Skip to main content

Overview

aiXplain's Index & Retrieval system, aiR, allows you to embed and index data for efficient information retrieval, enabling you to build powerful search-driven agents. The system uses an established and well maintained IR engine, Haystack, for better maintainability, flexibility, and community support.

You can only utilise aiR via the IndexFactory. Learn more in our guide on how to Index and Retrieve.

Functionality

The system enables the following actions:

  1. list and get your indexes.

  2. create an empty index.

  3. upsert to ingest (update or add) raw text documents (records) and convert them into vector representations (embeddings). The process involves two steps:

    • Embedding Generation: A neural network, such as a transformer-based model (e.g., BERT, GPT), converts the preprocessed text into dense vector embeddings. These embeddings capture the semantic meaning of the text.
    • Vector Storage: The generated vectors are then stored in a vector database (Qdrant).
  4. count the number of records inside an index.

  5. search to return the most relevant records to a user query from the vector database. The process involves three steps:

    • Query Embedding: The same embedding model used for indexing generates a vector representation of the query.
    • Similarity Search: The query vector is compared to the stored record vectors using a similarity metric (e.g., cosine similarity, Euclidean distance). The vector database quickly identifies the nearest neighbors (most similar records).
    • Result Retrieval: The records corresponding to the closest vectors are retrieved and returned as the search results.
  6. delete an index or a record.

Frequently Asked Questions (FAQs)

How large can a single record be?

It can be of any size. Size should not be a problem at all.

How many records (dataset size) can it handle?

It is tested on up to datasets of 10,000 samples and it can handle larger datasets.

What is the latency of indexing?

Here are the tests on the latency:

dataset of 100 samples ---> 3 seconds
dataset of 5k samples ---> 90 seconds
dataset of 10k samples ---> 190 seconds

What is the latency of retrieval?

Less than a second.

Does the retrieval step include ranking?

Yes, the retrieved items are ranked and given scores.

What embeddings does aiR use? Can they be switched?

For now it uses OpenAI's text-embedding-ada-002. We intend to make the choice of embeddings model a paramater in future.

What vector database does aiR use?

Qdrant

Does the retrieval step include ranking?

Yes, the retrieved items are ranked and given scores.

Future work

  1. Allow the user to choose the embedding model.
  2. Allow the user to chunk their records during ingestion.
  3. Allow the user to choose the ranking mechanism. (Cosine similarity, dot product, l2, etc.)
  4. Allow the user to choose the vector database.