How to index & retrieve data
aiXplain's indexing and retrieval is powered by aiR, our Search model. Check out the aiR concept page to learn more about the model. This guide shows how to use aiR via aiXplain's FineTune service.
For this guide, we will use a dataset from PubMedQA. The guide assumes you can upload a dataset to aiXplain. Please visit the How to upload a dataset guide to learn how.
1. Embbed and Index Documents
We will use aiXplain's FineTune service for creating the embeddings and storing them on a vector database. You will see how we can easily set up an information retrieval approach in the platform with a few lines of code.
1.1 Load Search model
First, we will load the base Search model in aiXplain.
from aixplain.factories import DatasetFactory, ModelFactory, FinetuneFactory
from aixplain.enums import Function, Language
model = ModelFactory.get("66eae6656eb56311f2595011")
model.__dict__
1.2 Load dataset
Second, we will load the PubmedQA dataset.
Datasets are currently private, so you must first onboard this dataset to your personal aiXplain account. (Running the code below with ID 66eb0f199d7e172531616c70
would result in an error.)
from aixplain.factories import DatasetFactory
dataset = DatasetFactory.get("66eb0f199d7e172531616c70")
dataset.__dict__
1.3 Create Finetune job
Let's create a finetune job to start the indexing process
finetune_name = "PubMedQA 5000 Samples"
finetune = FinetuneFactory.create(finetune_name, [dataset], model)
finetune.__dict__
1.4 Start Finetune job
Finally, let's start the process.
finetune_model = finetune.start()
status = finetune_model.check_finetune_status()
status
import time
while status != "completed":
status = finetune_model.check_finetune_status()
print(f"Current status: {status}")
time.sleep(10)
2. Retrieve Documents
Let's test our model.
from aixplain.factories import ModelFactory
finetune_model = ModelFactory.get("66f31d0acfca28e7e420fd79")
query = "Does vagus nerve contribute to the development of steatohepatitis and obesity in phosphatidylethanolamine N-methyltransferase deficient mice?"
response = finetune_model.run(query, parameters={ "numResults": 10 })
response
This process allows you to embed and index data for efficient information retrieval, enabling you to build powerful search-driven agents.