How to index & retrieve data

aiXplain's indexing and retrieval is powered by aiR, our Search model. Check out the aiR concept page to learn more about the model. This guide shows how to use aiR via aiXplain's FineTune service.

For this guide, we will use a dataset from PubMedQA. The guide assumes you can upload a dataset to aiXplain. Please visit the How to upload a dataset guide to learn how.

1. Embbed and Index Documents

We will use aiXplain's FineTune service for creating the embeddings and storing them on a vector database. You will see how we can easily set up an information retrieval approach in the platform with a few lines of code.

1.1 Load Search model

First, we will load the base Search model in aiXplain.

from aixplain.factories import DatasetFactory, ModelFactory, FinetuneFactory
from aixplain.enums import Function, Language

model = ModelFactory.get("66eae6656eb56311f2595011")
model.__dict__

Show output

1.2 Load dataset

Second, we will load the PubmedQA dataset.

info

Datasets are currently private, so you must first onboard this dataset to your personal aiXplain account. (Running the code below with ID 66eb0f199d7e172531616c70 would result in an error.)

from aixplain.factories import DatasetFactory
dataset = DatasetFactory.get("66eb0f199d7e172531616c70")
dataset.__dict__

Show output

1.3 Create Finetune job

Let's create a finetune job to start the indexing process

finetune_name = "PubMedQA 5000 Samples"
finetune = FinetuneFactory.create(finetune_name, [dataset], model)
finetune.__dict__

Show output

1.4 Start Finetune job

Finally, let's start the process.

finetune_model = finetune.start()

status = finetune_model.check_finetune_status()
status

Show output

import time

while status != "completed":
  status = finetune_model.check_finetune_status()
  print(f"Current status: {status}")
  time.sleep(10)

Show output

2. Retrieve Documents

Let's test our model.

from aixplain.factories import ModelFactory

finetune_model = ModelFactory.get("66f31d0acfca28e7e420fd79")
query = "Does vagus nerve contribute to the development of steatohepatitis and obesity in phosphatidylethanolamine N-methyltransferase deficient mice?"

response = finetune_model.run(query, parameters={ "numResults": 10 })
response

Show output

This process allows you to embed and index data for efficient information retrieval, enabling you to build powerful search-driven agents.

1. Embbed and Index Documents​

1.1 Load Search model​

1.2 Load dataset​

1.3 Create Finetune job​

1.4 Start Finetune job​

2. Retrieve Documents​

1. Embbed and Index Documents

1.1 Load Search model

1.2 Load dataset

1.3 Create Finetune job

1.4 Start Finetune job

2. Retrieve Documents