Index & Retrieve
This guide will walk you through the process of creating an index, adding records, and performing search queries using the IndexFactory. We will use a synthetic dataset to demonstrate.
1. Create a Synthetic Dataset
We will create a synthetic dataset with a few sample records.
synthetic_data = [
{
"id": "doc1",
"text": "Artificial intelligence is transforming industries worldwide, from healthcare to finance.",
"category": "technology"
},
{
"id": "doc2",
"text": "The Mona Lisa, painted by Leonardo da Vinci, is one of the most famous artworks in history.",
"category": "art"
},
{
"id": "doc3",
"text": "Machine learning algorithms are being used to predict patient outcomes in hospitals.",
"category": "technology"
},
{
"id": "doc4",
"text": "The Earth orbits the Sun once every 365.25 days, creating the calendar year.",
"category": "science"
},
{
"id": "doc5",
"text": "Quantum computing promises to solve complex problems that are currently intractable for classical computers.",
"category": "technology"
},
]
2. Create an Index
Next, we will create an index using the IndexFactory.
from aixplain.factories import IndexFactory
# Create an index
index_name = "Synthetic Index"
index_description = "Index for synthetic dataset."
index = IndexFactory.create(index_name, index_description)
info
If you already have an index, you can list
and get
it:
index_list = IndexFactory.list()['results']
for index in index_list:
print(index.id, index.name)
Show output
index = IndexFactory.get("678a6dd10c3d32001d119a10")
3. Prepare and Upsert Records
We will prepare the records and upsert them into the index. The upsert method allows you to add new records or update existing ones using their IDs.
from aixplain.modules.model.record import Record
# Prepare the records
records = [
Record(
value=item["text"],
value_type="text",
id=item["id"],
uri="",
attributes={"category": item["category"]}
) for item in synthetic_data
]
# Upsert records to the index
index.upsert(records)
4. Count the Records in your Index
You can count the number of records in your index to verify that they have been successfully upserted.
count = index.count()
print(count)
Show output
5. Perform a Search Query
Finally, we will perform a search query on the index and print the results.
import json
# Perform a search query
query = "Healthcare technology"
response = index.search(query, top_k=3)
# Print the search results
print(json.dumps(response.details, indent=4))
Show output
6. Delete Indexes
If you need to delete an index, you can use the delete
method.
index.delete()