Image based indexing
This guide will walk you through the process of creating an index with images, adding records, and performing search queries using IndexFactory
. We will use a synthetic dataset to demonstrate.
1. Prepare your Dataset
Prepare your image based dataset, here we are working with IKEA Manuals of 2015, extracted from the FETA dataset for demonstration purposes. We will use IKEA manuals as images stored in an Amazon S3 bucket. Each image represents a single page of the manual.
2. Create an Index
Next, we will create an index using IndexFactory
. We will use the Jina Clip v2 (Multimodal) model as the embedding model.
from aixplain.enums import DataType, EmbeddingModel
from aixplain.modules.model.record import Record
from aixplain.factories import IndexFactory
from uuid import uuid4
from tqdm import tqdm
index = IndexFactory.create(
name=f"IKEA Manuals Demo",
description=f"Collection of IKEA manuals",
embedding_model=EmbeddingModel.JINA_CLIP_V2_MULTIMODAL # Jina Clip v2 (Multimodal)
)
If you already have an index, you can list
and get
it:
index_list = IndexFactory.list()['results']
for index in index_list:
print(index.id, index.name)
index = IndexFactory.get("67d34b3654c9cf001d59ef5a")
3. Prepare and Upsert Records
We will prepare the records and upsert them into the index. The upsert method allows you to add new records or update existing ones using their IDs. Each record will use the IKEA manual page as the image URI (value_type="image"
).
# Source data
url = "https://aixplain-platform-assets.s3.us-east-1.amazonaws.com/samples/tests/IKEA_US_2015/IKEA_US_2015_Page_{page}.jpg"
records = []
for page in tqdm(range(1, 166)):
image_url = url.format(page=str(page).zfill(3))
records.append(Record(
uri=image_url,
value_type=str(DataType.IMAGE), # DataType.IMAGE for image based records
attributes={"manual": image_url.split("/")[-1].replace(".jpg", "")}
))
if (page % 10 == 0):
index.upsert(records) # Insert data into the index
records = []
if len(records) > 0:
index.upsert(records)
4. Count the Records in Your Index
You can count the number of records in your index to verify that they have been successfully upserted:
count = index.count()
print(count)
5. Perform a Search Query
Finally, we will perform a search query on the index and print the results. You can pass in a text query (e.g., “Show me chairs”) or even an image URL to perform similarity-based image retrieval.
response = index.search("brown chair", top_k=10)
print(response)
6. Delete Indexes
If you need to delete an index, you can use the delete
method.
index.delete()