Indexing with Files
The aiXplain indexing service supports ingestion of files such as PDFs by automatically parsing their contents and storing them as searchable records. This enables document-level semantic search without requiring manual text extraction.
This guide walks through creating an index, ingesting files, and querying the content using natural language.
Create an Index
Create a new index that will store the ingested file contents.
from aixplain.factories import IndexFactory
index = IndexFactory.create(
name="PDF Document Index",
description="Index for storing parsed PDF content",
embedding_model="6734c55df127847059324d9e" # Replace with your embedding model ID
)
print(index.id)
Ingesting Files into the Index
Upsert Using File Path
You can pass a file path directly to the upsert()
method. The file will be read, parsed, and indexed automatically.
pdf_path = "/path/to/document.pdf"
index.upsert(pdf_path)
Parse and Manually Create a Record
You can also extract the content using parse_file()
and build a Record
manually before upserting.
from aixplain.modules.model.record import Record
parsed_data = index.parse_file(pdf_path)
record = Record(
id="doc1",
value=parsed_data.data,
value_type="text",
attributes={"file_name": "document.pdf"}
)
index.upsert([record])
Prepare a Record from File
To simplify ingestion, use the built-in utility method that handles parsing and record creation:
record = index.prepare_record_from_file(pdf_path)
index.upsert([record])
Search Your Index
You can now search your file-ingested index using natural language queries.
response = index.search("project timeline")
response.data # Top result
response.details # Ranked results with metadata
Count Records
Verify that your file was successfully indexed:
index.count()