Indexing with Files

Not OnPrem

The aiXplain indexing service supports ingestion of files such as PDFs by automatically parsing their contents and storing them as searchable records. This enables document-level semantic search without requiring manual text extraction.

This guide walks through creating an index, ingesting files, and querying the content using natural language.

Create an Index

Create a new index that will store the ingested file contents.

from aixplain.factories import IndexFactory

index = IndexFactory.create(
    name="PDF Document Index",
    description="Index for storing parsed PDF content",
    embedding_model="6734c55df127847059324d9e"  # Replace with your embedding model ID
)

print(index.id)

Ingesting Files into the Index

Upsert Using File Path

You can pass a file path directly to the upsert() method. The file will be read, parsed, and indexed automatically.

pdf_path = "/path/to/document.pdf"

index.upsert(pdf_path)

Parse and Manually Create a Record

You can also extract the content using parse_file() and build a Record manually before upserting.

from aixplain.modules.model.record import Record

parsed_data = index.parse_file(pdf_path)

record = Record(
    id="doc1",
    value=parsed_data.data,
    value_type="text",
    attributes={"file_name": "document.pdf"}
)

index.upsert([record])

Prepare a Record from File

To simplify ingestion, use the built-in utility method that handles parsing and record creation:

record = index.prepare_record_from_file(pdf_path)
index.upsert([record])

Search Your Index

You can now search your file-ingested index using natural language queries.

response = index.search("project timeline")

response.data  # Top result
response.details  # Ranked results with metadata

Count Records

Verify that your file was successfully indexed:

index.count()

Create an Index​

Ingesting Files into the Index​

Upsert Using File Path​

Parse and Manually Create a Record​

Prepare a Record from File​

Search Your Index​

Count Records​