Skip to main content

Indexing with Files

Not OnPrem

The aiXplain indexing service supports ingestion of files such as PDFs by automatically parsing their contents and storing them as searchable records. This enables document-level semantic search without requiring manual text extraction.

This guide walks through creating an index, ingesting files, and querying the content using natural language.

Create an Index

Create a new index that will store the ingested file contents.

from aixplain.factories import IndexFactory

index = IndexFactory.create(
name="PDF Document Index",
description="Index for storing parsed PDF content",
embedding_model="6734c55df127847059324d9e" # Replace with your embedding model ID
)

print(index.id)

Ingesting Files into the Index

Upsert Using File Path

You can pass a file path directly to the upsert() method. The file will be read, parsed, and indexed automatically.

pdf_path = "/path/to/document.pdf"

index.upsert(pdf_path)

Parse and Manually Create a Record

You can also extract the content using parse_file() and build a Record manually before upserting.

from aixplain.modules.model.record import Record

parsed_data = index.parse_file(pdf_path)

record = Record(
id="doc1",
value=parsed_data.data,
value_type="text",
attributes={"file_name": "document.pdf"}
)

index.upsert([record])

Prepare a Record from File

To simplify ingestion, use the built-in utility method that handles parsing and record creation:

record = index.prepare_record_from_file(pdf_path)
index.upsert([record])

Search Your Index

You can now search your file-ingested index using natural language queries.

response = index.search("project timeline")

response.data # Top result
response.details # Ranked results with metadata

Count Records

Verify that your file was successfully indexed:

index.count()