RAG Pipeline Integration¶
The problem¶
Retrieval-Augmented Generation (RAG) pipelines need a reliable, searchable data source that returns relevant document chunks for a given query. Building and maintaining this retrieval layer is a significant engineering effort: you need document ingestion, text extraction, chunking, embedding generation, index management, and a query API -- all before you can feed a single result to your language model.
Most teams either build this infrastructure from scratch or cobble together multiple services, resulting in fragile pipelines that are expensive to maintain and difficult to scale.
The solution¶
IPTO serves as a managed retrieval backend for RAG pipelines. Upload documents, and the platform handles extraction, chunking, embedding, and indexing automatically. At query time, your RAG orchestrator calls the search API to retrieve relevant chunks, then feeds those chunks to your language model as context.
Architecture¶
flowchart LR
A[User Question] --> B[RAG Orchestrator]
B --> C[IPTO Search API]
C --> D[Retrieved Chunks + Citations]
D --> B
B --> E[Language Model]
E --> F[Answer with Citations] How it works¶
1. Upload documents¶
Providers upload documents through the presigned upload flow. IPTO accepts PDFs, plain text, images, scanned records, audio, and video files.
# Initiate upload
curl -X POST https://api.ipto.ai/v1/uploads \
-H "Authorization: Bearer $SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"dataset_id": "dset_knowledge_base",
"filename": "product-manual-v3.pdf",
"mime_type": "application/pdf",
"size_bytes": 2450000,
"checksum_sha256": "a1b2c3..."
}'
2. Automatic indexing¶
After admin approval, the platform processes each uploaded object through a multi-stage pipeline:
| Stage | What happens |
|---|---|
| Normalize | Classify file type; extract pages, text, and structure |
| Extract | Produce canonical artifacts: plain text, OCR blocks, transcripts, captions |
| Enrich | Generate dense embeddings, retrieval-ready chunks with positions |
| Index | Build immutable search segments and publish a new manifest |
No manual chunking, embedding, or index management is required.
3. Search via API¶
Your RAG orchestrator sends a search request. The search API supports multiple retrieval modes to match different query types:
| Mode | Best for |
|---|---|
lexical | Keyword-precise queries, boolean expressions, exact phrases |
dense | Semantic and conceptual queries |
hybrid | Natural language queries that benefit from both keyword and semantic matching |
auto | Let the platform choose based on query analysis (default) |
Example search request:
curl -X POST https://api.ipto.ai/v1/search \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"dataset_ids": ["dset_knowledge_base"],
"query": "How do I configure automatic failover?",
"top_k": 5,
"retrieval_mode": "hybrid",
"include_snippets": true,
"include_citations": true
}'
Example response (abbreviated):
{
"data": {
"query_id": "qry_abc123",
"results": [
{
"retrieval_event_id": "ret_001",
"search_unit_id": "unit_xyz",
"dataset_id": "dset_knowledge_base",
"score": 14.82,
"snippet": "Automatic failover is configured by setting the failover_mode parameter to 'auto' in the cluster configuration file...",
"citation_locator": {
"locator": { "page": 47 },
"display_text": "product-manual-v3.pdf p.47"
}
}
],
"charged_result_count": 5,
"timing_ms": { "total": 38 }
}
}
4. Feed results to your language model¶
Your RAG orchestrator extracts the snippets and citation locators from the search response and includes them as context in your language model prompt. The model generates an answer grounded in the retrieved content.
5. Record citations¶
When your pipeline uses a retrieved result in its final output, record a citation event. This creates an auditable link between the generated answer and its source data.
curl -X POST https://api.ipto.ai/v1/retrieval-events/ret_001/cite \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"consumer_type": "agent_run",
"consumer_id": "rag_session_20260405",
"search_unit_id": "unit_xyz"
}'
Benefits¶
Why IPTO for RAG pipelines
- Hybrid retrieval: Combine lexical and vector search in a single API call. Hybrid mode uses Reciprocal Rank Fusion to merge keyword matches with semantic similarity results, improving recall for natural language queries.
- Managed indexing: Upload documents and the platform handles extraction, chunking, embedding generation, and index construction. No custom ETL pipelines, embedding jobs, or index maintenance.
- Citation tracking: Every retrieved result carries a
retrieval_event_idandcitation_locator(page number, timestamp, chunk ordinal). Record citation events to create an auditable chain from generated output back to source data. - Metered usage: Pay per retrieval and per citation. No upfront infrastructure cost for search clusters, embedding models, or vector databases.
- Multi-format support: The ingestion pipeline handles PDFs, plain text, images with OCR, audio transcripts, and video captions. All formats are searchable through the same API.
- Scoped access: API keys can be restricted to specific datasets, ensuring your RAG pipeline only retrieves from authorized sources.
- Structured query syntax: Beyond simple natural language queries, the search API supports boolean operators, phrase matching, proximity search, wildcards, and field-scoped queries for precise retrieval.