Skip to content

RAG Pipeline Integration

The problem

Retrieval-Augmented Generation (RAG) pipelines need a reliable, searchable data source that returns relevant document chunks for a given query. Building and maintaining this retrieval layer is a significant engineering effort: you need document ingestion, text extraction, chunking, embedding generation, index management, and a query API -- all before you can feed a single result to your language model.

Most teams either build this infrastructure from scratch or cobble together multiple services, resulting in fragile pipelines that are expensive to maintain and difficult to scale.

The solution

IPTO serves as a managed retrieval backend for RAG pipelines. Upload documents, and the platform handles extraction, chunking, embedding, and indexing automatically. At query time, your RAG orchestrator calls the search API to retrieve relevant chunks, then feeds those chunks to your language model as context.

Architecture

flowchart LR
    A[User Question] --> B[RAG Orchestrator]
    B --> C[IPTO Search API]
    C --> D[Retrieved Chunks + Citations]
    D --> B
    B --> E[Language Model]
    E --> F[Answer with Citations]

How it works

1. Upload documents

Providers upload documents through the presigned upload flow. IPTO accepts PDFs, plain text, images, scanned records, audio, and video files.

# Upload a single document
ipto objects upload <dataset_id> ./product-manual-v3.pdf

# Bulk upload an entire directory
ipto objects upload <dataset_id> ./documents/ --recursive
# Initiate upload
curl -X POST https://api.ipto.ai/v1/uploads \
  -H "Authorization: Bearer $SESSION_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "dataset_id": "dset_knowledge_base",
    "filename": "product-manual-v3.pdf",
    "mime_type": "application/pdf",
    "size_bytes": 2450000,
    "checksum_sha256": "a1b2c3..."
  }'

2. Automatic indexing

After admin approval, the platform processes each uploaded object through a multi-stage pipeline:

Stage What happens
Normalize Classify file type; extract pages, text, and structure
Extract Produce canonical artifacts: plain text, OCR blocks, transcripts, captions
Enrich Generate dense embeddings, retrieval-ready chunks with positions
Index Build immutable search segments and publish a new manifest

No manual chunking, embedding, or index management is required.

3. Search via API

Your RAG orchestrator sends a search request. The search API supports multiple retrieval modes to match different query types:

Mode Best for
lexical Keyword-precise queries, boolean expressions, exact phrases
dense Semantic and conceptual queries
hybrid Natural language queries that benefit from both keyword and semantic matching
auto Let the platform choose based on query analysis (default)

Example search request:

ipto search "How do I configure automatic failover?" \
  --datasets dset_knowledge_base --mode hybrid --output json
curl -X POST https://api.ipto.ai/v1/search \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "dataset_ids": ["dset_knowledge_base"],
    "query": "How do I configure automatic failover?",
    "top_k": 5,
    "retrieval_mode": "hybrid",
    "include_snippets": true,
    "include_citations": true
  }'

Example response (abbreviated):

{
  "data": {
    "query_id": "qry_abc123",
    "results": [
      {
        "retrieval_event_id": "ret_001",
        "search_unit_id": "unit_xyz",
        "dataset_id": "dset_knowledge_base",
        "score": 14.82,
        "snippet": "Automatic failover is configured by setting the failover_mode parameter to 'auto' in the cluster configuration file...",
        "citation_locator": {
          "locator": { "page": 47 },
          "display_text": "product-manual-v3.pdf p.47"
        }
      }
    ],
    "charged_result_count": 5,
    "timing_ms": { "total": 38 }
  }
}

4. Feed results to your language model

Your RAG orchestrator extracts the snippets and citation locators from the search response and includes them as context in your language model prompt. The model generates an answer grounded in the retrieved content.

5. Record citations

When your pipeline uses a retrieved result in its final output, record a citation event. This creates an auditable link between the generated answer and its source data.

curl -X POST https://api.ipto.ai/v1/retrieval-events/ret_001/cite \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "consumer_type": "agent_run",
    "consumer_id": "rag_session_20260405",
    "search_unit_id": "unit_xyz"
  }'

Benefits

Why IPTO for RAG pipelines

  • Hybrid retrieval: Combine lexical and vector search in a single API call. Hybrid mode uses Reciprocal Rank Fusion to merge keyword matches with semantic similarity results, improving recall for natural language queries.
  • Managed indexing: Upload documents and the platform handles extraction, chunking, embedding generation, and index construction. No custom ETL pipelines, embedding jobs, or index maintenance.
  • Citation tracking: Every retrieved result carries a retrieval_event_id and citation_locator (page number, timestamp, chunk ordinal). Record citation events to create an auditable chain from generated output back to source data.
  • Metered usage: Pay per retrieval and per citation. No upfront infrastructure cost for search clusters, embedding models, or vector databases.
  • Multi-format support: The ingestion pipeline handles PDFs, plain text, images with OCR, audio transcripts, and video captions. All formats are searchable through the same API.
  • Scoped access: API keys can be restricted to specific datasets, ensuring your RAG pipeline only retrieves from authorized sources.
  • Structured query syntax: Beyond simple natural language queries, the search API supports boolean operators, phrase matching, proximity search, wildcards, and field-scoped queries for precise retrieval.