Skip to content

Searching Data

This guide walks through the buyer and agent search workflow: creating a scoped API key, running search queries, applying filters, choosing retrieval modes, paginating results, and downloading full objects.

Prerequisites

  • An IPTO account with buyer access to one or more datasets.
  • An API key with search:query scope. If you do not have one yet, Step 1 below shows how to create one.

Step 1: Create a scoped API key

Create an API key with only the scopes your search agent needs. For a read-only search workflow, search:query and datasets:read are sufficient.

ipto keys create --name "search-agent-prod" \
  --scopes search:query,datasets:read
curl -X POST https://api.ipto.ai/v1/api-keys \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "search-agent-prod",
    "scopes": ["search:query", "datasets:read"],
    "dataset_access_mode": "all_available"
  }'
import requests

BASE = "https://api.ipto.ai"
headers = {"Authorization": f"Bearer {token}"}

resp = requests.post(
    f"{BASE}/v1/api-keys",
    headers=headers,
    json={
        "name": "search-agent-prod",
        "scopes": ["search:query", "datasets:read"],
        "dataset_access_mode": "all_available",
    },
)
resp.raise_for_status()
key = resp.json()["data"]
api_secret = key["secret"]
print(f"Save this secret: {api_secret}")
const BASE = "https://api.ipto.ai";
const headers = {
  Authorization: `Bearer ${token}`,
  "Content-Type": "application/json",
};

const keyRes = await fetch(`${BASE}/v1/api-keys`, {
  method: "POST",
  headers,
  body: JSON.stringify({
    name: "search-agent-prod",
    scopes: ["search:query", "datasets:read"],
    dataset_access_mode: "all_available",
  }),
});
const keyData = (await keyRes.json()).data;
const apiSecret = keyData.secret;
console.log(`Save this secret: ${apiSecret}`);

Save your secret immediately

The API key secret is returned only once at creation time. Store it in a secrets manager or environment variable. If you lose it, revoke the key and create a new one.

From this point forward, use the API key for authentication:

export IPTO_API_KEY="ipto_kp1a2b_sk_live_..."

Step 2: Basic search query

Submit a search query across all datasets accessible to your API key. The search endpoint accepts a natural language query string and returns ranked results with snippets and citations.

ipto search "invoice showing VAT dispute with Acme in March"
curl -X POST https://api.ipto.ai/v1/search \
  -H "Authorization: Bearer $IPTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "invoice showing VAT dispute with Acme in March",
    "top_k": 10,
    "include_snippets": true,
    "include_citations": true
  }'
search_headers = {"Authorization": f"Bearer {api_secret}"}

resp = requests.post(
    f"{BASE}/v1/search",
    headers=search_headers,
    json={
        "query": "invoice showing VAT dispute with Acme in March",
        "top_k": 10,
        "include_snippets": True,
        "include_citations": True,
    },
)
resp.raise_for_status()
results = resp.json()["data"]
for hit in results["results"]:
    print(f"[{hit['rank']}] {hit['snippet'][:80]}...")
const searchHeaders = {
  Authorization: `Bearer ${apiSecret}`,
  "Content-Type": "application/json",
};

const searchRes = await fetch(`${BASE}/v1/search`, {
  method: "POST",
  headers: searchHeaders,
  body: JSON.stringify({
    query: "invoice showing VAT dispute with Acme in March",
    top_k: 10,
    include_snippets: true,
    include_citations: true,
  }),
});
const searchData = (await searchRes.json()).data;
for (const hit of searchData.results) {
  console.log(`[${hit.rank}] ${hit.snippet?.slice(0, 80)}...`);
}

Response:

{
  "data": {
    "query_id": "qry_abc123",
    "results": [
      {
        "retrieval_event_id": "ret_001",
        "search_unit_id": "unit_abc",
        "object_id": "obj_def456",
        "dataset_id": "dset_abc123",
        "seller_tenant_id": "ten_seller_001",
        "rank": 1,
        "score": 12.73,
        "billable": true,
        "pricing_band": 2,
        "snippet": "Invoice notes mention a VAT dispute with Acme in March...",
        "citation_locator": {
          "locator": { "page": 4 },
          "display_text": "invoice-0425.pdf p.4"
        }
      }
    ],
    "next_cursor": null,
    "charged_result_count": 1,
    "timing_ms": {
      "filter": 2,
      "lexical": 19,
      "vector": 0,
      "rerank": 0,
      "total": 24
    }
  },
  "request_id": "req_010",
  "timestamp": "2026-04-05T12:00:00Z"
}

The query string supports structured syntax for precise searches. Combine operators to narrow results.

ipto search '"VAT dispute" AND Acme NOT "credit note"'
# Phrase search with boolean operators
curl -X POST https://api.ipto.ai/v1/search \
  -H "Authorization: Bearer $IPTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "\"VAT dispute\" AND Acme NOT \"credit note\"",
    "top_k": 20
  }'
resp = requests.post(
    f"{BASE}/v1/search",
    headers=search_headers,
    json={
        "query": '"VAT dispute" AND Acme NOT "credit note"',
        "top_k": 20,
    },
)
results = resp.json()["data"]["results"]
print(f"Found {len(results)} results")
const boolRes = await fetch(`${BASE}/v1/search`, {
  method: "POST",
  headers: searchHeaders,
  body: JSON.stringify({
    query: '"VAT dispute" AND Acme NOT "credit note"',
    top_k: 20,
  }),
});
const boolData = (await boolRes.json()).data;
console.log(`Found ${boolData.results.length} results`);

More examples of query syntax:

# OR groups
(apple OR google) AND lawsuit

# Proximity search -- terms within 5 positions
revenue NEAR/5 forecast

# Wildcard prefix search
invest*

# Field-scoped search
title:"quarterly report" AND body:compliance

Step 4: Filtering by MIME type and date range

Use the filters object to narrow results by file type, language, date range, or tags -- independently of the query string.

ipto search "quarterly revenue report" \
  --filter-mime application/pdf \
  --filter-lang en
curl -X POST https://api.ipto.ai/v1/search \
  -H "Authorization: Bearer $IPTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "quarterly revenue report",
    "filters": {
      "mime_types": ["application/pdf"],
      "languages": ["en"],
      "created_at_gte": "2025-01-01T00:00:00Z",
      "created_at_lte": "2025-12-31T23:59:59Z"
    },
    "top_k": 20
  }'
resp = requests.post(
    f"{BASE}/v1/search",
    headers=search_headers,
    json={
        "query": "quarterly revenue report",
        "filters": {
            "mime_types": ["application/pdf"],
            "languages": ["en"],
            "created_at_gte": "2025-01-01T00:00:00Z",
            "created_at_lte": "2025-12-31T23:59:59Z",
        },
        "top_k": 20,
    },
)
results = resp.json()["data"]["results"]
const filterRes = await fetch(`${BASE}/v1/search`, {
  method: "POST",
  headers: searchHeaders,
  body: JSON.stringify({
    query: "quarterly revenue report",
    filters: {
      mime_types: ["application/pdf"],
      languages: ["en"],
      created_at_gte: "2025-01-01T00:00:00Z",
      created_at_lte: "2025-12-31T23:59:59Z",
    },
    top_k: 20,
  }),
});
const filterData = (await filterRes.json()).data;

Filters combine with logical AND across fields and logical OR within a single field's array values. For example, mime_types: ["application/pdf", "image/png"] matches objects that are either PDF or PNG.


Step 5: Choosing a retrieval mode

IPTO supports multiple retrieval strategies. Choose the one that fits your use case.

Mode Best for Description
lexical Exact terms, boolean queries, regex Keyword and boolean search using BM25 scoring.
dense Semantic similarity, natural language Vector similarity search using embeddings.
hybrid General-purpose search Combines lexical and dense results using Reciprocal Rank Fusion.
auto When you are unsure System analyzes the query and picks the best mode. This is the default.
ipto search "regulatory compliance documentation for GDPR" \
  --mode hybrid
# Explicit hybrid search
curl -X POST https://api.ipto.ai/v1/search \
  -H "Authorization: Bearer $IPTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "regulatory compliance documentation for GDPR",
    "retrieval_mode": "hybrid",
    "top_k": 20
  }'
resp = requests.post(
    f"{BASE}/v1/search",
    headers=search_headers,
    json={
        "query": "regulatory compliance documentation for GDPR",
        "retrieval_mode": "hybrid",
        "top_k": 20,
    },
)
const hybridRes = await fetch(`${BASE}/v1/search`, {
  method: "POST",
  headers: searchHeaders,
  body: JSON.stringify({
    query: "regulatory compliance documentation for GDPR",
    retrieval_mode: "hybrid",
    top_k: 20,
  }),
});

When to use auto

If your query contains boolean operators, phrases in quotes, wildcards, or regex patterns, auto mode will select lexical. For short natural-language queries without operators, auto will select hybrid. You can always override with an explicit retrieval_mode.


Step 6: Pagination

When a search returns more results than top_k, the response includes a next_cursor value. Pass it in the next request to fetch the next page.

# First page
curl -X POST https://api.ipto.ai/v1/search \
  -H "Authorization: Bearer $IPTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "vendor contract terms",
    "top_k": 10
  }'

# Next page (using cursor from previous response)
curl -X POST https://api.ipto.ai/v1/search \
  -H "Authorization: Bearer $IPTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "vendor contract terms",
    "top_k": 10,
    "cursor": "eyJwYWdlIjoy..."
  }'
cursor = None
all_results = []

while True:
    body = {"query": "vendor contract terms", "top_k": 10}
    if cursor:
        body["cursor"] = cursor

    resp = requests.post(
        f"{BASE}/v1/search",
        headers=search_headers,
        json=body,
    )
    data = resp.json()["data"]
    all_results.extend(data["results"])

    cursor = data.get("next_cursor")
    if not cursor:
        break

print(f"Total results collected: {len(all_results)}")
let cursor: string | null = null;
const allResults: any[] = [];

do {
  const body: Record<string, any> = {
    query: "vendor contract terms",
    top_k: 10,
  };
  if (cursor) body.cursor = cursor;

  const res = await fetch(`${BASE}/v1/search`, {
    method: "POST",
    headers: searchHeaders,
    body: JSON.stringify(body),
  });
  const data = (await res.json()).data;
  allResults.push(...data.results);
  cursor = data.next_cursor;
} while (cursor);

console.log(`Total results collected: ${allResults.length}`);

Pagination and billing

Paginating through results does not double-charge for the same search units within a single query session. Each unique billable result is charged once.


Step 7: Downloading full objects from results

Each search result includes a retrieval_event_id. Use it to request a download URL for the original file.

ipto objects download obj_abc123 --output ./downloaded-file.pdf
curl -X POST https://api.ipto.ai/v1/retrieval-events/$RETRIEVAL_EVENT_ID/download \
  -H "Authorization: Bearer $IPTO_API_KEY"
retrieval_event_id = results["results"][0]["retrieval_event_id"]

resp = requests.post(
    f"{BASE}/v1/retrieval-events/{retrieval_event_id}/download",
    headers=search_headers,
)
resp.raise_for_status()
download = resp.json()["data"]

# Download the actual file
file_resp = requests.get(download["download_url"])
with open("downloaded-file.pdf", "wb") as f:
    f.write(file_resp.content)

print(f"Downloaded {len(file_resp.content)} bytes")
const retEventId = searchData.results[0].retrieval_event_id;

const dlRes = await fetch(
  `${BASE}/v1/retrieval-events/${retEventId}/download`,
  {
    method: "POST",
    headers: { Authorization: `Bearer ${apiSecret}` },
  }
);
const dlData = (await dlRes.json()).data;

// Download the actual file
const fileRes = await fetch(dlData.download_url);
const fileBuffer = await fileRes.arrayBuffer();
const { writeFile } = await import("fs/promises");
await writeFile("downloaded-file.pdf", Buffer.from(fileBuffer));
console.log(`Downloaded ${fileBuffer.byteLength} bytes`);

Response from download endpoint:

{
  "data": {
    "download_event_id": "dl_xyz789",
    "download_url": "https://storage.example.com/presigned-get-url...",
    "expires_at": "2026-04-05T12:15:00Z"
  },
  "request_id": "req_011",
  "timestamp": "2026-04-05T12:00:05Z"
}

Download charges

Downloading the original file is a separate billable event from the initial retrieval. The download is metered and will appear in your spend summary.


Search endpoint availability

The POST /v1/search endpoint is planned for an upcoming release. The query syntax, request/response shapes, and retrieval concepts described in this guide are stable and reflect the finalized API design. You can build your integration against this contract today -- the endpoint will be fully operational when it ships.


Query syntax quick reference

Syntax Description Example
term Single keyword invoice
"phrase" Exact phrase match "chief executive officer"
A AND B Both terms required revenue AND forecast
A OR B Either term matches CEO OR "chief executive"
NOT A or -A Exclude term contract NOT termination
(A OR B) AND C Grouped boolean (apple OR google) AND lawsuit
A NEAR/n B Terms within n positions breach NEAR/5 notification
prefix* Prefix wildcard (min 3 chars) invest*
*suffix Suffix wildcard *ization
/regex/ Regular expression /INV-[0-9]{4}/
field:term Field-scoped search title:"quarterly report"

Available fields for field-scoped search: title, body, ocr, transcript, caption, metadata, tags

Operator precedence: NOT > AND > OR. Use parentheses to override.


Next steps