Searching Data¶

This guide walks through the buyer and agent search workflow: creating a scoped API key, running search queries, applying filters, choosing retrieval modes, paginating results, and downloading full objects.

Prerequisites¶

An IPTO account with buyer access to one or more datasets.
An API key with search:query scope. If you do not have one yet, Step 1 below shows how to create one.

Step 1: Create a scoped API key¶

Create an API key with only the scopes your search agent needs. For a read-only search workflow, search:query and datasets:read are sufficient.

IPTO CLIcURLPythonTypeScript

ipto keys create --name "search-agent-prod" \
  --scopes search:query,datasets:read

curl -X POST https://api.ipto.ai/v1/api-keys \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "search-agent-prod",
    "scopes": ["search:query", "datasets:read"],
    "dataset_access_mode": "all_available"
  }'

import requests

BASE = "https://api.ipto.ai"
headers = {"Authorization": f"Bearer {token}"}

resp = requests.post(
    f"{BASE}/v1/api-keys",
    headers=headers,
    json={
        "name": "search-agent-prod",
        "scopes": ["search:query", "datasets:read"],
        "dataset_access_mode": "all_available",
    },
)
resp.raise_for_status()
key = resp.json()["data"]
api_secret = key["secret"]
print(f"Save this secret: {api_secret}")

const BASE = "https://api.ipto.ai";
const headers = {
  Authorization: `Bearer ${token}`,
  "Content-Type": "application/json",
};

const keyRes = await fetch(`${BASE}/v1/api-keys`, {
  method: "POST",
  headers,
  body: JSON.stringify({
    name: "search-agent-prod",
    scopes: ["search:query", "datasets:read"],
    dataset_access_mode: "all_available",
  }),
});
const keyData = (await keyRes.json()).data;
const apiSecret = keyData.secret;
console.log(`Save this secret: ${apiSecret}`);

Save your secret immediately

The API key secret is returned only once at creation time. Store it in a secrets manager or environment variable. If you lose it, revoke the key and create a new one.

From this point forward, use the API key for authentication:

export IPTO_API_KEY="ipto_kp1a2b_sk_live_..."

Step 2: Basic search query¶

Submit a search query across all datasets accessible to your API key. The search endpoint accepts a natural language query string and returns ranked results with snippets and citations.

IPTO CLIcURLPythonTypeScript

ipto search "invoice showing VAT dispute with Acme in March"

curl -X POST https://api.ipto.ai/v1/search \
  -H "Authorization: Bearer $IPTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "invoice showing VAT dispute with Acme in March",
    "top_k": 10,
    "include_snippets": true,
    "include_citations": true
  }'

search_headers = {"Authorization": f"Bearer {api_secret}"}

resp = requests.post(
    f"{BASE}/v1/search",
    headers=search_headers,
    json={
        "query": "invoice showing VAT dispute with Acme in March",
        "top_k": 10,
        "include_snippets": True,
        "include_citations": True,
    },
)
resp.raise_for_status()
results = resp.json()["data"]
for hit in results["results"]:
    print(f"[{hit['rank']}] {hit['snippet'][:80]}...")

const searchHeaders = {
  Authorization: `Bearer ${apiSecret}`,
  "Content-Type": "application/json",
};

const searchRes = await fetch(`${BASE}/v1/search`, {
  method: "POST",
  headers: searchHeaders,
  body: JSON.stringify({
    query: "invoice showing VAT dispute with Acme in March",
    top_k: 10,
    include_snippets: true,
    include_citations: true,
  }),
});
const searchData = (await searchRes.json()).data;
for (const hit of searchData.results) {
  console.log(`[${hit.rank}] ${hit.snippet?.slice(0, 80)}...`);
}

Response:

{
  "data": {
    "query_id": "qry_abc123",
    "results": [
      {
        "retrieval_event_id": "ret_001",
        "search_unit_id": "unit_abc",
        "object_id": "obj_def456",
        "dataset_id": "dset_abc123",
        "seller_tenant_id": "ten_seller_001",
        "rank": 1,
        "score": 12.73,
        "billable": true,
        "pricing_band": 2,
        "snippet": "Invoice notes mention a VAT dispute with Acme in March...",
        "citation_locator": {
          "locator": { "page": 4 },
          "display_text": "invoice-0425.pdf p.4"
        }
      }
    ],
    "next_cursor": null,
    "charged_result_count": 1,
    "timing_ms": {
      "filter": 2,
      "lexical": 19,
      "vector": 0,
      "rerank": 0,
      "total": 24
    }
  },
  "request_id": "req_010",
  "timestamp": "2026-04-05T12:00:00Z"
}

Step 3: Using boolean operators and phrase search¶

The query string supports structured syntax for precise searches. Combine operators to narrow results.

IPTO CLIcURLPythonTypeScript

ipto search '"VAT dispute" AND Acme NOT "credit note"'

# Phrase search with boolean operators
curl -X POST https://api.ipto.ai/v1/search \
  -H "Authorization: Bearer $IPTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "\"VAT dispute\" AND Acme NOT \"credit note\"",
    "top_k": 20
  }'

resp = requests.post(
    f"{BASE}/v1/search",
    headers=search_headers,
    json={
        "query": '"VAT dispute" AND Acme NOT "credit note"',
        "top_k": 20,
    },
)
results = resp.json()["data"]["results"]
print(f"Found {len(results)} results")

const boolRes = await fetch(`${BASE}/v1/search`, {
  method: "POST",
  headers: searchHeaders,
  body: JSON.stringify({
    query: '"VAT dispute" AND Acme NOT "credit note"',
    top_k: 20,
  }),
});
const boolData = (await boolRes.json()).data;
console.log(`Found ${boolData.results.length} results`);

More examples of query syntax:

# OR groups
(apple OR google) AND lawsuit

# Proximity search -- terms within 5 positions
revenue NEAR/5 forecast

# Wildcard prefix search
invest*

# Field-scoped search
title:"quarterly report" AND body:compliance

Step 4: Filtering by MIME type and date range¶

Use the filters object to narrow results by file type, language, date range, or tags -- independently of the query string.

IPTO CLIcURLPythonTypeScript

ipto search "quarterly revenue report" \
  --filter-mime application/pdf \
  --filter-lang en

curl -X POST https://api.ipto.ai/v1/search \
  -H "Authorization: Bearer $IPTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "quarterly revenue report",
    "filters": {
      "mime_types": ["application/pdf"],
      "languages": ["en"],
      "created_at_gte": "2025-01-01T00:00:00Z",
      "created_at_lte": "2025-12-31T23:59:59Z"
    },
    "top_k": 20
  }'

resp = requests.post(
    f"{BASE}/v1/search",
    headers=search_headers,
    json={
        "query": "quarterly revenue report",
        "filters": {
            "mime_types": ["application/pdf"],
            "languages": ["en"],
            "created_at_gte": "2025-01-01T00:00:00Z",
            "created_at_lte": "2025-12-31T23:59:59Z",
        },
        "top_k": 20,
    },
)
results = resp.json()["data"]["results"]

const filterRes = await fetch(`${BASE}/v1/search`, {
  method: "POST",
  headers: searchHeaders,
  body: JSON.stringify({
    query: "quarterly revenue report",
    filters: {
      mime_types: ["application/pdf"],
      languages: ["en"],
      created_at_gte: "2025-01-01T00:00:00Z",
      created_at_lte: "2025-12-31T23:59:59Z",
    },
    top_k: 20,
  }),
});
const filterData = (await filterRes.json()).data;

Filters combine with logical AND across fields and logical OR within a single field's array values. For example, mime_types: ["application/pdf", "image/png"] matches objects that are either PDF or PNG.

Step 5: Choosing a retrieval mode¶

IPTO supports multiple retrieval strategies. Choose the one that fits your use case.

Mode	Best for	Description
`lexical`	Exact terms, boolean queries, regex	Keyword and boolean search using BM25 scoring.
`dense`	Semantic similarity, natural language	Vector similarity search using embeddings.
`hybrid`	General-purpose search	Combines lexical and dense results using Reciprocal Rank Fusion.
`auto`	When you are unsure	System analyzes the query and picks the best mode. This is the default.

IPTO CLIcURLPythonTypeScript

ipto search "regulatory compliance documentation for GDPR" \
  --mode hybrid

# Explicit hybrid search
curl -X POST https://api.ipto.ai/v1/search \
  -H "Authorization: Bearer $IPTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "regulatory compliance documentation for GDPR",
    "retrieval_mode": "hybrid",
    "top_k": 20
  }'

resp = requests.post(
    f"{BASE}/v1/search",
    headers=search_headers,
    json={
        "query": "regulatory compliance documentation for GDPR",
        "retrieval_mode": "hybrid",
        "top_k": 20,
    },
)

const hybridRes = await fetch(`${BASE}/v1/search`, {
  method: "POST",
  headers: searchHeaders,
  body: JSON.stringify({
    query: "regulatory compliance documentation for GDPR",
    retrieval_mode: "hybrid",
    top_k: 20,
  }),
});

When to use auto

If your query contains boolean operators, phrases in quotes, wildcards, or regex patterns, auto mode will select lexical. For short natural-language queries without operators, auto will select hybrid. You can always override with an explicit retrieval_mode.

Step 6: Pagination¶

When a search returns more results than top_k, the response includes a next_cursor value. Pass it in the next request to fetch the next page.

cURLPythonTypeScript

# First page
curl -X POST https://api.ipto.ai/v1/search \
  -H "Authorization: Bearer $IPTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "vendor contract terms",
    "top_k": 10
  }'

# Next page (using cursor from previous response)
curl -X POST https://api.ipto.ai/v1/search \
  -H "Authorization: Bearer $IPTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "vendor contract terms",
    "top_k": 10,
    "cursor": "eyJwYWdlIjoy..."
  }'

cursor = None
all_results = []

while True:
    body = {"query": "vendor contract terms", "top_k": 10}
    if cursor:
        body["cursor"] = cursor

    resp = requests.post(
        f"{BASE}/v1/search",
        headers=search_headers,
        json=body,
    )
    data = resp.json()["data"]
    all_results.extend(data["results"])

    cursor = data.get("next_cursor")
    if not cursor:
        break

print(f"Total results collected: {len(all_results)}")

let cursor: string | null = null;
const allResults: any[] = [];

do {
  const body: Record<string, any> = {
    query: "vendor contract terms",
    top_k: 10,
  };
  if (cursor) body.cursor = cursor;

  const res = await fetch(`${BASE}/v1/search`, {
    method: "POST",
    headers: searchHeaders,
    body: JSON.stringify(body),
  });
  const data = (await res.json()).data;
  allResults.push(...data.results);
  cursor = data.next_cursor;
} while (cursor);

console.log(`Total results collected: ${allResults.length}`);

Pagination and billing

Paginating through results does not double-charge for the same search units within a single query session. Each unique billable result is charged once.

Step 7: Downloading full objects from results¶

Each search result includes a retrieval_event_id. Use it to request a download URL for the original file.

IPTO CLIcURLPythonTypeScript

ipto objects download obj_abc123 --output ./downloaded-file.pdf

curl -X POST https://api.ipto.ai/v1/retrieval-events/$RETRIEVAL_EVENT_ID/download \
  -H "Authorization: Bearer $IPTO_API_KEY"

retrieval_event_id = results["results"][0]["retrieval_event_id"]

resp = requests.post(
    f"{BASE}/v1/retrieval-events/{retrieval_event_id}/download",
    headers=search_headers,
)
resp.raise_for_status()
download = resp.json()["data"]

# Download the actual file
file_resp = requests.get(download["download_url"])
with open("downloaded-file.pdf", "wb") as f:
    f.write(file_resp.content)

print(f"Downloaded {len(file_resp.content)} bytes")

const retEventId = searchData.results[0].retrieval_event_id;

const dlRes = await fetch(
  `${BASE}/v1/retrieval-events/${retEventId}/download`,
  {
    method: "POST",
    headers: { Authorization: `Bearer ${apiSecret}` },
  }
);
const dlData = (await dlRes.json()).data;

// Download the actual file
const fileRes = await fetch(dlData.download_url);
const fileBuffer = await fileRes.arrayBuffer();
const { writeFile } = await import("fs/promises");
await writeFile("downloaded-file.pdf", Buffer.from(fileBuffer));
console.log(`Downloaded ${fileBuffer.byteLength} bytes`);

Response from download endpoint:

{
  "data": {
    "download_event_id": "dl_xyz789",
    "download_url": "https://storage.example.com/presigned-get-url...",
    "expires_at": "2026-04-05T12:15:00Z"
  },
  "request_id": "req_011",
  "timestamp": "2026-04-05T12:00:05Z"
}

Download charges

Downloading the original file is a separate billable event from the initial retrieval. The download is metered and will appear in your spend summary.

Search endpoint availability

The POST /v1/search endpoint is planned for an upcoming release. The query syntax, request/response shapes, and retrieval concepts described in this guide are stable and reflect the finalized API design. You can build your integration against this contract today -- the endpoint will be fully operational when it ships.

Query syntax quick reference¶

Syntax	Description	Example
`term`	Single keyword	`invoice`
`"phrase"`	Exact phrase match	`"chief executive officer"`
`A AND B`	Both terms required	`revenue AND forecast`
`A OR B`	Either term matches	`CEO OR "chief executive"`
`NOT A` or `-A`	Exclude term	`contract NOT termination`
`(A OR B) AND C`	Grouped boolean	`(apple OR google) AND lawsuit`
`A NEAR/n B`	Terms within n positions	`breach NEAR/5 notification`
`prefix*`	Prefix wildcard (min 3 chars)	`invest*`
`*suffix`	Suffix wildcard	`*ization`
`/regex/`	Regular expression	`/INV-[0-9]{4}/`
`field:term`	Field-scoped search	`title:"quarterly report"`

Available fields for field-scoped search: title, body, ocr, transcript, caption, metadata, tags

Operator precedence: NOT > AND > OR. Use parentheses to override.

Next steps¶

Managing API Keys -- Restrict API keys to specific datasets for production agents.
Uploading Data -- Publish your own datasets to the marketplace.
Provider Analytics -- See how your datasets are being searched and monetized.