Searching Data¶
This guide walks through the buyer and agent search workflow: creating a scoped API key, running search queries, applying filters, choosing retrieval modes, paginating results, and downloading full objects.
Prerequisites¶
- An IPTO account with buyer access to one or more datasets.
- An API key with
search:queryscope. If you do not have one yet, Step 1 below shows how to create one.
Step 1: Create a scoped API key¶
Create an API key with only the scopes your search agent needs. For a read-only search workflow, search:query and datasets:read are sufficient.
import requests
BASE = "https://api.ipto.ai"
headers = {"Authorization": f"Bearer {token}"}
resp = requests.post(
f"{BASE}/v1/api-keys",
headers=headers,
json={
"name": "search-agent-prod",
"scopes": ["search:query", "datasets:read"],
"dataset_access_mode": "all_available",
},
)
resp.raise_for_status()
key = resp.json()["data"]
api_secret = key["secret"]
print(f"Save this secret: {api_secret}")
const BASE = "https://api.ipto.ai";
const headers = {
Authorization: `Bearer ${token}`,
"Content-Type": "application/json",
};
const keyRes = await fetch(`${BASE}/v1/api-keys`, {
method: "POST",
headers,
body: JSON.stringify({
name: "search-agent-prod",
scopes: ["search:query", "datasets:read"],
dataset_access_mode: "all_available",
}),
});
const keyData = (await keyRes.json()).data;
const apiSecret = keyData.secret;
console.log(`Save this secret: ${apiSecret}`);
Save your secret immediately
The API key secret is returned only once at creation time. Store it in a secrets manager or environment variable. If you lose it, revoke the key and create a new one.
From this point forward, use the API key for authentication:
Step 2: Basic search query¶
Submit a search query across all datasets accessible to your API key. The search endpoint accepts a natural language query string and returns ranked results with snippets and citations.
search_headers = {"Authorization": f"Bearer {api_secret}"}
resp = requests.post(
f"{BASE}/v1/search",
headers=search_headers,
json={
"query": "invoice showing VAT dispute with Acme in March",
"top_k": 10,
"include_snippets": True,
"include_citations": True,
},
)
resp.raise_for_status()
results = resp.json()["data"]
for hit in results["results"]:
print(f"[{hit['rank']}] {hit['snippet'][:80]}...")
const searchHeaders = {
Authorization: `Bearer ${apiSecret}`,
"Content-Type": "application/json",
};
const searchRes = await fetch(`${BASE}/v1/search`, {
method: "POST",
headers: searchHeaders,
body: JSON.stringify({
query: "invoice showing VAT dispute with Acme in March",
top_k: 10,
include_snippets: true,
include_citations: true,
}),
});
const searchData = (await searchRes.json()).data;
for (const hit of searchData.results) {
console.log(`[${hit.rank}] ${hit.snippet?.slice(0, 80)}...`);
}
Response:
{
"data": {
"query_id": "qry_abc123",
"results": [
{
"retrieval_event_id": "ret_001",
"search_unit_id": "unit_abc",
"object_id": "obj_def456",
"dataset_id": "dset_abc123",
"seller_tenant_id": "ten_seller_001",
"rank": 1,
"score": 12.73,
"billable": true,
"pricing_band": 2,
"snippet": "Invoice notes mention a VAT dispute with Acme in March...",
"citation_locator": {
"locator": { "page": 4 },
"display_text": "invoice-0425.pdf p.4"
}
}
],
"next_cursor": null,
"charged_result_count": 1,
"timing_ms": {
"filter": 2,
"lexical": 19,
"vector": 0,
"rerank": 0,
"total": 24
}
},
"request_id": "req_010",
"timestamp": "2026-04-05T12:00:00Z"
}
Step 3: Using boolean operators and phrase search¶
The query string supports structured syntax for precise searches. Combine operators to narrow results.
More examples of query syntax:
# OR groups
(apple OR google) AND lawsuit
# Proximity search -- terms within 5 positions
revenue NEAR/5 forecast
# Wildcard prefix search
invest*
# Field-scoped search
title:"quarterly report" AND body:compliance
Step 4: Filtering by MIME type and date range¶
Use the filters object to narrow results by file type, language, date range, or tags -- independently of the query string.
curl -X POST https://api.ipto.ai/v1/search \
-H "Authorization: Bearer $IPTO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "quarterly revenue report",
"filters": {
"mime_types": ["application/pdf"],
"languages": ["en"],
"created_at_gte": "2025-01-01T00:00:00Z",
"created_at_lte": "2025-12-31T23:59:59Z"
},
"top_k": 20
}'
resp = requests.post(
f"{BASE}/v1/search",
headers=search_headers,
json={
"query": "quarterly revenue report",
"filters": {
"mime_types": ["application/pdf"],
"languages": ["en"],
"created_at_gte": "2025-01-01T00:00:00Z",
"created_at_lte": "2025-12-31T23:59:59Z",
},
"top_k": 20,
},
)
results = resp.json()["data"]["results"]
const filterRes = await fetch(`${BASE}/v1/search`, {
method: "POST",
headers: searchHeaders,
body: JSON.stringify({
query: "quarterly revenue report",
filters: {
mime_types: ["application/pdf"],
languages: ["en"],
created_at_gte: "2025-01-01T00:00:00Z",
created_at_lte: "2025-12-31T23:59:59Z",
},
top_k: 20,
}),
});
const filterData = (await filterRes.json()).data;
Filters combine with logical AND across fields and logical OR within a single field's array values. For example, mime_types: ["application/pdf", "image/png"] matches objects that are either PDF or PNG.
Step 5: Choosing a retrieval mode¶
IPTO supports multiple retrieval strategies. Choose the one that fits your use case.
| Mode | Best for | Description |
|---|---|---|
lexical | Exact terms, boolean queries, regex | Keyword and boolean search using BM25 scoring. |
dense | Semantic similarity, natural language | Vector similarity search using embeddings. |
hybrid | General-purpose search | Combines lexical and dense results using Reciprocal Rank Fusion. |
auto | When you are unsure | System analyzes the query and picks the best mode. This is the default. |
When to use auto
If your query contains boolean operators, phrases in quotes, wildcards, or regex patterns, auto mode will select lexical. For short natural-language queries without operators, auto will select hybrid. You can always override with an explicit retrieval_mode.
Step 6: Pagination¶
When a search returns more results than top_k, the response includes a next_cursor value. Pass it in the next request to fetch the next page.
# First page
curl -X POST https://api.ipto.ai/v1/search \
-H "Authorization: Bearer $IPTO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "vendor contract terms",
"top_k": 10
}'
# Next page (using cursor from previous response)
curl -X POST https://api.ipto.ai/v1/search \
-H "Authorization: Bearer $IPTO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "vendor contract terms",
"top_k": 10,
"cursor": "eyJwYWdlIjoy..."
}'
cursor = None
all_results = []
while True:
body = {"query": "vendor contract terms", "top_k": 10}
if cursor:
body["cursor"] = cursor
resp = requests.post(
f"{BASE}/v1/search",
headers=search_headers,
json=body,
)
data = resp.json()["data"]
all_results.extend(data["results"])
cursor = data.get("next_cursor")
if not cursor:
break
print(f"Total results collected: {len(all_results)}")
let cursor: string | null = null;
const allResults: any[] = [];
do {
const body: Record<string, any> = {
query: "vendor contract terms",
top_k: 10,
};
if (cursor) body.cursor = cursor;
const res = await fetch(`${BASE}/v1/search`, {
method: "POST",
headers: searchHeaders,
body: JSON.stringify(body),
});
const data = (await res.json()).data;
allResults.push(...data.results);
cursor = data.next_cursor;
} while (cursor);
console.log(`Total results collected: ${allResults.length}`);
Pagination and billing
Paginating through results does not double-charge for the same search units within a single query session. Each unique billable result is charged once.
Step 7: Downloading full objects from results¶
Each search result includes a retrieval_event_id. Use it to request a download URL for the original file.
retrieval_event_id = results["results"][0]["retrieval_event_id"]
resp = requests.post(
f"{BASE}/v1/retrieval-events/{retrieval_event_id}/download",
headers=search_headers,
)
resp.raise_for_status()
download = resp.json()["data"]
# Download the actual file
file_resp = requests.get(download["download_url"])
with open("downloaded-file.pdf", "wb") as f:
f.write(file_resp.content)
print(f"Downloaded {len(file_resp.content)} bytes")
const retEventId = searchData.results[0].retrieval_event_id;
const dlRes = await fetch(
`${BASE}/v1/retrieval-events/${retEventId}/download`,
{
method: "POST",
headers: { Authorization: `Bearer ${apiSecret}` },
}
);
const dlData = (await dlRes.json()).data;
// Download the actual file
const fileRes = await fetch(dlData.download_url);
const fileBuffer = await fileRes.arrayBuffer();
const { writeFile } = await import("fs/promises");
await writeFile("downloaded-file.pdf", Buffer.from(fileBuffer));
console.log(`Downloaded ${fileBuffer.byteLength} bytes`);
Response from download endpoint:
{
"data": {
"download_event_id": "dl_xyz789",
"download_url": "https://storage.example.com/presigned-get-url...",
"expires_at": "2026-04-05T12:15:00Z"
},
"request_id": "req_011",
"timestamp": "2026-04-05T12:00:05Z"
}
Download charges
Downloading the original file is a separate billable event from the initial retrieval. The download is metered and will appear in your spend summary.
Search endpoint availability
The POST /v1/search endpoint is planned for an upcoming release. The query syntax, request/response shapes, and retrieval concepts described in this guide are stable and reflect the finalized API design. You can build your integration against this contract today -- the endpoint will be fully operational when it ships.
Query syntax quick reference¶
| Syntax | Description | Example |
|---|---|---|
term | Single keyword | invoice |
"phrase" | Exact phrase match | "chief executive officer" |
A AND B | Both terms required | revenue AND forecast |
A OR B | Either term matches | CEO OR "chief executive" |
NOT A or -A | Exclude term | contract NOT termination |
(A OR B) AND C | Grouped boolean | (apple OR google) AND lawsuit |
A NEAR/n B | Terms within n positions | breach NEAR/5 notification |
prefix* | Prefix wildcard (min 3 chars) | invest* |
*suffix | Suffix wildcard | *ization |
/regex/ | Regular expression | /INV-[0-9]{4}/ |
field:term | Field-scoped search | title:"quarterly report" |
Available fields for field-scoped search: title, body, ocr, transcript, caption, metadata, tags
Operator precedence: NOT > AND > OR. Use parentheses to override.
Next steps¶
- Managing API Keys -- Restrict API keys to specific datasets for production agents.
- Uploading Data -- Publish your own datasets to the marketplace.
- Provider Analytics -- See how your datasets are being searched and monetized.