Search & Retrieval¶

Search is the core interaction in IPTO. Buyers submit queries, and the platform retrieves relevant content from across the datasets they have access to -- ranking, deduplicating, and metering results along the way.

Retrieval modes¶

IPTO supports four retrieval modes that control how your query is matched against indexed content.

Mode	Description
`lexical`	Keyword-based search using BM25 ranking. Best for precise term matching, boolean queries, and structured query syntax.
`dense`	Vector similarity search that matches on semantic meaning rather than exact keywords. Best for natural language questions.
`hybrid`	Combines lexical and dense retrieval, merging results using Reciprocal Rank Fusion (RRF). Balances precision and recall.
`auto`	The system selects the best mode based on query characteristics. This is the default.

Tip

Use auto unless you have a specific reason to force a retrieval mode. The system analyzes your query structure -- boolean operators and wildcards favor lexical, while natural language questions favor hybrid.

Search flow¶

When you submit a search request, it passes through the following stages:

flowchart TD
    A[Query submitted] --> B[Authenticate & resolve tenant]
    B --> C[Resolve accessible datasets]
    C --> D[Select retrieval mode]
    D --> E[Execute search]
    E --> F[Rank & deduplicate results]
    F --> G[Build snippets & citations]
    G --> H[Meter billable results]
    H --> I[Return response]

Each stage enforces access controls, applies filters, and ensures that only authorized, active content is returned.

Query syntax¶

IPTO supports a structured query syntax that gives you fine-grained control over how your search is interpreted.

Boolean operators¶

Combine terms with boolean logic to narrow or broaden your results.

Operator	Syntax	Example
AND	`A AND B` or `A B`	`revenue AND forecast`
OR	`A OR B`	`CEO OR "chief executive"`
NOT	`NOT A` or `-A`	`contract NOT termination`
Grouping	`(A OR B) AND C`	`(apple OR google) AND lawsuit`

Operator precedence is NOT > AND > OR. Use parentheses to override precedence.

Phrase search¶

Match an exact sequence of words by enclosing them in double quotes.

"chief executive officer"

Phrases are position-aware and require terms to appear adjacent and in order.

Proximity search¶

Find terms that appear near each other using the NEAR/n operator, where n is the maximum distance in word positions.

apple NEAR/5 lawsuit
"data breach" NEAR/10 notification

If no distance is specified, the default is NEAR/10.

Wildcard search¶

Use * as a wildcard for prefix, suffix, or infix matching.

Pattern	Matches
`invest*`	invest, investor, investment, investing
`*ization`	organization, optimization, monetization
`c*o`	CEO, CFO, CTO

Note

Prefix wildcards require at least 3 characters before the *. Suffix and infix wildcards are more computationally expensive -- use them sparingly.

Field-scoped search¶

Target specific fields in the indexed content.

title:"quarterly report"
body:compliance AND tags:gdpr

Available fields: title, body, ocr, transcript, caption, metadata, tags.

Filters¶

Filters narrow results without affecting relevance scoring. Filters combine with logical AND; values within a single filter use logical OR.

Filter	Type	Description
`mime_types`	string array	Filter by file type (e.g., `["application/pdf", "image/png"]`).
`languages`	string array	Filter by content language (e.g., `["en", "de"]`).
`created_at_gte`	timestamp	Only include objects created on or after this date.
`created_at_lte`	timestamp	Only include objects created on or before this date.
`tags_any`	string array	Include objects matching any of the specified tags.
`object_ids`	string array	Restrict results to specific objects by ID.

Result structure¶

Each search result contains the information you need to use, cite, and attribute the retrieved content.

Snippets¶

A snippet is a highlighted text excerpt from the matched content. Snippets show the relevant portion of the indexed text with matched terms emphasized. They are generated from stored indexed content and do not require fetching the original file.

Citations¶

A citation provides a precise locator for the matched content within the original object. Citations include:

The object_id and dataset_id for attribution.
A locator that identifies the exact position -- for example, a page number in a PDF, a timestamp range in a transcript, or a chunk ordinal in a text document.
A display_text suitable for rendering in a citation list.

Scores¶

Each result includes a relevance score and a rank position. Scores are comparable within a single query but not across different queries. Results are ordered by descending score.

Billable results

Each result includes a billable flag indicating whether it counts toward metered usage. Only billable results generate retrieval charges. Duplicate results within the same query session are charged only once.

FAQ¶

What is the maximum number of results I can request?

You can request up to 100 results per query using the top_k parameter. The default is 20. For larger result sets, use cursor-based pagination to page through additional results.

Can I search across datasets from multiple providers in a single query?

Yes. When you omit the dataset_ids parameter or pass an empty array, the search spans all datasets accessible to your tenant. Results from different providers are merged and ranked together.

Does the retrieval mode affect billing?

No. Billing is based on the number of billable results returned, not the retrieval mode used. Whether you use lexical, dense, hybrid, or auto, the metering is the same.

What happens if I search for a term that does not exist in any dataset?

The API returns a successful response with an empty results array. No retrieval events are created and nothing is billed.

Can I combine filters with boolean query syntax?

Yes. Filters and query syntax work together. The query syntax controls how terms are matched, while filters restrict the candidate set. For example, you can use a boolean query like revenue AND forecast with a filter for mime_types: ["application/pdf"] to search only PDF documents.