Frequently Asked Questions¶

General¶

What is IPTO?

IPTO is a marketplace for valuable private data. It enables organizations to list proprietary corpora for agent and API access, monetizing retrieval and downstream usage rather than raw file transfers. The platform handles hosting, indexing, discovery, retrieval, access control, usage metering, and payout calculation.

Who is IPTO for?

IPTO serves two audiences: data providers who upload and monetize proprietary corpora (documents, images, audio, video, structured records), and agent customers who access that data through search APIs and agent workflows. Platform operators manage governance, review queues, and payout processing.

What data formats does IPTO support?

IPTO supports documents and PDFs, scanned records and invoices, images with OCR-relevant content, audio and video collections, and structured business records. Each dataset specifies a source_modality that determines the extraction and enrichment pipeline applied during ingestion.

Is IPTO suitable for real-time data?

IPTO is optimized for corpus-style data that is uploaded, reviewed, indexed, and then retrieved. Uploaded objects go through a staged review process before indexing, so it is best suited for data that does not require sub-second freshness. Near-real-time use cases can work if the review and ingestion pipeline latency is acceptable.

What regions is IPTO available in?

IPTO currently operates with European infrastructure. Each tenant is assigned a region at creation time. Storage uses Hetzner Object Storage in Europe. Region expansion is on the roadmap but not part of the v1 launch.

Authentication & Security¶

How do I authenticate with the API?

All authenticated requests require an Authorization: Bearer <token> header. The token can be either a session token obtained through the login flow or an API key created via POST /v1/api-keys. Every request resolves a tenant_id, principal_id, principal_type, and scopes from the credential.

What is the difference between session tokens and API keys?

Session tokens are issued during interactive login (POST /v1/auth/signup or login) and are intended for console users. API keys are tenant-scoped machine credentials created via POST /v1/api-keys, designed for agents and automation. API keys carry explicit scopes and can be restricted to specific datasets.

How do I rotate my API keys?

Create a new API key with POST /v1/api-keys, update your application to use the new key, then revoke the old key with DELETE /v1/api-keys/{api_key_id}. The full secret is returned only once at creation time, so store it securely before discarding the response.

Are API requests encrypted?

Yes. The API is served over HTTPS only. All request and response bodies use UTF-8 JSON transmitted over TLS. Credentials in the Authorization header are never logged in plaintext.

What happens if my session token expires?

When a session token expires, the API returns a 401 unauthorized error. You need to re-authenticate by logging in again to obtain a fresh session token. For long-running automation, use API keys instead of session tokens as they do not expire on a session basis.

Datasets & Objects¶

What is the maximum file size I can upload?

Single-part uploads are used below the multipart threshold (64 MiB). For larger files, the platform automatically uses multipart upload. There is no hard upper limit published in v1, but upload URLs expire after 15 minutes, so extremely large uploads should use the multipart flow with part sizes of at least 8 MiB.

How long does the review process take?

Newly uploaded objects start in review_state=staged and must be approved by a platform administrator before indexing begins. Staged uploads that are not reviewed within 48-72 hours expire automatically. Actual review time depends on the admin queue throughput.

Can I update an object after it has been uploaded?

Objects are immutable once uploaded. The original blob, filename, MIME type, size, and checksum cannot be changed. To replace content, delete the existing object and upload a new one. Mutable metadata such as status and review_state is managed by the platform lifecycle.

What happens to rejected objects?

Rejected objects (those with review_state=rejected) are never indexed or made searchable. They are purged from staging storage without creating any index artifacts. The rejection decision is recorded as an audit event.

How do I delete a dataset and all its objects?

Update the dataset status to pending_deletion via PATCH /v1/datasets/{dataset_id}. This creates tombstones for all contained objects and enqueues index update work. Original bytes are retained for a short grace period before final purge. The dataset transitions to deleted once cleanup completes.

Search & Retrieval¶

What search modes are available?

IPTO supports four retrieval modes: lexical (keyword and boolean search using BM25), dense (vector similarity search), hybrid (lexical + vector merged via Reciprocal Rank Fusion), and auto (the system selects the best mode based on query characteristics). The default is auto.

Can I search across multiple datasets at once?

Yes. If you omit dataset_ids or pass an empty array, the search fans out across all datasets accessible to your tenant. You can also pass specific dataset_ids to scope the search. For large access sets (100+ datasets), the platform uses catalog-first resolution to efficiently select the most relevant datasets.

Does IPTO support vector/semantic search?

Yes. All indexed search units have vector embeddings. You can use retrieval_mode: "dense" for pure vector similarity search or retrieval_mode: "hybrid" to combine lexical and semantic signals. The embedding model is configurable per tenant.

How are search results ranked?

Ranking depends on the retrieval mode. Lexical mode uses BM25 with phrase and proximity scoring. Dense mode uses cosine similarity. Hybrid mode merges both via Reciprocal Rank Fusion. For multi-dataset queries, additional signals include term fingerprint matching, bloom filter screening, historical relevance, and bounded price attractiveness.

Is there a limit on search queries?

The maximum top_k per request is 100 (default 20). Query execution has a hard timeout to protect serving costs. The API enforces rate limiting; exceeding the limit returns a 429 rate_limited error with a Retry-After header. Pagination uses cursor-based tokens for iterating through larger result sets.

Billing & Pricing¶

How does billing work?

Billing is event-driven. Each retrieval, citation, download, and outcome event generates a metering record. These events are rated against the applicable price book and written to an append-only ledger. Buyer invoices aggregate metered charges plus recurring plan fees on a monthly billing cycle.

What am I charged for as a buyer?

Buyers pay a monthly platform subscription plus metered charges for: standard or premium retrievals (per result chunk), citations (when a retrieved chunk is used in output), downloads (per original file access), and outcome-linked fees (a revenue share percentage when retrieval drives measurable downstream value).

How do I earn revenue as a provider?

Providers earn a revenue share on metered retrieval, citation, download, and outcome charges generated by buyers accessing their datasets. The default split is 60% to the provider and 40% to the platform. Curated premium supply may qualify for a 70/30 split.

When are payouts processed?

Payouts are calculated monthly at the end of each billing period. Payout release occurs after a dispute and hold window. You can check payout status via GET /v1/provider/payouts, which shows pending, approved, paid, held, or reversed states for each settlement period.

Can I set my own pricing?

Providers choose a monetization mode (open, premium, or outcome_share) and a pricing model (fixed, time_decay, or demand_curve). The default pricing model is demand_curve, which adapts to observed demand. Fully custom seller pricing is not in scope for v1, but fixed mode allows stable per-event pricing that you control.

API Keys & Access Control¶

How many API keys can I create?

There is no hard limit on the number of API keys per tenant in v1. Each key is independently revocable and can have different scopes and dataset restrictions. Use GET /v1/api-keys to list all keys for your tenant.

What scopes should I assign to my key?

Assign the minimum scopes your integration requires. Common patterns: search:query for search-only agents, datasets:read plus search:query for catalog browsing and search, objects:write plus datasets:write for upload automation. Available scopes are datasets:read, datasets:write, objects:write, search:query, usage:read, keys:write, billing:read, and admin:*.

What is the difference between all_available and allow_list access modes?

all_available (the default) grants the API key access to every dataset currently available to the tenant. allow_list restricts the key to only the specific datasets listed in dataset_ids. Key-level restrictions can only narrow tenant-level access, never expand it.

Can I restrict an API key to specific datasets?

Yes. Set dataset_access_mode to allow_list and provide the desired dataset_ids array when creating or updating the key via POST /v1/api-keys or PATCH /v1/api-keys/{api_key_id}. The key will only be able to search and access those specific datasets.

How do I revoke an API key?

Send a DELETE /v1/api-keys/{api_key_id} request. Revocation is immediate and independent of user accounts. Any in-flight requests using the revoked key will fail with 401 unauthorized. Key revocation is recorded as an audit event.