Uploading Data¶
This guide walks through the complete provider upload workflow: creating a dataset, uploading files through presigned URLs, confirming uploads, and monitoring object status through the processing pipeline.
Prerequisites¶
Before you begin, make sure you have:
- An authenticated session -- either a session token from signup/login or an API key with
datasets:writeandobjects:writescopes. - A dataset created (or you will create one in Step 1 below).
- One or more files ready to upload.
Step 1: Create a dataset¶
Datasets are containers for related objects. Each dataset declares a source modality, monetization mode, and visibility setting.
curl -X POST https://api.ipto.ai/v1/datasets \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "AP Invoices 2025",
"description": "Invoice corpus for vendor dispute search",
"source_modality": "document",
"monetization_mode": "premium",
"pricing_model": "demand_curve",
"visibility": "listed"
}'
import requests
BASE = "https://api.ipto.ai"
headers = {"Authorization": f"Bearer {token}"}
resp = requests.post(
f"{BASE}/v1/datasets",
headers=headers,
json={
"name": "AP Invoices 2025",
"description": "Invoice corpus for vendor dispute search",
"source_modality": "document",
"monetization_mode": "premium",
"pricing_model": "demand_curve",
"visibility": "listed",
},
)
resp.raise_for_status()
dataset = resp.json()
dataset_id = dataset["dataset_id"]
print(f"Created dataset: {dataset_id}")
const BASE = "https://api.ipto.ai";
const headers = {
Authorization: `Bearer ${token}`,
"Content-Type": "application/json",
};
const dsRes = await fetch(`${BASE}/v1/datasets`, {
method: "POST",
headers,
body: JSON.stringify({
name: "AP Invoices 2025",
description: "Invoice corpus for vendor dispute search",
source_modality: "document",
monetization_mode: "premium",
pricing_model: "demand_curve",
visibility: "listed",
}),
});
const dataset = await dsRes.json();
const datasetId = dataset.dataset_id;
Response:
{
"data": {
"dataset_id": "dset_abc123",
"status": "draft",
"created_at": "2026-04-05T10:00:00Z"
},
"request_id": "req_001",
"timestamp": "2026-04-05T10:00:00Z"
}
Step 2: Initiate the upload¶
Request a presigned upload URL for each file. The API returns a short-lived URL that you use to upload the raw bytes directly to cloud storage.
Using the CLI?
The CLI handles steps 2-4 (initiate, upload, confirm) in a single command:
import os
file_path = "invoice-0425.pdf"
file_size = os.path.getsize(file_path)
resp = requests.post(
f"{BASE}/v1/datasets/{dataset_id}/objects/upload",
headers=headers,
json={
"filename": "invoice-0425.pdf",
"content_type": "application/pdf",
"size_bytes": file_size,
},
)
resp.raise_for_status()
upload = resp.json()["data"]
object_id = upload["object_id"]
upload_url = upload["upload_url"]
print(f"Object ID: {object_id}")
print(f"Upload URL expires at: {upload['expires_at']}")
import { stat } from "fs/promises";
const filePath = "invoice-0425.pdf";
const fileStats = await stat(filePath);
const uploadRes = await fetch(
`${BASE}/v1/datasets/${datasetId}/objects/upload`,
{
method: "POST",
headers,
body: JSON.stringify({
filename: "invoice-0425.pdf",
content_type: "application/pdf",
size_bytes: fileStats.size,
}),
}
);
const uploadData = (await uploadRes.json()).data;
const objectId = uploadData.object_id;
const uploadUrl = uploadData.upload_url;
Response:
{
"data": {
"upload_id": "upl_xyz789",
"object_id": "obj_def456",
"blob_id": "blob_aaa111",
"review_state": "staged",
"upload_strategy": "single_part",
"upload_url": "https://storage.example.com/presigned-put-url...",
"expires_at": "2026-04-05T10:15:00Z"
},
"request_id": "req_002",
"timestamp": "2026-04-05T10:00:01Z"
}
Step 3: Upload to the presigned URL¶
Send the raw file bytes to the presigned URL with a PUT request. No Authorization header is needed -- the URL itself contains temporary credentials.
import { readFile } from "fs/promises";
const fileData = await readFile(filePath);
const putRes = await fetch(uploadUrl, {
method: "PUT",
headers: { "Content-Type": "application/pdf" },
body: fileData,
});
if (!putRes.ok) {
throw new Error(`Upload failed: ${putRes.status}`);
}
console.log("Upload complete");
Step 4: Confirm the upload¶
After the file bytes have been uploaded to cloud storage, confirm the upload so the platform can begin processing.
Response:
{
"data": {
"object_id": "obj_def456",
"status": "uploaded",
"review_state": "staged"
},
"request_id": "req_003",
"timestamp": "2026-04-05T10:01:00Z"
}
Step 5: Check object status¶
Poll the object endpoint to track processing progress. The object moves through several statuses as it is normalized, extracted, enriched, and indexed.
const poll = async () => {
while (true) {
const res = await fetch(`${BASE}/v1/objects/${objectId}`, {
headers: { Authorization: `Bearer ${token}` },
});
const obj = (await res.json()).data;
console.log(`Status: ${obj.status} Review: ${obj.review_state}`);
if (obj.status === "active" || obj.status === "failed") {
break;
}
await new Promise((r) => setTimeout(r, 5000));
}
};
await poll();
Response:
{
"data": {
"object_id": "obj_def456",
"dataset_id": "dset_abc123",
"status": "active",
"review_state": "approved",
"artifact_summary": {
"plain_text": true,
"ocr_blocks": true,
"chunk_embeddings": true
},
"latest_job_id": "job_pqr999",
"created_at": "2026-04-05T10:00:01Z"
},
"request_id": "req_004",
"timestamp": "2026-04-05T10:12:00Z"
}
Handling errors¶
Expired upload URLs¶
Presigned URLs have a short TTL (typically 15 minutes). If you receive a 403 or 400 from the storage endpoint, the URL has likely expired.
Fix: Call POST /v1/datasets/{id}/objects/upload again to get a fresh URL. Use the same Idempotency-Key header if you want the server to return the same object ID.
Size mismatch¶
If the bytes uploaded do not match the size_bytes declared during initiation, confirmation will fail with a 422 unprocessable error.
Fix: Ensure you calculate the file size accurately before initiating the upload. Use os.path.getsize() in Python or fs.stat() in Node.js rather than hard-coding values.
Checksum mismatch¶
If you provide a checksum_sha256 during upload initiation and the actual file hash does not match, the upload will be rejected.
Fix: Compute the SHA-256 hash of the file before initiating the upload and pass the correct value.
72-hour review window
All newly uploaded objects start with review_state=staged. An IPTO administrator must approve staged objects before they can proceed through the processing pipeline and become searchable. If an object is not reviewed within 48-72 hours, it automatically transitions to review_state=expired and will not be indexed. Contact support if your uploads are not being reviewed in a timely manner.
Complete upload-to-review pipeline¶
The following diagram shows the full lifecycle of an uploaded object, from initiation through admin review and into the processing pipeline.
sequenceDiagram
participant Provider
participant API as IPTO API
participant Storage as Cloud Storage
participant Admin as Platform Admin
participant Pipeline as Processing Pipeline
Provider->>API: POST /v1/datasets/{id}/objects/upload
API-->>Provider: upload_url + object_id (review_state=staged)
Provider->>Storage: PUT upload_url (file bytes)
Storage-->>Provider: 200 OK
Provider->>API: POST /v1/objects/{id}/confirm
API-->>Provider: status=uploaded, review_state=staged
Note over API,Admin: Object enters review queue
alt Approved within 72 hours
Admin->>API: POST /v1/admin/staged-objects/{id}/approve
API-->>Admin: review_state=approved, job_id
API->>Pipeline: Enqueue normalization job
Pipeline->>Pipeline: normalizing -> extracting -> enriching -> indexing
Pipeline-->>API: status=active
else Rejected
Admin->>API: POST /v1/admin/staged-objects/{id}/reject
API-->>Admin: review_state=rejected
Note over API: Object purged, never indexed
else Expired (no review)
Note over API: 72 hours elapsed
API->>API: review_state=expired
Note over API: Object purged, never indexed
end Next steps¶
- Searching Data -- Learn how buyers search across your published datasets.
- Provider Analytics -- Monitor how your datasets are performing in the marketplace.
- Managing API Keys -- Create dedicated keys for upload automation.