Skip to content

Uploading Data

This guide walks through the complete provider upload workflow: creating a dataset, uploading files through presigned URLs, confirming uploads, and monitoring object status through the processing pipeline.

Prerequisites

Before you begin, make sure you have:

  • An authenticated session -- either a session token from signup/login or an API key with datasets:write and objects:write scopes.
  • A dataset created (or you will create one in Step 1 below).
  • One or more files ready to upload.

Step 1: Create a dataset

Datasets are containers for related objects. Each dataset declares a source modality, monetization mode, and visibility setting.

ipto datasets create --name "AP Invoices 2025" \
  --description "Invoice corpus for vendor dispute search" \
  --modality document
curl -X POST https://api.ipto.ai/v1/datasets \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "AP Invoices 2025",
    "description": "Invoice corpus for vendor dispute search",
    "source_modality": "document",
    "monetization_mode": "premium",
    "pricing_model": "demand_curve",
    "visibility": "listed"
  }'
import requests

BASE = "https://api.ipto.ai"
headers = {"Authorization": f"Bearer {token}"}

resp = requests.post(
    f"{BASE}/v1/datasets",
    headers=headers,
    json={
        "name": "AP Invoices 2025",
        "description": "Invoice corpus for vendor dispute search",
        "source_modality": "document",
        "monetization_mode": "premium",
        "pricing_model": "demand_curve",
        "visibility": "listed",
    },
)
resp.raise_for_status()
dataset = resp.json()
dataset_id = dataset["dataset_id"]
print(f"Created dataset: {dataset_id}")
const BASE = "https://api.ipto.ai";
const headers = {
  Authorization: `Bearer ${token}`,
  "Content-Type": "application/json",
};

const dsRes = await fetch(`${BASE}/v1/datasets`, {
  method: "POST",
  headers,
  body: JSON.stringify({
    name: "AP Invoices 2025",
    description: "Invoice corpus for vendor dispute search",
    source_modality: "document",
    monetization_mode: "premium",
    pricing_model: "demand_curve",
    visibility: "listed",
  }),
});
const dataset = await dsRes.json();
const datasetId = dataset.dataset_id;

Response:

{
  "data": {
    "dataset_id": "dset_abc123",
    "status": "draft",
    "created_at": "2026-04-05T10:00:00Z"
  },
  "request_id": "req_001",
  "timestamp": "2026-04-05T10:00:00Z"
}

Step 2: Initiate the upload

Request a presigned upload URL for each file. The API returns a short-lived URL that you use to upload the raw bytes directly to cloud storage.

Using the CLI?

The CLI handles steps 2-4 (initiate, upload, confirm) in a single command:

# Single file
ipto objects upload <dataset_id> ./invoice-0425.pdf

# Bulk upload an entire directory
ipto objects upload <dataset_id> ./invoices/ --recursive

curl -X POST https://api.ipto.ai/v1/datasets/$DATASET_ID/objects/upload \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "filename": "invoice-0425.pdf",
    "content_type": "application/pdf",
    "size_bytes": 812345
  }'
import os

file_path = "invoice-0425.pdf"
file_size = os.path.getsize(file_path)

resp = requests.post(
    f"{BASE}/v1/datasets/{dataset_id}/objects/upload",
    headers=headers,
    json={
        "filename": "invoice-0425.pdf",
        "content_type": "application/pdf",
        "size_bytes": file_size,
    },
)
resp.raise_for_status()
upload = resp.json()["data"]
object_id = upload["object_id"]
upload_url = upload["upload_url"]
print(f"Object ID: {object_id}")
print(f"Upload URL expires at: {upload['expires_at']}")
import { stat } from "fs/promises";

const filePath = "invoice-0425.pdf";
const fileStats = await stat(filePath);

const uploadRes = await fetch(
  `${BASE}/v1/datasets/${datasetId}/objects/upload`,
  {
    method: "POST",
    headers,
    body: JSON.stringify({
      filename: "invoice-0425.pdf",
      content_type: "application/pdf",
      size_bytes: fileStats.size,
    }),
  }
);
const uploadData = (await uploadRes.json()).data;
const objectId = uploadData.object_id;
const uploadUrl = uploadData.upload_url;

Response:

{
  "data": {
    "upload_id": "upl_xyz789",
    "object_id": "obj_def456",
    "blob_id": "blob_aaa111",
    "review_state": "staged",
    "upload_strategy": "single_part",
    "upload_url": "https://storage.example.com/presigned-put-url...",
    "expires_at": "2026-04-05T10:15:00Z"
  },
  "request_id": "req_002",
  "timestamp": "2026-04-05T10:00:01Z"
}

Step 3: Upload to the presigned URL

Send the raw file bytes to the presigned URL with a PUT request. No Authorization header is needed -- the URL itself contains temporary credentials.

curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: application/pdf" \
  --data-binary @invoice-0425.pdf
with open(file_path, "rb") as f:
    put_resp = requests.put(
        upload_url,
        data=f,
        headers={"Content-Type": "application/pdf"},
    )
put_resp.raise_for_status()
print(f"Upload complete: HTTP {put_resp.status_code}")
import { readFile } from "fs/promises";

const fileData = await readFile(filePath);
const putRes = await fetch(uploadUrl, {
  method: "PUT",
  headers: { "Content-Type": "application/pdf" },
  body: fileData,
});

if (!putRes.ok) {
  throw new Error(`Upload failed: ${putRes.status}`);
}
console.log("Upload complete");

Step 4: Confirm the upload

After the file bytes have been uploaded to cloud storage, confirm the upload so the platform can begin processing.

curl -X POST https://api.ipto.ai/v1/objects/$OBJECT_ID/confirm \
  -H "Authorization: Bearer $TOKEN"
resp = requests.post(
    f"{BASE}/v1/objects/{object_id}/confirm",
    headers=headers,
)
resp.raise_for_status()
print(resp.json())
const confirmRes = await fetch(
  `${BASE}/v1/objects/${objectId}/confirm`,
  {
    method: "POST",
    headers: { Authorization: `Bearer ${token}` },
  }
);
const confirmData = await confirmRes.json();
console.log(confirmData);

Response:

{
  "data": {
    "object_id": "obj_def456",
    "status": "uploaded",
    "review_state": "staged"
  },
  "request_id": "req_003",
  "timestamp": "2026-04-05T10:01:00Z"
}

Step 5: Check object status

Poll the object endpoint to track processing progress. The object moves through several statuses as it is normalized, extracted, enriched, and indexed.

ipto objects get <object_id>
curl -X GET https://api.ipto.ai/v1/objects/$OBJECT_ID \
  -H "Authorization: Bearer $TOKEN"
import time

while True:
    resp = requests.get(
        f"{BASE}/v1/objects/{object_id}",
        headers=headers,
    )
    obj = resp.json()["data"]
    print(f"Status: {obj['status']}  Review: {obj['review_state']}")

    if obj["status"] in ("active", "failed"):
        break
    time.sleep(5)
const poll = async () => {
  while (true) {
    const res = await fetch(`${BASE}/v1/objects/${objectId}`, {
      headers: { Authorization: `Bearer ${token}` },
    });
    const obj = (await res.json()).data;
    console.log(`Status: ${obj.status}  Review: ${obj.review_state}`);

    if (obj.status === "active" || obj.status === "failed") {
      break;
    }
    await new Promise((r) => setTimeout(r, 5000));
  }
};
await poll();

Response:

{
  "data": {
    "object_id": "obj_def456",
    "dataset_id": "dset_abc123",
    "status": "active",
    "review_state": "approved",
    "artifact_summary": {
      "plain_text": true,
      "ocr_blocks": true,
      "chunk_embeddings": true
    },
    "latest_job_id": "job_pqr999",
    "created_at": "2026-04-05T10:00:01Z"
  },
  "request_id": "req_004",
  "timestamp": "2026-04-05T10:12:00Z"
}

Handling errors

Expired upload URLs

Presigned URLs have a short TTL (typically 15 minutes). If you receive a 403 or 400 from the storage endpoint, the URL has likely expired.

Fix: Call POST /v1/datasets/{id}/objects/upload again to get a fresh URL. Use the same Idempotency-Key header if you want the server to return the same object ID.

Size mismatch

If the bytes uploaded do not match the size_bytes declared during initiation, confirmation will fail with a 422 unprocessable error.

Fix: Ensure you calculate the file size accurately before initiating the upload. Use os.path.getsize() in Python or fs.stat() in Node.js rather than hard-coding values.

Checksum mismatch

If you provide a checksum_sha256 during upload initiation and the actual file hash does not match, the upload will be rejected.

Fix: Compute the SHA-256 hash of the file before initiating the upload and pass the correct value.


72-hour review window

All newly uploaded objects start with review_state=staged. An IPTO administrator must approve staged objects before they can proceed through the processing pipeline and become searchable. If an object is not reviewed within 48-72 hours, it automatically transitions to review_state=expired and will not be indexed. Contact support if your uploads are not being reviewed in a timely manner.


Complete upload-to-review pipeline

The following diagram shows the full lifecycle of an uploaded object, from initiation through admin review and into the processing pipeline.

sequenceDiagram
    participant Provider
    participant API as IPTO API
    participant Storage as Cloud Storage
    participant Admin as Platform Admin
    participant Pipeline as Processing Pipeline

    Provider->>API: POST /v1/datasets/{id}/objects/upload
    API-->>Provider: upload_url + object_id (review_state=staged)

    Provider->>Storage: PUT upload_url (file bytes)
    Storage-->>Provider: 200 OK

    Provider->>API: POST /v1/objects/{id}/confirm
    API-->>Provider: status=uploaded, review_state=staged

    Note over API,Admin: Object enters review queue

    alt Approved within 72 hours
        Admin->>API: POST /v1/admin/staged-objects/{id}/approve
        API-->>Admin: review_state=approved, job_id
        API->>Pipeline: Enqueue normalization job
        Pipeline->>Pipeline: normalizing -> extracting -> enriching -> indexing
        Pipeline-->>API: status=active
    else Rejected
        Admin->>API: POST /v1/admin/staged-objects/{id}/reject
        API-->>Admin: review_state=rejected
        Note over API: Object purged, never indexed
    else Expired (no review)
        Note over API: 72 hours elapsed
        API->>API: review_state=expired
        Note over API: Object purged, never indexed
    end

Next steps