Skip to content

Research Data Sharing

The problem

Researchers need to share datasets with collaborators, reviewers, and external partners under controlled conditions. They need audit trails that document who accessed which data and when, access controls that limit visibility to authorized participants, and reproducibility guarantees that ensure the same dataset version is available over time.

Traditional sharing methods -- email, shared drives, institutional repositories -- lack fine-grained access control, have no usage tracking, and make it difficult to enforce data governance policies across institutional boundaries.

The solution

IPTO provides a multi-tenant data platform with role-based access, dataset visibility controls, and scoped API keys. Researchers publish datasets with restricted visibility, invite collaborators through tenant memberships or scoped API keys, and maintain a complete audit trail of all access events.

How it works

1. Create a restricted dataset

The research lead creates a dataset with restricted visibility so that only explicitly authorized users can discover and access it.

{
  "name": "Clinical Trial Results - Phase II Longitudinal Study",
  "description": "De-identified patient outcomes and biomarker data from 2024 Phase II trials",
  "source_modality": "document",
  "monetization_mode": "open",
  "pricing_model": "fixed",
  "visibility": "restricted"
}

2. Upload and review data

Researchers upload files through the presigned upload flow. Each upload enters the staged review workflow:

State Description
staged Object uploaded, awaiting review
under_review Administrator is evaluating the object
approved Object cleared for indexing and search
rejected Object excluded from the dataset
expired Review deadline passed without a decision

Review deadlines

Staged uploads carry a review deadline of 48-72 hours. Objects that are not reviewed within this window automatically expire and are excluded from the dataset.

3. Invite collaborators with scoped API keys

The research lead creates API keys for each collaborator or collaborating institution, restricted to the specific datasets they should access.

{
  "name": "university-of-X-collab",
  "scopes": ["search:query", "datasets:read"],
  "dataset_access_mode": "allow_list",
  "dataset_ids": ["dset_trial_phase2"]
}

Key access rules:

  • allow_list mode restricts the key to only the specified datasets
  • Keys can only narrow tenant-level access, never expand it
  • Keys are revocable independently, so access can be removed for a single collaborator without affecting others

4. Collaborators search and retrieve

Collaborators use their scoped API keys to search the dataset, retrieve results, and download original files when needed. Every interaction is logged.

5. Monitor access and generate audit reports

The research lead monitors all access activity through the buyer activity and spend APIs:

API Purpose
GET /v1/agent/activity/searches Search history filtered by API key and date range
GET /v1/agent/activity/accesses Datasets and objects accessed per API key
GET /v1/agent/spend Usage and cost summaries grouped by day, dataset, or API key

Benefits

Why IPTO for research data sharing

  • Access control: Restricted visibility ensures datasets are discoverable only by authorized collaborators. Scoped API keys enforce dataset-level allow lists.
  • Compliance: Append-only audit events record every upload, search, retrieval, and download with timestamps and principal identifiers. This supports IRB compliance, data use agreements, and institutional review requirements.
  • Reproducibility: Immutable objects with SHA-256 checksums and blob deduplication ensure that collaborators access the exact same data version. Citation locators provide page-level and chunk-level references.
  • Staged review: The review workflow prevents unvetted data from becoming searchable, supporting quality control and data governance policies.
  • Revocable access: Individual API keys can be revoked at any time without affecting other collaborators, making it easy to manage access as team membership changes.
  • Cross-institutional sharing: Tenant-scoped isolation means multiple institutions can participate without infrastructure overlap. Each collaborator authenticates with their own API key against a shared dataset.