Research Data Sharing¶
The problem¶
Researchers need to share datasets with collaborators, reviewers, and external partners under controlled conditions. They need audit trails that document who accessed which data and when, access controls that limit visibility to authorized participants, and reproducibility guarantees that ensure the same dataset version is available over time.
Traditional sharing methods -- email, shared drives, institutional repositories -- lack fine-grained access control, have no usage tracking, and make it difficult to enforce data governance policies across institutional boundaries.
The solution¶
IPTO provides a multi-tenant data platform with role-based access, dataset visibility controls, and scoped API keys. Researchers publish datasets with restricted visibility, invite collaborators through tenant memberships or scoped API keys, and maintain a complete audit trail of all access events.
How it works¶
1. Create a restricted dataset¶
The research lead creates a dataset with restricted visibility so that only explicitly authorized users can discover and access it.
{
"name": "Clinical Trial Results - Phase II Longitudinal Study",
"description": "De-identified patient outcomes and biomarker data from 2024 Phase II trials",
"source_modality": "document",
"monetization_mode": "open",
"pricing_model": "fixed",
"visibility": "restricted"
}
2. Upload and review data¶
Researchers upload files through the presigned upload flow. Each upload enters the staged review workflow:
| State | Description |
|---|---|
staged | Object uploaded, awaiting review |
under_review | Administrator is evaluating the object |
approved | Object cleared for indexing and search |
rejected | Object excluded from the dataset |
expired | Review deadline passed without a decision |
Review deadlines
Staged uploads carry a review deadline of 48-72 hours. Objects that are not reviewed within this window automatically expire and are excluded from the dataset.
3. Invite collaborators with scoped API keys¶
The research lead creates API keys for each collaborator or collaborating institution, restricted to the specific datasets they should access.
{
"name": "university-of-X-collab",
"scopes": ["search:query", "datasets:read"],
"dataset_access_mode": "allow_list",
"dataset_ids": ["dset_trial_phase2"]
}
Key access rules:
allow_listmode restricts the key to only the specified datasets- Keys can only narrow tenant-level access, never expand it
- Keys are revocable independently, so access can be removed for a single collaborator without affecting others
4. Collaborators search and retrieve¶
Collaborators use their scoped API keys to search the dataset, retrieve results, and download original files when needed. Every interaction is logged.
5. Monitor access and generate audit reports¶
The research lead monitors all access activity through the buyer activity and spend APIs:
| API | Purpose |
|---|---|
GET /v1/agent/activity/searches | Search history filtered by API key and date range |
GET /v1/agent/activity/accesses | Datasets and objects accessed per API key |
GET /v1/agent/spend | Usage and cost summaries grouped by day, dataset, or API key |
Benefits¶
Why IPTO for research data sharing
- Access control: Restricted visibility ensures datasets are discoverable only by authorized collaborators. Scoped API keys enforce dataset-level allow lists.
- Compliance: Append-only audit events record every upload, search, retrieval, and download with timestamps and principal identifiers. This supports IRB compliance, data use agreements, and institutional review requirements.
- Reproducibility: Immutable objects with SHA-256 checksums and blob deduplication ensure that collaborators access the exact same data version. Citation locators provide page-level and chunk-level references.
- Staged review: The review workflow prevents unvetted data from becoming searchable, supporting quality control and data governance policies.
- Revocable access: Individual API keys can be revoked at any time without affecting other collaborators, making it easy to manage access as team membership changes.
- Cross-institutional sharing: Tenant-scoped isolation means multiple institutions can participate without infrastructure overlap. Each collaborator authenticates with their own API key against a shared dataset.