1. Overview
1.1. Version information
Version : 0.0.1
1.2. Contact information
Contact : GA4GH Cloud Work Stream
Contact Email : ga4gh-cloud@ga4gh.org
1.3. License information
License : Apache 2.0
License URL : https://raw.githubusercontent.com/ga4gh/data-repository-service-schemas/master/LICENSE
Terms of service : null
1.4. URI scheme
BasePath : /ga4gh/drs/v1
Schemes : HTTPS, HTTP
1.5. Consumes
-
application/json
1.6. Produces
-
application/json
2. Introduction
The Data Repository Service (DRS) API provides a generic interface to data repositories so data consumers, including workflow systems, can access data in a single, standard way regardless of where it’s stored and how it’s managed. This document describes the DRS API and provides details on the specific endpoints, request formats, and response. It is intended for developers of DRS-compatible services and of clients that will call these DRS services.
The primary functionality of DRS is to map a logical ID to a means for physically retrieving the data represented by the ID. The sections below describe the characteristics of those IDs, the types of data supported, and how the mapping works.
NOTE: this document represents a work in progress towards DRS 1.0.0. It may not be fully in sync
with the OpenAPI schema since both are being worked on. The 0.0.1 release represents the
schema as it existed at the time of the transition from the DOS to DRS name and is subject to
change as we evolve it to a DRS 1.0.0.
3. DRS API Principles
3.1. DRS IDs
Each implementation of DRS can choose its own id scheme, as long as it follows these guidelines:
-
DRS IDs are URL-safe text strings made up of alphanumeric characters and any of [.-_/]
-
One DRS ID MUST always return the same object data (or, in the case of a collection, the same set of objects). This constraint aids with reproducibility.
-
DRS does NOT support semantics around multiple versions of an object. (For example, there’s no notion of “get latest version” or “list all versions” in DRS v1.) Individual implementation MAY choose an ID scheme that includes version hints.
-
DRS implementations MAY have more than one ID that maps to the same object.
3.2. DRS Datatypes
DRS v1 supports two datatypes:
-
Blobs — these are file-like objects
-
Collections — these are sets of other DRS objects (either Blobs or Collections)
3.3. Read-only
DRS v1 is a read-only API. We expect that each implementation will define its own mechanisms and interfaces (graphical and/or programmatic) for adding and updating data.
3.4. URI convention (WORK IN PROGRESS)
For convenience, we define a recommended syntax for fully referencing DRS-accessible objects. Strings of the form drs://<server>/<id> mean “make a DRS call to the HTTP address at <server>, passing in the DRS id <id>, to retrieve the object”. For example, these strings are useful when passing objects to a WES server for processing.
3.5. Standards
The DRS API specification is written in OpenAPI and embodies a RESTful service philosophy. It uses JSON in requests and responses and standard HTTP/HTTPS for information transport.
4. Authorization & Authentication (WORK IN PROGRESS)
Users must supply credentials that establish their identity and authorization in order to use a DRS endpoint. We recommend that DRS implementations use an OAuth2 bearer token, although they can choose other mechanisms if appropriate. DRS callers can use the auth_instructions_url
from the service-info endpoint to learn how to obtain and use a bearer token for a particular implementation.
The DRS implementation is responsible for checking that a user is authorized to submit requests. The particular authorization policy is up to the DRS implementer.
5. Paths
5.1. Create a new Data Bundle
POST /bundles
5.1.1. Parameters
Type | Name | Schema |
---|---|---|
Body |
body |
5.1.2. Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
The Data Bundle was successfully created. |
|
400 |
The request is malformed. |
|
401 |
The request is unauthorized. |
|
403 |
The requester is not authorized to perform this action. |
|
500 |
An unexpected error occurred. |
5.1.3. Tags
-
DataRepositoryService
5.2. List the Data Bundles
GET /bundles
5.2.1. Parameters
Type | Name | Description | Schema |
---|---|---|---|
Query |
alias |
If provided returns Data Bundles that have any alias that matches the |
string |
Query |
checksum |
The hexlified checksum that one would like to match on. |
string |
Query |
checksum_type |
If provided will restrict responses to those that match the provided |
string |
Query |
page_size |
Specifies the maximum number of results to return in a single page. |
integer (int32) |
Query |
page_token |
The continuation token, which is used to page through large result sets. |
string |
5.2.2. Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Successfully listed Data Bundles. |
|
400 |
The request is malformed. |
|
401 |
The request is unauthorized. |
|
403 |
The requester is not authorized to perform this action. |
|
500 |
An unexpected error occurred. |
5.2.3. Tags
-
DataRepositoryService
5.3. Retrieve a Data Bundle
GET /bundles/{bundle_id}
5.3.1. Parameters
Type | Name | Description | Schema |
---|---|---|---|
Path |
bundle_id |
string |
|
Query |
version |
If provided will return the requested version of the selected Data Bundle. |
string |
5.3.2. Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Successfully found the Data Bundle. |
|
400 |
The request is malformed. |
|
401 |
The request is unauthorized. |
|
403 |
The requester is not authorized to perform this action. |
|
404 |
The requested Data Bundle wasn’t found. |
|
500 |
An unexpected error occurred. |
5.3.3. Tags
-
DataRepositoryService
5.4. Update a Data Bundle
PUT /bundles/{bundle_id}
5.4.1. Parameters
Type | Name | Description | Schema |
---|---|---|---|
Path |
bundle_id |
The ID of the Data Bundle to update |
string |
Body |
body |
The new content for the Data Bundle identified by the given bundle_id. If the ID specified in the request body is different than that specified in the path, the Data Bundle’s ID will be replaced with the one in the request body. |
5.4.2. Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
The Data Bundle was updated successfully. |
|
400 |
The request is malformed. |
|
401 |
The request is unauthorized. |
|
403 |
The requester is not authorized to perform this action. |
|
404 |
The requested Data Bundle wasn’t found. |
|
500 |
An unexpected error occurred. |
5.4.3. Tags
-
DataRepositoryService
5.5. Delete a Data Bundle
DELETE /bundles/{bundle_id}
5.5.1. Parameters
Type | Name | Schema |
---|---|---|
Path |
bundle_id |
string |
5.5.2. Responses
HTTP Code | Schema |
---|---|
200 |
5.5.3. Tags
-
DataRepositoryService
5.6. Retrieve all versions of a Data Bundle
GET /bundles/{bundle_id}/versions
5.6.1. Parameters
Type | Name | Schema |
---|---|---|
Path |
bundle_id |
string |
5.6.2. Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
The versions for the Data Bundle were found successfully. |
|
400 |
The request is malformed. |
|
401 |
The request is unauthorized. |
|
403 |
The requester is not authorized to perform this action. |
|
404 |
The requested Data Bundle wasn’t found. |
|
500 |
An unexpected error occurred. |
5.6.3. Tags
-
DataRepositoryService
5.7. Make a new Data Object
POST /objects
5.7.1. Parameters
Type | Name | Description | Schema |
---|---|---|---|
Body |
body |
The Data Object to be created. The ID scheme is left up to the |
5.7.2. Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Successfully created the Data Object. |
|
400 |
The request is malformed. |
|
401 |
The request is unauthorized. |
|
403 |
The requester is not authorized to perform this action. |
|
500 |
An unexpected error occurred. |
5.7.3. Tags
-
DataRepositoryService
5.8. List the Data Objects
GET /objects
5.8.1. Parameters
Type | Name | Description | Schema |
---|---|---|---|
Query |
alias |
If provided will only return Data Objects with the given alias. |
string |
Query |
checksum |
The hexlified checksum that one would like to match on. |
string |
Query |
checksum_type |
If provided will restrict responses to those that match the provided |
string |
Query |
page_size |
Specifies the maximum number of results to return in a single page. |
integer (int32) |
Query |
page_token |
The continuation token, which is used to page through large result sets. |
string |
Query |
url |
If provided will return only Data Objects with a that URL matches |
string |
5.8.2. Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
The Data Objects were listed successfully. |
|
400 |
The request is malformed. |
|
401 |
The request is unauthorized. |
|
403 |
The requester is not authorized to perform this action. |
|
500 |
An unexpected error occurred. |
5.8.3. Tags
-
DataRepositoryService
5.9. Retrieve a Data Object
GET /objects/{object_id}
5.9.1. Parameters
Type | Name | Description | Schema |
---|---|---|---|
Path |
object_id |
string |
|
Query |
version |
If provided will return the requested version of the selected Data Object. |
string |
5.9.2. Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
The Data Object was found successfully. |
|
400 |
The request is malformed. |
|
401 |
The request is unauthorized. |
|
403 |
The requester is not authorized to perform this action. |
|
404 |
The requested Data Object wasn’t found |
|
500 |
An unexpected error occurred. |
5.9.3. Tags
-
DataRepositoryService
5.10. Update a Data Object
PUT /objects/{object_id}
5.10.1. Parameters
Type | Name | Description | Schema |
---|---|---|---|
Path |
object_id |
The ID of the Data Object to update |
string |
Body |
body |
The new Data Object for the given object_id. If the ID specified in the request body is different than that specified in the path, the Data Object’s ID will be replaced with the one in the request body. |
5.10.2. Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
The Data Object was successfully updated. |
|
400 |
The request is malformed. |
|
401 |
The request is unauthorized. |
|
403 |
The requester is not authorized to perform this action. |
|
404 |
The requested Data Object wasn’t found. |
|
500 |
An unexpected error occurred. |
5.10.3. Tags
-
DataRepositoryService
5.11. Delete a Data Object index entry
DELETE /objects/{object_id}
5.11.1. Parameters
Type | Name | Schema |
---|---|---|
Path |
object_id |
string |
5.11.2. Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
The Data Object was deleted successfully. |
|
400 |
The request is malformed. |
|
401 |
The request is unauthorized. |
|
403 |
The requester is not authorized to perform this action. |
|
404 |
The requested Data Object wasn’t found. |
|
500 |
An unexpected error occurred. |
5.11.3. Tags
-
DataRepositoryService
5.12. Retrieve all versions of a Data Object
GET /objects/{object_id}/versions
5.12.1. Parameters
Type | Name | Schema |
---|---|---|
Path |
object_id |
string |
5.12.2. Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
The versions for the Data Object were returned successfully. |
|
400 |
The request is malformed. |
|
401 |
The request is unauthorized. |
|
403 |
The requester is not authorized to perform this action. |
|
404 |
The requested Data Object wasn’t found. |
|
500 |
An unexpected error occurred. |
5.12.3. Tags
-
DataRepositoryService
5.13. Returns service version and other information
GET /service-info
5.13.1. Responses
HTTP Code | Description | Schema |
---|---|---|
200 |
Service information returned successfully |
5.13.2. Tags
-
DataRepositoryService
6. Definitions
6.1. AuthorizationMetadata
OPTIONAL
A set of key-value pairs that represent sufficient metadata to be granted
access to a resource. It may be helpful to provide details about a specific
provider, for example.
Name | Description | Schema |
---|---|---|
auth_type |
The auth standard being used to make data available. For example, 'OAuth2.0'. |
string |
auth_url |
The URL where the auth service is located, for example, a URL to get an OAuth |
string |
6.2. Bundle
Name | Description | Schema |
---|---|---|
aliases |
A list of strings that can be used to identify this Data Bundle. |
< string > array |
checksums |
At least one checksum must be provided. |
< Checksum > array |
created |
Timestamp of object creation in RFC3339. |
string (date-time) |
description |
A human readable description. |
string |
id |
An identifier, unique to this Data Bundle |
string |
object_ids |
The list of Data Objects that this Data Bundle contains. |
< string > array |
system_metadata |
||
updated |
Timestamp of update in RFC3339, identical to create timestamp in systems |
string (date-time) |
user_metadata |
||
version |
A string representing a version, some systems may use checksum, a RFC3339 |
string |
6.3. Checksum
Name | Description | Schema |
---|---|---|
checksum |
The hex-string encoded checksum for the Data. |
string |
type |
The digest method used to create the checksum. If left unspecified md5 |
string |
6.4. CreateBundleRequest
Name | Schema |
---|---|
bundle |
6.5. CreateBundleResponse
Name | Description | Schema |
---|---|---|
bundle_id |
The identifier of the Data Bundle created. |
string |
6.6. CreateObjectRequest
The Data Object one would like to index. One must provide any aliases
and URLs to this file when sending the CreateObjectRequest. It is up
to implementations to validate that the Data Object is available from
the provided URLs.
Name | Schema |
---|---|
object |
6.7. CreateObjectResponse
Name | Description | Schema |
---|---|---|
object_id |
The ID of the created Data Object. |
string |
6.8. DeleteBundleResponse
Name | Schema |
---|---|
bundle_id |
string |
6.9. DeleteObjectResponse
Name | Description | Schema |
---|---|---|
object_id |
The identifier of the Data Object deleted. |
string |
6.10. ErrorResponse
An object that can optionally include information about the error.
Name | Description | Schema |
---|---|---|
msg |
A detailed error message. |
string |
status_code |
The integer representing the HTTP status code (e.g. 200, 404). |
integer |
6.11. GetBundleResponse
Name | Schema |
---|---|
bundle |
6.12. GetBundleVersionsResponse
Name | Description | Schema |
---|---|---|
bundles |
All versions of the Data Bundles that match the GetBundleVersions |
< Bundle > array |
6.13. GetObjectResponse
Name | Schema |
---|---|
object |
6.14. GetObjectVersionsResponse
Name | Description | Schema |
---|---|---|
objects |
All versions of the Data Objects that match the GetObjectVersions |
< Object > array |
6.15. ListBundlesRequest
Only return Data Bundles that match all of the request parameters. A
page_size and page_token are provided for retrieving a large number of
results.
Name | Description | Schema |
---|---|---|
alias |
If provided returns Data Bundles that have any alias that matches the |
string |
checksum |
The hexlified checksum that one would like to match on. |
string |
checksum_type |
If provided will restrict responses to those that match the provided |
string |
page_size |
Specifies the maximum number of results to return in a single page. |
integer (int32) |
page_token |
The continuation token, which is used to page through large result sets. |
string |
6.16. ListBundlesResponse
A list of Data Bundles matching the request parameters and a continuation
token that can be used to retrieve more results.
Name | Description | Schema |
---|---|---|
bundles |
The list of Data Bundles. |
< Bundle > array |
next_page_token |
The continuation token, which is used to page through large result sets. |
string |
6.17. ListObjectsRequest
Allows a requester to list and filter Data Objects. Only Data Objects
matching all of the requested parameters will be returned.
Name | Description | Schema |
---|---|---|
alias |
If provided will only return Data Objects with the given alias. |
string |
checksum |
The hexlified checksum that one would like to match on. |
string |
checksum_type |
If provided will restrict responses to those that match the provided |
string |
page_size |
Specifies the maximum number of results to return in a single page. |
integer (int32) |
page_token |
The continuation token, which is used to page through large result sets. |
string |
url |
If provided will return only Data Objects with a that URL matches |
string |
6.18. ListObjectsResponse
A list of Data Objects matching the requested parameters, and a paging
token, that can be used to retrieve more results.
Name | Description | Schema |
---|---|---|
next_page_token |
The continuation token, which is used to page through large result sets. |
string |
objects |
The list of Data Objects. |
< Object > array |
6.19. Object
Name | Description | Schema |
---|---|---|
aliases |
A list of strings that can be used to find this Data Object. |
< string > array |
checksums |
The checksum of the Data Object. At least one checksum must be provided. |
< Checksum > array |
created |
Timestamp of object creation in RFC3339. |
string (date-time) |
description |
A human readable description of the contents of the Data Object. |
string |
id |
An identifier unique to this Data Object. |
string |
mime_type |
A string providing the mime-type of the Data Object. |
string |
name |
A string that can be optionally used to name a Data Object. |
string |
size |
The computed size in bytes. |
string (int64) |
updated |
Timestamp of update in RFC3339, identical to create timestamp in systems |
string (date-time) |
urls |
The list of URLs that can be used to access the Data Object. |
< URL > array |
version |
A string representing a version. |
string |
6.20. ServiceInfoResponse
Placeholder for the Info Object
Name | Description | Schema |
---|---|---|
contact |
Maintainer contact info |
object |
description |
Service description |
string |
license |
License information for the exposed API |
object |
title |
Service name |
string |
version |
Service version |
string |
6.21. SystemMetadata
OPTIONAL
These values are reported by the underlying object store.
A set of key-value pairs that represent system metadata about the object.
Type : object
6.22. URL
Name | Description | Schema |
---|---|---|
authorization_metadata |
||
system_metadata |
||
url |
A URL that can be used to access the file. |
string |
user_metadata |
6.23. UpdateBundleRequest
Name | Schema |
---|---|
bundle |
6.24. UpdateBundleResponse
Name | Description | Schema |
---|---|---|
bundle_id |
The identifier of the Data Bundle updated. |
string |
6.25. UpdateObjectRequest
Name | Schema |
---|---|
object |
6.26. UpdateObjectResponse
Name | Description | Schema |
---|---|---|
object_id |
The identifier of the Data Object updated. |
string |
6.27. UserMetadata
OPTIONAL
A set of key-value pairs that represent metadata provided by the uploader.
Type : object
7. Appendix: Motivation
Data sharing requires portable data, consistent with the FAIR data principles (findable, accessible, interoperable, reusable). Today’s researchers and clinicians are surrounded by potentially useful data, but often need bespoke tools and processes to work with each dataset. And today’s data publishers don’t have a reliable way to make their data useful to all (and only) the people they choose. |
![]() Figure 1: there’s an ocean of data, with many different tools to drink from it, but no guarantee that any tool will work with any subset of the data |
We need a standard way for data producers to make their data available to data consumers, that supports the control needs of the former and the access needs of the latter. And we need it to be interoperable, so anyone who builds access tools and systems can be confident they’ll work with all the data out there, and anyone who publishes data can be confident it will work with all the tools out there. |
![]() Figure 2: by defining a standard Data Repository API, and adapting tools to use it, every data publisher can now make their data useful to every data consumer |
We envision a world where:
|
![]() Figure 3: a standard Data Repository API enables an ecosystem of data producers and consumers |
This spec defines a standard Data Repository Service (DRS) API (“the yellow box”), to enable that ecosystem of data producers and consumers. Our goal is that all data consumers need to know about a data repo is "here’s the DRS endpoint to access it", and all data publishers need to know about tapping into the world of consumption tools is "here’s how to tell it where my DRS endpoint lives".
7.1. Federation
The world’s biomedical data is controlled by groups with very different policies and restrictions on where their data lives and how it can be accessed. A primary purpose of DRS is to support unified access to disparate and distributed data. (As opposed to the alternative centralized model of "let’s just bring all the data into one single data repository”, which would be technically easier but is no more realistic than “let’s just bring all the websites into one single web host”.)
In a DRS-enabled world, tool builders don’t have to worry about where the data their tools operate on lives — they can count on DRS to give them access. And tool users only need to know which DRS server is managing the data they need, and whether they have permissions; they don’t have to worry about how to physically get access to, or (worse) make a copy of the data. For example, if I have appropriate permissions, I can run a pooled analysis where I run a single tool across data managed by different DRS servers, potentially in different locations.