Gen3 is how data commons are made.

A data commons is a cloud-based software platform for managing, analyzing, harmonizing and sharing large datasets. Gen3 is an open source platform for developing data commons. Data commons accelerate and democratize the process of scientific discovery, especially over large or complex datasets.

Host, Manage, and Share your Data

Gen3 enables you to receive, manage, and share data which is valuable to researchers, developers and health organizations. With Gen3, you can receive data, quality control data, generate globally unique IDs, share datasets with authorized individuals or any authenticated individual, and compute over that data.

Customize Your Gen3 Experience

Multiple Gen3 data commons can work together to create an interoperable ecosystem. Gen3 can be used to power APIs for a “thin middle” of framework services. These framework services provide the foundation on which you and your community can develop new tools for sharing and analyzing data with your group, collaborators, or the broader Gen3 community.

Facilitate Translational Data Science

Use Gen3’s built in tools to find your virtual cohort, and analyze the cohort in notebooks within a cloud environment to hasten your hypotheses testing and discoveries.

Scale Your Large Dataset with Any File Types

Gen3 APIs enable support for importing data from a variaty of platforms including from common clinical to genomic data platforms. Since Gen3 is a growing community, you could repurpose another person’s tools with your data, data commons, or data ecosystem.

Gen3 helps you as...

Bioinformaticians

Organized data allows you to focus on creating unique pipelines and analysis for your projects.

Developers

“From the perspective of setting up a basic Gen3 ecosystem with all the services running, it was a breeze and the Github docs are pretty awesome.”

-
Amit, Cloud Solutions Architect with Leidos

Who's using Gen3?

Federal agencies, not-for-profits, and consortiums with members spanning the globe use Gen3 and its framework services to support their research communities, access and index their data, and facilitate scientific discoveries that impact the world.

Introduction to Gen3

Gen3 is an open source software with Apache 2.0 or similar licenses, colocating compute and storage in a data commons. It is agnostic to the data type and the storage location, requiring minimally, a data model, data, a secure landing page for the portal, and research goals.

Data Access Control on Gen3

Gen3 manages data access via internal access control lists stored in a database. It is capable of pulling authorization information from multiple lists, as well as syncing with external sources such as dbGaP. Gen3 supports both users and groups defined in these access control lists - a user that is associated with a group will inherit that group’s permissions.

Data is either stored in S3/Google buckets or in our graph database. Only users with "read-storage" and "write-storage" permissions will be able to access stored object data (files) within these buckets, and users with "read" and "write" permissions will be able to access data (clinical data and data that describes a file) in the graph database. These policies prevent public access. Gen3 is also capable of issuing presigned urls for authorized users to directly access objects within buckets, and both the generation of these presigned urls as well as the object downloads themselves are logged.

Gen3 also has a Role Based Access Control (RBAC) engine. It can be used to define data access controls and permissions on a more granular level, and can determine if a user is able to access a specific piece of data. Data in our graph database is modeled as a hierarchy that starts with Programs and Projects. A Program is the root of the tree, and represents an overall group for the data. A Program will have Projects underneath it that consist of any subgroups. Projects can then have many subgroups, and so on. Permissions are generally given on the Program level, but with the RBAC engine, Gen3 can provide more specific authorization that applies to nodes further down the tree.

Easy Data Submission

Use one of the microservices or community tools to submit data objects and metadata to a Gen3 Commons. Or develop your own tools specific to your user community.

Easily Find Your Data in an Ecosystem

Gen3 will automatically index your data and provide globally unique identifiers (GUIDs). GUIDs can also be resolved at dataguids.org to find out where a data object lives within your data ecosystem.

Open-Source Community

Engage Gen3’s broad user community. Ask a question, answer a question, request a new feature, or see if anyone else has approached a technical or scientific problem like yours in their Gen3 data commons.

Customizable Options for Data Queries

Gen3’s UI includes a data exploration tool you can customize for your data. You can choose the queries or faceted searches your user community wants; decide whether the data is able to leave the cloud or not; or develop your own apps over Gen3 APIs.

Security & Compliance

Gen3 can be deployed with various levels of security and compliance. Deploy your data commons or ecosystem with the controls needed for your data and your user community.

Control Data Access—Or Not

You can leave your data open to the Internet or control access at deeper levels within your own data use ontology, from the core data to the data objects.