What's in the Cancer Genome Atlas?

The Cancer Genome Atlas

The Cancer Genome Atlas (TCGA) is the world’s largest and richest collection of genomic data. It is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer.

There are around 200 types of cancer, each characterized by molecular changes of the genome. These changes can be identified if:

Large amounts of genomic data are available to analyze;

All the data is easily accessible;

Researchers collaborate to develop bioinformatic and mathematical tools; and

TCGA currently contains more than 2.5 petabytes of publicly available data, which have contributed to hundreds of cutting-edge cancer studies.

What’s inside?

TCGA contains data on 33 different cancer types from 11,328 patients. These cancer types were chosen because of their poor prognosis and availability of samples. Matched tumor–normal tissue samples are molecularly characterized to identify genomic alterations.

The number of cases for each cancer type (denoted by 2–4-letter codes) present in TCGA, viewed using the Cancer Genomics Cloud.

Accessing the data

TCGA data is available in two tiers:

Open Access: Public data not unique to an individual and not requiring user certification. These data include deidentified clinical and demographic data, gene expression, copy number alterations, epigenetic data, compiled summaries, and anonymized single amplicon DNA sequence data.

TCGA data is available to download, but it would take several weeks to download all the data with standard internet speeds. Once downloaded, the data then costs a substantial amount to store and requires powerful computational resources to explore, manipulate, and analyze.

An alternative data access model takes advantage of the capabilities of the cloud to support researchers working with TCGA data. The Seven Bridges Cancer Genomics Cloud (CGC; an NCI Cancer Genomics Cloud pilot) hosts petabytes of TCGA data alongside hundreds of bioinformatic tools, and gives on-demand access to thousands of CPU cores for analysis.

Researchers can register at www.cancergenomicscloud.org to instantly explore and work with TCGA, the world’s most-complete public cancer genomic dataset.

The Cancer Genomics Cloud is now open to all researchers. More than $1,000,000 is available in computation and storage credit to support innovative research.
start work now