Amazon and the 1,000 Genomes Project Make DNA Searchable

The 1,000 Genomes Project, the world’s largest database of genetic sequences, is a modern marvel. But in the amount of time it would take to download the database’s 200 terabytes of data, you could conceive and give birth to a baby, and still have time left over to see the little bundle of joy (and DNA) cut its first tooth. Thanks to the Amazon Web Services cloud, however, that endeavor has been shrunk to a few seconds, making access to the massive data set—which could fill 16 million file cabinets or 30,000 standard DVDs—nearly instantaneous.

Amazon teamed up with the National Institutes of Health, and as of this spring the data set of over 1,700 genomes from 26 different populations is available to any scientist, biologist, institution—anyone really—for free, in the hopes that the accessibility of the information will accelerate research into genetic diseases.

The implications are vast, says Don Preuss, staff scientist at NIH. Researchers can use the data to investigate illnesses such as breast cancer, Parkinson’s, and scores of other deadly diseases that have been genetically sequenced.

Not yet sequenced: the genetic predisposition for clicking on every item Amazon recommends.