World Repository of Human Genetics Now Hosted by Amazon

Below:

Next story in Innovation

The U.S. National Institutes of Health announced
Friday (March 30) that it'll be hosting data from its 1,000
Genomes Project for free on Amazon's cloud service. The 1,000
Genomes Project is the world's largest database of human
genetics. It was created to act as a "reference population,"
including people of different ethnicities around the world, and
it captures all the major ways in which humankind varies
genetically. Now that they are hosted on Amazon's servers, the
data in 1000 Genomes will be easier and cheaper for scientists to
obtain and analyze.

"[The Amazon hosting] makes the data available to researchers in
a way that is more useful and that avoids the researcher having
to spend lots of money on storing the data themselves, on their
local systems," Eric Schadt, director of the genomics institute
at the Mount Sinai School of Medicine in New York, wrote to
InnovationNewsDaily in an email. "This is definitely cool."

In spite of its name, the project actually holds genetic
information from 1,700 anonymous people, with 900 more to come
this year. The main difficulty with the database is that it's so
large — 200 terabytes, an amount that would fill 30,000 DVDs. The
information in the database has always been freely available at
1000genomes.org, but before the Amazon hosting deal, scientists
had to pay for the Internet bandwidth and storage space to
download the data, Schadt explained. People who did not have
access to the powerful computers needed to store 1,000 Genome's
data couldn't read the data at all.

Amazon Web Services also offers its superpowered computing
resources to researchers who want to do calculations on the
enormous genetics database. For that, Amazon will charge. The
company charged one pharmaceutical client $1,279 an hour to run
very large calculations, the New York Times'
Bits blog reported. Yet researchers may still find it to be
worth the price. "Many will be willing to bear this cost because
it is far less expensive than buying 500 terabytes of disk
storage and a modest-sized computer cluster to analyze those data
locally," Schadt wrote.

By making this genomics data more accessible and affordable to
researchers, the Amazon deal may ultimately help scientists
predict diseases more reliably, based on a person's genetics,
Schadt wrote.

The deal is a part of a new initiative from the Obama
administration that will invest $200 million to researching
better ways to store, analyze and find interesting points in
extremely large datasets such as 1,000 Genomes.