It’s an unprecedented number of participants for this type of study, says Ruth March, vice-president and head of personalized health care and biomarkers at AstraZeneca, which is headquartered in London. “That’s necessary because we’re going to be looking for very rare differences among individuals.”

To achieve that ambitious goal, AstraZeneca will partner with research institutions including the Wellcome Trust Sanger Institute in Hinxton, UK, and Human Longevity, a biotechnology company founded in San Diego, California, by genomics pioneer Craig Venter. AstraZeneca also expects to draw on data from 500,000 participants in its own clinical trials, and medical samples that it has accrued over the past 15 years.

In doing so, AstraZeneca will be following a burgeoning trend in genetics research. For years, geneticists pursued common variations in human DNA sequences that are linked to complex diseases such as diabetes and heart disease. The approach yielded some important insights, but these common variations often accounted for only a small percentage of the genetic contribution to individual diseases.

Researchers are now increasingly focusing on the contribution of unusual genetic variants to disease. Combinations of these variants can hold the key to an individual’s traits, says Venter.

The hunt for important rare variants has led AstraZeneca to partner with the Institute for Molecular Medicine Finland, says Aarno Palotie, who heads the Human Genomics Program there. Finland’s population was geographically isolated until recently, he notes, which makes for a unique genetic make-up. As a result, some variations that are very rare in other populations may be more common in Finland, making them easier to detect and study.

AstraZeneca did not disclose exactly how much it would be investing in the project — “hundreds of millions of dollars” over the course of ten years was all that Menelas Pangalos, executive vice-president of the company’s innovative medicines program, would say. The company intends to use the data to inform drug development in all of its major disease areas, from diabetes to inflammation to cancer, says March.

The project should generate about 5 petabytes of data. “If you put 5 petabytes on DVDs, it would be four times the height of the Shard,” said Pangalos, referring to a nearly 310-meter London skyscraper. “If you wanted to put it on your iPod, it would take about 5,000 years to listen to it all.”

Much of that data will come from Human Longevity. The company, which ultimately hopes to accrue 10 million human genomes, already has 26,000 completed and paired with medical records. Its databases also contain additional partial genome sequences. “We’re adding one about every 15 minutes on average,” Venter says.

Using DNA sequence alone, Venter says that his company can now predict a person’s height, weight, eye color and hair color, and produce an approximate picture of their face. Much of that detail is lurking in rare sequence variations, says Venter, whose own genome has been in public databases for more than a decade.