Genomes project publishes inventory of human genetic variation

Scientists have published the full genetic sequences of more than 1,000 people from 14 countries, creating the most complete inventory of the millions of variations between people’s DNA sequences ever assembled. The resource built by the 1,000 Genomes Project will shed light on the genetic roots of complex diseases and suggest ways to treat them – as well as informing studies of human evolution.

The results of the 1,000 Genomes Project are published on Thursday in Nature, and contain the full DNA sequences of 1,092 people drawn from 14 populations around the world, including Europe, the Americas, East Asia and Africa. The pilot results from the project were unveiled in 2010, and the genomes of 179 people published to show that the technology and methods were robust.

The five-year project, which cost around $120m (£75m), is an international collaboration between scientists, charities and companies to map the full diversity of human DNA. It takes advantage of the rising speed and falling costs of sequencing machines.

The first human genome, published in 2003, took more than a decade to complete, but the 1,000 Genomes Project completed the bulk of its sequencing work in less than a year. A genome can nowadays be fully sequenced in just a few days.

The data now available to scientists contains 99% of all genetic variants that occur in the populations studied, down to the level of rare variations that only occur in 1 out of every 100 people. “The whole point of this resource is that we’re moving to a point where individuals are being sequenced in clinical settings and what you want to do there is sift through the variants you find in an individual and interpret them,” said Professor Gil McVean of Oxford University, a lead author for the study.

The information will be pored over by thousands of researchers, who will analyse and interpret the DNA variations between people in a bid to work out which ones are implicated in disease. In addition to the DNA sequences, the 1,000 Genomes Project has stored cell samples from all the people it has sequenced, to allow future scientific projects to look at the biological effect of the DNA variations they might want to study.

One early insight from the project has showed how some of the rarest DNA variants tend to cluster in relatively restricted geographic areas. “Within Europe, we think we’re all pretty similar,” said Prof McVean. “But if you look at the rarest variants – those present at 0.1% frequency around the world – if you find two copies of these mutations, they’re nearly always within the same country. At that kind of level, what you find in the UK is distinct from what you find in Italy, is distinct from what you find in Finland.”

These very rare variants are mutations that tend to do bad things to genes – they prevent a protein-coding gene from functioning by altering its sequence or affect the way in which is is regulated. “It’s these rare mutations that are likely to have the strongest effects, which are also likely to be the most geographically restricted,” said Prof McVeaan. “That’s something we’ve documented for the first time in this work.”

Previous work to build up catalogues of differences in human DNA have involved a technique called the genome-wide association study. Here, scientists look at DNA samples from thousands of patients for a particular disease, and compare their sequence with thousands of control samples from healthy volunteers, looking at hundreds of thousands of genetic differences in each sample. This has given scientists many leads for variations implicated in bipolar disorder, Crohn’s disease, heart disease, type 1 and type 2 diabetes, rheumatoid arthritis and high blood pressure.

In September, scientists published the results of the Encode project, which sequenced the vast areas of the human genome that lie between the 2% that is protein-coding genes, that had once been dismissed as “junk”. The focus in genomic research had largely been on looking for errors within genes themselves, but the combined results of Encode and 1,000 Genomes Project and others will help guide the hunt for problem areas that lie elsewhere in our DNA sequence.

The next phase of the 1,000 Genomes Project will be completed when the scientists have sequenced an additional 1,500 people. “The key thing with 2,500 [genomes] is not to get deeper in the existing regions but to spread the parts of the world in which we can achieve that level of coverage,” said Prof McVean.

“At the moment there’s big holes. In the data release so far there’s nothing from the Indian subcontinent and that’s something we need to fill. Many of the remaining samples [in the next batch] are from there. Likewise we need to do deeper sampling in Africa and that’s an area we will fill in.”

Sir Mark Walport, director of the Wellcome Trust – which part-funded the study – said: “It is quite remarkable that we have gone from completion of the first human genome sequence in 2003 to being able to sequence more than a 1,000 human genomes for a single study in 2012. This study is an important contribution to our understanding of human genetic variation in health and disease and the DNA sequences are freely available for analysis and use by researchers.”