1000 Genomes Project publishes analysis of completed pilot phase

Produces tool for research into genetic contributors to human disease

Video resource: 1000 Genomes: a new foundation for genetic research

Small genetic differences between individuals help explain why some people have a higher risk than others for
developing illnesses such as diabetes or cancer. Today in the journal Nature, the 1000 Genomes Project, an
international public-private consortium, published the most comprehensive map of these genetic differences, called
variations, estimated to contain approximately 95 per cent of the genetic variation of any person on Earth.

Researchers produced the map using next-generation DNA sequencing technologies to systematically characterise human
genetic variation in 180 people in three pilot studies. Moreover, the full scale-up from the pilots is already under
way, with data already collected from more than 1,000 people.

"The pilot studies of the 1000 Genomes Project laid a critical foundation for studying human
genetic variation," said Richard Durbin, PhD, of the Wellcome Trust Sanger Institute and co-Chair of the
consortium. "These proof-of-principle studies are enabling consortium scientists to create a
comprehensive, publically available map of genetic variation that will ultimately collect sequence from 2,500 people
from multiple populations worldwide and underpin future genetics research."

Genetic variation between people refers to differences in the order of the chemical units - called bases - that make up
DNA in the human genome. These differences can be as small as a single base being replaced by a different one - which
is called a single nucleotide polymorphism (abbreviated SNP) - or as large as whole sections of a chromosome being
duplicated or relocated to another place in the genome. Some of these variations are common in the population and some
are rare. By comparing many individuals to one another and by comparing one population to other populations,
researchers can create a map of all types of genetic variation.

Genomic distribution of SNP density on human chromosomes 1 to 22. The colours show the SNP density, with red indicating higher densities and blue indicating lower densities. Note high rates of SNP variation near the ends of the chromosomes.

The 1000 Genomes Project's aim is to provide a comprehensive public resource that supports researchers aiming to study
all types of genetic variation that might cause human disease. The project's approach goes beyond previous efforts in
capturing and integrating data on all types of variation, and by studying samples from numerous human populations with
informed consent allowing free data release without restriction on use. Already, these data have been used in studies
of the genetic basis for disease.

"By making data from the project freely available to the research community, it is already
impacting research for both rare and common diseases," said David Altshuler, MD, PhD, Deputy Director of the
Broad Institute of Harvard and MIT, and a co-chair of the project. "Biotech companies have
developed genotyping products to test common variants from the project for a role in disease. Every published study
using next-generation sequencing to find rare disease mutations, and those in cancer, used project data to filter out
variants that might obscure their results."

The project has studied populations with European, West African and East Asian ancestry. Using the newest technologies
for sequencing DNA, the project's nine centres sequenced the whole genome of 179 people and the protein-coding genes of
697 people. Each region was sequenced several times, so that more than 4.5 terabases (4.5 million million base letters)
of DNA sequence were collected. A consortium involving academic centres on multiple continents and technology companies
that developed and sell the sequencing equipment carried out the work.

To process these data required many technical and computational innovations, including standardised ways to organise,
store, analyse and share DNA sequencing data. Launched in 2008, the 1000 Genomes Project started with three pilot
projects to develop, evaluate and compare strategies for producing a catalogue of genetic variations. Funded through
numerous mechanisms by foundations and national governments, the 1000 Genome Project will cost some $120 million over
five years, ending in 2012.

" These proof-of-principle studies are enabling consortium scientists to create a comprehensive, publically
available map of genetic variation that will ultimately collect sequence from 2,500 people and underpin future
genetics research. "

Dr Richard Durbin

When the work began, sequencing was very expensive, so the project began with two approaches aimed at increasing
efficiency: One strategy, called 'low-pass', combines partial data from many people; the second, only focused on the
part of the genome that encodes protein-coding genes. By comparing these strategies to 'gold standard' data produced at
great completeness and accuracy, the project was able to show that both the alternative approaches work well and have
complementary strengths. Researchers will use both strategies in the full-scale project because, although sequencing
costs have decreased, it is still relatively expensive.

"We have shown for the first time that a new approach to sequencing - low coverage of many samples
- works efficiently and well," said Gil McVean, PhD, Professor of Statistical Genetics at the University of
Oxford. "This proof-of-principle is now being applied not only in the 1000 Genomes Project, but in
disease research, as well."

The resulting map of human genetic variation includes about 15 million SNPs, 1 million short insertion/deletion
changes, and more than 20,000 structural variations. Many of the genetic variants had previously been identified, but
more than half were new. The project's database contains more than 95 per cent of the currently measurable variants
found in any individual, and continuing work will eventually identify more than 99 per cent of human variants.

Richard Gibbs, PhD, Director of the Human Genome Sequencing Center at the Baylor College of Medicine (one of the
project's sequencing centres) said: "What really excites me about this project is the focus on
identifying variants in the protein-coding genes that have functional consequences. These will be extremely useful for
studies of disease and evolution."

The improved map produced some surprises. For example, the researchers discovered that on average, each person carries
between 250 and 300 genetic changes that would cause a gene to stop working normally, and that each person also carried
between 50 and 100 genetic variations that had previously been associated with an inherited disease. No human carries a
perfect set of genes. Fortunately, because each person carries at least two copies of every gene, individuals likely
remain healthy, even while carrying these defective genes, if the second copy works normally.

In addition to looking at variants that are shared between many people, the researchers also investigated in detail the
genomes of six people: two mother-father-daughter nuclear families. By finding new variants present in the daughter but
not the parents, the team was able to observe the precise rate of mutations in humans, showing that each person has
approximately 60 new mutations that are not in either parent.

With the completion of the pilot phase, the 1000 Genomes Project has moved into full-scale studies in which 2,500
samples from 27 populations will be studied over the next two years. Data from the pilot studies and the full-scale
project are freely available on the project web site, www.1000genomes.org.

Researchers studying specific illnesses, such as heart disease or cancer, use maps of genetic variation to help them
identify genetic changes that may contribute to the illnesses. Over the last five years, the first generation of such
studies (called genome-wide association studies or GWAS) have been based on an earlier map of genetic variation called
the HapMap. Built using older technology, HapMap lacks the completeness and detail of the 1000 Genomes Project.

"The 1000 Genomes Project map fills in the gaps between the HapMap landmarks, helping researchers
identify all candidate genes in a region associated with a disease," said Lisa Brooks, PhD, Program Director for
genetic variation at the National Human Genome Research Institute, a part of the National Institutes of Health.
"Once a disease-associated region of the genome is identified, experimental studies must be done to
identify which variants, genes, and regulatory elements cause the increased disease risk. With the new map, researchers
can just look up all the candidate genes and almost all of the variants in the database, saving them many steps in
finding the causes."

Funding

A full list of funding agencies is available at the Nature website.

Participating Centres

Organizations that committed major support to the project include: 454 Life Sciences, a Roche company, Branford, Conn.;
Life Technologies Corporation, Carlsbad, Calif.; BGI-Shenzhen, Shenzhen, China; Illumina Inc., San Diego; the Max
Planck Institute for Molecular Genetics, Berlin, Germany; the Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK;
and the National Human Genome Research Institute, which supports the work being done by Baylor College of Medicine,
Houston, Texas; the Broad Institute, Cambridge, Mass.; and Washington University, St. Louis, Missouri. Researchers at
many other institutions are also participating in the project including groups in Barbados, Canada, China, Colombia,
Finland, the Gambia, India, Malawi, Pakistan, Peru, Puerto Rico, Spain, the UK, the US, and Vietnam.

Additional information about the project, including a list of all participants and organizations, can be found at
http://www.1000genomes.org/

The National Human Genome Research Institute

The National Human Genome Research Institute is one of 27 institutes and centers at National Institutes of Health, an
agency of the Department of Health and Human Services. NHGRI's Division of Extramural Research supports grants for
research and for training and career development.www.genome.gov

The European Molecular Biology Laboratory

The European Molecular Biology Laboratory is a basic research institute funded by public research monies from 20 member
countries and supports research by approximately 85 independent groups covering the spectrum of molecular
biology.http://www.embl.de

European Bioinformatics Institute (EBI)

European Bioinformatics Institute (EBI) is part of the European Molecular Biology Laboratory (EMBL) and is located on
the Wellcome Trust Genome Campus in Hinxton near Cambridge (UK).http://www.ebi.ac.uk

The Eli and Edythe L. Broad Institute of MIT and Harvard

The Eli and Edythe L. Broad Institute of MIT and Harvard, founded in 2003 by MIT, Harvard and its affiliated hospitals,
and Los Angeles philanthropists Eli and Edythe L. Broad, includes faculty, professional staff and students from
throughout the MIT and Harvard biomedical research communities and beyond, with collaborations spanning over a hundred
private and public institutions in more than 40 countries worldwide.http://www.broadinstitute.org/

The Wellcome Trust Sanger Institute

The Wellcome Trust Sanger Institute, which receives the majority of its funding from the Wellcome Trust, was founded in 1992. The Institute is responsible for the completion of the sequence of approximately one-third of the human genome as well as genomes of model organisms and more than 90 pathogen genomes. In October 2006, new funding was awarded by the Wellcome Trust to exploit the wealth of genome data now available to answer important questions about health and disease.

Websites

The Wellcome Trust

The Wellcome Trust is a global charitable foundation dedicated to achieving extraordinary improvements in human and animal health. We support the brightest minds in biomedical research and the medical humanities. Our breadth of support includes public engagement, education and the application of research to improve health. We are independent of both political and commercial interests.