International Consortium Launches Genetic Variation Mapping Project

HapMap Will Help Identify Genetic Contributions to Common Diseases

October 2002

WASHINGTON - An international research consortium today launched an approximately $100 million public-private effort to create the next generation map of the human genome. Called the International HapMap Project, this new venture is aimed at speeding the discovery of genes related to common illnesses such as asthma, cancer, diabetes and heart disease.

Expected to take three years to complete, the HapMap will chart genetic variation within the human genome. By comparing genetic differences among individuals, consortium members believe they can create a tool to help researchers detect the genetic contributions to many diseases. Where the Human Genome Project provided the foundation on which researchers are making dramatic genetic discoveries, the HapMap will begin to make the results of genomic research applicable to individuals.

"The HapMap promises to accelerate medical research around the globe in many different ways," said Yusuke Nakamura, M.D., Ph.D., director of the University of Tokyo's Human Genome Center, as well as leader of the RIKEN SNP Center and the Japanese group working on the HapMap. "Not only will it lead to the identification of genes related to disease, it should help to pinpoint genes that influence how individuals react to various medications - discoveries that could improve drug design and lead to the development of diagnostic tools aimed at preventing adverse drug reactions."

To create the HapMap, DNA will be taken from blood samples collected by researchers in Nigeria, Japan, China and the United States. Initially, researchers will work with samples from between 200 and 400 people in widely distributed geographic regions. Samples will be collected from the Yorubas in Nigeria, Japanese, Han Chinese and U.S. residents with ancestry from northern and western Europe. A very careful sampling strategy has been developed to ensure that participants can give full informed consent. No medical or personal identifying information will be obtained from the people providing the samples. The samples, however, will be identified by the population from which they were collected.

"Studies like this must be done as ethically and transparently as we can," said Ellen Wright Clayton, M.D., J.D., of Vanderbilt University, who is chair of the group that is addressing the project's ethical and social issues. "For the HapMap project, we have devoted a lot of effort to achieving both these goals in order to do truly responsible science."

The samples will be processed and then stored at the Coriell Institute for Medical Research in Camden, N.J., a non-profit biomedical research center that specializes in storing living cells and making them available to scientists for further study.

Researchers from academic centers, non-profit biomedical research groups and private companies in Japan, the United Kingdom, Canada, China and the United States will analyze the samples to create the HapMap. The results will be made quickly and freely available on the Internet in keeping with the data release approach of the Human Genome Project.

Public funding for the effort will be provided by the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) in Tokyo; Genome Canada in Ottawa and Genome Quebec in Montreal; the Chinese Academy of Sciences, the Chinese Ministry of Science and Technology, and the Natural Science Foundation of China, all in Beijing; and the U.S. National Institutes of Health (NIH) in Bethesda, Md. The SNP Consortium (TSC) in Deerfield, Ill., will coordinate private funding, while The Wellcome Trust in London will provide charitable funding for the United Kingdom portion of the project.

Understanding Variation

The International HapMap Project builds on the freely available sequence of the human genome produced by the International Human Genome Sequencing Consortium. Although research shows that any two people are 99.9 percent identical at the genetic level, the 0.1 percent difference is important because it helps explain why one person is more susceptible to a specific disease - say diabetes - than someone who is less susceptible. By studying the patterns of these genetic differences, or genetic variation, in many people, researchers expect to identify which differences are related to disease.

"The goal of studying the human genome has always been to provide health benefits to all humankind. This project should be seen in that grand tradition," said Francis S. Collins, M.D., Ph.D., director of the National Human Genome Research Institute, which is part of NIH, U.S. Department of Health and Human Services. "The HapMap will provide a powerful tool to help us take the next quantum leap toward understanding the fundamental contribution that genes make to common illnesses like cancer, diabetes and mental illness."

Genetic information is physically inscribed in a linear molecule called deoxyribonucleic acid (DNA). DNA is composed of four chemicals, called bases, which are represented by the four letters of the genetic code: A, T, C and G. The Human Genome Project determined the order, or sequence, of the 3 billion A's, T's, C's and G's that make up the human genome. The order of genetic letters is as important to the proper functioning of the body as the order of letters in a word is to understanding its meaning. When a letter in a word changes, the word's meaning can be lost or altered. Variation in a DNA base sequence - when one genetic letter is replaced by another - may similarly change the meaning.

More than 2.8 million examples of these substitutions of genetic letters - called single nucleotide polymorphisms or SNPs (pronounced snips) - are already known and described in a public database called dbSNP, operated by NIH. The major source of this public SNP catalog was work done by The SNP Consortium (TSC), a collaborative genomics effort of major pharmaceutical companies, the Wellcome Trust and academic centers.

The human genome is thought to contain at least 10 million SNPs, about one in every 300 bases. Theoretically, researchers could hunt for genes using a map listing all 10 million SNPs, but there are major practical drawbacks to that approach.

Instead, the HapMap will find the chunks into which the genome is organized, each of which may contain dozens of SNPs. Researchers then only need to detect a few tag SNPs to identify that unique chunk or block of genome and to know all of the SNPs associated with that one piece. This strategy works because genetic variation among individuals is organized in "DNA neighborhoods," called haplotype blocks. SNP variants that lie close to each other along the DNA molecule form a haplotype block and tend to be inherited together. SNP variants that are far from each other along the DNA molecule tend to be in different haplotype blocks and are less likely to be inherited together.

"Essentially, the HapMap is a very powerful shortcut that represents enormous long-term savings in studies of complex disease," said David Bentley, Ph.D., of the UK's Wellcome Trust Sanger Institute.

Since all humans descended from a common set of ancestors that lived in Africa about 100,000 years ago, there have been relatively few generations in human history compared to older species. As a result, the human haplotype blocks have remained largely intact and provide an unbroken thread that connects all people to a common past and to each other. Recent research indicates that about 65 to 85 percent of the human genome may be organized into haplotype blocks that are 10,000 bases or larger.

The exact pattern of SNP variants within a given haplotype block differs among individuals. Some SNP variants and haplotype patterns are found in some people in just a few populations. However, most populations share common SNP variants and haplotype patterns, most of which were inherited from the common ancestor population. Frequencies of these SNP variants and haplotype patterns may be similar or different among populations. For example, the gene for blood type is variable in all human populations, but some populations have higher frequencies of one blood type, such as O, while others have higher frequencies of another, such as AB. For this reason, the HapMap consortium needs to include samples from a few geographically separated populations to find the SNP variants that are common in any of the populations.

Charles Rotimi, Ph.D., leader of the Howard University group collecting the blood samples in Nigeria, said, "We need to be inclusive in the populations that we study to maximize the chance that all people will eventually benefit from this international research effort."

Because of the block pattern of haplotypes, it will be possible to identify just a few SNP variants in each block to uniquely mark, or tag, that haplotype. As a result, researchers will need to study only about 300,000 to 600,000 tag SNPs, out of the 10,000,000 SNPs that exist, to efficiently identify the haplotypes in the human genome. It is the haplotype blocks, and the tag SNPs that identify them, that will form the HapMap.

Haplotypes and Health

To date, most of the known disease-causing genetic variations have been for relatively rare disorders, such as Huntington's disease and cystic fibrosis. These diseases are caused by variants in single genes that tend to have a big impact on health, making the genetic contributions to the illnesses relatively easy to find using current methods that rely on gathering family information, or pedigrees.

Researchers face a much tougher challenge when it comes to uncovering the genetic contributors to more common diseases, such as Alzheimer's disease, arthritis, cancer, diabetes, schizophrenia and stroke. These disorders are caused by many genetic variants that individually have a relatively weak contribution to the disorder, but together can increase the risk of illness. Environmental and other non-genetic factors also contribute to the disease process, making it even harder to find the genetic factors.

Researchers emphasize that the HapMap is not meant to minimize the role of environmental factors in disease development. "In fact, studying genetic factors may greatly increase the likelihood of our understanding the environmental contribution to illness, since these influences often interact," said Thomas Hudson, M.D., leader of the HapMap group at McGill University in Canada.

Once the HapMap is constructed, researchers around the globe will use it to study the genetic risk factors underlying a wide range of diseases and conditions. For any given disease, researchers would use the HapMap tag SNPs to compare the haplotype patterns of a group of people known to have the disease to a group of people without the disease, a method known as an association study. If the association study finds a certain haplotype more often in the people with the disease, researchers would then zero in on that genomic region in their search for the specific genetic variant. The tag SNPs would serve as signposts indicating that a genetic variant involved in the disease may lie nearby.

"Even with the human sequence in hand, linking small changes in the genome to changes in health is tedious work," said Huanming Yang, Ph.D., director of the Beijing Genomics Institute and coordinator of The China HapMap Consortium. "The HapMap project will create a powerful tool for linking differences in the genome to differences in health, including increased risk for common illnesses."

Mapping an individual patient's haplotypes also may be used in the future to help customize medical treatment. Genetic variation has been shown to affect the response of patients to drugs, toxic substances and other environmental factors. Some already envision an era in which drug treatment is customized, based on the patient's haplotypes, to maximize the effectiveness of the drug while minimizing side effects.

In addition, the HapMap may eventually help pinpoint genetic variations that may contribute to good health, such as those protecting against infectious diseases or promoting longevity.

Technology and Cooperation

Carrying out such a complex project will depend on the application of robust technologies to analyze individual SNP variants. The technologies must be capable of high throughput, high quality and low cost. Different groups will be using different technologies, providing the scientific community a chance to test which approaches work best. That experience is likely to speed the process of technology development, so that once the HapMap is available, the tools to use it will be much better developed.

In addition to its pioneering approach towards developing the HapMap and related technologies, the international consortium continues the strategy of pulling together a wide range of public and private partners from around the globe to both conduct and fund the research.

TSC chairman Arthur Holden said, "We are very positive about the chance to work collaboratively with the HapMap effort to support the informatic aspects of the program, as well as to ensure that the resulting HapMap will be useful in both disease and pharmacogenomic research."