Pioneering Study Compares 13 Vertebrate Genomes

Multi-Species Approach Provides Unprecedented Glimpse Into Function and Evolution of the Human Genome

BETHESDA, Md., Aug. 14, 2003 - In one of the most novel and extensive comparisons of vertebrate genomic sequences performed to date, a team led by the National Human Genome Research Institute (NHGRI) today reported results that demonstrate how such comparisons can reveal functionally important parts of the human genome beyond the genes themselves.

In a study published in the journal Nature, the researchers compared the sequence of the same large genomic region in 13 vertebrate species. The organisms included human, chimpanzee, baboon, cat, dog, cow, pig, rat, mouse, chicken, zebrafish and two species of pufferfish (Fugu, Tetraodon).

"This analysis provides convincing evidence that the sequence of the genomes of a wide range of organisms - from chimpanzees to zebrafish - provides powerful insight into the understanding of our own genome," said Francis S. Collins, M.D., Ph.D., director of NHGRI. "When it comes to elucidating the biological functions encoded by genomes, it is now clear that there is strength in numbers."

NHGRI Scientific Director Eric D. Green, M.D., Ph.D., led the research team that also included scientists from The Pennsylvania State University, University Park; the University of California, Santa Cruz (UCSC); and the University of Washington, Seattle. "Our efforts have produced the largest data set of evolutionarily diverse genomic sequence generated to date," Dr. Green said. "By focusing on targeted genomic regions, but sequencing them in multiple species, we are getting a previously unavailable glimpse through the window of vertebrate genome evolution."

By systematically comparing the patterns of a certain type of genomic change, called transposon insertions, among the different species' sequences, these investigators were able to address a heated controversy in the field of evolutionary genomics. Their analyses confirm recently proposed trees of mammalian evolution indicating that primates (human, chimpanzee, baboon) are more closely related to rodents (mouse, rat) than to carnivores (cat, dog) or artiodactyls (cow, pig). Indeed, the evidence revealed by the new sequence data refutes alternative evolutionary trees that place rodents much farther away from primates.

In its study, Dr. Green's team analyzed the genomic region containing 10 previously identified genes - the most well known being the gene mutated in cystic fibrosis. However, an important discovery emanating from their multi-species comparative sequence analyses was the presence of substantial numbers of previously unidentified DNA segments that are conserved across a wide range of species, but which, unlike genes, do not code for proteins. Most of these conserved, non-coding regions could be uncovered only by using the sequences from multiple species. Indeed, they are not readily apparent by comparing just two species' sequences, e.g., those from human and mouse. While the precise function of these conserved elements is not yet known, there is evidence that they reflect non-coding sequences that have biological roles.

"Our studies demonstrate that an important route for identifying functional elements in the human genome will be sequencing the genomes of a menagerie of animals - not just two or three species, but many species that represent a wide sampling of the evolutionary tree," said Webb C. Miller, Ph.D., of Penn State, who was one of the paper's co-authors. "This study was just the beginning of a reconnaissance expedition, but it clearly illustrates why we need to explore many other animals' genomes to identify highly conserved sequences that reflect the functional parts of the vertebrate genetic blueprint." Dr. Miller is a computer scientist whose research involves developing algorithms and software for analyzing genome sequences.

Another co-author of the study, David Haussler, Ph.D., of UCSC, agreed: "Not only is this data leading us to a better fundamental understanding of molecular evolution in vertebrate species, it is also guiding the way to the development of methods that use the evolutionary record itself to highlight functionally critical regions of the human genome." As an important adjunct contribution, the UCSC team constructed a specialized component of its Web site, genome.ucsc.edu, for viewing the sequences generated from the multiple species, as well as for examining the results of the comparative analyses reported in the study.

The use of multi-species sequences for identifying functionally important regions of the human genome, as described in the Nature paper, will be a prominent component of another NHGRI-sponsored program called the ENCylopedia Of DNA Elements (ENCODE) project (www.genome.gov/Encode). The ultimate goal of the ENCODE project is to catalog all functional elements in the human genome sequence, thereby deepening our understanding of human biology and stimulating the development of new strategies for preventing and treating disease.

The final set of findings reported in the study revealed that, while the general types of genome changes were similar among all vertebrates studied, differences in the relative contributions of the various changes have uniquely sculpted each species' genome. These findings point to the complex ways that evolution has used millions of years of alterations to render each species' genome into its modern-day form.

Researchers emphasized that because their findings pertain to just a single genomic region, they will need to conduct analyses of additional regions to get a broader perspective. Indeed, the targeted genomic region studied in the Nature paper represents just the first of more than 100 genomic regions being sequenced in multiple species and analyzed by the NHGRI program known as the NIH Intramural Sequencing Center (NISC) Comparative Sequencing Program. This broader effort, which is led by Dr. Green, seeks to push the frontiers of genome sequencing by taking a detailed look at the similarities and differences of the same stretch of DNA among multiple species. This program is specifically designed to complement the efforts of larger sequencing centers, which typically sequence the entire genome of an individual species, such as the rat, and then conduct relatively broad-brush analyses comparing one whole-genome sequence with another.

"The findings we report in the Nature paper are just the tip of the iceberg - a sneak preview of the future, when we will have genome sequences from many, many organisms," said Dr. Green. He also noted that his program is now generating sequences from more than 30 vertebrate species, including representatives from relatively exotic evolutionary branches, such as marsupials, e.g., opossum; and monotremes, e.g., platypus.

The multi-species sequencing of targeted regions of the human genome is expected to serve as a test bed for guiding decisions about which animals should be next in line for whole-genome sequencing. At present, it typically costs more than $50 million to sequence an entire genome of a vertebrate, far more than is needed to "sample" a few targeted regions of its genome in an effort to get a preliminary glimpse. "We are close to completing the genome sequences for the most obvious species. Decisions about which genomes to sequence next will hinge on first establishing which ones will best contribute to our understanding of the function and evolution of the human genome," Dr. Green said.

In April 2003, the International Human Genome Sequencing Consortium, led in the United States by NHGRI and the U.S. Department of Energy, announced the successful completion of the Human Genome Project. In addition to sequencing the 3 billion DNA letters in the human genetic instruction book, researchers involved in the Human Genome Project sequenced the genomes of a number of organisms commonly used in biomedical research, including a bacterium (Escherichia coli), baker's yeast, two types of roundworm, two types of fruit fly, two types of sea squirt, two types of pufferfish, the mouse and the rat. NHGRI-supported researchers are now sequencing the genomes of the chimpanzee, the honeybee, the sea urchin, the chicken, the rhesus macaque, the dog and a set of nine fungi.

NHGRI is one of the 27 institutes and centers at the National Institutes of Health, an agency of the Department of Health and Human Services. Additional information about NHGRI can be found at its Web site, www.genome.gov.