New Study Should Speed Discovery of Gene Function

Even though the complete and final sequence of the human
genome will be published next month, it should come as a shock
to nobody that we still have a long way to go in terms of
identifying and understanding how our genes actually function
in life.

In biology, this is referred to as the phenotype gapthe
discrepancy between the 30,000 to 40,000 genes we believe
are present in the human genome and the mere total of 5,000
distinguishable traits that have been identified through studies
of inherited diseases and knockout mutations produced by gene
targeting. The inference to be made is that the critical function
of most genes remains unknownwe know the genome sequence,
but we still have yet to discover all of its consequences.

Now a new study by a team of scientists, which was recently
published in the online edition of the journal Proceedings
of the National Academy of Sciences, could help to fill
in this gap, thanks to a team of scientists from the non-profit
Institute for Childhood and Neglected Diseases (ICND) at The
Scripps Research Institute (TSRI) in La Jolla; the non-profit
Genomics Institute of the Novartis Research Foundation; Phenomix
and Sequenom, Inc., both San Diego-based biotechnology companies;
the National Cancer Institute; and the Rockville, Maryland-based
company Celera Genomics.

The team conducted a massive study of the mouse genome to
examine the genomic variations within individual strains.
This analysis should help scientists decide which particular
mouse strains to breed in experiments aimed at mapping phenotypes
to genes, greatly increasing the speed with which the function
of genes are discovered.

"We now have a much clearer picture of the distribution
of DNA polymorphisms around the genome," says TSRI Assistant
Professor Colin Fletcher, one of the lead authors on the study.

Haplotype Patterns in Mouse Defined

Basically, explains TSRI Professor Steve Kay, the new tool
is like a global positioning system for mouse geneticsa
set of coordinates that scientists can use to "navigate" the
genome the same way that a sailboat pilot uses a constellation
of GPS satellites to navigate the Pacific Ocean.

The biological coordinates used in the new method are, in
actuality, what are known as single nucleotide polymorphisms
(SNPs). SNPs are locations in the genome where a particular
base can vary among individuals. Occasionally, these changes
occur in the middle of a gene and sometimes even alter the
function of the product of that gene, although the majority
of SNPs are not themselves responsible for disease.

Nevertheless, SNPs are extremely valuable for research if
they are located close to genes linked to a particular biological
traitlike susceptibility to a specific disease, for
instance. In these cases, the SNPs serve as biological markers
that scientists can use to identify and "positionally" clone
those linked genes.

In the study, the scientists analyzed the haplotypes, or
arrangements of these SNPs, throughout the mouse genome. In
fact, the team used technology developed by Sequenom, Inc.
and data generated by Celera Genomics to discover and identify
some 80,000 SNPs, a broad sampling of the mouse genome.

"We tried to get a set of SNPs that were spaced all over
the chromosomes," says GNF's Tim Wiltshire, who was the first
author on the paper.

What the team found was that the SNPs of the eight inbred
mouse species they surveyed tended to appear in clumps. That
is, certain regions had almost no SNPs, while other regions
were rich with the markers.

"You can go 50 million bases and find only a handful, and
then the next 20 million base pairs, there will be a SNP every
few hundred bases," says Fletcher.

This finding is intriguing because many scientists had assumed
SNPs were more or less randomly distributed throughout the
genome. But this analysis demonstrated that there is a structure
to the distribution of SNPs.

The findings also probably indicate something about the
nature of the laboratory micethat the ancestors of the
various inbred strains of mice were themselves inbred when
the strains were first established around the turn of the
last century.

Because of this inbreeding, no one strain can represent
the sum total of the biology of a mouse any more than one
family would represent the sum of all human biology.

Help for Cloning Genes

Based on this analysis of the haplotype patterns of the
different mouse strains, scientists should be able to compare
different strains of mice and determine which two would be
appropriate to breed in order to create a mouse model or attempt
to positionally clone a particular gene.

Positional cloning traditionally entails the use of classical
genetic mapping methods to confine the location of the gene
to a particular area in the genome, extensive sequencing of
the region in question, and the performance of computer-aided
searches through databases to find homology between sequences
in that region and known genes.

But with the new haplotype analysis and set of SNPs, researchers
could narrow the search dramatically. If a SNP is located
near a gene found to be associated with a particular trait,
scientists can recognize its proximity and use its known location
to identify nearby genes that may be linked to that trait.