Two
years ago, as the sequencing of the Drosophila genome neared completion,
researchers took an increasing interest in using the fly to study human
disease. The publication of the sequence in Science in March of
2000 has led to several studies assessing the prevalence of human disease
counterparts in the fly genome.

Researchers at the University of California San Diego were among those
who began constructing a database of human genes, fly genes and genetic
diseases early on. The result is Homophila, a database that went online
last fall. Now, in the June issue of Genome Research, they report
on 548 Drosophila genes representing 714 different diseases that
appear to be counterparts to human disease genes and may be good candidates
for study.

The fly is an extremely useful model system for studying genes
associated with disease in humans.

The authors have mined the Drosophila genome for links to existing
knowledge about medical genetics. Although the value of the fly as a model
system is well known, they argue that the biological connections are not
always obvious. At the Homophila database Web site, human genes, fly genes,
and diseases are cross-referenced and linked, for instance, to scientific
abstracts and to a catalogue of genetic conditions called OMIM (Online Mendelian
Inheritance in Man).

To produce the latest version of Homophila, Ethan Bier, a biology professor
at UCSD, and colleagues screened a set of 929 human disease genes against
the complete Drosophila sequence. Their analysis identified the
548 genes as potential relatives of the human genes based on a high degree
of similarities in amino acid sequences. Whether two genes are actually
related or share similar functions is impossible to know by comparing
their sequences.

"We're trying to show with the database that the fly is an extremely
useful model system for studying genes associated with disease in humans,"
says Bier. He proposes that the set of fly genes is a starting point for
investigators interested in studying human disease in Drosophila.
Homophila is an ongoing project whose ultimate goal is to facilitate communication
between fly and human researchers.

Researchers who do not normally work together collaborated on Homophila.
The primary architect of the database is Michael Gribskov, a computational
biologist at the San Diego Supercomputer Center.

"Ethan Bier came to me in 1999 and said, 'Let's try to find all
the human disease genes in the fly,'" recalls Gribskov. "Homophila
brought together experimentalists with real biological questions and our
bioinformatics group, which does not usually work on fly genetics."
Gribskov's group builds genomic databases for the plant Arabidopsis,
among other projects.

The research team included Lorraine Potocki, a clinical geneticist and
pathologist at Baylor College of Medicine, in Houston, Texas. Working
with the fly researchers at UCSD, she generated lists of the kinds of
human disorders that can potentially be studied using Drosophila
as a model organism.

The organization of human and fly data and the analysis by Potocki suggested
that there are fly counterpart genes for human conditions like blindness,
deafness, blood disorders, and immunological disorders. "This came
as a bit of a surprise as most people don't think to study hearing or
cancer in Drosophila," says Lawrence T. Reiter, a UCSD researcher
with a background in human genetics and co-leader the Homophila project.

The 929 human disease genes in the study were compiled from OMIM, which
was created by Victor A. McKusick, of the Johns Hopkins University School
of Medicine, and colleagues. The categories of diseases listed on Homophila
include neurological, cardiovascular, and skeletal development.

Detail of Homophila query using the keyword
'neuropathy.'
View largerCourtesy Genome Research

Typing 'deafness' into the Homophila search engine, for example, brings
up hearing-loss syndromes, human genes associated with them, and the fly
genes that match these sequences. Drosophila has about a dozen
sequences resembling deafness genes in humans, according to the Homophila
database.

"The biggest surprise of this study to me was that so many human
disease genes have cognates in flies," says Gribskov. He adds that
'cognate' implies a functional similarity between genes, but not necessarily
the degree of similarity needed to infer homology (common evolutionary
origin).

"Given how different humans are from flies, I expected about 20
to 30 percent of the human disease genes to have matches in the fly,"
he adds. "We found matches for nearly 80 percent of the genes we
screened."

Two previous analyses of human disease counterparts in the fly came up
with fewer candidate sequences, although the strategies and methods vary
among the studies. Just how many human disease genes might have counterparts
in the fly is at present unknown. After the Drosophila sequence
was published, one research team reported that 178 fly genes are likely
to be homologues to a set of 287 human disease.

The authors of that study, Mark E. Fortini, of the University of Pennsylvania
School of Medicine, and colleagues, noted in a paper last year that the
literature on the subject includes estimates that from half to three-quarters
of the human disease genes have counterparts in the fly. His group found
that 62 percent of the human disease genes in their set had homologues
in the fly (178 fly genes out of 287 human genes).

Fortini and colleagues started with a list of more than 800 human genes
compiled from OMIM, medical textbooks and scientific articles on classes
of genes. The researchers eliminated over half of these, however, because
they did not meet the criterion for the study, which was that "the
human gene must actually be mutated, altered, amplified, or deleted in
human subjects with the disease." The final list, they wrote in The
Journal of Cell Biology, was not meant to be comprehensive.

How many 'hits' are generated in any cross-genomic comparison depends
on several factors, including the statistical methods used to define evolutionary
relatedness. The stricter the standard of relatedness based on the similarity
of two gene sequences, the fewer the hits.

Bier and colleagues generated hit lists for a variety of 'E values,'
a statistical measure of the odds that the match between the two sequences
could have occurred by chance. "An E value of 10-5
means that you have to run 10,000 searches with a random query to get
the match you're seeing," explains Gribskov.

The 548 fly genes in the UCSD analysis were identified using an E
value of 10-10,
according to the Genome Research paper. The researchers call these
genes "clear hits."

Gribskov's group determined that 409 of the 548 fly genes on the clear-hit
list also have cognates in yeast. If the Homophila project develops as
planned, the database will be expanded and updated to include newly discovered
human and fly genes as well as data on other species. "Now that the
human sequence data are available, we expect many more human disease gene
candidates to be identified in the coming months," says Bier.

With so much data being generated all the time, researchers face a significant
challenge in organizing and managing the information efficiently. "There's
been a real revolution in genomic databases in the last five years,"
observes Gribskov. Until the recent explosion of sequence data, he says,
many researchers were interested in downloading information and setting
up databases.

The trend today is toward a kind of one-stop-shopping for data. "The
personalized approach is no longer practical because the data sets are
so big," he says. "And the Internet provides a great way to
have a centralized service that is easy for any researcher to use."