Working draft of human genome sequence now publicly available on UCSC web site

UCSC researchers who performed the computer analysis to assemble a working draft
of the human genome sequence have now posted their results on a UCSC web
site. Biomedical researchers throughout the world can now search the working
draft for particular genes or DNA sequences of interest to them.

A group led by David Haussler, professor of computer science and director of the
Center for Biomolecular Science and Engineering, created a powerful new computer
program to assemble the working draft from the sequence data obtained by the international
Human Genome Project (see related story). The policy
of the public consortium of scientists working on the Human Genome Project has been
to release sequence information to the world as soon as possible. Until the working
draft was assembled, however, the sequence data were only available in many small
pieces.

"This is the first publicly available view of the human genome sequence tentatively
placed in the order and orientation in which we think it occurs along the human chromosomes,"
Haussler said.

Haussler noted that after a 10-year effort involving many laboratories and more than
1,000 scientists, the human genome can now be downloaded from the UCSC web site in
about an hour and a half by anyone with a DSL-speed Internet connection. The genome
sequence is essentially a long string of As, Ts, Cs, and Gs, representing the chemical
units of DNA, called bases. The human genome consists of approximately 3.1 billion
bases arrayed along the length of the chromosomes.

Scientists involved in the Human Genome Project have sequenced about 85 percent of
the human genome and continue to generate new sequence data at a rapid pace. Haussler's
group will rerun their computer analysis every few weeks, incorporating new data
so that biomedical researchers will have immediate access to the most up-to-date
assembly.

Although the current working draft still has some gaps and uncertainty in it, it
is already extremely useful for most biomedical research purposes, said Alan Zahler,
an associate professor of biology at UCSC. In many cases, researchers have identified
part of a gene or have other clues to the gene's sequence. They can now use that
information to search the genome for the rest of the sequence associated with the
gene they are interested in.

In addition, researchers who know the sequence of one gene can search the genome
for similar sequences to find related genes. Many genes are members of multigene
families with similar and sometimes overlapping functions, Zahler said.

"Identification of all of the members of a gene family will give us a sense
for how many genes with a certain role are present in the genome," Zahler said.
"Before, it took long periods of experimentation to find out whether a gene
in humans was a member of a larger family of genes."

As a test for completeness of the working draft, Human Genome Project scientists
searched the draft for known genes associated with human genetic diseases and found
95 percent of those diseases had identifiable genes in the working draft.

"The chances of finding a particular disease gene in the working draft are apparently
quite good," Haussler said. "Technically, the working draft only covers
about 85 percent of the genome, but in practice it appears to cover 95 percent of
the disease genes."

The working draft is also an exciting beginning for scientists interested in understanding
gene structure and organization in humans, Zahler noted.

"For the first time, we will be able to look at tens of thousands of genes
at once and start to search for common themes in areas such as how classes of genes
are turned on and off, how the information in genes is processed into a form that
encodes proteins, and how that processing is regulated," he said.

Jim Kent, a graduate student working with Zahler, designed and wrote most of the
software used to assemble the working draft, which was compiled from hundreds of
thousands of fragments of various sizes.

"Imagine you have five copies of War and Peace and one of Crime and Punishment,
you put them through a paper shredder, and then try to paste together a single copy
of War and Peace from the shreds," Kent said. "That job would be a lot
like assembling the human genome, except that the genome runs to about a million
pages."

Haussler said Kent's accomplishments will have a very real impact on science and
medicine. "He has shown enormous talent and creativity in tackling this fundamental
problem," Haussler said.

In addition to the UCSC web site, the working draft will also be available on sites
maintained by the National Center for Biotechnology
Information and the European Bioinformatics Institute.
Both NCBI and EBI are major contributors to the computational analysis of the human
genome data. Haussler has already sent them the current working draft and will continue
to send them updated versions as new sequence data are incorporated into the analysis.