DOE Joint Genome Institute

DOE JGI in Walnut Creek, California, provides state-of-the-science capabilities for genome sequencing and analysis. With more than 1100 worldwide collaborators on active projects, JGI is the preeminent facility for sequencing plants, microbes, and microbial communities that are foundational to energy and environmental research.

Summary

The DOE Genome Project is in a transitional phase, evolving to balance the
demands of large-scale sequencing and technology development, while at the same
time setting the stage for genomic analysis of gene function. Discussion of
progress and goals emphasized the following points:

The JGI will carry out the majority of DOE's share of human genome sequencing
with a 5 year goal of over 600mb. Highest priority must be on ensuring success
of the JGI.

Sequencing costs must be reduced by 2-4 fold, with special emphasis on "hardening"
of incremental technological improvements that will contribute to sequencing
during the next 5 years.

Development of new sequencing technologies should be supported but they
should hold the promise of 20-100 fold improvements over current methods.

The current modest investments in functional genomics should be enhanced
as funds become available.

Sequencing

Microbial Genomes

The DOE microbial genome project has made a very substantial contribution
to our understanding of the diversity of microbial life and the complexities
of evolution. A large fraction of genes (30-50%) that are found in any newly
sequenced microbial genome do not have known relatives. Sequencing of additional
genomes should be high priority both to understand our biological world and
to enlarge the repertoire of genes that may be of practical importance.

Joint Genome Institute

The DOE genome project has made a major commitment to support of a large scale
DNA sequencing facility - the Joint Genome Institute (JGI) under the direction
of Dr. Elbert Branscomb. Resources and scientific talent from genome efforts
at three national laboratories have been pooled and brought to bear. Very challenging
goals of ramping up production have been set and every effort must be made to
ensure its success.

Currently LBL, LANL and LLNL are pursuing sequencing within their own structures
to meet the production goal of 20 mega bases of finished sequence for fiscal
1998. So far, production is on track to meet this goal. This is being accomplished
even under the pressure formulating a united scheme for the factory productions
and planning the new facility. These pressures will escalate in the coming months
and the stress of the move to the new facility and the need to double the sequence
output to 40 mb in fiscal 1999. This latter goal will be very challenging because
of the many distractions. Some patience should be shown so that the factory
can get up and running effectively, so as to have a real shot at meeting future
sequencing goals.

JGI needs to show that it can sustain production of high quality, contiguous,
sequence. This is imperative even if it is at the cost of throughput and cost
during the crucial first and second years. However, every effort within reason
must be made to keep to the proposed aggressive ramp-up. The proposed goals
are:

Year

Output

Cost/bp

1999

40mb

$0.50

2000

100

$0.35

2001

150

$0.30

2002

200

$0.25

2003

200

$0.20

In light of the major investment of the DOE genome project in the JGI,
and the importance of success of this venture, the first priority of funding
genome research must be to strategic ventures that will help insure success
of the JGI. The necessity of "getting on with sequencing" requires commitment
of substantial funds to production at the JGI, which will limit the ability
of the DOE genome project to support technology development.

Yet current technologies must be augmented by improvements in automation, in
sequencing chemistries and in computer tools for assembling and interpreting
DNA sequences in order to improve both efficiency and cost.

As the JASON report on the genome project points out, this is just the time
when technology development is very important for the genome project and technology
is DOE's forte. Both the JASONs and the BERAC genome subcommittee are in accord
that despite the immediate need for production sequencing, funding must be ensured
both for short term developments that enhance current production and for more
long term technologies that will provide the key tools for the future.

Technology Development

Current technology for genomic DNA sequencing has about equal cost contributions
from labor and reagents (capital investment in equipment is quickly amortized)
with an overall cost of about $0.50/base pair, although the many vagaries of
calculating costs make this number quite soft and subject to individual lab
interpretation. However, for the JGI (and the genome project in general) to
meet its goals, the cost must come down by a factor of 3 to 4, without compromising
sequence accuracy. To have any effect during the major sequence accumulation
phase of the project (until 2005) only incremental improvements to current
technology
are likely to pay off. Incremental improvements to the sequencing process itself,
including alternative chemistries and longer read lengths, are resulting in
overall cost improvements. Automation of sample preparation and handling continue
to hold the promise of cutting labor costs and improving reliability
of the process. (See addendum "A" for more complete list of needed improvements.)

However, just the development of improved technologies is not sufficient. It
is often the case that promising technologies languish because of the difficulties
of moving them into production streams. Disruption of the production effort
and dependence on a new, untested technology make the risks of implementation
too high. Thus a major challenge is to find ways to support "hardening" of incremental
technologies so that they can be moved into production with minimal risk. A
targeted funding method is needed to solve this problem otherwise the investment
in many incremental technologies will have been wasted. Perhaps new cooperative
agreements can be the tool. However, it is crucial that appropriate measures
be in place to ensure that the value of incremental technologies is assessed
during this hardening phase. This will require monitoring usefulness and establishing
milestones for performance.

There is a crucial need for development of new sequencing technologies that
will be the tools for the future. The appetite for sequencing will only increase,
but the costs of current methods, even with incremental improvements, will greatly
limit sequencing capacity. While some new approaches are in the wings, (See
addendum "A" for a list), they are not likely to contribute to the large-scale
sequence accumulation needed by 2005 to meet the primary goals of the genome
project. This reality should not discourage investment in longer-term development
of new sequencing technology. However support should be predicated on new technologies
being able to reduce the cost of sequencing by 20 to 100 fold. Anything less
will be too late with too little. In addition support for long term technologies
cannot be at the expense of JGI's success.

Functional genomics

It is important to lay the groundwork now for this next stage of the genome
project. Genome sequences are only starting points - the human sequence is a
tool to use to identify the information for each of the 100,000 genes with the
ultimate goal of determining the function of each gene. Defining the very complex
network of interactions of gene products will be the heart of biomedical research
for many decades.

Genome sequence is also a tool that permits examination of human variation
with direct applicability to understanding individual susceptibility to disease
and environmental insults such as exposure to low radiation doses. With the
reference sequence in hand, genes that play a role in susceptibilities will
be identified leading to an understanding of differential susceptibility in
the population. Using the mouse as a model organism is particularly powerful.

There are many aspects of the application of genome technology to DOE missions.
Analysis should be expanded on a number of these fronts if the resources can
be found. This is the payoff - the harvest of the genome project.

Efforts to sequence human and mouse cDNAs in order to get a reasonably
complete picture of the DNA regions that are expressed should be continued
and expanded if possible.

Expression levels of large numbers of genes can be determined at one time
with new chip technologies. For example the 6000 yeast genes can be monitored
on one chip and changes in patterns with environmental or genetic differences
can be determined. This important technology should be supported and enhanced
for more complex organisms.

Attention is beginning to focus on the complexity of gene products - the
proteins. It is clear that one gene can yield multiple protein products but
so far there is little understanding of the complexity on a genome wide basis.
Development of new tools in this arena is important.

The DOE has a major investment in the mouse as a model system for studying
the effects of mutations in individual genes. The groundwork should be laid
for application of new technologies will make it possible to systematically
assess gene functions in the mouse on a genome wide scale. What is learned
about mouse is directly relevant to human. Zebrafish should be assessed as
another appropriate model system.

Comparative sequencing of syntenic regions of mouse and human should be
supported.

However, given the stringent demands of the production-sequencing phase of
the Genome Project, presently available resources are far too constrained to
do justice to the scope and importance of developing tools of this nature.

Informatics

Development of informatics tools continues to be high priority. However, there
is still the nagging concern that tools to do equivalent job are developed independently
in many different labs. While it may be true that each large scale sequencing
center will need to develop informatics tools to support their own technologies,
sharing of solutions needs to be encouraged. The development of data bases and
their tools needs to be driven by the user community. It is anticipated that
the joint NIH-DOE workshop to consider appropriate informatics goals will provide
the needed guidance.

Human Genome Project 1990–2003

The Human Genome Project (HGP) was an international 13-year effort, 1990 to 2003. Primary goals were to discover the complete set of human genes and make them accessible for further biological study, and determine the complete sequence of DNA bases in the human genome. See Timeline for more HGP history.

Published from 1989 until 2002, this newsletter facilitated HGP communication, helped prevent duplication of research effort, and informed persons interested in genome research.

Citation and Credit

Unless otherwise noted, publications and webpages on this site were created for the U.S. Department of Energy Human Genome Project program and are in the public domain. Permission to use these documents is not needed, but credit the U.S. Department of Energy Human Genome Project and provide the URL http://www.ornl.gov/hgmis when using them. Materials provided by third parties are identified as such and not available for free use.