Wellcome Trust Announces Major Investment in Genome Bioinformatics

The Wellcome Trust today announced a major investment of at least £8 million over five years in the Ensembl
project, the database providing automatic annotation of the human genome.

The increased resources in staff and computer power for the gene "software" will mean a much speedier collection and
dissemination of information on the function of genes, greatly aiding the work of researchers around the world in
finding new diagnostic methods and treatments for a huge variety of diseases.

Dr Michael Dexter, Director of the Wellcome Trust, said: "Mapping the human genome is an amazing
scientific achievement with the power to touch the lives of everybody on the planet. It is important that information
is made available in the most 'user-friendly' and complete way - and made available free of charge - and this is why
the Ensembl project is so vital.

"Ensembl is a wonderful way of transmitting genetic information clearly and quickly across the world. Having such a
reference centre, and a pipeline to the wider scientific world, will prove invaluable in the coming years in the fight
against a wide range of illnesses."

Ensembl has been developed at the Sanger Centre and the European Bioinformatics Institute (EMBL-EBI - part of the
European Molecular Biology Laboratory) on its Genome Campus in Hinxton, Cambridgeshire.

On June 26th an international consortium of public laboratories announced the first 'working draft' of the human genome
sequence, which was hailed as one of the most outstanding scientific achievements of our lifetime. The public
availability of this comprehensive genetic information presents huge opportunities to develop new treatments for
diseases based on understanding of the basic molecular processes of life.

However, to understand and exploit the information in the genome, sophisticated computer methods must seek biological
meaning by analysing the sequence. One important part of this is to locate genes, which make up only a small part
(probably less than three per cent) of the total DNA in humans. The resulting "annotated" DNA sequence must then be
made accessible to scientists throughout the world.

The aim of Ensembl is to provide the reference view of genome sequence data as a freely available resource for
scientists and the public. Ensembl has been providing automatically generated analysis of human genome sequence since
the end of 1999. Since the completion of the working draft the Ensembl team has been collaborating with other
international public bioinformatics centres connected with the Human Genome Project to provide an ordered view of the
working draft sequence for researchers as quickly as possible. A full analysis of the first version of the working
draft, in which the fragments of genome sequence have been organised and connected into a whole, has already been made
available via the Ensembl web site.

"This is a superb example of the synergy that is possible through collaborations of institutions in
Europe and of the quality of work that is possible in the public domain", commented Fotis Kafatos,
Director-General of EMBL.

"This is wonderful news for open, public domain bioinformatics. This grant will enable Ensembl to
expand its team and give the project sufficient compute resources to process the avalanche of sequence data that is
being generated", said Ewan Birney, who heads the Ensembl initiative from the EMBL-EBI side.

He added: "Since Ensembl went live in 1999, the Ensembl team have worked to provide researchers
worldwide with both an integrated view of what our DNA means and the programming tools to develop their own ways of
exploring that data."

The Ensembl project is based on an entirely 'open' philosophy: all data and program source code are available for the
free and unrestricted use of both academic and commercial biomedical researchers worldwide. The Ensembl site and data
resources are already being used by large numbers of researchers.

Software developers from both academia and major pharmaceutical companies have also begun participating in a totally
open software collaboration with the Ensembl team to speed the development of the software.

Graham Cameron, Joint Head of EBI commented: "New resources such as Ensembl are critical to add
value and organise raw sequence data being deposited in the public sequence archives and so maximise the benefit to
mankind from this exciting era in science".

"Ensembl puts the genome on the desktop of biologists worldwide, and will provide key
infrastructure for functional genomics programmes being pursued at the Sanger Centre and elsewhere", said
Richard Durbin, Head of Informatics at the Sanger Centre.

Although Ensembl plans to provide a comprehensive view of genomic data for biologists, it is structured so as to be as
open as possible to ideas and data from other groups.

"The human genome is too complex for any organisation to have a monopoly of ideas or data",
said Tim Hubbard, who heads the Ensembl initiative from the Sanger side.

Notes to editors:

The Wellcome Trust is the world's largest medical research charity with an annual spend of some £600
million in the current financial year1999/2000. The Wellcome Trust supports more than 5000 researchers at 300
locations in 42 different countries, laying the foundations for the healthcare advances of the 21st century and
helping to maintain the UK's reputation as one of the worlds leading scientific nations. As well as funding major
initiatives in the public understanding of science, the Wellcome Trust is the country's leading supporter of research
into the history of medicine. http://www.wellcome.ac.uk/

The Sanger Centre, which receives the majority of its funding from the Wellcome Trust, is one of the world's
leading genome sequencing centres. Both the Sanger Centre and the Wellcome Trust have been at the forefront of
efforts to keep sequence data in the public domain. The Sanger Centre employs about 500 people in the purpose-built
campus at Hinxton. The Centre is a leading partner in the Human Genome Project, and is responsible for sequencing
one-third of the human genome sequence and also contributes to international projects to sequence the genomes of
disease-causing organisms. http://www.sanger.ac.uk/

The EBI is an Outstation of The European Molecular Biology Laboratory (EMBL); it maintains some of the world's
largest databases of DNA and protein sequence data, develops tools to help biologists use it, and is the home of
research groups who are looking for the biological significance of this data. The EBI is also one of the world's most
important centres for bioinformatics training. EMBL is a basic research institute funded by 16 member states,
including most of the EU, Switzerland and Israel. Research at EMBL is conducted by approximately 80 independent
groups covering the spectrum of molecular biology. The Laboratory has five units: the main Laboratory in Heidelberg,
Outstations in Hinxton (the European Bioinformatics Institute), Grenoble (on the campus of the ILL and ESRF), Hamburg
(on the DESY site), and an external research programme in Monterotondo, Italy (sharing a campus with EMMA and the
CNRS). The Laboratory provides essential services to the European scientific community, welcomes a large number of
scientific visitors each year, and has an active international PhD programme. http://www.ebi.ac.uk/

Ensembl integrates and is built on top of data from existing database resources provided by both institutes.
EMBL-EBI is one of the three worldwide repositories for biological sequence data. It houses both the EMBL DNA
database and the SWISSPROT protein sequence database, which are core resources used by researchers worldwide. The
total amount of DNA sequence data deposited doubles every 6 months, while computers only double in speed every 18
months. Functional annotation of genes in Ensembl are provided from Pfam and other protein domain resources combined
together in the INTERPRO project.

Ensembl is also integrating emerging data resources that are being generated by post- genomic initiatives, such
as from genetic variation projects. These include the efforts to find on the genome where single bases show
differences from individual to individual (single nucleotide polymorphisms or SNPs for short).

The SNP Consortium Ltd, a collaborative effort of commercial companies and the Wellcome Trust to create a
freely available genome wide map of such information, has recently announced the release of a total of 100,000 of
these points. The Sanger Centre has located 45% of these. The SNPs - many of which have been linked to genetic
diseases - have already been integrated into Ensembl and are visible on its web displays.