Public Release: 18-Jun-2014
Scientists about sequencing data: We drown in data but thirst for knowledge

IMAGE: This is a photo of Associate Professor Jan Baumbach, University of Southern Denmark.
view more

Credit: University of Southern Denmark

While more and more genomic information is becoming available at a drastically increasing pace, the knowledge we can gain about how microorganisms interact with their surrounding, infect hosts and alter their molecular programs in accordance to changing environmental conditions remains widely not deducible from genomic data alone, the researchers from University of Southern Denmark claim. This raises questions regarding the value of newly sequenced species.

The researchers have analyzed the genomes that are available from the past 20 years of sequencing bacterial DNA. They tried to use this data pile to answer a simple questions: Can one distinguish between pathogenic and non-pathogenic bacteria based on their DNA content only?

No valuable knowledge about dangerous bacteria

When they found out that this is not possible in several cases, i.e. you cannot use these data to make such a simple but extremely important distinction, why should we bother collecting even more of this kind of data, the University of Southern Denmark scientists, Associate Professor Jan Baumbach and his doctoral student Eudes Barbosa from the Department of Mathematics and Computer Science at the University of Southern Denmark, now ask in a new study.

Almost 3,000 bacterial species have been sequenced so far. Another 24,000 sequencing projects are presently under way, and there are numerous additional projects on sequencing many more organisms from all kingdoms of life.

"One may ask for the value of all this", the researchers say.

Their research results now show that when it comes to bacteria science cannot count on getting useful information on their pathogenicity from DNA sequencing.

"Should we continue to sequence the DNA of bacteria on such a large scale? Maybe some of the effort and resources could be spent better", say Baumbach and Barbosa.

Proteins provide more valuable knowledge than DNA

Together with colleagues from the Max Planck Institute for Informatics in Germany and the Bioinformatics Department at the Federal University of Minas Gerais in Brazil, the two researchers performed in-depth investigations of 240 whole-genome DNA sequences from actinobacteria, one of the oldest clades on earth. It covers species of high medical relevance, such as Corynebacterium diptheriae (causing diphtheria), Mycobacterium tuberculosis (tuberculosis) and Mycobacterium leprae (leprae). In average, their genomes have around three million base pairs and five thousand genes.

Since the first sequenced genome of the influenza virus in 1995, researchers have deciphered several thousand of species and ca. 50 million genes. In total, we know about ten thousand bacterial species and bacteria-like archaea, but it is estimated that there are many more. Conservative bids suggests well above 100 million.

The University of Southern Denmark researchers emphasize that they are not generally opposed to DNA sequencing as a scientific tool at all. One should just be aware of its limited value regarding important follow-up questions, such as pathogenicity, virulence and infectiousness.

"We drown in data but starve for knowledge", Jan Baumbach says and continues:

This allows for measuring the activity of the genes under a specific condition (after infection, for instance) rather than their mere occurrence, which turns out to be uninformative, at least for bacterial infectivity.

"Such data can be expected to carry more information than the DNA sequence alone, and it can be used to illuminate the interplay of genes, as they do not act in isolation but in an orchestra," the bioinformatics group leader explains.

The important aspects of disease-causing bacteria are found in the genes activity, not in their DNA sequence.

"It's like a plane crash. The color of the plane does not matter. What matters is unraveling the parallel sequence of activities that lead to the accident." says Eudes Barbosa.

###

Ref: Briefings in Functional Genomics: On the limits of computational functional genomics of bacterial lifestyle prediction.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.