D-Lib MagazineSeptember 1998

ISSN 1082-9873

Bioinformatics

"An unprecedented wealth of data is being generated by genome sequencing projects and other experimental efforts to determine the structure and function of biological molecules. The demands and opportunities for interpreting these data are expanding more than ever."

By IJsbrand Jan Aalbersberg, Publishing Research and Technology, Elsevier Science

IJsbrand Jan Aalbersberg has a background in theoretical computer science and information retrieval.

It has been many years since computers were only used for the storage and straightforward processing of simple tabular data. New technologies have enabled an increase in the complexity of data types, of data storage -- and especially of data processing. In the domain of digital libraries, text and image searches are everyday examples of this increase in complexity. However, there also exist quite a few other, more specialized examples in the digital library domain that many of us will never encounter.

Molecular biology databases -- mostly consisting of gene and protein sequences -- are examples of digital libraries that have increasingly complex applications associated with them. The development and use of these applications (i.e., the techniques, algorithms, and tools to analyze, compare, and classify the data in those biological databases) takes place in the field of bioinformatics. This emerging discipline is strategically located "at the frontier between biology and computer science, impacting medicine, biotechnology, and society in many ways" (Bioinformatics, page xi).

This book by Pierre Baldi and Søren Brunak presents a well-written (though technical) and complete overview of machine-learning techniques (e.g., neural networks and hidden Markov models) and their applications in the molecular biology domain. The authors concentrate on the more recent developments, but also present the less recent aspects in the short history of bioinformatics, when relevant. Theory and examples are both provided, although the latter could be worked out more extensively to guide the reader through the technically complex material. (Clearly, such an elaboration would have increased the already considerable size of the book, which numbers more than 350 pages.) The reference list contains more than 450 references. It is a sign of the times and of the evolution of the medium that the twelfth and final section provides a 15-page list of URL's that are relevant to bioinformatics.

Still, the book has a problem of properly selecting the audience. Since bioinformatics concerns the application of advanced computer science and probability theory to the most recent discoveries in molecular biology, the question arises, what basic knowledge should the reader have? The technical prerequisites mentioned are "basic calculus, algebra, and discrete probability theory" at an undergraduate level, while any prior knowledge in the molecular biology domain is not required (page xiii). The cover even claims that the book can be used as an introduction into bioinformatics, by both researchers and students with only a primary background in either one of those areas. It is my opinion that at least some knowledge of molecular biology is required for a proper understanding of the (sometimes rather dense) material presented, and that some experience in applying probability theory would be very helpful as well.

In conclusion, Bioinformatics - The Machine Learning Approach is a very valuable reference work and update on developments in bioinformatics, both for those that are already working in this domain and for those desiring to do so. It is, however, less appropriate as a general introduction into the area of bioinformatics for interested and literate non-specialists.