Structured computer representations of biomedical data:
application to the ribosome

Abstract

The information explosion that is gripping molecular biology has challenged
our traditional mechanisms for the collection, storage and analysis of
experimental data. In particular, it is becoming more difficult to create
explanatory and
predictive models that are consistent both internally and with the huge
volumes of published data. The difficulty increases when a large variety of
heterogeneous experimental approaches are used to gather data from multiple
perspectives. A central strategy for managing this information overload is
the creation of technologies which store and represent these data in novel
ways.
In order to facilitate computational processing of data, it is especially
critical to develop standardized structured data formats for representing
biological data.

The large majority of biological experiments do not have standardized
templates. The results of these experiments are still predominantly
disseminated in
published texts accompanied by figures and tables for summary and
convenience. While this format is useful for knowledge extraction by
readers on a
per-article basis, it does not allow for efficient integration of all data
relevant to a particular topic, and it certainly is not amenable to
computer-based data extraction for the purposes of further computations on
these data.

To show the value of structured representations of data in dealing with these
critical issues, we have built a prototype knowledge base (RiboWEB) of
structural data pertaining to the small (30S) ribosomal subunit of E. coli.
Diverse types of data taken principally from published journal articles are
represented using a set of templates within this knowledge base, and these
data are linked to each other with numerous and rich connections. Not only
does
this representation allow for easier and more convenient data retrieval by
human users, but it facilitates automated data analysis by computer programs.
We believe that formal representations of the data and models within
scientific subdisciplines hold promise as a key method for delivering the
next generation
of scientific data resources and represent the way in which scientific data
should be published in the future.