Abstract

A report on the meeting 'Unravelling Nature's Networks: from Microarray and Proteomic
Analysis to Systems Biology', Sheffield, UK, 21-22 July 2003.

Meeting report

Understanding complex regulatory biochemical networks in cells, tissues and on an
organism-wide scale has become the holy grail of biologists and bioinformaticians
alike. The 'Unravelling Nature's Networks: from Microarray and Proteomic Analysis
to Systems Biology' meeting put together recent experimental and computational advances,
providing a picture of how far we have come in this subject, and outlining ways forward.

Having always been one of the primary goals in biology, the problem of inferring relationships
between entities in biochemical networks has been re-defined by the community each
time a new experimental technique appears. The high-throughput methods of genomics
and proteomics have increased interest in the problems of biological network reconstruction
enormously. It has become possible not only to develop sophisticated, albeit abstract,
models using classical modeling approaches, but also to apply various novel concepts.
One of the most promising examples of such concepts is discovering recurring patterns
in large sets of data, allowing effective and direct analysis of the results of modern
biological experiments.

For a while, the exciting possibilities of large-scale studies led many researchers
to think that general theories could be revealed simply by doing as many large-scale
experiments as possible and looking at the results to find common features. That time
has passed, and several practical as well as theoretical restrictions on high-throughput
methodologies are becoming apparent. It is now well understood and widely accepted
that only a combination of different approaches and different types of data and, more
importantly, the concentrated efforts of experts with different scientific backgrounds
on both 'large' and 'small' scales, can lead us to fruitful results. The predominant
idea now is that a fusion of novel techniques with expert knowledge will help to avoid
what can otherwise happen: the transformation of high-throughput concepts into scattered
attempts to gather a lot of costly experimental information, often with only a vague
purpose.

The opening talk was given by Steve Dower (University of Sheffield, UK), who studies
gene expression in signaling pathways associated with chronic inflammatory diseases
and host defense mechanisms. He showed that, although in general the considered signaling
system is highly robust to changes in activator expression and responds in a linear
fashion even to many-fold overexpression, it consists of 'noisy' stochastic components.
Indeed, he noted that gene expression varies significantly at the single-cell level,
suggesting that the pathway dynamics are governed at a higher level, possibly by the
formation of large multiprotein complexes from signal transduction components.

The combination of experimental and in silico techniques on small and large scales is routine for some laboratories. Rick Livesey
(The Wellcome Trust/Cancer Research UK Institute, Cambridge, UK) described his work
on the modeling of transcriptional regulatory networks involved in mouse forebrain
development. The goal is to develop a means of combining the results of spatial/temporal
arrays of expression of candidate transcription factors with the computational analysis
of putative regulatory regions of co-regulated genes. He made the point that it is
possible to answer some 'simple' questions about the expression of the selected genes
at single-cell resolution in a precise and reproducible way. A great deal of noise
and natural stochasticity is intrinsic to biological systems, and, as Dower described,
the same pathway may produce a wide variety of responses, which differ from cell to
cell. The reproducibility of even 'simple' experimental observations of the expression
of specified genes when comparing individual cells (for example, overexpressed versus
underexpressed) is an exciting development, as this represents progress beyond experiments
dealing with averages at the level of cell populations.

One of the main goals of bioinformatics is still the development of manageable and
accessible banks of biological information. Terry Speed (The Walter & Eliza Hall Institute
of Medical Research, Parkville, Australia) and Patrick Kemmeren (University of Utrecht,
The Netherlands) presented two different approaches to this type of development. Speed's
example was of a classical bioinformatic methodology focused on particular practical
questions such as the experimental problems of how to identify multiple proteins in
complex mixtures and the data mining and acquisition issues in the tandem mass spectrometry.
He presented a scoring algorithm based on estimating factors that influence the gas-phase
fragmentation of protonated peptides, using a recently developed 'relative proton
mobility scale' and evaluated and compared it with several other existing algorithms.
The proposed scale proved to be an effective basis for the automatic classification
and statistical analysis of MS/MS spectra. Taking a 'systems biology' view, Kemmeren
introduced a conceptually simple but promising approach to data management in which
high-throughput biological information from different sources is integrated in order
to derive common observations that would otherwise be hidden or even neglected. The
approach involves storing data from different experiments for subsequent analysis
and in a qualitative form, for example the presence or absence of a particular protein-protein
interaction or a significant or nonsignificant change of gene expression. The underlying
philosophy is that, as high-throughput methods contribute more and more data, a larger
overall picture should become visible, and newly developed methods may help refine
this image.

Mahesan Niranjan (University of Sheffield) moved away from the experimental towards
computational methods. The often-encountered problem in large-scale data analysis
is that the high dimensionality of the data makes statistical inference difficult
and restricts the methods that can be applied. Niranjan reviewed several approaches
for feature subset selection in the context of microarray expression data analysis;
such selection is crucial when it is necessary to determine a subset of genes that
helps to discriminate between different samples or experimental conditions. Using
publicly available Saccharomyces cerevisiae gene-expression data, he showed that it is indeed possible to significantly reduce
the dimensionality of microarray-related classification problems without considerable
loss in discrimination ability.

One of the pitfalls in analyzing high-throughput data is that our intuitions about
the nature of the networks often color both our approach and our interpretation of
the results. The connectivity distributions of various biochemical networks (such
as protein-protein interactions or metabolic networks) have been analyzed, and it
has been suggested that they follow a power law, which is a distinctive mark of so-called
scale-free networks. Scale-free networks have been proven to be one of the fundamental
types of interconnected systems, both real and artificial. A considerable error rate
is inherent in several of the high-throughput datasets, however; for instance, by
some estimates around half of the protein-protein interactions revealed in different
studies simply should not exist in real cells. This indicates that, in the worst case,
our judgment about the presence of a particular interaction may be as good as tossing
a coin, and it also gives rise to doubts about whether the 'scale-freeness' can be
claimed to be a distinctive property of biochemical networks. Alun Thomas (University
of Utah, Salt Lake City, USA) pointed out that it is possible to observe non-scale-free
structures that have properties resembling those of the scale-free ones. Thomas presented
a model for the structure of graphs of protein-protein interactions, derived from
yeast two-hybrid experiments and supported by the data on human protein-protein interactions,
that does not follow power law. The generated interaction graphs were also subsampled,
taking in only a fraction of the connections for each node. It was shown that the
vertex degree distribution of the subsampled graphs indeed looks approximately like
a power law distribution, while not actually being one.

Several speakers at the meeting suggested that one of the ways to analyze high-throughput
data is to take the data piecemeal, adopting comprehensible and interpretable approaches
to studying relationships inside and/or between moderate-sized groups of genes or
proteins, rather than performing system-wide searches. A parallel methodology coupled
with the systems biology approach was introduced by Kwang-Hyun Cho (University of
Manchester Institute of Science and Technology, UK). Cho uses systems of nonlinear
ordinary differential equations to model networks of biochemical reactions in signal
transduction pathways. Acknowledging that such an approach requires a large amount
of clean and trustworthy data in order to estimate parameters of equations in use,
he pointed out that even approximate and general features derived may help to develop
more efficient experimental strategies.

Béla Novák (Technical University of Budapest, Hungary) uses the tools of dynamical
systems theory to create a comprehensive differential-equation model of the Schizosaccharomyces pombe cell cycle. The model suggests that the cell-cycle engine can be broken down into
modules responsible for transitions between the consecutive cycle stages. Novák's
study has a distinct value, as it describes the transitions between all the major
cell-cycle events in a simple way, using only a few key members of the cell-cycle
control system.

The Mathematica® package Cellerator™ presented by Eric Mjolsness (University of California, Irvine, USA) allows modeling
of biological networks at various levels of scope and complexity, ranging from signal-transduction
pathways to cells and tissues, thus forming a hierarchy. The package supports the
translation of chemical reactions, regulatory interactions and physical forces into
differential equations and can run multicellular developmental models based on such
equations. This is an important achievement in creating a complete modeling framework
that can operate simultaneously at the molecular signaling level and at the level
of multiple cells that communicate and affect each other via their physical movements
and dislocations. The methods were illustrated by modeling the shoot apical meristem
of Arabidopsis thaliana.

All in all, the meeting demonstrated that the field is developing quickly and the
means of data acquisition and analysis are still being refined and extended. The task
of reconstructing gene networks is, however, daunting. Jaroslav Stark (Imperial College,
London, UK) posed an important question in his overview of the problem: what are the
limits on network reconstruction with the presently available data and methods? The
current strategy for network elucidation and analysis seems to be not to attempt to
model the entire system at once, because the paucity of data makes the results impossible
to interpret correctly, but to tackle network modules (although identifying these
still remains an open question) with simple hypotheses that are verifiable by existing
experimental techniques. For additional information on the meeting please refer to
the Biochemical Society past meetings http://www.biochemistry.org/meetings/pastmeet.cfmwebcite.