Complementary approaches to understanding
the plant circadian clock

Ozgur E. Akman1,∗ Maria Luisa Guerriero1
Laurence Loewe1 Carl Troein1,21 Centre for Systems Biology at Edinburgh,
University of Edinburgh, UK
2 School of Biological Sciences, University of Edinburgh, UK
∗ Current address: School of Engineering, Computing & Mathematics, University of Exeter, UK

Abstract

Circadian clocks are oscillatory genetic networks that help
organisms adapt to the 24-hour day/night cycle.
The clock of the green alga Ostreococcus tauri is
the simplest plant clock discovered so far. Its many advantages as
an experimental system facilitate the testing of
computational predictions.

We present a model of the Ostreococcus clock in the stochastic
process algebra Bio-PEPA and exploit its mapping to different
analysis techniques, such as ordinary differential equations,
stochastic simulation algorithms and model-checking.
The small number of molecules reported for this system tests the
limits of the continuous approximation underlying differential
equations. We investigate the difference between
continuous-deterministic and discrete-stochastic approaches.
Stochastic simulation and model-checking allow us to formulate new
hypotheses on the system behaviour, such as the presence of
self-sustained oscillations in single cells under constant light conditions.

We investigate how to model the timing of dawn and dusk in the
context of model-checking, which we use to compute how the
probability distributions of key biochemical species change over
time. These show that the relative variation in expression level is
smallest at the time of peak expression, making peak time an optimal
experimental phase marker.
Building on these analyses, we use approaches from evolutionary
systems biology to investigate how changes in the rate of mRNA degradation
impacts the phase of a key protein likely to affect fitness. We
explore how robust this circadian clock is towards such potential
mutational changes in its underlying biochemistry.
Our work shows that multiple approaches lead to a more
complete understanding of the clock.

The daily cycles in sunlight, temperature and other environmental
parameters are highly important to most organisms. To follow and
anticipate these cycles, living cells generate biochemical rhythms with
a period of approximately 24 hours (circadian). The
majority of the known circadian clocks, including those in
eukaryotes, are based on one or more interlocking transcriptional feedback loops
between a set of key genes. Crucial to the function of the clock is its
ability to entrain to environmental signals (i.e. to adjust its
internal rhythm by synchronising with external cycles),
so that the phase of gene expression is maintained under changes to the
length of the day (photoperiod).
Such entrainment acts through various photoreceptor
pathways, where light affects kinetic parameters of the core clock.
In addition to circadian entrainment,
a defining feature of circadian clocks is that they exhibit
continued oscillations in constant light conditions [?].

The circadian clocks of many organisms are organised around complex
feedback loop architectures, making the determination of design principles a challenging
computational problem. Although research has revealed much about the clock of the
foremost model plant organism, Arabidopsis thaliana, there
are still unidentified components and inconsistencies between
computational models and experimental observations [?].
For this reason, to increase our mechanistic understanding of
circadian clocks, it is desirable to investigate simpler systems
that possess functional similarity with more complex networks. The
circadian clock of the green alga Ostreococcus
tauri[?] is the simplest plant clock discovered
so far, and is thus an ideal model system for understanding plant circadian
function with the help of experiments, simulations
and theory. A quantitative model describing the biochemical
reactions of the Ostreococcus clock can serve as a focal
point for this research, yielding a low-dimensional test system
for various mathematical analysis techniques.

Bio-PEPA [?] is a stochastic process
algebra specifically defined to model and analyse biochemical
systems. Exploiting the defined formal mappings of Bio-PEPA models
into a number of equivalent representations, it is possible to
analyse Bio-PEPA models using different mathematical and
computational techniques, including ordinary differential equations
(ODEs), stochastic simulation algorithms (SSAs) and model-checking.

In previous work we used both ODEs and SSAs to model the clock of
the fungus Neurospora crassa, demonstrating that combining
different analysis methods is important for fully quantifying
the relationship between feedback architecture and circadian
behaviour [?]. Here we build
on this approach, applying a broader range of computational
techniques to the Ostreococcus clock. We develop and
analyse a Bio-PEPA model of the clock, focusing on various
stochastic methods which are the most appropriate in this case due
to the low copy numbers characteristic of the system. In particular,
we exploit the automatic generation of PRISM models from Bio-PEPA to
carry out a novel application of the PRISM
model-checker [?] to a circadian model, computing
time-dependent probability distributions for the clock components.
We use the model to quantify the variability and robustness of the
clock’s functional behaviour with respect to the following factors:
(i) internal stochastic noise, the inevitable consequence of a
system comprising a small number of molecules; (ii) environmental
changes, such as photoperiod variations and transitions between
constant light/darkness; and (iii) mutational changes that affect
the biochemical reaction rates of our model, representing
perturbations to the system that occur on an evolutionary timescale.

The rest of our paper is structured as follows. After an overview of
Bio-PEPA in Section 2, the Ostreococcus
clock is introduced in Section 3, followed by
the description of the corresponding Bio-PEPA model. In
Section 4 we analyse the model using
various approaches. We first use stochastic simulation to
investigate how different light conditions affect the oscillations
of the clock. We then explore approaches for modelling light entrainment
in a continuous-time Markov chain (CTMC) before using model-checking
to compute the time-dependent probability distributions of protein levels.
This enables us to identify the phase markers that are most robust to stochastic fluctuations.
Finally we use ideas from a recently developed framework for evolutionary
systems biology [?] to test how mutational changes in
mRNA degradation rate affect the phase of oscillations in comparison to the inherent
stochastic noise that is present at the individual cell level. The
full Bio-PEPA model is given in Appendix A.

Bio-PEPA [?] is a
stochastic process algebra, recently developed for the modelling and
analysis of biological systems. We give here a brief overview of the
main features of the language. For a detailed presentation of
its syntax and semantics, see [?].

The main components of a Bio-PEPA system are the species
components, describing the behaviour of each species, and the
model component, specifying all interactions and initial amounts of species.
The syntax of Bio-PEPA components
is given by:

where S is the species component and P is the
model component. In the prefix term (α,κ) {op} S, κ is the stoichiometry coefficient
of species S in reaction α, and the prefix
combinator “op” represents the role of S in the
reaction. Specifically, \reactant indicates a reactant,
\product a product, \activator an activator,
\inhibitor an inhibitor and \modifier a generic
modifier. The notation α {op} is a shorthand for
(α,κ) {op} S when κ=1.
The operator “+” expresses a choice between possible actions,
and the constant C is defined by an equation C\rmdefS.
The
process P\syncsIQ denotes synchronisation between
components P and Q; the set I determines the
activities on which the operands are forced to synchronise, with
\syncs∗ denoting a synchronisation on all common action types. In
the model component S(x), the parameter x∈\Nat represents
the initial number of molecules S present.
In addition to species and model components, a Bio-PEPA system consists of
kinetic rates, parameters and, if needed, locations,
events and other auxiliary information for the species.

The formal representation offered by Bio-PEPA allows for different kinds of analysis through the
defined mapping into continuous-deterministic and
discrete-stochastic analysis methods (see [?] for details).
More on Bio-PEPA can be found at [?], including two
software tools, the Bio-PEPA Eclipse Plug-in and the Bio-PEPA
Workbench [?]. Both tools process Bio-PEPA
models automatically and either compute time-series results directly
using various SSA or ODE solvers, or generate representations that
can be used by other tools.

Ostreococcus tauri is an exceptionally small green alga with
a highly reduced genome [?]. Experiments and homology searches
indicate that its circadian clock is very simple compared to higher
plants, such as Arabidopsis thaliana. Only
a handful of the clock genes identified in other plants have been found
in Ostreococcus, and only two of these appear to be central to
the clock. The first of these, which we refer to as TOC1, is homologous to
ArabidopsisTOC1 (TIMING OF CAB EXPRESSION 1)
and other PRRs (PSEUDO RESPONSE REGULATORs). The other gene,
here called LHY, is homologous to ArabidopsisLHY (LATE
ELONGATED HYPOCOTYL) and CCA1 (CIRCADIAN CLOCK
ASSOCIATED 1) [?].
An ODE model of the Ostreococcus clock as a negative feedback loop between these two genes
was introduced in [?], where it was applied to drug treatments and other perturbations.
The full model includes details of the luciferase assay used to measure mRNA and protein levels,
but here we use only the central parts of the model, which describe the dynamics of the native
mRNAs and proteins. The model is illustrated in Figure 1.

Figure 1: The genetic regulatory network underlying our model of the
Ostreococcus clock. The network comprises a single negative
feedback loop involving the LHY and TOC1 genes,
augmented by 5 light inputs which synchronise the endogenous
oscillations in gene expression to the day/night
cycle [?].

TOC1 transcription requires light, which is buffered by a “light accumulator” (acc) and is
inhibited by the presence of nucleic LHY protein (LHY_n) for most of the day. TOC1 activates LHY
transcription through an unknown mechanism, proposed in [?] to work as follows: TOC1
mRNA is translated into inactive TOC1 protein (TOC1_i), which is activated slowly during the day but
quickly after dusk. The active form (TOC1_a) drives LHY transcription throughout the night
but is quickly degraded after dawn. LHY mRNA is translated into cytosolic LHY (LHY_c), which
is quickly translocated to the nucleus, thereby closing the feedback loop. Light also accelerates the
rate of LHY degradation.

The model parameters were estimated by fitting simulated time-courses to equivalent data
obtained from experiments over a wide range of light conditions. Some experiments alternated 12 hours of
light and dark (denoted LD 12:12), others used longer or shorter days (such as LD 16:8 or 8:16), and many
included transitions between different conditions, often into constant light (LL) [?].

3.1 A Bio-PEPA model of the clock

A model of the clock as described above was
implemented in Bio-PEPA. Here we describe its main features;
for the full model, including kinetic laws and parameters, see Appendix A.

One of the key issues involved in obtaining a realistic stochastic model is the correct
scaling of the initial concentrations and kinetic parameters of the continuous ODE model in [?]
so as to obtain the respective molecule counts and rate constants for the Bio-PEPA discrete-state model.
Since the absolute values of the initial concentrations are not known, the initial values in the
original ODE model are given in arbitrary relative units. However, the peak number of TOC1 and LHY
protein molecules was estimated experimentally
over a number of free-running cycles
in LL conditions using a TopCount luminometer. From this, approximate initial values for our discrete-state
model were computed, yielding a rough estimate for the scaling factor of Ω=50.
After such rescaling the Bio-PEPA model can be analysed by ODEs and SSAs, which both give results
in molecule counts.

The proteins and mRNAs shown in Figure 1 are
modelled as the following Bio-PEPA species components that describe
the possible reactions they can participate in and how their amounts
are affected by the occurrence of each reaction. Reactions are
associated with functional rates representing the corresponding
kinetic law.
TOC1_mRNA\rmdeftransc3\product+transl5\activator+deg7\reactantLHY_mRNA\rmdeftransc8\product+deg9\reactant+transl10\activatorTOC1_i\rmdeftransl5\product+conv6\reactantLHY_c\rmdefExtra open brace or missing close braceTOC1_a\rmdefdeg4\reactant+conv6\product+transc8\activatorLHY_n\rmdefExtra open brace or missing close braceacc\rmdefprod1\product+deg2\reactant+transc3\activator

For instance, the transcription of TOC1_mRNA is modelled
by reaction transc3, which involves three different
species (TOC1_mRNA, LHY_n, and acc),
and is positively regulated by the light-accumulator acc
and negatively regulated by LHY_n. The kinetic law for
this reaction is given by a Hill function, commonly used for
describing transcription in clock
models [?, ?, ?, ?]:

Here, species names represent molecule counts, tmp_toc1_transcription=L_toc1+acc⋅R_toc1_acc/Ω and L_toc1, R_toc1_acc, R_toc1_lhy and H_toc1_lhy are parameters.
For comparisons with experiments we also defined the observables
Total_LHY=LHY_c+LHY_n and
Total_TOC1=TOC1_i+TOC1_a.

Bio-PEPA functional rates allow the definition of general kinetic laws.
We use this facility to represent the
entrainment of the system to light/dark cycles through the time-dependent
function below:

This allows us to model light-dependent reaction rates by returning
the value 1 in day-time and 0 during night-time. The parameters
tdawn and tdusk give the time of
the day (in hours) at which dawn and dusk occur, respectively;
H(x) is the Heaviside step function that returns 1 for x>0 and 0 otherwise.

Each technique that can be used to analyse Bio-PEPA
models has its particular strengths: ordinary differential equations
(ODEs) easily predict mean values and quantify dynamical changes
in terms of bifurcations, stochastic simulation algorithms
(SSAs) allow variability in the system’s responses to be measured, and
model-checking enables complex queries about the model to be formulated
and verified automatically. Here we analyse the clock model using these three
analysis methods. After briefly describing each method, we explain why it is
better suited for investigating a particular aspect of the system, and report
some of the results obtained.

4.1 Stochastic simulation: population versus single cell behaviour

Following the formulation of Gillespie’s stochastic simulation
algorithm [?], the stochastic analysis of biochemical systems has received increasing attention
due to the impact that stochastic variability can have on system behaviour.
This is particularly relevant for systems such as gene regulatory
networks, where some molecules are present in copy numbers so small that
random fluctuations are too large for the continuous approximation behind
ODEs to be justified.

Within this framework, a single molecule-by-molecule stochastic
simulation run can be viewed as a faithful representation of
behaviour at the cellular level (assuming the underlying model is
accurate). Observing the mean behaviour over a larger number of runs
is then equivalent to observing a population of cells. Most current
experimental techniques only allow population-level assays. However, as progress in high-resolution imaging
techniques reduces the minimum population size that can be measured,
it will also become possible to consider the effect of stochastic noise, which is expected to be more evident in smaller populations.

In the rest of this section we report results obtained
by solving the clock model using the Dormand–Prince ODE solver and
the Gibson–Bruck SSA, both available in
the Bio-PEPA Eclipse Plug-in [?].
We consider three different light conditions: constant dark (DD),
constant light (LL), and alternating light/dark cycles (LD).
We also consider an experiment in which the system is transferred
from constant light into constant dark (LL-DD).
For each of these,
we compare results obtained by numerical integration of the
deterministic model with those obtained by stochastic
simulation.
The initial conditions are those of the original model at dawn following entrainment to 24
hour light/dark cycles (LD 12:12).
Figures 2 and 4 report the
computed time-series behaviours for all settings.

The species of interest are TOC1_mRNA,
LHY_mRNA and the corresponding experimentally observable total protein amounts
(Total_TOC1 and Total_LHY as defined in
Section 3.1).

(a) DD – ODE

(b) DD – average 10000 SSA runs

(c) DD – single SSA run

(d) LL – ODE

(e) LL – average 10000 SSA runs

(f) LL – single SSA run

(g) LL-DD – ODE

(h) LL-DD – average 10000 SSA runs

(i) LL-DD – single SSA run

Figure 2: Comparison of the deterministic and stochastic models in different light conditions.

DD system.

The rapid damping behaviour of the system in constant dark (DD) is shown in
Figure a–c. This
damping is seen in all analysis methods: ODEs, individual SSA
runs, and the mean stochastic behaviour calculated over 10000 independent
SSA runs. Despite the fact that the self-sustained oscillations observed by
averaging over the SSA runs stop very quickly (within 1-2 days), when
looking at individual SSA runs we note that occasionally a non-zero number
of molecules can be briefly observed, even after several circadian cycles
(Figure c).

LL system.

In constant light (LL), the deterministic system
also exhibits damped oscillations, with all species tending to non-zero
constant values after about 7-8 days (Figure d). Similar behaviour
is observed by averaging over 10000 SSA runs
(Figure e), though the oscillations damp more
rapidly and the steady-state value is slightly different (e.g. the
LHY copy number is about 130 in ODEs compared to about 160 in the SSA).
However, a 10-fold increase of the scaling factor to Ω=500 yields a perfect
quantitative agreement between the SSA and ODEs (see Figure 8
in Appendix C). Although the precise source of the discrepancy
is not known at present, we hypothesise that it is caused by a breakdown of the continuous
approximation underpinning ODEs, due to the low copy numbers
obtained at small Ω values.

The other, most notable, difference between the deterministic and stochastic
models is the behaviour of individual SSA runs (Figure f), for which
persistent irregular oscillations (in both phase and amplitude) are observed.
Because of phase diffusion effects, however, these oscillations cannot be
detected in the mean behaviour: hence, in this case, neither the simple
average over multiple SSA runs nor the ODE solution gives us a correct
indication of the real behaviour of the system,
emphasising the importance of observing single realisations.

In view of these findings, we hypothesise that single cell
experimental data may exhibit sustained oscillations, whereas the behaviour
observed in a large population of cells would be closer to the rapid damping reproduced by both the
ODE and SSA average. This is due to the fact that free-running oscillations are unlikely to be synchronised over different cells.
Furthermore, visual inspection of the solutions obtained by averaging over different numbers of SSA runs suggests
that it should be possible to discern the stochastic effects with a population of around 100 cells. The
experimental data currently available for this system are at the
level of a population comprising at least 10000 cells. However, we anticipate that
the development of new experimental techniques for measuring gene expression
in single cells or small populations will enable this hypothesis to tested
experimentally in the not-too-distant future.

LL-DD system.

As an additional experiment, we consider a system which is kept in constant
light (LL) for 160 hours, and then transferred into constant dark (DD). The
time-series results are reported in Figures g–i.
It can be seen that single realisations of the SSA — approximating the
behaviour of individual cells — exhibit immediate cessation of
self-sustained oscillations following the LL-DD transfer. This behaviour can
be understood by considering the fixed points of the deterministic model.

The LL fixed point is located far from the origin in phase space and is of the stable focus type. Trajectories
of the ODE — approximating the behaviour of a large population of cells — spiral around the
fixed point as they converge to it, producing slowly damping oscillations. In the corresponding
stochastic model, fluctuations kick individual realisations of the system between these spiralling
trajectories, thereby preventing the system from remaining close to the
fixed point for long periods (see Figure 3). This leads to the irregular self-sustained
oscillations observed. By contrast, the DD fixed point of the ODE system
is located at the origin. As species concentrations must be positive, the fixed point cannot be a stable
focus, and is instead a stable node. Trajectories of the ODE converge directly onto it, generating
oscillations that quickly damp to zero. Individual
realisations of the stochastic model are thus repeatedly perturbed between rapidly convergent
trajectories. Consequently, they quickly approach
the DD steady-state following the LL-DD transition, remaining in its vicinity
thereafter (see Figure 3).

The model thus predicts that the DD behaviour of the LL-DD system at
the single-cell level mirrors that at the population level. If
further experiments were to reveal that self-sustained oscillations
are observed during the DD phase, this would indicate that the model
requires modification to convert the DD fixed point into a stable
focus bounded away from the origin.

Figure 3: Phase-space for the LL-DD transfer experiment. White and grey
dots indicate the fixed points (steady-states) of the deterministic system
in LL and DD conditions, respectively. The eigenvalues of the model obtained by
linearising the deterministic equations around a fixed point determine the behaviour of the system in
its neighbourhood. These are listed for both fixed points in Table 2 of Appendix C.

LD system.

The light conditions considered so far are experimental settings useful
for observing the system’s endogenous dynamics. It
is also informative, however, to observe the behaviour of the clock under
natural conditions (alternating 24-hour cycles of light and dark). We present
results obtained for three different photoperiods: 6 hours
light/18 hours dark (LD 6:18), 12 hours light/12 hours dark (LD
12:12), and 18 hours light/6 hours dark (LD 18:6).

As described in Section 3, exposure to periodic
external stimuli such as light/dark cycles has the effect of
resetting the free-running oscillations observed in constant light,
so as to establish stable phase relationships with the forcing
stimulus.
Compared with the free-running LL system, the entrainment to LD
cycles regularises the dynamics of the system, markedly reducing the
variability of oscillations, particularly in terms of phase. As a
consequence, persistent regular oscillations with a stable phase
relationship to the light/dark cycle are observed in both ODEs
(Figures a, d,
g) and when averaging over multiple SSA runs
(Figures b,
e, h).
This phase regularisation can also be seen in individual SSA runs of
the entrained system (compare Figures c,
f, i) with the
simulations of the free-running clock in
Figure f). These observations are
consistent with previous stochastic analyses of clock
models [?, ?]. We also
note that, as for the LL system, the deterministic and mean
stochastic behaviour, whilst very similar, are not in perfect
agreement.

(a) LD 6:18 – ODE

(b) LD 6:18 – average 10000 SSA runs

(c) LD 6:18 – single SSA run

(d) LD 12:12 – ODE

(e) LD 12:12 – average 10000 SSA runs

(f) LD 12:12 – single SSA run

(g) LD 18:6 – ODE

(h) LD 18:6 – average 10000 SSA runs

(i) LD 18:6 – single SSA run

Figure 4: Comparison of the deterministic and stochastic models for different photoperiods.

4.2 Model-checking: time-dependent probability distributions

Model-checking [?] is a formal
verification method that allows modellers to state properties of a
given model and to automatically check whether they are met. Probabilistic
model-checking (see, for instance,
[?, ?]) adds probabilistic measures in the
evaluation of queries. Recently, model-checking has grown in
popularity within the field of systems biology due to its ability to
directly answer complex questions on a model’s behaviour. Traditional model-checking
verification differs from simulation-based analysis in that the verification of a
property is obtained from a computation over the entire
state-space of the continuous-time Markov chain (CTMC) underlying
the model. The major drawback of this approach is
the state-space explosion problem: the
model’s dimension is often too large for computational viability.

Statistical model-checking (see, for instance,
[?]) is an alternative query-based approach: it
estimates the probability distributions and computes approximate
results of queries (together with an estimate of the
error) by generating
random realisations of the CTMC and averaging the results obtained
by evaluating the queries on each of them. The advantage of
statistical model-checking over exact verification approaches is
that it does not need to build the explicit state-space of the
model, which is often intractable, and it does not rely on the
transient solution of the CTMC.
In essence, statistical model-checking is a verification technique
which allows modellers to perform additional analyses of a stochastic
system by automatically evaluating queries over multiple simulation
traces.
The obvious drawback of statistical model-checking
is that it only considers a finite number of behaviours
of the system (i.e. paths in the CTMC) and, hence, the accuracy of the results is strongly
related to that number.
However, exact verification of
biological systems is generally infeasible, and statistical
model-checking often represents a good practical solution.
Another issue of probabilistic model-checking is that the transient
solution of the CTMC can incur the same averaging effect discussed
previously relating to ODEs and mean SSA behaviour: computing the
expected value of the model variables might not be sufficient
because this would be exactly the same as the deterministic
behaviour. Results of reward-based properties, for instance, are
computed in terms of expected values and this, especially in the
case of oscillations which are out of phase, does not give
satisfactory results.

PRISM [?] is a probabilistic model-checker, which
can be used to verify properties of a CTMC model. It also includes a
discrete-event simulator
for statistical
model-checking. PRISM has been used to analyse systems from a wide
range of application domains, and recently also biochemical systems [?].
Models are described using the state-based PRISM language, and it is
possible to specify quantitative properties of the system using a
property specification language which includes the temporal logic
CSL (Continuous Stochastic
Logic) [?, ?].

Using the Bio-PEPA Workbench [?], we generated a
PRISM model of the clock (together with a set of reward structures and some
standard CSL properties which are automatically generated).
In the PRISM model, one module is defined for each species, and
module local variables are used to record the current quantity of
each species. The transitions correspond to the activities of the
Bio-PEPA model and the updates take the stoichiometry into account.
Transition rates are specified in an auxiliary module which defines
the functional rates corresponding to all the reactions. In order to
have a finite CTMC, lower and upper bounds are defined for each
variable.
In the following, we
focus on statistical model-checking: in this case, the choice of the
values for the bounds has no effect on the performance of the
analysis (since the CTMC is not built) and so we can use arbitrarily
high values that are guaranteed not to be reached.

Modelling the light in PRISM.

In order to model the entrained clock (LD system), time-dependent
events must be represented. However, because of the intrinsic nature
of model-checking algorithms, which involve the numerical solution
of the CTMC underlying stochastic models, deterministic events and
time-dependent functions cannot be explicitly specified in PRISM.

We investigated several approaches to address this problem. A first
possibility is to split the model-checking algorithm into different
(two or more, depending on the time window we are interested in)
analysis steps over different time intervals, each with constant
light conditions: two different CTMCs would be considered (one for
the day-time system and one for the night-time one) with the
algorithm switching back and forth between them. The main issue with
this approach is how to merge the results obtained over the
different time periods. For some particular queries, such as those
relating to reachability, this can be done by splitting them into a number
of subqueries such that the result (i.e. probability distributions)
of one query can be used as the initial state for the next one. For
an arbitrary CSL query, however, this cannot be done, and this
strongly limits the kind of queries that could be verified.

The alternative approach we consider here is to represent the light
by approximating time using a monotonically increasing stochastic
variable. The main issue of this
approach is that we introduce an additional stochastic effect which
is absent in the system we have described so far (i.e. where light
is modelled as a deterministic on/off switch). In practice this does
not matter, provided that the stochastic variability introduced is
kept smaller than the variability of any experiments the model may
be compared to.
The introduction of an
additional variable to model time also causes an increase in the
state-space, but this is not an issue here since we focus
on statistical model-checking only. An extract from the PRISM model
showing how we model time and the day/night switch is provided in
Appendix B.

Time-dependent probability distributions of protein levels.

Using model-checking we can compute the time-dependent probability
distributions for each of the model species. For instance, by
verifying the CSL property

\Prob\Eventually[T,T](LHY_c+LHY_n=i)

for time instant T∈[0,96] and protein level i∈[0,500], we can observe how the probability distribution for LHY
protein changes over time during the first 96 hours of simulation.

(a) DD

(b) LL

(c) LD 12:12

Figure 5: Probability distribution of total LHY level over the first 4 days (0–96 hours, 10000 runs). The heatmap represents the probability distribution: a darker colour corresponds to a higher probability. Blue lines show average and standard deviation (μ±σ). Similar changes in distribution with time were
obtained for TOC1 protein.

Each of the plots in Figure 5
refers to a different light condition (DD, LL, and LD 12:12),
showing how the probability of being at a particular level changes
over time in each case. The plots also report the mean value μ
and the standard deviation σ of LHY expression. In all cases,
the initial amount is LHY=200, and then the probability mass
gradually moves away from this initial value. In DD, as expected
from the results of Section 4.1, the bulk of the probability
mass rapidly moves close to zero. In LL we can clearly
observe the effect of phase diffusion, resulting in a probability
distribution spread almost equally across a wide range of values. By
contrast, in LD we are able to observe clear oscillations in the
probability distribution; we also note that the amplitude of peak
expression is more variable than that of trough expression,
with a much broader spread of the probability mass around the mean.

In the following, we focus on the case of alternating light/dark
cycles (LD 12:12). In Figure 6(a) we
report the probability distribution for LHY protein, together with
its mean μ and standard deviation σ in a single 24-hour
cycle (from 120 to 144 hours).
Figure 6(b) plots the oscillation in
the corresponding coefficient of variation cv=σμ.
This provides a normalised measure of the sensitivity of the LHY protein
oscillation to stochastic fluctuations as a function of
circadian time. Small values of cv, therefore, correspond to robust
phase markers (a high signal-to-noise ratio) and large values
poor phase markers (a low-signal-to-noise ratio).
Commonly used phase measures in circadian research are the times of
peak and trough expression together with the time at which the
oscillation falls to its midpoint level. It can be seen in
Figure 6(b) that the coefficient of
variation is minimal around the peak, suggesting that the latter is
the best of the standard phase markers for analysing experimental LHY data.

(a) LHY – probability distribution and μ±σ

(b) LHY – coefficient of variation cv=σ/μ

Figure 6: Probability distribution of LHY level over one day (120–144 hours) in LD 12:12 (10000 runs). The distribution of TOC1 exhibits a qualitatively similar variation.

As discussed in Section 4.1, for the DD
system, both the deterministic solution and SSA average quickly attain a constant value.
Species amounts can, however, be greater than zero for short time intervals in individual SSA runs (see
Figure c). The following CSL property
computes the probability that the total LHY level remains
in the range [0,e] between 96 and 500 hours.

\vspace∗−.2ex\Prob\Globally[96,500](LHY_c+LHY_n≤0+e)

The results reported in Table 1 quantify the
observed trend, showing that there is a small probability that LHY
exceeds e for 0≤e≤20, and that this probability decreases with increasing e.

4.3 Distribution of Mutational Effects: robustness analysis

Thus far we have investigated aspects of the inherent random noise
caused by the small size of the system and the discrete nature of its
components. We have also explored the consequences of variations
in the light environment. We now consider the impact of mutational noise
caused by DNA changes that alter
reaction rates by affecting the
structure of corresponding proteins.
This analysis differs from the above in that it requires simulation of vast numbers of different parameter combinations.
We do this in the context of a recently developed framework for evolutionary
systems biology that describes how the effects of DNA changes propagate
through various levels of organisation and abstraction until they impact fitness
at the highest level of biological functionality [?].
We limit our analysis to what has been described as the “third” level of the adaptive
landscape [?], which maps a combination of biochemical reaction
rates to a computable system-level property likely to affect fitness via a quantitative mechanistic model.

Figure 7: Robustness analysis showing how inherent stochastic
noise and mutational noise affect the peak phase of total TOC1 in 4 sets of 10000 runs. Here, noise is measured by the width of the corresponding distributions in phase values, where wider widths indicate greater noise.
Plots (a–c) are dominated by the high internal stochastic noise observed in single-cell simulations (Ω=50). Plots (d–f) observe population averages as computed with Ω=50×106 and show virtually no internal stochastic noise. The first column (a,d) plots the behaviour of
the wild-type, the second (b,e) adds mutational
noise by changing the mRNA degradation rate by a factor drawn from a uniform distribution [0.5,1.5], where 1.0 is the wild-type. The third column (c,f) shows how phase depends on mutational effects on mRNA degradation.

We generated a StochKit [?] model using
the Bio-PEPA Workbench [?] in
order to build on the code base of an analysis framework that has previously
been used to investigate the evolutionary systems biology of
a different, highly simplified circadian clock system that lacks entrainment [?]. We added a class for computing phase in particularly noisy circadian clocks, using two thresholds to block stochastic noise from generating
artificially short ‘cycles’ with very low amplitude. Thresholds were set at 20% and 35% of the distance between the highest peak and lowest trough
observed over 10 days, since minima are less variable than maxima (see Figure f).
We used the peak of total TOC1 as a phase marker since our model checking results showed peaks to have higher signal/noise ratios (Figure 6(b)).

We consider the parameter combination used in the rest of the paper to be the wild-type and
introduce mutational noise by multiplying degradation rates of all mRNAs in the system (deg7 and deg9 in
Figure 1) by a uniformly distributed factor ([0.5,1.5], where 1.0 represents the wild-type). To differentiate between internal stochastic noise and mutational noise we ran 40000 simulations in
two sets, each analysing 10000 time-courses for the wild-type and 10000 for mutants.
In the first set (Figure 7(a–c)), the system size Ω=50 corresponds to that
of a single cell. The resulting noise is substantial
as indicated by the large width of the distribution of observed phase values
(Figure 7(a)).
In the second set of simulations (Figure 7(d–f)), we increased Ω a
million fold. This results in an excellent approximation of corresponding time-courses produced by ODEs; these
simulations are thus denoted as ‘mean’ here.
The resulting internal stochastic noise
is minimal (see the extremely narrow distribution in Figure 7(d)).

Our ‘mean’ results show
how phase is affected by mutational changes in the mRNA degradation rate (Figure 7(d,e)). If the same mutational changes are introduced in the noisy single cell system, they are much more difficult to detect (see
Figure 7(a,b)). As a different way of visualising results,
Figure 7(c,f) plots the high-level consequences of
mutations (phase) against their low-level effect (the factor affecting mRNA
degradation). The resulting graphs for the ‘mean’ system show clearly how a
change in mRNA degradation rate is expected to affect the mean change in
phase (Figure 7(f)).
Figure 7(c) shows the corresponding plot for single
cell simulations. It demonstrates that a wave of TOC1 peaks starts around
the time of dusk (18h in these simulations) and continues for a few hours.
The time at which the wave starts is minimally affected by the mutations we
investigated, in stark contrast to the variance, which is much lower for
higher mRNA degradation rates. In other words, a lower rate of mRNA
degradation leads to greater internal stochastic noise. This increase in
variance explains why the average phase is strongly shifted towards later
hours for smaller mRNA degradation rates in the ‘mean’ system
(Figure 7(f)), since the mean is strongly affected by
large values in skewed distributions as found in (Figure 7(c)).
Taken together, these results demonstrate that a higher mRNA degradation rate will on average move the phase forward
and make it more reliable (decrease the phase variance), whereas a decrease will move it backwards and make it less reliable.

These subtle patterns in the mutational robustness of the clock could not have been uncovered without running large numbers of simulations with varying parameter combinations. Such work requires a different infrastructure for data analysis from projects that analyse only a few parameter sets.

We have studied the circadian network of Ostreococcus tauri
by developing a process algebra model of the clock, based on an
existing deterministic representation that was parameterised according to
quantitative experimental data. We have investigated several key
aspects of the clock, such as the conditions necessary for
persistent oscillations in its constituent genes and proteins, as well as
the effect of different environmental and mutational changes on the
phases of these oscillations. We used the Bio-PEPA stochastic process
algebra as a modelling language and applied a range of the analysis
methods supported. Because of the low copy numbers of the
molecular species involved in the clock network, we focused on
stochastic analysis methods which enable the system’s intrinsic
variability to be observed.

In particular, we used stochastic simulation to explore how the clock
responds to changes in the light environment, and compared the results
obtained against the behaviour of the corresponding deterministic system. We
predict that the qualitative behaviour of the free-running (LL) clock will
be dependent on the size of the cellular population; while damped
oscillations will be observed in large populations (simulated by the SSA
average and the deterministic model), self-sustained oscillations may be
detectable in single cells (simulated by individual runs of the SSA).
Model-checking was further used to investigate how the variability of the
clock’s behaviour changes over a circadian cycle. By computing
the time-dependent probability distributions of the
clock proteins, we identified the time of peak expression as the most robust
phase marker, suggesting its use as an
experimental measure.

Finally, we added mutational noise to our system by
randomly changing the overall rate of mRNA degradation and observing
how this affects the phase of the oscillations of a key clock
protein, likely to have an impact on fitness. We found that the large
amount of stochastic noise at the single cell level makes it hard to
observe functional changes that may be induced by mutations, without
averaging over many observations.

A number of the novel hypotheses we have formulated in this
modelling study may provide new biological insights into the behaviour of
the Ostreococcus clock and will hopefully inspire subsequent experimental
research. In addition, further theoretical work can build on the novel model-checking results
reported here, to explore additional ways in which systems biology models can be
automatically analysed using approaches based on concurrency theory. Our results
demonstrate that the integration of different computational techniques is critical for fully
quantifying the architectural [?] and mutational [?] robustness
of the circadian clock.

Acknowledgements

The authors thank Gerben van Ooijen for TopCount data and
Jane Hillston and Andrew Millar for their helpful
comments. The Centre for Systems Biology at Edinburgh is a Centre
for Integrative Systems Biology (CISB) funded by BBSRC and EPSRC,
ref. BB/D019621/1. CT is supported by The International Human Frontier Science Program Organization.

Species components:LHY_c\rmdefExtra open brace or missing close braceLHY_mRNA\rmdeftransc8\product+deg9\reactant+transl10\activatorTOC1_a\rmdefdeg4\reactant+conv6\product+transc8\activatorTOC1_mRNA\rmdeftransc3\product+transl5\activator+deg7\reactantacc\rmdefprod1\product+deg2\reactant+transc3\activatorTOC1_i\rmdeftransl5\product+conv6\reactantLHY_n\rmdefExtra open brace or missing close brace

Appendix C Additional simulation and analysis results

Figure 8 shows the comparison between the ODE results
and the mean SSA behaviour with scaling factors Ω=50 and
Ω=500. We observe a slight difference between the SSA results
with the different scaling factors. The reference value is Ω=50
(estimated from experimental protein counts), and consequently the predicted
biological behaviour is that shown in
Figure b. Note, instead, that the ODE
results (Figure a) agree perfectly with the SSA
results for the larger value Ω=500
(Figure c).

Table 2: Eigenvalues of the linearised ODE system about the DD and
LL fixed points.

Table 2 lists the eigenvalues of linearisation at the fixed points of the deterministic model. These determine the dynamics in the vicinity of the steady-states [1]. All eigenvalues of the DD fixed point are negative and real, identifying it as a stable node [1]. The LL fixed point
retains 3 of the DD eigenvalues; the remaining ones comprise two complex conjugate pairs with negative real parts. The steady-state is therefore a stable focus [1]. The positions of the fixed points were estimated using the Nelder-Mead simplex algorithm [2], as implemented in the MATLAB routine fminsearch. Derivatives were computed analytically using the MATLAB Symbolic Math Toolbox.