Abstract

Flower patterning is determined by a complex molecular network but how this network functions remains to be elucidated. Here, we develop an integrative modeling approach that assembles heterogeneous data into a biologically coherent model to allow predictions to be made and inconsistencies among the data to be found. We use this approach to study the network underlying sepal development in the young flower of Arabidopsis thaliana. We constructed a digital atlas of gene expression and used it to build a dynamical molecular regulatory network model of sepal primordium development. This led to the construction of a coherent molecular network model for lateral organ polarity that fully recapitulates expression and interaction data. Our model predicts the existence of three novel pathways involving the HD-ZIP III genes and both cytokinin and ARGONAUTE family members. In addition, our model provides predictions on molecular interactions. In a broader context, this approach allows the extraction of biological knowledge from diverse types of data and can be used to study developmental processes in any multicellular organism.

INTRODUCTION

The flower is a complex structure that develops in a precise and reproducible manner within a given plant species. It is divided into functionally distinct areas of cells characterized by specific combinations of gene expression, and these molecular steady states define cell fates and identities. This patterning is determined by a complex network of molecular interactions called the molecular regulatory network (MRN), which acts at the cellular level. How this network functions and influences flower morphogenesis is a central question in developmental biology and remains to be elucidated.

In this article, we used a new integrative approach to build the MRN of sepal primordium development. During the past two decades, many of the key regulators involved in floral organ development in Arabidopsis thaliana have been identified, some of them being shared by other lateral organs. Gene expression patterns, genetic interactions, loss-of-function phenotypes, and the transcriptome have been extensively analyzed (reviewed in Sablowski, 2009; Irish, 2010; Nag and Jack, 2010; Wellmer and Riechmann, 2010). In addition, many articles have reported evidence of direct molecular interactions between proteins or between proteins and nucleic acids. Paradoxically, this ever-increasing amount of data has not proportionally increased our comprehension of the MRN underlying development. This is not only because data are still lacking but also because each piece of work is largely disconnected from the rest. Indeed, it is not certain that the literature is free of internal contradictions that might preclude the construction of a coherent MRN model.

A powerful approach to integrating data and testing their coherence is the use of mathematical and computational modeling. The concept of an MRN, where all molecular interaction data are integrated into a single graph, is particularly pertinent. Using models derived from such graphs, one should be able to perform simulations that reproduce experimental observations and predict the effects of selected perturbations, to anticipate unobserved phenomena, as well as to provide new directions for biological experiments. One approach, originally proposed by Kauffman (1969), is based on Boolean logic where the state of each gene in the network is either on or off. Boolean approaches have previously been used to analyze the small gene network that regulates the establishment of flower organ identity (Mendoza and Alvarez-Buylla, 1998; Mendoza et al., 1999; Espinosa-Soto et al., 2004; Alvarez-Buylla et al., 2008). However, no attempt has been made to extend this type of approach to other aspects of flower development. In addition, existing models tend to consider both direct (molecular) and indirect interactions. However, indirect interactions in fact reflect the observed behavior of the system, rather than being structural or functional features of it. For this reason, we chose to use these interactions differently: Molecular interactions are used to construct the interaction graph (direct interaction between any type of molecule, protein–protein, protein–DNA, RNA–RNA, etc.), whereas indirect interactions are the emergent behaviors that serve to validate the model. Genetic interactions can be direct or indirect; we considered them as being indirect unless evidence exists suggesting they could be direct, in which case we simply represented it as a molecular interaction. We therefore use the term MRN, since each interaction is meant to be of molecular nature, in contrast with the gene regulatory network, where different types of interaction data are combined in a single network.

We set out to construct a comprehensive model of the MRN that governs sepal primordium development at floral stage 3, when it begins to emerge from the flanks of the floral meristem. An MRN model is composed of an interaction graph that represents all possible molecular interactions and of a molecular regulation mechanism that determines how each element will be activated or repressed. It must be stressed that our model does not include explicitly spatial interactions between cells, but the interaction graph implicitly includes spatial phenomena (e.g., auxin accumulation). The approach consisted of the following steps: (1) inventory of known interactions among elements involved in early flower development; (2) inventory of expression data for these elements and representation of these patterns as a virtual flower atlas with cellular resolution, which was then used to identify the 16 molecular steady states (or cell types) in the stage 3 flower; (3) construction of a candidate interaction graph that meets the minimum theoretical requirements to achieve the steady state configurations; (4) identification of parameter values (weights of interaction and activation thresholds) for the molecular regulation mechanisms of each MRN element; (5) analysis of each solution for biological plausibility by examining the activation functions (molecular regulation mechanisms) of each element and by simulating loss- and gain-of-function mutations and then comparing the resulting steady states with experimental evidence not used to construct the model.

It was possible to find solutions that provide a perfect match between the steady states and the experimental observations. Furthermore, our model could reproduce most of the known genetic interactions (35 out of 37), showing that our model is biologically coherent. Interestingly, the model also predicts the existence of three previously unreported pathways involved in conferring lateral organ polarity. Evidence for these three pathways are scattered in the literature and were revealed simply by integrating the data into a coherent MRN model, illustrating the strength of our approach.

RESULTS

Generating an Exhaustive MRN Topology

We focused our study on the stage 3 flower (Smyth et al., 1990), when the sepals are being initiated. We performed an extensive search of the literature and recorded the interaction data in a database, which contains the molecular nature of each element (whether it is a transcript, protein, protein complex, small RNA, hormone, etc.); the nature of the action of each element on other elements (positive, negative, or unknown); the experimental evidence supporting the interactions between elements (direct chemical binding, induction experiment, genetic evidence, or correlation); and the reference(s) for these observations. The elements retained in our study (1) are active in the flower meristem at stage 3, (2) are connected by an interaction to at least one other element in the database, and (3) either yield patterning defects when perturbed or are connected to an element that does. Using these criteria, we obtained a list of 71 elements with 277 interactions, which are compiled in Supplemental Data Set 1 online (in the section entitled “Interaction”). A graphical representation of this network is shown in Supplemental Figure 1 online.

Expression Patterns and Expected Molecular Steady States

We assume that the developing flower is divided into zones of cells where particular combinations of genes are active. The molecular network within each cell is supposed to be in a specific steady state. To identify the different molecular steady states that exist in a stage 3 flower, in particular in the sepal primordia, the expression patterns of each gene in the database were determined using published data (compiled in Supplemental Data Set 1 online, in the section entitled “Expression”), supplemented with our own studies (see Supplemental Figure 2 online). For each gene, the distribution pattern was interpreted, expressed in binary format, and projected onto a virtual, longitudinal section of a reconstructed flower bud (Fernandez et al., 2010). Among the 71 elements of the database, we generated such maps for 54 elements (Figure 1). We then superimposed the individual maps to obtain an atlas, from which we deduced the various zones of coexpression that represent the molecular activities of the different floral cell types (see Supplemental Figure 3 online for an example with three genes). Based on the maps shown in the Figure 1, we superimposed the 35 genes expressed in the sepal primordia. We find that 35 elements show expression in sepal primordia, with the resulting atlas displaying six zones of coexpression (Figure 2A). The molecular state of each zone may thus be represented by the expression state (0 or 1) of these 35 elements.

Zones of Coexpression and Molecular Activities in the Sepal Primordia.

(A) Following the procedure illustrated in Supplemental Figure 3 online, we superimposed the 35 expression maps of elements expressed in the sepal primordia. This allowed the definition of six homogeneous zones regarding gene expression as shown on the right, where cells are colored according to these molecular activities. Genes are listed on the left, and their molecular activities are given in each zone.

(B) We used the same procedure as in (A) but only considering the elements of the final graph (Figure 3D). The 14 expression maps were superimposed, and this resulted in three homogeneous zones. Their corresponding molecular activities are shown.

Construction of the Sepal Primordia MRN Model

Our objective was to construct an MRN model based on molecular interactions that would reproduce both observed steady states as well as indirect interactions. To this end, we undertook the following steps.

Step 1: Construction of a Candidate Molecular Interaction Graph

The sepal module was composed of 48 elements, 35 of them being expressed in the primordia, while the other 13 were connected by at least one interaction to elements expressed in the sepal but for which expression data was lacking. These 48 elements were connected by 140 interactions (see Supplemental Data Set 1 online, the “Sepal” section, Supplemental Figure 1 online for the interaction graph, and Supplemental Figure 4 online for the interaction matrix). We started with the 44 known direct molecular interactions listed in the sepal database (see Supplemental Data Set 1 online, “Sepal” section) and investigated if the resulting candidate interaction graph fulfilled the required criteria for a functional regulatory network. In particular, to show multistability (several steady states), an MRN must have at least one closed positive circuit and each node must either be part of a closed circuit or its activity must depend on the dynamics of such a circuit (Soulé, 2006; Aracena, 2008). The corresponding graph is shown in Figure 3A, and the evidence for each interaction is recapitulated in Supplemental Table 1 online (row D). We found that this network was incomplete. Indeed, many elements had no input and no closed circuits were found. As a result, many elements did not have sufficient inputs to allow for state changes. Therefore, we searched for putative molecular interactions in the database that could provide appropriate inputs (so that sepal elements could be either active or inactive). When transcription factors were involved, we examined the possibility of a direct interaction by looking for the presence of consensus binding sites in the promoter region of the destination element. In this way, we introduced 12 direct molecular interactions into the initial graph (Figure 3B; see row A of Supplemental Table 1 online for the basis for each hypothesis). However, we found that even this graph did not fully satisfy our requirements. We therefore decided to look for additional information that could fill some of the existing gaps and were able to introduce 10 additional speculative hypotheses of molecular interactions (see Supplemental Table 1 online, row H). As a consequence, we obtained a subnetwork interaction graph (Figure 3C) that fulfills the requirements. This subnetwork of 21 elements connected by 32 interactions is called the “candidate interaction graph” and is used for further analysis. The resulting network is, in fact, composed of elements mainly involved in organ polarity and thus is expected to support the steady states corresponding to the adaxial and abaxial identities. These coexpression zones were obtained by superimposing the expression maps of the elements of the candidate interaction graph that correspond to zones 1 and 3 (Figure 2B). The candidate interaction graph contains nine positive closed circuits (see Supplemental Table 2 online), most of them sharing elements, and thus can sustain at least two cell fates since in theory, a single closed circuit is sufficient to sustain two steady states (Soulé, 2006; Aracena, 2008), depending on the parameters of the model.

(A) Interaction graph of the 48 elements of the sepal primordia connected by the 44 direct molecular interactions reported in the database (thick black arrows).

(B) Graph A completed with 12 hypotheses (red arrows) based on the presence of putative binding motif in the target genes.

(C) Graph B completed with 10 additional hypotheses (dashed red arrows). The explanations for each arrow are given in Supplemental Table 1 online. For hormones, the interaction is not direct but assumed to be through the known molecular pathways (indole-3-acetic acid and ARFs for auxin, and ARABIDOPSIS HISTIDINE PHOSPHOTRANSFER PROTEINS and ARABIDOPSIS RESPONSE REGULATORS for CK).

(D) Final interaction graph corresponding to the subnetwork framed in (C). This interaction graph includes 13 direct interactions and 19 hypotheses of direct molecular interaction. Elements expressed in the adaxial zone are highlighted in yellow, and those expressed in the abaxial zone are highlighted in green. For the others, the expression is either uniform across the sepal primordia or unknown. For simplicity, genes are represented either by the proteins they encode or by the mature active form of the small RNA they encode. Rectangles correspond to proteins and small RNAs, framed rectangles to complexes, and diamonds to hormones. Simple arrows design activations, whereas tee arrows design repressions. Interactions with unknown sign are represented by arrows terminated by a point.

Step 2: Organ Polarity: Predictions Based on Exploration of the Parameter Values

Parameter values (weight of interaction and activation threshold for each element) were estimated such that the network structure and dynamics were coherent with observed expression patterns. These parameters determine the rules that govern the state of each element during the simulations: The state of each element (on or off) is a function of the states of its input; in other words, it is given by the logical (activation) functions (Figure 4) that depend on the parameter values. This is important, for example, to determine whether an element with both positive and negative inputs is active or not. Note that these parameter values are generally not directly available from biological data and must therefore be estimated from the data. We used an optimization procedure (see Methods) to fit the steady state configurations of the inferred MRN model to the expression domains previously found in the virtual atlas subject to the constraints imposed by the interaction graph. When the optimization problem was unfeasible (i.e., no values of the parameters in the inferred model satisfied the given constraints), we either modified the candidate graph or obtained new data to fill knowledge gaps when possible. When the optimization problem became feasible, a search for all optimal solutions (sets of parameters that minimize the fitness criteria) was performed.

For the elements with only one input, f1 corresponds to a repression (the element is ON if its input is OFF and is OFF if its input is ON), and f2 corresponds to an activation (the element is ON if its input is ON and is OFF if its input is also OFF). For the complexes (AGO1-miR165/166, AGO7-miR390, and AS1-AS2), the function number 8 corresponds to the fact that the complex needs both inputs ON to be active. For the six elements with more than one input, several logical functions can be found and are detailed above. The numbers 0 and 1 designate the state of the element as OFF and ON, respectively. For each element indicated at the top left of each table, inputs are shown with all their possible combinations of activities. Each function gives the state of the destination element for each combination of input activity. The numbers designing each function correspond to the sum of the column (from bottom to top) in base 2.

[See online article for color version of this figure.]

We used this inference procedure to look for sets of parameter values that would allow the candidate network to adopt both abaxial and adaxial identities. In the case of genes, only the activity of the final product (protein or mature small RNA) was taken into account. When in situ data showed the expression of a given gene in a particular zone, we assume that its final product is located in the cells where the transcript is detected. In some cases, however, the available data point to more complex situations. For example, the ARGONAUTE1 (AGO1) gene is expressed everywhere (Figure 1), but the protein is degraded by the adaxially expressed AGO10 (Mallory et al., 2009), thus restricting AGO1 activity to the abaxial region. We therefore considered that in the expected steady states, AGO1 activity is abaxially localized. Once parameter values were found, they were used to derive the logical (activation) function for each element (Figure 4). These functions define the activation state of each element as a function of its inputs. We define a solution as a combination of parameter values and corresponding logical functions, allowing the network to adopt both adaxial and abaxial steady states. Among the 1.7 × 1012 possible combinations of logical functions, 338 solutions were found that matched the expected steady states.

We then used existing data to examine the validity of the parameter values determined above. First, it has been shown that miR165/166 is repressed by both ASYMMETRIC LEAVES1 (AS1)/AS2 and TAS3siRNA and that the weight of each repression alone is sufficient to switch miR165/166 off (Li et al., 2005). Among the 338 solutions, we were thus able to exclude all solutions where this is not the case. Secondly, because the degradation of ETTIN (ETT) and ARF4 by TAS3siRNA has been shown to be very efficient (Chitwood et al., 2009; see Supplemental Figure 2 online), we assumed that the inhibitory weight of TAS3siRNA on these elements is likely to be stronger than the activation weight of FILAMENTOUS FLOWER (FIL). We thus excluded all solutions where this is not the case. These two considerations allowed the elimination of 291 solutions, leaving 47 biologically plausible solutions (Figure 5). For each of them, the logical function of each element is indicated by a number, the meaning of which is shown in Figure 4. Several elements have only one possible logical function, which therefore is the same across all the solutions, whereas other elements have several possible logical functions.

Among the initial 338 solutions found, the 47 that have been further tested for biological relevance are shown. The numbers in the first column represent the solutions. The elements are listed at the top, and the number corresponding to the logical (activation) function in each solution is given. Logical functions are designated by a number and are shown in Figure 4. The two solutions (114 and 165) that fulfill the functionality criteria are highlighted.

[See online article for color version of this figure.]

Step 3: Verifying the Functionality of Molecular Interactions

Ideally, if all the molecular data and the hypotheses included in our candidate graph are correct, we expect at least one solution to exist for which all the interactions are functional. Indeed, even if the MRN models retrieve the expected steady configurations, it does not necessarily imply that all molecular interactions are functional in all models (solutions). A functional interaction has parameter values such that if the source element is active, it can have an influence on the state of the target element. We therefore examined whether solutions exist where all molecular interactions are functional. For those elements that have only one input, it is necessarily functional since this is included as a constraint in the parameter estimation procedure. For the elements with a single logical function (the three complexes, AS1, and miR165/166), we found that all interactions were functional across all the solutions. We then analyzed the molecular functions of the five elements (AUXIN RESPONSE FACTOR4 [ARF4], ETT, FIL, KAN and REV), for which several activation functions are possible. For example, REV has two possible activation functions: f4, which implies that REV is active if miR165/166 is absent and auxin is present; and f5, which implies that REV is active in the absence of miR165/166, independently of auxin. Only f4 satisfies the criterion that all elements should be functional since f5 implies that auxin is not required for REV activation. We therefore retained the solutions where REV has f4 as logical function (i.e., where the activation of REV by auxin is functional). Following the same logic for the other genes, we found only two solutions (114 and 165, highlighted in green in Figure 5) among the 47 where all inputs to all nodes are functional. The difference between these two solutions concerns the activation function of ARF4, which requires the absence of TAS3siRNA in all cases and either the presence of both Auxin and FIL (f8 in the solution 165) or the presence of either Auxin or FIL (f14 in the solution 114). We must stress that this is not a selection criteria and other solutions could be equally relevant with respect to biological reality. The fact that a given interaction is not functional in a given solution may be due to lacking data (see Discussion). However, the fact that we find two solutions that reproduce the correct steady states with parameter values implying the functionality of all the interactions suggests that our hypotheses are valid and that our network is coherent.

Model Predictions

Our MRN models should account not only for the direct molecular interactions but also for the indirect interactions not used to construct the models. To verify that this is indeed the case, we identified 37 putative indirect interactions from our database. All evidence concerning the same mutation was grouped into a single test. Furthermore, those genes known to have very similar targets were also grouped into a single gene. For example, whenever genetic evidence was based on the HD-ZIP III gene PHB, we assumed that the result was also true for REV, another family member in our network. Similarly, RDR6 genetically influences many genes and is directly involved in the production of TAS3siRNA; therefore, we transposed its genetic interactions onto TAS3siRNA. After such simplification, we identified a list of 20 mutation tests (Table 1). We then used our MRN models to run simulations by forcing a given element to be either on or off and determined whether, among all the possible network configurations, some of them were fixed points (steady states). If at least one fixed point was found for which the state of the observed gene corresponds to the observed data, then the solution was declared as being compatible with the observation. Our tests identified at least one such fixed point for each virtual mutation and with all 47 models. Network configurations predicted by these simulations were then compared with the experimental evidence. For instance, given that AS1 loss of function results in the upregulation of ETT (Iwakawa et al., 2007), when we forced AS1 to be off in our models, starting from a steady state corresponding to an adaxial configuration, we found at least one fixed point network configuration where ETT was on, indicating that the model prediction is correct concerning the interaction between these two elements. Following this logic, we tested the 37 observations of indirect interactions and found that all but two (observations 3 and 4 in Table 1) were predicted by the 47 solutions. These nonpredicted observations concerned the upregulation of FIL in the double mutant ago7-3 as1-1 but not in the single mutants. Possible reasons for this are discussed later. Finally, we show that 35 indirect interactions predicted by the models are all supported by experimental observations.

List of Evidence for Putative Indirect Interactions Used for Simulation Tests

The fact that we found two solutions compatible with (1) the experimentally observed steady states, (2) the functionality of all the direct interactions, and (3) 35 out of 37 indirect interactions suggests that the hypotheses of direct molecular interactions introduced to construct the candidate graph are valid and shows the coherence of our network. The following 19 additional hypotheses are supported by these two possible models and become predictive: ARF4 activates FIL; AS1/AS2 inhibits miR165/166 and KAN1; Auxin activates ANT, ARF4, miR390, REV, and itself; ETT activates FIL; REV activates AGO7, AGO10, and IPT5; AGO10 inhibits AGO1; cytokinin (CK) activates GTE6; FIL activates ARF4, KAN1, and ETT; TAS3siRNA inhibits miR165/166; and ANT activates FIL. In addition, the activation functions retained in the two final solutions constitute pure predictions of the models since these functions give specific mechanisms of activation for each element that are compatible with the data. The most interesting predictions are the previously unreported pathways revealed by the models. These pathways, all implicating the HD-ZIP III adaxial element REV, are as follows: (1) the activation of AS1 by REV via IPT5, CK, and GTE6; (2) the activation of TAS3siRNA by REV via AGO7; and (3) the repression of AGO1 by REV via AGO10 (Figure 3D).

DISCUSSION

Currently, a wealth of information is available on all aspects of flower development via thousands of published articles and a multitude of databases. Several useful analytical tools have been developed to mine these data. However, they do not permit users to (1) make predictions on the steady states of large networks, (2) predict the effects of local specific perturbations on the global behavior of complex networks, and (3) identify potential contradictions and gaps. For this reason, the predictive and integrative powers of MRN models provide an important complement to these analytical tools. The Boolean model used here is useful in capturing the basic properties of dynamic systems in biology (such as steady states). The parameters in these models are easy to understand, having a clear biological meaning, and the approach can be used for large networks. Importantly, they are well adapted to the existing data available for flower development, which is mostly qualitative in nature and still fragmentary.

The originality of our work mainly relies on the fact that we (1) used three different types of data in different ways, (2) make use of spatial information, and (3) made an exhaustive study of all possible solution parameter sets of the model. Direct molecular interactions were used to construct the interaction graph, expression data were used to restrict the solutions to those which reproduce the wild-type behavior, while indirect interactions were considered as outputs of the perturbed molecular network and were consequently used to validate the resulting MRN models. Most previous qualitative modeling studies (e.g., Mendoza and Alvarez-Buylla, 1998; Espinosa-Soto et al., 2004) mixed genetic (behavioral) and molecular (structural) evidence indiscriminately to propose a regulatory network, then searched for a single solution to the parameter inference problem, and the network behavior was then compared with observed behavior. In these works, spatial expression information was not used in the parameter inference procedure. On the other hand, some quantitative studies have typically focused on parameter inference from spatial expression data (e.g., Mjolsness et al., 1991; Jönsson et al., 2005) without using the structural information available from the literature. Here, we made explicit the difference between genetic data, which is an observed behavior of the system, and molecular data, which is the evidence of a structural or functional feature of the network. Due to the scarcity of molecular data, we used a small set of genetic data, only when supported by additional information (the presence of putative binding sites in the regulatory region of target genes) to formulate a restricted number of hypotheses about molecular interactions. In addition, we used expression as well as unused genetic data to obtain all possible values of the functional parameters compatible with that data.

We must stress that all of the above-mentioned approaches suffer from the same paradoxical insufficiency regarding the data; indeed, the enormous quantity of available data carries very sparse information, leading to ill-posed problems (problems for which several solutions are possible due to lack of data). We consider that if sufficient data were available, only one solution could be found, corresponding to the one occurring in the real life. If data are missing, we cannot discriminate among the different solutions to retain the good one. Several sets of parameters may meet the expected criteria, such that the final results do not provide a strong explanation for the observed behaviors. However, these results are very useful to guide the search of new information to fill the knowledge gaps. The recent availability of high-throughput data call for new modeling approaches where the information about direct molecular interactions, together with spatial and genetic expression data, could be optimally used to reconstruct big networks using a similar approach to the one presented in this work. However, for this to be done, this data must be systematically stored in a convenient form in publicly available databases, something that is still not a current practice in biology laboratories.

The literature had to be interpreted with a degree of precision and accuracy that cannot be reached with automated procedures; while significant advances have recently been made in that direction, current automatic texts mining methods still show important limitations (Kowald and Schmeier, 2011). As a result, we followed an incremental process where the MRN gradually emerged, step by step, from the careful interpretation of the existing data. It must be noted that with data being produced continuously, the database and model produced here reflects the state of knowledge at the time were the model was constructed (last update September, 2010).

Expression data were very important in the construction of the model because they guide the search of parameter values to those giving realistic behaviors, therefore reducing the size of the search space. This is illustrated by the relatively small number of solutions found when expression data were used to constrain the parameter values (338) compared with the size of the search space when no expression data was used (1.7 × 1012 possible combinations of logical functions). However, we used additional evidence on parameter values to constrain further the solutions down to 47 among which only two solutions satisfy the functionality criterion. However, it must be stressed that other biologically valid solutions may arise with further experimentation.

Insights into Flower Development

The superimposition of the expression patterns suggested the existence of six different domains in the developing sepal (Figure 2A). This would in principle imply the existence of as many steady states of the regulatory network, but at this stage, the available molecular data are not complete enough to construct a full model able to reproduce them all. We therefore restricted our analysis to the subnetwork regulating organ polarity. It must be noted that this network is not sepal specific, as it is also active in leaves and other floral organs. Antagonistic interactions between adaxial and abaxial determinants have been described and were expected to generate mutually exclusive and opposing cell fates (reviewed in Kidner and Timmermans, 2010). However, this has not been tested. Recently, Chitwood et al. (2009) identified TAS3siRNA as the signaling molecule that specifies the adaxial/abaxial boundary in developing leaves by forming a gradient of accumulation across the leaf primordia. Similar movement of miR165/166 has been proposed to form a gradient in the opposite direction, leading to a scenario where TAS3siRNA would define the adaxial side by restricting ETT and ARF4 to the abaxial zone and miR165/166 would confine the expression of the HD-ZIP III family members to the adaxial side (Kidner and Timmermans, 2010). In this scenario, however, crosstalk between TAS3siRNA and miR165/166 pathways was proposed to be missing for the network to be stable. Our models predict an activation of TAS3siRNA by REV via AGO7, a pathway that provides such a crosstalk by linking the miR165/166 pathway to TAS3siRNA.

Solving Inconsistencies

We did not come across any major contradictions while constructing the model. However, a number of questions arose. A typical example concerned the observed induction of AS1 by FIL (J. Golz, personal communication). This is not in agreement with the fact that these two genes appeared to be expressed in different domains of the sepal. In fact, this interaction was introduced in the MRN graph and was indeed not supported by any of the solutions. However, closer analysis of the expression data showed that AS1 and FIL are in fact coexpressed at an earlier stage of sepal development when abaxial/adaxial polarity is not yet established (Figure 2). A plausible explanation could be that in the real network, FIL activates AS1 initially in young primordia, and at later stages, other factors, such as GTE6, would maintain AS1 expression in the adaxial domain. However, this scenario implies that once polarity is established, AS1 must be switched off in the abaxial domain despite the activity of FIL. This suggests the existence of an additional element, which either represses AS1 in the abaxial region, strongly enough to counteract its activation by FIL, or is required for the activation of AS1 by FIL. In this case, this element must be present in the young stage of the primordia and absent later, at least in the abaxial region.

A question arose concerning the two ARFs, ETT and ARF4, which in our model are required to promote FIL. This seems to be at odds with the predicted repressor function of these ARFs (Tiwari et al., 2003). It might be that both factors do not exclusively inhibit transcription and act as activators as well. Alternatively, these ARFs could repress some yet unknown element that would repress FIL in the adaxial region. If this was the case, their action on FIL would not be direct. This would then require further analysis of the modified MRN, although this would not necessarily change the dynamics of the network.

Among the 37 indirect interactions tested, two of them were not predicted by our models. These nonpredicted observations are that (1) expression of FIL does not change upon mutating AGO7 or AS1, whereas (2) the double mutant ago7-3 as1-1 exhibits an upregulation of FIL, suggesting a redundant repression of FIL by AGO7 and AS1 (Garcia et al., 2006). The reason why these observations are not supported by our model is that in our interaction graph, AGO7 and AS1 are in the same pathway, which is not consistent with redundancy. Nevertheless, if the ago7-3 allele, which was used to generate the experimental observations, was not null, residual AGO7 activity could be sufficient to repress FIL. In this scenario, AGO7 and AS1 could both act in the same pathway and still be compatible with the experimental observations. To evaluate this possibility, we tested if in a null allele of AGO7, FIL was derepressed to a similar level as the one observed in the double mutant as1 ago7 (Garcia et al., 2006). We performed quantitative PCR analysis on two alleles of AGO7 considered as null (zip-2 and ago7-1) and found no significant upregulation of FIL in either of the mutants (see Supplemental Figure 5 online). These results confirm the redundancy between AS1 and AGO7 regarding FIL repression. Redundancy does imply two independent pathways, which do not exist in our network. To explain why FIL does not change in an as1 mutant, our network needs additional interactions. Given that FIL does not change in a single mutant ago7, we expect FIL to be repressed by an element downstream of AS1 and independent of AGO7. Interestingly, Watanabe et al. (2003) reported the existence in the FIL promoter of a putative KRUPPEL-LIKE binding site, which is required to repress FIL in the adaxial domain. Therefore, we can speculate that such a repressor could be activated by AS1.

Predictions Made by the Model

The models predict pathways not reported previously in the literature. A first pathway links REV to AS1 via IPT5, CK, and GTE6 (Figure 3D). To our knowledge, CK has not been explicitly associated with lateral organ development, nor with polarity, although several IPT genes are expressed in lateral organs and inflorescences (Miyawaki et al., 2004). Transcriptomic data (Hoth et al., 2003) show that GTE6 is induced by CK. Although RT-PCR experiments have revealed GTE6 expression in flowers (Chua et al., 2005), we could not detect signal in situ using GTE6 as a probe. From a functional point of view, REV and AS1 are both involved in adaxial identity and vascular development. Interestingly, IPT genes and CK are also involved in vascular development (Miyawaki et al., 2004; Matsumoto-Kitano et al., 2008). GTE6 is a direct activator of AS1 and has been associated in general terms with leaf morphogenesis. The gte6 mutant phenotype resembles that of the as1 mutant (Chua et al., 2005), suggesting that it could also be involved in adaxial identity. It would be interesting to determine to what extent the role of REV (and HD-ZIP III family members) in vascular development and organ polarity depends on its putative link with AS1. The integrated view of the MRN also points at REV as a central element in several RNA regulatory pathways. In particular, the model proposes that REV is linked via the complex AGO7-miR390 to TAS3siRNA, which represses ETT and ARF4 (Hunter et al., 2006). Thus, REV would promote adaxial identity, first by repressing these two abaxial genes and second by inducing AS1.

Besides predictions concerning the overall structure of the network, the model also proposes specific rules for the activation of its components. In this context, the elements with multiple inputs are of particular interest. The two models that are compatible with functionality of all interactions differ only in the mode of activation of ARF4. ARF4 expression requires the absence of TAS3siRNA in both cases and is activated either by both Auxin and FIL (f8 in solution 165) or by either Auxin or FIL (f14 in solution 114). Additional data are needed to reveal which one of these two solutions is the most relevant.

Simulations showed that the models also account for observed indirect interactions that were not used to construct the MRN. However, we must stress that the tests used to verify the evidence for an indirect interaction do not identify the precise pathways that link two particular elements, as this would require a more complex model regarding dynamics and adapted data, for example, kinetics. Nevertheless, following simulation of a particular mutation, we can check if the state of each element would be compatible with a proposed molecular pathway. For example, this led to the validation of the molecular pathway linking AS1 inactivation to ETT upregulation (Figure 3D). Therefore, we propose a scenario where loss of AS1 function derepresses miR165/166, which in turn inhibits REV (a HD-ZIP III). Loss of REV activity leads to downregulation of its target, AGO7, leading to decreased production of TAS3siRNA and subsequent derepression of the TAS3siRNA target, ETT. A decrease in TAS3siRNA accumulation should also result in ARF4 upregulation, as it is also targeted by TAS3siRNA. In addition, we can predict that AS1 loss of function should also result in IPT5 downregulation since in our network, it is activated by REV. The simulation of AS1 loss of function in our MRN models indeed results in one steady configuration of the MRN, which is in agreement with this pathway.

As shown in this work, the use of theoretical requirements as a modeling guide to obtain functional MRNs constitutes a pertinent criterion to implement the graph with appropriate interactions. Such requirements could be used to design specific experiments and identify putative regulators or targets of particular elements. In more general terms, our model should help to initiate an incremental approach wherein gradually more elements are added. We can expect the further completion of the interaction graph, as data become available, to lead to a more complete MRN able to reproduce all the six observed expression zones in the sepal and potentially more if new expression patterns subdivide the existing zones. In this study, we analyzed a snapshot of flower patterning and therefore did not take into account the evolution of the MRN behavior during flower development, neither did we model explicitly the spatial interactions between cells. The next challenging task will be to integrate these dimensions in the models to come.

METHODS

Virtual Projections

Graphical tools have been developed to visualize the expression patterns on a geometrical representation of the flower bud. In this study, we used a virtual longitudinal section of a three-dimensional reconstructed flower as a support. For each gene, we closely examined the data reporting expression and made an interpretation of the signal such that cells expressed or not a particular gene. Each projection was made using the same mockup. Once each expression pattern had been defined individually from the corresponding set of experiments, we obtained the steady states of the network (or cell identity) by superimposing all the individual maps. For each cell, we had access to the list of genes it expressed. Hence, it was straightforward to sort cells according to their gene expression identity. By associating a unique color to each state, we displayed the steady states in space by coloring individual cells according to their molecular activities.

Identification of Putative Binding Sites and Promoter Analyses

We did not perform systematic promoter searches as this would open too many possibilities with poor relevance since the presence of a short motif in a particular DNA region is not a good indication of binding. Instead, when an appropriate input was spotted (evidence of interaction), we then scanned the promoter region of the destination gene (from the end of the upstream gene to the ATG codon) for the appropriate binding site when it was known. When the motif was found, we did not consider it as a proof of binding but rather as a possibility for such a binding to occur and subsequently introduced a hypothesis. On the contrary, if no such motif was found, then we did not introduce the hypothesis. Presence of auxin response elements was based on perfect matches in direct orientation with the consensus sequence TGTCTC (Ulmasov et al., 1997). Putative AS1-AS2 binding was based on the presence of the sequence motifs CWGTTD and KMKTTGAHW (Guo et al., 2008). Putative HD-ZIP III binding was based on the presence of the sequence motif GTAATSATTAC (Sessa et al., 1998).

MRN Modeling

Our theoretical framework is based on the work of Snoussi (1989) and Thomas and Kaufman (2001) where the state space is discretized rather than time. This discretization is based on the nonlinearities usually seen in real biological phenomena. Indeed, in biology, a regulator is often inefficient below a threshold range of concentration, and its effect rapidly levels up above this threshold (Thomas and Kaufman, 2001). Regulatory interactions are thus usually described using a steep sigmoid function, which can be approximated by a Heaviside function with a threshold for each interaction. Each real threshold then defines a partition of the state space such that in each region the system behaves linearly and such that the system behavior can be described in terms of the possible successions of regions visited by the system instead of continuous trajectories on the real state space. This approach leads to a multilevel qualitative approximation of the dynamic system. Even though in this approximation information about the precise trajectories followed by the system in the real state space is lost (introducing uncertainty in state transitions), the set of possible trajectories in qualitative space (succession of regions) can be known (one of which corresponds to the trajectory in real space). In addition, it has been proven by Snoussi (1989) that the qualitative approximation has the same asymptotic behavior (steady states) as a continuous description, making it possible to study and infer the steady states of the system using the qualitative model. A simplified approach that we used here is to assume a single threshold for all interactions arising from an element of the network; the partitioning therefore results in a Boolean approximation. It must be stressed that in the resulting description, thresholds and weights are not the same as the real threshold and weights found in a quantitative model. However, a relation exists between both descriptions. Mathematical details of the approximation are found in Snoussi (1989).

MRN Requirements for Patterning and Multistability

For the MRN to be able to reproduce the observed patterns, (1) positive circuits must be present for multiple cell fates to exist (Thomas and Kaufman, 2001; Soulé et al., 2006), each stable steady state of the network corresponding to one cell fate; (2) at least one positive circuit is required for two steady stable states and several positive circuits are needed to generate more stable steady states, the maximum number of stable steady states depending on the number of positive circuits and on their intersections (Aracena et al., 2004; Aracena, 2008); and (3) each element differentially expressed among regions must be included in a positive circuit with other elements that are also differentially expressed among those regions or regulated by such a circuit.

Molecular Activation Function

The behavior of the qualitative MRN model, which is represented by the evolution of its configuration (Q ={qv}v=1,..,V) along a sequence of iterations (T={1,..,n,..,N}), is determined by state transition mapping that relates the configuration at the next iteration to the actual one through a local activation function. This function represents the regulatory mechanisms governing the molecular activities at each node. Here, the output of this function (i.e., the state variable of each element) can be either “1” if that element is present in sufficient amount to have an influence on the activity of its destination elements or “0” if not. The state of each molecular element is assumed to change according to a linear threshold associator mechanism: It tends to be activated if a regulatory force, determined by a weighted sum of its inputs, equals or exceeds a local activation threshold, resulting in the following function:where is the state of element u at iteration step n, αuv and wuv represent the sign and weight of each interaction, θv is an effective threshold for element v at which the interactions having this element as source becomes active, is the projection or image of (the qualitative value that the state tends to take depending on input activity, and (v) is the set of incident neighbors of element (v) for all v,u ∈ V. V is the set of nodes representing molecular elements (i.e., gene products, hormones, etc.). The local activation function is completed by a network update schedule function that determines the actual state evolution. Given that in this work we are interested only in the steady states of the MRN, which are independent of the order of updating, we used a parallel schedule (all elements states are updated simultaneously): .

Linear Threshold Associator MRNs and Boolean Networks

The local activation function can be seen as a Boolean function of the state of its inputs; the function computed by an element is determined by the relative values of the weights and threshold for that element. The number of possible functions that an element can compute is finite and depends on the number of inputs. Once the parameter values have been determined, the specific function computed by an element can be coded by an integer number (Figure 4). However, the MRN model used in this work has an important difference with respect to logical Boolean networks in that a single entity with n inputs can compute only a (fast decreasing) subset of the possible Boolean functions for a given number of inputs (e.g., 14 out of 16 for n = 2, 104 out of 256 for n = 3, and 1882 out of 65,536 for n = 4) (Rojas, 1996). This fact restricts the type of molecular interactions that can be modeled with a single unit; in particular, it cannot express the case where the activation of a molecular element requires the nonsimultaneous presence of either of two other products (i.e., the XOR function). This may happen, for example, if two molecules can each activate alone the transcription of a third molecule and if the activation is inhibited when both molecules are present and form a complex that cannot bind to the promoter. However, this can be modeled by introducing additional entities representing molecular complexes.

Biological Meaning of Model Parameters

The link that exists between the Boolean model and piecewise linear quantitative differential models let us give a biological interpretation to the model parameters:where kuv is the synthesis rate of gene product v induced (or inhibited) by gene product u, λv is the degradation rate of gene product v, and σv is a positive concentration threshold. It can be seen that the weight of a given interaction simply corresponds to the equilibrium ratio between the synthesis rate of the target element due to the presence (or absence in case of an inhibitory interaction) of the source element and its degradation rate (w0v is the ratio for the interaction-independent basal synthesis). Thus, the qualitative threshold corresponds to an effective quantitative threshold composed by three terms: (1) a maximum threshold occurring when no basal synthesis occurs, (2) a threshold decrease resulting from interaction-independent basal synthesis, and (3) a decrease due to basal synthesis that can be inhibited by incoming interactions. This relationship between qualitative and quantitative parameters arises from the quantitative to qualitative mapping presented by Snoussi (1989). Derivation is presented in Supplemental Methods 1 online.

Parameter Estimation

As the data are assumed to represent the steady state behavior of the real system, it is possible to infer the parameters of a qualitative network from this data. The theoretical justification can be found in Snoussi (1989) and Thomas and Kaufman (2001), where it is shown that a quantitative (real valued level and continuous time) model of a biological network can be approximated by a qualitative (discrete level and discrete time) model that retains the same steady state behavior as the quantitative one. We must stress again that we analyzed only the steady state dynamics due to the nature of the data, the study of the transitory dynamics needing time series data and other kinds of modeling.

The framework used here for the inference of parameters is that of exact global optimization. We used mathematical programming problem (MPP) and reformulation linearization techniques (Liberti et al., 2009), which allow the formal description of the nonlinear inference problem and the associated biological model as a mixed integer linear programming (MILP) problem that can be solved exactly to optimality with efficient and reliable general purpose solvers. Additionally, this methodology allows the recovery of all exact solutions. The basic issue is the reformulation of the MRN model in a suitable form that follows the MPP formulation; that is, all the elements of the model must be reformulated either as an objective function or as a set of constraints. The parameters of the MRN model become variables of the MPP problem; if a solution exists for the MPP, the parameters of the MRN model are given by that solution. Briefly, an MPP is formulated as follows:where x represents the decision variables, and is an objective function to be minimized subject to a set of constraints , which may also include variable ranges or integrality constraints on the variables. The decision variables represent the activity of the genes and other auxiliary variables. Both the biological regulatory model and interconnection graph are formulated as constraints on decision variables, while the distance of the gene regulatory network activity to expression data is the objective function (see below). We imposed two additional constraints to eliminate spurious solutions: The first one is that each molecular complex of our network can only be active if all its components are present, and the second is that each element should be able to change state according to its inputs (no insulated elements). Finally, to find all possible solutions, an iterative solving algorithm was used (see below).

Objective Function

We wanted the MRN configuration to follow the expression data as closely as possible. A natural method to ensure the closeness is to define a distance from a set of variables to the data, under a suitable metric, and use this measure as the MPP objective function that must be minimized. Given the nature of the data, a suitable closeness measure is the Hamming distance (DH); for the set of state variables Q, it is defined as:where represents the binary expression data, and the symbol represents the XOR operation (i.e., it counts the number of variables whose values are different to the corresponding data expression value).

Network Structure Constraints

We choose to constrain the structure of the network while being flexible on the sign values of the interactions by minimizing a sign distance; the natural measure is again the Hamming distance for integer values given by:

This distance reflects the number of interactions that can change its sign, where ai is the sign variable for interaction i and ɑiis its sign according to the data.

Dynamics Constraints

This problem was studied by La Rota et al. (2008), where the local activation function was reformulated as the following pair of inequalities:where, q and y are binary variables that represent the current state and its projection on the next iteration, respectively, for node v at time n and at region r; w, a, and θ, which are variables of the MPP, represent the weights and signs of each interaction and the activation thresholds of each node, respectively; R is the set of homogeneous regions sharing the same observed combination of molecular activities, and V is the set of network nodes and |V| means the cardinality of the vertex set (the number of vertex). Whenever the right-hand side of both inequalities is greater (or smaller) than the activation threshold for a node, the only solution satisfying both inequalities is that the image must be 1 (or 0). The parameter ε is a very small number that is introduced to solve the ambiguity when the right-hand side is exactly equal to the threshold (both solutions for y being possible if ε = 0), forcing y to have a value of 1. Thus, the MRN dynamics equation is rewritten as two static nonconvex nonlinear constraints that must be satisfied by any point of the trajectory of the dynamics of the system, and in particular they must be satisfied at the fixed points of the dynamics.

Fixed-Point Constraints

We assume that the observed expression data correspond to the fixed point dynamics of the real MRN in different tissues. It is trivial to show that the fixed points of an MRN are the same under any update scheme; therefore, we can discard the time dependency such that the steadiness constraints are given by the fixed point condition:

Solving the Problem

The MPP as formulated above is a nonconvex, nonlinear mixed-integer nonlinear programming problem; that is, there are both objective and constraints that are nonconvex nonlinear forms and some integrality constraints. Most nonlinearities are nonconvex products of binary variables with binary or real numbers or can be reformulated as such, which can be exactly reformulated to equivalent linear forms (Liberti et al., 2009) such that the objective function and constraints become linear (La Rota et al., 2008). The new problem after reformulation is then an equivalent MILP problem that has the same set of solutions of the original mixed-integer nonlinear programming problem but is much easier to solve. Currently, MILP solution methods are the most advanced, and general-purpose solution algorithms exist. We used the mathematical programming language AMPL (Fourer and Gay, 2002) to model the MPP and the standard ILOG CPLEX (ILOG 8.0 User’s Manual) MILP solver to solve it.

Finding All Solutions

Several possible solutions may exist for the MPP problem. To explore all of them, we used an iterative scheme such that a new solution with different logical behavior from those already found is obtained at each iteration. This has been done by computing the output of each network element for all possible binary inputs, as determined by the parameters given by this solution, thus obtaining a binary representation of the logical function, translating this representation to a decimal representation, and adding constraints as proposed by Tsai et al. (2008) to obtain multiple solutions to integer problems.

Computational Issues

The estimation procedure as stated here, once linearized, for a single solution and if restricted to the steady states of the dynamics, scales slowly (number of variables = 8|V|+18|V|2); thus, the size of the network is not a limiting factor when using state-of-the-art solvers to find a single solution. The iterative search of multiple solutions needs a growing number of variables for the computation of all possible logical functions for each node at each iteration (~ (1+2Iter)|V|+3|V|2+5|V|3) (Iter=1,..,#Solutions); this is still manageable with current solvers and may be strongly reduced using look-up tables. However, the method is not suitable to infer the parameters of transitory dynamics from time series data because the number of variables of the MPP will be multiplied by the number of time steps needed (for steady states time is not a variable), such that the problem quickly becomes unmanageable as new nodes and interactions are included or as the time discretization becomes finer to deal with strong nonlinearities. Therefore, the method presented here is suitable for solving estimation problems for qualitative dynamical models at steady states.

MRN Simulation

Simulator software implementing the MRN model concept was conceived and realized in C++. Each parameter set obtained as a solution to the MPP problem was used to define a computational instance of the model that can be simulated.

Validation of Genetic Evidence

For each set of parameters, simulation was done as follows. The mutated or induced genes were set permanently to off or on. All possible configurations where the genes responding to the mutation or induction had states as those given by the evidence were tested for steadiness by setting each configuration as initial condition and applying the state transition mapping. A test was validated for a configuration if it remained unchanged. It must be noted that a single genetic interaction evidence (e) gives information about the observed states of a restricted set of genes (Ce) under mutations in another small set of genes (Me), but nothing is known about the state of other genes in the network (set Oe, where Oe = V – Me – Ce, and V is the set of all genes in the network). Therefore, for each network solution and each evidence, we tested the network configuration steadiness for all possible configurations of set Oe; If at least one of the tests was validated, it meant that the network was compatible with evidence e.

Acknowledgments

We thank John Golz for kindly sharing unpublished results, Michel Morvan and Eric Boix for supporting the project from its beginning, Leo Liberti and Fabien Tarissan for participating in the development of the parameter inference methodology, and all our colleagues from the Reproduction et Développement des Plantes laboratory for helpful discussions and critical reading of the manuscript. C.L.R. and S.P. were funded by the Agence Nationale de la Recherche “Virtual Carpel” and “Geneshape.”

AUTHOR CONTRIBUTIONS

C.L.R., J.T., and F.M. designed the research. C.L.R., P.D., S.P., F.R., and F.M. performed research. J.C. and C.G. contributed new computational tools. C.L.R., E.F., and F.M. analyzed the data. C.L.R., P.D., E.F., C.G., J.T., and F.M. wrote the article.

Footnotes

The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Françoise Monéger (francoise.moneger{at}ens-lyon.fr).

(2007). Expression of the ASYMMETRIC LEAVES2 gene in the adaxial domain of Arabidopsis leaves represses cell proliferation in this domain and is critical for the development of properly expanded leaves. Plant J.51: 173–184.