This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Background

Genome-scale metabolic network reconstructions are considered a key step in quantifying
the genotype-phenotype relationship [1]. While the process of manually reconstructing genome-scale metabolic network models
is complex [2], such networks have already been manually reconstructed for more than 50 organisms
[3], including common model microorganisms [4,5], industrially relevant microbes [6-9], various pathogens [10-13], and recently for human cellular metabolism [14]. A modeling approach called constraint-based modeling serves to analyze the function
of such networks by solely relying on simple physical-chemical constraints [15,16] and is frequently used to predict various phenotypes of microorganisms (reviewed
in [3,17-20]). Several applications of constraint-based modeling for metabolic engineering of
microbial species has been presented (reviewed in [17,21]).

The two major computational challenges in metabolic network reconstruction are (i)
the identification of missing reactions in a metabolic network, and (ii) the association
of genes with network reactions. The identification of missing reactions, referred
to as 'gap-filling', is commonly performed based on a pre-defined metabolic capability
that the network is expected to be able to fulfill. For example, the capability to
generate essential biomass products under various genetic and environmental conditions
[22-25], the synthesis of specific compounds identified via metabolomics [26], the flow of specifically measured flux rates [27], and the ability to activate a core set of reactions under a steady-state assumption
[28]. Missing reactions are obtained either from other species [22,23,26-28], or via computational chemistry methodologies that enumerate possible metabolic routes
[29], aiming to identify a minimal number of missing reactions to fulfill the required
objective. A specific approach for network reconstruction that is based on the concept
of elementary flux modes [30] was previously applied to successfully recover missing network reactions [16,31,32]. Another gap-filling approach that integrates some of these principles has been recently
used to reconstruct 130 genome-scale metabolic network models of various bacteria
[22]. While most of the above gap-filling methods rely strictly on metabolic flux analysis
and do not utilize functional genomics data to guide the search for missing reactions,
computational methods that aim to address the second challenge of gene-reaction assignment
do rely intensively on functional genomics data. Specifically, several methods predict
gene assignment based on genomic data, utilizing principles such as conserved chromosomal
proximity [31,33,34] and similarity in phylogenetic profiles with neighboring genes in the same pathway
[32,35,36]. Others rely on an additional array of functional genomics data, including gene co-expression
and protein-protein interactions [37-43].

Here, we present a novel approach, MetabolIc Reconstruction via functionAl GEnomics
(MIRAGE), for reconstructing metabolic network models and specifically addressing
the problem of gap-filling, by searching for missing reactions whose presence is supported
by various functional genomic data. Specifically, to reconstruct a metabolic network
model for an organism of interest, MIRAGE starts from a core set of reactions, whose
presence is established via strong genomic evidence, and identifies missing reactions
that are required to activate the latter core reactions (in addition to biomass requirement)
by identifying additional reactions, whose presence is further supported by phylogenetic
profiles and gene expression data. The performance of MIRAGE, in comparison to previous
methods, is demonstrated on the reconstruction of network models for Escherichia coli and the cyanobacteria Synechocystis sp. PCC 6803, validated via existing networks for these species. Then, it is applied
to reconstruct genome-scale metabolic network models for 36 sequenced cyanobacteria
(supplied via standard Systems Biology Markup Language (SBML) files

[

44

]

), amenable for constraint-based modeling analysis and specifically for metabolic
engineering. To demonstrate the utility of the reconstructed cyanobacteria networks,
a strain design method was applied to predict gene knockouts whose implementation
is expected to significantly elevate the production rate of an important nutritional
product, astaxanthin.

Results and discussion

MIRAGE

MIRAGE is a functional genomics-based model reconstruction approach that aims to generate
a genome-scale metabolic network model for an organism of interest, given a core set
of reactions that are known to exist in its network, and optionally, a definition
of a biomass reaction. The core set of reactions can be automatically derived strictly
from genomic data, based on strong sequence similarity with known enzyme-coding genes
in other species. The method then aims to find missing reactions (from a universal
database of candidate gap-filling reactions such as the Kyoto Encyclopedia of Genes
and Genomes (KEGG)), supported by functional genomics data, whose addition to the
network would lead to a functional model. The method follows a two-step procedure,
starting with the utilization of functional genomics data to estimate the probability
of including each reaction from the universal database in the reconstructed network,
and then, metabolic flux analysis that selects the most likely set of reactions whose
addition to the network would satisfy the above described objectives.

For the first step, we utilize two functional-genomics data sources to estimate the
likelihood that a reaction from a universal reactions database should be included
in the target metabolic network: (i) enzymes' phylogenetic profiles, and (ii) gene
expression. Specifically, we define a weight for each reaction in the universal database
(that is not already included in the reconstruction's core reactions set), based on
the functional similarity between neighboring enzymes, in terms of resemblance of
phylogenetic profiles, and correlation in gene expression of the enzyme-coding genes
(Materials and methods).

Enzyme phylogenetic profiles were extracted from KEGG, representing a pattern of enzyme
presence or absence across an available collection of species. For each reaction in
KEGG, we computed a phylogenetic weight, representing the likelihood for its inclusion
in the network reconstruction. Specifically, the phylogenetic weight of a certain
reaction is calculated based on the maximal Jaccard coefficient between its phylogenetic
profile and the corresponding profiles of its neighboring core reactions in the network
(Materials and methods). Similarly, an expression weight for each reaction was calculated
by evaluating gene expression profiles (measured in the target organism) of potential
enzyme-coding genes (considering all non-annotated genes in the genome), compared
with the expression profiles of known genes associated with neighboring core reactions.
The sum of the phylogenetic and expression weights after proper normalization was
used as input for the second reconstruction step (Materials and methods).

The second reconstruction step aims to find a set of high weight gap-filling reactions
that satisfy the objectives described above. Towards this goal, we employed the following
reaction pruning procedure. Starting from a metabolic network model consisting of
all reactions in the universal reaction database, we iteratively remove potential
gap-filling reactions, as long as the removal does not affect the consistency of the
model. In each iteration, the probability of choosing a certain reaction for removal
is inversely proportional to its weight - that is, low weight reactions have a higher
probability to be chosen first for removal. The model consistency check procedure
involves the usage of constraint-based modeling to verify that the remaining network
(i) enables each core reaction to carry non-zero metabolic flux within a stoichiometrically
balanced flux distribution, accounting for reaction directionality constraints, (ii)
enables the production of all essential biomass constituents, and (iii) accounts for
the growth-associated dilution of all network metabolites (that is, guaranteeing that
the network consists of complete pathways for either the transport or de novo synthesis of all metabolites that exist in the network) [45]. Since the reactions' scanning order may affect the resulting model, the algorithm
is executed several times with different, random pruning orders (Materials and methods).
The fraction of obtained models that contains a certain reaction reflects the confidence
that it should be included in the final model. Hence, to construct the final metabolic
network model, we run the reactions removal procedure again, based on an ordering
defined by the received confidence values (Materials and methods).

Notably, the presented method extends upon the Model Building Algorithm (MBA) of Jerby
et al. [28] that was recently used to reconstruct a model of human liver metabolism. The MBA
method addresses only the first objective from the above list, while not accounting
for biomass production and growth-associated metabolite dilution, which are of less
importance for the modeling of human tissue metabolism. Furthermore, it accounts for
functional genomics data in a more limited manner, by using them only to define two
core sets of reactions with either a moderate or high probability to be retained in
a specific tissue model. In contrast, MIRAGE assigns a continuous score per each reaction
that reflects its probability to be retained in a specific species model, allowing
us to make better use of these data.

The described method is computationally demanding since each trial of the random reaction
pruning procedure (out of the 500 trials performed to gather sufficient confidence
statistics), requires eliminating each reaction from the universal reactions set in
turn, and checking the consistency of the resulting model. Implementing the speedup
heuristic suggested by Jerby et al. [28], which aims to minimize the number of linear optimizations required in each model
consistency check, provided some improvement in running time. However, each random
pruning trial still took around 35 hours, which made the entire method computationally
intractable. The significant increase in running time in comparison to the method
of Jerby et al. resulted from the markedly large size and complexity of the universal reaction database
in comparison to the human network model used by Jerby et al., and the additional reconstruction objectives previously not accounted for.

To overcome this, we implemented the following additional speed-up techniques (Materials
and methods). First, the model consistency check procedure is based on identifying
a set of flux distributions in which all core reactions are activated (that is, have
non-zero flux), and is applied following the removal of each reaction in the reaction
pruning procedure. The first speed-up involved the utilization of flux distributions
computed in one call to the model consistency check procedure in subsequent calls
to this procedure (testing the potential removal of subsequent reactions in the pruning
order) to avoid time-consuming linear programming optimizations. Second, to further
minimize the number of performed linear optimizations, the latter are now formulated
with the objective of minimizing flux through subsequent gap-filling reactions in
the pruning order. These two speed-up techniques, significantly elaborated upon in
Additional file 1, provide a 100-fold improvement in running time.

Figure 1 illustrates the working of MIRAGE on a toy model. Reactions E1, E8, E9 and E10 are
core reactions, while all the other reactions are candidates for gap-filling. MIRAGE
predicts the addition of reactions E2, E3, E4 and E7 to enable flux activation of
all core reactions, biomass production, and accounting for growth dilution of all
metabolites in the core. The inclusion of reactions leading from M3 to M5 is required
to enable flux activation of core reactions E8 and E9. In this case, the choice of
including both reactions E3 and E4 for gap-filling, instead of the single reaction
E5, is based on higher support for the former reactions in the functional-genomic
data. Reaction E2 is predicted for gap-filling to compensate for growth-associated
dilution of metabolites M6 and M9 [45].

Figure 1.The application of MIRAGE on a toy model. Core reactions (E1, E8, E9 and E10) are marked with straight lines, while gap-filling
reactions are marked with dashed lines. A weight for each reaction is computed based
on the correlation of its phylogenetic and expression profiles with those of neighboring
core reactions in the network. Reactions predicted for gap-filling by MIRAGE are in
red. Specifically, E2 is chosen to enable the growth-associated dilution of metabolites
M6 and M9. E3 and E4 are chosen (instead of E5, which has a significantly lower weight)
to enable the flux activation of E8 and E9. E7 is chosen to enable flux activation
of E8 and E9 under steady-state. Reaction E6 is not chosen for gap-filling as it is
redundant given the above-mentioned chosen essential reactions.

Validation of MIRAGE in the reconstruction of a metabolic network for E. coli

To evaluate the performance of MIRAGE, we applied it to reconstruct a metabolic network
model for E. coli, for which a comprehensively curated model (iAF1260) is already available for validation
[46]. Towards this end, we extracted a cross-species reactions dataset from KEGG having
7,211 reactions (referred to as the universal reactions set). To define a core set
of known E. coli reactions to be used by MIRAGE, we considered KEGG reactions annotated as existing
in E. coli and also belonging to iAF1260, plus the known biomass and all exchange reactions from
iAF1260. Then we removed dead-end reactions that cannot be activated within a feasible
flux distribution when considering the entire universal reactions set, yielding a
core set of 812 reactions. Performing standard flux variability analysis [47] when focusing only on this set of 812 core reactions revealed that 45% (365/812)
of these reactions are on dead-ends. MIRAGE's task is hence to identify gap-filling
reactions that would resolve these dead-ends, aiming to identify a remaining set of
109 reactions from iAF1260. Notably, our analysis did not account for subcellular
localization of metabolic processes, and hence duplicated reactions in iAF1260 that
correspond to multiple compartments were removed.

Comparison of MIRAGE's reconstructed network model for E. coli with iAF1260 shows a predictive precision of 41.9% and recall of 24.3%, which is significantly
better than random sampling of gap-filling reactions (hyper-geometric P-value <10-16; Figure 2; Additional file 1, part 6, and Supp. Table 1 in Additional file 1). As controls,
we assessed the predictive performance of using only the functional genomics data
based on the computed reaction weights (by ordering potential gap-filling reactions
based on their computed weights), and the predictive performance of MIRAGE without
utilizing functional genomics data (by assigning reactions with random weights; as
done in the MBA algorithm). Using only the functional genomics data, the resulting
predictive performance was significantly lower than that of MIRAGE (Figure 2), reaching a precision of 6.1%, under a recall level of 19.6% (P-value = 2 × 10-9). Without utilizing functional genomics data, the predictive performance was also
markedly lower, with a precision of 27.5% and recall of 20.6% (P-value <10-16). Using only gene expression [48] or phylogenetic weights (based on all species in KEGG) provided lower precision of
31.8% and 36.9%, respectively, with slightly lower recall levels (19.6% and 22.4%,
respectively) to those achieved when utilizing both (Figure 2), demonstrating the importance of integrating multiple functional-genomics data sources.
As a further control, we applied MIRAGE to reconstruct a metabolic network model for
E. coli, without prior knowledge of exchange reactions (which in the above analysis were
taken from the model of iAF1260), finding an overall similar predictive performance,
showing an improvement of MIRAGE compared to other approaches (Supp. Table 2 in Additional
file 1).

Figure 2.MIRAGE's predictive performance on reconstructing a known metabolic network of E. coli. The precision and recall of MIRAGE is marked with a star symbol. The precision and
recall of several controls, including variants of MIRAGE that utilize only phylogenetic
data, only expression data, or no functional-genomics data, are marked with a triangle,
bar, and circle, respectively. The predictive performance of the functional genomic
data (that is, by ordering potential gap-filling reactions based on their computed
functional genomic weights, without utilizing metabolic flux analysis) is shown by
the straight lines: the performance of the phylogenetic data, gene expression, and
both data sources are colored green, yellow, and purple, respectively. The performance
of random predictions of gap-filling reactions is colored blue.

Comparing the predictive performance of MIRAGE on reconstructing the metabolic network
of E. coli with that of Model SEED [22] has shown a marked advantage to the former. While the number of core reactions considered
by MIRAGE and the SEED algorithm in the reconstruction of a metabolic network model
of E. coli is close (812 and 826 reactions for MIRAGE and SEED, respectively), the number of
predicted gap-filling reactions by MIRAGE was 62, in comparison to only 10 by SEED.
This results from MIRAGE's aim to resolve all gap-filling problems instead of just
enabling biomass production as performed by SEED. The precision of MIRAGE's predictions
was significantly higher than that of SEED, reaching 41.9% for MIRAGE versus 10% for
SEED. Re-running MIRAGE given the very same definition of a biomass reaction used
in the SEED reconstruction of E. coli's model (rather than the biomass definition taken from iAF1260) still resulted in
a higher number of 76 predicted gap-filling reactions, with a significantly higher
precision of 34.2% than that achieved by SEED.

To demonstrate the utility of MIRAGE, we applied it to reconstruct genome-scale metabolic
network models for 36 cyanobacteria for which genomic data are available to define
core reactions sets. Our analysis spans all cyanobacteria for which enzyme annotations
are available in KEGG, including Synechocystis, Synechococcus, Cyanobacteria, Prochlorococcus, Anabaena, and so on [49]. For all species, we considered the same biomass function, obtained from a previously
reconstructed model of Synechocystis sp. PCC 6803 [50], assuming that CO2 is the sole carbon source. Due to lack of comprehensive gene expression for most cyanobacteria
species, we utilized here only phylogenetic data (considering all species in KEGG)
to define reaction weights.

The average size of a core reactions set for a cyanobacteria network is 570 reactions
(Figure 3a), out of which, 331 reactions belong to all of the 36 network cores (Figure 3b). The high degree of similarity between the reaction cores of the various cyanobacteria
species reflects the current knowledge on common metabolic processes across these
species, obtained mostly from sequence comparisons. These shared core reactions belong
to highly conserved metabolic pathways, such as glycolysis, gluconeogenesis, and the
TCA cycle among others. MIRAGE's predictions extend these networks in a species-specific
manner, with many reactions predicted to belong to a small number of species (Figure
3b). These species-specific reactions belong to more peripheral pathways, for example,
diterpenoid biosynthesis, fluorene degradation and others.

Figure 3.Statistics on MIRAGE's reconstructed cyanobacteria metabolic networks. (a) The number of core reactions (blue) and predicted gap-filling reactions (red) in the
various reconstructed cyanobacteria models. (b) A histogram of core reactions (blue) and predicted gap-filling reactions (red) that
participate in different numbers of reconstructed cyanobacteria models. As shown,
cyanobacteria network cores consist of many reactions that are known to exist in all
36 species, while many of the predicted gap-filling reactions are species-specific.

To evaluate the performance of MIRAGE in reconstructing cyanobacteria models, we compared
a reconstructed network model for Synechocystis sp. PCC 6803 with the manually curated models of Knoop et al. [50] and iSyn811 [51,52]. In this case, MIRAGE was applied to reconstruct a Synechocystis model by further utilizing gene expression data obtained from Tu et al. [53] as part of the reconstruction process (Materials and methods). The comparison shows
a predictive precision of 70% and recall of 24.6% for the Knoop et al. model [50] and precision of 37.5% and recall of 45% for iSyn811 [51,52]. These results are significantly better than random sampling (hyper-geometric P-values are 2.99 × 10-27 and 3.59 × 10-31 for Knoop et al.'s model and iSyn811, respectively). Again, we find that the predictive performance
of either the functional genomics data or the flux analysis alone is far worse (Figure
4). A comparison with the predictive performance of Model SEED was not possible in
this case, as the SEED algorithm was not applied to reconstruct cyanobacteria models
(focusing only on well-studied and annotated genomes). As a further evaluation criterion,
we performed a BLAST [54] search of the known enzyme sequences catalyzing the predicted gap-filling reactions
in other species against the genomes of the corresponding cyanobacteria. Reassuringly,
we found that the resulting BLAST E-scores show significantly higher sequence similarity for the set of predicted reactions
in comparison to a random set of reactions (t-test of 1.04 × 10-74). Moreover, 20.3% of predicted reactions showed E-values below 10-100, compared to 9.7% of randomly sampled reactions, testifying the overall correctness
of the predicted set of reactions.

Figure 4.MIRAGE's predictive performance on reconstructing a known metabolic network of the
cyanobacteria Synechocystis sp. PCC6803. (a) Metabolic network after Knoop et al. [50]; (b) metabolic network after Montagud et al. [51,52]. The precision and recall of MIRAGE is marked with a star symbol. The precision and
recall of a variant of MIRAGE that does not utilize functional-genomics data is marked
with a circle. The predictive performance of the functional genomic data (without
metabolic flux analysis) is shown by the purple line. The performance of random predictions
of gap-filling reactions is colored blue.

As a further evaluation of our reconstructed Synechocystis model, we applied it to predict gene knockout lethality data provided by [50]. We find that the prediction performance of our model is comparable with that of
Knoop et al. (Supp. Table 3 in Additional file 1): out of 39 genes known to be non-essential, Knoop
et al. correctly predicted 35, while our model correctly predicts 38. Out of 11 known essential
genes, Knoop et al. correctly predicted 7, while our model correctly predicts 6. The fact that our automatically
generated model reaches a similar level of prediction performance to that of a manually
curated model demonstrates the applicability and importance of our model reconstruction
approach.

To demonstrate the applicability of the reconstructed cyanobacteria networks, we applied
a computational metabolic engineering approach called Optknock [55] on these networks to rationally design genetic modifications that would increase
the production of astaxanthin, which is a powerful antioxidant belonging to the carotenoid
family. These metabolites are known to be produced by various cyanobacteria [56,57]. Optknock works by searching for gene knockouts that would couple the maximal production
and secretion of a molecule of interest with a naturally selected trait of maximizing
growth rate. Notably, 24 of the original core networks extracted from KEGG include
an astaxanthin production reaction, though only 8 of these are not dead-end. In contrast,
25 of the network models reconstructed by MIRAGE have a functional astaxanthin production
pathway, amenable for Optknock analysis.

The application of Optknock for astaxanthin production identified double gene knockouts
in 15 species that are expected to lead to astaxanthin secretion (Table 1). For 12 out of these, Optknock predicts the knockout of dimethylallyl-diphosphate:
isopentenyl-diphosphate dimethylallyl transtransferase (EC: 2.5.1.1), which consumes
an essential precursor for astaxanthin biosynthesis (1-hydroxy-2-methyl-2-butenyl4-diphosphate).
The maximal achievable astaxanthin production rate reaches 2.18 μmol gDW-1 h-1 in Prochlorococcus marinus 9601, representing a carbon utilization of 40% for astaxanthin
production (considering a CO2 uptake rate of 0.22 mmol gDW-1 h-1 [50]). This utilization of CO2 to produce astaxanthin is predicted to reduce growth rate by 57% relative to the wild-type
Prochlorococcus strain (Table 1).

Conclusions

Our paper presents a novel method, MIRAGE, for reconstructing metabolic network models
by integrating metabolic flux analysis and functional genomics data to resolve network
gaps. MIRAGE was validated based on a comparison of its predictions with manually
curated metabolic networks for E. coli [46] and Synechocystis sp. PCC 6803 [50-52]. Then it was applied to reconstruct metabolic network models for an ensemble of cyanobacteria,
with the resulting networks shown to be amenable for metabolic engineering applications
of astaxanthin secretion.

Our results show that functional genomics data enable the marked improvement of gap-filling
in metabolic networks. Furthermore, we show that the integration of more than one
type of functional genomics data can further improve the performance of MIRAGE. Naturally,
MIRAGE can be extended to account for additional functional genomics data, including
protein-protein interactions and genomic context data, which were previously used
for the identification of missing gene annotations in metabolic networks [38]. Metabolomics data can also be integrated within MIRAGE, to enable the definition
of a metabolite core, consisting of metabolites that are known to be synthesized,
and hence corresponding pathways that connect them to the rest of the network must
be identified [26].

Several existing gap-filling methods work by searching for a minimal set of missing
reactions that would enable the network to perform a certain task [22]. MIRAGE extends upon these methods by enabling the identification of pathways that
are not necessarily minimal in size, if supported by functional genomics data. However,
MIRAGE is still limited in being unable to predict the presence of alternative pathways,
in case either one is sufficient to fulfill its defined objectives. This may explain
the relatively low recall levels achieved by MIRAGE and the other tested approaches.
For example, this was demonstrated in Figure 1, where reaction R6 will not be predicted for gap-filling, as an alternative pathway
that fulfills the required metabolic objectives was chosen. The identification of
alternative pathways based on more complex integration of functional genomics data
with metabolic flux analysis is currently an open challenge for all known gap-filling
algorithms. An additional limitation of MIRAGE is that it does not explicitly account
for thermodynamic considerations as part of the network reconstruction process. Future
implementation of this approach may formulate additional thermodynamic constraints
as part of the model consistency check, as suggested in Thermodynamic Metabolic Flux
Analysis (TMFA) [58] (which would require further speedups to obtain reasonable running times).

Metabolic models generated by automated methods such as MIRAGE should be regarded
as first draft models, requiring further manual curation to bring them up to comparable
level with standard manually curated models. The growing interest in reconstructing
metabolic network models for hundreds of species raises the challenge of developing
improved such gap-filling approaches that could speed up the reconstruction process,
while the approach presented here shows a marked improvement in this direction over
the state-of-the-art, supporting the advantage of integrating functional genomic data
as part of model reconstruction. We expect MIRAGE to be used for automatic reconstructions
of many other species, leading to a significant boost in the understanding of their
metabolism.

Materials and methods

Step I: calculation of functional genomics weights

Binary vectors describing reaction phylogenetic profiles were acquired from KEGG.
A phylogenetic weight for each non-core reaction is defined as the maximal Jaccard
similarity with a phylogenetic profile of a core reaction that shares a metabolite
substrate with the reaction at hand. The Jaccard values are normalized based on the
frequency of appearance of the shared metabolite in the universal reaction database
(see Additional file 1 for details).

An expression weight for a given non-core reaction is computed by evaluating the correlation
between profiles of genes that may potentially code for an enzyme catalyzing the reaction
at hand and expression profiles of genes associated with neighboring reactions. Specifically,
for each gene in the genome of the target species, we compute the average Pearson
correlation between its expression profile and profiles of genes associated with neighboring
reactions, with the expression weight defined as the maximal such correlation obtained.
All Pearson correlations are normalized by the frequency of appearance of the connecting
metabolites (as done above).

The distribution of phylogenetic and expression weights are normalized to having the
same mean and standard deviation. Final edge weights are defined based on the sum
of normalized phylogenetic and expression weights. Reactions for which either weight
is missing are assigned the median normalized value.

First, we create a random reaction pruning list by iteratively sampling the next reaction
with probability proportional to its weight normalized by the sum of weights of the
remaining non-sampled reactions. Next, we scan through the obtained reaction list
and try to remove each reaction in turn from the model, as long as the resulting model
remains consistent. The consistency check involves verifying that: (i) each core reaction
can carry non-zero flux under steady-state and reaction directionality constraints;
(ii) there can be non-zero flux through the biomass reaction; and (iii) the growth-dilution
of each metabolite in the network is accounted for. Once we finish scanning through
the pruning list, we are left with a minimal functional model. We repeat the random
pruning procedure 500 times and count the number of times that each non-core reaction
appeared in the final model. Finally, we order the non-core reactions based on their
frequencies (from low to high) and repeat the pruning step to obtain the final model.

The details of the above and the implementation of the various speed-up techniques
that makes this algorithm computationally tractable are described in Additional file
1. The implementation of MIRAGE is available at [44].