This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Carbon-13 (13C) analysis is a commonly used method for estimating reaction rates in biochemical networks. The choice of carbon labeling pattern is an important consideration when designing these experiments. We present a novel Monte Carlo algorithm for finding the optimal substrate input label for a particular experimental objective (flux or flux ratio). Unlike previous work, this method does not require assumption of the flux distribution beforehand.

Results

Using a large E. coli isotopomer model, different commercially available substrate labeling patterns were tested computationally for their ability to determine reaction fluxes. The choice of optimal labeled substrate was found to be dependent upon the desired experimental objective. Many commercially available labels are predicted to be outperformed by complex labeling patterns. Based on Monte Carlo Sampling, the dimensionality of experimental data was found to be considerably less than anticipated, suggesting that effectiveness of 13C experiments for determining reaction fluxes across a large-scale metabolic network is less than previously believed.

Conclusions

While 13C analysis is a useful tool in systems biology, high redundancy in measurements limits the information that can be obtained from each experiment. It is however possible to compute potential limitations before an experiment is run and predict whether, and to what degree, the rate of each reaction can be resolved.

Background

In vivo metabolic reaction flux data provides insight into the dynamic function of the cell [1-3]. One widely-used experimental method for measuring in vivo reaction fluxes is steady-state substrate 13C isotope labeling [4-6]. An overview of the general 13C methods is described in Figure ​Figure1.1. Isotopomers, or isomers created from inserting labeled isotopes (often 13C) at different positions in a molecule, provide a unique way to track the progress of carbon through a metabolic network. By measuring the enrichment for 13C in metabolite pools after growing on a 13C labeled substrate, inferences about the internal flux state can be made. The approach can be summarized as a data fitting problem between simulated and experimentally measured 13C labeled metabolite concentrations. An isotopomer model, describing the positional transfer of carbon atoms for all or a subset of reactions in the network, is used to simulate data (Figure ​(Figure1a).1a). For a specified carbon input label, an isotopomer model enables the calculation of an isotopomer distribution vector (IDV) corresponding to a particular simulated steady-state flux distribution (Figure ​(Figure1b).1b). Mass spectrometry (MS) experiments on 13C-labeled metabolites (e.g. macromolecules) generate fractional 13C enrichments from fragmented macromolecules, forming a mass distribution vector (MDV) (Figure ​(Figure1c).1c). The error between the measured MDV and the MDV corresponding to the simulated IDV summarizes how well the presumed flux distribution fits the 13C experiment. The flux distribution v that minimizes this error can be computed by solving a non-linear optimization problem. Simulating 13C enrichment given a flux distribution is computationally inexpensive; however, the inverse problem of calculating the flux distribution that best fits a 13C experiment is both of greater interest and significantly more computationally difficult (Figure ​(Figure1d).1d). A review of these methods and associated challenges can be found in [6-8].

There are several distinct sources of variability in a 13C experiment that limit the confidence with which particular reactions can be determined. First, due to experimental accuracy limitations and biological variability, uncertainty arises in the experimentally measured MDV. Second, due to alternate pathways present in metabolic networks, the mass balance equations underlying a metabolic steady-state are significantly under-determined [9]. While the full network flux distribution may not be resolved at high confidence by a given experiment, certain labeling patterns may resolve fluxes through certain pathways with greater confidence than other labeling patterns, as has previously been shown [4,10].

For a given n-carbon compound, there are 2n possible 13C labeling states (as well as mixtures), and the choice of label is known to affect the ability to determine reactions fluxes [10]. As 13C methods are based upon computational modeling of isotopomer distributions, it is possible to computationally optimize the choice of substrate labeling pattern to enhance the information gained from an experiment. There are two primary motivations that drive such an endeavor. First, 13C experiments are expensive, so choosing the best experiment a priori is desirable. Second, we can assess the capability of the steady-state 13C labeling approach towards determining reaction fluxes in an unbiased manner. The issue of optimization of 13C labeling experiments has been addressed in the literature [4,10,11]. However, the use of flux sampling for optimal isotopomer experiment prediction has not been explored previously, and this approach presents several unique advantages over previous methods.

We describe a Monte Carlo sampling-based method for choosing the optimal substrate label, based upon the Constraint-Based Reconstruction and Analysis (COBRA) computational platform [12,13]. COBRA methods use manually-curated biochemical network reconstructions of known reaction stoichiometries and measurable nutrient uptake and secretion rates to define feasible ranges for internal reaction fluxes. Many of these reconstructions have been generated [14] and the procedure is well-established [13,15]. These models can be used for methods such as computing growth rates [16,17], predicting the effects of gene knockouts [16,18,19], predicting the endpoint of adaptive evolutions [20], and designing strains for industrial production [21,22]. A review of these methods can be found here [12,13,23]. Monte Carlo sampling of constraint-based metabolic models can be used to generate sets of biochemically feasible flux distributions that obey measured uptake and secretion rate constraints [24]. IDVs generated from these flux distributions in an isotopomer model can then be compared against simulated 13C data to evaluate the ability of the experiment to determine reaction fluxes. Monte Carlo sampling takes advantage of the speed with which IDVs can be simulated from putative flux distributions, making this approach suitable for large-scale analysis of in silico experiments.

A Monte Carlo sampling approach was implemented using a newly developed isotopomer model to evaluate the efficiency of different carbon labeling patterns toward determining reaction fluxes in E. coli. The dimensionality of simulated 13C data was calculated using singular value decomposition (SVD) for different substrate labeling patterns and compared to the number of undetermined dimensions in the network. 13C experiments were performed for three substrate labeling patterns to validate the prior theoretical analysis. The methods developed represent a flexible computational analysis that can be applied to various biological systems and experimental setups to estimate, a priori, the efficiency of isotopomer experiments in determining reaction fluxes.

Results and Discussion

Expanded Isotopomer Model

An isotopomer model was constructed in two phases. First, a central metabolic isotopomer model that accounts for 85 reactions including glycolysis, the TCA cycle, the pentose phosphate pathway, oxidative phosphorylation, pyruvate metabolism, and anaplerotic reactions was derived from the iJR904 E. coli reconstruction [16]. This initial model was equivalent in reaction content to commonly used isotopomer models for E. coli [25,26].

An expanded model was then constructed that includes both central and biosynthetic pathways. The iMC1010 metabolic network [19] was evaluated to determine which reactions can sustain non-zero fluxes during growth on glucose, acetate, or lactate when only certain by-products are allowed to be secreted (acetate, formate, D-lactate, pyruvate, succinate, glycerol, CO2, and ethanol). Blocked reactions, which must have zero net flux at steady state, were subsequently omitted from consideration. Groups of reactions that could be merged together without affecting model results (e.g. linear pathways) were combined in order to reduce the number of variables. Large sets of biosynthesis reactions that produce phospholipids, nucleotides, co-factors were also combined, since there no experimental measurements existed for these high-carbon metabolites. However, by-products resulting from high-carbon metabolite production (e.g. CO2, formate, succinate, fumarate, and pyruvate) that could enter back into the metabolic network were tracked. Of the original 932 reactions in the complete metabolic iMC1010 network, nearly a third were represented in the biosynthetic isotopomer model, either individually or as grouped reactions.

The final isotopomer model accounts for a total of 313 irreversible reactions, including 278 which track carbon. Inclusion of these additional pathways is likely important for accurate assessment of the flux-resolving power of 13C experiments both within and beyond central metabolism [7]. A complete listing of the reactions and metabolites in the biosynthetic network can be found in the Additional File 1.

Monte Carlo Sampling Approach

To compute possible flux distributions of the E. coli model, the network was sampled using a Markov Chain, Monte Carlo (MCMC) sampling algorithm (see Methods). The steady-state mass balance and uptake rate constraints for the metabolic network create a convex hyperspace that contains all biochemically feasible steady-state flux distributions [27]. Monte Carlo sampling generates a set of flux distributions that are spread uniformly throughout the feasible space. The inclusion of 13C experimental data reduces the feasible space in which the true flux state must lie by requiring that the IDV calculated from the putative flux distribution must match the experimental data within error. While the space of feasible flux distributions depends only on reaction stoichiometry, the space of resulting simulated IDVs differs depending on the input substrate labeling pattern. Hence, different labeling patterns can have differing abilities to resolve each reaction flux.

Here, we used Monte Carlo sampling of flux distributions to analyze the degree to which reaction fluxes can be determined by steady-state 13C labeling experiments in terms of several possible experimental objectives. For example, one possible experimental objective is to determine whether a particular reaction has a flux above or below a specified value. For this objective, a well-designed labeling pattern would be one in which flux distributions that have an objective reaction flux greater than the specified value can be easily distinguished from flux distributions with an objective reaction flux less than the specified value. As seen in Figure ​Figure2,2, a hypothetical experiment 1 produces measurement distributions which overlap whereas experiment 2 shows greater separation. If one were interested in differentiating between the two partitions, experiment 2 would be much preferable. This method allows for the scoring of any label for any given experimental objective without first knowing the true cellular flux distribution v.

Method Overview. a) The space of flux distributions is partitioned in two parts corresponding to 'high' flux versus 'low' flux. A uniform random sample is drawn from the space and is also partitioned into partition 1 and partition 2. b) For each point...

Generating and Evaluating 13C Experimental Hypotheses

An experimental hypothesis is defined as a partition of the sampled flux distribution set. While many possible hypotheses could be considered, two rational hypotheses were studied. The first case attempts to elucidate whether a reaction has high or low flux (hi-lo). The solution space is partitioned into all points with vj >threshold versus vj <threshold. A different hypothesis is generated for each reaction j. The threshold was chosen to be the median of all vj so that half of all points would be in each of the two partitions. The second set of hypotheses tested consisted of biologically relevant flux ratios. For each point the ratio of two reactions, vi/vj, was determined to be above or below some threshold that formed a partition.

Intuitively, a hypothesis score should be high if the isotopomer distributions coming from one partition are distinguishable from distributions in the other partition. While there are several ways of doing this, we chose a heuristic metric based on a Z-score, which is commonly used to determine the difference between two samples. A Z-score was calculated for each fragment (element) of the calculated MDV for each simulated flux distribution:

Zi=|x ¯hi-x ¯lo|shi2+slo2+σ2

where x ¯hi and x ¯lo are the average fragment enrichments for the upper and lower partitions, respectively, shi2 and slo2 are the variances of fragment enrichments for the upper and lower partitions, respectively, and the α is a constant equal to 0.014. α is on the order of magnitude of the uncertainty in measurements. This slight modification to the standard Z-score puts a lower bound on the expected experimental variation. The Z-score of each fragment is added together to give the Z-score of the experiment.

Z= ∑i∈fragmentZi

Using this approach, candidate flux states were sampled uniformly and experimental hypotheses tested. Z-scores were calculated for the hi-lo hypothesis corresponding to 1) individual reactions 2) reaction ratios and 3) two 'random' reactions or ratios. Random hypotheses were tested to estimate the level of noise associated with the set of flux distributions. Raw and normalized Z-scores are given in the Additional File 1. Z-scores varied from the level of noise to a maximum of >20-fold the level of noise.

To illustrate the differences in label-dependent reaction resolving capacity, two sets of Z-scores corresponding to [1-13C] glucose and [6-13C] glucose are plotted in Figure ​Figure3.3. Lighter colors indicate higher Z-scores and ease of measurement. In this case, [6-13C] glucose scores higher at measuring the pentose phosphate pathway and most of lower glycolysis, whereas [1-13C] glucose glucose scores much higher at measuring the glyoxylate shunt. The results suggest that there is no single label that yields a high score for all experimental objectives. For example, the exchange of formate (EX_for) could be easiest measured with a [1,2-13C] glucose; however, this labeling pattern is bested by [1-13C] glucose for the measurement of reaction formyltetrahydrofolate deformylase (FTHFD) (Figure ​(Figure4).4). This non-universality of labels is in line with expectations, as it has been previously shown that the choice of labels can affect the flux resolution. For many reactions, the best experiment that could be performed involves hypothetical (non-commercially available) labels. One example is the ratio of phosphofructokinase (PFK) flux to fructose bisphosphate aldolase (FBA) flux. The best label for determining this ratio is [1,2,3-13C] glucose (Z = 28.0), which gives a much higher Z-score than the best commercially available label, [1,2-13C] glucose (Z = 18.5). Thus, there may be motivation to synthesize compounds with more complex labeling patterns than commonly used. Additionally, there are certain reactions which are predicted to be difficult to measure with any labeling pattern. For example, the Z-scores for each possible labeled glucose substrate for the reaction pyruvate oxidase (POX) all lie within the level of noise, as determined by the comparison with random hi-lo experiment Z-scores.

Computational Evaluation of Glucose. Potential glucose labels are evaluated based on both absolute Z-scores and Z-scores normalized with respect to the labeling pattern with the highest score. Glucose labels are listed on top including hypothetical labeling...

In addition to label-specific reaction flux elucidation properties, 13C experiments show a clear pathway bias regardless of labeling pattern. The maximum Z-score of all labeling patterns was found for each reaction, giving a metric for the maximum potential for reaction flux determination using 13C-labeled glucose (Figure ​(Figure5A).5A). Then, the fraction of reactions that had a maximum potential at least twice the noise level was found for each subsystem (Figure ​(Figure5b).5b). Reactions that were stoichiometrically fixed by the measured constraints on acetate, glucose, D-lactate, oxygen, growth rate, and ATP maintenance were also categorized by subsystem. Stoichiometrically fixed (i.e. constraint-determined) reactions have a confidence interval of zero, and thus are label-independent and receive no additional knowledge from 13C experiments. It was found that histidine, valine, leucine, and isoleucine metabolism fluxes are completely identified solely based on the flux constraints. On the other hand, prior constraints fix none of the fluxes in central carbon metabolic systems such as glycolysis, citric acid cycle, pentose phosphate pathway, and anaplerotic reactions; however, fluxes in these pathways are all predicted to be identifiable with a 13C experiment using optimal labeling patterns for each reaction. This result is expected as these identifiable pathways are the typical pathways being studied using 13C analysis. Other pathways, such as cysteine, threonine, and lysine metabolism, are completely identifiable through a combination of prior stoichiometric constraints combined with well-chosen 13C experiments. However, many of the remaining subsystems have a fraction of reactions that cannot be determined using any 13C-labeling pattern of glucose. In particular, no additional information can be obtained from 13C-labeled glucose experiments about certain biosynthetic pathways, nucleotide salvage pathways, reductive citric acid cycle reactions, and certain alternate pathways peripheral to glycolysis, such as an alternate pathways from DHAP to D-lactate. Measuring metabolites other than amino acids may give more information on these pathways. Note that in this discussion of identifiability, the Z-score metric indicates that an experiment can significantly reduce the confidence interval of a particular reaction but does not specifically predict the value of the confidence interval. Confidence intervals are directly calculated for experimental data sets in a later section and compared to the Z-scores for the same labeling patterns.

Pathway Analysis of Optimal 13C Experiments. a) The maximum Z-score of all labeling patterns for each non-fixed reaction was found. Notably, greater than 50% of reactions have no experiment with a predicted Z-score greater than twice the noise level....

Dimensionality of Isotopomer Data

The Monte Carlo sampling approach enables the determination of the dimensionality of simulated 13C experiments for a particular substrate labeling pattern. The dimensionality gives an indication of the degree to which a particular substrate labeling pattern can specify the free dimensions inherent in a network structure, given a set experimental error. In an extreme case, if all the data falls on one point (zero dimensions), no additional information is given from the data. Similarly, a dimensionality of one indicates that the data can specify one degree of freedom. Singular value decomposition (SVD) is a data reduction technique that allows the estimation of data dimensionality (Figure ​(Figure6a).6a). A data matrix M of size (nfragments x npoints), consisting of all sample points generated from Monte Carlo Sampling, is decomposed into M = U · Σ · V T where U and V are orthonormal bases and Σ is a diagonal matrix containing singular values in descending order. The singular values are effectively weightings that describe the information content of the corresponding vectors in U and V towards reconstructing the full matrix M. A partial reconstruction of M is possible by taking only a subset of the singular values greater than some threshold. These thresholds have a direct interpretation as the uncertainty with which a data point can be measured. For example, a threshold cutoff of 0.01 indicates that the remaining uncertainty of the data falls within 0.01 or 1% error in the measurement of isotope enrichment.

Data Dimensionality with SVD. The linear dimensionality of experimental data space is measured with Singular Value Decomposition. a) The E. coli model has 313 reactions and 139 degrees of freedom. The isotopomer fragments were computed for a random sample...

To determine the dimensionality of the isotopomer data, SVD was performed on 13C fragments derived from uniformly sampled flux distribution sets for several glucose labels. The results are summarized in Figure ​Figure6b.6b. Globally, the choice of glucose labels affects the dimensionality of the resulting isotopomer data set. At the 1% (0.01) threshold, the label with the highest dimensionality was hypothetical [1,2,5-13C] glucose with 73 dimensions. The three labels for which experimental data was measured in the subsequent section, [1-13C] glucose, [6-13C] glucose, and 20% [U-13C] glucose, had dimensions 53, 53 and 35, respectively, at this cutoff. These values are all significantly lower than the best label, and, in particular, the uniform labeled experiment only produces half of the dimensionality as the optimal experiment. This result is significant. While 139 dimensions (the number of undetermined dimensions for the model used) are required to specify a unique flux vector, the dimensionality of the 13C data for each label is significantly lower. The best labeling experiment specifies just over half (73/139 = 0.52) the degrees of freedom required, and 20% [U-13C] glucose only specifies about one fourth of the possible degrees of freedom (0.26). It is worth noting that SVD is a linear operation used to approximate properties of a non-linear system and the true degrees of freedom may be even lower than reported. SVD serves as a useful upper bound on the dimensionality of data for non-linear systems, but the difference between SVD dimensionality and true dimensionality may grow to be unacceptable for large systems. For the system studied here, SVD was found to be of practical use.

Experimental Validation

In order to assess the agreement of computationally predicted flux elucidation capacity with experimental data, we took fluxomic measurements for three labeling patterns in E. coli. Flux distributions that best explain each set of 13C data were calculated using a non-linear optimization problem:

minvError(v)subjectto:vmin<v<vmaxS⋅v=0

The function Error(v) is a score of how well a given flux distribution fits the experimental data. It is defined as:

Error(v) = ∑i∈fragments(fragmenti(v)-measuredi)2σ2

where measured i is the measured fractional enrichment of fragment i, fragment i(v) is the computed fractional enrichment of fragment i as a function of the flux distribution v, and α = 0:014 is the standard deviation of the fragments as calculated from experimental replicates.

Reaction flux confidence ranges were then computed for all reactions using all three sets of 13C data and all combinations thereof. Confidence intervals for reaction rates were computed by maximizing and minimizing the value of each reaction in turn subject to a slightly relaxed score.

minα/maxαciT⋅N⋅αsubjectto:vmin≤N⋅α≤vmaxError (N⋅α)≤Errormax

where ci = (0, 0, ...0, 1, 0...0) is a vector of all zeros with a 1 in position i, and Errormax was set based on the confidence value. Because different data sets provide different levels of consistency, Errormax was chosen to be 30 units greater than the minimum error found.

These intervals were compared with Z-scores calculated through Monte Carlo methods to assess the ability of the Z-scores to predict the size of experimental reaction ranges in a label-specific manner (Figure ​(Figure7a).7a). The Z-scores were found to be correlated with the relative flux ranges in a statistically significant manner (Student's T-test, p <8.6 × 10-34). A receiver operating characteristic (ROC) curve suggests that the Z-scores can identify with both sensitivity and specificity the reactions that can be elucidated in a label-specific manner, with better performance predicting ranges that are restricted more by data (Figure ​(Figure7b).7b). These findings indicate that the Z-score is indeed a useful predictor of the degree a flux range will be constrained by a particular 13C experiment and provide experimental support for the computational approach taken.

Comparison of Calculated Z-scores and Experimental Flux Ranges. a) For several reactions, the computed Z scores are compared to the resulting measured flux ranges. Z-scores show (color coded) Z-scores for each of the 12 reactions and three glucose labels....

The number of reactions elucidated at particular confidence intervals was then found (Figure ​(Figure8).8). Using different labels provides different levels of reaction confidence (Additional File 1). Including no 13C data generates the largest flux ranges (lower black line), while adding 13C data reduces the ranges and shifts the curve left. With almost no exception, including one experiment yields larger confidence intervals than any combination of two carbon sources which in turn is a larger range than including all three sets. Of the single experiment curves, the 20% [U-13C] glucose curve provides notably worse ranges than the other two experiments, consistent with the finding that 20% [U-13C] substrate provides data with the smallest number of dimensions.

Reaction Confidence Intervals with 13C Data. The number of reactions determined at a particular confidence given different data is plotted. The range of allowable fluxes (vmax - vmin) were computed for each reaction, constrained by none, one, two or three...

At a reaction confidence of 1 mmol · gDW-1·h-1 (a relatively non-stringent cutoff), 85 reactions are specified simply from uptake rate data without any 13C data. Performing the least informative 13C experiment, using 20% [U-13C] substrate, yields 105 reactions that meet the confidence criterion, whereas the combination of all three 13C experiments yields 125 reactions that meet the criterion. In other words, performing all three experiments will increase the number of elucidated reactions by 40 reactions or about 50%. As the model used contains 278 carbon-tracking reactions and reaction groups, the increase in knowledge at 1 mmol · gDW-1·h-1 confidence from 85 to 125 reactions from using 13C data indicates that a large gap in the knowledge remains. It seems apparent that other methods must be developed to obtain flux information at the genome scale from single experiments, as would ultimately be desirable. However, as noted in the above section, the majority of reactions that are elucidated by 13C-labeled glucose experiments lie in central metabolic pathways, which tend to be both of high interest and not-specifiable by constraints alone.

Conclusions

We introduce a new framework for calculating the uncertainty inherent to 13C experiments using Monte Carlo Sampling. This allows us to predict the success of experiments before performing them. The method used here 1) does not require experimental identification of the 'real' flux state a priori [10] and 2) reports scores for the resolution capability for each reaction as opposed to Boolean identification calls [11]. This framework reveals several key findings:

• The choice of input label is important, as different labels perform better than others. In particular, the commonly-used 20% mixture of uniform label + 80% natural label was shown computationally and experimentally to resolve significantly fewer reaction fluxes than either [1-13C] glucose or [6-13C] glucose. Thus, the amount of information likely to be obtained by a 13C experiment can be predicted in a reaction-specific manner before having to carry out an experiment.

• There is no universally best label. The best label depends on the experimental objective. Certain reactions are more precisely measured with some labels than others, and no label is best at elucidating all reactions. Certain hypothetical 13C labels of glucose, for example [1,2,3-13C] glucose and [1,2,5-13C] glucose, are predicted to perform better than commercially available single labels for many reactions.

• The 13C data dimensionality is less than anticipated. Whereas each 13C experiment can measure 186 pieces of information at a time, there is a high degree of interdependence. We measured the true data dimensionality to be between 35 and 50 dimensions for commercially available labels and as high as 73 for exotic labels. This high data redundancy can partially explain why 13C experiments yield many reaction rates with high uncertainties.

This study suggests limitations of steady-state 13C analysis using solely amino acids due to the lower than expected dimensionality of the isotopomer data. However, steady-state 13C analysis is clearly still useful elucidating reaction fluxes in E. coli metabolism. Notably, as the study was conducted using only protein-derived amino acids, it would be of immediate interest to determine the additional benefit of measuring other classes of labeled metabolites, as well as the benefit of more recently developed experimental techniques such as multi-substrate [10] and dynamic flux labeling [28,29] experiments. Monte Carlo methods are generic and thus are well suited to be adapted to the experimental setup of interest. Additionally, Monte Carlo methods are amenable to biasing the sampling space based on known data to improve results; however, this is expected to incur a corresponding cost in convergence time. Possible sources of error in the current method include unwanted bias in the sampled flux space, possible inadequacy of the Z-score as a statistical metric over more sophisticated tests such as the Kolmogorov-Smirnov test, and inadequacy of the distribution median as a threshold for use in hi-lo hypotheses. Conducting Monte Carlo isotopomer analysis on new systems will become more accessible with the availability of the open source COBRA Toolbox v2.0 for MATLAB, which includes the algorithms presented here [30].

Methods

Isotopomer Network Description

The isotopomer network was derived from the iMC1010 E. coli reconstruction [19]. The content is reported in the Additional File 1. There are a total of 313 irreversible reactions including 278 that track carbon. All carbon tracking reactions are broken into elementary forward and reverse reactions.

First, a central metabolic isotopomer model was generated that includes a total of 85 reactions, including a biomass production reaction, which drains the precursor metabolites used to make biomass, and 14 system boundary exchange fluxes (for glucose, oxygen, phosphate, NO2, NO3, acetate, CO2, ethanol, formate, fumarate, glycerol, D-lactate, pyruvate, and succinate). The biomass composition is based on one that was reported previously [16,31] and used in the biosynthetic isotopomer model (see details below), but where the biomass components are replaced by the amount of ATP, NADH, NADPH, and central metabolic precursors needed to synthesize the biomass components (Additional File 1). The remaining 70 reactions participate in glycolysis, TCA cycle, pentose phosphate pathway, oxidative phosphorylation, pyruvate metabolism, and anaplerotic metabolism. The central metabolic isotopomer model includes linear mass balance equations for 67 metabolites. Carbon atoms are tracked through 46 metabolites in the core metabolic network. Changes to the central metabolic reactions include assigning fumarate reductase to utilize menaquinone and demethylmenaquinone rather than ubiquinone and adding a phosphate transport reaction coupled to proton symport.

Aside from the central metabolic reactions contained in the central isotopomer model, the biosynthetic model also includes a number of other catabolic and anabolic reactions. The fluxes were calculated with an additional constraint that flux through formyltetrahydrofolate deformylase (which removes the C1 unit from 10-formyltetrahydrofolate) was less than or equal to the measured formate secretion flux. When higher flux through this reaction was allowed the minimum error improved by only 0.3%, but the flux through this reaction was high (around half the glucose uptake rate). To adjust this aberrant behavior, the optimal flux distributions and confidence intervals were calculated with this additional constraint on the formyltetrahydrofolate deformylase flux. The resulting biosynthetic isotopomer model includes 189 metabolites (126 of which have tracked carbon atoms), 313 irreversible metabolic reactions (63 of which are reversible and involve tracked carbon atoms), and 8,612 isotopomer variables (which is equal to the number of non-linear isotopomer mass balance constraints). The model also includes a biomass reaction and 19 system boundary exchange reactions. The biomass reaction was altered to include amino acids, nucleotides, co-factors, and macromolecules rather than their precursor metabolites. In addition, the biosynthetic model balances intracellular protons as well as water molecules similar to iJR904 [16].

Monte Carlo Sampling

With traditional MCMC, a point is selected within the space which is then iteratively moved around. At each step, a random direction is chosen and the next point is chosen uniformly along this line. The set of points that this algorithm visits will converge to a uniformly distributed set. Two modifications were made: 1) Artificial Centering [32] - Because these biological spaces tend to be elongated in one direction, it is often beneficial to choose directions along the "long" direction rather than uniformly. This can be done by choosing the direction based on previously visited points. At each step, the direction is chosen by drawing a vector from the center of the previous points to one of the previous points chosen at random. 2) In place sampling - Instead of moving just one point throughout the space, many points are moved simultaneously. In this way, no "history" is kept, only the updated position of all the points. This method is described in greater detail in other literature [33,34].

Computing the Isotopomer Distribution

Each flux distribution and glucose input results in a unique isotopomer distribution. The cumomer method [35] and the elementary metabolite unit (EMU) method [36] were implemented in Matlab and utilized in calculating isotopomer distributions. These methods involve solving several linear systems of equations to compute different groups of isotopomers. For numerical reasons, a routine is introduced which checks whether all parts of the network are still connected at every step. Disconnected components can occur when fluxes to and from the component are zero, making it impossible to compute a unique isotopomer distribution within this subnetwork as many isotopomers satisfy the balance equations. By removing these components first, the other metabolites can be solved in a numerically stable fashion. The resulting isotopomers for the amino acids for each flux distribution are transformed to a mass distribution. This way, each experiment is abstracted to a (number of distributions) × (number of fragments) matrix.

Sample Preparation and 13C Measurement

Culture labeling

Prior to labeling, single colonies of E. coli K12 MG1655 were selected from stock plates and inoculated directly into 250 ml M9 medium in 500 Erlenmeyer flasks aerated by stirring at 1000 rpm. Cells were grown overnight, harvested, washed twice with water and used to inoculate 50 ml flasks containing 25 ml medium with 2 g/L 13C-labeled D-glucose, with initial OD600 0.005-0.01. Glucose was supplied as either 100% [1-13C]-labeled, 100% [6-13C]-labeled, or a mixture of 20% uniformly [U-13C]-labeled with 80% natural glucose (which is randomly 1% 13C). Cells were grown to mid-log phase, corresponding to OD600 of 0.6. 3 ml of each culture was harvested by centrifugation at 4°C. The media was aspirated and analyzed with HPLC to determine the remaining glucose concentration. Cell pellets were placed at -80°C prior to further analysis.

Derivatization and GC-MS analysis

Cells were resuspended in 0.1 ml 6 M HCl and transferred to glass vials. Protein was digested into amino acids under a nitrogen atmosphere for 18 hr at 105°C in an Eldex H/D Work Station. Digested samples were dried to remove residual HCl, resuspended with 75 μl each of tetrahydrofuran and N-tert-butyldimethylsilyl-N-methyltrifluoroacetamide (Aldrich), and incubated for 1 hr at 80°C to derivatize amino acids. Samples were filtered through 0.2 μm PVDF filters and injected into a Shimadzu QP2010 Plus GC-MS (0.5 μl with 1:50 split ratio). GC injection temperature was 250°C and the GC oven temperature was initially 130°C for 4 min, rising to 230°C at 4°C/min and to 280°C at 20°C/min with a final hold at this temperature for 2 min. GC flow rate with helium carrier gas was 50 cm/s. The GC column used was a 15 m × 0.25 mm × 0.25 m SHRXI-5ms (Shimadzu). GC-MS interface temperature was 300 degrees with 70 eV ionization voltage. The mass spectrometer was set to scan an m/z range of 50 to 600.

Processing of GC-MS data

Mass data were retrieved from the GC-MS for fragments of 14 derivatized amino acids: cysteine and tryptophan were degraded during amino acid hydrolysis; asparagine and glutamine were converted respectively to aspartate and glutamate; arginine was not stable to the derivatization procedure. For each fragment, these data comprised mass intensities for the base isotopomer (without any heavy isotopes, M+0), and isotopomers with increasing unit mass (up to M+6) relative to that of M+0. These mass distributions were normalized by dividing by the sum of M+0 to M+6, and corrected for naturally-occurring heavy isotopes of the elements H, N, O, Si, S, and (in moieties from the derivatizing reagent) C, using matrix-based probabilistic methods as described [37,38] implemented in Microsoft Excel. Data were also corrected for carry-over of unlabeled inoculum [37].

Computing Reaction Rates from 13C Data

Reaction rates were computed from 13C data as described in the Results. In calculating the best fit flux values from experimental data, a small variation was introduced to reduce the number of variables and remove constraints. Let N be a basis for the null space of S. Then all valid fluxes can be written as:

v=N⋅αminαError N⋅αsubjectto:vmin<N⋅α<vmax

This reduced the number of variables from |v| = 335 to |α| = 139. Optimization was performed with the Tomlab/SNOPT package. This method is an iterative local optimization and is therefore not guaranteed to find the optimal solution. To address the issue of local minima, the procedure was run with many randomly generated starting points and the lowest minimum was taken.

Code and Equipment

The code was written in the MATLAB environment and the COBRA toolbox. Linear Programming was done with the Tomlab/CPLEX package and nonlinear optimization with the TOMLAB/SNOPT interface. The EMU and cumomer method were written in native Matlab but generated in Perl. Computations were performed on a Dell Studio XPS desktops (2.6 Ghz core i7 with 9-12 GB ram) and a custom Rocks cluster (100 dual Xeon 5500 series nodes).

Supplementary Material

Additional file 1:

Z-scores, confidence intervals, and isotopomer model. This file contains the absolute and relative Z-scores for individual reactions across all glucose labeling patterns tested, confidence intervals calculated using experimental 13C tracing data, and details of the model that was used in calculations.