Monday, July 4, 2016

Future perspectives for stoichiometric modeling

These are a set of notes I wrote for a specific proposed stoichiometric modeling project. I've edited out some of the details because someone is actually working on this project, but I think what remains is still worth reading.

Areas of opportunity in stoichiometric modeling:

I think local optimization techniques that can find “maximum likelihood” relationships between omics data and flux sets in FBA models are an area of huge opportunity in stoichiometric modeling.

Schellenberger http://www.jbc.org/content/284/9/5457.full discusses “random sampling” analysis, and I think some of the techniques he mentions could be applicable to our system. However, I think non-random sampling might be even more exciting.

Another subfield of computational biology is evolutionary genetics. In evolutionary genetics, one of the main problems is to reconstruct phylogenetic trees from modern sequencing data. It is often the case that provably optimal phylogenetic trees cannot be computed, because the problem is too computationally complex. So a modeler has to settle for a solution that is good, but not provably optimal. To arrive at these solutions they use Bayesian or “maximum likelihood” approaches. These kinds of approaches are not yet widespread in stoichiometric modeling. But I think they should be.

Differences between stoichiometric models and ODE models:

A stoichiometric model is just a set of reaction stoichiometries. The way they are generally used is that the modeler defines specific rates of inputs and outputs from the system, and the computer calculates a set of internal fluxes that allow for those exchange fluxes (a procedure called Flux Balance Analysis). Because there are often an infinite number of solutions to flux balance analysis problems, the solution sets have to be narrowed down somehow, such as by eliminating internal loops, or eliminating reactions which have no tissue-specific expression support. Stoichiometric models can also be used to predict metabolic phenotype from gene expression data. Many current methods are summarized at http://cobramethods.wikidot.com/.

The other major kind of metabolic modeling is kinetic (ODE) modeling. Whereas the focus of stoichiometric modeling is on fluxes (with the exception of certain specialized techniques such as dynamic FBA), the focus of ODE modeling is centered more around concentrations. An ODE model consists of a set of equations explaining how reaction rates depend on the concentrations of metabolites. The advantage to ODE models is that they allow the model to capture complex, indirect relations such as feedback inhibition. A major disadvantage of ODE models is that they can require detailed descriptions of all of the enzymes in a system, which are often not available, or not correct, or only correct in certain circumstances. To some extent, the disadvantages of ODE models can be compensated for by using formalizations such as Metabolic Control Analysis and Biochemical Systems Theory, that rely only on a small number of parameters for each enzyme, rather than a full kinetic description (I would recommend taking one of these two approaches, in any new kinetic model).

Using stoichiometric models to compare wild-type with mutant phenotype:

One of the obvious experiments is to run RNA-seq on both the wild-type and mutant, and compare gene expression levels. Analyzing expression of genes that encode enzymes will be substantially easier if a high-quality stoichiometric model is available. The “state of the art” in automated transcriptome annotation and model generation is pretty poor. The automatically generated models that I've looked at have been rampant with obvious mis-annotation. A careful semi-automated (but well curated) reconstruction of a metabolic network for a model plant would enable substantially more reliable analyses of quantitative transcriptomic data which would be useful not just for this project, but also for many other projects in the wider plant biology community. For example, the human metabolic model made by Duarte et al. (2007) [1], has been used in numerous other studies (including Fan et al. 2014), and cited more than 700 times (it would be interesting to survey those citations and see what exactly it is being cited for, but at least some of the time, like in Fan et al. 2014, it is leading to insights into human physiology). If nothing else, a carefully reconstructed model enhances the accuracy of visualizations of omics-data mapped onto the metabolic network graph (because it improves the accuracy of the mapping). That in itself I think is sufficient justification for funding the construction of a model.

In addition to improving omics data visualization, a stoichiometric model can also be used to predict fluxes from omics data, with one of the methods reviewed by Blazier and Papin (2012) [2]. If a subnetwork of 100 or fewer of the most relevant reactions can be identified, my own Flux Rank method (unpublished) could be used. These methods would answer the same kinds of questions as a manual evaluation of a color mapping of expression levels onto a network visualization: they would indicate whether decreased flux due to decreased enzyme gene expression, in the absence of additional regulatory interactions (such as feedback inhibition, enzyme phosphorylation, or any other post transcriptional regulation), is alone sufficient to explain mutant phenotype. It would also be possible to try to account for other kinds of regulation using stoichiometric modeling techniques (although kinetic modeling may be more suited for this). For example, if there were a hypothesized post-transcriptional regulation interaction between two proteins, one could incorporate that into the model. For example, using the E-Flux [3] method, you would modify the bounds of a reaction based not only on the expression of enzymes directly involved in that reaction, but also based on other transcripts that are hypothesized to have a regulatory interaction with it. Analogous operations are possible using other modeling methods. The problem with trying to infer these kinds of long distance regulatory interactions from a stoichiometric model is that, with just two conditions, there is the potential for a huge number of false positives: interactions that are not real, but would be consistent with the expression data and product accumulation data for those two conditions.

To decrease the number of false positives among predicted interactions, a subnetwork of the stoichiometric model could be converted into a kinetic model. The kinetic model could be constructed based on Michealis-Menten Kinetics (MMK), or by the Metabolic Control Analysis formalism (MCA) (David Fell), or the Biochemical Systems Theory (BST) (Eberhard Voit) formalism. A kinetic model would be most useful for pathways where there is at least some knowledge (complete knowledge would be helpful, but not essential) of the concentrations of pathway intermediates in both the wild-type and the mutant. After choosing the pathways to model, we'd attempt to find model parameters that are consistent with the expression data, and with the known kinetic parameters of the enzymes involved, keeping within those constraints, the parameters should be further adjusted so that they are consistent with the metabolomics data. We would hypothesize that without incorporating long distance regulatory interactions, no model matching the experimental data (for both conditions) would be possible, then we would try potential regulatory interactions until we are able to find parameters that cause the model to match the experimental data. If there are sufficiently few of them, they can be tested experimentally. The kinetic modeling approach to some extent may suffer from the same problem with an overwhelming number of false positive predictions as regulatory approaches with stoichiometric models. However, it has the advantage that it in addition to boundary fluxes, it also predicts internal metabolite concentrations, so if these can be measured, they can be used as constraints on the parameter space of a kinetic model (that is, as dependent variables), but not of a stoichiometric model (at least not as easily or directly). Counteracting this advantage is the fact that kinetic models also have more independent variables: whereas reactions in a stoichiometric model are associated with just two independent variables (lower bound and upper bound), reactions in a kinetic model may be associated with many more independent variables. Fortunately, using formalisms like MCA and BST, the numbers of parameters associated with each reaction in a kinetic model can be kept to a minimum, which will hopefully allow the model to not be so under-determined that it becomes useless.