INTRODUCTION

Genome sequencing and annotation has enabled the development of genome-scale constraint-based metabolic models for hundreds of microbes. These models have been used to characterize and predict the metabolic potential and behavior of a diverse collection of prokaryotes, eukaryotes, and archeae—including those with medical, biotechnological, and environmental applications. Initial models were built to study individual microbes grown in monoculture; however, over the past 10 years, modeling efforts have been extended to study metabolic interactions between microbes in synthetic and natural microbiomes. The remaining sections describe how constraint-based models are built from genomic information, and how these models have been used to answer qualitative and quantitative questions regarding cellular metabolism for individual species and microbial communities.

Constraint-based metabolic models are built from an organism’s genome-scale metabolic network reconstruction. A metabolic reconstruction details the enzymatic and transport reactions that an organism can catalyze and the genes responsible for these reactions. An organism’s genome annotation is one of the primary sources of information used to reconstruct a metabolic network. Metabolic and transport genes are identified, and elementally balnaced and charge-balanced reactions associated with these genes are included in the reconstruction. Because many reactions commonly occur across species, a variety of metabolic databases and tools can be used to facilitate reconstructing metabolic networks (Hamilton and Reed, 2014). Databases such as KEGG (Kanehisa and Goto, 2000), MetaCyc (Krieger et al., 2004), and Model Seed (Henry et al., 2010) can be used to translate genome annotations into draft metabolic reconstructions. These reconstructions may contain metabolic gaps due to missing reactions, which occur spontaneously or are associated with genes that are incorrectly or incompletely annotated. These metabolic gaps can be identified and resolved by converting these metabolic reconstructions into constraint-based metabolic models.

Constraint-based metabolic models calculate intracellular flux distributions that satisfy three fundamental types of constraints (Price et al., 2004). The first type of constraint is a steady-state mass-balance constraint, which sets the total production and consumption rates for each metabolite to be equal. This ensures that there is no net

___________________

a Department of Chemical and Biological Engineering, University of Wisconsin–Madison.

accumulation or depletion of intracellular metabolites. These mass-balance constraints can be used when metabolism is in a steady or a quasi-steady state. The second type of constraint is associated with reaction reversibility and ensures that irreversible reactions can only operate in the appropriate directions. This reversibility constraint was traditionally derived based on biochemical and physiological data, but more recently can be determined using thermodynamic estimates for changes in Gibbs energies due to a reaction (Henry et al., 2007; Fleming et al., 2009). The third type of constraint is referred to as enzyme capacity constraints. For a subset of reactions where the flux capacities are known or measured, upper and lower bounds for fluxes can be imposed. In most cases, capacity constraints limit a small number of fluxes that can easily be measured experimentally, such as growth rates, nutrient uptake rates, or product secretion rates. Together these three types of constraints define a solution space of possible intracellular flux distributions. Since there are often multiple solutions to constraint-based models, optimization can be used to identify optimal flux distributions, including those that maximize biomass yields, minimize enzyme usage or total flux, and minimize flux changes (Orth et al., 2010). In the vast majority of constraint-based models, kinetic parameters and regulatory effects are not included, but such constraints can be included if this information is available (Covert et al., 2004; Yizhak et al., 2010; Cotten and Reed, 2013).

Constraint-based models were initially built for individual species, and hundreds of such models exist for various bacteria, eukaryotes, and archaea (see Systems Biology Research Group, 2017), for a maintained list of models that have been validated against experimental data. Recently, Magnusdottir and colleagues reported the development of a semiautomated pipeline that was used to build 773 individual metabolic models for microbes found in the human gut microbiome (Magnusdottir et al., 2016). Multispecies models have also been developed over the past 10 years. In most of these multispecies models, the reactions and metabolites in each species are accounted for separately, meaning that metabolite production and consumption rates in each species are balanced. These multispecies models also allow for the exchange of metabolites between species by introducing an additional compartment to the model representing the media or shared environment. By modeling the shared environment explicitly, the relative or absolute abundance of different species can be predicted and accounted for. To date, most multispecies models have been developed for synthetic and natural communities containing just a few species, although this is likely to be expanded in the coming years.

While constraint-based models are inherently quantitative, meaning they provide numerical values for all fluxes in the metabolic network, they can be used to answer qualitative and quantitative questions about the metabolic behavior of an organism or microbial community. Qualitative predictions typically require less physiological information because the results are qualitatively insensitive to the enzyme capacity constraints imposed. However, if quantitative predictions are desired, then more physiological data are needed to constrain the metabolic models.

Constraint-based models can be used to answer a variety of qualitative questions regarding cellular metabolism. These models can be used to predict nutrient utilization, minimal medium requirements, product secretion, pathway utilization, gene essentiality, synthetic lethality, and missing reactions from network reconstructions. The amount of data needed to generate qualitative predictions is typically lower than that for quantitative predictions; however, the types of data needed depend on the questions being asked. To answer qualitative questions related to growth or cellular fitness, the metabolic reconstruction and a list of biomass components are needed. Here, the biomass components include the chemicals that must be produced to generate new cells, including amino acids, nucleic acids, lipids, and cofactors; what is typically not needed for qualitative predictions are measurements of biomass composition or uptake and secretion rates. While the model-predicted fluxes will depend on the enzyme capacity constraints imposed, the qualitative output of the model will not change if the capacity constraints are scaled up or down.

Individual species models have been successfully used to predict what genes and nutrients are essential for growth. In this case, the models determine whether nutrients present in the media or environment can be converted by the metabolic reactions into all biomass components. In the case of gene deletion simulations, reactions associated with these genes are removed by constraining the associated fluxes to zero. Metabolic models of Escherichia

Discrepancies between qualitative model predictions and experimental results can be used to improve the metabolic models and refine genome annotations (Orth and Pallson, 2010). As noted earlier, constraint-based models can be used to help identify and fill gaps in draft metabolic models in a process referred to as gap filling. Missing reactions and isozymes in draft models can be identified by resolving discrepancies where the models predict no growth, but cells grow experimentally. Previously, mispredictions associated with carbon source utilization, gene essentiality, and synthetic lethality have been used to add reactions and genes to the metabolic models (Reed et al., 2006; Henry et al., 2009; Zomorrodi and Maranas, 2010). Similarly, reactions and genes can be removed from the models to resolve discrepancies where the models predict growth but the cells do not grow experimentally (Kumar and Maranas, 2009). With the development of high-throughput mutant fitness experiments like TnSeq (van Opijnen and Camilli, 2013) and BarSeq (Wetmore et al., 2015), these gene essentiality comparisons will become more readily available to help probe and improve constraint-based models for a variety of organisms.

Multispecies models have been used to predict the types of interactions that might exist between microbes in a community (Heinken and Thiele, 2015; Magnusdottir et al., 2016). These predictions were made based on how predicted growth rates change for each organism between monoculture and co-culture conditions. For monoculture simulations, the individual growth rate was maximized, while the sum of both microbes’ growth rate was maximized for co-culture simulations. Magnusdottir and colleagues recently predicted all pairwise interactions between 773 microbes found in the human gut microbiome under four different dietary conditions (Magnusdottir et al., 2016). Microbial growth was predicted to increase or decrease in co-culture if the growth rate in co-culture was more or less than 10% of the growth rate in monoculture, respectively. Most interactions were predicted to be parasitic (38-41%) or commensal (22-30%), where either one microbe’s growth rate increases while the other’s decreases or one microbe’s increased growth does not affect the other in co-culture. The types of interactions between pairs of microbes depended on diet and oxygen conditions. For example, a low- versus high-fiber diet impacted the numbers of commensal interactions, while aerobic versus anaerobic conditions mostly impacted the numbers of mutualistic and amensal interactions (Magnusdottir et al., 2016).

To answer quantitative metabolic questions, more information is typically needed to constrain the genome-scale metabolic models. This information can include measurements of biomass component composition of cells, uptake and secretion rates, growth rates, and kinetic parameters. Such information can be used by the models to predict uptake and secretion rates, intracellular fluxes in metabolic pathways, metabolite concentrations, growth rates, interspecies fluxes, and community composition.

Individual species models have been frequently used to predict metabolic fluxes in response to genetic or environmental changes. A number of constraint-based methods have been developed specifically for this purpose, where they identify flux distributions that minimize flux differences between perturbed and unperturbed states (Segre et al., 2002; Shlomi et al., 2005; Kim and Reed, 2012). These methods have been used to successfully predict central metabolic fluxes and growth rates for a variety of gene knockout mutants—including E. coli, S. cerevisiae, and B. subtilis—or growth conditions (Kim and Reed, 2012). The accuracy of these tools has enabled the use of metabolic models for metabolic engineering strain design purposes. For example, combinations of metabolic additions and deletions needed to couple growth and product formation (Burgard and Maranas, 2003; Kim et al., 2011) or that maximize productivity can be identified (Patil et al., 2005). These tools have been used to design strains that produce polymer precursors (Fong et al., 2005), nutriceuticals (Lee et al., 2007; Park et al., 2007), and commodity chemicals (Kim and Reed, 2010).

A number of studies have used multispecies models to predict community-, interspecies-, and intraspecies-level fluxes in microbial communities. One of the first community models was developed for a syntrophic community containing the sulfate reducer Desulfovibrio vulgaris and methanogen Methanococcus maripaludis. Stoylar and colleagues used community measurements of lactate and hydrogen fluxes to predict acetate, methane, carbon dioxide, and biomass production rates (Stolyar et al., 2007). Wintermute and Silver used metabolic models to predict how pairs of E. coli auxotrophs would grow, and found a strain’s growth in co-culture was correlated with the ratio of the growth benefit for acquiring that strain’s essential nutrients to the cost of producing those nutrients by the other partner strain (Wintermute and Silver, 2010). Dynamic multispecies models have been developed that can capture changes in community composition over time when species have different growth rates. Such models have been developed for communities that degrade cellulose (Salimi et al., 2010), co-utilize glucose and xylose (Hanly and Henson, 2011), cross-feed amino acids (Zhang and Reed, 2014), and reduce uranium (Zhuang et al., 2011). Zhuang and colleagues developed a dynamic community model of Rhodobacter ferrireducens and Geobacter sulfurreducens that included kinetic parameters for nutrient uptake rates. The model accurately predicted changes in G. sulfurreducens abundance in response to acetate amendment as a function of ammonium availability (Zhuang et al., 2011). In many multispecies models, either individual species are assumed to maximize their own growth rate or the combined growth rates of all species in the community are maximized. In contrast, the OptCom modeling framework was developed to allow both community- and species-level objective functions to be optimized (Zomorrodi and Maranas, 2012). Application of OptCom to different synthetic and natural communities found that some microbes in phototrophic microbial mats reduce their species-level biomass production to increase community-level biomass production, while microbes in a synthetic community representing a subsurface anaerobic environment maximize community and individual species biomass production (Zomorrodi and Maranas, 2012).

CHALLENGES AND FUTURE DIRECTIONS

While modeling microbiomes is an exciting and expanding area of research, there are a number of experimental and computational challenges that need to be overcome to move the field forward in its ability to more accurately predict the qualitative and quantitative behaviors of microbial communities. A current limitation for modeling microbial communities is a lack of experimental inter- and intraspecies flux measurements, which are needed to evaluate and improve model predictions. Monoculture measurements of intracellular and extracellular fluxes have been invaluable for the development of modeling approaches and the identification of objective functions that best predict monoculture behaviors (Burgard et al., 2003; Schuetz et al., 2007); however, analogous co-culture flux measurements are more difficult to acquire. Extracellular fluxes for individual species are more challenging to measure for metabolites that are produced or consumed by multiple community members. Recent advances using carbon-13 labeling experiments have been able to resolve intracellular fluxes in two-species communities (Gebreselassie and Antoniewicz, 2015). Improvements in experimental techniques to measure fluxes in microbial communities will enable the development of constraint-based modeling approaches to more accurately predict fluxes in microbial communities by identifying appropriate species- and community-level objective functions. Individual models have been successfully used to design genetic and environmental perturbations to achieve desired phenotypes; however, to extend such approaches to design microbial communities and manipulate their behaviors will require knowledge of which objective functions accurately predict intracellular and extracellular fluxes in co-culture.

Another challenge deals with building metabolic models from genomic and metagenomic data. With a few exceptions, constraint-based models are built using genomic data from individual organisms; however, metagenomic sequencing identifies metabolic genes in community members, but it lacks complete details on which microbe these genes belong to. As a result, it is difficult to predict which metabolic genes and reactions should go into different species’ models. Biggs and Papin have recently developed a new approach to try and address this issue (Biggs and Papin, 2016). Another challenge with using genomic and metagenomic annotations to build models is predicting what metabolites can be taken up and secreted by different organisms from sequencing data alone. While transporter mechanisms can be predicted based on sequence information, it is more difficult to predict which specific metabolites are being taken up or excreted by these transporters. Thus, improving transporter annotations

and their experimental characterization will help improve predictions of nutrient uptake, product secretion, and metabolite exchange in microbial communities.

Most current constraint-based modeling studies of communities have not accounted for spatial variation in microbial communities. However, spatial chemical gradients will develop in a lot of natural communities where good mixing does not occur. Since cellular behaviors are dependent on chemical concentrations in their local environments, future microbiome models should also include approaches to predict concentration gradients in response to flow, diffusion, and microbial metabolism.

Constraint-based models can be used to study a diverse range of organisms and microbial communities, including synthetic and natural communities associated with ocean, marine, and human environments. To date, multispecies models have mostly been used to study communities with low diversity, comprising two- or three-member communities. As tools for building, refining, and simulating multispecies models improve, the numbers of microbiomes being modeled and their applications to describe, predict, and design the chemistries being performed by communities will increase.

ACKNOWLEDGMENTS

This work was funded by the Office of Science (BER), the U.S. Department of Energy (DE-SC0008103), the U.S. Department of Energy Great Lakes Bioenergy Research Center (DOE BER Office of Science DE-FC0207ER64494), and the National Science Foundation (NSF 1053712).

The 21st century has witnessed a complete revolution in the understanding and description of bacteria in eco- systems and microbial assemblages, and how they are regulated by complex interactions among microbes, hosts, and environments. The human organism is no longer considered a monolithic assembly of tissues, but is instead a true ecosystem composed of human cells, bacteria, fungi, algae, and viruses. As such, humans are not unlike other complex ecosystems containing microbial assemblages observed in the marine and earth environments. They all share a basic functional principle: Chemical communication is the universal language that allows such groups to properly function together. These chemical networks regulate interactions like metabolic exchange, antibiosis and symbiosis, and communication.

The National Academies of Sciences, Engineering, and Medicine’s Chemical Sciences Roundtable organized a series of four seminars in the autumn of 2016 to explore the current advances, opportunities, and challenges toward unveiling this “chemical dark matter” and its role in the regulation and function of different ecosystems. The first three focused on specific ecosystems—earth, marine, and human—and the last on all microbiome systems. This publication summarizes the presentations and discussions from the seminars.

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.