Discuss how additional KBase tools may be used to improve the accuracy of phenotype predictions.

Description of the app

This app uses an input metabolic model to simulate growth in a set of media conditions, with a specified set of gene knockouts and with specified media supplements. A metabolic model can be curated using phenotypic data such as Biolog growth data or gene essentiality data generated on a specific set of media conditions. The app reports differences between growth predictions and experimentally measured growth rates. This app can be applied to test the accuracy of a model in replicating experimental observations, as well as exploring the set of metabolites that an organism can use as nutrient sources. The Simulate Growth on Phenotype Data app carries out flux balance analysis (FBA) for each medium and knockout in the phenotype dataset and displays the output (growth/no growth) as a side-by-side comparison of model predictions and experimental results. To begin, a user uploads a table of phenotype data and either loads a metabolic model or selects a model already present in KBase. KBase uses the selected model to simulate the uploaded phenotypes, presenting simulation results in a detailed exportable report. This app also conducts some reconciliation of models with phenotype data.

Description of the input

The Simulate Growth on Phenotype Data app takes one metabolic model and one phenotype set as input. In KBase, an “FBAModel” or “Metabolic Model typed object” contains the reactions, compounds, compartments, biomass reactions, and gene associations that comprise a metabolic model. The “PhenotypeSet,” or “Phenotype Set typed object,” contains information about experimentally measured growth phenotype data. The entire Phenotype Set object is always associated with a specific “Genome” object in KBase; this object is the genome for which the growth phenotype observations were measured.

The phenotype set then includes a list of observed growth phenotypes. Each growth phenotype contains the data required to define a specific growth observation: (1) the base media condition in which the growth phenotype was observed; (2) a list of supplemental compounds that were added to the base media condition for the growth phenotype (this list can be empty if the phenotype was observed in the unaltered base media); (3) a list of genes knocked out of the associated reference genome during the growth phenotype study (this list can be empty if the growth phenotype was observed with the wildtype strain); and (4) the normalized growth rate observed for the specified strain in the specified growth condition.

The phenotype set was designed as a high-level representation of observed growth phenotype data, which includes but is not limited to: (1) Biolog array data, where the phenotype set includes a single growth phenotype for each well of the Biolog array; (2) gene essentiality data, where the phenotype set includes a single growth phenotype for each gene knockout; and (3) TNseq data, where transposon counts are translated into gene essentiality measures in a variety of growth conditions.

KBase offers several ways to load metabolic models into your Narrative for input into this and other apps:

KBase offers several ways to load phenotype sets into your Narrative, as well:

Upload your own data in TSV format from your local machine.

Search for and add to your Narrative a phenotype set from KBase’s reference data collection, which includes 14 Biolog and 14 gene essentiality datasets.

Use example data from the Data Browser slideout.

Note that if you’re using phenotype data to evaluate the accuracy of your model, you will want to ensure that the genome associated with the phenotype data matches the genome associated with the model being used to simulate the phenotype data. You can see the genomes associated with a model or phenotype set by (1) dragging either type of object from the Data Panel to the main Narrative panel or (2) clicking on the object in the Data Panel and opening its Provenance page using the tree-like icon.

However, it’s also important to note the value of Phenotype Set objects as a quick and easy means for simulating growth of a model in hundreds of conditions at once. This is useful to quickly determine which nutrients an organism has complete utilization pathways for. When used in this way, the genome associated with the phenotype set does not have to match the genome for the model. In this case, it is more useful to ensure that the phenotype set includes a wide range of growth conditions of interest (e.g., a Biolog set). This application for phenotype sets is particularly useful for comparative genomics or for assigning an organism to an ecological niche based on its genome content.

Description of the output

The output of this app is a “PhenotypeSimulationSet,” or a list of the growth phenotypes included in the phenotype set, along with growth rates predicted by the specified metabolic model. In this data object, each phenotype is also classified based on the agreement between the model-predicted growth and the experimentally observed growth. If model growth and experiment growth are both nonzero, this is classified as a correct positive (CP). If the model growth and experiment growth are zero, this is classified as a correct negative (CN).

If the model growth is zero while the experiment growth is nonzero, this is classified as a false negative (FN). If the model growth is nonzero while the experiment growth is zero, this is classified as a false positive (FP). All data from the generated PhenotypeSimulationSet is displayed in tabular form within the Narrative.

Point and click instructions for using this app

Note: This tutorial assumes that you have already created a new Narrative. For instructions on how to accomplish this and other tasks such as finding or uploading data to your Narrative, see the Narrative Interface User Guide.

Step 1. Add data that you want to analyze

Before running the Simulate Growth on Phenotype Data app, you will need to copy or upload the needed input data. For this analysis, we will use an example model and phenotype set available in KBase.

To add the example data to your Narrative, find the Data Panel along the left of the screen and click the Add Data (or red “+”) button. This will open the Data Browser slideout. Choose the Example tab to see a list of example datasets, which are organized by type.

Look for the Example FBAModels heading and find the three models listed beneath it. Mouse over the iRsp1140 model and click on the Add button that appears to its left to add the model to your Narrative.

The iRsp1140 object is a curated model of Rhodobacter Sphaeroides 2.4.1, published by Tim Donahue’s lab at the University of Wisconsin, Madison [1].

In addition to selecting our model, we also need to select the phenotype set for the genome that our model is associated with. This object is also available in the Examples tab under the Other Examples heading.

Under that heading, locate the object called Rhodobacter_sphaeroides_2.4.1_Biolog object and add it to your Narrative. Exit the Data Browser by clicking the Close button at the bottom of the window or by clicking anywhere in the main Narrative panel in the center.

Try this later…When you are ready to analyze your own models, you may want to upload or import your model using the Import tab, or you could import a genome and construct a new model using KBase. Similarly, you can also use the Import tab to upload your own phenotype data. See the Upload Guide for more information.

Notice that your Data Panel now shows the two objects that you added to your Narrative:

You can find out more about these objects by mousing over their records in the Data Panel and clicking on the “…” that appears. An expanded view of the data will open:

The icons in this view let you see a data summary, download the object, see its provenance, and more. (Please see theExplore Data section of the Narrative Interface User Guide for more information.)

For now, you can examine the model by dragging and dropping the data object from the Data Panel onto the main Narrative panel. You will see something like this:

Be sure to save your Narrative frequently, using the Save button at the top right of the screen.

Step 2. Add and run the app

Now that you have the needed input data, you can add the Simulate Growth on Phenotype Data app to your Narrative. Look closer at the Apps Panel directly below your data.

You can search for apps using the search box at the top of the Apps Panel, or just scroll until you find the one you want. Locate the Simulate Growth on Phenotype Data app in the list and click on its name or icon to add it to your Narrative.

To run the app on the sample model and phenotype set, you must first fill out each input field for the app. In-depth descriptions for all input fields for this app are provided in the app details page.

For the FBA Model field, select the iRsp1140 model from the dropdown menu. In the Phenotype Set field, select the Rhodobacter_sphaeroides_2.4.1_Biolog object. Finally, in the Phenotype Simulation Result field, type a name for the phenotype simulation set that will be generated by this app. In our example, we will use the name “iRsp1140_Biolog_sim.”

Notice that as you fill in the required parameter fields, the red arrows next to those fields change to green checkmarks. Once all required fields have a green checkmark, the app is ready to run.

Click the Run button at the bottom of the app cell to launch the analysis job.

Depending on the queue size (how many other calculations have been requested by users recently), this job should take approximately 3 minutes. To check the status of your job, click on the Jobs tab near the top left of your Narrative.

Step 3. Look at the output

The Simulate Growth on Phenotype Data app creates a new output data object in your Data Panel called PhenotypeSimulationSet. A new output cell is generated below the app cell in the main Narrative panel to visualize this object.

There are two tabs for browsing the data: Overview and Phenotypes.

The Overview tab indicates information on the Phenotype Simulation Set object.

The Phenotypes tab lists the phenotypes from the selected PhenotypeSet object, along with growth predicted by the input model.

Step 4. Download the results

You can download the phenotype simulation set in several formats: TSV, Excel, or JSON. Open the expanded view of the object in the Data Panel. Next, click on the Export/Download data icon to see the download options.

Note to PC users: If downloading to Excel, the data will be placed into a zipped folder whose name (or path) can be long, depending on the data object’s name and type. If the folder path becomes too long, Windows may not be able to open it. Try copying or moving the file to a folder or directory that has a shorter path if you encounter problems.