Effect of Outliers on Fit of Growth Models

The choice of a population’s growth model is frequently influenced by its fit to experimental growth curves, which in turn can be discernibly affected by the data scatter and presence of outliers. This Demonstration allows visualization and quantification of these effects by generating smooth or randomly scattered growth data with the shifted logistic equation and fitting them with the Gompertz and stretched exponential (Weibullian) models by nonlinear regression. One to three outliers can be added manually at chosen locations and their effect on the fitted growth curve’s shape and fit parameters assessed.

THINGS TO TRY

SNAPSHOTS

DETAILS

Snapshot 1: Simulated growth data with two outlier points added. The gray curves are the Gompertz and Weibullian models having the default parameter values.

Snapshot 2: The simulated growth data shown in Snapshot 1 after being fitted with the Gompertz and Weibullian models.

Snapshot 3: Simulated growth data having one outlier fitted with the Gompertz and Weibullian models. Notice the slightly better fit of the Weibullian model.

Snapshot 4: Simulated growth data having one outlier fitted with the Gompertz and Weibullian models. Notice the slightly better fit of the Gompertz model.

Sigmoid growth curves have been described by an assortment of mathematical models of which the Gompertz and several versions of the logistic (Verhulst) model are the most common. The choice between the models is often influenced by their degree of fit, which might be affected differently by the experimental data’s scatter and outliers. This phenomenon is shown using simulated growth data with and without random scatter of a chosen amplitude, . One to three outlier points may also be added at chosen locations by dragging them into position.

The simulated growth data is in the form of a growth ratio (dimensionless) versus time (arbitrary units). The growth ratio can be defined as for small and moderate growth or for intensive growth, that is, of several orders of magnitude, as encountered in microbiology, where and are the momentary and initial number, respectively. The simulated data is generated with the shifted logistic model [1] , where the growth parameters , , and are entered with sliders. The number of points to generate is also entered with a slider and so is the amplitude of the superimposed random scatter, . The seed slider allows differing random point scatter for .

The two regression models are the Gompertz equation, , and stretched exponential (Weibull) equation, . To fit the data, you can modify the default parameters’ values until the plotted gray curves roughly match the data points (hint: always start by moving the slider first).

To add one to three outlier points to the generated data, select the desired number using the "number of outlier point locators" setter bar and click the light red "new locator(s)" setter to add the locators to the plot. They will appear along the axis starting at the origin. Drag each locator to the desired position of an outlier point and when all have been positioned click the light green "accept outlier point locator(s)" setter to add them to the data to be fitted. The locator crosshairs become grayed out when they have been added as outlier points (note: the outlier points will NOT be included in the fitted dataset unless the light green setter is clicked). To remove any outlier points, select "0" on the "number of outlier point locators" setter bar and then click the light red "new locator(s)" setter.

Once an approximate fit is obtained, click the bright green "fit selected model to data and plot results" setter to do the regression. The resulting fitted curves' parameter values and corresponding goodness of fit and MSE will then appear above the plot. Often the fit can be improved by using the fitted model’s parameters as initial guesses for successive attempts. This is done by clicking the light blue "last fitted values" setter and again clicking the bright green setter until the and MSE values remain unchanged.

The Demonstration illustrates that depending on the scatter and outliers’ positions, models that could otherwise be used interchangeably might produce different fitted growth curves.