Scatter plot of the residuals

Purpose

These plots display the PWRES (population weighted residuals), the IWRES (individual weighted residuals), and the NPDEs (normalized prediction distribution errors) as scatter plots with respect to the time or the prediction.

The PWRES are computed using the population parameters and the IWRES are computed using the individual parameters. For discrete outputs, only NPDEs are used.

These plots are useful to detect misspecifications in the structural and residual error models: if the model is true, residuals should be randomly scattered around the horizontal zero‐line.

Definition

Population Weighted Residuals )

are defined as the normalized difference between the observations and their mean. Let be the vector of observations for subject i. The mean of is the vector . Let be the variance-covariance matrix of . Then, the ith vector of the population weighed residuals is defined by

and are not known in practice but are estimated empirically by Monte-Carlo simulation without any approximation of the model.

Individual weighted residuals

are estimates of the standardized residual () based on individual predictions, with the function defining the residual error model:

If the residual errors are assumed to be correlated, the individual weighted residuals can be decorrelated by multiplying each individual vector by , where is the estimated correlation matrix of the vector of residuals .

Normalized prediction distribution errors

are a nonparametric version of based on a rank statistic. For any (i,j), let where is the cumulative distribution function (cdf) of . NPDEs are then obtained from by applying the inverse of the standard normal cdf .

In practice, one simulates a large number of simulated data set using the model, and estimate as the fraction of simulated data below the original data, i.e:

By definition, the distribution of is uniform on [0,1], we thus rather use , which follows a standard normal distribution (with the cdf of the standard normal distribution). NPDEs are defined as an empirical estimation of , i.e .

Examples

In the following example, the parameters of a two-compartment model with iv unfusion and linear elimination are estimated on the remifentanil data set. One can see the PWRES, the IWRES and the NPDE w.r.t. the time (on top), and the prediction (at the bottom).

Since the points are clearly scattered unevenly around the horizontal zero‐line, these plots suggest a misspecifcation of the structural model.

It is possible to select some of the subplots to focus on, with the panel Subplots in Settings:

Presets

A number of element can be overlaid or hidden from the plots in the panel Display. Only the horizontal zero‐line, representing the theoretical mean, is always displayed. Two presets with predefined selections of displayed elements are available: the first one called “Scatter” hides all elements except the points for residuals, while the second called “VPC” displays instead empirical and predicted percentiles for the residuals as lines, as well as prediction intervals as colored areas. This figure is detailed below.

Predictive checks

The preset “VPC” displays prediction intervals for the median, 10th and 90th percentiles, obtained with simulations of the residuals, as well as the empirical percentiles to compare the behavior of the model to the data. Residual points are hidden, but the trend is represented with a spline interpolation.

Misspecification in the structural model, the error model, and the covariate model can be detected by discrepancies between the observed percentiles and their prediction intervals, as can be seen for example on the plots of IWRES vs time and NPDE vs time below, with log-scale on the x-axis. Population residuals greatly depart from the data at all time points, while individual residuals show better predictions for low times only.

Outliers (empirical percentiles outside the prediction intervals) can be marked with red points or red areas:

Comparing PWRES and NPDEs

NPDEs are quite similar to PWRES, but are simulation‐based, and therefore account for the heterogeneity in study design by comparing the observations with their own distribution. NPDEs are thus displayed by default rather than PWRES.

Preventing shrinkage in IWRES

The individual estimates used to compute the IWRES can be chosen in the Display panel:

By default, the individual estimates are drawn from the conditional distributions rather than coming from usal estimators such as condition modes (EBEs) or conditional means. This choise is recommended in order to prevent shrinkage, a phenomenon that occurs when the individual data are not sufficiently informative with respect to one or more parameters. If overfitting occurs, IWRES computed from biased estimators might thus shrink toward 0.

Highlight

Hovering on a point highligths all the points from the same individual in yellow on all plots, and reveals the corresponding subject id and time. If the individual estimates selected in Display are the simulated condition distribution, each observation corresponds to a set of IWRES computed from a set of simulated individual parameters. When the observation is hovered, the points from this set are indicated with a bigger diameter.

If the individual estimates selected in Display are condition modes (EBEs) or conditional means, there is only one residual per observation, and all points corresponding to the same individual are linked with segments to visualize the time chronology.

Binning

As for VPC, data binning used to compute percentiles can be changed. Several strategies exist to segment the data: equal-width binning, equal-size binning, and a least-squares criterion. The number of bins can also be either set by the user, or automatically selected to obtain a good tradeoff.

On the three figures below where NPDEs are displayed with respect to log-scaled time, 5 bins are selected with equal width on the left, equal size in the center, and the least-squares criteria on the right. Observations are overlaid in light purple to visualize the data density in each bin. Equal width in particular shows low density for some bins, and result in a less informative plot for low times were data density is high.

On the figure below, the number of bins for least-squares criteria is automatically set, allowing a more precise display.

Discrete data

For categorical or count data, only NPDEs are used. Here again, NPDEs correspond to the rank of each observation among a set of simulations based on the model. However, to prevent problems with discrete values, both observations and simulations are slightly perturbed with a uniform distribution before computing the ranks.

Settings

Subplots

Residuals

Population residuals: Add/remove scatterplots for PWRES. Hidden by default.

Individual residuals: Add/remove scatterplots for IWRES, using the individual parameter estimated using the conditional mode or the conditional mean. By default, individual parameters come from the conditional mode estimation.

NPDE: Add/remove scatterplots for NPDE.

X-axis

time: Add/remove the scatterplots w.r.t. the time.

prediction: Add/remove the scatterplots w.r.t. the prediction.

Display

Presets: apply the preselections of elements for scatter plots or VPC

Residuals: Add/remove observed data.

BLQ : Add/remove BLQ data (and a possibility to add a different color) if present.