I have a question for the group and that concerns the handling of BQLs. Some set them equal to missing, yet others set them equal to one-half the LOQ of the assay and then have some small residual variance term included in the error statement. This latter method has always bothered me because of the 1/2 LOQ issue. The value itself is completely arbitrary and wouldn't this set up the possibility of a mixture model for some pk parameters? What does the group think about instead of assigning one-half LOQ to the missing values, assigning a random value on the uniform interval 0(+) to LOQ. This would insure continuity. Then adding a dichotomous variable that is 0 or 1 to indicate whether the value is imputed and include this as a separate variance term to account for the additional uncertainty in the imputed concentration value.

Any thoughts?

Pete Bonate

*****

From: LSheiner <lewis@c255.ucsf.edu>

Subject: Re: BQLs

Date: Tue, 26 Jun 2001 09:32:34 -0700

This has been discussed before ...

First, let's be clear on what is the "right" thing to do in principle. If you are using ML, then the right thing to do is use the marginal likelihood, integrating out the "missing value", that is, if the usual likelihood contribution for datum y is L(y) = p(y|params), and y is censored (i.e., known to be < QL, but not known further), then the likelihood contribution should be L*(y) = p(y<=QL|params) = Integral[p(y|params)dy], where the integration is from -infinity to QL. In the case of normal residual noise with var = sigma**2, this would be the CDF of a Standard Normal density (i.e. with mean = zero and variance = 1) evaluated at (QL-yhat)/sigma, where yhat = E(y|params) -- in fact the "Y" usually defined in $ERROR.

Here are the two approximations Pete discusses:

1. Set y = QL/2, var(y) = QL**3/12. This assumes that p(y|y<=QL,params) is approximately U(0,QL), and that p(E(y)|params) is approximately proportional to Integral[p(y|params)dy].

2. The first part of Pete's new suggestion chooses y randomly from U(0,QL), so that the likelihood contribution is correct (under p(y|y<=QL,params) = U(0,QL)), avoiding the assumption that p(E(y)|params)is approximately proportional to Integral[p(y|params)dy]. The second part of his assumption tries to deal with the "extra uncertainty" but I'm not sure it would do so. Even if we set var(y) = QL**3/12 consistent with the uniform distribution assumption, the fact that we are creating data where there are none leads to standard errors of parameters that are a bit too small.

If you want to avoid the approximations in (1), and still avoid the integration, then you can use "multiple imputation;" see Hopke et al (Hopke, P. K., C. Liu, et al. (2001). Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the Arctic. Biometrics 57(1): 22-33), and references therein. This method does as Pete suggests for te first part, but not the second. Instead it creates multiple data sets with different random imputations, and uses the separate analyses of each to deal with the added error. In effect, it performs a Monte Carlo integration, rather than the explicit one that I discussed above.

Note that all of the approximations (including multiple imputation) require a model for p(y|y<=QL,params). Obviously, the "right" model is a function of the (unknown) "params", and hence the imputation methods are not fully efficient. I have no idea what the "cost" of this is, but if it is to be avoided, then one must either modify the likelihood to accept L*(y) (which can be done in NONMEM by a sophisticated user, employing the option LIKELIHOOD -- for an example, see ftp://pkpd.icon.palo-alto.med.va.gov/nonmem.dir/COMPLIANCE2.dir/), or use data augmentation methods, which can be viewed as iterative multiple imputation (see Tanner, M. A. (1991). Tools for Statistical Inference. Observed Data and Data Augmentation Methods. New York, Springer-Verlag.).