Measurement Error

Measurement Error

Measurement error refers to a circumstance in which the true empirical value of a variable cannot be observed or measured precisely. The error is thus the difference between the actual value of that variable and what can be observed or measured. For instance, household consumption/expenditures over some interval are often of great empirical interest (in many applications because of the theoretical role they play under the forward-looking theories of consumption). These are usually observed or measured via household surveys in which respondents are asked to catalog their consumption/expenditures over some recall window. However, these respondents often cannot recall precisely how much they spent on the various items over that window. Their reported consumption/expenditures are thus unlikely to reflect precisely what they or their households actually spent over the recall interval.

Unfortunately measurement error is not without consequence in many empirical applications. Perhaps the most widely recognized difficulty associated with measurement error is bias to estimates of regression parameters. Consider the following regression model:

Y = β0 + β1·x + ε

If y and x are observed precisely (and assuming no other complications), β0 and β1 can be estimated via straightforward linear regression techniques. Suppose, however, that we actually observe x*, which is x plus some randomly distributed error v:

x* = x + v

In terms of observed variables, our regression model now becomes

y = β0 + β1· (x * –v ) + ε

= β0 + β1·x * + ε–β1·v

= β0 + β1·x * + ζ

where ζ = ε–β1·v. From this setup, a problem should be immediately apparent. Because x* = x + v and ζ = ε–β1·v, x * is correlated with the error term ζ violating a central assumption of linear regression (that is, independence between regressors and the regression error) required to recover consistent, unbiased estimates of regression parameters. If one were to regress y on what we can observe (that is, x *), the probability limit of the estimate of β1, β1, would be

Using x* thus does not yield a consistent estimate of β1:

ǀp lim (β̂1ǀ<ǀβ1ǀ

If both v and ε are normally distributed or if the conditional expectation from the regression model is linear, then this holds even in small samples as an expectation (Hausman 2001):

This is generally referred to as attenuation bias. While measurement error to right-hand side explanatory variables will also result in biased and inconsistent estimates in a multiple regression framework, the direction of bias is less clear and will depend on the correlations between the measurement errors of the various regressors. Similarly, biased and inconsistent estimates will obtain when the measurement error v is correlated with ε or x, although once again the sign of the bias will no longer be clear a priori.

Measurement error in left-hand side, dependent variables has a different consequence. To cleanly separate issues, imagine that x can now be observed perfectly but that we cannot observe the dependent variable y precisely, but only with a degree of error, as follows:

y* = y + v

Here y* is the observed variable. Thus we cannot observe y directly because of some measurement error v. Returning to our regression framework, we have

y = β0 + β1·x + ε

which, in terms of observed variables, yields

y * –v = β0 + β1·x + ε

or

y * = β0 + β1·x + ζ

where ζ = ε + v. Since x is still uncorrelated with the new regression error ζ straightforward linear regression of y* on x will still yield unbiased and consistent estimates of the regression parameters β0 and β1. However, the variance of ζ will in general exceed that of ε implying more uncertain estimates (and hence higher standard errors and lower t-statistics for those parameters).

Because it is likely a ubiquitous condition (particularly with many variables typically found in microlevel data, often based on interviews at the household, firm, or individual level), many econometric remedies for measurement error have been proposed. Here we focus on the case of measurement errors in right-hand side explanatory variables x because it is errors in these that will actually lead to biased and inconsistent (as opposed to merely inefficient) estimates. While a variety of practical solutions has been proposed, in practice one has become particularly popular: instrumental variables.

In some sense the instrumental variables approach is rooted in part in the contributions of Vincent Geraci (1976, 1977), who explored identification and estimation of systems of simultaneous equations with generalized measurement error. Geraci established the necessary conditions for identification and efficient estimation under such circumstances. Of particular importance, his work stressed the need for prior restrictions sufficiently numerous to compensate for the additional parameters introduced by the measurement error.

Despite the rather elaborate work in the context of systems of equations by Geraci and others, in practice most instrumental variables estimation to surmount measurement error is carried out in a simple, two-stage setting. Once again, to isolate issues, let us assume that y is observed without error but that x is; specifically, assume that we actually observe x*, where

x* = x + v.

To implement the instrumental variables remedy for this sort of measurement error, one must have some variable z (an instrument) that is correlated with the true value x and not the measurement error v. Furthermore z must be correlated with y only through its correlation with x. (Following standard results for instrumental variables estimation, z can be correlated with other observed determinants of y ; what it cannot be correlated with is the regression error ζ = ε–β1·v ) Once such an instrument has been identified, the standard two-stage least squares procedure can be adopted: Regress x* on z, use the fitted model to predict x*, and finally, regress y on the predicted x*. For example, the case of mismeasured household consumption is often addressed through instruments such as household income (often measured in a separate survey module), local prices (which influence consumption, given income), and the like. What is required is a variable correlated with the true measure and not the error. All the concerns regarding the predictive power of instruments (see, for example, Staiger and Stock 1997) apply.

The result that mismeasured explanatory variables leads to biased and inconsistent estimates generalizes to nonlinear regression and limited-dependent variable models (such as logit and probit), although the instrumental variables solution discussed above is no longer effective. See Jerry Hausman (2001) for further discussion of the case of nonlinear regression and Douglas Rivers and Quang Vuong (1988) for solutions in the case of limited dependent variable models. Interestingly measurement error in dependent variables can lead to biased and inconsistent estimates of model parameters in limited dependent variable models. See Hausman (2001) for further discussion.