Sample Attrition and Sample Selection

Missing observations occur frequently in panel data. If individuals are missing randomly, most estimation methods for the balanced panel can be extended in a straightforward manner to the unbalanced panel (e. g. Hsiao, 1986). For instance, suppose that

ditVit = dit [a, + 7 ‘zit + uit], (16.34)

where dit is an observable scalar indicator variable which denotes whether in­formation about (yit, z ‘it) for the ith individual at fth time period is available or not. The indicator variable dit is assumed to depend on a ^-dimensional variables, wit, individual specific effects Xi and an unobservable error term nit,

dit = I(Xi + §’wu + Чи > 0), (16.35)

where I(-) is the indicator function that takes the value of 1 if Xi + S’wit + nit > 0 and 0 otherwise. In other words, the indicator variable dit determines whether (yit, zit) in (16.34) is observed or not (e. g. Hausman and Wise, 1979).

Without sample selectivity, that is dit = 1 for all i and t, (16.31) is the standard variable intercept (or fixed effects) model for panel data discussed in Section 2. With sample selection and if nit and uit are correlated, E(uit | zit, dit = 1) Ф 0. Let 0( ) denote the conditional expectation of uit conditional on dit = 1 and wit, then (16.31) can be written as

Vit = ai + 7′ zit + 0(Xi + §’wit) + £n

where E(eit | zit, dit = 1) = 0. The form of the selection function is derived from the joint distribution of u and n. For instance, if u and n are bivariate normal, then we have the Heckman (1979) sample selection model with 0(X, + §’wit) =

are standard normal density and distribution, respectively, and the variance of n is normalized to be 1. Therefore, in the presence of sample attrition or selection, regressing Vu on zit using only the observed information is invalidated by two problems. First, the presence of the unobserved effects a i, and second, the "selec­tion bias" arising from the fact that E(uit| zit, dit = 1) = 0(X,- + §wit).

When individual effects are random and the joint distribution function of (u, n, Y, Xi) is known, both the maximum likelihood and two – or multi-step estimators can be derived (e. g. Heckman, 1979; and Ryu, 1998). The resulting estimators are consistent and asymptotically normally distributed. The speed of convergence is proportional to the square root of the sample size. However, if the joint distribution of u and n is misspecified, then even without the presence ofai, both the maximum likelihood and Heckman (1979) two-step estimators will be inconsistent. This sensitivity of parameter estimate to the exact specification of the error distribution has motivated the interest in semiparametric methods.

The presence of individual effects is easily solved by pairwise differencing those individuals that are observed for two time periods t and s, i. e. who has dit = dis = 1. However, the sample selectivity factors are not eliminated by pairwise differencing. The expected value of yit – yis given dit = 1 and dis = 1 takes the form

and are different from each other. If (uit, nit) are independent, identically distri­buted (iid) and are independent of ai, Ki, § and w, then

Qu = E(Uit | du = 1, dis = 1) = E(Uit | du = 1)

= E(UU Int > – w’it8 – K) = 9 (8’wit + A,-), (16.39)

where the second equality is due to the independence over time assumption of the error vector and the third equality is due to the independence of the errors to the individual effects and the explanatory variables. The function 9() of the single index, S’wit + K, is the same over i and t because of the iid assumption of (Uit, nit), but in general, 9(8’wit + K) * 9(S’wis + Ki) because of the time variation of the scalar index S’wit. However, for an individual i that has S’wit = S’wis and dit = dis = 1, the sample selection effect 9it will be the same in the two periods. Therefore, for this particular individual, time differencing eliminates both the unobserved individual effect and the sample selection effect,

У it – yis = У’ (§it – §is) + (£it – Sis). (16.40)

This suggests estimating у by the least squares from a subsample that consists of those observations that satisfy S’wit = 8’wis and dit = dis = 1,

1

X X (§it – §is)(§it – §is)’1{(Wit – Wis)§ = 0}dudis

i=1 1< s<t < Ti N

X X (§it – §is)(yit – yis)1{(Wit – Wis)’8}dudis

i =1 1<s <t <Ti

where Ti denotes the number of ith individual’s time series observations.

The estimator (16.41) cannot be directly implemented because 8 is unknown. Moreover, the scalar index 8’wit will typically be continuous if any of the variables

in wit is continuous. Ahn and Powell (1993) note that if 9 is a sufficiently "smooth" function, and 8 is a consistent estimator of 8, observations for which the differ­ence (wit – wis)’8 is close to zero should have 9it – 9is — 0. They propose a two-step procedure. In the first step, consistent semiparameter estimates of the coefficients of the "selection" equation are obtained. The result is used to obtain estimates of the "single index, wJ8," variables characterizing the selectivity bias in the equation of index. The second step of the approach estimates the parameters of the equation of interest by a weighted instrumental variables regression of pairwise differences in dependent variables in the sample on the corresponding differences in explanatory variables; the weights put more emphasis on pairs with w’t8 — wit-18.

Kyriazidou (1997) and Honore and Kyriazidou (1998) generalize this concept and propose to estimate the fixed effects sample selection models in two steps: In the first step, estimate 8 by either the Anderson (1970), Chamberlain (1980) con­ditional maximum likelihood approach or the Manski (1975) maximum score method. In the second step, the estimated 8 is used to estimate y, based on pairs of observations for which dit = dis = 1 and for which (wit – wis)’8 is "close" to zero. This last requirement is operationalized by weighting each pair of observations with a weight that depends inversely on the magnitude of (wit – wis)’8, so that pairs with larger differences in the selection effects receive less weight in the estimation. The Kyriazidou (1997) estimator takes the form:

where K is a kernel density function which tends to zero as the magnitude of its argument increases and hN is a positive constant that decreases to zero as N ^ ^.

Under appropriate regularity conditions, Kyriazidou (1997) shows that ‘ (16.42) is consistent and asymptotically normally distributed. However, the rate of con­vergence is slower than the standard square root of the sample size.

There is an explosion of techniques and procedures for the analysis of panel data (e. g. Matyas and Sevestre, 1996). In this chapter we have discussed some popular panel data models. We did not discuss issues of duration and count data models (e. g. Cameron and Trivedi, 1998; Heckman and Singer, 1984; Lancaster, 1990; Lancaster and Intrator, 1998), simulation-based inference (e. g. Gourieroux and Monfort, 1993), specification analysis (e. g. Baltagi and Li, 1995; Lee, 1987; Li and Hsiao, 1998; Maddala, 1995; Wooldridge, 1995), measurement errors (e. g. Biorn, 1992; Griliches and Hausman, 1984; Hsiao, 1991; Hsiao and Taylor, 1991) pseudo panels or matched samples (e. g. Deaton, 1985; Moffit, 1993; Peracchi and Welsch,1995; Verbeek, 1992), etc. In general, there does not exist a panacea for panel data analysis. It appears more fruitful to explicitly recognize the limitations of the data and focus attention on providing solutions for a specific type of model. A specific model often contains specific structural information that can be exploited. However, the power of panel data depends on the validity of the assumptions upon which the statistical methods have been built (e. g. Griliches, 1979).

Notes

* This work was supported in part by National Science Foundation grant SBR96-19330. I would like to thank two referees for helpful comments.

1 Normality is made for ease of relating sampling approach and Bayesian approach estimators. It is not required.

2 See Chamberlain (1984), Hausman and Taylor (1981) for the approaches of estimating models when u and є are correlated.

3 Under smooth conditions, Horowitz (1992) proposed a smoothed maximum score estimator that has a n~2/5 rate of convergence. With even stronger conditions Lee (1999) is able to propose a root-n consistent semiparametric estimator.

Hsiao, C. (1990). A mixed fixed and random coefficients framework for pooling cross­section and time series data. Paper presented at the Third Conference on Telecom­munication Demand Analysis with Dynamic Regulation, Hilton Head, S. Carolina.

Hsiao, C., and D. Mountain (1995). A framework for regional modelling and impact analy­sis – an analysis of demand for electricity by large municipalities in Ontario, Canada. Journal of Regional Science 34, 361-85.