Re: st: pool cross-section survey data

Ana et al.--
As usual, I agree with Stas. But -ivregress- or -ivreg2- requires
that you have good instruments (excluded instrumental variables, to be
precise), meaning some variables that determine treatment but are
uncorrelated with the error in the regression model you wrote out
(i.e. they have no direct effect on the outcome except through the
included variables). Matching (which addresses only selection on
observables), esp. propensity-score matching, can be thought of as
reweighting, and then you can specify a many-to-many matching-type
strategy where you compare the weighted conditional mean over all of
the untreated obs to the weighted conditional mean over all of the
treated obs, so all the untreated obs are all matches for each treated
obs and all the treated obs are matches for all untreated obs, in some
sense (for the ATE estimand). If you regress treatment on
observables, get a predicted probability, then generate a new weight
something like this:
logit t `x'
predict p if e(sample)
g w=cond(t,1/p,1/(1-p))
su w if t==0, meanonly
replace w=w/r(mean) if t==0
su w if t==1, meanonly
replace w=w/r(mean) if t==1
(you should probably do this separately by super-stratum) then
multiply that new weight by your sampling weight (another kind of
inverse prob weight) to get a composite weight, so you get a smooth
matching-style estimator. But if predicted probability of treatment
is ever close to zero or one, then the approach is probably no good (p
must be strictly in the interior of the unit interval, and if p is
1e-12 for one treated obs then that obs gets all the weight,
effectively throwing away the rest of the data on the treated).
Note that weighting the treatment group by 1/p is for estimating the
population mean of the outcome had they all been seen in the treatment
group (Brunell and Dinardo:
http://pan.oxfordjournals.org/cgi/content/abstract/12/1/28) and
weighting the untreated group by 1/(1-p) is for estimating the
population mean of the outcome had everyone been seen in the control
group, so the difference between those weighted means is an estimate
of the average treatment effect for the whole pop.
The literature on this type of thing is probably mostly forthcoming,
but if anyone has citations, please add them to the thread...
On Thu, Oct 9, 2008 at 1:19 PM, Stas Kolenikov <skolenik@gmail.com> wrote:
...
> svyset psuXyear [pw=weight in each wave], strata(year)
...
> If people could opt out of the treatment, or there was partial
> compliance with it, then you are in real trouble. I don't think those
> issues have been developed well enough in technical literature,
> although Steven S (or Austin N, or somebody out there!!!) can have
> more information about the topic. I would probably have more trust in
> instrumental variables estimators than in matching estimators, as the
> former are smoother, so svy-appropriate inferential procedures are
> easier to be applied towards them (-svy: ivregress- should work right
> away, for instance).
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/