Re: st: simulate consequences of selection bias 101

I meant to say "selection on the dependent variable". I wanted to let
the students see that we might even get a sign flip if we select on the
dependent variable and run a regression of the range-restricted Y on X.

Thanks for the references.

Thomas

Austin Nichols schrieb:

Thomas Gschwend <gschwend@uni-mannheim.de>:
You seem to be using the term "selection bias" in a somewhat
nonstandard way ("virtues of selection bias" is certainly an odd turn
of phrase)--do you have in mind selection on the dependent variable?
Or the classic form of selection bias (selection on unobservables, or
omitted "confounding" variables, leading to endogeneity of X) which
could be modeled as a neglected nonlinearity in X for your case?
clear
range x -3 6 100
expand 80 if x<0
g y=x^2 +invnorm(uniform())
reg y x
reg y x if y>10
reg y x if x>0
lpoly y x
In this simple case, the omitted variable is clearly just X^2.
See SJ7(4):507-541
[http://www.stata-journal.com/article.html?article=st0136] for an
inventory of common solutions for endogeneity of X.
A nice example of sign reversal due to omitted variables that students
can easily understand is given in Julious and Mullee (1994) citing
Charig et al. (1986):
Tell students they each have a kidney stone. In past cases, treatment
OS (open surgery) had a success rate of 78% while treatment PN
(percutaneous nephrolithotomy) had a success rate of 83% overall. Ask
them which treatment they would choose. Now tell them the success
rates look rather different when stone size is taken into account. For
smaller stones (diameter <2 cm), 93% of cases treated with OS were
successful compared with just 83% of cases treated with PN. For larger
stones (diameter >=2 cm), the success rate of OS was 73% and the
success rate of PN was 69%. Now which would they choose, even not
knowing which size stone they have?
Always good to put death on the table as a possible outcome of omitted
variables bias in regression.
Steven A. Julious and Mark A. Mullee. 1994. "Confounding and Simpson's
paradox". British Medical Journal 309(6967): 1480–1481.
[http://www.bmj.com/cgi/content/full/309/6967/1480]
C. R. Charig, D. R. Webb, S. R. Payne, O. E. Wickham. 1986.
"Comparison of treatment of renal calculi by operative surgery,
percutaneous nephrolithotomy, and extracorporeal shock wave
lithotripsy". British Medical Journal 292 (6524): 879–882.
[http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=3083922]
On Tue, Apr 1, 2008 at 3:32 AM, Thomas Gschwend
<gschwend@uni-mannheim.de> wrote:

Dear all,
prompted by a student's question when teaching about the virtues of
selection bias I would like to simulate some data which fulfills the
following requirements, whereby Y = b0 + b1*X
1) When regressing Y on X (for the full sample)
b1 = -.5 and significantly < 0
2) When regressing Y on X (for a subsample, say for Y > 10)
b1 = +2 and significantly > 0
I am not sure how to do simulate data that fulfills both requirements.
Any help is greatly appreciated.
Thomas