Daily Dolt: Dennis “GET OFF MY LAWN” Byrne

The problem with dismissing the Carroll study because it is epidemiological is that you’ll also have to dismiss a multitude of public health studies, including ones claiming a link between radon and lung cancer. These are the same epidemiological studies that alarmed millions of Americans, frightening them into buying radon detectors and creating a huge radon mitigation business. No study is perfect, and Carroll’s shortcoming is that his data do not allow comparisons of individual women over time. But other major studies have, and according to one unchallenged compressive analysis of those studies, they show that a pregnant woman who has never had a child before and aborts in the first term increased her chance of breast cancer by 50 percent.

Let me offer up the model from the paper

Two explanatory variables are selected for modeling: (abortion)and (fertility).The trends for abortion and fertility are shown in Figures 8 and 9 for countries considered. The Mathematical Model is then:

Yi = a + b1x1i + b2x2i + ei

where Y represents cumulated cohort incidence of breast cancer within a particular age group; a is intercept, b1 and b2 are coefficients, and e is random error.

That creates a guffaw from those who know statistics at all.

He has a correlation Coefficient of .98.

Those who understand correlation coefficients are shooting liquid through their nose if they were drinking anything right now. I had to look at it about 20 minutes to understand this moron was trying to sell a .98 correlation coefficient.
What he has done is take mass data that shows one factor increasing (abortion) and another decreasing (fertility) and then regresses it upon a variable that is increasing-incidence of breast cancer.

So if I were to regress the number of abortions and the fertility rate on the number of televisions sold per person, I’d get about the same result over this period of time. So I can, according to this dumbass, claim credibly that television leads to breat cancer. Or, as the Orac points out, the reduction in the number of pirates has led to global warming.
There’s a variety of problems in this study starting with he throws out independent variables well established by other studies. In the case of linear regression, the problem is that if you do not include other variables, you cannot control for those variables and so not are just theoretical variables excluded, but well established variables demonstrated over and over are excluded from the analysis. To say the least, this is an underspecified

A regression model is underspecified if the regression equation is missing one or more important predictor variables. This situation is perhaps the worst-case scenario, because an underspecified model yields biased regression coefficients and biased predictions of the response. That is, in using the model, we would consistently underestimate or overestimate the population slopes and the population means. To make already bad matters even worse, the mean square error MSE tends to overestimate ?2, thereby yielding wider confidence intervals than it should.

No one accepts a .98 coefficient. No one. That is essentially regressing one variable on itself and in this case, it’s the regressing less restrictive abortion laws with a number of factors that have led to an increase in breast cancer.

Ecological inference is not an acceptable means of imputing causation on individuals from macro level data and this study violates the principle. One might use it to explore potential causes and whether there is a gross correlation, but not to determine causality. For that one requires cohort information or some other way to address individual observations.

It’s junk science. Yet the Chicago Tribune keeps publishing a clown who insists there is a link, but is wholly unqualified to judge that and uses crappy studies to do it. Why?

Fewer children, or fewer children breastfed, and scientists see an increase in the risk of breast cancer.

Given the prevalence of formula-fed babies in developed countries, a lack of nursing is a much more likely cause (or one of many causes) for the increased rates of breast cancer we’re seeing. (Use of pesticides, hormones, antibiotics, and other potential carcinogens in our modern industrialized agriculture could be another.)

That doesn’t explain the disparity in the number of conservative op-ed columnists they publish vs the much smaller number of liberals; heck, even the relatively smaller number of center-left if you want to be charitable.

There are clearly liberals among their readership also. Not all of us vote for Rubber Chicken (R).

The same bottom-line argument is made as an excuse for the number of conservative pundits on cable and broadcast tv. And, again, it’s a false argument given that the most watched shows present a center-left point of view. (That “center-left” distinction is important because even among those pundits actual “liberal” perspectives are few and far between.)

“From the three you then use one
To make ten ones…
(And you know why four plus minus one
Plus ten is fourteen minus one?
‘Cause addition is commutative, right.)
And so you have thirteen tens,
And you take away seven,
And that leaves five…

Well, six actually.
But the idea is the important thing.

Now go back to the hundreds place,
And you’re left with two.
And you take away one from two,
And that leaves…?

Everybody get one?
Not bad for the first day!

Hooray for new math,
New-hoo-hoo-math,
It won’t do you a bit of good to review math.
It’s so simple,
So very simple,
That only a child can do it! “

While your primary point is spot on, you miss the statistical story a bit. This is an example of “spurious regression” which is not exactly misspecification and is much worse.

Misspecification happens when an important variable is omitted, or a nonlinear relationship is modeled as linear, and can have an assortment of pathologies associated with it, but in general, regression is not a good way to sort out causality anyway, and the fact that your independent variable is strongly correlated with your dependent variable is about all that you can learn with this tool.

The problem here is that the independent variable is time series and serially correlated and trending. If you regress one trending variable on another or others you automatically explain nearly all the variance in the independent variable – doesn’t even matter if they are trending the same way (just get negative correlation). Rsquared values of .98 are pretty typical of regressions of nonstationary time series.

While whole books and graduate courses are devoted to figuring out how to deal with this problem, a quick and dirty one is to detrend the data and then look for correlation. So instead of regressing the actual incidence you could take the change in incidence as your y variable, and regress on the change in x, now the constant term will capture the trend and only variation from trend will be modeled. Without even doing the math you can see that the deviations are inversely correlated, the positive deviations from trend for breast cancer occur when the growth in abortion ticks down and visa versa.

True and part of the problem with addressing this paper is that the sheer number statistical malpractice is huge. Ultimately, the ecological inference problem kills it before you even look at the numbers.