Or, the thoughts of several frustrated intellectuals on Sociology, Gaming, Science, Politics, Science Fiction, Religion, and whatever the hell else strikes their fancy. There is absolutely no reason why you should read this blog. None. Seriously. Go hit your back button. It's up in the upper left-hand corner of your browser... it says "Back."
Don't say we didn't warn you.

The content of this blog is comprised almost entirely of opinions... and not particularly intelligent ones in Drek's case. The opinions contained herein are not those of the blog authors' employers. Total Drek is not responsible for the content of comments.

Friday, June 22, 2007

The Death of Significance?

At Decision Science News (another h/t to Brad DeLong), Dan Goldstein prints a comment from J. Scott Armstrong who has "concluded that tests of statistical significance should never be used." [Emphasis mine.] He is not conducting statistical performance art, and I substantially agree with the conclusion. A couple random remarks:

There are results which lead to a conclusion that social science researchers tend to tweak their statistical models to cross significance thresholds so they can produce positive results with (presumably) greater probability of publication. But,

To do so invalidates the published inferences. Because,

The "classical" statistics reported by most software packages are invalid under any pretesting (i.e., deciding on a model specification based on results from preliminary estimation). And,

The prospects for computing or simulating correct statistics are as good as the quality of the researcher's choice trail. But,

A lot of social science "theories" don't determine the full set of explanatory variables, making the lure of statistical model diagnostics attractive. Though,

There are families of models (e.g., the 'flexible functional form' cost models in economics, which I work with) where individual coefficients have no theoretical interpretation, in which case the researcher has no direct basis for evaluating the consequences of a restriction. More broadly,

Properties of social science data often mean we need to use consistent but inefficient estimators; sometimes "better" significance from inappropriate estimation methods have little meaning. (*) Last,

For those of you with institutional access, links to the International Journal of Forecasting article are at the Decision Science News link.

(*) This sometimes leads to wacky advice being given to everyday applied researchers from econo- or sociometricians, of the "if a result from an inconsistent esitmator goes away with a consistent (but inefficient) procedure, be suspicious [or vice-versa]." Armstrong's bottom-line recommendations address the reasonable suspicions that might arise.

6 Comments:

I quite agree with Armstrong that we should also be publishing our null findings. If we're truly interested in getting a better grasp on the social world, we are limiting ourselves by only looking at the significant results.

This discussion is tangentially related to something else I've struggled with in sociology. We sociologists (probably based partly on the liberal political views many of us hold) are often most interested in what is happening with disadvantaged groups (women, people of color, lower social classes, etc.) and often forget about their advantaged counterparts (men, whites, the upper class). However, if we only focus on the legitimate detriments that the dispriviledged face, we run the risk of forgetting that inequality is a system, and that there are likely psychological effects of priviledge (if nothing else).

It's not all that "sexy" in the publishing world to report that white people, for example, have lower self-esteem than African Americans. I have yet to come across a researcher who asks, "Why do whites have lower self-esteem than African Americans?" Instead, the question is always posed as, "Why do African Americans have such high self-esteem?" However, I would argue that it is equally important to look at both questions. Just as social scientists limit ourselves by only looking at one side of what are stats are telling us, so do we limit ourselves with the kinds of questions we ask in the first place.

I agree on null findings. But to play devil's advocate for a moment, I suppose a perspective a journal editor could take is that even a lot of the positive results pertain to "small" research projects. So a class of null finding would fall under the heading of "thanks for telling me I don't need to worry about something I wasn't going to worry about."

What's of much greater concern is that properly arrived-at null findings in "big" areas not be censored. And in that regard, the point about not limiting the questions being asked is very well taken. (Econ has a similar problem in the study of inequality for different reasons. If you start probing why some upper-class parents would, say, hire naming consultants for their kids, an array of interventions in the "market" that even many left-of-center economists are afraid of start looking like good ideas.)

Regarding looking at the whole system versus parts, there also can be purely statistical reasons why that may be a good idea -- data availability for different groups, exploiting cross-correlations, etc.

"researchers tend to tweak their statistical models to cross significance thresholds so they can produce positive results with (presumably) greater probability of publication"

This makes me very skeptical reading any sort of findings in publications ...One of my biggest concerns would be with pharmaceutical companies in their clinical trial reports on some new drug. Having them report only the drugs/meds that yield significant results (after repeated testing it's known they'll show postive) can be dangerous...which brings me ask who is doing the research? and the moral/ethical issue that should be attached.

There's good reason to be concerned over results of pharmaceutical trials, given that they're often funded by firms with strong financial stakes in the outcome. Medical journals are, increasingly, requiring disclosure of academic researchers' relationships with pharmaceutical firms. Having a strong, independent, and technically capable regulator helps, but there are cases to be made for more public funding of drug trials. (The analytical methods Armstrong recommends would help identify fragile results, too.)

Among many shocking things in this account from a NYT business reporter detained by factory management while attempting to report on the Thomas the Tank Engine recall from China is this:

Many experts have told me that one of the most serious problems in China is that the government lacks the power to control the nation’s Wild West entrepreneurs, deal makers and connected factory owners.

Bribery is rampant, and government corruption widespread. Just a few weeks ago, the top food and drug regulator was sentenced to death for taking huge bribes from pharmaceutical companies. But it’s not clear that strong messages like that will stop the anarchy.

"serious problems in China is that the government lacks the power to control the nation’s Wild West entrepreneurs,.." Perhaps we should be producing more of our own goods and stop all the "off-shore" activity. Just the other day (Toronto CAN) toothpaste made in China was pulled from shelves, all of which came from dollar discount stores...sad isn't it, that those living on marginal means would probably have bought their toiletries in the discount stores..not to say the poor are the only ones buying in these stores, but probably a good percentage of them.

With all the Chinese cheap-product failure news (the latest: defective tires implicated in at least a couple of fatal accidents), I think there's a 21st century version of The Jungle to be written. What we've seen instead has tended towards Pietra Rivoli's Travels of a T-Shirt in the Global Economy, or even rosier accounts, which at best are non-alarming versions of the outsourced-manufacturing story for Western consumption.

Dani Rodrik of Harvard's Kennedy School of Government (his blog, focusing on international trade issues, is here) has a useful suggestion that trade agreements explicitly take place in "policy space" -- so developed countries can more effectively export labor and/or consumer product safety standards, offering faster growth to developing countries as a quid pro quo. The idea is not to throw the 'baby' of gains from trade out with the undesirable side-effects. The question is whether such agreements would be sustainable without institutions in China to effectively stem endemic make-a-quick-buck corruption.

For that matter, what progressive fair-traders might think of as undesirable side-effects might be the intended effects for at least some outsourcers -- i.e., to hell with pesky product-safety regs. If that's the case, then maybe we would be better off with, if not autarky, then limiting freer trade to a set of countries with compatible labor and consumer product safety policies.