Wednesday, December 16, 2009

People are natural cargo cultists - they (I'm saying "they" as I'm secretly a cat - just don't tell anyone) have natural tendency to focus too much on superficial characteristics - if something is good, everything vaguely associated with it must be good too! Or if something is bad, everything vaguely associated with it must be bad. So as people got into their heads once that Soviet Union and its "socialism" were "bad", everything "socialist" must be "bad" - like single payer healthcare, financial system regulations, and minimum wages. Never mind that even the basic premise is wrong, as performance of Communist economies was fairly decent, and as research says, Soviet growth performance was actually slightly above the global average in the period 1960-89, at 2.4 percent per year. Eastern Europe's growth was somewhat lower than USA's, Western Europe's, or Japan's, but it managed to beat by huge margins growth in capitalist countries of Latin America, South Asian, or even UK for that matter. Not to mention China vs India, or spectacularly good performance of the "socialist" Scandinavia. Or how pre-Reagan socialist USA was much better off economically than after Reaganite revolution.

I remember when I first started writing on economics, I was scolded for formatting my papers in a two-column, single-spaced format. While that format was common in computer science, to be taken seriously in economics a paper mustbe formatted as single-column, double-spaced.

But all academics eventually learn those. A much greater problem than these time-wasters are standards which are actively harmful, like...

Statistical significance

Probably the single biggest problem of science is appalling quality of statistical analysis. As a funny anecdote, I remember back at the University when I had to go through a bunch of database optimization papers - which implemented various performance hacks, and measured how well they do - and more often than not they did it on completely bogus data generated just for this case. If you spend even five seconds thinking about it, it's obviously wrong - all non-trivial optimizations highly depend on characteristics of data access patterns, and differences in results are many orders of magnitude. But for one reason or another using realistic data never got to be part of "good" database optimization research.

For one historical reason or another, almost all science got affected by obsession with "statistical significance". Typical research goes as follows:

Gather data

Throw data at some statistical package

Keep tweaking "hypothesis" until you get "statistically significant" answer, and send for publication

Still, that's nothing compared with uselessness of something being "not statistically significant". Even otherwise smart people routinely misinterpret it as "with high probability not true". If so, let me start a study in which I will take wallets of randomly selected half of 10 such people, and we'll measure if there's any statistically significant reduction of wealth from me taking their wallets. Double blinded and all! And as science will prove no statistically significant reduction relative to control group, which court would dare to question science and convict me for doing so?

While false positives - wrongly rejecting null hypothesis - can come from either bad luck, or bad study; false negatives - not rejecting wrong null hypothesis - can come from either of those, or from insufficient sample size relative to effect strength. How strong would the effect need to be?

A randomised controlled trial of anti-smoking advice in 1445 male smokers, aged 40-59, at high risk of cardiorespiratory disease. After one year reported cigarette consumption in the intervention group (714 men) was one-quarter that of the “normal care” group (731 men); over 10 years the net reported reduction averaged 53%. The intervention group experienced less nasal obstruction, cough, dyspnoea, and loss of ventilatory function.

During the next 20 years there were 620 deaths (231 from coronary heart disease), 96 cases of lung cancer, and 159 other cancers. Comparing the intervention with the normal care group, total mortality was 7% lower, fatal coronary heart disease was 13% lower, and lung cancer (deaths+registrations) was 11% lower.

But what were the changes of getting statistically significant results? For clarity let's skip all statistical complications and do it the most brutal possible way, and even make both groups the same size. Control group size 720, intervention group size 720, true chance of death in control group 45%, true chance in intervention group 42% (7% reduction in mortality) - we just don't know it yet. Statistical significance levels 95%. So control group had 95% chance of getting between 298 and 343 deaths (I'll skip the issue of one-sided and two-sided tests of significance as the entire idea is highly offensive to Bayesians). Chance of intervention group having fewer deaths than 298 - merely 38%. So assuming entirely implausibly that this 1440-person-strong 20-year study was perfectly run, there's be 62% chance that the results will be worthless because 1440 people is very far from enough. Except as maybe fodder for a meta-analysis.

By the way reduction is merely 7% as what was studied was "trying to convince people to stop smoking". Most people wouldn't be convinced or would relapse; and many in control group would stop smoking on their own.

Anyway, how many people would we need for study with 45% and 42% death rates groups to have less than 5% chance of both false positive (conditional on null hypothesis being true) and false negative (conditional on null hypothesis being false)? 3350 in each group, or 7100 altogether. And that was smoking - the biggest common health risk we know. How many people would we need if we studied normal-risk people, let's say 10% death rates during study time, and levels of relative risks typical for diet/lifestyle intervention, let's say 1%? Two times 1.2M, or 2.4M people. More than the entire population of Latvia. And that's before you consider how highly non-random are dropouts from such studies, and how they will swamp out any results we would get. And any double-blindness here?

In all likelihood, we will never get any data about effects of this kind of interventions. Science will have no idea if fish oil cures cancer, vegetables lower risk of obesity, or if organic is better for you than non-organic.

My software

Creative Commons

Unless otherwise expressly stated, all original material of whatever nature created by Tomasz Węgrzanowski and included in this blog, is licensed under a Creative Commons License. It is also licensed under GFDL (for Wikipedia compatibility).