Link Bar

The Winning Neuropublishing Jokes and a Statistics Lesson

Okay, I know I'm a day late to announce the winners of the Neuropublishing Jokes Contest. Unlike other bloggers with convenient excuses for delayed posts, (cough, cough), I have no excuse except, well, I'll explain later. So sorry about the delay, and without further ado...

This one got the most votes from random people I pulled to my blog to help me judge. In an unexpected twist, though, this joke is actually not eligible for a prize because Anonymous chose his/her other two jokes as the official entry. But this just received so many compliments that I wanted to award the prize anyways, if just for bragging rights.

Q:How many brain scientists does it make to write a bestseller? A: None. They taught the lab rat to do it.

Again from Anonymous:

Q: How many brain scientists does it take to write a best-seller?A: Thousands! Of course, after you correct for multiple comparisons, only a handful are doing any real work.

Congratulations! Both runners up requested The Graveyard Book as their prize. So Liana, I'll be contacting you about your mailing address, and Anonymous, I have a good guess about who you are, but please contact me as well.

Okay, and here's the reason I've been procrastinating on the results. I guess, *sigh*, I'm going to have to explain Anonymous's second joke. I know I'm going to explain it slightly wrong, and some statistician will come out and tell me I'm dumb, and it'll be embarrassing for all involved (where by "all involved" I mean me). But I'll give it a try...

*rolls up sleeves*

In an ideal world, we wouldn't have to do statistics on experimental data. If we were doing an experiment on whether morning or evening testing would result in better scores, one ideal data set would be if all morning tests were better than all evening tests:

Morning: 99, 97, 92, 95, 98
Evening: 85, 90, 82, 70, 88

However, that's never true in the real world. In reality, our data is noisy because of factors like individual variation, testing conditions, phase of the moon, etc. Therefore, rather than a clean difference between the datasets, we usually end up with two overlapping datasets:

Morning: 99, 97, 92, 82, 55
Evening: 85, 92, 70, 95, 88

So see how Morning tests are mostly better, but there's alot of overlap? With datasets like this then, there's two possible interpretations.

1. Morning testing is better on average than Evening testing (ie, the experimental conditions are Actually different)
or
2. The two testing conditions are the same, and the difference you get is just a fluke of the specific samples you took. (ie, the experimental conditions are Actually the same, aka the Null hypothesis)

To get an answer, we perform a statistical test that calculates the probability of getting our data set if the conditions are Actually the Same. This is called the p value. In other words, if the p value is less than 5%, there is a less than 5% chance that the conditions are Actually the Same.

It's standard in the sciences now that if the p value is less than 5%, we conclude that our experimental conditions are probably different.

With me so far?

Okay, so the whole p value and statistics thing works fine if you just do one experiment with one statistical test at a time. However, when you're analyzing brain imaging data, you're interested in a whole bunch of different areas. Usually, we divide the brain into tiny cubes a few millimeters wide, and perform a statistical test on every single one. Now we have a problem, because even if every single one the cubes are Actually the same for the two experimental conditions, 5% of them are going to pass our test, just because of random chance. Say we're testing 100,000 voxels -- that's 5000 voxels that will light up in our brain image due to random chance!

Therefore, for neuroimaging, we have to do a more stringent statistical test, and this is called Correcting for Multiple Comparisons (cuz, we're testing multiple cubes, see?). So if you're doing an expeirment, you might get activations in a whole bunch of voxels, but once you correct for multiple comparions, only a handful are actually activated.

Get it? Funny huh?

Um, get it?

Eh, well, it's really funny to neuroscientists. Just take my word for it.

Thanks to all the good folks who entered the contest. Do go over to the contest and check out all the entries. It was fun :-)

I love how your scientific brain works. My husband is a technical genius and I don't understand a lot his jokes, but I think it's cute that he thinks their funny. I did like your first joke though. Thanks for your comment on POV. Also thank you for becoming a follower. I already like your blog a lot.

I feel compelled to write my own short explanation of multiple comparisons for a lay audience, because I think I'm going to need it again some day...

Imagine you have a quarter and you want to know if it always comes up heads. You flip it 5 times, and it comes up heads every time.

Because you're an expert in stats, you know that that will only happen 1 out of every 32 times with a normal quarter. In other words, the probability of getting that result with a *normal* quarter is around 3%. In other words, as Livia pointed out very well, we're going to say that we think this is a trick quarter, but we acknowledge there's a 3% chance that we just got a strange set of coin flips.

In scientific terms, the "null hypothesis" is that the quarter is normal. We tentatively "reject the null hypothesis", because there's only a 3% chance of a trick quarter. This is a key point about science -- EVERYTHING is tentative. We're never, ever sure about anything. We can never directly prove our hypotheses are correct, we can *only* disprove other hypotheses. And we always doing this while acknowledging there's a certain chance that we're wrong. Hopefully, that chance is vanishingly small, but not always...

Now, on with the story. Say you go to the bank teller and tell them to open up the vault, because you heard a rumor they might have some counterfeit coins in there. You insist that they flip each of their 20,000 coins five times each, and if any of them come up with heads all five times, you're going to call the cops.

See the problem with this? While there's only a 1/32 chance that any one quarter will come up all-heads, when you do this 20,000 times, you expect several hundred quarters to have all-head results, through pure chance alone.

You need to be much more careful with your threshold for a counterfeit coin because you're testing so many, and so you "correct for multiple comparisons". The simplest way of doing this is just to change your mind about when you're *sure* a coin is counterfeit. If you're satisfied suspecting a single fake coin after 5 throws, you'd require, say, 18 throws to satisfy yourself that the bank really had a bad quarter.

Livia's explanation was great, but if you didn't get it the first time around, maybe that helped?

I don't miss stats (particularly since, when I was at uni, we ran a lot of it on old Apple IIs).

Did you ever hear the Monty Hall problem (aka the probability thingy with the goats I'd much rather win)? Joe and I used to argue over it a lot, until he changed the prize distribution (once goats became the big prize, I was totally on board...)

FTC Disclaimer and Copyright

Usually, I write about books that I own or that I borrowed from the library. If I received a book as a free review copy, I will say so in the post. Regardless of how I got the book, I do not do traditional book reviews. Rather, I focus on one or two writing techniques used by the author.All links to Amazon and Smashwords are affiliate links.

All content is copyrighted. Feel free to quote excerpts from this blog, provided that you link back to the original article. Please ask for permission if you wish to quote more than 60% of an article.