Scientific Research 101: Results!

March 24, 2010

So you’ve just completed your last assays on physioprofitin signaling in the Namnezian complex. Lo and behold it is qaz-mediated, just like you suspected and the beccans are off the freaking chart. woot! PiT/PlS ratios are within relevant physiological ranges and still this work of art, your labor of love, came through with the experimental goods.
With a hope and a prayer you run your stats….and YES! p < 0.01!!!!!!!!!!!!!!!
What is the correct way to report your big Result?

ooooooh, this is a pet peeve of mine. DM, you and everyone else who picked anything different than ‘other’ needs to go back and take stats 101. Review in particular the theory behind p values and the null hypothesis.
I chose ‘other’, and suggested “…is consistent with…”

Well, there are two issues in play: 1) the need to understand what statistics really tell us (which is what Joe is talking about); and 2) the need to communicate science in a concise and pragmatic style that isn’t going to frustrate the reader, who just wants to know what the fuck happened and what the authors think it means.
To me, I assume that #1 is understood as a professional disclaimer to all published work (except when the applied statistical analyses are clearly inappropriate). If I think their appeal to stats is bullshit, then I can simply draw a different conclusion from theirs. But to be honest, papers that are littered with “mights”, “possiblies” “ifs”, “buts” and “consistent withs” relating only to the use of statistics, rather than for more pertinent reasons regarding experimental design, equipment limitations &c, tend to make for tough reading. It also isn’t always clear what the writers are being so cautious and indecisive about (are they holding back because they know their experiments were a bit dodgy? Because a competitor is about to bring out a contradictory paper?)
That said, the trend for differentiating between “ultra-mega-eleventy significance!” vs “mildly significant” is a more tangible issue, imho, because it involves an active and explicitly inappropriate interpretation of statistics rather than a passive and stylistically pragmatic one, if that makes any sense.

Hmmmmm… Interesting, DSKS. My reaction is the opposite. When authors imply that a p value less than 0.05 represents either proof or causality, then I assume the authors are either statistically ignorant or hoping salesmanship will substitute for better experiments (or both: ignorant liars). However, when the wording is cautious and consistent with what the measures actually demonstrate, I tend to trust the authors more.
I guess it depends on how you read papers. If you read papers wanting to be told the take home message in the simplest possible terms, then ‘This does that’ is great. But if you read papers to find out what the authors actually observed, then the literary obfuscation is annoying.
In the end I admit I can’t come down firmly on one side or the other, though. Because when I’m on the toilet reading about some asteroid in Science, I want simply to be told whether it’s bigger than Manhattan or not and whether it’s iron or ice or dust or whatever and whether it’s going to crash into the Earth any time soon. But I suppose actual astrophysicist want a bit more detail and less hype.
Which is probably why the glamourmagz use glamourlanguage, and the niche journals are full of ‘consistent with’ and ‘correlated with’. One is toilet reading, the other is archival.

I intensely hate most of these choices. “the statistical analysis is consistent with…” is the most accurate.
The most egregious is “proved”, which is flat out icky.
I don’t care for most of the other choices, though I’ve seen them all and they may be awkward or inaccurate but they don’t OMG Totally ruin the paperz! IMO.
As a grammatical aside, why are “confirmed”, “demonstrated” and “established” encumbered with “an effect of of”? They seem to read equally well without the extra words.
I’m not sure the root cause of most of these things is really a poor understanding of stats. For example, I think that the phrase “The statistical analysis (pwhose conclusion that…? Not MINE!!!!”.“BTW, that’s physioprofitin(TM) and you owe me royalties.”
What are you, Craig Venter? Going around trying to claim intellectual property on perfectly innocent biological molecules! Well, ok, chances are physioprofitin isn’t innocent (I’m guessing it’s upregulation in the Namnezian complex leads to Turrets or something), but you know what I mean.
Also, if the beccans are off the freakin chart, one would think that should be noted in the results.

Well I guess the polling explains it.
I’m looking at a basket of FAIL on this one. Joe is on the right track and is *almost* there. Kudos. DSKS has, I think, hands on a key that Joe is missing…
Dr Becca….oh boy. Like a moth to the brightest flame. In a word, no.

I’m with the “if you need stats to prove it it ain’t biologically meaningful…” but I’m a cranky old asshole. And I have never done a t-test for any of my results, ever. For a lot of cellular physiology stuff, the sample sizes are not so large as to preclude just showing all of them.
Besides, I’d much rather have the support of further experiments to test the point.
But with Joe’s point, I was taught to always indicate to the reader where we as authors came down strongly on an interpretation, and where we were kinda throwing out a plausible explanation. In the first case we stuck with simple declarative statements, whereas in the second we used some qualifier (perhaps, possibly, etc.)

demonstrated, for the first [muppethugging] time in [muppethugging]history,
is not inconsistent with
would prefer not to word the sentence that way in the 1st place.
provided good evidence for
did not rule out

@DM in response to #13:
Well, it sure the hell isn’t any ‘statistical analysis’ directed solely at computing a p value. To say otherwise is like flipping a coin 3 times, never getting tails, and therefore concluding that the coin has only one side, or perhaps a head on both sides (assuming someone has previously published a paper containing the sentence: “Previous studies failed to find evidence for more than one side on a coin. Here using, three-dimensional reconstruction coupled with platonic solid analysis, we present evidence that coins resemble extremely squat cylinders and therefore have in fact two major sides.”)

Heart attacks are pretty fucking biologically meaningful, particularly when you are the biological entity in question.
And you still need stats to study biologically meaningful phenomenon in the real world. There is only one thing that has the right to claim that it can “show”, “reveal”, “establish”, etc an effect.
DM. Maybe it’s too early in the morining but I’m torn between mathematical proof and the Scottish Presbyterian Church as an answer. But the sentence is already FUBAR because is starts with “statistical analysis” as the active player when it’s really a tool to aggregate your DATA.

Becca,
—“What are you, Craig Venter? Going around trying to claim intellectual property on perfectly innocent biological molecules!”—
C’mon now—I claimed IP on the word itself, as I coined it. Anyone who uses it owes me cash…sweet sweet cash…DM’s desparately late attempt at citation via hyperlink is futile. My lawyers are on the case already.
Of course, I should also add that if someone also named a protein ‘physioprofitin’, and it turns out to be fit for profitable commercial use, then they would also automatically hand over to me the rights to the protein and all associated IP—because I had also functionally defined physioprofitin as something that would “make heaps and heaps of cash when expressed”.
And oh….Muhuhahahahahaha…..eeeexellent….
Back to the main issue….DM @13…Proof or Evidence?

blarg.
Just noticed that my less than sign borked my comment.
That bit should have read something like:
I’m not sure the root cause of most of these things is really a poor understanding of stats. Rather, I think it’s as much a function of the intrinsic awkwardness resulting from the lack of human agents in scientific writing.
I think the phrase: “Based on the statistical analysis (p less than 0.05), we conclude that qaz-mediated…” is perfectly valid. But because scientific writing tends to avoid putting any people into the situation, we pretend that the stats do something. The stats didn’t establish qaz-mediated …, the experimenters established qaz-mediated… , using stats as one tool.

“I had also functionally defined physioprofitin as something that would “make heaps and heaps of cash when expressed”. “
like I said… venterlike.
😛
On a only vaguely related note, does anyone know if the Sonic Hedgehog protein namers got sued?

it’s really a tool to aggregate your DATA.DM @13…Proof or Evidence?we pretend that the stats do something. The stats didn’t establish qaz-mediated
Exactly. It is only the experimental result (aka, the data) that can ‘reveal’ an effect.
The notion that statistical analysis ‘reveals’ something sends me around the bend in a foaming froth.
And becca’s excuse for this practiceBut because scientific writing tends to avoid putting any people into the situation
is wrong. The answer lies in ….the actual answer.
All the statistical analysis does is confirm the likely reliability of the experimental result. That, or similar, is the way it should be phrased. There is no reason to insert a fake statistical person into the description!
Now, anyone want to take a shot at why this is a problem for the way scientists operate (i.e., think about their experimental results), not merely a semantic preference?

i can’t stand the thought of starting a sentence with “the statistical analysis…” – it sounds to me like an appeal to a false authority. (which is an arguable point.) say what the data tell us, cite the stats to show that you are not completely full of shit.
Qaz is associated/correlated/whatever with upregulation of physioprofitin in the Namzenia complex (f values, p

It is only the experimental result (aka, the data) that can ‘reveal’ an effect.

The ‘data’ don’t do that any more than a p value does. An ‘effect’ is a conclusion, an inference, and it is ‘revealed’ only by some mysterious neurobiological mechanism in the author’s mind.

All the statistical analysis does is confirm the likely reliability of the experimental result.

‘likely reliability’? No. It says nothing about reliability. It just says the probability of getting the same numerical result from a random sampling of the same set of possible numbers (population), assuming the set of numbers actually fits the characteristics assumed by the particular statistical procedure that is being employed. If every researcher who misused a t-test was forced to eat a statistics book, we would have a surfeit of statistics books in the world and a lot of researchers with paper-bloated bellies. And if every researcher who published a paper misusing t-tests was shot in the head, the NIH grant application success rate would shoot to about 90%, because the number of researchers in the world would be cut dramatically.
In any case, as mentioned above, statistics can’t do anything. In particular, it can’t conclude anything or infer anything, or even suggest anything. Statistics is only a tool for summarizing data, and statistical tests are merely a set of calculations for determining the overlap between hypothetical sets of numbers. People conclude things, make inferences, and suggestions based on the observations they make, often using statistics as a tool for clarifying and summarizing those observations.
There’s no problem with statistics, DM, or the use of them.The problem is that scientists confuse their observations with their inferences.

OK, I think I accidentally did some html-fu there…let me rephrase the point of cutoff and try this one more time….DM, please delete the previous two…
___________
DM@20—“Now, anyone want to take a shot at why this is a problem for the way scientists operate (i.e., think about their experimental results), not merely a semantic preference?”—
Easy peasy lemon squeezy, methinks.
Bcos many scientists tend to, when seeing a P value of <0.01, believe that their observation is now the invariant truth, maybe? They not only tend to forget that it is a likelihood of reliability, as you phrased it, (or reproducibility?) but also tend to forget that their observations were limited to the boundaries of a gajillion different experimental conditions and variables etc, and the P value only tells them the reliability under those very conditions….. But, “I see it. The P value is <0.01. This must be absolute truth. I will now base subsequent and consequent research on this world view”……
Am I close?

It just says the probability of getting the same numerical result from a random sampling of the same set of possible numbers (population), assuming the set of numbers actually fits the characteristics assumed by the particular statistical procedure that is being employed.
I’m not seeing where my “reliability” is too much of a jump from that but sure, your formulation gets at the essential point. kinda unwieldy for a results section though.An ‘effect’ is a conclusion, an inference,
fair enough. What’s your preferred way to state the result then?
AM, I think the bigger problem is in undervaluing the data point that fails to garner the all-important p-value certification. A category of Type II error, sortof. I qualify this because it has more to with the deployment of statistical procedures than being a feature of the approach itself. A sort of meta-design type of miss.
In behavioral pharm, as one example, we’re frequently keen on inferring a “dose-dependent” effect of a drug. Some canonical types insist that you must have statistically significant differences between two drug doses, not just a difference from vehicle in both cases. even if you’ve observed results which order in an expected line. I don’t see where anyone with a brain looks at that middle dose that differs from the vehicle, but not the numerically lined-up dose, and says “well, nope, must not be a dose dependent effect”.
My concern is that the Church of the Holy Asterisk pushes people away from really understanding that the data are the driver.
It also leads (and this is especially bad for people who mistakenly think that anything smaller than 0.0499999 is meaningfully more impressive a result) to….animal use issues.
Perhaps this should be our next exercise: Is it justified to use 15% or 25% more animals simply because you think p less than 0.01 is so much more impressive than p less than 0.05?

All the statistical analysis does is confirm the likely reliability of the experimental result.

Dude, you are using “reliability” in a very idiosyncratic way. What the statistical analysis tells you is the likelihood–given more or fewer assumptions, depending on the nature of the analysis–that your conclusion of a difference among groups sampled from populations is due to chance and not due to a difference among the underlying populations. Why are you making this more complicated than it needs to be?

The poll confirms what’s well known – biomedical types are statistically illiterate. Which isn’t surprising considering that in most universities one can get a B.S/Ph.D./M.D. without ever taking statistics. The thing is, the p=.01 only means that, given a bunch of assumptions, the chance that the observed difference is completely random is 1%. The devil is almost always in the assumptions part (the “model”). And it does not say that there is a 99% chance that the difference is non-random.
I interpreted the sentence as a statement about statistical treatment of the observed effects and entered “applied to” in the Other.

Oh, you are askin’ for the precision of language smackdown fun games time, aren’t ya?
First let’s get the word choice issues out of the way:
*“we would have a surfeit of statistics books in the world and a lot of researchers with paper-bloated bellies. “
“surfeit”- I do not think that word means what you think it means. Antonyms more appropriate in that context: “dearth”.
*“All the statistical analysis does is confirm the likely reliability of the experimental result.”
“reliability” – I do not think that words what you think it means. Closely related synonyms that may be more appropriate in that context: reproducibility.
Reliability: reproducibility :: accuracy: precision.
P values address only precision/replicablity (and that imperfectly, for the reason others have noted- you have to assume the conditions are the same).
*Strictly speaking, from a grammar-fascist perspective, data don’t reveal anything.
Now, as far as the philosophical point, I maintain that your objection to this is on par with objecting when people say things like “poison dart frogs evolved colorful markings to ward off predators”.
Now, of course poison dart frogs did no such thing, they didn’t go about saying “hmm, too many predators here, let us go forth and order some colorful markings from the Evolution R Us catalog”And we* all know this!
(*we being, for this purpose, all biologists worth their salt and any well-educated layman to boot)
However, I can see the argument that it does sometimes get cumbersome to say “colorful markings arose randomly in poison dart frog populations, and those froggies with colorful markings tended to get consumed less often by predators” (or, conversely “as colorful markings arose in poison dart frog populations, predators that responded by NOT EATING said froggies outcompeted those that ate pretty poison froggies”)
DM, your argument about precise statistical phrasing is on par with the view:
“If we say things like “X evolved Y in order to…”, of COURSE we will all have ERRONEOUSLY purposeful views of evolution as a force that requires intelligent design or intent, and we will all therefore believe in god or the like and science will be unmade!!!1111 DOOM! DOOOOOOOOOM!!!!!1111eleventy”
Look, there are many words/phrases/verbal habits that we all use that, if taken literally, are nonsensical. But many times they still represent, in context, something perfectly reasonable.
Saying “the statistical analysis revealed an effect of qaz-mediated…” MEANS “judging by the statistical analysis, the odds of this seemingly qaz-mediated…. arising by some chance non-qaz-mediated mechanism is less than 1/20” [for p less than 0.05, of course]. If you don’t read the former and mentally substitute the later, YOU are being sloppy (but not the writers of the paper).
Now, if you have a psychological study showing that people reading the type of statement you object to actually leads to incorrect conclusions (or, for that matter, that purpose-sounding phrases about evolution lead to intelligent design support) I’ll A) apologize profusely and B) eat my hat. If, however, you are just ASSUMING that the phrasing you find so objectionable leads to sloppy science… well then. You should perhaps eat a linguistics text (Or read a poison dart frog)

All the statistical analysis does is confirm the likely reliability of the experimental result.

This isn’t just misleading, it’s wrong. Reliability can only be confirmed by multiple studies with different samples. A single study says nothing about reliability.
To your poll question, tell me what the sample size and estimated effect size are. Your effect size could be tiny and meaningless, but you can get p

DM: I stand by my original suggestion: “…is consistent with…”
Comrade: ‘Reliability’ to me implies more than probability. But I agree that it’s open to interpretation, and accept that my interpretation might be uncommon. In my defense, though, I’d offer that ‘reliability’, given it’s fuzziness, might not be the perfect word choice.
becca: You’re right. I used ‘surfeit’ completely backwards. My bad. But I don’t think you mean ‘precision’ either where you suggest that. ‘Reproducibility’ I might go with, though. Maybe. At least more than ‘reliability’. I like what lylebot says about ‘reliability’ in #30.
I like what DK says in #28.
In the end, what we’re wrestling with here is the tension between simplicity in communication and accuracy in communication. The ultimate in accuracy is to show every data point and not use any statistics at all. But who wants to read The Journal of Long Tables of Results Without Commentary ? So we need some summary and interpretation. Summaries will always leave out details, and interpretation is always a ‘polishing’ of reality. The question is how much simplification and polishing is reasonable. Like I said earlier, I think ultimately that depends on the audience.
Maybe ‘supplemental information’, instead of the bullshit it typically is now, should actually be raw data and/or pre-statistics tables of measurements for all the stuff that is summarized in the main figures and text. Then a paper could be read on multiple levels. If you just want the take-home message, read the abstract. If you want more detail, read the paper. And if you really need to go deep into the recordings for some reason, get the supplemental info. This plan would probably preclude a lot of the retractions too, because every figure or statement in any text would have the corresponding data and steps in the analysis somewhere in the supplemental info. People couldn’t get away with shit. And the hassle for authors would be minimal. They should have the raw data and analysis for everything in the paper already anyway, right?

i feel sorry for biologists.
you must be constantly dealing with prob/stats incompetance on a daily basis to have a vigorous debate over such a simple concept.
BTW-I voted “revealed an effect of” even though I don’t really know what that means cuz:
-i didn’t like any of the choices
-it was the last one
-who the hell ever bothers typing something into “other”

Lylebot- The beccans dude! Look at the beccans. And the PiT is *tight* I’m telling you! Heck I could just present those instead of the bunnyhopping PiT/PlS ratios and it would *still* be convincing to the naked eye. “effect size”. Sheesh. This ain’t fMRI homie.

DM — I am guessing that you received some negative reviews on a manuscript, accusing you of overstating your interpretation of the data. I am also guessing that you need to suck it up, admit you were wrong, and run some more animals and/or tone down your conclusions.

You would be guessing wrong N-c. More accurate would be to speculate how many papers I get to review that use “the stats revealed!!!11!” as the results style. also to wonder about the number of papers I review that act as if non-statistically-significant differences are meaningless and/or proof of “no-difference”.

Dr. Isis- when I read a scientific paper, unless the context is very unusual, my default assumption is “significant” = statistically significant. The effect size may be very small. Or the effect size may be ginormous and it may still be, from a biological perspective, twice as dull as toast.
When I read a standard newspaper or many blogs, I’m more likely to assume “significant” is supposed to mean “important”. That said, when a newspaper or blog is reporting on a scientific topic and they use the term in an ambiguous way, it can be irksome.
So if your pet peeve concerns papers containing statistical analysis where people use “significant” instead of “important”, I’m right with you. If, on the other hand, somebody is talking to you and says something like “healthcare reform is a significant issue” and you glare at them and rant about P values, you might be on your own.
Really, communication isn’t hard. It’s just miraculous.
joe- I am not suggesting the use of ‘precision’. Instead, I said the relationship between precision and accuracy is analogous to the relationship between reproducibility and reliability. Analogies… those are them things where you compare one thing to another.
(for reference: in the Blue Collar Comedy Tour movie, Bill says “Yeah. I like to use analogies in my show.”
[Larry has a confused look on his face. Bill leans over and stage whispers]
Bill: That’s where they compare things… )

Look, all you’re saying with statistics is that you are pretty sure your difference – whatever it is – isn’t solely due to chance. Whether you choose to set your N to perceive a difference at p less than 0.5 or less than 0.05 or less than 0.01, assuming you are in the right ballpark about the strength of the effect, you’re taking a risk it’s still due to chance. Moreover, you’re basing your perception of that chance on your possibly flawed understanding of certain properties of the true distribution of whatever you’re measuring.
I think you chose this topic to generate a bazillion comments truncated at the less-than sign.

I don’t think DM’s original challenge has been answered yet, but…Wasn’t statistics used by Nazis to ‘prove’ the superiority of the aryan race?
…thank you thank you. We can all move on to another thread now…