Tuesday, October 12, 2010

This post is a semi-confession, and the difference between "I find X" statements and "X" statements is significant.

When I don't like something, I can of course come with some reasonable-sounding arguments specific to the particular subject, but usually "X is not really about X, it's just status seeking game gone awry" has no trouble getting onto such list. Sometimes high, sometimes not, but it's rarely missing.

When I like something, I tend to believe "X is not about X" type of arguments don't really apply - maybe there's some status seeking involved, but it surely cannot be that significant.

So here's my main point - "X is not about X" is not about X not being about X. It is a cheap and hard to refute shot at X to lower its status - essentially "X is not about X" argument is really about raising speaker's status relative to X.

Science is not about science

For example I find it intuitively obvious that academic science is mostly silly status game, with just tiny fraction of effort directed towards seeking important truth about reality.

How else can you explain:

persistent disregard for statistics, reliance on cargo cult statistics like p-values and PCA

vast majority of papers not getting even a single replication attempt

most research being about irrelevant minutiae

nobody bothered by research results being not publicly available online

review by "peers", not by outsiders

obsession about citations

obsession regarding where something gets published

routinely shelving results you don't like

This is exactly what we'd expect if it was just one big status seeking game! Truth seeking would involve:

everything being public from even before experiments start

all results being published online immediately, most likely on researcher's blog

results being routinely reviewed by outsiders with most clue about statistics and methodology at least as much as by insiders with domain knowledge

journals wouldn't exist at all

citations wouldn't matter much, hyperlinks work just as well - and nobody would bother with silliness like adding classics from 1980s to bibliography like it's still commonly done in many fields

most research focusing on the most important issues

vast majority of effort being put towards reproducing others' research - if it was worth studying, it's worth verifying; and if it's not worth verifying, why did anyone bothered with original research in the first place?

serious statistical training as part of curriculum in place of current sham in most disciplines

It's a miracle that anything ever gets discovered at all! It would be a very cheap shot, but it's really easy to finish this line of reasoning by attributing any genuine discovery as an attempt at acquiring funding from outsiders so the status game can continue. And isn't there a huge correlation between commercial funding of science and rate of discovery?

Medicine is definitely about health

And here's something I like - modern medicine. It's just obvious to me it's about health and I find claims to the contrary ridiculous.

Yes, plenty of medical interventions have only weak evidence behind them, but this is true about anything in the universe. Hard evidence is an exception (also see previous section) and among few things having hard evidence, medical interventions are unusually well represented.

And yes, a few studies show that medical spending might not be very beneficial on the margin - mostly in short term, mostly in United States, mostly on small samples, always without replication, and I could go on like this.

Anyway, modern medicine has enough defenders without me getting involved, so I'll just stop here. To me it just looks like what you'd get if it was about health, and it doesn't look like what you'd get if it was about status seeking, even if a tiny bit of it gets into it somehow.

But notice how the argument goes. Conclusions first, and you can always fit or refute status seeking in arguments later. This bothered me until I've seen even bigger point - it all works as well when you replace "seeking status" with "making money" or any other motivator!

Compare these semi-silly examples:

Writing books is not about content, it's about money

Writing blog posts is not about content, it's about status

Food industry is not really about food, it's about money

Organic food is not really about food, it's about status

Commercial software is not really about software, it's about money

Open Source is not really about software, it's about status

This is just universal and an argument like that can be applied to nearly everything. But in all these money and status are only mechanisms of compensation - people optimize for money or status because that's what people are good at, but the only reason there's a connection between money/status and given action or product is underlying connection with desirable outcome.

To pull an Inception - "X is about status, not X" is about status, not about X being about status, not X.

35 comments:

thomas
said...

"t's just obvious to me it's about health and I find claims to the contrary ridiculous."

really not obvious to me. how is medicine different than any other scientific field? it is about money of course. drug companies? they are funding quite a bit of academic research as well.... it is actually quite embarrassing if you look at the status quo in preventive medicine.

while i agree about the problems, i think your solutions is extremely flawed.

"vast majority of effort being put towards reproducing others' research - if it was worth studying, it's worth verifying; and if it's not worth verifying, why did anyone bothered with original research in the first place?"

who exactly pays for the replications? you wont get funding for it.

and who are those 'outsiders'? they are doing it for free too? just wondering.

how many articles in computer science have you replicated and analyzed?do you acquire funding?

Like I said in introduction "it's obvious to me" statements are about my intuitions. I understand it's not obviously intuitive to others (including Robin Hanson).

And of course replication would get funding! If someone cares enough to fund a study and find out if something is true or not, they'll definitely want to fund another to make sure first one wasn't flawed. Single studies are far too unreliable to be worth anything, this much is known.

Of course if study wasn't about finding out truth, but about status from publishing, cover story for manufacturer etc., then you won't see any confirmation effort.

Many people seek truth without getting anywhere near academia. Accurate information has plenty of uses.

And speaking of computer science, this is partly motivated by reading some paper claiming Common Lisp is faster than (dubiously written) C, but not bothering to put sources online for verification. Was that paper truth seeking or status seeking? Because I replicated enough of the study to verify that the author is a total twat.

all results being published online immediately, most likely on researcher's blog

a result that is positive leads to logical jumps into new research. if a PI publishes it immediately, suddenly, a competitor may take that new knowledge and apply for a grant based on a "hunch" that can lead to the next big discovery. asking them to publish immediately encourages others to lurk in wait until the grunt work is done.

results being routinely reviewed by outsiders with most clue about statistics and methodology at least as much as by insiders with domain knowledge

who would? truthfully, without an income, no one would spend the time busting their brains over papers (after paying for them) and posting their results. if some do, and they're faulty, think of the implications: PIs who published the works would need to refute every single one of them, right? the purpose of the journal is for others (competitors, experts in the same field) to ask those questions of how valid the research is. it's so cutthroat already.

journals wouldn't exist at all

journals do the same thing this author is vouching for... except they're actually in the field who knows wtf the scientists are talking about.

citations wouldn't matter much, hyperlinks work just as well - and nobody would bother with silliness like adding classics from 1980s to bibliography like it's still commonly done in many fields

this guy clearly understands nothing about citations. the point is to show the logical flow and justification of your research because it's being built on top of many other published research. there are articles where you can cite to demonstrate that this field is quite controversial as others have gotten different observations, thus, you cannot say that your own results are absolutely correct in the overaching theme of it all.

sure, I'll give you this. now, what are the most important issues? laymen such as oh, I don't know... the **AUTISM IS LINKED TO VACCINES MOVEMENT** would consider that a very, very important issue... but unfortunately, it's bull. if you say that scientists need to decide on the most important issues and all figure it out together... good job. you've just created a monopoly! Yay! Everyone's now being paid to do the exact same shit. you've just destroyed science.

vast majority of effort being put towards reproducing others' research - if it was worth studying, it's worth verifying; and if it's not worth verifying, why did anyone bothered with original research in the first place?

has the author not read any papers? there are repeatable experiments. people take others' methods and say, oh, this worked for them, so I'll do it. when they do it and figure out that that paper is full of shit, they WILL call them out on it. that's why science has progressed. people actually DO the experiments others have done before them.

serious statistical training as part of curriculum in place of current sham in most disciplines

i'll agree with you that statistics is less focused on in the biological sciences. i'd argue with you that no, it doesn't matter. you're designing experiments with ONE or TWO variables. science is all about simplifying and THEN looking at the differences between them. that's one of the only ways you can do it. if you make your experiment dirty, yes, you're going to need more statistics, but generally, science and publishable results are done with very few variances between groups so you do not need much more statistical knowledge other than what you've done before. if you're going into new conditions, scientists generally consult statisticians before they figure out the results. forcing all scientists to learn shit they don't need to know will only delay science. congrats.

Anonymous and Anonymous: I added a few links to data how rarely anyone bothered replicating results, and how rarely results survive replication attempt.

Re income and journals: What you're saying doesn't make sense, journals don't pay researchers anything, and medical research is increasingly trying to enforce this kind of early publishing to counter publication bias and post-experimental manipulation of results.

Have you even noticed how you're saying "they cannot do X or their status would suffer", not "they cannot do X or we wouldn't find out how it really is"? Exactly my point.

Re citations: Most papers cited were never read, pretty much disproving such claims.

Re importance: You don't really need millions of scientists focusing on one issue. But most studies don't get even one replication. With gene-disease associations (which is easy to meta-analyze, as claims tend to have the same form) 1% of claimed positive results were consistently replicated. 99% were either replicated without consistent results or nobody bothered. What is value of this 99% of research?

This 1% figure might even be overstated, if there are any methodological weaknesses shared by entire field.

By doing more replication and meta-analysis and less original work, you could easily get far more reliable results with the same amount of effort.

Of course with more statistical clue, fewer such false claims would get published in the first place.

Re statistics: This "you do not need much more statistical knowledge" is exactly the problem. Abysmal levels of statistical knowledge by an average researcher, and how rarely statisticians are consulted are both disturbing.

You cannot possibly have all three: most research claiming high statistical significance, most research turning out to be false, most researchers having any clue about statistics.

As two first are easily verifiable, universal lack of clue about statistics is the only possibility left.

"Of course if study wasn't about finding out truth, but about status from publishing, cover story for manufacturer etc., then you won't see any confirmation effort."

well, that won't work. you either slash the number of researchers by half or someone has to double or triple the budget. time of course is a huge problem too, especially in medicine where it takes years for a study to complete (hence i consider it to be amongst the most unreliable ones, which is funny considering you appreciate it so much ;))what is also known is that opening up the publishing process - at the moment - does not help much. some conferences let you publish your paper openly and people can comment on it. just nobody seems to comment - or even read it.

when you answered anonymous you were mixing things up, specifically this one:

"What is value of this 99% of research?"

sorry but you can't have it both ways. either nobody reads papers or 99% were inaccurate. you cant have an inaccurate paper when nobody knows about it ;)

"Many people seek truth without getting anywhere near academia. Accurate information has plenty of uses."

I will try to put it in terms that you like.Re: Income & JournalsQuality of Research: Researchers pay journals every submission, and this shows a very competitive environment. If we used your method of publishing online, there would be two things that would happen. 1) Other PIs will jump at real-time results and data obtained via hard work and pursue new ideas, and 2) These real-time results are not verified yet and will lead to much time spent on red herrings, thus wasting grant money and time. Scientific progress will be hindered and we will not find out how it (the truth of whatever you're studying) really is.Cost of Research: PIs will need to hire more people to transcribe lab notebooks and put it into laymen's terms. This is bad because 1) the cost will be high and takes away from money that could be better spent in research, and 2) the general populace will either not care enough to read it (98%), care enough and misread it (~1%), care enough and use results for their own political or ideological agenda (~1%), or care enough and understand it (your peers in your field) and criticize it.Peer Review: It does not matter what status you have. In a majority of universities, labs or departments will have these things called Journal Clubs where they pick newly published papers and discuss the merits or demerits of a paper. Yes, there are people who have very high status because they publish in a journal like Nature. However, when you comb through the paper and look at the experimental methodology, if it's bullshit and illogical, a group of individuals will call them out on it. The reason they have high status is because they have high quality papers, not the other way around. The same thing happens with peer reviewers before the paper even makes the cut into the journal. Again, if the paper is shit, the reviewers are that author's direct competitors. They will not let it pass if they can bring up good points against the publication of the paper. This is vastly different from publishing a paper against a certain dogma. A paper will have a high chance of acceptance if its experimental method is very logical and sound.Accessibility: As someone else on Reddit stated, the papers are one phone call away. Any PI would be happy to provide you with knowledge if you request it, especially if you don't have money to access the journal. All abstracts are available for viewing online, and laymen can go to their university library stacks and read the paper there for free. This is probably the biggest argument that renders your whole point about journals moot. Laymen have relatively easy access to papers.I will reply to your other statements as I have time.

I will try to put it in terms that you like.Re: Income & JournalsQuality of Research: Researchers pay journals every submission, and this shows a very competitive environment. If we used your method of publishing online, there would be two things that would happen. 1) Other PIs will jump at real-time results and data obtained via hard work and pursue new ideas, and 2) These real-time results are not verified yet and will lead to much time spent on red herrings, thus wasting grant money and time. Scientific progress will be hindered and we will not find out how it (the truth of whatever you're studying) really is.Cost of Research: PIs will need to hire more people to transcribe lab notebooks and put it into laymen's terms. This is bad because 1) the cost will be high and takes away from money that could be better spent in research, and 2) the general populace will either not care enough to read it (98%), care enough and misread it (~1%), care enough and use results for their own political or ideological agenda (~1%), or care enough and understand it (your peers in your field) and criticize it.

Peer Review: It does not matter what status you have. In a majority of universities, labs or departments will have these things called Journal Clubs where they pick newly published papers and discuss the merits or demerits of a paper. Yes, there are people who have very high status because they publish in a journal like Nature. However, when you comb through the paper and look at the experimental methodology, if it's bullshit and illogical, a group of individuals will call them out on it. The reason they have high status is because they have high quality papers, not the other way around. The same thing happens with peer reviewers before the paper even makes the cut into the journal. Again, if the paper is shit, the reviewers are that author's direct competitors. They will not let it pass if they can bring up good points against the publication of the paper. This is vastly different from publishing a paper against a certain dogma. A paper will have a high chance of acceptance if its experimental method is very logical and sound.Accessibility: As someone else on Reddit stated, the papers are one phone call away. Any PI would be happy to provide you with knowledge if you request it, especially if you don't have money to access the journal. All abstracts are available for viewing online, and laymen can go to their university library stacks and read the paper there for free. This is probably the biggest argument that renders your whole point about journals moot. Laymen have relatively easy access to papers.I will reply to your other statements as I have time.

Re: CitationsBurden of Proof: You have the burden of proof to show that a majority of people do not actually read the papers they cite. I have read your post you linked, and it does not take into account some things. I can give you my personal anecdote: in order to develop an experimental protocol to develop a certain biochemical product, I needed to read these cited papers in this one paper. I skimmed through all of them and found out what they did. When I go to cite the paper for my PI, I copy and pasted those references. Why? Should I be reformatting things that I have read and are already in citation format? No, this is a waste of time. I know which paper I’m referring to, and I understand its result. Now, maybe when I copy and pasted that citation, there was a mistake in it – I don't know. I do know that I've read the paper and know what the hell I'm talking about. Do you want all scientists to be editors, not scientists? I don't discount the fact that there are some bad science papers out there, but I want to posit this: those papers are most likely from low-tier journals. Hey! It's that status thing you were talking about again! Good science begets good publication sources, not the other way around. Once you can show me that a majority of scientists are shitty and cutting corners, then you can prove that citations can be misused. Unfortunately, your whole point is still moot because, when used correctly, citations are the best thing in science.

Re: ImportanceClinical Research vs. Translational Research vs. Basic Research: I still don't think you understand what replication means. We replicate experiments all the time. In basic research, we have a positive result. This, in turn, goes into translational research that will not work unless the basic research was correct. This, in turn, goes into clinical trials which have high variance and few controls and would kill patients if the translational research was incorrect. If it was discovered that a paper was complete bullshit and completely non-reproducible, those scientists would get barred from publication for academic dishonesty. I think that you wanted this scenario: you wanted two groups of scientists at different locations to replicate every experiment done. This is possible, but ultimately wasteful and fruitless.Funding of Replication: I assume that you wanted this type of replication I previously stated. Who would pay? The only people that come to mind are pharmaceutical companies that wish to legally protect their asses before they release a new drug. If you wanted taxpayers to pay for replication, by all means, go ahead! I would love it if you created 100% more jobs for scientists. The reality is, no one would pay for your method of replication. It's already being done, but it seems unsatisfactory for you.

Re: StatisticsExperimental Design: I don't think you understand experimental design. When you have every possible thing controlled for except one factor, you can use a basic students' T-test to determine the differences between them. For example, one biochemical test gives protein concentrations (with replication in the experiment) as absorbance values. You can compare the values and get your p-value (>95%) to determine if these are good enough to say you have a concentration of X mg/mL protein. Is there really another statistical method you should use? Another example, I do challenge experiments sometimes. Mice are vaccinated, and mice are vaccinated with an empty vector. We challenge them with lethal doses. 9/10 mice die in the control group. All of the vaccinated ones survive. We can look at p-values for blood plasma level comparisons of antibody against the challenge antigen. We can look at a statistic for the death rate (which, I consulted a statistician, by the way). More statistics should boost your results in your t-test. If your results are equivocal, then you may need the more fancy statistics to justify why you think you see some results there. I actually think that more statistics can be a detriment to good science. It either works within your confidence level or it does not. Justifying it with more bullshit statistics is grasping at straws.

False Research: You can't have it both ways, either. How did people discover that "most research turned out to be false?" Either people don't replicate experiments to find this out, or people replicate experiments and find this out. I'm dubious to your experience in actual research. Can you show me how a majority of published research is actually false?Results: As you stated before in your OP, "It's a miracle that anything ever gets discovered at all!" If we did not have results, this means that science and statistics that researchers use at the moment does not work. Since we have results that lead to medical advances, I'd venture to say that we're doing something right.I'm glad that you are questioning it, Mr. Taw. It shows an inquisitive mind. But please, before you delve into the issue based on a few articles you've read, go into the field and immerse yourself. Look at the logic behind why things are being done this way, and then you can give valid reasons why it's massive fail.

> "I actually think that more statistics can be a detriment to good science"

Anonymous: Thanks for admitting this so clearly. Your attitude definitely confirms what I've been saying about total disregard for serious statistical analysis among most researchers.

And now I understand why I was unable to figure out what your arguments were even about; and why you seems unable to figure out what my arguments were about. Your epistemological position is just highly anti-Bayesian.

I cannot do anything about this. If you don't consider statistics essential to understanding reality, then I doubt you will find anything I can say convincing.

Many of things I've been saying have support of various statistical meta-analyses (like ones I already linked), but that's just more "detriment to good science" to you.

Yes, in bad statistics, you can make any data say what you want. In good statistics, you can see patterns that point to the truth you were seeking.

I'm saying that in a good scientific experimental model, you need very basic statistical analyses because you cannot fuck with that. It either is or it isn't; no fancy statistics to muddle your conclusions.

I'm glad you think that I'm part of the problem, but you have to realize that the way I attack the problem is vastly different from you. You should consider doing statistics for research with lots of dirty data such as clinical data, NOT go into research with very clear, discrete controls.

OK, you're generalizing quite a bit; what type of research do you do? In my research, I construct new adenoviral based vaccines against a certain agent that is deadly. We can vaccinate two groups of mice (n=10): one with the gene of interest, one without. When we challenge them with the agent (>100xLD50), then we see all of the controls die off, and 90% of the vaccinated survive.

Tell me: what other statistical values do you need other than t-test? Like I mentioned before, you can take the times in which your mice died and do statistics, but overall, that gives fluff to the paper and does not detract from the actual result you've gotten. There were some cases where it wasn't as clear cut, and my PI suggested using the other (valid) statistical analysis to see if there is some data that is publishable. This is where I have my qualms about statistics. It can be either clear cut, or not. You wish to trust the muddy data, whereas I wish to improve the experimental model to get the clear(er) data.

Anonymous: So you're essentially saying that if first statistical method didn't get you *results* you want, you pick another method? *After* the experiment? Or else it would be *unpublishable*?

How the fuck is this not exactly what I've been talking about all the time ???

What you should be doing is pre-registering somewhere "we are going to run experiment on mice with this protocol and this statistical analysis", then run experiments, then publish results no matter what they are.

For frequentist statistics like t-test, deciding which method to use based on results makes it automatically invalid, no exceptions ever.

There are ways to do valid statistics after seeing data, but I doubt you would manage to get any of them past peer review in a typical discipline, most people would have no idea what's going on there.

By the way, are there many (ungated) meta-analyses of such mouse vaccine research? I'm curious how good or bad it really is.

I can't help but feel that despite epistemological differences you're both on the same page on what you expect from statistics. Anonymous seems to be at least partly aware of zes ignorance, even if not willing to admit it.

Taw, take note that Anonymous seems to indicate ze's rather wary of using doubtful, handwaving 'statistical' methods to fabricate publishable results, even if ze doesn't know any better and seems to believe actually you are the one that's proposing it.

NO! Mr. Taw, if the first and only statistical method does not give positive results, then you pick another experiment after you do a few more tests to confirm your initial results. You do not pick another method.

Let me give you the thought process behind a grant. You do initial tests and constructs before getting a grant. While applying for the grant, you have preliminary, unpublished data (free for them to review) that suggests a possibility of your idea working. You suggest multiple aims that would encompass the whole grant.

When you're funded and working on your project, you will do your various tests. The meat of the project is "does the vaccine work?" If you get a good survival rate, you know that it works, but then, you must prove that it does work. You should test for blood serum levels of antibodies against the antigen in question, you must test for populations of lymphocytes using ELISPOT or Flow Cytometry, you must be able to use the mouse serum (containing the antibody) to neutralize the pathogen in question. And you must do other tests to basically confirm that beyond a doubt, your vaccine is working via a specific method and is not magic. Every step of the way, you need to control for as much as possible so that you may use the simplest statistical analyses.

If your project does not give a good survival rate, you still do the other tests to figure out what went wrong. In this scenario, most likely, the epitope was not good, so, you go back to Square 1 and choose a different epitope on this antigen to see if this works or not (you may even use a shotgun approach and choose many epitopes). Then, you do the SAME statistical analyses you've done.

It's a subtle difference, I know. Your methods should stay the same (barring optimization for specific antigens), but you may start from the beginning if it's not working.

And also, wtf? Obviously you don't look at the results and go... oh hey, let's see if analyzing it this way will give me better results. This is what a bad scientist does. Also, this wasn't what we were debating in the first place. How did we go from me saying "too many statistics bad" to you saying "too many statistics bad b/c you're looking at results and deciding on which one to use"??

Vaccine research is exceptionally vast. There are people looking at innate immune system, some look at adaptive, some look at only B cells, some only look at T cells, some only look at a subset of T cells (Treg, etc). I cannot comment on the statistical analyses there -- sometimes in Flow Cytometry, you're gating data that's only looking at a tiny population, so there are probably different statistics involved (including the t-tests you hate so much).

About publication: you can publish negative data. It just goes into a lower tier journal because it doesn't help others all that much. I've published negative and equivocal data before. There isn't much you can do with the data after it tells you the results.

I agree I probably need to learn more about statistics. In my (temp) field, there usually isn't a need for additional statistics because things either worked or didn't work with that >95% confidence value. If we do require more statistical analyses, we consult actual biostatisticians. I will definitely read per your recommendation, and I am curious if it will change my opinion of statistics.

NO! Mr. Taw, if the first and only statistical method does not give positive results, then you pick another experiment after you do a few more tests to confirm your initial results. You do not pick another method.

Let me give you the thought process behind a grant. You do initial tests and constructs before getting a grant. While applying for the grant, you have preliminary, unpublished data (free for them to review) that suggests a possibility of your idea working. You suggest multiple aims that would encompass the whole grant.

When you're funded and working on your project, you will do your various tests. The meat of the project is "does the vaccine work?" If you get a good survival rate, you know that it works, but then, you must prove that it does work. You should test for blood serum levels of antibodies against the antigen in question, you must test for populations of lymphocytes using ELISPOT or Flow Cytometry, you must be able to use the mouse serum (containing the antibody) to neutralize the pathogen in question. And you must do other tests to basically confirm that beyond a doubt, your vaccine is working via a specific method and is not magic. Every step of the way, you need to control for as much as possible so that you may use the simplest statistical analyses.

If your project does not give a good survival rate, you still do the other tests to figure out what went wrong. In this scenario, most likely, the epitope was not good, so, you go back to Square 1 and choose a different epitope on this antigen to see if this works or not (you may even use a shotgun approach and choose many epitopes). Then, you do the SAME statistical analyses you've done.

It's a subtle difference, I know. Your methods should stay the same (barring optimization for specific antigens), but you may start from the beginning if it's not working.

And also, wtf? Obviously you don't look at the results and go... oh hey, let's see if analyzing it this way will give me better results. This is what a bad scientist does. Also, this wasn't what we were debating in the first place. How did we go from me saying "too many statistics bad" to you saying "too many statistics bad b/c you're looking at results and deciding on which one to use"??

Vaccine research is exceptionally vast. There are people looking at innate immune system, some look at adaptive, some look at only B cells, some only look at T cells, some only look at a subset of T cells (Treg, etc). I cannot comment on the statistical analyses there -- sometimes in Flow Cytometry, you're gating data that's only looking at a tiny population, so there are probably different statistics involved (including the t-tests you hate so much).

About publication: you can publish negative data. It just goes into a lower tier journal because it doesn't help others all that much. I've published negative and equivocal data before. There isn't much you can do with the data after it tells you the results.

I said... "and my PI suggested using the other (valid) statistical analysis to see if there is some data that is publishable."

This is an example of where I felt it was frivolous to do more statistical analyses. It was clear that it did not work, but I suppose I mis-spoke. He wanted to see if this was promising enough to continue with the project instead of abandoning it. In this case, we were using a specific recombinant protein for vaccination. It was not clear whether it was protective or not. He wanted to see if it was worth pursing (via looking at different statistical results) to do more experiments and get a publishable result. I felt that it wasn't an effective enough vaccine to warrant more study. We had our differences in opinion there.

If the other statistics suggested it was worth studying? What then? I hope you see why I dislike too many fancy statistics. Yes, I realize that they can be used correctly and give great results, but more importantly, I still adhere to tightly controlled experimental design as top priority.

From what I could find on pubmed, at least for human flu vaccines, quality of research seems low, statistical manipulation seems rampant, and it's not just the plain old bias against publication of null results.

Anonymous: The kind of statistical clarity you claim to have in mice vaccine research doesn't seem to extend to human vaccine research.

If publication decision or peer review decision (by editors or reviewers) depends in any way on results, and especially if it depends on which side of arbitrary p-value cutoff they end up (0.94 vs 0.96), to me this is already horrible abuse of statistics.

And this situation exists in pretty much every scientific field as inherent side effect of journals and peer review, so I have very low opinion of all of them.

Except for medicine's recent efforts to limit it with mandatory trial protocol registration, nobody even seems to acknowledge that the problem exists, let alone do something about it!

And this is just one of far too many problems with academic research.

Also remember this is all new development. Einstein didn't have to suffer from any of this peer review nonsense. Back then we were making more progress with handful of scientists than now with millions of them. Just take a guess what would be the changes of any of his papers surviving peer review in alternative universe where it was adopted a century earlier.

"We cannot say for certain why industry sponsored studies are more attractive to more prestigious journals, but such journals are preferentially targeted by all studies because of their prominence and prestige, so industry sponsored studies might have a higher probability of acceptance. The two mechanisms might be linked, but further research, especially in other specialties, is required."

There may be a higher percentage of papers being accepted with industry funding, but they fail to take into account the quality of the paper written. For example, if I were a layman and did a home experiment in my garage and found a result, then wrote a paper, (assuming I had the licenses to do the work) I probably would not out-compete a scientist doing the same exact experiment. The scientist would articulate his methodology more clearly and in jargon specific and relevant to his target audience (because he has done so before).

The Principal Investigators who manage to grab funding (competitive!) from industries such as Sanofi-Pasteur or others have that eloquence to their writing. They have proven their mettle by obtaining many grants; they have proven their mettle by publishing a plethora of papers as well. Their own repertoire of skills is focused at writing grants and writing papers. Thus, I can give this small reasoning: successful people are successful.

Yes, I agree, journals should probably look more at the experimental design and results and hold more weight to it, but when someone writes a discussion section that is astoundingly convincing, the reviewers... well... are convinced. It has nothing to do with data manipulation, and in fact, shows that results are results and aren't messed with even with those people that go for "status" according to you. Otherwise, we'd see so many more papers that are perfectly aligned with their hypotheses (and I would question that to death).

I will have to agree with you in part for human vaccine trials. I personally have done little work on that, so I cannot comment on statistical manipulation (which I hate!).

I do agree that humans are "dirty data" because of the overwhelming variance between individuals. It's so difficult working with human blood as data. I am not sure what the statistically accepted number is for this type of research, but I feel that experiments should hold thousands of subjects instead of the hundreds or even tens that some papers use. If I were to go into this research, I would have to read up on more statistics because in those experiments, there are many factors you cannot control as you could in a mouse model.

But... weren't you saying that medical research (clinical) was trying to be more transparent and that other fields aren't following suit? I may understand why they are striving for that instead of typical basic research. It might be because there are PIs there who use statistical manipulation because the data is so dirty. I don't have a defense for the people who do that.

> but they fail to take into account the quality of the paper written.

This is something they explicitly checked for. Now they used point system for study design to proxy for quality, so possibly something was missing, but if journals are much more willing to publish based on vague criteria like eloquence and industry connections, the system is simply broken.

I'd recommend jumping on bandwagon of what medicine is undergoing right now. Accept that reality is complicated and so data will always be messy. Accept that researchers are human so they will always be biased.

Things like mandatory study design registration; mandatory disclosure of conflicts of interest; a lot more statistics in education; wider selection of adequate statistical methods instead of just t-test (e.g. if your study involves two hypothesis that are not independent (uncorrelated is not enough), unadjusted p-values are already invalid, but hardly anybody bothers to adjust that, and it always gets through peer review!); better controls than naive null hypothesis and naive control group (see active placebo effect and antidepressants); meta-analyses to double check everything; meta-meta-analyses to make sure meta-analyses are doing their job; open access journals to make results widely available - all these are the least that needs to change to get science back on track.

I understand that mice vaccines research might have far neater data than human vaccines, but I doubt your ultimate goal is development of vaccines for laboratory mice, and transfer of mice results to humans involves extremely messy statistics.

It's a good thing that you distrust blind application of complicated statistical methods - but what needs to happen is more statistical education so that such methods can be properly applied and are no longer confusing to researchers, not avoidance and sticking to simple methods that cannot really handle real messy data.

This kind of information is very limited on internet. Nice to find the post related to my searching criteria. Your updated and informative post will be appreciated by blog loving people.Love the way you posted such stuff in this blog..

My software

Creative Commons

Unless otherwise expressly stated, all original material of whatever nature created by Tomasz Węgrzanowski and included in this blog, is licensed under a Creative Commons License. It is also licensed under GFDL (for Wikipedia compatibility).