We’re so good at medical studies that most of them are wrong

Our ability to generate massive data sets and crunch through numbers at …

It's possible to get the mental equivalent of whiplash from the latest medical findings, as risk factors are identified one year and exonerated the next. According to a panel at the American Association for the Advancement of Science, this isn't a failure of medical research; it's a failure of statistics, and one that is becoming more common in fields ranging from genomics to astronomy. The problem is that our statistical tools for evaluating the probability of error haven't kept pace with our own successes, in the form of our ability to obtain massive data sets and perform multiple tests on them. Even given a low tolerance for error, the sheer number of tests performed ensures that some of them will produce erroneous results at random.

The panel consisted of Suresh Moolgavkar from the University of Washington, Berkeley's Juliet P. Shaffer, and Stanley Young from the National Institute of Statistical Sciences. The three gave talks that partially overlapped, at least when it came to describing the problem, so it's most informative to tackle the session at once, rather than by speaker.

Why we can't trust most medical studies

Statistical validation of results, as Shaffer described it, simply involves testing the null hypothesis: that the pattern you detect in your data occurs at random. If you can reject the null hypothesis—and science and medicine have settled on rejecting it when there's only a five percent or less chance that it occurred at random—then you accept that your actual finding is significant.

The problem now is that we're rapidly expanding our ability to do tests. Various speakers pointed to data sources as diverse as gene expression chips and the Sloan Digital Sky Survey, which provide tens of thousands of individual data points to analyze. At the same time, the growth of computing power has meant that we can ask many questions of these large data sets at once, and each one of these tests increases the prospects than an error will occur in a study; as Shaffer put it, "every decision increases your error prospects." She pointed out that dividing data into subgroups, which can often identify susceptible subpopulations, is also a decision, and increases the chances of a spurious error. Smaller populations are also more prone to random associations.

In the end, Young noted, by the time you reach 61 tests, there's a 95 percent chance that you'll get a significant result at random. And, let's face it—researchers want to see a significant result, so there's a strong, unintentional bias towards trying different tests until something pops out.

Young went on to describe a study, published in JAMA, that was a multiple testing train wreck: exposures to 275 chemicals were considered, 32 health outcomes were tracked, and 10 demographic variables were used as controls. That was about 8,800 different tests, and as many as 9 million ways of looking at the data once the demographics were considered.

The problem with models

Both Young and Moolgavkar then discussed the challenges of building a statistical model. Young focused on how the models are intended to help eliminate bias. Items like demographic information often correlate with risks of specific health outcomes, and researchers need to adjust for those when attempting to identify the residual risk associated with any other factors. As Young pointed out, however, you're never going to know all the possible risk factors, so there will always be error that ends up getting lumped in with whatever you're testing.

Moolgavkar pointed out a different challenge related to building the statistical models: even the same factor can be accounted for using different mathematical means. The models also make decisions on how best handle things like measuring exposures or health outcomes. The net result is that two models can be fed an identical dataset, and still produce a different answer.

At this point, Moolgavkar veered into precisely the issues we covered in our recent story on scientific reproducibility: if you don't have access to the models themselves, you won't be able to find out why they produce different answers, and you won't fully appreciate the science behind what you're seeing.

Consequences and solutions

It's pretty obvious that these factors create a host of potential problems, but Young provided the best measure of where the field stands. In a survey of the recent literature, he found that 95 percent of the results of observational studies on human health had failed replication when tested using a rigorous, double blind trial. So, how do we fix this?

The consensus seems to be that we simply can't rely on the researchers to do it. As Shaffer noted, experimentalists who produce the raw data want it to generate results, and the statisticians do what they can to help them find them. The problems with this are well recognized within the statistics community, but they're loath to engage in the sort of self-criticism that could make a difference. (The attitude, as Young described it, is "We're both living in glass houses, we both have bricks.")

Shaffer described how there were tools (the "family-wise error rate") that were once used for large studies, but they were so stringent that researchers couldn't use them and claim much in the way of positive results. The statistics community started working on developing alternatives about 15 years ago but, despite a few promising ideas, none of them gained significant traction within the community.

Both Moolgavkar and Young argued that the impetus for change had to come from funding agencies and the journals in which the results are published. These are the only groups that are in a position to force some corrections, such as compelling researchers to share both data sets and the code for statistical models.

Moolgavkar also made a forceful argument that journal editors and reviewers needed to hold studies to a minimal standard of biological plausibility. Focusing on studies of the health risks posed by particulates, he described studies that indicated the particulates in a city were as harmful as smoking 40 cigarettes daily; another concluded that particulates had a significant protective effect when it comes to cardiovascular disease. "Nobody is going to tell you that, for your health, you should go out and run behind a diesel bus," Moolgavkar said. "How did this get past the reviewers?"

But, in the mean time, Shaffer seemed to suggest that we simply have to recognize the problem and communicate it with the public, so that people don't leap to health conclusions each time a new population study gets published. Medical researchers recognize the value of replication, and they don't start writing prescriptions based on the latest gene expression study—they wait for the individual genes to be validated. As we wait for any sort of reform to arrive, caution, and explaining to the public the reasons for this caution, seems like the best we can do.

Nice article- at work I still get the response that p<0.05 always means significance even when I can run a Monte Carlo sim and show that p<0.05 will happen randomly. I get statistical certainty of course- I always find p<0.05 for this result in my simulations.

It sounds like these aren't experimental studies from what I can pick up in this summary (look at the 6th paragraph).

Why aren't we just lowering the 5% standard for statistical significance to say, 1%? There's no natural law setting the test for significance to 5%. It was set by culture. And obviously, as laid out here, the issue is false positives, not false negatives. We seem to have more than enough researchers and studies that we can afford to err on the side of false negatives.

The other issue is that people don't publish negative results under the assumption that it isn't news - this is what creates the bias for positive results. Well, we all have databases in the cloud now. Let people upload results that don't work out and put that on the CV. And the eventually, people won't keep testing until they randomly stumble into something that isn't there.

This issue came up for the HIV/AIDS vaccine trial in Thailand last fall. There was a lot of noise about a "successful" vaccine, even though the results were statistically significant only by a scant margin and only for one statistical test. Moreover, the vaccine seemed to be acting in a different step in the disease progression than it had been designed for.

It should have been more widely asked how many other trials had searched and failed, which would help to assess the significance of that positive result.

It's quite simple, we should use statistical research only to guide clinical trials. The stats are very good at eliminating hypotheses - if you can't find any correlation between two factors, there is very little chance that there is a cause and effect relationship.

So test your hypothesis with statistical analysis, if it survives that, you can then go forward and attempt to establish causality with a clinical trial. No cheating, only double blinded, placebo controlled studies need apply.

I think another problem is the media outlets that want to sensationalize every little tidbit of information. Media wants the attention so if they think exaggeration of the study will get the viewers, then they will do that. Almost like a domino effect, the researchers want results, the publishers want sales, media wants the viewership, etc. In effect they all end up reporting statistics as facts, and being refuted 6 months later, which again the media reports.

You could just use a p-value and let the person who reads the work determine whether they think it is legitimate.

In case of the study they cited that had a multitude of outcome variables, I would say that would be a very good thing to have. Some of the effects on outcomes of what you are study by chance may turn out opposite to the majority of outcome, but you can glean what way the whole study is going (as long as the outcome variables are fairly similar).

What is more concerning is the large number of studies that are non-reproducible. Occasionally, just by chance, you should come a cross a study that is wrong, but 95 percent seems high. I would be interested to take a closer look at that survey paper.

It is not uncommon for very good and intelligent doctors to lack the basic ability to do statistics and (even worse) the basic understanding of the ideas involved, yet still be involved in studies. My mother is a doctor and decided (kindly) to try to help me get published since she was the primary clinician on a very large-scale study and had access to a lot of data that could be studied (I'm a PhD student in math). It was very hard for me to explain to her that I don't really have the background in statistics to "make an equation for the likelihood of getting a certain disease" and even more she didn't understand the basic ideas of that approach. It was very difficult for her to understand that I couldn't really produce a new statistical model and expect it to have any true validity until it had been tested a lot in the future and instead tried to explain that I could only really test the previous models produced from other data sets to see their validity (I mean I could come up with a polynomial expression to exactly perfectly meet any finite data set). Needless to say, I was often very frustrated in this endeavor.

Now that's not to say that the statisticians working on these questions don't know what they're doing, but it's important to realize that many people who are very intelligent and successful lack very basic mathematical/reasoning skills.

And this doesn't even get into the question of how slowly trials go through and how difficult it is to make a model and the questions of whether there is any basis for the assumptions in the models, etc...I guess in conclusion this is hard shit that requires people of many different fields to work together so much that few individuals can truly understand the details of the work from start to finish. So while we shouldn't discount research or believe they're just out for fame (or whatever other stupid crap you hear), we should also recognize how complex our bodies (and other things) are and that mistakes will surely be made.

Interesting article. I don't have time for a huge post, so here are a few random thoughts (of dubious quality ) after reading it:

- The problem of increased errors from multiple comparisons/statistical tests is a pretty old one. As mentioned in the article, the usual ways of handling it (controlling for familywise error rates) are extremely conservative. Newer (but not exactly "new" techniques) try to control for something called the "false discovery rate" instead, which puts a limit on the percent of Type I errors acceptable to the researcher in exchange for increased power to detect real effects. These are presumably the "alternate methods" mentioned in the article. They haven't gained much traction because i) they aren't always easy to use in most of the current major statistical packages, and ii) in my experience, they aren't widely taught or acknowledged, and (from what I hear) journals don't really expect them. There's also no set standard for what constitutes a "good" level of control; if you are doing purely exploratory research, you may not want to trade ANY level of statistical power for lower error rates, while in other settings, the reverse is true. It's totally dependent on the goal of the study. So long as researchers are up front about their decisions and the possible result I don't consider it a huge issue. For example, it would be nice if authors included statements like "I found 5 significant correlations in my matrix, however, there is a 20% chance that at least one of those was due to chance."

- In several fields (especially social sciences, but other fields as well) there has been a push in recent years to focus a bit less on tests of statistical significance (which are affected by things like sample size) and to focus a bit more on talking about effect sizes. The idea here is that the null hypothesis is almost never "true," and that the .05/.01 significance levels are really arbitrary numbers that have become the default in science over time for historical reasons. The real question people care about is what direction is the relationship in, how big is it, and what are the plausible ranges of the estimated parameters (e.g., confidence/credibility intervals) given the data?

- I'm a little confused about the way the article uses the term "model" here. When I think of a statistical model, I think of some function like y = b*x + e, e ~ N(0, t); models in this sense of the word are typically described in enough detail in articles (at least, in my experience). Providing the code is nice, but I'm not sure what that will give you above and beyond the model. What the articles *sometimes* fail to provide is detailed information on the specific algorithms used to estimate the models (e.g., REML, ML; stopping/fitting rules used by the algorithms to decide when they've converged, etc.); sure, that kind of info would be nice, but I can't see it being useful for many people. I guess I'm not clear on what the point/complaint is here.

...

Finally, a question: did they provide a cite for the "95% percent of observational studies couldn't be replicated" info? That's pretty shocking, and I'd love to read about how they came up with that number, what they found, etc.

They're a cludge for faking it because we don't really know what's going on. (as evidenced by "This drug is thought to work by..." ads. This is the FDA's fault. Before the FDA medicine was released slowly on to the market, people were careful. Doctors wouldn't proscribe until they were statisified. Now, the FDA stamps approval and a massive marketing campeign begins and the drug is sold light hot cakes immediately. Guess what? Per capita fewer people died without the FDA than do with it as a result of bad drugs. And in the unregulated drug market, few people die or have complications than in the regulated market per capita. Get rid of the FDA yesterday, it's killing people both for releasing drugs that kill people and for delaying drugs that can save people.

Been saying all of this for years and being made fun of that I'm not being rational, but every single article continues to prove me right.

Note that the hockey stick artificially introduces a 1 degree drop at the beginning to get a trend line that goes up. Otherwise, no matter what you do, you get a flat line for the most part as the current trend. Which of course was bad for AGW so they screwed with the data. (the second line is a more reasonably corrected result which shows no trend at all)(This is real from the computer code released in Climategate btw)

Wow, it's easy to tell who the anti-science people are here. Statistics is the heart of science and scientific thought; it's what turns observations into arguments. Sure, it can be done incorrectly, but so can any proof. Misapplication does not invalidate the procedure. Sheesh.

The idea that the p-value becomes a poor test as the number of tests increase is a concept covered within the first few chapters of most bioinformatics texts, so this shouldn't be surprising for those in biological computing. This is why programs like BLAST use the E-value instead of p-value or Z-score. I'm not a clinician, but I assumed it would be covered in other texts as well. Clinical research is far more important than anything I work on, so the fact that statistics aren't well covered for clinicians is a little scary, particularly if its beginning to render the research unreliable.

I'm not a great statistician, but I see the importance of using statistics, and even though it would irk some scientists, I suggest making probability and statistics a part of all graduate curricula in the natural sciences/engineering. This could still be mandated by the funding agencies through stipulations on training grants, but it seems a lot more effective than requiring current PIs to take time to learn proper statistics. Since graduate/junior students are doing much of the legwork anyway, I propose training them directly and then letting them "educate" their PIs.

"Statistics is the heart of science and scientific thought; it's what turns observations into arguments. "

No argument with your sentence, but if you leave your observation as statistical instead of doing a real science experiment where you control all of the variables you're not doing science, you're being lazy.

Doing a study where you give people a drug (and some no drug) and watch what happens, doesn't tell you anything other than that those people experienced those issues and you don't have a clue why. This is not science. Science is looking at the drug at a molecular level and understanding how it works directly and seeing how it interacts with all bodily functions. That's hard to do, so they short cut which is the problem with statistics. You're simply doing a "probably this is the right answer" when you don't have a clue why. It's called guessing with math, not science.

llama-line "Statistics is the heart of science and scientific thought"

No... No it's not. The heart of science is repeatability through observation, experimentation, or derivation. Statistics can provide a clue to which direction to head, but it is not what science is based on. To think so is ignorant.

Observational and meta-analysis studies are generally acknowledged to be of limited value by well trained physicians as they have all the problems the author points out and more. Randomized, well powered, blinded placebo controlled trials are generally considered the gold standard and for the most part generate the only evidence worth considering.

The need for statistics is to eliminate the influence of confounding variables. These are variables that interfere but cannot be directly controlled or accounted for. We simply do not have the knowledge of many systems to be able to control for all variables. A drug that perfectly interferes with tumor growth may also affect the immune system in unpredictable ways. Similarly, many data sets are tainted, and we'd like to understand something--somtimes anything--from the data at hand. Temperature data fits this description, and statistics can help eliminate irrelevant contributions to trends (if applied correctly). If we knew everything in these situations, we may not need statistics, but, unfortunately, we remain largely ignorant of large, complicated (chaotic) systems.

Even beyond that, statistics can also point scientists in new directions of investigations; if something doesn't fit statistically within a model, it can help us know that the model isn't sufficient.

And the prospect that you don't know enough about climatology to judge has never occurred to you? Because calling AGW "nothing more than an exercise in statistics" suggests to me that you have no idea what you're talking about when it comes to this topic.

llama-line "Statistics is the heart of science and scientific thought"

No... No it's not. The heart of science is repeatability through observation, experimentation, or derivation. Statistics can provide a clue to which direction to head, but it is not what science is based on. To think so is ignorant.

I think the link is stronger than simply a "clue." It is precisely what helps us place bounds on whether what we see is likely to have been caused by the things we're controlling. Saying that "b always changes like so when I increase a" is an act of statistical reasoning. And not a very good one, unless we also add in how many times the behavior was observed. Statistics is one of the truest garbage-in garbage-out disciplines there is, which is why we should be so motivated to do it as carefully as possible. I include as part of that making careful assumptions, and stating them explicitly. "Statistically significant" is shorthand for something more complex, one that has unfortunately been elevated to too high a position.

Geminiman wrote:

No argument with your sentence, but if you leave your observation as statistical instead of doing a real science experiment where you control all of the variables you're not doing science, you're being lazy.

Sometimes it isn't possible or desirable to control certain parameters in an experiment. For example, if trying to do so is likely to introduce more bias into the models (or noise into the observations, etc.) than simply drawing from a more general population. I work with a method where it is possible, in principle, to collect 3 dimensional scattering information from a sample. In practice, however, the amount of data and time required to collect, store, and analyze it makes it unfeasible. Performing angle-averaging and ending up with a 1-dimensional version takes a fraction of a second. Both versions are statistical, and we are choosing the one with the best statistical properties because we don't know a better way. This isn't lazy, it is the only way forward. What would be lazy is knowing reasonable steps one could take to get better statistics, and not taking them.

The problem could stem from Statistics being a bit mind-bending to Science-minded folks. Folks in Science are creative, analytical, etc. And they spend a lot of time learning chemistry, anatomy, etc. But when you throw them into a Statistics class, you will notice a point when it just starts flying over many of their heads. There's just this threshold where their brain has a hard time comprehending more advanced mathematics, even though they excel at understanding complex chemistry and biology.

Statistics was required for Nursing degrees at the school I'm attending. I was in the class for Business (I was the ONLY one in the class taking it for Business, which in and of itself is also disturbing). After we passed mean, median, mode, and started getting into p-value, hypothesis testing, etc, you could just see the confusion on faces of 90% of the class.

I guess to make an analogy, it's like with programming, where you get some folks who are very creative with high-level programming languages, but when they take a fundamental C programming class, some just hit a brick wall when they get to pointers.

Science & Statistics are both specializations that benefit from a certain mind-set, and that's why folks who can understand both are so valuable when you put them into things like Bioinformatics. They bridge the gap between two mountains. They're able to figure out Taguchi-style statistical tests based off a few previous tests, and figure out what variables would be a waste of time to pursue and which are the fruit. They save tons of time and money.

Unfortunately, a lot of Science-oriented businesses tend to neglect how useful this can be, just as most businesses in general don't bother hiring a statistician to help out with process control and market analysis. The folks "in charge" just can't grasp what's going on. They see it as "magic", and just as a King would have a wary trust of his court magician, that's who others view statisticians. They don't truly appreciate what a good statistician can bring to the table, because they just don't get it. They don't get how crunching a few numbers in complex fashion can get these course-altering results. Instead, they'd rather toss tons of money and manpower at retesting things over and over in a fashion that does make sense to them.

The issue is that some researchers, not all, fail to account for multiple testing.

However, this article very poorly explains this very important distinction. The article shuns the fact that adjustment for multiple testing exists and is widely employed and instead suggests that it's something that we (as researchers) gave up 15 years ago because it was too hard.

Um, no.

Sure more statisticians/researchers need to be employing adjustment methods, but that doesn't discredit every study ever done as this article so absurdly suggests. Heck, most of the time you are only concerned with the primary outcome anyway so performing multiple tests (even without adjustment) on secondary outcomes for hypothesis-generating purposes is acceptable.

This particular point is almost laughable:

"In a survey of the recent literature, he found that 95 percent of the results of observational studies on human health had failed replication when tested using a rigorous, double blind trial."

Care to share those results? Failed replication how? Failed to replicate the same direction of effect, magnitude, what?

Statistics is one of the truest garbage-in garbage-out disciplines there is, which is why we should be so motivated to do it as carefully as possible. I include as part of that making careful assumptions, and stating them explicitly. "Statistically significant" is shorthand for something more complex, one that has unfortunately been elevated to too high a position.

This is the point: statistics is the technology for drawing numerical inferences from the data -- nothing more, nothing less. These numerical inferences can be compared against physical model predictions and clue us into better models. Every experimenter does this.

Statistics *done properly* tells you what information you have, and what information you don't have. The problem in the article is that medical researchers are attempting to extract information that isn't there.

...he found that 95 percent of the results of observational studies on human health had failed replication when tested using a rigorous, double blind trial.

A simple analysis of that statement would imply that observational studies provide no statistically signficant benefit. Which, in a way, is true. Without further research, the results of observational studies provide scant benefit. But observational studies provide clues, that should then be followed up by more rigorous methods. It is only the results of those more rigorous studies that provide real medical benefit.

Think about that next time they talk about a new study on the news that shows some food is bad/good for you.

while what you are reporting may well have been the gist of the session, I'm a bit surprised that you appear to present it as the status quo of medical scientific publishing ...

To some extent, the papers that make it through peer review are a reflection of the reviewer pool available to a journal, and will have a lag until a critical mass are available that meet current levels of scientific knowledge, but the field has certainly evolved beyond what you describe, IMHO.

Certainly with the advent and rapid proliferation of genomics and array-based methods (now over 10 yrs ago), where statisticians had to basically reinvent their field to change from "a handful of tests on hundreds of probands", to "a handful of samples and tens of thousands of "tests", standards of acceptable statistical methods have changed dramatically, as has the recognition of their necessity. Numbers like that essentially killed the usefulness of simple tricks like a Bonferroni correction for multiple testing, and crossvalidation methods have taken over, as has the need for independent test sets to confirm initial results.

Dr. Jay, when you can actually run a few experiments on the climate and set up a control planet, get back to me.

So, i take it from this that you don't believe anything about the rest of geology or evolution either?

Don't forget cosmology and most of the rest of astronomy.

Those cosmologists just report what they do because they're trying to get more grants, and Matt Clary has proof (they keep on getting grants, don't they? Isn't that...interesting? hmmmmm?). Between this and Geminiman, I'd laugh if I could only stop crying long enough.

Dr. Jay, when you can actually run a few experiments on the climate and set up a control planet, get back to me.

So, i take it from this that you don't believe anything about the rest of geology or evolution either?

Seriously, to compare AWG to evolution is to do an extreme disservice to the folks that study and have attempted understand the subject. Of course, I find evolution to be implausible but to compare it to AWG is a low blow indeed. There is little to no "science" behind AWG because much of the data is biased. Get rid of the bias and the alarming predictions vanish in the noise. But feel free to believe otherwise - I don't take it personally.

while what you are reporting may well have been the gist of the session, I'm a bit surprised that you appear to present it as the status quo of medical scientific publishing ...

To some extent, the papers that make it through peer review are a reflection of the reviewer pool available to a journal, and will have a lag until a critical mass are available that meet current levels of scientific knowledge, but the field has certainly evolved beyond what you describe, IMHO.

Certainly with the advent and rapid proliferation of genomics and array-based methods (now over 10 yrs ago), where statisticians had to basically reinvent their field to change from "a handful of tests on hundreds of probands", to "a handful of samples and tens of thousands of "tests", standards of acceptable statistical methods have changed dramatically, as has the recognition of their necessity. Numbers like that essentially killed the usefulness of simple tricks like a Bonferroni correction for multiple testing, and crossvalidation methods have taken over, as has the need for independent test sets to confirm initial results.

And, no, I'm not surprised that 95% of bad studies were not reproducible... ;-)

I think the panel was mostly focused on health risks, rather than current array studies, so the state of the art there probably got shorted - as did astronomy for that matter. It was used as an example of the sort of massive data sets we have, but they didn't delve into some of the problems and solutions that field is working with.

I'm not sure if i agree with the suggestion that genomics and array publication has changed due to recognition of statistical issues, though. I tend to view it as following a track similar to gene sequencing. At first, sequencing a gene was so hard, and had been done so rarely, that you could get a paper just for getting another one. Over time, however, it got easier and lost its novelty. So, to get a paper, you didn't just have to sequence a gene - you had to do something with the sequence, too.

In the same way, the first arrays were novel and hard work. They got easier and less novel, so now journal editors expect more extensive follow up. The effect is good - validation of any form is always excellent - but i'm not sure whether the recognition of statistical challenges really drove it.

And the prospect that you don't know enough about climatology to judge has never occurred to you? Because calling AGW "nothing more than an exercise in statistics" suggests to me that you have no idea what you're talking about when it comes to this topic.

I find it interesting when AGW true believers react with such arrogant presumption, because that's what lost them the popular front in the battle to enact their longed-for AGW countermeasures, and continues to lose them more ground. There was a time just a few years ago when most of the educated middle-class folks I knew were gaga over AGW and really believed the "Inconvenient Truth"-like spin the popular press put on Global Warming, and they were willing to push for government support of fairly heavy measures to curb AGW. I thought carbon taxing and/or cap-and-trade were a foregone conclusion due to popular support from key demographics.

But then...the backlash over such arrogant presumption began to outweigh popular acceptance of AGW by these key demos. Today, the U.S. is unlikely to *ever* implement any stringent AGW countermeasures, thanks to climate scientists with attitudes like Dr. Jay's. It doesn't matter now what the talking heads at Copenhagen and future climate change summits say or agree to--not enough key politicians in the U.S. will support AGW countermeasures, because they know their key constituents will vote them out if they do. All because of the arrogance certain climate scientists show not just toward the common people, but toward other scientists and mathematicians as well.

Climate science is fast becoming an insular mutual wank-fest which alienates all other scientists, who are eventually going to get as tired of the arrogance as the general populace has and write climate science off as a fringe field of junk science run by green agendas rather than by scientific principles and honest exploration.

Besides which, as a great man once said all science is either physics or stamp collecting. It's becoming increasingly clear that many key climate scientists are just stamp collectors, and fairly shoddy ones at that.

And the prospect that you don't know enough about climatology to judge has never occurred to you? Because calling AGW "nothing more than an exercise in statistics" suggests to me that you have no idea what you're talking about when it comes to this topic.

I find it interesting when AGW true believers react with such arrogant presumption

You seem to be the one presuming - namely, you assume that I don't already have a detailed knowledge of Mr. Clary's, posting history, including his lack of interest in defending any of his claims, limited understanding of the science behind climate change, and tendency to derail discussions of unrelated matters by making inflammatory and inaccurate comments about climate change.

Dr. Jay, when you can actually run a few experiments on the climate and set up a control planet, get back to me.

So, i take it from this that you don't believe anything about the rest of geology or evolution either?

Seriously, to compare AWG to evolution is to do an extreme disservice to the folks that study and have attempted understand the subject. Of course, I find evolution to be implausible but to compare it to AWG is a low blow indeed. There is little to no "science" behind AWG because much of the data is biased. Get rid of the bias and the alarming predictions vanish in the noise. But feel free to believe otherwise - I don't take it personally.

Show us the bias, then. It should be fairly easy with the use of modern statistical methods, and even if there's some 'agenda' which prevents publication in recognized journals (despite there being no evidence of papers being rejected without good reason), they should be all over the internet.

No, blog posts don't count. No, Watts giving me some photos, an analogy to water turbidity, and a complaint that NASA didn't analyze enough stations does not count. McIntyre and McKitrick's reply to Mann's '98 paper would be a great example of what I'm looking for, but that sort of work doesn't appear to be very common, for some reason.

Besides which, as a great man once said all science is either physics or stamp collecting. It's becoming increasingly clear that many key climate scientists are just stamp collectors, and fairly shoddy ones at that.

Rutherford was just as wrong as he was pithy when he said that. If that's your impression of how science goes (everything reduces to physics or it's 'stamp collecting' in the Linnean model) then you haven't studied the history of science much, and certainly have ignored the past 50 years of philosophy of science.

We could talk about reduction to physics if you're seriously interested, and it would even be on-topic, since the likely candidate for 'most famous reduction to physics' is Boltzmann's reduction of thermodynamics to statistical mechanics. Let us know.