Archives

Follow us on Twitter

Yes, many psychology findings may be “too good to be true” – now what?

Today, Sciencepublished the first results from a massive reproducibility project, in which more than 250 psychology researchers tried to replicate the results of 100 papers published in three psychology journals. Despite working with the original authors and using original materials, only 36% of the studies produced statistically significant results, and more than 80% of the studies reported a stronger effect size in the original study than in the replication. To the authors, however, this is not a sign of failure – rather, it tells us that science is working as it should:

Humans desire certainty, and science infrequently provides it. As much as we might wish it to be otherwise, a single study almost never provides definitive resolution for or against an effect and its explanation. The original studies examined here offered tentative evidence; the replications we conducted offered additional, confirmatory evidence. In some cases, the replications increase confidence in the reliability of the original results; in other cases, the replications suggest that more investigation is needed to establish the validity of the original findings. Scientific progress is a cumulative process of uncertainty reduction that can only succeed if science itself remains the greatest skeptic of its explanatory claims.

We discussed these findings with Jelte M. Wicherts, Associate Professor of Methodology and Statistics at Tilburg University, whose research focuses on this area but was not a co-author on the paper.

-Are you surprised by any of the results?

Overall, I am not surprised by these findings. We already have quite strong indirect and direct evidence for the relatively high publication and reporting biases in psychology (and other fields), implying that many findings in the psychological literature are too good to be true. Given the smallness of most study samples in social and cognitive psychology and the subtlety of most of the effects that we study, we cannot almost always obtain a significant outcome as the literature appears to suggest. There are surveys in which most psychological researchers admit to making opportunistic decisions in collecting and analyzing data, and in presenting their results. Estimates of the number of completed psychological studies being published are lower than 50%. Many meta-analyses show signs of publication bias (small study effects). And there is recent direct evidence showing that researchers report selectively in psychology and related fields. And this collective problem is psychologically and socially understandable given that (at least, researchers think that) journals will publish only significant results.

This new study corroborates that effects reported in the psychological literature are often inflated, and the most likely culprits are publication bias and reporting biases. Some results appear to replicate just fine, while further research will be needed to determine which original findings were false positives (quite likely given these biases) and which effects turn out to be moderated by some aspect of the study. One effect went from strongly positive in the original to strongly negative in the replication.

-It’s easy to get discouraged by these findings and lose faith in research results. How do you respond to that?

Because we as a research group find so many errors in reporting of results and such clear signs of publication bias in the literature, we run the risk of becoming overly negative. This result shows that the field is actually doing pretty well; this study shows that psychology is cleaning up its act. Yes there are problems, but we can address these. Some results appear weaker or more fragile than we expected. Yet many findings appear to be robust even though the original studies were not preregistered and we do not know how many more similar studies were put in the file drawer because their results were less desirable.

-A criticism we’ve heard of replication efforts is that it’s very difficult for a new group of people to gain the skills and tools to do the same study as well as the original authors, so a perfectly valid result may still fail to be replicated. Do you think this study addresses this criticism in any way?

The Open Science Collaborators have installed several checks and balances to tackle this problem. Studies to be replicated were matched with the replicator teams on the basis not only of interests and resources, but also of the teams’ expertise. The open data files clearly indicate the expertise of each replicator team, and the claim that a group of over 250 psychologists lacks expertise in doing these kinds of experiments is a bit of a stretch. Certainly there may be debates about certain specifics of the studies, and I expect the original researchers to point at methodological and theoretical explanations for the supposed discrepancy between the original finding and the replication (Several of the original researchers responded to the final replication report, as can be seen on the project’s OSF page). Such explanations are often ad hoc and typically ignore the role of chance (given the smallness of effects and samples sizes used in most original studies finding a significant result in one study and a non-significant result in another study may well be completely accidental), but they are to be taken seriously and perhaps studied further.

One should always report one’s methods and results in a manner that allows for independent replication; we now have many safe online locations to put supplementary information, materials, and data, and so I hope this project highlights the importance of reporting studies in a much more replicable and reproducible manner.

-Is there anything else about the study you’d like to add?

This project quite clearly shows that findings reported in the top psychology journals should not be taken at face value. The project is also reassuring in the sense that it shows how many psychological researchers are genuinely concerned about true progress in our field. The project is a good example of what open science collaboration can produce; Replication protocols were preregistered. All analyses were double-checked by internal auditors. The materials and data are all open for anyone to disagree with. It cannot get much better than this.

I hope this impressive project will raise awareness among researchers, peer reviewers, and journal editors about the real need to publish all relevant studies, by which I mean those studies that are methodologically and substantively rigorous regardless of their outcome. We need to pre-register our studies more, exercise openness with respect to the data, analyses, and materials (via OSF for instance), and publish results in a more reproducible way. Publication bias and reporting biases will not entirely go away, but at least now we know how to counter them: If we want to make sure our results are directly replicable across different locations, we team up with others and do it all over again.

42 thoughts on “Yes, many psychology findings may be “too good to be true” – now what?”

This confirms what researchers and meta-researchers have suspected for a long time. That’s a good thing, but it will mean nothing if journals shrug their collective shoulders and continue to publish flashy results that are unlikely to be reproduced. Right now I don’t see any incentives for them to change their policies. It needs some heavy-hitters (e.g., PLoS) to announce that a demonstration of adequate power will be condition for accepting an article (and, not insignificantly in the case of OA journals, a condition for accepting the authors’ money).

Requiring demonstration of adequate power before conducting the experiment is key: to calculate the power, you have to think about, and commit to, the analysis plan (what tests are optimal?), the number of test (what family-wise alpha level?), designs aspect (e.g., consequences of unbalanced design, collinearity), sources of missingness (planned N vs actual N). This sets a strict regime, which does not allow for arbitrary decisions made in the light of negative results (e.g., “let’s try the analysis in the males / females only”).

In addition, once the planned analyses are done, researchers are free to explore the data as they see fit (e.g., alpha=.05 applied to each and every test). Any unanticipated results of interest can be reported as the upshot of “additional exploratory analyses”. So that is the win-win situation.

From a statistical point of view the findings discussed above make sense. Flashy quick publications of small studies hurried to print yield a high rate of irreproducible results and are a waste of resources.

Tenure review and grant finding committees should stop placing great weight on flashy first-off publications and demand demonstration of replicated results. Flashy studies clearly demonstrating reproducibility would be a boon, and would indicate serious scientists’ efforts to ensure that what they are claiming has some validity. Such papers would take longer to develop, and require more resources. Better funding of fewer such studies would help bolster trust in the scientific process by those who ultimately find it, the tax paying populace.

In the New York Times, Norbert Schwarz said the following:
“There’s no doubt replication is important, but it’s often just an attack, a vigilante exercise,” said Norbert Schwarz, a professor of psychology at the University of Southern California. Dr. Schwarz, who was not involved in any of the 100 studies that were re-examined, said that the replication studies themselves were virtually never vetted for errors in design or analysis.”

The paper clearly indicates that the original researchers were contacted by all replication teams and that all the analyses have been audited in the RPP. Of course, Prof. Schwarz is free to re-analyze the data of the 100 experiments by using the open data on the OSF project page and report on any errors he might find.

Professor Schwarz recently opened a can of worms with a study that showed that incidental exposure to fishy smells induces suspicion and improves detection of misleading information. That study belongs to the “social priming” tradition, which stand for everything that is “fishy” with psychology right now.http://www.sciencedirect.com/science/article/pii/S0022103115000281

i am really getting annoyed with this expression: “This result shows that the field is actually doing pretty well; this study shows that psychology is cleaning up its act. Yes there are problems, but we can address these.” which has become the mantra of every so-called expert explaining the messy state of research conduct.. we are moving haphazardly at many fronts of research and better to admit that king has no clothes.

This is excellent news for those of us laymen who waste everyone’s time arguing politics or whatever using psychological research to back up our claims. Any such research used against us can now be dismissed out of hand as a probable part of the irreproducible 64%. Actually I might need to comb through the 36% to see if any of those support my pre-existing convictions 😀

I think that the conclusion of this work is inescapable: the social sciences should stop advertising themselves as “scientific” and accept that whatever is that they do, it is not science.

The notion that humans can be studied and understood as if they were robots that operate according to deterministic rules, such as it is the case with planets and atoms, is so misguided that it begs the question how is that people believe such nonsense to begin with.

As the husband of a true social scientist, I have been asked to point out that psychology, especially the variety being re-considered here, is a behavioral science, not a social science. I will leave it up to others to debate what “science” is or should be.

No one in psychology would agree with the statement that psychology assumes that “humans can be studied and understood as if they were robots”. This is hysteria and an overreach. However, it is certainly the case that rules exist. Development of young human persons occurs in a systematic way, although it is not perfectly understood nor perfectly regular. Human persons have biases and limitations that can be understood and used in improving the delivery and consumption of information. Human perception has limitations. While psychology has great limits, it has had successes as well.

Norbert Schwarz’s comments are worrying, but I think may represent a view shared by many more of the older school and the very influential in the psychology field.
In the face of calls for better research practices/checks and balances, I’ve heard defenses from psychology research organisations (at least their leadership) and senior researchers (in years) of the numerous problems with psych/social science research which beggar belief. Head in the sand may be a too generous attribution. I think many of the next generation of researchers (<15 years) really want change and are interested in 'doing real research' that will stand the test of time and rigorous (re)testing. But the current climate obviously suits many for whatever reasons.

I think it’s a bit far to demand psychology stop calling themselves a science. Even with so few replicable results, clearly some research can be done that is replicable and thus valid. Perhaps there needs to be a culture shift in what type of research is carried out or what kind of results are feasible; seeing if there is any correlation in the studies that were replicated and those that were not.

The problems with low-powered studies and associated large effects that almost certainly will not be replicated has had serious consideration before; this is one of the better synopses (although it’s officially about neuroscience, all the same issues exists in psych and more…):

Prof. Schwartz has said that he was misquoted on this – in fact he said that the RPP is the sort of replication effort that’s needed to get around any vigilantism that might occur (presumably referring to debacles like the Schnall replication drama).

Why is no one mentioning “regression toward the mean”? Isn’t this a classic case? If 10,000 studies are done on random data, then 1,000 will randomly show an effect that is 90% significant. Doing them over again will show 100 of them replicating. Doing them again will show 10….. And none of these hypothetical studies mean anything.

– ‘… an effect that is 90% significant …’ is hard to interpret
– the studies as published are not ‘done on random data’
– in the ideal world the type I error rate determine the probability of false positives
– perhaps you are referring to the ‘file-drawer’ issue: 9 out of 10 studies (say) produce no results (nothing stat significant) and remain in the drawer, while the type I error is submitted. “Doing the study until you get it right” may be tantamount to do study until you commit a type I error. I think that this is a well known and possible relevant source of failure to replicate.

To the different people who have commented on my comment: I stand by my contention that the social sciences should stop advertising themselves as scientific. In fact, I am hardly the first to have called for this https://www.youtube.com/watch?v=IaO69CF5mbY . I understand the temptation: disciplines that are truly scientific produce probes that reach Pluto or experiments like the LHC. But alas, as Richard Feynman mentions in the video, following the forms only produces “cargo cult science”. For a discipline to be scientific it needs to produce predictive theories which result in “replicable” experiments. If the subjects of your study, such as humans in social science, are not capable of being understood this way, then the results of studies are meaningless. Richard Feynman warned that all these “cargo cult scientists” do is to intimidate people and I am afraid he is 100% right.

“For a discipline to be scientific it needs to produce predictive theories which result in “replicable” experiments.”

That’s one approach to phenomena, but it does not exhaust what “science” is. It hardly characterizes meteorology, for example. “Science” might be best understood as our effort to understand what is in the world and how it works — testing those efforts is secondary, often useful, but is a little different than the basic cognitive impulse that leads members of every culture to categorize the natural world, to examine its parts, and think about how those parts are related. Psychologists have, historically, tried to understand what it is that comprises mental life: thought, emotion, perception, and so on — testing those understandings may reveal gaps or errors, but it is the initial, systematic effort at understanding that is the science part. Geologists have this dilemma all of the time: as an observational field, geologists were doing great science long before they figured out how to constructive “predictive theories” (I think you mean hypotheses — theories aren’t necessarily predictive…). I also hesitate to mention particle physics, where experiments can be so costly and time-consuming to set up and evaluate that a single experiment is often touted as sufficient, without much possibility of replication. Have we found the Higgs-Boson? How many times? Neutrinos? How’s that working as replicated science?

I agree.. in fact, focusing on solely testing may detract from scientific inquiry and exploration. Of course, once we have a pretty good hunch, then we need to rigorously test, re-test and test at different conditions to establish causality and generalizability.. it is so much harder to achieve reproducibility in biological systems where individual differences play an important role on outcome than it is in physical sciences like physics and chemistry, where the experimenter has a greater control. On the other hand, many phenomenon that have been shown to work under laboratory conditions also cannot be replicated in natural environment due to unknown confounders. That’s why it is futile to argue what is science what is not in this context..

So this sounds like a redefinition of the term “science”. Or at the very least, it’s a popular definition and not one fully defined with rigor. Maybe a different word is in order? (see the wikipedia entry on it: https://en.wikipedia.org/wiki/Science). As a taxpayer, I will gladly support research on reproducible cures. But never your definition of science. I prefer to provide food to starving people. It’s that simple of a choice. To your example of meteorology, I completely agree it’s not science. So let’s not call it that. That of course would explain why its forecasting results have been such dismal failures. If in your science, the results are not reproducible but you consider them truth what is your test of them being scientific? Popularity among peers? That sounds a lot like good ol’ boy networking to me–not truth.

Metereology is certainly science, even a relatively hard one. Predictions improved a lot over time and are much much better than just looking at historical data. Try to forecast the path of a hurricane based on historical data alone, for instance.

Some experts in epidemiology and statistics have been voicing the problems plaguing psychology for some time. In my view, the discussion is not whether the psychology is science or not, but where we are going with this unchecked growth in published research – especially why mostly positive results – in human biology or behavior.

We consider medicine a science, but some of its guidelines are based on observational studies and some randomized trials are riddled with methodological problems – some are not even attempted to be replicated to due budget constraints. The issue is the haphazard science in any field. In most cases, when the human behavior is involved, it is difficult to replicate results or repeat the experiments. However, a good study design would eliminate many biases (not only statistical, but systematic errors like confounding, selection and information biases) that might distort the plausible inferences from the study results. Most researchers want to do good, but don’t know how to design good studies – sadly even more of science reporters don’t know how to evaluate a study design and rush to report any publication on a sexy topic. Everyone talks about the sample size, p value and statistical analysis, but very few focuses on the systematic problems and limitations to judge whether a study’s results are valid.

If we did the best we can designing a good study, a meta-analysis can show whether the inference from the results is generalizable. Re-analyzing the data would only point to the statistical problems such as data manipulations and selective reporting. Which brings us to another major problem in current state of affairs in research conduct: rush to publish! Majority of researchers in academia invest their entire professional life into their area of study. Their promotions depend on the number of publications they produce. There is so much incentive and temptation to tweak things. How did we start publishing so much? Some numbers are not even humanly possible.

I am so glad that there are these efforts like Retraction Watch, Meta-Research Innovation Center and AllTrials etc. to get a handle on this madness. I am also happy that we democratized doing research and anyone can investigate any question they have, but we have to slow down doing and start thinking…. otherwise, we will lose the public’s trust, which is already hanging by a thread.

In reply to the above comments, specially Barbara Piper’s. I am not saying that the areas studied by social science or psychology shouldn’t be studied. All I am saying is that these areas of knowledge should not be given the status of “science”, in the sense that their findings have predictive value as it is the case with Einstein’s General Relativity or physics’ Standard Model of particles. Neither psychology nor social science produces laws that are akin to the laws of physics. There is a clear ideology behind those who insist in equating social science with hard science: the notion that human beings are like robots that operate according to deterministic rules (like the ones that are at work in the natural world that are discovered by physics through the scientific method). Never mind that there are strong arguments as to why that is certainly not the case, from the human experience itself to more sophisticated like these put forward by Roger Penrose https://www.youtube.com/watch?v=f477FnTe1M0 .

A pernicious effect of this insistence is that the general public is losing trust in science as a discipline. If after all, “science”, as defended by those who equate physics with social science, produces non replicable results, why should “science” be taken seriously, any more seriously than say, astrology?

In these times of confusion, it is more relevant than ever to bring clarity to these matters. This is not to say that psychologists or social scientists should stop doing what they do. But not all knowledge is or can be scientific. The areas that can be knowledgeable through the scientific method require an underlying reality that follows deterministic laws, as it is the case with physics. If there is no underlying determinism -and my contention is that in the case of humans there isn’t regardless of the patterns that can be observed some times- the scientific method doesn’t apply. Curve fitting is not science unless it is predictive.

problem arises when we forget that what we discover is not the final destination and made many assumptions along the way.. perhaps as the researchers of the study we know that the final users seems to have lost it in translation.. part of the problems is, as Feynman mentions, scientific aspect of the work gives credibility of the results and inference from these results – why not? we follow the recipe for a good science, right? 🙂

Not really, because if there is no underlying law, there is nothing to discover with the scientific method no matter how much data you collect and how much number crunching you do. The ultimate test is whether the theory predicts correct results in falsifiable experiments. No prediction, no science.

There are areas of knowledge that are just not a good fit for the scientific method. My contention is that neither the social sciences nor psychology are. Although sometimes included in social science, economics is another area of knowledge that is not scientific as far as I am concerned. Number crunching and detailed derivations full of Greek letters are irrelevant if prediction is lacking. Economics is called “dismal science” for a reason.

I would be the last to “equate” social sciences with physical sciences — I tend to agree with the famous sociologist Max Weber, who argued that social sciences try to achieve understanding, while physical sciences try to achieve explanation. That is why no social scientist that I know of — and I am a social scientist — argues or thinks that “human beings are like robots that operate according to deterministic rules.” That’s bizarre.

Christian Scientist fails to explain what is at stake in claiming the mantel of “science.” Human social behavior is complex to the point of largely defying easy understanding, and certainly resisting simplistic explanation. But it’s immensely fascinating and deeply important. We read articles on newly discovered black holes on the front page of the New York Times (no experiments, of course, just observation), but the impact of that discovery on most of our lives is negligible. Solving the Middle East crisis has eluded us for decades — even fully understanding it is often bewildering. Which issue deserves research funding? What gets more money: exploring the roots of poverty or finding another subatomic particle?

We cracked most of chemistry in the 19th century — as Piaget noted in his ‘genetic epistemology’ — and we’re almost there with physics. That’s not because those realms are harder: regularities at those levels of physical phenomena make explanation and prediction that much easier. For most social sciences (and let’s not ignore the point made by David Taylor, above, that psychology is not a social science but a behavioral science), “understanding” is a goal that is indeed replicable, and specific models of understanding are replicated over and over. Here’s a simple hypothesis that I have made use of in my work in sub-Saharan Africa: women rise in prestige and power when they control their own economic resources. A simple hypothesis that has been ‘tested’ over and over — and an extremely important one for millions of women. Including women in the U.S. who found that control over their reproductivity allowed them to gain that control over their own economic status, which produced the conservative backlash that we see today in the anti-abortion movement, attacks on Planned Parenthood, etc. Check out Faye Ginsburg’s classic work on this (including “Contested Lives: The Abortion Debate in an American Community”.) Marvin Harris predicted this years ago in his book “America Now,” based on the simple hypothesis that I mentioned. Was that a prediction and are our headlines this week a test of it? I think so.

You say that your hypothesis has been tested. How exactly has that been done? Experiments with random assignment to conditions? It is easy to find confirming evidence for our favorite theories and hypotheses when we seek it. The trick – according to Popper at least -is to seek disconfirming evidence.

Your response contains the type of contradictory reasoning that has made possible that so much so called “social science research” be published although it cannot be reproduced.

On one side you agree that human beings are complex beings that defy easy explanations. On the other you claim that there occasions where those easy explanations to complex problems exist and that we should invest money in finding them going as far saying that said investments are more important than investments in hard science like planetary exploration or particle physics.

What you call “understanding” is what a hard scientist, like yours truly, calls a “snapshot”. Snapshots are not totally uninteresting but almost. In hard science collection and analysis of data is relevant only insofar as it allows to develop testable hypotheses (the “guessing thing” mentioned by Richard Feynman) or to test said hypotheses. Collection and analysis of data for its own sake is not very interesting in the context of hard science.

As to why it matters that social science stops calling itself “science”, it matters for the credibility of the scientific enterprise at large with the general public. Richard Feynman explained it well: because of the success of science, there is a lot of disciplines interested in claiming “scientific” status for themselves, only in the process not only said disciplines have not become more “scientific” but they are poisoning the scientific enterprise at large.

Of course the gay marriage study that had to be retracted from Science shows the degree to which social science has poisoned science. Even if the study had not been fraudulent, such a study does not belong in the same magazine that publishes breakthroughs in physics, chemistry or medicine.

So called “social scientists” will never be able to eliminate poverty because the existence of haves and have nots is inherent to the human condition. However, if they publish non reproducible research that claims some magic formula will solve poverty, they might as well convince politicians to implement destructive agendas as it happened with eugenics, the poster-example of social science doing pseudoscience and convincing politicians to implement said pseudoscience. We all know how things ended.

So my bottom line is this: if it were up to me, I would completely defund all public investments in so called “social science”, including psychology and economics. These fields should be able to do their work and get private funding, but I don’t see the value of spending my hard earned tax dollars in fields that produce nothing of value for society at large.

Instead, how about start by googling “confirmation bias” (it’s one of those pesky psychology ideas). Then you can come back and tut-tut about how the replicability studies just prove what you always knew…

When it comes to contradictory reasoning, I am not sure Christian Scientist sees the irony in his own statement that
“So called “social scientists” will never be able to eliminate poverty because the existence of haves and have nots is inherent to the human condition.”

This reasoning would suggest that there are inherent rules that govern the human condition. Who else but social scientists would look for such rules?

You’re on Retraction Watch. You know how many hard science papers get retracted for misconduct, fakery, or just being bullshit. The retracted gay marriage study doesn’t show “the extent to which social science has poisoned science” any more than the arsenic life paper or the STAP debacle show that biochemistry or biology have done the same.

Your strawmen about what social sciences claim to be able to do is nearly as bizarre as your idolatry of Feynman and out-of-nowhere. I’m not sure you know much about psychology if you think it doesn’t produce predictive theories or reproducible results, the perverse incentives that affect all scientific publishing aside – the Yerkes-Dodson law is an easy example of this, as well as the majority of psychophysics.

Godwin is a trick used by people that have no arguments to shutdown the debate. I must say though that I am still puzzled where you see Godwin. Eugenics programs were run in the United States until the 1970s https://www.youtube.com/watch?v=Nshj9rCTPdE .

MW
Such a fun thread. I could write a monograph in reply.
Instead, how about start by googling “confirmation bias” (it’s one of those pesky psychology ideas). Then you can come back and tut-tut about how the replicability studies just prove what you always knew…

Does the existence of the construct “confirmation bias” mean that nobody is able to correctly conjecture or predict anything in advance?

I would think not.

On top of that, there is a “theory” for just about anything in psychology, regardless of the validity of the evidence that is supposed to support it. Just pick and choose your personal favorite “theory” to support your opinion!

Christian Scientist
Not really, because if there is no underlying law, there is nothing to discover with the scientific method no matter how much data you collect and how much number crunching you do. The ultimate test is whether the theory predicts correct results in falsifiable experiments. No prediction, no science.
There are areas of knowledge that are just not a good fit for the scientific method. My contention is that neither the social sciences nor psychology are. Although sometimes included in social science, economics is another area of knowledge that is not scientific as far as I am concerned. Number crunching and detailed derivations full of Greek letters are irrelevant if prediction is lacking. Economics is called “dismal science” for a reason.

I agree that behavioural and social sciences (including social psychology and economics) have their very special problems.
However, your definition of “science” seems a little narrow. First of all: how do you know that this or that field offers no prediction? The theories might not be good enough but mybe there are better ones out there to be discovered?

You claimed above that meteorology is no science. Really? Just because current models don’t allow for a prediction of the weather next year? Prediction for next day has actually become pretty good. Sure, it’s not perfect, but there are strictly scientific reasons for that, e.g. weather is a chaotic system. So any research into chaotic systems is unscientific? Really?

It is unlikely that any science is perfect, not even physics. Take a close look at Heisenberg’s uncertainty principle. Here we have a theory (quantum mechanics) that tells us that certain things are not only not predictable, they are even unmeasurable by definition. Oops, physics is no science then according to your definition. And yes, I do acknowledge that Einstein hated quantum mechanics pretty much for that reason (“God doesn’t play dice”).

There is probably a lot more scientific knowledge out there that only allows for probabilistic prediction than definitive “yes or no” prediction.

Yes, my definition of science is narrow, and that’s fine. For one I have never claimed that areas outside science should not be studied. What I defend is to rescue the word “science” to something very specific to make sure we are comparing apples to apples when we talk about scientific theories.

To your other two points:

I didn’t claim that meteorology is not science. My opinion is that meteorology is definitely scientific, meaning, I do believe that the underlying reality meteorology studies follows deterministic laws (the laws of nature). On the other hand, I do believe that meteorology is far from being a mature, accurate science whose laws are well understood to the point of making accurate predictions. So I do believe that we shouldn’t take meteorology’s (or climate science’s for that matter) predictions as seriously as we would take the threat of a meteor crashing on Earth. A threat that in 2 years NYC will succumb to a climate event is not believable with our current understanding of meteorology and climate science. A threat that in 2 years meteor X with a big enough size could crash on Earth is believable.

With respect to the notion that quantum mechanics not being a scientific because of its probabilistic formulation, again, quantum mechanics provides accurate testable predictions in falsifiable experiments that measure the quantum effects on the aggregate. Pulling again the Feynman trick, he called the theory he got his Nobel prize for, Quantum electrodynamics https://en.wikipedia.org/wiki/Quantum_electrodynamics ” “the jewel of physics” for its extremely accurate predictions of quantities like the anomalous magnetic moment of the electron and the Lamb shift of the energy levels of hydrogen.” I think you don’t understand quantum dynamics and the way its predictions are tested in experiments.