Replicability Rankings of Eminent Social Psychologists

Social psychology has a replication problem. The reason is that social psychologists used questionable research practices to increase their chances of reporting significant results. The consequence is that the real risk of a false positive result is higher than the stated 5% level in publications. In other words, p < .05 no longer means that at most 5% of published results are false positives (Sterling, 1959). Another problem is that selection for significance with low power produces inflated effect sizes estimates. Estimates suggests that on average published effect sizes are inflated by 100% (OSC, 2015). These problems have persisted for decades (Sterling, 1959), but only now psychologists are recognizing that published results provide weak evidence and might not be replicable even if the same study were replicated exactly.

How should consumers of empirical social psychology (textbook writers, undergraduate students, policy planners) respond to the fact that published results cannot be trusted at face value? Jerry Brunner and I have been working on ways to correct published results for the inflation introduced by selection for significance and questionable practices. Z-curve estimates the mean power of studies selected for significance. Here I applied the method to automatically extracted test statistics from social psychology journals. I computed z-curves for 70+ eminent social psychologists (H-index > 35).

The results can be used to evaluate the published results reported by individual researchers. The main information provided in the table are (a) the replicability of all published p-values, (b) the replicability of just significant p-values (defined as p-values greater than pnorm(2.5) = .0124, and (c) the replicability of p-values with moderate evidence against the null-hypothesis (.0124 > p > .0027). More detailed information is provided in the z-curve plots (powergraphs) that are linked to researchers’ names. An index less than 50% would suggest that these p-values are no longer significant after adjusting for selection for significance. As can be seen in the table, most just significant results are no longer significant after correction for bias.

Caveat: Interpret with Care

The results should not be overinterpreted. They are estimates based on an objective statistical procedure, but no statistical method can compensate perfectly for the various practices that led to the observed distribution of p-values (transformed into z-scores). However, in the absence of any information which results can be trusted, these graphs provide some information. How this information is used by consumers depends ultimately on consumers’ subjective beliefs. Information about the average replicability of researchers’ published results may influence these beliefs.

It is also important to point out that a low replicability index does not mean researchers were committing scientific misconduct. There are no clear guidelines about acceptable and unacceptable statistical practices in psychology. Zcurve is not designed to detect scientific fraud. In fact, it assumes that researcher collect data, but conduct analyses in a way that increases the chances of producing a significant result. The bias introduced by selection for significance is well known and considered acceptable in psychological science.

There are also many factors that can bias results in favor of researchers’ hypotheses without researchers’ awareness. Thus, the bias evident in many graphs does not imply that researchers intentionally manipulated data to support their claims. Thus, I attribute the bias to unidentified researcher influences. It is not important to know how bias occurred. It is only important to detect biases and to correct for them.

It is necessary to do so for individual researchers because bias varies across researchers. For example, the R-Index for all results ranges from 22% to 81%. It would be unfair to treat all social psychologists alike when their research practices are a reliable moderator of replicability. Providing personalized information about replicability allows consumers of social psychological research to avoid stereotyping social psychologists and to take individual differences in research practices into account.

Finally, it should be said that producing replicabilty estimates is subject to biases and errors. Researchers may differ in their selection of hypotheses that they are reporting. A more informative analysis would require hand-coding of researchers’ focal hypothesis tests. At the moment, R-Index does not have the resources to code all published results in social psychology, let alone other areas of psychology. This is an important task for the future. At the moment, automatically extracted results have some heuristic value.

One unintended and unfortunate consequence of making this information available is that some researchers’ reputation might be negatively effected by a low replicability score. This cost has be be weighted against the benefit to the public and the scientific community of obtaining information about the robustness of published results. In this regard, the replicability rankings are no different from actual replication studies that fail to replicate an original finding. The only difference is that replicability rankings use all published results, whereas actual replication studies are often limited to a single or a few studies. While replication failures in a single study are ambiguous, replicability esitmates based on hundreds of published results are more diagnostic of researchers’ practices.

Nevertheless, statistical estimates provide no definitive answer about the reproducibility of a published result. Ideally, eminent researchers would conduct their own replication studies to demonstrate that their most important findings can be replicated under optimal conditions.

It is also important to point out that researchers have responded differently to the replication crisis that became apparent in 2011. It may be unfair to generalize from past practices to new findings for researchers who changed their practices. If researchers preregistered their studies and followed a well-designed registered research protocol, new results may be more robust than a researchers’ past record suggests.

Finally, the results show evidence of good replicability for some social psychologists. Thus, the rankings avoid the problem of selectively targeting researchers with low replicability, which can lead to a negative bias in evaluations of social psychology. The focus on researchers with a high H-index means that the results are representative of the field.

If you believe that you should not be listed as an eminent social psychologists, please contact me so that I can remove you from the list.

If you think you are an eminent social psychologists and you want to be included in the ranking, please contact me so that I can add you to the list.

If you have any suggestions or comments how I can make these rankings more informative, please let me know in the comments section.

*** *** *** *** ***

REPLICABILITY RANKING OF EMINENT SOCIAL PSYCHOLOGISTS
[sorted by R-Index for all tests from highest to lowest rank]

Could you please add more highly cited researchers on the list? For instance, Ryan and Deci and their self-determination theory seem to be very widely applied. I wonder if you have any data on their replicability?