Article Figures & SI

Figures

Prediction market performance. Final market prices and survey predictions are shown for the replication of 44 publications from three top psychology journals. The prediction market predicts 29 out of 41 replications correctly, yielding better predictions than a survey carried out before the trading started. Successful replications (16 of 41 replications) are shown in black, and failed replications (25 of 41) are shown in red. Gray symbols are replications that remained unfinished (3 of 44).

Relationship between market price and prior and posterior probabilities p0, p1, and p2 of the hypothesis under investigation. Bayesian inference (green arrows) assigns an initial (prior) probability p0 to a hypothesis, indicating its plausibility in absence of a direct test. Results from an initial study allows this prior probability to be updated to posterior p1, which in turn determines the chances for the initial result to hold up in a replication, and thus the market price in the prediction market. Once the replication has been performed, the result can be used to generate posterior p2. Observing the market price, and using the statistical characteristics of the initial study and the replication, we can thus reconstruct probabilities p1, p2, and p0. Detailed calculations are presented in Supporting Information.

Probability of a hypothesis being true at three different stages of testing: before the initial study (p0), after the initial study but before the replication (p1), and after replication (p2). “Error bars” (or whiskers) represent range, boxes are first to third quartiles, and thick lines are medians. Initially, priors of the tested hypothesis are relatively low, with a median of 8.8% (range, 0.7–66%). A positive result in an initial publication then moves the prior into a broad range of intermediate levels, with a median of 56% (range, 10–97%). If replicated successfully, the probability moves further up, with a median of 98% (range, 93.0–99.2%). If the replication fails, the probability moves back to a range close to the initial prior, with a median of 6.3% (range, 0.01–80%).

Final positions per participant and market. The left panel shows the portfolios in the first set of prediction markets, and the right panel shows the portfolios for the second set of prediction markets. Long positions (bets on success) are shown in green, and short positions (bets on failure) are shown in red. This figure indicates that, in both sets of prediction markets, the participants had broad portfolios with positions in several markets. Similarly, each market attracted a number of traders. Often, traders have diverging views: in each market, there is at least one trader holding a long position, and one trader holding a short position. The final portfolios show that there are a few “bears” (predominantly betting on failure) who invested in short positions only (6 of 47 traders for the first set of markets; 4 of 45 traders for the second set of markets), and “bulls” (predominantly betting on success) who invested in long positions only (3 of 47 traders for the first set of markets; 6 of 45 traders for the second set of markets). However, most of the participants fall into a wide spectrum between these two extremes.

(A) Trading interface introductory page. When entering the prediction market, participants were presented with all hypotheses along with their current price (“score”) and recent change in price. By clicking Adjust, the participants received more information on the study and the possibility to trade by buying and selling (a). For each replication, participants were presented with the hypothesis, the authors, the title, and the journal, and could buy stocks by choosing Yes or sell stocks by choosing No (b), and enter how many points they would like to invest in the specific hypothesis (c). (B) Position summary presented participants with an overview of their investments: which hypotheses, number of shares held, and current market value.

Comparison of survey responses and behavior in the two prediction markets. (A) Correlation between market price and average survey response. Market prices and average survey responses are positively correlated, suggesting that information given in the surveys was also revealed in the market (Pearson correlation coefficient of 0.78, P < 0.001, n = 43). However, market prices are more “extreme” than survey responses, which translate into a lower prediction error. Studies that were replicated successfully are shown in black, and studies that failed to replicate are shown in red. Studies that remained unfinished are shown in gray. (B) Correlation between volume of traded shares and diversity in survey responses (i.e., SD of responses; Pearson correlation coefficient of 0.51, P < 0.001, n = 43). The positive correlation between volume in the market and diversity in the surveys suggests that there was more trading for studies where participants had more diverging views on the replicability of a study. In other words, when there is larger diversity in premarket views, more trades are required to reach a “consensus” in the market pricing. (C) Negative correlation between market price and diversity in survey responses (Pearson correlation coefficient of −0.53, P < 0.001, n = 43). The diversity of survey responses is higher when the prediction market predicts a low probability that the original result will be replicated. This suggests that there is more disagreement around replications that are overall expected to fail rather than replications expected to succeed.

Hypotheses for the 23 replication studies in the first set of prediction markets

Ref.

Hypothesis

33

White participants with high external motivation to respond without prejudice toward Blacks have an attentional bias toward neutral Black faces presented for 30 ms, but have an attentional bias away from neutral Black faces presented for 450 ms. These biases are eliminated when the faces display happy expressions.

34

Participants do not exhibit a delay in response when switching between pronouncing regular words and pronouncing nonwords.

35

Naive participants' judgments of the power and leadership of CEO faces are correlated positively with their companies' profits.

36

Repetition blindness (a reduction in reporting seeing an orthographically identical or similar word when it is presented in close temporal proximity amid a series of rapidly presented words or nonwords) will occur even for nonidentical orthographical neighbors (e.g., boss and bass) even when the stimuli are nonwords and when they are never repeated in the string of stimuli.

37

An increase in participants' public moral image will be related to an increased willingness to reconcile only for perpetrators, whereas an increase in participants' sense of power will be related to an increased willingness to reconcile only for victims.

38

Participants instructed to avoid race or use race in categorizing tools and guns exhibited less 1/f noise than participants in a control condition where no mention of race was made.

Participants will prefer descriptions of the city of Los Angeles that are more concrete/less abstract when they are exposed to the words “Los Angeles” during an earlier exercise. Participants who are not shown “Los Angeles” during this earlier exercise will prefer relatively less concrete/more abstract descriptions of the city of Los Angeles.

41

Word processing is slower for dense near semantic neighborhoods, i.e., words with many near neighbors are processed more slowly than words with few near neighbors.

42

Words denoting objects that typically occur high in the visual field hinder identification of targets appearing at the top of the display, whereas words denoting low objects hinder target identification at the bottom of the display.

When there are no nonoccurrences of the outcome in the presence of just one cause (cause A), increasing the number of occurrences of the outcome in the presence of that cause alone does not alter the conditional contingency. Under the conditional contingency hypothesis, therefore, such manipulations should not have a significant effect on causal judgment. As opposed to this, the tested predictions are that (i) such occurrences raise judgments of A as cause for the outcome and (ii) lower judgments of an alternative cause B.

45

When participants read sequences of digits and a task requires the joint processing of nonadjacent pairs of digits, they learn exclusively the relation between these nonadjacent digits and not relations between adjacent digits, thus suggesting attention instead of spatial contiguity as the critical factor.

46

Drug use is positively correlated with learning from experience under “sunny” conditions (in which win–loss probabilities are known before making a series of choices) but not correlated under “cloudy” conditions (in which the win–loss probabilities are not known in advance and can only be learned through trial and error).

There are semantic interference effects in the delayed naming conditions such that individuals are slower to respond to semantically related word–picture pairs than semantically unrelated word–picture pairs.

49

Participants’ ambivalence scores differ across three conditions (implemental mindset one-sided focus, implemental mindset two-sided focus, and neutral mindset), with the implemental mindset one-sided group showing a significantly lower amount of ambivalence compared with the implemental mindset two-sided group. Participants assigned to the neutral mindset condition score in the middle, although not significantly different from either group.

50

Visual statistical learning for colors operates in a feature-based manner if the covariance between feature dimensions is disrupted.

51

Attentional selection is suppressed, delayed, and diffused in time during the attentional blink, and these effects are dissociated by their time course.

52

People who read an essay undermining free will show more cheating in a simple arithmetic task than people who read a control essay.

53

When confronted with more than two pieces of information, the salient selection criterion is expected information quality, which causes a preference for consistent information.

54

There will be a triple interaction with man's availability, participant's conception risk, and participant's partnership status such that man’s availability and participant’s conception risk interact significantly for partnered women but not for unpartnered ones. In particular, this interaction will show that women with a partner will prefer attached men during the less fertile days of their cycle and single men during the more fertile days of their cycle.

55

When asked to intentionally forget a presented item list, participants will forget items that are repeated twice with several other words in between (spaced presentation) more frequently than when they are not directed to forget. This effect will not occur for items that are repeated twice consecutively (massed presentation).

Hypotheses for the 21 replication studies in the second set of prediction markets

Ref.

Hypothesis

56

Preschool children and adults who are presented with a 3*3 matrix of color photographs of threat-relevant and threat-irrelevant stimuli and are asked to find the threat relevant target among eight treat-irrelevant distractors or the threat-irrelevant target among eight threat-relevant distractors will detect the threat-relevant target faster than the fear-irrelevant target.

57

Older children select the correct object more frequently than what would be expected by chance in “where” trials.

58

The discontinuity effect (that groups make more competitive choices than individuals) is larger under low partner control–joint control (the low PC-JC matrix) than under high partner control–joint control (the high PC-JC matrix).

59

The effect of processing style on social judgments will be partially mediated by hemisphere activation (stronger right hemisphere activation will be related to assimilation judgments, whereas stronger left hemisphere activation will be related to contrast judgments).

60

Men who feel threatened in their faith of the political and economic climate of their country will show a greater romantic interest in women who are portrayed as embodying benevolent sexist ideals than in women who are portrayed as career oriented, party seeking, active in social cases, or athletic.

61

Participants with higher cognitive ability are better than participants with lower cognitive ability at determining which cards must be turned over to prove the validity of a proposed rule regarding the two sides of four displayed cards.

62

Participants in the “vulnerable condition” will believe that the confederate's expressions were happier than their private feelings, and this effect will be larger in the “vulnerable condition” than in the “control condition.”

63

A relationship between self-esteem and later health outcomes is mediated through interpersonal stress.

The contribution rate to pool B in the IPD-MD (intergroup prisoners dilemma–maximizing difference) game will be lower than the contribution rate to pool B in the IPD (intergroup prisoners dilemma) game.

66

When pseudohomophones rather than legal nonwords are used as the nonwords in lexical tasks, there is a stimulus quality and word frequency interaction effect.

67

There is an interaction between commitment (regular vs. first-time donors) and focus intervention (the “to date” condition vs. the “to go” condition). The direction of the interaction is that regular donors donate more in the “to go” than in the “to date” condition and this effect is decreased or reversed for first time donors.

68

Participants faced with the task of selecting the correct ink color for a word presented to them will make more color identification errors in low-contingency trials where words have a low correlation to colors, than in medium-contingency trials where words have a stronger correlation to colors.

Considering that people sample from a database including a constant ratio of more positive than negative information for three providers, in the two-way interaction effect of provider (manipulating sample size) and valence (positive vs. negative information), the tendency to underestimate the frequency of positive and to overestimate the frequency of negative observations will increase from the provider with the smallest to the provider with the highest overall frequency due to differential regression effects.

71

When participants perform a response time task identifying words as either old or new, response times to misses (old items judged as new) are faster than response times to correct rejections (new items judged as new), indicating that priming can occur independently of recognition.

72

Participants who are expecting to play confrontational video games prefer anger-inducing experiences to exciting experiences.

73

Under low-switch conditions, recall performance of consonants is worse for degraded stimuli compared with normal stimuli.

74

Males show less sensitivity in distinguishing between friendliness and sexual interest than women.

75

The difference in measured “trust and comfort” between the high fairness condition and the low fairness condition is larger for African American participants than for white participants.

76

The sum effect will be more pronounced in the categorically related word pairs priming condition than the misaligned unrelated word pairs priming condition.

Researchers report biparental inheritance of mitochondrial DNA in 17 members of three unrelated multigeneration families, paving the way for insights into alternative mechanisms for the treatment of inherited mitochondrial diseases.

Researchers report a machine-learning approach to identify land plants at risk of extinction, suggesting that the approach can be used to guide policies aimed at allocating resources for biodiversity conservation.

A study explores how cats groom fur using fine structures called papillae on the surface of the tongue and presents a biologically inspired hairbrush to remove allergens from cat fur and apply medications on cat skin.