Reproducibility

In a series of papers, we replicate lab experiments in the social sciences. We also study how accurately replicability can be predicted, by peer scientists in prediction markets, and also by machine learning algorithms.

Publications

Predicting the replicability of social science lab experiments
PLOS One, 2019

with Anna Dreber, Eskil Forsell, Gideon Nave et al.

with Anna Dreber, Eskil Forsell, Gideon Nave, Juergen Huber, Magnus Johannesson, Michael Kirchler, Taisuke Imai, Teck Ho, and Colin Camerer
- Abstract
- Published Paper
We measure how accurately replication of experimental results can be predicted by black-box statistical models. With data from four large-scale replication projects in experimental psychology and economics, and techniques from machine learning, we train predictive models and study which variables drive predictable replication. The models predicts binary replication with a cross-validated accuracy rate of 70% (AUC of 0.77) and estimates of relative effect sizes with a Spearman ρ of 0.38. The accuracy level is similar to market-aggregated beliefs of peer scientists (Camerer et al., 2016; Dreber et al., 2015). The predictive power is validated in a pre-registered out of sample test of the outcome of Camerer et al. (2018), where 71% (AUC of 0.73) of replications are predicted correctly and effect size correlations amount to ρ = 0.25. Basic features such as the sample and effect sizes in original papers, and whether reported effects are single-variable main effects or two-variable interactions, are predictive of successful replication. The models presented in this paper are simple tools to produce cheap, prognostic replicability metrics. These models could be useful in institutionalizing the process of evaluation of new findings and guiding resources to those direct replications that are likely to be most informative.
Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015
Nature Human Behavior, 2018

with Colin F. Camerer, Anna Dreber, Felix Holzmeister et al.

with Colin F. Camerer, Anna Dreber, Felix Holzmeister, Teck Ho, Juergen Huber, Magnus Johannesson, Michael Kirchler, Gideon Nave, Brian A. Nosek, Thomas Pfeiffer, Nick Buttrick, Taizan Chan, Yiling Chen, Eskil Forsell, Anup Gampa, Emma Heikensten, Lily Hummer, Taisuke Imai, Siri Isaksson, Dylan Manfredi, Julia Rose, Eric-Jan Wagenmakers, and Hang Wu
Being able to replicate scientific findings is crucial for scientific progress. We replicate 21 systematically selected experimental studies in the social sciences published in Nature and Science between 2010 and 2015. The replications follow analysis plans reviewed by the original authors and pre-registered prior to the replications. The replications are high powered, with sample sizes on average about five times higher than in the original studies. We find a significant effect in the same direction as the original study for 13 (62%) studies, and the effect size of the replications is on average about 50% of the original effect size. Replicability varies between 12 (57%) and 14 (67%) studies for complementary replicability indicators. Consistent with these results, the estimated true-positive rate is 67% in a Bayesian analysis. The relative effect size of true positives is estimated to be 71%, suggesting that both false positives and inflated effect sizes of true positives contribute to imperfect reproducibility. Furthermore, we find that peer beliefs of replicability are strongly related to replicability, suggesting that the research community could predict which results would replicate and that failures to replicate were not the result of chance alone.

Science News, Vox, The Atlantic, NPR, Wired, The Washington Post, Nature, Science Mag, The Guardian, Buzzfeed News, Marginal Revolution
Evaluating Replicability of Laboratory Experiments in Economics
Science, 2016

with Colin F. Camerer, Anna Dreber, Eskil Forsell et al.

with Colin F. Camerer, Anna Dreber, Eskil Forsell, Teck Ho, Juergen Huber, Magnus Johannesson, Michael Kirchler, Johan Almenberg, Taizan Chan, Emma Heikensten, Felix Holzmeister, Taisuke Imai, Siri Isaksson, Gideon Nave, Thomas Pfeiffer, Michael Razen, and Hang Wu
The replicability of some scientific findings has recently been called into question. To contribute data about replicability in economics, we replicated 18 studies published in the American Economic Review and the Quarterly Journal of Economics between 2011 and 2014. All of these replications followed predefined analysis plans that were made publicly available beforehand, and they all have a statistical power of at least 90% to detect the original effect size at the 5% significance level. We found a significant effect in the same direction as in the original study for 11 replications (61%); on average, the replicated effect size is 66% of the original. The replicability rate varies between 67% and 78% for four additional replicability indicators, including a prediction market measure of peer beliefs.

Last Week Tonight with John Oliver, The Economist, Science Mag, The Chronicle of Higher Education
Using Prediction Markets to Forecast Research Evaluations
Royal Society Open Science, 2015

with Marcus Munafo, Thomas Pfeiffer, Emma Heikensten et al.

with Marcus Munafo, Thomas Pfeiffer, Emma Heikensten, Johan Almenberg, Alexander Bird, Yiling Chen, Brad Wilson, Magnus Johannesson, and Anna Dreber
- Abstract
- Published Paper
The 2014 Research Excellence Framework (REF2014) was conducted to assess the quality of research carried out at higher education institutions in the UK over a 6 year period. However, the process was criticized for being expensive and bureaucratic, and it was argued that similar information could be obtained more simply from various existing metrics. We were interested in whether a prediction market on the outcome of REF2014 for 33 chemistry departments in the UK would provide information similar to that obtained during the REF2014 process. Prediction markets have become increasingly popular as a means of capturing what is colloquially known as the 'wisdom of crowds', and enable individuals to trade 'bets' on whether a specific outcome will occur or not. These have been shown to be successful at predicting various outcomes in a number of domains (e.g. sport, entertainment and politics), but have rarely been tested against outcomes based on expert judgements such as those that formed the basis of REF2014.

Publications

Predicting the replicability of social science lab experiments

Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015

Evaluating Replicability of Laboratory Experiments in Economics

Using Prediction Markets to Forecast Research Evaluations