Testing for Underpowered Literatures
Stefan Faridani
Published: 2024/6/19
Abstract
How many experimental studies would have come to different conclusions had they been run on larger samples? I show how to estimate the expected number of statistically significant results that a set of experiments would have reported had their sample sizes all been counterfactually increased. The proposed deconvolution estimator is asymptotically normal and adjusts for publication bias. Unlike related methods, this approach requires no assumptions of any kind about the distribution of true intervention treatment effects and allows for point masses. Simulations find good coverage even when the t-score is only approximately normal. An application to randomized trials (RCTs) published in economics journals finds that doubling every sample would increase the power of t-tests by 7.2 percentage points on average. This effect is smaller than for non-RCTs and comparable to systematic replications in laboratory psychology where previous studies enabled more accurate power calculations. This suggests that RCTs are on average relatively insensitive to sample size increases. Research funders who wish to raise power should generally consider sponsoring better-measured and higher quality experiments -- rather than only larger ones.