Another look at inference after prediction
Jessica Gronsbell, Jianhui Gao, Yaqi Shi, Zachary R. McCaw, David Cheng
公開日: 2024/11/29
Abstract
From structural biology to epidemiology, predictions from machine learning (ML) models increasingly complement costly gold-standard data to enable faster, more affordable, and scalable scientific inquiry. In response, prediction-based (PB) inference has emerged to accommodate statistical analysis using a large volume of predictions together with a small amount of gold-standard data. The goals of PB inference are two-fold: (i) to mitigate bias from errors in predictions and (ii) to improve efficiency relative to classical inference using only the gold-standard data. While early PB inference methods focused on bias, their ability to enhance efficiency remains a focus of ongoing research. We revisit a foundational PB inference method and show that a simple modification can be applied to guarantee provable improvements in efficiency. In doing so, we establish new connections between augmented inverse probability weighted estimators (AIPW) and several recently proposed PB inference methods with a similar focus. The utility of our proposal, which leverages prediction-based outcomes to enhance efficiency, is demonstrated through extensive simulation studies and an application to real data from the UK Biobank. Further, we contextualize PB inference by drawing connections to historical literature from economics and statistics, highlighting how classic methods directly inform this contemporary problem.