Generalizable estimation of conditional average treatment effects using Causal Forest in randomized controlled trials

Rikuta Hamaya, Etsuji Suzuki, Konan Hara

公開日: 2025/6/14

Abstract

Estimating conditional average treatment effects (CATE) from randomized controlled trials (RCTs) and generalizing them to broader populations is essential for individualized treatment rules but is complicated by selection bias and high dimensional covariates. We evaluated Causal Forest based CATE estimation strategies that address trial selection bias. Specifically, we compared approaches of fitting Causal Forest with covariates of interest only, additionally including covariates that determine trial participation, and repeating these models with inverse probability weighting (IPW) to reweight trial samples to the source population. Identification theory suggests unbiased CATE estimation is possible when covariates related to trial participation are included. However, simulation studies demonstrated that, under realistic RCT sample sizes, variance inflation from high dimensional covariates often outweighed modest bias reduction. Including greater than 3 covariates related to participation substantially degraded precision unless sample sizes were large. In contrast, IPW methods consistently improved performance across scenarios, even when the weighting model was misspecified. Application to the VITAL trial of omega 3 fatty acids and coronary heart disease further illustrated how IPW shifts estimates toward source population effects and refines heterogeneity assessments. Our findings highlight a fundamental bias variance tradeoff in generalizing CATE from RCTs. While inclusion of trial selection variables ensures consistency in theory, in practice it may worsen performance in medical trials with sample size of 5000 or less. More efficient strategies are to limit CATE models to strong effect modifiers and address selection bias separately through IPW. These results provide practical guidance for applying CATE estimation in clinical and epidemiologic research.

Generalizable estimation of conditional average treatment effects using Causal Forest in randomized controlled trials | SummarXiv | SummarXiv