Selecting the Best Arm in One-Shot Multi-Arm RCTs: The Asymptotic Minimax-Regret Decision Framework for the Best-Population Selection Problem

Joonhwi Joo

Published: 2025/9/4

Abstract

We develop a frequentist decision-theoretic framework for selecting the best arm in one-shot, multi-arm randomized controlled trials (RCTs). Our approach characterizes the minimax-regret (MMR) optimal decision rule for any location-family reward distribution with full support. We show that the MMR rule is deterministic, unique, and computationally tractable, as it can be derived by solving the dual problem with nature's least-favorable prior. We then specialize to the case of multivariate normal (MVN) rewards with an arbitrary covariance matrix, and establish the local asymptotic minimaxity of a plug-in version of the rule when only estimated means and covariances are available. This asymptotic MMR (AMMR) procedure maps a covariance-matrix estimate directly into decision boundaries, allowing straightforward implementation in practice. Our analysis highlights a sharp contrast between two-arm and multi-arm designs. With two arms, the empirical success rule ("pick-the-winner") remains MMR-optimal, regardless of the arm-specific variances. By contrast, with three or more arms and heterogeneous variances, the empirical success rule is no longer optimal: the MMR decision boundaries become nonlinear and systematically penalize high-variance arms, requiring stronger evidence to select them. This result underscores that variance plays no role in optimal two-arm comparisons, but it matters critically when more than two options are on the table. Our multi-arm AMMR framework extends classical decision theory to multi-arm RCTs, offering a rigorous foundation and a practical tool for comparing multiple policies simultaneously.

Selecting the Best Arm in One-Shot Multi-Arm RCTs: The Asymptotic Minimax-Regret Decision Framework for the Best-Population Selection Problem | SummarXiv | SummarXiv