Minimax and Bayes Optimal Best-Arm Identification
Masahiro Kato
Published: 2025/6/30
Abstract
This study investigates minimax and Bayes optimal strategies in fixed-budget best-arm identification. We consider an adaptive procedure consisting of a sampling phase followed by a recommendation phase, and we design an adaptive experiment within this framework to efficiently identify the best arm, defined as the one with the highest expected outcome. In our proposed strategy, the sampling phase consists of two stages. The first stage is a pilot phase, in which we allocate each arm uniformly in equal proportions to eliminate clearly suboptimal arms and estimate outcome variances. In the second stage, arms are allocated in proportion to the variances estimated during the first stage. After the sampling phase, the procedure enters the recommendation phase, where we select the arm with the highest sample mean as our estimate of the best arm. We prove that this single strategy is simultaneously asymptotically minimax and Bayes optimal for the simple regret, with upper bounds that coincide exactly with our lower bounds, including the constant terms.