Diffusion Approximations for Thompson Sampling in the Small Gap Regime

Lin Fan, Peter W. Glynn

公開日: 2021/5/19

Abstract

We study the process-level dynamics of Thompson sampling in the ``small gap'' regime. The small gap regime is one in which the gaps between the arm means are of order $\sqrt{\gamma}$ or smaller and the time horizon is of order $1/\gamma$, where $\gamma$ is small. As $\gamma \downarrow 0$, we show that the process-level dynamics of Thompson sampling converge weakly to the solutions to certain stochastic differential equations and stochastic ordinary differential equations. Our weak convergence theory is developed from first principles using the Continuous Mapping Theorem, can handle stationary, weakly dependent reward processes, and can also be adapted to analyze a variety of sampling-based bandit algorithms. Indeed, we show that the process-level dynamics of many sampling-based bandit algorithms -- including Thompson sampling designed for any single-parameter exponential family of rewards, as well as non-parametric bandit algorithms based on bootstrap re-sampling -- satisfy an invariance principle. Namely, their weak limits coincide with that of Gaussian parametric Thompson sampling with Gaussian priors. Moreover, in the small gap regime, the regret performance of these algorithms is generally insensitive to model mis-specification, changing continuously with increasing degrees of mis-specification.