Externally Valid Selection of Experimental Sites via the k-Median Problem

José Luis Montiel Olea, Brenda Prallon, Chen Qiu, Jörg Stoye, Yiwei Sun

Published: 2024/8/17

Abstract

We present a decision-theoretic justification for viewing the question of how to best choose where to experiment in order to optimize external validity as a $k$-median problem, a popular problem in computer science and operations research. We present conditions under which minimizing the worst-case, welfare-based regret among all nonrandom schemes that select $k$ sites to experiment is approximately equal - and sometimes exactly equal - to finding the k most central vectors of baseline site-level covariates. The k-median problem can be formulated as a linear integer program. Two empirical applications illustrate the theoretical and computational benefits of the suggested procedure.

Externally Valid Selection of Experimental Sites via the k-Median Problem | SummarXiv | SummarXiv