Multiple testing with anytime-valid Monte Carlo p-values

Lasse Fischer, Timothy Barry, Aaditya Ramdas

公開日: 2024/4/24

Abstract

In contemporary problems involving genetic or neuroimaging data, thousands of hypotheses need to be tested. Due to their high power, and finite sample guarantees on type-I error under weak assumptions, Monte Carlo permutation tests are often considered as gold standard for these settings. However, the enormous computational effort required for (thousands of) permutation tests is a major burden. In this paper, we integrate recently constructed anytime-valid permutation p-values into a broad class of multiple testing procedures, including the Benjamini-Hochberg procedure. This allows to fully adapt the number of permutations to the underlying data and thus, for example, to the number of rejections made by the multiple testing procedure. Even though this data-adaptive stopping can induce dependencies between the p-values that violate the usual assumptions of the Benjamini-Hochberg procedure, we prove that our approach controls the false discovery rate under mild assumptions. Furthermore, our method provably decreases the required number of permutations substantially without compromising power. On a real genomics data set, our method reduced the computational time from more than three days to less than four minutes while increasing the number of rejections.

Multiple testing with anytime-valid Monte Carlo p-values | SummarXiv | SummarXiv