Distribution Testing in the Presence of Arbitrarily Dominant Noise with Verification Queries
Hadley Black, Christopher Ye
公開日: 2025/9/21
Abstract
We study distribution testing without direct access to a source of relevant data, but rather to one where only a tiny fraction is relevant. To enable this, we introduce the following verification query model. The goal is to perform a statistical task on distribution $\boldsymbol{p}$ given sample access to a mixture $\boldsymbol{r} = \lambda \boldsymbol{p} + (1-\lambda)\boldsymbol{q}$ and the ability to query whether a sample was generated by $\boldsymbol{p}$ or by $\boldsymbol{q}$. In general, if $m_0$ samples from $\boldsymbol{p}$ suffice for a task, then $O(m_0/\lambda)$ samples and queries always suffice in our model. Are there tasks for which the number of queries can be significantly reduced? We study the canonical problems in distribution testing, and obtain matching upper and lower bounds that reveal smooth trade-offs between sample and query complexity. For all $m \leq n$, we obtain (i) a uniformity and identity tester using $O(m + \frac{\sqrt{n}}{\varepsilon^2 \lambda})$ samples and $O(\frac{n}{m \varepsilon^4 \lambda^2})$ queries, and (ii) a closeness tester using $O(m + \frac{n^{2/3}}{\varepsilon^{4/3} \lambda} + \frac{1}{\varepsilon^4 \lambda^3})$ samples and $O(\frac{n^2}{m^2 \varepsilon^4 \lambda^3})$ queries. Moreover, we show that these query complexities are tight for all testers using $m \ll n$ samples. Next, we show that for testing closeness using $m = \widetilde{O}(\frac{n}{\varepsilon^2\lambda})$ samples we can achieve query complexity $\widetilde{O}(\frac{1}{\varepsilon^2\lambda})$ which is nearly optimal even for the basic task of bias estimation with unbounded samples. Our uniformity testers work in the more challenging setting where the contaminated samples are generated by an adaptive adversary (at the cost of a $\log n$ factor). Finally, we show that our lower bounds can be circumvented if the algorithm is provided with the PDF of the mixture.