Tight Bounds for Low-Error Frequency Moment Estimation and the Power of Multiple Passes
Naomi Green-Maimon, Or Zamir
公開日: 2025/9/9
Abstract
Estimating the second frequency moment $F_2$ of a data stream up to a $(1 \pm \varepsilon)$ factor is a central problem in the streaming literature. For errors $\varepsilon > \Omega(1/\sqrt{n})$, the tight bound $\Theta\left(\log(\varepsilon^2 n)/\varepsilon^2\right)$ was recently established by Braverman and Zamir. In this work, we complete the picture by resolving the remaining regime of small error, $\varepsilon < 1/\sqrt{n}$, showing that the optimal space complexity is $\Theta\left( \min\left(n, \frac{1}{\varepsilon^2} \right) \cdot \left(1 + \left| \log(\varepsilon^2 n) \right| \right) \right)$ bits for all $\varepsilon \geq 1/n^2$, assuming a sufficiently large universe. This closes the gap between the best known $\Omega(n)$ lower bound and the straightforward $O(n \log n)$ upper bound in that range, and shows that essentially storing the entire stream is necessary for high-precision estimation. To derive this bound, we fully characterize the two-party communication complexity of estimating the size of a set intersection up to an arbitrary additive error $\varepsilon n$. In particular, we prove a tight $\Omega(n \log n)$ lower bound for one-way communication protocols when $\varepsilon < n^{-1/2-\Omega(1)}$, in contrast to classical $O(n)$-bit protocols that use two-way communication. Motivated by this separation, we present a two-pass streaming algorithm that computes the exact histogram of a stream with high probability using only $O(n \log \log n)$ bits of space, in contrast to the $\Theta(n \log n)$ bits required in one pass even to approximate $F_2$ with small error. This yields the first asymptotic separation between one-pass and $O(1)$-passes space complexity for small frequency moment estimation.