Beyond Quantity: Distribution-Aware Labeling for Visual Grounding

Yichi Zhang, Gongwei Chen, Jun Zhu, Jia Wan, Liqiang Nie

公開日: 2025/5/30

Abstract

Visual grounding requires large and diverse region-text pairs. However, manual annotation is costly and fixed vocabularies restrict scalability and generalization. Existing pseudo-labeling pipelines often overfit to biased distributions and generate noisy or redundant samples. Through our systematic analysis of data quality and distributional coverage, we find that performance gains come less from raw data volume and more from effective distribution expansion. Motivated by this insight, we propose DAL, a distribution-aware labeling framework for visual grounding. The proposed method first employs a dual-driven annotation module, where a closed-set path provides reliable pseudo labels and an open-set path enriches vocabulary and introduces novel concepts; meanwhile, it further performs explicit out-of-distribution (OOD) expression expansion to broaden semantic coverage. We then propose a consistency- and distribution-aware filtering module to discard noisy or redundant region-text pairs and rebalance underrepresented linguistic and visual content, thereby improving both data quality and training efficiency. Extensive experiments on three benchmarks demonstrate that our method consistently outperforms strong baselines and achieves state-of-the-art results, underscoring the critical role of distribution-aware labeling in building scalable and robust visual grounding datasets.

Beyond Quantity: Distribution-Aware Labeling for Visual Grounding | SummarXiv | SummarXiv