MUSHRA-1S: A scalable and sensitive test approach for evaluating top-tier speech processing systems

Laura Lechler, Ivana Balic

Published: 2025/9/23

Abstract

Evaluating state-of-the-art speech systems necessitates scalable and sensitive evaluation methods to detect subtle but unacceptable artifacts. Standard MUSHRA is sensitive but lacks scalability, while ACR scales well but loses sensitivity and saturates at a high quality. To address this, we introduce MUSHRA 1S, a single-stimulus variant that rates one system at a time against a fixed anchor and reference. Across our experiments, MUSHRA 1S matches standard MUSHRA more closely than ACR, including in the high-quality regime, where ACR saturates. MUSHRA 1S also effectively identifies specific deviations and reduces range-equalizing biases by fixing context. Overall, MUSHRA 1S combines MUSHRA level sensitivity with ACR like scalability, making it a robust and scalable solution for benchmarking top-tier speech processing systems.

Read Full Paper (arXiv.org)