Frustratingly Easy Zero-Day Audio DeepFake Detection via Retrieval Augmentation and Profile Matching
Xuechen Liu, Xin Wang, Junichi Yamagishi
Published: 2025/9/26
Abstract
Modern audio deepfake detectors using foundation models and large training datasets have achieved promising detection performance. However, they struggle with zero-day attacks, where the audio samples are generated by novel synthesis methods that models have not seen from reigning training data. Conventional approaches against such attacks require fine-tuning the detectors, which can be problematic when prompt response is required. This study introduces a training-free framework for zero-day audio deepfake detection based on knowledge representations, retrieval augmentation, and voice profile matching. Based on the framework, we propose simple yet effective knowledge retrieval and ensemble methods that achieve performance comparable to fine-tuned models on DeepFake-Eval-2024, without any additional model-wise training. We also conduct ablation studies on retrieval pool size and voice profile attributes, validating their relevance to the system efficacy.