Wasserstein normalized autoencoder for anomaly detection

CMS Collaboration

Published: 2025/10/2

Abstract

A novel anomaly detection algorithm is presented. The Wasserstein normalized autoencoder (WNAE) is a normalized probabilistic model that minimizes the Wasserstein distance between the learned probability distribution -- a Boltzmann distribution where the energy is the reconstruction error of the autoencoder -- and the distribution of the training data. This algorithm has been developed and applied to the identification of semivisible jets -- conical sprays of visible standard model particles and invisible dark matter states -- with the CMS experiment at the CERN LHC. Trained on jets of particles from simulated standard model processes, the WNAE is shown to learn the probability distribution of the input data in a fully unsupervised fashion, such that it effectively identifies new physics jets as anomalies. The model consistently demonstrates stable, convergent training and achieves strong classification performance across a wide range of signals, improving upon standard normalized autoencoders, while remaining agnostic to the signal. The WNAE directly tackles the problem of outlier reconstruction, a common failure mode of autoencoders in anomaly detection tasks.

Read Full Paper (arXiv.org)