Event Tokenization and Next-Token Prediction for Anomaly Detection at the Large Hadron Collider

Ambre Visive, Polina Moskvitina, Clara Nellist, Roberto Ruiz de Austri, Sascha Caron

公開日: 2025/9/30

Abstract

We propose a novel use of Large Language Models (LLMs) as unsupervised anomaly detectors in particle physics. Using lightweight LLM-like networks with encoder-based architectures trained to reconstruct background events via masked-token prediction, our method identifies anomalies through deviations in reconstruction performance, without prior knowledge of signal characteristics. Applied to searches for simultaneous four-top-quark production, this token-based approach shows competitive performance against established unsupervised methods and effectively captures subtle discrepancies in collider data, suggesting a promising direction for model-independent searches for new physics.

Event Tokenization and Next-Token Prediction for Anomaly Detection at the Large Hadron Collider | SummarXiv | SummarXiv