LEMURS dataset: Large-scale multi-detector ElectroMagnetic Universal Representation of Showers

Peter McKeown, Piyush Raikwar, Anna Zaborowska

公開日: 2025/9/5

Abstract

We present LEMURS: an extensive dataset of simulated calorimeter showers designed to support the development and benchmarking of fast simulation methods in high-energy physics, most notably providing a step towards the development of foundation models. This new dataset is more robust than the well-established CaloChallenge dataset 2, featuring substantially greater statistics, a wider range of incident angles in the detector, and most crucially multiple detector geometries (including more realistic calorimeters). The dataset is provided in HDF5 format, with a file structure inspired by the CaloChallenge shower representation while also including more variables. LEMURS scale and diversity make it particularly suitable for development of foundation models and has been used in the CaloDiT-2 model, a pre-trained model released in the community standard simulation toolkit Geant4 (version 11.4.beta). All data and code for generation and analysis are openly accessible, facilitating reproducibility and reuse across the community.

LEMURS dataset: Large-scale multi-detector ElectroMagnetic Universal Representation of Showers | SummarXiv | SummarXiv