Benchmarking atmospheric circulation variability in an AI emulator, ACE2, and a hybrid model, NeuralGCM

Ian Baxter, Hamid Pahlavan, Pedram Hassanzadeh, Katharine Rucker, Tiffany Shaw

Published: 2025/10/6

Abstract

Physics-based atmosphere-land models with prescribed sea surface temperature have notable successes but also biases in their ability to represent atmospheric variability compared to observations. Recently, AI emulators and hybrid models have emerged with the potential to overcome these biases, but still require systematic evaluation against metrics grounded in fundamental atmospheric dynamics. Here, we evaluate the representation of four atmospheric variability benchmarking metrics in a fully data-driven AI emulator (ACE2-ERA5) and hybrid model (NeuralGCM). The hybrid model and emulator can capture the spectra of large-scale tropical waves and extratropical eddy-mean flow interactions, including critical levels. However, both struggle to capture the timescales associated with quasi-biennial oscillation (QBO, $\sim 28$ months) and Southern annular mode propagation ($\sim 150$ days). These dynamical metrics serve as an initial benchmarking tool to inform AI model development and understand their limitations, which may be essential for out-of-distribution applications (e.g., extrapolating to unseen climates).

Benchmarking atmospheric circulation variability in an AI emulator, ACE2, and a hybrid model, NeuralGCM | SummarXiv | SummarXiv