3D Blocking for Matrix-free Smoothers in 2D Variable-Viscosity Stokes Equations with Applications to Geodynamics

Marcel Ferrari, Cyrill Püntener, Alexander Sotoudeh, Niklas Viebig

Published: 2025/9/23

Abstract

We present the design, implementation, and evaluation of optimized matrix-free stencil kernels for multigrid smoothing in the incompressible Stokes equations with variable viscosity, motivated by geophysical flow problems. We investigate five smoother variants derived from different optimisation strategies: Red-Black Gauss-Seidel, Jacobi, fused Jacobi, blocked fused Jacobi, and a novel Jacobi smoother with RAS-type temporal blocking, a strategy that applies local iterations on overlapping tiles to improve cache reuse. To ensure correctness, we introduce an energy-based residual norm that balances velocity and pressure contributions, and validate all implementations using a high-contrast sinker benchmark representative of realistic geodynamic numerical models. Our performance study on NVIDIA GH200 Grace Hopper nodes of the ALPS supercomputer demonstrates that all smoothers scale well within a single NUMA domain, but the RAS-Jacobi smoother consistently achieves the best performance at higher core counts. It sustains over 90% weak-scaling efficiency up to 64 cores and delivers up to a threefold speedup compared to the C++ Jacobi baseline, owing to improved cache reuse and reduced memory traffic. These results show that temporal blocking, already employed in distributed-memory solvers to reduce communication, can also provide substantial benefits at the socket and NUMA level. This work highlights the importance of cache-aware stencil design for harnessing modern heterogeneous architectures and lays the groundwork for extending RAS-type temporal blocking strategies to three-dimensional problems and GPU accelerators.

3D Blocking for Matrix-free Smoothers in 2D Variable-Viscosity Stokes Equations with Applications to Geodynamics | SummarXiv | SummarXiv