Disaggregated Design for GPU-Based Volumetric Data Structures
Massimiliano Meneghin, Ahmed H. Mahmoud
Published: 2025/3/10
Abstract
Volumetric data structures typically prioritize data locality, focusing on efficient memory access patterns. This singular focus can neglect other critical performance factors, such as occupancy, communication, and kernel fusion. We introduce a novel \emph{disaggregated} design that rebalances trade-offs between locality and these objectives -- reducing communication overhead on distributed memory architectures, mitigating register pressure in complex boundary conditions, and enabling kernel fusion. We provide a thorough analysis of its benefits on a single-node multi-GPU Lattice Boltzmann Method (LBM) solver. Our evaluation spans dense, block-sparse, and multi-resolution discretizations, demonstrating our design's flexibility and efficiency. Leveraging this approach, we achieve up to a $3\times$ speedup over state-of-the-art solutions.