Generating coupled cluster code for modern distributed memory tensor software

Jan Brandejs, Johann Pototschnig, Trond Saue

Published: 2024/9/10

Abstract

Using GPU-based HPC platforms efficiently for coupled cluster computations is a challenge due to heterogeneous hardware structures. The constant need to adapt software to these structures and the required man-hours makes a systematization of high-performance code development desirable, even more so for higher-order coupled cluster. This is generally achieved by introducing a high-level representation of the problem, which is then translated to low-level instructions for the hardware using a compiler/translator component. Designing such software comes with another challenge: Allowing efficient implementation by capturing key symmetries of tensors, while retaining the abstraction from the hardware. We review ways to address these two challenges while presenting design decisions which led us to the development of a general-order coupled cluster code generator. The systematically produced code shows excellent weak scaling behavior running on up to 1200 GPUs using the distributed memory tensor library ExaTENSOR. We present an open-source modular tensor framework "tenpi" for coupled cluster code development with diagrammatic derivation, visualization module, symbolic algebra, intermediate optimization and support for multiple tensor backends. Tenpi brings higher-order CC functionality to the massively parallel ExaCorr module of the DIRAC code for relativistic molecular calculations.

Generating coupled cluster code for modern distributed memory tensor software | SummarXiv | SummarXiv