Network-accelerated Active Messages

Md Ashfaqur Rahaman, Alireza Sanaee, Todd Thornley, Sebastiano Miano, Gianni Antichi, Brent E. Stephens, Ryan Stutsman

Published: 2025/9/9

Abstract

Remote Direct Memory Access (RDMA) improves host networking performance by eliminating software and server CPU involvement. However, RDMA has a limited set of operations, is difficult to program, and often requires multiple round trips to perform simple application operations. Programmable SmartNICs provide a different means to offload work from host CPUs to a NIC. This leaves applications with the complex choice of embedding logic as RPC handlers at servers, using RDMA's limited interface to access server structures via client-side logic, or running some logic on SmartNICs. The best choice varies between workloads and over time. To solve this dilemma, we present NAAM, network-accelerated active messages. NAAM applications specify small, portable eBPF functions associated with messages. Each message specifies what data it accesses using an RDMA-like interface. NAAM runs at various places in the network, including at clients, on server-attached SmartNICs, and server host CPU cores. Due to eBPF's portability, the code associated with a message can be run at any location. Hence, the NAAM runtime can dynamically steer any message to execute its associated logic wherever it makes the most sense. To demonstrate NAAM's flexibility, we built several applications, including the MICA hash table and lookups from a Cell-style B-tree. With an NVIDIA BlueField-2 SmartNIC and integrating its NIC-embedded switch, NAAM can run any of these operations on client, server, and NIC cores, shifting load in tens of milliseconds on server compute congestion. NAAM dynamically offloads up to 1.8 million MICA ops/s for YCSB-B and 750,000 Cell lookups/s from server CPUs. Finally, whereas iPipe, the state-of-the-art SmartNIC offload framework, only scales to 8 application offloads on BlueField-2, NAAM scales to hundreds of application offloads with minimal impact on tail latency due to eBPF's low overhead.

Network-accelerated Active Messages | SummarXiv | SummarXiv