Zero-Waiting Load Balancing with Heterogeneous Servers in Heavy Traffic
Xin Liu, Lei Ying
Published: 2025/9/28
Abstract
We study the steady-state delay performance of load balancing in large-scale systems with heterogeneous servers in the heavy-traffic regimes. The system consists of $N$ servers, each with a local buffer of size $b-1$, serving jobs in the first-in-first-out (FIFO) order. Jobs arrive according to a Poisson process with rate $\lambda N$, where $\lambda = 1 - N^{-\alpha}$ for any $\alpha \in (0,1)$. Service times are assumed to be exponentially distributed with fully heterogeneous rates, where the service rate of each server can differ and may scale with the system size $N$. We study a queue length aware and service rate aware load balancing policy, Join-the-Fastest-Shortest-Queue (JFSQ), and demonstrate that it achieves asymptotic zero waiting time and probability under the heavy traffic regimes, including both the Sub-Halfin-Whitt ($\alpha \in (0,0.5)$) and Super-Halfin-Whitt ($\alpha \in [0.5,1)$) regimes. The performance bounds of waiting time and probability explicitly capture the convergence rate w.r.t. the system size $N$ and show the negative effect of server heterogeneity. Our analysis builds on the general framework of Stein's method with iterative state-space peeling, where we design a sequence of Lyapunov functions to analyze the high-dimensional heterogeneous system without assuming exchangeability and monotonicity. Our analysis shows that JFSQ efficiently utilizes servers with higher capacities, and the steady-state system can be coupled with a single-server queue via Stein's method. To the best of our knowledge, this is the first work to establish delay performance bounds of a load-balancing system with size $N$ and fully heterogeneous servers in heavy traffic.