Almost Sure Convergence for the Last Iterate of Stochastic Gradient Descent Schemes

Marcel Hudiani

公開日: 2025/7/9

Abstract

We study the almost sure convergence rate for the last iterate of stochastic gradient descent (SGD) and stochastic heavy ball (SHB) in the parametric setting when the objective function $F$ is globally convex or non-convex whose gradient is $\gamma$-H\"{o}lder. Using only discrete Gronwall's inequality without Robbins-Siegmund theorem, we recover results for both SGD and SHB: $\min_{s\leq t} \|\nabla F(w_s)\|^2 = o(t^{p-1})$ for non-convex objectives and $F(w_t) - F_* = o(t^{2\gamma/(1+\gamma) \cdot \max(p-1,-2p+1)-\epsilon})$ for $\beta \in (0, 1)$ and $\min_{s \leq t} F(w_s) - F_* = o(t^{p-1})$ almost surely for convex objectives. In addition, we proved that SHB with constant momentum parameter $\beta \in (0, 1)$ attains a convergence rate of $F(w_t) - F_* = O(t^{\max(p-1,-2p+1)} \log^2 \frac{t}{\delta})$ with probability at least $1-\delta$ when $F$ is convex and $\gamma = 1$ and step size $\alpha_t = \Theta(t^{-p})$ with $p \in (\frac{1}{2}, 1)$.

Almost Sure Convergence for the Last Iterate of Stochastic Gradient Descent Schemes | SummarXiv | SummarXiv