Adaptive Gradient Normalization and Independent Sampling for (Stochastic) Generalized-Smooth Optimization

Kavli Affiliate: Yi Zhou

| First 5 Authors: Yufeng Yang, Yufeng Yang, , ,

| Summary:

Recent studies have shown that many nonconvex machine learning problems
satisfy a generalized-smooth condition that extends beyond traditional smooth
nonconvex optimization. However, the existing algorithms are not fully adapted
to such generalized-smooth nonconvex geometry and encounter significant
technical limitations on their convergence analysis. In this work, we first
analyze the convergence of adaptively normalized gradient descent under
function geometries characterized by generalized-smoothness and generalized
PL condition, revealing the advantage of adaptive gradient normalization.
Our results provide theoretical insights into adaptive normalization across
various scenarios.For stochastic generalized-smooth nonconvex optimization, we
propose textbfIndependent-textbfAdaptively textbfNormalized
textbfStochastic textbfGradient textbfDescent algorithm, which
leverages adaptive gradient normalization, independent sampling, and gradient
clipping to achieve an $mathcalO(epsilon^-4)$ sample complexity under
relaxed noise assumptions. Experiments on large-scale nonconvex
generalized-smooth problems demonstrate the fast convergence of our algorithm.

| Search Query: ArXiv Query: search_query=au:”Yi Zhou”&id_list=&start=0&max_results=3