Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance

Kavli Affiliate: Yi Zhou

| First 5 Authors: Qi Zhang, Yi Zhou, Shaofeng Zou, ,

| Summary:

This paper provides the first tight convergence analyses for RMSProp and Adam
in non-convex optimization under the most relaxed assumptions of
coordinate-wise generalized smoothness and affine noise variance. We first
analyze RMSProp, which is a special case of Adam with adaptive learning rates
but without first-order momentum. Specifically, to solve the challenges due to
dependence among adaptive update, unbounded gradient estimate and Lipschitz
constant, we demonstrate that the first-order term in the descent lemma
converges and its denominator is upper bounded by a function of gradient norm.
Based on this result, we show that RMSProp with proper hyperparameters
converges to an $epsilon$-stationary point with an iteration complexity of
$mathcal O(epsilon^{-4})$. We then generalize our analysis to Adam, where the
additional challenge is due to a mismatch between the gradient and first-order
momentum. We develop a new upper bound on the first-order term in the descent
lemma, which is also a function of the gradient norm. We show that Adam with
proper hyperparameters converges to an $epsilon$-stationary point with an
iteration complexity of $mathcal O(epsilon^{-4})$. Our complexity results for
both RMSProp and Adam match with the complexity lower bound established in
cite{arjevani2023lower}.

| Search Query: ArXiv Query: search_query=au:”Yi Zhou”&id_list=&start=0&max_results=3

Read More