Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization

Kavli Affiliate: Yi Zhou

| First 5 Authors: [#item_custom_name[1]], [#item_custom_name[2]], [#item_custom_name[3]], [#item_custom_name[4]], [#item_custom_name[5]]

| Summary:

Various optimal gradient-based algorithms have been developed for smooth
nonconvex optimization. However, many nonconvex machine learning problems do
not belong to the class of smooth functions and therefore the existing
algorithms are sub-optimal. Instead, these problems have been shown to satisfy
certain generalized-smooth conditions, which have not been well understood in
the existing literature. In this paper, we propose a notion of
$alpha$-symmetric generalized-smoothness that substantially extends the
existing notions and covers many important functions such as high-order
polynomials and exponential functions. We study the fundamental properties and
establish descent lemmas for the functions in this class. Then, to solve such a
large class of nonconvex problems, we design a special deterministic normalized
gradient descent algorithm that achieves the optimal iteration complexity
$mathcal{O}(epsilon^{-2})$, and also prove that the popular SPIDER variance
reduction algorithm achieves the optimal sample complexity
$mathcal{O}(epsilon^{-3})$. Our results show that solving generalized-smooth
nonconvex problems is as efficient as solving smooth nonconvex problems.

| Search Query: [#feed_custom_title]

Read More