SYNTHESIS: A Semi-Asynchronous Path-Integrated Stochastic Gradient Method for Distributed Learning in Computing Clusters

Kavli Affiliate: Jia Liu

| First 5 Authors: Zhuqing Liu, Xin Zhang, Jia Liu, ,

| Summary:

To increase the training speed of distributed learning, recent years have
witnessed a significant amount of interest in developing both synchronous and
asynchronous distributed stochastic variance-reduced optimization methods.
However, all existing synchronous and asynchronous distributed training
algorithms suffer from various limitations in either convergence speed or
implementation complexity. This motivates us to propose an algorithm called
algname (ul{s}emi-asul{yn}chronous paul{th}-intul{e}grated ul{s}tochastic
gradul{i}ent ul{s}earch), which leverages the special structure of the
variance-reduction framework to overcome the limitations of both synchronous
and asynchronous distributed learning algorithms, while retaining their salient
features. We consider two implementations of algname under distributed and
shared memory architectures. We show that our algname algorithms have
(O(sqrt{N}epsilon^{-2}(Delta+1)+N)) and
(O(sqrt{N}epsilon^{-2}(Delta+1) d+N)) computational complexities for
achieving an (epsilon)-stationary point in non-convex learning under
distributed and shared memory architectures, respectively, where (N) denotes
the total number of training samples and (Delta) represents the maximum
delay of the workers. Moreover, we investigate the generalization performance
of algname by establishing algorithmic stability bounds for quadratic strongly
convex and non-convex optimization. We further conduct extensive numerical
experiments to verify our theoretical findings

| Search Query: ArXiv Query: search_query=au:”Jia Liu”&id_list=&start=0&max_results=10