Adjacent Leader Decentralized Stochastic Gradient Descent

Kavli Affiliate: Jing Wang

| First 5 Authors: Haoze He, Jing Wang, Anna Choromanska, ,

| Summary:

This work focuses on the decentralized deep learning optimization framework.
We propose Adjacent Leader Decentralized Gradient Descent (AL-DSGD), for
improving final model performance, accelerating convergence, and reducing the
communication overhead of decentralized deep learning optimizers. AL-DSGD
relies on two main ideas. Firstly, to increase the influence of the strongest
learners on the learning system it assigns weights to different neighbor
workers according to both their performance and the degree when averaging among
them, and it applies a corrective force on the workers dictated by both the
currently best-performing neighbor and the neighbor with the maximal degree.
Secondly, to alleviate the problem of the deterioration of the convergence
speed and performance of the nodes with lower degrees, AL-DSGD relies on
dynamic communication graphs, which effectively allows the workers to
communicate with more nodes while keeping the degrees of the nodes low.
Experiments demonstrate that AL-DSGD accelerates the convergence of the
decentralized state-of-the-art techniques and improves their test performance
especially in the communication constrained environments. We also theoretically
prove the convergence of the proposed scheme. Finally, we release to the
community a highly general and concise PyTorch-based library for distributed
training of deep learning models that supports easy implementation of any
distributed deep learning approach ((a)synchronous, (de)centralized).

| Search Query: ArXiv Query: search_query=au:”Jing Wang”&id_list=&start=0&max_results=3

Read More