FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation

Kavli Affiliate: Jia Liu

| First 5 Authors: Srijith Nair, Michael Lin, Peizhong Ju, Amirreza Talebi, Elizabeth Serena Bentley

| Summary:

Collaborative training methods like Federated Learning (FL) and Split
Learning (SL) enable distributed machine learning without sharing raw data.
However, FL assumes clients can train entire models, which is infeasible for
large-scale models. In contrast, while SL alleviates the client memory
constraint in FL by offloading most training to the server, it increases
network latency due to its sequential nature. Other methods address the
conundrum by using local loss functions for parallel client-side training to
improve efficiency, but they lack server feedback and potentially suffer poor
accuracy. We propose FSL-SAGE (Federated Split Learning via Smashed Activation
Gradient Estimation), a new federated split learning algorithm that estimates
server-side gradient feedback via auxiliary models. These auxiliary models
periodically adapt to emulate server behavior on local datasets. We show that
FSL-SAGE achieves a convergence rate of $mathcal{O}(1/sqrt{T})$, where $T$ is
the number of communication rounds. This result matches FedAvg, while
significantly reducing communication costs and client memory requirements. Our
empirical results also verify that it outperforms existing state-of-the-art FSL
methods, offering both communication efficiency and accuracy.

| Search Query: ArXiv Query: search_query=au:”Jia Liu”&id_list=&start=0&max_results=3