Federated XGBoost on Sample-Wise Non-IID Data – Kavli Institute Pre-Print Publications

Kavli Affiliate: Yi Zhou

| First 5 Authors: Katelinh Jones, Yuya Jeremy Ong, Yi Zhou, Nathalie Baracaldo,

| Summary:

Federated Learning (FL) is a paradigm for jointly training machine learning
algorithms in a decentralized manner which allows for parties to communicate
with an aggregator to create and train a model, without exposing the underlying
raw data distribution of the local parties involved in the training process.
Most research in FL has been focused on Neural Network-based approaches,
however Tree-Based methods, such as XGBoost, have been underexplored in
Federated Learning due to the challenges in overcoming the iterative and
additive characteristics of the algorithm. Decision tree-based models, in
particular XGBoost, can handle non-IID data, which is significant for
algorithms used in Federated Learning frameworks since the underlying
characteristics of the data are decentralized and have risks of being non-IID
by nature. In this paper, we focus on investigating the effects of how
Federated XGBoost is impacted by non-IID distributions by performing
experiments on various sample size-based data skew scenarios and how these
models perform under various non-IID scenarios. We conduct a set of extensive
experiments across multiple different datasets and different data skew
partitions. Our experimental results demonstrate that despite the various
partition ratios, the performance of the models stayed consistent and performed
close to or equally well against models that were trained in a centralized
manner.

| Search Query: ArXiv Query: search_query=au:”Yi Zhou”&id_list=&start=0&max_results=10