research-article

Open access

FedAT: a high-performance and communication-efficient federated learning system with asynchronous tiers

Authors:

Huzefa RangwalaAuthors Info & Claims

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 60, Pages 1 - 16

https://doi.org/10.1145/3458817.3476211

Published: 13 November 2021 Publication History

PDF eReader

Abstract

Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem---where clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck---where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one single dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based, synchronous mechanisms to tackle the straggler problem. However, asynchronous methods can easily create a communication bottleneck, while tiering may introduce biases that favor faster tiers with shorter response latencies.

To address these issues, we present FedAT, a novel Federated learning system with Asynchronous Tiers under Non-i.i.d. training data. FedAT synergistically combines synchronous, intra-tier training and asynchronous, cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-aware, weighted aggregation heuristic to steer and balance the training across clients for further accuracy improvement. FedAT compresses uplink and downlink communications using an efficient, polyline-encoding-based compression algorithm, which minimizes the communication cost. Results show that FedAT improves the prediction performance by up to 21.09% and reduces the communication cost by up to 8.5×, compared to state-of-the-art FL methods.

Supplementary Material

MP4 File (FedAT A High Performance and Communication-Efficient Federated Learning System With Asynchronous Tiers 232 Afternoon 3.mp4)

Presentation video

Download
237.85 MB

References

[1]

Chameleon Cloud: A configurable experimental environment for large-scale cloud research. https://www.chameleoncloud.org/.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

FedPcf : An Integrated Federated Learning Framework with Multi-Level Prospective Correction Factor

A federated deep learning framework for privacy preservation and communication efficiency

Joint think locally and globally: Communication-efficient federated learning with feature-aligned filter selection

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations