More Web Proxy on the site http://driver.im/

research-article

Open access

FedHybrid: Breaking the Memory Wall of Federated Learning via Hybrid Tensor Management

Authors:

ChengZhong XuAuthors Info & Claims

SENSYS '24: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems

Pages 394 - 408

https://doi.org/10.1145/3666025.3699346

Published: 04 November 2024 Publication History

Abstract

Federated Learning (FL) emerges as a new learning paradigm that enables multiple devices to collaboratively train a shared model while preserving data privacy. However, one fundamental and prevailing challenge that hinders the deployment of FL on mobile devices is the memory limitation. This paper proposes FedHybrid, a novel framework that effectively reduces the memory footprint during the training process while guaranteeing the model accuracy and the overall training progress. Specifically, FedHybrid first selects the participating devices for each training round by jointly evaluating their memory budget, computing capability, and data diversity. After that, it judiciously analyzes the computational graph and generates an execution plan for each selected client in order to meet the corresponding memory budget while minimizing the training delay through employing a hybrid of recomputation and compression techniques according to the characteristic of each tensor. During the local training process, FedHybrid carries out the execution plan with a well-designed activation compression technique to effectively achieve memory reduction with minimum accuracy loss. We conduct extensive experiments to evaluate FedHybrid on both simulation and off-the-shelf mobile devices. The experiment results demonstrate that FedHybrid achieves up to a 39.1% increase in model accuracy and a 15.5X reduction in wall clock time under various memory budgets compared with the baselines.

References

[1]

Samiul Alam, Luyang Liu, Ming Yan, and Mi Zhang. 2022. Fedrolex: Model-heterogeneous federated learning with rolling sub-model extraction. Advances in Neural Information Processing Systems 35 (2022), 29677--29690.

[2]

Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konečnỳ, Stefano Mazzocchi, Brendan McMahan, et al. 2019. Towards federated learning at scale: System design. Proceedings of machine learning and systems 1 (2019), 374--388.

[3]

Jianfei Chen, Lianmin Zheng, Zhewei Yao, Dequan Wang, Ion Stoica, Michael Mahoney, and Joseph Gonzalez. 2021. Actnn: Reducing training memory footprint via 2-bit activation compressed training. In Proceedings of ICML. PMLR, 1803--1813.

[4]

Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174 (2016).

[5]

Wei Chen, Yajun Wang, and Yang Yuan. 2013. Combinatorial multi-armed bandit: General framework and applications. In Proceedings of ICML. PMLR, 151--159.

[6]

Yae Jee Cho, Jianyu Wang, and Gauri Joshi. 2020. Client selection in federated learning: Convergence analysis and power-of-choice selection strategies. arXiv preprint arXiv:2010.01243 (2020).

[7]

Michael Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems 29 (2016).

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[9]

Enmao Diao, Jie Ding, and Vahid Tarokh. 2020. Heterofl: Computation and communication efficient federated learning for heterogeneous clients. arXiv preprint arXiv:2010.01264 (2020).

[10]

Cynthia Dwork. 2008. Differential privacy: A survey of results. In International conference on theory and applications of models of computation. Springer, 1--19.

Digital Library

[11]

R David Evans and Tor Aamodt. 2021. Ac-gc: Lossy activation compression with guaranteed convergence. Advances in Neural Information Processing Systems 34 (2021), 27434--27448.

[12]

R David Evans, Lufei Liu, and Tor M Aamodt. 2020. Jpeg-act: accelerating deep learning via transform-based lossy compression. In Proceedings of ACM/IEEE ISCA. IEEE, 860--873.

Digital Library

[13]

Gary Sims. 2023. How much RAM does your Android phone really need in 2023? https://www.androidauthority.com/how-much-ram-do-i-need-phone-3086661/. Accessed: 2023.12.

[14]

In Gim and JeongGil Ko. 2022. Memory-efficient DNN training on mobile devices. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services. 464--476.

Digital Library

[15]

Google. 2023. samsung-galaxy-s21-ram-plus-update. https://developer.android.com/topic/performance/memory-management. Accessed: 2023.12.

[16]

Google. 2023. System profiling, app tracing and trace analysis. https://perfetto.dev/. Accessed: 2023.12.

[17]

Google. 2023. UI/Application Exerciser Monkey. https://developer.android.com/studio/test/other-testing-tools/monkey. Accessed: 2023.12.

[18]

Google. 2024. ActivityManager. https://developer.android.com/reference/android/app/ActivityManager.AppTask. Accessed: 2023.12.

[19]

Google. 2024. oom-adj-score. https://developer.android.com/topic/performance/memory-management#low-memory_killer. Accessed: 2023.12.

[20]

Cong Guo, Jiaming Tang, Weiming Hu, Jingwen Leng, Chen Zhang, Fan Yang, Yunxin Liu, Minyi Guo, and Yuhao Zhu. 2023. Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization. In Proceedings of ACM/IEEE ISCA. 1--15.

Digital Library

[21]

Dianne Hackborn. 2013. ProcessList.java. https://android.googlesource.com/platform/frameworks/base/+/6285a32/services/java/com/android/server/am/ProcessList.java. Accessed: 2023.12.

[22]

Kyuhwa Han and Dongkun Shin. 2020. Command queue-aware host I/O stack for mobile flash storage. Journal of systems architecture 109 (2020), 101758.

[23]

Jung Hwan Heo, Jeonghoon Kim, Beomseok Kwon, Byeongwook Kim, Se Jung Kwon, and Dongsoo Lee. 2023. Rethinking channel dimensions to isolate outliers for low-bit weight quantization of large language models. arXiv preprint arXiv:2309.15531 (2023).

[24]

Samuel Horvath, Stefanos Laskaridis, Mario Almeida, Ilias Leontiadis, Stylianos Venieris, and Nicholas Lane. 2021. Fjord: Fair and accurate federated learning under heterogeneous targets with ordered dropout. Advances in Neural Information Processing Systems 34 (2021), 12876--12889.

[25]

Chien-Chin Huang, Gu Jin, and Jinyang Li. 2020. Swapadvisor: Pushing deep learning beyond the gpu memory limit via smart swapping. In Proceedings of ACM ASPLOS. 1341--1355.

Digital Library

[26]

Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2023. AI Benchmark: All About Deep Learning on Smartphones. https://ai-benchmark.com/ranking.html. Accessed: 2023.12.

[27]

Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Joseph Gonzalez, Kurt Keutzer, and Ion Stoica. 2020. Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization. In Proceedings of Machine Learning and Systems 2020. 497--511.

[28]

Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lv, and Zhihua Wu. 2020. MNN: A Universal and Efficient Inference Engine. In Proceedings of MLSys.

[29]

Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2021. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 14, 1--2 (2021), 1--210.

[30]

Marisa Kirisame, Steven Lyubomirsky, Altan Haan, Jennifer Brennan, Mike He, Jared Roesch, Tianqi Chen, and Zachary Tatlock. 2020. Dynamic tensor rematerialization. arXiv preprint arXiv:2006.09616 (2020).

[31]

Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, et al. 2020. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. International Journal of Computer Vision 128, 7 (2020), 1956--1981.

[32]

Fan Lai, Yinwei Dai, Sanjay Singapuram, Jiachen Liu, Xiangfeng Zhu, Harsha Madhyastha, and Mosharaf Chowdhury. 2022. Fedscale: Benchmarking model and system performance of federated learning at scale. In Proceedings of ACM ICML. PMLR, 11814--11827.

[33]

Fan Lai, Xiangfeng Zhu, Harsha V Madhyastha, and Mosharaf Chowdhury. 2021. Oort: Efficient federated learning via guided participant selection. In 15th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 21). 19--35.

[34]

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019).

[35]

Gyeongyong Lee, Jaewook Kwak, Joonyong Jeong, Daeyong Lee, Moonseok Jang, Jungwook Choi, and Yong Ho Song. 2021. Internal Task-Aware Command Scheduling to Improve Read Performance of Embedded Flash Storage Systems. IEEE Access 9 (2021), 71638--71650.

[36]

Ang Li, Jingwei Sun, Pengcheng Li, Yu Pu, Hai Li, and Yiran Chen. 2021. Hermes: an efficient federated learning framework for heterogeneous mobile clients. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 420--437.

Digital Library

[37]

Chenning Li, Xiao Zeng, Mi Zhang, and Zhichao Cao. 2022. PyramidFL: A fine-grained client selection framework for efficient federated learning. In Proceedings of ACM MobiCom. 158--171.

Digital Library

[38]

Tong Li, Yali Fan, Yong Li, Sasu Tarkoma, and Pan Hui. 2021. Understanding the long-term evolution of mobile app usage. IEEE Transactions on Mobile Computing 22, 2 (2021), 1213--1230.

[39]

Yu Liang, Jinheng Li, Rachata Ausavarungnirun, Riwei Pan, Liang Shi, Tei-Wei Kuo, and Chun Jason Xue. 2020. Acclaim: Adaptive memory reclaim to improve user experience in android systems. In Proceedings of USENIX ATC. 897--910.

[40]

Yu Liang, Riwei Pan, Tianyu Ren, Yufei Cui, Rachata Ausavarungnirun, Xianzhang Chen, Changlong Li, Tei-Wei Kuo, and Chun Jason Xue. 2022. {CacheSifter}: Sifting Cache Files for Boosted Mobile Performance and Lifetime. In 20th USENIX Conference on File and Storage Technologies (FAST 22). 445--459.

[41]

Geunsik Lim, Donghyun Kang, MyungJoo Ham, and Young Ik Eom. 2023. SWAM: Revisiting Swap and OOMK for Improving Application Responsiveness on Mobile Devices. arXiv preprint arXiv:2306.08345 (2023).

[42]

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Xingyu Dang, and Song Han. 2023. Awq: Activation-aware weight quantization for llm compression and acceleration. arXiv preprint arXiv:2306.00978 (2023).

[43]

Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, et al. 2022. GACT: Activation compressed training for generic network architectures. In Proceedings of ACM ICML. PMLR, 14139--14152.

[44]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Proceedings of AISTATS. PMLR, 1273--1282.

[45]

Brendan McMahan and Daniel Ramage. 2017. Federated learning: Collaborative machine learning without centralized training data. Google Research Blog 3 (2017).

[46]

Inc Monsoon Solutions. 2023. High voltage power moniter. https://www.msoon.com/. Accessed: 2023.12.

[47]

Adam J Oliner, Anand P Iyer, Ion Stoica, Eemil Lagerspetz, and Sasu Tarkoma. 2013. Carat: Collaborative energy diagnosis for mobile devices. In Proceedings of the 11th ACM conference on embedded networked sensor systems. 1--14.

Digital Library

[48]

Matthias Paulik, Matt Seigel, Henry Mason, Dominic Telaar, Joris Kluivers, Rogier van Dalen, Chi Wai Lau, Luke Carlson, Filip Granqvist, Chris Vandevelde, et al. 2021. Federated evaluation and tuning for on-device personalization: System design & applications. arXiv preprint arXiv:2102.08503 (2021).

[49]

Xuan Peng, Xuanhua Shi, Hulin Dai, Hai Jin, Weiliang Ma, Qian Xiong, Fan Yang, and Xuehai Qian. 2020. Capuchin: Tensor-based gpu memory management for deep learning. In Proceedings of ACM ASPLOS. 891--905.

Digital Library

[50]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of IEEE CVPR. 4510--4520.

[51]

Jaemin Shin, Yuanchun Li, Yunxin Liu, and Sung-Ju Lee. 2022. FedBalancer: data and pace control for efficient federated learning on heterogeneous clients. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services. 436--449.

Digital Library

[52]

Zhen Tu, Runtong Li, Yong Li, Gang Wang, Di Wu, Pan Hui, Li Su, and Depeng Jin. 2018. Your apps give you away: distinguishing mobile users by their app usage fingerprints. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3 (2018), 1--23.

Digital Library

[53]

Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, and Tim Kraska. 2018. Superneurons: Dynamic GPU memory management for training deep neural networks. In Proceedings of ACM PPoPP. 41--53.

Digital Library

[54]

Qipeng Wang, Mengwei Xu, Chao Jin, Xinran Dong, Jinliang Yuan, Xin Jin, Gang Huang, Yunxin Liu, and Xuanzhe Liu. 2022. Melon: Breaking the memory wall for resource-efficient on-device machine learning. In Proceedings of ACM MobiSys. 450--463.

Digital Library

[55]

Wikipedia contributors. Year the page was last edited. 68-95-99.7 rule. https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule Accessed: Access date.

[56]

Zhenliang Xue, Yixin Song, Zeyu Mi, Le Chen, Yubin Xia, and Haibo Chen. 2024. PowerInfer-2: Fast Large Language Model Inference on a Smartphone. arXiv preprint arXiv:2406.06282 (2024).

[57]

Chengliang Zhang, Suyi Li, Junzhe Xia, Wei Wang, Feng Yan, and Yang Liu. 2020. {BatchCrypt}: Efficient homomorphic encryption for {Cross-Silo} federated learning. In 2020 USENIX annual technical conference (USENIX ATC 20). 493--506.

[58]

Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. Advances in neural information processing systems 28 (2015).

[59]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of IEEE CVPR. 6848--6856.

Index Terms

FedHybrid: Breaking the Memory Wall of Federated Learning via Hybrid Tensor Management
1. Computing methodologies
  1. Distributed computing methodologies
2. Human-centered computing
  1. Ubiquitous and mobile computing

Recommendations

Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

Replacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...
Power management of hybrid DRAM/PRAM-based main memory
DAC '11: Proceedings of the 48th Design Automation Conference

Hybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power ...
Software Hint-Driven Data Management for Hybrid Memory in Mobile Systems
Hybrid memory systems, comprised of emerging non-volatile memory (NVM) and DRAM, have been proposed to address the growing memory demand of current mobile applications. Recently emerging NVM technologies, such as phase-change memories (PCM), memristor, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SenSys '24: Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems

November 2024

950 pages

ISBN:9798400706974

DOI:10.1145/3666025

Chair:
Jie Liu,
Co-chairs:
Yuanchao Shu,
Jiming Chen,
Program Chair:
Yuan He,
Program Co-chair:
Rui Tan

This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SenSys '24

Sponsor:

SenSys '24: 22nd ACM Conference on Embedded Networked Sensor Systems

November 4 - 7, 2024

Hangzhou, China

Acceptance Rates

Overall Acceptance Rate 174 of 867 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
232
Total Downloads

Downloads (Last 12 months)232
Downloads (Last 6 weeks)232

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents