[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

PiPar: : Pipeline parallelism for collaborative machine learning

Published: 18 November 2024 Publication History

Abstract

Collaborative machine learning (CML) techniques, such as federated learning, have been proposed to train deep learning models across multiple mobile devices and a server. CML techniques are privacy-preserving as a local model that is trained on each device instead of the raw data from the device is shared with the server. However, CML training is inefficient due to low resource utilization. We identify idling resources on the server and devices due to sequential computation and communication as the principal cause of low resource utilization. A novel framework PiPar that leverages pipeline parallelism for CML techniques is developed to substantially improve resource utilization. A new training pipeline is designed to parallelize the computations on different hardware resources and communication on different bandwidth resources, thereby accelerating the training process in CML. A low overhead automated parameter selection method is proposed to optimize the pipeline, maximizing the utilization of available resources. The experimental results confirm the validity of the underlying approach of PiPar and highlight that when compared to federated learning: (i) the idle time of the server can be reduced by up to 64.1×, and (ii) the overall training time can be accelerated by up to 34.6× under varying network conditions for a collection of six small and large popular deep neural networks and four datasets without sacrificing accuracy. It is also experimentally demonstrated that PiPar achieves performance benefits when incorporating differential privacy methods and operating in environments with heterogeneous devices and changing bandwidths.

Highlights

Compute resources are underutilized in collaborative machine learning.
Underutilization leads to idle time and increases overall training time.
Our work Pipar uses pipeline parallelism to reduce idle time and accelerate training.
Pipar overlaps computation and communication.
Pipar reduces idle time by up to 64.1x and accelerates training by up to 34.6x.

References

[1]
M. Abadi, A. Chu, I. Goodfellow, H.B. McMahan, I. Mironov, K. Talwar, L. Zhang, Deep learning with differential privacy, in: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016, pp. 308–318.
[2]
T. Brown, B. Mann, N. Ryder, M. Subbiah, J.D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language models are few-shot learners, in: Advances in Neural Information Processing Systems, 2020, pp. 1877–1901.
[3]
X. Chen, J. Li, C. Chakrabarti, Communication and computation reduction for split learning using asynchronous training, in: IEEE Workshop on Signal Processing Systems, 2021, pp. 76–81.
[4]
L. Deng, The MNIST database of handwritten digit images for machine learning research, IEEE Signal Processing Magazine 29 (2012) 141–142.
[5]
J. Devlin, M.W. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, in: Conference of the North American Chapter of the Association forComputational Linguistics: Human Language Technologies, 2019, pp. 4171–4186.
[6]
Y. Gao, M. Kim, S. Abuadbba, Y. Kim, C. Thapa, K. Kim, S.A. Camtep, H. Kim, S. Nepal, End-to-end evaluation of federated learning and split learning for Internet of things, in: International Symposium on Reliable Distributed Systems, 2020, pp. 91–100.
[7]
A. Graves, Sequence transduction with recurrent neural networks, in: International Conference of Machine Learning Workshop, 2012.
[8]
O. Gupta, R. Raskar, Distributed learning of deep neural network over multiple agents, J. Netw. Comput. Appl. 116 (2018) 1–8.
[9]
D.J. Han, D.Y. Kim, M. Choi, C.G. Brinton, J. Moon, Splitgp: achieving both generalization and personalization in federated learning, in: IEEE International Conference on Computer Communications, 2023, pp. 1–10.
[10]
Hannun, A.Y.; Case, C.; Casper, J.; Catanzaro, B.; Diamos, G.; Elsen, E.; Prenger, R.; Satheesh, S.; Sengupta, S.; Coates, A.; Ng, A.Y. (2014): Deep speech: scaling up end-to-end speech recognition. CoRR arXiv:1412.5567 [abs].
[11]
C. He, M. Annavaram, S. Avestimehr, Group knowledge transfer: federated learning of large CNNs at the edge, in: Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020.
[12]
K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN, in: IEEE International Conference on Computer Vision, 2017, pp. 2980–2988.
[13]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[14]
A. Howard, M. Sandler, G. Chu, L. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q.V. Le, H. Adam, Searching for MobileNetV3, in: IEEE/CVF International Conference on Computer Vision, 2019, pp. 1314–1324.
[15]
Y. Huang, Y. Cheng, A. Bapna, O. Firat, D. Chen, M. Chen, H. Lee, J. Ngiam, Q.V. Le, Y. Wu, z. Chen, GPipe: efficient training of giant neural networks using pipeline parallelism, in: Advances in Neural Information Processing Systems, 2019.
[16]
Z. Ji, L. Chen, N. Zhao, Y. Chen, G. Wei, F.R. Yu, Computation offloading for edge-assisted federated learning, IEEE Trans. Veh. Technol. 70 (2021) 9330–9344.
[17]
A. Kendall, Y. Gal, What uncertainties do we need in Bayesian deep learning for computer vision?, in: 31st International Conference on Neural Information Processing Systems, 2017, pp. 5580–5590.
[18]
N.S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, P.T.P. Tang, On large-batch training for deep learning: generalization gap and sharp minima, in: International Conference on Learning Representations, 2017.
[19]
J. Konečný, B. McMahan, D. Ramage, Federated Optimization: Distributed Optimization Beyond the Datacenter, in: 8th NIPS Workshop on Optimization for Machine Learning, 2015.
[20]
Konečný, J.; McMahan, H.B.; Ramage, D.; Richtárik, P. (2016): Federated Optimization: Distributed Machine Learning for On-Device Intelligence. CoRR abs/1610.02527.
[21]
Konečný, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. (2016): Federated learning: strategies for improving communication efficiency. CoRR arXiv:1610.05492 [abs].
[22]
A. Krizhevsky, G. Hinton, Learning Multiple Layers of Features from Tiny Images, Master's thesis Department of Computer Science, University of Toronto, 2009.
[23]
A. Krizhevsky, V. Nair, G. Hinton, Canadian Institute for Advanced Research, 2009, CIFAR-10.
[24]
M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, S. Jana, Certified robustness to adversarial examples with differential privacy, in: 2019 IEEE Symposium on Security and Privacy (SP), 2019, pp. 656–672.
[25]
S. Li, T. Hoefler, Chimera: efficiently training large-scale neural networks with bidirectional pipelines, in: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021.
[26]
J. Liu, H. Xu, L. Wang, Y. Xu, C. Qian, J. Huang, H. Huang, Adaptive asynchronous federated learning in resource-constrained edge computing, IEEE Trans. Mob. Comput. 22 (2023).
[27]
L. Lockhart, P. Harvey, P. Imai, P. Willis, B. Varghese, Scission: context-aware and performance-driven edge-based distributed deep neural networks, in: 13th IEEE/ACM International Conference on Utility and Cloud Computing, 2020.
[28]
B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A.y. Arcas, Communication-efficient learning of deep networks from decentralized data, in: 20th International Conference on Artificial Intelligence and Statistics, 2017, pp. 1273–1282.
[29]
D. Narayanan, A. Harlap, A. Phanishayee, V. Seshadri, N.R. Devanur, G.R. Ganger, P.B. Gibbons, M. Zaharia, PipeDream: generalized pipeline parallelism for DNN training, in: 27th ACM Symposium on Operating Systems Principles, 2019, pp. 1–15.
[30]
T. Nishio, R. Yonetani, Client selection for federated learning with heterogeneous resources in mobile edge, IEEE International Conference on Communications, 2019, pp. 1–7.
[31]
K. Osawa, S. Li, T. Hoefler, PipeFisher: efficient training of large language models using pipelining and Fisher information matrices, in: Proceedings of Machine Learning and Systems, 2023.
[32]
Pal, S.; Uniyal, M.; Park, J.; Vepakomma, P.; Raskar, R.; Bennis, M.; Jeon, M.; Choi, J. (2021): Server-side local gradient averaging and learning rate acceleration for scalable split learning. CoRR arXiv:2112.05929 [abs].
[33]
Z. Qu, S. Guo, H. Wang, B. Ye, Y. Wang, A. Zomaya, B. Tang, Partial synchronization to accelerate federated learning over relay-assisted edge networks, IEEE Trans. Mob. Comput. (2021) 1.
[34]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.S.; Berg, A.C.; Fei-Fei, L. (2014): ImageNet large scale visual recognition challenge. CoRR arXiv:1409.0575 [abs].
[35]
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: 3rd International Conference on Learning Representations, 2015, pp. 1–14.
[36]
Singh, A.; Vepakomma, P.; Gupta, O.; Raskar, R. (2019): Detailed comparison of communication efficiency of split learning and federated learning. CoRR arXiv:1909.09145 [abs].
[37]
C. Thapa, M.A.P. Chamikara, S. Camtepe, SplitFed: when federated learning meets split learning, AAAI Conf. Artif. Intell. 36 (8) (2022) 8485–8493.
[38]
C. Thapa, M.A.P. Chamikara, S.A. Camtepe, Advancements of federated learning towards privacy preservation: from federated learning to split learning, in: M.H.u. Rehman, M.M. Gaber (Eds.), Federated Learning Systems: Towards Next-Generation AI, Springer International Publishing, 2021, pp. 79–109.
[39]
P. Vepakomma, O. Gupta, T. Swedish, R. Raskar, Split learning for health: distributed deep learning without sharing raw patient data, ICLR Workshop on AI for Social Good, 2019.
[40]
Z. Wang, H. Xu, J. Liu, Y. Xu, H. Huang, Y. Zhao, Accelerating federated learning with cluster construction and hierarchical aggregation, IEEE Trans. Mob. Comput. (2022) 1.
[41]
D. Wu, R. Ullah, P. Harvey, P. Kilpatrick, I. Spence, B. Varghese, FedAdapt: adaptive offloading for IoT devices in federated learning, IEEE Int. Things J. 9 (2022) 20889–20901.
[42]
W. Xu, W. Fang, Y. Ding, M. Zou, N. Xiong, Accelerating federated learning for IoT in big data analytics with pruning, quantization and selective updating, IEEE Access 9 (2021) 38457–38466.
[43]
Z. Xu, F. Yu, J. Xiong, X. Chen, Helios: heterogeneity-aware federated learning with dynamically balanced collaboration, in: 58th ACM/IEEE Design Automation Conference, 2021, pp. 997–1002.
[44]
Z. Xu, Y. Zhang, G. Andrew, C. Choquette, P. Kairouz, B. Mcmahan, J. Rosenstock, Y. Zhang, Federated learning of gboard language models with differential privacy, in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023, pp. 629–639.
[45]
B. Yang, J. Zhang, J. Li, C. Ré, C.R. Aberger, C.D. Sa, PipeMare: asynchronous pipeline parallel DNN training, in: Seventh Conference on Machine Learning and Systems, 2024.
[46]
Y. Ye, S. Li, F. Liu, Y. Tang, W. Hu, EdgeFed: optimized federated learning based on edge computing, IEEE Access 8 (2020) 209191–209198.
[47]
R. Yu, P. Li, Toward resource-efficient federated learning in mobile edge computing, IEEE Netw. 35 (2021) 148–155.
[48]
Y. Zhou, Q. Ye, J. Lv, Communication-efficient federated learning with compensated overlap-FedAvg, IEEE Trans. Parallel Distrib. Syst. 33 (2022) 192–205.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing  Volume 193, Issue C
Nov 2024
239 pages

Publisher

Academic Press, Inc.

United States

Publication History

Published: 18 November 2024

Author Tags

  1. Collaborative machine learning
  2. Resource utilization
  3. Pipeline parallelism
  4. Edge computing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media