More Web Proxy on the site http://driver.im/

research-article

Rethinking Resource Management in Edge Learning: A Joint Pre-Training and Fine-Tuning Design Paradigm

Authors:

H. Vincent Poor,

Shuguang CuiAuthors Info & Claims

IEEE Transactions on Wireless Communications, Volume 24, Issue 2

Pages 1584 - 1601

https://doi.org/10.1109/TWC.2024.3510418

Published: 11 December 2024 Publication History

Abstract

In some applications, edge learning is experiencing a shift in focus from conventional learning from scratch to two-stage learning combining pre-training and task-specific fine-tuning. This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system. In this system, model pre-training is first conducted at an edge server via centralized learning on local pre-stored general data, and then task-specific fine-tuning is performed at edge devices based on the pre-trained model via federated edge learning. For the two-stage learning model, we first analyze the convergence behavior (in terms of the average squared gradient norm bound), which characterizes the impacts of various system parameters, such as the number of learning rounds and batch sizes in the two stages, on the convergence rate. Based on our analytical results, we then propose a joint communication and computation resource management design to minimize an average squared gradient norm bound, subject to constraints on the transmit power, overall system energy consumption, and training delay. The decision variables include the number of learning rounds, batch sizes, clock frequencies, and transmit power control for both pre-training and fine-tuning stages. Finally, numerical results are provided to evaluate the effectiveness of our proposed design. It is shown that the proposed joint resource management over the pre-training and fine-tuning stages well balances the system performance trade-off among the training accuracy, delay, and energy consumption. The proposed design is also shown to effectively leverage the inherent trade-off between pre-training and fine-tuning, which arises from the differences in data distribution between pre-stored general data versus real-time task-specific data, thus efficiently optimizing overall system performance.

References

[1]

D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 2, pp. 604–624, Feb. 2021.

[2]

Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, “Deep learning for visual understanding: A review,” Neurocomputing, vol. 187, pp. 27–48, Apr. 2016.

Digital Library

[3]

J. Achiam et al., “GPT-4 technical report,” 2023, arXiv:2303.08774.

[4]

G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, and K. Huang, “Toward an intelligent edge: Wireless communication meets machine learning,” IEEE Commun. Mag., vol. 58, no. 1, pp. 19–25, Jan. 2020.

Digital Library

[5]

K. Zhao et al., “EdgeAdaptor: Online configuration adaption, model selection and resource provisioning for edge DNN inference serving at scale,” IEEE Trans. Mobile Comput., vol. 22, no. 10, pp. 5870–5886, Oct. 2023.

Digital Library

[6]

G. Zhu et al., “Pushing AI to wireless network edge: An overview on integrated sensing, communication, and computation towards 6G,” Sci. China Inf. Sci., vol. 66, pp. 1–19, Feb. 2023.

[7]

W. Wu et al., “AI-native network slicing for 6G networks,” IEEE Wireless Commun., vol. 29, no. 1, pp. 96–103, Feb. 2022.

[8]

G. Wu, Z. Lyu, J. Zhang, and J. Xu, “Embracing radiance field rendering in 6G: Over-the-air training and inference with 3-D contents,” IEEE Open J. Commun. Soc., vol. 5, pp. 4275–4292, 2024.

[9]

Z. Lyu, G. Zhu, and J. Xu, “Joint maneuver and beamforming design for UAV-enabled integrated sensing and communication,” IEEE Trans. Wireless Commun., vol. 22, no. 4, pp. 2424–2440, Apr. 2023.

Digital Library

[10]

Z. Zhou, Q. Wu, and X. Chen, “Online orchestration of cross-edge service function chaining for cost-efficient edge computing,” IEEE J. Sel. Areas Commun., vol. 37, no. 8, pp. 1866–1880, Aug. 2019.

Digital Library

[11]

Z. Lyu, G. Zhu, J. Xu, B. Ai, and S. Cui, “Semantic communications for image recovery and classification via deep joint source and channel coding,” IEEE Trans. Wireless Commun., vol. 23, no. 8, pp. 8388–8404, Aug. 2024.

Digital Library

[12]

W. Xu, Z. Yang, D. W. K. Ng, M. Levorato, Y. C. Eldar, and M. Debbah, “Edge learning for B5G networks with distributed signal processing: Semantic communication, edge computing, and wireless sensing,” IEEE J. Sel. Topics Signal Process., vol. 17, no. 1, pp. 9–39, Jan. 2023.

[13]

D. Liu, G. Zhu, J. Zhang, and K. Huang, “Data-importance aware user scheduling for communication-efficient edge machine learning,” IEEE Trans. Cognit. Commun. Netw., vol. 7, no. 1, pp. 265–278, Mar. 2021.

[14]

X. Cao, Z. Lyu, G. Zhu, J. Xu, L. Xu, and S. Cui, “An overview on over-the-air federated edge learning,” IEEE Wireless Commun., vol. 31, no. 3, pp. 202–210, Jun. 2024.

Digital Library

[15]

L. You, S. Liu, B. Zuo, C. Yuen, D. Niyato, and H. V. Poor, “Federated and asynchronized learning for autonomous and intelligent things,” IEEE Netw., vol. 38, no. 2, pp. 286–293, Mar. 2024.

Digital Library

[16]

X. Shen, J. Gao, W. Wu, M. Li, C. Zhou, and W. Zhuang, “Holistic network virtualization and pervasive network intelligence for 6G,” IEEE Commun. Surveys Tuts., vol. 24, no. 1, pp. 1–30, 1st Quart., 2022.

[17]

Y. Yang, Z. Zhang, and Q. Yang, “Communication-efficient federated learning with binary neural networks,” IEEE J. Sel. Areas Commun., vol. 39, no. 12, pp. 3836–3850, Dec. 2021.

[18]

Y. Jiang et al., “Model pruning enables efficient federated learning on edge devices,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 12, pp. 10374–10386, Jul. 2023.

[19]

Y. Shi, S. Xia, Y. Zhou, Y. Mao, C. Jiang, and M. Tao, “Vertical federated learning over cloud-RAN: Convergence analysis and system optimization,” IEEE Trans. Wireless Commun., vol. 23, no. 2, pp. 1327–1342, Feb. 2024.

Digital Library

[20]

H. Zhao, Y. Tan, K. Guo, W. Xia, B. Xu, and T. Q. S. Quek, “Client scheduling for multiserver federated learning in industrial IoT with unreliable communications,” IEEE Internet Things J., vol. 11, no. 9, pp. 16478–16490, May 2024.

[21]

Z. Yang, M. Chen, W. Saad, C. S. Hong, and M. Shikh-Bahaei, “Energy efficient federated learning over wireless communication networks,” IEEE Trans. Wireless Commun., vol. 20, no. 3, pp. 1935–1949, Mar. 2021.

Digital Library

[22]

X. Han et al., “Analysis and optimization of wireless federated learning with data heterogeneity,” IEEE Trans. Wireless Commun., vol. 23, no. 7, pp. 7728–7744, Jul. 2024.

Digital Library

[23]

Z. Chen, W. Yi, A. Nallanathan, and G. Ye Li, “Efficient wireless federated learning with partial model aggregation,” 2022, arXiv:2204.09746.

[24]

Y. Sun, S. Zhou, Z. Niu, and D. Gündüz, “Dynamic scheduling for over-the-air federated edge learning with energy constraints,” IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 227–242, Jan. 2022.

Digital Library

[25]

Y. Li, X. Qin, K. Han, N. Ma, X. Xu, and P. Zhang, “Accelerating wireless federated learning with adaptive scheduling over heterogeneous devices,” IEEE Internet Things J., vol. 11, no. 2, pp. 2286–2302, Jan. 2023.

[26]

Y. Li, Y. Cui, and V. Lau, “GQFedWAvg: Optimization-based quantized federated learning in general edge computing systems,” IEEE Trans. Wireless Commun., vol. 23, no. 7, pp. 6856–6872, Jul. 2024.

Digital Library

[27]

X. Mo and J. Xu, “Energy-efficient federated edge learning with joint communication and computation design,” J. Commun. Inf. Netw., vol. 6, no. 2, pp. 110–124, Jun. 2021.

[28]

M. S. Al-Abiad, M. Obeed, M. J. Hossain, and A. Chaaban, “Decentralized aggregation for energy-efficient federated learning via d2d communications,” IEEE Trans. Commun., vol. 71, no. 6, pp. 3333–3351, Jun. 2023.

[29]

S. Luo, X. Chen, Q. Wu, Z. Zhou, and S. Yu, “HFEL: Joint edge association and resource allocation for cost-efficient hierarchical federated edge learning,” IEEE Trans. Wireless Commun., vol. 19, no. 10, pp. 6535–6548, Oct. 2020.

[30]

J. Du, B. Jiang, C. Jiang, Y. Shi, and Z. Han, “Gradient and channel aware dynamic scheduling for over-the-air computation in federated edge learning systems,” IEEE J. Sel. Areas Commun., vol. 41, no. 4, pp. 1035–1050, Apr. 2023.

Digital Library

[31]

L. Li, D. Shi, R. Hou, H. Li, M. Pan, and Z. Han, “To talk or to work: Flexible communication compression for energy efficient federated learning over heterogeneous mobile edge devices,” in Proc. IEEE Conf. Comput. Commun. (INFOCOM), May 2021, pp. 1–10.

[32]

Z. Chen, W. Yi, Y. Liu, and A. Nallanathan, “Knowledge-aided federated learning for energy-limited wireless networks,” IEEE Trans. Commun., vol. 71, no. 6, pp. 3368–3386, Jun. 2023.

[33]

L. Bariah, Q. Zhao, H. Zou, Y. Tian, F. Bader, and M. Debbah, “Large generative AI models for telecom: The next big thing?” IEEE Commun. Mag., vol. 62, no. 11, pp. 84–90, Nov. 2024.

Digital Library

[34]

J. Wang and Y. Chen, Introduction to Transfer Learning: Algorithms and Practice. Cham, Switzerland: Springer, 2023.

[35]

Y. Shen et al., “Large language models empowered autonomous edge AI for connected intelligence,” IEEE Commun. Mag., vol. 62, no. 10, pp. 140–146, Oct. 2024.

Digital Library

[36]

D. Narayanan et al., “Efficient large-scale language model training on GPU clusters using Megatron-LM,” in Proc. IEEE SC, Nov. 2021, pp. 1–15.

[37]

A. S. Luccioni, S. Viguier, and A. L. Ligozat, “Estimating the carbon footprint of BLOOM, a 176B parameter language model,” J. Mach. Learn. Res., vol. 24, no. 253, pp. 1–15, Jun. 2023.

[38]

A. Hilmkil, S. Callh, M. Barbieri, L. R. Sfeld, E. L. Zec, and O. Mogren, “Scaling federated learning for fine-tuning of large language models,” in Proc. NLDB, 2021, pp. 15–23.

[39]

S. Savazzi, V. Rampa, S. Kianoush, and M. Bennis, “On the energy and communication efficiency tradeoffs in federated and multi-task learning,” in Proc. IEEE 33rd Annu. Int. Symp. Pers., Indoor Mobile Radio Commun. (PIMRC), Sep. 2022, pp. 1431–1437.

[40]

M. Xu, D. Cai, Y. Wu, X. Li, and S. Wang, “FwdLLM: Efficient FedLLM using forward gradient,” 2023, arXiv:2308.13894.

[41]

H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. PMLR, Apr. 2017, pp. 1273–1282.

[42]

X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of FedAvg on non-IID data,” 2019, arXiv:1907.02189.

[43]

Q. Zeng, Y. Du, K. Huang, and K. K. Leung, “Energy-efficient resource management for federated edge learning with CPU-GPU heterogeneous computing,” IEEE Trans. Wireless Commun., vol. 20, no. 12, pp. 7947–7962, Dec. 2021.

[44]

L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for largescale machine learning,” SIAM Rev., vol. 60, no. 2, pp. 223–311, 2018.

[45]

N. Tripuraneni, M. Jordan, and C. Jin, “On the theory of transfer learning: The importance of task diversity,” in Proc. NeurIPS, 2020, pp. 7852–7862.

[46]

V. M. Panaretos and Y. Zemel, “Statistical aspects of Wasserstein distances,” Annu. Rev. Statist. Appl., vol. 6, pp. 405–431, Mar. 2019.

[47]

M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 214–223.

[48]

M. Grand and S. Boyd. (2016). CVX: MATLAB Software for Disciplined Convex Programming. [Online]. Available: http://cvxr.com/cvx

[49]

Y. LeCun, C. Cortes, and C. Burges. The MNIST Database of Handwritten Digits. [Online]. Available: http://yann.lecun.com/exdb/mnist

[50]

A. Krizhevsky, “Learning multiple layers of features from tiny images,” Univ. Toronto, Tech. Rep., 2009.

[51]

L. N. Darlow, E. J. Crowley, A. Antoniou, and A. J. Storkey, “CINIC-10 is not ImageNet or CIFAR-10,” 2018, arXiv:1810.03505.

[52]

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.

[53]

C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 1–9.

[54]

X. Deng et al., “Low-latency federated learning with DNN partition in distributed industrial IoT networks,” IEEE J. Sel. Areas Commun., vol. 41, no. 3, pp. 755–775, Mar. 2023.

[55]

M. Wu, D. Ye, J. Ding, Y. Guo, R. Yu, and M. Pan, “Incentivizing differentially private federated learning: A multidimensional contract approach,” IEEE Internet Things J., vol. 8, no. 13, pp. 10639–10651, Jul. 2021.

[56]

D. Amodei, D. Hernandez, G. Sastry, J. Clark, G. Brockman, and I. Sutskever. AI and Compute. OpenAI. Accessed: May 16, 2018. [Online]. Available: https://openai.com/research/ai-and-compute

[57]

Q. Li, Y. Diao, Q. Chen, and B. He, “Federated learning on non-IID data silos: An experimental study,” in Proc. IEEE 38th Int. Conf. Data Eng. (ICDE), May 2022, pp. 965–978.

[58]

K. Hsieh, A. Phanishayee, O. Mutlu, and P. Gibbons, “The non-IID data quagmire of decentralized machine learning,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 4387–4398.

[59]

T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” in Proc. Mach. Learn. Syst. (MLSys), 2020, pp. 250–429.

[60]

C. Villani, Optimal Transport: Old and New. Cham, Switzerland: Springer, 2008.

Index Terms

Rethinking Resource Management in Edge Learning: A Joint Pre-Training and Fine-Tuning Design Paradigm
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by regression
    2. Machine learning approaches
      1. Neural networks
2. Networks
  1. Network algorithms
    1. Control path algorithms
      1. Network resources allocation
  2. Network performance evaluation
    1. Network performance modeling

Index terms have been assigned to the content through auto-classification.

Recommendations

Rethinking pre-training on medical imaging
Abstract
Transfer learning from natural image datasets, such as ImageNet, is common for applying deep learning to medical imaging. However, the modalities of natural and medical images differ considerably, and the reason for the latest medical ...
Highlights
- We study the transfer learning effectiveness based on medical and natural images.
Improved fine-tuning by better leveraging pre-training data
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing Systems

As a dominant paradigm, fine-tuning a pre-trained model on the target data is widely used in many deep learning applications, especially for small data sets. However, recent studies have empirically shown that training from scratch has the final ...
Adaptive Transfer Learning via Fine-grained Multi-task Pre-training
ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

Nowadays pre-training paradigm has been widely adopted for deep learning-based applications. In multiple pre-training tasks, conventional methods process them using naive Multi-Task Learning (MTL) technology. The pre-trained models are unavoidably ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Wireless Communications

IEEE Transactions on Wireless Communications Volume 24, Issue 2

Feb. 2025

877 pages

Issue’s Table of Contents

1536-1276 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 11 December 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents