[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Rethinking Resource Management in Edge Learning: A Joint Pre-Training and Fine-Tuning Design Paradigm

Published: 11 December 2024 Publication History

Abstract

In some applications, edge learning is experiencing a shift in focus from conventional learning from scratch to two-stage learning combining pre-training and task-specific fine-tuning. This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system. In this system, model pre-training is first conducted at an edge server via centralized learning on local pre-stored general data, and then task-specific fine-tuning is performed at edge devices based on the pre-trained model via federated edge learning. For the two-stage learning model, we first analyze the convergence behavior (in terms of the average squared gradient norm bound), which characterizes the impacts of various system parameters, such as the number of learning rounds and batch sizes in the two stages, on the convergence rate. Based on our analytical results, we then propose a joint communication and computation resource management design to minimize an average squared gradient norm bound, subject to constraints on the transmit power, overall system energy consumption, and training delay. The decision variables include the number of learning rounds, batch sizes, clock frequencies, and transmit power control for both pre-training and fine-tuning stages. Finally, numerical results are provided to evaluate the effectiveness of our proposed design. It is shown that the proposed joint resource management over the pre-training and fine-tuning stages well balances the system performance trade-off among the training accuracy, delay, and energy consumption. The proposed design is also shown to effectively leverage the inherent trade-off between pre-training and fine-tuning, which arises from the differences in data distribution between pre-stored general data versus real-time task-specific data, thus efficiently optimizing overall system performance.

References

[1]
D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 2, pp. 604–624, Feb. 2021.
[2]
Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, “Deep learning for visual understanding: A review,” Neurocomputing, vol. 187, pp. 27–48, Apr. 2016.
[3]
J. Achiam et al., “GPT-4 technical report,” 2023, arXiv:2303.08774.
[4]
G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, and K. Huang, “Toward an intelligent edge: Wireless communication meets machine learning,” IEEE Commun. Mag., vol. 58, no. 1, pp. 19–25, Jan. 2020.
[5]
K. Zhao et al., “EdgeAdaptor: Online configuration adaption, model selection and resource provisioning for edge DNN inference serving at scale,” IEEE Trans. Mobile Comput., vol. 22, no. 10, pp. 5870–5886, Oct. 2023.
[6]
G. Zhu et al., “Pushing AI to wireless network edge: An overview on integrated sensing, communication, and computation towards 6G,” Sci. China Inf. Sci., vol. 66, pp. 1–19, Feb. 2023.
[7]
W. Wu et al., “AI-native network slicing for 6G networks,” IEEE Wireless Commun., vol. 29, no. 1, pp. 96–103, Feb. 2022.
[8]
G. Wu, Z. Lyu, J. Zhang, and J. Xu, “Embracing radiance field rendering in 6G: Over-the-air training and inference with 3-D contents,” IEEE Open J. Commun. Soc., vol. 5, pp. 4275–4292, 2024.
[9]
Z. Lyu, G. Zhu, and J. Xu, “Joint maneuver and beamforming design for UAV-enabled integrated sensing and communication,” IEEE Trans. Wireless Commun., vol. 22, no. 4, pp. 2424–2440, Apr. 2023.
[10]
Z. Zhou, Q. Wu, and X. Chen, “Online orchestration of cross-edge service function chaining for cost-efficient edge computing,” IEEE J. Sel. Areas Commun., vol. 37, no. 8, pp. 1866–1880, Aug. 2019.
[11]
Z. Lyu, G. Zhu, J. Xu, B. Ai, and S. Cui, “Semantic communications for image recovery and classification via deep joint source and channel coding,” IEEE Trans. Wireless Commun., vol. 23, no. 8, pp. 8388–8404, Aug. 2024.
[12]
W. Xu, Z. Yang, D. W. K. Ng, M. Levorato, Y. C. Eldar, and M. Debbah, “Edge learning for B5G networks with distributed signal processing: Semantic communication, edge computing, and wireless sensing,” IEEE J. Sel. Topics Signal Process., vol. 17, no. 1, pp. 9–39, Jan. 2023.
[13]
D. Liu, G. Zhu, J. Zhang, and K. Huang, “Data-importance aware user scheduling for communication-efficient edge machine learning,” IEEE Trans. Cognit. Commun. Netw., vol. 7, no. 1, pp. 265–278, Mar. 2021.
[14]
X. Cao, Z. Lyu, G. Zhu, J. Xu, L. Xu, and S. Cui, “An overview on over-the-air federated edge learning,” IEEE Wireless Commun., vol. 31, no. 3, pp. 202–210, Jun. 2024.
[15]
L. You, S. Liu, B. Zuo, C. Yuen, D. Niyato, and H. V. Poor, “Federated and asynchronized learning for autonomous and intelligent things,” IEEE Netw., vol. 38, no. 2, pp. 286–293, Mar. 2024.
[16]
X. Shen, J. Gao, W. Wu, M. Li, C. Zhou, and W. Zhuang, “Holistic network virtualization and pervasive network intelligence for 6G,” IEEE Commun. Surveys Tuts., vol. 24, no. 1, pp. 1–30, 1st Quart., 2022.
[17]
Y. Yang, Z. Zhang, and Q. Yang, “Communication-efficient federated learning with binary neural networks,” IEEE J. Sel. Areas Commun., vol. 39, no. 12, pp. 3836–3850, Dec. 2021.
[18]
Y. Jiang et al., “Model pruning enables efficient federated learning on edge devices,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 12, pp. 10374–10386, Jul. 2023.
[19]
Y. Shi, S. Xia, Y. Zhou, Y. Mao, C. Jiang, and M. Tao, “Vertical federated learning over cloud-RAN: Convergence analysis and system optimization,” IEEE Trans. Wireless Commun., vol. 23, no. 2, pp. 1327–1342, Feb. 2024.
[20]
H. Zhao, Y. Tan, K. Guo, W. Xia, B. Xu, and T. Q. S. Quek, “Client scheduling for multiserver federated learning in industrial IoT with unreliable communications,” IEEE Internet Things J., vol. 11, no. 9, pp. 16478–16490, May 2024.
[21]
Z. Yang, M. Chen, W. Saad, C. S. Hong, and M. Shikh-Bahaei, “Energy efficient federated learning over wireless communication networks,” IEEE Trans. Wireless Commun., vol. 20, no. 3, pp. 1935–1949, Mar. 2021.
[22]
X. Han et al., “Analysis and optimization of wireless federated learning with data heterogeneity,” IEEE Trans. Wireless Commun., vol. 23, no. 7, pp. 7728–7744, Jul. 2024.
[23]
Z. Chen, W. Yi, A. Nallanathan, and G. Ye Li, “Efficient wireless federated learning with partial model aggregation,” 2022, arXiv:2204.09746.
[24]
Y. Sun, S. Zhou, Z. Niu, and D. Gündüz, “Dynamic scheduling for over-the-air federated edge learning with energy constraints,” IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 227–242, Jan. 2022.
[25]
Y. Li, X. Qin, K. Han, N. Ma, X. Xu, and P. Zhang, “Accelerating wireless federated learning with adaptive scheduling over heterogeneous devices,” IEEE Internet Things J., vol. 11, no. 2, pp. 2286–2302, Jan. 2023.
[26]
Y. Li, Y. Cui, and V. Lau, “GQFedWAvg: Optimization-based quantized federated learning in general edge computing systems,” IEEE Trans. Wireless Commun., vol. 23, no. 7, pp. 6856–6872, Jul. 2024.
[27]
X. Mo and J. Xu, “Energy-efficient federated edge learning with joint communication and computation design,” J. Commun. Inf. Netw., vol. 6, no. 2, pp. 110–124, Jun. 2021.
[28]
M. S. Al-Abiad, M. Obeed, M. J. Hossain, and A. Chaaban, “Decentralized aggregation for energy-efficient federated learning via d2d communications,” IEEE Trans. Commun., vol. 71, no. 6, pp. 3333–3351, Jun. 2023.
[29]
S. Luo, X. Chen, Q. Wu, Z. Zhou, and S. Yu, “HFEL: Joint edge association and resource allocation for cost-efficient hierarchical federated edge learning,” IEEE Trans. Wireless Commun., vol. 19, no. 10, pp. 6535–6548, Oct. 2020.
[30]
J. Du, B. Jiang, C. Jiang, Y. Shi, and Z. Han, “Gradient and channel aware dynamic scheduling for over-the-air computation in federated edge learning systems,” IEEE J. Sel. Areas Commun., vol. 41, no. 4, pp. 1035–1050, Apr. 2023.
[31]
L. Li, D. Shi, R. Hou, H. Li, M. Pan, and Z. Han, “To talk or to work: Flexible communication compression for energy efficient federated learning over heterogeneous mobile edge devices,” in Proc. IEEE Conf. Comput. Commun. (INFOCOM), May 2021, pp. 1–10.
[32]
Z. Chen, W. Yi, Y. Liu, and A. Nallanathan, “Knowledge-aided federated learning for energy-limited wireless networks,” IEEE Trans. Commun., vol. 71, no. 6, pp. 3368–3386, Jun. 2023.
[33]
L. Bariah, Q. Zhao, H. Zou, Y. Tian, F. Bader, and M. Debbah, “Large generative AI models for telecom: The next big thing?” IEEE Commun. Mag., vol. 62, no. 11, pp. 84–90, Nov. 2024.
[34]
J. Wang and Y. Chen, Introduction to Transfer Learning: Algorithms and Practice. Cham, Switzerland: Springer, 2023.
[35]
Y. Shen et al., “Large language models empowered autonomous edge AI for connected intelligence,” IEEE Commun. Mag., vol. 62, no. 10, pp. 140–146, Oct. 2024.
[36]
D. Narayanan et al., “Efficient large-scale language model training on GPU clusters using Megatron-LM,” in Proc. IEEE SC, Nov. 2021, pp. 1–15.
[37]
A. S. Luccioni, S. Viguier, and A. L. Ligozat, “Estimating the carbon footprint of BLOOM, a 176B parameter language model,” J. Mach. Learn. Res., vol. 24, no. 253, pp. 1–15, Jun. 2023.
[38]
A. Hilmkil, S. Callh, M. Barbieri, L. R. Sfeld, E. L. Zec, and O. Mogren, “Scaling federated learning for fine-tuning of large language models,” in Proc. NLDB, 2021, pp. 15–23.
[39]
S. Savazzi, V. Rampa, S. Kianoush, and M. Bennis, “On the energy and communication efficiency tradeoffs in federated and multi-task learning,” in Proc. IEEE 33rd Annu. Int. Symp. Pers., Indoor Mobile Radio Commun. (PIMRC), Sep. 2022, pp. 1431–1437.
[40]
M. Xu, D. Cai, Y. Wu, X. Li, and S. Wang, “FwdLLM: Efficient FedLLM using forward gradient,” 2023, arXiv:2308.13894.
[41]
H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. PMLR, Apr. 2017, pp. 1273–1282.
[42]
X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of FedAvg on non-IID data,” 2019, arXiv:1907.02189.
[43]
Q. Zeng, Y. Du, K. Huang, and K. K. Leung, “Energy-efficient resource management for federated edge learning with CPU-GPU heterogeneous computing,” IEEE Trans. Wireless Commun., vol. 20, no. 12, pp. 7947–7962, Dec. 2021.
[44]
L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for largescale machine learning,” SIAM Rev., vol. 60, no. 2, pp. 223–311, 2018.
[45]
N. Tripuraneni, M. Jordan, and C. Jin, “On the theory of transfer learning: The importance of task diversity,” in Proc. NeurIPS, 2020, pp. 7852–7862.
[46]
V. M. Panaretos and Y. Zemel, “Statistical aspects of Wasserstein distances,” Annu. Rev. Statist. Appl., vol. 6, pp. 405–431, Mar. 2019.
[47]
M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 214–223.
[48]
M. Grand and S. Boyd. (2016). CVX: MATLAB Software for Disciplined Convex Programming. [Online]. Available: http://cvxr.com/cvx
[49]
Y. LeCun, C. Cortes, and C. Burges. The MNIST Database of Handwritten Digits. [Online]. Available: http://yann.lecun.com/exdb/mnist
[50]
A. Krizhevsky, “Learning multiple layers of features from tiny images,” Univ. Toronto, Tech. Rep., 2009.
[51]
L. N. Darlow, E. J. Crowley, A. Antoniou, and A. J. Storkey, “CINIC-10 is not ImageNet or CIFAR-10,” 2018, arXiv:1810.03505.
[52]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.
[53]
C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 1–9.
[54]
X. Deng et al., “Low-latency federated learning with DNN partition in distributed industrial IoT networks,” IEEE J. Sel. Areas Commun., vol. 41, no. 3, pp. 755–775, Mar. 2023.
[55]
M. Wu, D. Ye, J. Ding, Y. Guo, R. Yu, and M. Pan, “Incentivizing differentially private federated learning: A multidimensional contract approach,” IEEE Internet Things J., vol. 8, no. 13, pp. 10639–10651, Jul. 2021.
[56]
D. Amodei, D. Hernandez, G. Sastry, J. Clark, G. Brockman, and I. Sutskever. AI and Compute. OpenAI. Accessed: May 16, 2018. [Online]. Available: https://openai.com/research/ai-and-compute
[57]
Q. Li, Y. Diao, Q. Chen, and B. He, “Federated learning on non-IID data silos: An experimental study,” in Proc. IEEE 38th Int. Conf. Data Eng. (ICDE), May 2022, pp. 965–978.
[58]
K. Hsieh, A. Phanishayee, O. Mutlu, and P. Gibbons, “The non-IID data quagmire of decentralized machine learning,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 4387–4398.
[59]
T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” in Proc. Mach. Learn. Syst. (MLSys), 2020, pp. 250–429.
[60]
C. Villani, Optimal Transport: Old and New. Cham, Switzerland: Springer, 2008.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Wireless Communications
IEEE Transactions on Wireless Communications  Volume 24, Issue 2
Feb. 2025
877 pages

Publisher

IEEE Press

Publication History

Published: 11 December 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media