Virtual Machine Placement in Edge Computing Based on Multi-Objective Reinforcement Learning
<p>The three-tier architecture for MEC.</p> "> Figure 2
<p>The hypervolume of a maximization problem with two objective functions.</p> "> Figure 3
<p>The hypervolume of the eight tuples of weighting coefficients when the MEC server number is 80 and the VM request number is 500.</p> "> Figure 4
<p>The relationship between the response latency of all VM requests and the VM request number for different numbers of MEC servers.</p> "> Figure 5
<p>The relationship between the energy consumption of all servers and the VM request number for different numbers of MEC servers.</p> "> Figure 6
<p>The connection between hypervolume and the VM request number when the MEC server number is 80.</p> "> Figure 7
<p>Variation in hypervolume with iterations when the MEC server number is 80 and the VM request number is 500.</p> "> Figure 8
<p>The running time of EVMPRL when the MEC server number is 160.</p> ">
Abstract
:1. Introduction
- This paper uses the Chebyshev scalarization function to design EVMPRL, which could solve the weighting coefficient selection problem in multi-objective VM placement in edge computing and significantly improve the solution set quality.
- EVMPRL can always search for solutions in the feasible domain, which is guaranteed through selecting the servers that can satisfy the current VM request as the next action. Thus, the additional operations required in the previous work are avoided.
- EVMPRL scalarizes the Q-value instead of the objective function, thus avoiding the problem in previous work in which the order-of-magnitude difference between the optimization objectives makes the impact of an objective function on the final result too small. To prevent excessive order-of-magnitude differences from affecting the results, we normalize two reward functions and keep the reward values between [0,1].
- The experimental results prove that EVMPRL is superior to the state-of-the-art algorithm in terms of objectives and the solution set quality.
2. Related Work
2.1. VM Placement in Cloud Computing
2.2. VM Placement in Edge Computing
3. Background
3.1. Pareto Domination
- , .
- , .
3.2. Reinforcement Learning
4. Problem Formulation
4.1. Systems Modeling
4.2. Response Latency Model
4.3. Energy Consumption Model
4.4. Mathematical Formulation
5. Proposed Algorithms
5.1. The Learning Agent
5.1.1. State Space
5.1.2. Action Space
5.1.3. Reward Vector
5.2. Scalarization Functions in RL Algorithms
5.2.1. Linear Scalarization Function
5.2.2. Chebyshev Scalarization Function
5.3. Scalarized Policies
5.3.1. The Scalarized Policy of EVMPRL
Algorithm 1 -sgreedy() |
5.3.2. The Scalarized Policy of L-EVMPRL
5.4. The Detailed Pseudo-Codes of Two Algorithms
- Parameter initialization. Initialize the current approximate optimal solution set , the -value table, and the set of current unplaced VMs.
- Solution construction. The learning agent selects the action for the current state based on Algorithm 1, updates the decision variable, calculates the reward vector, and updates the corresponding -value.
- Update the approximate solution set. EVMPRL compares the constructed with the current approximate solution set and updates .
- Termination detection. If the maximum iteration is reached, EVMPRL returns the approximate solution set . If not, return to step 2.
Algorithm 2 Pseudocode of EVMPRL |
|
6. Experimental Results
6.1. Simulation Setup
6.2. The Quality of the Solution Set
6.3. The Selection of Weighting Coefficients
6.4. Algorithm Comparison
6.5. Convergence Analysis
6.6. Scalability Analysis
- Based on the reward functions for response latency and energy consumption, EVMPRL prefers servers that produce the lowest possible response latency and energy consumption for the current VM request. At the cost of narrowing the search space, the response latency and energy consumption can approximate to relatively optimal values.
- When selecting an action, the scalarized strategy can remove low-quality solutions. This evaluates the distance from the current solution to the utopian point, and prefers the action with the smallest SQ-value to accelerate algorithm convergence.
- To avoid falling into local optimal, we balance exploration and exploitation by using the scalarized -sgreedy policy.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Pütz, P.; Mitev, R.; Miettinen, M.; Sadeghi, A.R. Unleashing IoT Security: Assessing the Effectiveness of Best Practices in Protecting Against Threats. In Proceedings of the ACSAC ’23: 39th Annual Computer Security Applications Conference, New York, NY, USA, 4–8 December 2023; pp. 190–204. [Google Scholar]
- Sharma, M.; Tomar, A.; Hazra, A. Edge computing for industry 5.0: Fundamental, applications and research challenges. IEEE Internet Things J. 2024, 11, 19070–19093. [Google Scholar] [CrossRef]
- Chen, Z.; Cheng, Z.; Luo, W.; Ao, J.; Liu, Y.; Sheng, K.; Chen, L. FSMFA: Efficient firmware-secure multi-factor authentication protocol for IoT devices. Internet Things 2023, 21, 100685. [Google Scholar] [CrossRef]
- Yang, J.; Shah, A.A.; Pezaros, D. A Survey of Energy Optimization Approaches for Computational Task Offloading and Resource Allocation in MEC Networks. Electronics 2023, 12, 3548. [Google Scholar] [CrossRef]
- Alashaikh, A.; Alanazi, E.; Al-Fuqaha, A. A survey on the use of preferences for virtual machine placement in cloud data centers. ACM Comput. Surv. (CSUR) 2021, 54, 1–39. [Google Scholar] [CrossRef]
- Silva Filho, M.C.; Monteiro, C.C.; Inácio, P.R.; Freire, M.M. Approaches for optimizing virtual machine placement and migration in cloud environments: A survey. J. Parallel Distrib. Comput. 2018, 111, 222–250. [Google Scholar] [CrossRef]
- Sun, Z.; Yang, H.; Li, C.; Yao, Q.; Teng, Y.; Zhang, J.; Liu, S.; Li, Y.; Vasilakos, A.V. A resource allocation scheme for edge computing network in smart city based on attention mechanism. ACM Trans. Sens. Netw. 2024. [Google Scholar] [CrossRef]
- Al-Hammadi, I.; Li, M.; Islam, S.M.; Al-Mosharea, E. Collaborative computation offloading for scheduling emergency tasks in SDN-based mobile edge computing networks. Comput. Netw. 2024, 238, 110101. [Google Scholar] [CrossRef]
- Lee, K.; Lee, S.; Lee, J. Interactive character animation by learning multi-objective control. ACM Trans. Graph. (TOG) 2018, 37, 1–10. [Google Scholar] [CrossRef]
- Das, I.; Dennis, J.E. A closer look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multicriteria optimization problems. Struct. Optim. 1997, 14, 63–69. [Google Scholar] [CrossRef]
- Qin, Y.; Wang, H.; Yi, S.; Li, X.; Zhai, L. A multi-objective reinforcement learning algorithm for deadline constrained scientific workflow scheduling in clouds. Front. Comput. Sci. 2021, 15, 155105. [Google Scholar] [CrossRef]
- Yi, S.; Li, X.; Wang, H.; Qin, Y.; Yan, J. Energy-aware disaster backup among cloud datacenters using multiobjective reinforcement learning in software defined network. Concurr. Comput. Pract. Exp. 2022, 34, e6588. [Google Scholar] [CrossRef]
- Van Moffaert, K.; Drugan, M.M.; Nowé, A. Scalarized multi-objective reinforcement learning: Novel design techniques. In Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), Singapore, 16–19 April 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 191–199. [Google Scholar]
- Jian, C.; Bao, L.; Zhang, M. A high-efficiency learning model for virtual machine placement in mobile edge computing. Clust. Comput. 2022, 25, 3051–3066. [Google Scholar] [CrossRef]
- Xu, H.; Jian, C. A meta reinforcement learning-based virtual machine placement algorithm in mobile edge computing. Clust. Comput. 2024, 27, 1883–1896. [Google Scholar] [CrossRef]
- Zhang, L.; Zhuge, S.; Wang, Y.; Xu, H.; Sun, E. Energy-delay tradeoff for virtual machine placement in virtualized multi-access edge computing: A two-sided matching approach. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 6603–6621. [Google Scholar] [CrossRef]
- Jia, M.; Cao, J.; Liang, W. Optimal cloudlet placement and user to cloudlet allocation in wireless metropolitan area networks. IEEE Trans. Cloud Comput. 2015, 5, 725–737. [Google Scholar] [CrossRef]
- Li, Y.; Wang, S. An Energy-Aware Edge Server Placement Algorithm in Mobile Edge Computing. In Proceedings of the 2018 IEEE International Conference on Edge Computing (EDGE), San Francisco, CA, USA, 2–7 July 2018; pp. 66–73. [Google Scholar] [CrossRef]
- Gábor, Z.; Kalmár, Z.; Szepesvári, C. Multi-criteria reinforcement learning. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), Madison, Wisconsin, WI, USA, 24–27 July 1998; Volume 98, pp. 197–205. [Google Scholar]
- Kong, Y.; He, Y.; Abnoosian, K. Nature-inspired virtual machine placement mechanisms: A systematic review. Concurr. Comput. Pract. Exp. 2022, 34, e6900. [Google Scholar] [CrossRef]
- Gao, Y.; Guan, H.; Qi, Z.; Hou, Y.; Liu, L. A multi-objective ant colony system algorithm for virtual machine placement in cloud computing. J. Comput. Syst. Sci. 2013, 79, 1230–1242. [Google Scholar] [CrossRef]
- Shaw, R.; Howley, E.; Barrett, E. An energy efficient anti-correlated virtual machine placement algorithm using resource usage predictions. Simul. Model. Pract. Theory 2019, 93, 322–342. [Google Scholar] [CrossRef]
- Li, Z.; Li, Y.; Yuan, T.; Chen, S.; Jiang, S. Chemical reaction optimization for virtual machine placement in cloud computing. Appl. Intell. 2019, 49, 220–232. [Google Scholar] [CrossRef]
- Zhao, D.; Zhou, J. An energy and carbon-aware algorithm for renewable energy usage maximization in distributed cloud data centers. J. Parallel Distrib. Comput. 2022, 165, 156–166. [Google Scholar] [CrossRef]
- Yang, C.; Xia, Y. Interval Pareto front-based multi-objective robust optimization for sensor placement in structural modal identification. Reliab. Eng. Syst. Saf. 2024, 242, 109703. [Google Scholar] [CrossRef]
- Yan, J.; Wang, H.; Li, X.; Yi, S.; Qin, Y. Multi-objective disaster backup in inter-datacenter using reinforcement learning. In Proceedings of the Wireless Algorithms, Systems, and Applications: 15th International Conference, WASA 2020, Qingdao, China, 13–15 September 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 590–601, Proceedings, Part I 15. [Google Scholar]
- Verma, A.; Kaushal, S. A hybrid multi-objective particle swarm optimization for scientific workflow scheduling. Parallel Comput. 2017, 62, 1–19. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Watkins, C.J.C.H. Learning from Delayed Rewards. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 1989. [Google Scholar]
- Tsitsiklis, J.N. Asynchronous stochastic approximation and Q-learning. Mach. Learn. 1994, 16, 185–202. [Google Scholar] [CrossRef]
- Wiering, M.A.; De Jong, E.D. Computing optimal stationary policies for multi-objective markov decision processes. In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, Honolulu, HI, USA, 1–5 April 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 158–165. [Google Scholar]
- Hua, H.; Li, Y.; Wang, T.; Dong, N.; Li, W.; Cao, J. Edge computing with artificial intelligence: A machine learning perspective. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, H.; Yu, L.; Xu, H.; Song, L.; Han, Z. Virtual resource allocation for mobile edge computing: A hypergraph matching approach. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
- Fan, X.; Weber, W.D.; Barroso, L.A. Power provisioning for a warehouse-sized computer. ACM SIGARCH Comput. Archit. News 2007, 35, 13–23. [Google Scholar] [CrossRef]
- Liu, X.F.; Zhan, Z.H.; Deng, J.D.; Li, Y.; Gu, T.; Zhang, J. An energy efficient ant colony system for virtual machine placement in cloud computing. IEEE Trans. Evol. Comput. 2016, 22, 113–128. [Google Scholar] [CrossRef]
- Voß, T.; Beume, N.; Rudolph, G.; Igel, C. Scalarization versus indicator-based selection in multi-objective CMA evolution strategies. In Proceedings of the 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–6 June 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 3036–3043. [Google Scholar]
- Sonmez, C.; Ozgovde, A.; Ersoy, C. Edgecloudsim: An environment for performance evaluation of edge computing systems. Trans. Emerg. Telecommun. Technol. 2018, 29, e3493. [Google Scholar] [CrossRef]
- Ding, W.; Luo, F.; Han, L.; Gu, C.; Lu, H.; Fuentes, J. Adaptive virtual machine consolidation framework based on performance-to-power ratio in cloud data centers. Future Gener. Comput. Syst. 2020, 111, 254–270. [Google Scholar] [CrossRef]
- Tandon, A.; Patel, S. DBSCAN based approach for energy efficient VM placement using medium level CPU utilization. Sustain. Comput. Inform. Syst. 2024, 43, 101025. [Google Scholar] [CrossRef]
- Wang, J.; Wang, N.; Wang, H.; Cao, K.; El-Sherbeeny, A.M. GCP: A multi-strategy improved wireless sensor network model for environmental monitoring. Comput. Netw. 2024, 254, 110807. [Google Scholar] [CrossRef]
- Al Mindeel, T.; Spentzou, E.; Eftekhari, M. Energy, thermal comfort, and indoor air quality: Multi-objective optimization review. Renew. Sustain. Energy Rev. 2024, 202, 114682. [Google Scholar] [CrossRef]
- Qin, Y.; Wang, H.; Yi, S.; Li, X.; Zhai, L. An energy-aware scheduling algorithm for budget-constrained scientific workflows based on multi-objective reinforcement learning. J. Supercomput. 2020, 76, 455–480. [Google Scholar] [CrossRef]
Reference | Cloud/Edge | Framework | Optimization Objectives |
---|---|---|---|
Gao et al. [21] | Cloud | ACO | energy consumption, resource wastage |
Shaw et al. [22] | Cloud | ANN | energy consumption, SLA violation rate |
Li et al. [23] | Cloud | CRO | energy consumption, resource utilization |
Zhao et al. [24] | Cloud | Heuristic | the share of energy generated by renewable |
energy sources, carbon emissions utilization | |||
Jian et al. [14] | Edge | BA | power consumption, placement delay |
Xu et al. [15] | Edge | RL | energy consumption, response latency |
Zhang et al. [16] | Edge | Two-sided matching | energy consumption, communication delay |
Jia et al. [17] | Edge | Heuristic | the average waiting time |
Li et al. [18] | Edge | PSO | energy consumption, resource utilization |
MEC Server Type | Number of Cores | Memory Size (GB) | CPU Processing Speed (MIPS) |
---|---|---|---|
HP Proliant ML110 G5 | 2 | 4 | 2660 |
HP Proliant DL360 G7 | 12 | 16 | 3067 |
HP Proliant DL360 Gen9 | 36 | 64 | 2300 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yi, S.; Hong, S.; Qin, Y.; Wang, H.; Liu, N. Virtual Machine Placement in Edge Computing Based on Multi-Objective Reinforcement Learning. Electronics 2025, 14, 633. https://doi.org/10.3390/electronics14030633
Yi S, Hong S, Qin Y, Wang H, Liu N. Virtual Machine Placement in Edge Computing Based on Multi-Objective Reinforcement Learning. Electronics. 2025; 14(3):633. https://doi.org/10.3390/electronics14030633
Chicago/Turabian StyleYi, Shanwen, Shengyi Hong, Yao Qin, Hua Wang, and Naili Liu. 2025. "Virtual Machine Placement in Edge Computing Based on Multi-Objective Reinforcement Learning" Electronics 14, no. 3: 633. https://doi.org/10.3390/electronics14030633
APA StyleYi, S., Hong, S., Qin, Y., Wang, H., & Liu, N. (2025). Virtual Machine Placement in Edge Computing Based on Multi-Objective Reinforcement Learning. Electronics, 14(3), 633. https://doi.org/10.3390/electronics14030633