Deep Reinforcement Learning Algorithm with Long Short-Term Memory Network for Optimizing Unmanned Aerial Vehicle Information Transmission
<p>UAV communication sketch.</p> "> Figure 2
<p>DRL model for UAV messaging.</p> "> Figure 3
<p>Illustration of UAV path planning and IoT data collection. (<b>A</b>) Starting position; (<b>B</b>) UAV only connects with IoT device 1; (<b>C</b>) UAV connect with both IoT devices; (<b>D</b>) UAV only connects with IoT device 2.</p> "> Figure 4
<p>Value loss functions at different learning rates.</p> "> Figure 5
<p>Reward function.</p> "> Figure 6
<p>Comparison of data collection under different numbers of channels and IoT devices.</p> "> Figure 7
<p>Flight paths of the UAV with different algorithms and IoT device locations, while <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>3</mn> </mrow> </msup> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>9</mn> </mrow> </msup> </mrow> </semantics></math>. (<b>A</b>) IoT devices are distributed along the diagonal of the region <math display="inline"><semantics> <mo>Ω</mo> </semantics></math>; (<b>B</b>) IoT devices are distributed in an “S” shape; (<b>C</b>) IoT devices are concentrated on the left side of the diagonal of the region <math display="inline"><semantics> <mo>Ω</mo> </semantics></math>; (<b>D</b>) IoT devices are concentrated on the right side of the diagonal of the region <math display="inline"><semantics> <mo>Ω</mo> </semantics></math>.</p> "> Figure 8
<p>Amount of information collected by the UAV with different algorithms and IoT device locations. (<b>A</b>) IoT devices are distributed along the diagonal of the region <math display="inline"><semantics> <mo>Ω</mo> </semantics></math>; (<b>B</b>) IoT devices are distributed in an “S” shape; (<b>C</b>) IoT devices are concentrated on the left side of the diagonal of the region <math display="inline"><semantics> <mo>Ω</mo> </semantics></math>; (<b>D</b>) IoT devices are concentrated on the right side of the diagonal of the region <math display="inline"><semantics> <mo>Ω</mo> </semantics></math>.</p> ">
Abstract
:1. Introduction
- We establish a robust simulation environment by integrating a channel and energy consumption model tailored for fixed-wing UAVs, creating a comprehensive mathematical framework for UAV information transmission.
- Our novel DRL framework addresses the optimal control problem with mixed variables, utilizing two policy networks: a continuous policy for path control and a discrete policy for information collection.
- We incorporate an LSTM network into the state representation to enhance predictive capabilities and enable effective global optimization through systematic state adjustments.
- We optimize algorithm parameters through extensive experimentation and develop a dynamic reward function to mitigate reward sparsity, enhancing learning efficiency. Simulation results demonstrate that our algorithm achieves near-optimal data collection paths while meeting speed and energy constraints.
2. Related Work
3. Model Description
3.1. Channel Modeling
3.2. Energy Consumption Model
3.3. Problem Formulation
- represents the uplink transmission ratio of IoT devices n on channel k. We define T as the total time period during which the UAV operates to complete the data collection task.
- The parameter acts as an indicator of whether the data from IoT device n is successfully collected by the UAV, where indicates no connection with IoT device n, and represents a successful connection and data transmission with IoT device n.
- In Constraint (14), the total minimum amount of data collected by the UAV during time T is given by , where u is the minimum required data amount from each IoT device. If the UAV fails to collect the data, it incurs a negative reward, as described by in Section 3.4.3 of the reward function.
- Constraint (15) controls the channel allocation in the algorithm. Specifically, for IoT device n, if no communication channels are occupied, then , which implies . This satisfies the constraint . Conversely, if IoT device n occupies a single channel k for data transmission, then , resulting in . Thus, the constraint holds. Therefore, the constraint is valid for all IoT devices.
- Constraint (16) ensures that each channel can be occupied by only one IoT device at any time.
- The variable denotes the initial energy available to the UAV, while refers to the energy expended by the UAV during landing. The constraints from (18) to (21) are designed to regulate the velocity and acceleration of the UAV during flight, as detailed in Section 3.4.3 of the reward function regarding , , and .
- Maximizing in the objective function (13) is essential, as it represents the successful connections and data transmissions between the UAV and IoT devices. This directly contributes to the total amount of information collected, which is the primary goal of the algorithm.
3.4. Deep Reinforcement Learning Model
3.4.1. States
3.4.2. Action
3.4.3. Reward Function
- Distance reward : This term penalizes the UAV’s distance from its target location , encouraging it to minimize this distance and eventually land at the destination. Here, is a positive weighting factor [39]. The target location is a predefined point representing the UAV’s intended flight destination, which is determined during mission planning.
- Velocity reward : The UAV incurs a penalty of if its velocity exceeds the specified maximum speed. Otherwise, .
- Acceleration reward : Similarly, this term assigns a penalty of when the UAV’s acceleration exceeds the threshold . If the acceleration remains within the allowable range, .
- Energy control reward : To ensure sustainable operation, this term penalizes the UAV with a value of when its remaining energy . Otherwise, .
- Information collection reward : This component incentivizes the UAV to gather data from IoT devices. It is defined as follows:
4. Method
4.1. LSTM
4.2. Algorithm Based on Hybrid Action Space
Algorithm 1 DRL Algorithm for UAV Messaging |
|
5. Simulation Results and Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, M.; Wang, H.; Wu, J. On UAV source seeking with complex dynamic characteristics and multiple constraints: A cooperative standoff monitoring mode. Aerosp. Sci. Technol. 2022, 121, 107315. [Google Scholar] [CrossRef]
- Zhang, X.; Zhao, H.; Wei, J.; Yan, C.; Xiong, J.; Liu, X. Cooperative Trajectory Design of Multiple UAV Base Stations with Heterogeneous Graph Neural Networks. IEEE Trans. Wirel. Commun. 2023, 22, 1495–1509. [Google Scholar] [CrossRef]
- Mohsan, S.A.H.; Othman, N.Q.H.; Li, Y.; Alsharif, M.H.; Khan, M.A. Unmanned aerial vehicles (UAVs): Practical aspects, applications, open challenges, security issues, and future trends. Intell. Serv. Robot. 2023, 16, 109–137. [Google Scholar] [CrossRef] [PubMed]
- Bayomi, N.; Fernandez, J.E. Eyes in the sky: Drones applications in the built environment under climate change challenges. Drones 2023, 7, 637. [Google Scholar] [CrossRef]
- Gu, X.; Zhang, G. A survey on UAV-assisted wireless communications: Recent advances and future trends. Comput. Commun. 2023, 208, 44–78. [Google Scholar] [CrossRef]
- Mohsan, S.A.H.; Khan, M.A.; Noor, F.; Ullah, I.; Alsharif, M.H. Towards the unmanned aerial vehicles (UAVs): A comprehensive review. Drones 2022, 6, 147. [Google Scholar] [CrossRef]
- Guo, S.; Zhang, X.; Zheng, Y.; Du, Y. An autonomous path planning model for unmanned ships based on deep reinforcement learning. Sensors 2020, 20, 426. [Google Scholar] [CrossRef]
- Azar, A.T.; Koubaa, A.; Ali, M.N.; Ibrahim, H.A.; Ibrahim, Z.F.; Kazim, M.; Ammar, A.; Benjdira, B.; Khamis, A.M.; Hameed, I.A. Drone deep reinforcement learning: A review. Electronics 2021, 10, 999. [Google Scholar] [CrossRef]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; PMLR: London, UK, 2014; pp. 387–395. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjel, A.K.; Ostrovski, G. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Konda, V.; Tsitsiklis, J. Actor-critic algorithms. Adv. Neural Inf. Process. Syst. 1999, 12, 1008–1014. [Google Scholar]
- Li, B.; Wu, Y. Path planning for UAV ground target tracking via deep reinforcement learning. IEEE Access 2020, 8, 29064–29074. [Google Scholar] [CrossRef]
- Khuntia, P.; Hazra, R. An actor-critic reinforcement learning for device-to-device communication underlaying cellular network. In Proceedings of the TENCON 2018—2018 IEEE Region 10 Conference, Jeju, Republic of Korea, 28–31 October 2018; IEEE: Piscataway, NJ, USA, 2018; p. 0050. [Google Scholar]
- Lillicrap, T. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Na, Z.; Wang, J.; Liu, C.; Guan, M.; Gao, Z. Join trajectory optimization and communication design for UAV-enabled OFDM networks. Hoc Netw. 2020, 98, 102031. [Google Scholar] [CrossRef]
- Edfors, O.; Sandell, M.; Van De Beek, J.-J.; Landström, D.; Sjöberg, F. An Introduction to Orthogonal Frequency-Division Multiplexing. Ph.D. Thesis, Lund University, Lund, Sweden, 1996. [Google Scholar]
- Zeng, Y.; Zhang, R. Energy-efficient UAV communication with trajectory optimization. IEEE Trans. Wirel. Commun. 2017, 16, 3747–3760. [Google Scholar] [CrossRef]
- Lee, J.-H.; Park, K.-H.; Ko, Y.-C.; Alouini, M.-S. A UAV-mounted free space optical communication: Trajectory optimization for flight time. IEEE Trans. Wirel. Commun. 2019, 19, 1610–1621. [Google Scholar] [CrossRef]
- Liu, C.H.; Chen, Z.; Tang, J.; Xu, J.; Piao, C. Energy-efficient UAV control for effective and fair communication coverage: A deep reinforcement learning approach. IEEE J. Sel. Areas Commun. 2018, 36, 2059–2070. [Google Scholar] [CrossRef]
- Yin, S.; Zhao, S.; Zhao, Y.; Yu, F.R. Intelligent trajectory design in UAV-aided communications with reinforcement learning. IEEE Trans. Veh. Technol. 2019, 68, 8227–8231. [Google Scholar] [CrossRef]
- Liu, Q.; Shi, L.; Sun, L.; Li, J.; Ding, M.; Shu, F. Path planning for UAV-mounted mobile edge computing with deep reinforcement learning. IEEE Trans. Veh. Technol. 2020, 69, 5723–5728. [Google Scholar] [CrossRef]
- Xiong, Z.; Zhang, Y.; Lim, W.Y.B.; Kang, J.; Niyato, D.; Leung, C.; Miao, C. UAV-assisted wireless energy and data transfer with deep reinforcement learning. IEEE Trans. Cogn. Commun. Netw. 2020, 7, 85–99. [Google Scholar] [CrossRef]
- Yuan, X.; Hu, S.; Ni, W.; Wang, X.; Jamalipour, A. Deep Reinforcement Learning-Driven Reconfigurable Intelligent Surface-Assisted Radio Surveillance with a Fixed-Wing UAV. IEEE Trans. Inf. Forensics Secur. 2023, 18, 4546–4560. [Google Scholar] [CrossRef]
- Zhu, X.; Lin, L.; Huang, Y.; Wang, X.; Que, Y.; Jedari, B.; Piran, M.J. Secure Data Transmission Based on Reinforcement Learning and Position Confusion for Internet of UAVs. IEEE Internet Things J. 2024, 11, 21010–21020. [Google Scholar] [CrossRef]
- Peng, Y.; Song, T.; Song, X.; Yang, Y.; Lu, W. Time-Effective UAV-IRS-Collaborative Data Harvesting: A Robust Deep Reinforcement Learning Approach. IEEE Trans. Wirel. Commun. 2024, 23, 18592–18607. [Google Scholar] [CrossRef]
- Wang, Y.; Gao, Z.; Zhang, J.; Cao, X.; Zheng, D.; Gao, Y.; Ng, D.W.K.; Di Renzo, M. Trajectory design for UAV-based Internet of Things data collection: A deep reinforcement learning approach. IEEE Internet Things J. 2021, 9, 3899–3912. [Google Scholar] [CrossRef]
- Dong, R.; Wang, B.; Cao, K.; Tian, J.; Cheng, T. Secure Transmission Design of RIS Enabled UAV Communication Networks Exploiting Deep Reinforcement Learning. IEEE Trans. Veh. Technol. 2024, 73, 8404–8419. [Google Scholar] [CrossRef]
- Duan, Y.; Schulman, J.; Chen, X.; Bartlett, P.L.; Sutskever, I.; Abbeel, P. RL²: Fast Reinforcement Learning via Slow Reinforcement Learning. arXiv 2016, arXiv:1611.02779. [Google Scholar]
- Bharadiya, J.P. Exploring the Use of Recurrent Neural Networks for Time Series Forecasting. Int. J. Innov. Sci. Res. Technol. 2023, 8, 2023–2027. [Google Scholar]
- Iparraguirre-Villanueva, O.; Guevara-Ponce, V.; Ruiz-Alvarado, D.; Beltozar-Clemente, S.; Sierra-Liñan, F.; Zapata-Paulini, J.; Cabanillas-Carbonell, M. Text Prediction Recurrent Neural Networks Using Long Short-Term Memory-Dropout. arXiv 2023, arXiv:2302.01459. [Google Scholar] [CrossRef]
- Donkol, A.A.E.-B.; Hafez, A.G.; Hussein, A.I.; Mabrook, M.M. Optimization of intrusion detection using likely point PSO and enhanced LSTM-RNN hybrid technique in communication networks. IEEE Access 2023, 11, 9469–9482. [Google Scholar] [CrossRef]
- Oyewola, D.O.; Akinwunmi, S.A.; Omotehinwa, T.O. Deep LSTM and LSTM-Attention Q-learning based reinforcement learning in oil and gas sector prediction. Knowl.-Based Syst. 2024, 284, 111290. [Google Scholar] [CrossRef]
- Zhou, L.; Chen, X.; Hong, M.; Jin, S.; Shi, Q. Efficient resource allocation for multi-UAV communication against adjacent and co-channel interference. IEEE Trans. Veh. Technol. 2021, 70, 10222–10235. [Google Scholar] [CrossRef]
- Rappaport, T.S. Wireless Communications: Principles and Practice; Cambridge University Press: Cambridge, UK, 2024. [Google Scholar]
- Al-Hourani, A.; Yanikomeroglu, H. Line-of-Sight Probability and Holding Distance in Non-Terrestrial Networks. IEEE Commun. Lett. 2024, 28, 622–626. [Google Scholar] [CrossRef]
- 3GPP. User Equipment (UE) Radio Transmission and Reception, 3rd Generation Partnership Project (3GPP), Technical Specification, 38.101, v 16.3.0. 2020. Available online: https://www.3gpp.org/ (accessed on 16 December 2024).
- Haider, A.; Hwang, S.-H. Maximum Transmit Power for UE in an LTE Small Cell Uplink. Electronics 2019, 8, 796. [Google Scholar] [CrossRef]
- Wang, L.; Wang, K.; Pan, C.; Xu, W.; Aslam, N.; Nallanathan, A. Deep Reinforcement Learning Based Dynamic Trajectory Control for UAV-Assisted Mobile Edge Computing. IEEE Trans. Mobile Comput. 2022, 21, 3536–3550. [Google Scholar] [CrossRef]
- Yuan, Y.; Lei, L.; Vu, T.X.; Chatzinotas, S.; Sun, S.; Ottersten, B. Actor-Critic Learning-Based Energy Optimization for UAV Access and Backhaul Networks. J. Wirel. Commun. Netw. 2021, 2021, 78. [Google Scholar] [CrossRef]
- Yu, Z.; Chen, Y. Persistent Monitoring UAV Path Planning Based on Entropy Optimization. In Proceedings of the 2023 IEEE 13th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Qinhuangdao, China, 11–14 July 2023; pp. 909–914. [Google Scholar]
- Xiao, S.; Tan, X.; Wang, J. A Simulated Annealing Algorithm and Grid Map-Based UAV Coverage Path Planning Method for 3D Reconstruction. Electronics 2021, 10, 853. [Google Scholar] [CrossRef]
Parameter | Value |
---|---|
10 | |
0.6 | |
2.3 | |
0.2 | |
−170 dBm/Hz | |
kg/m | |
312.5 kHz | |
20 dBm | |
10 kg |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
He, Y.; Hu, R.; Liang, K.; Liu, Y.; Zhou, Z. Deep Reinforcement Learning Algorithm with Long Short-Term Memory Network for Optimizing Unmanned Aerial Vehicle Information Transmission. Mathematics 2025, 13, 46. https://doi.org/10.3390/math13010046
He Y, Hu R, Liang K, Liu Y, Zhou Z. Deep Reinforcement Learning Algorithm with Long Short-Term Memory Network for Optimizing Unmanned Aerial Vehicle Information Transmission. Mathematics. 2025; 13(1):46. https://doi.org/10.3390/math13010046
Chicago/Turabian StyleHe, Yufei, Ruiqi Hu, Kewei Liang, Yonghong Liu, and Zhiyuan Zhou. 2025. "Deep Reinforcement Learning Algorithm with Long Short-Term Memory Network for Optimizing Unmanned Aerial Vehicle Information Transmission" Mathematics 13, no. 1: 46. https://doi.org/10.3390/math13010046
APA StyleHe, Y., Hu, R., Liang, K., Liu, Y., & Zhou, Z. (2025). Deep Reinforcement Learning Algorithm with Long Short-Term Memory Network for Optimizing Unmanned Aerial Vehicle Information Transmission. Mathematics, 13(1), 46. https://doi.org/10.3390/math13010046