Advanced Cooperative Formation Control in Variable-Sweep Wing UAVs via the MADDPG–VSC Algorithm
<p>Schematic of dynamic relationships in a variable-sweep wing UAV.</p> "> Figure 2
<p>Diagram of the variable-sweep wing UAV and mass centers. (<b>a</b>) Structural diagram of the L-30A variable-sweep wing UAV. (<b>b</b>) Schematic of mass centers and position vectors.</p> "> Figure 3
<p>Rotating mechanism at the wing–fuselage junction of the L-30A UAV. (<b>a</b>) Schematic of the rotation mechanism. (<b>b</b>) Schematic of the wing–fuselage junction detail.</p> "> Figure 4
<p>Aerodynamic characteristics of the L-30A variable-sweep wing UAV. (<b>a</b>) Lift and drag coefficients. (<b>b</b>) Lift-to-drag ratio.</p> "> Figure 5
<p>Schematic of cooperative formation control in variable-sweep wing UAV system.</p> "> Figure 6
<p>Structure of the MADDPG–VSC algorithm model.</p> "> Figure 7
<p>Schematic representation of the simulation environment. (<b>a</b>) Terrain schematic; (<b>b</b>) simple scenario; (<b>c</b>) complex scenario.</p> "> Figure 8
<p>UAV trajectories using MADDPG, MADDPG–VSC, and SAC algorithms. (<b>a</b>) MADDPG algorithm; (<b>b</b>) MADDPG–VSC algorithm; (<b>c</b>) SAC algorithm.</p> "> Figure 9
<p>Training reward curves for MADDPG, MADDPG–VSC, and SAC algorithms.</p> "> Figure 10
<p>Network parameter variations using MADDPG, MADDPG–VSC, and SAC algorithms. (<b>a</b>) MADDPG algorithm; (<b>b</b>) MADDPG–VSC algorithm; (<b>c</b>) SAC algorithm.</p> "> Figure 10 Cont.
<p>Network parameter variations using MADDPG, MADDPG–VSC, and SAC algorithms. (<b>a</b>) MADDPG algorithm; (<b>b</b>) MADDPG–VSC algorithm; (<b>c</b>) SAC algorithm.</p> "> Figure 11
<p>Schematic of the L-30A UAV platform and its components: (<b>a</b>) L-30A UAV system; (<b>b</b>) sensors and safety mechanisms.</p> "> Figure 12
<p>Experimental hardware platform and task controller. (<b>a</b>) Hardware platform; (<b>b</b>) task controller.</p> "> Figure 13
<p>Scenario map of the formation flight trajectories.</p> "> Figure 14
<p>Latency comparison across different hardware platforms.</p> "> Figure 15
<p>Energy consumption comparison across hardware platforms.</p> "> Figure 16
<p>Fault tolerance and reliability comparison across hardware platforms.</p> "> Figure 17
<p>Trajectory tracking error comparison between MADDPG and MADDPG–VSC. (<b>a</b>) Trajectory of MADDPG; (<b>b</b>) trajectory of MADDPG–VSC.</p> ">
Abstract
:1. Introduction
- (1)
- Optimization of the multi-rigid-body dynamic model for an adaptive morphing wing.
- (2)
- Development of a cooperative control algorithm for variable-sweep wing UAVs.
- (3)
- Construction of an adaptive optimization system for multi-objective reinforcement learning.
2. Dynamics Analysis of the Variable-Sweep Wing UAV
2.1. Multi-Rigid-Body Dynamic Model
- (1)
- Expression of center of mass and velocity
- (2)
- Multi-rigid-body dynamics equation
- (3)
- Total force and dynamic equilibrium of the system
- (4)
- Calculation of additional force
- (5)
- Simplification of forces and moment
- (6)
- Dynamic equilibrium of the system
- (7)
- Relationship between sweep angle and moment
- (8)
- Linearization of the dynamic equation
2.2. Dynamic Characteristics Analysis of the L-30A UAV
- (1)
- Connection between the model and aerodynamic analysis
- (2)
- Detailed analysis of dynamic characteristics
- (3)
- Aerodynamic characteristics analysis
- (4)
- Comparative analysis of aerodynamic characteristics
3. Cooperative Control of Variable-Sweep Wing UAV via MADDPG
3.1. Application of MARL in Variable-Sweep Wing UAV
- (1)
- State-value function
- (2)
- Action-value function
- (3)
- Q-learning update
- (4)
- Policy gradient update
3.2. Optimization of the MADDPG–VSC Algorithm for Control Design
3.2.1. Reward Function Design
- (1)
- Sweep angle reward
- (2)
- Formation reward
- (3)
- Energy consumption penalty
- (4)
- Collision avoidance penalty
3.2.2. Exploration and Learning Efficiency
3.2.3. Convergence and Stability
3.2.4. Adaptive Sweep Angle Control Strategy
3.2.5. Flight States of UAV and Control Strategies
3.3. Structure and Training Process of the MADDPG–VSC Algorithm
3.3.1. Multi-Agent Algorithm Model Structure
3.3.2. Training Process
- (1)
- Initialization.
- (2)
- Policy selection.
- (3)
- Experience storage.
- (4)
- Parameter updates.
- (5)
- Target network update.
- (6)
- Repeat training.
3.4. Simulation Verification and Performance Analysis of the MADDPG–VSC
3.4.1. Simulation Environment and Parameter Setting
3.4.2. Algorithmic Performance Evaluation
3.4.3. Comparative Analysis of Learning Efficiency and Convergence
4. Flight Validation and Performance Analysis
4.1. Experimental Platform Overview
4.1.1. L-30A UAV System
4.1.2. Controller Selection and Algorithm Integration
- (1)
- Controller selection and rationale
- (2)
- Hardware performance evaluation and optimization
4.2. Flight Validation and Performance Analysis
4.2.1. Experiment Process Overview
4.2.2. Hardware Platform Ground Testing and Analysis
- (1)
- Latency test
- (2)
- Energy consumption test
- (3)
- Throughput test
- (4)
- Fault tolerance and reliability
4.2.3. Algorithm Performance and Data Analysis
- (1)
- Coordination evaluation
- Trajectory tracking error
- Formation stability
- (2)
- Real-time responsiveness evaluation
- Data transmission efficiency
4.2.4. Key Insights and Implications
4.3. Comparative Analysis of Simulation and Flight Validation
- (1)
- Trajectory and formation stability
- (2)
- Hardware performance consistency
- (3)
- Response to innovation points
- (4)
- Insights and implications for future work
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Botez, R.M.; Dao, T.M.; Elelwi, M.; Kuitche, M.A. Comparison and analyses of a variable span-morphing of the tapered wing with a varying sweep angle. Aeronaut. J. 2020, 124, 1146–1169. [Google Scholar] [CrossRef]
- Hart, P.E.; Nilsson, N.J.; Raphael, B. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Trans. Syst. Man Cybern. Syst. 1968, 4, 100–107. [Google Scholar] [CrossRef]
- LaValle, S.M. Planning Algorithms; Cambridge University Press: Cambridge, UK, 2006; pp. 23–62. [Google Scholar]
- Kavraki, L.E.; Svestka, P.; Latombe, J.C.; Overmars, M.H. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans. Robot. Autom. 1996, 12, 566–580. [Google Scholar] [CrossRef]
- Shadeed, O.; Hasanzade, M.; Koyuncu, E. Deep Reinforcement Learning based Aggressive Flight Trajectory Tracker. In Proceedings of the AIAA SciTech Forum 2021, Virtual Online, 11–15 January 2021. 1 PartF. [Google Scholar]
- Bolinches, M.; Forrester, A.I.J.; Keane, A.J.; Scanlan, J.P.; Takeda, K. Design, analysis and experimental validation of a morphing UAV wing. Aeronaut. J. 2011, 115, 761–765. [Google Scholar] [CrossRef]
- Valasek, J.; Tandale, M.D.; Rong, J. A Reinforcement Learning—Adaptive Control Architecture for Morphing. J. Aerosp. Comput. Inf. Commun. 2005, 2, 174–195. [Google Scholar] [CrossRef]
- Lampton, A.; Niksch, A.; Valasek, J. Reinforcement Learning of a Morphing Airfoil-Policy and Discrete Learning Analysis. J. Aerosp. Comput. Inf. Commun. 2010, 7, 241–260. [Google Scholar] [CrossRef]
- Lampton, A.; Niksch, A.; Valasek, J. Reinforcement Learning of Morphing Airfoils with Aerodynamic and Structural Effects. J. Aerosp. Comput. Inf. Commun. 2009, 6, 30–50. [Google Scholar] [CrossRef]
- Valasek, J.; Doebbler, J.; Tandale, M.D.; Meade, A.J. Improved Adaptive–Reinforcement Learning Control for Morphing Unmanned Air Vehicles. IEEE Trans. Syst. Man Cybern. B 2008, 38, 1014–1020. [Google Scholar] [CrossRef] [PubMed]
- Xu, D.; Hui, Z.; Liu, Y.; Chen, G. Morphing control of a new bionic morphing UAV with deep reinforcement learning. Aerosp. Sci. Technol. 2019, 92, 232–243. [Google Scholar] [CrossRef]
- Yan, B.; Li, Y.; Dai, P.; Liu, S. Aerodynamic Analysis, Dynamic Modeling, and Control of a Morphing Aircraft. J. Aerosp. Eng. 2019, 32, 04019058. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van den Driessche, G.; Hassabis, D. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Liu, F.; Lan, T.; Wang, S. A collaborative control method for multi-UAVs based on ADRC control. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 7014–7019. [Google Scholar]
- Ouamri, M.A.; Barb, G.; Singh, D.; Adam, A.B.M.; Muthanna, M.S.A.; Li, X. Nonlinear Energy-Harvesting for D2D Networks Underlaying UAV with SWIPT Using MADQN. IEEE Commun. Lett. 2023, 27, 1804–1808. [Google Scholar] [CrossRef]
- Ji, H.; Tong, L. Multi-body dynamic modelling and flight control for an asymmetric variable sweep morphing UAV. Aeronaut. J. 2014, 118, 683–706. [Google Scholar] [CrossRef]
- Rashid, T.; Samvelyan, M.; Witt, C.; Farquhar, G.; Foerster, J.; Whiteson, S. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4295–4304. [Google Scholar]
- Liu, X.; Yin, Y.; Su, Y.; Ming, R. A Multi-UCAV Cooperative Decision-Making Method Based on an MAPPO Algorithm for Beyond-Visual-Range Air Combat. Aerospace 2022, 9, 56. [Google Scholar] [CrossRef]
- Isci, H.; Koyuncu, E. Reinforcement Learning Based Autonomous Air Combat with Energy Budgets. In Proceedings of the AIAA SciTech Forum 2022, San Diego, CA, USA; Online, 3–7 January 2022. Autonomy I. [Google Scholar]
- Du, Y.; Li, F.; Zandi, H.; Xue, Y. Approximating Nash Equilibrium in Day-ahead Electricity Market Bidding with Multi-agent Deep Reinforcement Learning. J. Mod. Power Syst. Clean Energy 2021, 9, 534–544. [Google Scholar] [CrossRef]
- Koch, W.; Mancuso, R.; West, R.; Bestavros, A. Reinforcement Learning for UAV Attitude Control. ACM Trans. Cyber-Phys. Syst. 2019, 3, 1–21. [Google Scholar] [CrossRef]
- Zhen, Z.; Zhu, P.; Xue, Y.; Ji, Y. Distributed intelligent self-organized mission planning of multi-UAV for dynamic targets cooperative search-attack. Chin. J. Aeronaut. 2019, 32, 2706–2716. [Google Scholar] [CrossRef]
- Wang, C.; Wu, L.; Yan, C.; Wang, Z.; Long, H.; Yu, C. Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork. Chin. J. Aeronaut. 2020, 33, 2930–2945. [Google Scholar] [CrossRef]
- Bekar, C.; Yuksek, B.; Inalhan, G. High Fidelity Progressive Reinforcement Learning for Agile Maneuvering UAVs. In Proceedings of the AIAA SciTech Forum 2020, Orlando, FL, USA, 6–10 January 2020. 2020-0898. [Google Scholar]
- Elkins, J.G.; Sood, R.; Rumpf, C. Bridging Reinforcement Learning and Online Learning for Spacecraft Attitude Control. J. Aerosp. Inf. Syst. 2021, 19, 62–69. [Google Scholar] [CrossRef]
- Duryea, E.; Ganger, M.; Hu, W. Exploring Deep Reinforcement Learning with Multi Q-Learning. Intell. Control Autom. 2016, 7, 129–144. [Google Scholar] [CrossRef]
- Zhang, Y.; Wu, Z.; Ma, Y.; Sun, R.; Xu, Z. Research on autonomous formation of Multi-UAV based on MADDPG algorithm. In Proceedings of the 2022 IEEE 17th International Conference on Control & Automation (ICCA), Naples, Italy, 27–30 June 2022; pp. 249–254. [Google Scholar]
- Bouhamed, O.; Ghazzai, H.; Besbes, H.; Massoud, Y. Autonomous UAV Navigation: A DDPG-Based Deep Reinforcement Learning Approach. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 12–14 October 2020; pp. 1–5. [Google Scholar]
- Zhou, W.; Li, J.; Liu, Z.; Shen, L. Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning. Chin. J. Aeronaut. 2022, 35, 100–112. [Google Scholar] [CrossRef]
- Chen, Y.J.; Chang, D.K.; Zhang, C. Autonomous Tracking Using a Swarm of UAVs: A Constrained Multi-Agent Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2020, 69, 13702–13717. [Google Scholar] [CrossRef]
- Yan, B.; Dai, P.; Liu, R.; Xing, M.; Liu, S. Adaptive super-twisting sliding mode control of variable sweep morphing aircraft. Aerosp. Sci. Technol. 2019, 92, 198–210. [Google Scholar] [CrossRef]
- Hernandez-Leal, P.; Kartal, B.; Taylor, M.E. A survey and critique of multiagent deep reinforcement learning. Auton. Agents Multi-Agent Syst. 2019, 33, 750–797. [Google Scholar] [CrossRef]
- Gaudet, B.; Linares, R.; Furfaro, R. Six Degree-of-Freedom Hovering over an Asteroid with Unknown Environmental Dynamics via Reinforcement Learning. In Proceedings of the AIAA SciTech Forum 2020, Orlando, FL, USA, 6–10 January 2020. 2020-0953. [Google Scholar]
- Kashyap, V.; Vepa, R. Reinforcement learning based Linear quadratic Regulator for the Control of a Quadcopter. In Proceedings of the AIAA SciTech Forum 2023, National Harbor, MD, USA; Online, 23–27 January 2023. 2023-0014. [Google Scholar]
- Webb, K.; Rogers, J. Adaptive Control Design for Multi-UAV Cooperative Lift Systems. J. Aircr. 2021, 58, 1302–1322. [Google Scholar] [CrossRef]
- Dai, J.; Wang, S.; Jang, Y.; Wu, X.; Cao, Z. Multi-UAV cooperative formation flight control based on APF & SMC. In Proceedings of the 2017 2nd International Conference on Robotics and Automation Engineering (ICRAE), Shanghai, China, 29–31 December 2017; pp. 222–228. [Google Scholar]
- Zynq UltraScale+ MPSoC ZCU104 Evaluation Kit. Available online: https://docs.xilinx.com/v/u/en-US/ug1267-zcu104-eval-bd (accessed on 9 October 2018).
Parameters | L-30A Variable-Sweep Wing UAV | L-30 Fixed-Wing UAV |
---|---|---|
Takeoff weight | 75 kg | 70 kg |
Service ceiling | 6800 m | 7000 m |
Velocity | 100–120 km/h (cruising) | 100–120 km/h (cruising) |
180 km/h (maximum) | 150 km/h (maximum) | |
Endurance | 4.8 h | 5 h |
Parameters | Wing Configurations | |
---|---|---|
16° | 60° | |
Longitudinal Reference Length (m) | 0.80408 | 1.98564 |
Chord Length (m) | 0.33448 | 0.68220 |
Wingspan (m) | 1.53164 | 0.69072 |
Flight States | Flight Velocity (km/h) | Flight Altitude (m) | Sweep Angle (°) |
---|---|---|---|
Takeoff | 0–60 | 0–500 | 0–10 |
Climb | 60–100 | 500–1000 | 10–20 |
Cruise | 120–140 | 1000–2000 | 15–25 |
Maneuver | 100–140 | 500–1500 | 20–35 |
Dive | 120–160 | 0–1000 | 40–50 |
Vertical Attack | 160–180 | 0–500 | 50–60 |
Landing | 0–60 | 0–500 | 0–10 |
Parameters | Values |
---|---|
Max Episodes | 3000 |
Maximum Number of Steps T | 200 |
Discount Factor γ | 0.2 |
Critic Learning Rate η1 | 0.002 |
Size of Buffer U | 5000 |
Batch Size | 32 |
Actor Learning Rate η2 | 0.001 |
Algorithm | Maximum Deviation (m) | Average Deviation (m) | Median Error (m) | Total Deviation (m) |
---|---|---|---|---|
MADDPG | 24.06 | 11.13 | 6.04 | 25.31 |
MADDPG–VSC | 18.17 | 8.84 | 5.84 | 21.50 |
SAC | 19.00 | 9.00 | 7.00 | 22.00 |
Algorithm | Maximum Deviation (m) | Average Deviation (m) | Median Error (m) | Total Deviation (m) |
---|---|---|---|---|
MADDPG | 13.88 | 4.07 | 7.78 | 16.43 |
MADDPG–VSC | 9.75 | 3.12 | 1.69 | 11.83 |
SAC | 10.00 | 4.00 | 3.50 | 12.50 |
Algorithm | Maximum Deviation (m) | Average Deviation (m) | Median Error (m) | Total Deviation (m) |
---|---|---|---|---|
MADDPG | 12.38 | 3.91 | 2.10 | 14.66 |
MADDPG-VSC | 8.79 | 2.55 | 1.57 | 10.55 |
SAC | 9.05 | 3.15 | 2.50 | 12.50 |
Hardware Platforms | Model | Runtime (ms) | Frequency (GHz) |
---|---|---|---|
ARM | ARM-A53 | 0.5550 | 1 |
ARM+FPGA | ZCU104 | 0.3550 | 1 |
Hardware Platforms | Model | Runtime (ms) | Average Energy Consumption (W) |
---|---|---|---|
ARM+FPGA | ZCU104 | 0.78 | 15 |
ARM | ARM-A53 | 15.2 | 5 |
CPU | E5-2630v4 | 0.45 | 85 |
Waypoint | wp1 | wp2 | wp3 | wp4 | wp5 | wp6 | wp7 | wp8 |
---|---|---|---|---|---|---|---|---|
Lon/° E | 90.1437 | 90.1581 | 90.1767 | 90.1823 | 90.1618 | 90.1643 | 90.1520 | 90.1431 |
Lat/° N | 38.3884 | 38.3939 | 38.4016 | 38.3901 | 38.3838 | 38.3763 | 38.3721 | 38.3882 |
Hardware Platforms | Model | Throughput (Tasks/s) |
---|---|---|
ARM | ARM-A53 | 100 |
ARM+FPGA | ZCU104 | 150 |
CPU | E5-2630v4 | 200 |
Waypoint | wp1 | wp2 | wp3 | wp4 | wp5 | wp6 | wp7 | wp8 |
---|---|---|---|---|---|---|---|---|
MADDPG | 20.6 m | 25.3 m | 27.8 m | 23.5 m | 26.7 m | 23.3 m | 26.2 m | 23.3 m |
MADDPG–VSC | 14.5 m | 16.9 m | 17.8 m | 14.5 m | 17.2 m | 16.8 m | 16.9 m | 14.4 m |
Task Scenario | Avg Response Time (s) | |
---|---|---|
MADDPG–VSC | MADDPG | |
Formation Adjustment | 1.2 | 1.8 |
Obstacle Avoidance | 1.3 | 2.0 |
Trajectory Recalibration | 1.1 | 1.7 |
Algorithm | Data Transmission Delay (s) | Data Transfer Rate (Mbps) |
---|---|---|
MADDPG–VSC | 0.2 | 1.4 |
MADDPG | 0.3 | 1.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cao, Z.; Chen, G. Advanced Cooperative Formation Control in Variable-Sweep Wing UAVs via the MADDPG–VSC Algorithm. Appl. Sci. 2024, 14, 9048. https://doi.org/10.3390/app14199048
Cao Z, Chen G. Advanced Cooperative Formation Control in Variable-Sweep Wing UAVs via the MADDPG–VSC Algorithm. Applied Sciences. 2024; 14(19):9048. https://doi.org/10.3390/app14199048
Chicago/Turabian StyleCao, Zhengyang, and Gang Chen. 2024. "Advanced Cooperative Formation Control in Variable-Sweep Wing UAVs via the MADDPG–VSC Algorithm" Applied Sciences 14, no. 19: 9048. https://doi.org/10.3390/app14199048
APA StyleCao, Z., & Chen, G. (2024). Advanced Cooperative Formation Control in Variable-Sweep Wing UAVs via the MADDPG–VSC Algorithm. Applied Sciences, 14(19), 9048. https://doi.org/10.3390/app14199048