Abstract
Nowadays, various innovative air combat paradigms that rely on unmanned aerial vehicles (UAVs), i.e., UAV swarm and UAV-manned aircraft cooperation, have received great attention worldwide. During the operation, UAVs are expected to perform agile and safe maneuvers according to the dynamic mission requirement and complicated battlefield environment. Deep reinforcement learning (DRL), which is suitable for sequential decision-making process, provides a powerful solution tool for air combat maneuver decision-making (ACMD), and hundreds of related research papers have been published in the last five years. However, as an emerging topic, there lacks a systematic review and tutorial. For this reason, this paper first provides a comprehensive literature review to help people grasp a whole picture of this field. It starts from the DRL itself and then extents to its application in ACMD. And special attentions are given to the design of reward function, which is the core of DRL-based ACMD. Then, a maneuver decision-making method based on one-to-one dogfight scenarios is proposed to enable UAV to win short-range air combat. The model establishment, program design, training methods and performance evaluation are described in detail. And the associated Python codes are available at gitee.com/wangyyhhh, thus enabling a quick-start for researchers to build their own ACMD applications by slight modifications. Finally, limitations of the considered model, as well as the possible future research direction for intelligent air combat, are also discussed.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Air Combat Evolution Project Overview. (Air Combat Evolution Project Overview. https://www.darpa.mil/program/air-combat-evolution. 2023–May–21
Air combat reinforcement learning. https://github.com/y8107928/air-combat-Reinforcement-Learning. 2023–May–21
Akabari S, Menhaj MB, Nikravesh SK (2005) Fuzzy modeling of offensive maneuvers in an air-to-air combat. computational intelligence. Theory Appl 10:171–184. https://doi.org/10.1007/3-540-31182-3_15
AlMahamid F, Grolinger K (2022) Autonomous unmanned aerial vehicle navigation using reinforcement learning: a systematic review. Eng Appl Artificial Intell. https://doi.org/10.48550/arXiv.2208.12328
Alpdemir MN (2022) Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Appl 34:5649–5664. https://doi.org/10.1007/s00521-021-06702-3
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34:26–38. https://doi.org/10.1109/MSP.2017.2743240
Austin F, Carbone G, Falco M, Hinz H, Lewis M (1987) Automated maneuvering decisions for air-to-air combat. American Institute Aeronaut Astronautics. https://doi.org/10.2514/6.1987-2393
Austin F, Carbone G, Hinz H, Lewis M, Falco M (1991) Game theory for automated maneuvering during air-to-air combat. J Guid Control Dyn. https://doi.org/10.2514/3.20590
Azar AT, Koubaa A, Ali Mohamed N, Ibrahim HA, Ibrahim ZF, Kazim M, Ammar A, Benjdira B, Khamis AM, Hameed IA, Casalino G (2021) Drone deep reinforcement learning: a review. Electronics 10:999. https://doi.org/10.3390/electronics10090999
Bae J, Jung H, Kim S, Kim S, Kim Y-D (2023) Deep reinforcement learning-based air-to-air combat maneuver generation in a realistic environment. IEEE Access 11:26427–26440. https://doi.org/10.1109/ACCESS.2023.3257849
Bayerlein H, Theile M, Caccamo M, Gesbert D (2021) Multi-UAV path planning for wireless data harvesting with deep reinforcement learning. IEEE Open J Commun Soc 2:1171–1187. https://doi.org/10.1109/OJCOMS.2021.3081996
Bergdahl J, Gordillo C, Tollmar K, Gisslén L (2021) Augmenting automated game testing with deep reinforcement learning. ArXiv. https://doi.org/10.48550/arXiv.2103.15819
Berner C, Brockman G, Chan B, Cheung V, Dębiak P, Dennison C, Farhi D, Fischer Q, Hashme S, Hesse C, Józefowicz R, Gray S, Olsson C, Pachocki J, Petrov M, Pinto H, Raiman J, Salimans T, Schlatter J, Zhang S (2019) Dota 2 with large scale deep reinforcement learning. ArXiv. https://doi.org/10.48550/arXiv.1912.06680
Cao X, Wan H, Lin Y, Han S (2019) High-value prioritized experience replay for off-policy reinforcement learning. IEEE Int Conference Tools with Artificial Intell 2019:1510–1514. https://doi.org/10.1109/ICTAI.2019.00215
Cao Y, Kou Y, Li Z, Xu A (2023) Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory. Int J Aerospace Eng 2023:1–20. https://doi.org/10.1155/2023/3657814
Chai R, Tsourdos A, Savvaris A, Chai S, Xia Y (2020a) Design and implementation of deep neural network-based control for automatic parking maneuver process. IEEE Trans Neural Net Learn Syst 33:1400–1413. https://doi.org/10.1109/TNNLS.2020.3042120
Chai R, Tsourdos A, Savvaris A, Chai S, Xia Y, Chen CLP (2020b) Six-DOF spacecraft optimal trajectory planning and real-time attitude control: a deep neural network-based approach. IEEE Trans Neural Net Learn Syst 31:5005–5013. https://doi.org/10.1109/TNNLS.2019.2955400
Chai R, Tsourdos A, Savvaris A, Xia Y, Chai S (2020c) Real-time reentry trajectory planning of hypersonic vehicles: a two-step strategy incorporating fuzzy multiobjective transcription and deep neural network. IEEE Trans Industr Electron 67:6904–6915. https://doi.org/10.1109/TIE.2019.2939934
Chai R, Tsourdos A, Savvaris A, Chai S (2021a) Review of advanced guidance and control algorithms for space/aerospace vehicles. Prog Aerosp Sci. https://doi.org/10.1016/j.paerosci.2021.100696
Chai R, Tsourdos A, Savvaris A, Chai S, Xia Y (2021b) Solving constrained trajectory planning problems using biased particle swarm optimization. IEEE Trans Aerosp Electron Syst 57:1685–1701. https://doi.org/10.1109/TAES.2021.3050645
Chai R, Tsourdos A, Gao H, Chai S, Xia Y (2022a) Attitude tracking control for reentry vehicles using centralised robust model predictive control. Automatica. https://doi.org/10.1016/j.automatica.2022.110561
Chai R, Tsourdos A, Gao H, Xia Y, Chai S (2022b) Dual-loop tube-based robust model predictive attitude tracking control for spacecraft with system constraints and additive disturbances. IEEE Trans Industr Electron 69:4022–4033. https://doi.org/10.1109/TIE.2021.3076729
Chai R, Tsourdos A, Chai S, Xia Y, Savvaris A (2022c) Multi-phase overtaking maneuver planning for autonomous ground vehicles via a desensitized trajectory optimization approach. IEEE Trans Industr Inf 51:4035–4049. https://doi.org/10.1109/TII.2022.3168434
Chai R, Liu D, Liu T, Tsourdos A, Xia Y, Chai S (2023) Deep learning-based trajectory planning and control for autonomous ground vehicle parking maneuver. IEEE Trans Autom Sci Eng 20:1633–1647. https://doi.org/10.1109/TASE.2022.3183610
Chen M, Wu Q, Jiang C (2008) A modified ant optimization algorithm for path planning of UCAV. Appl Soft Comput 8:1712–1718. https://doi.org/10.1016/j.asoc.2007.10.011
Crumpacker JB, Robbins MJ, Jenkins PR (2022) An approximate dynamic programming approach for solving an air combat maneuvering problem. Expert Syst Appl 203:117448. https://doi.org/10.1016/j.eswa.2022.117448
Cruz J, Simaan M, Gacic A, Jiang H, Letelliier B, Li M, Liu Y (2001) Game-theoretic modeling and control of a military air operation. IEEE Trans Aerosp Electron Syst 37:1393–1405. https://doi.org/10.1109/7.976974
Cui K, Han W, Liu Y, Wang X, Su X, Liu J, Shao X (2021) Model predictive control for automatic carrier landing with time delay. Int J Aerospace Eng 2021:8613498. https://doi.org/10.1155/2021/8613498
DARPA AlphaDogfight program overview. (DARPA AlphaDogfight program overview. https://en.wikipedia.org/wiki/DARPA_AlphaDogfight. 2023–May–21
DARPA's Gremlins Program. (DARPA's Gremlins Program. https://www.darpa.mil/program/gremlins. 2023–May–21
Dassault nEUROn. https://zh.wikipedia.org/zh-cn. 2023–Aug–08
Din A, Mir I, Faiza SA (2022) Development of reinforced learning based non-linear controller for unmanned aerial vehicle. J Ambient Intell Humaniz Comput 14:4005–4022. https://doi.org/10.1007/s12652-022-04467-8
Din A, Mir I, Gul F, Mir S (2023) Non-linear intelligent control design for unconventional unmanned aerial vehicle. American Institute Aeronautics Astronautics. https://doi.org/10.2514/6.2023-1071
Din A, Akhtar S, Maqsood A, Habib M, Mir I (2023b) Modified model free dynamic programming: an augmented approach for unmanned aerial vehicle. Appl Intell 53:3048–3068. https://doi.org/10.1007/s10489-022-03510-7
Dong Y, Ai J, Liu J (2019) Guidance and control for own aircraft in the autonomous air combat: a historical review and future prospects. J Aerosp Eng 233:5943–5991. https://doi.org/10.1177/0954410019889447
European Horizons Program. (European Horizons Program. https://irp.fas.org/program/collect/uav_roadmap2005.pdf. 2023–May–21
Evers L, Dollevoet T, Barros AI, Monsuur H (2014) Robust UAV mission planning. Ann Oper Res 222:293–315. https://doi.org/10.1007/s10479-012-1261-8
Fan Z, Xu Y, Kang Y, Luo D (2022) Air combat maneuver decision method based on A3C deep reinforcement learning. MACHINES 10:1033. https://doi.org/10.3390/machines10111033
Fu L, Wang Q, Xu J, Zhou Y, Zhu K (2012) Target assignment and sorting for multi-target attack in multi-aircraft coordinated based on RBF. 2012 Chinese control and decision conference. https://doi.org/10.1109/CCDC.2012.6244311
Fu L, Xie F, Wang D, Meng G (2014) The overview for UAV air-combat decision method. Chinese Control and Decision Conference 2014:3380–3384. https://doi.org/10.1109/CCDC.2014.6852760
Future combat air system project overview. https://en.wikipedia.org/wiki/Future_Combat_Air_System#Contractors. 2023–May–21
Gao X, Wang L, Yu X, Su X, Ding Y, Lu C, Peng H, Wang X (2023) Conditional probability based multi-objective cooperative task assignment for heterogeneous UAVs. Eng Appl Artificial Intell. https://doi.org/10.1016/j.engappai.2023.106404
Grondman I, Busoniu L, Lopes G, Babuska R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst 42:1291–1307. https://doi.org/10.1109/TSMCC.2012.2218595
Guo H, Hou M, Zhang Q, Tang C (2017) UCAV robust maneuver decision based on statistics principle. Binggong Xuebao/acta Armamentarii 38:160–167. https://doi.org/10.3969/j.issn.1000-1093.2017.01.021
Guo T, Jiang N, Li B, Zhu X, Wang Y, Du W (2021) UAV navigation in high dynamic environments: A deep reinforcement learning approach. Chin J Aeronaut 34:479–489. https://doi.org/10.1016/j.cja.2020.05.011
Han Y, Piao H, Hou Y, Sun Y, Sun Z, Zhou D, Yang S, Peng X, Fan S (2022) Deep relationship graph reinforcement learning for multi-aircraft air combat. International Joint Conference on Neural Net 2022:1–8. https://doi.org/10.1109/IJCNN55064.2022.9892208
Hou Z, Fei J, Deng Y, Xu J (2021) Data-Efficient hierarchical reinforcement learning for robotic assembly control applications. IEEE Trans Industr Electron 11:11565–11575. https://doi.org/10.1109/TIE.2020.3038072
Hu X, Luo P, Zhang X, Wang J (2018) Improved ant colony optimization for weapon-target assignment. Math Prob Eng. https://doi.org/10.1155/2018/6481635
Hu D, Yang R, Zuo J, Zhang Z, Wu J, Wang Y (2021) Application of deep reinforcement learning in maneuver planning of beyond-visual-range air combat. IEEE Access 9:32282–32297. https://doi.org/10.1109/ACCESS.2021.3060426
Hu J, Wang L, Hu T, Guo C, Wang Y (2022) Autonomous maneuver decision making of dual-uav cooperative air combat based on deep reinforcement learning. Electronics 11:467. https://doi.org/10.3390/electronics11030467
Hu Z (2020) Research on tactical decision-making of ucav based on deep reinforcement learning. Master of engineering, Harbin Institute of Technology, Shenzhen
Huang C, Dong K, Huang H, Tang S (2018) Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J Syst Eng Electron 29:86–97. https://doi.org/10.21629/JSEE.2018.01.09
Huang C, Wei Z, Yang Y, Ku S, Zhang H (2019) Knowledge acquisition for the air combat based on GWO. In: 2019 International conference on artificial intelligence technologies and applications vol 1325, pp 12–78. https://doi.org/10.1088/1742-6596/1325/1/012078
Jang B, Kim M, Harerimana G, Kim JW (2019) Q-learning algorithms: a comprehensive classification and applications. IEEE Access 7:133653–133667. https://doi.org/10.1109/ACCESS.2019.2941229
Jiang N, Jin S, Zhang C (2019) Hierarchical automatic curriculum learning: Converting a sparse reward navigation task into dense reward. Neurocomputing 360:265–278. https://doi.org/10.1016/j.neucom.2019.06.024
Jiang Y, Yu J, Li Q (2022) A novel decision-making algorithm for beyond visual range air combat based on deep reinforcement learning. Youth Academic Annual Conference of Chinese Association of Automation 2022:516–521. https://doi.org/10.1109/YAC57282.2022.10023870
Jing X, Hou M, Wu G, Ma Z, Tao Z (2022) Research on maneuvering decision algorithm based on improved deep deterministic policy gradient. IEEE Access 10:92426–92445. https://doi.org/10.1109/ACCESS.2022.3202918
Kaneshige J, Krishnakumar K (2007) Artificial immune system approach for air combat maneuvering. Intell Comput. https://doi.org/10.1117/12718892
Kim C, Ji C, Kim BS (2020) Development of a control law to improve the handling qualities for short-range air-to-air combat maneuvers. Adv Mech Eng 12:207–226. https://doi.org/10.1177/1687814020936790
Kober J, Bagnell J, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32:1238–1274. https://doi.org/10.1177/0278364913495721
Kong W, Zhou D, Zhang K, Yang Z (2020) Air combat autonomous maneuver decision for one-on- one within visual range engagement base on robust multi-agent reinforcement learning. IEEE Int Conference Control Automation 2020:506–512. https://doi.org/10.1109/ICCA51439.2020.9264567
Kong W, Zhou D, Du Y, Zhou Y, Zhao Y (2022a) Reinforcement Learning for Multi-aircraft autonomous air combat in multi-sensor UCAV platform. IEEE Sens J. https://doi.org/10.1109/JSEN.2022.3220324
Kong W, Zhou D, Du Y, Zhou Y, Zhao YY (2022b) Hierarchical multi-agent reinforcement learning for multi-aircraft close-range air combat. IET Control Theory Appl. https://doi.org/10.1049/cth2.12413
Kumar M, Agrawal K, Dutt V (2019) Modeling Decisions in Collective Risk Social Dilemma Games for Climate Change Using Reinforcement Learning. 2019 IEEE conference on cognitive and computational aspects of situation management. https://doi.org/10.1109/COGSIMA.2019.8724273.
Lange S, Riedmiller M (2010) Deep auto-encoder neural networks in reinforcement learning. 2010 International Joint Conference on Neural Networks. https://doi.org/10.1109/IJCNN.2010.5596468
Li B, Wu Y (2020) Path planning for uav ground target tracking via deep reinforcement learning. IEEE Access 8:29064–29074. https://doi.org/10.1109/ACCESS.2020.2971780
Li B, Gan Z, Chen D, Sergey D (2020a) UAV maneuvering target tracking in uncertain environments based on deep reinforcement learning and meta-learning. Remote Sensing 12:3789. https://doi.org/10.3390/rs12223789
Li Y, Han W, Wang Y (2020b) Deep reinforcement learning with application to air confrontation intelligent decision-making of manned/unmanned aerial vehicle cooperative system. IEEE Access 8:67887–67898. https://doi.org/10.1109/ACCESS.2020.2985576
Li B, Bai S, Gan Z, Liang S, Evgeny N, Yao S (2022a) Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning. CAAI Trans Intell Technol 8:64–81. https://doi.org/10.1049/cit2.12109
Li Y, Shi J, Jiang W, Zhang W, Lyu Y (2022b) Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm. Def Technol 18:1697–1714. https://doi.org/10.1016/j.dt.2021.09.014
Li B, Bai S, Liang S, Ma R, Neretin E, Huang J (2023) Manoeuvre decision-making of unmanned aerial vehicles in air combat based on an expert actor-based soft actor critic algorithm. CAAI Trans Intell Technol. https://doi.org/10.1049/cit2.12195
Li S, Wu Q, Du B, Wang Y, Chen M (2023b) Autonomous maneuver decision-making of ucav with incomplete information in human-computer gaming. Drones 7:157. https://doi.org/10.3390/drones7030157
Liu X, Yin Y, Su Y, Ming R (2022) A Multi-UCAV cooperative decision-making method based on an MAPPO algorithm for beyond-visual-range air combat. Aerospace 9:563. https://doi.org/10.3390/aerospace9100563
Luong NC, Hoang DT, Gong S, Niyato D, Wang P, Liang Y, Kim DI (2019) Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun Surveys Tutorials 21:3133–3174. https://doi.org/10.1109/COMST.2019.2916583
Lyu L, Shen Y, Zhang S (2022) The advance of reinforcement learning and deep reinforcement learning. 2022 IEEE International conference on electrical engineering p 644–648. https://doi.org/10.1109/EEBDA53927.2022.9744760
Morales EF, Murrieta-Cid R, Becerra I, Esquivel-Basaldua MA (2021) A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning. Intel Serv Robot 14:773–805. https://doi.org/10.1007/s11370-021-00398-z
MQ-9. https://zh.wikipedia.org/zh-cn/MQ-9. 2023–Aug–08
Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybernet 50:3826–3839. https://doi.org/10.1109/TCYB.2020.2977374
OFFensive Swarm-Enabled Tactics (OFFSET) program. https://apps.dtic.mil/sti/pdfs/AD1125864.pdf. 2023–May–21
Özbek M, Yıldırım S, Aksoy M, Kernin E, Koyuncu E (2022) Harfang3D dog-fight sandbox: a reinforcement learning research platform for the customized control tasks of fighter aircrafts. ArXiv. https://doi.org/10.48550/arXiv.2210.07282
Parisi S, Tateo D, Hensel M, Eramo CD, Peters J, Pajarinen J (2022) Long-term visitation value for deep exploration in sparse-reward reinforcement learning. Algorithms 15:81. https://doi.org/10.3390/a15030081
Park H, Lee B, Tahk M, Yoo D (2016) Differential game based air combat maneuver generation using scoring function matrix. Int J Aeronautical Space Sci 17:204–213. https://doi.org/10.5139/IJASS.2016.17.2.204
Piao H, Sun Z, Meng G, Chen H, Qu B, Lang K, Sun Y, Yang S, Peng X (2020) Beyond-visual-range air combat tactics auto-generation by reinforcement learning. Int Joint Conference on Neural Net 2020:1–8. https://doi.org/10.1109/IJCNN48605.2020.9207088
Piao H, Han Y, Chen H, Peng X, Fan S, Sun Y, Liang C, Liu Z, Sun Z, Zhou D (2023) Complex relationship graph abstraction for autonomous air combat collaboration: A learning and expert knowledge hybrid approach. Expert Syst Appl 215:119285. https://doi.org/10.1016/j.eswa.2022.119285
Pope AP, Ide JS, Micovic D, Diaz H, Rosenbluth D, Ritholtz L, Twedt JC, Walker TT, Alcedo K, Javorsek D (2021) Hierarchical reinforcement learning for air-to-air combat. International Conference Unmanned Aircraft Syst. https://doi.org/10.48550/arXiv.2105.00990
Poropudas J, Virtanen K (2010) Game-theoretic validation and analysis of air combat simulation models. IEEE Trans Syst, Man, Cybernet - Part a: Syst Humans 40:1057–1070. https://doi.org/10.1109/TSMCA.2010.2044997
Russia National Weapons Program. https://www.foi.se/rest-api/report/FOI-R--4239--SE. 2023–May–21
Qie H, Shi D, Shen T, Xu X, Li Y, Wang L (2019) Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning. IEEE Access 7:146264–146272. https://doi.org/10.1109/ACCESS.2019.2943253
Qiu X, Yao Z, Tan F, Zhu Z, Lu J (2020) One-to-one air-combat maneuver strategy based on improved TD3 algorithm. Chinese Automation Congress 2020:5719–5725. https://doi.org/10.1109/CAC51589.2020.9327310
Rardin R, Uzsoy R (2001) Experimental evaluation of heuristic optimization algorithms: a tutorial. J Heurist 7:261–304. https://doi.org/10.1023/A:1011319115230
RL air combat. https://github.com/Linaom1214/RL_air-combat. 2023–May–21
Rodriguez-Ramos A, Sampedro C, Bavle H, de la Puente P, Campoy P (2019) A deep reinforcement learning strategy for UAV autonomous landing on a moving platform. J Intell Rob Syst 93:351–366. https://doi.org/10.1007/s10846-018-0891-8
Ruan W, Duan H, Deng Y (2022) Autonomous maneuver decisions via transfer learning pigeon-inspired optimization for UCAVs in dogfight engagements. IEEE/CAA J Automatica Sinica 9:1639–1657. https://doi.org/10.1109/JAS.2022.105803
Russia is testing its own 'loyal wingman' drone for its Su-57 stealth fighter. https://tass.com/defense/1012351. 2023–May–21
Sarkar N, Gul S (2023) Artificial intelligence-based autonomous UAV networks: a survey. Drones 7:322. https://doi.org/10.3390/drones7050322
Silver D, Huang A, Maddison C, Guez A, Sifre L, Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nature 529:484–489. https://doi.org/10.1038/nature16961
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, Driessche G, Graepel T, Hassabis D (2017) Mastering the game of go without human knowledge. Nature 550:354–359. https://doi.org/10.1038/nature24270
Smith R, Dike B (1995) Learning novel fighter combat maneuver rules via genetic algorithms. Int J Expert Syst 8:247–276
Subrahmanian VS (1994) Amalgamating knowledge bases. Association for Comput Machinery. https://doi.org/10.1145/176567.176571
Sun Y, Wang X, Wang T, Gao P (2020) Modeling of air-to-air missile dynamic attack zone based on bayesian networks. Chinese Automation Congress 2020:5596–5601. https://doi.org/10.1109/CAC51589.2020.9327613
Tasbas S, Aydinli S (2021) 2-D air combat maneuver decision using reinforcement learning. Int Conference Eng Emerg Technol 2021:1–6. https://doi.org/10.1109/ICEET53442.2021.9659753
Vázquez-Canteli JR, Nagy Z (2019) Reinforcement learning for demand response: a review of algorithms and modeling techniques. Appl Energy 235:1072–1089. https://doi.org/10.1016/j.apenergy.2018.11.002
Vien NA, Yu H, Chung T (2011) Hessian matrix distribution for Bayesian policy gradient reinforcement learning. Inf Sci 181:1671–1685. https://doi.org/10.1016/j.ins.2011.01.001
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, Huang A, Sifre L, Cai T, Agapiou JP, Jaderberg M, Vezhnevets AS, Leblond R, Pohlen T, Dalibard V, Budden D, Sulsky Y, Molloy J, Paine TL, Gulcehre C, Wang Z, Pfaff T, Wu Y, Ring R, Yogatama D, Wünsch D, McKinney K, Smith O, Schaul T, Lillicrap T, Kavukcuoglu K, Hassabis D, Apps C, Silver D (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575:350–354. https://doi.org/10.1038/s41586-019-1724-z
Wang L, Wei H (2022) Research on autonomous decision-making of UCAV based on deep reinforcement learning. Inform Commun Technol Conference 2022:122–126. https://doi.org/10.1109/ICTC55111.2022.9778652
Wang J, Zhao X, Zhang Y, Wang B (2011) Cooperative air-defense system of system model based on immune multi-agent for surface warship formation. Int Conference Awareness Sci Technol 2011:256–260. https://doi.org/10.1109/ICAwST.2011.6163151
Wang Y, Li TS, Lin C (2013) Backward Q-learning: the combination of Sarsa algorithm and Q-learning. Eng Appl Artif Intell 26:2184–2193. https://doi.org/10.1016/j.engappai.2013.06.016
Wang Y, Huang C, Tang C (2016) Research on unmanned combat aerial vehicle robust maneuvering decision under incomplete target information. Adv Mech Eng. https://doi.org/10.1177/1687814016674384
Wang C, Wang J, Wang J, Zhang X (2020a) Deep reinforcement-learning-based autonomous UAV navigation with sparse rewards. IEEE Internet Things J 7:6180–6190. https://doi.org/10.1109/JIOT.2020.2973193
Wang M, Wang L, Yue T, Liu H (2020b) Influence of unmanned combat aerial vehicle agility on short-range aerial combat effectiveness. Aerosp Sci Technol 96:105534. https://doi.org/10.1016/j.ast.2019.105534
Wang Z, Li H, Wu H, Wu Z (2020c) Improving maneuver strategy in air combat by alternate freeze games with a deep reinforcement learning algorithm. Math Probl Eng 2020:1–17. https://doi.org/10.1155/2020/7180639
Wang L, Wang K, Pan C, Xu W, Aslam N, Hanzo L (2021a) Multi-agent deep reinforcement learning-based trajectory planning for multi-uav assisted mobile edge computing. IEEE Trans Commun 7:73–84. https://doi.org/10.1109/TCCN.2020.3027695
Wang X, Chen Y, Zhu W (2021b) A survey on curriculum learning. IEEE Trans Pattern Anal Mach Intell 44:4555–4576. https://doi.org/10.1109/TPAMI.2021.3069908
Wang X, Peng H, Liu J, Dong X, Zhao X, Lu C (2022) Optimal control based coordinated taxiing path planning and tracking for multiple carrier aircraft on flight deck. Def Technol 18:238–248. https://doi.org/10.1016/j.dt.2020.11.013
Wang Y, Ren T, Fan Z (2022b) Autonomous maneuver decision of uav based on deep reinforcement learning: comparison of DQN and DDPG. Chinese Control and Decision Conference 2022:4857–4860. https://doi.org/10.1109/CCDC55256.2022.10033863
Wang X, Li B, Su X, Peng H, Wang L, Lu C, Wang C (2023) Autonomous dispatch trajectory planning on flight deck: a search-resampling-optimization framework. Eng Appl Artificial Intell 119:105792. https://doi.org/10.1016/j.engappai.2022.105792
Wang Y, Jiang T, Li Y, Zhang Z (2021) A hierarchical reinforcement learning method on multi UCAV air combat. Society of photo-optical instrumentation engineers 119330K–119337K. https://doi.org/10.1117/12.2615268
Wu J, He H, Peng J, Li Y, Li Z (2018) Continuous reinforcement learning of energy management with deep Q network for a power split hybrid electric bus. Appl Energy 222:799–811. https://doi.org/10.1016/j.apenergy.2018.03.104
Wu L, Wang C, Zhang P, Wei C (2022) Deep reinforcement learning with corrective feedback for autonomous uav landing on a mobile platform. Drones 6:238. https://doi.org/10.3390/drones6090238
Wu Y, Lei Y, Z Z, Wang Y (2022) Decision modeling and simulation of fighter air-to-ground combat based on reinforcement learning: association for computing machinery 8:102–109. https://doi.org/10.1145/3529446.3529463
Xi Z, Xu A, Kou Y, Li Z, Yang A (2020) Air combat maneuver trajectory prediction model of target based on chaotic theory and IGA-VNN. Math Probl Eng 2020:1–23. https://doi.org/10.1155/2020/8325498
Xi Z, An X, Kou Y, Li Z, Yang A (2021) Target maneuver trajectory prediction based on RBF neural network optimized by hybrid algorithm. J Syst Eng Electron 32:498–516. https://doi.org/10.23919/JSEE.2021.000042
Xi Z, Yu Y, Kou Y, Li Z, Li Y (2023) An online ensemble semi-supervised classification framework for air combat target maneuver recognition. Chinese J Aeronaut 36:340–360. https://doi.org/10.1016/j.cja.2023.04.020
Xie J, Peng X, Wang H, Niu W, Zheng X (2020) UAV autonomous tracking and landing based on deep reinforcement learning strategy. Sensors 20:5630. https://doi.org/10.3390/s20195630
Xu Z, Cao L, Chen X, Li C, Zhang Y, Lai J (2018) Deep reinforcement learning with sarsa and q-learning: a hybrid approach. IEICE Trans Inform Syst. https://doi.org/10.1587/transinf.2017EDP7278
Xu D, Guo Y, Yu Z, Wang Z, Lan R, Zhao R, Xie X, Long H (2023) PPO-Exp: keeping fixed-wing UAV formation with deep reinforcement learning. Drones 7:28. https://doi.org/10.3390/drones7010028
Xuan Y, Huang C, Li W (2011) Air combat situation assessment by gray fuzzy bayesian network. Appl Mech Mater 69:114–119. https://doi.org/10.4028/www.scientific.net/AMM.69.114
Yan J, Daobo W, Tingting B, Zongyuan Y (2022) Multi-UAV objective assignment using hungarian fusion genetic algorithm. IEEE Access 10:43013–43021. https://doi.org/10.1109/ACCESS.2022.3168359
Yang Q, Zhang J, Shi G, Hu J, Wu Y (2020) Maneuver decision of uav in short-range air combat based on deep reinforcement learning. IEEE Access 8:363–378. https://doi.org/10.1109/ACCESS.2019.2961426
Yang K, Dong W, Cai M, Jia S, Liu R (2022) UCAV air combat maneuver decisions based on a proximal policy optimization algorithm with situation reward shaping. Electronics 11:2602. https://doi.org/10.3390/electronics11162602
Yoo J, Seong H, Shim D, Bae J, Kim Y (2022) Deep reinforcement learning-based intelligent agent for autonomous air combat. IEEE/AIAA Digital Avionics Syst Conference 2022:1–9. https://doi.org/10.1109/DASC55683.2022.9925811
You S, Diao M, Gao L, Zhang F, Wang H (2020) Target tracking strategy using deep deterministic policy gradient. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2020.106490
Yu X, Gao X, Wang L, Wang X, Ding Y, Lu C, Zhang S (2022) Cooperative multi-UAV task assignment in cross-regional joint operations considering ammunition inventory. Drones. https://doi.org/10.3390/drones6030077
Yue L, Yang R, Zhang Y, Yu L, Wang Z (2022) Deep reinforcement learning for uav intelligent mission planning. Complexity 2022:1–13. https://doi.org/10.1155/2022/3551508
Zhang L, Yuan Z, Liu W (2012) The design of target assignment model based on the reverse mutation ant colony algorithm. Procedia Eng 29:1554–1558. https://doi.org/10.1016/j.proeng.2012.01.172
Zhang J, Yang Q, Shi G, Lu Y, Wu Y (2021) UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning. J syst Eng Electron 32:1421–1438. https://doi.org/10.23919/JSEE.2021.000121
Zhang H, Zhou H, Wei Y, Huang C (2022) Autonomous maneuver decision-making method based on reinforcement learning and monte carlo tree search. Front Neurorobot. https://doi.org/10.3389/fnbot.2022.996412
Zhang H, Wei Y, Zhou H, Huang C (2022b) Maneuver decision-making for autonomous air combat based on FRE-PPO. Appl Sci 12:10230. https://doi.org/10.3390/app122010230
Zhao K, Huang C (2018) Air combat situation assessment for UAV based on improved decision tree. Chinese Control and Decision Conference 2018:1772–1776. https://doi.org/10.1109/CCDC.2018.8407414
Zhao T, Hachiya H, Niu G, Sugiyama M (2012) Analysis and improvement of policy gradient estimation. Neural Netw 26:118–129. https://doi.org/10.1016/j.neunet.2011.09.005
Zhao W, Chu H, Miao X, Guo L, Shen H, Zhu C, Zhang F, Liang D (2020a) Research on the multiagent joint proximal policy optimization algorithm controlling cooperative fixed-wing UAV obstacle avoidance. Sensors 20:4546. https://doi.org/10.3390/s20164546
Zhao Y, Chen Y, Zhen Z, Jiang J (2020b) Multi-weapon multi-target assignment based on hybrid genetic algorithm in uncertain environment. Int J Adv Rob Syst. https://doi.org/10.1177/1729881420905922
Zhao W, Meng Z, Wang K, Zhang J, Lu S (2021) Hierarchical active tracking control for UAVs via deep reinforcement learning. Appl Sci 11:10595. https://doi.org/10.3390/app112210595
Zhao X, Yang R, Zhang Y, Yan M, Yue L (2022) Deep reinforcement learning for intelligent dual-uav reconnaissance mission planning. Electronics 11:2031. https://doi.org/10.3390/electronics11132031
Zheng Z, Duan H (2023) UAV maneuver decision-making via deep reinforcement learning for short-range air combat. Intell Robot 3:76–94. https://doi.org/10.20517/ir.2023.04
Zhong L, Tong M, Zhong W, Zhagn S (2007) Sequential maneuvering decisions based on multi-stage influence diagram in air combat. J Syst Eng Electron 18:551–555. https://doi.org/10.1016/S1004-4132(07)60128-5
Zhong Y, Yao P, Sun Y, Yang J (2016) Cooperative task allocation method of MCAV/UCAV formation. Math Probl Eng 2016:1–9. https://doi.org/10.1155/2016/6051046
Zhou H, Zhang X, Zhang Z, Wu F, Liu J, Chen Y (2022) Reinforcement learning technology for air combat confrontation of unmanned aerial vehicle. Soc Photo-Optical Instrument Eng. https://doi.org/10.1117/122631651
Zhou K, Wei R, Xu Z, Zhang Q (2018) A brain like air combat learning system inspired by human learning mechanism. In: 2018 IEEE CSAA guidance navigation and control conference. https://doi.org/10.1109/GNCC42960.2018.9018975
Zhu J, Song Y, Jiang D, Song H (2018) A new deep-Q-learning-based transmission scheduling mechanism for the cognitive internet of things. IEEE Int Things 5:2375–2385. https://doi.org/10.1109/JIOT.2017.2759728
Zhu B, Bedeer E, Nguyen HH, Barton R, Henry J (2021) UAV trajectory planning in wireless sensor networks for energy consumption minimization by deep reinforcement learning. IEEE Trans Veh Technol 70:9540–9554. https://doi.org/10.1109/TVT.2021.3102161
Acknowledgements
The authors are thankful to the financial support of the National Key Research and Development Plan (2021YFB3302501); the National Natural Science Foundation of China (12102077, 12161076, U2241263); the Fundamental Research Funds for the Central Universities (DUT22RC(3)010, DUT22LAB305, DUT22QN223, DUT22ZD211).
Author information
Authors and Affiliations
Contributions
XW: Conceptualization, Funding acquisition, Writing – original draft, Writing – review & editing, Supervision; YW: Writing – original draft, Visualization, Validation; XS: Project administration, Supervision; LW: Funding acquisition, Writing – review & editing; CL: Project administration; HP: Funding acquisition, Supervision; JL: Writing – review & editing.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Code availability
The code is available at the URL gitee.com/wangyyhhh.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, X., Wang, Y., Su, X. et al. Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction. Artif Intell Rev 57, 1 (2024). https://doi.org/10.1007/s10462-023-10620-2
Accepted:
Published:
DOI: https://doi.org/10.1007/s10462-023-10620-2