Searching and Tracking an Unknown Number of Targets: A Learning-Based Method Enhanced with Maps Merging
<p>The environment considered in this paper. Unmanned aerial vehicles (UAVs) and targets are moving in a bounded two-dimensional rectangular search region <span class="html-italic">S</span>. <math display="inline"><semantics> <mrow> <msub> <mi>c</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>ν</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>c</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> denote the cells in which target <math display="inline"><semantics> <msub> <mi>ν</mi> <mi>j</mi> </msub> </semantics></math> and UAV <math display="inline"><semantics> <msub> <mi>u</mi> <mi>i</mi> </msub> </semantics></math> lie at time step <span class="html-italic">t</span>, respectively. The blue grids represent the FOV (field of view) of each UAV. The dashed ellipse indicates the communication range of the UAV <math display="inline"><semantics> <msub> <mi>u</mi> <mi>i</mi> </msub> </semantics></math>.</p> "> Figure 2
<p>The system architecture.</p> "> Figure 3
<p>An illustration of four observation maps for UAV <math display="inline"><semantics> <msub> <mi>u</mi> <mn>0</mn> </msub> </semantics></math>. The parameters are set as follows: <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>M</mi> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>C</mi> <mi>L</mi> </msub> <mo>=</mo> <mn>50</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>C</mi> <mi>W</mi> </msub> <mo>=</mo> <mn>50</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>s</mi> </msub> <mo>=</mo> <mn>5</mn> <mspace width="0.277778em"/> <mi>cells</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>c</mi> </msub> <mo>=</mo> <mn>10</mn> <mspace width="0.277778em"/> <mi>cells</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mi>U</mi> </msub> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mi>T</mi> </msub> <mo>=</mo> <mn>8</mn> </mrow> </semantics></math>. The current time step is <math display="inline"><semantics> <mrow> <mi>t</mi> <mo>=</mo> <mn>200</mn> </mrow> </semantics></math>. The map <math display="inline"><semantics> <mrow> <mi>M</mi> <msub> <mi>C</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> is normalized as follows: <math display="inline"><semantics> <mrow> <mi>m</mi> <msubsup> <mi>c</mi> <mi>t</mi> <mrow> <mi>k</mi> <mi>l</mi> </mrow> </msubsup> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>m</mi> <msubsup> <mi>c</mi> <mi>t</mi> <mrow> <mi>k</mi> <mi>l</mi> </mrow> </msubsup> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> <mo>/</mo> <mo movablelimits="true" form="prefix">max</mo> <mrow> <mo>(</mo> <mi>M</mi> <msub> <mi>C</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mo movablelimits="true" form="prefix">max</mo> <mo>(</mo> <mi>M</mi> <msub> <mi>C</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </semantics></math> represents the maximum value of elements in matrix <math display="inline"><semantics> <mrow> <mi>M</mi> <msub> <mi>C</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math>. From the map <math display="inline"><semantics> <mrow> <mi>M</mi> <msub> <mi>U</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math>, we can see that this map records one UAV’s historical positions. From the map <math display="inline"><semantics> <mrow> <mi>M</mi> <msub> <mi>T</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math>, we can see that this map records three targets’ historical positions.</p> "> Figure 4
<p>An illustration of observations for UAV <math display="inline"><semantics> <msub> <mi>u</mi> <mn>0</mn> </msub> </semantics></math>. The parameters are consistent with those in <a href="#sensors-21-01076-f003" class="html-fig">Figure 3</a>. The value of <math display="inline"><semantics> <msub> <mi>C</mi> <mi>input</mi> </msub> </semantics></math> is set to 21 cells. From the observation <math display="inline"><semantics> <mrow> <msubsup> <mi mathvariant="bold">o</mi> <mi>t</mi> <mn>1</mn> </msubsup> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mi mathvariant="bold">o</mi> <mi>t</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math>, we can see that UAV <math display="inline"><semantics> <msub> <mi>u</mi> <mn>0</mn> </msub> </semantics></math> is very close to the right boundary of the search region.</p> "> Figure 5
<p>The deep neural network architecture. The value of <math display="inline"><semantics> <msub> <mi>C</mi> <mi>input</mi> </msub> </semantics></math> is set to 21.</p> "> Figure 6
<p>Training curve of the average and variance of each episode’s cumulative reward every 100 episodes.</p> "> Figure 7
<p>Comparison of results when the number of UAVs is increasing while the number of targets is fixed to 10. Other parameters are the same as those in the training process. (<b>a</b>) The results of the average observation rates change with the number of UAVs. (<b>b</b>) The results of the standard deviation of the average observation rates change with the number of UAVs. (<b>c</b>) The results of the exploration rates change with the number of UAVs.</p> "> Figure 8
<p>Comparison of results when the total mission time is increasing. Other parameters are the same as those in the training process. (<b>a</b>) The results of the average observation rates change with the total mission time. (<b>b</b>) The results of the standard deviation of the average observation rates change with the total mission time. (<b>c</b>) The results of the exploration rates change with the total mission time.</p> "> Figure 9
<p>Comparison of results when the size of the search region is increasing. Other parameters are the same as those in the training process. (<b>a</b>) The results of the average observation rates change with the size of the search region. (<b>b</b>) The results of the standard deviation of the average observation rates change with the size of the search region. (<b>c</b>) The results of the exploration rates change with the size of the search region.</p> "> Figure 10
<p>Comparison of results when the communication range is increasing. Other parameters are the same as those in the training process. (<b>a</b>) The results of the average observation rates change with the communication range. (<b>b</b>) The results of the standard deviation of the average observation rates change with the communication range. (<b>c</b>) The results of the exploration rates change with the communication range.</p> ">
Abstract
:1. Introduction
- The average observation rate of the targets, the standard deviation of the observation rates of the targets, and the exploration rate of the search region are simultaneously optimized to enable multiple UAVs to cooperatively achieve fair observation of discovered targets and continuous search for undiscovered targets.
- Each UAV maintains four observation maps recording observation histories, and a map merging method among UAVs is proposed, which can reduce the partial observability of the environment and improve awareness of the environment.
- A DRL-based multi-UAV control policy is proposed, which allows UAVs to learn to balance tracking targets and exploring the environment by interacting with the environment.
2. Problem Formulation
- A bounded two-dimensional rectangular search region S discretized into equally sized cells, where and represent the number of cells in the length and width directions of the search region, respectively.
- The time step is discretized and denoted by t within a mission time duration T.
- A set of N moving targets in S. For target , the cell that lies at time step t is denoted by . The mission is to observe these targets using multiple UAVs. To simplify this mission, we assume that the maximal speed of the targets is smaller than that of the UAVs.
- A team of M homogeneous UAVs deployed in S to observe the targets. For UAV , the cell that lies at time step t is denoted by . Each UAV can observe the targets through its onboard sensor. The sensing range of each UAV is denoted by . We assume that the UAVs are flying at a fixed altitude, and the size of the field of view (FOV) of each UAV is the same and remains constant. The term denotes the FOV of the UAV at time step t. In addition, each UAV is equipped with a communication device to share information to coordinate with other UAVs. The communication range is denoted by , which is assumed to be larger than the sensing range . The UAVs can only share information with UAVs within a communication range. We further assume that all UAVs share a known global coordinate system.
3. Methods
3.1. Overview
3.2. Maps Merging
3.3. Deep Reinforcement Learning
3.3.1. Observation Space
- The observation is a part of the map centered in the UAV’s current cell , with length and width . That is, is a matrix, representing the positional relationship of UAV relative to the boundary of the search area S, which is defined as follows:
- The observation is a part of the map centered in the UAV’s current cell , with length and width . Similarly, is a matrix, representing the observation state of the cells around UAV , which is defined as follows:
- The observation is a part of the map centered in the UAV’s current cell , with length and width . Like , is a matrix, representing historical position information of other UAVs around UAV , which is defined as follows:
- The observation is a part of the map centered in the UAV’s current cell , with length and width . Similarly, is a matrix, representing historical position information of targets around UAV , which is defined as follows:
3.3.2. Action Space
3.3.3. Network Architecture
3.3.4. Reward Function
3.3.5. Training Algorithm
4. Results
4.1. Simulation Setup and Training Results
Algorithm 1: PPO with multiple UAVs for CMUOMMT |
4.2. Comparison with Other Methods
- the average observation rate of the targets ,
- the standard deviation of the observation rates of the N targets, and
- the exploration rate of the search region.
4.3. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
A3C | Asynchronous advantage actor-critic |
CMOMMT | Cooperative Multi-Robot Observation of Multiple Moving Targets |
CMUOMMT | Cooperative Multi-UAV Observation of Multiple Moving Targets |
CNN | Convolutional neural network |
DRL | Deep reinforcement learning |
EKF | Extended Kalman filter |
FOV | Field of view |
GA | Genetic algorithm |
MOSOS | Multi-Objective Symbiotic Organisms Search |
MPC | Model Predictive Control |
PHD | Probability Hypothesis Density |
POMDP | Partially Observable Markov Decision Process |
PPO | Proximal policy optimization |
PSO | Particle Swarm Optimization |
SAR | Search and rescue |
UAV | Unmanned aerial vehicle |
References
- Queralta, J.P.; Taipalmaa, J.; Pullinen, B.C.; Sarker, V.K.; Gia, T.N.; Tenhunen, H.; Gabbouj, M.; Raitoharju, J.; Westerlund, T. Collaborative multi-robot systems for search and rescue: Coordination and perception. arXiv 2020, arXiv:2008.12610. [Google Scholar]
- Mendonça, R.; Marques, M.M.; Marques, F.; Lourenco, A.; Pinto, E.; Santana, P.; Coito, F.; Lobo, V.; Barata, J. A cooperative multi-robot team for the surveillance of shipwreck survivors at sea. In Proceedings of the OCEANS 2016 MTS/IEEE Monterey, Monterey, CA, USA, 19–23 September 2016; pp. 1–6. [Google Scholar]
- Sampedro, C.; Rodriguez-Ramos, A.; Bavle, H.; Carrio, A.; de la Puente, P.; Campoy, P. A fully-autonomous aerial robot for search and rescue applications in indoor environments using learning-based techniques. J. Intell. Robot. Syst. 2019, 95, 601–627. [Google Scholar] [CrossRef]
- Khan, A.; Rinner, B.; Cavallaro, A. Cooperative robots to observe moving targets. IEEE Trans. Cybern. 2016, 48, 187–198. [Google Scholar] [CrossRef]
- Parker, L.E.; Emmons, B.A. Cooperative multi-robot observation of multiple moving targets. In Proceedings of the International Conference on Robotics and Automation, Albuquerque, NM, USA, 20–25 April 1997; pp. 2082–2089. [Google Scholar]
- Parker, L.E. Distributed algorithms for multi-robot observation of multiple moving targets. Auton. Robot. 2002, 12, 231–255. [Google Scholar] [CrossRef]
- Kolling, A.; Carpin, S. Cooperative observation of multiple moving targets: An algorithm and its formalization. Int. J. Robot. Res. 2007, 26, 935–953. [Google Scholar] [CrossRef]
- Ding, Y.; Zhu, M.; He, Y.; Jiang, J. P-CMOMMT algorithm for the cooperative multi-robot observation of multiple moving targets. In Proceedings of the 2006 6th World Congress on Intelligent Control and Automation, Dalian, China, 21–23 June 2006; Volume 2, pp. 9267–9271. [Google Scholar]
- Peng, H.; Su, F.; Bu, Y.; Zhang, G.; Shen, L. Cooperative area search for multiple UAVs based on RRT and decentralized receding horizon optimization. In Proceedings of the 2009 7th Asian Control Conference, Hong Kong, China, 27–29 August 2009; pp. 298–303. [Google Scholar]
- Yao, P.; Wang, H.; Ji, H. Multi-UAVs tracking target in urban environment by model predictive control and Improved Grey Wolf Optimizer. Aerosp. Sci. Technol. 2016, 55, 131–143. [Google Scholar] [CrossRef]
- Rosalie, M.; Danoy, G.; Chaumette, S.; Bouvry, P. Chaos-enhanced mobility models for multilevel swarms of UAVs. Swarm Evol. Comput. 2018, 41, 36–48. [Google Scholar] [CrossRef] [Green Version]
- Stolfi, D.H.; Brust, M.R.; Danoy, G.; Bouvry, P. A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UAV Swarms. In Proceedings of the 2020 IEEE 17th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 10–13 January 2020; pp. 1–6. [Google Scholar]
- Hu, J.; Xie, L.; Xu, J.; Xu, Z. Multi-agent cooperative target search. Sensors 2014, 14, 9408–9428. [Google Scholar] [CrossRef] [Green Version]
- Hayat, S.; Yanmaz, E.; Brown, T.X.; Bettstetter, C. Multi-objective UAV path planning for search and rescue. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, Singapore, 29 May–3 June 2017; pp. 5569–5574. [Google Scholar]
- Chen, H.X.; Nan, Y.; Yang, Y. Multi-UAV Reconnaissance task assignment for heterogeneous targets based on modified symbiotic organisms search algorithm. Sensors 2019, 19, 734. [Google Scholar] [CrossRef] [Green Version]
- Hu, C.; Zhang, Z.; Yang, N.; Shin, H.S.; Tsourdos, A. Fuzzy multiobjective cooperative surveillance of multiple UAVs based on distributed predictive control for unknown ground moving target in urban environment. Aerosp. Sci. Technol. 2019, 84, 329–338. [Google Scholar] [CrossRef]
- De Alcantara Andrade, F.A.; Reinier Hovenburg, A.; Netto de Lima, L.; Dahlin Rodin, C.; Johansen, T.A.; Storvold, R.; Moraes Correia, C.A.; Barreto Haddad, D. Autonomous unmanned aerial vehicles in search and rescue missions using real-time cooperative model predictive control. Sensors 2019, 19, 4067. [Google Scholar] [CrossRef] [Green Version]
- Banfi, J.; Guzzi, J.; Amigoni, F.; Flushing, E.F.; Giusti, A.; Gambardella, L.; Di Caro, G.A. An integer linear programming model for fair multitarget tracking in cooperative multirobot systems. Auton. Robot. 2019, 43, 665–680. [Google Scholar] [CrossRef]
- Li, X.; Chen, J.; Deng, F.; Li, H. Profit-driven adaptive moving targets search with UAV swarms. Sensors 2019, 19, 1545. [Google Scholar] [CrossRef] [Green Version]
- Dames, P.M. Distributed multi-target search and tracking using the PHD filter. Auton. Robot. 2020, 44, 673–689. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef] [Green Version]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef] [PubMed]
- Walker, O.; Vanegas, F.; Gonzalez, F. A framework for multi-agent UAV exploration and target-finding in GPS-denied and partially observable environments. Sensors 2020, 20, 4739. [Google Scholar] [CrossRef]
- Bhagat, S.; Sujit, P. UAV Target Tracking in Urban Environments Using Deep Reinforcement Learning. In Proceedings of the 2020 International Conference on Unmanned Aircraft Systems (ICUAS), Athens, Greece, 1–4 September 2020; pp. 694–701. [Google Scholar]
- Niroui, F.; Zhang, K.; Kashino, Z.; Nejat, G. Deep reinforcement learning robot for search and rescue applications: Exploration in unknown cluttered environments. IEEE Robot. Autom. Lett. 2019, 4, 610–617. [Google Scholar] [CrossRef]
- Sharif Razavian, A.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, Ohio, 23–28 June 2014; pp. 806–813. [Google Scholar]
- Li, H.; Zhang, Q.; Zhao, D. Deep reinforcement learning-based automatic exploration for navigation in unknown environment. IEEE Trans. Neural Networks Learn. Syst. 2020, 31, 2064–2076. [Google Scholar] [CrossRef]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Long, P.; Fanl, T.; Liao, X.; Liu, W.; Zhang, H.; Pan, J. Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 6252–6259. [Google Scholar]
- Yan, P.; Bai, C.; Zheng, H.; Guo, J. Flocking Control of UAV Swarms with Deep Reinforcement Leaming Approach. In Proceedings of the 3rd International Conference on Unmanned Systems (ICUS), Harbin, China, 27–28 November 2020; pp. 592–599. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 8026–8037. [Google Scholar]
Parameters | Values |
---|---|
T | 200 |
M | 5 |
0.99 | |
10 | |
0.1 | |
B | 64 |
0.00005 | |
10 | |
0.001 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yan, P.; Jia, T.; Bai, C. Searching and Tracking an Unknown Number of Targets: A Learning-Based Method Enhanced with Maps Merging. Sensors 2021, 21, 1076. https://doi.org/10.3390/s21041076
Yan P, Jia T, Bai C. Searching and Tracking an Unknown Number of Targets: A Learning-Based Method Enhanced with Maps Merging. Sensors. 2021; 21(4):1076. https://doi.org/10.3390/s21041076
Chicago/Turabian StyleYan, Peng, Tao Jia, and Chengchao Bai. 2021. "Searching and Tracking an Unknown Number of Targets: A Learning-Based Method Enhanced with Maps Merging" Sensors 21, no. 4: 1076. https://doi.org/10.3390/s21041076
APA StyleYan, P., Jia, T., & Bai, C. (2021). Searching and Tracking an Unknown Number of Targets: A Learning-Based Method Enhanced with Maps Merging. Sensors, 21(4), 1076. https://doi.org/10.3390/s21041076