Laser Based Navigation in Asymmetry and Complex Environment
<p>The functionality of elements of methodology.</p> "> Figure 2
<p>The proposed method learns the policy in an asynchronous manner. The robot interacts with the scene in the asymmetric environment. Each scene is divided into four segments where each puzzle represents one specific start and target situations. A dedicated worker is dispatched to collect the state representation in each puzzle. Once the robot has learned a generalized policy <math display="inline"><semantics> <msub> <mi>π</mi> <msub> <mi>θ</mi> <mn>0</mn> </msub> </msub> </semantics></math> in one scene (i.e., scene 1 in top left), it copies to a set of initialized parameters in another scene (i.e., scene 2 in top right).</p> "> Figure 3
<p>Network structure. On the left is the critic network that predicts a scalar value from state input variables <span class="html-italic">s</span>. On the right is the actor network that generates a stochastic policy from the same state inputs. The complete network was learned in an actor–critic fashion. The dimension of each layer and the type of activation functions are listed in the box.</p> "> Figure 4
<p>Experiment setup. The robot used for building the map was Tyran. Messages were processed by the map builder element and interacted with the navigation framework. The navigation framework provided the available state information for the deep RL agent.</p> "> Figure 5
<p>Evaluation result with baseline algorithms. The baselines used for comparison with the APPO method were end-to-end and ADDPG methods. The horizontal axis was the episode, and the vertical axis was the moving averaged total reward.</p> "> Figure 6
<p>Decomposition of the agent behavior in hard mode. We use a map with size <math display="inline"><semantics> <mrow> <mn>100</mn> <mo>×</mo> <mn>100</mn> </mrow> </semantics></math> for calculating the path (solid red line) to the local target. We use a map with size <math display="inline"><semantics> <mrow> <mn>60</mn> <mo>×</mo> <mn>60</mn> </mrow> </semantics></math> to monitor the local behavior of the agent. The gray solid line is used to calculate the angle to path. The white solid footprint in the window is the next state by making the current move. Most of the time, it is small, as the agent makes decisions every 200 ms with a maximum of 0.5 m/s. We selected three typical stages for each task, and its number is shown on the right. These are listed as (<b>a1</b>–<b>d3</b>).</p> "> Figure 7
<p>Snapshots of the agent decisions for unseen obstacles. The blue sticks are the synthetic obstacles that have not been seen before.</p> "> Figure 8
<p>Evaluation result with the new scene.The learning process of scene 2 by using the old parameters derived from scene 1. It was compared with the result, which was learned from scratch.</p> ">
Abstract
:1. Introduction
2. Related Work
2.1. Deep Reinforcement Learning
2.2. Traditional Collision-Free Navigation
2.3. Reinforcement-Learning-Based Navigation
3. Methodology
3.1. Preliminary
3.2. Problem Setting
3.3. Algorithms
Algorithm 1. Asynchronous PPO Pseudo-Code |
Set number of workers N |
Set minimum batch size |
Initialize global buffer |
Forn in N then |
Initialize each worker using Worker () in Algorithm 2 |
end for |
Train network using Train method in Algorithm 3 when notified |
Algorithm 2. Asynchronous PPO Pseudo-Code for each Worker Thread |
Worker |
Set global step counter |
Set global PPO network |
Set global buffer queue QUEUE |
Initialize training environment ENV |
Initialize local buffer |
Restart a new game and get Initial state |
repeat: |
Take action by sampling the target distribution |
Receive from ENV and append them to buffer |
If done or then: |
Forr in then: |
Compute critic |
After each batch during training, then |
End For |
Reverse |
Push into |
Reset |
end if |
Notify main thread for update network |
Algorithm 3. Pseudo-Code for Train Method in . |
Initialize critic network weights |
Initialize actor network weights |
Initialize target critic network weights |
Initialize target actor network weights |
Set learning rate for actor and critic and |
Set decay rate for critic network |
Train method (): |
Critic update: |
where is advantage |
Actor update: |
4. Experiments
4.1. Experiment Settings
- A benchmark of learning methods that uses a deep RL agent in laser-based navigation. The scene was a typical office area. The start and target pose were designed to represent 4 typical situations that a robot may encounter during the navigation task. Each situation was randomly initialized and taken care of by a dedicated worker. For all methods, the learning rate for the critic network was and that for the actor network was . The hyperparameters for the reward functions are and . The model was trained by using an Adam optimizer on a single Nvidia GeForce GTX 1060 GPU for 1000 episodes. We repeated 5 times for each method with different random seeds to show the stability and repeatability of the training process.
- A decomposition of the decision process in harder tasks. We initialized the new Asynchronous PPO (APPO) model with the parameters trained from the first experiment and continuously learned a harder version of the tasks. The start poses were deliberately chosen (e.g., randomized with the positions close to the corridor and angle opposed to the target) to increase the complexity of the task. In the test stage, each task was run 100 times.
- An evaluation of the model in unseen environments. In this experiment, we modified the scene with some unseen obstacles and tested the success rate (repeated over 100 times) of the model learned from experiment 2.
- An evaluation of the model in a new scene. This experiment initialized a new model with the parameters learned from experiment 2 and continuously improved it in scene 2 (a warehouse) with 4 designated tasks, which have been plotted in Figure 2.
4.2. Results
4.2.1. The Baseline Algorithm
4.2.2. Evaluation with the Baseline Algorithm
4.2.3. Evaluation with Hard Mode
4.2.4. Evaluation with Unseen Obstacles
4.2.5. Evaluation with the New Scene
4.3. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Acknowledgments
Conflicts of Interest
Appendix A
Author | Contribution | Limitation |
---|---|---|
Zhu et al. [6] | Navigation of mobile robots based on RGB-camera sensors | Do not consider the dynamics in the real-world environments [t] |
Wang et al. [28] | Propose a planned-ahead hybrid reinforcement learning model to bridge the gap between synthetic studies and real-world practices | Incapable of accomplish navigation for complex robot dynamics and environments |
Zhou et al. [29] | Present a goal-directed robot navigation system that integrates global planning based on goal-directed end-to-end learning and local planning based on reinforcement learning | Learning algorithms for continuous action spaces is not considered |
Misra et al. [30] | Use reinforcement learning in a contextual bandit setting to train a neural network agent | Do not consider the dynamics in the real-world environments |
Zeng et al. [31] | Propose an improved A3C (IA3C) algorithm to learn the control policies of the robots’ local motion | Complex behaviors of moving obstacles is not considered |
Chen et al. [32] | Propose a map-based deep reinforcement learning approach for multi-robot collision avoidance in a distributed and communication-free environment | More dynamic and crowding environment is not considered |
Doukhi and Lee [33] | Present a novel approach for enabling a micro aerial vehicle system equipped with a laser range finder to autonomously navigate among obstacles and achieve a user-specified goal location in a GPS-denied environment, without the need for mapping or path planning | The problem of dynamic obstacle avoidance in 3D space is not considered |
Han and Kim [34] | Propose a lane detection algorithm using a laser range finder for the autonomous navigation of a mobile robot | Different environmental conditions is not considered |
Elfakharany and Ismail [35] | Present a novel deep reinforcement learning based method that is used to perform multi-robot task allocation and navigation in an end-to-end fashion, without the need to construct a map of the environment | More complex environments is not considered [b] |
References
- Ramirez, G.; Zeghloul, S. A new local path planner for nonholonomic mobile robot navigation in cluttered environments. In Proceedings of the 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation, Symposia Proceedings (Cat. No. 00CH37065), San Francisco, CA, USA, 24–28 April 2000; Volume 3, pp. 2058–2063. [Google Scholar]
- Oleynikova, H.; Honegger, D.; Pollefeys, M. Reactive avoidance using embedded stereo vision for mav flight. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 50–56. [Google Scholar]
- Sanchez-Lopez, J.L.; Wang, M.; Olivares-Mendez, M.A.; Molina, M.; Voos, H. A real-time 3d path planning solution for collisionfree navigation of multirotor aerial robots in dynamic environments. J. Intell. Robot. Syst. 2019, 93, 33–53. [Google Scholar] [CrossRef] [Green Version]
- Levine, S.; Finn, C.; Darrell, T.; Abbeel, P. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 2016, 17, 1–39. [Google Scholar]
- Tai, L.; Paolo, G.; Liu, M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In Proceedings of the in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 31–36. [Google Scholar]
- Zhu, Y.; Mottaghi, R.; Kolve, E.; Lim, J.J.; Gupta, A.; Fei-Fei, L.; Farhadi, A. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of the 2017 IEEE International Conference on robotics and Automation (ICRA), Sands Expo and Convention Centre, Marina Bay Sands, Singapore, 29 May–3 June 2017; pp. 3357–3364. [Google Scholar]
- Wang, F.; Zhou, B.; Chen, K.; Fan, T.; Zhang, X.; Li, J.; Tian, H.; Pan, J. Intervention aided reinforcement learning for safe and practical policy optimization in navigation. In Proceedings of the 2nd Annual Conference on Robot Learning, CoRL 2018, Zurich, Switzerland, 29–31 October 2018; pp. 410–421. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations, Conference Track Proceedings, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.A.; Fidjeland, A.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement learning in robotics: A survey. Int. J. Rob. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef] [Green Version]
- Fujimoto, S.; van Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmassan, Stockholm, Sweden, 10–15 July 2018; pp. 1582–1591. Available online: http://proceedings.mlr.press/v80/fujimoto18a.html (accessed on 18 July 2021).
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.I.; Moritz, P. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33th International Conference on Machine Learning (ICML), New York, NY, USA, 19–24 June 2016; pp. 1928–1937. [Google Scholar]
- Gu, S.; Holly, E.; Lillicrap, T.P.; Levine, S. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, 29 May–3 June 2017; pp. 3389–3396. [Google Scholar]
- Irpan, A. Deep Reinforcement Learning Doesn’t Work Yet. 2018. Available online: https://www.alexirpan.com/2018/02/14/rl-hard.html (accessed on 18 October 2021).
- Deisenroth, M.; Rasmussen, C.E. Pilco: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on Machine Learning (ICML), Bellevue, WA, USA, 28 June–2 July 2011; pp. 65–472. [Google Scholar]
- Malavazi, F.B.; Guyonneau, R.; Fasquel, J.-B.; Lagrange, S.; Mercier, F. Lidar-only based navigation algorithm for an autonomous agricultural robot. Comput. Electron. Agric. 2018, 154, 71–79. [Google Scholar] [CrossRef]
- Sampedro, C.; Bavle, H.; Rodriguez-Ramos, A.; de la Puente, P.; Campoy, P. Laser-based reactive navigation for multirotor aerial robots using deep reinforcement learning. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1024–1031. [Google Scholar]
- Qin, H.; Bi, Y.; Feng, L.; Zhang, Y.; Chen, B.M. A 3d rotating laser-based navigation solution for micro aerial vehicles in dynamic environments. Unmanned Syst. 2018, 6, 297–305. [Google Scholar] [CrossRef]
- Perez-Higueras, N.; Ramon-Vigo, R.; Caballero, F.; Merino, L. Robot local navigation with learned social cost functions. In Proceedings of the 2014 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO), Vienna, Austria, 1–3 September 2014; Volume 2, pp. 618–625. [Google Scholar]
- Jeni, L.A.; Istenes, Z.; Szemes, P.; Hashimoto, H. Robot navigation framework based on reinforcement learning for intelligent space. In Proceedings of the 2008 Conference on Human System Interactions, Krakow, Poland, 25–27 May 2008; pp. 761–766. [Google Scholar]
- Macek, K.; PetroviC, I.; Peric, N. A reinforcement learning approach to obstacle avoidance of mobile robots. In Proceedings of the 7th International Workshop on Advanced Motion Control. Proceedings (Cat. No. 02TH8623), Maribor, Slovenia, 3–5 July 2002; pp. 462–466. [Google Scholar]
- Kim, B.; Pineau, J. Socially adaptive path planning in human environments using inverse reinforcement learning. Int. J. Soc. Rob. 2016, 8, 51–66. [Google Scholar] [CrossRef]
- Gil, O.; Sanfeliu, A. Effects of a social force model reward in robot navigation based on deep reinforcement learning. In Proceedings of the Robot 2019: Fourth Iberian Robotics Conference, Porto, Portugal, 20–22 November 2019; pp. 213–224. [Google Scholar]
- Gao, W.; Hsu, D.; Lee, W.S.; Shen, S.; Subramanian, K. Intention-net: Integrating planning and deep learning for goaldirected autonomous navigation. In Proceedings of the 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, CA, USA, 13–15 November 2017; pp. 185–194. [Google Scholar]
- Wang, X.; Xiong, W.; Wang, H.; Wang, W.Y. Look before you leap: Bridging model-free and model-based reinforcement learning for planned-ahead vision-and-language navigation. In Proceedings of the ECCV 2018 European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 38–55. [Google Scholar]
- Zhou, X.; Gao, Y.; Guan, L. Towards goal-directed navigation through combining learning based global and local planners. Sensors 2019, 19, 176. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Misra, D.; Langford, J.; Artzi, Y. Mapping instructions and visual observations to actions with reinforcement learning. arXiv 2017, arXiv:1704.08795. [Google Scholar]
- Zeng, J.; Qin, L.; Hu, Y.; Yin, Q.; Hu, C. Integrating a path planner and an adaptive motion controller for navigation in dynamic environments. Appl. Sci. 2019, 7, 1384. [Google Scholar] [CrossRef] [Green Version]
- Chen, G.; Yao, S.; Ma, J.; Pan, L. Distributed non-communicating multi-robot collision avoidance via map-based deep reinforcement learning. Sensors 2020, 20, 4836. [Google Scholar] [CrossRef] [PubMed]
- Doukhi, O.; Lee, D.J. Deep reinforcement learning for end-to-end local motion planning of autonomous aerial robots in unknown outdoor environments: Real-time flight experiments. Sensors 2021, 21, 2534. [Google Scholar] [CrossRef] [PubMed]
- Han, J.H.; Kim, H.W. Lane detection algorithm using lrf for autonomous navigation of mobile robot. Appl. Sci. 2021, 11, 6229. [Google Scholar] [CrossRef]
- Elfakharany, A.; Ismail, Z.H. End-to-end deep reinforcement learning for decentralized task allocation and navigation for a multi-robot system. Appl. Sci. 2021, 11, 2895. [Google Scholar] [CrossRef]
Method | Task1 | Task2 | Task3 | Task4 |
---|---|---|---|---|
End-to-End | 62% | 65% | 20% | 60% |
ADDPG | 90% | 93% | 89% | 95% |
APPO | 95% | 98% | 93% | 98% |
Task1 | Task2 | Task3 | Task4 |
---|---|---|---|
98% | 99% | 98% | 99% |
Task 1 | Task 2 | Task 3 | Task 4 |
---|---|---|---|
93% | 94% | 92% | 89% |
Parameter | Task1 | Task2 | Task3 | Task4 |
---|---|---|---|---|
97% | 95% | 96% | 95% | |
98% | 96% | 97% | 96% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Y.; Xie, K.; Liu, Q.; Li, Y.; Wu, T. Laser Based Navigation in Asymmetry and Complex Environment. Symmetry 2022, 14, 253. https://doi.org/10.3390/sym14020253
Zhao Y, Xie K, Liu Q, Li Y, Wu T. Laser Based Navigation in Asymmetry and Complex Environment. Symmetry. 2022; 14(2):253. https://doi.org/10.3390/sym14020253
Chicago/Turabian StyleZhao, Yuchen, Keying Xie, Qingfei Liu, Yawen Li, and Tian Wu. 2022. "Laser Based Navigation in Asymmetry and Complex Environment" Symmetry 14, no. 2: 253. https://doi.org/10.3390/sym14020253
APA StyleZhao, Y., Xie, K., Liu, Q., Li, Y., & Wu, T. (2022). Laser Based Navigation in Asymmetry and Complex Environment. Symmetry, 14(2), 253. https://doi.org/10.3390/sym14020253