An Advisor-Based Architecture for a Sample-Efficient Training of Autonomous Navigation Agents with Reinforcement Learning
<p>Actor–critic model-based architecture with an advisor. The lowercase symbols <math display="inline"><semantics> <mrow> <mi>s</mi> <mo>,</mo> <mo> </mo> <mi>a</mi> <mo>,</mo> <mo> </mo> <msub> <mi>a</mi> <mrow> <mi>a</mi> <mi>d</mi> <mi>v</mi> </mrow> </msub> <mo>,</mo> <mo> </mo> <mi>r</mi> </mrow> </semantics></math>, and <math display="inline"><semantics> <msup> <mi>s</mi> <mo>′</mo> </msup> </semantics></math> represent a single sample of state, agent’s action, advisor’s action, reward, and next state, respectively, for a given time step. The upper case symbols <math display="inline"><semantics> <mrow> <mi>S</mi> <mo>,</mo> <mo> </mo> <mi>A</mi> <mo>,</mo> <mo> </mo> <msub> <mi>A</mi> <mrow> <mi>a</mi> <mi>d</mi> <mi>v</mi> </mrow> </msub> <mo>,</mo> <mo> </mo> <mi>R</mi> </mrow> </semantics></math>, and <math display="inline"><semantics> <msup> <mi>S</mi> <mo>′</mo> </msup> </semantics></math> represent the batches of the corresponding lowercase symbols selected randomly from the memory replay buffer.</p> "> Figure 2
<p>V-rep simulation environments for training and validation. The training environment is used to train each agent, and the validation environment is used to test whether the trained agents are generalized to unseen environments. The size of each arena is 5 m × 5 m.</p> "> Figure 3
<p>Physical setup. (<b>a</b>) Modified navigation platform; (<b>b</b>) physical arrangement of the training arena. The size of the complete arena is approximately 16 feet × 10 feet.</p> "> Figure 4
<p>Neural network architectures. The number labeled under each layer indicates the number of nodes associated with the corresponding layer.</p> "> Figure 5
<p>Illustration of advisor developed for target reaching during obstacle avoidance task. (<b>a</b>) Input parameters of the advisor. Red, green, and blue arrows represent proximity sensor reading <math display="inline"><semantics> <msub> <mi>P</mi> <mi>t</mi> </msub> </semantics></math>, velocity vector <math display="inline"><semantics> <msub> <mi>v</mi> <mi>t</mi> </msub> </semantics></math>, and displacement to the target <math display="inline"><semantics> <msub> <mi>d</mi> <mi>t</mi> </msub> </semantics></math>, respectively. (<b>b</b>) Decision making strategy of the advisor. The advisor takes proximity sensor readings <math display="inline"><semantics> <msub> <mi>P</mi> <mi>t</mi> </msub> </semantics></math>, target distance <math display="inline"><semantics> <msub> <mi>d</mi> <mi>t</mi> </msub> </semantics></math>, and target angle <math display="inline"><semantics> <msub> <mi>δ</mi> <mi>t</mi> </msub> </semantics></math> as the inputs. It generates the relevant linear velocity <math display="inline"><semantics> <msub> <mi>V</mi> <mrow> <mi>a</mi> <mi>d</mi> <mi>v</mi> </mrow> </msub> </semantics></math> and angular velocity <math display="inline"><semantics> <msub> <mi>ω</mi> <mrow> <mi>a</mi> <mi>d</mi> <mi>v</mi> </mrow> </msub> </semantics></math> control signals as the outputs. The activator components choose which navigation strategy must be activated by comparing the minimum proximity reading <math display="inline"><semantics> <msub> <mi>p</mi> <mi>t</mi> </msub> </semantics></math> against a threshold value for obstacle avoidance <math display="inline"><semantics> <msub> <mi>T</mi> <mi>o</mi> </msub> </semantics></math>.</p> "> Figure 6
<p>Performance variation of the trained agents against the terminal episode number of the advisor <math display="inline"><semantics> <msub> <mi>N</mi> <mi>T</mi> </msub> </semantics></math>. In this experiment, the agent only engages the advisor in the data collection process.</p> "> Figure 7
<p>Performance variation of trained agent against the total number of training episodes for different advisor contributions.</p> "> Figure 8
<p>The learning curves of each agent type. Solid lines represent the moving average of the reward per step gained by each agent type in all six trials with a window size of 150 episodes, and the shaded area represents the standard deviation.</p> "> Figure 9
<p>Moving average and the standard deviation of the collision velocity with the window size of 10 episodes. The collision speed is calculated as the average speed of the last three steps of each episode that ends with a collision. Solid lines of each plot represent a variation of the averaged collision velocity, and the dotted lines indicate the standard deviation of six trials corresponding to each agent. (<b>a</b>) ADDPG + advisor for both data collecting and policy updating; (<b>b</b>) DDPG without advisor; (<b>c</b>) ADDPG without advisor; and (<b>d</b>) MCP + model-free.</p> "> Figure 10
<p>Actor’s firing rate. Green and magenta lines show the moving average (with a window size of 30) of the actor’s firing rate for six trials corresponding to the agent types ADDPG + AfD and ADDPG + AfDP. The shaded area represents its standard deviation. The red line shows the expected maximum firing rate derived from Equation (<a href="#FD1-robotics-12-00133" class="html-disp-formula">1</a>).</p> "> Figure 11
<p>Success rate variation of trained agents with the number of training episodes.</p> "> Figure 11 Cont.
<p>Success rate variation of trained agents with the number of training episodes.</p> "> Figure A1
<p>Paths followed by the trained ATD3+AfDP and DDPG agent in six trials. The blue line represents the path followed by the ATD3+AfDP agent, and the magenta line shows the DDPG agent’s path.</p> ">
Abstract
:1. Introduction
- Investigate the use of prior knowledge as an advisor to reduce the sample complexity of the DDPG and TD3 algorithms for autonomous navigation tasks in the continuous domain.
- Devise an appropriate procedure to integrate the advisor role that makes the training of an autonomous navigation agent safer and more efficient.
- Implement the proposed method with a physical experimental setup and analyze the performance on real navigation tasks.
2. Preliminaries
3. Related Work
4. Incorporating an Advisor
4.1. Actor–Critic Model-Based Architecture
4.2. Data Collection Process
Algorithm 1 Data collection with an advisor |
|
4.3. Policy Updating Process
Algorithm 2 Policy updating with an advisor |
|
5. Experiment Setup
5.1. Simulation Setup
5.2. Physical Setup
5.3. Navigation Agent
5.4. Reward Function
5.5. Advisor
6. Experiments and Results
- Checking the advisor’s influence on the training efficiency when the advisor is being used for the data collection and investigating the optimal way of employing the advisor in the data collection process.
- Checking the performance of the advisor-based architecture and comparing it against the other state-of-the-art methods.
- Checking the applicability of proposed advisor-based architecture on navigation tasks performed with real-world platforms assuring safe and efficient training.
6.1. Advisor’s Influence on Data Collection Process
6.2. Performance of Adviser-Based Architecture
6.3. Performance with the Physical Platform
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
RL | Reinforcement learning |
DQN | Deep-Q-network |
DDPG | Deep deterministic policy gradient |
TD3 | Twin-delayed DDPG |
SLAM | Simultaneous localisation and mapping |
IntRL | Interactive reinforcement learning |
IL | Integrating imitation learning |
BADGR | Berkeley autonomous driving ground robot |
V-rep | Virtual robot experimentation platform |
API | Application programming interface |
RGB | Red green blue |
PC | Personal computer |
GPU | Graphical processing unit |
CPU | Central processing unit |
RAM | Random access memory |
FCL | Fully connected layers |
ADDPG | Adapted deep deterministic policy gradient |
ATD3 | Adapted twin-delayed DDPG |
LfD | Learning from demonstration |
MCP | Model predictive control |
AfD | Advisor for data collection |
AfP | Advisor for policy updating |
AfDP | Advisor for both data collection and policy updating |
Appendix A. Implementation of the Advisor
Parameter | Value |
---|---|
No detect distance () | 0.6 m |
Maximum detect distance () | 0.2 m |
Left Braitenberg array () | |
Right Braitenberg array () | |
Maximum linear velocity () | 1.0 ms |
Algorithm A1 Advisor for goal reaching with obstacle avoiding |
|
Appendix B. Parameter Values Used in the Implementation of the Advisor-Based Navigation Agent
Hyper-Parameter | Simulation Setup | Physical Setup |
---|---|---|
Memory replay buffer size | 100,000 | 50,000 |
Batch size (n) | 1000 | 500 |
Discount rate () | ||
Soft updating rate () | ||
Policy updating rate () | ||
Actor learning rate () | ||
Critic learning rate () | ||
Starting advisor selection probability () | ||
Selection probability rate (b) | ||
Maximum number of steps per episode () | 300 | 150 |
Parameter | Value |
---|---|
Threshold of target distance for goal reaching | m |
Threshold of the minimum proximity reading for collision | m |
Hyper-Parameter | Simulation Setup | Physical Setup |
---|---|---|
0 | 0 | |
c | ||
d | 4 | 3 |
Appendix C. Adapt the TD3 Algorithm to Incorporate an Advisor
Algorithm A2 Data collection with an advisor |
|
Algorithm A3 Policy updating with an advisor |
|
Appendix D. Actor’s Expected Maximum Firing Rate
Appendix E. Paths Followed by the Trained Agents with and without the Advisor
References
- Yang, Y.; Wang, J. An overview of multi-agent reinforcement learning from game theoretical perspective. arXiv 2020, arXiv:2011.00583. [Google Scholar]
- Lample, G.; Chaplot, D.S. Playing FPS games with deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Afsar, M.M.; Crump, T.; Far, B. Reinforcement learning based recommender systems: A survey. ACM Comput. Surv. 2022, 55, 1–38. [Google Scholar] [CrossRef]
- Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement learning in robotics: A survey. Int. J. Robot. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef]
- Ni, P.; Zhang, W.; Zhang, H.; Cao, Q. Learning efficient push and grasp policy in a tote box from simulation. Adv. Robot. 2020, 34, 873–887. [Google Scholar] [CrossRef]
- Shi, J.; Dear, T.; Kelly, S.D. Deep reinforcement learning for snake robot locomotion. IFAC-PapersOnLine 2020, 53, 9688–9695. [Google Scholar] [CrossRef]
- Wiedemann, T.; Vlaicu, C.; Josifovski, J.; Viseras, A. Robotic information gathering with reinforcement learning assisted by domain knowledge: An application to gas source localization. IEEE Access 2021, 9, 13159–13172. [Google Scholar] [CrossRef]
- Martinez, J.F.; Ipek, E. Dynamic multicore resource management: A machine learning approach. IEEE Micro 2009, 29, 8–17. [Google Scholar] [CrossRef]
- Ipek, E.; Mutlu, O.; Martínez, J.F.; Caruana, R. Self-optimizing memory controllers: A reinforcement learning approach. ACM SIGARCH Comput. Archit. News 2008, 36, 39–50. [Google Scholar] [CrossRef]
- Wang, C.; Wang, J.; Shen, Y.; Zhang, X. Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach. IEEE Trans. Veh. Technol. 2019, 68, 2124–2136. [Google Scholar] [CrossRef]
- Blum, T.; Paillet, G.; Masawat, W.; Yoshida, K. SegVisRL: Development of a robot’s neural visuomotor and planning system for lunar exploration. Adv. Robot. 2021, 35, 1359–1373. [Google Scholar] [CrossRef]
- Xue, W.; Liu, P.; Miao, R.; Gong, Z.; Wen, F.; Ying, R. Navigation system with SLAM-based trajectory topological map and reinforcement learning-based local planner. Adv. Robot. 2021, 35, 939–960. [Google Scholar] [CrossRef]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, PR, USA, 2–4 May 2016. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor–critic methods. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Hasselt, H. Double Q-learning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 6–8 December 2010. [Google Scholar]
- Liu, R.; Nageotte, F.; Zanne, P.; de Mathelin, M.; Dresp-Langley, B. Deep reinforcement learning for the control of robotic manipulation: A focussed mini-review. Robotics 2021, 10, 22. [Google Scholar] [CrossRef]
- Kormushev, P.; Calinon, S.; Caldwell, D.G. Reinforcement learning in robotics: Applications and real-world challenges. Robotics 2013, 2, 122–148. [Google Scholar] [CrossRef]
- Wijesinghe, R.; Vithanage, K.; Tissera, D.; Xavier, A.; Fernando, S.; Samarawickrama, J. Transferring Domain Knowledge with an Adviser in Continuous Tasks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Cham, Switzerland, 2021; pp. 194–205. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Introduction to Reinforcement Learning; MIT Press: Cambridge, UK, 1998; Volume 2. [Google Scholar]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference Machine Learning (ICML), Beijing, China, 21–26 June 2014. [Google Scholar]
- Peters, J.; Schaal, S. Natural actor–critic. Neurocomputing 2008, 71, 1180–1190. [Google Scholar] [CrossRef]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor–critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv 2018, arXiv:1801.01290. [Google Scholar]
- Gu, S.; Lillicrap, T.; Sutskever, I.; Levine, S. Continuous deep q-learning with model-based acceleration. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–21 June 2016; pp. 2829–2838. [Google Scholar]
- Dadvar, M.; Nayyar, R.K.; Srivastava, S. Learning Dynamic Abstract Representations for Sample-Efficient Reinforcement Learning. arXiv 2022, arXiv:2210.01955. [Google Scholar]
- Wen, S.; Zhao, Y.; Yuan, X.; Wang, Z.; Zhang, D.; Manfredi, L. Path planning for active SLAM based on deep reinforcement learning under unknown environments. Intell. Serv. Robot. 2020, 13, 263–272. [Google Scholar] [CrossRef]
- Tai, L.; Paolo, G.; Liu, M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 31–36. [Google Scholar]
- Kahn, G.; Villaflor, A.; Ding, B.; Abbeel, P.; Levine, S. Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 1–8. [Google Scholar]
- Nagabandi, A.; Kahn, G.; Fearing, R.S.; Levine, S. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7559–7566. [Google Scholar]
- Arzate Cruz, C.; Igarashi, T. A survey on interactive reinforcement learning: Design principles and open challenges. In Proceedings of the 2020 ACM Designing Interactive Systems Conference, Virtual, 6–10 July 2020; pp. 1195–1209. [Google Scholar]
- Lin, J.; Ma, Z.; Gomez, R.; Nakamura, K.; He, B.; Li, G. A review on interactive reinforcement learning from human social feedback. IEEE Access 2020, 8, 120757–120765. [Google Scholar] [CrossRef]
- Bignold, A.; Cruz, F.; Dazeley, R.; Vamplew, P.; Foale, C. Human engagement providing evaluative and informative advice for interactive reinforcement learning. Neural Comput. Appl. 2022, 35, 18215–18230. [Google Scholar] [CrossRef]
- Millan-Arias, C.C.; Fernandes, B.J.; Cruz, F.; Dazeley, R.; Fernandes, S. A robust approach for continuous interactive actor–critic algorithms. IEEE Access 2021, 9, 104242–104260. [Google Scholar] [CrossRef]
- Bignold, A.; Cruz, F.; Dazeley, R.; Vamplew, P.; Foale, C. Persistent rule-based interactive reinforcement learning. Neural Comput. Appl. 2021, 1–18. [Google Scholar] [CrossRef]
- Zhang, J.; Springenberg, J.T.; Boedecker, J.; Burgard, W. Deep reinforcement learning with successor features for navigation across similar environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017), Vancouver, BC, Canada, 24–28 September 2017; pp. 2371–2378. [Google Scholar]
- Parisotto, E.; Ba, J.L.; Salakhutdinov, R. Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv 2015, arXiv:1511.06342. [Google Scholar]
- Ross, S.; Gordon, G.; Bagnell, D. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Lauderdale, FL, USA, 11–13 April 2011; pp. 627–635. [Google Scholar]
- Amini, A.; Gilitschenski, I.; Phillips, J.; Moseyko, J.; Banerjee, R.; Karaman, S.; Rus, D. Learning robust control policies for end-to-end autonomous driving from data-driven simulation. IEEE Robot. Autom. Lett. 2020, 5, 1143–1150. [Google Scholar] [CrossRef]
- Kabzan, J.; Hewing, L.; Liniger, A.; Zeilinger, M.N. Learning-based model predictive control for autonomous racing. IEEE Robot. Autom. Lett. 2019, 4, 3363–3370. [Google Scholar] [CrossRef]
- Taylor, M.E.; Kuhlmann, G.; Stone, P. Autonomous transfer for reinforcement learning. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2008)-Volume 1, Estoril, Portugal, 12–16 May 2008; pp. 283–290. [Google Scholar]
- Kinose, A.; Taniguchi, T. Integration of imitation learning using GAIL and reinforcement learning using task-achievement rewards via probabilistic graphical model. Adv. Robot. 2020, 34, 1055–1067. [Google Scholar] [CrossRef]
- Qian, K.; Liu, H.; Valls Miro, J.; Jing, X.; Zhou, B. Hierarchical and parameterized learning of pick-and-place manipulation from under-specified human demonstrations. Adv. Robot. 2020, 34, 858–872. [Google Scholar] [CrossRef]
- Sosa-Ceron, A.D.; Gonzalez-Hernandez, H.G.; Reyes-Avendaño, J.A. Learning from Demonstrations in Human–Robot Collaborative Scenarios: A Survey. Robotics 2022, 11, 126. [Google Scholar] [CrossRef]
- Oh, J.; Guo, Y.; Singh, S.; Lee, H. Self-imitation learning. arXiv 2018, arXiv:1806.05635. [Google Scholar]
- Nair, A.; McGrew, B.; Andrychowicz, M.; Zaremba, W.; Abbeel, P. Overcoming exploration in reinforcement learning with demonstrations. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2018), Brisbane, Australia, 21–25 May 2018; pp. 6292–6299. [Google Scholar]
- Hester, T.; Vecerik, M.; Pietquin, O.; Lanctot, M.; Schaul, T.; Piot, B.; Horgan, D.; Quan, J.; Sendonaris, A.; Osband, I.; et al. Deep q-learning from demonstrations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Kahn, G.; Abbeel, P.; Levine, S. Badgr: An autonomous self-supervised learning-based navigation system. IEEE Robot. Autom. Lett. 2021, 6, 1312–1319. [Google Scholar] [CrossRef]
- Yang, X.; Patel, R.V.; Moallem, M. A fuzzy–braitenberg navigation strategy for differential drive mobile robots. J. Intell. Robot. Syst. 2006, 47, 101–124. [Google Scholar] [CrossRef]
- Konda, V.R.; Tsitsiklis, J.N. Actor–critic algorithms. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 4–9 December 2000; pp. 1008–1014. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML), New York, NY, USA, 19–24 June 2016; pp. 1928–1937. [Google Scholar]
- Gu, S.; Holly, E.; Lillicrap, T.; Levine, S. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In Proceedings of the International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3389–3396. [Google Scholar]
- Uhlenbeck, G.E.; Ornstein, L.S. On the theory of the Brownian motion. Phys. Rev. 1930, 36, 823. [Google Scholar] [CrossRef]
- Rohmer, E.; Singh, S.P.; Freese, M. V-REP: A versatile and scalable robot simulation framework. In Proceedings of the International Conference Intelligent Robots and Systems (IROS), Tokyo, Japan, 3–7 December 2013; pp. 1321–1326. [Google Scholar]
- Jia, T.; Sun, N.L.; Cao, M.Y. Moving object detection based on blob analysis. In Proceedings of the 2008 IEEE International Conference on Automation and Logistics, Qingdao, China, 1–3 September 2008; pp. 322–325. [Google Scholar]
- Li, Y.; Yuan, Y. Convergence analysis of two-layer neural networks with relu activation. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Kumar, S.K. On weight initialization in deep neural networks. arXiv 2017, arXiv:1704.08863. [Google Scholar]
- Pfeiffer, M.; Shukla, S.; Turchetta, M.; Cadena, C.; Krause, A.; Siegwart, R.; Nieto, J. Reinforced imitation: Sample efficient deep reinforcement learning for mapless navigation by leveraging prior demonstrations. IEEE Robot. Autom. Lett. 2018, 3, 4423–4430. [Google Scholar] [CrossRef]
Environment | DDPG | ADDPG | ADDPG + AfD | ADDPG + AfP | ADDPG + AfDP | ATD3 + AfDP | LfD | MCP + Model Free |
---|---|---|---|---|---|---|---|---|
Training Environment | 72.2% | 78.3% | 81.3% | 84.5% | 88.1% | 90.8% | 80.2% | 82.3% |
Validation Environment | 71.3% | 77.1% | 79.3% | 83.0% | 85.2% | 89.4% | 80.7% | 80.1% |
Agent Type | Total Number of Episodes | ||
---|---|---|---|
DDPG | |||
ADDPG | |||
ADDPG + AfDP | |||
ATD3 + AfDP | |||
LfD | |||
MCP + Model-free |
Agent Type | Collision Velocity (ms) |
---|---|
Advisor | |
DDPG | |
ADDPG | |
ADDPG + AfDP | |
ATD3 + AfDP | ± |
LfD | |
MCP + model-free |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wijesinghe, R.D.; Tissera, D.; Vithanage, M.K.; Xavier, A.; Fernando, S.; Samarawickrama, J. An Advisor-Based Architecture for a Sample-Efficient Training of Autonomous Navigation Agents with Reinforcement Learning. Robotics 2023, 12, 133. https://doi.org/10.3390/robotics12050133
Wijesinghe RD, Tissera D, Vithanage MK, Xavier A, Fernando S, Samarawickrama J. An Advisor-Based Architecture for a Sample-Efficient Training of Autonomous Navigation Agents with Reinforcement Learning. Robotics. 2023; 12(5):133. https://doi.org/10.3390/robotics12050133
Chicago/Turabian StyleWijesinghe, Rukshan Darshana, Dumindu Tissera, Mihira Kasun Vithanage, Alex Xavier, Subha Fernando, and Jayathu Samarawickrama. 2023. "An Advisor-Based Architecture for a Sample-Efficient Training of Autonomous Navigation Agents with Reinforcement Learning" Robotics 12, no. 5: 133. https://doi.org/10.3390/robotics12050133
APA StyleWijesinghe, R. D., Tissera, D., Vithanage, M. K., Xavier, A., Fernando, S., & Samarawickrama, J. (2023). An Advisor-Based Architecture for a Sample-Efficient Training of Autonomous Navigation Agents with Reinforcement Learning. Robotics, 12(5), 133. https://doi.org/10.3390/robotics12050133