Diversity-Based Trajectory and Goal Selection with Hindsight Experience Replay

Tianhong Dai¹²,
Hengyan Liu¹²,
Kai Arulkumaran^12,13,
Guangyu Ren¹² &
…
Anil Anthony Bharath¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13033))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1558 Accesses
7 Citations

Abstract

Hindsight experience replay (HER) is a goal relabelling technique typically used with off-policy deep reinforcement learning algorithms to solve goal-oriented tasks; it is well suited to robotic manipulation tasks that deliver only sparse rewards. In HER, both trajectories and transitions are sampled uniformly for training. However, not all of the agent’s experiences contribute equally to training, and so naive uniform sampling may lead to inefficient learning. In this paper, we propose diversity-based trajectory and goal selection with HER (DTGSH). Firstly, trajectories are sampled according to the diversity of the goal states as modelled by determinantal point processes (DPPs). Secondly, transitions with diverse goal states are selected from the trajectories by using k-DPPs. We evaluate DTGSH on five challenging robotic manipulation tasks in simulated robot environments, where we show that our method can learn more quickly and reach higher performance than other state-of-the-art approaches on all tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 55.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 69.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Quantile Regression Hindsight Experience Replay

A Novel Reinforcement Learning Sampling Method Without Additional Environment Feedback in Hindsight Experience Replay

Hindsight Experience Replay with Evolutionary Decision Trees for Curriculum Goal Generation

Notes

1.
https://github.com/openai/baselines.

References

Andrychowicz, M., et al.: Hindsight experience replay. In: Neural Information Processing Systems (2017)
Google Scholar
Andrychowicz, O.M., et al.: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39(1), 3–20 (2020)
Article Google Scholar
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)
Article Google Scholar
Berner, C., et al.: Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680 (2019)
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Dai, T., Liu, H., Bharath, A.A.: Episodic self-imitation learning with hindsight. Electronics 9(10), 1742 (2020)
Article Google Scholar
Fang, M., Zhou, C., Shi, B., Gong, B., Xu, J., Zhang, T.: DHER: Hindsight experience replay for dynamic goals. In: International Conference on Learning Representations (2018)
Google Scholar
Fang, M., Zhou, T., Du, Y., Han, L., Zhang, Z.: Curriculum-guided hindsight experience replay. In: Neural Information Processing Systems (2019)
Google Scholar
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: International Conference on Robotics and Automation (2017)
Google Scholar
Hong, K., Nenkova, A.: Improving the estimation of word importance for news multi-document summarization. In: Conference of the European Chapter of the Association for Computational Linguistics (2014)
Google Scholar
Kaelbling, L.P.: Learning to achieve goals. In: International Joint Conference on Artificial Intelligence (1993)
Google Scholar
Kiran, B.R., et al.: Deep reinforcement learning for autonomous driving: a survey. IEEE Trans. Intell. Transp. Syst. 1–18 (2021)
Google Scholar
Kulesza, A., Taskar, B.: k-DPPs: fixed-size determinantal point processes. In: International Conference on Machine Learning (2011)
Google Scholar
Kulesza, A., et al.: Determinantal Point Processes for Machine Learning. Foundations and Trends in Machine Learning (2012)
Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17, 1–40 (2016)
MathSciNet MATH Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (2016)
Google Scholar
Liu, H., Trott, A., Socher, R., Xiong, C.: Competitive experience replay. In: International Conference on Learning Representations (2019)
Google Scholar
Mahasseni, B., Lam, M., Todorovic, S.: Unsupervised video summarization with adversarial lstm networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Ng, A.Y., Harada, D., Russell, S.: Theory and application to reward shaping. In: International Conference on Machine Learning (1999)
Google Scholar
Osogami, T., Raymond, R.: Determinantal reinforcement learning. In: AAAI Conference on Artificial Intelligence (2019)
Google Scholar
Parker-Holder, J., Pacchiano, A., Choromanski, K.M., Roberts, S.J.: Effective diversity in population based reinforcement learning. In: Neural Information Processing Systems (2020)
Google Scholar
Plappert, M., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research. arXiv:1802.09464 (2018)
Rauber, P., Ummadisingu, A., Mutz, F., Schmidhuber, J.: Hindsight policy gradients. In: International Conference on Learning Representations (2019)
Google Scholar
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning (2015)
Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: International Conference on Learning Representations (2016)
Google Scholar
Schrittwieser, J., et al.: Mastering Atari, go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020)
Article Google Scholar
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (2018)
MATH Google Scholar
Vinyals, O., et al.: Grandmaster level in Starcraft ii using multi-agent reinforcement learning. Nature 575, 350–354 (2019)
Article Google Scholar
Yang, Y., et al.: Multi-agent determinantal q-learning. In: International Conference on Machine Learning (2020)
Google Scholar
Zhao, R., Tresp, V.: Energy-based hindsight experience prioritization. In: Conference on Robot Learning (2018)
Google Scholar

Download references

Acknowledgements

This work was supported by JST, Moonshot R&D Grant Number JPMJMS2012.

Author information

Authors and Affiliations

Imperial College London, London, SW7 2AZ, UK
Tianhong Dai, Hengyan Liu, Kai Arulkumaran, Guangyu Ren & Anil Anthony Bharath
Araya Inc., Tokyo, 107-6024, Japan
Kai Arulkumaran

Authors

Tianhong Dai
View author publications
You can also search for this author in PubMed Google Scholar
Hengyan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Arulkumaran
View author publications
You can also search for this author in PubMed Google Scholar
Guangyu Ren
View author publications
You can also search for this author in PubMed Google Scholar
Anil Anthony Bharath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tianhong Dai .

Editor information

Editors and Affiliations

MIMOS Berhad, Kuala Lumpur, Malaysia
Duc Nghia Pham
Sirindhorn International Institute of Science and Technology, Thammasat University, Mueang Pathum Thani, Thailand
Thanaruk Theeramunkong
Data61, CSIRO, Brisbane, QLD, Australia
Guido Governatori
Department of Philosophy, Tsinghua University, Beijing, China
Fenrong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dai, T., Liu, H., Arulkumaran, K., Ren, G., Bharath, A.A. (2021). Diversity-Based Trajectory and Goal Selection with Hindsight Experience Replay. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13033. Springer, Cham. https://doi.org/10.1007/978-3-030-89370-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-89370-5_3
Published: 01 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89369-9
Online ISBN: 978-3-030-89370-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Diversity-Based Trajectory and Goal Selection with Hindsight Experience Replay

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Quantile Regression Hindsight Experience Replay

A Novel Reinforcement Learning Sampling Method Without Additional Environment Feedback in Hindsight Experience Replay

Hindsight Experience Replay with Evolutionary Decision Trees for Curriculum Goal Generation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Diversity-Based Trajectory and Goal Selection with Hindsight Experience Replay

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Quantile Regression Hindsight Experience Replay

A Novel Reinforcement Learning Sampling Method Without Additional Environment Feedback in Hindsight Experience Replay

Hindsight Experience Replay with Evolutionary Decision Trees for Curriculum Goal Generation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation