More Web Proxy on the site http://driver.im/

research-article

Intrinsically motivated model learning for developing curious robots

Authors:

Peter StoneAuthors Info & Claims

Artificial Intelligence, Volume 247, Issue C

Pages 170 - 186

https://doi.org/10.1016/j.artint.2015.05.002

Published: 01 June 2017 Publication History

Abstract

Reinforcement Learning (RL) agents are typically deployed to learn a specific, concrete task based on a pre-defined reward function. However, in some cases an agent may be able to gain experience in the domain prior to being given a task. In such cases, intrinsic motivation can be used to enable the agent to learn a useful model of the environment that is likely to help it learn its eventual tasks more efficiently. This paradigm fits robots particularly well, as they need to learn about their own dynamics and affordances which can be applied to many different tasks. This article presents the texplore with Variance-And-Novelty-Intrinsic-Rewards algorithm (texplore-vanir), an intrinsically motivated model-based RL algorithm. The algorithm learns models of the transition dynamics of a domain using random forests. It calculates two different intrinsic motivations from this model: one to explore where the model is uncertain, and one to acquire novel experiences that the model has not yet been trained on. This article presents experiments demonstrating that the combination of these two intrinsic rewards enables the algorithm to learn an accurate model of a domain with no external rewards and that the learned model can be used afterward to perform tasks in the domain. While learning the model, the agent explores the domain in a developing and curious way, progressively learning more complex skills. In addition, the experiments show that combining the agent's intrinsic rewards with external task rewards enables the agent to learn faster than using external rewards alone. We also present results demonstrating the applicability of this approach to learning on robots.

References

[1]

M. Lopes, P.-Y. Oudeyer, Guest editorial: active learning and intrinsically motivated exploration in robots: advances and challenges, IEEE Trans. Auton. Ment. Dev. (TAMD), 2 (2010) 65-69.

Digital Library

[2]

R. Sutton, A. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998.

Digital Library

[3]

T. Hester, P. Stone, Real time targeted exploration in large domains, in: Proceedings of the Ninth International Conference on Development and Learning, 2010.

[4]

R. Brafman, M. Tennenholtz, R-Max a general polynomial time algorithm for near-optimal reinforcement learning, in: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, 2001, pp. 953-958.

Digital Library

[5]

A. Baranes, P.Y. Oudeyer, R-IAC: robust intrinsically motivated exploration and active learning, IEEE Trans. Auton. Ment. Dev. (TAMD), 1 (2009) 155-169.

Digital Library

[6]

J. Schmidhuber, Curious model-building control systems, in: Proceedings of the International Joint Conference on Neural Networks, IEEE, 1991, pp. 1458-1463.

[7]

M. Lopes, T. Lang, M. Toussaint, P.-Y. Oudeyer, Exploration in model-based reinforcement learning by empirically estimating learning progress, in: Neural Information Processing Systems, 2012.

Digital Library

[8]

A. Jonsson, A.G. Barto, Active learning of dynamic Bayesian networks in Markov decision processes, in: SARA, 2007, pp. 273-284.

Digital Library

[9]

C.M. Vigorito, A.G. Barto, Intrinsically motivated hierarchical skill learning in structured environments, IEEE Trans. Auton. Ment. Dev. (TAMD), 2 (2010).

Digital Library

[10]

S. Singh, A.G. Barto, N. Chentanez, Intrinsically motivated reinforcement learning, in: Advances in Neural Information Processing Systems, vol. 17, 2005.

Digital Library

[11]

O. imek, A.G. Barto, An intrinsic reward mechanism for efficient exploration, in: ICML, 2006, pp. 833-840.

Digital Library

[12]

C. Watkins, Learning from delayed rewards, University of Cambridge, 1989.

[13]

A. Stout, A. Barto, Competence progress intrinsic motivation, in: Proceedings of the Ninth International Conference on Development and Learning, 2010, pp. 257-262.

[14]

S.P. Singh, R.L. Lewis, A.G. Barto, J. Sorg, Intrinsically motivated reinforcement learning: an evolutionary perspective, IEEE Trans. Auton. Ment. Dev. (TAMD), 2 (2010) 70-82.

Digital Library

[15]

J. Sorg, S.P. Singh, R.L. Lewis, Internal rewards mitigate agent boundedness, in: ICML, Omnipress, 2010, pp. 1007-1014.

[16]

J. Sorg, S.P. Singh, R.L. Lewis, Optimal rewards versus leaf-evaluation heuristics in planning agents, in: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2011.

Digital Library

[17]

J. Bratman, S.P. Singh, J. Sorg, R.L. Lewis, Strong mitigation: nesting search for good policies within search for good reward, in: Proceedings of the Eleventh International Joint Conference on Autonomous Agents and Multiagent Systems, 2012, pp. 407-414.

Digital Library

[18]

P. Sequeira, Socio-emotional reward design for intrinsically motivated learning agents, Universidad Tcnica de Lisboa, 2013.

[19]

P. Sequeira, F.S. Melo, A. Paiva, Emotion-based intrinsic motivation for reinforcement learning agents, in: Lecture Notes in Computer Science, vol. 6974, Springer, 2011, pp. 326-336.

Digital Library

[20]

J. Randlv, P. Alstrm, Learning to drive a bicycle using reinforcement learning and shaping, in: Proceedings of the Fifteenth International Conference on Machine Learning, 1998.

Digital Library

[21]

A. Ng, D. Harada, S. Russell, Policy invariance under reward transformations: theory and application to reward shaping, 1999.

[22]

M.G. Sam Devlin, D. Kudenko, An empirical study of potential-based reward shaping and advice in complex, multi-agent systems, Adv. Complex Syst., 14 (2011) 251-278.

[23]

T. Hester, P. Stone, Intrinsically motivated model learning for a developing curious agent, in: Proceedings of the Eleventh International Conference on Development and Learning, 2012.

[24]

T. Hester, M. Quinlan, P. Stone, RTMBA: a real-time model-based reinforcement learning architecture for robot control, in: Proceedings of the 2012 IEEE International Conference on Robotics and Automation, 2012.

[25]

L. Kocsis, C. Szepesvri, Bandit based Monte-Carlo planning, in: Proceedings of the Seventeenth European Conference on Machine Learning, 2006.

Digital Library

[26]

T. Degris, O. Sigaud, P.-H. Wuillemin, Learning the structure of factored Markov Decision Processes in reinforcement learning problems, in: Proceedings of the Twenty-Third International Conference on Machine Learning, 2006, pp. 257-264.

Digital Library

[27]

N. Jong, P. Stone, Model-based function approximation for reinforcement learning, in: Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems, 2007.

Digital Library

[28]

B. Leffler, M. Littman, T. Edmunds, Efficient reinforcement learning with relocatable action models, in: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007, pp. 572-577.

Digital Library

[29]

C. Guestrin, R. Patrascu, D. Schuurmans, Algorithm-directed exploration for model-based reinforcement learning in factored MDPs, in: Proceedings of the Nineteenth International Conference on Machine Learning, 2002, pp. 235-242.

Digital Library

[30]

A. Strehl, C. Diuk, M. Littman, Efficient structure learning in factored-state MDPs, in: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007, pp. 645-650.

Digital Library

[31]

D. Chakraborty, P. Stone, Structure learning in ergodic factored MDPs without knowledge of the transition function's in-degree, in: Proceedings of the Twenty-Eighth International Conference on Machine Learning, 2011.

[32]

T. Hester, P. Stone, An empirical comparison of abstraction in models of Markov Decision Processes, in: Proceedings of the ICML/UAI/COLT Workshop on Abstraction in Reinforcement Learning, 2009.

[33]

R. Quinlan, Induction of decision trees, Mach. Learn., 1 (1986) 81-106.

[34]

L. Breiman, Random forests, Mach. Learn., 45 (2001) 5-32.

Digital Library

[35]

G. Konidaris, A.G. Barto, Building portable options: skill transfer in reinforcement learning, in: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, 2007, pp. 895-900.

Digital Library

[36]

M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, A. Ng, ROS: an open-source robot operating system, in: ICRA Workshop on Open Source Software, 2009.

[37]

C.R. Mansley, A. Weinstein, M.L. Littman, Sample-based planning for continuous action Markov decision processes, in: ICAPS, 2011.

Digital Library

[38]

A. Weinstein, M.L. Littman, Bandit-based planning and learning in continuous-action Markov decision processes, in: Proceedings of the Twenty-Second International Conference on Automated Planning and Scheduling, 2012.

Digital Library

[39]

S. Bubeck, R. Munos, G. Stoltz, Pure exploration in finitely-armed and continuous-armed bandits, Theor. Comput. Sci., 412 (2011) 1832-1852.

Digital Library

Cited By

Romero AMeden BBellas FDuro R(2023)Using perceptual classes to dream policies in open-ended learning roboticsIntegrated Computer-Aided Engineering10.3233/ICA-23070730:3(205-222)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/ICA-230707
Chhabra JSama KDeshmukh JSrinivasa S(2023)When Extrinsic Payoffs Meet Intrinsic ExpectationsAdvances in Practical Applications of Agents, Multi-Agent Systems, and Cognitive Mimetics. The PAAMS Collection10.1007/978-3-031-37616-0_4(40-51)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1007/978-3-031-37616-0_4
Liu W(2022)Cultivation of College English Network Autonomous Learning Ability Based on the Multisource Information Fusion AlgorithmMobile Information Systems10.1155/2022/81924102022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/8192410
Show More Cited By

Intrinsically motivated model learning for developing curious robots
1. Computing methodologies

Recommendations

Intrinsically motivated machines
50 years of artificial intelligence

Children seem intrinsically motivated to manipulate, to explore, to test, to learn and they look for activities and situations that provide such learning opportunities. Inspired by research in developmental psychology and neuroscience, some researchers ...
Intrinsically Motivated Hierarchical Skill Learning in Structured Environments

We present a framework for intrinsically motivated developmental learning of abstract skill hierarchies by reinforcement learning agents in structured environments. Long-term learning of skill hierarchies can drastically improve an agent's efficiency in ...
Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective

There is great interest in building intrinsic motivation into artificial systems using the reinforcement learning framework. Yet, what intrinsic motivation may mean computationally, and how it may differ from extrinsic motivation, remains a murky and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Artificial Intelligence

Artificial Intelligence Volume 247, Issue C

June 2017

176 pages

ISSN:0004-3702

Issue’s Table of Contents

Copyright © Elsevier B.V.

Publisher

Elsevier Science Publishers Ltd.

United Kingdom

Publication History

Published: 01 June 2017

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Romero AMeden BBellas FDuro R(2023)Using perceptual classes to dream policies in open-ended learning roboticsIntegrated Computer-Aided Engineering10.3233/ICA-23070730:3(205-222)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/ICA-230707
Chhabra JSama KDeshmukh JSrinivasa S(2023)When Extrinsic Payoffs Meet Intrinsic ExpectationsAdvances in Practical Applications of Agents, Multi-Agent Systems, and Cognitive Mimetics. The PAAMS Collection10.1007/978-3-031-37616-0_4(40-51)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1007/978-3-031-37616-0_4
Liu W(2022)Cultivation of College English Network Autonomous Learning Ability Based on the Multisource Information Fusion AlgorithmMobile Information Systems10.1155/2022/81924102022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/8192410
Babadi Avan de Panne MLiu CHämäläinen P(2022)Learning Task-Agnostic Action Spaces for Movement OptimizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.310009528:12(4700-4712)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1109/TVCG.2021.3100095
Qin JQin Q(2021)Cloud Platform for Enterprise Financial Budget Management Based on Artificial IntelligenceWireless Communications & Mobile Computing10.1155/2021/80384332021Online publication date: 25-Sep-2021
https://dl.acm.org/doi/10.1155/2021/8038433
Yin ZYang ZVan De Panne MYin K(2021)Discovering diverse athletic jumping strategiesACM Transactions on Graphics10.1145/3450626.345981740:4(1-17)Online publication date: 19-Jul-2021
https://dl.acm.org/doi/10.1145/3450626.3459817
Davoodabadi Farahani MMozayani N(2021)Acquiring reusable skills in intrinsically motivated reinforcement learningJournal of Intelligent Manufacturing10.1007/s10845-020-01629-332:8(2147-2168)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s10845-020-01629-3
Suro FFerber JStratulat TMichel F(2021)A hierarchical representation of behaviour supporting open ended development and progressive learning for artificial agentsAutonomous Robots10.1007/s10514-020-09960-745:2(245-264)Online publication date: 1-Feb-2021
https://dl.acm.org/doi/10.1007/s10514-020-09960-7
Röder FEppe MNguyen PWermter S(2020)Curious Hierarchical Actor-Critic Reinforcement LearningArtificial Neural Networks and Machine Learning – ICANN 202010.1007/978-3-030-61616-8_33(408-419)Online publication date: 15-Sep-2020
https://dl.acm.org/doi/10.1007/978-3-030-61616-8_33
Ng JPetrick R(2019)Incremental learning of planning actions in model-based reinforcement learningProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367485(3195-3201)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367471.3367485
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents