[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Intrinsically motivated model learning for developing curious robots

Published: 01 June 2017 Publication History

Abstract

Reinforcement Learning (RL) agents are typically deployed to learn a specific, concrete task based on a pre-defined reward function. However, in some cases an agent may be able to gain experience in the domain prior to being given a task. In such cases, intrinsic motivation can be used to enable the agent to learn a useful model of the environment that is likely to help it learn its eventual tasks more efficiently. This paradigm fits robots particularly well, as they need to learn about their own dynamics and affordances which can be applied to many different tasks. This article presents the texplore with Variance-And-Novelty-Intrinsic-Rewards algorithm (texplore-vanir), an intrinsically motivated model-based RL algorithm. The algorithm learns models of the transition dynamics of a domain using random forests. It calculates two different intrinsic motivations from this model: one to explore where the model is uncertain, and one to acquire novel experiences that the model has not yet been trained on. This article presents experiments demonstrating that the combination of these two intrinsic rewards enables the algorithm to learn an accurate model of a domain with no external rewards and that the learned model can be used afterward to perform tasks in the domain. While learning the model, the agent explores the domain in a developing and curious way, progressively learning more complex skills. In addition, the experiments show that combining the agent's intrinsic rewards with external task rewards enables the agent to learn faster than using external rewards alone. We also present results demonstrating the applicability of this approach to learning on robots.

References

[1]
M. Lopes, P.-Y. Oudeyer, Guest editorial: active learning and intrinsically motivated exploration in robots: advances and challenges, IEEE Trans. Auton. Ment. Dev. (TAMD), 2 (2010) 65-69.
[2]
R. Sutton, A. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998.
[3]
T. Hester, P. Stone, Real time targeted exploration in large domains, in: Proceedings of the Ninth International Conference on Development and Learning, 2010.
[4]
R. Brafman, M. Tennenholtz, R-Max a general polynomial time algorithm for near-optimal reinforcement learning, in: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, 2001, pp. 953-958.
[5]
A. Baranes, P.Y. Oudeyer, R-IAC: robust intrinsically motivated exploration and active learning, IEEE Trans. Auton. Ment. Dev. (TAMD), 1 (2009) 155-169.
[6]
J. Schmidhuber, Curious model-building control systems, in: Proceedings of the International Joint Conference on Neural Networks, IEEE, 1991, pp. 1458-1463.
[7]
M. Lopes, T. Lang, M. Toussaint, P.-Y. Oudeyer, Exploration in model-based reinforcement learning by empirically estimating learning progress, in: Neural Information Processing Systems, 2012.
[8]
A. Jonsson, A.G. Barto, Active learning of dynamic Bayesian networks in Markov decision processes, in: SARA, 2007, pp. 273-284.
[9]
C.M. Vigorito, A.G. Barto, Intrinsically motivated hierarchical skill learning in structured environments, IEEE Trans. Auton. Ment. Dev. (TAMD), 2 (2010).
[10]
S. Singh, A.G. Barto, N. Chentanez, Intrinsically motivated reinforcement learning, in: Advances in Neural Information Processing Systems, vol. 17, 2005.
[11]
O. imek, A.G. Barto, An intrinsic reward mechanism for efficient exploration, in: ICML, 2006, pp. 833-840.
[12]
C. Watkins, Learning from delayed rewards, University of Cambridge, 1989.
[13]
A. Stout, A. Barto, Competence progress intrinsic motivation, in: Proceedings of the Ninth International Conference on Development and Learning, 2010, pp. 257-262.
[14]
S.P. Singh, R.L. Lewis, A.G. Barto, J. Sorg, Intrinsically motivated reinforcement learning: an evolutionary perspective, IEEE Trans. Auton. Ment. Dev. (TAMD), 2 (2010) 70-82.
[15]
J. Sorg, S.P. Singh, R.L. Lewis, Internal rewards mitigate agent boundedness, in: ICML, Omnipress, 2010, pp. 1007-1014.
[16]
J. Sorg, S.P. Singh, R.L. Lewis, Optimal rewards versus leaf-evaluation heuristics in planning agents, in: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2011.
[17]
J. Bratman, S.P. Singh, J. Sorg, R.L. Lewis, Strong mitigation: nesting search for good policies within search for good reward, in: Proceedings of the Eleventh International Joint Conference on Autonomous Agents and Multiagent Systems, 2012, pp. 407-414.
[18]
P. Sequeira, Socio-emotional reward design for intrinsically motivated learning agents, Universidad Tcnica de Lisboa, 2013.
[19]
P. Sequeira, F.S. Melo, A. Paiva, Emotion-based intrinsic motivation for reinforcement learning agents, in: Lecture Notes in Computer Science, vol. 6974, Springer, 2011, pp. 326-336.
[20]
J. Randlv, P. Alstrm, Learning to drive a bicycle using reinforcement learning and shaping, in: Proceedings of the Fifteenth International Conference on Machine Learning, 1998.
[21]
A. Ng, D. Harada, S. Russell, Policy invariance under reward transformations: theory and application to reward shaping, 1999.
[22]
M.G. Sam Devlin, D. Kudenko, An empirical study of potential-based reward shaping and advice in complex, multi-agent systems, Adv. Complex Syst., 14 (2011) 251-278.
[23]
T. Hester, P. Stone, Intrinsically motivated model learning for a developing curious agent, in: Proceedings of the Eleventh International Conference on Development and Learning, 2012.
[24]
T. Hester, M. Quinlan, P. Stone, RTMBA: a real-time model-based reinforcement learning architecture for robot control, in: Proceedings of the 2012 IEEE International Conference on Robotics and Automation, 2012.
[25]
L. Kocsis, C. Szepesvri, Bandit based Monte-Carlo planning, in: Proceedings of the Seventeenth European Conference on Machine Learning, 2006.
[26]
T. Degris, O. Sigaud, P.-H. Wuillemin, Learning the structure of factored Markov Decision Processes in reinforcement learning problems, in: Proceedings of the Twenty-Third International Conference on Machine Learning, 2006, pp. 257-264.
[27]
N. Jong, P. Stone, Model-based function approximation for reinforcement learning, in: Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems, 2007.
[28]
B. Leffler, M. Littman, T. Edmunds, Efficient reinforcement learning with relocatable action models, in: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007, pp. 572-577.
[29]
C. Guestrin, R. Patrascu, D. Schuurmans, Algorithm-directed exploration for model-based reinforcement learning in factored MDPs, in: Proceedings of the Nineteenth International Conference on Machine Learning, 2002, pp. 235-242.
[30]
A. Strehl, C. Diuk, M. Littman, Efficient structure learning in factored-state MDPs, in: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007, pp. 645-650.
[31]
D. Chakraborty, P. Stone, Structure learning in ergodic factored MDPs without knowledge of the transition function's in-degree, in: Proceedings of the Twenty-Eighth International Conference on Machine Learning, 2011.
[32]
T. Hester, P. Stone, An empirical comparison of abstraction in models of Markov Decision Processes, in: Proceedings of the ICML/UAI/COLT Workshop on Abstraction in Reinforcement Learning, 2009.
[33]
R. Quinlan, Induction of decision trees, Mach. Learn., 1 (1986) 81-106.
[34]
L. Breiman, Random forests, Mach. Learn., 45 (2001) 5-32.
[35]
G. Konidaris, A.G. Barto, Building portable options: skill transfer in reinforcement learning, in: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, 2007, pp. 895-900.
[36]
M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, A. Ng, ROS: an open-source robot operating system, in: ICRA Workshop on Open Source Software, 2009.
[37]
C.R. Mansley, A. Weinstein, M.L. Littman, Sample-based planning for continuous action Markov decision processes, in: ICAPS, 2011.
[38]
A. Weinstein, M.L. Littman, Bandit-based planning and learning in continuous-action Markov decision processes, in: Proceedings of the Twenty-Second International Conference on Automated Planning and Scheduling, 2012.
[39]
S. Bubeck, R. Munos, G. Stoltz, Pure exploration in finitely-armed and continuous-armed bandits, Theor. Comput. Sci., 412 (2011) 1832-1852.

Cited By

View all
  • (2023)Using perceptual classes to dream policies in open-ended learning roboticsIntegrated Computer-Aided Engineering10.3233/ICA-23070730:3(205-222)Online publication date: 1-Jan-2023
  • (2023)When Extrinsic Payoffs Meet Intrinsic ExpectationsAdvances in Practical Applications of Agents, Multi-Agent Systems, and Cognitive Mimetics. The PAAMS Collection10.1007/978-3-031-37616-0_4(40-51)Online publication date: 12-Jul-2023
  • (2022)Cultivation of College English Network Autonomous Learning Ability Based on the Multisource Information Fusion AlgorithmMobile Information Systems10.1155/2022/81924102022Online publication date: 1-Jan-2022
  • Show More Cited By
  1. Intrinsically motivated model learning for developing curious robots

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Artificial Intelligence
    Artificial Intelligence  Volume 247, Issue C
    June 2017
    176 pages

    Publisher

    Elsevier Science Publishers Ltd.

    United Kingdom

    Publication History

    Published: 01 June 2017

    Author Tags

    1. Developmental learning
    2. Exploration
    3. Intrinsic motivation
    4. Reinforcement learning
    5. Robots

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Using perceptual classes to dream policies in open-ended learning roboticsIntegrated Computer-Aided Engineering10.3233/ICA-23070730:3(205-222)Online publication date: 1-Jan-2023
    • (2023)When Extrinsic Payoffs Meet Intrinsic ExpectationsAdvances in Practical Applications of Agents, Multi-Agent Systems, and Cognitive Mimetics. The PAAMS Collection10.1007/978-3-031-37616-0_4(40-51)Online publication date: 12-Jul-2023
    • (2022)Cultivation of College English Network Autonomous Learning Ability Based on the Multisource Information Fusion AlgorithmMobile Information Systems10.1155/2022/81924102022Online publication date: 1-Jan-2022
    • (2022)Learning Task-Agnostic Action Spaces for Movement OptimizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.310009528:12(4700-4712)Online publication date: 1-Dec-2022
    • (2021)Cloud Platform for Enterprise Financial Budget Management Based on Artificial IntelligenceWireless Communications & Mobile Computing10.1155/2021/80384332021Online publication date: 25-Sep-2021
    • (2021)Discovering diverse athletic jumping strategiesACM Transactions on Graphics10.1145/3450626.345981740:4(1-17)Online publication date: 19-Jul-2021
    • (2021)Acquiring reusable skills in intrinsically motivated reinforcement learningJournal of Intelligent Manufacturing10.1007/s10845-020-01629-332:8(2147-2168)Online publication date: 1-Dec-2021
    • (2021)A hierarchical representation of behaviour supporting open ended development and progressive learning for artificial agentsAutonomous Robots10.1007/s10514-020-09960-745:2(245-264)Online publication date: 1-Feb-2021
    • (2020)Curious Hierarchical Actor-Critic Reinforcement LearningArtificial Neural Networks and Machine Learning – ICANN 202010.1007/978-3-030-61616-8_33(408-419)Online publication date: 15-Sep-2020
    • (2019)Incremental learning of planning actions in model-based reinforcement learningProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367485(3195-3201)Online publication date: 10-Aug-2019
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media