[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3305890.3305973guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article
Free access

Neural Episodic Control

Published: 06 August 2017 Publication History

Abstract

Deep reinforcement learning methods attain super-human performance in a wide range of environments. Such methods are grossly inefficient, often taking orders of magnitudes more data than humans to achieve reasonable performance. We propose Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them. Our agent uses a semi-tabular representation of the value function: a buffer of past experience containing slowly changing state representations and rapidly updated estimates of the value function. We show across a wide range of environments that our agent learns significantly faster than other state-of-the-art, general purpose deep reinforcement learning agents.

References

[1]
Ba, Jimmy, Hinton, Geoffrey E, Mnih, Volodymyr, Leibo, Joel Z, and Ionescu, Catalin. Using fast weights to attend to the recent past. In Advances In Neural Information Processing Systems, pp. 4331-4339, 2016.
[2]
Bakker, Bram, Zhumatiy, Viktor, Gruener, Gabriel, and Schmidhuber, Jürgen. A robot that reinforcement-learns to identify and memorize important previous observations. In Intelligent Robots and Systems, 2003.(IROS 2003). Proceedings. 2003 IEEE/RSJ International Conference on, volume 1, pp. 430-435. IEEE, 2003.
[3]
Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253-279, 06 2013.
[4]
Bellemare, Marc G, Naddaf, Yavar, Veness, Joel, and Bowling, Michael. The arcade learning environment: An evaluation platform for general agents. J. Artif. Intell. Res.(JAIR), 47:253-279, 2013.
[5]
Bentley, Jon Louis. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509-517, September 1975.
[6]
Blundell, Charles, Uria, Benigno, Pritzel, Alexander, Li, Yazhe, Ruderman, Avraham, Leibo, Joel Z, Rae, Jack, Wierstra, Daan, and Hassabis, Demis. Model-free episodic control. arXiv preprint arXiv:1606.04460, 2016.
[7]
Duan, Yan, Schulman, John, Chen, Xi, Bartlett, Peter L, Sutskever, Ilya, and Abbeel, Pieter. Rl2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
[8]
Fernando, Chrisantha, Banarse, Dylan, Blundell, Charles, Zwols, Yori, Ha, David, Rusu, Andrei A, Pritzel, Alexander, and Wierstra, Daan. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734, 2017.
[9]
Gabel, Thomas and Riedmiller, Martin. Cbr for state value function approximation in reinforcement learning. In International Conference on Case-Based Reasoning, pp. 206-221. Springer, 2005.
[10]
Graves, Alex, Wayne, Greg, Reynolds, Malcolm, Harley, Tim, Danihelka, Ivo, Grabska-Barwińska, Agnieszka, Colmenarejo, Sergio Gómez, Grefenstette, Edward, Ramalho, Tiago, Agapiou, John, et al. Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471-476, 2016.
[11]
Harutyunyan, Anna, Bellemare, Marc G, Stepleton, Tom, and Munos, Remi. Q (\ lambda) with off-policy corrections. In International Conference on Algorithmic Learning Theory, pp. 305-320. Springer, 2016.
[12]
Hausknecht, Matthew and Stone, Peter. Deep recurrent q-learning for partially observable mdps. arXiv preprint arXiv:1507.06527, 2015.
[13]
He, Frank S, Liu, Yang, Schwing, Alexander G, and Peng, Jian. Learning to play in a day: Faster deep reinforcement learning by optimality tightening. arXiv preprint arXiv:1611.01606, 2016.
[14]
Hinton, Geoffrey E and Plaut, David C. Using fast weights to deblur old memories. In Proceedings of the ninth annual conference of the Cognitive Science Society, pp. 177-186, 1987.
[15]
Hochreiter, Sepp and Schmidhuber, Jurgen. Long short-term memory. Neural Comput., 9(8):1735-1780, November 1997. ISSN 0899-7667.
[16]
Kaiser, Lukasz, Nachum, Ofir, Roy, Aurko, and Bengio, Samy. Learning to remember rare events. 2016.
[17]
Kingma, Diederik P and Welling, Max. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
[18]
Kumaran, Dharshan, Hassabis, Demis, and McClelland, James L. What learning systems do intelligent agents need? complementary learning systems theory updated. Trends in Cognitive Sciences, 20(7):512-534, 2016.
[19]
Lake, Brenden M, Ullman, Tomer D, Tenenbaum, Joshua B, and Gershman, Samuel J. Building machines that learn and think like people. arXiv preprint arXiv:1604.00289, 2016.
[20]
Lengyel, M. and Dayan, P. Hippocampal contributions to control: The third way. In NIPS, volume 20, pp. 889-896, 2007.
[21]
McCloskey, Michael and Cohen, Neal J. Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of learning and motivation, 24:109-165, 1989.
[22]
Miller, Alexander, Fisch, Adam, Dodge, Jesse, Karimi, Amir-Hossein, Bordes, Antoine, and Weston, Jason. Key-value memory networks for directly reading documents. arXiv preprint arXiv:1606.03126, 2016.
[23]
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
[24]
Mnih, Volodymyr, Badia, Adria Puigdomenech, Mirza, Mehdi, Graves, Alex, Lillicrap, Timothy P, Harley, Tim, Silver, David, and Kavukcuoglu, Koray. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, 2016.
[25]
Munos, Remi and Moore, Andrew W. Barycentric interpolators for continuous space and time reinforcement learning. In NIPS, pp. 1024-1030, 1998.
[26]
Munos, Rémi, Stepleton, Tom, Harutyunyan, Anna, and Bellemare, Marc. Safe and efficient off-policy reinforcement learning. In Advances in Neural Information Processing Systems, pp. 1046-1054, 2016.
[27]
Oh, Junhyuk, Guo, Xiaoxiao, Lee, Honglak, Lewis, Richard L, and Singh, Satinder. Action-conditional video prediction using deep networks in atari games. In Advances in Neural Information Processing Systems, pp. 2845-2853, 2015.
[28]
Oh, Junhyuk, Chockalingam, Valliappa, Lee, Honglak, et al. Control of memory, active perception, and action in minecraft. In Proceedings of The 33rd International Conference on Machine Learning, pp. 2790-2799, 2016.
[29]
Osband, Ian, Blundell, Charles, Pritzel, Alexander, and Van Roy, Benjamin. Deep exploration via bootstrapped dqn. In Advances In Neural Information Processing Systems, pp. 4026-4034, 2016.
[30]
Peng, Jing and Williams, Ronald J. Incremental multi-step q-learning. Machine learning, 22(1-3):283-290, 1996.
[31]
Rezende, Danilo Jimenez, Mohamed, Shakir, and Wierstra, Daan. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of The 31st International Conference on Machine Learning, pp. 1278-1286, 2014.
[32]
Rusu, Andrei A, Rabinowitz, Neil C, Desjardins, Guillaume, Soyer, Hubert, Kirkpatrick, James, Kavukcuoglu, Koray, Pascanu, Razvan, and Hadsell, Raia. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
[33]
Santamaría, Juan C, Sutton, Richard S, and Ram, Ashwin. Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive behavior, 6(2):163-217, 1997.
[34]
Schaul, Tom, Quan, John, Antonoglou, Ioannis, and Silver, David. Prioritized experience replay. CoRR, abs/1511.05952, 2015a.
[35]
Schaul, Tom, Quan, John, Antonoglou, Ioannis, and Silver, David. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015b.
[36]
Silver, David, Huang, Aja, Maddison, Chris J, Guez, Arthur, Sifre, Laurent, Van Den Driessche, George, Schrittwieser, Julian, Antonoglou, Ioannis, Panneershelvam, Veda, Lanctot, Marc, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016.
[37]
Sukhbaatar, Sainbayar, Weston, Jason, Fergus, Rob, et al. End-to-end memory networks. In Advances in neural information processing systems, pp. 2440-2448, 2015.
[38]
Sutton, Richard S. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9-44, 1988.
[39]
Sutton, Richard S and Barto, Andrew G. Reinforcement learning. An introduction. MIT press, 1998.
[40]
Tieleman, Tijmen and Hinton, Geoffrey. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4:2, 2012.
[41]
van Hasselt, H., Guez, A., Hessel, M., and Silver, D. Learning functions across many orders of magnitudes. ArXiv e-prints, February 2016.
[42]
Van Hasselt, Hado, Guez, Arthur, and Silver, David. Deep reinforcement learning with double q-learning. In AAAI, pp. 2094-2100, 2016.
[43]
Vezhnevets, Alexander, Mnih, Volodymyr, Osindero, Simon, Graves, Alex, Vinyals, Oriol, Agapiou, John, et al. Strategic attentive writer for learning macro-actions. In Advances in Neural Information Processing Systems, pp. 3486-3494, 2016.
[44]
Vinyals, Oriol, Blundell, Charles, Lillicrap, Tim, Wierstra, Daan, et al. Matching networks for one shot learning. In Advances in Neural Information Processing Systems, pp. 3630-3638, 2016.
[45]
Wang, Jane X, Kurth-Nelson, Zeb, Tirumala, Dhruva, Soyer, Hubert, Leibo, Joel Z, Munos, Remi, Blundell, Charles, Kumaran, Dharshan, and Botvinick, Matt. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
[46]
Watkins, Christopher JCH and Dayan, Peter. Q-learning. Machine learning, 8(3-4):279-292, 1992.
[47]
Watkins, Christopher John Cornish Hellaby. Learning from delayed rewards. PhD thesis, University of Cambridge England, 1989.

Cited By

View all
  • (2024)Random latent exploration for deep reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693463(34219-34252)Online publication date: 21-Jul-2024
  • (2021)Guiding Evolutionary Strategies with Off-Policy Actor-CriticProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464104(1317-1325)Online publication date: 3-May-2021
  • (2020)Multigrid neural memoryProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525362(4561-4571)Online publication date: 13-Jul-2020
  • Show More Cited By
  1. Neural Episodic Control

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70
    August 2017
    4208 pages

    Publisher

    JMLR.org

    Publication History

    Published: 06 August 2017

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)73
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Random latent exploration for deep reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693463(34219-34252)Online publication date: 21-Jul-2024
    • (2021)Guiding Evolutionary Strategies with Off-Policy Actor-CriticProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464104(1317-1325)Online publication date: 3-May-2021
    • (2020)Multigrid neural memoryProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525362(4561-4571)Online publication date: 13-Jul-2020
    • (2020)Agent57Proceedings of the 37th International Conference on Machine Learning10.5555/3524938.3524986(507-517)Online publication date: 13-Jul-2020
    • (2020)Learning to learn variational semantic memoryProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496489(9122-9134)Online publication date: 6-Dec-2020
    • (2019)Generalization of reinforcement learners with working and episodic memoryProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455404(12469-12478)Online publication date: 8-Dec-2019
    • (2019)Ferroelectric FET Based In-Memory Computing for Few-Shot LearningProceedings of the 2019 Great Lakes Symposium on VLSI10.1145/3299874.3319450(373-378)Online publication date: 13-May-2019
    • (2018)Fast deep reinforcement learning using online adjustments from the pastProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327717(10590-10600)Online publication date: 3-Dec-2018
    • (2018)A simple cache model for image recognitionProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327675(10128-10137)Online publication date: 3-Dec-2018
    • (2018)Learning attractor dynamics for generative memoryProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327610(9401-9410)Online publication date: 3-Dec-2018

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media