More Web Proxy on the site http://driver.im/

Article

Free access

Neural Episodic Control

Authors:

Alexander Pritzel,

Sriram Srinivasan,

Adrià Puigdomènech Badia,

Demis Hassabis,

Charles BlundellAuthors Info & Claims

ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

Pages 2827 - 2836

Published: 06 August 2017 Publication History

PDF eReader Publisher Site

Abstract

Deep reinforcement learning methods attain super-human performance in a wide range of environments. Such methods are grossly inefficient, often taking orders of magnitudes more data than humans to achieve reasonable performance. We propose Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them. Our agent uses a semi-tabular representation of the value function: a buffer of past experience containing slowly changing state representations and rapidly updated estimates of the value function. We show across a wide range of environments that our agent learns significantly faster than other state-of-the-art, general purpose deep reinforcement learning agents.

References

[1]

Ba, Jimmy, Hinton, Geoffrey E, Mnih, Volodymyr, Leibo, Joel Z, and Ionescu, Catalin. Using fast weights to attend to the recent past. In Advances In Neural Information Processing Systems, pp. 4331-4339, 2016.

Digital Library

[2]

Bakker, Bram, Zhumatiy, Viktor, Gruener, Gabriel, and Schmidhuber, Jürgen. A robot that reinforcement-learns to identify and memorize important previous observations. In Intelligent Robots and Systems, 2003.(IROS 2003). Proceedings. 2003 IEEE/RSJ International Conference on, volume 1, pp. 430-435. IEEE, 2003.

[3]

Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253-279, 06 2013.

[4]

Bellemare, Marc G, Naddaf, Yavar, Veness, Joel, and Bowling, Michael. The arcade learning environment: An evaluation platform for general agents. J. Artif. Intell. Res.(JAIR), 47:253-279, 2013.

Digital Library

[5]

Bentley, Jon Louis. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509-517, September 1975.

Digital Library

[6]

Blundell, Charles, Uria, Benigno, Pritzel, Alexander, Li, Yazhe, Ruderman, Avraham, Leibo, Joel Z, Rae, Jack, Wierstra, Daan, and Hassabis, Demis. Model-free episodic control. arXiv preprint arXiv:1606.04460, 2016.

[7]

Duan, Yan, Schulman, John, Chen, Xi, Bartlett, Peter L, Sutskever, Ilya, and Abbeel, Pieter. Rl²: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.

[8]

Fernando, Chrisantha, Banarse, Dylan, Blundell, Charles, Zwols, Yori, Ha, David, Rusu, Andrei A, Pritzel, Alexander, and Wierstra, Daan. Pathnet: Evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734, 2017.

[9]

Gabel, Thomas and Riedmiller, Martin. Cbr for state value function approximation in reinforcement learning. In International Conference on Case-Based Reasoning, pp. 206-221. Springer, 2005.

Digital Library

[10]

Graves, Alex, Wayne, Greg, Reynolds, Malcolm, Harley, Tim, Danihelka, Ivo, Grabska-Barwińska, Agnieszka, Colmenarejo, Sergio Gómez, Grefenstette, Edward, Ramalho, Tiago, Agapiou, John, et al. Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471-476, 2016.

[11]

Harutyunyan, Anna, Bellemare, Marc G, Stepleton, Tom, and Munos, Remi. Q (\ lambda) with off-policy corrections. In International Conference on Algorithmic Learning Theory, pp. 305-320. Springer, 2016.

Digital Library

[12]

Hausknecht, Matthew and Stone, Peter. Deep recurrent q-learning for partially observable mdps. arXiv preprint arXiv:1507.06527, 2015.

[13]

He, Frank S, Liu, Yang, Schwing, Alexander G, and Peng, Jian. Learning to play in a day: Faster deep reinforcement learning by optimality tightening. arXiv preprint arXiv:1611.01606, 2016.

[14]

Hinton, Geoffrey E and Plaut, David C. Using fast weights to deblur old memories. In Proceedings of the ninth annual conference of the Cognitive Science Society, pp. 177-186, 1987.

[15]

Hochreiter, Sepp and Schmidhuber, Jurgen. Long short-term memory. Neural Comput., 9(8):1735-1780, November 1997. ISSN 0899-7667.

Digital Library

[16]

Kaiser, Lukasz, Nachum, Ofir, Roy, Aurko, and Bengio, Samy. Learning to remember rare events. 2016.

[17]

Kingma, Diederik P and Welling, Max. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

[18]

Kumaran, Dharshan, Hassabis, Demis, and McClelland, James L. What learning systems do intelligent agents need? complementary learning systems theory updated. Trends in Cognitive Sciences, 20(7):512-534, 2016.

[19]

Lake, Brenden M, Ullman, Tomer D, Tenenbaum, Joshua B, and Gershman, Samuel J. Building machines that learn and think like people. arXiv preprint arXiv:1604.00289, 2016.

[20]

Lengyel, M. and Dayan, P. Hippocampal contributions to control: The third way. In NIPS, volume 20, pp. 889-896, 2007.

Digital Library

[21]

McCloskey, Michael and Cohen, Neal J. Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of learning and motivation, 24:109-165, 1989.

[22]

Miller, Alexander, Fisch, Adam, Dodge, Jesse, Karimi, Amir-Hossein, Bordes, Antoine, and Weston, Jason. Key-value memory networks for directly reading documents. arXiv preprint arXiv:1606.03126, 2016.

[23]

Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.

[24]

Mnih, Volodymyr, Badia, Adria Puigdomenech, Mirza, Mehdi, Graves, Alex, Lillicrap, Timothy P, Harley, Tim, Silver, David, and Kavukcuoglu, Koray. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, 2016.

Digital Library

[25]

Munos, Remi and Moore, Andrew W. Barycentric interpolators for continuous space and time reinforcement learning. In NIPS, pp. 1024-1030, 1998.

Digital Library

[26]

Munos, Rémi, Stepleton, Tom, Harutyunyan, Anna, and Bellemare, Marc. Safe and efficient off-policy reinforcement learning. In Advances in Neural Information Processing Systems, pp. 1046-1054, 2016.

Digital Library

[27]

Oh, Junhyuk, Guo, Xiaoxiao, Lee, Honglak, Lewis, Richard L, and Singh, Satinder. Action-conditional video prediction using deep networks in atari games. In Advances in Neural Information Processing Systems, pp. 2845-2853, 2015.

Digital Library

[28]

Oh, Junhyuk, Chockalingam, Valliappa, Lee, Honglak, et al. Control of memory, active perception, and action in minecraft. In Proceedings of The 33rd International Conference on Machine Learning, pp. 2790-2799, 2016.

Digital Library

[29]

Osband, Ian, Blundell, Charles, Pritzel, Alexander, and Van Roy, Benjamin. Deep exploration via bootstrapped dqn. In Advances In Neural Information Processing Systems, pp. 4026-4034, 2016.

Digital Library

[30]

Peng, Jing and Williams, Ronald J. Incremental multi-step q-learning. Machine learning, 22(1-3):283-290, 1996.

Digital Library

[31]

Rezende, Danilo Jimenez, Mohamed, Shakir, and Wierstra, Daan. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of The 31st International Conference on Machine Learning, pp. 1278-1286, 2014.

Digital Library

[32]

Rusu, Andrei A, Rabinowitz, Neil C, Desjardins, Guillaume, Soyer, Hubert, Kirkpatrick, James, Kavukcuoglu, Koray, Pascanu, Razvan, and Hadsell, Raia. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.

[33]

Santamaría, Juan C, Sutton, Richard S, and Ram, Ashwin. Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive behavior, 6(2):163-217, 1997.

Digital Library

[34]

Schaul, Tom, Quan, John, Antonoglou, Ioannis, and Silver, David. Prioritized experience replay. CoRR, abs/1511.05952, 2015a.

[35]

Schaul, Tom, Quan, John, Antonoglou, Ioannis, and Silver, David. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015b.

[36]

Silver, David, Huang, Aja, Maddison, Chris J, Guez, Arthur, Sifre, Laurent, Van Den Driessche, George, Schrittwieser, Julian, Antonoglou, Ioannis, Panneershelvam, Veda, Lanctot, Marc, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016.

[37]

Sukhbaatar, Sainbayar, Weston, Jason, Fergus, Rob, et al. End-to-end memory networks. In Advances in neural information processing systems, pp. 2440-2448, 2015.

Digital Library

[38]

Sutton, Richard S. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9-44, 1988.

[39]

Sutton, Richard S and Barto, Andrew G. Reinforcement learning. An introduction. MIT press, 1998.

Digital Library

[40]

Tieleman, Tijmen and Hinton, Geoffrey. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4:2, 2012.

[41]

van Hasselt, H., Guez, A., Hessel, M., and Silver, D. Learning functions across many orders of magnitudes. ArXiv e-prints, February 2016.

[42]

Van Hasselt, Hado, Guez, Arthur, and Silver, David. Deep reinforcement learning with double q-learning. In AAAI, pp. 2094-2100, 2016.

Digital Library

[43]

Vezhnevets, Alexander, Mnih, Volodymyr, Osindero, Simon, Graves, Alex, Vinyals, Oriol, Agapiou, John, et al. Strategic attentive writer for learning macro-actions. In Advances in Neural Information Processing Systems, pp. 3486-3494, 2016.

Digital Library

[44]

Vinyals, Oriol, Blundell, Charles, Lillicrap, Tim, Wierstra, Daan, et al. Matching networks for one shot learning. In Advances in Neural Information Processing Systems, pp. 3630-3638, 2016.

Digital Library

[45]

Wang, Jane X, Kurth-Nelson, Zeb, Tirumala, Dhruva, Soyer, Hubert, Leibo, Joel Z, Munos, Remi, Blundell, Charles, Kumaran, Dharshan, and Botvinick, Matt. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.

[46]

Watkins, Christopher JCH and Dayan, Peter. Q-learning. Machine learning, 8(3-4):279-292, 1992.

Digital Library

[47]

Watkins, Christopher John Cornish Hellaby. Learning from delayed rewards. PhD thesis, University of Cambridge England, 1989.

Cited By

Mahankali SHong ZSekhari ARakhlin AAgrawal PSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Random latent exploration for deep reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693463(34219-34252)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693463
Tang YDignum FLomuscio AEndriss UNowé A(2021)Guiding Evolutionary Strategies with Off-Policy Actor-CriticProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464104(1317-1325)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464104
Huynh TMaire MWalter MDaumé HSingh A(2020)Multigrid neural memoryProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525362(4561-4571)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525362
Show More Cited By

Neural Episodic Control
1. Computing methodologies

Recommendations

Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Learning to use episodic memory

This paper brings together work in modeling episodic memory and reinforcement learning (RL). We demonstrate that is possible to learn to use episodic memory retrievals while simultaneously learning to act in an external environment. In a series of three ...
Episodic Control: The Role of Memory in Decision Making

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

August 2017

4208 pages

Publisher

JMLR.org

Publication History

Published: 06 August 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
346
Total Downloads

Downloads (Last 12 months)77
Downloads (Last 6 weeks)4

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mahankali SHong ZSekhari ARakhlin AAgrawal PSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Random latent exploration for deep reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693463(34219-34252)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693463
Tang YDignum FLomuscio AEndriss UNowé A(2021)Guiding Evolutionary Strategies with Off-Policy Actor-CriticProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464104(1317-1325)Online publication date: 3-May-2021
https://dl.acm.org/doi/10.5555/3463952.3464104
Huynh TMaire MWalter MDaumé HSingh A(2020)Multigrid neural memoryProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525362(4561-4571)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525362
Badia APiot BKapturowski SSprechmann PVitvitskyi AGuo DBlundell CDaumé HSingh A(2020)Agent57Proceedings of the 37th International Conference on Machine Learning10.5555/3524938.3524986(507-517)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3524986
Zhen XDu YXiong HQiu QSnoek CShao LLarochelle HRanzato MHadsell RBalcan MLin H(2020)Learning to learn variational semantic memoryProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496489(9122-9134)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496489
Fortunato MTan MFaulkner RHansen SBadia AButtimore GDeck CLeibo JBlundell CWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Generalization of reinforcement learners with working and episodic memoryProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455404(12469-12478)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3455404
Laguna AYin XReis DNiemier MHu XHomayoun HTaskin BMohsenin TZhao W(2019)Ferroelectric FET Based In-Memory Computing for Few-Shot LearningProceedings of the 2019 Great Lakes Symposium on VLSI10.1145/3299874.3319450(373-378)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3299874.3319450
Hansen SSprechmann PPritzel ABarreto ABlundell C(2018)Fast deep reinforcement learning using online adjustments from the pastProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327717(10590-10600)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327546.3327717
Orhan E(2018)A simple cache model for image recognitionProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327675(10128-10137)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327546.3327675
Wu YWayne GGregor KLillicrap T(2018)Learning attractor dynamics for generative memoryProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327610(9401-9410)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327546.3327610

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten