More Web Proxy on the site http://driver.im/

article

Natural Actor-Critic

Authors:

Stefan SchaalAuthors Info & Claims

Neurocomputing, Volume 71, Issue 7-9

Pages 1180 - 1190

https://doi.org/10.1016/j.neucom.2007.11.026

Published: 01 March 2008 Publication History

Abstract

In this paper, we suggest a novel reinforcement learning architecture, the Natural Actor-Critic. The actor updates are achieved using stochastic policy gradients employing Amari's natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke's Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

References

[1]

D. Aberdeen, Policy-gradient algorithms for partially observable Markov decision processes, Ph.D. Thesis, Australian National Unversity, 2003.

[2]

Aberdeen, D., POMDPs and policy gradients. In: Proceedings of the Machine Learning Summer School (MLSS), Canberra, Australia.

[3]

Amari, S., Natural gradient works efficiently in learning. Neural Comput. v10. 251-276.

[4]

Bagnell, J. and Schneider, J., Covariant policy search. In: International Joint Conference on Artificial Intelligence,

[5]

L.C. Baird, Advantage updating, Technical Report WL-TR-93-1146, Wright Lab., 1993.

[6]

L.C. Baird, A.W. Moore, Gradient descent for general reinforcement learning, in: Advances in Neural Information Processing Systems, vol. 11, 1999.

[7]

P. Bartlett, An introduction to reinforcement learning theory: value function methods, in: Machine Learning Summer School, 2002, pp. 184-202.

[8]

Bertsekas, D.P. and Tsitsiklis, J.N., Neuro-Dynamic Programming. 1996. Athena Scientific, Belmont, MA.

[9]

J. Boyan, Least-squares temporal difference learning, in: Machine Learning: Proceedings of the Sixteenth International Conference, 1999, pp. 49-56.

Digital Library

[10]

Bradtke, S., Ydstie, E. and Barto, A.G., Adaptive Linear Quadratic Control Using Policy Iteration. 1994. University of Massachusetts, Amherst, MA.

[11]

Buffet, O., Dutech, A. and Charpillet, F., Shaping multi-agent systems with gradient reinforcement learning. Autonomous Agents Multi-Agent Syst. v15 i2. 1387-2532.

[12]

F. Guenter, M. Hersch, S. Calinon, A. Billard, Reinforcement learning for imitating constrained reaching movements, RSJ Adv. Robotics 21 (13) (2007) 1521-1544.

[13]

A. Ijspeert, J. Nakanishi, S. Schaal, Learning rhythmic movements by demonstration using nonlinear oscillators, in: IEEE International Conference on Intelligent Robots and Systems (IROS 2002), 2002, pp. 958-963.

[14]

S.A. Kakade, Natural policy gradient, in: Advances in Neural Information Processing Systems, vol. 14, 2002.

[15]

V. Konda, J. Tsitsiklis, Actor-critic algorithms, in: Advances in Neural Information Processing Systems, vol. 12, 2000.

[16]

Moon, T. and Stirling, W., Mathematical Methods and Algorithms for Signal Processing. 2000. Prentice-Hall, Englewood Cliffs, NJ.

[17]

J. Park, J. Kim, D. Kang, An RLS-based Natural Actor-Critic algorithm for locomotion of a two-linked robot arm, in: Proceedings of Computational Intelligence and Security: International Conference (CIS 2005), Xi'an, China, December 2005, pp. 15-19.

Digital Library

[18]

Peters, J. and Schaal, S., Policy gradient methods for robotics. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Beijing, China.

[19]

Peters, J. and Schaal, S., Applying the episodic natural actor-critic architecture to motor primitive learning. In: Proceedings of the 2007 European Symposium on Artificial Neural Networks (ESANN),

[20]

J. Peters, S. Vijayakumar, S. Schaal, Scaling reinforcement learning paradigms for motor learning, in: Proceedings of the 10th Joint Symposium on Neural Computation (JSNC), Irvine, CA, May 2003.

[21]

J. Peters, S. Vijaykumar, S. Schaal, Reinforcement learning for humanoid robotics, in: IEEE International Conference on Humandoid Robots, 2003.

[22]

J. Peters, S. Vijayakumar, S. Schaal, Natural Actor-Critic, in: Proceedings of the European Machine Learning Conference (ECML), Porto, Portugal, 2005.

[23]

Richter, S., Aberdeen, D. and Yu, J., Natural Actor-Critic for road traffic optimisation. In: Advances in Neural Information Processing Systems,

[24]

Sutton, R.S. and Barto, A.G., Reinforcement Learning. 1998. MIT Press, Cambridge, MA.

[25]

R.S. Sutton, D. McAllester, S. Singh, Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, in: Advances in Neural Information Processing Systems, vol. 12, 2000.

[26]

T. Ueno, Y. Nakamura, T. Shibata, K. Hosoda, S. Ishii, Fast and Stable learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy Natural Actor-Critic, in: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2006.

[27]

N Vishwanathan, S.V., Zhang, X. and Aberdeen, D., Conditional random fields for reinforcement learning. In: Bengio, Y., LeCun, Y. (Eds.), Proceedings of the 2007 Snowbird Learning Workshop, San Juan, Puerto Rico.

[28]

X. Zhang, D. Aberdeen, S.V.N. Vishwanathan, Conditional random fields for multi-agent reinforcement learning, in: Proceedings of the 24th International Conference on Machine Learning (ICML 2007), ACM International Conference Proceeding Series, Corvalis, Oregon, 2007, pp. 1143-1150.

Cited By

Nah MLachner JHogan N(2024)Robot control based on motor primitivesInternational Journal of Robotics Research10.1177/0278364924125878243:12(1959-1991)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1177/02783649241258782
Liang SPan Zliu wYin Jde Rijke M(2024)A Survey on Variational Autoencoders in Recommender SystemsACM Computing Surveys10.1145/366336456:10(1-40)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3663364
Liu ZLiu SYang BXue ZCai QZhao XZhang ZHu LLi HJiang PBaeza-Yates RBonchi F(2024)Modeling User Retention through Generative Flow NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671531(5497-5508)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671531
Show More Cited By

Index Terms

Natural Actor-Critic
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Factorization methods
        Canonical correlation analysis
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Regression analysis
    2. Stochastic processes

Recommendations

Natural actor-critic algorithms

We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and function-approximation ideas, and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy ...
Meta attention for Off-Policy Actor-Critic
Abstract
Off-Policy Actor-Critic methods can effectively exploit past experiences and thus they have achieved great success in various reinforcement learning tasks. In many image-based and multi-agent tasks, attention mechanism has been employed in Actor-...
2008 Special Issue: Reinforcement learning of motor skills with policy gradients

Autonomous learning is one of the hallmarks of human and animal behavior, and understanding the principles of learning will be crucial in order to achieve true autonomy in advanced machines like humanoid robots. In this paper, we examine learning of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing

Neurocomputing Volume 71, Issue 7-9

March, 2008

651 pages

ISSN:0925-2312

Issue’s Table of Contents

Copyright © Elsevier B.V. © 2008.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 March 2008

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

174
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nah MLachner JHogan N(2024)Robot control based on motor primitivesInternational Journal of Robotics Research10.1177/0278364924125878243:12(1959-1991)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1177/02783649241258782
Liang SPan Zliu wYin Jde Rijke M(2024)A Survey on Variational Autoencoders in Recommender SystemsACM Computing Surveys10.1145/366336456:10(1-40)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3663364
Liu ZLiu SYang BXue ZCai QZhao XZhang ZHu LLi HJiang PBaeza-Yates RBonchi F(2024)Modeling User Retention through Generative Flow NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671531(5497-5508)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671531
Liu ZLiu SZhang ZCai QZhao XZhao KHu LJiang PGai KHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term RetentionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657829(1872-1882)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657829
Ge FZhang GLi ZZhou F(2024)A FPGA Accelerator of Distributed A3C Algorithm with Optimal Resource DeploymentIET Computers & Digital Techniques10.1049/2024/78552502024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1049/2024/7855250
Wang XLi D(2024)Bioinspired actor-critic algorithm for reinforcement learning interpretation with Levy–Brown hybrid exploration strategyNeurocomputing10.1016/j.neucom.2024.127291574:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127291
Feng JWei KChen J(2024)Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance ReductionJournal of Scientific Computing10.1007/s10915-024-02688-x101:2Online publication date: 4-Oct-2024
https://dl.acm.org/doi/10.1007/s10915-024-02688-x
Liu ZChen KSong FChen BZhao XGuo HTang R(2024)AutoAssign+: Automatic Shared Embedding Assignment in streaming recommendationKnowledge and Information Systems10.1007/s10115-023-01951-166:1(89-113)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s10115-023-01951-1
Yang YYang PYin GYang D(2024)Reinforced Keyphrase Generation with Multi-Dimensional RewardArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72350-6_21(306-319)Online publication date: 17-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72350-6_21
Cayci SEryilmaz AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Provably robust temporal difference learning for heavy-tailed rewardsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667239(25693-25711)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667239
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents