[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Natural Actor-Critic

Published: 01 March 2008 Publication History

Abstract

In this paper, we suggest a novel reinforcement learning architecture, the Natural Actor-Critic. The actor updates are achieved using stochastic policy gradients employing Amari's natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke's Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

References

[1]
D. Aberdeen, Policy-gradient algorithms for partially observable Markov decision processes, Ph.D. Thesis, Australian National Unversity, 2003.
[2]
Aberdeen, D., POMDPs and policy gradients. In: Proceedings of the Machine Learning Summer School (MLSS), Canberra, Australia.
[3]
Amari, S., Natural gradient works efficiently in learning. Neural Comput. v10. 251-276.
[4]
Bagnell, J. and Schneider, J., Covariant policy search. In: International Joint Conference on Artificial Intelligence,
[5]
L.C. Baird, Advantage updating, Technical Report WL-TR-93-1146, Wright Lab., 1993.
[6]
L.C. Baird, A.W. Moore, Gradient descent for general reinforcement learning, in: Advances in Neural Information Processing Systems, vol. 11, 1999.
[7]
P. Bartlett, An introduction to reinforcement learning theory: value function methods, in: Machine Learning Summer School, 2002, pp. 184-202.
[8]
Bertsekas, D.P. and Tsitsiklis, J.N., Neuro-Dynamic Programming. 1996. Athena Scientific, Belmont, MA.
[9]
J. Boyan, Least-squares temporal difference learning, in: Machine Learning: Proceedings of the Sixteenth International Conference, 1999, pp. 49-56.
[10]
Bradtke, S., Ydstie, E. and Barto, A.G., Adaptive Linear Quadratic Control Using Policy Iteration. 1994. University of Massachusetts, Amherst, MA.
[11]
Buffet, O., Dutech, A. and Charpillet, F., Shaping multi-agent systems with gradient reinforcement learning. Autonomous Agents Multi-Agent Syst. v15 i2. 1387-2532.
[12]
F. Guenter, M. Hersch, S. Calinon, A. Billard, Reinforcement learning for imitating constrained reaching movements, RSJ Adv. Robotics 21 (13) (2007) 1521-1544.
[13]
A. Ijspeert, J. Nakanishi, S. Schaal, Learning rhythmic movements by demonstration using nonlinear oscillators, in: IEEE International Conference on Intelligent Robots and Systems (IROS 2002), 2002, pp. 958-963.
[14]
S.A. Kakade, Natural policy gradient, in: Advances in Neural Information Processing Systems, vol. 14, 2002.
[15]
V. Konda, J. Tsitsiklis, Actor-critic algorithms, in: Advances in Neural Information Processing Systems, vol. 12, 2000.
[16]
Moon, T. and Stirling, W., Mathematical Methods and Algorithms for Signal Processing. 2000. Prentice-Hall, Englewood Cliffs, NJ.
[17]
J. Park, J. Kim, D. Kang, An RLS-based Natural Actor-Critic algorithm for locomotion of a two-linked robot arm, in: Proceedings of Computational Intelligence and Security: International Conference (CIS 2005), Xi'an, China, December 2005, pp. 15-19.
[18]
Peters, J. and Schaal, S., Policy gradient methods for robotics. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Beijing, China.
[19]
Peters, J. and Schaal, S., Applying the episodic natural actor-critic architecture to motor primitive learning. In: Proceedings of the 2007 European Symposium on Artificial Neural Networks (ESANN),
[20]
J. Peters, S. Vijayakumar, S. Schaal, Scaling reinforcement learning paradigms for motor learning, in: Proceedings of the 10th Joint Symposium on Neural Computation (JSNC), Irvine, CA, May 2003.
[21]
J. Peters, S. Vijaykumar, S. Schaal, Reinforcement learning for humanoid robotics, in: IEEE International Conference on Humandoid Robots, 2003.
[22]
J. Peters, S. Vijayakumar, S. Schaal, Natural Actor-Critic, in: Proceedings of the European Machine Learning Conference (ECML), Porto, Portugal, 2005.
[23]
Richter, S., Aberdeen, D. and Yu, J., Natural Actor-Critic for road traffic optimisation. In: Advances in Neural Information Processing Systems,
[24]
Sutton, R.S. and Barto, A.G., Reinforcement Learning. 1998. MIT Press, Cambridge, MA.
[25]
R.S. Sutton, D. McAllester, S. Singh, Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, in: Advances in Neural Information Processing Systems, vol. 12, 2000.
[26]
T. Ueno, Y. Nakamura, T. Shibata, K. Hosoda, S. Ishii, Fast and Stable learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy Natural Actor-Critic, in: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2006.
[27]
N Vishwanathan, S.V., Zhang, X. and Aberdeen, D., Conditional random fields for reinforcement learning. In: Bengio, Y., LeCun, Y. (Eds.), Proceedings of the 2007 Snowbird Learning Workshop, San Juan, Puerto Rico.
[28]
X. Zhang, D. Aberdeen, S.V.N. Vishwanathan, Conditional random fields for multi-agent reinforcement learning, in: Proceedings of the 24th International Conference on Machine Learning (ICML 2007), ACM International Conference Proceeding Series, Corvalis, Oregon, 2007, pp. 1143-1150.

Cited By

View all
  • (2024)Robot control based on motor primitivesInternational Journal of Robotics Research10.1177/0278364924125878243:12(1959-1991)Online publication date: 1-Oct-2024
  • (2024)A Survey on Variational Autoencoders in Recommender SystemsACM Computing Surveys10.1145/366336456:10(1-40)Online publication date: 24-Jun-2024
  • (2024)Modeling User Retention through Generative Flow NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671531(5497-5508)Online publication date: 25-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing
Neurocomputing  Volume 71, Issue 7-9
March, 2008
651 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 March 2008

Author Tags

  1. Actor-Critic methods
  2. Compatible function approximation
  3. Natural gradients
  4. Policy-gradient methods
  5. Reinforcement learning
  6. Robot learning

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Robot control based on motor primitivesInternational Journal of Robotics Research10.1177/0278364924125878243:12(1959-1991)Online publication date: 1-Oct-2024
  • (2024)A Survey on Variational Autoencoders in Recommender SystemsACM Computing Surveys10.1145/366336456:10(1-40)Online publication date: 24-Jun-2024
  • (2024)Modeling User Retention through Generative Flow NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671531(5497-5508)Online publication date: 25-Aug-2024
  • (2024)Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term RetentionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657829(1872-1882)Online publication date: 10-Jul-2024
  • (2024)A FPGA Accelerator of Distributed A3C Algorithm with Optimal Resource DeploymentIET Computers & Digital Techniques10.1049/2024/78552502024Online publication date: 1-Jan-2024
  • (2024)Bioinspired actor-critic algorithm for reinforcement learning interpretation with Levy–Brown hybrid exploration strategyNeurocomputing10.1016/j.neucom.2024.127291574:COnline publication date: 14-Mar-2024
  • (2024)Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance ReductionJournal of Scientific Computing10.1007/s10915-024-02688-x101:2Online publication date: 4-Oct-2024
  • (2024)AutoAssign+: Automatic Shared Embedding Assignment in streaming recommendationKnowledge and Information Systems10.1007/s10115-023-01951-166:1(89-113)Online publication date: 1-Jan-2024
  • (2024)Reinforced Keyphrase Generation with Multi-Dimensional RewardArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72350-6_21(306-319)Online publication date: 17-Sep-2024
  • (2023)Provably robust temporal difference learning for heavy-tailed rewardsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667239(25693-25711)Online publication date: 10-Dec-2023
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media