[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1402821.1402863acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Expediting RL by using graphical structures

Published: 12 May 2008 Publication History

Abstract

The goal of Reinforcement learning (RL) is to maximize reward (minimize cost) in a Markov decision process (MDP) without knowing the underlying model a priori. RL algorithms tend to be much slower than planning algorithms, which require the model as input. Recent results demonstrate that MDP planning can be expedited, by exploiting the graphical structure of the MDP. We present extensions to two popular RL algorithms, Q-learning and RMax, that learn and exploit the graphical structure of problems to improve overall learning speed. Use of the graphical structure of the underlying MDP can greatly improve the speed of planning algorithms, if the underlying MDP has a nontrivial topological structure. Our experiments show that use of the apparent topological structure of an MDP speeds up reinforcement learning, even if the MDP is simply connected.

References

[1]
R. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.
[2]
C. Boutilier, T. Dean, and S. Hanks. Decision-theoretic planning: Structural assumptions and computational leverage. J. of Artificial Intelligence Research, 11:1--94, 1999.
[3]
R. I. Brafman and M. Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. of Machine Learning Research, 3:213--231, 2002.
[4]
P. Dai and J. Goldsmith. Topological value iteration algorithm for Markov decision processes. In Proc. IJCAI-07, pages 1860--1865, 2007.
[5]
E. Hansen. Finite Memory Control of Partially Observable Systems. PhD thesis, University of Massachusetts, Amherst, 1998.
[6]
S. M. Kakade. On the sample complexity of reinforcement learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003.
[7]
M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2--3):209--232, 2002.
[8]
A. Moore and C. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13:103--130, 1993.
[9]
R. Munos and A. Moore. Influence and variance of a Markov chain: Application to adaptive discretization in optimal control. In Proc. of IEEE Conference on Decision and Control, 1999.
[10]
D. J. Pearce and P. H. Kelly. A dynamic topological sort algorithm for directed acyclic graphs. ACM J. of Experimental Algorithmics, 11:1.7, 2007.
[11]
A. L. Strehl and M. L. Littman. A theoretical analysis of model-based interval estimation. In Proc. of ICML-05, pages 856--863, 2005.
[12]
R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 1998.
[13]
C. J. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, UK, 1989.
[14]
C. J. Watkins and P. Dayan. Q-Learning. Machine Learning, 8(3-):279--292, 1992.
[15]
D. Wingate and K. D. Seppi. Prioritization methods for accelerating MDP solvers. J. of Machine Learning Research, 6:851--881, 2005.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '08: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
May 2008
503 pages
ISBN:9780981738123

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 12 May 2008

Check for updates

Qualifiers

  • Research-article

Conference

AAMAS08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 122
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media