More Web Proxy on the site http://driver.im/

research-article

Expediting RL by using graphical structures

Authors:

Alexander L. Strehl,

Judy GoldsmithAuthors Info & Claims

AAMAS '08: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3

Pages 1325 - 1328

Published: 12 May 2008 Publication History

Abstract

The goal of Reinforcement learning (RL) is to maximize reward (minimize cost) in a Markov decision process (MDP) without knowing the underlying model a priori. RL algorithms tend to be much slower than planning algorithms, which require the model as input. Recent results demonstrate that MDP planning can be expedited, by exploiting the graphical structure of the MDP. We present extensions to two popular RL algorithms, Q-learning and RMax, that learn and exploit the graphical structure of problems to improve overall learning speed. Use of the graphical structure of the underlying MDP can greatly improve the speed of planning algorithms, if the underlying MDP has a nontrivial topological structure. Our experiments show that use of the apparent topological structure of an MDP speeds up reinforcement learning, even if the MDP is simply connected.

References

[1]

R. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.

Digital Library

[2]

C. Boutilier, T. Dean, and S. Hanks. Decision-theoretic planning: Structural assumptions and computational leverage. J. of Artificial Intelligence Research, 11:1--94, 1999.

[3]

R. I. Brafman and M. Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. of Machine Learning Research, 3:213--231, 2002.

Digital Library

[4]

P. Dai and J. Goldsmith. Topological value iteration algorithm for Markov decision processes. In Proc. IJCAI-07, pages 1860--1865, 2007.

Digital Library

[5]

E. Hansen. Finite Memory Control of Partially Observable Systems. PhD thesis, University of Massachusetts, Amherst, 1998.

Digital Library

[6]

S. M. Kakade. On the sample complexity of reinforcement learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003.

[7]

M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2--3):209--232, 2002.

Digital Library

[8]

A. Moore and C. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13:103--130, 1993.

[9]

R. Munos and A. Moore. Influence and variance of a Markov chain: Application to adaptive discretization in optimal control. In Proc. of IEEE Conference on Decision and Control, 1999.

[10]

D. J. Pearce and P. H. Kelly. A dynamic topological sort algorithm for directed acyclic graphs. ACM J. of Experimental Algorithmics, 11:1.7, 2007.

Digital Library

[11]

A. L. Strehl and M. L. Littman. A theoretical analysis of model-based interval estimation. In Proc. of ICML-05, pages 856--863, 2005.

Digital Library

[12]

R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 1998.

Digital Library

[13]

C. J. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, UK, 1989.

[14]

C. J. Watkins and P. Dayan. Q-Learning. Machine Learning, 8(3-):279--292, 1992.

Digital Library

[15]

D. Wingate and K. D. Seppi. Prioritization methods for accelerating MDP solvers. J. of Machine Learning Research, 6:851--881, 2005.

Digital Library

Index Terms

Expediting RL by using graphical structures

Recommendations

Evo-RL: evolutionary-driven reinforcement learning
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion

In this work, we propose a novel approach for reinforcement learning driven by evolutionary computation. Our algorithm, dubbed as Evolutionary-Driven Reinforcement Learning (Evo-RL), embeds the reinforcement learning algorithm in an evolutionary cycle, ...
Inventory Control with Generalized Expediting

We consider a single-item, periodic-review inventory control problem in which discrete stochastic demand must be satisfied. When shortages occur, the unmet demand must be filled by some form of expediting; we allow a very general form for the cost ...
IMED-RL: regret optimal learning of ergodic Markov decision processes
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing Systems

We consider reinforcement learning in a discrete, undiscounted, infinite-horizon Markov Decision Problem (MDP) under the average reward criterion, and focus on the minimization of the regret with respect to an optimal policy, when the learner does not ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '08: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3

May 2008

503 pages

ISBN:9780981738123

Sponsors

ACM: Association for Computing Machinery
AAAI: Association for the Advancement of Artifical Intelligence

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 12 May 2008

Check for updates

Qualifiers

Research-article

Conference

AAMAS08

Sponsor:

ACM
AAAI

AAMAS08: 7th International Conference on Autonomous Agents and Multi Agent Systems

May 12 - 16, 2008

Estoril, Portugal

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
122
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents