Non-delusional Q-learning and value iteration
Abstract
References
- Non-delusional Q-learning and value iteration
Recommendations
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-factors. The main difference is in the policy evaluation phase: instead of ...
Focused topological value iteration
ICAPS'09: Proceedings of the Nineteenth International Conference on International Conference on Automated Planning and SchedulingTopological value iteration (TVI) is an effective algorithm for solving Markov decision processes (MDPs) optimally, which divides an MDP into strongly-connected components, and solves these components sequentially. Yet, TVI's usefulness tends to degrade ...
Approximate Q-Learning: An Introduction
ICMLC '10: Proceedings of the 2010 Second International Conference on Machine Learning and ComputingThis paper introduces an approach to Q-learning algorithm with rough set theory introduced by Zdzislaw Pawlak in 1981. During Q-learning, an agent makes action selections in an effort to maximize a reward signal obtained from the environment. Based on ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
Qualifiers
- Article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 114Total Downloads
- Downloads (Last 12 months)61
- Downloads (Last 6 weeks)11
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in