Epoch-Incremental Queue-Dyna Algorithm

Roman Zajdel¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5097))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

Abstract

The basic reinforcement learning algorithm, as Q-learning, is characterized by short time-consuming single learning step, however, the number of epochs necessary to achieve the optimal policy is not satisfactory. There are many methods that reduce the number of necessary epochs, like TD(λ> 0), Dyna or prioritized sweeping, but their learning time is considerable. This paper proposes a combination of Q-learning algorithm performed in incremental mode with executed in epoch mode method of acceleration based on environment model and distance to terminal state. This approach ensures the maintenance of short time of a single learning step and high efficiency comparable with Dyna or prioritized sweeping. Proposed algorithm is compared with Q(λ)-learning, Dyna-Q and prioritized sweeping in the experiments on three maze tasks. The time-consuming learning process and number of epochs necessary to reach the terminal state is used to evaluate the efficiency of compared algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Improved Q-Learning Algorithm for Path Planning in Maze Environments

Benchmarking Deep and Non-deep Reinforcement Learning Algorithms for Discrete Environments

Coupling Effect of Exploration Rate and Learning Rate for Optimized Scaled Reinforcement Learning

Article 25 August 2023

References

Asadpour, M., Siegwart, R.: Compact Q-learning optimized for micro-robots with processing and memory constraints. In: Robotics and Autonomous Systems. European Conference on Mobile Robots, vol. 48(1), pp. 49–61 (2004)
Google Scholar
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning problem. IEEE Trans. SMC 13, 834–847 (1983)
Google Scholar
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to Act using Real-Time Dynamic Programming. Artificial Intelligence. Special Vol. on Computational Research on Interaction and Agency 72(1), 81–138 (1995)
Google Scholar
Cichosz, P.: Learning systems. WNT, Warsaw (2000) (in polish)
Google Scholar
Crook, P., Hayes, G.: Learning in a State of Confusion: Perceptual Aliasing in Grid World Navigation. In: Proc. of Towards Intelligent Mobile Robots (2003)
Google Scholar
Kaelbing, L.P., Litman, M.L., Moore, A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Lanzi, P.L.: Adaptive Agents with Reinforcement Learning and Internal Memory. In: Proc. of the Sixth International Conference on the Simulation of Adaptive Behavior, pp. 333–342. The MIT Press, Cambridge (2000)
Google Scholar
Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In: ICML, pp. 323–331 (1998)
Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)
Google Scholar
Peng, J., Williams, R.J.: Efficient learning and planning within the Dyna framework. In: Meyer, J., Roitblat, H., Wilson, S. (eds.) From Animals to Animats 2, USA, pp. 281–290 (1993)
Google Scholar
Pickett, M., Barto, A.G.: PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning. In: Proc. of the International Conference on Machine Learning, vol. 19, pp. 506–513 (2002)
Google Scholar
Rummery, G.A., Niranjan, M.: On line q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department (1994)
Google Scholar
Sherstov, A.A., Stone, P.: Improving Action Selection in MDP’s via Knowledge Transfer. In: Proc 20th the Nation Conference on Artificial Intelligence, vol. 20(2), pp. 1024–1029 (2005)
Google Scholar
Sutton, R.S.: Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. In: Proc. of Seventh Int. Conf. on Machine Learning, pp. 216–224 (1990)
Google Scholar
Sutton, R.S.: Planning by incremental dynamic programming. In: Proc. of the Ninth Conference on Machine Learning, pp. 353–357 (1991)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Tadepalli, P., Ok, D.: Model–Based Average Reward Reinforcement Learning. Artificial Intelligence 100, 177–224 (1998)
Article MATH Google Scholar
Tanner, B., Sutton, R.S.: Temporal-Difference Networks with History. In: Proc. of the 2005 International Joint Conference on Artificial Intelligence, pp. 865–870 (2005)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed Rewards. PhD thesis, Cambridge University, Cambridge, England (1989)
Google Scholar
Wellstead, P.E.: Introduction to Physical System Modelling, Control System Principles (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical and Computer Engineering, Rzeszow University of Technology, W. Pola 2, 35-959, Rzeszow, Poland
Roman Zajdel

Authors

Roman Zajdel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Leszek Rutkowski Ryszard Tadeusiewicz Lotfi A. Zadeh Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zajdel, R. (2008). Epoch-Incremental Queue-Dyna Algorithm. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2008. ICAISC 2008. Lecture Notes in Computer Science(), vol 5097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69731-2_109

Download citation

DOI: https://doi.org/10.1007/978-3-540-69731-2_109
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69572-1
Online ISBN: 978-3-540-69731-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics