Abstract
The basic reinforcement learning algorithm, as Q-learning, is characterized by short time-consuming single learning step, however, the number of epochs necessary to achieve the optimal policy is not satisfactory. There are many methods that reduce the number of necessary epochs, like TD(λ> 0), Dyna or prioritized sweeping, but their learning time is considerable. This paper proposes a combination of Q-learning algorithm performed in incremental mode with executed in epoch mode method of acceleration based on environment model and distance to terminal state. This approach ensures the maintenance of short time of a single learning step and high efficiency comparable with Dyna or prioritized sweeping. Proposed algorithm is compared with Q(λ)-learning, Dyna-Q and prioritized sweeping in the experiments on three maze tasks. The time-consuming learning process and number of epochs necessary to reach the terminal state is used to evaluate the efficiency of compared algorithms.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Asadpour, M., Siegwart, R.: Compact Q-learning optimized for micro-robots with processing and memory constraints. In: Robotics and Autonomous Systems. European Conference on Mobile Robots, vol. 48(1), pp. 49–61 (2004)
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning problem. IEEE Trans. SMC 13, 834–847 (1983)
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to Act using Real-Time Dynamic Programming. Artificial Intelligence. Special Vol. on Computational Research on Interaction and Agency 72(1), 81–138 (1995)
Cichosz, P.: Learning systems. WNT, Warsaw (2000) (in polish)
Crook, P., Hayes, G.: Learning in a State of Confusion: Perceptual Aliasing in Grid World Navigation. In: Proc. of Towards Intelligent Mobile Robots (2003)
Kaelbing, L.P., Litman, M.L., Moore, A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Lanzi, P.L.: Adaptive Agents with Reinforcement Learning and Internal Memory. In: Proc. of the Sixth International Conference on the Simulation of Adaptive Behavior, pp. 333–342. The MIT Press, Cambridge (2000)
Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In: ICML, pp. 323–331 (1998)
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)
Peng, J., Williams, R.J.: Efficient learning and planning within the Dyna framework. In: Meyer, J., Roitblat, H., Wilson, S. (eds.) From Animals to Animats 2, USA, pp. 281–290 (1993)
Pickett, M., Barto, A.G.: PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning. In: Proc. of the International Conference on Machine Learning, vol. 19, pp. 506–513 (2002)
Rummery, G.A., Niranjan, M.: On line q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department (1994)
Sherstov, A.A., Stone, P.: Improving Action Selection in MDP’s via Knowledge Transfer. In: Proc 20th the Nation Conference on Artificial Intelligence, vol. 20(2), pp. 1024–1029 (2005)
Sutton, R.S.: Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. In: Proc. of Seventh Int. Conf. on Machine Learning, pp. 216–224 (1990)
Sutton, R.S.: Planning by incremental dynamic programming. In: Proc. of the Ninth Conference on Machine Learning, pp. 353–357 (1991)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An Introduction. MIT Press, Cambridge (1998)
Tadepalli, P., Ok, D.: Model–Based Average Reward Reinforcement Learning. Artificial Intelligence 100, 177–224 (1998)
Tanner, B., Sutton, R.S.: Temporal-Difference Networks with History. In: Proc. of the 2005 International Joint Conference on Artificial Intelligence, pp. 865–870 (2005)
Watkins, C.J.C.H.: Learning from delayed Rewards. PhD thesis, Cambridge University, Cambridge, England (1989)
Wellstead, P.E.: Introduction to Physical System Modelling, Control System Principles (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zajdel, R. (2008). Epoch-Incremental Queue-Dyna Algorithm. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2008. ICAISC 2008. Lecture Notes in Computer Science(), vol 5097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69731-2_109
Download citation
DOI: https://doi.org/10.1007/978-3-540-69731-2_109
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69572-1
Online ISBN: 978-3-540-69731-2
eBook Packages: Computer ScienceComputer Science (R0)