[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Epoch-Incremental Queue-Dyna Algorithm

  • Conference paper
Artificial Intelligence and Soft Computing – ICAISC 2008 (ICAISC 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5097))

Included in the following conference series:

Abstract

The basic reinforcement learning algorithm, as Q-learning, is characterized by short time-consuming single learning step, however, the number of epochs necessary to achieve the optimal policy is not satisfactory. There are many methods that reduce the number of necessary epochs, like TD(λ> 0), Dyna or prioritized sweeping, but their learning time is considerable. This paper proposes a combination of Q-learning algorithm performed in incremental mode with executed in epoch mode method of acceleration based on environment model and distance to terminal state. This approach ensures the maintenance of short time of a single learning step and high efficiency comparable with Dyna or prioritized sweeping. Proposed algorithm is compared with Q(λ)-learning, Dyna-Q and prioritized sweeping in the experiments on three maze tasks. The time-consuming learning process and number of epochs necessary to reach the terminal state is used to evaluate the efficiency of compared algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Asadpour, M., Siegwart, R.: Compact Q-learning optimized for micro-robots with processing and memory constraints. In: Robotics and Autonomous Systems. European Conference on Mobile Robots, vol. 48(1), pp. 49–61 (2004)

    Google Scholar 

  2. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning problem. IEEE Trans. SMC 13, 834–847 (1983)

    Google Scholar 

  3. Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to Act using Real-Time Dynamic Programming. Artificial Intelligence. Special Vol. on Computational Research on Interaction and Agency 72(1), 81–138 (1995)

    Google Scholar 

  4. Cichosz, P.: Learning systems. WNT, Warsaw (2000) (in polish)

    Google Scholar 

  5. Crook, P., Hayes, G.: Learning in a State of Confusion: Perceptual Aliasing in Grid World Navigation. In: Proc. of Towards Intelligent Mobile Robots (2003)

    Google Scholar 

  6. Kaelbing, L.P., Litman, M.L., Moore, A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)

    Google Scholar 

  7. Lanzi, P.L.: Adaptive Agents with Reinforcement Learning and Internal Memory. In: Proc. of the Sixth International Conference on the Simulation of Adaptive Behavior, pp. 333–342. The MIT Press, Cambridge (2000)

    Google Scholar 

  8. Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In: ICML, pp. 323–331 (1998)

    Google Scholar 

  9. Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)

    Google Scholar 

  10. Peng, J., Williams, R.J.: Efficient learning and planning within the Dyna framework. In: Meyer, J., Roitblat, H., Wilson, S. (eds.) From Animals to Animats 2, USA, pp. 281–290 (1993)

    Google Scholar 

  11. Pickett, M., Barto, A.G.: PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning. In: Proc. of the International Conference on Machine Learning, vol. 19, pp. 506–513 (2002)

    Google Scholar 

  12. Rummery, G.A., Niranjan, M.: On line q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department (1994)

    Google Scholar 

  13. Sherstov, A.A., Stone, P.: Improving Action Selection in MDP’s via Knowledge Transfer. In: Proc 20th the Nation Conference on Artificial Intelligence, vol. 20(2), pp. 1024–1029 (2005)

    Google Scholar 

  14. Sutton, R.S.: Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. In: Proc. of Seventh Int. Conf. on Machine Learning, pp. 216–224 (1990)

    Google Scholar 

  15. Sutton, R.S.: Planning by incremental dynamic programming. In: Proc. of the Ninth Conference on Machine Learning, pp. 353–357 (1991)

    Google Scholar 

  16. Sutton, R.S., Barto, A.G.: Reinforcement learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  17. Tadepalli, P., Ok, D.: Model–Based Average Reward Reinforcement Learning. Artificial Intelligence 100, 177–224 (1998)

    Article  MATH  Google Scholar 

  18. Tanner, B., Sutton, R.S.: Temporal-Difference Networks with History. In: Proc. of the 2005 International Joint Conference on Artificial Intelligence, pp. 865–870 (2005)

    Google Scholar 

  19. Watkins, C.J.C.H.: Learning from delayed Rewards. PhD thesis, Cambridge University, Cambridge, England (1989)

    Google Scholar 

  20. Wellstead, P.E.: Introduction to Physical System Modelling, Control System Principles (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Leszek Rutkowski Ryszard Tadeusiewicz Lotfi A. Zadeh Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zajdel, R. (2008). Epoch-Incremental Queue-Dyna Algorithm. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2008. ICAISC 2008. Lecture Notes in Computer Science(), vol 5097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69731-2_109

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69731-2_109

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69572-1

  • Online ISBN: 978-3-540-69731-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics