Abstract
Reinforcement learning algorithms are used to deal with a lot of sequential problems, such as playing games, mechanical control, and so on. Q-Learning is a model-free reinforcement learning method. In traditional Q-learning algorithms, the agent stops immediately after it has reached the goal. We propose in this paper a new method—Experience-based Exploration method—in order to sample more efficient state-action pairs for Q-learning updating. In the Experience-based Exploration method, the agent does not stop and continues to search the states with high bellman-error inversely. In this setting, the agent will set the terminal state as a new start point, and generate pairs of action and state which could be useful. The efficacy of the method is proved analytically. And the experimental results verify the hypothesis on Gridworld.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine learning 8(3–4), 279–292 (1992)
Hasselt, H.V., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Computer Science (2015)
Hasselt, H.V.: Double Q-learning, pp. 2613–2621. Mit Press, Cambridge (2010)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized Experience Replay. arXiv preprint arXiv:1511.05952 (2015)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. arXiv preprint arXiv:1602.01783 (2016)
Parisotto, E., Ba, J., Salakhutdinov, R.: Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning. arXiv preprint arXiv:1511.06342 (2015)
Wang, Z., Freitas, N., Lanctot, M.: Dueling Network Architectures for Deep Reinforcement Learning. arXiv preprint arXiv:1511.06581 (2015)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2015)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning, pp. 387–395 (2014)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
MacAlpine, P., Depinet, M., Stone, P.: UT Austin villa 2014: RoboCup 3D simulation league champion via overlapping layered learning. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2842–2848 (2015)
Yu, C., Zhang, M., Ren, F., Tan, G.: Emotional multiagent reinforcement learning in spatial social dilemmas. IEEE Trans. Neural Netw. Learn. Syst. 26(12), 3083–3096 (2015)
Zhao, D., Zhu, Y.: MEC–A near-optimal online reinforcement learning algorithm for continuous deterministic systems. IEEE Trans. Neural Netw. Learn. Syst. 26(2), 346–356 (2015)
Kusy, M., Zajdel, R.: Application of reinforcement learning algorithms for the adaptive computation of the smoothing parameter for probabilistic neural network. IEEE Trans. Neural Netw. Learn. Syst. 26(9), 2163–2175 (2015)
Teng, T.H., Tan, A.H., Zurada, J.M.: Self-organizing neural networks integrating domain knowledge and reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 26(5), 889–902 (2015)
Deng, Y., Bao, F., Kong, Y., Ren, Z., Dai, Q.: Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. Neural Netw. Learn. Syst. 28(3), 653–664 (2017)
DeNero, J., Klein, D.: Pacman Project (2012). http://ai.berkeley.edu/reinforcement.html
Ng, A.Y., Jordan, M.: PEGASUS: A policy search method for large MDPs and POMDPs. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., pp. 406–415 (2014)
Jia, Y., et al.: Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014)
Acknowledgments
This work was supported in part by National Natural Science Foundation of China (No.81373555) and Shanghai Committee of Science and Technology (14JC1402200).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yang, B., Lu, H., Li, B., Zhang, Z., Zhang, W. (2018). A Novel Experience-Based Exploration Method for Q-Learning. In: Zhou, Q., Gan, Y., Jing, W., Song, X., Wang, Y., Lu, Z. (eds) Data Science. ICPCSEE 2018. Communications in Computer and Information Science, vol 901. Springer, Singapore. https://doi.org/10.1007/978-981-13-2203-7_17
Download citation
DOI: https://doi.org/10.1007/978-981-13-2203-7_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2202-0
Online ISBN: 978-981-13-2203-7
eBook Packages: Computer ScienceComputer Science (R0)