Abstract
We point out that value-based reinforcement learning, such as TDand Q-learning, is not applicable to games of imperfect information. We give a reinforcement learning algorithm for two-player poker based on gradient search in the agents’ parameter spaces. The two competing agents experiment with different strategies, and simultaneously shift their probability distributions towards more successful actions. The algorithm is a special case of the lagging anchor algorithm, to appear in the journal Machine Learning. We test the algorithm on a simplified, yet non-trivial, version of two-player Hold’em poker, with good results.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Dahl, F.A.: The lagging anchor algorithm. Reinforcement learning in two-player zero-sum games with imperfect information. Machine Learning (to appear).
Owen, G.: Game Theory. 3rd ed. Academic Press, San Diego (1995).
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3 (1988) 9–44.
Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, University of Cambridge, UK (1989).
Szepesvari, C., Littman, M.L.: A unified analysis of value-function-based reinforcement learning algorithms. Neural Computation 11 (1999) 2017–2060.
Tesauro, G.J.: Practical issues in temporal difference learning. Machine Learning 8 (1992) 257–277.
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning, Morgan Kaufmann, New Brunswick (1994) 157–163.
Dahl F.A., Halck O.M.: Minimax TD-learning with neural nets in a Markov game. In: Lopez de Mantaras, R., Plaza, E. (eds.): ECML 2000. Proceedings of the 11th European Conference on Machine Learning. Lecture Notes in Computer Science Vol. 1810, Springer-Verlag, Berlin-Heidelberg-New York (2000) 117–128.
Koller, D., Megiddo, N., von Stengel, B.: Efficient computation of equilibria for extensive two-person games. Games and Economic Behavior 14 (1996) 247–259.
Luce, R.D., Raiffa, H.: Games and Decisions. Wiley, New York (1957).
Koller, D., Pfeffer, A.: Representations and solutions for game-theoretic problems. Artificial Intelligence 94 (1997) 167–215.
Schaeffer, J., Billings, D., Peña, L., Szafron, D.: Learning to play strong poker. In: Fürnkranz, J., Kubat, M. (eds.): Proceedings of the ICML-99-Workshop on Machine Learning in Game Playing, Jozef Stefan Institute, Ljubljana (1999).
Hassoun, M.H.: Fundamentals of Artificial Neural Networks. MIT Press, Cambridge, Massachusetts (1995).
Selten R. (1991). Anticipatory learning in two-person games, in: Selten, R. (ed.): Game equilibrium models, vol. I: Evolution and game dynamics, Springer-Verlag, Berlin.
Halck, O.M., Dahl, F.A.: On classification of games and evaluation of players — with some sweeping generalizations about the literature. In: Fürnkranz, J., Kubat, M. (eds.): Proceedings of the ICML-99-Workshop on Machine Learning in Game Playing, Jozef Stefan Institute, Ljubljana (1999).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dahl, F.A. (2003). A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold’em Poker. In: De Raedt, L., Flach, P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science(), vol 2167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44795-4_8
Download citation
DOI: https://doi.org/10.1007/3-540-44795-4_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42536-6
Online ISBN: 978-3-540-44795-5
eBook Packages: Springer Book Archive