A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold’em Poker

Fredrik A. Dahl³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2167))

Included in the following conference series:

European Conference on Machine Learning

6669 Accesses
11 Citations

Abstract

We point out that value-based reinforcement learning, such as TDand Q-learning, is not applicable to games of imperfect information. We give a reinforcement learning algorithm for two-player poker based on gradient search in the agents’ parameter spaces. The two competing agents experiment with different strategies, and simultaneously shift their probability distributions towards more successful actions. The algorithm is a special case of the lagging anchor algorithm, to appear in the journal Machine Learning. We test the algorithm on a simplified, yet non-trivial, version of two-player Hold’em poker, with good results.

Download to read the full chapter text

Chapter PDF

Iterated Boxed Pigs Game: A Reinforcement Learning Approach

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Adaptive Playouts in Monte-Carlo Tree Search with Policy-Gradient Reinforcement Learning

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Dahl, F.A.: The lagging anchor algorithm. Reinforcement learning in two-player zero-sum games with imperfect information. Machine Learning (to appear).
Google Scholar
Owen, G.: Game Theory. 3^rd ed. Academic Press, San Diego (1995).
Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3 (1988) 9–44.
Google Scholar
Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, University of Cambridge, UK (1989).
Google Scholar
Szepesvari, C., Littman, M.L.: A unified analysis of value-function-based reinforcement learning algorithms. Neural Computation 11 (1999) 2017–2060.
Article Google Scholar
Tesauro, G.J.: Practical issues in temporal difference learning. Machine Learning 8 (1992) 257–277.
MATH Google Scholar
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning, Morgan Kaufmann, New Brunswick (1994) 157–163.
Google Scholar
Dahl F.A., Halck O.M.: Minimax TD-learning with neural nets in a Markov game. In: Lopez de Mantaras, R., Plaza, E. (eds.): ECML 2000. Proceedings of the 11th European Conference on Machine Learning. Lecture Notes in Computer Science Vol. 1810, Springer-Verlag, Berlin-Heidelberg-New York (2000) 117–128.
Chapter Google Scholar
Koller, D., Megiddo, N., von Stengel, B.: Efficient computation of equilibria for extensive two-person games. Games and Economic Behavior 14 (1996) 247–259.
Article MATH MathSciNet Google Scholar
Luce, R.D., Raiffa, H.: Games and Decisions. Wiley, New York (1957).
MATH Google Scholar
Koller, D., Pfeffer, A.: Representations and solutions for game-theoretic problems. Artificial Intelligence 94 (1997) 167–215.
Article MATH MathSciNet Google Scholar
Schaeffer, J., Billings, D., Peña, L., Szafron, D.: Learning to play strong poker. In: Fürnkranz, J., Kubat, M. (eds.): Proceedings of the ICML-99-Workshop on Machine Learning in Game Playing, Jozef Stefan Institute, Ljubljana (1999).
Google Scholar
Hassoun, M.H.: Fundamentals of Artificial Neural Networks. MIT Press, Cambridge, Massachusetts (1995).
MATH Google Scholar
Selten R. (1991). Anticipatory learning in two-person games, in: Selten, R. (ed.): Game equilibrium models, vol. I: Evolution and game dynamics, Springer-Verlag, Berlin.
Google Scholar
Halck, O.M., Dahl, F.A.: On classification of games and evaluation of players — with some sweeping generalizations about the literature. In: Fürnkranz, J., Kubat, M. (eds.): Proceedings of the ICML-99-Workshop on Machine Learning in Game Playing, Jozef Stefan Institute, Ljubljana (1999).
Google Scholar

Download references

Author information

Authors and Affiliations

Norwegian Defence Research Establishment (FFI), P.O. Box 25, NO-2027, Kjeller, Norway
Fredrik A. Dahl

Authors

Fredrik A. Dahl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Albert-Ludwigs University Freiburg, Georges Köhler-Allee, Geb. 079, 79110, Freiburg, Germany
Luc De Raedt
Department of Computer Science, University of Bristol, Merchant Ventures Bldg., Woodland Road, Bristol, BS8 1UB, UK
Peter Flach

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dahl, F.A. (2003). A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold’em Poker. In: De Raedt, L., Flach, P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science(), vol 2167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44795-4_8

Download citation

DOI: https://doi.org/10.1007/3-540-44795-4_8
Published: 30 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42536-6
Online ISBN: 978-3-540-44795-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold’em Poker

Abstract

Chapter PDF

Similar content being viewed by others

Iterated Boxed Pigs Game: A Reinforcement Learning Approach

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Adaptive Playouts in Monte-Carlo Tree Search with Policy-Gradient Reinforcement Learning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold’em Poker

Abstract

Chapter PDF

Similar content being viewed by others

Iterated Boxed Pigs Game: A Reinforcement Learning Approach

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Adaptive Playouts in Monte-Carlo Tree Search with Policy-Gradient Reinforcement Learning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation