Abstract
The strength of a game-playing program is mainly based on the adequacy of the evaluation function and the efficacy of the search algorithm. This paper investigates how temporal difference learning and genetic algorithms can be used to improve various decisions made during game-tree search. The existent TD algorithms are not directly suitable for learning search decisions. Therefore we propose a modified update rule that uses the TD error of the evaluation function to shorten the lag between two rewards. The genetic algorithms can be applied directly to learn search decisions. For our experiments we selected the problem of time allocation from the set of search decisions. On each move the player can decide on a certain search depth, being constrained by the amount of time left. As testing ground, we used the game of Lines of Action, which has roughly the same complexity as Othello. From the results we conclude that both the TD and the genetic approach lead to good results when compared to the existent time-allocation techniques. Finally, a brief discussion of the issues that can emerge when the algorithms are applied to more complex search decisions is given.
Acknowledgements
The authors thank the referees for their constructive comments and suggestions for improvements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
T.S. Anantharaman. Evaluation tuning for computer chess: Linear discriminant methods. ICCA Journal, 20(4):224–242, 1997.
E.B. Baum and W.D. Smith. A bayesian approach to relevance in game playing. Artificial Intelligence, 97(1-2):195–242, 1997.
J. Baxter, A. Tridgell, and L. Weaver. Experiments in parameter learning using temporal differences. ICCA Journal, 21(2):84–99, 1998.
D.F. Beal and M.C. Smith. Learning piece values using temporal difference learning. ICCA Journal, 20(3):147–151, 1997.
D.F. Beal and M.C. Smith. Temporal difference learning for heuristic search and game playing. Information Sciences, 122(1):3–21, 2000.
Y. Björnsson and T. Marsland. Learning search control in adversary games. In H.J. van den Herik and B. Monien, editors, Proceedings of the Advances in Computer Games 9 Conference, 2000.
M. Buro. Experiments with Multi-ProbCut and a new high-quality evaluation function for Othello. In H. J. van den Herik and H. Iida, editors, Games in AI Research. 1999.
D. Carmel and S. Markovitch. Incorporating opponent models into adversary search. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), pages 120–125, 1996.
K. Chellapilla and D.B. Fogel. Co-evolving checkers playing programs using only win, lose, or draw. In Proceedings of SPIE’s AeroSense’99: Applications and Science of Computational Intelligence II, 1999.
D.E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, MA, 1989.
D. Harada and S. Russell. Extended abstract: Learning search strategies. In AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information, 1999.
E.A. Heinz. Adaptive null-move pruning. ICCA Journal, 22(3):123–132, 1999.
R.M. Hyatt. Using time wisely. ICCA Journal, 7(1):4–9, 1984.
H. Iida, J.W.H.M. Uiterwijk, H.J. van den Herik, and I.S. Herschberg. Potential applications of opponent-model search. Part 1: The domain of applicability. ICCA Journal, 16(4):201–208, 1993.
T. Jaakkola, S. Singh, and M. Jordan. Reinforcement learning algorithm for partially observable markov problems. In Advances in Neural Information Processing Systems 7, pages 345–352, 1994.
L.P. Kaelbling, M.L. Littman, and A.R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2):99–134, 1998.
H. Kimura and S. Kobayashi. An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In Proceedings of the 15th International Conference on Machine Learning, pages 278–286, 1998.
V.R. Konda and V.S. Borkar. Actor-critic type learning algorithms for markov decision processes. SIAM Journal of Control and Optimisation, 38(1):94–133, 1999.
V.R. Konda and J.N. Tsitsiklis. Actor-critic algorithms. In Advances in Neural Information Processing Systems 12, 2000.
B.C. Kuszmaul. The StarTech massively parallel chess program. ICCA Journal, 18(1):3–19, 1995.
C. Leiserson. Using the Cilk multithreaded programming language to implement a multiprocessor chess program. In H.J. van den Herik and B. Monien, editors, Proceedings of the Advances in Computer Games 9 Conference, 2000.
S. Markovitch and Y. Sella. Learning of resource allocation strategies for game playing. Computational Intelligence, 12(1):88–105, 1996.
T.M. Mitchell, R.M. Keller, and S. Kedar-Cabelli. Explanation-based generalization: A unifying view. Machine Learning, 1(1):47–80, 1986.
D.E. Moriarty and R. Miikkulainen. Hierarchical evolution of neural networks. In Proceedings of the 1998 IEEE Conference on Evolutionary Computation, pages 428–433, 1998.
D.E. Moriarty, A.C. Schultz, and J.J. Grefenstette. Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research, 11:241–276, 1999.
N. Richards, D. Moriarty, and R. Miikkulainen. Evolving neural networks to play Go. Applied Intelligence, 8:85–96, 1998.
S. Russell and E.H. Wefald. Do the Right Thing: Studies in Limited Rationality. MIT Press, 1991.
A.L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3):211–229, 1959.
J. Schaeffer and A. Plaat. Kasparov versusDEEPBLUE: The rematch. ICCA Journal, 20(2):95–101, 1997.
Y. Seirawan. The Kasparov-DEEP BLUE games. ICCA Journal, 20(2):102–125, 1997.
R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
R.S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057–1063, 2000.
G.J. Tesauro. Practical issues in temporal difference learning. Machine Learning, 8:257–277, 1992.
S. Thrun. Learning to play the game of chess. In Advances in Neural Information Processing Systems 7, pages 1069–1076, 1995.
M.H.M. Winands. Analysis and implementation of Lines of Action. Master’s thesis, Department of Computer Science, Universiteit Maastricht, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kocsis, L., Uiterwijk, J., van den Herik, J. (2001). Learning Time Allocation Using Neural Networks. In: Marsland, T., Frank, I. (eds) Computers and Games. CG 2000. Lecture Notes in Computer Science, vol 2063. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45579-5_11
Download citation
DOI: https://doi.org/10.1007/3-540-45579-5_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43080-3
Online ISBN: 978-3-540-45579-0
eBook Packages: Springer Book Archive