Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search

Coulom, Rémi

doi:10.1007/978-3-540-75538-8_7

Rémi Coulom¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4630))

Included in the following conference series:

International Conference on Computers and Games

3744 Accesses
330 Citations

Abstract

A Monte-Carlo evaluation consists in estimating a position by averaging the outcome of several random continuations. The method can serve as an evaluation function at the leaves of a min-max tree. This paper presents a new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a Monte-Carlo phase. Instead of backing-up the min-max value close to the root, and the average value at some depth, a more general backup operator is defined that progressively changes from averaging to min-max as the number of simulations grows. This approach provides a fine-grained control of the tree growth, at the level of individual simulations, and allows efficient selectivity. The resulting algorithm was implemented in a 9×9 Go-playing program, Crazy Stone, that won the 10th KGS computer-Go tournament.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abramson, B.: Expected-Outcome: A General Model of Static Evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(2), 182–193 (1990)
Article Google Scholar
Allis, L.V.: Searching for Solutions in Games and Artificial Intelligence. PhD thesis, Universiteit Maastricht, Maastricht, The Netherlands (1994)
Google Scholar
Alrefaei, M.H., Andradóttir, S.: A Simulated Annealing Algorithm with Constant Temperature for Discrete Stochastic Optimization. Management Science 45(5), 748–764 (1999)
Article Google Scholar
Baum, E.B., Smith, W.D.: A Bayesian Approach to Relevance in Game Playing. Artificial Intelligence 97(1–2), 195–242 (1997)
Article MATH MathSciNet Google Scholar
Billings, D., Papp, D., Peña, L., Schaeffer, J., Szafron, D.: Using Selective-Sampling Simulations in Poker. In: Proceedings of the AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information (1999)
Google Scholar
Bouzy, B.: Associating Shallow and Selective Global Tree Search with Monte Carlo for 9×9 Go. In: van den Herik, H.J., Björnsson, Y., Netanyahu, N.S. (eds.) CG 2004. LNCS, vol. 3846, pp. 67–80. Springer, Heidelberg (2006)
Chapter Google Scholar
Bouzy, B.: Move Pruning Techniques for Monte-Carlo Go. In: van den Herik, H.J., Hsu, S.-C., Hsu, T.-s., Donkers, H.H.L.M. (eds.) CG 2005. LNCS, vol. 4250, pp. 104–119. Springer, Heidelberg (2006)
Chapter Google Scholar
Bouzy, B., Cazenave, T.: Computer Go: an AI-oriented Survey. Artificial Intelligence 132, 39–103 (2001)
Article MATH MathSciNet Google Scholar
Bouzy, B., Helmstetter, B.: Monte Carlo Go Developments. In: van den Herik, H.J., Iida, H., Heinz, E.A. (eds.) 10th Advances in Computer Games (ACG10), Many Games, Many Challenges, pp. 159–174. Kluwer Academic Publishers, Boston (2004)
Google Scholar
Brügmann, B.: Monte Carlo Go, Unpublished technical report (1993)
Google Scholar
Cazenave, T., Helmstetter, B.: Combining Tactical Search and Monte-Carlo in the Game of Go. In: Kendall, G., Lucas, S. (eds.) Proceedings of the IEEE Symposium on Computational Intelligence and Games, pp. 117–124. IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Chang, H.S., Fu, M.C., Hu, J., Marcus, S.I.: An Adaptive Sampling Algorithm for Solving Markov Decision Processes. Operations Research 53(1), 126–139 (2005)
Article MathSciNet Google Scholar
Chen, C.-H., Lin, J., Yücesan, E., Chick, S.E.: Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization. Journal of Discrete Event Dynamic Systems: Theory and Applications 10(3), 251–270 (2000)
Article MATH Google Scholar
Chung, M., Buro, M., Schaeffer, J.: Monte-Carlo Planning in RTS Games. In: Kendall, G., Lucas, S. (eds.) Proceedings of the IEEE Symposium on Computational Intelligence and Games, pp. 117–124. IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Enzenberger, M.: Evaluation in Go by a Neural Network Using Soft Segmentation. In: van den Herik, H.J., Iida, H., Heinz, E.A. (eds.) 10th Advances in Computer Games (ACG10), Many Games, Many Challenges, pp. 97–108. Kluwer Academic Publishers, Boston (2004)
Google Scholar
Futschik, A., Pflug, G.Ch.: Optimal Allocation of Simulation Experiments in Discrete Stochastic Optimization and Approximative Algorithms. European Journal of Operational Research 101, 245–260 (1997)
Article MATH Google Scholar
Ginsberg, M.L.: GIB: Steps Toward an Expert-Level Bridge-Playing Program. In: Dean, Th. (ed.) Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 584–593. Morgan Kaufmann, Los Altos, CA (1999)
Google Scholar
Juillé, H.: Methods for Statistical Inference: Extending the Evolutionary Computation Paradigm. PhD thesis, Brandeis University, Department of Computer Science (May 1999)
Google Scholar
Kearns, M., Mansour, Y., Ng, A.Y.: A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes. In: Dean, T. (ed.) Proceedings of the Sixteenth Internation Joint Conference on Artificial Intelligence, pp. 1324–1331. Morgan Kaufmann, Los Alamitos, CA (1999)
Google Scholar
Knuth, D.E., Moore, R.W.: An Analysis of Alpha-Beta Pruning. Artificial Intelligence 6, 293–326 (1975)
Article MATH MathSciNet Google Scholar
Palay, A.J.: Searching with Probabilities. Pitman, Marshfield, MA (1984)
Google Scholar
Péret, L., Garcia, F.: On-line Search for Solving Large Markov Decision Processes. In: De Mantaras, R.L., Saitta, L. (eds.) Proceedings of the 16th European Conference on Artificial Intelligence (2004)
Google Scholar
Sheppard, B.: Efficient Control of Selective Simulations. ICGA Journal 27(2), 67–79 (2004)
Google Scholar
Sutton, R.S.: Learning to Predict by the Methods of Temporal Differences. Machine Learning 3, 9–44 (1988)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998)
Google Scholar
Tesauro, G.: Programming Backgammon Using Self-Teaching Neural Nets. Artificial Intelligence 134, 181–199 (2002)
Article MATH Google Scholar
Tromp, J., Farnebäck, G.: Combinatorics of Go. In: van den Herik, H.J., Ciancarini, P., Donkers, H.L.L.M. (eds.) CG 2006. 5th Computers and Games Conference. LNCS, vol. 4630, pp. 85–100. Springer, Heidelberg (2007)
Google Scholar
Wedd, N.: Computer Go Tournaments on KGS (2005), http://www.weddslist.com/kgs/

Download references

Author information

Authors and Affiliations

CNRS-LIFL, INRIA-SequeL, Université Charles de Gaulle, Lille, France
Rémi Coulom

Authors

Rémi Coulom
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

H. Jaap van den Herik Paolo Ciancarini H. H. L. M. (Jeroen) Donkers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Coulom, R. (2007). Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.(. (eds) Computers and Games. CG 2006. Lecture Notes in Computer Science, vol 4630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75538-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-75538-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75537-1
Online ISBN: 978-3-540-75538-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics