Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter August 21, 2018

A Bayesian regression approach to handicapping tennis players based on a rating system

  • Timothy C.Y. Chan and Raghav Singal EMAIL logo

Abstract

This paper builds on a recently developed Markov Decision Process-based (MDP) handicap system for tennis, which aims to make amateur matches more competitive. The system gives points to the weaker player based on skill difference, which is measured by the point-win probability. However, estimating point-win probabilities at the amateur level is challenging since point-level data is generally only available at the professional level. On the other hand, tennis rating systems are widely used and provide an estimate of the difference in ability between players, but a rigorous determination of handicap using rating systems is lacking. Therefore, our goal is to develop a mapping between the Universal Tennis Rating (UTR) system and the MDP-based handicaps, so that two amateur players can determine an appropriate handicap for their match based only on their UTRs. We first develop and validate an approach to extract server-independent point-win probabilities from match scores. Then, we show how to map server-independent point-win probabilities to server-specific point-win probabilities. Finally, we use the estimated probabilities to produce handicaps via the MDP model, which are regressed against UTR differences between pairs of players. We conclude with thoughts on how a handicap system could be implemented in practice.

Appendix

A Additional figures

Figure 11: Posterior distributions of the Bayesian parameters α, β, and σ corresponding to the Bayesian model (Equation (2)). Red area denotes the 80% posterior interval and black curve denotes the 95% posterior interval. The distributions are unimodal and symmetric with low standard deviations.
Figure 11:

Posterior distributions of the Bayesian parameters α, β, and σ corresponding to the Bayesian model (Equation (2)). Red area denotes the 80% posterior interval and black curve denotes the 95% posterior interval. The distributions are unimodal and symmetric with low standard deviations.

Figure 12: Posterior distributions of the Bayesian parameters γ and τ corresponding to the Bayesian linear regression model (Equation (3)) when fitted to the combined data (3686 matches). Red area denotes the 80% posterior interval and black curve denotes the 95% posterior interval. The distributions are unimodal and symmetric with low standard deviations.
Figure 12:

Posterior distributions of the Bayesian parameters γ and τ corresponding to the Bayesian linear regression model (Equation (3)) when fitted to the combined data (3686 matches). Red area denotes the 80% posterior interval and black curve denotes the 95% posterior interval. The distributions are unimodal and symmetric with low standard deviations.

References

Bertsekas, D. P. and J. N. Tsitsiklis. 2002. Introduction to Probability (Vol. 1). Belmont, MA: Athena Scientific.Search in Google Scholar

Carpenter, B., A. Gelman, M. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. A. Brubaker, J. Guo, P. Li, and A. Riddell. 2016. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 20:1–37.10.18637/jss.v076.i01Search in Google Scholar

Carter, Jr., W. H. and S. L. Crews. 1974. “An Analysis of the Game of Tennis.” The American Statistician 28:130–134.Search in Google Scholar

Chan, T. C. and R. Singal. 2016. “A Markov Decision Process-Based Handicap System for Tennis.” Journal of Quantitative Analysis in Sports 12:179–188.10.1515/jqas-2016-0057Search in Google Scholar

Eisenhauer, J. G. 2003. “Regression Through the Origin.” Teaching Statistics 25:76–80.10.1111/1467-9639.00136Search in Google Scholar

Fischer, G. 1980. “Exercise in Probability and Statistics, or the Probability of Winning at Tennis.” American Journal of Physics 48:14–19.10.1119/1.12241Search in Google Scholar

Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2014. Bayesian Data Analysis, volume 2. Boca Raton, FL: CRC Press.10.1201/b16018Search in Google Scholar

Kemeny, J. G. and J. L. Snell. 1960. Finite Markov Chains, volume 356. Princeton, NJ: van Nostrand.Search in Google Scholar

Klaassen, F. J. and J. R. Magnus. 2001. “Are Points in Tennis Independent and Identically Distributed? Evidence from a Dynamic Binary Panel Data Model.” Journal of the American Statistical Association 96:500–509.10.1198/016214501753168217Search in Google Scholar

Klaassen, F. J. and J. R. Magnus. 2003. “Forecasting the Winner of a Tennis Match.” European Journal of Operational Research 148: 257–267.10.1016/S0377-2217(02)00682-3Search in Google Scholar

Liu, Y. 2001. “Random Walks in Tennis.” Missouri Journal of Mathematical Sciences 13(3):1–9.10.35834/2001/1303154Search in Google Scholar

Newton, P. K. and K. Aslam. 2009. “Monte Carlo Tennis: A Stochastic Markov Chain Model.” Journal of Quantitative Analysis in Sports 5(3): Article 7.10.2202/1559-0410.1169Search in Google Scholar

O’Malley, A. J. 2008. “Probability Formulas and Statistical Analysis in Tennis.” Journal of Quantitative Analysis in Sports 4(2): Article 15.10.2202/1559-0410.1100Search in Google Scholar

UTR. 2015. “Universal Tennis, Universal Tennis Rating System.” http://universaltennis.com. Accessed: 23-November-2015.Search in Google Scholar

Published Online: 2018-08-21
Published in Print: 2018-09-25

©2018 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 24.4.2024 from https://www.degruyter.com/document/doi/10.1515/jqas-2017-0103/html
Scroll to top button