Abstract
We intend to understand the growing amount of sports performance data by finding extreme data points, which makes human interpretation easier. In archetypoid analysis each datum is expressed as a mixture of actual observations (archetypoids). Therefore, it allows us to identify not only extreme athletes and teams, but also the composition of other athletes (or teams) according to the archetypoid athletes, and to establish a ranking. The utility of archetypoids in sports is illustrated with basketball and soccer data in three scenarios. Firstly, with multivariate data, where they are compared with other alternatives, showing their best results. Secondly, despite the fact that functional data are common in sports (time series or trajectories), functional data analysis has not been exploited until now, due to the sparseness of functions. In the second scenario, we extend archetypoid analysis for sparse functional data, furthermore showing the potential of functional data analysis in sports analytics. Finally, in the third scenario, features are not available, so we use proximities. We extend archetypoid analysis when asymmetric relations are present in data. This study provides information that will provide valuable knowledge about player/team/league performance so that we can analyze athlete’s careers.
Similar content being viewed by others
Notes
library (shiny); runUrl(‘http://www.uv.es/vivigui/softw/AppPlayers.zip’)
All data was downloaded from: http://www.basketball-reference.com/play-index/pgl_finder.cgi?lid=front_pi.
Obtained from www.linguasport.com/futbol/nacional/liga/Liga_15.htm.
References
Bauckhage C, Thurau C (2009) Making archetypal analysis practical. In: Denzler J., Notni G., Süsse H. (eds) Pattern Recognition. 31st annual pattern recognition symposium of the German Association for Pattern Recognition, 2009. Lecture Notes in Computer Science, vol 5748. Springer, Berlin, Heidelberg, Germany, 272–281
Bhandari I, Colet E, Parker J, Pines Z, Pratap R, Ramanujam K (1997) Advanced scout: Data mining and knowledge discovery in NBA data. Data Mining and Knowledge Discovery 1(1):121–125
Canhasi E, Kononenko I (2013) Multi-document summarization via archetypal analysis of the content-graph joint model. Knowledge and Information Systems, 1–22
Canhasi E, Kononenko I (2014) Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization. Expert Systems with Applications 41(2):535–543
Chan B, Mitchell D, Cram L (2003) Archetypal analysis of galaxy spectra. Monthly Notices of the Royal Astronomical Society 338:1–6
Cleveland W, Grosse E, Shyu W (1992) Statistical models in S, Wadsworth & Brooks/Cole, chap Local regression
Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36(4):338–347
Davis T, Love B (2010) Memory for category information is idealized through contrast with competing options. Psychological Science 21(2):234–242
D’Esposito MR, Palumbo F, Ragozini G (2012) Interval archetypes: A new tool for interval data analysis. Statistical Analysis and Data Mining 5(4):322–335
D’Esposito MR, Ragozini G (2008) A new R-ordering procedure to rank multivariate performances. Quaderni di Statistica 10:5–21
Donoghue O, Harrison A, Coffey N, Hayes K (2008) Functional data analysis of running kinematics in chronic Achilles tendon injury. Medicine and Science in Sports and Exercise 40(7):1323–1335
Elhamifar E, Sapiro G, Vidal R (2012) See all by looking at a few: Sparse modeling for finding representative objects. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–8
Epifanio I (2013) H-plots for displaying nonmetric dissimilarity matrices. Statistical Analysis and Data Mining 6(2):136–143
Epifanio I (2014) Mapping the asymmetrical citation relationships between journals by h-plots. Journal of the Association for Information Science and Technology 65(6):1293–1298
Epifanio I (2016) Functional archetype and archetypoid analysis. Computational Statistics & Data Analysis 104:24–34
Epifanio I, Ávila C, Page Á, Atienza C (2008) Analysis of multiple waveforms by means of functional principal component analysis: normal versus pathological patterns in sit-to-stand movement. Medical & Biological Engineering & Computing 46(6):551–561
Epifanio I, Vinué G, Alemany S (2013) Archetypal analysis: Contributions for estimating boundary cases in multivariate accommodation problem. Computers & Industrial Engineering 64:757–765
Eugster M (2012) Performance profiles based on archetypal athletes. International Journal of Performance Analysis in Sport 12(1):166–187
Eugster M, Leisch F (2009) From Spider-Man to hero - Archetypal analysis in R. Journal of Statistical Software 30(8):1–23
Eugster M, Leisch F (2011) Weighted and robust archetypal analysis. Computational Statistics & Data Analysis 55(3):1215–1225
Feld S, Werner M, Schönfeld M, Hasler S (2015) Archetypes of alternative routes in buildings. In: Proceedings of the 6th International Conference on Indoor Positioning and Indoor Navigation (IPIN), 1–10
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976
Glossary of basketball (2016) http://www.basketball-reference.com/about/glossary.html
Gower J (1971) A general coefficient of similarity and some of its properties. Biometrics 27(4):857–871
Gruhl J, Erosheva EA (2014) A Tale of Two (Types of) Memberships. In: Handbook on Mixed-Membership Models, Chapman & Hall/CRC, 15–38
Harrison A (2014) Applications of functional data analysis in sports biomechanics. In: 32 International Conference of Biomechanics in Sports, 1–9
Harrison A, Ryan W, Hayes K (2007) Functional data analysis of joint coordination in the development of vertical jump performance. Sports Biomechanics 6(2):199–214
Hoopdata - NBA Statistics and Analysis (2009-2013). Retrieved from http://www.hoopdata.com/regstats.aspx
James G (2010) The Oxford handbook of functional data analysis, Oxford University Press, chap Sparse Functional Data Analysis
James G, Hastie T, Sugar C (2000) Principal component models for sparse functional data. Biometrika 87(3):587–602
Kaufman L, Rousseeuw PJ (1990) Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley, New York
Kersting K, Bauckhage C, Thurau C, Wahabzada M (2012) Matrix Factorization as Search. Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases. Bristol, UK, pp 850–853
Krein M, Milman D (1940) On extreme points of regular convex sets. Studia Mathematica 9:133–138
Kubatko J, Oliver D, Pelton K, Rosenbaum D (2007) A starting point for analyzing basketball statistics. Journal of Quantitative Analysis in Sports 3(3):1–10
Levitin D, Nuzzo R, Vines B, Ramsay J (2007) Introduction to functional data analysis. Canadian Psychology 48(3):135–155
Li S, Wang P, Louviere J, Carson R (2003) Archetypal analysis: A new way to segment markets based on extreme individuals. In: ANZMAC 2003, Conference Proceedings, Australia and New Zealand Marketing Academy Conference (ANZMAC), Adelaide, Australia, 1674–1679
Lutz D (2012) A cluster analysis of NBA players. MIT Sloan Sports Analytics Conference. MIT, Boston, USA, pp 1–10
Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2015) cluster: Cluster analysis basics and extensions. R package version 2.0.1 — For new features, see the ’Changelog’ file (in the package source)
Midgley D, Venaik S (2013) Marketing strategy in MNC subsidiaries: Pure versus hybrid archetypes. Proceedings of the 55th Annual Meeting of the Academy of International Business. AIB, Istanbul, Turkey, pp 215–216
Mohamed S, Heller K, Ghahramani Z (2014) A simple and general exponential family framework for partial membership and factor analysis. In: Handbook on Mixed-Membership Models, Chapman & Hall/CRC, 67–88
Mørup M, Hansen L (2012) Archetypal analysis for machine learning and data mining. Neurocomputing 80:54–63
O’Donoghue P (2010) Research methods for sports performance analysis. Routledge, Taylor & Francis Group, New York, NY
Peng J, Paul D (2009) A geometric approach to maximum likelihood estimation of the functional principal components from sparse longitudinal data. Journal of Computational and Graphical Statistics 18(4):995–1015
Peng J, Paul D (2011) fpca: Restricted MLE for functional principal components analysis. https://CRAN.R-project.org/package=fpca, R package version 0.2-1
Porzio G, Ragozini G, Vistocco D (2008) On the use of archetypes as benchmarks. Applied Stochastic Models in Business and Industry 24:419–437
R Development Core Team (2016) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/, ISBN 3-900051-07-0
Ragozini G, Palumbo F, D’Esposito MR (2017) Archetypal analysis for data-driven prototype identification. Statistical Analysis and Data Mining: The ASA Data Science Journal 10(1):6–20
Ramsay J, Silverman B (2002) Applied functional data analysis. Springer
Ramsay J, Silverman B (2005) Functional data analysis, 2nd edn. Springer
Schulte, O, Zhao, Z Routley, K (2015) What is the Value of an Action in Ice Hockey? Learning a Q-function for the NHL. In: MLSA 2015: Machine Learning and Data Mining for Sports Analytics (MLSA 15), 1–10
Seiler C, Wohlrabe K (2013) Archetypal scientists. Journal of Informetrics 7:345–356
Shea S (2014) Basketball analytics: Spatial tracking. Louis, MO, Createspace, Lake St
Shea S, Baker C (2013) Basketball analytics: Objective and efficient strategies for understanding how teams win. Louis, MO, Advanced Metrics, LLC, Lake St
Theodosiou T, Kazanidis I, Valsamidis S, Kontogiannis S (2013) Courseware usage archetyping. In: Proceedings of the 17th Panhellenic Conference on Informatics, ACM, New York, NY, USA, PCI ’13, 243–249
Thurau C, Kersting K, Wahabzada M, Bauckhage C (2012) Descriptive matrix factorization for sustainability adopting the principle of opposites. Data Mining and Knowledge Discovery 24(2):325–354
Ullah S, Finch C (2013) Applications of functional data analysis: A systematic review. BMC Medical Research Methodology 13(43):1–12
Vinué G (2014) Development of statistical methodologies applied to anthropometric data oriented towards the ergonomic design of products. PhD thesis, Faculty of Mathematics. University of Valencia, Spain, http://hdl.handle.net/10550/35907
Vinué G, Epifanio I, Alemany S (2015) Archetypoids: A new approach to define representative archetypal data. Computational Statistics and Data Analysis 87:102–115
Vinué G (2017) Anthropometry: An R package for analysis of anthropometric data. Journal of Statistical Software 77(6):1–39
Vinué G, Epifanio I, Simó A, Ibáñez M, Domingo J, Ayala G (2017) Anthropometry: An R package for analysis of anthropometric data. https://CRAN.R-project.org/package=Anthropometry, R package version 1.8
Wakim A, Jin J (2014) Functional data analysis of aging curves in sports, http://arxiv.org/abs/1403.7548
Williams C, Wragg C (2004) Data analysis and research for sport and exercise science. Routledge, Taylor & Francis Group, New York, NY
Winston W (2009) Mathletics : How gamblers, managers, and sports enthusiasts use mathematics in baseball, basketball, and football. Princeton University Press, Princeton, New Jersey
Yao F, Müller H-G, Wang JL (2005) Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100(470):577–590
Acknowledgements
The authors would like to thank the Editors and three reviewers for their very constructive suggestions, which have led to improvements in the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editors: A. Zimmermann and U. Brefeld.
This work has been partially supported by Grant DPI2013-47279-C2-1-R. The databases and R code (including the web application) to reproduce the results can be freely accessed at www.uv.es/vivigui/software.
Rights and permissions
About this article
Cite this article
Vinué, G., Epifanio, I. Archetypoid analysis for sports analytics. Data Min Knowl Disc 31, 1643–1677 (2017). https://doi.org/10.1007/s10618-017-0514-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-017-0514-1