Skip to main content
Log in

Archetypoid analysis for sports analytics

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

We intend to understand the growing amount of sports performance data by finding extreme data points, which makes human interpretation easier. In archetypoid analysis each datum is expressed as a mixture of actual observations (archetypoids). Therefore, it allows us to identify not only extreme athletes and teams, but also the composition of other athletes (or teams) according to the archetypoid athletes, and to establish a ranking. The utility of archetypoids in sports is illustrated with basketball and soccer data in three scenarios. Firstly, with multivariate data, where they are compared with other alternatives, showing their best results. Secondly, despite the fact that functional data are common in sports (time series or trajectories), functional data analysis has not been exploited until now, due to the sparseness of functions. In the second scenario, we extend archetypoid analysis for sparse functional data, furthermore showing the potential of functional data analysis in sports analytics. Finally, in the third scenario, features are not available, so we use proximities. We extend archetypoid analysis when asymmetric relations are present in data. This study provides information that will provide valuable knowledge about player/team/league performance so that we can analyze athlete’s careers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://bayes2.ucd.ie:3838/gvinue/AppBasketball.

  2. library (shiny); runUrl(‘http://www.uv.es/vivigui/softw/AppPlayers.zip’)

  3. All data was downloaded from: http://www.basketball-reference.com/play-index/pgl_finder.cgi?lid=front_pi.

  4. Obtained from www.linguasport.com/futbol/nacional/liga/Liga_15.htm.

References

  • Bauckhage C, Thurau C (2009) Making archetypal analysis practical. In: Denzler J., Notni G., Süsse H. (eds) Pattern Recognition. 31st annual pattern recognition symposium of the German Association for Pattern Recognition, 2009. Lecture Notes in Computer Science, vol 5748. Springer, Berlin, Heidelberg, Germany, 272–281

  • Bhandari I, Colet E, Parker J, Pines Z, Pratap R, Ramanujam K (1997) Advanced scout: Data mining and knowledge discovery in NBA data. Data Mining and Knowledge Discovery 1(1):121–125

    Article  Google Scholar 

  • Canhasi E, Kononenko I (2013) Multi-document summarization via archetypal analysis of the content-graph joint model. Knowledge and Information Systems, 1–22

  • Canhasi E, Kononenko I (2014) Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization. Expert Systems with Applications 41(2):535–543

    Article  Google Scholar 

  • Chan B, Mitchell D, Cram L (2003) Archetypal analysis of galaxy spectra. Monthly Notices of the Royal Astronomical Society 338:1–6

    Article  Google Scholar 

  • Cleveland W, Grosse E, Shyu W (1992) Statistical models in S, Wadsworth & Brooks/Cole, chap Local regression

  • Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36(4):338–347

    Article  MathSciNet  MATH  Google Scholar 

  • Davis T, Love B (2010) Memory for category information is idealized through contrast with competing options. Psychological Science 21(2):234–242

    Article  Google Scholar 

  • D’Esposito MR, Palumbo F, Ragozini G (2012) Interval archetypes: A new tool for interval data analysis. Statistical Analysis and Data Mining 5(4):322–335

    Article  MathSciNet  Google Scholar 

  • D’Esposito MR, Ragozini G (2008) A new R-ordering procedure to rank multivariate performances. Quaderni di Statistica 10:5–21

    Google Scholar 

  • Donoghue O, Harrison A, Coffey N, Hayes K (2008) Functional data analysis of running kinematics in chronic Achilles tendon injury. Medicine and Science in Sports and Exercise 40(7):1323–1335

    Article  Google Scholar 

  • Elhamifar E, Sapiro G, Vidal R (2012) See all by looking at a few: Sparse modeling for finding representative objects. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–8

  • Epifanio I (2013) H-plots for displaying nonmetric dissimilarity matrices. Statistical Analysis and Data Mining 6(2):136–143

    Article  MathSciNet  Google Scholar 

  • Epifanio I (2014) Mapping the asymmetrical citation relationships between journals by h-plots. Journal of the Association for Information Science and Technology 65(6):1293–1298

    Article  Google Scholar 

  • Epifanio I (2016) Functional archetype and archetypoid analysis. Computational Statistics & Data Analysis 104:24–34

    Article  MathSciNet  Google Scholar 

  • Epifanio I, Ávila C, Page Á, Atienza C (2008) Analysis of multiple waveforms by means of functional principal component analysis: normal versus pathological patterns in sit-to-stand movement. Medical & Biological Engineering & Computing 46(6):551–561

    Article  Google Scholar 

  • Epifanio I, Vinué G, Alemany S (2013) Archetypal analysis: Contributions for estimating boundary cases in multivariate accommodation problem. Computers & Industrial Engineering 64:757–765

    Article  Google Scholar 

  • Eugster M (2012) Performance profiles based on archetypal athletes. International Journal of Performance Analysis in Sport 12(1):166–187

    Google Scholar 

  • Eugster M, Leisch F (2009) From Spider-Man to hero - Archetypal analysis in R. Journal of Statistical Software 30(8):1–23

    Article  Google Scholar 

  • Eugster M, Leisch F (2011) Weighted and robust archetypal analysis. Computational Statistics & Data Analysis 55(3):1215–1225

    Article  MathSciNet  MATH  Google Scholar 

  • Feld S, Werner M, Schönfeld M, Hasler S (2015) Archetypes of alternative routes in buildings. In: Proceedings of the 6th International Conference on Indoor Positioning and Indoor Navigation (IPIN), 1–10

  • Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976

    Article  MathSciNet  MATH  Google Scholar 

  • Glossary of basketball (2016) http://www.basketball-reference.com/about/glossary.html

  • Gower J (1971) A general coefficient of similarity and some of its properties. Biometrics 27(4):857–871

    Article  Google Scholar 

  • Gruhl J, Erosheva EA (2014) A Tale of Two (Types of) Memberships. In: Handbook on Mixed-Membership Models, Chapman & Hall/CRC, 15–38

  • Harrison A (2014) Applications of functional data analysis in sports biomechanics. In: 32 International Conference of Biomechanics in Sports, 1–9

  • Harrison A, Ryan W, Hayes K (2007) Functional data analysis of joint coordination in the development of vertical jump performance. Sports Biomechanics 6(2):199–214

    Article  Google Scholar 

  • Hoopdata - NBA Statistics and Analysis (2009-2013). Retrieved from http://www.hoopdata.com/regstats.aspx

  • James G (2010) The Oxford handbook of functional data analysis, Oxford University Press, chap Sparse Functional Data Analysis

  • James G, Hastie T, Sugar C (2000) Principal component models for sparse functional data. Biometrika 87(3):587–602

    Article  MathSciNet  MATH  Google Scholar 

  • Kaufman L, Rousseeuw PJ (1990) Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley, New York

    Book  MATH  Google Scholar 

  • Kersting K, Bauckhage C, Thurau C, Wahabzada M (2012) Matrix Factorization as Search. Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases. Bristol, UK, pp 850–853

    Google Scholar 

  • Krein M, Milman D (1940) On extreme points of regular convex sets. Studia Mathematica 9:133–138

    Article  MathSciNet  MATH  Google Scholar 

  • Kubatko J, Oliver D, Pelton K, Rosenbaum D (2007) A starting point for analyzing basketball statistics. Journal of Quantitative Analysis in Sports 3(3):1–10

    Article  MathSciNet  Google Scholar 

  • Levitin D, Nuzzo R, Vines B, Ramsay J (2007) Introduction to functional data analysis. Canadian Psychology 48(3):135–155

    Article  Google Scholar 

  • Li S, Wang P, Louviere J, Carson R (2003) Archetypal analysis: A new way to segment markets based on extreme individuals. In: ANZMAC 2003, Conference Proceedings, Australia and New Zealand Marketing Academy Conference (ANZMAC), Adelaide, Australia, 1674–1679

  • Lutz D (2012) A cluster analysis of NBA players. MIT Sloan Sports Analytics Conference. MIT, Boston, USA, pp 1–10

    Google Scholar 

  • Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2015) cluster: Cluster analysis basics and extensions. R package version 2.0.1 — For new features, see the ’Changelog’ file (in the package source)

  • Midgley D, Venaik S (2013) Marketing strategy in MNC subsidiaries: Pure versus hybrid archetypes. Proceedings of the 55th Annual Meeting of the Academy of International Business. AIB, Istanbul, Turkey, pp 215–216

    Google Scholar 

  • Mohamed S, Heller K, Ghahramani Z (2014) A simple and general exponential family framework for partial membership and factor analysis. In: Handbook on Mixed-Membership Models, Chapman & Hall/CRC, 67–88

  • Mørup M, Hansen L (2012) Archetypal analysis for machine learning and data mining. Neurocomputing 80:54–63

    Article  Google Scholar 

  • O’Donoghue P (2010) Research methods for sports performance analysis. Routledge, Taylor & Francis Group, New York, NY

  • Peng J, Paul D (2009) A geometric approach to maximum likelihood estimation of the functional principal components from sparse longitudinal data. Journal of Computational and Graphical Statistics 18(4):995–1015

    Article  MathSciNet  Google Scholar 

  • Peng J, Paul D (2011) fpca: Restricted MLE for functional principal components analysis. https://CRAN.R-project.org/package=fpca, R package version 0.2-1

  • Porzio G, Ragozini G, Vistocco D (2008) On the use of archetypes as benchmarks. Applied Stochastic Models in Business and Industry 24:419–437

    Article  MathSciNet  MATH  Google Scholar 

  • R Development Core Team (2016) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/, ISBN 3-900051-07-0

  • Ragozini G, Palumbo F, D’Esposito MR (2017) Archetypal analysis for data-driven prototype identification. Statistical Analysis and Data Mining: The ASA Data Science Journal 10(1):6–20

    Article  MathSciNet  Google Scholar 

  • Ramsay J, Silverman B (2002) Applied functional data analysis. Springer

  • Ramsay J, Silverman B (2005) Functional data analysis, 2nd edn. Springer

  • Schulte, O, Zhao, Z Routley, K (2015) What is the Value of an Action in Ice Hockey? Learning a Q-function for the NHL. In: MLSA 2015: Machine Learning and Data Mining for Sports Analytics (MLSA 15), 1–10

  • Seiler C, Wohlrabe K (2013) Archetypal scientists. Journal of Informetrics 7:345–356

    Article  Google Scholar 

  • Shea S (2014) Basketball analytics: Spatial tracking. Louis, MO, Createspace, Lake St

    Google Scholar 

  • Shea S, Baker C (2013) Basketball analytics: Objective and efficient strategies for understanding how teams win. Louis, MO, Advanced Metrics, LLC, Lake St

    Google Scholar 

  • Theodosiou T, Kazanidis I, Valsamidis S, Kontogiannis S (2013) Courseware usage archetyping. In: Proceedings of the 17th Panhellenic Conference on Informatics, ACM, New York, NY, USA, PCI ’13, 243–249

  • Thurau C, Kersting K, Wahabzada M, Bauckhage C (2012) Descriptive matrix factorization for sustainability adopting the principle of opposites. Data Mining and Knowledge Discovery 24(2):325–354

    Article  MathSciNet  MATH  Google Scholar 

  • Ullah S, Finch C (2013) Applications of functional data analysis: A systematic review. BMC Medical Research Methodology 13(43):1–12

    Google Scholar 

  • Vinué G (2014) Development of statistical methodologies applied to anthropometric data oriented towards the ergonomic design of products. PhD thesis, Faculty of Mathematics. University of Valencia, Spain, http://hdl.handle.net/10550/35907

  • Vinué G, Epifanio I, Alemany S (2015) Archetypoids: A new approach to define representative archetypal data. Computational Statistics and Data Analysis 87:102–115

    Article  MathSciNet  Google Scholar 

  • Vinué G (2017) Anthropometry: An R package for analysis of anthropometric data. Journal of Statistical Software 77(6):1–39

    Article  Google Scholar 

  • Vinué G, Epifanio I, Simó A, Ibáñez M, Domingo J, Ayala G (2017) Anthropometry: An R package for analysis of anthropometric data. https://CRAN.R-project.org/package=Anthropometry, R package version 1.8

  • Wakim A, Jin J (2014) Functional data analysis of aging curves in sports, http://arxiv.org/abs/1403.7548

  • Williams C, Wragg C (2004) Data analysis and research for sport and exercise science. Routledge, Taylor & Francis Group, New York, NY

  • Winston W (2009) Mathletics : How gamblers, managers, and sports enthusiasts use mathematics in baseball, basketball, and football. Princeton University Press, Princeton, New Jersey

    MATH  Google Scholar 

  • Yao F, Müller H-G, Wang JL (2005) Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100(470):577–590

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Editors and three reviewers for their very constructive suggestions, which have led to improvements in the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Vinué.

Additional information

Responsible editors: A. Zimmermann and U. Brefeld.

This work has been partially supported by Grant DPI2013-47279-C2-1-R. The databases and R code (including the web application) to reproduce the results can be freely accessed at www.uv.es/vivigui/software.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vinué, G., Epifanio, I. Archetypoid analysis for sports analytics. Data Min Knowl Disc 31, 1643–1677 (2017). https://doi.org/10.1007/s10618-017-0514-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-017-0514-1

Keywords

Navigation