Zusammenfassung
Empirische Beobachtungsergebnisse werden mit statistischen Verfahren ausgewertet, um Zusammenhänge und Unterschiede zu prüfen und sachgerecht interpretieren zu können. Die statistisch-methodologische Auswertungsstrategie entspricht einer formalen Sprache, die zu lernen ist, um sich mit anderen Forscher(inne)n verständigen zu können, die aber, wie andere Gegenstandsgebiete auch, durch neue Erkenntnisse im Wandel begriffen ist und damit sich auch entsprechende Empfehlungen zur Verwendung optimaler statistischer Methoden im Laufe der Zeit wandeln. Dieser Gedanke wird in diesem kurzen Diskussionsbeitrag aufgegriffen, und es werden Empfehlungen zu fünf zentralen Bereichen gegeben: a) Gerichtetheit der Hypothesen, b) Konfidenzintervalle, c) Effektgrößen, d) Intervall-Effektgrößen und e) praktische Bedeutsamkeit.
Abstract
In general, empirical results are analyzed using statistical methods to examine and discuss differences and interrelationships. Statistical methodological evaluation and the underlying reporting strategy can be described as a technical language that has to be learnt for successful communication between researchers, authors and reviewers; however, empirical science is a constantly changing environment which is why the statistical methods applied have to be refined to adhere to new empirical approaches. This line of thought will be discussed in this article and five essential recommendations will be presented: (a) directionality of hypotheses, (b) confidence intervals, (c) effect sizes, (d) confidence intervals of effect sizes and (e) practical meaningfulness.
Literatur
Aiken, L. R. (1994). Some observations and recommendations concerning research methodology in the behavioral sciences. Educational and Psychological Measurement, 54(4), 848–860. doi:10.1177/0013164494054004001.
American Psychological Association (Hrsg.). (2010). Publication manual of the American Psychological Association – Teil 1 (6. Aufl.). Washington, DC: American Psychological Association.
Amir, Y., & Sharon, I. (1990). Replication Research: A „must“ for the scientific advancement of psychology. In J. W. Neuliep (Hrsg.), Handbook of replication research in the behavioral and social sciences (S. 51–69). Corte Medera: Select.
Arain, M., Campbell, M., Cooper, C., & Lancaster, G. (2010). What is a pilot or feasibility study? A review of current practice and editorial policy. BMC Medical Research Methodology, 10(1), 67.
Atkinson, G., & Nevill, A. M. (2001). Selected issues in the design and analysis of sport performance research. Journal of Sports Sciences, 19(10), 811–827.
Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100(3), 603–617. doi:10.1348/000712608x377117.
Bakker, M., & Wicherts, J. (2011). The (mis)reporting of statistical results in psychology journals. Behavior Research Methods, 43(3), 666–678. doi:10.3758/s13428-011-0089-5.
Batterham, A. M., & Hopkins, W. G. (2006). Making meaningful inferences about magnitudes. International Journal of Sports Physiology and Performance, 1, 50–57.
Beck, T. W. (2013). The importance of a priori sample size estimation in strength and conditioning research. Journal of Strength & Conditioning Research, 27(8), 2323–2337. doi:10.1519/JSC.0b013e318278eea0.
Bortz, J., & Döring, N. (2006). Forschungsmethoden und Evaluation (4. Aufl.). Berlin: Springer.
Brandstätter, E. (1999). Konfidenzintervalle als Alternative zu Signifikanztests. Methods of Psycholocical Research – Online, 4(2), 33–46.
Bredenkamp, J. (1970). Über Maße der praktischen Signifikanz. Zeitschrift für Psychologie, 177(3/4), 310–317.
Bredenkamp, J. (1972). Der Signifikanztest in der psychologischen Forschung. Stuttgart: Akademische Verlagsgesellschaft.
Bredenkamp, J., & Feger, H. (Hrsg.). (1983). Hypothesenprüfung. Göttingen: Hogrefe.
Büsch, D. (2004). Sequ(T)est: Ein einfaches Statistikprogramm zum sequenziellen Testen in sportwissenschaftlichen Untersuchungen. Spectrum der Sportwissenschaften, 16(1), 85–95.
Büsch, D. (2014). Das Messen trainingsbedingter Veränderungen im Spitzensport. In L. K. Maurer, F. Döhring, K. Ferger, H. Maurer, M. Reisser, & H. Müller (Hrsg.), Trainingsbedingte Veränderungen – Messung, Modellierung und Evidenzsicherung. Abstractband zum 10. gemeinsamen Symposium der dvs-Sektionen Biomechanik, Sportmotorik und Trainingswissenschaft vom 17.–19. September 2014 in Gießen (Bd. 237, S. 25–26). Hamburg: Feldhaus Edition Czwalina.
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafo, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature reviews Neuroscience, 14(5), 365–376. doi:10.1038/nrn3475.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale: Lawrence Erlbaum Associates.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003.
Conzelmann, A., & Raab, M. (2009). Datenanalyse: Das Null-Ritual und der Umgang mit Effekten in der Zeitschrift für Sportpsychologie. Zeitschrift für Sportpsychologie, 16(2), 43–54.
Cumming, G. (2014). The New Statistics: Why and How. Psychological Science, 25(1), 7–29. doi:10.1177/0956797613504966.
Cumming, G., & Maillardet, R. (2006). Confidence intervals and replication: Where will the next mean fall? Psychological Methods, 11(3), 217–227.
Cumming, G., Williams, J., & Fidler, F. (2004). Replications and researcher's understanding of confidence intervals and standard error bars. Understanding Statistics, 3(4), 299–311.
Cumming, J., & Finch, S. (2005). Inference by eye. American Psychologist, 60(2), 170–180. doi:10.1037/0003-066X.60.2.170.
Curran-Everett, D. & Benos, D. J. (2004). Guidelines for reporting statistics in journals published by the American Physiological Society. Advances in Physiology Education, 28(3), 85–87. doi:10.1152/advan.00019.2004.
Curran-Everett, D., Taylor, S. & Kafadar, K. (1998). Fundamental concepts in statistics: Elucidation and illustration. Journal of Applied Physiology , 85 (3), 775–786.
Devilly, G. J. (2007). The Effect Size Generator for Windows (Version 4.1) [Computer-Programm]. Swinburne University, Australia: Brain Sciences Institute.
Drinkwater, E. (2008). Applications of confidence limits and effect sizes in sport research. The Open Sports Sciences Journal, 1, 3–4.
Drummond, G. B., & Tom, B. D. M. (2012). Presenting data: Can you follow a recipe? British Journal of Pharmacology, 165(4), 777–781. doi:10.1111/j.1476-5381.2011.01735.x.
Drummond, G. B., & Vowler, S. L. (2011). Show the data, don’t conceal them. British Journal of Pharmacology, 163, 208–210. doi:10.1111/j.1476-5381.2011.01251.x.
Erdfelder, E. (2010). Editorial – A note on statistical analysis. Experimental Psychology, 57(1), 1–4.
Erdfelder, E., & Bredenkamp, J. (1994). Hypothesenprüfung. In T. Herrmann & W. Tack (Hrsg.), Enzyklopädie der Psychologie: Themenbereich B Methodologie und Methoden, Serie I Forschungsmethoden der Psychologie. Bd. 1 Methodologische Grundlagen der Psychologie (S. 604–648). Göttingen: Hogrefe.
Finch, S., Cumming, G., & Thomason, N. (2001). Colloquium on effect sizes: The roles of editors, textbook authors, and the publication manual: Reporting of statistical inference in the journal of applied psychology: Little evidence of reform. Educational and Psychological Measurement, 61(2), 181–210. doi:10.1177/0013164401612001.
Finch, S., Thomason, N., & Cumming, G. (2002). Past and future American psychological association guidelines for statistical practice. Theory & Psychology, 12(6), 825–853.
Fisher, R. A. (1973). Statistical methods and scientific inference (3. Aufl.). London, England: Collier Macmillan.
Francis, G. (2012a). The psychology of replication and replication in psychology. Perspectives on Psychological Science, 7(6), 585–594. doi:10.1177/1745691612459520.
Francis, G. (2012b). Publication bias and the failure of replication in experimental psychology. Psychonomic Bulletin & Review, 19(6), 975–991. doi:10.3758/s13423-012-0322-y.
Fritz, A., Scherndl, T., & Kühberger, A. (2013). A comprehensive review of reporting practices in psychological journals: Are effect sizes really enough? Theory & Psychology, 23(1), 98–122. doi:10.1177/0959354312436870.
Fröhlich, M., & Pieter, A. (2009). Cohen’s Effektstärken als Mass der Bewertung von praktischer Relevanz – Implikationen für die Praxis. Schweizerische Zeitschrift für „Sportmedizin und Sporttraumatologie“, 57(4), 139–142.
Furchtgott, E. (1984). Replicate, again and again. American Psychologist, 39, 1315–1316.
Giles, J. (2006). The trouble with replication. Nature, 442, 344–347. doi:10.1038/442344a.
Gollwitzer, M., & Jäger, R. S. (2009). Evaluation kompakt (1. Aufl.). Weinheim: Beltz Verlag.
Grissom, R. J., & Kim, J. J. (2001). Review of assumptions and problems in the appropriate conceptualization of effect size. Psychological Methods, 6, 135–146.
Guller, U., & DeLong, E. R. (2004). Interpreting statistics in medical literature: A vade mecum for surgeons1. Journal of the American College of Surgeons, 198(3), 441–458. doi:http://dx.doi.org/10.1016/j.jamcollsurg.2003.09.017.
Hager, W. (1987). Grundlagen einer Versuchsplanung zur Prüfung empirischer Hypothesen der Psychologie. In G. Lüer (Hrsg.), Allgemeine experimentelle Psychologie (S. 43–264). Stuttgart: Gustav Fischer Verlag.
Hager, W. (1992). Jenseits von Experiment und Quasi-Experiment. Göttingen: Hogrefe.
Hager, W. (2000). About some misconceptions and the dicontent with statistical tests in psychology. Methods of Psycholocical Research – Online, 5(1), 1–31.
Hager, W. (2004). Testplanung zur statistischen Prüfung psychologischer Hypothesen. Göttingen: Hogrefe.
Hager, W., Spies, K., & Heise, E. (2001). Versuchsdurchführung und Versuchsbericht (2. überarb. und erweit. ed.). Göttingen: Hogrefe.
Hager, W., & Westermann, R. (1983). Zur Wahl und Prüfung statistischer Hypothesen in psychologischen Untersuchungen. Zeitschrift für experimentelle und angewandte Psychologie, 30(1), 67–94.
Hagger, M. S., & Chatzisarantis, N. L. D. (2009). Assumptions in research in sport and exercise psychology. Psychology of Sport and Exercise, 10(5), 511–519.
Hoekstra, R., Morey, R., Rouder, J., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21(5), 1157–1164. doi:10.3758/s13423-013-0572-3.
Hopkins, W. (2002). Probabilities of clinical or practical significance. Sportscience, 6. www.sportsci.org/jour/0201/wghprob.htm. Zugegriffen: 31. Jan. 2015.
Hopkins, W. (2004). How to interpret changes in an athletic performance test. Sportscience, 8, 1–7. (www.sportsci.org/jour/05/ambwgh.htm).
Hopkins, W. (2005). Making meaningful inferences about magnitudes. Sportscience, 9, 6–13. (www.sportsci.org/jour/05/ambwgh.htm).
Hopkins, W. (2006). Estimating sample size for magnitude-based inferences. Sportscience, 10, 63–70. (www.sportsci.org/2006/wghss.htm).
Hopkins, W. (2007). A spreadsheet for deriving a confidence interval, mechanistic inference und clinical inference from a p value. Sportscience, 11, 16–20. (www.sportsci.org/2007/wghinf.htm).
Hopkins, W., Batterham, A. M., Marshall, S. W., & Hanin, J. (2009a). Progressive statistics. Sportscience, 13, 55–70. (www.sportsci.org/2009/prostats.htm).
Hopkins, W., Batterham, A. M., Marshall, S. W., & Hanin, J. (2009b). Progressive statistics for studies in sports medicine and exercise science. Medicine & Science in Sports & Exercercise, 41(1), 3–12.
Huber, H. P. (1973). Psychometrische Einzelfalldiagnostik. Weinheim: Beltz.
Hurlbert, S. H., & Lombardi, C. M. (2009). Final collapse of the Neyman-Pearson decision theoretic framework and rise of the neoFisherian. Annales Zoologici Fennici, 46, 311–349.
Hussy, W., & Jain, A. (2002). Experimentelle Hypothesenprüfung in der Psychologie. Göttingen: Hogrefe.
Hussy, W., & Möller, H. (1994). Hypothesen. In T. Herrmann & W. Tack (Hrsg.), Enzyklopädie der Psychologie: Themenbereich B Methodologie und Methoden, Serie I Forschungsmethoden der Psychologie. Bd. 1 Methodologische Grundlagen der Psychologie (S. 475–507). Göttingen: Hogrefe.
Hussy, W., Schreier, M., & Echterhoff, G. (2013). Forschungsmethoden in Psychologie und Sozialwissenschaften (2. Aufl.). Berlin: Springer.
Hyde, J. S. (2001). Reporting effect sizes: The roles of editors, textbook authors, and publication manuals. Educational and Psychological Measurement, 61(2), 225–228. doi:10.1177/0013164401612005.
Kelley, K., & Preacher, K. J. (2012). On effect size. Psychological Methods, 17(2), 137–152. doi:10.1037/a0028086.
Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56(5), 746–759. doi:10.1177/0013164496056005002.
Kirk, R. E. (2001). Promoting good statistical practices: Some suggestions. Educational and Psychological Measurement, 61(2), 213–218. doi:10.1177/00131640121971185.
Klein, S. B. (2014). What can recent replication failures tell us about the theoretical commitments of psychology? Theory & Psychology, 24(3), 326–338. doi:10.1177/0959354314529616.
Kline, R. B. (2005). Beyond significance testing (2. Aufl.). Washington, DC: American Psychological Association.
Koole, S. L., & Lakens, D. (2012). Rewarding replications: A sure and simple way to improve psychological science. Perspectives on Psychological Science, 7(6), 608–614. doi:10.1177/1745691612462586.
Krause, M. S. (2012). Measurement validity is fundamentally a matter of definition, not correlation. Review of General Psychology, 14(4), 391–400. doi:10.1037/a0027701.
Leonhart, R. (2004). Effektgrößenberechnung bei Interventionsstudien. Rehabilitation, 43(4), 241–246.
Lew, M. (2006). Principles: When there should be no difference – how to fail to reject the null hypothesis. Trends in Pharmacological Sciences, 27(5), 274–278. doi:http://dx.doi.org/10.1016/j.tips.2006.03.006.
Lew, M. (2007a). Good statistical practice in pharmacology. Problem 1. British Journal of Pharmacology, 152, 295–298.
Lew, M. (2007b). Good statistical practice in pharmacology. Problem 2. British Journal of Pharmacology, 152, 299–303.
Lew, M. (2012). Bad statistical practice in pharmacology (and other basic biomedical disciplines): You probably don't know P. British Journal of Pharmacology, 166(5), 1559–1567. doi:10.1111/j.1476-5381.2012.01931.x.
MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7(1), 19–40.
Makel, M. C., Plucker, J. A., & Hegarty, B. (2012). Replications in psychology research: How often do they really occur? Perspectives on Psychological Science, 7(6), 537–542. doi:10.1177/1745691612460688.
Morris, S. B. (2008). Estimating effect sizes from pretest-posttest-control group designs. Organizational Research Methods, 11(2), 364–386. doi:10.1177/1094428106291059.
Mullineaux, D. R., Bartlett, R. M., & Bennett, S. (2001). Research design and statistics in biomechanics and motor control. Journal of Sports Sciences, 19(10), 739–760. doi:10.1080/026404101317015410.
Nakagawa, S., & Cuthill, I. C. (2007). Effect size, confidence interval and statistical significance: A practical guide for biologists. Biological Rewies, 82(4), 591–605. doi:10.1111/j.1469-185X.2007.00027.x.
Nuijten, M. B., van Assen, M. A. L. M., Veldkamp, C. L. S. & Wicherts, J. M. (2015). The replication paradox: Combining studies can decrease accuracy of effect size estimates. Review of General Psychology, 19(2), 172–182. doi:10.1037/gpr0000034.supp (Supplemental).
Page, P. (2014). Beyond statistical significance: Clinical interpretation of rehabilitation research literature. The International Journal of Sports Physical Therapy, 9(5), 726–736.
Petersen, C., Wilson, B., & Hopkins, W. (2004). Effects of modified-implement training on fast bowling in cricket. Journal of Sports Sciences, 22(11–12), 1035–1039.
Rhea, M. R. (2004). Determing the magnitude of treatment effects in strength training research through the use of the effect size. Journal of Strength and Conditioning Research, 18(4), 918–920.
Sarris, V. & Reiß, S. (2007). Kurzer Leitfaden der Experimentalpsychologie. München: Pearson Studium.
Schimmack, U. (2013). The ironic effect of significant results on the credibility of multiple-study articles. Psychological Methods, 17(4), 551–566.
Sedlmeier, P. (1996). Jenseits des Signifikanz-Rituals: Ergänzungen und Alternativen. Methods of Psychological Research – Online, 1(4), 41–63.
Sedlmeier, P., & Renkewitz, F. (2008). Forschungsmethoden und Statistik in der Psychologie. München: Pearson Studium.
Shakespeare, T. P., Gebski, V. J., Veness, M. J., & Simes, J. (2001). Improving interpretation of clinical studies by use of confidence levels, clinical significance curves, and risk-benefit contours. The Lancet, 357(9265), 1349–1353. doi:10.1016/S0140-6736(00)04522-0.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science. doi:10.1177/0956797611417632.
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534–547. doi:http://dx.doi.org/10.1037/a0033242.
Smith, N. C. (1970). Replication studies: A neglected aspect of psychological research. American Psychologist, 25, 970–975.
Sparkes, A. C., & Smith, B. (2009). Judging the quality of qualitative inquiry. Psychology of Sport and Exercise, 10(5), 491–497.
Steiger, J. H. (2001). NDC: Noncentral distribution calculator [Statistical program]. Vanderbilt University, USA: Department of Psychology and Human Development.
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and test of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9(2), 164–182.
Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and the evaluation of statistical methods. In L. L. Harlow (Hrsg.), What if there were no significance tests? (S. 221–257). London: Routledge.
Sterne, J. A. C., & Smith, G. D. (2001). Sifting the evidence – what's wrong with significance tests? British Medical Journal, 322, 226–231.
Strauss, B., Hagemann, N., & Loffing, F. (2009). Die Drei-Punkte-Regel in der deutschen 1. Fußballbundesliga und der Anteil unentschiedener Spiele. Sportwissenschaft, 39(1), 16–22. doi:10.1007/s12662-009-0003-9.
Strauss, B., & Ntoumanis, N. (2015). Our PSE journey: Looking back and forward. Psychology of Sport and Exercise, 16, Part 3(0), 181–182. doi:http://dx.doi.org/10.1016/j.psychsport.2014.11.002.
Thompson, B. (2001). Significance, effect sizes, stepwise methods, and other issues: Strong arguments move the field. Journal of Experimental Education, 70(1), 80.
Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31(3), 25–32. doi:10.3102/0013189x031003025.
Thompson, B. (2007). Effect sizes, confidence intervals, and confidence intervals for effect sizes. Psychology in the Schools, 44(5), 423–432. doi:10.1002/pits.20234.
Thompson, E. N. (1974). A plea for replication. California Journal of Educational Research, 25, 79–86.
Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1–2. doi:10.1080/01973533.2015.1012991.
Troncoso Skidmore, S., & Thompson, B. (2013). Bias and precision of some classical ANOVA effect sizes when assumptions are violated. Behavior Research Methods, 45(2), 536–546. doi:10.3758/s13428-012-0257-2.
Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632–638. doi:10.1177/1745691612463078.
Welsh, A. H., & Knight, E. J. (2015). ‘‘magnitude-based inference’’: A statistical review. Medicine and Science in Sports and Exercise, 47(4), 874–884. doi:10.1249/MSS.0000000000000451.
Westermann, R. (2000). Wissenschaftstheorie und Experimentalmethodik. Göttingen: Hogrefe.
Wilkinson, M. (2012). Testing the null hypothesis: The forgotten legacy of Karl Popper? Journal of Sports Sciences, 31(9), 919–920. doi:10.1080/02640414.2012.753636.
Wilkinson, M. (2014). Distinguishing Between Statistical Significance and Practical/Clinical Meaningfulness Using Statistical Inference. Sports Medicine, 44(3), 295–301. doi:10.1007/s40279-013-0125-y.
Zemková, E. (2014). Significantly and practically meaningful differences in balance research: P values and/or effect sizes? Sports Medicine, 44(7), 879–886. doi:10.1007/s40279-014-0185-7.
Zhu, W. (2012). Sadly, the earth is still round (p < 0.05). Journal of Sport and Health Science, 1(1), 9–11.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Interessenkonflikt
D. Büsch und B. Strauß geben an, dass kein Interessenkonflikt besteht.
Rights and permissions
About this article
Cite this article
Büsch, D., Strauß, B. Wider die „Sternchenkunde“!. Sportwiss 46, 53–59 (2016). https://doi.org/10.1007/s12662-015-0376-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12662-015-0376-x