Skip to main content
Log in

The Effect of Prior Probability on Skill in Two-Group Discriminant Analysis

  • Published:
Quality and Quantity Aims and scope Submit manuscript

Abstract

Although the weights in a discriminant function (both linear and quadratic) are independent of group prior probabilities, the performance of the classifier (on the training and validation data) is sensitively dependent on these often unknown probabilities. After reviewing some defects of a popular measure of performance in the situation where the group sizes are naturally disproportionate, three alternative measures of performance (or association) are considered and it is shown that the behavior of the measures as a function of group prior probability is different between measures. Consequently, the optimum choice of the group prior probability depends on the specific measure of performance. Among the measures considered, only two measures - the index of mean square contingency and the Heidke Skill Statistic - are found to be well defined in the disparate-group size situation, and are, therefore, recommended. An empirical data set, dealing with delinquency among high school students is employed to illustrate all of the findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderer, P., Saletu, B., Klöppel, B., Semlitsch, H. V. & Werner, H. (1994). Discrimination between demented patients and normals based on topographic EEG slow wave activity: comparison between z statistics, discriminant analysis and artificial neural network classifiers, Electroencephalography and Clinical Neurophysiology 91: 108–117.

    Google Scholar 

  • Anson, O. & Sagy, S. (1995). Marital violence: Comparing women in violent and nonviolent unions, Human Relations 48: 285–305.

    Google Scholar 

  • Azari, N. P., Pietrini, P., Horwitz, B. & Pettigrew, K. D. (1993). Individual differences in cerebral metabolic patterns during pharmacotherapy in obsessive-compulsive disorder: A multiple regression/discriminant analysis of positron emission tomographic data, Biological Psychiatry 34: 798–809.

    Google Scholar 

  • Bellantoni, L., Conway, J. S., Jacobsen, J. E., Pan, Y. B. & Wu, S. L. (1991). Using neural networks with jet shapes to identify b jets in e + e interactions, Nuclear Instruments and Methods A-310: 610–615.

    Google Scholar 

  • Bernard, L. C., McGrath, M. J. & Houston, W. (1993). Discriminating between simulated malingering and closed head injury on the Wechsler Memory Scale — revised, Archives of Clinical Neuropsychology 8: 539–551.

    Google Scholar 

  • Boone, S. L. (1991). Aggression in African-American boys: A discriminant analysis. Genetic, Social, and General Psychology Monographs, 117: 203–228.

    Google Scholar 

  • Bowser-Chao, D. & Dzialo, D. L. (1993). Comparison of the use of binary decision trees and neural networks in top-quark detection, Physical Review D 47: 1900–1905.

    Google Scholar 

  • Camilli, G. (1990). The test of homogeneity for 2 × 2 contingency tables: A review of and some personal opinions on the controversy, Psychological Bulletin 108: 135–145.

    Google Scholar 

  • Cherry, A. (1993). Combining cluster and discriminant analysis to develop a social bond typology of runaway youth, Research on Social Work Practice 3: 175–190.

    Google Scholar 

  • Christensen, L. & Duncan, K. (1995). Distinguishing depressed from nondepressed individuals using energy and psychosocial variables, Journal of Consulting and Clinical Psychology 63: 495–498.

    Google Scholar 

  • Dammers, E. (1993). Measurement in the ex post evaluation of forecasts, Quality and Quantity 27: 31–45.

    Google Scholar 

  • Dannehl, C. R. & Groth, A. J. (1992). Communist and non-communist Europe: Functional differentiation, 1970–1985, Social Indicators Research 27: 59–87.

    Google Scholar 

  • Doolittle, M. H. (1888). Association ratios, Bulletin of the Philosophical Society of Washington 10: 83–87, 94–96.

    Google Scholar 

  • Famularo, R., Fenton, T., Kinscherff, R., Barnum, R., Bolduc, S. & Bunschaft, D. (1992). Differences in neuropsychological and academic achievement between adolescent delinquents and status offenders, American Journal of Psychiatry 149: 1252–1257.

    Google Scholar 

  • Fienberg, S. E. (1980). The Analysis of Cross-Classified Categorical Data (2nd edn). Cambridge, MA: MIT Press.

    Google Scholar 

  • Glick, N. (1978). Additive estimators for probabilities of correct classification, Pattern Recognition 10: 211–222.

    Google Scholar 

  • Goodman, L. A. & Kruskal, W. H. (1954). Measures of association for cross classifications, American Statistical Association Journal 49: 723–764.

    Google Scholar 

  • Goodman, L. A. & Kruskal, W. H. (1959). Measures of association for cross classifications. II: Further discussion and references, American Statistical Association Journal 54: 123–163.

    Google Scholar 

  • Haber, M. (1990). Comments on “the test of homogeneity for 2 × 2 contingency tables: A review of and some personal opinions on the controversy” by G. Camilli, Psychological Bulletin 108: 146–149.

    Google Scholar 

  • Hammond, S. M. & Lienert, G. A. (1995). Modified Phi correlation coefficients for the multivariate analysis of ordinally scaled variables, Educational and Psychological Measurement 55: 225–236. Hays, W. L. (1973). Statistics for the Social Sciences (2nd edn). New York, NY: Holt, Rinehart and Winston.

    Google Scholar 

  • Huberty, C. A. (1994). Applied Discriminant Analysis. New York, NY: John Wiley & Sons, Inc.

    Google Scholar 

  • Klecka, W. R. (1980). Discriminant Analysis. Newbury Park, CA: Sage Publications.

    Google Scholar 

  • Lachenbruch, P. A. (1975). Discriminant Analysis. New York, NY: Hafner Press.

    Google Scholar 

  • Marzban, C. (1997). Scalar Measures of Performance in Rare-event Situations, To appear in Weather and Forecasting.

  • McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. New York, NY: A Wiley-Interscience Publication.

    Google Scholar 

  • Morwitz, V. G. & Schmittlein, D. (1992). Using segmentation to improve sales forecasts based on purchase intent: Which “intenders” actually buy?, Journal of Marketing Research 29: 391–405.

    Google Scholar 

  • Murphy, A. H. & Daan, H. (1985). Forecast Evaluation, in A. H. Murphy & R. W. Katz (eds), Probability, Statistics and Decision Making in the Atmospheric Sciences, pp. 379–437. Boulder, CO: Westview Press.

    Google Scholar 

  • National Opinion Research Center. (1980). High School and Beyond Information for Users: Base Year (1980) Data. Chicago: Author.

    Google Scholar 

  • Ott, L., Larson, R. F. & Mendenhall, W. (1983). Statistics: A Tool for the Social Sciences (3rd edn). Boston, MA: Duxbury Press.

    Google Scholar 

  • Paik, H. & Comstock, G. (1994). The effects of television violence on antisocial behavior: A meta-analysis, Communication Research 21: 516–546.

    Google Scholar 

  • Paik, H. & Marzban, C. (1995). Predicting television extreme viewers and nonviewers: A neural network analysis, Human Communication Research 22: 284–306.

    Google Scholar 

  • Parshall, C. G. & Kromrey, J. D. (1996). Tests of independence in contingency tables with small samples: A comparison of statistical power, Educational and Psychological Measurement 56: 26–44.

    Google Scholar 

  • Peirce, C. S. (1884). The numerical measure of the success of predictions [Letter to the editor], Science 4: 453–454.

    Google Scholar 

  • Reiss, A. J. & Roth, J. A. (eds). (1994). Understanding and Preventing Violence, Vol. 3: Social Influences. Washington, DC: National Academy Press.

    Google Scholar 

  • SAS Institute Inc. (1989). SAS/STAT® User's Guide, Version 6, Fourth Edition, 1, Cary, NC: SAS Institute Inc.

    Google Scholar 

  • Stimpfl-Abele, G. (1991). Recognition of decays of charged tracks with neural network techniques, Computer Physics Communications 67: 183–192.

    Google Scholar 

  • Wilson, R. L. & Hardgrave, B. C. (1995). Predicting graduate student success in an MBA program regression versus classification, Educ. Psych. Measurement 55: 186–195.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paik, H. The Effect of Prior Probability on Skill in Two-Group Discriminant Analysis. Quality & Quantity 32, 201–211 (1998). https://doi.org/10.1023/A:1004359127048

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1004359127048

Navigation