Abstract
This paper addresses the role of social values in educational and psychological measurement, with special attention to the consequences of testing as validity evidence, which is an inherently value-dependent enterprise. The primary measurement standards that must be met to legitimize a proposed test use are those of reliability, validity, and fairness, which are also value-laden concepts. Evidence of reliability signifies that something is being measured; the major concern is score consistency or stability. Evidence of validity circumscribes the nature of that something; the major concern is score meaning. Evidence of fairness indicates that score meaning does not differ consequentially across individuals, groups, or settings; the major concern is comparability.
Reprinted by permission of Educational Testing Service, the copyright holder.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
American Psychological Association, American Educational Research Association, & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Bennett, R. E. (1998). Computer-based testing for examinees with disabilities: On the road to generalized accommodation (pp. 181–191). In S. Messick (Ed.), Assessment in higher education. Mahwah, NJ: Erlbaum.
Brandon, P. R. (1996, June). Fatal flaw in consequential validity theory. AERA-D Division D: Measurement and Research Methodology. Available on-line at: http://lists.asu.edu/cgi-bin/wa?A2=ind9606&L=aera-d&F=&S=&P=5219.
Civil Rights Act of 1991. (November 21, 1991). Publ. L. No. 102-166, 105 Stat. 1071.
Cleary, T. A. (1968). Test bias: Prediction of grades of Negro and White students in integrated colleges. Journal of Educational Measurement, 5, 115–124.
Cole, N. S. (1973). Bias in selection. Journal of Educational Measurement, 10, 237–255.
Cook, T.D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand McNally.
Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 443–507). Washington, DC: American Council on Education.
Cronbach, L. J. (1976). Equity in selection: Where psychometrics and political philosophy meet. Journal of Educational Measurement, 13, 31–41.
Cronbach, L. J. (1988). Five perspectives on validation argument (pp. 3–17). In H. Wainer & H. Braun (Eds.), Test validity. Hillsdale, NJ: Lawrence Erlbaum Associates.
Darlington, R. B. (1971). Another look at “cultural fairness.” Journal of Educational Measurement, 8, 71–82.
Dunnette, M. D., & Borman, W. C. (1979). Personnel selection and classification systems. Annual Review of Psychology, 30, 477–525.
Einhorn, H. J., & Bass, A. R. (1971). Methodological considerations relevant to discrimination in employment testing. Psychological Bulletin, 75, 261–269.
Embretson (Whitely), S. (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93, 179–197.
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp. 105–146). New York: Macmillan.
Ferguson, G. A. (1956). On transfer and the abilities of man. Canadian Journal of Psychology, 10, 121–131.
Gottfredson, L.S. (1994). The science and politics of race-norming. American Psychologist, 49, 955–963.
Gordon, E. (1998). Human diversity and equitable assessment (203–211). In S. Messick (Ed.), Assessment in higher education. Mahwah, NJ: Erlbaum.
Gross A.L., & Su, W. (1975). Defining a “fair” and “unbiased” selection model: A question of utilities. Journal of Applied Psychology, 60, 345–351.
Hartigan, J. A., & Wigdor, A. K. (Eds.) (1989). Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery. Washington, DC: National Academy Press.
Heller, K. A., Holtzman, W. H., & Messick, S. (Eds.) (1982). Placing children in special education: A strategy for equity. Washington, DC: National Academy Press.
Hunter, J. E., & Schmidt, F. I. (1976). Critical analysis of the statistical and ethical implications of various definitions of test bias. Psychological Bulletin, 83, 1053–1071.
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.
Lennon, R. T. (1956). Assumptions underlying the use of content validity. Educational and Psychological Measurement, 16, 294–304.
Linn, R. L. (1973). Fair test use in selection. Review of Educational Research, 43, 139–161.
Linn, R. L. (1976). In search of fair selection procedures. Journal of Educational Measurement, 13, 53–58.
Linn, R. L. (1998). Partitioning responsibility for the evaluation of the consequences of assessment programs. Educational Measurement: Issues and Practice, 17(2), 28–30.
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635–694 (Monograph Supplement, 9).
Markus, K. (1998). Science, measurement, and validity: Is completion of Samuel Messick’s synthesis possible? Social Indicators Research, in press.
Mehrens, W. A. (1997). The consequences of consequential validity. Educational Measurement: Issues and Practice, 16(2), 16–18.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: Macmillan.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.
Messick, S. (1998). Test validity: A matter of consequence. Social Indicators Research, in press.
Novick, M. R., & Ellis, D. D. (1977). Equal opportunity in educational and employment selection. American Psychologist, 72, 306–320.
Peak, H. (1953). Problems of observation. In L. Festinger & D. Katz (Eds.), Research methods in the behavioral sciences (pp.243-299). Hinsdale, IL: Dryden.
Petersen, N. S., & Novick, M. R. (1976). An evaluation of some models for culture-fair selection. Journal of Educational Measurement, 13, 3–29.
Popham, W. J. (1997). Consequential validity: Right concern-wrong concept. Educational Measurement: Issues and Practice, 16(2), 9–13.
Sackett, P. R. & Wilk, S. L. (1994). Within-group norming and other forms of score adjustment in preemployment testing. American Psychologist, 49, 929–954.
Sawyer, R. L., Cole, N. S., & Cole, J. W. L. (1976). Utilities and the issue of fairness in a decision theoretic model for selection. Journal of Educational Measurement, 13, 59–76.
Shepard, L. A. (1993). Evaluating test validity. Review of Research in Education, 19, 405–450.
Shulman, L. S. (1970). Reconstruction of educational research. Review of Educational Research, 40, 371–396.
Tenopyr, M. L. (1996, April). Construct-consequences confusion. Paper presented at the annual meeting of the Society for Industrial and Organizational Psychology, San Diego.
Thorndike, R. L. (1971). Concepts of culture fairness. Journal of Educational Measurement, 8, 63–70.
Wiggins, G. (1993). Assessment: Authenticity, context, and validity. Phi Delta Kappan, 75, 200–214.
Wiley, D. E. (1991). Test validity and invalidity reconsidered. In R. E. Snow & D. E. Wiley (Eds.), Improving inquiry in the social sciences: A volume in honor of Lee J. Cronbach (pp. 75–107). Hillsdale, NJ: Erlbaum.
Willingham, W. W. (1998). A systemic view of test validity (213-242). In S. Messick (Ed.), Assessment in higher education. Mahwah, NJ: Erlbaum.
Willingham, W.W., & Cole, N. S. (1997). Gender bias and fair assessment. Hillsdale, NJ: Erlbaum.
Willingham, W. W., Ragosta, M., Bennett, R. E., Braun, H., Rock, D.A., & Powers, D. E. (1988). Testing handicapped people. Boston: Allyn & Bacon.
Yen, W. M. (1998). Investigating the consequential aspects of validity: Who is responsible and what should they do? Educational Measurement: Issues and Practice, 17(2), 5.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer Science+Business Media New York
About this chapter
Cite this chapter
Messick, S. (2000). Consequences of Test Interpretation and Use: The Fusion of Validity and Values in Psychological Assessment. In: Goffin, R.D., Helmes, E. (eds) Problems and Solutions in Human Assessment. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-4397-8_1
Download citation
DOI: https://doi.org/10.1007/978-1-4615-4397-8_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-6978-3
Online ISBN: 978-1-4615-4397-8
eBook Packages: Springer Book Archive