Abstract
This research provides an example of testing for differential item functioning (DIF) using multiple indicator multiple cause (MIMIC) structural equation models. True/False items on five scales of the Schedule for Nonadaptive and Adaptive Personality (SNAP) were tested for uniform DIF in a sample of Air Force recruits with groups defined by gender and ethnicity. Uniform DIF exists when an item is more easily endorsed for one group than the other, controlling for group mean differences on the variable under study. Results revealed significant DIF for many SNAP items and some effects were quite large. Differentially-functioning items can produce measurement bias and should be either deleted or modeled as if separate items were administered to different groups. Future research should aim to determine whether the DIF observed here holds for other samples.
Similar content being viewed by others
Notes
Uniform DIF occurs when item thresholds differ between groups: An item is more easily endorsed for one group than the other. DIF is nonuniform if item discrimination also differs between groups; thus, the group difference depends on the level of the latent variable.
Because each scale is evaluated for DIF separately from the other scales, it is not problematic for a SNAP item to be included on more than one scale.
References
American Psychiatric Association (1987). Diagnostic and statistical manual of mental disorders (3rd ed.). Washington, DC: Author.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289–300.
Birnbaum, A. (1968). Some latent trait models. In F. M. Lord, & M. R. Novick (Eds.) Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison & Wesley.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.
Clark, L. (1996). SNAP Manual for administration, scoring, and interpretation. Minneapolis: University of Minnesota Press.
Finch, H. (2005). The MIMIC model as a method for detecting DIF: comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278–295.
Hathaway, S. R., & McKinley, J. C. (1940). A multiphasic personality schedule (Minnesota): I. Construction of the schedule. Journal of Psychology, 10, 249–254.
Holland, P.W. & Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.
MacIntosh, R., & Hashim, S. (2003). Variance estimation for converting MIMIC model parameters to IRT parameters in DIF analysis. Applied Psychological Measurement, 27, 372–379.
Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.
Millsap, R. E., & Everson, H. T. (1993). Methodology review: statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334.
Muthén, B. O. (1985). A method for studying the homogeneity of test items with respect to other relevant variables. Journal of educational statistics, 10, 121–132.
Muthén, B. O. (1988). Some uses of structural equation modeling in validity studies: Extending IRT to external variables. In H. Wainer, & H. I. Braun (Eds.) Test Validity (pp. 213–238). Hillsdale, NJ: Lawrence Erlbaum.
Muthén, B. O. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54, 557–585.
Muthén, B. O., Kao, C., & Burstein, L. (1991). Instructionally sensitive psychometrics: an application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28, 1–22.
Muthén, L. K., & Muthén, B. O. (2007). Mplus: Statistical Analysis with Latent Variables, (Version 4.21) [Computer software]. Los Angeles, CA: Muthén & Muthén.
Oltmanns, T. F., & Turkheimer, E. (2006). Perceptions of self and others regarding pathological personality traits. In R. F. Krueger, & J. Tackett (Eds.) Personality and psychopathology: Building bridges. New York: Guilford.
Simms, L. J., & Clark, L. A. (2006). Chapter 17: The schedule for nonadaptive and adaptive personality (SNAP): A dimensional measure of traits relevant to personality and personality pathology. Differentiating Normal & Abnormal Personality. New York: Springer.
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1291–1306.
Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99, 118–128.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer, & H. Braun (Eds.) Test validity (pp. 147–169). Hillsdale, NJ: Erlbaum.
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P.W. Holland, & H. Wainer (Eds.) Differential item functioning (pp. 67–111). Hillsdale, NJ: Erlbaum.
Thissen, D., Steinberg, L., & Kuang, D. (2002). Quick and easy implementation of the Benjamini-Hochberg procedure for controlling the false positive rate in multiple comparisons. Journal of Educational and Behavioral Statistics, 27, 77–83.
Waller, N. G., Thompson, J. S., & Wenk, E. (2000). Using IRT to separate measurement bias from true group differences on homogeneous and heterogeneous scales: An illustration with the MMPI. Psychological Methods, 5, 125–146.
Wang, W. (2004). Effects of anchor item methods on detection of differential item functioning within the family of Rasch models. The Journal of Experimental Education, 72, 221–261.
Wang, W., & Yeh, Y. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479–498.
Williams, V. S. L., Jones, L. V., & Tukey, J. W. (1999). Controlling error in multiple comparisons, with examples from state-to-state differences in educational achievement. Journal of Educational and Behavioral Statistics, 24, 42–69.
Woods, C. M. (in press). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research.
Woods, C. M., Oltmanns, T. F., & Turkheimer, E. (2008). Detection of aberrant responding on a personality scale in a military sample: An application of evaluating person fit with two-level logistic regression. Psychological Assessment, 20, 159–168.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Woods, C.M., Oltmanns, T.F. & Turkheimer, E. Illustration of MIMIC-Model DIF Testing with the Schedule for Nonadaptive and Adaptive Personality. J Psychopathol Behav Assess 31, 320–330 (2009). https://doi.org/10.1007/s10862-008-9118-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10862-008-9118-9