Abstract
Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits.
Similar content being viewed by others
Notes
An R package and example code for the proposed approach can be downloaded from http://www.scientifichpc.com/flagirt.html.
References
Anderson, C. J., & Vermunt, J. K. (2000). Log-multiplicative association models as latent variable models for nominal and/or ordinal data. Sociological Methodology, 30, 81–121.
Anderson, C. J., & Yu, H.-T. (2007). Log-multiplicative association models as item response models. Psychometrika, 72, 5–23.
Barber, R. F., & Drton, M. (2015). High-dimensional Ising model selection with Bayesian information criteria. Electronic Journal of Statistics, 9, 567–607.
Belloni, A., & Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli, 19, 521–547.
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society Series B (Methodological), 36, 192–236.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison-Wesley.
Boschloo, L., van Borkulo, C. D., Rhemtulla, M., Keyes, K. M., Borsboom, D., & Schoevers, R. A. (2015). The network structure of symptoms of the diagnostic and statistical manual of mental disorders. PLoS One, 10, e0137621.
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153–168.
Braeken, J. (2011). A boundary mixture approach to violations of conditional independence. Psychometrika, 76, 57–76.
Braeken, J., Tuerlinckx, F., & De Boeck, P. (2007). Copula functions for residual dependency. Psychometrika, 72, 393–411.
Cai, L., Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods, 16, 221–248.
Chen, Y. (2016). Latent variable modeling and statistical learning. Ph.D. thesis, Columbia University. Available at http://academiccommons.columbia.edu/catalog/ac:198122.
Chen, Y., Li, X., Liu, J., & Ying, Z. (2016) A fused latent and graphical model for multivariate binary data. Available at arXiv:1606.08925v1.pdf. ArXiv preprint.
Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771.
Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015a). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110, 850–866.
Chen, Y., Liu, J., & Ying, Z. (2015b). Online item calibration for Q-matrix in CD-CAT. Applied Psychological Measurement, 39, 5–15.
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22, 265–289.
Cramer, A. O., Sluis, S., Noordhof, A., Wichers, M., Geschwind, N., Aggen, S. H., et al. (2012). Dimensions of normal personality as networks in search of equilibrium: You can’t like parties if you don’t like people. European Journal of Personality, 26, 414–431.
Cramer, A. O., Waldorp, L. J., van der Maas, H. L., & Borsboom, D. (2010). Complex realities require complex theories: Refining and extending the network approach to mental disorders. Behavioral and Brain Sciences, 33, 178–193.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates Publishers.
Epskamp, S., Maris, G. K., Waldorp, L. J., & Borsboom, D. (2016). Network psychometrics. arXiv preprint arXiv:1609.02818.
Epskamp, S., Rhemtulla, M., & Borsboom, D. (2017). Generalized network pschometrics: Combining network and latent variable models. Psychometrika, 82, 904–927.
Eysenck, S., & Barrett, P. (2013). Re-introduction to cross-cultural studies of the EPQ. Personality and Individual Differences, 54, 485–489.
Eysenck, S. B., Eysenck, H. J., & Barrett, P. (1985). A revised version of the psychoticism scale. Personality and Individual Differences, 6, 21–29.
Ferrara, S., Huynh, H., & Michaels, H. (1999). Contextual explanations of local dependence in item clusters in a large scale hands-on science performance assessment. Journal of Educational Measurement, 36, 119–140.
Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. In Advances in Neural Information Processing Systems (pp 604–612).
Fried, E. I., Bockting, C., Arjadi, R., Borsboom, D., Amshoff, M., Cramer, A. O., et al. (2015). From loss to loneliness: The relationship between bereavement and depressive symptoms. Journal of Abnormal Psychology, 124, 256–265.
Gibbons, R. D., Bock, R. D., Hedeker, D., Weiss, D. J., Segawa, E., Bhaumik, D. K., et al. (2007). Full-information item bifactor analysis of graded response data. Applied Psychological Measurement, 31, 4–19.
Gibbons, R. D., & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57, 423–436.
Holland, P. W. (1990). The Dutch identity: A new tool for the study of item response models. Psychometrika, 55, 5–18.
Holland, P. W., & Wainer, H. (2012). Differential item functioning. New York, NY: Routledge.
Hoskens, M., & De Boeck, P. (1997). A parametric model for local dependence among test items. Psychological Methods, 2, 261–277.
Ip, E. H. (2002). Locally dependent latent trait model and the Dutch identity revisited. Psychometrika, 67, 367–386.
Ip, E. H. (2010). Empirically indistinguishable multidimensional IRT and locally dependent unidimensional item response models. British Journal of Mathematical and Statistical Psychology, 63, 395–416.
Ip, E. H., Wang, Y. J., De Boeck, P., & Meulders, M. (2004). Locally dependent latent trait model for polytomous responses with application to inventory of hostility. Psychometrika, 69, 191–216.
Ising, E. (1925). Beitrag zur theorie des ferromagnetismus. Zeitschrift für Physik A Hadrons and Nuclei, 31, 253–258.
Knowles, E. S., & Condon, C. A. (2000). Does the rose still smell as sweet? Item variability across test forms and revisions. Psychological Assessment, 12, 245–252.
Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques. Cambridge, MA: MIT press.
Kruis, J., & Maris, G. (2016). Three representations of the Ising model. Scientific Reports, 6(34175), 1–11.
Laird, N. M. (1991). Topics in likelihood-based methods for longitudinal data analysis. Statistica Sinica, 1, 33–50.
Lee, J. D., & Hastie, T. J. (2015). Learning the structure of mixed graphical models. Journal of Computational and Graphical Statistics, 24, 230–253.
Li, Y., Bolt, D. M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30, 3–21.
Liu, J. (2017). On the consistency of Q-matrix estimation: A commentary. Psychometrika, 82, 523–527.
Liu, J., Xu, G., & Ying, Z. (2012). Data-driven learning of Q-matrix. Applied Psychological Measurement, 36, 548–564.
Liu, J., Xu, G., & Ying, Z. (2013). Theory of the self-learning Q-matrix. Bernoulli, 19, 1790–1817.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Marsman, M., Maris, G., Bechger, T., & Glas, C. (2015). Bayesian inference for low-rank Ising networks. Scientific Reports, 5(9050), 1–7.
McKinley, R. L., & Reckase, M. D. (1982). The use of the general Rasch model with multidimensional item response data. Iowa City, IA: American College Testing.
Pan, J., Ip, E. H., & Dubé, L. (2017). An alternative to post hoc model modification in confirmatory factor analysis: The bayesian lasso. Psychological Methods, 22, 687–704.
Parikh, N., & Boyd, S. P. (2014). Proximal algorithms. Foundations and Trends in Optimization, 1, 127–239.
Rasch, G. (1960). Probabilistic models for some intelligence and achievement tests. Copenhagen: Danish Institute for Educational Research.
Ravikumar, P., Wainwright, M. J., & Lafferty, J. D. (2010). High-dimensional ising model selection using 1-regularized logistic regression. The Annals of Statistics, 38, 1287–1319.
Reckase, M. (2009). Multidimensional item response theory. New York, NY: Springer.
Reise, S. P., Horan, W. P., & Blanchard, J. J. (2011). The challenges of fitting an item response theory model to the social anhedonia scale. Journal of Personality Assessment, 93, 213–224.
Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16, 19–31.
Rhemtulla, M., Fried, E. I., Aggen, S. H., Tuerlinckx, F., Kendler, K. S., & Borsboom, D. (2016). Network analysis of substance abuse and dependence symptoms. Drug and Alcohol Dependence, 161, 230–237.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
Schwarz, N. (1999). Self-reports: How the questions shape the answers. American Psychologist, 54, 93–105.
Sun, J., Chen, Y., Liu, J., Ying, Z., & Xin, T. (2016). Latent variable selection for multidimensional item response theory models via \(L_1\) regularization. Psychometrika, 81, 921–939.
van Borkulo, C. D., Borsboom, D., Epskamp, S., Blanken, T. F., Boschloo, L., Schoevers, R. A., et al. (2014). A new method for constructing networks from binary data. Scientific Reports, 4(5918), 1–10.
van der Maas, H. L., Dolan, C. V., Grasman, R. P., Wicherts, J. M., Huizenga, H. M., & Raijmakers, M. E. (2006). A dynamical model of general intelligence: The positive manifold of intelligence by mutualism. Psychological Review, 113, 842–861.
Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. In W. J. van der Linden & G. A. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245–269). New York, NY: Springer.
Wang, W.-C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 9, 126–149.
Yao, L., & Schwarz, R. D. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. Applied Psychological Measurement, 30, 469–492.
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125–145.
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213.
Acknowledgements
This research was funded by NSF grant DMS-1712657, NSF grant SES-1323977, NSF grant IIS-1633360, Army Research Office grant W911NF-15-1-0159, and NIH grant R01GM047845.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Chen, Y., Li, X., Liu, J. et al. Robust Measurement via A Fused Latent and Graphical Item Response Theory Model. Psychometrika 83, 538–562 (2018). https://doi.org/10.1007/s11336-018-9610-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-018-9610-4