Abstract
Diagnostic models (DMs) provide researchers and practitioners with tools to classify respondents into substantively relevant classes. DMs are widely applied to binary response data; however, binary response models are not applicable to the wealth of ordinal data collected by educational, psychological, and behavioral researchers. Prior research developed confirmatory ordinal DMs that require expert knowledge to specify the underlying structure. This paper introduces an exploratory DM for ordinal data. In particular, we present an exploratory ordinal DM, which uses a cumulative probit link along with Bayesian variable selection techniques to uncover the latent structure. Furthermore, we discuss new identifiability conditions for structured multinomial mixture models with binary attributes. We provide evidence of accurate parameter recovery in a Monte Carlo simulation study across moderate to large sample sizes. We apply the model to twelve items from the public-use, Early Childhood Longitudinal Study, Kindergarten Class of 1998–1999 approaches to learning and self-description questionnaire and report evidence to support a three-attribute solution with eight classes to describe the latent structure underlying the teacher and parent ratings. In short, the developed methodology contributes to the development of ordinal DMs and broadens their applicability to address theoretical and substantive issues more generally across the social sciences.
Similar content being viewed by others
References
Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational and Behavioral Statistics, 17(3), 251–269.
Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.
Allman, E. S., Matias, C., & Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. Annals of Statistics, 37, 3099–3132.
Bao, J., & Hanson, T. E. (2015). Bayesian nonparametric multivariate ordinal regression. Canadian Journal of Statistics, 43(3), 337–357.
Béguin, A. A., & Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541–561.
Chen, J., & de la Torre, J. (2013). A general cognitive diagnosis model for expert-defined polytomous attributes. Applied Psychological Measurement, 37(6), 419–437.
Chen, J., & de la Torre, J. (2018). Introducing the general polytomous diagnosis modeling framework. Frontiers in Psychology, 9, 1–9.
Chen, Y., & Culpepper, S. A. (2018). A multivariate probit model for learning trajectories with application to classroom assessment. In Paper presentation at the international meeting of the psychometric society, New York.
Chen, Y., Culpepper, S. A., Chen, Y., & Douglas, J. (2018). Bayesian estimation of the DINA Q-matrix. Psychometrika, 83, 89–108.
Chen, Y., Culpepper, S. A., & Liang, F. (2018). Beyond the Q-matrix: A general approach to cognitive diagnostic models. In Paper presentation at the international meeting of the psychometric society, New York.
Chen, Y., Culpepper, S. A., Wang, S., & Douglas, J. A. (2018). A hidden Markov model for learning trajectories in cognitive diagnosis with application to spatial rotation skills. Applied Psychological Measurement, 42, 5–23.
Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110(510), 850–866.
Cowles, M. K. (1996). Accelerating Monte Carlo Markov chain convergence for cumulative-link generalized linear models. Statistics and Computing, 6(2), 101–111.
Culpepper, S. A. (2015). Bayesian estimation of the DINA model with Gibbs sampling. Journal of Educational and Behavioral Statistics, 40(5), 454–476.
Culpepper, S. A. (2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 1142–1163.
Culpepper, S. A. (2019). Estimating the cognitive diagnosis Q matrix with expert knowledge: Application to the fraction-subtraction dataset. Psychometrika, 84, 333–357. 10.1007/s11336-018-9643-8.
Culpepper, S. A., & Chen, Y. (2018). Development and application of an exploratory reduced reparameterized unified model. Journal of Educational and Behavioral Statistics, 44, 3–24.
DeCarlo, L. T. (2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-matrix. Applied Psychological Measurement, 35(1), 8–26.
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.
de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353.
de la Torre, J., & Douglas, J. A. (2008). Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction subtraction data. Psychometrika, 73(4), 595–624.
DeYoreo, M., & Kottas, A. (2018). Bayesian nonparametric modeling for multivariate ordinal regression. Journal of Computational and Graphical Statistics, 27(1), 71–84.
DeYoreo, M., Reiter, J. P., & Hillygus, D. S. (2017). Bayesian mixture models with focused clustering for mixed ordinal and nominal data. Bayesian Analysis, 12(3), 679–703.
Fang, G., Liu, J., & Ying, Z. (2019). On the identifiability of diagnostic classification models. Psychometrika, 84, 19–40.
Green, B. F. (1951). A general solution for the latent class model of latent structure analysis. Psychometrika, 16(2), 151–166.
Haberman, S. J., von Davier, M., & Lee, Y.-H. (2008). Comparison of multidimensional item response models: Multivariate normal ability distributions versus multivariate polytomous ability distributions. ETS Research Report Series, 2008(2), 1–25.
Henson, R. A., & Templin, J. (2007). Importance of Q-matrix construction and its effects cognitive diagnosis model results. In Annual meeting of the national council on measurement in education, Chicago, IL.
Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210.
Hojtink, H., & Molenaar, I. W. (1997). A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks. Psychometrika, 62(2), 171–189.
Jain, S., & Neal, R. M. (2004). A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of Computational and Graphical Statistics, 13(1), 158–182.
Karelitz, T. M. (2004). Ordered category attribute coding framework for cognitive assessments. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign.
Kaya, Y., & Leite, W. L. (2017). Assessing change in latent skills across time with longitudinal cognitive diagnosis modeling: An evaluation of model performance. Educational and Psychological Measurement, 77(3), 369–388.
Kottas, A., Müller, P., & Quintana, F. (2005). Nonparametric Bayesian modeling for multivariate ordinal data. Journal of Computational and Graphical Statistics, 14(3), 610–625.
Kruskal, J. B. (1976). More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika, 41(3), 281–293.
Kruskal, J. B. (1977). Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and Its Applications, 18(2), 95–138.
Li, F., Cohen, A., Bottge, B., & Templin, J. (2016). A latent transition analysis model for assessing change in cognitive skills. Educational and Psychological Measurement, 76(2), 181–204.
Liu, J., Xu, G., & Ying, Z. (2013). Theory of the self-learning Q-matrix. Bernoulli, 19(5A), 1790–1817.
Liu, R., & Jiang, Z. (2018). Diagnostic classification models for ordinal item responses. Frontiers in Psychology, 9, 1–12.
Ma, W., & de la Torre, J. (2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69(3), 253–275.
Ma, W., & de la Torre, J. (2019). An empirical Q-matrix validation method for the sequential generalized DINA model. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12156.
Madison, M. J., & Bradshaw, L. P. (2018). Assessing growth in a diagnostic classification model framework. Psychometrika, 83, 963–990.
McDonald, R. P. (1962). A note on the derivation of the general latent class model. Psychometrika, 27(2), 203–206.
Proctor, C. H. (1970). A probabilistic formulation and statistical analysis of guttman scaling. Psychometrika, 35(1), 73–78.
Rost, J. (1988). Rating scale analysis with latent class models. Psychometrika, 53(3), 327–348.
Rupp, A. A., & Templin, J. L. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68(1), 78–96.
Shute, V. J., Hansen, E. G., & Almond, R. G. (2008). You can’t fatten a hog by weighing it-or can you? Evaluating an assessment for learning system called ACED. International Journal of Artificial Intelligence in Education, 18(4), 289–316.
Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30(4), 298–321.
Templin, J. L. (2004). Generalized linear mixed proficiency models. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign.
Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287.
Templin, J. L., Henson, R. A., Templin, S. E., & Roussos, L. (2008). Robustness of hierarchical modeling of skill association in cognitive diagnosis models. Applied Psychological Measurement, 32, 559–574.
Tourangeau, K., Nord, C., Lê, T., Sorongon, A., Hagedorn, M., Daly, P., & Najarian, M. (2015). Early childhood longitudinal study, kindergarten class of 2010–2011 (ECLS-K:2011), user’s manual for the ECLS-K:2011 kindergarten data file and electronic codebook, public version (NCES 2015-074). Early childhood longitudinal study, kindergarten class of 2010–2011 (ECLS-K:2011), user’s manual for the ECLS-K:2011 kindergarten data file and electronic codebook, public version (NCES 2015-074). U.S. Department of Education. Washington, DC: National Center for Education Statistics. https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2010070. Accessed 19 Apr 2018.
von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61(2), 287–307.
von Davier, M. (2009). Some notes on the reinvention of latent structure models as diagnostic classification models. Measurement: Interdisciplinary Research and Perspectives, 7, 67–74.
Wang, S., Yang, Y., Culpepper, S. A., & Douglas, J. (2017). Tracking skill acquisition with cognitive diagnosis models: A higher-order hidden Markov model with covariates. Journal of Educational and Behavioral Statistics, 43(1), 57–87.
Xu, G. (2017). Identifiability of restricted latent class models with binary responses. Annals of Statistics, 45(2), 675–707.
Xu, G., & Shang, Z. (2018). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association, 113(523), 1284–1295.
Ye, S., Fellouris, G., Culpepper, S. A., & Douglas, J. (2016). Sequential detection of learning in cognitive diagnosis. British Journal of Mathematical and Statistical Psychology, 69(2), 139–158.
Acknowledgements
This research was partially supported by National Science Foundation Methodology, Measurement, and Statistics Program Grants 1632023 and 1758631 and Spencer Foundation Grant 201700062. The manuscript benefited from the comments of Editor, Associate Editor, three blind reviewers and Jeff Douglas. Any remaining short-comings belong to the author.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Gibbs Sampling Algorithm and Full Conditional Distributions
Appendix: Gibbs Sampling Algorithm and Full Conditional Distributions
This section discusses the full conditional distributions used to approximate the posterior distribution of the ordinal diagnostic model parameters with Gibbs sampling. For iteration \(t=1,\ldots , T\) we sample:
- 1.
For \(i=1,\ldots ,n\),
- (a)
\({\varvec{\alpha }}_i^{(t)}\) from the multinomial full conditional distribution \({\varvec{\alpha }}_i^{(t)}|{\varvec{Y}}_i,\mathbf B ^{(t-1)},{\varvec{\alpha }}_1^{(t)},\ldots ,{\varvec{\alpha }}_{i-1}^{(t)},{\varvec{\alpha }}_{i+1}^{(t-1)},\ldots ,{\varvec{\alpha }}_{n}^{(t-1)}\) where the conditional probability \({\varvec{\alpha }}_i^{(t)}\) is classified as profile c is,
$$\begin{aligned} P({\varvec{\alpha }}_i^{(t)\top }{\varvec{v}}&=c|{\varvec{Y}}_i,\mathbf B ^{(t-1)},{\varvec{\alpha }}_1^{(t)},\ldots ,{\varvec{\alpha }}_{i-1}^{(t)},{\varvec{\alpha }}_{i+1}^{(t-1)},\ldots ,{\varvec{\alpha }}_{n}^{(t-1)})\nonumber \\&=\frac{(n_{ci}+n_{c0})\prod _{j=1}^J \theta _{jc,y_{ij}}^{(t-1)} }{\sum _{c=0}^{2^K-1} (n_{ci}+n_{c0}) \prod _{j=1}^J \theta _{jc,y_{ij}}^{(t-1)} } \end{aligned}$$(A1)where \(\theta _{jc,y_{ij}}^{(t-1)}=\Phi \left( \tau _{jc,y_{ij}+1}-{\varvec{a}}_c^\top {\varvec{\beta }}_j^{(t-1)}\right) -\Phi \left( \tau _{jc,y_{ij}}-{\varvec{a}}_c^\top {\varvec{\beta }}_j^{(t-1)}\right) \). Notice that we integrate \({\varvec{\pi }}\) from the prior distribution \(p(\mathbf A ,{\varvec{\pi }})=p({\varvec{\alpha }}_1,\ldots ,{\varvec{\alpha }}_n|{\varvec{\pi }})p({\varvec{\pi }})\) and instead use the conditional prior distribution \(p({\varvec{\alpha }}_i|{\varvec{\alpha }}_1^{(t)},\ldots ,{\varvec{\alpha }}_{i-1}^{(t)},{\varvec{\alpha }}_{i+1}^{(t-1)},\ldots ,{\varvec{\alpha }}_{n}^{(t-1)})\) which implies the usual \(\pi _c\) (e.g., see Equation 7 of Culpepper, 2019) is replaced with \(n_{ci}+n_{c0}\) where \(n_{ci}\) is the number of respondents other than i that are classified in class c (e.g., see Jain & Neal, 2004) and \(n_{c0}\) is the prior Dirichlet parameter (note \(n_{c0}=1\) for a uniform prior).
- (b)
For \(j=1,\ldots ,J\) update the latent augmented data from the full conditional distribution
$$\begin{aligned} Y_{ij}^{*(t)}|Y_{ij},{\varvec{\alpha }}_i^{(t)},{\varvec{\beta }}_j^{(t-1)}\sim \mathcal N({\varvec{a}}_i^{(t)\top }{\varvec{\beta }}_j^{(t-1)},1) \mathcal I(\tau _{jc,y_{ij}}<Y_{ij}^{*(t)}<\tau _{jc,y_{ij}+1}) \end{aligned}$$(A2)where \(\tau _{jc,y_{ij}}\) and \(\tau _{jc,y_{ij}+1}\) are lower and upper thresholds for the observed value of \(Y_{ij}\) for class c and item j. Recall we follow previously discussed strategies and fix the thresholds as \({\varvec{\tau }}_j=(0,2,\ldots , 2(M_j-2))^\top \).
- (a)
- 2.
Update the latent class probabilities (i.e., the mixing weights) as \({\varvec{\pi }}^{(t)}|\mathbf{A }^{(t)}\) from the Dirichlet full conditional distribution (e.g., see Culpepper, 2015).
- 3.
For \(j=1,\ldots ,J\),
- (a)
For \(k=1,\ldots ,K\) sample \(q_{jk}^{(t)}\) from the Bernoulli full conditional distribution \(q_{jk}^{(t)}|{\varvec{\beta }}_j^{(t-1)}, q_{j1}^{(t)},\ldots ,q_{j,k-1}^{(t)},q_{j,k+1}^{(t-1)},\ldots ,q_{jK}^{(t-1)},\omega ^{(t-1)}\) (e.g., see Culpepper, 2019).
- (b)
For \(p=1,\ldots ,P\) sample \(\beta _{jp}^{(t)}\) from the truncated normal full conditional distribution \(\beta _{jp}^{(t)}|Y_{1j}^{*(t)},\ldots ,Y_{nj}^{*(t)},\mathbf A ^{(t)},\beta _{j1}^{(t)},\ldots ,\beta _{j,p-1}^{(t)},\beta _{j,p+1}^{(t-1)},\ldots ,,\beta _{j,P+1}^{(t-1)}, {\varvec{q}}_j^{(t)}\) (e.g., see Culpepper, 2019).
- (a)
- 4.
Sample \(\omega ^{(t)}\) from the Beta full conditional distribution \(\omega ^{(t)}|\mathbf{Q }^{(t)}\) (e.g., see Culpepper, 2019).
Rights and permissions
About this article
Cite this article
Culpepper, S.A. An Exploratory Diagnostic Model for Ordinal Responses with Binary Attributes: Identifiability and Estimation. Psychometrika 84, 921–940 (2019). https://doi.org/10.1007/s11336-019-09683-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-019-09683-4