Skip to main content
Log in

Bayesian Model Assessment for Jointly Modeling Multidimensional Response Data with Application to Computerized Testing

  • Theory and Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Computerized assessment provides rich multidimensional data including trial-by-trial accuracy and response time (RT) measures. A key question in modeling this type of data is how to incorporate RT data, for example, in aid of ability estimation in item response theory (IRT) models. To address this, we propose a joint model consisting of a two-parameter IRT model for the dichotomous item response data, a log-normal model for the continuous RT data, and a normal model for corresponding paper-and-pencil scores. Then, we reformulate and reparameterize the model to capture the relationship between the model parameters, to facilitate the prior specification, and to make the Bayesian computation more efficient. Further, we propose several new model assessment criteria based on the decomposition of deviance information criterion (DIC) the logarithm of the pseudo-marginal likelihood (LPML). The proposed criteria can quantify the improvement in the fit of one part of the multidimensional data given the other parts. Finally, we have conducted several simulation studies to examine the empirical performance of the proposed model assessment criteria and have illustrated the application of these criteria using a real dataset from a computerized educational assessment program.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Bolsinova, M., de Boeck, P., & Tijmstra, J. (2017). Modelling conditional dependence between response time and accuracy. Psychometrika, 82(4), 1126–1148.

    Article  PubMed  Google Scholar 

  • Bolt, D. M., Wollack, J. A., & Suh, Y. (2012). Application of a multidimensional nested logit model to multiple-choice test items. Psychometrika, 77(2), 339–357.

    Article  Google Scholar 

  • Celeux, G., Forbes, F., Robert, C. P., & Titterington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1(4), 651–673.

    Article  Google Scholar 

  • Chan, J. C., & Grant, A. L. (2016). Fast computation of the deviance information criterion for latent variable models. Computational Statistics and Data Analysis, 100, 847–859.

  • Chen, G., & Luo, S. (2018). Bayesian hierarchical joint modeling using skew-normal/independent distributions. Communications in Statistics-Simulation and Computation, 47(5), 1420–1438.

    Article  PubMed  Google Scholar 

  • Chen, M. H., & Shao, Q. M. (1999). Monte Carlo estimation of Bayesian credible and HPD intervals. Journal of Computational and Graphical Statistics, 8(1), 69–92.

    Google Scholar 

  • Chen, M. H., Shao, Q. M., & Ibrahim, J. G. (2000). Monte Carlo methods in Bayesian computation. Berlin: Springer.

    Book  Google Scholar 

  • de la Torre, J., & Patz, R. J. (2005). Making the most of what we have: A practical application of multidimensional item response theory in test scoring. Journal of Educational and Behavioral Statistics, 30(3), 295–311.

    Article  Google Scholar 

  • de Valpine, P., Paciorek, C., Turek, D., Michaud, N., Anderson-Bergman, C., Obermeyer, F. & Paganin, S. (2020). NIMBLE: MCMC, particle filtering, and programmable hierarchical modeling. https://doi.org/10.5281/zenodo.1211190

  • de Valpine, P., Turek, D., Paciorek, C. J., Anderson-Bergman, C., Lang, D. T., & Bodik, R. (2017). Programming with models: Writing statistical algorithms for general model structures with NIMBLE. Journal of Computational and Graphical Statistics, 26(2), 403–413.

    Article  Google Scholar 

  • Donkin, C., Averell, L., Brown, S., & Heathcote, A. (2009). Getting more from accuracy and response time data: Methods for fitting the linear ballistic accumulator. Behavior Research Methods, 41(4), 1095–1110.

    Article  PubMed  Google Scholar 

  • Entink, R. K., Fox, J. P., & van der Linden, W. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74(1), 21–48.

    Article  Google Scholar 

  • Fox, J. P. (2010). Bayesian item response modeling: Theory and applications. Berlin: Springer.

    Book  Google Scholar 

  • Fox, J. P., & Marianti, S. (2016). Joint modeling of ability and differential speed using responses and response times. Multivariate Behavioral Research, 51(4), 540–553.

    Article  PubMed  Google Scholar 

  • Fujimoto, K. A. (2018). A general Bayesian multilevel multidimensional IRT model for locally dependent data. British Journal of Mathematical and Statistical Psychology, 71(3), 536–560.

    Article  PubMed  Google Scholar 

  • Geisser, S., & Eddy, W. F. (1979). A predictive approach to model selection. Journal of the American Statistical Association, 74(365), 153–160.

    Article  Google Scholar 

  • Gelfand, A. E., & Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society: Series B, 56(3), 501–514.

    Google Scholar 

  • Gelfand, A. E., Dey, D. K., & Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based-methods (with discussion). In A. P. D. J.M. Bernado J.O. Berger & A. Smith (eds), In bayesian statistics 4. Oxford: Oxford University Press.

  • Gilbert, J. K., Compton, D. L., Fuchs, D., & Fuchs, L. S. (2012). Early screening for risk of reading disabilities: Recommendations for a four-step screening system. Assessment for Effective Intervention, 38(1), 6–14.

    Article  PubMed  PubMed Central  Google Scholar 

  • Ibrahim, J. G., Chen, M. H., & Sinha, D. (2001). Bayesian survival analysis. Berlin: Springer.

    Book  Google Scholar 

  • Jeffreys, H. (1961). The theory of probability (3rd ed.). Oxford, UK: Oxford University Press.

    Google Scholar 

  • Johnson, T. R. (2003). On the use of heterogeneous thresholds ordinal regression models to account for individual differences in response style. Psychometrika, 68(4), 563–583.

    Article  Google Scholar 

  • Karadavut, T. (2019). The uniform prior for Bayesian estimation of ability in item response theory models. International Journal of Assessment Tools in Education, 6(4), 568–579.

    Article  Google Scholar 

  • Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.

    Article  Google Scholar 

  • Li, Y., Yu, J., & Zeng, T. (2020). Deviance information criterion for latent variable models and misspecified models. Journal of Econometrics, 216(2), 450–493.

    Article  Google Scholar 

  • Lindley, D. V. (1965). Introduction to probability and statistics from a bayesian viewpoint. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Loeys, T., Rosseel, Y., & Baten, K. (2011). A joint modeling approach for reaction time and accuracy in psycholinguistic experiments. Psychometrika, 76(3), 487–503.

    Article  Google Scholar 

  • Lu, J., Wang, C., Zhang, J., & Tao, J. (2020). A mixture model for responses and response times with a higher-order ability structure to detect rapid guessing behaviour. British Journal of Mathematical and Statistical Psychology, 73(2), 261–288.

    Article  PubMed  Google Scholar 

  • Luce, R. D. (1991). Response times: Their role in inferring elementary mental organization. Oxford: Oxford University Press.

  • Man, K., Harring, J. R., Jiao, H., & Zhan, P. (2019). Joint modeling of compensatory multidimensional item responses and response times. Applied Psychological Measurement, 43(8), 639–654.

    Article  PubMed  PubMed Central  Google Scholar 

  • Merkle, E. C., Furr, D., & Rabe-Hesketh, S. (2019). Bayesian comparison of latent variable models: Conditional versus marginal likelihoods. Psychometrika, 84(3), 802–829.

    Article  PubMed  Google Scholar 

  • Molenaar, D., & de Boeck, P. (2018). Response mixture modeling: Accounting for heterogeneity in item characteristics across response times. Psychometrika, 83(2), 279–297.

    Article  PubMed  Google Scholar 

  • Rouder, J. N., Province, J. M., Morey, R. D., Gomez, P., & Heathcote, A. (2015). The lognormal race: A cognitive-process model of choice and latency with desirable psychometric properties. Psychometrika, 80(2), 491–513.

    Article  PubMed  Google Scholar 

  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B, 64(4), 583–639.

    Article  Google Scholar 

  • Torgesen, J. K., Wagner, R., & Rashotte, C. (2012). Test of word reading efficiency: (TOWRE-2). New York, NY: Pearson.

  • van der Linden, W. J. (2009). Conceptual issues in response-time modeling. Journal of Educational Measurement, 46(3), 247–272.

    Article  Google Scholar 

  • van der Linden, W. J. (2017). Handbook of item response theory, volume three: Applications. Boca Raton: Chapman and Hall/CRC.

    Book  Google Scholar 

  • van der Linden, W. J., & Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73(3), 365–384.

    Article  Google Scholar 

  • van der Linden, W. J., & Hambleton, R. K. (2013). Handbook of modern item response theory. Berlin: Springer.

    Google Scholar 

  • Visual Numerics, I. (2003). Imsl fortran library user’s guide math/library. San Ramon, CA: Visual Numerics Inc.

    Google Scholar 

  • Wang, X., Saha, A., & Dey, D. K. (2016). Bayesian joint modeling of response times with dynamic latent ability in educational testing (Vol. 3; Tech. Rep.). Department of Statistics, University of Connecticut, Storrs, Connecticut, USA

  • Watanabe, S. (2010). Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11, 3571–3594.

    Google Scholar 

  • Zhang, D., Chen, M. H., Ibrahim, J. G., Boye, M. E., & Shen, W. (2017). Bayesian model assessment in joint modeling of longitudinal and survival data with applications to cancer clinical trials. Journal of Computational and Graphical Statistics, 26(1), 121–133.

    Article  PubMed  PubMed Central  Google Scholar 

  • Zhang, F., Chen, M. H., Cong, X. J., & Chen, Q. (2021). Assessing importance of biomarkers: A bayesian joint modelling approach of longitudinal and survival data with semi-competing risks. Statistical Modelling, 21(1–2), 30–55.

    Article  PubMed  Google Scholar 

  • Zhang, X., Tao, J., Wang, C., & Shi, N. Z. (2019). Bayesian model selection methods for multilevel IRT models: A comparison of five DIC-based indices. Journal of Educational Measurement, 56(1), 3–27.

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the Executive Editor, the Associate Editor, and three reviewers on their valuable suggestions and comments, which have led to a much improved version of the paper. The work was done when Dr. Liu visited the University of Connecticut as a visiting scholar. This research was partially supported by the University of California Office of the President and the University of Connecticut InCHIP seed grant. Dr. Wang’s research was also partially supported by the US National Science Foundation under Grant No. 1848451.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaojing Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1631 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, F., Wang, X., Hancock, R. et al. Bayesian Model Assessment for Jointly Modeling Multidimensional Response Data with Application to Computerized Testing. Psychometrika 87, 1290–1317 (2022). https://doi.org/10.1007/s11336-022-09845-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-022-09845-x

Keywords

Navigation