Abstract
Fully Bayesian estimation of item response theory models with logistic link functions suffers from low computational efficiency due to posterior density functions that do not have known forms. To improve algorithmic computational efficiency, this paper proposes a Bayesian estimation method by adopting a new data-augmentation strategy in uni- and multidimensional IRT models. The strategy is based on the Pólya–Gamma family of distributions which provides a closed-form posterior distribution for logistic-based models. In this paper, an overview of Pólya–Gamma distributions is described within a logistic regression framework. In addition, we provide details about deriving conditional distributions of IRT, incorporating Pólya–Gamma distributions into the conditional distributions for Bayesian samplers’ construction, and random drawing from the samplers such that a faster convergence can be achieved. Simulation studies and applications to real datasets were conducted to demonstrate the efficiency and utility of the proposed method.
Similar content being viewed by others
References
Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association, 88(422), 669–679.
Aruoba, S. B., & Fernández-Villaverde, J. (2014). A comparison of programming languages in economics. NBER working paper no. w20263. Cambridge, MA: National Bureau of Economic Research.
Baker, F. B., & Kim, S. H. (Eds.). (2004). Item response theory: Parameter estimation techniques. Boca Raton: CRC Press.
Biane, P., Pitman, J., & Yor, M. (2001). Probability laws related to the Jacobi theta and Riemann zeta functions, and Brownian excursions. Bulletin of the American Mathematical Society, 38(4), 435–465.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.
Brooks, S., Gelman, A., Jones, G., & Meng, X. L. (Eds.). (2011). Handbook of Markov chain Monte Carlo. Boca Raton: CRC Press.
Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75(1), 33–57.
Carlin, B. P., Polson, N. G., & Stoffer, D. S. (1992). A Monte Carlo approach to nonnormal and nonlinear state-space modeling. Journal of the American Statistical Association, 87(418), 493–500.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.
Chib, S., Greenberg, E., & Chen, Y. (1998). MCMC methods for fitting and comparing multinomial response models. NBER working paper no. 19802001. Cambridge, MA: National Bureau of Economic Research.
DeMars, C. E. (2016). Partially compensatory multidimensional item response theory models: Two alternate model forms. Educational and Psychological Measurement, 76(2), 231–257.
Devroye, L. (2002). Simulating Bessel random variables. Statistics and Probability Letters, 57(3), 249–257.
Dobra, A., Tebaldi, C., & West, M. (2006). Data augmentation in multi-way contingency tables with fixed marginal totals. Journal of Statistical Planning and Inference, 136(2), 355–372.
Duane, S., Kennedy, A. D., Pendleton, B. J., & Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195(2), 216–222.
Eckes, T., & Baghaei, P. (2015). Using testlet response theory to examine local dependence in C-tests. Applied Measurement in Education, 28(2), 85–98.
Edwards, M. C. (2010). A Markov chain Monte Carlo approach to confirmatory item factor analysis. Psychometrika, 75(3), 474–497.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.
Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36–49.
Forster, J. J., & Skene, A. M. (1994). Calculation of marginal densities for parameters of multinomial distributions. Statistics and Computing, 4(4), 279–286.
Fox, J. P., & Glas, C. A. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66(2), 271–288.
Frühwirth-Schnatter, S., & Frühwirth, R. (2010). Data augmentation and MCMC for binary and multinomial logit models. In T. Kneib & G. Tutz (Eds.), Statistical modelling and regression structures (pp. 111–132). Heidelberg: Physica-Verlag HD.
Gamerman, D. (1997). Sampling from the posterior distribution in generalized linear mixed models. Statistics and Computing, 7(1), 57–68.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.
Girolami, M., & Calderhead, B. (2011). Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2), 123–214.
Harwell, M. R., & Baker, F. B. (1991). The use of prior distributions in marginalized Bayesian item parameter estimation: A didactic. Applied Psychological Measurement, 15(4), 375–389.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97–109.
Hoffman, M. D., & Gelman, A. (2014). The No-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593–1623.
Holmes, C. C., & Held, L. (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1), 145–168.
Imai, K., & van Dyk, D. A. (2005). A Bayesian analysis of the multinomial probit model using marginal data augmentation. Journal of Econometrics, 124(2), 311–334.
Jiang, Z., & Raymond, M. (2018). The use of multivariate generalizability theory to evaluate the quality of subscores. Applied Psychological Measurement. https://doi.org/10.1177/0146621618758698.
Junker, B. W., Patz, R. J., & VanHoudnos, N. M. (2016). Markov chain Monte Carlo for item response models. Handbook of Item Response Theory, Volume Two: Statistical Tools, 21, 271–325.
Kuo, T. C., & Sheng, Y. (2015). Bayesian estimation of a multi-unidimensional graded response IRT model. Behaviormetrika, 42(2), 79–94.
Lawley, D. N., & Maxwell, A. E. (1971). Factor analysis as a statistical method. New York: Macmillan.
Lenk, P. J., & DeSarbo, W. S. (2000). Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika, 65(1), 93–119.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. London: Routledge.
Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D. (2000). WinBUGS-a Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10(4), 325–337.
Lynch, S. M. (2010). Introduction to applied Bayesian statistics and estimation for social scientists. New York: Springer.
Matlock, K. L., Turner, R. C., & Gitchel, W. D. (2016). A study of reverse-worded matched item pairs using the generalized partial credit and nominal response models. Educational and Psychological Measurement, 78, 103–127.
McDonald, R. P. (1999). Test theory: A unified treatment. London: Erlbaum.
Mislevy, R. J., & Stocking, M. L. (1989). A consumer’s guide to LOGIST and BILOG. Applied Psychological Measurement, 13, 57–75.
Monroe, S., & Cai, L. (2014). Estimation of a Ramsay-curve item response theory model by the Metropolis–Hastings Robbins–Monro algorithm. Educational and Psychological Measurement, 74(2), 343–369.
Niederreiter, H. (1978). Quasi-Monte Carlo methods and pseudo-random numbers. Bulletin of the American Mathematical Society, 84(6), 957–1041.
Patz, R. J., & Junker, B. W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146–178.
Polson, N. G., Scott, J. G., & Windle, J. (2013). Bayesian inference for logistic models using Pólya–Gamma latent variables. Journal of the American statistical Association, 108(504), 1339–1349.
R Core Team. (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from http://www.Rproject.org/.
Robert, C. P., & Casella, G. (2004). Monte Carlo statistical methods. New York: Springer.
Schilling, S., & Bock, R. D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70(3), 533–555.
Shaby, B., & Wells, M. T. (2010). Exploring an adaptive Metropolis algorithm. Currently Under Review, 1, 1–17.
Sinharay, S. (2003). Assessing convergence of the Markov chain Monte Carlo algorithms: A review. ETS Research Report Series, 2003(1), i–52.
Skene, A. M., & Wakefield, J. C. (1990). Hierarchical models for multicentre binary response studies. Statistics in Medicine, 9(8), 919–929.
Spiegelhalter, D. J., Thomas, A., Best, N., & Lunn, D. (2003). WinBUGS user manual. Cambridge, UK: MRC Biostatistics Unit. Retrieved from http://www.mrc-bsu.cam.ac.uk/bugs.
Talhouk, A., Doucet, A., & Murphy, K. (2012). Efficient Bayesian inference for multivariate probit models with sparse inverse correlation matrices. Journal of Computational and Graphical Statistics, 21(3), 739–757.
Thissen, D., & Wainer, H. (2001). Test scoring. Hillsdale, NJ: Lawrence Erlbaum Associates.
van Der Linden, W. J., & Hambleton, R. K. (1997). Item response theory: Brief history, common models, and extensions. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 1–28). New York: Springer.
Zeithammer, R., & Lenk, P. (2006). Bayesian estimation of multivariate-normal models when dimensions are absent. Quantitative Marketing and Economics, 4(3), 241–265.
Author information
Authors and Affiliations
Corresponding author
Appendix I
Appendix I
Proof
The conditional posterior of \({\varvec{\beta }}\) conditioning on a set of Pólya–Gamma random variables \({{\varvec{w}}}=\left( {w_1 ,w_2 ,\ldots ,w_N } \right) \) is:
where \({{\varvec{z}}}=\left( {\frac{k_1 }{w_1 },\frac{k_2 }{w_2 },\ldots ,\frac{k_N }{w_N }} \right) \), \({\varvec{\varOmega }} =diag\left( {{w}_1 ,w_2 ,\ldots ,w_N } \right) \), and the prior for \({\varvec{\beta }}\) is \(p\left( {\varvec{\beta }} \right) \) and \({{\varvec{X}}}\) is the predictor matrix consisted of known scalars. The \({{\varvec{y}}}\) is reparametrized to k by calculating \(y-\frac{n}{2}\), where n =1 in a binomial likelihood. Let \({\varvec{\beta }} =\theta _e \), \({{\varvec{X}}}\)= (\({{\varvec{a}}}\), \({{\varvec{b}}}\)), and \(p\left( {\theta _e } \right) \) be the prior for \(\theta _e \). Given \({{\varvec{a}}}\), \({{\varvec{b}}}\), and \({\varvec{\varOmega }}_{{\varvec{e}}}\) are constant, the conditional posterior distribution for \(\theta _e \) can be obtained as:
\(\square \)
Let \({{\varvec{z}}}_{{\varvec{e}}} ={{\varvec{z}}}+{{{\varvec{a}}}}'{{\varvec{b}}}\), and therefore, \({{\varvec{z}}}_e =(\frac{a_1 b_1 w_{1e} +k_{1e} }{w_{1e} }\), ..., \(\frac{a_I b_I w_{Ie} +k_{Ie} }{w_{Ie} })\); the formula above can be simplified as \(p\left( {\theta _e {|}{{\varvec{a}}},{{\varvec{b}}},{{\varvec{w}}}, {{\varvec{y}}}_e } \right) \propto p\left( {\theta _e } \right) \exp \left\{ {-\frac{1}{2}\left( {{{\varvec{z}}}_{{\varvec{e}}} -{{\varvec{a}}}\theta _e } \right) ^{T}{\varvec{\varOmega }} _{\mathrm{e}} \left( {{{\varvec{z}}}_{{\varvec{e}}} -{{\varvec{a}}}\theta _e } \right) } \right\} \) with both \({\varvec{\varOmega }} _{\mathrm{e}} =diag\left( {w_{1e} ,\ldots ,w_{Ie} } \right) \) and \({{\varvec{k}}}_e ={{\varvec{y}}}_e -\frac{1}{2}\). This finding is given in Eq. 7.
Let \({\varvec{\beta }} =b_i \), \({{\varvec{X}}}\)= (\({\varvec{\theta }},a_i )\), and \(p\left( {b_i } \right) \) be the prior for \(b_i \). Given \({\varvec{\theta }}\), \(a_i\), and \({\varvec{\varOmega }}_{{\varvec{b}}} \) are constant, the conditional posterior distribution for \(b_i \) can be obtained as:
Let \({{\varvec{z}}}_{{\varvec{b}}} ={{\varvec{z}}}-a_i {\varvec{\theta }}\), and therefore, \({{\varvec{z}}}_b =(\frac{k_{i1} -a_i \theta _1 w_{i1} }{w_{i1} }\), ..., \(\frac{k_{iN} -a_i \theta _{N} w_{iN} }{w_{iN} })\); the formula above can be simplified as \(f\left( {b_i {|}{\varvec{\theta }} ,a_i,{{\varvec{y}}}_{{\varvec{i}}} ,{{\varvec{w}}}} \right) \propto p\left( {b_i } \right) \exp \left\{ {-\frac{1}{2}\left( {{{\varvec{z}}}_b +\mathbf{1}a_i b_i } \right) ^{T}{\varvec{\varOmega }} _{ab} \left( {{{\varvec{z}}}_b +\mathbf{1}a_i b_i } \right) } \right\} \) with both \({\varvec{\varOmega }} _{\mathrm{b}} =diag\left( {w_{i1} ,\ldots ,w_{iN} } \right) \) and \({{\varvec{k}}}_i ={{\varvec{y}}}_{{\varvec{i}}} -\frac{1}{2}\). This finding is given in Eq. 8.
Let \({\varvec{\beta }}=a_i \), \({{\varvec{X}}} = ({\varvec{\theta }},b_i )\), and \(p\left( {a_i } \right) \) be the prior for \(a_i \). Given \({{\varvec{\theta }}} \), \(b_i\), and \({\varvec{\varOmega }}_{{\varvec{a}}}\) are constant, the conditional posterior distribution for \(a_i \) can be obtained as:
As specified previously, \({{\varvec{z}}}=\left( {\frac{k_1 }{w_1 },\frac{k_2 }{w_2 },\ldots ,\frac{k_N }{w_N }} \right) \), \({{\varvec{k}}}_i ={{\varvec{y}}}_{{\varvec{i}}} -\frac{1}{2}\), and \({\varvec{\varOmega }}_{{\varvec{a}}} \) is identical to \({\varvec{\varOmega }} _{{\varvec{b}}} \). This finding is given in Eq. 9.
In the multidimensional 2-PL model, \({\varvec{\Theta }}\) is the N x K matrix of latent traits where d is the dth column of the \({\varvec{\Theta }}\) matrix. Therefore, \({\varvec{\Theta }}_{{\varvec{d}}} \) is essentially a vector where \({\varvec{\Theta }}_{-{{\varvec{d}}}} \) is the \({\varvec{\Theta }}\) matrix excluding the \({\varvec{\Theta }}_{{\varvec{d}}} \). For each examine, \({\varvec{\theta }}_{{\varvec{e}}} =\left( {\theta _{\mathrm {e}1} ,\theta _{\mathrm {e}2} ,\ldots \theta _{ek} } \right) \) is the K-dimensional column vector of latent traits for an examinee e. Let \({\varvec{\beta }} ={\varvec{\theta }} _{{\varvec{e}}} \), \({{\varvec{X}}}\) = (\({{\varvec{a}}},{{\varvec{b}}})\), and \(p\left( {{\varvec{\theta }} _{{\varvec{e}}} } \right) \) be the prior for \({\varvec{\theta }}_{{\varvec{e}}} \). Given \({{\varvec{a}}}\), \({{\varvec{b}}}\), and \({\varvec{\varOmega }} _{{\varvec{e}}} \) are constant, the conditional posterior distribution for \({\varvec{\theta }}_{{\varvec{e}}} \) can be obtained as:
Let \({{\varvec{z}}}_{{\varvec{e}}} ={{\varvec{z}}}+{{\varvec{b}}}\), and therefore, \({{\varvec{z}}}_e =(\frac{b_1 w_{1e} +k_{1e} }{w_{1e} }\), ..., \(\frac{b_I w_{Ie} +k_{Ie} }{w_{Ie} })\); the formula above can be simplified as \(f\left( {{\varvec{\theta }} _e {|}{\varvec{\Sigma }},{{\varvec{a,b,y}}}_e } \right) \propto p\left( {{\varvec{\theta }} _{{\varvec{e}}} } \right) \exp \left\{ {-\frac{1}{2}\left( {{{\varvec{z}}}_{{\varvec{e}}} -{{\varvec{a}}}^{T}{\varvec{\theta }} _{{\varvec{e}}} } \right) ^{T}{\varvec{\varOmega }} _{\mathrm{e}} \left( {{{\varvec{z}}}_{{\varvec{e}}} -{{\varvec{a}}}^{T}{\varvec{\theta }} _{{\varvec{e}}} } \right) } \right\} \) with both \({\varvec{\varOmega }} _{\mathrm{e}} =diag\left( {w_{1e} ,\ldots ,w_{Ie} } \right) \) and \({{\varvec{k}}}_e ={{\varvec{y}}}_e -\frac{1}{2}\). This finding is given in line 1 of Eq. 12.
Let \({\varvec{\beta }} =b_i \), \({{\varvec{X}}}\) = (\({\varvec{\Theta }},{{\varvec{a}}}_{{\varvec{i}}} )\), and \(p\left( {b_i } \right) \) be the prior for \(b_i \). Given \({\varvec{\Theta }}\), \({{\varvec{a}}}_{{\varvec{i}}}\), and \({\varvec{\varOmega }}_{{\varvec{b}}}\) are constant, the conditional posterior distribution for \(b_i \) in the multidimensional 2-PL model can be obtained as:
Let \({{\varvec{z}}}_{{\varvec{b}}} ={{\varvec{z}}}-{{\varvec{a}}}_{{\varvec{i}}}^T {\varvec{\Theta }}\), and therefore, \({{\varvec{z}}}_b =(\frac{k_{i1} -a_{i1} \theta _{11} w_{i1} -a_{i2} \theta _{12} w_{i1} -\ldots -a_{iK} \theta _{1K} w_{i1} }{w_{i1} }\), ..., \(\frac{k_{iN} -a_{i1} \theta _{N1} w_{iN} -a_{i2} \theta _{N3} w_{iN} -\ldots -a_{iK} \theta _{NK} w_{iN} }{w_{iN} })\); the formula above can be simplified as \(f\left( {b_i {|}{\varvec{\theta }},a_i ,{{\varvec{y}}}_{{\varvec{i}}} ,{{\varvec{w}}}} \right) \propto p\left( {b_i } \right) \exp \left\{ {-\frac{1}{2}\left( {{{\varvec{z}}}_b +\mathbf{1}b_i } \right) ^{T}{\varvec{\varOmega }}_b \left( {{{\varvec{z}}}_b +\mathbf{1}b_i } \right) } \right\} \) with both \({\varvec{\varOmega }} _{\mathrm{b}} =diag\left( {w_{i1} ,\ldots ,w_{iN} } \right) \) and \({{\varvec{k}}}_i ={{\varvec{y}}}_{{\varvec{i}}} -\frac{1}{2}\). This finding is given in line 2 of Eq. 12.
Let \({\varvec{\beta }} =a_{id} \), \({{\varvec{X}}}\) = (\({\varvec{\Theta }},{{\varvec{a}}}_{{{\varvec{i}}}\left( {-{{\varvec{d}}}} \right) } ,b_i )\), and \(p\left( {a_{id} } \right) \) be the prior for \(a_{id} \). Given \({\varvec{\Theta }}\), \({{\varvec{a}}}_{{{\varvec{i}}}\left( {-{{\varvec{d}}}} \right) } \), \(b_i \), and \({\varvec{\varOmega }}_{{\varvec{a}}} \) are constant, the conditional posterior distribution for \(a_{id} \) can be obtained as:
Let \({{\varvec{z}}}_{ad} ={{\varvec{z}}}-{{\varvec{a}}}_{{{\varvec{i}}}\left( {-d} \right) }^T {\varvec{\Theta }}_{-d} +\mathbf{1}b_i \) and (1) if d = 1, \({{\varvec{z}}}_b =(\frac{k_{i1} -a_{i2} \theta _{12} w_{i1} -\ldots -a_{iK} \theta _{1K} w_{i1} +b_i w_{i1} }{w_{i1} }\), ..., \(\frac{k_{iN} -a_{i2} \theta _{N2} w_{iN} -\ldots -a_{iK} \theta _{NK} w_{i1} +b_i w_{iN} }{w_{iN} })\); (2) if d = K, \({{\varvec{z}}}_b =(\frac{k_{i1} -a_{i1} \theta _{11} w_{i1} -\ldots -a_{i\left( {K-1} \right) } \theta _{1\left( {K-1} \right) } w_{i1} +b_i w_{i1} }{w_{i1} }\), ..., \(\frac{k_{iN} -a_{i1} \theta _{N1} w_{iN} -\ldots -a_{i\left( {K-1} \right) } \theta _{N\left( {K-1} \right) } w_{iN} +b_i w_{iN} }{w_{iN} })\); and (3) if \(d \in \) (1, K), \({{\varvec{z}}}_b =(\frac{k_{i1} -a_{i1} \theta _{11} w_{i1} -\ldots -a_{i\left( {d-1} \right) } \theta _{1\left( {d-1} \right) } w_{i1} -a_{i\left( {d+1} \right) } \theta _{1\left( {d+1} \right) } w_{i1} -\ldots -a_{iK} \theta _{1K} w_{i1} +b_i w_{i1} }{w_{i1} }\), ..., \(\frac{k_{iN} -a_{i1} \theta _{N1} w_{iN} -\ldots -a_{i\left( {d-1} \right) } \theta _{N\left( {d-1} \right) } w_{iN} -a_{i\left( {d+1} \right) } \theta _{N\left( {d+1} \right) } w_{iN} -\ldots -a_{iK} \theta _{NK} w_{i1} +b_i w_{iN} }{w_{iN} })\). Therefore, the posterior above can be simplified as \(f\left( {a_{id} {|}{\varvec{\Theta }},{{\varvec{a}}}_{{{\varvec{i}}}\left( {-{{\varvec{d}}}} \right) } ,b_i ,{{\varvec{y}}}_{{\varvec{i}}} ,{{\varvec{w}}}} \right) \propto p\left( {a_{id} } \right) \exp \left\{ {-\frac{1}{2}\left( {{{\varvec{z}}}_{ad} -a_{id} {\varvec{\Theta }}_d } \right) ^{T} {\varvec{\varOmega }} _{{\varvec{a}}} \left( {{{\varvec{z}}}_{ad} -a_{id} {\varvec{\Theta }}_d } \right) } \right\} \) with both \({\varvec{\varOmega }} _{{\varvec{a}}} =diag\left( {w_{i1} ,\ldots ,w_{iN} } \right) \) and \({{\varvec{k}}}_i ={{\varvec{y}}}_{{\varvec{i}}} -\frac{1}{2}\). This finding is given in line 3 of Eq. 12.
Rights and permissions
About this article
Cite this article
Jiang, Z., Templin, J. Gibbs Samplers for Logistic Item Response Models via the Pólya–Gamma Distribution: A Computationally Efficient Data-Augmentation Strategy. Psychometrika 84, 358–374 (2019). https://doi.org/10.1007/s11336-018-9641-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-018-9641-x