Abstract
Objective
This study proposes a method for self-report health questionnaires to adjust test–retest reliability for changes during the test–retest interval based on an external measure, and to distinguish such changes from random response errors.
Methods
In our application, eighty participants completed the Symptoms of Illness Checklist (SIC) on two occasions, two weeks apart, immediately before interviews given on each occasion by one of two physicians in a crossover design. The physician interview scores served as external measures, and structural equation modeling was used to estimate the parameters of a model that corrected for the occasion-specific effect of participants’ responses using information from the interviews.
Results
Correcting for changes in symptoms during the test–retest interval increased SIC test–retest reliability from .744 to .804 and significantly improved model fit (χ2 diff (1) = 30.78, p < .001).
Conclusions
The results suggest methods that can improve the evaluation of self-report health questionnaire test–retest reliability by identifying changes using an external measure, and distinguishing these from random response errors; these increased the estimated SIC test–retest reliability and indicated that the SIC was indeed able to measure changes over the studied time interval. This method can be applied across a broad range of questionnaires.
Similar content being viewed by others
Notes
AMOS allows the model to be specified graphically in the form of a path diagram—in the present case the illustration presented in Fig. 1, but without the triangle and the lines emanating from it (these intercepts are estimated by default in AMOS). If the reader wishes to obtain a copy of the AMOS program file used in the present study, please contact Joseph Olsen at joseph_olsen@byu.edu.
References
Schmidt, F. L., Le, H., & Ilies, R. (2003). Beyond alpha: An empirical examination of the effects of different sources of measurement error on reliability estimates for measures of individual differences constructs. Psychological Methods, 8, 206–224.
Laenen, A., Vangeneugden, T., Geys, H., & Molenberghs, G. (2006). Generalized reliability estimation using repeated measurements. British Journal of Mathematical and Statistical Psychology, 59, 113–131.
Schuck, P. (2004). Assessing reproducibility for interval data in health-related quality of life questionnaires: Which coefficient should be used? Quality of Life Research, 13, 571–586.
Becker, G. (2000). How important is transient error in estimating reliability? Going beyond simulation studies. Psychological Methods, 5, 370–379.
Green, S. B. (2003). A coefficient Alpha for test–retest data. Psychological Methods, 8, 88–101.
Vautier, S., & Jmel, S. (2003). Transient error or specificity? An alternative to the staggered equivalent split-half procedure. Psychological Methods, 8, 225–238.
Raykov, T., & Penev S. (2005). Estimating the reliability for multiple component measuring instruments in test–retest designs. British Journal of Mathematical and Statistical Psychology, 58, 285–299.
Sturman, M. C., Cheramie, R. A., & Cashen, L. H. (2005). The impact of job complexity and performance measurement on the temporal consistency, stability, and test–retest reliability of employee job performance ratings. Journal of Applied Psychology, 90, 269–283.
Watson, D. (2004). Stability versus change, dependability versus error: Issues in the assessment of personality over time. Journal of Research in Personality, 38, 319–350.
Stowell, J. R., & Bloch, G. J. (2002, April). The symptoms of illness checklist (SIC): A relation between health and stress. (Paper presented at the Rocky Mountain Psychological Association, Abstract 180.).
Stowell, J. R., Hedges, D. W., Ghambaryan. A., Key, C., & Bloch, G. J. (Submitted, 2007). Validation of the Symptoms of Illness Checklist (SIC) as a tool for health psychology research. Journal of Health Psychology.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
McDowell, I., & Newell, C. (1996). Measuring health: A guide to rating scales and questionnaires. New York: Oxford Press.
Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull, A., et al. (1993). The European organization for research and treatment of cancer QLQ-C30: a quality-of-life instrument for use in international clinical trials in oncology. Journal of the National Cancer Institute, 85, 365–376.
Ware, J. E., & Sherbourne, C. D. (1992). The MOS 36-item Short-Form Health Survey (SF-36). I. Conceptual framework and item selection. Medical Care, 30, 473–483.
Author information
Authors and Affiliations
Corresponding author
Appendix: Model specification
Appendix: Model specification
Letting bold face characters indicate vectors or matrices, denote \( {\mathbf{Y}} = \varvec{\upalpha} + {\mathbf{\Lambda \eta }} \), \( Cov(\varvec{\upeta}) = \varvec{\uppsi} \). Then
Here \( \sigma ^{2}_{Q} \), \( \sigma ^{2}_{I} \), \( \sigma ^{2}_{d} \) and \( \sigma ^{2}_{e} \) are the variances of the factors T (Q), T (I), d (I), and e (Q), \( \sigma _{{QI}} \) is the covariance between T (Q) and T (I), and c is the correlation coefficient obtained from regressing \( d^{{(Q)}}_{{jk}} \) on \( d^{{(I)}}_{{jk}} \). Because the mean structure of the model is saturated due to the estimation of separate intercepts for each of the observed variables, it is possible to give a simplified expression for the model-implied covariance matrix as \( {\mathbf{\hat{\Sigma }}} = {\mathbf{\Lambda \Psi {\Lambda }\ifmmode{'}\else$'$\fi }} \). The maximum likelihood fitting function which is used to index the discrepancy between this model-implied matrix and the original sample covariance matrix (S) can be given as: \( F_{{ML}} = \log {\left| {\hat{\Sigma }} \right|} + trace[{\mathbf{S\hat{\Sigma }}}^{{ - 1}} ] - \log {\left| {\mathbf{S}} \right|} - t, \) where t is the total number of study variables. In the present study, S is the covariance matrix for the four-variate multinormal distribution of the study variables \( y^{{(I)}}_{{j1}} \), \( y^{{(I)}}_{{j2}} \), \( y^{{(Q)}}_{{j1}} \), and \( y^{{(Q)}}_{{j2}} \).
With sample size N, F ML can be rescaled to approximate a chi-square variate for purposes of model goodness-of-fit testing: (N–1)F ML ∼ χ2. Under standard conditions, this provides a chi-square test of model fit with degrees of freedom equal to the difference between the number of estimated model parameters and the total number of means, variances, and covariances.
Rights and permissions
About this article
Cite this article
Olsen, J.A., Bloch, D.A. & Bloch, G.J. Controlling for occasion-specific effects when assessing the test–retest reliability of self-report health questionnaires. Qual Life Res 16, 1399–1405 (2007). https://doi.org/10.1007/s11136-007-9246-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-007-9246-9