Skip to main content
Log in

Controlling for occasion-specific effects when assessing the test–retest reliability of self-report health questionnaires

  • Original Paper
  • Published:
Quality of Life Research Aims and scope Submit manuscript

Abstract

Objective

This study proposes a method for self-report health questionnaires to adjust test–retest reliability for changes during the test–retest interval based on an external measure, and to distinguish such changes from random response errors.

Methods

In our application, eighty participants completed the Symptoms of Illness Checklist (SIC) on two occasions, two weeks apart, immediately before interviews given on each occasion by one of two physicians in a crossover design. The physician interview scores served as external measures, and structural equation modeling was used to estimate the parameters of a model that corrected for the occasion-specific effect of participants’ responses using information from the interviews.

Results

Correcting for changes in symptoms during the test–retest interval increased SIC test–retest reliability from .744 to .804 and significantly improved model fit (χ2 diff (1) = 30.78, p < .001).

Conclusions

The results suggest methods that can improve the evaluation of self-report health questionnaire test–retest reliability by identifying changes using an external measure, and distinguishing these from random response errors; these increased the estimated SIC test–retest reliability and indicated that the SIC was indeed able to measure changes over the studied time interval. This method can be applied across a broad range of questionnaires.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. AMOS allows the model to be specified graphically in the form of a path diagram—in the present case the illustration presented in Fig. 1, but without the triangle and the lines emanating from it (these intercepts are estimated by default in AMOS). If the reader wishes to obtain a copy of the AMOS program file used in the present study, please contact Joseph Olsen at joseph_olsen@byu.edu.

References

  1. Schmidt, F. L., Le, H., & Ilies, R. (2003). Beyond alpha: An empirical examination of the effects of different sources of measurement error on reliability estimates for measures of individual differences constructs. Psychological Methods, 8, 206–224.

    Article  Google Scholar 

  2. Laenen, A., Vangeneugden, T., Geys, H., & Molenberghs, G. (2006). Generalized reliability estimation using repeated measurements. British Journal of Mathematical and Statistical Psychology, 59, 113–131.

    Article  Google Scholar 

  3. Schuck, P. (2004). Assessing reproducibility for interval data in health-related quality of life questionnaires: Which coefficient should be used? Quality of Life Research, 13, 571–586.

    Article  Google Scholar 

  4. Becker, G. (2000). How important is transient error in estimating reliability? Going beyond simulation studies. Psychological Methods, 5, 370–379.

    Article  CAS  Google Scholar 

  5. Green, S. B. (2003). A coefficient Alpha for test–retest data. Psychological Methods, 8, 88–101.

    Article  Google Scholar 

  6. Vautier, S., & Jmel, S. (2003). Transient error or specificity? An alternative to the staggered equivalent split-half procedure. Psychological Methods, 8, 225–238.

    Article  Google Scholar 

  7. Raykov, T., & Penev S. (2005). Estimating the reliability for multiple component measuring instruments in test–retest designs. British Journal of Mathematical and Statistical Psychology, 58, 285–299.

    Article  Google Scholar 

  8. Sturman, M. C., Cheramie, R. A., & Cashen, L. H. (2005). The impact of job complexity and performance measurement on the temporal consistency, stability, and test–retest reliability of employee job performance ratings. Journal of Applied Psychology, 90, 269–283.

    Article  Google Scholar 

  9. Watson, D. (2004). Stability versus change, dependability versus error: Issues in the assessment of personality over time. Journal of Research in Personality, 38, 319–350.

    Article  Google Scholar 

  10. Stowell, J. R., & Bloch, G. J. (2002, April). The symptoms of illness checklist (SIC): A relation between health and stress. (Paper presented at the Rocky Mountain Psychological Association, Abstract 180.).

  11. Stowell, J. R., Hedges, D. W., Ghambaryan. A., Key, C., & Bloch, G. J. (Submitted, 2007). Validation of the Symptoms of Illness Checklist (SIC) as a tool for health psychology research. Journal of Health Psychology.

  12. Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.

    Google Scholar 

  13. McDowell, I., & Newell, C. (1996). Measuring health: A guide to rating scales and questionnaires. New York: Oxford Press.

    Google Scholar 

  14. Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull, A., et al. (1993). The European organization for research and treatment of cancer QLQ-C30: a quality-of-life instrument for use in international clinical trials in oncology. Journal of the National Cancer Institute, 85, 365–376.

    Article  CAS  Google Scholar 

  15. Ware, J. E., & Sherbourne, C. D. (1992). The MOS 36-item Short-Form Health Survey (SF-36). I. Conceptual framework and item selection. Medical Care, 30, 473–483.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George J. Bloch.

Appendix: Model specification

Appendix: Model specification

Letting bold face characters indicate vectors or matrices, denote \( {\mathbf{Y}} = \varvec{\upalpha} + {\mathbf{\Lambda \eta }} \), \( Cov(\varvec{\upeta}) = \varvec{\uppsi} \). Then

$$ {\left[ {\begin{array}{*{20}c} {{y^{{(Q)}}_{{j1}} }} \\ {{y^{{(Q)}}_{{j2}} }} \\ {{y^{{(I)}}_{{j1}} }} \\ {{y^{{(I)}}_{{j2}} }} \\ \end{array} } \right]} = {\left[ {\begin{array}{*{20}c} {{a^{{(Q)}}_{1} }} \\ {{a^{{(Q)}}_{2} }} \\ {{a^{{(I)}}_{1} }} \\ {{a^{{(I)}}_{2} }} \\ \end{array} } \right]} + {\left[ {\begin{array}{*{20}c} {1} & {0} & {c} & {0} & {1} & {0} \\ {1} & {0} & {0} & {c} & {0} & {1} \\ {0} & {1} & {1} & {0} & {0} & {0} \\ {0} & {1} & {0} & {1} & {0} & {0} \\ \end{array} } \right]}{\left[ {\begin{array}{*{20}c} {{T^{{(Q)}}_{j} }} \\ {{T^{{(I)}}_{j} }} \\ {{d^{{(I)}}_{{j1}} }} \\ {{d^{{(I)}}_{{j2}} }} \\ {{e^{{(Q)}}_{{j1}} }} \\ {{e^{{(Q)}}_{{j2}} }} \\ \end{array} } \right]},\;\varvec{\uppsi} = {\left[ {\begin{array}{*{20}c} {{\sigma ^{2}_{Q} }} & {{\sigma _{{QI}} }} & {0} & {0} & {0} & {0} \\ {{\sigma _{{QI}} }} & {{\sigma ^{2}_{I} }} & {0} & {0} & {0} & {0} \\ {0} & {0} & {{\sigma ^{2}_{d} }} & {0} & {0} & {0} \\ {0} & {0} & {0} & {{\sigma ^{2}_{d} }} & {0} & {0} \\ {0} & {0} & {0} & {0} & {{\sigma ^{2}_{e} }} & {0} \\ {0} & {0} & {0} & {0} & {0} & {{\sigma ^{2}_{e} }} \\ \end{array} } \right]} $$

Here \( \sigma ^{2}_{Q} \), \( \sigma ^{2}_{I} \), \( \sigma ^{2}_{d} \) and \( \sigma ^{2}_{e} \) are the variances of the factors T (Q), T (I), d (I), and e (Q), \( \sigma _{{QI}} \) is the covariance between T (Q) and T (I), and c is the correlation coefficient obtained from regressing \( d^{{(Q)}}_{{jk}} \) on \( d^{{(I)}}_{{jk}} \). Because the mean structure of the model is saturated due to the estimation of separate intercepts for each of the observed variables, it is possible to give a simplified expression for the model-implied covariance matrix as \( {\mathbf{\hat{\Sigma }}} = {\mathbf{\Lambda \Psi {\Lambda }\ifmmode{'}\else$'$\fi }} \). The maximum likelihood fitting function which is used to index the discrepancy between this model-implied matrix and the original sample covariance matrix (S) can be given as: \( F_{{ML}} = \log {\left| {\hat{\Sigma }} \right|} + trace[{\mathbf{S\hat{\Sigma }}}^{{ - 1}} ] - \log {\left| {\mathbf{S}} \right|} - t, \) where t is the total number of study variables. In the present study, S is the covariance matrix for the four-variate multinormal distribution of the study variables \( y^{{(I)}}_{{j1}} \), \( y^{{(I)}}_{{j2}} \), \( y^{{(Q)}}_{{j1}} \), and \( y^{{(Q)}}_{{j2}} \).

With sample size N, F ML can be rescaled to approximate a chi-square variate for purposes of model goodness-of-fit testing: (N–1)F ML ∼ χ2. Under standard conditions, this provides a chi-square test of model fit with degrees of freedom equal to the difference between the number of estimated model parameters and the total number of means, variances, and covariances.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Olsen, J.A., Bloch, D.A. & Bloch, G.J. Controlling for occasion-specific effects when assessing the test–retest reliability of self-report health questionnaires. Qual Life Res 16, 1399–1405 (2007). https://doi.org/10.1007/s11136-007-9246-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11136-007-9246-9

Keywords

Navigation