Series: Clinical Validity of PROMIS Measures Across Several Chronic Conditions
Original Article
PROMIS measures of pain, fatigue, negative affect, physical function, and social function demonstrated clinical validity across a range of chronic conditions

https://doi.org/10.1016/j.jclinepi.2015.08.038Get rights and content

Abstract

Objective

To present an overview of a series of studies in which the clinical validity of the National Institutes of Health's Patient Reported Outcome Measurement Information System (NIH; PROMIS) measures was evaluated, by domain, across six clinical populations.

Study Design and Setting

Approximately 1,500 individuals at baseline and 1,300 at follow-up completed PROMIS measures. The analyses reported in this issue were conducted post hoc, pooling data across six previous studies, and accommodating the different designs of the six, within-condition, parent studies. Changes in T-scores, standardized response means, and effect sizes were calculated in each study. When a parent study design allowed, known groups validity was calculated using a linear mixed model.

Results

The results provide substantial support for the clinical validity of nine PROMIS measures in a range of chronic conditions.

Conclusion

The cross-condition focus of the analyses provided a unique and multifaceted perspective on how PROMIS measures function in “real-world” clinical settings and provides external anchors that can support comparative effectiveness research. The current body of clinical validity evidence for the nine PROMIS measures indicates the success of NIH PROMIS in developing measures that are effective across a range of chronic conditions.

Introduction

In a succinct seven words, Lee Sechrest summed up the formidable challenge that faces researchers who use and develop psychometric instruments: “Validity of measures is no simple matter.” [1] Although researchers often describe a scale as “valid” or having been “validated,” validity is contextual. It resides, not in the instrument itself, but in the use of the scores. A simple example clarifies the relationship between validity and appropriate use based on context. Consider a measure of depressive symptoms whose scores are found to successfully predict clinical depression. Such a finding supports the validity of using the measure's scores to screen individuals for depression; using scores of this measure to predict substance abuse, however, is unlikely to be as successful. Poor predictability of the scores in the latter case does not “invalidate” the measure any more than success in the former confers global validity to the measure.

In health measurement, among the challenges that make validity “no simple matter” is the application of measures in diverse populations and for a range of purposes. Scores are used to follow people over time, evaluate interventions, compare the effectiveness of treatments, and quantify the impact of disease on quality of life. To evaluate how effective measures are for these purposes, it is critical to administer them in clinical contexts and evaluate their performance. This is not to say that other considerations are irrelevant. The quality of an instrument is influenced by the soundness of the methods used to develop it, and evaluation of these methods is a critical component in assessment of a measure's validity. But the level of validity evidence generated in the typical measurement development study often is rudimentary and inadequate for establishing the validity of using scores in clinical contexts [2].

For patient-reported outcome (PRO) measures, there are a number of particularly relevant validity evaluations. It is important to know how well scores on measures perform in quantifying the impact of disease and health problems on domains important to patients, in comparing effectiveness of treatments and management strategies, and in tracking the longitudinal course of disease. Subjecting well-constructed PRO tools to these critical tests of clinical validity is an essential step in the maturation of a new measure.

Well-constructed, generalizable, and clinically relevant PRO measures can be very useful when conducting comparative effectiveness research (CER). CER is defined as research “designed to inform health care decisions by providing evidence on the effectiveness, benefits, and harms of different treatment options” [3]. PRO scores increasingly serve as end points in treatment efficacy and effectiveness studies. They can be used to define responding (or progressing) patients in clinical trials. Magnitude of change in the PRO score of an individual that is required to classify as improved or worsened is specified a priori. Treatments are then compared with regard to differences in proportions of responders, progressors, or both. Responder analysis is appealing because it embeds meaningful change into the consideration of statistical significance. To conduct a responder analysis, however, one must answer the difficult question, “How should response to treatment be operationalized?” Statistical approaches have been critiqued because of the absence of an external anchor and the lack of consensus and empirical support [4]. A more patient-centered approach is to estimate meaningful change by anchoring to responses to a one-item global rating of change (GROC). However, the GROC has been criticized at several levels because of its vulnerability to response bias [5]. Clinical validity studies that evaluate changes in scores across different conditions and contexts could provide more defensible anchors for responder analysis, supporting the science of CER.

This article presents an overview of the five-article series published in this issue. [An additional article in this issue is devoted to examining the ecological validity of various Patient Reported Outcome Measurement Information System (PROMIS®) measures across five different populations]. These publications document progress in building a body of clinical validity evidence for nine measures from the National Institutes of Health's (NIH) PROMIS [6], [7], [8], [9], [10], [11], [12]. Collectively, the findings substantially increase knowledge of the appropriate and meaningful applications of these PROMIS measures. In addition, they present an innovative approach in which measures are evaluated and compared across multiple chronic diseases and conditions. We further discuss how these findings may be used to support comparative effective research.

Section snippets

Background

During the first period of NIH PROMIS funding (2004–2009), several longitudinal studies were undertaken. Each was conducted in one of six clinical conditions: chronic heart failure (CHF), chronic obstructive pulmonary disease (COPD), rheumatoid arthritis (RA), cancer, back pain, or major depression. These studies, the “parent studies” for this series of articles, addressed both substantive and psychometric research questions. For example, the back pain study evaluated the impact of spinal

Organization and scope of studies

The cross-cutting articles in this series are organized by PRO domain rather than by clinical population. This is consistent with the PROMIS measurement philosophy that emphasizes domain-specific, rather than disease-specific measurement. Table 1 reports which domains and subdomains were measured and which corresponding PROMIS measures were administered to each of the six clinical samples. (PROMIS measures themselves are available in an online Appendix at www.jclinepi.com.) As evident in Table 1

Back pain

Recruitment and all procedures were approved by the University of Washington Institutional Review Board [13]. All participants provided informed consent.

Analyses

As reported above, different clinical samples lent themselves to different research questions, but all analyses contributed to understanding the clinical validity of the PROMIS domain measures. Another point of analytic continuity was the use of global and clinical anchors (such as perceived improvement in health) that were consistent within clinical groups and comparative across populations. Table 3 identifies the selected global and clinical anchors by clinical sample.

For each PROMIS

Summary

This series of articles provides practical information on the responsiveness of several PROMIS domains across six clinical validation studies. Cumulatively, they report clinical validity findings for nine PROMIS measures, representing five PROMIS domains, evaluated across six clinical conditions, and including approximately 1,500 individuals at baseline and 1,300 at follow-up. The cross-condition focus of the analyses provides a unique and multifaceted perspective on how PROMIS measures

Acknowledgments

NIH Science Officers on this project have included Deborah Ader, PhD, Vanessa Ameen, MD (deceased), Susan Czajkowski, PhD, Basil Eldadah, MD, PhD, Lawrence Fine, MD, DrPH, Lawrence Fox, MD, PhD, Lynne Haverkos, MD, MPH, Thomas Hilton, PhD, Laura Lee Johnson, PhD, Michael Kozak, PhD, Peter Lyster, PhD, Donald Mattison, MD, Claudia Moy, PhD, Louis Quatrano, PhD, Bryce Reeve, PhD, William Riley, PhD, Peter Scheidt, MD, Ashley Wilder Smith, PhD, MPH, Susana Serrate-Sztein, MD, William Phillip

References (38)

  • What is comparative effectiveness research

    (2013)
  • K.W. Wyrwich et al.

    Methods for interpreting change over time in patient-reported outcome measures

    Qual Life Res

    (2013)
  • S. Magasi et al.

    Content validity of patient-reported outcome measures: perspectives from a PROMIS meeting

    Qual Life Res

    (2012)
  • D. Cella et al.

    The patient-reported outcomes measurement information System (PROMIS): progress of an NIH roadmap cooperative group during its first two years

    Med Care

    (2007)
  • D.A. DeWalt et al.

    Evaluation of item candidates: the PROMIS qualitative item review

    Med Care

    (2007)
  • B.B. Reeve et al.

    Psychometric evaluation and calibration of health-related quality of life item banks: plans for the patient-reported outcomes measurement information System (PROMIS)

    Med Care

    (2007)
  • D. Cella et al.

    Meaningful change in cancer-specific quality-of-life scores: differences between improved and worsening

    Qual Life Res

    (2002)
  • J. Ringash et al.

    Interpreting differences in quality of life: the FACT-H&N in laryngeal cancer patients

    Qual Life Res

    (2004)
  • M. Dolgin

    Nomenclature and criteria for diagnosis of diseases of the heart and great vessels

    (1994)
  • Cited by (301)

    View all citing articles on Scopus

    Conflict of interest: K.F.C. is an unpaid officer of the PROMIS Health Organization; D.A.D. is an unpaid member of the board of directors of the PROMIS Health Organization; B.B.R. is an unpaid member of the board of directors of the PROMIS Health Organization. A.A.S. declares a potential conflict as Senior Scientist with the Gallup Organization and as a Senior Consultant with ERT, inc.; K.W. is an unpaid member of the board of directors of the PROMIS Health Organization; D.C. is an unpaid member of the board of directors and officer of the PROMIS Health Organization. All other authors declare no conflict of interest.

    Funding: PROMIS was funded with cooperative agreements from the National Institutes of Health (NIH) Common Fund Initiative (Northwestern University, PI: David Cella, PhD, U54AR057951, U01AR052177, R01CA60068; Northwestern University, PI: Richard C. Gershon, PhD, U54AR057943; American Institutes for Research, PI: Susan (San) D. Keller, PhD, U54AR057926; State University of New York, Stony Brook, PIs: Joan E. Broderick, PhD, and Arthur A. Stone, PhD, U01AR057948, U01AR052170; University of Washington, Seattle, PIs: Heidi M. Crane, MD, MPH, Paul K. Crane, MD, MPH, and Donald L. Patrick, PhD, U01AR057954; University of Washington, Seattle, PI: Dagmar Amtmann, PhD, U01AR052171; University of North Carolina, Chapel Hill, PI: Harry A. Guess, MD, PhD (deceased), Darren A. DeWalt, MD, MPH, U01AR052181; Children's Hospital of Philadelphia, PI: Christopher B. Forrest, MD, PhD, U01AR057956; Stanford University, PI: James F. Fries, MD, U01AR052158; Boston University, PIs: Alan Jette, PT, PhD, Stephen M. Haley, PhD (deceased), and David Scott Tulsky, PhD (University of Michigan, Ann Arbor), U01AR057929; University of California, Los Angeles, PIs: Dinesh Khanna, MD (University of Michigan, Ann Arbor), and Brennan Spiegel, MD, MSHS, U01AR057936; University of Pittsburgh, PI: Paul A. Pilkonis, PhD, U01AR052155; Georgetown University, PIs: Carol. M. Moinpour, PhD (Fred Hutchinson Cancer Research Center, Seattle), and Arnold L. Potosky, PhD, U01AR057971; Children's Hospital Medical Center, Cincinnati, PI: Esi M. Morgan DeWitt, MD, MSCE, U01AR057940; University of Maryland, Baltimore, PI: Lisa M. Shulman, MD, U01AR057967; and Duke University, PI: Kevin P. Weinfurt, PhD, U01AR052186).

    View full text