FormalPara Key Summary Points

The Patient-Reported Outcomes Measurement Information System (PROMIS) questionnaires are designed to efficiently evaluate and monitor health-related quality of life across numerous aspects of physical, mental, and social health (e.g., physical function, fatigue, and pain interference).

PROMIS can assess the health status of individuals in the general population as well those with chronic conditions, and has been validated in patients with rheumatoid arthritis.

In the AWARE study, patients with rheumatoid arthritis experienced meaningful changes from baseline in patient-reported outcomes following approximately 1 year of treatment with IV golimumab or infliximab as measured by PROMIS questionnaires (PROMIS-29, PROMIS Fatigue short form (SF) 7a, and PROMIS Pain Interference SF 6b).

PROMIS may supplement information routinely collected by healthcare providers in the clinical care of rheumatoid arthritis patients and inform shared decision-making by identifying unmet needs and providing a more comprehensive picture across several dimensions of patient health-related quality of life.

Introduction

Rheumatoid arthritis (RA) is a chronic inflammatory autoimmune disease characterized by progressive joint damage, with associated pain and fatigue. Varying degrees of disability that negatively affect patients’ health-related quality of life (HRQoL) are a hallmark of this disease [1], and maintaining or improving patient function is a key treatment goal in management of RA [2, 3]. Accurate assessment of treatment effect on HRQoL, physical function, and patient independence [4] requires detailed patient-reported input and cannot rely solely on physician-reported physical examinations.

The use of patient-reported outcomes to assess HRQoL in clinical practice and clinical trials is an area of research that has gained prominence over the last decade. Patient-reported assessments of global disease activity, pain, and physical function were included in the Core Set of RA response criteria proposed by Outcomes Measures in Rheumatology (OMERACT) [5] and endorsed by the American College of Rheumatology [6]. Fatigue has also been recognized as an important patient-prioritized aspect of HRQoL, and OMERACT has recommended it be measured in future trials whenever possible [7]. Patients with RA also have prioritized sleep, mood, and participation in activities of daily life as important aspects of their disease [8,9,10]. The Patient-Reported Outcomes Measurement Information System (PROMIS®), developed by the National Institutes of Health [11, 12], is a disease-agnostic set of health assessment instruments designed to efficiently evaluate and monitor HRQoL across numerous aspects of physical, emotional, and social health [11,12,13]. PROMIS can be used to assess the health status of individuals in the general population as well as in those with chronic conditions and has been validated in patients with RA [14, 15]. PROMIS measures have greater precision than the 36-item Short-Form Health Survey [SF-36] and improved responsiveness to change when assessing physical function [16]. Notably, PROMIS questionnaires (i.e., PROMIS-29 and PROMIS Short Forms [SF]) have been shown to be responsive to changes in disease activity across multiple domains in patients with rheumatologic diseases [14, 15, 17,18,19,20], with fewer ceiling and floor effects than other instruments in patients with RA [14, 21]. PROMIS has been employed at routine rheumatology visits, allowing patients and physicians to integrate patient-reported outcomes into shared treatment decisions [22]. This system has the potential to advance understanding of HRQoL in the real-world setting in RA and other chronic diseases, especially as these measures are integrated in mobile applications [23] and electronic health records systems.

AWARE was a prospective, real-world study with the primary objective of comparing the rate of infusion reactions and overall clinical effectiveness of intravenous (IV) golimumab to infliximab in patients with RA [24]. The effect of IV golimumab on quality of life in patients with RA as measured by the PROMIS instruments has not been previously reported in a real-world setting with a large United States (U.S.) cohort. The objective of this analysis was to assess the impact of IV golimumab and infliximab treatment on HRQoL using periodic multidimensional PROMIS assessments. Specifically, this analysis reports the ability of the PROMIS system to detect clinically meaningful change in RA patients’ perception of pain, fatigue, physical functioning, emotions, and social participation over time when treated with commonly used RA treatments. As a unique contribution of this work, there have been very few longitudinal evaluations of the PROMIS system in the setting of patients with RA with moderate-to-high disease activity who are initiating therapy.

Methods

Patients and Study Design

AWARE (NCT02728934) was a prospective, noninterventional, real-world phase 4 study of the safety and effectiveness of IV golimumab and infliximab in patients with RA in the U.S. [24]. Adults (≥ 18 years) with a physician-confirmed diagnosis of RA and medically eligible for treatment with IV golimumab or infliximab (or infliximab biosimilar) in accordance with approved product labeling [25, 26] were eligible. At enrollment, patients had to have been prescribed, but never previously received, IV golimumab (if initiating IV golimumab) or infliximab (if initiating infliximab). Previous therapy with any number of other biologics, disease-modifying antirheumatic drugs, and other RA-related treatments was permitted. Patients who were pregnant, planning a pregnancy, or enrolled in an interventional study were not eligible.

This study did not include any randomization or study-related therapeutic interventions. Concomitant medication use was recorded at each infusion visit. All treatment decisions, including prescribed dose and dosing intervals of IV golimumab and infliximab and concomitant medication use, were at the discretion of the treating rheumatologist. The approved dosage of IV golimumab is 2 mg/kg at weeks 0 and 4, and every 8 weeks [25]. Infliximab is approved at a dosage of 3 mg/kg, at weeks 0, 2, and 6, and every 8 weeks, with dosage adjustment permitted up to 10 mg/kg or as often as every 4 weeks [26]. The frequency of patient visits for infusions was not predefined and occurred at the discretion of the investigator according to usual clinical practice.

Ethics Declaration

This study was conducted according to the Declaration of Helsinki and the International Committee on Harmonization Good Clinical Practices. The protocol was reviewed by a centralized institutional review board (Copernicus Group, Approval QUI1-15–645); all patients provided written consent.

PROMIS Evaluations

PROMIS-29 and PROMIS SF (Fatigue and Pain Interference) assessments [11, 14, 15, 27] were collected at baseline and immediately prior to the second, fifth, and eighth infusions (~ weeks 4, 28, and 52 for IV golimumab and weeks 2, 22, and 46 for infliximab). The PROMIS-29 questionnaire comprises four questions for each of seven domains (Depression, Anxiety, Physical Function, Pain Interference, Fatigue, Sleep Disturbance, and Participation in Social Roles and Activities), and a single pain intensity item (0 to 10 visual analogue scale [VAS]). The seven-item PROMIS Fatigue SF 7a and six-item PROMIS Pain Interference SF 6b were also utilized as these longer measures may have better precision than shorter instruments embedded in the PROMIS-29. Both the four-item (in the PROMIS-29) and six-item PROMIS Pain Interference SF 6b instruments assess the extent to which pain hinders a patient's engagement with physical, mental, cognitive, emotional, recreational, and social activities. Higher PROMIS scores indicate more of the domain concept being measured. Thus, for Anxiety, Depression, Fatigue, Pain Interference, and Sleep Disturbance, higher scores represent more symptoms. In contrast, higher PROMIS scores for Physical Function and Social Participation represent better function or social participation (i.e., fewer symptoms).

Statistical Methods

Prespecified analyses included all enrolled patients who received ≥ 1 administration of IV golimumab or infliximab (full study population) (Supplementary Fig. 1). Raw scores for each PROMIS-29 domain were converted to T-scores (using standard “look-up” tables), wherein standard T-scores with a mean of 50 and standard deviation of 10 were based on values obtained in the general U.S. adult population [11]. The single item pain intensity VAS score (0–10) was also reported.

Mean changes in PROMIS T-scores were determined using observed data without any imputation and summarized by treatment group in the full study population. Statistical analysis on mean changes in PROMIS T-scores was performed using analysis of covariance controlling for baseline PROMIS score and inverse probability of treatment-weighted (IPTW) propensity score to adjust for baseline imbalances between IV golimumab and infliximab in these real-world populations. Least squares (LS) mean differences between treatment groups and 95% confidence intervals (CI) are reported. The propensity scores were estimated using logistic regression and included baseline covariates of age, sex, race, region, body mass index (BMI), weight, disease duration, clinical disease activity index (CDAI) [28], biologic-naïve, other medications, number of prior biologics received, prior tumor necrosis factor inhibitor (TNFi) therapy (yes/no), selected comorbidities, and smoking status. Standardized mean differences > 0.10 [29] were used to identify imbalances at baseline.

In a post hoc analysis, the proportions of patients in the full study population who achieved improvements of ≥ 3, ≥ 5, and ≥ 10 points from baseline PROMIS-29 domain and SF T-scores were summarized by treatment group as these thresholds may reflect a clinically important difference in patient-reported outcomes at either a group level or an individual patient level [15, 30,31,32].

Additionally, to better assess shifts in scores, post hoc analyses evaluated mean changes in PROMIS T-scores and the proportions of patients with improvements of ≥ 3, ≥ 5, and ≥ 10 points from baseline in two patient subgroups: (1) patients with abnormal baseline T-scores (defined as > 5 points [one-half of the SD of 10] worse than the U.S. adult population norm of 50 [27]) (Supplementary Fig. 2) and (2) patients who completed infusion 8 and had available data from this visit (Supplementary Fig. 3). Change in the Pain Intensity score could not be evaluated in the subgroup of patients with abnormal baseline scores because the normative range for this score in the general population is unknown.

Results

A total of 1270 patients were enrolled in AWARE and included in this analysis: 685 received IV golimumab, and 585 received infliximab. As previously reported, higher proportions of patients in the IV golimumab than infliximab group were female (85 vs. 80%) and received prior biologic therapy (65 vs. 57%). On average, IV golimumab-treated patients were also older, had a slightly lower BMI, and had a longer duration of RA compared with infliximab-treated patients (Table 1) [24]. Through week 52, 43% of IV golimumab- and 42% of infliximab-treated patients discontinued the study; the most common reasons were lack of effectiveness (20% for IV golimumab; 12% for infliximab) and adverse events (8% for IV golimumab; 12% for infliximab).

Table 1 Baseline demographic and disease characteristics: full study population

In the full study population, mean baseline T-scores for PROMIS Fatigue (four-item), Pain Interference (four-item), and Physical Function were worse than general U.S. population normative values in both treatment groups (Table 2, Fig. 1). Among PROMIS-29 domains, mean Depression T-scores at baseline (51.9 ± 9.8 IV golimumab, 52.5 ± 10.2 infliximab) were the most closely aligned to the general population normative values (i.e., least impacted by RA in the AWARE population) (Table 2). The PROMIS-29 Pain Intensity item (0 to 10 VAS) was similar at baseline in IV golimumab and infliximab patients (5.9 ± 2.3 and 6.1 ± 2.2, respectively).

Table 2 Baseline and mean (SD) change from baseline PROMIS-29 domain and Short Form T-scoresa through approximately 1 year
Fig. 1
figure 1

Mean PROMIS-29 and Short Form T-Scores at baseline and infusion 8 for IV golimumab and infliximab in the a full study population b patients with baseline scores > 5 points worse than normative scores, and c patients who completed infusion 8 (completer population). Higher PROMIS T-scores indicate more of the domain concept being measured. Thus, for Anxiety, Depression, Fatigue, Pain Interference, and Sleep Disturbance, an increase in score represents worsening of the domain. In contrast, increasing PROMIS T-scores for Physical Function and Social Participation represent improvement

Mean Changes in PROMIS-29 T-Scores Through the Eighth Infusion

The proportion of patients with a baseline T-score > 5 points worse than normal scores (i.e., > 55 or < 45 depending on the direction of the score) ranged from 43% (Depression) to 91% (Pain Interference). Among these patients, mean changes in all PROMIS-29 domains and PROMIS SF T-scores in both treatment groups improved from baseline and were maintained over time through infusion 8 (approximately week 52) (Table 2). The greatest mean improvements after eight infusions of IV golimumab were observed in the four-item domains of Anxiety, Depression, Fatigue, and Pain Interference and Pain Interference SF 6b. Similar trends were observed among infliximab-treated patients across PROMIS domains and instruments with the exception of a numerically lower degree of improvement for both Pain Interference assessments and the Fatigue SF 7a. This subset of patients with abnormal baseline T-scores also demonstrated early improvements (at infusion 2) in all domain and SF  scores in both treatment groups (Table 2). Mean improvements in T-scores progressively increased over time for most PROMIS-29 domains in both treatment groups. IV golimumab-treated patients reported steady and continued mean improvements throughout the approximately 1-year treatment period (from infusion 2 through infusion 8), except for Sleep Disturbance where no change was observed between infusions 5 and 8. In contrast, mean changes in T-scores for Fatigue SF 7a, Pain Interference (four-item), Pain Interference SF 6b, and Sleep Disturbance remained steady between infusions 5 and 8 in the infliximab group.

Similar trends in mean improvements in T-scores were observed in the full study population; however, these changes were smaller in magnitude compared with the population with abnormal baseline scores (Table 2). Of note, the magnitude of mean change in PROMIS T-scores from baseline in patients with abnormal scores at baseline (> 55) was more than double those in the full study population for some PROMIS domains (e.g., Anxiety, Depression, Sleep Disturbance). Additionally, the mean change in the Pain Intensity VAS for IV golimumab and infliximab was similar by infusion 8 (− 1.3 ± 2.4 and − 1.1 ± 2.5, respectively). Mean changes in T-scores among the subset of patients who completed infusion 8 (Supplementary Table 1) were consistent with those of the full study population (Table 2).

Comparison of PROMIS-29 Four-item vs. SF Instruments for Fatigue and Pain Interference

In the subset of patients with abnormal baseline scores, mean improvements from infusion 2 through infusion 8 for both the four-item PROMIS-29 and SF instruments related to Fatigue and Pain Interference were numerically greater in the IV golimumab group than the infliximab group, with the exception of the four-item Fatigue at infusion 5 (Table 2). In the Fatigue domain, the difference in mean improvement between the IV golimumab group and the infliximab group was greater when using Fatigue SF 7a (− 4.7 vs. − 3.0; LS mean difference [95% CI]: 1.93 [0.46, 3.39]) than when using the four-item Fatigue domain in the PROMIS-29 (− 5.5 vs. − 4.6; LS mean difference [95% CI]: 1.11 [− 0.61, 2.83]). For the Pain Interference domain, mean improvements from baseline at infusion 8 in the IV golimumab and infliximab groups, respectively, were − 5.1 and − 3.8 (LS mean difference [95% CI]: 1.85 [0.53, 3.16]) when using the Pain Interference SF 6b and − 5.1 and − 3.8 (LS mean difference [95% CI]: 2.18 [0.82, 3.54]) when using the four-item Pain Interference domain of PROMIS-29 (Table 2).

Proportions of Patients Achieving Improvements ≥ 3, ≥ 5, and ≥ 10 Points from Baseline

Among patients with abnormal baseline T-scores, response rates for achieving an improvement of ≥ 3 points were generally maintained or increased from infusion 2 through infusion 8 in both treatment groups across all PROMIS domains (Fig. 2). At infusion 8, the proportion of patients achieving an improvement ≥ 3 points tended to be numerically greater in the IV golimumab group compared with the infliximab group in all domains. In the IV golimumab group, the domains with the highest response rates for an improvement ≥ 3 points at infusion 8 included the four-item Anxiety, Depression, Fatigue, Pain Interference, Sleep Disturbance, and Social Participation domains (range 53.5–67.8%); the domain with the lowest response rate at infusion 8 was Physical Function (41.2%) (Fig. 2). Similar trends were observed for patients achieving improvements of ≥ 5 and ≥ 10 points from baseline (Table 3).

Fig. 2
figure 2

a–i Proportion of patients with ≥ 3-point improvement from baseline in PROMIS-29 domains and Short Form Fatigue 7a and Pain 6b T-scores through infusion 8 among patients with baseline scores > 5 points worse than normative scores

Table 3 Proportion of patients with improvement from baseline PROMIS-29 domain and Short Form T-scores through approximately 1 year (among patients with baseline scores > 5 points worse than normative scores)a

When assessed in the full study population and in the subgroup of patients who completed infusion 8 (regardless of baseline score), response rates for achieving improvements of ≥ 3, ≥ 5, and ≥ 10 points generally followed similar trends as those in patients with abnormal baseline scores, but tended to be numerically lower (Supplementary Tables 2 and 3).

Discussion

Patient-reported outcomes can be practical and accessible tools for assessing patient perspectives of disease burden, and if administered longitudinally, can help to monitor treatment effectiveness in patients with RA. In the real-world phase 4 AWARE study that evaluated IV golimumab and infliximab in patients with RA, mean PROMIS T-scores at baseline in the entire study population indicated impairment across all PROMIS domains, particularly in the four-item Fatigue, Pain Interference, and Physical Function domains. Baseline PROMIS T-scores from this analysis were generally consistent with those in other cohorts of patients with RA from real-world observational studies including 156 to 548 patients [15, 17, 20, 31]. The demonstration of the impact of RA on various aspects of HRQoL from this analysis and across previous studies [15, 17, 20, 31] clearly supports the utility of these PROMIS measures in patients with RA.

In the full study population, improvements across PROMIS-29 domains were observed at infusion 2 in both treatment groups, and generally these improvements were sustained through ~ 1 year of treatment. By infusion 8, 40–68% of IV golimumab- and infliximab-treated patients reported a clinically meaningful improvement of ≥ 3 points across PROMIS-29 domains. The prespecified analyses included all enrolled and treated patients in the AWARE study; however, some patients had PROMIS domain scores near normal or better than population normative values at baseline. Thus, a post hoc analysis of patients with impairments in baseline PROMIS T-scores (i.e., scores > 5 points worse than normal) was performed. In this population, PROMIS-29 and PROMIS SF assessments were able to detect change in patient-reported outcomes following IV golimumab and infliximab treatment, with evidence of a response as early as infusion 2. The mean change from baseline in T-score for six of the seven different PROMIS-29 domains, and the Pain Interference SF 6b and Fatigue SF 7a assessments increased steadily from the fifth through the eighth infusions of IV golimumab. For Sleep Disturbance, T-scores plateaued at infusion 5 of IV golimumab. Mean changes in T-scores for most PROMIS-29 domains were comparable between the treatment groups after approximately 1 year of therapy, with the exception of the four-item Fatigue and Pain Interference domains, wherein improvements were generally numerically greater in the IV golimumab group. Of note, the magnitude of change from baseline in the subgroup of patients with abnormal baseline T-scores was more than double that observed in the full study population for three PROMIS domains (Anxiety, Depression, Sleep Disturbance). This finding was directly related to the observation that ≥ 50% of the patients in the overall AWARE population had normal or near-normal baseline scores in these domains.

It is also noteworthy that PROMIS-29 four-item Fatigue and Pain Interference T-scores were generally similar to those determined using the shorter Fatigue SF 7a and Pain Interference SF 6b T-scores through infusion 8 of IV golimumab; however, the longer versions of both assessments allowed for greater precision (i.e., less variability and smaller SD), when monitoring an individual patient over time vs. group-level changes. Even at a group level, using a longer SF or the Computer Adaptive Testing (CAT) version of the PROMIS instruments [23] may provide greater statistical power to detect between-group differences. For example, in the subgroup of patients with baseline values > 55, the LS mean differences between the treatment groups in the Fatigue score at infusions 5 and 8 were larger when using the Fatigue SF 7a instrument vs. the Fatigue four-item instrument.

In a previous prospective observational study of patients with RA (with similar demographics as those in AWARE), T-score changes of 1 to 3 points generally reflected minimal changes, while changes of 3 to 7 points indicated a clinically meaningful change [15]. These findings were consistent with another analysis of data from 31 studies, including patients with RA, that found a change in T-score of 2 to 6 points representative of a minimally important change (patient level) (effect size of 0.2 to 0.6) [33]. In the AWARE population, among patients with baseline PROMIS T-scores  > 5 points from the population norm, response rates for achieving improvements ≥ 3 points were generally maintained or increased from infusions 2 (26.7 to 51.2%) through 8 (41.2 to 67.8%) for IV golimumab across all domains.

Routine use of PROMIS provides a comprehensive, validated assessment of how patients experience RA, complementing clinician-reported assessments, as the latter may not capture all aspects of RA and RA-related health that are important to patients. PROMIS measures relevant to RA (e.g., pain, fatigue, physical functioning, and emotional status), and the associated scoring system (T-score), clearly position patients along a continuum relative to the general U.S. population (i.e., worse or better relative to a T-score = 50), allowing assessment of change over time, and minimizing floor and ceiling effects of many other patient-reported outcome instruments [14]. PROMIS instruments are also responsive to change, can be completed quickly, and are publicly available. Most PROMIS-29 domains can be covered by as few as four items on either a fixed SF scale or the CAT version and require < 1 min to complete [23]. Routine use of these measures may enhance outcomes in patients with RA by ensuring patient perspectives on their health and response to therapy are quantified and captured. Furthermore, in a previous analysis of patients with RA, PROMIS scores approached or exceeded population normative values for individual domains when CDAI remission was achieved [14]. Thus, PROMIS measures may serve as a reasonable proxy clinical disease activity measurement in RA (e.g., CDAI) in circumstances where only patient data are available and in-person assessments are curtailed (e.g., due to a global pandemic) [34]. Finally, use of PROMIS instruments allows for cross-disease comparisons to facilitate cost effective resource allocation to maximize the health of a diverse population with a variety of chronic illnesses, not limited to RA.

The findings of this study must be interpreted in the context of its strengths and limitations. The AWARE study represents the largest, prospective, real-world assessment of the safety and efficacy of IV golimumab and infliximab in patients with RA, which is a strength of this analysis. However, as would be expected for a real-world source of evidence and by design, concomitant treatments were not standardized. A high discontinuation rate in both treatment groups was also observed, and analyses were based on data only from patients continuing therapy, which may have introduced bias [35]. Using methods to account for discontinuation such as nonresponder imputation would likely result in smaller changes in the PROMIS scores. In addition, the "regression to the mean” is a potential concern as patients are enrolled in a study at a time when they are feeling poorly and starting a new treatment [35]. However, in as much as this reflects the common circumstance where patients initiate a new treatment, our findings should generalize well to this setting. AWARE provides a substantial real-world RA patient database with which to assess patient-reported outcomes using PROMIS assessment tools. The combined baseline and longitudinal PROMIS data represent the largest assessment in adult patients with RA reported to date from a clinical trial using multiple PROMIS instruments and enrolling patients with RA with moderate-to-high disease activity initiating a new therapy. Accordingly, these data reflect real-world clinical practice, and this population is likely representative of the full disease spectrum of patients with RA who would initiate TNFi in clinical settings.

Conclusions

The results reported here support the use of the PROMIS assessment tools in patients with RA in the real-world and across multiple dimensions of HRQoL. The ability of the PROMIS instrument to detect a clinically meaningful change in RA patients’ perception of the impact of their disease (e.g., pain, fatigue, physical functioning) following treatment is an important observation. Utilization of these instruments is feasible in a real-world clinical care setting on paper or electronically, potentially both at and between office visits (e.g., via a smartphone app or patient portal). These PROMIS measures reflect change over time with effective interventions and provide a population-normalized reference. Sharing PROMIS results at the time of a clinical encounter using data collected at, or potentially even between visits, facilitates focused patient–physician interactions that address issues of importance to patients, shared decision making, and a more effective partnership between patients and providers.