Introduction

The validation of patient-reported outcome measures (PROMs) over time ultimately pertains to whether the inferences we make on changes in PROM scores are justified, and subsequent actions and decisions based on those inferences are well founded. Such inferences are typically based on comparisons of responses to PROMs over time. An important source of validity evidence is information on response processes [1], which are defined as “… the mechanisms that underlie what people do, think, or feel when interacting with, and responding to (PROM items)…” [2, p. 2]. In this paper, we will use the general term ‘response processes’ to refer to these mechanisms. We will use the more specific term ‘appraisal’ (i.e. a specific set of four main types of cognitive processes proposed to describe how people respond to PROM items) only when referring to the work of Rapkin and Schwartz [3, 4]. Inferences about change over time based on the repeated administration of PROMs require that we take into account that the response processes over time may also change. There is ample evidence that respondents may not interpret and respond to PROM items in the same way at different time points as a result of health state changes [5, 6]. Changes in observed PROM results over time may signal changes in the measured target construct. They may also reflect a number of other response processes that influence response behaviour, including a change in the interpretation of the item, referred to as response shift [7]. Such response shifts may signal meaningful changes. However, if ignored, response shifts may threaten the validity of the inferences, actions, and decisions we will make based on the results of these PROMs over time.

The last 25 years have witnessed the burgeoning of response shift research and a concomitant increase in the heterogeneity of conceptualizations, objectives, designs, methods, and reporting of results [8]. The Response Shift – in Sync Working Group [9] was therefore established to synthesize diverse approaches wherever possible and desirable, based on a critical and comprehensive appraisal of the work to date. Among others, the Working Group focused on two interrelated topics: definitions and theoretical underpinnings [7], and operationalizations and response shift methods [10].

These two orienting works may benefit from further clarification and refinement and have implications for response shift research, either separately or combined, that have not been delineated. The aim of this paper is to advance response shift research by explicating the implications of these syntheses in an integrative way and suggesting ways for improving the quality of future response shift studies. We hereby aim to reach particularly researchers, but also health care providers and policy makers who are familiar with or want to familiarize themselves with response shift.

Implication 1: definition

The proposed response shift definition by Vanier and colleagues [7] needs further revision and clarification in relation to the definitions provided by Sprangers & Schwartz [11] and Rapkin & Schwartz [3, 4] and its implications for the interpretation of results need to be explicated.

Revision of the Vanier et al. [7] definition

The key definitions of response shift, provided by Sprangers and Schwartz [11], Rapkin and Schwartz [3, 4], and Vanier et al. [7] based on Oort [12, 13], are listed in Box 1. According to these definitions, measures that are particularly susceptible to response shift pertain to one’s self-evaluation [11], evaluation-based PROs [3, 4], and evaluation-based self-reports [7]. Hence, all definitions pertain to evaluations of oneself, of which PROs are a subset. We propose expanding the definition of Vanier et al. [7] to include any subjective evaluation requiring idiosyncratic criteria [3]. This would imply that response shift can also occur in proxy evaluations of other persons (e.g. patients, children) or objects (e.g. aesthetic evaluations of art). The definition can also be strengthened by including the sentence that Vanier et al. [7, p. 3316] formulated separately, in the definition itself, namely that response shift is the consequence of ‘a change in the meaning of one’s self evaluation of a target construct’. Finally, the definition can be written more precisely, highlighting that response shift is an effect, resulting in:

“Response shift is an effect on observed change that cannot be attributed to target change because of a change in the meaning of the subjective evaluation of the target construct.”

Box 1 Previous and new response shift definitions

Note that observed change is change in the scores on the measurement instrument (e.g. a PROM), target change is change in the targeted construct or intended outcome (e.g. a PRO), and change in meaning of subjective evaluation refers to change in response processes (e.g. recalibration [11]) when responding to the items of the measurement instrument.

This definition coincides with the formal definition of response shift, where response shift is defined as a special case of a violation of the principle of conditional independence (PCI), which can be phrased in mathematical terms [7, 13]. There are many possible causes of violations of the PCI. The current definition refers to the special case where the violation of the PCI is caused by a change in the meaning of the subjective evaluation of the target construct. Only then there is response shift (see Box 1).

Agreements and differences among the definitions

Building on prior work by Golembiewski et al. [14] and Howard et al., [15], Sprangers and Schwartz [11] conceptualized response shift as a change in meaning of one’s self-evaluation. Conversely, Vanier et al. [7] define response shift at the measurement level as a discrepancy between observed and target change, as a special case of violation of the PCI, if it is caused by a change in the meaning of subjective evaluation. Hence, whereas change in meaning of one’s self-evaluation is response shift according to the conceptual definition by Sprangers and Schwartz [11], it is a cause of response shift according to Vanier et al. [7]. Clearly, such a discrepancy between observed and target change may not always be caused by a change in the meaning of a subjective evaluation, but for example by social desirability responding or effort justification. In those cases, it will not be considered response shift, which is consistent with the conceptual definition. Hence, as discussed by Vanier et al. [7] and Sébille et al. [10], and as indicated above, a discrepancy between observed and target change is considered a necessary condition and change in meaning of subjective evaluation a sufficient condition for the occurrence of response shift. For example, within latent variable frameworks, a lack of longitudinal measurement invariance and presence of longitudinal differential item functioning (DIF) provide evidence of PCI violation and could be considered a necessary but not sufficient condition for response shift to occur.

The response shift definitions provided by Rapkin & Schwartz [3, 4] and Vanier et al. [7] are comparable in that they both refer to a discrepancy between observed and expected change [3, 4] or target change [7] that is caused by changes in appraisal [3, 4] or changes in meaning of subjective evaluation [7]. The major difference is that the definition of Rapkin & Schwartz [3, 4], refers to an empirical study where expected change is dependent on the variables measured and change in meaning is assessed with a particular measure assessing appraisal. Conversely, Vanier et al. [7] employ a formal definition based on violation of the PCI, which is applicable to any method that can detect such discrepancies and where change in meaning can be assessed in multiple ways.

Situations might occur where a change in meaning of the subjective evaluation does not cause a discrepancy between observed and expected [3, 4] or target [7] change. Whether plausible or empirically detectable, this situation would still be considered response shift according to the conceptual definition [11], but not according to the two definitions by Rapkin & Schwartz [3, 4] and Vanier et al. [7] where these discrepancies are the conditio sine qua non.

We suggest a possible way to reconcile the two previous definitions [3, 4, 11] and the current definition: how response shift occurs (i.e. via changes in response processes, e.g. recalibration) may take place in respondents’ minds, but may be revealed in the violation of the PCI when there is a discrepancy between target and observed change caused by a change in the meaning of the subjective evaluation of the target construct.

Implication 2: theory

The proposed theoretical model by Vanier et al. [7] requires further explanation in relation to previous theoretical models [3, 11] and its implications for formulating research objectives need to be highlighted.

Recalibration, reprioritization, and reconceptualization

In the conceptual definition of response shift [11], a change in the meaning of one’s self-evaluation of a target construct results from recalibration, reprioritization, and/or reconceptualization. They were also referred to as the three types of response shift and were thus conceived as the causes or why there is change in meaning. The definitions of Rapkin and Schwartz [3, 4] and Vanier et al. [7] do not explicitly include these three types of response shift.

Rapkin and Schwartz [3] do not depict the three types of response shift in their model but rather intend their appraisal measures to operationalize them. Changes in appraisal may indicate a change in standards of comparison (recalibration), combinatory algorithm and/or sampling strategy (reprioritization), and frame of reference (i.e. reconceptualization). Vanier et al. [7] did include the three types of response shift in their theoretical model, but removed them from the ‘why’ (response shift can occur), and subsumed them under the ‘how’, i.e. how response shift, or the violation of the PCI due to change in meaning, can occur. Hence, the relevant question would be ‘How does response shift occur?’ and the answers could be “Through recalibrating the scale, reprioritizing the relative importance of the components constituting the target construct and/or reconceptualizing the meaning of the target construct itself”. These ways through which response shift can occur are not exhaustive as response shift can also occur via other response processes that induce or reflect change in meaning of the subjective evaluation, e.g. changes in response selection to normalize health state change [5, 6, 16].

In Vanier et al.’s [7] theoretical model, the why or the explanation of the violation of the PCI due to change in meaning, needs to be sought in a broader explanation of why people react to changing conditions as the underlying cause of response shift. Hence, all possible theories explaining such reactions are subsumed under the why, e.g. theories on cognitive homeostasis, set points, meaning making, or regaining control. This distinction is particularly important as it allows for linkages with relevant fields, such as health psychology theories and approaches that are critical to advancing response shift research [17].

Adaptation and response shift

The response shift theoretical model proposed by Vanier et al. [7] can further provide insights as to why adaptation to changing health and response shift are distinct concepts despite the fact that they are frequently confused or used interchangeably in the literature. Here we view adaptation as the lay term to what we have defined as mechanisms, i.e. behavioural, cognitive, and affective processes to accommodate health state change [7, 11]. Hence, adaptation may refer to any variable subsumed under mechanisms, e.g. coping, social comparisons, or spiritual engagement.

The theoretical model [7, p. 3317] shows that adaptation to changing health may induce response shift only if this mechanism has an additional effect on observed change that cannot be explained by its effect on target change due to change in meaning. This happens when adaptation not only affects the responses to the PROM at follow-up through its influence on the target construct at follow-up but also directly (i.e. through path M2 [7, p. 3317]). Consequently, the general distinction is that response shift is an effect that may be caused by adaptation. It should be noted that adaptation can take place without inducing response shift. This is the case when adaptation influences the level of the target construct, and through this influence, indirectly the responses to the PROM at follow-up (i.e. through paths M1 and TC2 1) [7, p. 3317]. For example, seeking pain medication may help an individual to experience less pain over time, or taking a course in mindfulness may help a person to better cope with debilitating fatigue, without affecting the meaning of the response scales of pain and fatigue, respectively. Response shift can also occur when the catalyst directly affects the responses to the measure at follow-up (i.e. through path C3) [7, p. 3317]. For example, when the catalyst is an acute and severe shock, e.g. a car accident or emergency surgery. Hence, adaptation and response shift are distinct phenomena and they do not need to co-occur.

An implication of this distinction is to be careful and precise about the objective of an empirical study. For example, is one primarily interested in the influence of adaptation to illness on changes in the level of pain or fatigue, or in detecting response shift in the assessment of pain or fatigue over time? One should keep in mind that these two objectives are not mutually exclusive. If one would adopt our definition and model where response shift is a possible effect of adaptation (i.e. mechanisms) on observed change that cannot be attributed to target change, due to change in meaning, another implication would be to avoid referring to interventions or treatments as designed to induce a positive response shift. Rather, such interventions or treatments are meant to stimulate adaptation, which in turn may cause response shift at the measurement level, or not.

Implication 3: methods

The proposed list of alternative explanations per method in Sébille et al. [10] would benefit from extension and further implications for response shift detection and explanation need to be delineated. The diversity of the methods also has implications for response shift research that warrants attention.

Change in meaning of subjective evaluation

The main finding of the review by Sébille et al. [10] was that for all methods, response shift results cannot be accepted at face value and steps need to be taken to rule out alternative explanations or make them less likely. Vanier et al. [7] have provided a list of phenomena that are related to but distinct from response shift, which may influence responses to PROMs. Some of these can be considered alternative explanations of response shift and may be applicable to a range of methods (Table 1 [7, p 3312–5]). Sébille and colleagues [10] have listed the major alternative explanations for each method specifically. We bring our earlier work a step further by listing additional alternative explanations for each method, without claiming to be exhaustive (Table 1). One additional alternative explanation merits particular attention. According to all three definitions, the key to response shift is that a change in meaning of a subjective evaluation is at stake [3, 7, 11]. This implies that response shift methods should be able to detect a change in meaning. However, none of the extant quantitative methods are able to attribute unequivocally their results to change in the meaning of subjective evaluations [10]. With this perspective, all quantitative methods provide the necessary but not the sufficient conditions for response shift. An exception would be the appraisal method as it directly targets change in meaning of subjective evaluations. However, it is doubtful whether the derivatives of the original Quality of Life Appraisal Profile, version 2 [18] and the Brief Appraisal Inventory [19] as employed are able to assess appraisal, as these measures conflate appraisal of quality of life (QoL) with QoL itself and adaptation [10, 20]. Moreover, it is unlikely that an appraisal measure administered at the end of an entire set of various questionnaires would be able to assess the response processes of all these items [5, 6]. Further research, particularly qualitative research, on understanding (causes of) appraisal or more generally, response processes, is needed [21, 22].

Table 1 Exploring and making alternative explanations less plausible per response shift method

Exploring alternative explanations

For each method separately, we have further expanded our earlier work by providing ways to assess the plausibility that results are caused by change in meaning, exploring empirically the possible influence of all the other listed alternative explanations, and making these less likely by design or analysis where possible (Table 1). These actions are needed to make the conclusion that response shift may have occurred (more) plausible. However, additional validity evidence (e.g. qualitative research) and theoretical and/or clinical support for the interpretation of the results are also required to confirm that the results can indeed be attributed to response shift [2, 23,24,25].

One should be aware that in principle, we cannot know whether our results reflect the presence or absence of response shift. There are logically four possible situations based on whether response shift is present or absent and whether the methods have detected response shift or not. Clearly, particularly in the case of false positives and false negatives, alternative explanations or rebuttal arguments [23] would clarify these results. We are therefore obliged to explore the possible influence of alternative explanations in all empirical studies.

Different purposes

The response shift methods are diverse and range from qualitative and individualized methods to design and statistical approaches [8, 10]. Given their variety, they may be useful for different purposes. A helpful distinction might be between methods that detect response shift, and methods that investigate (components of) response shift theory or aim to explain response shift for short.

If we want to detect response shift, then we need to operationalize the constructs that feature in the definition, and their interrelationships. We then would favour methods that are optimal for investigating violations of the PCI or discrepancies between observed and target change. Again, we need to check whether change in the meaning of people’s responses to PROM questions (i.e. subjective evaluation) is the cause of the discrepancies between observed and target change. In other words, finding their cause is part of response shift detection. Detection also concerns the scale or magnitude of the findings. This includes methods that are able to generate effect size estimates and methods that classify people as having undergone response shift or not. Finally, detection also refers to the ability to assess change over time while adjusting for response shift effects [10, Table 1]. This is key when the aim of the study is not only to detect response shift but also to assess change in the level of the target construct, adjusted for response shift.

If we want to explain response shift, we need to investigate (specific parts of) response shift theory and operationalize the constructs that feature in the theory, and their interrelationships. Based on the theoretical model proposed by Vanier and colleagues [7] we distinguish its components (i.e. target construct, catalyst, antecedents, and mechanisms), as well as “how”’ and “why” response shift occurs. This would imply that explanation encompasses how response shift occurs, why response shift occurs at study level via explanatory variables, and why response shift occurs at a more abstract, theoretical level by considering the underlying theories explaining the main principles behind response shift. In other words, finding the cause of response shift is part of response shift explanation. We therefore would favour methods that reveal how response shift can occur (i.e. through changes in response processes related to change in meaning of the subjective evaluation, including but not restricted to recalibration, reprioritization, and reconceptualization) and why response shift can occur by, for example, including explanatory variables (e.g. antecedents or mechanisms) in their models or conducting qualitative interviews.

In Table 2, the methods are classified according to these two, not mutually exclusive dimensions: response shift detection versus explanation. Here we include the extant methods and analytical practices used in response shift research. As can be seen, a number of methods can be used to both detect and explain response shift. Depending on whether a study aims at detecting/quantifying or explaining response shift, a method in those respective areas can be chosen. If a study targets both objectives, then a method combining both should be preferred.

Table 2 Classification of methods according to their ability to detect and explain response shift

Implication 4: future research

The need to enhance the quality and reporting of response shift studies has implications for future research.

Quality and reporting of response shift research

There is a need to enhance the quality of response shift research, which may entail the following components. First, the study aims need to be explicitly defined to inform the study design. For example, a study may aim to detect response shift and/or to assess change adjusted for the possible occurrence of response shift. Prior to embarking on such a response shift study, one should think carefully ahead how likely it is that response shift would occur. Based on previous studies, detection of response shift does not automatically imply that it would affect the assessment of group level change [26, 27]. In those cases, one would need to weigh the value of possibly finding response shift against the extra effort needed to find it (i.e. designing, collecting data, analysing, and reporting). If the balance tends to be negative, then it might be better not to assess response shift. If the aim is to investigate (parts of) response shift theory or explain response shift per se, then one would also need to think carefully ahead to conscientiously design the study (e.g. timing of assessments, collecting the requisite data, using the appropriate methods) such that response shift can be demonstrated if it is present. In other words: researchers are advised to do it well or not.

Second, we would like to highlight that even when a researcher is not interested in explaining response shift per se, but rather aims to assess change adjusted for response shift, the response shift itself can point to meaningful changes. Similarly, Zumbo [28] advocated an explanation-focused approach to DIF, aimed to provide explanations of why DIF has occurred. In other words, response shift itself may reveal meaningful information in any type of study.

Third, there is need for intentional use of different response shift methods dependent on the research objective and context. Methods can either focus on detection or explanation, standardization, or exploration, and adopt a nomothetic (i.e. focused on populations or groups of people) or idiographic orientation (i.e. focused on individual differences). These approaches are all needed to advance response shift research. Moreover, applying different methods to the same data where possible rather than using a single approach would avoid overconfidence in the results and ‘model myopia’ [29].

Fourth, one particular method, however, may need to be used more frequently: qualitative methods. As indicated before, none of the extant quantitative methods can unequivocally ascribe their results to change in meaning of the self-evaluation as a cause of discrepancies between observed and expected or target change. As indicated in Table 1, qualitative interviews are recommended alongside the quantitative methods to provide insight into response processes in relation to response shift [30, 31]. Particularly cognitive interviewing and think-aloud methods may shed light on how respondents interpret and respond to PROM items and whether the underlying response processes remain equivalent over time [6, 31]. Qualitative research is also needed to develop tools for measuring changes in the meaning of subjective evaluations or changes in response processes, which can be used across studies to enhance cross-study comparability. Such measures may include an interview protocol that is applicable to a range of studies or quantitative measures that ideally could be used as an explanatory variable in a statistical model, which most quantitative methods allow [10, Table 1, pp. 3328–32] (see also Table 2 of current paper). However, construction of such a quantitative measure is far from straightforward, if possible at all, given the concerns raised to the appraisal measures [20]. Finally, qualitative methods may play a pivotal role in ongoing theoretical development.

Fifth, whereas most response shift research is practical and empirically focused, researchers are encouraged to ground their studies in a theoretical framework. For example, the theoretical model provided by Vanier et al. [7], based on explicated assumptions, related to ontology (what is response shift?) and epistemology (how do we learn about response shift and how is it different from other phenomena?), may be useful. To stimulate empirical research, Vanier and colleagues [7], electronic appendix] have provided some examples on how to empirically test (parts of) their response shift model. Other theoretical frameworks can be used, including those of Rapkin and Schwartz [3, 4] and Oort et al. [32].

Sixth, to enhance the quality of research into response shift and to safeguard against false-positive findings and publication bias, we would like to encourage researchers to use pre-registrations [33] or registered reports [34, 35]. Both formats distinguish between prediction and postdiction. Whereas the former pre-registration entails posting the protocol and analysis plan to an independent registry (timestamped at a time before the analyses can commence), a registered report is a paper accepted before the start of data collection, focusing on the study’s theoretical foundation and a prospectively planned research protocol including methods and analysis plan [34]. The subsequent paper including the results, will be accepted provided the protocol was followed (or deviations are justified) and the conclusions are sound. The results themselves (i.e. insignificant or not) will not affect the final editorial decision [34, 35].

Seventh, the quality of the reporting of studies may benefit from improvement. The work on the synthesis of the quantitative response shift research [9] was hindered by the many studies that did not provide the requisite data to enable such a synthesis. In addition, in some studies, the operationalization of response shift was ambiguous (e.g. due to conflation with measuring adaptation). A list of reporting recommendations, based on a Delphi study and endorsed by all stakeholders, including editorial boards, may be helpful. To avoid too restrictive reporting recommendations, their purpose would need to be made explicit. For example, reporting recommendations may differ for studies aimed to detect, explain, or understand response shift (e.g. qualitative studies).

Last but not least, the planning of future studies and a future research agenda may benefit from being co-led by people living with the particular condition, carers, and other stakeholders. Patients and other stakeholders provide unique insights from their perspective that may ensure that the topics of greatest importance are advanced [36]. Moreover, such engagement may enhance the quality of the study, including, for example, the study design, outcome selection, patient recruitment strategies, patient enrolment rates, and the credibility of the findings [37]. Finally, integrating diverse contributions would yield “results that go beyond the ‘average treatment effects’” as they are pertinent to specific groups of patients [36, p. 1588].

Epilogue

We consider response shift itself to provide meaningful information that improves our understanding of change over time in PROs. The key is that our inferences, decisions, and actions made on longitudinal PROM data must consider the possibility that measurements of change over time may be influenced by response shift. However, a repeated finding is that response shift is not a self-evident or easy to understand term and many researchers attach different meanings to it. With the Response Shift – in Sync initiative, we aimed to contribute to a common language regarding definition, theory, and methods. By further elucidating and specifying this work, we hope to advance response shift research. We also intended to provide the logical or possible implications of our earlier work. Clearly, the current work is part of an ongoing process where the sketched implications are open for debate and further improvement.

Although we believe a common framework could be helpful, our goal is to promote the development and testing of theoretical frameworks and methods. We intentionally have not promoted one type of response shift method over another, as favouring one method would devalue other approaches. We believe all response shift approaches are needed to advance response shift research.