Numerous stakeholders recommend advance care planning (ACP) to improve the quality of care that patients receive as they approach the end-of-life.1,2,3 Hospitalizations offer one opportunity for clinicians to initiate ACP conversations with patients.4 However, high-quality conversations, which allow patients to reveal (or potentially construct) their preferences, require clinicians to have the communication skills, the willingness to engage in emotionally complex interactions, and the time necessary to facilitate this process.5,6,7,8 Guidelines suggest screening patients to prioritize those with near-term mortality or morbidity risk based on the presence of “serious illness,” defined as the presence of a condition that carries a high risk of mortality or impacts quality of life.4 In the absence of a quantifiable definition of this term, the surprise question (which requires the treating clinician to consider whether or not he/she would be surprised if the patient died in the next year) has been widely promoted.9, 10 Pooled results of two different meta-analyses, however, suggest poor to modest accuracy of the surprise question for predicting death at 12 months.10, 11 Efforts to improve the quality of care for patients at the end-of-life therefore require better strategies to screen and prioritize patients for ACP conversations.

The objective of this study was to establish a consensus-based normative standard for risk of mortality that should prompt hospitalists to have an ACP conversation with their patients. Recognizing that people, even experts, struggle with probability-based judgments, we embedded a behavioral experiment within a Delphi process, sequentially presenting experts with cases selected from across the distribution of mortality risk and observing their judgments as the sampling frame changed. We hypothesized that experts would be more likely to recommend an immediate ACP conversation as the risk of mortality increased.

MATERIALS AND METHODS

Study Design

We used a modified Delphi method to establish an empirical standard for the threshold of mortality at which hospitalized patients should have ACP conversations. In an area that lacks certainty, the Delphi method uses multiple rounds of structured feedback to achieve consensus. 12,13,14,15 We began with the consensus definition of ACP developed by Sudore et al., as “a process that supports adults, at any age or stage of health in understanding and sharing their personal values, life goals, and preferences regarding future medical care.”16 We conducted three rounds of surveys from March 2019 to June 2019, using Qualtrics Survey Software (Provo, UT) to collect anonymous responses to an iterative series of questions. Questions focused on establishing a threshold for the risk of mortality (short-term and 1-year) at which > 70% of clinicians agreed that hospitalized patients would benefit from ACP conversations, and used both case vignettes as well as explicit questions about prognosis to elicit these judgments.17 The Dartmouth Institutional Review Board reviewed and approved the study (IRB# 31186).

Expert Panel

We identified a multi-disciplinary group of clinicians with academic and practical expertise in ACP to participate in the Delphi process (see Appendix for details of sampling frame). We decided not to include patients, surrogates, or non-expert physicians on the panel because our objective was to establish to a clinical standard for screening patients.

We contacted candidate panelists through email, explained the purpose and design of the study, and offered a wage-based, fixed, monetary honorarium (paid through a link, embedded in the email invitation, to a $25 Amazon gift card). Clinicians, who completed a round, were invited to participate in subsequent rounds. We provided an additional $25 honorarium to those who completed all three rounds of surveys.

Instrument Design

We used three methods to generate consensus among panelists. First, we selected cases for each round to probe areas of controversy. Second, we provided aggregated feedback to panelists after each round so that they could consider and either incorporate or reject that information into their personal judgment. Third, we became increasingly transparent about the design of the study with each round. Once we developed the survey, we beta-tested it with 13 clinicians and made iterative modifications to minimize respondent burden and maximize clarity.

Demographic and Personal Characteristics

The instrument included 11 items designed to capture the demographic and personal characteristics (e.g., age, sex, training) of the respondent. It also included items specific to the the respondent’s ACP practices, including their source of expertise and the number of ACP conversations held per week.

Elicitation of Mortality Threshold for Patient Selection

Given the observation that people, even experts, lack insight into their own cognitive processes,17 we used two methods to establish the mortality threshold at which expert clinicians would recommend an ACP conversation: case vignettes to observe the use of the threshold in practice (see Table 1 and the Appendix for details of development) and direct elicitation of judgment.

Table 1 Description of Case Vignettes, Stratified by Risk of Short-term, 1-Year Mortality (with Risk Calculators), and Presence or Absence of Dementia

Indirect Elicitation Via Case Vignettes

After presenting each vignette, we asked panelists whether or not they believed an ACP conversation was indicated since no prior planning had occurred. Those who replied affirmatively were then asked to rate the priority of the conversation: (1) the patient’s primary care physician should have it in the outpatient setting; (2) the hospitalist should have it before discharge; (3) the hospitalist should have it that day. In response to feedback from panelists, after round 1, we further refined the description of the Likert options by changing the description of the current day option to “the treating hospitalist should have a “goals of care” conversation that day” (defined as discussions around decisions about near-term treatment choices and the intensity of care) with the other response options unchanged.

Round 1

Round 1 included 9 vignettes: 8 experimental stimuli plus one distractor case that we included as a check on the validity of the task and the panelists’ attention. We categorized mortality as low or high (short-term: ≤ 10% or > 10%; 1-year mortality: < 34% or ≥ 34%), and used a 3 × 2 factorial design to select cases, systematically varying the short-term and 1-year mortality as well as the presence or absence of dementia. Finally, we asked panelists an open-ended question about what drove their decision making for each case.

Round 2

Round 2 included 4 vignettes, selected to represent mortalities in the middle of each distribution. Given concerns about the burden imposed on respondents by the 3 × 2 factorial design, we dropped dementia as predictor of panelist recommendations.

Round 3

Round 3 included 5 vignettes. Based on responses in round 2, we further stratified the lower 1-year mortality group into two categories: low mortality: < 19%; intermediate mortality: 19–33%. We also dropped high mortality risk from the factorial design.

Direct Elicitation

In rounds 2 and 3, we asked panelists to state the short-term and 1-year risk of mortality and the risk of failure to return to baseline physical and cognitive functioning that would prompt them to engage in a goals of care conversation with a hospitalized patient, using a slider to select the exact value between 0 and 100%. Finally, we included the surprise question, “would you be surprised if the patient died in the next year?,” after each case, to further elucidate the calibration of clinician prognostication about mortality. We did not include these questions in round 1 to avoid priming panelists.

Other Determinants of ACP Conversations

To ensure that we captured other potential determinants of panelists’ screening decisions, we included an open-ended question about advice the respondent would offer to trainees on how to prioritize ACP conversations in the hospital in round 1.

Statistical Analyses

We summarized demographic and personal characteristics of panelists using counts (percentages) and means (standard deviations) as appropriate. We summarized responses to the case vignettes, using proportions, and the direct elicitation slider questions, using medians (inter-quartile ranges). We used Spearman’s correlation coefficient to test the correlation between responses to the surprise question and the predicted probability of 1-year mortality.

Predicted probability of 1-year mortality = the probability of inpatient mortality (based on disease-specific risk calculator) * probability of 1-year post-hospitalization mortality conditional on survival to discharge (based on the Walter score).(1)

To assess other determinants of panelists’ screening decisions, one author (OS) coded all the responses to the open-ended questions, using content analysis to identify dominant themes.

RESULTS

Baseline Characteristics

Of the 108 clinicians who were successfully emailed with an invitation to participate in the study, 57 (52%) completed round 1 of the Delphi process (Fig. 1). Of these initial respondents, 47 completed rounds 2 and 3 (82%). The mean age of study participants was 48.5 (SD 9.9) and 31 were male (54%). The majority were physicians (84%), working in the acute care setting (73%), with 23 (SD 9.8) years since the completion of medical school. Of the clinicians who participated in the Delphi process, the majority (55%) conducted research in ACP and were educators (55%) in the field. A summary of the participants’ characteristics is provided in Table 2.

Figure 1
figure 1

Sampling frame for the study. We show recruitment and retention of panelists by survey round.

Table 2 Characteristics of Delphi Panelists

Elicitation of Mortality Threshold for ACP Conversations

Indirect Elicitation Via Case Vignettes

Panelists did not reach consensus on how to treat the case of the 54-year-old otherwise healthy patient with diverticulitis, included to check both their attention and the validity of the task. However, the heterogeneity of responses suggested that (a) panelists were paying attention and (b) norms around the desirability of ACP did not extend indiscriminately to all patients, affirming the validity of the task.

Among cases of patients 65 years or older, panelists immediately agreed that ACP should occur in the hospital. They did not set a threshold for mortality at which hospitalists should engage in ACP conversations. However, they did vary in their recommendations about the timing and content of those conversations (see Fig. 2), which therefore became the focus of the analysis.

Figure 2
figure 2

Frequency of ACP recommendations by case. We show the frequency of the responses to the case vignettes, organized by the probability of mortality (short-term/1-year) and by round. We categorized short-term mortality as low (≤ 10%) or high (> 10%), and 1-year mortality as low (< 19%), intermediate (1933%), and high (> 33%). Cases describing patients with dementia are denoted by a (D). Dashed lines indicate similar groups of cases. As the survey progressed, consensus on the timing and type of conversation required was achieved for all except cases with low/intermediate risk of death.

Round 1

Consensus on the timing of ACP conversations occurred only for the sickest patients, where panelists felt conversations should occur immediately. For the remainder, one-third to one-half of panelists opted to have the conversation immediately, while the others thought it could occur before discharge. The presence of dementia increased the priority that panelists assigned to having these conversations (see Fig. 2). Many participants noted potential ambiguity in the term “advance care planning,” wondering if a distinction existed between “advance care planning” and “goals of care” conversations.

Responses to the open-ended questions following each of case in round 1 about clinical characteristics that informed panelists’ decisions about whether or not to recommend an ACP conversation further affirmed the validity of the instrument (see Table 3). Panelists repeatedly referenced three themes: the age of the patient, the presence of co-morbidities influencing long-term function (dementia) and survival (cancer), the risk of imminent death or decompensation (e.g., sepsis, respiratory failure).

Table 3 Determinants of Panelists’ ACP Recommendations for Case Vignettes

Round 2

In round 2, we explicitly differentiated between “goals of care” conversations, conducted immediately to inform near-term treatment decisions, ACP conversations conducted as part of discharge planning to inform post-acute care, and ACP conversations conducted by the primary care physician to establish preferences for future medical care. There was continued agreement that a conversation (either ACP or goals of care) needed to happen during the admission regardless of mortality risk.

Round 3

Based on responses to round 2, we further probed the role that short-term and 1-year mortality played in determining the timing of the ACP conversation, sampling cases to highlight differences among cases. In round 3, > 70% of panelists agreed that patients with high (> 10%) short-term mortality risk or high (≥ 34%) long-term risk of mortality should have a goals of care conversation. Panelists also agreed that patients ≥ 65 years old who had low (≤ 10%) short-term and low (< 19%) 1-year risk of mortality warranted an ACP rather than a goals of care conversation. However, they disagreed about the case where the patient had a low risk of short-term mortality and an intermediate (19–33%) risk of 1-year mortality (see Fig. 2).

Direct Elicitation

In rounds 2 and 3, we also probed panelists for their opinion on the exact probability of short-term and 1-year mortality that should prompt a goals of care conversation, using sliders to elicit a value between 0 and 100%. In round 2, panelists stated that they would engage in a goals of care conversation if the risk of short-term mortality was higher than 38% (IQR 20–50%), or the risk of 1-year mortality was higher than 25% (IQR 16–50%). In round 3, those assessments shifted to 30% (IQR 20–40%) or 30% (IQR 20–50%) respectively. Similarly, panelists recommended engaging in a goals of care conversation immediately only if the risk of failure to return to baseline function was relatively high (physical function: 40% [IQR 25–50%]; cognitive function: 30% [IQR 20–50%]). Assessments provided in round 3 were similar (physical function: 35% [IQR 30–50%]; cognitive function: 30% [IQR 20–40%]).

There was heterogeneity in physician responses to the surprise question (i.e., would you be surprised if the patient were dead at 1 year), with only moderate correlation (r = 0.34) to the predicted probability that the patient would be dead at 1 year: 58% of physicians would be surprised if patients with low (< 19% mortality) were dead at 1 year; 31% if patients with intermediate (19–33% mortality) were dead at 1 year; and 9% if patients with high (≥ 34%) mortality were dead at 1 year.

DISCUSSION

We conducted a modified Delphi study with an embedded behavioral experiment to establish a normative standard for the risk of mortality that should prompt clinicians to prioritize ACP conversations in the hospital. Instead of setting a threshold for the mortality that characterizes the “seriously ill,” as we had expected, a multi-disciplinary group of experts instead recommended that clinicians engage in ACP conversations with all hospitalized patients over the age of 65. For those with low risk of short-term and 1-year mortality, they recommended having the conversation before discharge, focusing on preferences for future medical care. However, for those with either high short-term or 1-year mortality, they recommended having the conversation immediately to ensure alignment between treatment and goals of care.

These observations have important policy implications. ACP is an integral part of the National Academy of Medicine’s objective of ensuring that patients receive person-centered, family-oriented, and evidence-based care, particularly at the end-of-life.1 Existing guidelines have therefore advocated that clinicians use the opportunity of hospitalization to initiate these conversations, screening for the presence of “serious illness” to decide whether or not to have an ACP conversation.4 Mortality risk is central to the definition of “serious illness.”18 However, our results demonstrate the difficulty that clinicians, even experts, have prioritizing based on the risk of mortality, except for the very highest risk patients. We found only moderate correlation between responses to the surprise question and calculator-based measures of 1-year mortality. Additionally, panelists explicitly recommended a much higher risk of short-term death when deciding whether or not to have a goals of care conversation than they implicitly used in practice when responding to the case vignettes.

Based on these findings, we conclude that strategies to increase ACP conversations by improving the calibration of clinicians’ predictions about mortality (i.e., helping them to select the sickest patients) may be ineffective and inefficient. Indeed, our results may obviate the need for a screening process altogether, since our experts agree that all hospitalized patients over 65 should have an ACP conversation. Although theoretically simpler to implement, and less variable than the surprise question, this new normative standard has its own set of barriers. A total of 12.4 million patients over the age of 65 require hospitalization each year.19 At community hospitals, with few specialist services, the onus of having ACP conversations would fall almost exclusively on hospitalists, of whom 40% already report unsafe workloads.20 From a societal perspective, submission of all these additional claims would increase physician billing charges and associated patient co-payments.21, 22

When designing the study, we speculated that factors other than mortality would influence experts’ recommendations, and specifically hypothesized that the presence of dementia might affect the urgency with which they advocated for hospitalists to have ACP conversations. Preliminary evidence confirmed our hypothesis. We believe this finding warrants additional investigation as we lacked the sample size to test the association between specific case characteristics and physician recommendations quantitatively (see Future Directions in the Appendix).

Finally, of interest, the Delphi process also highlighted the ongoing controversy over the content of ACP conversations. Consistent with Sudore’s work, panelists struggled to specify when to recommend ACP (focused on long-term goals and preferences) rather than goals of care conversations about current or near-term treatment.16 After several rounds, they came to consensus for all except the most ambiguous case—patients with intermediate (< 19–33%) 1-year risk of mortality—recommending goals of care conversations for patients with either high (≥ 10%) short-term or high (≥ 34%) 1-year mortality. Feasibility and budgetary constraints required that we limit the number of rounds in our survey. Additional rounds may have allowed us to establish a more specific recommendation about how to manage those intermediate cases.

Our study had three limitations. First, we used case vignettes as a method of observing clinician judgment in practice so that we could make inferences about the risk of mortality that influenced their decision making. Case vignettes present a static view of the patient and may fail to elicit truly representative judgments.23 However, in other clinical contexts, responses to case vignettes have proven a reasonable facsimile of standardized patients, the gold standard in simulation.24 Second, less than 60% of invited panelists completed the first rounds of questionnaires, raising concerns about response bias. Our response rate of 50% matches those seen in other studies using survey methods.25, 26 Moreover, panels of 5 participants in Delphi studies have been shown to have equal validity and reliability to panels of 60, suggesting that we had sufficient numbers of participants to establish consensus.14 Third, our decision to recruit panelists with academic and practical expertise in ACP may have influenced the recommendations. A panel of community hospitalists, cognizant of the barriers to ACP in practice, may have advocated a less liberal policy for screening patients. Future work will need to address potential discordance between expert and “lay” perspectives on the topic.

CONCLUSIONS

Experts recommend that hospitalists engage in ACP with all patients 65 years or older, instead of screening for those with serious illness. Implementation of this new standard will require interventions to motivate physicians to initiate these conversations with all patients.