Introduction

In a previous Hints and Kinks, we discussed the role of causal inference in tasks of health services research (HSR) using examples from health system interventions (Moser et al. 2020). In the present Hints and Kinks, we more formally introduce a principled framework for causal inference. Specifically, we discuss in more detail the role of counterfactuals for the definition of a causal effect and the ‘association is not causation’ adage. We continue on the example of a hospital merger (HM) as a health system intervention.

Counterfactuals and causal effect

We introduced counterfactuals as hypothetical outcomes which are actually not observed in a real-world setting (Hernán 2004). We used an example of a HM, where we were interested in the causal question whether a HM reduces hospital readmissions (Moser et al. 2020). To answer this question, we need to define a causal effect, a statistical measure which relates probabilities of hospital readmissions when (1) every patient is treated under the situation of a HM versus (2) the HM would not have been implemented. Note that we never observe one of the two situations, because either the HM is implemented or not, but not both. We now introduce a formal notation for causal inference which allows us to mathematically define a causal effect.

For each patient, we would like to know his or her outcome (here, a hospital readmission) if the HM had not been implemented (denoted as YnoHM) together with the outcome under the HM (denoted as YHM). The superscripts denote the counterfactual outcomes we can formalize, but which are actually not observed: Only YHM can be observed if the HM is implemented. An average causal effect in the study population can then be defined by the risk difference Probability(YHM  = 1)–Probability(YnoHM  = 1), abbreviated as RDCausal. Note that we could also use other risk measures, for example a relative risk, for the definition of a causal effect. The choice of the used effect measure depends on the research question because the underlying scale (i.e., an additive scale for a risk difference or multiplicative scale for a risk ratio) influences its final interpretation (Hernán and Robins 2020).

An important question remains: How can we assess an effect measure based on outcomes which are actually not observed? One could compare the outcomes in the region with HM to outcomes in a 'control' region with no HM. Table 1 shows hypothetical patients with (known) counterfactual outcomes and actually observed outcomes (denoted with subscriptsYnoHM, YHM, YObserved). For example, the patient with ID 5 was treated in the HM region with no observed hospital readmission (YObserved = 0). The observed outcome is equal to the counterfactual outcome in the HM region (YObserved = YHM = YHM = 0). Note that if this patient would have been treated in the control region, he or she would have had a readmission (YnoHM  = 1). Because this patient is actually only observed in the HM region, one will never observe the outcome of the control region (YnoHM is missing). The mathematical notation for counterfactuals might be initially confusing, yet it is a necessary component for a causal inference framework.

Table 1 Study population of five patients

What is the average causal effect in the study population from Table 1? We get that the risk difference RDCausal is zero, because Probability(YHM = 1) = 3/5 and Probability(YnoHM  = 1) = 3/5. Thus, the HM does not reduce hospital readmissions.

Association versus causation

An associational effect measure generally compares risks in subsets of a study population by conditioning on certain study characteristics (see Fig. 1) (Hernán 2004). In the example of Table 1, one relates the risk of hospital readmissions among patients in the HM region with the risk among patients in the control region. Let us define

$$\begin{gathered} {\text{RD}}^{{{\text{Associational}}}} := {\text{Probability}}\left( {Y^{{{\text{Observed}}}} = {1\text{ among patients in the HM region}}} \right) \hfill \\ - {\text{Probability}}\left( {Y^{{{\text{Observed}}}} = {1\text{ among patients in the control region}}} \right), \hfill \\ \end{gathered}$$

as the associational risk difference in the study population. We obtain from Table 1 that the first expression of RDAssociational is 0 (two patients were treated in the HM region without an observed hospital readmission) and the second expression 1/3 (three patients were treated in the control region with one hospital readmission). Thus, RDAssociational is equal to 0–1/3 = –1/3, i.e., the risk of hospital readmissions in the HM region is lower compared to the risk in the control region.

Fig. 1
figure 1

Source: Figure adapted from Hernán (2004)

Graphical explanation of ‘association versus causation’ using the example of a hospital merger as a health system intervention. Study outcome: Hospital readmissions. ‘Association’ compares relationships in subsets of a study population, indicated by the separated triangles. For example, one compares the risk of hospital readmissions among patients treated in a region with a hospital merger and among patients treated in a region without a hospital merger. ‘Causation’ compares situations (i.e. ‘what-if’ questions) between hypothetical study populations. For example, one compares hospital readmissions in a population where every patient would have been treated in a region with a hospital merger with a population where every patient would have been treated in the same region, but without a hospital merger

The difference between the derived causal effect RDCausal and the associational effect RDAssociational leads to the famous ‘association is not causation’ adage. Likely because of this adage, many researchers in HSR avoid any causal terminology, especially when they use ‘only’ observational data (Hernán 2018). They argue that the above comparison of outcomes between an ‘intervention’ and a ‘control’ region does not allow for any causal conclusions because the regions differ in several ways, for example, due to the case mix of treated patients, the skill-grade mix of medical personnel or the availability of health care services. When a study design randomly allocates patients before hospital entry to either the HM region or the control region (and patients and health care providers perfectly comply with that assignment), researchers would interpret statistical findings as causal. But in fact, many studies in HSR are observational studies without a random allocation of patients to treatment groups. Still, often only ‘descriptive’ and ‘modeling’ approaches are then used to support decision-making in health systems, even if the background is inherently causal. Whether the reported effect measure should be used from a causal inference approach or from descriptive and modeling approaches strongly depends on the intended HSR question.

How can researchers integrate ‘causality’ in HSR? Our above introduced components of a framework for causal inference is the backbone for modern causal inference. Modern causal inference allows for inference which mimics a situation as if patients would have been assigned by random allocation, despite using an observational study design. Topics for recent calls of causal inference approaches in HSR include, for example, comparative effectiveness research, payment scheme evaluations, health care utilization or the use of simulation studies (see Table 2). Principles of modern causal inference are described and explained in several textbooks (van der Laan and Sherri 2011; Pearl et al. 2016; Hernán and Robins 2020).

Table 2 Selected study examples using causal inferences approaches in health services research

Discussion

In the present Hints and Kinks, we introduced components for a principled framework for causal inference in HSR. Because ‘causal inference’ is conceptually different from ‘description’ or ‘modeling’, HSR needs the integration of a causal inference framework which includes a specific notation, definitions and analysis techniques to extend the traditional tasks of ‘description’ and ‘modeling’. Public health decision-making which solely relies on associational effect measures might lead to inappropriate decisions because questions about optimal decision-making are inherently causal. We plea that students and researchers in the field of HSR are aware of the different available frameworks to successfully address ‘description’, ‘modeling’ and ‘causal inference’, depending on the intended research question.