Introduction

Clinical decisions are increasingly reliant on guidelines, but clinical guidelines are only as good as the available evidence on the comparative effectiveness of interventions [1•]. Ideally, such evidence would come from randomized controlled trials. When a randomized trial is not available, it may be possible to emulate it using observational data [2•]. This approach requires appropriate confounding adjustment, avoidance of selection bias in the definition of the groups to be compared, and formulation of a research question that is relevant for decision makers.

Prior explicit attempts to emulate trials using observational data have studied, for example, postmenopausal hormone therapy [3], statins [4••], epoetin [5••], and antiretroviral therapy [6••]. Here, we review the emulation of trials to compare strategies that differ in the timing of the intervention of interest. As an example, we will consider post-polypectomy surveillance by colonoscopy. During this procedure, adenomas (benign tumors of the colon [7]) are detected and removed. Most adenomas will not develop into colorectal cancer, but most cancers arise from adenomas [8]. In patients with removed adenomas, surveillance colonoscopies are recommended to detect and remove future adenomas before they become malignant. The optimal interval between colonoscopies is not known. Current guidelines both in the USA [9] and the EU [10] are mostly based on expert opinion due to the scarcity of available evidence.

Besides reviewing a methodology to emulate trials for the comparison of strategies that administer the same intervention at different times, we also review a classification of these strategies. First, we consider point interventions to study the effectiveness of a single application of the treatment. Second, we consider sustained interventions to study the effectiveness of a fixed treatment schedule (e.g., colonoscopy at 3 years after the initial procedure). Third, we consider sustained interventions to study the effectiveness of a personalized schedule of treatment (e.g., colonoscopy every year if the most recent procedure detected large adenomas, otherwise every 3 years). To fix ideas, we review the methodology in the context of its implementation to a cohort of Norwegian individuals. We start by describing this cohort.

Data

The Norwegian Colorectal Cancer Prevention (NORCCAP) screening study was a randomized clinical trial of once-only sigmoidoscopy screening versus no sigmoidoscopy, conducted in Oslo and Telemark counties in Norway between 1999 and 2001. Our analysis includes participants in the sigmoidoscopy arm in whom at least one adenoma was detected (n = 2190). As part of the trial, endoscopies were conducted in these individuals until the bowel was free from adenomas. We excluded patients with history of serious gastrointestinal disease, known genetic predisposition to colorectal cancer, and cancer detected as a result of screening in NORCCAP.

In addition to the available data (age, sex, county, smoking, family history of colorectal cancer, and findings at NORCCAP colonoscopies), we conducted a manual chart review at all hospitals in Oslo and Telemark—guided by claims data from the governmental single-payer agency HELFO—to collect data on the date, findings (e.g., size and type of adenomas) and indication of all subsequent colonoscopies and sigmoidoscopies. Of the post-screening endoscopies, 64 % were for surveillance purposes (3 % sigmoidoscopies and 61 % colonoscopies); 30 % were clinically indicated because of symptoms (27 % colonoscopies, 3 % sigmoidoscopies); and 6 % were due to a recent incomplete endoscopy (4 % colonoscopies, 2 % sigmoidoscopies).

Our outcome of interest was incidence of colorectal cancer. For many surveillance interventions, the use of cancer incidence as an outcome is questionable because of potential lead time bias: [11] cancer cases will be detected earlier in patients with more intensive surveillance, which will make surveillance appear less beneficial. In this case, however, the use of the outcome cancer incidence is justified because most of the beneficial effect of surveillance colonoscopy seems to be due to removing adenomas before they become malignant [12], with only a small component of the effect due to earlier detection of prevalent cancer. Death from colorectal cancer could not be studied as an outcome because there were too few cases.

We refer to the date of the last NORCCAP colonoscopy as time of “first eligibility” for our analyses. For each individual, follow-up ends at colorectal cancer, death, sigmoidoscopy, emigration, or December 2011, whichever occurred first. Because we are trying to estimate the effects of post-baseline colonoscopies, which were not randomly assigned to the trial participants, ours is an analysis of observational data. The flow chart in Fig. 1 describes the enrollment of participants in our study. Table 1 displays the characteristics of the eligible individuals.

Fig. 1
figure 1

Flowchart of selection of the 2190 eligible individuals from the intervention arm of the NORCCAP trial

Table 1 Characteristics of 2190 eligible individuals from the intervention arm of the NORCCAP trial

Three Hypothetical Randomized Trials

The design of any trial is determined by the causal question of interest, which in turn is determined by the population, the strategies being compared, and the outcome of interest to the decision makers [13]. For surveillance tests, the strategies are defined by the timing of the test. Some strategies involve a point intervention at baseline, whereas other strategies involve interventions that are sustained over time according to either a fixed schedule (e.g., do not perform a colonoscopy for 5 years after baseline, then perform a colonoscopy at the end of year 5) or a schedule that depends on each individual’s time-evolving clinical characteristics (i.e., schedule the time of every colonoscopy according to the findings at the previous colonoscopy). We refer to sustained strategies with a fixed schedule as static and to those with a subject-specific schedule as dynamic.

Here, we review three types of hypothetical trials that compare static and dynamic strategies and therefore address different questions regarding the effectiveness of surveillance colonoscopy. In all trials, eligible individuals are followed until death, loss to follow-up (i.e., emigration out of Norway), sigmoidoscopy, occurrence of the outcome (here, diagnosis of colorectal cancer), or December 31, 2011, whichever occurred earlier. In all trials, individuals receive a colonoscopy whenever it is clinically indicated (e.g., due to symptoms) but a surveillance colonoscopy only according to the trial protocol. A graphical representation of each trial is shown in Fig. 2.

Fig. 2
figure 2

The three trial types considered in this paper. Circles represent randomization, dotted lines represent periods when the strategy specifies all interventions (e.g., colonoscopy or no colonoscopy), solid lines represent periods when the strategy does not specify the intervention (e.g., anything goes, colonoscopy or no colonoscopy)

Trial type #1: point interventions assigned at a fixed time after first eligibility

Individuals who survived 36 months since first eligibility are randomized to either (1) immediate surveillance colonoscopy or (2) no surveillance colonoscopy. Additional eligibility criteria are no colorectal cancer, colonoscopy, or sigmoidoscopy during the 36 months before randomization. Individuals who reach age 70 or develop any invasive non-colorectal cancer before baseline also become ineligible (other comorbidities might be added to the exclusion criteria). For each individual, follow-up starts at the time of randomization, i.e., baseline is 36 months after first eligibility.

More generally, one can consider trials in which baseline is month z, where z ranges between 36 and 84. The effect estimates from these trials will only apply to survivors without symptoms or cancer by z months after first eligibility. These trials will help determine the effect of undergoing a colonoscopy among the survivors, but it does not directly inform the decision of when to undergo the colonoscopy. The next trial does so.

Trial type #2: sustained static strategies assigned at first eligibility

Baseline is the time of first eligibility. Individuals are randomized to either (1) surveillance colonoscopy 36 months after baseline or (2) surveillance colonoscopy 84 months after baseline. Individuals in both arms who reach age 70 or develop malignancies other than colorectal cancer may have surveillance colonoscopies at any time as determined by their physician. More generally, one can consider additional arms in which 36 is replaced by any value of x between 36 and 84. We could also consider similar trials in which baseline is any month after first eligibility. For example, one could consider a trial in which individuals who have survived 36 months after first eligibility are randomized to either (1) immediate surveillance colonoscopy or (2) surveillance colonoscopy at month 84 after first eligibility (48 months after baseline at 36 months). We will only consider trials with baseline at first eligibility.

Both trial types #1 and #2 compare fixed surveillance schedules, but they address different questions. Trial #1 helps individuals who have survived z months after adenoma removal decide whether they should undergo a surveillance colonoscopy at that time. Trial #2 helps individuals who just had their adenomas removed decide how long they should wait before having a surveillance colonoscopy (if they plan to have only one surveillance colonoscopy). Neither trial type considers strategies that assign different surveillance schedules to different individuals (i.e., dynamic strategies). The next trial type does so. 

Trial type #3: sustained dynamic strategies assigned at first eligibility

Individuals at first eligibility are randomized to either (1) receive surveillance colonoscopies according to the following rules:

  • First surveillance colonoscopy at 36 months if the adenomas detected at baseline sigmoidoscopy were low risk (1 or 2 small adenomas without villous features) and 12 months earlier (at month 24) otherwise.

  • Follow-up surveillance colonoscopy 36 months after the previous colonoscopy (surveillance or clinical) if low-risk adenomas were detected, 12 months earlier (24 months after the previous colonoscopy) if high-risk adenomas (more than two, or large, or containing villous features) were detected, and 12 months later (48 months) if no adenomas were detected.

or (2) surveillance colonoscopies according to similar rules, but where 36 months is replaced by 84 months. During the follow-up, individuals in both arms of the trial may also receive a colonoscopy whenever it is clinically indicated due to symptoms. Individuals who reach age 70 or develop malignancies other than colorectal cancer after baseline may have surveillance colonoscopies at any time as determined by their physician. For each individual, follow-up starts at the time of randomization, i.e., baseline is the time of first eligibility.

More generally, one can consider additional arms in which 36 is replaced by x with x ranging from 36 to 84, or trials in which the time until the next surveillance colonoscopy is obtained by adding or subtracting y (rather than 12) months.

Emulating the Design of the Hypothetical Trials

In this section, we review how to emulate the design of each of the above hypothetical trials by setting up a database with the same structure as that of the trial. In the next section, we review how to mimic the analysis of the hypothetical trials.

Trial type #1: point intervention assigned at a fixed time after first eligibility

We emulated 49 “trials,” one starting at each month z between months 36 and 84 after first eligibility. For the “trial” starting in month z, we identified the individuals who met the eligibility criteria at baseline, i.e., all individuals with adenomas detected and removed at first eligibility who were alive and had not yet had a post-screening colonoscopy/sigmoidoscopy or been diagnosed with colorectal cancer by z months of follow-up. For each trial, individuals were classified into the colonoscopy arm if they received a colonoscopy during month z and into the control arm otherwise.

We identified 2028 eligible individuals. On average, each participated in 45 trials, of which at most 1 was in the colonoscopy arm. The number of eligible individuals who received a colonoscopy at baseline ranged between 0 (in several trials) and 16 (in trial z = 61). See Appendix Table 3 for details. Unfortunately, all trials had zero cancers among the exposed, which means the data from NORCCAP cannot be used for a meaningful emulation of trial type #1.

Trial type #1 has the advantage of being easy to emulate and analyze when sufficient observational data are available. This approach has been used in observational studies to estimate the observational analog of the intention-to-treat effect of statin therapy [4••] and postmenopausal hormone therapy [3]. Here, we will not consider this trial type further.

Trial type #2: sustained static strategies assigned at first eligibility

We emulated a randomized trial with 49 arms, in which the participants were assigned at first eligibility to colonoscopy at a randomly assigned time ranging from month 36 to 84 after first eligibility. Classifying the 2190 eligible individuals into a single arm is not possible because, at baseline, each individual’s data are consistent with all 49 arms. To overcome this problem we created an expanded dataset with 49 clones of each individual, and assigned each of them to a different arm [14]. The 2190 eligible subjects contributed 107,309 clones to this trial. See Appendix Table 4 for details.

The clones in the expanded dataset were censored at the time their data deviated from the strategy to which they were assigned. For example, in arm 84, 12.9 % of participants were censored for having a surveillance colonoscopy too early (before month 84), 73.5 % of participants were censored for failing to have a surveillance colonoscopy in time (in month 84), and 0.5 % were censored for having a sigmoidoscopy. Those who received a colonoscopy for clinical reasons or developed malignancies other than colorectal cancer were subsequently considered “immune” from censoring.

Trial type #3: sustained dynamic strategies assigned at first eligibility

We emulated a trial with 49 arms, one for each value of x in the dynamic strategies defined above. The 2190 individuals were classified into the arm that was consistent with their observed data. Like in the previous trial, individuals cannot be assigned to a single arm at baseline, so we created an expanded dataset with 49 clones of each individual and assigned each of them to a different arm. The clones were censored at the time they deviated from the strategy to which they were assigned. For example, in arm 84, 11.3 % of participants were censored for having a surveillance colonoscopy too early, 79.7 % of participants for failing to have a surveillance colonoscopy in time, and 1.3 % for having a sigmoidoscopy. The 2190 eligible subjects contributed 107,309 clones to this trial. See Appendix Table 5 for details.

Emulating the Design of Hypothetical Trials with a Grace Period

So far, we have implicitly assumed that it is possible to administer a colonoscopy at a precisely specified time point, e.g., month 36. However, in many clinical settings, this may not be feasible. We may therefore be more interested in emulating trials with a grace period, that is, a window of m months during which the patient may undergo colonoscopy. For example, in trial type #2, patients would be assigned to interventions of the form “surveillance colonoscopy between x and x + m months after baseline.” Trials with a grace period more accurately reflect clinical practice in which administrative delays and patient availability may prevent an immediate intervention.

Strategies with a grace period are emulated using “clones” as described above, but with different criteria for censoring. Suppose we use a grace period of m = 6 months. An individual who received a surveillance colonoscopy in month 40 now has data consistent with arm 36 because subjects assigned to this arm are allowed to have a colonoscopy at any time between months 36 and 42. Therefore, his clones assigned to arms 36 to 40 will not be censored whereas his clones assigned to arm 41 will be censored because he received a surveillance colonoscopy before the assigned time.

The addition of a grace period requires us to specify the distribution of the interventions during the grace period. For example, we might ask whether most colonoscopies are performed during the first 2 months of the grace period or whether they are more equally distributed during the grace period. In our application, we will specify a uniform distribution of colonoscopies during the grace period [14].

In both trials #2 and #3 with a 6-month grace period, each of the 2190 eligible individuals in the original dataset contributed 49 clones, for a total of 107,310 clones to the expanded dataset. In trial #2, the average censoring time ranged between 41.9 months for x = 36 to 89.1 months for x = 84. In arm 84, 12.9 % of participants were censored for having a surveillance colonoscopy too early (before month 84), 71.5 % of participants were censored at month 90 for failing to have a surveillance colonoscopy in time, 0.1 % were censored after month 90 for having a second surveillance colonoscopy, and 0.6 % were censored for having a sigmoidoscopy. Across the 49 arms, there were 381 incident cases of colorectal cancer in the clones, which occurred in 12 unique individuals.

In trial #3, the average censoring time ranged from 34.2 months for x = 36 to 78.1 months and for x = 84. For arm 84, 11.3 % of participants were censored for having a surveillance colonoscopy too early, 77.6 % for failing to have a surveillance colonoscopy in time, and 1.4 % for having a sigmoidoscopy. In total, there were 254 incident cases of colorectal cancer in 13 unique individuals. See Appendix Tables 4 and 5 for details.

Emulating the Analysis of the Hypothetical Trials

After reviewing how to create observational databases with the same structure as hypothetical randomized trials, we review how to use those databases to estimate the cumulative incidence curves (or their complement, the survival curves) that would have been observed under each strategy if all individuals had fully adhered to their original arm assignment. In a slight abuse of notation, we index the strategies by the variable x, which was defined in the previous sections. For example, in trial #2, x = 78 corresponds to the strategy “surveillance colonoscopy between 78 and 78 + 6 months after baseline.”

In a true randomized trial with many arms x, we could estimate these curves nonparametrically (Kaplan-Meier curves) or parametrically by fitting a pooled logistic model of the form \( logit\; \Pr \left({Y}_{t+1}=0\Big|{Y}_t={D}_t=0,\;x\right)={\alpha}_{0,t}+{\alpha}_1f(x)+{\alpha}_2f(x)\times t \), where t denotes time (in months), Y t is an indicator of colorectal cancer by t, D t is an indicator of death by t, \( {\alpha}_{0,t} \) is a time-varying intercept (estimated, for example, via restricted cubic splines for time with knots at 30, 60, 90, and 120 months), \( f(x) \) is a function of x (for example, a second degree polynomial), and \( f(x)\times t \) is a product term to allow the hazard ratio to vary during the follow-up. For example, for the first 36 months of follow-up, the hazard is known to be identical under all strategies, but it may change after that if colonoscopy has a non-null effect on colorectal cancer incidence.

We would then calculate the predicted values for each value of x and compute their product in order to estimate the survival curves. Pointwise 95 % confidence intervals for the curves can be obtained via a non-parametric bootstrap. In our emulated trials, however, the above logistic model needs to be adjusted by both baseline and post-baseline (time-varying) confounders. The procedure then needs to be modified as we now describe.

Adjustment for Covariates

In both trials # 2 and #3, we need to adjust for covariates that jointly predict surveillance colonoscopy A t (and therefore censoring) and subsequent outcome. Some of these variables are fixed at the baseline of each trial; others vary during the follow-up. Let L 0 represent the vector of baseline covariates, which include age at baseline, sex, family history of colorectal cancer, history of smoking, and findings at NORCCAP colonoscopies (number of adenomas, size, histology, and presence of villous elements). Let L t represent the vector of time-varying covariates, which include an indicator for incident non-colorectal malignancies, and a vector of the findings from the most recent colonoscopy (number of adenomas, size of largest adenoma, histological grade, and presence of villous elements).

To adjust for L 0, one could fit the pooled logistic model \( logit \Pr \left({Y}_{t+1}=0\Big|{Y}_t=0,x,{L}_0\right)={\alpha}_{0,t}+{\alpha}_1f(x)+{\alpha}_2f(x)\times t+{\alpha}_3{L}_0 \) to the expanded dataset of each trial separately. To obtain the survival curves under each strategy x, one would then calculate the predicted values for each value of x, standardized them by L 0, and compute their product. However, the time-varying covariates L t cannot be added to the logistic model because these variables may be affected by prior treatment [10, 11] (a colonoscopy may change the findings at future colonoscopies, for example by removing adenomas; see Appendix). We therefore need to use IP weighting to adjust for L t .

The subject-specific, time-varying IP weights are \( {W}_t=\prod_{j=0}^t\frac{1}{f\left({A}_j|{\overline{A}}_{j-1},\;{\overline{L}}_j,\;{Y}_j={D}_j=0\right)} \). Informally, the denominator of the weights is each subject’s conditional probability of having, at each time t, his or her own surveillance colonoscopy history. We use overbars to denote history, i.e., \( {\overline{L}}_t= \) (L 0, L 1, L 2, …, L t ).

The factors in the denominator of the weights were set to 1 in months following age 70, a non-surveillance colonoscopy, or the diagnosis of malignancies other than colorectal cancer because the individual has a probability 1 of remaining uncensored during those months. The factors in the denominator were also set to 1 during the first 9 months after a colonoscopy is received, because no surveillance colonoscopies were performed during this period (only colonoscopies due to symptoms or to incompleteness of the preceding colonoscopy). In previous applications of IP weighting for strategies with grace periods, the investigators were interested only in strategies that were not sustained beyond the initial decision to treat [14]. Therefore, the contributions to the weights were set to 1 for all time periods after treatment was first received.

For all other months, we estimate the denominator by fitting a logistic model for the conditional probability of receiving a colonoscopy to the original, unexpanded study population. We fit the model

$$ logit \Pr \left({A}_t=\left.1\right|{\overline{A}}_{t-1},{\overline{L}}_t\right)={\beta}_{0,t}+{\beta}_1g\left({\overline{A}}_{t-1}\right){P}_t + {\beta}_2{L}_0+{\beta}_3{L}_t{P}_t $$

where \( {\beta}_{0,t} \) is a time-varying intercept estimated via restricted cubic splines with knots at 30, 60, 90, and 120 months, \( g\left({\overline{A}}_{t-1}\right) \) is the time since the most recent colonoscopy, and covariate history \( {\overline{L}}_t \) is summarized via the time-varying covariates L t and the baseline variables L 0, which include age (restricted cubic splines with knots at 50, 55, 60, and 65 years); sex; family history of colorectal cancer (yes/no); history of smoking (yes/no); findings at the NORCCAP colonoscopies (indicators for three or more adenomas, adenoma greater than 10 mm, adenoma with villous component); and histological grade (1 if high grade dysplasia, 0 otherwise). The variables \( g\left({\overline{A}}_{t-1}\right) \) and \( {L}_t \) are entered to the model only in a product (“interaction”) term with P t , an indicator for prior colonoscopy (1 if the individual had a colonoscopy before t, 0 otherwise), such that the terms are zero in individuals who have not had a previous surveillance colonoscopy.

Because the IP weights already adjusted for the baseline covariates L 0, we did not include them as covariates in the outcome model. That is, we fit the weighted pooled logistic model \( logit\kern0.2em \Pr \left({Y}_{t+1}=0\Big|{Y}_t=0,\;x\right)={\alpha}_{0,t}+{\alpha}_1f(x)+{\alpha}_2f(x)\times t \). To check the robustness of our estimates to different choices of functional form for time and x, we explored different parameterizations of the outcome model, including a quadratic functional form for time, cubic terms for x, and additional interaction terms between f(x) and time.

Grace Period

Because our strategies of interest include grace periods, the above-mentioned IP weights W t need to be modified [14]. Specifically, the numerator of the factors corresponding to months included in the grace period need to change to ensure that surveillance colonoscopies will be uniformly distributed during the grace period. For trial #2, the numerator of factors corresponding to month j of the grace period is replaced by \( \frac{1}{m+1-j} \) with j = 0, 1, … 5 when A t  = 1, and replaced by \( \frac{m-j}{m+1-j} \) when A t  = 0. For trial #3, where there can be multiple surveillance colonoscopies, we use the same approach during all grace periods.

Estimates from NORCCAP Data

Table 2 shows the 5- and 10-year risks of colorectal cancer for arms 36 and 84 in trials #2 and #3. For both static and dynamic strategies, earlier surveillance colonoscopy resulted in a lower risk. The estimated survival curves for selected arms of trials #2 and #3 are shown in Fig. 3. As expected, the survival curves are essentially identical over the first 3 years, as the strategies are the same during this time period. Results were similar in sensitivity analyses using different functional forms for f(x) and time.

Table 2 Estimated risk of colorectal cancer at 5 and 10 years under selected surveillance strategies, intervention arm of the NORCCAP trial
Fig. 3
figure 3

Estimated survival curves for trials #2 and #3, intervention arm of the NORCCAP trial

Note that had the dataset included no cancer diagnoses after surveillance colonoscopy, the conclusion that delaying colonoscopy increases risk would be foregone. In our dataset, only one individual who has a surveillance colonoscopy between months 36 and 84 subsequently developed colorectal cancer, and he was censored before getting cancer under most clinically relevant strategies. Any changes to the strategies that led to him not being censored would result in substantial changes to the estimates. Therefore, our analysis needs to be replicated in a larger dataset.

Conclusions

After a medical procedure or medication has been shown to be effective, the next question is usually how often it should be administered. In this paper, we reviewed an approach that, when applied to a sufficiently large and rich dataset, helps decide among various timing strategies. Specifically, we outlined the design and analysis of hypothetical randomized trials to compare different strategies, and provided a methodology for emulating these trials using observational data.

As a motivating example, we compared the effectiveness of different strategies for scheduling surveillance colonoscopies in patients with adenomas, a clinical question for which the available evidence is sparse [9, 1520]. Our analysis suggests that more frequent surveillance colonoscopies leads to a greater reduction in colorectal cancer risk; as expected, the analysis also suggests that dynamic strategies are more effective than static strategies. However, our analysis is more an example of implementation than an attempt at providing definite answers to the clinical question because the sample size of our study was small.

The application of the methods outlined in this review allowed us to specify a research question that is directly relevant to decision makers interested in timing questions. Though these methods allow adjustment for both baseline and time-varying covariates, the possibility of unmeasured confounding remains as in any observational study.