FormalPara Key Summary Points

Extrapolation of survival outcomes is key to health technology assessments in oncology to quantify lifetime benefit of a novel intervention. Conventional methods to extrapolate limited overall survival (OS) data beyond clinical trial follow-up may lead to high uncertainty in long-term estimations. External data can further inform the extrapolation of OS data and increase confidence in estimates of long-term survival outcomes.

The analysis focused on cilta-cel, a CAR-T therapy for triple-class relapsed/refractory multiple myeloma. Data from the initial cutoff of the pivotal phase Ib/II CARTITUDE-1 trial demonstrated strong efficacy results (97% overall response and 89% 12-month OS rate) in this patient population with limited therapy options. OS data from CARTITUDE-1 were extrapolated using various models, including those using conventional parametric methods (without incorporating external data), as well as Bayesian models that were additionally informed by a related external data source (i.e., the phase I LEGEND-2 trial).

Variability among parametric models was greatly reduced when the external LEGEND-2 data informed the extrapolations from the pivotal CARTITUDE-1 trial. These projections were further validated with observed 28-month OS data from CARTITUDE-1.

Introduction

Estimates of long-term overall survival (OS) benefits of new oncology treatments play an important role in economic evaluations and have an impact on patient access, especially in countries where health technology assessment focus is on lifetime benefits. However, typical follow-up in pivotal oncology trials is limited, and OS data may be too immature to demonstrate robust lifetime benefits before marketing authorization is granted [1]. This is specifically the case for newer immune- or cell-based therapies that have higher efficacy with longer patient survival. Standard parametric survival modeling is a conventional method of generating extrapolations from observed trial data, but is associated with significant uncertainty, especially when trial data are highly immature [1]. Furthermore, the underlying hazard (event rate) function over time may not be appropriately represented by standard parametric models [2]. This is particularly relevant for newer cancer therapies that have the potential for achieving durable survival benefits [3]. Plateauing survival curves long after treatment administration and heavy censoring toward the end of the follow-up period present a challenge for extrapolating long-term survival from short-term clinical data [1].

To support long-term projections from short-term trial data, external information is often utilized. External source data with longer follow-up can help inform the selection of plausible extrapolation curves from the observed trial by providing evidence for external validity. Additionally, external source data can be more formally combined with trial data in the survival modeling to generate extrapolation curves [2].

Joint modeling of clinical trial data and individual patient-level data from external sources is an area of active research [4,5,6]. A review of methods for extrapolating survival from randomized trials using external data presented a structure for model choices based on various assumptions about how the disease population hazards relate to external population hazards [5]. In the literature, joint modeling of trial and external data is typically undertaken using a Bayesian framework [7]. In the Bayesian approach recently illustrated by Soikkeli et al. [8], external information was used in specifying prior distributions for the shape parameter of the survival distribution, providing a flexible framework for incorporating external individual patient data. These approaches often assume less restrictive exchangeability assumptions than formally pooling data and hence are less vulnerable to differences in baseline characteristics from the external data.

Ciltacabtagene autoleucel (cilta-cel) is a chimeric antigen receptor (CAR)-T cell therapy approved in the USA and in Europe for patients with relapsed/refractory multiple myeloma (RRMM) after ≥ 4 (USA) or ≥ 3 (EU) prior lines of therapy, including a proteasome inhibitor (PI), an immunomodulatory drug (IMiD), and an anti-CD38 monoclonal antibody [9]. Patient T cells are genetically modified to target B-cell maturation antigen on the surface of multiple myeloma (MM) cells. CARTITUDE-1 was the registrational, phase Ib/II trial of cilta-cel. Initial results were reported at median follow-up of 12.4 months [10], at which point the 12-month OS rate was 89%. Updated results reported a median follow-up of 28 months with a 27-month OS rate of 70% [11]. Extrapolating CARTITUDE-1 OS data to estimate cilta-cel’s long-term treatment benefit for economic evaluation is challenging, as most patients were still alive at the end of trial follow-up. Estimations of OS, particularly in the post-trial period, can differ substantially when various standard extrapolation models are used.

In this analysis, we implemented a Bayesian approach as outlined by Soikkeli et al. [8] to estimate long- term OS of cilta-cel from the CARTITUDE-1 study 12-month data cut. External data from the LEGEND-2 study, a phase I trial assessing LCAR-B38M (a CAR construct identical to that of cilta-cel) with median follow-up of 48 months [12], were formally leveraged to inform these predictions. Availability of an additional CARTITUDE-1 data cut at 28 months provided an opportunity to support and validate this approach.

Methods

Source Data 1: CARTITUDE-1 Pivotal Trial

CARTITUDE-1 was a single-arm, open-label, multicenter, phase Ib/II study conducted primarily in the USA (NCT03548207) [10, 11]. The objectives of the study were to characterize cilta-cel safety, confirm the recommended phase II dose (phase Ib), and evaluate clinical efficacy. Eligible patients were ≥ 18 years of age, had a diagnosis of MM per International Myeloma Working Group diagnostic criteria [13], measurable disease at baseline, and Eastern Cooperative Oncology Group performance status of 0 or 1. All patients received ≥ 3 prior lines of therapy including a PI, an IMiD, and an anti-CD38 antibody, or were double refractory to a PI and an IMiD and received an anti-CD38 antibody, with evidence of progressive disease within 12 months of the last line of therapy. Patients received a single cilta-cel infusion [target dose 0.75 × 106 CAR-positive viable T cells/kg (range 0.5–1.0 × 106)] 5–7 days after lymphodepletion with 300 mg/m2 cyclophosphamide and 30 mg/m2 fludarabine daily for 3 days. Ninety-seven patients, all from the USA, received treatment with cilta-cel and were included in the present analysis. OS was a secondary endpoint. Data have been published from a September 2020 data cutoff (median 12 months follow-up) [10] and from a January 2022 data cutoff (median 28 months follow-up) [11]. Patients in CARTITUDE-1 had an unprecedented overall response rate of 98%, with 83% of patients experiencing complete response, and median OS and progression-free survival had not yet been reached at 28 months median follow-up.

Source Data 2: LEGEND-2 Trial (Historical External Data)

LEGEND-2 was a phase I, single-arm, open-label study conducted in four sites in China (NCT03090659) [12]. Enrolled patients were aged 18–80 years and had a diagnosis of RRMM. Lymphodepletion regimen varied by study site, using either cyclophosphamide 300 mg/m2 or cyclophosphamide 250 mg/m2 plus fludarabine 25 mg/m2 for 3 days [14, 15]. LCAR-B38M CAR-T cells were infused either in three separate infusions or in a single infusion. Seventy-four patients received treatment with LCAR-B38M. Data have been published for a May 2021 data cutoff (median 48 months follow-up) [12]. This is the longest follow-up of OS in patients with RRMM receiving CAR-T therapy to date.

Comparison of CARTITUDE-1 and LEGEND-2 Studies and Populations

Both CARTITUDE-1 and LEGEND-2 enrolled patients with RRMM, were of similar design and size, and used the same CAR construct to manufacture the study CAR-T cell product. There were important differences between the two studies in trial design (cilta-cel dosing schedule and lymphodepletion regimens), eligibility criteria (previous exposure to immunomodulatory drugs, proteasome inhibitors, and anti-CD38 monoclonal antibodies), and baseline patient characteristics (Table 1); however, LEGEND-2 is currently the best available external source to utilize in modeling long-term projections of the patients with RRMM from CARTITUDE-1.

Table 1 Comparison of baseline characteristics of patients from LEGEND-2 and CARTITUDE-1 trials

Compliance with Ethics Guidelines

The LEGEND-2 and CARTITUDE-1 studies were conducted in accordance with the Declaration of Helsinki and an institutional review board or independent ethics committee at each study site approved the respective study protocol. The current analysis is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.

Statistical Analysis: Comparison of LEGEND-2 and CARTITUDE-1 OS Outcomes and Hazards

The authors had access to individual patient data from the two studies, and also utilized the published OS data [10,11,12]. Extrapolation of OS from 12-month CARTITUDE-1 data, adjusting for general population mortality, was carried out using two approaches: (1) an uninformed approach that used solely 12-month CARTITUDE-1 data in conventional survival models with standard parametric distributions, and (2) an externally informed approach using Bayesian survival modeling to extrapolate 12-month CARTITUDE-1 data with informative shape prior from LEGEND-2 (similar to that previously described by Soikkeli et al.) [8].

The informative Bayesian method provides flexibility in how much weight is given to the information provided by the external source. Also, consistent with Soikkeli et al. [8], the shape parameter was selected as the informative prior, since the median survival is less prone to changes in the shape parameter compared with other parameters (e.g., scale). For instance, in a Weibull distribution, the shape parameter describes increasing or decreasing hazards over time [16], whereas scale parameter is proportional to the median survival [17].

The first approach (conventional, uninformed) was accomplished by conducting standard survival models in R software. The eligible list of distributions to be explored in the second approach was selected on the basis of the clinical plausibility of 5-year projections from the models developed using the first approach.

In the second approach (informative Bayesian), we first used Bayesian survival models on the 48-month LEGEND-2 data using non-informative priors (i.e., the extrapolation was informed only by LEGEND-2 data) for each distribution explored. A moment-fitting approach was used to fit the matching gamma distribution to the posterior Monte Carlo Markov chain simulation results of the shape parameter from the Bayesian survival models conducted on 48-month LEGEND-2 data. This fitted gamma distribution from LEGEND-2 48-month OS data was used as an informative prior for the shape parameter used in the Bayesian survival analysis of the CARTITUDE-1 12-month data.

The gamma distribution was chosen to characterize the shape parameter prior because shape parameter needs to be always non-negative, and it was consistently used in Bayesian analyses from the previous literature [18].

This method assumes the shape parameter is exchangeable for the CARTITUDE-1 and LEGEND-2 cohorts. The strength of the informative prior was adjustable. When the fitted gamma distribution from the posterior of the LEGEND-2 results was used as the shape prior distribution for the CARTITUDE-1 Bayesian analysis, the resulting shape parameter was informed by both CARTITUDE-1 and LEGEND-2 data based on their corresponding trial sizes (referred to as an informed prior). If the fitted gamma distribution parameters from the LEGEND-2 data posterior were multiplied by a constant factor greater than 1 (e.g., 100), the resulting shape parameter from the CARTITUDE-1 Bayesian extrapolation was informed mostly by the LEGEND-2 data (referred to as a strongly informed prior).

The ranges of projection results from uninformed and informed approaches were compared, and the extrapolations of OS from 12-month CARTITUDE-1 data (September 2020) using informed approaches were validated with later follow-up, using 28-month CARTITUDE-1 data (January 2022).

Results

OS in CARTITUDE-1 and LEGEND-2

OS data from CARTITUDE-1 had 28 months of available follow-up, at which point 70.4% of patients were still alive. Data were heavily censored towards the end of the follow-up period (Fig. 1A). With 48-month data, LEGEND-2 represented the longest available follow-up of any clinical study of cilta-cel or any other CAR-T treatment in RRMM. The plateauing of the OS curve around month 36 and the empirical hazard for LEGEND-2 suggested that the hazards of death decreased over time (Fig. 1B). In the 12-month CARTITUDE-1 dataset, this decreasing hazard behavior was not fully observed, with high uncertainty of hazard estimates towards the end of the follow-up period, which increased the uncertainty in the standard parametric extrapolations.

Fig. 1
figure 1

A Overall survival Kaplan–Meier curves for LEGEND-2 (48-month data) and CARTITUDE-1 (12-month and 28-month data). B Empirical hazards from CARTITUDE-1 and LEGEND-2

Bayesian Informed Priors Approach Reduces Uncertainty Versus Standard Parametric Extrapolations

Twelve-month data from CARTITUDE-1 were extrapolated using the standard parametric Weibull, log-normal, log-logistic, and exponential distributions, and hazards from these standard models were capped by the generalized population mortality hazards from US data [19] (Fig. 2A). Generalized gamma and Gompertz projections were deemed clinically implausible (5-year estimated OS projections were 0%; Table 2). The exponential distribution was not explored in the informative Bayesian approach as it is a single-parameter distribution and does not have a separate shape parameter. OS extrapolation of the CARTITUDE-1 data based on standard parametric survival models led to wide uncertainty. The exponential distribution, which assumes constant hazard rate in time, provided the minimum Akaike information criteria and Bayesian information criteria (with a small margin compared with other distributions; Table 2); however, it conflicted with the empirical hazard from long-term survival data of LEGEND-2, which indicates a decreasing hazard after 1 year. In contrast to the exponential distribution, other distributions can reflect changing hazard trends over time. For example, the Weibull function assumes monotonically increasing (or decreasing) trends, and log-normal and log-logistic functions assume unimodal hazards. However, extrapolation using standard (uninformed) Weibull, log-normal, and log-logistic distributions on CARTITUDE-1 12-month data still resulted in wide variations in OS across distributions [e.g., 5-year and 10-year OS estimates ranged from 27% to 50% and from 3% to 31%, respectively, and 5-year and 10-year restricted mean survival time (RMST) estimates ranged from 37.70 to 43.38 months and from 44.60 months to 66.59 months, respectively; Table 2].

Fig. 2
figure 2

Projections based on A standard parametric extrapolations versus B Bayesian informative prior (CARTITUDE-1 12-month data). The Bayesian approach reduced variability in projections compared with standard parametric extrapolations. KM Kaplan–Meier, OS overall survival

Table 2 OS projections based on standard parametric extrapolations versus Bayesian informed prior extrapolations

Bayesian (informed) survival models for the Weibull, log-normal, and log-logistic distributions were performed on the 12-month CARTITUDE-1 data with informative priors from LEGEND-2. Resulting extrapolations adjusted for general population mortality showed reduced variability across distributions (smaller differences in estimates of OS and RMST at 5-year intervals) (Fig. 2B and Table 2). The relative range reduction with respect to the uninformed approaches is higher for projections at later timepoints (e.g., 5-year and 10-year RMST estimates ranged from 43.36 to 45.17 months and from 67.59 to 75.73 months, respectively; Table 2). Strongly informative priors (informed primarily by the LEGEND-2 data) reduced variability more than informative priors (informed by both CARTITUDE-1 and LEGEND-2 data).

Validation of Bayesian Informed Prior Approach Using Observed Data at Later Timepoints

To determine how closely Bayesian informative prior extrapolations tracked with observed data versus standard parametric extrapolations, model projections generated from 12-month CARTITUDE-1 data using informative priors were compared with observed 28-month data from CARTITUDE-1 (Fig. 3 and Table 3). Extrapolations using informative priors and the uninformed survival model with log-normal distribution tracked most closely to the observed Kaplan–Meier curves from the later CARTITUDE-1 data cut, as these distributions had the smallest area differences between the extrapolation curve and the observed 28-month curve. Despite the constant hazard rate implication, the uninformed exponential distribution projection also tracked closely with the observed data. Uninformed Weibull distribution fit to the 12-month CARTITUDE-1 data (which implied monotonically increasing hazard in time) was furthest from the observed 28-month curve, whereas the impact of the informed Bayesian approach on the Weibull function [i.e., moving from monotonically increasing (uninformed) to monotonically decreasing (informed) function] reduced the variation of long-term projections and increased the validity of projections with respect to the observed OS CARTITUDE-1 data from the later timepoint.

Fig. 3
figure 3

Bayesian informative prior approach based on CARTITUDE-1 12-month data predicts observed data at 28 months. KM Kaplan–Meier, OS overall survival

Table 3 Comparison of OS projections using standard parametric extrapolation and Bayesian informative prior approach versus observed 28-month median follow-up data

Discussion

In this analysis, the Bayesian informative prior approach was used to decrease uncertainty in OS extrapolations and help validate plausible long-term extrapolation based on standard parametric functions. There is currently no formal guidance on the use of Bayesian curves in OS extrapolations, but external data may be used to inform long-term survival estimates or assess plausibility of extrapolations [8]. The results of our analysis are consistent with other analyses that used external information to increase accuracy and decrease uncertainty of OS extrapolations [4, 8, 20].

The primary benefit of the Bayesian informative prior approach is that it provides more flexibility in combining information from multiple sources to inform the trajectory of the hazard over a longer time. LEGEND-2 48-month data show that the risk for death decreased over time, such that the OS curve plateaued starting around 36 months. Clinically, this plateau may reflect long-term efficacy of cilta-cel treatment in the subset of patients who were able to tolerate treatment and mount an effective and durable anti-tumor response after CAR-T cell infusion. A higher death rate early in the study reflects the advanced stage of disease of the study population as well as adverse events that occurred in the period after infusion. Because the CARTITUDE-1 data are less mature, the population had less advanced disease at baseline, and due to heavy censoring, the decreasing hazard rate in time observed in LEGEND-2 was not yet reflected in the observed CARTITUDE-1 data. Long-term projections were validated by comparing estimates derived from early CARTITUDE-1 data cuts to observed data at later timepoints.

Using evidence of decreasing hazards from LEGEND-2 data to inform the shape parameter of the CARTITUDE-1 curve decreased the variation of OS long-term projections. Specifically, using LEGEND-2 to inform the shape parameter prior altered the original (uninformed) Weibull distribution from a monotonically increasing to a monotonically decreasing hazard function. This may be plausible, given the durable clinical response. LEGEND-2 OS and the 28-month CARTITUDE-1 data did indeed suggest that the uniformed Weibull projections based on the 12-month CARTITUDE-1 data were overly pessimistic. The extrapolations using informed distributions generally had the effect of increasing survival estimates, as well as decreasing uncertainty of the projections and providing closer estimations to the OS from the later CARTITUDE-1 data cut.

This method relied on the assumption that the shape parameter used as an informative prior was exchangeable for both cohorts. LEGEND-2 had a similar trial population and treatment as CARTITUDE-1 but a longer follow-up period. However, there are differences in the treatment practices between China and the USA, including availability of treatments after cilta-cel, as well as differences in patient baseline characteristics. These differences would not violate exchangeability if the two cohorts demonstrated similarity in the distribution of effect modifiers that impact the shape parameters only (e.g., for the Weibull distribution, how the hazard rate changes over time). We acknowledge that there remains uncertainty in the degree of exchangeability of shape parameters between the cohorts; however, the flexibility of the Bayesian approach allowed us to explore different exchangeability assumptions (i.e., different distributions), and the later data cut from CARTITUDE-1 provided additional evidence for the appropriateness of borrowing information using the Bayesian approach. The validity of this approach should be reassessed when OS from later data cuts become available.

The generalizability of this approach to other disease states, therapies, and trials depends on the exchangeability of the shape parameter between the external data and the dataset being extrapolated. This requires a degree of similarity between the two data sets, although adjustments can be made to control for population differences across datasets. For example, an analysis using Bayesian approaches to extrapolate OS data on nintedanib in progressive fibrosing interstitial lung disease used external data from trials of nintedanib in idiopathic pulmonary fibrosis, using propensity score matching to ensure that patients in the analysis had similar baseline characteristics [21]. In the case of cilta-cel, a similar approach of adjusting for baseline characteristics could be used in Bayesian extrapolations of future trials in MM populations earlier in their disease course using CARTITUDE-1 as the external dataset.

Conclusions

The method outlined aimed to reduce uncertainty in OS extrapolations by informing 12-month CARTITUDE-1 OS extrapolations with the shape parameter obtained from 48-month LEGEND-2 OS data. It offered a flexible approach allowing adjustable degrees of influence from the reference external data (i.e., informed versus strongly informed priors) on the shape parameter used in survival models fitted to CARTITUDE-1 data. Although this approach is not widely used currently, its advantages have been recognized by the UK National Institute for Health and Care Excellence (NICE) and was preferred over uninformed extrapolations (e.g., NICE technical support document on flexible survival methods and NICE technology assessment of nintedanib for lung disease) [2, 21, 22]. Use of external data sources to inform OS projections from pivotal clinical trials might increase certainty of long-term projections, especially in the initial data cuts from the trial that are available at the time of the product launch.