FormalPara Key Summary Points

Why carry out this study?

A network meta-analysis was performed, evaluating approved first-line tyrosine kinase inhibitors (TKIs) for metastatic renal cell carcinoma (mRCC).

This provided an up-to-date, comprehensive analysis of phase II/III randomized controlled trial data.

What was learned from this study?

No significant efficacy differences between approved first-line TKIs were observed.

Tivozanib ranked the most favourable in the analysis of grade 3 and 4 adverse events.

This produced indirect evidence to support clinical decisions and planning of future trials.

Introduction

The treatment of metastatic renal cell carcinoma (mRCC) has evolved over the last decade and now harbours several different drug classes [cytokines, tyrosine kinase inhibitors (TKIs), mammalian target of rapamycin inhibitors and immune checkpoint inhibitors (ICIs, IOs)] [1, 2]. European mRCC treatment guidelines [European Association of Urology (EAU) and European Society for Medical Oncology (ESMO)] were updated in 2018/2019, with TKIs recommended as the standard for treating favourable International Metastatic RCC Database Consortium (IMDC) risk-group patients, but as optional first-line treatments (secondary to ICIs as standard) in intermediate- or poor-risk patients [1, 2]. Guidelines now include the following TKIs as first line: cabozantinib, pazopanib, sorafenib, sunitinib and tivozanib [1, 2]. The first-line TKI options have varying potency and selectivity for vascular endothelial growth factor receptors (VEGFRs) [3, 4]. Pazopanib, sorafenib, sunitinib and cabozantinib are considered multi-targeted TKIs because they inhibit several tyrosine kinases, such as platelet-derived growth factor receptor and c-KIT, in addition to VEGFR, whereas tivozanib has been shown to potently and selectively target all three VEGF receptors [3, 4]. All approved first-line TKIs have demonstrated anti-tumour activity, but it is proposed that the off-target effects contribute to differences between the toxicity profiles [1,2,3,4]. Examples of off-target toxicities include diarrhoea, fatigue and hand-foot syndrome, whereas VEGF-associated toxicities include hypertension and hypothyroidism [3, 4]. With many TKIs available, and given the lack of head-to-head randomised controlled trials (RCTs), it is important to evaluate the relative efficacy and toxicity of each TKI to support an evidence-based approach to treatment. Two previous network meta-analyses (NMAs) of first-line mRCC treatments have been carried out: one predates the approval of cabozantinib as a first-line treatment; the other demonstrated that there was no significant difference in progression-free survival (PFS) associated with several first-line treatments, but this analysis did not include safety profile data for all therapies studied [5, 6]. Toxicity may be an important differentiator between these treatments. This NMA aims to provide an up-to-date, comprehensive efficacy and toxicity comparison between each of the approved first-line TKIs for mRCC.

Methods

Systematic Literature Review Methodology

We performed a systematic literature review and NMA of phase II/III RCTs assessing approved first-line TKI therapies for mRCC. PubMed, ClinicalTrials.gov, Embase, Medline, the Cochrane Central Register of Controlled Trials, Web of Science and conference abstracts from the American Society of Clinical Oncology (ASCO), ESMO and ASCO-Genitourinary were searched independently by two authors (WD and AE). Only English language publications from database inception to 15 January 2019 were included. Search terms included: randomised clinical trial; metastatic renal cell carcinoma; advanced renal cell carcinoma; tyrosine kinase inhibitor; first-line; phase II trial; phase III trial; immunotherapy; progression-free survival; adverse events. Results were restricted to phase II and phase III RCTs. Bibliographies of review articles and editorials were manually searched. The literature review process followed Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [7]: two authors (JM and KF) independently evaluated data from eligible studies, which were then checked by a third author (KW), and any disagreements were resolved by discussions moderated by a fourth author (WD). A bias risk assessment was conducted using The Cochrane Collaboration’s tool [8]. All RCTs comparing a first-line TKI with other TKIs, placebo or interferon-alfa (IFN-α; a historic standard of care) were included; other comparators were excluded. Additional exclusion criteria included: non-randomised trials, retrospective studies, second-line or later-line studies, case reports and TKIs not approved for first-line therapy.

The outcome measure used to evaluate efficacy in the NMA was PFS, as first reported by the study authors (i.e., either independent review committee or investigator assessed). In some instances, time-to-progression (TTP) data were substituted as a close approximate. The toxicity outcome measure used was the proportion of patients experiencing a maximum of grade 3 or 4 adverse events (AEs). Contact with the study authors was attempted (email and/or telephone) to obtain missing information. In some cases it was possible to derive missing information from available data using formulae adapted from Woods et al. (2010) and Altman and Bland (2011) [9, 10].

Data collection for this work is based on previously conducted studies and does not contain any study with human participants or animals performed by any of the authors. The data are solely obtained from published studies.

Statistical Analysis

All statistical analyses were performed using R software and the netmeta package [11, 12]. For the network analyses, the treatment effect [log (PFS hazard ratio)] and toxicity effect [log (relative risk of % patients experience grade 3 or 4 AE)], and estimates of the standard error of each, were inputted into the model, using the data are collated in Table 1. Network diagrams were produced: the thickness of the connecting lines represents the strength of evidence for a treatment effect [11,12,13].

Table 1 RCTs included in the network meta-analysis

We performed an NMA using a random effects model with a frequentist approach [14, 15]. A fixed effect model was tested but significant heterogeneity was detected in the overall network (Qtotal = 13.99, p = 0.0073), which could be decomposed into considerable heterogeneity between the designs (Qbetween = 12.00, p = 0.0074) and non-significant heterogeneity within the designs (Qwithin = 1.99, p = 0.1586). A design consists of a pairwise comparison of two treatments such as sorafenib vs. tivozanib. Therefore, a random effects model was selected over a fixed-effects model to account for this potential heterogeneity (different study designs, populations, treatment arms, etc.) [16].

Sources of inconsistency in the random effects model were investigated by the generation of net heat plots: the colour-scale shading of each box indicates whether the design was a source of inconsistency (red) or supported other evidence (blue), and the area of the grey boxes indicates the contribution of direct-comparison evidence and indirect evidence (given in the columns) to the network estimate of a comparison (shown in the rows) [13].

Treatments were ranked by calculating P scores using the netrank function of the netmeta package [11, 17]. P scores measure the extent of certainty that a treatment is better than another treatment, averaged over all competing treatments, while taking the precision into account [17].

Results

The systematic literature search identified 699 unique references, which, after review, revealed 12 that fitted the screening criteria (Fig. 1). Table 1 presents the RCT data input into the model and some key trial characteristics. The RCTs directly comparing TKIs to either IFN-α or placebo demonstrated significant improvements in PFS [18,19,20], except for sorafenib versus IFN-α [21]. RCTs directly comparing TKIs to one another demonstrated mixed results: some demonstrated significant improvements in PFS [22, 23] or established non-inferiority [24] while others did not [25,26,27,28,29,30]. As data from two studies were available in abstract form only, we were unable to access their risk of bias. All other studies included were open-label trials. We felt that all studies were at low risk of attrition and reporting bias.

Fig. 1
figure 1

PRISMA diagram

For the efficacy NMA, we included all 12 studies with a total of 4306 patients (Fig. 2a), and for the safety analysis, we included data for all 12 studies with a total of 4243 patients (Fig. 3a). The strength of evidence for the sunitinib (50 mg 2/1) dosing regimen was the weakest in the NMA (Fig. 2a, 3a), perhaps because of the small sample size (Table 1). The NMA output data are tabulated in Appendix Tables S1 and S2. The eligibility criteria for the 12 studies varied, for example the majority of studies did not specify a Memorial Sloan Kettering Cancer Center (MSKCC) prognostic group as an entry criteria, except for three studies that only enrolled patients with a favourable or intermediate MSKCC risk score [26,27,28] and one that enrolled patients of intermediate or poor IMDC risk category [23]. These differences in eligibility criteria can be a potential source of heterogeneity, which is partially accounted for in the random effects model. When analysing for specific sources of heterogeneity, the studies including only favourable or intermediate MSKCC risk patients [26,27,28] were not collectively found to be a significant cause of inconsistency (Figs. 2d, 3d). The net heat plots also show that these studies contribute important indirect evidence to the model. It was not possible to analyse the effect of restrictive MSKCC eligibility criteria at the other end of the prognostic risk spectrum in this way because only one cabozantinib study that used these criteria was included [23].

Fig. 2
figure 2

Network meta-analysis of PFS: a network diagram; b forest plot, with placebo as the comparator. PFS progression-free survival, HR hazard ratio, CI confidence interval; c forest plot, with cabozantinib as the comparator; d net heat plot. Red areas are related to inconsistency, whereas blue areas support other evidence gained from the network. The area of a grey box indicates the contribution of the direct estimate of the pairwise comparison in the column to a network estimate in a row

Fig. 3
figure 3

Network meta-analysis of percentage grade 3 or 4 AEs: a network diagram; b forest plot, with cabozantinib as the comparator. AE adverse event, RR relative risk, CI confidence interval; c forest plot, with tivozanib as the comparator; d net heat plot. Red areas are related to inconsistency, whereas blue areas support other evidence gained from the network. The area of a grey box indicates the contribution of the direct estimate of the pairwise comparison in the column to a network estimate in a row

Figure 2b shows the NMA results of the indirect efficacy comparison with placebo; the confidence intervals demonstrate that cabozantinib, sunitinib [standard regimen (50 mg 4/2)], pazopanib, tivozanib and sorafenib treatments were significantly different from placebo, whereas the alternative sunitinib dosing regimens [50 mg 2/1 and 37.5 mg continuous daily dose (CDD)] were not. Cabozantinib had the highest probability of being the best treatment in terms of PFS (P score 0.9481), followed by sunitinib, pazopanib and tivozanib (P score 0.7411, 0.6914 and 0.5988, respectively) (Table 2). When treatments were indirectly compared with cabozantinib, it was clear that there was no significant difference in PFS between the first-line TKIs, with the exception of sorafenib, which was associated with a significantly shorter PFS (Fig. 2c). Figure 3b shows the NMA toxicity results for the indirect comparison of first-line TKIs with cabozantinib: the confidence intervals demonstrate that tivozanib, placebo and IFN-α were associated with a significantly lower likelihood of grade 3/4 AEs. Calculation of P scores confirms that tivozanib has a 92.6% probability of having the least toxicity (Table 2). Indirect toxicity comparison with tivozanib demonstrates that the grade 3/4 safety profile of tivozanib is significantly different from all other first-line TKIs (Fig. 3).

Table 2 Network meta-analysis of PFS (left) and percentage grade 3 or 4 AEs (right): P score ranking

Discussion

The therapeutic arsenal at hand to treat kidney cancer patients is evolving rapidly with novel IO-IO combinations (e.g., nivolumab plus ipilimumab—CheckMade-214 trial [33]) or IO-TKI combinations (e.g., avelumab plus axitinib—Javelin-101 trial [34]; pembrolizumab plus axitinib—Keynote-426 trial [35]) now being approved by the FDA and/or EMA as first-line combination treatment strategies. These combinations have improved treatment outcome dramatically with a significant overall survival benefit (Keynote-426 and CheckMade-214). However, the observed clinical benefit in these trials was associated with increased grade 3/4 AEs: 75.8% for pembrolizumab/axitinib [35] and 71.2% for avelumab/axitinib [34]. For the IO-IO combination, grade 3 and 4 AEs were reported to be 46% and 63%, respectively, with a treatment discontinuation rate of 22% due to adverse events [33], suggesting that there is still a role for a single TKI treatment especially in elderly and less fit patients; however, the optimal TKIs for these patients remain unclear.

This NMA was conducted in an attempt to provide a comprehensive comparison of the efficacy and safety of approved first-line TKIs for advanced and metastatic renal cell carcinoma. The efficacy NMA P scores ranked the first-line TKIs from highest probability of efficacy downward as follows: cabozantinib, sunitinib (50 mg 4/2); pazopanib and tivozanib, sunitinib (37.5 mg CDD); sunitinib (50 mg 2/1), sorafenib. It is possible that the small sample size influenced the sunitinib (50 mg 2/1) ranking result. Cabozantinib had a 94.8% probability of being the best treatment in terms of PFS; however, several other treatments also had P scores > 50%, and the confidence intervals demonstrate that no significant differences between cabozantinib and either sunitinib (50 mg 4/2), pazopanib or tivozanib were observed. However, it should be noted that this NMA is underpowered with wide 95% CIs and we cannot conclude “similar efficacy” as our NMA was not an equivalence trial, which is a common feature of all NMAs published so far.

Taken together, it is not possible to produce a clear hierarchy of first-line TKIs based on significant differences in efficacy. Consequently, the toxicity of these TKIs may play a more significant role in treatment decisions. The NMA indicated that in terms of grade 3 and 4 AEs, tivozanib had the most favourable safety profile and was shown to be associated with significantly less risk of toxicity than other TKIs. This result was consistent with the high specificity of tivozanib for VEGFR compared with other multikinase inhibitors and the hypothesis that fewer off-target side effects occur [3, 4].

A previous NMA of mRCC treatments did not include safety profile data for all therapies included [5]. To produce a comprehensive analysis of approved first-line TKIs, unpublished missing safety data were sought and obtained from six studies. To this end, an updated value from the internal tivozanib safety data bank for the proportion of patients experiencing grade 3 or 4 AEs was also included (data on file). The NMA was also computed using the Motzer et al. (2013) grade 3 or 4 AE data [22], and a similar P score rank position was held by tivozanib relative to other TKIs, but tivozanib ranked lower than placebo and IFN-α, respectively (data not shown).

A favourable safety profile has the direct benefit to patient quality of life of experiencing fewer side effects and may also be associated with simplified management owing to fewer dose interruptions or dose reductions that are required to mitigate side effects [3]. Indeed, in the tivozanib versus sorafenib RCT, tivozanib was associated with significantly fewer dose reductions and interruptions due to AEs than sorafenib [22]. Furthermore, low toxicity is a key characteristic of a therapy potentially suitable for use in combination therapy. The CheckMate-016 trial demonstrated that the combination of either sunitinib or pazopanib with nivolumab, an anti-PD-1 antibody, resulted in a high incidence of grade 3/4 AEs, making neither combination suitable for use [31]. The combinations of cabozantinib with nivolumab and tivozanib with nivolumab are both under investigation [32, 33]. Among these TKIs, our safety data NMA results predict that tivozanib has the greatest chance of being a suitable partner to nivolumab, although the specific overlap of each drug’s safety profile and any drug-drug interaction will also have influence. As new drugs enter the mRCC treatment landscape, there will be even more combination therapy options to pursue [4].

Cabozantinib had the greatest probability of having the highest efficacy in the NMA; however, the cabozantinib RCT was conducted only in intermediate or poor MSKCC risk patients; hence, it is only indicated in these patients [23, 34]. The NMA data support the ESMO and EAU guidelines that include cabozantinib as a TKI option in poor- or intermediate-risk patients [1, 2].

The current EAU guideline recommends that tivozanib is not used in first-line mRCC treatment because the evidence is considered inferior to other recommended TKIs [2]. This NMA addresses this by providing indirect evidence to supplement the direct evidence for tivozanib. Specifically, the net heat plot diagram (Fig. 2d) demonstrates that while the sorafenib-tivozanib comparison strongly relies on the direct evidence, the placebo-tivozanib network efficacy estimate gains most evidence from indirect comparisons. The NMA results show tivozanib was not associated with significantly worse efficacy compared with other first-line TKIs and demonstrated a clearly reduced incidence of grade 3 or 4 AEs. These data support the ESMO guidelines, which recommend tivozanib as a first-line mRCC treatment in favourable-risk patients, alongside sunitinib and pazopanib, and as a TKI option in intermediate-risk patients, alongside cabozantinib, sunitinib and pazopanib [1].

In addition, the efficacy NMA data showing that sorafenib PFS is significantly different from cabozantinib, and the P score ranking that suggests sorafenib is less likely than either alternative sunitinib regimen to be the best treatment, which were themselves shown to be not significantly different from placebo, provide support for the guidelines that (1) no longer recommend sorafenib as a first-line treatment, apart from in limited-choice settings, and (2) suggest that robust data to support the use of alternative sunitinib dose regimens are lacking [1, 2].

Limitations of meta-analyses using aggregate data have been discussed previously [5]. As the confidence intervals in our analysis and other published NMAs [5, 36] are relatively wide, results need to be treated with caution. In our NMW we used a p < 0.05 threshold to judge the statistical significance of our findings, which means that the results are statistically significant if the confidence intervals do not include the value of 1 (for HR and relative risk). The forest plot for cabozantinib (Fig. 2c) with the HR being > 1 is indicative of inferior efficacy of all other treatments compared with cabozantinib. However, after including the uncertainty around the point estimate (i.e., 95% CIs) for the other TKIs, it becomes clear that the “true” effect could be better for cabozantinib (HR < 1, lower limit of the CI) or worse than cabozantinib (HR > 1, upper limit of the CI), which then should be considered as being not statistically significant. However, it has to be taken into account that for cabozantinib only one phase II trial (CABSUN Trial, N = 157) had been published [23]. The results of this trial remain controversial and a number of concerns have been raised because of the poor efficacy of sunitinib in the control arm in which the overall survival (OS) was found to be much worse than in the majority of other studies [5]. On the other hand, tivozanib appeared to be the TKI of choice in our NMA; however, the drug is not approved in the USA because of poor OS outcomes [22]. Finally, one can argue that grade 3–4 AEs may not entirely reflect treatment-related toxicity, suggesting that for an appropriate interpretation of any NMA clinical context is required. Although we have demonstrated no statistically significant differences in the efficacy of the TKIs approved as first-line therapy, it does not rule out the possibility that there still might be one when analysing another data set.

Specific to this NMA, it may be argued that the omission of an evaluation of OS is a limitation. In this regard it should be noted that none of the currently approved TKIs for first-line treatment of advanced or metastatic renal cell cancers has shown a significant OS benefit so far, which prompted us to omit an extensive OS analysis. In addition, some other recently published NMAs have provided evidence that no single TKI treatment appeared to be superior to its comparators for objective response rate (ORR) and was not predictive of OS [36].

However, PFS has been shown to be predictive of OS in TKI-treated patients and, although strict surrogacy has not been established, the US Food and Drug Administration has indicated PFS end points are acceptable [37, 38]. Furthermore, OS can be impacted by differences in sequential therapy, evidenced by the fact that none of the studies included in this analysis demonstrated a significant OS benefit over its comparator; several of these studies suggest OS results were possibly confounded by cross-over to second-line therapies, and some studies specifically evaluated cross-over in a switch design [20,21,22,23,24,25,26,27, 30, 39,40,41]. Another possible limitation is that grade 1 and 2 toxicities were not included in the analysis. By definition, higher grade AEs are more critical; however, using the IFN-α safety profile as an example, which includes mostly grade 1 and 2 AEs (resulting in it ranking higher than all TKIs except tivozanib by P score), side effects such as fatigue are known to be challenging to manage [42]. Finally, the results of our NMA cannot be directly applied to clinical practice because cabozantinib is only approved for use in intermediate or poor IMDC prognostic risk patients. The cabozantinib trial is one of four trials (out of 12) included in the NMA that had restrictive prognostic risk group entry criteria. We included these studies to provide a comprehensive analysis of all first-line approved TKIs because there was an insufficient number of trials to analyse them separately.

Conclusions

In this NMA no statistically significant differences in the efficacy among cabozantinib, sunitinib, pazopanib and tivozanib could be detected; however, tivozanib appeared to be associated with a more favourable safety profile in terms of grade 3 or 4 toxicities. The findings of this NMA may bolster information from pairwise comparisons to shape mRCC clinical decision-making and to assist planning of future RCTs.