Introduction

Transrectal ultrasound-guided systematic biopsy in the work-up of prostate cancer (PCa) diagnosis has shown a rising prevalence of antibiotic-resistant bacterial infections with biopsy-related septic complications [1]. Furthermore, systematic biopsy is associated with increased detection of indolent or low-grade PCa [2]. Reduction of systematic biopsies is pivotal in men who eventually prove to have no or low-grade PCa. Utility of validated multivariate risk prediction models to determine the risk of clinically significant PCa (csPCa) and to reduce (unnecessary) biopsy is nowadays recommended in guidelines [3].

The individualized risk-adapted approach in prostate cancer diagnosis is about to change with the introduction of prostate multi-parametric magnetic resonance imaging (mpMRI) in daily practice. Despite the qualities of mpMRI in predicting the absence or presence of csPCa, today mpMRI is still utilized as a diagnostic test for improving the performance of the diagnostic work-up, and not reducing biopsy [3, 4]. In other words, mpMRI is indicated when systematic biopsy is indicated, and thus when the likelihood is high of having clinically significant disease in a subsequently systematic biopsy.

mpMRI is especially indicated in men with prior negative biopsy who are still suspected of having significant disease. However, in biopsy-naïve men mpMRI is also suggested as an upfront or prebiopsy diagnostic test, to improve the diagnostic yield when combining targeted and systematic biopsy [5,6,7]. Moreover, mpMRI has also been introduced as a triage test to indicate performing or not performing a biopsy [5, 8,9,10]. As a result of its high negative predictive value, men with no suspected evidence for csPCa on MRI may defer systematic biopsy [11, 12].

Utilizing mpMRI as a triage test shows resemblance with risk prediction, and may have overlap with current multivariate risk prediction models for prostate cancer [13,14,15,16,17,18,19]. Most of these current risk models have been externally validated several times, and all use prostate-specific antigen (PSA), and digital rectal examination (DRE) as individual predictive input variables. To improve their predictive value, some use extra input variables such as age, prostate volume, free PSA, family history, race, and prior negative biopsy.

Due to the predictive value of mpMRI in PCa diagnosis, recently new multivariate risk prediction tools have been constructed, with the inclusion of the mpMRI suspicion score [20,21,22,23,24,25,26,27,28,29]. The purpose of this review is to explore the performance of these new MRI risk models for indicating a biopsy for prostate cancer diagnosis.

Multivariate risk prediction including mpMRI for indicating targeted biopsy

Including the mpMRI as an extra diagnostic test in multivariate risk prediction tools, the balance between benefit and risks may change and should be (re-)evaluated. In this evolution, the first studies are published, mainly focusing on maximizing diagnostic yield, in combinations with potentially reducing biopsies and reducing the detection of clinically insignificant PCa. Study characteristics are shown in Tables 1 and 2. Theses multivariate prediction tools are based on logistic regression models. For these models, mostly the area under the receiver-operating-characteristic (ROC) curve (AUC) are investigated for any PCa and csPCa (Table 2), sometimes in combination with decision and net reduction curve analysis. In most studies clinically significant PCa was defined as Gleason score 3 + 4 or higher, nowadays better referred to the International Society of Urological Pathology (ISUP) grade (G) 2 and higher.

Table 1 Characteristics of studies related to nomogram development for prostate cancer diagnosis, including multi-parametric MRI
Table 2 Characteristics and results of studies related to nomogram development for prostate cancer diagnosis, including multi-parametric MRI

In biopsy-naïve setting

Several studies have developed an MRI risk prediction model, which some have been internally or externally validated.

Based on the ERSPC-RCs, a risk calculator including mpMRI was developed on datasets from five international centers [20]. Input variables were PSA, DRE, prostate volume, prior biopsies, age, and mpMRI (PI-RADS 1–5) (Table 2). In total, data from 504 biopsy-naïve men were used to adjust the ERSPC-RC3 into the MRI-ERSPC-RC3. MRI-ERSPC-RC3 had a significantly higher AUC for G ≥ 2 PCa compared with the non-calibrated baseline model (ERSPC-RC3 + DRE): 0.84 [95% confidence interval (CI) 0.81–0.88] versus 0.76 [0.71–0.80] (p < 0.01) (Table 2). In decision curve analysis, the MRI-ERSPC-RC3 showed a benefit for G ≥ 2 PCa threshold probabilities larger than 10%. For example, at a risk threshold of ≥ 10% G ≥ 2 PCa to indicate a biopsy, 14% (143/1000) biopsies would have been avoided, missing G = 1 PCa in 13% (18/143) and missing G ≥ 2 PCa in 10% (14/143) of the avoided biopsies (i.e., negative test) (Table 3). We need to address that the prevalence of 42% G ≥ 2 PCa in this cohort was rather high. At this threshold, still only 3.3% of all G ≥ 2 PCa (14/423) was not detected overall. Due to this high-risk population, hardly any benefit was reached at a risk threshold of lower than 10% G ≥ 2 PCa.

Table 3 Test results of baseline model and MRI-model in biopsy-naive, prior negative biopsy, and combined setting

Radtke and coworkers [21] investigated multivariate prediction modeling in a similar approach, also based on the ERSPC-RC3. A single-center dataset of 660 biopsy-naïve men was used. MRI-ERSPC-RC3 reached a higher AUC (0.83), compared with ERSPC-RC3 (0.81), age refitted ERSPC-RC3 (0.80, p < 0.001), and PI-RADSv1.0 (0.76, p < 0.001) (Table 2). In decision curve analysis, the MRI-ERSPC-RC3 showed a benefit for G ≥ 2 PCa threshold probabilities larger than 10% (Table 3). Also in this study cohort, the estimated prevalence for G ≥ 2 PCa was 46% and was rather high, explaining the benefit in only the risk threshold above 10%.

Mehralivand and colleagues developed a risk prediction model, based on PSA, DRE, age, African American ethnicity, and incorporating mpMRI (PI-RADS 1–5) [22]. A developing cohort from 1 institute (n = 400) and a validating cohort from 2 other institutes (n = 250) had prevalence for G ≥ 2 PCa (outcome measurement) of 48.3% and 38.2%, respectively. In comparison to the baseline model, the AUC of the MRI risk prediction model increased from 0.72 to 0.84 (p < 0.001) in the development cohort, and increased from 0.64 to 0.84 (p < 0.001) in the validation cohort (Table 2). By applying the MRI risk prediction model to the validation cohort, higher net benefit than the baseline model could be achieved for risk thresholds above 10%. At a risk threshold of ≥ 10% G ≥ 2 PCa to indicate a biopsy, 17% (172/1000) biopsies would have been avoided, missing G ≥ 2 PCa in 6% (11/172) in the avoided biopsies (Table 3). In this study cohort, the prevalence for G ≥ 2 PCa was 38% and was a little lower than in the previous studies, which may contribute to the higher net benefit at the same risk threshold of ≥ 10%.

Fang et al. [23] developed an MRI risk prediction model, based on PSA, age, and (abnormal) transrectal ultrasound, incorporating mpMRI (PI-RADS 1–5). The AUC for G ≥ 3 PCa for the developing cohort (n = 894, with a prevalence 24.4%) was 0.87, in comparison to 0.85 (p = 0.001) risk prediction model without mpMRI (Table 2). At a risk threshold of ≥ 10% G ≥ 3 PCa to indicate a biopsy, 10% (98/1000) biopsies would have been avoided, missing no G ≥ 3 PCa (0%, 0/98) in the avoided biopsies (Table 3).

Bjurlin and colleagues [24] developed an MRI risk prediction model, based on PSA-density and age, incorporating mpMRI (MRI suspicion score assessment category 3–5, excluding the first categories). Bias-corrected AUC for G ≥ 2 (Gleason ≥ 7) PCa was 0.91 for the developing cohort (n = 201, with a prevalence 34.8%) and 0.84 for the validating cohort (n = 87, with a prevalence 31.0%) (Table 2). These AUCs were higher in comparison to baseline model (PSA-density) (0.75 and 0.69) and MRI suspicion score model (0.90 and 0.84).

Niu and coworkers [25] developed an MRI risk prediction model, based on adjusted PSA-density and age, incorporating mpMRI (PI-RADS 1–5). The AUC for G ≥ 2 (Gleason score ≥ 7) for the developing and validating cohort was 0.85 [0.79–0.90] (n = 151, with a prevalence 21.2%) and 0.82 [0.76–0.89] (n = 74, with a prevalence 24.3%) (Table 2). These AUCs were higher in comparison to the baseline model (PSA-density) (0.74 [0.66–0.79], p = 0.013) and PI-RADSv2 score (0.76 [0.71–0.84], p = 0.018).

In prior negative biopsy setting

Next to the development of MRI-ERSPC-RC3 on biopsy-naïve men, a risk calculator MRI-ERSPC-RC4 was developed for men with prior negative biopsy (n = 457, prevalence of 29% G ≥ 2 PCa) [20]. MRI-ERSPC-RC4 had a significantly higher AUC for G ≥ 2 PCa compared with the ERSPC-RC4 + DRE (baseline model): 0.85 (95% CI 0.81–0.89) versus 0.74 (95% CI 0.69–0.79, p < 0.01) (Table 2). Using a ≥ 10% risk threshold for G ≥ 2 PCa to indicate a biopsy, 36% (361/1000) biopsies would have been avoided, missing G = 1 PCa in 15% (55/361) and missing G ≥ 2 PCa in 4% (15/361) of all avoided biopsies (Table 3). In this cohort the prevalence of 29% G ≥ 2 PCa was lower than for the biopsy-naïve cohort. Therefore, in contrast to biopsy-naïve men, the decision curve analysis in men with prior negative biopsy already showed clear net benefit of the MRI-ERSPC-RC4 at a risk threshold of ≥ 5% for G ≥ 2 PCa (Table 3).

Radtke and coworkers [21] also developed an MRI-ERSPC-RC4 for men with prior negative biopsy, next to an MRI-ERSPC-RC3. In men with previous biopsy (n = 355, estimated prevalence of 40% G ≥ 2 PCa), the discrimination of the MRI-ERSPC-RC4 (0.81) was superior to that of ERSPC-RC4 (baseline model) (0.66, p < 0.001), refitted ERSPC-RC4 (0.76, p < 0.001), and PI-RADSv1.0 (0.78, p < 0.001) (Table 2). In decision curve analysis, the MRI-ERSPC-RC4 showed a benefit for G ≥ 2 PCa threshold probabilities larger than ≥ 10% (Table 3).

Bjurlin and colleagues [24] investigated an MRI risk prediction model with a bias-corrected AUC for G ≥ 2 PCa of 0.86 for the developing cohort (n = 119, with a prevalence 21.8% G ≥ 2 PCa) and 0.87 for the validating cohort (n = 52, with a prevalence 9.6% G ≥ 2 PCa) (Table 2). These AUCs were higher in comparison to the baseline model (PSA-density) (0.76 and 0.76), and MRI suspicion score model (0.83 and 0.84).

Truong et al. [26] developed an MRI risk prediction model, based on age, PSA, and prostate volume, incorporating mpMRI (PI-RADS 1–5). The AUC for predicting benign prostate pathology for the developing cohort (n = 285, with a prevalence 46.3% benign prostate) was 0.83 (Table 2). With a cutoff probability of ≥ 0.70 used to recommend deferment of MRI-targeted biopsy, in 21.4% (61/285) men unnecessary biopsy would have been avoided, and 6.2% (4/61) with a G ≥ 2 PCa would have been missed. The prevalence of G ≥ 2 PCa in this cohort was 39% (111/285). Overall, only 3.6% (4/111) of all G ≥ 2 PCa would not have been detected at this threshold. Another group [30] validated this prediction tool, and found for predicting benign prostate pathology an AUC of 0.78. An updated model was constructed with improved calibration and similar discrimination (AUC 0.79).

Huang et al. [27] developed an MRI risk prediction model, based on age, PSA, prostate volume, and DRE, incorporating mpMRI (PI-RADS 1–5). The AUC for G ≥ 2 PCa for the developing cohort (n = 231, with a prevalence 25.5%) was 0.93 [0.89–0.96] (p < 0.001) (Table 2).

In combined biopsy-naïve and prior negative biopsy setting

Lee et al. [28] developed an MRI risk prediction model, based on age, PSA-density, and primary biopsy, incorporating biparametric MRI (PI-RADS 1–5). The AUC for csPCa (G ≥ 2 or MCCL ≥ 6 mm) for the developing cohort (n = 615, with a prevalence 38.5%) was 0.92 [0.89–0.94] (Table 2). At a calculated probability cutoff of ≥ 10% csPCa to indicate a biopsy, 10.6% (65/615) biopsies would have been avoided, missing G = 1 PCa in 16.9% (11/65) and missing G ≥ 2 PCa in 3.1% (2/65) of the avoided biopsies (Table 3).

Van Leeuwen and coworkers [29] developed a risk prediction model, based on age, PSA, DRE, prostate volume, previous biopsy, incorporating mpMRI (MRI suspicion score category 1–5). csPCa was defined as Gleason 7 with > 5% grade 4, ≥ 20% cores positive or ≥ 7 mm maximum cancer core length (MCCL). The AUC for csPCa was 0.88 [0.85–0.92] for the developing cohort (n = 393, with a prevalence 37.9%) and 0.86 [0.81–0.92] for the validating cohort (n = 198, with a prevalence 44.7%) (Table 2). This was higher than in comparison to the multivariate risk baseline model (0.80 [0.75–0.84]). With a cutoff probability of ≥ 0.10 used to indicate a biopsy, in 28.2% (282/1000) men an unnecessary biopsy would have been avoided, missing G = 1 PCa in 26.2% (74/282) and missing G ≥ 2 PCa in 3.5% (10/282) in the avoided biopsies. Overall, 12.8% (74/578) cisPCa and 2.6% (10/379) csPCa were not detected at this threshold.

Performance analysis of the MRI risk prediction models

Next to the comparison of AUCs between MRI-models, the performance can also be further explored using other performance parameters (Table 4). For such an analysis, performance parameters like true positive rate (TPR), false positive rate (FPR), net benefit (NB), percentage avoided biopsies (PAB), and percentage avoided clinically insignificant PCa could be investigated, as shown by Mehralivand and coworkers [22]. We were able to calculate these performance parameters for most studies, with risk thresholds ranging from 0 to 20% (Table 4).

Table 4 Performance measures related to clinical usefulness of baseline model, MRI-model, and net differences between both models, in biopsy-naive, prior negative biopsy, and combined setting

By analyzing the individual performance parameters, physician and patient may be better informed to decide whether to undergo further biopsy testing or no further biopsy testing. A better trade-off can be made, based on the subjective weight of each individual performance parameter. If a patient focuses on maximizing diagnostic yield, the TPR is of vital importance in his decision. If a patient focuses on not to be (unnecessarily) biopsied, the FPR and PAB become more prominent. The net benefit is calculated by the formula = (TP − w FP)/N), where TP is the number of true positive decisions, FP the number of false positive decisions, N is the total number of patients and w is a weight equal to the odds of the risk threshold or cutoff in percentages (pt/(1 − pt) [31].

In most studies, the MRI risk prediction models were compared to baseline risk prediction models (without MRI). The input parameters in the used baseline models differed between studies, as previously discussed. The comparison between the each MRI- and baseline model is in fact the calculated difference between the values of the individual performance parameter (Table 4). For example, using the MRI-model instead of the baseline model in biopsy-naïve men at a risk threshold of ≥ 10%, Alberts and coworkers showed a net difference of 13.7% in avoiding biopsies, 11.2% in avoiding the detection of clinically insignificant PCa, 21.7% reduction of false positive tests, at the expense of 2.8% not detected csPCa.

Using the MRI-risk model instead of the baseline risk model in biopsy-naïve men, we observe that three studies show a notable beneficial net difference above the threshold of > 10%, at least for the FPR, NB, avoided biopsies and avoided detection of clinically insignificant PCa [20,21,22]. These results are at a low price of missing csPCa, even in the investigated cohorts with remarkable high prevalence of clinically significant PCa (42.3–48.3%) [20,21,22] (Table 2).

These results would have been even better in more real-life opportunistic screening cohorts, instead of the highly selected cohorts in these tertiary hospitals. Following upfront risk prediction in biopsy-naïve men, the prevalence of csPCa might be in the range of approximately 20–30% [13, 32]. Before using such a test recalibration of these multivariate risk prediction tools to the in-hospital diagnosed population is mandatory.

Considerations and future perspectives

Individualized risk assessment of csPCa using a predictive model that incorporates the mpMRI suspicion score in combination with clinical and biochemical data, allows a considerable reduction in unnecessary biopsies and reduction of the risk of over-detection of clinically insignificant PCa, albeit at the expense of a low number of (not diagnosed) csPCa. Whether or not this has detrimental effect on future metastatic and mortality rates remains to be seen. The MRI risk prediction models highlight their accuracy and power, and suggest that the usage of these tools would allow the identification of patients with significant disease. However, at present external validation and results of calibrating steps are lacking.

We need to acknowledge that previously developed and already validated risk calculators, without mpMRI as an input variable, perform sufficient, but can be further improved by the input of other “biomarkers” such as mpMRI. Despite these satisfactory predictive results and potential improvements in outcomes, MRI risk prediction models may also avoid limitations of these models. For example, prostate volume is an important input variable as it is related to PSA value. Prostate volume is commonly determined on TRUS or in certain circumstances estimated by DRE. The use of a TRUS-based input variable, including the variable of abnormal or suspicious TRUS, is impractical since men who undergo TRUS are also likely to undergo TRUS-guided biopsy. With mpMRI, prostate volume can also be accurately assessed, and avoiding the need for performing a TRUS [33, 34]. It provides information on probably the most strong predictor for biopsy outcome currently available, i.e., PSA-density. Therefore, the results of the MRI-based models can be used to counsel men in biopsy decision-making, even before planning a TRUS. Still, the indication for prostate MRI should be critically evaluated, when even unnecessary mpMRI’s can be avoided by multivariate risk prediction [35, 36]. In such a scenario, at first the mpMRI could be circumvented using DRE to the roughly estimate prostate volume. Subsequently, mpMRI could only be utilized in men that most likely have an elevated PSA not related to BPH [37,38,39,40,41].

Despite the good performance of these constructed and published MRI risk prediction models, in combination with the potential advantages, we need to address some limitations for its generalizability and practicality in daily clinic.

The accuracy of the systematic biopsy is dependent on the number of cores. Most studies reported a transrectal approach with a median of 12 biopsy cores (Table 1) [20, 22,23,24,25,26,27, 30]. However, three studies used a transperineal approach with biopsy cores ranging from 24 to 40 [21, 28, 29]. This is important to realize when such a model is applied in a setting where this biopsy technique is not implemented as a routine daily based practice. Furthermore, the results of biopsy schemes involving saturation biopsies appear to have a higher concordance rate with results from prostatectomy than a scheme involving 12 cores, indicating that these tools predicts more accurate the risk of csPCa [42]. However, in a literature review the increase in diagnostic yield becomes marginal as the number of cores increases above 12 [42, 43]. Furthermore, some existing risk calculators are derived from a six or eight core approach, in which the reference test may be less reliable [44, 45]. Moreover, different targeting methods (MRI/US fusion or cognitive biopsy) were used. Subsequently, when systematic biopsies are performed with the knowledge of the mpMRI findings, the results from this input variable might overestimate the diagnostic performances of the multivariable model.

Next to prevalence of csPCa, the characteristics of the population tested, may influence the outcome of the models. Therefore, again, the applicability of the predicted risk should match with the risk in the investigated population. For example, higher PSA value combined with a higher positive rate of DRE or abnormal TRUS findings, may select a population with a high-risk for csPCa, and may reduce the additional value of mpMRI. Moreover, when experienced radiologists have interpreted the mpMRIs in the cohorts used for the predictive models, the results of these models may not be applicable to less experienced radiologists.

The constructed MRI risk prediction models were based on retrospective data with a probable risk of selection bias. Furthermore, these models were constructed with relatively small sample sizes, mostly from a single institution that may hamper the extrapolation and interpretation into larger cohorts.

Internal validation was performed on an even smaller (single-center) sample size and AUCs dropped in some examples [24, 25, 29]. Prediction tools require external validation in a multicenter study to assess their wider applicability. Only one study externally validated a constructed MRI risk model [30] with a small drop in AUC as compared to the development cohort [26].

Novel biomarkers for PCa with prognostic value will be developed. Ancillary tests such as the prostate cancer antigen 3 (PCA3) test, prostate health index, 4Kscore test, and ConfirmMDx may also be considered to provide further reassurance about omitting prostate biopsy. As a next step, these markers will be integrated in the multivariate risk prediction tools. One study reported the analysis of PCA3 in their report; however, in the multivariate analysis PCA3 did not sustain significance as an independent predictor [24].

Above all, we should take care investigating these MRI-based risk prediction models and make sure that in our surge of enthusiasm we do not throw the baby out with the bathwater. We do have already validated risk prediction models with satisfactory performance [13,14,15,16,17,18,19], however, the utility of these models are not completely adopted in daily practice, despite recommendation by international guidelines [3, 4]. Further improvement in selecting only those men who will benefit from mpMRI will be essential in the near future. The additional value of mpMRI in risk prediction on top off current models should be carefully analyzed. These new MRI risk assessment tools require external validation on different patient populations who possess varying baseline risks to ensure that the risk prediction tool performs satisfactorily prior to implementation in clinical practice.

Conclusions

Due to the predictive value of mpMRI in prostate cancer diagnosis, recently new multivariate risk prediction tools have been constructed, with the inclusion of the mpMRI suspicion score. All MRI risk prediction models had a high accuracy with area’s under the receiver-operating characteristic curves ranging from 0.78 to 0.93, and suggest that the usage of these tools would allow the selective identification of patients with significant disease. By analyzing individual performance parameters, physician and patient may be better informed to decide whether or not to undergo further biopsy testing.