Several studies have shown that a higher Gleason score (GS) is an important prognostic factor for prostate cancer (PC) regardless of treatment.1,2,3 Tumor grading was reported using Grade Groups (GGs) first proposed by authors at Johns Hopkins Hospital,2 validated in a large multi-institutional study,4 and subsequently endorsed by the 2014 International Society of Urological Pathology (ISUP) Consensus Conference,5 whereby GG1 = GS ≤ 6, GG2 = GS 3 + 4 = 7, GG3 = GS 4 + 3 = 7, GG4 = GS 8, and GG5 = GS 9–10. The GGs reflect a biological and clinical behavioral distinction within PCs with GS 7, differentiating between GS 3 + 4 (GG 2) and GS 4 + 3 (GG 3).6

Currently, GG 4 is equivalent to GS 8, consisting of GSs 4 + 4, 3 + 5, and 5 + 3. GG 4 is still considered a homogenous entity with regard to its associated prognosis and treatment allocation. However, some reports have raised questions regarding its prognostic heterogeneity, suggesting the reclassification of GG 4 into separate GGs.7,8 Moreover, given that Gleason pattern 5 has a negative prognostic significance compared with pattern 4, there is concern that GG 4 is subject to heterogeneity with respect to oncological outcomes.9,10,11 An updated meta-analysis of different GS patterns of PC in GG 4 showed that GS 4 + 4 was associated with better overall survival (OS);12 however, this meta-analysis had significant heterogeneity in the population of interest, mainly because it did not restrict the interventions implemented. Moreover, it made no distinction between GG 4 on biopsy and radical prostatectomy (RP) specimens.

We have shown prognostic differences in patients with PC within GG 4 treated with RP based on different GSs in RP specimens, suggesting that there is considerable heterogeneity within GG 4 in terms of oncological and surgical pathologic outcomes.13 However, there is large discrepancy between the biopsy and RP GS, with the two specimens matching exactly in only approximately 40–60% of cases.14,15,16,17 Thus, it remains unclear whether our findings on RP specimens also hold true for biopsy specimens.

As treatment decisions are generally made based on prostatic biopsy specimens, adequate biopsy GS stratification remains of utmost importance. Therefore, the purpose of this study was to determine the prognostic homogeneity/heterogeneity, as assessed by pathologic and oncologic outcomes, between the three different GS groups within biopsy GG 4 in patients treated with RP. Such analyses may enable more accurate risk stratification of patients with biopsy GG 4.

Material and Methods

Patient Selection

This study obtained approval from the Institutional Review Board at each participating institution, with all sites providing institutional data-sharing agreements prior to the initiation of the study. A total of 6724 patients were treated with RP for clinically nonmetastatic PC between 2005 and 2019 at the four participating institutions (Mayo Clinic, University Hospital Hamburg-Eppendorf, Weill Cornell Medical College, and University of Texas Southwestern). Patients with biopsy GG 4 (consisting of GSs 4 + 4, 3 + 5, and 5 + 3) were then included for analysis; as such, a total of 1791 patients were assessed. No patients received neoadjuvant hormone therapy. The multicenter retrospective nature of the study meant that preoperative staging was not standardized. In general, preoperative imaging (conventional bone scans, computed tomography [CT] scan of the chest, abdomen, and pelvis) was performed based on the patients’ clinicopathological features (i.e. prostate-specific antigen [PSA] and GS at biopsy), current guidelines, and physician discretion. Patients were considered to have non-metastatic disease if preoperative imaging showed no cancer spread from the primary site to other sites.

Data Collection and Pathologic Evaluation

Demographic, surgical, pathologic, and outcome data were collected. Data on age, biopsy GS, clinical stage, baseline PSA, RP GS, pathologic stage, and positive surgical margins (PSMs) were confirmed for all patients. Specimens were analyzed by dedicated genitourinary pathologists at each center. Pathologic stage was assigned using the 2009 American Joint Committee on Cancer tumor-node-metastasis staging system.

Management and Follow-Up

All patients were treated with RP with or without pelvic lymph node dissection according to guideline recommendations at the time of the study and at the surgeons’ discretion. While the multicenter retrospective nature of the study meant that the area of lymph node dissection was not standardized, as a rule, extended lymph node dissection was performed in the current cohort, which included only high-risk PC. Patients were followed-up in accordance with institutional protocols and local guidelines at the time. In general, patients underwent physical examinations and PSA testing every 3 months in the first year after surgery, semi-annually from the second to fifth years, and annually thereafter. Biochemical recurrence (BCR) was defined as two consecutive increases in PSA over 0.2 ng/mL.18 The date of the first increase was considered the date of BCR. The cause of death was determined by the treating physician, based on chart reviews corroborated by death certificates, or by death certificates alone. Follow-up time was calculated as starting from the date of RP.

Statistical Analysis

Associations of GS with categorical variables were assessed using the Chi-square test or Fisher’s exact test, and differences in continuous variables were analyzed using the Kruskal–Wallis test. BCR-free survival (BCRFS), cancer-specific survival (CSS), and OS were analyzed using the Kaplan–Meier method and the log-rank test. Extraprostatic extension (EPE) was defined as ≥ pT3a, while non-organ-confined (NOC) disease was defined as  ≥ pT3a and/or lymph node-positive disease. Logistic regression analysis was performed to assess the association of GS and other predictive factors with GS upgrading, PSM, lymph node metastasis, EPE, and NOC disease. Univariable and multivariable Cox regression models were used to evaluate the association of various prognostic factors with BCR, death from PC, and all-cause mortality. The discrimination of the model was evaluated using Harrel’s concordance index. All p-values were two-sided and statistical significance was defined as p < 0.05. Statistical analyses were performed using R (The R Foundation for Statistical Computing, Vienna, Austria) and Stata/MP 14.3 statistical software (StataCorp LLC, College Station, TX, USA).

Results

Patient Demographics and their Association with the Gleason Score (GS)

A total of 1791 patients (GS 3 + 5, 190; GS 4 + 4, 1557; and GS 5 + 3, 44) were included in the analysis. Table 1 and electronic supplementary Table 1 summarize the clinicopathological characteristics of the study cohort. Lymphadenectomy was performed in 1773 patients (99.0%). There was a significant difference in RP GS and pathological node stage between the groups (p < 0.001 and p = 0.02, respectively). Biopsy GS 5 + 3 was associated with higher rates of GS upgrading and lower rates of GS downgrading in RP specimens than GS 4 + 4 and GS 3 + 5 (p = 0.0009) [Table 2]. Electronic supplementary Table 2 summarizes the clinicopathologic characteristics of the patients at each institution. Heterogeneity was found in patient characteristics between the participating institutions, with that in the proportion of patients receiving adjuvant treatments (androgen deprivation therapy [ADT] 0–14.6%; radiation therapy [RT] 0–9.1%) that could have affected survival and in the proportion of patients with GS 3 + 5 (6.4–17.6%), GS 4 + 4 (78.4–92.4%), and GS 5 + 3 (1.2–4.1%) in GG 4 found to be particularly large.

Table 1 Patient demographics
Table 2 Concordance between biopsy and RP specimens

Association between the GS and High-Risk Surgical Pathological Features

On multivariable analyses adjusting for PSA and clinical T stage, biopsy GS within GG 4 was significantly associated with GS upgrading in RP specimens (p = 0.004), but not with the risks of PSM, lymph node metastasis, EPE, and NOC disease (Table 3). Specifically, compared with GS 3 + 5, GS 5 + 3 was significantly associated with higher rates of GS upgrading in RP specimens (odds ratio [OR] 3.24, 95% confidence interval (CI) 1.54–6.83; p = 0.002). Similarly, biopsy GS 5 + 3 was significantly associated with higher rates of GS upgrading in RP specimens than GS 4 + 4 (OR 3.17, 95% CI 1.65–6.08; p = 0.0005).

Table 3 Logistic regression analysis (adjusting PSA and clinical T stage)

Association between the GS, Recurrence, and Survival

At a median follow-up of 75 months, 750 patients experienced BCR, 146 died of any cause, and 57 died of PC. GS was significantly associated with BCRFS in the log-rank analysis (p = 0.01) [Fig. 1]. The BCRFS rates in biopsy GS 3 + 5, 4 + 4, and 5 + 3 were 61.2%, 49.0%, and 42.7%, respectively, at the 7-year follow-up, and 51.5%, 44.2%, and 36.6%, respectively, at the 10-year follow-up. Table 4 shows the results of the univariable and multivariable Cox proportional hazard regression analyses in the overall cohort. In the univariable analysis, GS was significantly associated with BCRFS (p = 0.007). In the multivariable analysis that adjusted for clinicopathologic features, GS remained an independent prognostic factor for BCRFS (p =  0.03). In contrast, GS was not associated with OS and CSS. Compared with GS 3 + 5, GS 4 + 4 was significantly associated with worse BCRFS (hazard ratio 1.43, 95% CI 1.12–1.86: p = 0.005). Adding the GS did not improve the accuracy of the predictive models for BCRFS, OS, or CSS (data not shown).

Fig. 1
figure 1

Kaplan–Meier estimates of oncologic outcomes stratified by different Gleason scores in 1791 prostate cancer patients with grade group 4 treated with radical prostatectomy. a Biochemical recurrence free survival; b overall survival; c cancer-specific survival. GS Gleason scores, BCRFS biochemical recurrence-free survival

Table 4 Cox regression analysis (adjusting PSA and pathological T stage and PSM)

Discussion

This study was conducted to investigate the prognostic differences between GS 3 + 5, GS 4 + 4, and GS 5 + 3 in biopsy specimens from patients with PC classified into GG 4 based on the association with oncologic and surgical pathologic outcomes. The results indicate that GS 5 + 3 was associated with significantly higher rates of GS upgrading in RP specimens than GS 3 + 5 and GS 4 + 4. In contrast, GS was not associated with lymph node metastases, NOC, PSM, and EPE disease. Moreover, GS was not associated with OS or CSS, but was significantly associated with BCRFS.

Initial validation studies of grading for PC combined GS 8 into one prognostic group;5 however, the results from our study do not provide clear support for subdividing patients with GS 8 into three prognostic groups. Current evidence suggests that, as the strongest pathologic predictor of recurrence, metastasis, and PC-specific death, Gleason pattern 5 may have important biological and clinical implications and accounts for varying oncological outcomes in patients who fall within the GG 4 category.9,10,19 In addition, GS 3 + 4 and GS 4 + 3 patterns differ significantly in prognosis depending on the percentage of Gleason pattern 4 cancer present (greater or less than 50%), suggesting that the percentage of Gleason pattern 5 cancer may result in differences in prognosis between the GS 3 + 5 and GS 5 + 3 patterns, in agreement with previous studies demonstrating that the percentage of high-grade patterns has prognostic value in predicting oncological outcomes in PC patients undergoing RP.20,21,22 Therefore, the proposal to classify patients with GS 8 into a single category (GG 4) may not have strong theoretical support. However, our biopsy specimen-based study detected some limited differences within the GG 4 category in terms of BCR and GS upgrading in RP specimens.

Our findings are relevant considering the paucity of studies assessing the prognostic differences within GG 4 in patients with PC treated with RP.7,13,23,24,25 Indeed, a review of the literature shows that the evaluation of GG 4 involved biopsy specimens alone in one study, prostatectomy specimens alone in two studies, and biopsy/prostatectomy specimens in two studies (electronic supplementary Table 3); however, of these, the study involving biopsy specimens alone evaluated GS downgrading as the only outcome measure, but provided no survival analysis.24 Therefore, this is the first analysis to assess differences in prognosis in terms of mortality, BCR, and surgical pathological outcomes among PC patients within the GG 4 category (GS 3 + 5 vs. GS 4 + 4 vs. GS 5 + 3) treated with RP based on biopsy specimens. In this regard, the limited heterogeneity shown in this study within GG 4 in terms of oncological and surgical pathological outcomes have clinically relevant implications in patients who fall within the GG 4 category.

Again, while the RP specimen-based studies reported not only oncological but also surgical pathologic outcomes, biopsy GS remains the mainstay of diagnosis as a basis for treatment decision making. In addition, there is large discrepancy between biopsy and RP GS, with the concordance rate between these GSs reported to be no more than 40–60%.14,15,16,17 Moreover, GS is differently assigned in an RP specimen than in a biopsy specimen due to the much larger area of tissue sampled and the different pathological criteria used for grade assignment in biopsy and RP specimens. For example, GS is differently assigned in patients whose secondary Gleason pattern is assigned a higher GS despite accounting for < 5% of the tumor, or in patients whose tertiary Gleason pattern is GS 5 despite accounting for < 5% of the tumor. Indeed, the current study showed different results from those of our previous RP specimen-based study,13 suggesting that biopsy specimens are not sufficiently accurate to yield similar results relative to RP specimens, and thus leading to minimal heterogeneity between GS patterns within biopsy GG 4 (GS 3 + 5 vs. GS 4 + 4 vs. GS 5 + 3) despite their significant difference in regard to GS upgrading, downgrading, or BCR.

While this study provides a number of findings of interest, it has some limitations. First, the pathological specimens were not centrally evaluated, and most patients depended on their individual pathologists for GS identification and reporting. Furthermore, this retrospective study failed to evaluate the percentage of each Gleason grade in biopsy specimens, thus possibly affecting survival outcomes. Moreover, the GS patterns were shown to be differently distributed in our study than previously reported. In an earlier large multi-institutional study involving genitourinary pathologists and conducted from 2005 to 2014, of the 16,172 patients undergoing needle biopsies, only 44 (0.3%) and 6 (0.04%) were shown to have GS 3 + 5 = 8 and GS 5 + 3 = 8, respectively (unpublished data).26 In contrast, in our study, a majority (86.9%) of the patients had GS 4 + 4, while 10.6% and 2.5% had GS 3 + 5 and GS 5 + 3, respectively. Moreover, the study has found large inter-institutional heterogeneity in the proportion of patients shown to have GS 3 + 5 (range 6.4–17.6%), GS 4 + 4 (range 78.4–92.4%), and GS 5 + 3 (range 1.2–4.1%). Thus, the proportions of patients shown to have GS 3 + 5 and GS 5 + 3 varied from one institution to the next but were high at all institutions. This raises concern as to whether or not our findings may be readily generalizable. The absence of central reviews involving expert pathologists may thus be the largest limitation of this study, given that, indeed, earlier studies lacking central reviews were associated with high proportions of patients with GS 3 + 5 and GS 5 + 3 as our study (electronic supplementary Table 3), and that a high percentage of GS 3 + 5 and GS 5 + 3 has been re-categorized upon expert review.27 These factors could have led to the misinterpretation of the pathological reports, thus unpredictably affecting the oncologic outcomes. Second, the preoperative staging, operation method, and follow-up protocols could not be standardized. Moreover, due to its multicenter nature, our study may have suffered from heterogeneity in the selection of patients and administration of adjuvant and salvage treatments. Indeed, it was found to be particularly large in the proportion of patients undergoing adjuvant treatments (ADT 0–14.6%; RT 0–9.1%), which could have affected survival outcomes. Third, given its multi-institutional nature, many institutional characteristics, which likely remained only insufficiently captured by our regression models, may have affected the study outcomes (e.g. inherent differences in follow-up protocols, preoperative staging, operation method, monitoring of oncologic events, and pathologic specimen processing). Thus, overrepresentation of one of these institutions within one of the three subclassifications of GG 4 could have skewed the results. Fourth, some relevant data (e.g. MRI image data or use of targeted fusion biopsies) were unavailable for analysis. Moreover, lack of patient data on factors deemed important to pathological evaluation, such as cribriform or intraductal features, was also a major limitation of the study. Fifth, the inclusion of subjects with high PSA levels in the study cohort may have led to selection or information bias in this study. Furthermore, limitations associated with the use of conventional imaging modalities have been highlighted in this study. Finally, given the median follow-up duration of 75 months and the low number of deaths, this study may not have evaluated mortality adequately.

Conclusion

We found that patients with biopsy GG 4 exhibited some limited heterogeneity, while significant differences were seen in association with GS upgrading, downgrading, or BCR. Therefore, the biopsy specimen-based GG 4 classification may be deemed valid. However, caution should be exercised in interpreting the conclusions drawn from this study, given the lack of central pathological specimen evaluation. Thus, well-designed prospective studies with prolonged follow-up are warranted to validate the differential prognostic and biological values within GG 4 in the clinical setting, as well as to investigate whether such validation may lead to better clinical decision making for patients with PC.