Introduction

Of all patients diagnosed with cancer, 2% present as metastatic carcinoma of unknown primary site (CUP)1. It is classified as any metastatic epithelial tumour where, following extensive clinical history, physical examination, radiological studies and histopathological investigations, failed to identify the primary site of tumours2. According to the European Society for Medical Oncology (ESMO) guidelines for the treatment of patients with favourable-risk CUP, the administration of various regimens of chemotherapy alone or in combination with radiotherapy or hormonal therapy has been proposed as only standard treatment guidelines3. Because of CUP tumour heterogeneity, the current clinical trials are challenging to perform, resulting in a poor prognosis with a median survival of less than 12 months and 5-year survival of 14%4. Thus, there is an urgent need to improve treatment modalities and prolong patients' survival with CUP5.

Personalized cancer medicine using genomics technologies opened new ways to treat various types of cancer using the identification of targetable mutations6,7,8,9,10. Recent studies have highlighted the crucial role of precision medicine in patient stratification and the selection of effective treatment in malignant types of cancer11,12,13,14,15,16,17. Moreover, several studies have reported improved overall survival in patients with advanced and metastatic cancers who have received genetically matched targeted therapies18,19. In CUP tumours, the implementation of this approach may improve treatment by targeting tumour-specific and druggable somatic variants in a personalized manner4. The AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE) has recently collected the genomic information, including mutations and copy number variation of the wide range of solid tumours, including CUP from both primary and metastatic tumours20,21,22. Using these public data, we analyzed the genomic mutations and copy number alterations of 1709 CUP samples to provide insight into the genetic makeup of these tumours and determined potentially druggable targets.

Results

Clinical characteristic of samples

In total, 45,048 samples across 17 cancer types, including CUP, were included in this study. The sample type distribution was 24,567 primary and 15,484 metastasis tumours in GENIE cohorts. The hotspot regional mutations and copy number variations of these samples were available from GENIE and cBioportal. According to the information provided by GENIE, we divided samples into more than17 broader cancer types, including CUP samples (Fig. 1A). The cancer types containing the most samples were non-small cell lung cancer (9090 (15.3%)), breast invasive ductal carcinoma (8712 (14.7%)), colorectal cancer (5961 (10.0%)), Glioma (3214 (5.4%)), Melanoma (2492 (4.2%)), prostate cancer (2214 (3.7%)). The number of CUP samples registered in this cohort was 1709 (2.9%), dividing into 1222 metastatic (71.5%), 288 primaries (16.9%), 182 (10.6%) not applicable or heme and 17 (1.0%) unspecified (Fig. 1B) according to the ICD-O, ICD-O-3, and MSKCC OncoTree ontology classification. In addition, the CUP samples comprised various Not Otherwise Specified (NOS) cancer types, including but not limited to adenocarcinoma (503 (29.4%)), poorly differentiated carcinoma (156 (9.1%)), neuroendocrine carcinoma (146, (8.5%), and squamous cell carcinoma (124, (7.3%)) (Fig. 1C).For gender information among CUP patients, 864 (50.56%) of patients were female, and 845 (49.44%) were male (Fig. 1D).

Figure 1
figure 1

Overview of the GENIE database. Distribution of tumour types among cases successfully sequenced and analyzed in this cohort.

Significantly mutated genes (SMG) in CUP samples

We analyzed the most genomic mutations of hotspot regions at the gene level in CUP samples in GENIE according to the previously developed method9,10. In total, 52 SMG was identified (Fig. 2A; Supplementary Table 1). Among SMGs, the mutation rate of TP53, KRAS, ARID1A, SMARCA4 and KMT2D were recorded significantly higher than other identified SMGs (Fig. 2B, Supplementary Table 1). The pathway enrichment analysis of identified SMGs resulted in SMGs' involvement in a wide range of cellular processes (Fig. 2C, Supplementary Table 2), including transcription factors/regulators, receptor tyrosine kinase signalling, cell cycle, IGF pathway-protein kinase B signalling, phosphatidylinositol-3-OH kinase (PI(3)K) signalling, Wnt/β-catenin signalling, PDGF, FGF, EGF, TGF-β, and Notch signalling pathways and integrin signalling pathway. The identification of MAPK, PI(3)K and Wnt/β-catenin signalling pathways is consistent with classical cancer studies. Notably, almost all samples had at least one non-synonymous mutation in one of the SMG. The average number of point mutations in these genes varies across samples, with the highest (512 mutations for TP53 across 727 cases) and the lowest (15 mutations for GLI3 across 15 cases) (Fig. 2B. Supplementary Table 1). This suggests that the numbers of both cancer-related genes (52 identified in this study) and cooperating driver mutations required during oncogenesis are few, although large-scale structural rearrangements were not included in this analysis. Interestingly, in line with the previous study performed by Zehir et al.9 highlighting TERT promoter mutations across few primary tumours, we observed a similar mutation of TERT promoter among CUP samples (n = 91) (Fig. 2D). Although the clinical relevance of mutations in the TERT promoter remains incompletely understood, our results reaffirm the high prevalence of these alterations in patients with advanced solid tumours and suggest an association with disease progression and poor outcome. Additionally, the presence of similar mutations of TERT promoter in CUP and NSCLC samples suggests these mutations may serve as a diagnostic marker for identifying the primary tumour in CUP patients.

Figure 2
figure 2

The most significant mutated genes in CUP samples. (A) Mutation frequency of SMGs. Genes with a cohort-level alteration frequency of > 5% or a tumour type–specific alteration frequency of > 30% are displayed. (B) Genomic alterations of 52 SMGs within CUP samples. (C) Pathway enrichment analysis of SMGs from MSigDB. (D) Genomic alterations identified in TERT promoter among CUP samples.(Mutation frequency were analyzed using the MuSiC suite version (0.4) and Mutatiom Mapper under cBioportal (https://www.cbioportal.org/mutation_mapper).

Mutual exclusivity and co-occurrence among SMGs

The 1035 pair-wise exclusivity and co-occurrence analysis of the 52 SMGs found 198 mutually exclusive (P value < 0.001) and 837 co-occurring (P value < 0.001) pairs (Fig. 3 and Supplementary Table 3) among cup samples. Pairs with significant exclusivity were include KRAS and FAT1, KRAS and NOTCH3, KRAS and NF1, Kras and DMD and CDKN2A and RB1 in CUP samples. Additionally, the cohort analysis identified pairs with significant co-occurrence, including KRAS and APC, TP53 and APC, KRAS and CDKN2A, KRAS and STK11, KRAS, KEAP1, and SMARCA4 and KEAP1, highlighting the importance of these oncogenes in CUP tumours.

Figure 3
figure 3

Mutual exclusivity and co-occurrence between identified SMG in CUP. The data retrieved from cbiopotal (https://genie.cbioportal.org) and anayized using Mutual exclusivity module under GiTools software (version 2.3.1).

Copy number alteration among cup samples

The copy number variation differences within CUP samples resulted in the identification of 624 frequently amplified/deleted regions. Significant amplification of MYC, FGF4 and FGF19 observed in a small fraction of patients (Fig. 4A), while deletion of cell cycle-related genes CDKN2B and CDKN2A were detected in only 10% and 20% of patients, respectively (Fig. 4A). Further, we analyzed copy number alteration of the CUP-SMGs within CUP samples (Fig. 4B) and across primary tumours of 14 cancer types registered in GENIE (Fig. 4C, Supplementary Table 4). Among CUP samples, a deep deletion of TP53, RB1, CDKN2A, and STK11 and amplification of KRAS and PIK3CA were observed. In a pan-cancer analysis, amplification of KRAS and PIK3CA in the breast (66 and 114 of cases) and non-small cell lung cancer (46 and 48 of cases), TERT in non-small cell lung cancer (114 of cases) and ATR in breast cancer (36 of cases), were the most amplified genes, while deletion in CDKN2A in glioma (676 of cases), RB1 and TP53 in small cell lung cancer (15 of cases) were observed in these 14 different cancer types (Fig. 4C). Among these genes with significantly altered copy numbers between CUP and primary tumours, a significant amplification of TERT promoter was observed in both CUP and non-small cell lung cancer samples compared to glioma and breast primary tumours suggesting that copy number variation of TERT may play a diagnostic role for the identification of the origin of CUP tumours (Fig. 4D).

Figure 4
figure 4

Copy number variation among CUP and other cancer types. (A) major copy number alteration detected in CUP samples. (B) Amplification and deletion status of identified CUP-SMGs within CUP samples and (C) other known primary tumours registered in GENIE database. (D) copy number variation analysis of TERT between CUP and all primary tumours (up panel) and NSCLC, GBM and BRCA (bottom panel). (data analyzed using cBioportal (http://www.cbioportal.org) version v3.2.11.

Mutation frequency of CUP-SMGs across 17 known primary tumours

To identify similar and targetable mutation patterns in CUP, we analyzed and compared the genomic alteration frequency of identified CUP-SMGs in primary tumour types across 17 cancer types in GENIE (Fig. 5A). The majority of CUP-SMGs mutations were enriched in non-small cell lung cancer (4221 cases), colon cancer (4011 cases) and breast cancer (3376 cases) (Fig. 5A).

Figure 5
figure 5

Mutation frequency of CUP-SMGs across 17 cancer types. (A, B) Distribution of genomic alterations in 52 CUP-SMGs across primary tumours of 17 different cancer types in GENIE cohort. (The oncoplot was generated using (https://genie.cbioportal.org). (C) Distribution of hotspot mutations identified in KRAS among six different cancer types including CUP. (D) The results of gene-drug association analysis using PanDrug platforms. The best candidate drugs with highest GScore and Dscore are labeled.

The most frequently mutated gene in this cohort was TP53 (44% total samples) (Fig. 5B). Its mutations predominate in non-small cell lung cancer (46.36%; 2517 cases), colon cancer (65.55%; 2365 cases) and breast cancer (36.26%; 2060 cases) (Fig. 5B). KRAS is the second most commonly mutated gene, occurring frequently (> 10%) in most cancer types (pancreatic: 74.6%, colon cancer:44.24%, non-small cell lung cancer:30.93%) except hepatobiliary carcinoma, cervical cancer, bladder cancer, thyroid cancer, melanoma, small-cell lung cancer, head and neck carcinoma, prostate and breast cancer (Fig. 5B). PIK3CA mutations were frequented in breast cancer (36.7%) and cervical cancer (25.14%), being specifically enriched in luminal subtype tumours. Many cancer types carried mutations in chromatin re-modelling genes. In particular, histone-lysine N-methyltransferase genes KMT2D, KMT2C and KMT2B in bladder, lung and endometrial cancers, whereas the KMT2A is mostly mutated in non-small cell lung cancer and colon cancer. Mutations in ARID1A are frequent in non-small cell lung cancer, colon cancer, bladder cancer and breast cancer, whereas mutations in KEAP1 and STK11 was predominate in non-small cell lung cancer (8.62% and 11.75%, respectively) (Fig. 5B). KRAS mutations are typically mutually exclusive, with recurrent activating mutations (KRAS (Gly 12) and KRAS (Gly 13) common in colon cancer, non-small cell lung cancer and pancreatic cancer. We compared the most common hotspot mutations in KRAS between CUP, and other KRAS mutation enriched cancer types (Fig. 5C). The comparison of hotspot mutations resulted in the enrichment of G12D and G12R in pancreatic cancer, G12C, G12F and G13C in non-small cell lung cancer and CUP samples. These data highlight similarity of KRAS hotspot mutations between CUP and NSCLC.

Targetable mutations and drug candidates

To identify or predict possible therapeutics based on genomic alterations identified in SMG in CUP samples, we performed a gene-drug association analysis using PanDrugs platforms23. The gene-drug associations classified into two groups called “Drug targets” in which drugs can directly target genes that contribute to disease phenotype, and “Biomarkers” where genes are representing a drug-response associated status while its protein products are not targetable23. From 262 identified interactions, 8.7% (23/262) was classified as a direct drug target, while 91.3% (239/262) of gene-drug interactions identified as Biomarker (Fig. 5D). Interestingly, we found five FDA approved drugs, Crizotinib (GScore: 0.76. Dscore: 0.95) and Copanlisib (GScore: 0.76. Dscore: 0.92), Debrafenib, Sorafenib, Vemurafenib, and Regorafenib as best candidates for targeting ALK/MET, PIK3CA, and BRAF inhibitors, respectively (Fig. 5D. Supplementary Table 5). Moreover, various off-label and clinically investigating compounds for targeting mutated KRAS were identified, although the GScore and DScore of these compounds did not reach a high score (Supplementary Table-5). Everolimus (mTOR inhibitor) and Bortezomib (26S proteasome inhibitor) were identified with the highest GScore and DScore compared to the other drugs candidates in this group (Supplementary Table-5). Taken together, these data highlight presence of at least one druggable variant and the potential of using genomic alteration guided targeted therapy in CUP patients.

Discussion

Currently, combination chemotherapy regimens have been considered as the first-line of therapy for CUP patients24. Personalized cancer therapy using the identification of druggable mutations has encouraged mutational profiling of various types of tumours, including metastasis tumours, for instance CUP25,26,27. This study analyzed the most significant mutated genes and identified the most prevalent variants in 1709 CUP samples. The gene-drug association studies suggested that at least one of the identified variants is linked to the known ,and approved targeted therapy agents or therapeutics are currently in clinical trial studies highlighting the potential of genomic alteration-based treatment approach for a patient with CUP. In line with this concept, numerous clinical studies have been reported durable treatment responses using mutation-matched targeted therapies drugs, including EGFR, BRAF, KIT, and MET18,28,29,30.

Currently, targeted therapy agents Crizotinib and Copanlisib approved for the treatment of tumours that harbour mutations in ROS1/MET/ALK and PIK3CA, while therapeutic agents for the other identified variants, including FGFR family, MYC, MET, and KRAS are currently under investigation in active and ongoing clinical trials. A large proportion of the mutations detected in this study are associated with various signal transduction pathways, apoptotic regulation, cell cycle progression, and receptor tyrosine kinase signalling regulations. These results can be promising because the majority of available targeted drugs act through targeting one of these pathways, which are commonly altered in various types of cancer with known primary tumours31,32,33,34,35. The most mutated gene identified in this study was TP53 (43%, 743/1709), with numerous non-synonymous coding region variants. Similar to these data, previous studies demonstrated the association of TP53 mutations in metastatic progression in multiple cancer types, supporting the presence of high mutation load on TP53 reported in CUP36,37.

Other common variants detected in this cohort were observed in genes involved in activating and regulating key signal transduction pathways, including BRAF and KRAS. This is the first study to report various codon 12 variants of KRAS in CUP samples. The detection of codon 12 mutations in this cohort is consistent with the highly aggressive behaviours of CUP tumours25,29. Furthermore, characterizing the mutational status of KRAS has become clinically relevant in some malignancies because the presence of a KRAS mutation is known to stimulate resistance to some tyrosine kinase inhibitors38,39,40,41. Although currently no approved therapeutic agent to target and inhibit mutant KRAS activity available; however, recent clinical studies reported a partial response in CUP patients with a KRAS(G12D) mutation treated with Trametinib (MEK inhibitor)30,42. In this study, we also observed KRAS(G12C) variant in 25% of CUP samples. Recent promising results from Sotorasib (AMG-510); a specific covalent inhibitor of KRAS(G12C) in NSCLC suggest detecting this variant of KRAS as a possible druggable target in CUP patients43. Moreover, targeting KRAS(G12C) using Sotorasib in advanced solid tumours showed an encouraging anticancer activity which might be useful in CUP44.

Similar to other studies, we also identified activating BRAF (V600E) mutations in 4.3% (74/1709) cases24,25,26. This offers the potential of using BRAF inhibitors such as Vemurafenib and Dabrafenib for CUP with BRAF (V600E) mutations. In line with these, through the gene-drug association analysis, we also observed a high GScore and DScore of BRAF inhibitors Dabrafenib and Vemurafenib for targeting V600E variant identified in CUP samples. Moreover, a clinical study showed a complete clinical response of CUP patients treated with BRAF(V600E) targeted therapy Vemurafenib in combination with immunotherapy agent Ipilimumab45.

Mutations in MET and ERBB2 (HER2) amplification were detected in 30 and 27 of cases,respectively, suggesting the possibility of targeting these receptor tyrosine kinase28. Targeting MET using Crizotinib for patients without exon-14 skipping combined with HER2 inhibitor Trastuzumab has been shown with success in CUP patients. The current success of HER2 and MET targeted therapies using Trastuzumab (for cases with a HER2 amplification status) and Crizotinib in a combination manner in advanced and metastatic tumours including HER2 amplified and MET-mutant CUP tumours, suggest the further evaluation of these genes as druggable targets in patients with CUP46. Our results support those of other CUP studies highlighting the value of sequencing techniques, particularly gene mutation detection, to identify actionable targets11,24,25,26,27.

Taken together, these data highlight the molecular heterogeneity of CUP tumours. The mutations detected across the majority of CUP cases included in this study highlight not only the genomic instability present in these tumours but also the potential application of targeted therapies for a significant proportion of patients with CUP, which might improve the prognosis and therapeutic decisions for these patients12.

Material and methods

Data collection

GENIE v5.0 provided the mutation, copy number variation, gene fusion and clinical information of 59,442 tumour samples21. Most onco-types were classified into 17 categories according to Oncotree (http://oncotree.mskcc.org/oncotree/). The onco-types not included in these 17 categories were excluded from our analysis. Raw data were downloaded from Synapse (syn17112456, https://www.synapse.org/) and provided by the GENIE project using either R commands or cbioportal (https://genie.cbioportal.org/)47,48. The preprocessing protocols for these data are described in the GENIE-provided data guide.

Significantly mutated genes (SMG) analysis

The SMG analysis performed according to the previously developed criteria and protocols20,21. We used the MuSiC suite49 to identify significant genes for CUP samples and also for Pan-Cancer tumours according. This test assigns mutations to seven categories: AT transition, AT transversion, CG transition, CG transversion, CpG transition, CpG transversion and indel, and then uses statistical methods based on convolution, the hypergeometric distribution (Fisher's test P value < 0.05), and likelihood to combine the category-specific binomials to obtain overall P values. Notably, the genes with a cohort level alteration frequency of ≥ 5% or a tumour type-specific alteration frequency of ≥ 30% were included in our analysis, while tumours having no mutation, or more than 500 mutations were excluded in this study. Differentially mutated sites were plotted using Mutation-Mapper module in cBioportal. (https://www.cbioportal.org/mutation_mapper).

Copy number variation analysis

Copy number alteration data were available at AACR Project GENIE, in cbioportal. In the present study, we selected the 17 most common cancer types for comparing their copy number variation frequencies with CUP samples. We calculated the changes in the average frequency of copy number variation (amplification and deletion) of CUP and Pan-cancer samples using provided R code in cbioportal.

Mutual exclusivity and co-occurrence analysis

We used Fisher’s exact test to identify pairs of SMGs with significant (P value < 0.001 by Benjamini–Hochberg) exclusivity and co-occurrence. We identified significant pairs by analyzing all CUP samples together. Then we used a de novo driver exclusivity algorithm known as Dendrix50 to identify sets of approximately mutually exclusive mutations on all samples together. The plotting for mutual exclusivity and co-occurrence was performed using Gitools software (version 2.3.1)51.