An in-silico method leads to recognition of hub genes and crucial pathways in survival of patients with breast cancer

Dashti, Sepideh; Taheri, Mohammad; Ghafouri-Fard, Soudeh

doi:10.1038/s41598-020-76024-2

Download PDF

Article
Open access
Published: 30 October 2020

An in-silico method leads to recognition of hub genes and crucial pathways in survival of patients with breast cancer

Sepideh Dashti¹,
Mohammad Taheri² &
Soudeh Ghafouri-Fard¹

Scientific Reports volume 10, Article number: 18770 (2020) Cite this article

5118 Accesses
18 Citations
Metrics details

Subjects

Abstract

Breast cancer is a highly heterogeneous disorder characterized by dysregulation of expression of numerous genes and cascades. In the current study, we aim to use a system biology strategy to identify key genes and signaling pathways in breast cancer. We have retrieved data of two microarray datasets (GSE65194 and GSE45827) from the NCBI Gene Expression Omnibus database. R package was used for identification of differentially expressed genes (DEGs), assessment of gene ontology and pathway enrichment evaluation. The DEGs were integrated to construct a protein–protein interaction network. Next, hub genes were recognized using the Cytoscape software and lncRNA–mRNA co-expression analysis was performed to evaluate the potential roles of lncRNAs. Finally, the clinical importance of the obtained genes was assessed using Kaplan–Meier survival analysis. In the present study, 887 DEGs including 730 upregulated and 157 downregulated DEGs were detected between breast cancer and normal samples. By combining the results of functional analysis, MCODE, CytoNCA and CytoHubba 2 hub genes including MAD2L1 and CCNB1 were selected. We also identified 12 lncRNAs with significant correlation with MAD2L1 and CCNB1 genes. According to The Kaplan–Meier plotter database MAD2L1, CCNA2, RAD51-AS1 and LINC01089 have the most prediction potential among all candidate hub genes. Our study offers a framework for recognition of mRNA–lncRNA network in breast cancer and detection of important pathways that could be used as therapeutic targets in this kind of cancer.

High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis

Article Open access 19 December 2023

A single-cell and spatially resolved atlas of human breast cancers

Article 06 September 2021

Hallmarks of transcriptional intratumour heterogeneity across a thousand tumours

Article 31 May 2023

Introduction

Breast cancer is the second most frequent and the fifth cause of cancer-associated mortality¹. This type of cancer is associated with dysregulation of several genes (including both coding and non-coding ones) and signaling pathways². Breast cancer is a molecularly heterogeneous disorder which is classified to five subtypes including luminal A, luminal B, basal-like, HER2-enriched and normal-like. This classification is based on the presence/ abundance of estrogen receptor (ER), progesterone receptor (PR), HER2 and Ki67^3,4. However, several recent studies have indicated significance of other genes and signaling pathways in determination of overall survival (OS) of patients^2,5. Among the recently appreciated genes in this regard are long non-coding RNAs (lncRNAs)⁶. These transcripts are involved in the regulation of fundamental cell survival pathways and have functional interactions with proteins and other non-coding RNAs that participate in the pathogenesis of breast cancer⁷. Identification of such networks is an important step towards design of targeted therapies in breast cancer.

In the current study, we have retrieved data of two microarray datasets (GSE65194 and GSE45827) from the NCBI Gene Expression Omnibus database (GEO). R package was used for identification of differentially expressed genes (DEGs), assessment of gene ontology (GO) and pathway enrichment evaluation. The DEGs were integrated to construct a protein–protein interaction (PPI) network. Next, hub genes were recognized using the Cytoscape software and lncRNA–mRNA co-expression analysis was performed to evaluate the potential roles of lncRNAs.

Methods

In this study, we used a system biology approach for data mining of two microarray datasets of normal and malignant breast tissue (GSE65194 and GSE45827). We aim to identify differentially expressed genes (DEGs) and lncRNAs and construct an mRNA–lncRNA network based on co-expression analysis. Figure 1 shows summary of the steps accomplished in bioinformatics strategy.

Gene expression profile data collection

Two gene expression profiles associated with breast cancer (GSE65194 and GSE45827) were obtained from the NCBI Gene Expression Omnibus database (GEO, https://www.ncbi.nlm.nih.gov/geo/). A chip-based platform GPL570 (HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array was applied for both datasets. The GSE65194 included 130 breast cancer samples (41 Triple negative, 30 HER2 positive, 29 Luminal A, 30 Luminal B) and 11 normal breast tissue samples⁸. Similarly, the GSE45827 contained 130 tumor tissue specimen (41 Triple negative, 30 HER2 positive, 29 Luminal A, 30 Luminal B) as well as 11 normal tissues samples⁹.

Data preprocessing and DEGs identification

All raw data files were subjected to quantile normalization and background correction using Robust Multichip Average (RMA)¹⁰. RMA is an effective tool in the affy Bioconductor package for both mRNA and lncRNA profiling data¹¹. Quality Control assessment was done with AgiMicroRna Bioconductor Package¹². We conducted a dimensional reduction analysis by performing Principal component analysis (PCA)¹³ with the purpose of finding similarities between each group of samples using ggplot2 package of R software¹⁴. The linear models for microarray data (LIMMA) R package¹⁵ in Bioconductor (https://www.bioconductor.org/)¹⁶ were used to perform differential expression gene analysis (DEGA) between breast cancer and normal breast samples. The Student's t-test was applied and DEGs with false discovery rate (FDR) < 0.01 and a |log2FC (fold change)|> 2 were screened.

Functional enrichment analysis

To identify the role of DEGs in breast cancer, KEGG Pathway and GO function enrichment analysis in 3 functional ontologies namely biological process (BP), cellular component (CC) and molecular function (MF) were performed using the DAVID system (https://david.ncifcrf.gov/). The adjusted P < 0.05 was considered as statistically significant¹⁷.

PPI network construction, cluster analysis and key gene identification

To predict interactive relationships among common DEGs encoding proteins, we constructed a PPI network using online STRING database (https://string-db.org/)¹⁸. The minimum interaction score > 0.4 was required to construct the PPI network. Cytoscape software version 3.7.1 (https://www.cytoscape.org/) was applied to visualize the PPI networks and analyze the hub genes¹⁹. We used Molecular COmplex DEtection (MCODE) algorithm (version 1.5.1) to find PPI subnetwork and the highly interconnected clusters within the PPI network. MCODE is a Cytoscape plug-in in which we set maximum depth = 100, node score = 0.2, and K-core = 2 as threshold parameters²⁰. CytoHubba (version 1.6)²¹ and CytoNCA (version 2.1.6)²² are two other plug-in in which provide multiple algorithms to detect hub genes in the network. In addition, identified key genes were selected for additional expression analysis on 1104 cancer and 113 normal samples from the TCGA project in The Encyclopedia of RNA Interactomes (ENCORI) database (https://starbase.sysu.edu.cn/panCancer.php). Pearson correlation coefficient was assessed between hub genes. The correlation coefficients were also checked on TCGA dataset by using Gene Expression Profiling Interactive Analysis (GEPIA) database (https://gepia.cancer-pku.cn/).

Prediction of lncRNAs function

LncRNA–mRNA co-expression analysis was performed to evaluate the potential roles of lncRNAs. The full list of lncRNA genes with approved HUGO Gene Nomenclature Committee (HGNC) symbols was downloaded from (https://www.genenames.org/)²³. The list of lncRNA gene names was compared to our dataset gene symbols and overlapped genes were chosen. Then, differentially expressed lncRNAs were selected according to (|logFC|) > 0.5 and the adjusted P value < 0.01 cutoff criteria. The reason for application of easier selection criteria was the lower expression level of lncRNAs compared with mRNAs. Then, the Pearson correlation coefficient was calculated between the differentially expressed lncRNA and 2 key protein-coding genes that were obtained from the previous steps based on functional annotation and co-expression analysis (MAD2L1 and CCNA2) in our dataset. LncRNAs with correlation coefficients higher than 0.6 or lower than − 0.6 were chosen as the lncRNAs that co-expressed with MAD2L1 and CCNA2. In order to uncovering the importance of these candidate genes in different molecular subtypes of breast cancer, the expression of these genes was also examined in four breast cancer subtypes, including luminal A, luminal B, basal-like and HER2-enriched.

Survival analysis

Survival analysis was carried out on these candidate hub genes to check out their effects on breast cancer survival. Recurrence free survival (RFS) analysis and overall survival (OS) analysis were performed based on expression data from 6234 breast cancer patients by Kaplan Meier plotter (kmplot.com/) that can evaluate the effect of gene expression on survival in 21 cancer types²⁴. We split patients by Mean. In other words, the groups were divided with low expression level and high expression level based on Mean in the survival analysis. The hazard ratio was calculated for both RFS and OS and the P value was determined applying log-rank tests.

Results

DEGs screening

Before performing differentially expressed gene analysis (DEGA), background correction and normalization were done and we removed batch effect. We used AgiMicroRna Bioconductor Package for Quality Control assessment. Degradation plots which indicate the quality of RNA hybridization along the probe sets was drawn and the RNA quality was good. Furthermore, box plots for gene expression data were created to assess the distribution of data after normalization. In the box plots the different arrays had the similar median expression level. This result indicated correction was performed properly. Additionally, a PCA plot was drawn to illustrate the spatial distribution of the samples before and after batch effect correction (Supplementary Figure S1a). Principal components analysis (PCA) provides information about the structure of the analyzed dataset. It can be used to find similarities between samples. We found two samples from the normal group which is spatially far from other normal samples. As a consequence, we removed these two samples. Furthermore, a heatmap was drawn to illustrate the correlation between samples using Pheatmap package of R 3.6.1 software²⁵ (Supplementary Figure S1b). After correction, removing the batch effects and performing data normalization, 887 DEGs including 730 upregulated and 157 downregulated DEGs were screened between breast cancer and normal samples from GSE65194 and GSE45827 according to |logFC|> 2 and FDR < 0.01 as cut-off criteria. The list of upregulated and downregulated DEGs are indicated in Supplementary Tables S1 and S2, respectively. Figure 2a is a Venn diagram which illustrates the overlap between 2 datasets. Moreover, to visualize the overall gene expression levels of the DEGs, a Volcano plot was created with log2 FC score and log10 P values in R software (Fig. 2b).

KEGG and GO enrichment analysis

To further examine the role of common DEGs in breast cancer, we performed GO and KEGG pathway enrichment analysis^26,27. We found 10 dysregulated pathways based on the adjusted P < 0.05. Up-regulated DEGs were enriched in six pathways including ‘Cell cycle’, ‘Oocyte meiosis’ and ‘Focal adhesion’. Down-regulated DEGs were enriched in five pathways including ‘Peroxisome-proliferator-activated receptors (PPAR) signaling pathway’, ‘Metabolism of xenobiotics by cytochrome P450’, ‘Adipocytokine signaling pathway’ and ‘Cytokine-cytokine receptor interaction’ pathways (Fig. 3a). The results for each GO functional analysis are presented in Fig. 3b–d. The genes enriched in KEGG pathway and GO enrichment analysis have shown in Supplementary Tables S3–S6.

PPI network construction and module analysis

The interactive information among DEGs and the PPI network was obtained using the STRING online database. Among the total of common DEGs, 887 DEGs (730 up-regulated and 157 down-regulated) were filtered into the PPI network with 887 nodes and 10,398 edges, at a combined score > 0.4. Finally, Genes with a combined score > 0.9 were selected as key DEGs to be imported into Cytoscape. The Cytoscape software was applied to evaluate the interactive relationships between the candidate proteins. Afterward, two clusters consist of 65 nodes and 23 nodes were screened with a cut-off k-score = 12 depend on the MCODE scoring system (Supplementary Figure S2). The CytoNCA and the CytoHubba are two Cytoscape plug-in for centrality analysis and give us some insight into the most influential nodes or edges in a network. We ran CytoHubba application and extracted data from four calculations methods (EPC, MCC, MNC, and Stress). The top 100 nodes ranked by these four methods were selected (Supplementary Table S7). Moreover, four algorithms from CytoNCA application (Degree, Eigenvector, Betweenness, and Closeness) were employed and the top 100 nodes based upon these four approaches were obtained (Supplementary Table S7). Besides, a Venn diagram was created to identify the significant hub genes that are similar between all groups. The result of Venn diagram is mentioned in Supplementary Table S8. Eventually, through overlapping analysis, we identified a list of 26 key genes most of them belonged to MCODE cluster 1 (Supplementary Table S8). Since highly interconnected proteins in a network accumulate in a cluster, we chose only 20 genes from our list that belonged to cluster 1 (Table 1). All the selection steps are illustrated in Fig. 4a.

Table 1 Key differentially expressed genes acquired by centrality analysis.

Full size table

Key genes functional annotation and co-expression analysis

GO enrichment and KEGG pathway analysis on these 20 genes indicated that four pathways were enriched, including cell cycle, progesterone-mediated oocyte maturation, oocyte meiosis, and p53 signaling pathway. CCNA2, CDK1, MAD2L1, and CCNB1 were significantly enriched in some biological aspects such as cell cycle, mitosis, nuclear division, M phase, cell cycle and progesterone-mediated oocyte maturation pathways. In particular, by checking the expression data of 1104 cancer and 113 normal samples from the TCGA project in ENCORI database, we found that these four genes showed strong expression in the breast cancer specimens as compared to their expression in normal breast tissue ( Including : MAD2L1, Fold change: 4.28, Adjusted P value: 1.4e−70; CCNA2, Fold change: 6.88, Adjusted P value: 3.2e−91; CCNB1, Fold change: 5.63, Adjusted P value: 1.8e−111; CDK1, Fold change: 8.54, Adjusted P value: 5.3e−121). Additionally, we calculated the Pearson correlation for these 20 candidate genes and found a strong and significant correlation between them (Supplementary Table S9, Fig. 4b). Interestingly, CCNA2 and MAD2L1 which are two important genes in the cell cycle pathway and some crucial biological processes related to cell division, were highly correlated genes with a correlation coefficient higher than 0.9 in our analysis. Furthermore, these two genes correlation in TCGA dataset in the GEPIA database was consistent with our analysis.

Identification of differentially expressed lncRNAs and co-expression analysis

After downloading the list of lncRNA genes from HGNC database, lncRNAs genes symbols were extracted from the GSE65194 and GSE45827. A total of 334 lncRNA probes were identified in these two datasets by using this approach. Finally, 159 lncRNAs probe ID with |logFC|> 0.5 and adjusted P value < 0.01 among 20 normal samples and 258 breast tissue samples were picked out (Fig. 2c). Among these lncRNAs, 77 lncRNAs were up-regulated (Supplementary Table S10) and 80 lncRNAs were down-regulated (Supplementary Table S11) in breast cancer. We calculated Pearson correlation coefficient between differentially expressed lncRNAs and MAD2L1 and CCNA2 based on their expression value. LncRNA with Pearson correlation coefficient ≥ 0.6 or ≤ − 0.6 were selected as key lncRNA which co-expressed with MAD2L1 and CCNA2. Totally, 12 lncRNAs meet this criterion (Table 2). Additionally, Table 3 indicates the expression of these genes in four breast cancer subtypes. Our selected genes appear to be more important in more aggressive sub-types (basal-like and HER-enriched). However, deregulation of CARMN, PRINS and MEG3 may be crucial in all subtypes of breast cancer.

Table 2 Key lncRNAs which co-expressed with MAD2L1 and CCNA2.

Full size table

Table 3 Relative expression of our candidate genes in different molecular subtypes of breast cancer and healthy breast tissue in GSE65194 and GSE45827.

Full size table

Survival analysis of candidate hub genes

Associations between expression of candidate hub genes and RFS and OS of the breast cancer patients were evaluated using KM method to estimate the prognostic importance of the hub genes in our study. The results indicated that low expression of MAD2L1, CCNA2 and NCK1-DT lead to higher RFS rate than high expression. Inversely, high expressions of MEG3, RAD51-AS1, PRINS, LINC01089, LINC02256, FUT8-AS1, LINC01279, CARMN, EPB41L4A-AS1, EIF3J-DT and TNFRSF14-AS1 result in a significantly longer RFS time among patients with breast cancer. The results showed that MAD2L1, CCNA2, RAD51-AS1 and LINC01089 have the most prediction potential based on RFS among all candidate hub genes. Besides, hazard ratio was also calculated for OS. High expression of MAD2L1, CCNA2 and FUT8-AS1 lead to lower OS rate than low expression. On the other hand, low expressions of LINC01279, RAD51-AS1 and CARMN were correlated with significantly worse OS in breast cancer patients. Other candidate hub genes expressions were not significantly relevant to OS (Table 4).

Table 4 Recurrence free survival (RFS) and overall survival (OS) of candidate hub genes.

Full size table

Discussion

In the present study, we used a bioinformatics strategy to identify key genes and signaling pathways in breast cancer pathogenesis with a focus on the role of lncRNAs and their interactions with protein-coding genes. Such interactions can be assessed using experimental approaches which are costly and laborious. Bioinformatics methods for such purpose fall into two groups: strategies that use sequence, structural data and physicochemical features, and methods that are based on network construction. The latter can provide the inherent characteristics of topological configuration of associated biological networks which is often disregarded by the former strategies⁶. In the present work, we used GPL570 which is a good platform to evaluate the expression level of lncRNAs in tumorigenesis^11,28. We identified 730 upregulated and 157 downregulated DEGs between breast cancer and normal samples. Up-regulated DEGs were enriched in ‘Cell cycle’, ‘Oocyte meiosis’ and ‘Focal adhesion’. A previous bioinformatics study using topological characteristics of genes in breast cancer has identified these pathways as hub subnetworks²⁹. The role of these pathways has been acknowledged in the pathogenesis of another hormone related cancer namely prostate cancer³⁰. We also detected down-regulated DEGs were enriched in ‘PPAR signaling pathway’, ‘Metabolism of xenobiotics by cytochrome P450′, ‘Adipocytokine signaling pathway’ and ‘Cytokine-cytokine receptor interaction’ pathways. PPARs are nuclear hormone receptors which participate in modulation of different aspects of tumorigenesis such as cell proliferation, survival and apoptosis³¹. Xenobiotic metabolizing enzymes are also involved in the tumorigenesis and response of cancer patients to therapeutic options. Integration of expression data of these genes with eQTL data and allele frequency data from the 1000 Genomes project has shown considerable inter-population differences in the related pathways which might influence cancer prognosis and response to treatment³². Adipocytokines can also influence cell proliferation and survival, and malignant phenotypes of breast cancer cells through regulation several cellular and molecular pathways thus aggravating survival of patients³³. Cytokine signaling has important functions in formation, proliferation, and migration of breast cancer, thus modulating invasiveness, angiogenesis and metastatic potential of these cells³⁴.

Our in silico analyses revealed that CCNA2, CDK1, MAD2L1 and CCNB1 were significantly enriched in several biological pathways. These four genes showed strong expression in breast cancer samples as compared to their expression in normal breast tissue. Notably, these four genes have been among the top dysregulated genes in small cell lung cancer as revealed by GO, KEGG analysis and construction of PPI network³⁵. Such similarity between these two different types of cancers implies fundamental role of these genes in the carcinogenesis process and potentiates them as therapeutic targets. MAD2L1 form a complex with the APC/C and CDC20 and subsequently stimulate the M-A checkpoint to halt the transition of cell at this stage in the presence of anomalous segregation of chromatin. Yet, over-expression of E2F1 in atypical cells affects the formation of the mentioned complex leading to cell cycle transition even in the presence of abnormal chromosomes³⁶. CDK1/cyclin B is a maturation-promoting factor³⁷ and the checkpoint for G2/M transition^38,39, so it is expected to be involved in the process of cell cycle regulation and tumorigenesis. We also identified 12 lncRNAs with significant correlation with MAD2L1 and CCNB1 genes. As expected from KEGG analysis, KM analysis indicated that low expression of MAD2L1, CCNA2 and NCK1-DT lead to higher RFS rate than high expression. Inversely, high expressions of MEG3, RAD51-AS1, PRINS, LINC01089, LINC02256, FUT8-AS1, LINC01279, CARMN, EPB41L4A-AS1, EIF3J-DT and TNFRSF14-AS1 result in a significantly longer RFS time among patients with breast cancer. Additionally, hazard ratio was also calculated for OS. High expression of MAD2L1, CCNA2 and FUT8-AS1 and low expressions of LINC01279, RAD51-AS1 and CARMN were correlated with significantly worse OS in breast cancer patients, while Other candidate hub genes expression were not significantly relevant to OS.

According to previous studies MEG3 is down-regulated in breast cancer tissues^40,41,42. Recently, Zhang et al. showed MEG3 ability in promoting breast cancer growth and induction of apoptosis by activating ER stress, NF-κB and p53 pathways in breast cancer cell line⁴³. RAD51-AS1, also known as TODRA is transcribed from upstream of RAD51 in a divergent manner. Gazy et al. identified a conserved E2F1 binding site in the promoter region of RAD51-AS1 and considered this lncRNA as a target gene of E2F1 in breast cancer. RAD51-AS1 negatively regulates RAD51 expression and higher expression of RAD51-AS1 has been associated with a less aggressive tumor phenotype⁴⁴. PRINS (Psoriasis susceptibility-related RNA Gene Induced by Stress) is a stress induced lncRNA which regulates apoptosis^45,46. Min Yu et al. considered PRINS as a HIF-1α dependent lncRNA due to its significant over-expression in hypoxic conditions in renal tubular cells⁴⁷. Moreover, increased levels of PRINS have been observed in colorectal adenocarcinoma cells. This lncRNA interacts with trefoil factor 3 (TFF3), AKT/PI3K signaling pathway and miR-491-5p⁴⁸. PRINS levels were down-regulated in MCF-7 and MDA-MB-231 cell lines following exposure to the apoptotic and anti-proliferative agent CCT137690⁴⁹. LINC01089 (also known as LncRNA Inhibiting Metastasis; LIMT) is an EGF regulated lncRNA which is down-regulated in breast cancer tissues and cell lines, especially in aggressive subtypes of breast cancer⁵⁰. Yuan et al. have reported a significant correlation between low expression of LINC01089 and lymph node metastasis and poor prognosis of breast cancer. LINC01089 is modulates breast tumorigenesis by inhibiting β-catenin transcription and consequently blocking Wnt/β-catenin signaling⁵¹. LINC02256 (ENSG00000261064) is a validated novel long intergenic non-protein coding RNA with 2 transcripts which is located on 15q13.3. Based on GTEx (Release v6) results, it has ubiquitous expression in breast and other tissues. Potential contribution of this lncRNA in breast cancer should be evaluated in future studies. FUT8‐AS1 was up-regulated in endometrioid endometrial cancer patients in association with poor survival⁵². According to another TCGA data mining study on glioblastoma, FUT8-AS1 over-expression has been associated with poor patients outcomes⁵³. LINC01279 was significantly upregulated in patients with endometriosis. Based on Liu et al. study, there is a strong association between this lncRNA and cell cycle-dependent kinase-14 and CXC motif chemokine ligand-12. Hence, LINC01279 might contribute in the pathogenesis of endometriosis⁵⁴. In another bioinformatics analysis of differential gene expression in breast cancer LINC01279 was significantly down-regulated⁵⁵. However, further studies should be done to elucidate its function in breast cancer. CARMN (also known as MiR143HG) is recognized as a tumor suppressor in bladder cancer. Xie et al. observed down-regulation of CARMN in bladder cancer tissues compared with normal tissues. Moreover, there was an association between CARMN over-expression and a high survival rate in bladder cancer patients. CARMN/miR‐1275/AXIN2 axis takes part in bladder tumorigenesis by interacting with the Wnt/β‐catenin pathway⁵⁶. Furthermore, there is an association between down-regulation of CARMN and poor survival in endometrial carcinoma⁵⁷. CARMN was significantly down-regulated in hepatocellular carcinoma (HCC) tissues and cells. Over-expression of CARMN was associated with good prognosis. Generally, this gene contributes to development and progression of HCC by blocking the MAPK and Wnt signaling pathways⁵⁸. EPB41L4A-AS1 (also known as TIGA1) is a p53-regulated gene. EPB41L4A-AS1 was down-regulated in many human cancers in correlation with poor prognosis. EPB41L4A-AS1 acts as a repressor of the Warburg effect and is involved in cancer metabolic reprogramming⁵⁹. In a recent study in early stage breast cancer, significant down-regulation of EPB41L4A-AS1 was observed in tumor tissues⁶⁰. EIF3J-DT is a novel lncRNA with no publication reporting its biological function in breast cancer up to now. Based on one study in HCC, EIF3J-DT might have potential prognostic value⁶¹. In addition, EIF3J-DT regulates multi-drug resistance by interacting with autophagy in gastric cancer⁶². Based on He et al. study, TNFRSF14-AS1 might have a prognostic value in breast cancer but this result needs to further confirmation⁶³. Based on the available literature, the identified lncRNAs in the current study has putative roles in the pathogenesis of breast cancer and other types of cancer.

Taken together, in the present study, we intended to introduce a precise method to discover and prioritize the most probable candidate genes involved in breast cancer. Gene expression analysis in different molecular subtypes indicated the importance of our chosen genes in more aggressive subtypes. On the other hand, CARMN, PRINS and MEG3 probably have an important role in pathogenesis of all subtypes of breast cancer. We also added several evidences from literature regarding the role of candidate genes in the pathogenesis of cancer. Although this study provides some impressive evidence for future differential expression studies in breast cancer, the limitation of this study is lack of experimental evaluation of the candidate genes. our in silico method identified a number of hub genes and related lncRNAs which are possibly involved in the pathogenesis of breast cancer and patients' prognosis, so can be used as therapeutic targets or biomarkers for this malignancy.

References

Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).
Article PubMed Google Scholar
Yang, K. D., Gao, J. & Luo, M. Identification of key pathways and hub genes in basal-like breast cancer using bioinformatics analysis. Oncotargets Ther. 12, 1319–1331. https://doi.org/10.2147/Ott.S158619 (2019).
Article CAS Google Scholar
Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27, 1160–1167. https://doi.org/10.1200/jco.2008.18.1370 (2009).
Article PubMed PubMed Central Google Scholar
Prat, A., Ellis, M. J. & Perou, C. M. Practical implications of gene-expression-based assays for breast oncologists. Nat. Rev. Clin. Oncol. 9, 48 (2012).
Article CAS Google Scholar
Feng, Y. X. et al. Breast cancer development and progression: risk factors, cancer stem cells, signaling pathways, genomics, and molecular pathogenesis. Genes Dis. 5, 77–106. https://doi.org/10.1016/j.gendis.2018.05.001 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhang, H., Liang, Y. C., Han, S. Y., Peng, C. & Li, Y. Long noncoding RNA and protein interactions: from experimental results to computational models based on network methods. Int. J. Mol. Sci. https://doi.org/10.3390/Ijms20061284 (2019).
Article PubMed PubMed Central Google Scholar
Tuersong, T., Li, L. L., Abulaiti, Z. & Feng, S. M. Comprehensive analysis of the aberrantly expressed lncRNA-associated ceRNA network in breast cancer. Mol. Med. Rep. 19, 4697–4710. https://doi.org/10.3892/mmr.2019.10165 (2019).
Article CAS PubMed PubMed Central Google Scholar
Maire, V. et al. Polo-like kinase 1: a potential therapeutic option in combination with conventional chemotherapy for the management of patients with triple-negative breast cancer. Can. Res. 73, 813–823 (2013).
Article CAS Google Scholar
Gruosso, T. et al. Chronic oxidative stress promotes H2AX protein degradation and enhances chemosensitivity in breast cancer patients. EMBO Mol. Med. 8, 527–549 (2016).
Article CAS PubMed PubMed Central Google Scholar
Irizarry, R. A. et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–264 (2003).
Article MATH PubMed Google Scholar
Zhang, X. et al. Long non-coding RNA expression profiles predict clinical phenotypes in glioma. Neurobiol. Dis. 48, 1–8 (2012).
Article CAS PubMed Google Scholar
Lopez-Romero, P. AgiMicroRna: Processing and differential expression analysis of agilent microRNA chips. R package version 2 (2016).
Yeung, K. Y. & Ruzzo, W. L. Principal component analysis for clustering gene expression data. Bioinformatics 17, 763–774 (2001).
Article CAS PubMed Google Scholar
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, Berlin, 2016).
Book MATH Google Scholar
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47–e47 (2015).
Article CAS PubMed PubMed Central Google Scholar
Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115 (2015).
Article CAS PubMed PubMed Central Google Scholar
Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).
Article CAS PubMed Google Scholar
Szklarczyk, D. et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2018).
Article CAS PubMed Central Google Scholar
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article CAS PubMed PubMed Central Google Scholar
Bader, G. D. & Hogue, C. W. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 4, 2 (2003).
Article Google Scholar
Chin, C.-H. et al. cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Syst. Biol. 8, S11 (2014).
Article PubMed PubMed Central Google Scholar
Tang, Y., Li, M., Wang, J., Pan, Y. & Wu, F.-X. CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127, 67–72 (2015).
Article CAS PubMed Google Scholar
Braschi, B. et al. Genenames.org: the HGNC and VGNC resources in 2019. Nucleic Acids Res. 47, D786–D792 (2018).
Article CAS PubMed Central Google Scholar
Nagy, Á., Lánczky, A., Menyhárt, O. & Győrffy, B. Validation of miRNA prognostic power in hepatocellular carcinoma using expression data of independent datasets. Sci. Rep. 8, 9227 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Kolde, R. Pheatmap: pretty heatmaps. R package version 1 (2012).
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Article CAS PubMed Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Jiang, L. et al. Co-expression network analysis of the lncRNAs and mRNAs associated with cervical cancer progression. Arch. Med. Sci. AMS 15, 754 (2019).
Article CAS PubMed Google Scholar
Zhuang, D. Y., Jiang, L., He, Q. Q., Zhou, P. & Yue, T. Identification of hub subnetwork based on topological features of genes in breast cancer. Int. J. Mol. Med. 35, 664–674. https://doi.org/10.3892/ijmm.2014.2057 (2015).
Article CAS PubMed Google Scholar
Fan, S. T. et al. Identification of the key genes and pathways in prostate cancer. Oncol. Lett. 16, 6663–6669. https://doi.org/10.3892/ol.2018.9491 (2018).
Article CAS PubMed PubMed Central Google Scholar
Gou, Q., Gong, X., Jin, J. H., Shi, J. J. & Hou, Y. Z. Peroxisome proliferator-activated receptors (PPARs) are potential drug targets for cancer therapy. Oncotarget 8, 60704–60709. https://doi.org/10.18632/oncotarget.19610 (2017).
Article PubMed PubMed Central Google Scholar
Li, Y. et al. Tumoral expression of drug and xenobiotic metabolizing enzymes in breast cancer patients of different ethnicities with implications to personalized medicine. Sci. Rep. https://doi.org/10.1038/S41598-017-04250-2 (2017).
Article PubMed PubMed Central Google Scholar
Li, J. & Han, X. Adipocytokines and breast cancer. Curr. Probl. Cancer 42, 208–214. https://doi.org/10.1016/j.currproblcancer.2018.01.004 (2018).
Article PubMed Google Scholar
Fasoulakis, Z., Kolios, G., Papamanolis, V. & Kontomanolis, E. N. Interleukins associated with breast cancer. Cureus 10, e3549 (2018).
PubMed PubMed Central Google Scholar
Ni, Z., Wang, X. T., Zhang, T. C., Li, L. L. & Li, J. X. Comprehensive analysis of differential expression profiles reveals potential biomarkers associated with the cell cycle and regulated by p53 in human small cell lung cancer. Exp. Ther. Med. 15, 3273–3282. https://doi.org/10.3892/etm.2018.5833 (2018).
Article PubMed PubMed Central Google Scholar
May, K. M., Paldi, F. & Hardwick, K. G. Fission yeast Apc15 stabilizes MCC-Cdc20-APC/C complexes, ensuring efficient Cdc20 ubiquitination and checkpoint arrest. Curr. Biol. CB 27, 1221–1228. https://doi.org/10.1016/j.cub.2017.03.013 (2017).
Article CAS PubMed Google Scholar
Draetta, G. et al. Cdc2 protein kinase is complexed with both cyclin A and B: evidence for proteolytic inactivation of MPF. Cell 56, 829–838. https://doi.org/10.1016/0092-8674(89)90687-9 (1989).
Article CAS PubMed Google Scholar
Fisher, D. & Nurse, P. Cyclins of the fission yeast Schizosaccharomyces pombe. Semin. Cell Biol. 6, 73–78 (1995).
Article CAS PubMed Google Scholar
Nasmyth, K. Viewpoint: putting the cell cycle in order. Science 274, 1643–1645. https://doi.org/10.1126/science.274.5293.1643 (1996).
Article ADS CAS PubMed Google Scholar
Sun, L., Li, Y. & Yang, B. Downregulated long non-coding RNA MEG3 in breast cancer regulates proliferation, migration and invasion by depending on p53’s transcriptional activity. Biochem. Biophys. Res. Commun. 478, 323–329 (2016).
Article CAS PubMed Google Scholar
Zhang, C.-Y. et al. Overexpression of long non-coding RNA MEG3 suppresses breast cancer cell proliferation, invasion, and angiogenesis through AKT pathway. Tumor Biol. 39, 1010428317701311 (2017).
Google Scholar
Zhu, M. et al. MEG3 overexpression inhibits the tumorigenesis of breast cancer by downregulating miR-21 through the PI3K/Akt pathway. Arch. Biochem. Biophys. 661, 22–30 (2019).
Article CAS PubMed Google Scholar
Zhang, Y. et al. Long noncoding RNA MEG3 inhibits breast cancer growth via upregulating endoplasmic reticulum stress and activating NF-κB and p53. J. Cell. Biochem. 120, 6789–6797 (2019).
Article CAS PubMed Google Scholar
Gazy, I. et al. TODRA, a lncRNA at the RAD51 locus, is oppositely regulated to RAD51, and enhances RAD51-dependent DSB (double strand break) repair. PLoS ONE 10, e0134120 (2015).
Article CAS PubMed PubMed Central Google Scholar
Sonkoly, E. et al. Identification and characterization of a novel, psoriasis susceptibility-related noncoding RNA gene, PRINS. J. Biol. Chem. 280, 24159–24167 (2005).
Article CAS PubMed Google Scholar
Szegedi, K. et al. The anti-apoptotic protein G1P3 is overexpressed in psoriasis and regulated by the non-coding RNA, PRINS. Exp. Dermatol. 19, 269–278 (2010).
Article CAS PubMed Google Scholar
Yu, T.-M. et al. RANTES mediates kidney ischemia reperfusion injury through a possible role of HIF-1α and LncRNA PRINS. Sci. Rep. 6, 18424 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Hanisch, C. et al. TFF3-dependent resistance of human colorectal adenocarcinoma cells HT-29/B6 to apoptosis is mediated by miR-491-5p regulation of lncRNA PRINS. Cell Death Discov. 3, 16106 (2017).
Article CAS PubMed PubMed Central Google Scholar
Balcı, T. O., Kayabaşı, Ç. & Gündüz, C. Effect of CCT137690 on long non-coding RNA expression profiles in MCF-7 and MDA-MB-231 cell lines. Bosnian J. Basic Med. Sci. 20(1), 56–62 (2019).
Google Scholar
Sas-Chen, A. et al. LIMT is a novel metastasis inhibiting lncRNA suppressed by EGF and downregulated in aggressive breast cancer. EMBO Mol. Med. 8, 1052–1064 (2016).
Article CAS PubMed PubMed Central Google Scholar
Yuan, H. et al. Long noncoding RNA LINC01089 predicts clinical prognosis and inhibits cell proliferation and invasion through the Wnt/β-catenin signaling pathway in breast cancer. Oncotargets Ther. 12, 4883 (2019).
Article CAS Google Scholar
Xu, Q. et al. A long noncoding RNAs signature to improve survival prediction in endometrioid endometrial cancer. J. Cell. Biochem. 120, 8300–8310 (2019).
Article CAS Google Scholar
Shergalis, A., Bankhead, A., Luesakul, U., Muangsin, N. & Neamati, N. Current challenges and opportunities in treating glioblastoma. Pharmacol. Rev. 70, 412–445 (2018).
Article CAS PubMed PubMed Central Google Scholar
Liu, J. et al. Identification of LINC01279 as a cell cycle-associated long non-coding RNA in endometriosis with GBA analysis. Mol. Med. Rep. 18, 3850–3858 (2018).
CAS PubMed PubMed Central Google Scholar
Dong, H. et al. Bioinformatic analysis of differential expression and core GENEs in breast cancer. Int. J. Clin. Exp. Pathol. 11, 1146–1156 (2018).
PubMed PubMed Central Google Scholar
Xie, H. et al. LncRNA miR143HG suppresses bladder cancer development through inactivating Wnt/β-catenin pathway by modulating miR-1275/AXIN2 axis. J. Cell. Physiol. 234, 11156–11164 (2019).
Article CAS PubMed Google Scholar
Shi, F. et al. LncRNA miR143HG up-regulates p53 in endometrial carcinoma by sponging miR-125a. Cancer Manag. Res. 11, 10117 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lin, X. et al. Long non-coding RNA miR143HG predicts good prognosis and inhibits tumor multiplication and metastasis by suppressing mitogen-activated protein kinase and Wnt signaling pathways in hepatocellular carcinoma. Hepatol. Res. 20(1), 56–62 (2019).
Google Scholar
Liao, M. et al. LncRNA EPB41L4A-AS1 regulates glycolysis and glutaminolysis by mediating nucleolar translocation of HDAC2. EBioMedicine 41, 200–213 (2019).
Article PubMed PubMed Central Google Scholar
Rao, A. K. D. M. et al. Identification of lncRNAs associated with early-stage breast cancer and their prognostic implications. Mol. Oncol. 13, 1342 (2019).
Article CAS Google Scholar
Gu, J.-X. et al. Six-long non-coding RNA signature predicts recurrence-free survival in hepatocellular carcinoma. World J. Gastroenterol. 25, 220 (2019).
Article PubMed PubMed Central Google Scholar
Luo, Y., Zhou, R., Huang, N., Sun, L. & Liao, W. (American Society of Clinical Oncology, 2017).
He, Y. et al. A prognostic 11 long noncoding RNA expression signature for breast invasive carcinoma. J. Cell. Biochem. 120(10), 16692–16702 (2019).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This study was financially supported by Shahid Beheshti University of Medical Sciences.

Author information

Authors and Affiliations

Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
Sepideh Dashti & Soudeh Ghafouri-Fard
Urogenital Stem Cell Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
Mohammad Taheri

Authors

Sepideh Dashti
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Taheri
View author publications
You can also search for this author in PubMed Google Scholar
Soudeh Ghafouri-Fard
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.D. and M.T. performed the data collection and analyzed the data. S.G.F. designed the study, wrote the draft and revised it. All the authors contributed equally and are fully aware of submission.

Corresponding author

Correspondence to Soudeh Ghafouri-Fard.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Supplementary Information 4.

Supplementary Information 5.

Supplementary Information 6.

Supplementary Information 7.

Supplementary Information 8.

Supplementary Information 9.

Supplementary Information 10.

Supplementary Information 11.

Supplementary Information 12.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Dashti, S., Taheri, M. & Ghafouri-Fard, S. An in-silico method leads to recognition of hub genes and crucial pathways in survival of patients with breast cancer. Sci Rep 10, 18770 (2020). https://doi.org/10.1038/s41598-020-76024-2

Download citation

Received: 08 April 2020
Accepted: 22 October 2020
Published: 30 October 2020
DOI: https://doi.org/10.1038/s41598-020-76024-2

This article is cited by

Integrated analysis of inflammatory mRNAs, miRNAs, and lncRNAs elucidates the molecular interactome behind bovine mastitis
- Aliakbar Hasankhani
- Maryam Bakherad
- Mohammad Moradi Shahrbabak
Scientific Reports (2023)
Phytocompounds From Edible Oil Seeds Target Hub Genes To Control Breast Cancer
- Soniya Ashok Kumar
- Noorul Samsoon Maharifa Haja Mohaideen
- Hemalatha S
Applied Biochemistry and Biotechnology (2023)
Phytocompounds of Onion Target Heat Shock Proteins (HSP70s) to Control Breast Cancer Malignancy
- Karunya Jenin Ravindranath
- Noorul Samsoon Maharifa Haja Mohaideen
- Hemalatha Srinivasan
Applied Biochemistry and Biotechnology (2022)
Bioinformatics analysis of long non-coding RNA-associated competing endogenous RNA network in schizophrenia
- Hani Sabaie
- Madiheh Mazaheri Moghaddam
- Maryam Rezazadeh
Scientific Reports (2021)
Long non-coding RNA-associated competing endogenous RNA axes in the olfactory epithelium in schizophrenia: a bioinformatics analysis
- Hani Sabaie
- Marziyeh Mazaheri Moghaddam
- Maryam Rezazadeh
Scientific Reports (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Methods

Gene expression profile data collection

Data preprocessing and DEGs identification

Functional enrichment analysis

PPI network construction, cluster analysis and key gene identification

Prediction of lncRNAs function

Survival analysis

Results

DEGs screening

KEGG and GO enrichment analysis

PPI network construction and module analysis

Key genes functional annotation and co-expression analysis

Identification of differentially expressed lncRNAs and co-expression analysis

Survival analysis of candidate hub genes

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links