Four-copy number alteration (CNA)-related lncRNA prognostic signature for liver cancer

Cheng, Zhenyun; Guo, Yan; Sun, Jingjing; Zheng, Lei

doi:10.1038/s41598-022-17927-0

Download PDF

Article
Open access
Published: 22 August 2022

Four-copy number alteration (CNA)-related lncRNA prognostic signature for liver cancer

Zhenyun Cheng^1,2,
Yan Guo^1,2,
Jingjing Sun^1,2 &
…
Lei Zheng^1,2

Scientific Reports volume 12, Article number: 14261 (2022) Cite this article

1198 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

Abstract

The objective of this study was to identify CNA-related lncRNAs that can better evaluate the prognosis of patients with liver cancer. Prognostic molecular subtypes were identified, followed by tumor mutation and differential expression analyses. Genomic copy number anomalies and their association with lncRNAs were also evaluated. A risk model was built based on lncRNAs, as well as a nomogram, and the differences in the tumor immune microenvironment and drug sensitivity between the High_ and Low_risk groups were compared. Weighted gene co-expression network analysis was used to identify modules with significant enrichment in prognostic-related lncRNAs. In total, two subtypes were identified, TP53 and CTNNB1 were common high-frequency mutated genes in the two subtypes. A total of 8,372 differentially expressed (DE) mRNAs and 798 DElncRNAs were identified between cluster1 and cluster2. In addition, a four-lncRNA signature was constructed, and statistically significant differences between the Low_ and High_risk groups were found in terms of CD8 T cells, resting memory CD4 T cells, etc. Enrichment analysis showed that prognostic-related lncRNAs were involved in the cell cycle, p53 signaling pathway, non-alcoholic fatty liver disease, etc. A prognostic prediction signature, based on four-CNA-related lncRNAs, could contribute to a more accurate prognosis of patients with liver cancer.

Investigation of Hippo pathway-related prognostic lncRNAs and molecular subtypes in liver hepatocellular carcinoma

Article Open access 20 March 2023

Network of clinically-relevant lncRNAs-mRNAs associated with prognosis of hepatocellular carcinoma patients

Article Open access 07 July 2020

A disulfidptosis-related lncRNAs signature in hepatocellular carcinoma: prognostic prediction, tumor immune microenvironment and drug susceptibility

Article Open access 07 January 2024

Introduction

Liver cancer is the most prevalent primary malignancy of the liver and the fourth leading form of life-threatening cancer worldwide¹. Early screening and diagnosis of liver cancer, such as imaging examination and serological indicators, have been widely used and have greatly improved in recent years^2,3. However, the early diagnosis rate of liver cancer is rather low; only 30–40% of patients are diagnosed at an early stage⁴. At present, the most effective way to treat liver cancer is radical tumor resection; nevertheless, the survival rate of patients remains poor, the 5-year survival rate being only 18%⁵. Therefore, new prognostic biomarkers are urgently needed to promote the treatment and accurate diagnosis of patients with liver cancer.

Long noncoding RNAs (lncRNAs) are RNAs that are greater than 200 nucleotides in length and lack protein-coding ability⁶. Although lncRNAs are one of the least understood classes of molecules, recent studies have shown that they are involved in a wide range of biological processes and are associated with many diseases, such as autoimmune thyroid diseases, cancer, and cardiovascular diseases^7,8,9. There is evidence that abnormally expressed lncRNAs are associated with the progression of cancers¹⁰, including liver cancer. Wang et al. found that by activating the Wnt signaling pathway, lnctcf7 improved the self-renewal of human hepatoma stem cells¹¹. Xin et al. suggested that lncRNA HULC inhibits PTEN and accelerates liver cancer through autophagy cooperation with miR15a¹². Fu et al. illustrated that lncRNA PURPL accelerates cell proliferation in liver cancer through the regulation of p53¹³.

Copy number alteration (CNA) is a significant cause of genetic variation¹⁴ and defines as somatic copy number changes, which has been reported to be strongly associated with morbid consequences, such as developmental disorders and cancer¹⁵. There is evidence that CNA has important functions in the pathogenesis of numerous tumors^16,17. The gain or loss of the tumor genome copy number is closely related to differential gene expression, particularly for oncogenes and tumor suppressor genes¹⁵. Numerous studies have reported the association between lncRNAs and CNAs in cancers. For instance, Zhong et al. have identified CNA-related lncRNAs that can better predict cervical cancer prognosis¹⁸, Athie et al. shown that the lncRNA ALAL-1 could be used as a regulator of lung cancer immune evasion via CNA analysis¹⁹, and Zhong et al. revealed the prognosis-related lncRNAs by analyzing the expression profiles of lncRNAs and CNAs in bladder cancer²⁰, however, few studies have explored the regulatory relationships between lncRNAs and CNAs in liver cancer, and the CNA-related lncRNA prognostic model in liver cancer is largely unknown.

Accordingly, the goal of the present study was to analyze the regulatory relationships between lncRNAs and CNAs in liver cancer. This was achieved by screening CNA-related lncRNAs that can evaluate liver cancer prognosis based on CNA, methylation, and gene expression data (a schematic of the study design is shown in Fig. 1). The results of this study offer predictive biomarkers for liver cancer.

Results

Identification of prognostic molecular subtype

A total of 6060 mRNAs, 3966 methylation genes, and 4961 CNA regions with significant prognostic associations were obtained. In addition, two subtypes, including cluster1 (n = 101) and cluster2 (n = 234), were identified using iClusterPlus (Table 1). Cluster2 had the most favorable prognosis (Fig. 2A). PCA results showed the mRNA expression pattern and methylation pattern in the two subtypes were different (Fig. 2B,C). Moreover, based on the methylation level values of prognostic related methylated genes in each sample, the hierarchical clustering analysis was conducted. Hierarchical clustering analysis results revealed that the samples were divided into three groups, and total 116, 218, one samples were included in 1, 2, 3 groups, respectively, and the two identified clusters were tending to cluster together (Fig. 2D). Moreover, to further observe the tendentiousness of the identified clusters contained in each group in the hierarchical clustering, the sample distribution proportion diagram of each cluster in group 1 and 2 was drawn (Fig. 2E), the results shown that group 2 was mainly included cluster2, and group 1 was mainly contained cluster1, and the methylation pattern in the two groups were found different using chi square test (P < 0.05). In addition, the frequency of 102 mutated genes showed significant differences between two subtypes (Fig. 3A; Table 2), and there were more mutated genes in cluster1. Among the 102 mutated genes, the top10 high-frequency mutated genes in subtypes are shown in Fig. 3B,C, and TP53 and CTNNB1 were common high-frequency mutated genes in the two subtypes.

Table 1 P value of the corresponding survival difference under different cluster numbers.

Full size table

Table 2 Mutation distribution of high-frequency mutation genes in the two molecular subtypes.

Full size table

Identification of differentially expressed (DE) mRNAs and DElncRNAs in two subtypes

Applying the screening criteria of P < 0.05, and |log₂FC|> 0.263, a total of 8,372 DEmRNAs (7,123 up-regulated and 1249 down-regulated) and 798 DElncRNAs (577 up- and 221 down-regulated) were identified between cluster1 and cluster2 (Supplementary Fig. 1A,B).

LncRNAs abnormal expressions related to CNAs

The variant frequency of lncRNAs in the samples was calculated to evaluate the association between CNAs and lncRNA expression. The frequency of copy number gains and losses of lncRNAs on each chromosome varied (Fig. 4A); for example, numerous copies of chromosomes 4, 8, 9, and 17 were deficient, whereas there were a greater number of copies of chromosomes 5, 6, and 7. In addition, based on the expression profile and CNA profile of 1,358 lncRNAs, the correlation distribution between the copy number and lncRNA expression profile showed an overall trend of positive association (Fig. 4B). Numerous regions with lncRNA copy number gain and loss were revealed (Fig. 4C), indicating that the abnormal lncRNAs copy number might be associated with the progression of liver cancer. In addition, as shown in the heatmap (Fig. 4D), the variant ratio of lncRNAs in cluster1 increased compared to that in cluster2, and the Chi square test results shown that among the 1,358 lncRNAs, total 1,238 lncRNAs had significant difference on CNA between cluster1 and cluster2 (Supplementary Table 1). Next, a total of 52 lncRNAs with CNA frequency > 75% in samples were screened, and the differences in the expression of these 52 lncRNAs with copy number gain, copy number loss, and normal copy number were evaluated using Kruskal–Wallis test. The results showed that the expression of most lncRNAs were significantly difference among the three groups (P < 0.05; Supplementary Fig. 2 and Supplementary Table 2), suggesting that lncRNA abnormal expressions are associated with CNAs.

Establishment of a lncRNA signature

A total of 34 lncRNAs were screened as candidate lncRNAs, as described in the Methods section. Univariate Cox regression analysis was conducted, and a total of 12 prognostic-related lncRNAs were identified (Table 3). LASSO Cox regression analysis was performed, and four lncRNAs were utilized to build the signature (Fig. 5A), containing LOC339803, F11-AS1, PCAT2, and TMEM220-AS1. The prognostic capacity of the lncRNA signature was also evaluated in training, testing, validation sets, and the patients in the High_risk group had a poorer prognosis than those in the Low_risk group (Fig. 5B–D). The AUCs at 1-, 3- and 5-year survival time were all approximately 0.7 (Fig. 5B–D). In addition, patients with high expression of F11-AS1 and TMEM220-AS1 had a favorable prognosis, whereas high expression of LOC339803 and PCAT2 indicated poor prognosis (Fig. 6).

Table 3 Univariate Cox regression results.

Full size table

Construction of the nomogram

After the univariate Cox regression analysis was carried out, RiskGroup, AJCC_PATHOLOGIC_TUMOR_STAGE, NEW_TUMOR_EVENT_AFTER_INITIAL_TREATMENT were identified with P < 0.05 (Fig. 7A), and these characteristics were used to build a nomogram (Fig. 7B). The calibration curves were matched to actual 1-, 3-, and 5-year survival (Fig. 7C).

Clinical characteristics

The distribution of each clinical characteristic in the High_ and Low_risk groups was statistically explored, and the results showed significant differences in Pathologic-T, Pathologic-stage, Grade, Vascular invasion between the Low_ and High_risk groups (Table 4). In addition, the High_risk group presented more cluster1 samples (Supplementary Fig. 3), which might explain the poor prognosis of patients in the High_risk group.

Table 4 Clinical features of the dataset.

Full size table

Tumor immune microenvironment

As shown in Fig. 8A, ten immune cells, including memory B cells, regulatory T cells (Tregs), and M0 macrophages, showed obvious differences between the Low_ and High_risk groups. There were also statistically significant differences in immune score, estimate score, and tumor purity between the Low_ and High_risk groups (Fig. 8B).

Drug sensitivity prediction

The IC50 of 138 drugs was quantified, and the differences between High_ and Low_risk groups were compared. In the case of 30 of these drugs (including Erlotinib, Lapatinib, and Gefitinib), a significant difference in IC50 was found between the two groups (Supplementary Fig. 4; Table 5), suggesting that the High_risk group may be more resistant to these drugs.

Table 5 The IC50 of 30 drugs.

Full size table

Identification of enriched lncRNA modules

The WGCNA package was employed to build a scale-free co-expression network, and the soft threshold power for matrix transformation was analyzed with the square of the related coefficient between log2k and log2p (k) being 0.85, and the power = 10 (Fig. 9A). For each module, the minimum number of genes was set to 30, and the similarity was greater than 0.1. These modules were clustered, and the modules with correlation coefficients greater than 0.8 were merged, yielding a total of seven modules (Fig. 9B). The two lncRNAs were clustered into a gray module, and the other two lncRNAs, TMEM220-AS1 (blue module) and F11-AS1 (brown module), were further analyzed. The enrichment analysis showed that the blue module was enriched in 487 GO-BP terms and 19 KEGG pathways (including cell cycle, p53 signaling pathway, DNA replication), and the brown module was enriched in 168 GO-BP terms and 15 KEGG pathways (including non-alcoholic fatty liver disease, fatty acid degradation, etc.) (Fig. 9C,D).

Discussion

CNAs have important functions in tumor progression²¹. In the present study, iClusterPlus was utilized for cluster analysis based on mRNA expression, methylation, CNA data, and iClusterplus, which can decrease the dimension of a dataset without altering the sample size. The results showed that two subtypes, cluster1 and cluster2, were identified. Cluster2 had the most favorable prognosis, and the CNA frequency of lncRNAs in cluster1 was higher than that in cluster2, which suggests that CNA-related lncRNAs were correlated with the prognosis of patients with liver cancer. Moreover, TP53 and CTNNB1 were common high-frequency mutated genes in both subtypes. In cancer, TP53 is the most frequently mutated gene, and more than 50% of human tumors carry TP53 gene mutations, including liver cancer^22,23,24. Mutations in CTNNB1 have been implicated in the pathogenesis of liver cancer²⁵. These results indicate that subtype classification might help evaluate the prognosis of patients with liver cancer and have specific regulatory relationships at the level of transcription, genome, and epigenome.

Zheng et al. aimed to screen prognostic biomarkers of lncRNA associated with CNA in ovarian cancer²⁶, however, prognostic biomarkers of four lncRNAs associated with CNA were screened after LASSO Cox regression analysis in this study, containing LOC339803, F11-AS1, PCAT2, and TMEM220-AS1, and these four lncRNAs were further used to build the CNA-related lncRNA prognostic model for liver cancer. The patients in High_risk group had a poorer prognosis than patients in Low_risk group in all sets. The AUCs at 1-, 3-, and 5-year survival times in all sets were all approximately 0.7, suggesting that the performance of the lncRNA signature was reliable. In addition, patients with high expression of F11-AS1 and TMEM220-AS1 had a favorable prognosis, whereas high expression of LOC339803 and PCAT2 was associated with poor prognosis. Du et al. found that lncRNA F11-AS1 regulates PTEN expression by competitive binding with miR-3146 and inhibits the progression of liver hepatocellular carcinoma, and F11-AS1 may be used as a therapeutic target for liver hepatocellular carcinoma²⁷. Cao et al. revealed that TMEM220-AS1 inhibits hepatocellular carcinoma by regulating the miR-484/MAGI1 axis²⁸. Xue et al. documented that lncRNA LOC339803 facilitates the invasion and migration of hepatocellular carcinoma cells by acting as a ceRNA of miR-30a-5p²⁹. Han et al. implied that PCAT2 plays a vital role in prostate cancer³⁰. Our results are consistent with those reported above. However, few studies have reported PCAT2 expression in liver cancer. In addition, enrichment analysis showed that TMEM220-AS1 is involved in the cell cycle, DNA replication pathways, p53 signaling pathway, etc., and F11-AS1 is involved in non-alcoholic fatty liver disease, fatty acid degradation pathways, etc. The cell cycle is a complex process that is regulated by a variety of proteins at multiple levels, and the cell cycle pathway plays a crucial role in tumorigenesis³¹. Studies have reported that the p53 signaling pathway plays a vital role in the regulation of tumor progression^32,33,34.

DNA replication is a basic biological process, in this process, disorder can lead to genomic instability, which is a hallmark of cancer³⁵. Non-alcoholic fatty liver disease can develop into cirrhosis via fibrosis and can be complicated by hepatocellular carcinoma^36,37. As fatty acids are essential for cancer cell proliferation, fatty acid degradation could provide a therapeutic strategy³⁸. Thus, LOC339803, F11-AS1, PCAT2, and TMEM220-AS1 might have vital functions in the pathogenesis of liver cancer and could be used as prognostic markers for the cancer.

Because changes in the immune microenvironment have a profound effect on the progression of liver cancer³⁹, the immune microenvironment changes were analyzed using the ESTIMATE algorithm and CIBERSORT. The results revealed that ten immune cells, and the immune, estimate scores, and tumor purity had obvious differences between Low- and High_risk groups. It has been reported that in hepatocellular carcinoma, regulatory T cells (Tregs) and exhausted CD8 T cells are increased and may clonally expand⁴⁰. In patients with liver cancer, tumor-associated macrophages (TAMs) are regularly increased through immunohistochemical staining⁴¹. Rohr-Udilova et al. found that resting mast cells in hepatocellular carcinoma were increased when compared to healthy livers⁴². In addition, high immune and estimate scores are correlated with clinicopathological characteristics and poor prognosis in cancer⁴³. In addition, statistically significant differences in IC50 of 30 drugs were found when Low- and High_risk groups were compared, among these drugs were Erlotinib, Lapatinib, Gefitinib, etc. These results revealed that these CNA-related lncRNA signatures might better predict the survival of patients with liver cancer, and these ten immune cells are related to the progression of liver cancer.

However, this study had some limitations. First, the data analyzed were downloaded from public databases, and external validation was required to show the utility of lncRNAs related signatures. Second, the fold-change was not calculated in the drug sensitivity prediction anlaysis dut to no quantized value that can represent resistant or sensitive, and further research should be conducted. Besides, the lncRNAs from which short peptide are transcribed should be considered in this study. In addition, the immune cell proportion was estimated using only the CIBERSORT algorithm, further relevant experiments should be carried out to verify this. Moreover, that’s would be better if there were more relevant experiments to validate the biomarkers and pathways identified in this study.

Conclusion

In summary, a CNA-related lncRNA prognostic signature, which is closely correlated with the immune microenvironment, was constructed in this study. This signature is likely to improve the accuracy of liver cancer prognosis and provide insights into predictive biomarkers or potential targets for patients with liver cancer.

Materials and methods

Data collection and processing

The gene expression RNA-seq (log2(fpkm + 1)) data of GDC TCGA LIHC were downloaded from the UCSC Xene platform (https://xenabrowser.net/)⁴⁴, and the genes with expression less than 1 in more than half of the samples were filtered out. Those genes with “protein_coding” annotation (based on the downloaded gene annotation file in GENCODE V22 version) were reserved as mRNA, and the genes with “antisense,” “sense_intronic,” and “lincRNA”, etc., annotation information were reserved as lncRNA. In addition, the CNA (cna_hg19.seg; https://cbioportal-datahub.s3.amazonaws.com/lihc_tcga.tar.gz; Affymetrix SNP 6.0 array), 450k methylation (gene level methylation values, and the probe with the most obvious negative correlation with the gene was selected as the methylation value of the gene, so that each gene has a methylation level value), and clinical and survival information in the TCGA database were obtained from the cBioportal website (http://cbioportal.org)⁴⁵. The samples corresponding to RNA-seq, CNA, 450K methylation, and clinical survival information (OS and OS.time) were matched one by one, and the samples with these data were retained. As a result of these screenings, a total of 335 samples meeting the requirements were obtained; the clinical features of these samples are shown in Table 4.

Screening of prognostic molecular subtype

Based on the mRNA expression, 450k methylation, CNA, and survival information of the 335 samples, the FsbyCox function in the CancerSubtypes package⁴⁶ was employed to perform the univariate Cox regression analysis, and the prognostic-related characteristics of mRNA, methylation gene, and CNA region were acquired with the cutoff value of P < 0.05. Cluster analysis was then carried out using iClusterPlus⁴⁷ in R software package, and the parameter was set as K = 1:6 in order to select the best number of clusters. Combined with the sample survival information, the log-rank test was conducted, and the classification results with the lowest P value were selected to determine the molecular subtypes. To verify the classification results, principal component analysis (PCA) and hierarchical clustering analysis were performed.

Tumor mutation analysis

The somatic mutation file processed using Mutect software was obtained from the TCGA database⁴⁸. The oncoplot function in maftools⁴⁹ R package was employed to draw the waterfall of the top10 mutated genes with a high mutation frequency. The mutation frequency of each gene in different subtypes was calculated, and the differences were compared using the Chi-square test.

Differential expression analysis

The linear regression and empirical Bayesian methods offered in the limma package⁵⁰ in R software were utilized to conduct differential expression analysis, and the P values were adjusted using the Benjamini & Hochberg method for multiple comparisons. The DEmRNA and DElncRNA were screened with a cutoff value of P < 0.05 and |log₂FC|> 0.263 owing to acquiring more DElncRNA for subsequent analysis.

Genomic copy number anomalies and their association to lncRNAs

The GISTIC 2.0 tool⁵¹ was used to define CNA extracted from the TCGA-LIHC dataset with a cutoff value of gain /loss threshold > 0.1 and Q < 0.25 (When using GISTIC2 to the detect significantly gain or loss genomic regions in a group of samples, the integration of all results of gistic can be obtained, including gain and loss regions, and the samples of gain or loss in each region and the Q value in the peak region are acquired). Copy numbers ≥ 1 or ≤ −1 were considered gain and loss, respectively. The variant frequency of lncRNAs in samples was calculated, and the copy number gain and loss distribution of lncRNA in the genome were analyzed using the Rcircos tool⁵². Samples with lncRNA expression profiles were chosen, and Pearson correlation coefficients between CNA and lncRNA expression were analyzed. In addition, the differences of the expression of lncRNA with CNA frequency > 75% between normal, copy loss, and gain samples were compared using Kruskal–Wallis test.

Screening lncRNA prognostic markers with CNA and construction of lncRNA signature

First, the lncRNAs that met the following criteria were considered as candidate lncRNAs: CNA frequency > 5%, a significant positive correlation between CNA and expression (correlation coefficient > 0.3 and P < 0.05), and DE between different subtypes. Univariate Cox regression analysis was then performed to identify prognostic-related lncRNAs using the Survminer package (P < 0.05)⁵³. In addition, the samples in the TCGA database were categorized into training set (n = 234) and testing set (n = 101) based on 7:3, and the whole sample dataset was used as the validation set (n = 335). In the training set, the LASSO Cox regression analysis was performed using the glmnet package⁵⁴, and a 20-fold cross-validation was utilized to build the lncRNA signature. The risk score was analyzed using the following formula: Risk score = β_lncRNA1 × expr_lncRNA1 + β_lncRNA2 × expr_lncRNA2 + … + β_lncRNAn × expr_lncRNAn (β represents the regression coefficient of lncRNAs, and expr represents the lncRNA expression level). The samples were then categorized into High_ and Low_risk groups based on the median risk score. Kaplan–Meier survival curve analysis was then conducted. In addition, the model was validated in the testing and validation sets. To verify the prognostic performance of the lncRNA signature, receiver operating characteristic (ROC) analysis was carried out in three sets using the survivalROC package⁵⁵.

Development of the nomogram

Nomogram is a method to display the results of the signature intuitively and effectively, and is conveniently applied in the prediction of the outcome. It uses the length of the line to represent the different variables, thereby exhibiting the effect of different variable values on the outcome. To test whether the Riskscore model was an independent prognostic factor, univariate and multivariate Cox regression analyses were carried out on RiskGroup and clinical characteristics, including AGE, AJCC_PATHOLOGIC_TUMOR_STAGE, and GRADE, and the characteristics with P < 0.05, were used to build a nomogram. The nomogram was validated by assessing the discrimination and calibration. To be clear, the calibration curve of the nomogram was plotted to observe the nomogram prediction probabilities against the observed rates.