Introduction

Lung cancer remains one of the deadliest cancers in the world, and 85% of them are non-small cell lung cancer (NSCLC)1. According to statistics, lung adenocarcinoma (LUAD) is a major subtype of NSCLC, accounting for approximately 40% of all patients2. Despite significant advances in the treatment of LUAD, the 5-year overall survival (OS) rate for LUAD is less than 20%3.

In recent years, with the rapid development of immunotherapy, especially the application of immune checkpoint inhibitors (ICIs), the prognosis of lung cancer patients has been improved significantly4,5. However, not all patients respond to ICIs, and a minority of patients do not benefit from ICIs6. In addition, it is worth noting that similar to chemotherapy or targeted therapy, a majority of patients who initially respond to ICI therapy and then develop primary or secondary resistance.

A growing number of studies suggest that tumor microenvironment (TME) plays an important role in cancer development and antitumor processes and may involve in immunotherapy resistance7. TME contains various immune cells, stromal cells and extracellular matrix molecules, of which tumor-infiltrating lymphocytes (TILs) are the main immune cells exerting anti-tumor activity8,9. TILs have been reported as predictive immune efficacy biomarkers for lung cancer10. However, TILs, especially CD8 + T cells, eventually developed a state of dysfunction known as T-cell exhaustion during long-term tumor fight11. The exhausted T cells have been demonstrated to exhibit overexpression of inhibitory receptors, metabolic dysregulation, epigenetic reprogramming and loss of effector functions12,13,14. And studies have proven that exhausted T cells can induce immune tolerance15. Immunotherapy depends on the activation of T cells to eliminate tumors. However, the emergence of T-cell exhaustion impedes it capacity of anti-tumor and facilitates immune escape. The formation of exhausted T cells is regulated by multiple factors, and a deep understanding of key genes that regulated T cell exhaustion might help to interpret the underlying mechanism.

Therefore, our study aimed to explored the crucial molecules of T cell exhaustion in LUAD, and attempted to construct a prognostic signature and nomogram model to predict prognosis and immunotherapy response for LUAD patients.

Methods

Publicly available datasets

We downloaded the normalized RNA sequencing data (transcripts per million, TPM) and raw counts data of 539 LUAD patients from the cancer genome atlas (TCGA) (https://portal.gdc.cancer.gov/) database. The normalized matrix files of GSE72094 cohort from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/), which had included 442 LUAD patients, used as an independent external cohort for risk model validation. The patients’ characteristics and other information of TCGA-LUAD and GSE72094 cohorts were shown in Supplementary file 1: Table S1. In addition, immunotherapy cohorts including GSE135222 and GSE91061 from GEO database were included for validation of immunotherapy response. Probe IDs were converted to gene symbols according to the platform annotation file. If multiple probes matched the same gene symbol, their median expression values were selected. The information of TCGA-LUAD and GEO datasets was summarized in Supplementary file 1: Table S2.

Weighted correction network analysis (WGCNA)

Weighted correction network analysis (WGCNA) was used to construct co-expression networks of scale-free distributions of genes based on correlations to gene expression, from which the most relevant modules to clinical features were identified16. In our study, we used the "pickSoftThreshold" method to choose an appropriate soft threshold parameter. Then, the adjacency was transformed into a topological overlap matrix (TOM) and genes with similar expression patterns were divided into the same module. Finally, genes from modules that had high correlation coefficients with T-cell exhaustion and dysfunction were identified for subsequent analysis.

Consensus clustering analysis

The “ConsensusClusterPlus” R package was used to perform the unsupervised consensus clustering analysis17. We used 80% of the items for subsampling and divided each subsample into groups by the k-means algorithm, repeating this clustering process 1000 times. Then, The cumulative distribution function (CDF) curve and consensus matrix are used to identify the optimal k-value. Based on the optimal k-value, patients were clustered into sub-clusters. In addition, the principal component analysis (PCA) and principal coordinate analysis (PCoA) were performed to show the distribution difference of sub-clusters.

Functional enrichment analysis

Enrichment analysis in Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) were based on the “Clusterprofiler” R package, where “adjusted p < 0.05” was considered significant18. The gene sets of “hallmark” from the Molecular Signatures Database (MSigDB) were analyzed through the gene set enrichment analysis software (v4.3.2). P.adjust < 0.05 and q values < 0.25 were considered statistically significant19,20.

Tumor immune microenvironment evaluation

The ESTIMATE algorithm tool was used to was used to calculate stromal, immune and estimated scores21. The CIBERSORTx (https://cibersort.stanford.edu/) was used to evaluate the composition of 22 kinds of tumor-infiltrating immune cells based on the principle of linear support vector regression22. Single sample gene set enrichment analysis (ssGSEA) based on the R package GSVA was used to quantify single sample immune cell infiltration scores. The immune cell abundance identifier (ImmuCellAI) (http://bioinfo.life.hust.edu.cn/ImmuCellAI) was utilized to obtain the scores of T cell exhaustion23. Moreover, we compared the mRNA expression levels of immune checkpoint suppressor molecules including programed cell death protein 1 (PD1), programmed death ligand 1 (PD-L1), cytotoxic T-lymphocyte associated protein 4 (CTLA4), lymphocyte activating 3 (LAG3), T cell immunoreceptor with Ig and ITIM domains (TIGIT), hepatitis A virus cellular receptor 2 (HAVCR2) and other immunosuppressive molecule in two clusters.

Development and verification of prognostic signature

Firstly, differential genes between cluster 1 and cluster 2 in TCGA cohort were analyzed with the “DESeq2” R package based on the mRNA expression of raw counts data24. The threshold of adjusted P value less than 0.05 (P < 0.05) and the absolute log2 fold-change greater than 1 (|log2FC|> 1) were define differential genes. Next, univariate cox regression analysis was utilized to screen differential genes that were associated with OS in TCGA and GSE72094 cohort using the “survival” R package (P < 0.05). Then, we performed machine learning algorithms including the least absolute shrinkage and selection operator (LASSO) regression analysis25 and random forest (RF) algorithm26 to improve the accuracy and reliability of prognostic signature using the “glmnet” and “randomForestSRC” R packages, respectively. Furthermore, we used the intersection of LASSO and the RF to determine candidate genes, followed by multivariate cox regression analysis. According to the results of the multivariate cox analysis, the prognostic signature was established as follows: Riskscore = \(\sum_{i=1}^{n}\left(Coef\times Ni\right)\), where Coef referred the coefficient of gene i and Ni represented the expression value of gene i.

To further validate the predictive power of the prognostic signature, we performed Kaplan–Meier survival analysis in the TCGA-LUAD and GSE72094 cohorts, where the cutoff value was set to the median riskscore. Time-dependent receiver operating characteristic curve (ROC) were plotted using the “timeROC” R package to predict OS at 1, 2 and 3 years in the training dataset (TCGA-LUAD) and validation cohort (GSE72094). Finally, we further analyzed the relationship between the prognostic signature and clinical characteristics (including age, gender and stage) to clarify the independent prognostic value of riskscore using multivariate Cox regression analyses.

Development and evaluation of the nomogram

We performed the univariate and multivariate Cox regression analyses for clinical parameters and riskscore. In the multivariate Cox model, variables with P < 0.05 were included in the construction of the nomogram. The calibration analysis and time-dependent ROC curves were used to assess the prognostic accuracy of the nomogram model. Decision curve analysis (DCA) curves used to evaluate the net benefit of the nomogram.

Immunotherapeutic response prediction

Tumor Immune Dysfunction and Exclusion (TIDE) was a computational method to predict ICIs response27. Based on transcriptomic data from TCGA cohort, T-cell dysfunction scores and TIDE scores were calculated through an online website (http://tide.dfci.harvard.edu). The Cancer Immunome Atlas (TCIA) database describes the immune landscape of 20 solid tumors and developed a novel score called “Immunophenscore” (IPS), which also predicted the response to ICIs. We calculated each patients’ IPS scores through an online website (https://tcia.at/)28. Tumor mutational load (TMB) was a valuable biomarker to assess the efficiency of immunotherapy29. We utilized the “TCGAbiolinks” R package to download somatic mutation data, and then calculated TMB values for each LUAD patient in the TCGA cohort based on the “Maftools” R package30. In addition, the mRNA expression of PD-1, PD-L1 and CTLA4 was analyzed in low or high-risk groups.

Chemotherapeutic drug sensitivity prediction

We utilized the “pRophetic” R package to estimate the half-maximal inhibitory concentrations (IC50) of drugs to predict the sensitivity of LUAD patients to chemotherapy and targeted therapies31.

Statistical analysis

All statistical analyses were performed using R software (version 4.2.2). Wilcoxon test was used to compare the differences between groups. The log-rank test was used to compare Kaplan–Meier survival curves. Univariate and multivariate Cox analyses were performed to establish independent prognostic factors. All P values were two-sided and less than 0.05 were considered statistically significant.

Results

Identification of Tex-related genes and functional analysis

WGCNA analysis was conducted according to the expression of LUAD mRNA in the TCGA cohort (Fig. 1A). We choose a soft threshold of 4, which met the scale-free network rule (Fig. 1B). Figure 1C–E showed that the purple module was the most significantly related to T-cell exhaustion (Tex) (cor = 0.5, P = 7.3e-25) and T cell dysfunction (cor = 0.75, P = 3.2e-68). Therefore, the purple module (n = 371) was characterized as Tex-related genes. GO and KEGG enrichment analysis revealed possible biological processes involved in Tex-related genes (Supplementary file 2: Fig. S1).

Figure 1
figure 1

WGCNA analysis in TCGA cohort. (A) The coexpression network. (B) The soft threshold power of WGCNA. The left one showed the analysis of the scale-free index for various soft threshold powers. The right one showed the analysis of the average connectivity for various soft threshold powers. (C) Heatmap displayed correlation between module eigengenes and T-cell exhaustion and T-cell dysfunction. (D) Identification of the modules most significantly associated with T-cell exhaustion. (E) Identification of modules most significantly associated with T-cell dysfunction.

Identification of Tex subgroups

Tex subgroups were identified by using an unsupervised clustering approach based on the expression levels of Tex related genes in TCGA. Based on the consensus CDF curve (Fig. 2A), the relative change in area under the CDF curve (Fig. 2B), and the consensus matrix (Fig. 2C), we finally selected k = 2 as the best cluster. Two different subgroups were identified, including 375 patients in cluster 1 and 155 patients in cluster 2. It is clear that patients in cluster 1 had a better OS than those in cluster 2 (P = 0.039, Fig. 2D). Two significantly different subgroups were further confirmed by the results of PCA as well as PCoA analysis (Fig. 2E–F). In addition, we compared the differences in clinical characteristics between cluster 1 and cluster 2. The results in Table 1 showed that patients in cluster 2 had a higher proportion of female and older patients (> 65 years), and there was no difference in the tumor stage between two subgroups.

Figure 2
figure 2

Unsupervised consensus cluster analysis for LUAD patients in the TCGA cohort based on T-cell exhaustion related genes. (A) Consensus CDF from k = 2–9. (B) Delta area under the cumulative distribution function (CDF) curve of different clusters ranging from k = 2–9. (C) Consensus matrix for k = 2. (D) The overall survival (OS) probability of the patients in the two clusters. (E) The principal component analysis (PCA) and principal coordinate analysis (PCoA) of two clusters.

Table 1 The clinical characteristics between cluster 1 and cluster 2.

Immunological landscape of Tex subtypes

First, we calculated the composition of immune cells between clsuter 1 and cluster 2 by the CIBERSORT algorithm (Fig. 3A). Comparing with cluster 1, cluster 2 had significantly more abundance of CD8 T cell and CD4 memory activated T cell. However, despite having higher CD4/CD8 T cell levels, patients in cluster 2 had worse prognosis than those in cluster 1. This was contradictory to the fact that high levels of tumor infiltrating CD4/CD8 T cells were generally associated with better prognosis. Therefore, we hypothesized that these CD4/CD8 T cells in cluster 2 were prone to exhaustion. Next, we compared the mRNA expression levels of immune checkpoint suppressor molecules in the two clusters. The results showed that PDCD1 (PD1), CD274 (PD-L1), CTLA4, LAG3, TIGIT, HAVCR2 and other immunosuppressive molecules were upregulated in cluster 2 (Fig. 3B). Then, we assessed the TME score using an estimation algorithm. The results displayed that cluster 2 had higher immune, stromal, and estimate scores than those in cluster 1 (Fig. 3C). In addition, we also found that T-cells exhaustion as well as T-cell dysfunction scores were significantly higher in cluster 2 than in cluster 1 (Fig. 3D–E). Finally, we performed GSEA analysis based on the Tex-related genes identified by WGCNA, and the results showed that cluster 2 was enriched in IL6/JAK/STAT3 signaling, interferon_gamma response, TNFA signaling via NFKB and inflammatory response (Fig. 4). Based on the above results, we inferred that cluster 2 mediated the suppressive immune microenvironment associated with T-cell exhaustion.

Figure 3
figure 3

Immune analysis of different clusters. (A) Differences in abundance of tumor-infiltrating immune cells in different clusters based on CIBERSORT. (B) Differential mRNA expression of immune checkpoints in different clusters. (C) The stromal, immune and estimate scores between two clusters. (D) Differences in T-cell exhasution score between different clusters. (E) Differences in T-cell dysfunction score between different clusters. (*p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001).

Figure 4
figure 4

Gene set enrichment analysis (GSEA) showed revealed enrichment for biological processes associated with immunosuppression.

Machine learning identifies hub genes and functional analysis

First of all, we performed differential analysis of mRNA expression in cluster1 and cluster 2 based on the TCGA cohort to identify differential genes (Fig. 5A). Then, univariate cox regression analysis was utilized to screen differential genes that were associated with OS in TCGA and GSE72094 cohort. Venn diagram showed a total of 35 genes were identified as Tex related prognostic differential genes (Fig. 5B). Next, top 15 genes were selected using RF (Fig. 5C) and 11 genes were selected using LASSO (Fig. 5D–E). Finally, 9 intersected genes (including CPS1, FOSL1, GJB3, HLA-DOB, IGF2BP1, IGFBP1, KLK11, KRT6A, KRT81) were identified as hub genes (Fig. 5F). GO and KEGG analysis showed that these nine hub genes were involved in response to cAMP, carbon–nitrogen ligase activity, MHC class II receptor activity, gap junction channel activity and so on (all adjusted P < 0.05, Supplementary file 2: Fig. S2).

Figure 5
figure 5

Machine learning identifies hub genes associated with involved in the regulation of T-cell exhaustion. (A) Volcano map shows differential genes between cluster1 and cluster 2 in the TCGA. (B) 35 intersected differential genes associated with OS in TCGA and GSE72094 cohort. (C) The importance of 35 genes using random forest (RF). (DE) 11 hub genes using the least absolute shrinkage and selection operator (LASSO) regression analysis. (F) 9 intersected hub genes based on 11 genes in LASSO and 15 top genes in RF.

Construction of prognostic signature

The hub genes were enrolled in multivariate cox analysis to construct the prognostic signature: riskscore = (CPS1 × 0.0376) + (FOSL1 × 0.0731) + (GJB3 × 0.0641)—(HLA-DOB × 0.1590) + (IGF2BP1 × 0.0732) + (IGFBP1 × 0.1072)—(KLK11 × 0.0445) + (KRT6A × 0.0460) + (KRT81 × 0.0439)—0.2897. Patients in the TCGA cohort were classified as low- or high-risk group when the cutoff value was defined as the median riskscore (Fig. 6A). The Kaplan–Meier survival curves showed showed that the median OS of the low-risk group was longer than that of the low-risk group (P < 0.001, Fig. 6B). The time-dependent ROC curve displayed that the risk score had a good predictive performance, with an area under the curve (AUC) of 0.718, 0.703, 0.692 at 1, 2, and 3 years, respectively (Fig. 6C). In addition, when the riskscore was included in multivariate cox analysis combined with age, gender, and stage, it still was demonstrated as an independent prognostic factor for OS (HR = 1.904; 95% CI 1.389–2.612; P < 0.001; Fig. 6D). To further confirmed the reliability of the riskscore, we used the same formula to calculate the riskscore for patients in the GSE72094 cohort and used it for external validation. Similar to the previous results (Fig. 6E–H), the riskscore was an independent prognostic factor (HR = 1.990; 95% CI: 1.321–2.997; P < 0.001) as well and had good predictive power with an AUC at 1-, 2-, 3-year of 0.692, 0.709, 0.715. Moreover, we further stratified Tex subgroups based on riskscore and found that cluster 2 with high riskscore had the worst prognosis (Supplementary file 2: Fig. S3).

Figure 6
figure 6

The establishment and validation of the riskscore in TCGA and GSE72094 cohort respectively. (A, E) Risk map for prognostic signature and heat map for hub genes expression. (B, F) Kaplan–Meier curves for the riskscore. (C, G) The prognostic signature predict time-dependent ROC curves at 1, 2, and 3 years OS. (D, H) Multivariate cox regression analysis to verify the independent predictive value of the riskscore.

Correlation between the riskscore and clinical features

We explored relationships between the riskscore and clinical features in TCGA cohort and found that gender and TNM stage were significant correlated with the riskscore. Male, T3-4 stage, more lymph node metastases, distant metastases, and advanced stage patients tended to have higher riskscores (Fig. 7).

Figure 7
figure 7

Correlation between the riskscore and clinical features. (A) age, (B) gender, (C) T stage, (D) N stage, (E) M stage, (F) pathological stage. (ns p > 0.05; *p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001).

Construction and evaluation of nomogram model

The above results suggested that both tumor stage and risk score were independent prognostic factors. Thus, a nomogram was constructed to predict survival combined pathological stage with the riskscore (Fig. 8A). Calibration curves showed substantial agreement between the expected and actual probabilities of nomgram in predicting 1, 2, and 3-year survival (Fig. 8B). DCA curves showed the nomogram model had a greater net benefit in predicting the 3‐year OS (Fig. 8C). The time-dependent ROC curves showed a higher accuracy of the nomogram in predicting survival, with AUC values of 0.739, 0.716, 0.714 at 1, 2, and 3 years OS (Fig. 8D). Meanwhile, the model was validated using the GSE72094 cohort, and the results were similar to the results in TCGA cohort (Fig. 8E–G).

Figure 8
figure 8

Construction and evaluation of the nomogram. (A) The nomogram combined the riskscore with stage for prognostic prediction of a patient with LUAD in the TCGA cohort. (B, E) Calibration curves of 1-year, 2-year, and 3-year OS for LUAD patients in the TCGA cohort and GSE72094 cohort. (C, F) Decision curve analysis of 3-year survival benefit in the TCGA cohort and GSE72094 cohort. (D, G) Time-dependent receiver operating characteristic (ROC) curves of the nomogram to predict 1-year, 2-year, and 3-year OS in the TCGA cohort and GSE72094 cohort.

Immunotherapeutic response prediction

To assess ability of the riskscore as a biomarker for predicting immunotherapeutic response, we estimated the distribution of TME scores, IPS score, TIDE scores, TMB, immune cell infiltration scores and the mRNA expression of HLA and immune checkpoint inhibitors in different risk groups. The results of the TME score analysis indicated that the low-risk group had higher immune score (P < 0.001; Fig. 9A), stromal score (P = 0.05; Fig. 9B), and estimate score (P < 0.01; Fig. 9C). Further, TCIA database analysis revealed the low-risk group had higher IPS (P < 0.001; Fig. 9D). We also found that TIDE scores and TMB were higher in the high-risk group (both P < 0.001; Fig. 9E–F). In addition, our study results showed that the mRNA expression of PD-1 and PD-L1 were not significantly different between the two groups, except for CTLA4, which was significantly upregulated in the low-risk group. (Fig. 9G–I). We also observed that the low-risk group had significantly higher immune cell infiltration than the high-risk group, including activated CD8T cells, cytotoxic cells, dendritic cells, T helper cells, and so on (Supplementary file 2: Fig. S4). Lastly, we investigated the relationship between high- and low-risk groups and human leukocyte antigens (HLA). Except for HLA-A/B/C/F/G, the expression of HLA appeared to be higher in low-risk group (Supplementary file 2: Fig. S5).

Figure 9
figure 9

The distribution of immune score (A), stromal score (B), estimate score (C), IPS score (D), TIDE score (E), TMB (F), and the mRNA expression of immune checkpoint inhibitors (GI) in different risk groups.

To further validate that the riskscore could effectively predict immune efficacy response, we also included two immunotherapy cohorts to evaluate the riskscore model. GSE91061, which including 109 melanoma samples with anti-CTLA4 and anti-PD1 therapy demonstrated patients with complete response (CR) or partial response (PR) had a lower riskscore than those with stable disease (SD) or progressive disease (PD) (Fig. 10A). Moreover, GSE135222, which including 27 advanced NSCLC samples with anti-PD-1/PD-L1 confirmed that the low-risk group had better PFS compared to the high-risk group (Fig. 10B).

Figure 10
figure 10

(A) Differences in immunotherapy response between low- and high-risk groups in the GSE91061 dataset. (B) Progression-free survival for patient with anti-PD-1/PD-L1 therapy between the low- and high-risk groups in the GSE135222 dataset.

Chemotherapeutic drug sensitivity prediction

The IC50 values of common chemotherapy and targeted drugs for LUAD were calculated to further explored the drug sensitivity between low- and high-risk groups. By comparing the difference in IC50 values in two risk groups (Fig. 11), we found that patients in the low-risk group had a higher sensitivity to paclitaxel, docetaxel, doxorubicin, gefitinib, lapatinib, and tipifarnib while patients in high-risk group had a higher sensitivity to axitinib and methotrexate. Besides, patients in the high and low risk groups showed no significant difference in the sensitivity of cisplatin, etoposide, erlotinib and gefitinib.

Figure 11
figure 11

The IC50 values of chemotherapy and targeted drugs for LUAD in different risk groups. (*p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001).

Discussion

T-cell exhaustion is considered to be one of the most crucial reasons for immune resistance32. However, it is still not known what regulates T-cell exhaustion in the tumor microenvironment. Therefore, an in-depth exploration of key regulatory genes of T-cell exhaustion could help to understand the potential mechanisms of immune resistance and accordingly provide a theoretical basis for clinical decision making.

In the present study, we used WGCNA to identify the module of genes most related to T-cell exhaustion and T-cell dysfunction. Next, unsupervised clustering approach based on the expression levels of these genes divided patients into two different subgroups. Obviously, patients in Cluster 1 had better OS compared with patients in cluster 2. It was intriguing that patients in cluster 2 had higher abundance of CD8 T cell and CD4 memory activated T cell compared to group 1. This is contradictory to the fact that high levels of tumor-infiltrating CD4/CD8 T cells usually associated with good outcome33,34. Therefore, we hypothesized that these CD4/CD8 T cells in cluster 2 maybe exhausted T cells. Research had revealed that exhausted T cells often exhibit persistent high expression of multiple suppressive receptors, such as PD-1, CTLA-4, and LAG-313. In our study, we found that PD-1, CTLA4, LAG3, TIGIT, HAVCR2 and other immunosuppressive molecules were upregulated in cluster 2. In addition, GSEA analysis cluster 2 was enriched in IL6/JAK/STAT3 signaling, interferon_gamma response, TNF_alpha signaling via NFKB and inflammatory response. Studies had demonstrated that JAK/STAT signaling pathway35,36 the IFN-γsignaling pathway37 and TNF-ɑ signaling via NFKB38 caused immune escape by upregulating PD-L1 expression. Therefore, we inferred that cluster 2 might mediate the suppressive immune microenvironment.

Next, we further analyzed the significantly differentially expressed genes between the two groups and used machine learning algorithms to finally identify nine key regulatory genes, including CPS1, FOSL1, GJB3, IGF2BP1, HLA-DOB, IGFBP1, KRT6A, KLK11, KRT81, that may be involved in regulating T cell exhaustion. The enzyme carbamoyl phosphate synthetase 1 (CPS1) is a key rate-limiting enzyme in the urea cycle, involving in ammonium conversion and mediated arginine metabolism and pyrimidine metabolism39. T cell is highly sensitive to extracellular levels of arginine40, and arginine is essential for T cell function41,42. It has been shown that low arginine mediates an immunosuppressive microenvironment, which may suppress T-cell responses by providing a brake on T-cell proliferation43. In addition, CPS1 deficiency could lead to hyperammonemia, which impair mitochondrial function, reduce ATP synthesis, and increase free radical formation, leading to oxidative stress44, thereby possibly inducing T-cell exhaustion. FOSL1 encodes Fra-1, which is initially found to be highly expressed in solid tumors and is the member of the FOS family in activator protein (AP-1). Studies indicate that the AP-1 family members, including Fra-1, have essential effects in T cell development45. In melanoma, Fra-1 suppress the conversion of Treg cells into effector T cells under the regulation of Ubc13-IKK signaling axis46. In addition, Fra-1 is strongly associated with Epithelial-to-Mesenchymal Transition (EMT) in cancer cells46,47. It has been recently shown that EMT-related pathways could harm CD8 + T cell function, leading to immune evasion48. Moreover, a study from Lee et al. found a significant correlation between Fra-1 and PD-L1 expression, and high Fra-1 expression was associated with poorer overall survival49. Connexin 31 encoded by gap junction protein Beta 3 (GJB3), is one of the major members of the connexin family. Studies have indicated that connexin proteins mainly serve as channels to transport metabolites such as nucleotides, glutamate and glucose50. There is increasing evidence to support that metabolic changes in tumor cells affect the function of immune cells52. A study from Huo et al. reported that GJB3 promoted neutrophil survival and polarization by forming a channel between pancreatic tumor cells and neutrophils, transferring cyclic adenosine monophosphate (cAMP) from cancer to neutrophils53. Meanwhile, some studies suggested that cAMP could induce T-cell senescence54,55. Therefore, we hypothesized that GJB1 might be able to transfer cAMP to T cells in the same way which might cause T-cell exhaustion. The RNA modification N6-methyladenosine (m6A) has a significant value in the immune system56, and its dysregulation is associated with poor prognosis57. For example, a study showed that the m6A reader insulin-like growth factor-2 mRNA-binding protein 1 (IGF2BP1) could recognized the 30-UTR of PD-L1 mRNA and thus mediated stable PD-L1 expression58. The major histocompatibility complex, class II, DO beta (HLA-DOB) belonged to the HLA class II beta chain paralogue. It is demonstrated that HLA-II has antigen-presenting functions, participates in T-cell differentiation, and mediates the activation of T cells to provoke immune responses59. The insulin-like growth factor binding protein 1 (IGFBP1) has a vital action in regulating cell growth60. It had shown that ICFBP1 was expressed in T cells61. A research from Han et al. suggested that IGFBP1 had effects on CD4 + T cell immunomodulation62. Moreover, high expression of IGFBP1 is demonstrated to be closely associated with unfavorable OS in NSCLC63. Kallikrein related peptidase 11 (KLK11) is a member of the human KLK gene family, which is known to perform in a number of physiological processes, including extracellular matrix (ECM) remodeling, cell proliferation and differentiation64. Many studies have shown that KLK11 is aberrantly expressed in tumors and significantly correlated with survival65,66,67. The keratin 6A (KRT6A) and keratin 81 (KRT81) are the member of the keratin gene family. KRT6A gene overexpression in LUAD promotes lung cancer cell proliferation by EMT68. KRT81 has been identified as a promising biomarker for the identification of squamous cell lung cancer69.

The nine key genes described above may involved in T-cell exhaustion. Although the specific mechanisms have not yet been reported, we constructed a prognostic signature based on these genes that was confirmed to have some clinical significance. In the TCGA cohort, the prognostic signature was associated with OS, and patients in the low-risk group had a good prognosis. Then we performed external validation using the GSE72094 cohort to further confirm the general applicability of the prognostic signature. Moreover, we found that cluster 2 with high risk scores had the worst prognosis.

Immunotherapy had improved prognosis in lung cancer, but the benefit was limited. Only a minority of patients did benefit from ICIs6. One of the more widely utilized immune efficacy markers in clinical practice was PD-L1 expression. Numerous studies demonstrated that the expression level of PD-L1 is closely related to the efficacy of immunotherapy70,71. However, PD-L1 was not completely accurate in predicting immune efficacy. Hence, it was necessary to discover a signature that could predict the efficacy of immunotherapy. In our study, the prognostic signature we constructed may be useful to predict the immune response. Patients in the low-risk group had higher immune, stromal, and estimate scores, higher IPS, lower TIDE score which suggested a better response to immune checkpoint inhibitors (ICIs) therapy in the low-risk group. In the meantime, we included two separate immunotherapy cohorts, GSE135222 and GSE91061, which demonstrated better immune efficacy in the low-risk group. Besides, chemotherapy and targeted therapy were both the major strategies for the treatment of advanced LUAD. We found differences in the sensitivity of chemotherapeutic and targeted agents between high and low risk groups, suggesting that this prognostic signature might provide assistance in the selection of clinically sensitive agents for LUAD. Finally, the nomogram model combined with tumor stage and the riskscore could effectively predict the prognosis of LUAD patients.

Although our study had been successfully validated in an external cohort. However, there are several limitations in our study. Firstly, the study was based on publicly available databases and its training cohort (TCGA) and validation cohort (GEO) were retrospective. Therefore prospective studies are necessary to validate our conclusions. Secondly, there was a lack of molecular mechanism studies to investigate the functional role of candidate key genes, and in the future we will conduct further in vitro and in vivo experiments to confirm the potential regulatory mechanisms. Thirdly, in lack of a large cohort of lung cancer immunotherapy patients, our findings still need to be further validated by future prospective studies with larger samples.

Conclusions

In summary, we identified nine key genes (including CPS1, FOSL1, GJB3, HLA-DOB, IGF2BP1, IGFBP1, KLK11, KRT6A, KRT81) that may involved in the regulation of T-cell exhaustion and constructed a riskscore that could help predict immunotherapy response and the selection of chemotherapeutic and targeted agents. In addition, the nomogram built in combination with tumor stage and the riskscore may be a powerful tool for LUAD survival prediction.