Introduction

T cells play an essential role in the antitumor immunity by directly killing tumor cells. Recent progress of TCR-T and Chimeric Antigen Receptor-T (CAR-T) cell therapies demonstrated that targeting tumor antigens by T cells is an effective approach to cancer treatment.1,2,3,4,5,6 Since TSAs derived from random cancer mutations are rarely shared among patients, TCRs recognizing tumor-associated antigens (TAAs) such as NY-ESO-1, GP100, and MAGE are commonly used in TCR-T cell therapies.1,2,3,7,8 Owing to the limited cancer types expressing these TAAs and the restriction of specific human leukocyte antigen (HLA) types, only a small group of patients could benefit from these therapies. The heterogeneous expression of TAAs within a tumor further limits the therapeutic effect.4,9 Moreover, the expression of the target antigens in normal cells might cause the on-target off-tumor toxicity.10,11,12 To overcome these barriers, polyclonal TCRs targeting multiple tumor antigens with improved tumor specificity are necessary.

The host immune system activates T cells by tumor antigens to specifically attack tumors.13 These naturally selected Tas cells are enriched but suppressed in the tumor microenvironment.14,15 The high response rate and low toxicity of tumor-infiltrating lymphocytes (TILs) therapy indicate that the naturally occurring Tas cells are the most suitable immune cells for cancer treatment.16,17,18 Despite that, extraction and expansion of TILs from a tumor tissue are still challenging. Successful production of sufficient TILs for treatment is limited to samples with high levels of TILs. Due to the suppressive microenvironment, TILs are typically present in low numbers within a tumor.19 Majority of the cancer patients are thus not qualified for TILs therapy. A feasible approach to overcome this limitation is to clone TCRs from Tas cells and then use them to generate personalized TCR-T cells, which could produce plenty of T cells expressing polyclonal TCRs recognizing both TSAs and TAAs without knowing the sequences of these antigens. In addition, the natural selection of self TCRs could reduce the risk of the on-target off-tumor toxicity of these TCR-T cells in the same patient. However, previous studies have shown that only a low number of TCRs in the TCR repertoire of TILs are tumor reactive.13 Current methods of Tas cell identification rely on the inefficient co-culture of T cells with tumor antigens or autologous tumor cells in vitro,20 which largely impedes their application in clinic. Sorting Tas cells from patient tumor samples through FACS may be an effective approach to obtain tumor antigen-specific TCRs for personalized TCR-T cell therapy. However, Tas cells are not well defined. Specific biomarkers distinguishing Tas cells from other tumor-infiltrating T cell populations are missing.

The therapeutic efficacy of ICB also depends on the Tas cell-mediated tumor destruction.21 Moreover, intratumor accumulation of Tas cells occurs at the late stage of the antitumor immune cycle, which indicates the completion of the early steps such as antigen presentation and recognition.22 Biomarkers evaluating Tas cell levels in tumors may provide more precise prediction of response to ICB than markers from early steps in the immune cycle, such as tumor mutation burden (TMB).

Due to the continuous tumor antigen stimulation in the tumor microenvironment, Tas cells might display distinct features of gene expression and TCR expansion from bystander T cells in a tumor. We hypothesized that single-cell transcriptomic analysis and TCR sequencing plus neoantigen stimulation may identify Tas cells as well as their biomarkers.

Results

In vitro tumor antigen stimulation system

We established an in vitro tumor antigen stimulation system to investigate the antigen specificity of TCRs from sequenced T cells (Fig. 1a). Briefly, TCR sequences were constructed and transduced into T cells from peripheral blood mononuclear cells (PBMCs) to establish TCR-T cells. Potent neoantigen epitopes derived from tumor mutations were identified by whole-exome sequencing (WES) and bulk RNA sequencing (RNA-seq) of tumor and peri-tumor tissues. B lymphoblastoid cells converted from autologous B cells through Epstein-Barr Virus (EBV) infection were used as antigen presentation cells (APCs). In vitro tumor antigen stimulation was performed by co-culturing TCR-T cells with APCs transduced with tandem minigenes consisting of tumor-specific mutation sequences. The responses of tested TCRs were evaluated by the production of Interferon γ (IFNγ). In the meantime, tumors of the sequenced patients were transplanted into immunodeficient NOD-SCID IL-2 receptor gamma null (NSG) mice to establish PDX models for the therapeutic experiments of TCR-T cells.

Fig. 1: ScRNA-seq identified tumor-enriched T cell clusters.
figure 1

a Scheme of overall study design. b t-SNE projection of the expanded CD4+ and CD8+ T cells. Cells are color-coded for clusters. c Mean expression of genes associated with T cell subtypes in each cluster. d Cells from tumor, peri-tumor, normal and blood tissues (left to right panels) are highlighted with blue color in the t-SNE plot of the expanded CD4+ T cells, respectively. e Tissue prevalence estimated by Ro/e score in expanded CD4+ T cells. f Cells from tumor, peri-tumor, normal and blood tissues (left to right panels) are highlighted with blue color in the t-SNE plot of the expanded CD8+ T cells, respectively. g Tissue prevalence estimated by Ro/e score in expanded CD8+ T cells.

ScRNA-seq and TCR-seq identify T cell clusters enriched in tumor

CD45+CD3+ T cells were sorted from tumors, peri-tumors, normal tissues and peripheral blood of 5 treatment-naïve lung cancer patients, which allowed us to profile and trace T cell clones among a tumor and its adjacent tissues. Using the 10x genomics platform, scRNA-seq and TCR-seq were performed on the T cells from the 20 samples of the 5 patients (Supplementary information, Table S1).

After filtering cells with low quality and doublet clearance, 178 million RNA transcripts were obtained in 42,511 T cells with complementarity-determining region 3 (CDR3) sequences (Supplementary information, Fig. S1a). 22,592 unique TCRs were identified by matching CDR3 regions. T cells containing TCRs with two α and/or two β chains were observed (Supplementary information, Table S2). Multiple T cells containing the same TCR indicated that they were not doublets. Four TCR forms, 1α1β, 1α2β, 2α1β, and 2α2β were thus considered in the analysis. After removing the batch effect using Canonical Correlation Analysis (CCA) implemented in Seurat,23 we grouped T cells into CD4+ and CD8+ T cells. The t-Distributed Stochastic Neighbor Embedding (t-SNE) projection indicated that the batch effect was removed successfully (Supplementary information, Fig. S1b).

Given that Tas cells are expanded by tumor antigens, expanded T cells (≥2 T cells containing the same TCR in a patient) were selected for further analysis (Supplementary information, Table S2). We performed separate cluster analysis on expanded CD4+ and CD8+ T cells, and identified ten CD4 and nine CD8 clusters by specific gene expression signatures (Fig. 1b; Supplementary information, Fig. S1c, d, and Table S3). The classic markers of T cell subtypes indicated the presence of conventional naïve, effector, memory, exhausted clusters in both CD4+ and CD8+ T cells, and CD4+ regulatory T cells (Tregs) (Fig. 1c). Two clusters with active mitosis (CD4_C10_MKI67 and CD8_C9_MKI67), characterized by high expression of MKI67, TK1 and STMN1, were identified in CD4+ and CD8+ T cells. Similar clusters were reported in previous studies.24,25,26

In order to find T cell populations enriched in tumor, we checked the tissue distribution of each cluster (Fig. 1d, f). Consistent with previous reports,24 blood and tissue-resident T cells were grouped into distinct clusters. Two CD4 (CD4_C1_LEF1 and CD4_C2_PRF1) and two CD8 clusters (CD8_C1_LTB and CD8_C2_CX3CR1) contained mostly T cells from blood, while the other clusters were almost exclusively located in the other three tissues. The memory and effector clusters were shared by the peri-tumor and normal tissue but less observed in tumor, except for the memory clusters with high expression of GZMK (CD4_C3_GZMK and CD8_C3_GZMK) which spread in all the three tissues (Fig. 1d, f). The exhausted clusters (CD4_C9_CXCL13 and CD8_C8_CXCL13), CD4+ Tregs (CD4_C8_FOXP3), and clusters with active mitosis (CD4_C9_MKI67 and CD8_C9_MKI67) were enriched in tumor. The CD4_C9_MKI67 cluster also presented in normal tissues (Fig. 1d, f). The tissue distribution of each cluster was also confirmed by comparing the observed and expected cell numbers of each cluster (Ro/e) (Fig. 1e, g).24 Interestingly, high expression of exhaustion markers (HAVCR2, CTLA4, TIGIT, LAG3 and TOX) was not limited to the two exhaustion clusters; CD4+ Tregs and the CD8 cluster with active mitosis showed similar expression levels of these genes (Fig. 1c). Medium levels of exhaustion genes were also observed in the CD4 cluster with active mitosis. These findings revealed an exhausted feature of gene expression in tumor-enriched clusters, suggesting that T cells in these clusters had been experiencing antigen stimulation.

Tumor-specific clonal expansion occurs in tumor-enriched T cell clusters

Tumor antigen stimulation leads to clonal expansion of Tas cells in tumor. We therefore analyzed T cell clones in tumor and its adjacent tissues to identify populations with tumor-specific clonal expansion. The frequency of each CD4+ or CD8+ T cell clone in these tissues was calculated by dividing the number of T cells of a TCR clone in these sites by the total number of CD4+ or CD8+ T cells in the corresponding tissues (Supplementary information, Table S4).

We compared the distribution of tumor TCRs in tumor and its adjacent tissues. Both tissue-shared and tumor-specific TCRs were observed (Fig. 2a), suggesting that circulation and local expansion together contributed to T cell composition in tumor. Overall, CD8+ T cells exhibited higher clone frequencies than CD4+ T cells in tumor (Fig. 2a), indicating that tumor antigens are mainly presented by MHC class I molecules in these patients.

Fig. 2: Tumor-enriched T cells were expanded by tumor antigens.
figure 2

a Heatmap showing the frequency of each TCR in the four tissues. The top 200 TCRs in expanded CD4+ and CD8+ T cells from tumor were presented. Columns represent different clonotypes, and rows represent different tissues. Color key represents the frequency of each TCR in every tissue. b Line chart showing the frequency of each TCR in the four tissues. The top 10 TCRs of expanded tumor CD4+ (top) and CD8+ (bottom) T cells in each cluster were presented. Each line represents a TCR. c Scatter plot showing the correlation between TMB and the percent of tumor CD4+ (top) of CD8+ (bottom) T cell from each cluster in the total tumor CD4+ or CD8+ T cells in 10 patients, respectively.

To identify which clusters preserved T cell clones specifically expanded in tumor, we selected the top 10 TCRs of tumor T cells in each cluster, and compared their distribution in tumor and its adjacent tissues (Supplementary information, Table S5). The four blood-specific clusters were not analyzed because they contained very few tumor T cells. Tumor-specific TCR expansion was observed in the tumor-enriched clusters (C9, C10 in CD4+ T cells and C8, C9 in CD8+ T cells) but not in other clusters (Fig. 2b). These results suggested that T cells in the tumor-enriched clusters were expanded by tumor antigens.

The proportions of T cell clusters with tumor-specific TCRs positively correlate with TMB

Neoantigen is a major type of TSAs encoded by mutated genes of tumor. Since Tas cells are expanded by tumor antigens, the abundance of Tas in a tumor might correlate with its TMB, the surrogate of neoantigen burden. We then performed WES using paired tumor and peri-tumor samples to detect tumor-specific mutations. Bulk RNA-seq was performed to evaluate the expression of each mutation. To increase the sample size, tumor samples from five more lung cancer patients were subjected to scRNA-seq. Their tumor and peri-tumor samples were subjected to WES and bulk RNA-seq. A total of 10 patients were analyzed (Supplementary information, Table S1). TMB was calculated as the average number of mutations per megabase (Supplementary information, Table S6). T cells from the 10 tumor samples were assigned to the ten CD4 and nine CD8 clusters using the R package “cellassign” (Supplementary information, Fig. S2a–d, and Table S7).27 To calculate the percent of each cluster in tumor CD4+ or CD8+ T cells, the number of tumor T cells in a cluster was divided by the total number of CD4+ or CD8+ T cells in the tumor.

As shown in Fig. 2c, the proportions of clusters with tumor-specific TCR expansion (C9, C10 in CD4+ T cells and C8, C9 in CD8+ T cells) were positively associated with TMB significantly. Most of the other clusters showed negative association with TMB, although not statistically significant. These data supported that TCR expansion in the tumor-enriched clusters were mainly mediated by the stimulation of neoantigens derived from tumor mutations.

CXCL13 is a unique marker for both CD4+ and CD8+ T cells expressing tumor-specific TCRs

To identify biomarkers for T cells containing TCRs specifically expanded in tumor, gene expression of T cells with tumor-specific TCRs were compared to T cells with tissue-shared TCRs in tumor CD4+ or CD8+ T cells. CD4+ Treg cells were also compared with CD4+ T cells with tissue-shared TCRs. A number of differentially expressed genes were identified in theses T cell populations (Fig. 3a). Interestingly, CXCL13 was specifically expressed in both CD4+ and CD8+ T cells with tumor-specific TCRs (Fig. 3b). Although some exhaustion genes were also highly expressed in these T cells, most of them were detected in Treg cells or showed intermediate levels in T cells with tissue-shared TCRs. We further checked the expression of CXCL13 in each T cell cluster, and found that tumor-enriched clusters (C9, C10 in CD4+ T cells and C8, C9 in CD8+ T cells) express high levels of CXCL13 (Fig. 3c). We then selected the top 20 tumor TCRs from each CD4+CXCL13+, CD4+CXCL13, CD8+CXCL13+, and CD8+CXCL13 T cells, and compared their distribution in the tumor and its adjacent tissues. Tumor-specific TCR expansion was observed in CXCL13+ T cells, while TCRs in CXCL13 T cells were shared among tissues (Fig. 3d). We also compared the frequencies of the four types of TCRs (1α1β, 1α2β, 2α1β, and 2α2β) in CD4+CXCL13+, CD4+CXCL13, CD8+CXCL13+, and CD8+CXCL13 T cells of the 10 sequenced patients, and found that their frequencies are not statistically significant between CXCL13+ and CXCL13 cells in CD4+ or CD8+ T cells (Supplementary information, Fig. S2e).

Fig. 3: Identification of specific markers for Tas cells.
figure 3

a Volcano plots showing differentially expressed genes between CD4+ T-shared and T-specific cells, CD8+ T-shared and T-specific cells, and CD4+ T-shared cells and CD4+ Treg cells. Blue dots represent significantly upregulated genes in CD4+ T-specific, CD8+ T-specific and CD4+ Treg cells respectively (|log2(FC)| > 0.58, P < 0.05). b Heatmap showing the expression of selected genes in CD4+ T-specific, CD8+ T-specific and CD4+ Treg cells, CD4+ T-shared and CD8+ T-shared cells (right panel); c Box plot showing the expression of CXCL13 in CD4+ (top panel) and CD8+ T cell clusters (bottom panel). d Line chart showing the frequency of each TCR in the four tissues. The top 20 TCRs of CD4+CXCL13+, CD4+CXCL13, CD8+CXCL13+, and CD8+CXCL13 T cells were presented. Each line represents a TCR. e The correlation between the percents of CXCL13+ T cells in tumor T cells and the TMB of each tumor. f The correlation between the median expression of CXCL13 and TMB of each cancer type.

Moreover, the percents of CXCL13+ cells in total tumor T cells were positively associated with TMBs in the ten sequenced non-small cell lung cancer (NSCLC) patients (Fig. 3e). Analysis of the RNA-seq data of tumors from the cancer genome atlas (TCGA) database also showed a significant correlation between the median expression of CXCL13 and TMB in multiple cancer types (Fig. 3f). These data demonstrated that CXCL13 is a unique marker for T cells specifically expanded in tumor.

TCRs specifically expanded in tumor respond to neoantigen stimulation in vitro

To examine if these tumor-specific TCRs recognize TSAs, we performed in vitro tumor antigen stimulation to detect the tumor antigen specificity of tumor-specific and tissue-shared TCRs. Using the P4 and P5 patient as examples, we selected the top 5 TCRs from CD4+CXCL13+, CD4+CXCL13, CD8+CXCL13+ and CD8+CXCL13 T cells in tumor of each patient (Supplementary information, Table S8). These TCRs were synthesized and constructed into a lenti-vector expressing the surface marker Thy1.1. A total of 20 TCR-T cell lines were generated for each patient by transducing these TCRs into autologous T cells collected from PBMCs. 100 mutations with high to medium mRNA levels in the tumor of each patient were selected as potential tumor antigens, which were detected by WES and bulk RNA-seq (Supplementary information, Table S9). Five tandem genes were constructed and transduced into autologous lymphoblastoid cell lines (LCLs). Each gene consisted of 20 selected mutation sequences.

We then co-cultured each TCR-T cell line with mixed LCLs tranduced with the tandem genes respectively. TCR-T cells co-cultured with non-tranduced LCLs were used as control. IFNγ production in TCR-T cells was measured by intracellular staining to evaluate the response to antigen stimulation (Fig. 4a; Supplementary information, Fig. S4a). TCR-T cells that produced at least 3 fold more IFNγ than control were determined as positive responses. In P4 patient, one CD4+ and four CD8+ tumor-specific TCRs positively responded to tumor antigen stimulation, but no positive response was detected in tumor-shared TCRs (Fig. 4b). In P5 patient, one CD4+ and all of the CD8+ tumor-specific TCRs positively responded to tumor antigen stimulation, but none of the tumor-shared TCRs showed positive response (Fig. 4c). These data provided direct evidence that TCRs specifically expanded in tumor recognize tumor antigens. One CD8+ tumor-specific TCRs from P4 patient and most of the CD4+ tumor-specific TCRs in both patients were negative to the tumor antigen stimulation, which might be due to that they recognized TAAs or tumor mutations not included in the analysis. The low frequencies of CD4+ tumor-specific TCRs also suggest that the corresponding tumor antigens might be expressed at relatively low levels in tumors. Taking these data together, we conclude that CXCL13+ T cells in tumor are Tas cells.

Fig. 4: T cells from tumor-enriched clusters recognize neoantigens.
figure 4

a FACS plots showing the IFNγ expression in TCR-T cells stimulated with LCLs expressing tandem minigenes of tumor mutations and control. b, c Bar plots showing the fold change of IFNγ expression in each TCR-T cell line stimulated with LCLs expressing tandem minigenes compared to the stimulation with control LCLs (right panel); The top five tumor TCRs in each group of P4 (b) and P5 (c) were presented. d, e Growth of P4 (d) and P5 (e) PDX tumors treated with TCR-T cells expressing top three TCRs indicated in groups in (b, c); Non-tranduced T cells were used as control. Representative of two independent experiments (n = 5 mice/group, each value represents means ± SEM, **P < 0.01 by two-sided t-test).

TCR-T cells engineered with TCRs of Tas cells inhibit growth of autologous PDX tumors

Next, we asked if TCR-T cells engineered with Tas TCRs could be used to treat autologous tumor. PDX tumors were established in NSG mice using tumor samples from the P4 or P5 patient. We treated the PDX tumors by adoptive transfer of T cells respectively engineered with the top 3 TCRs from CD4+CXCL13+, CD4+CXCL13, CD8+CXCL13+ and CD8+CXCL13 T cells in tumor of each patient. Non-transduced T cells were used as control.

We found that TCR-T cells expressing TCRs from CD8+CXCL13+ T cells showed significant therapeutic effects on autologous PDX tumors from P4 (Fig. 4d) or P5 patient (Fig. 4e). For TCR-T cells engineered with TCRs from CD4+CXCL13+ T cells, only the TCRs showed positive response to in vitro neoantigen stimulation exhibited therapeutic effect. No therapeutic effect was observed in the treatment using T cells engineered with TCRs from CXCL13 T cells (Fig. 4d, e). These data demonstrated that TCR-T cells engineered with Tas TCRs could effectively treat autologous tumors.

CXCL13 expression in tumor predicts response to ICB

Since Tas cells play an essential role in the antitumor immune response, we asked if the levels of these cells in tumor measured by CXCL13 expression could predict the response to ICB. We collected the objective response rates (ORRs) of ICB in different cancer types from reported clinical trails,28 and found that the median expression of CXCL13 is significantly associated with the ORRs in multiple cancer types (Fig. 5a). Analysis of published RNA-seq data of tumor samples before ICB treatment also showed that high expression of CXCL13 was associated with improved overall survival (Fig. 5b) and better response to ICB (Fig. 5c).29,30,31,32

Fig. 5: Expression of CXCL13 predicts response to ICB.
figure 5

a The correlation between the median expression of CXCL13 and objective response rate of each cancer type. b Kaplan–Meier overall survival curves comparing high and low CXCL13 expression in patients from dataset PRJEB23709 (left) and GSE91061 (right). c Box plots showing the CXCL13 expression in clinical non-responders (progressive disease) versus responders (stable disease, partial response and complete response) in PRJEB23709, GSE93517, GSE91061 and PRJEB25780 datasets. (*P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001 by two-sided Mann–Whitney U test). d Representative IHC images of CXCL13+TLS+, CXCL13+TLS, and CXCL13TLS tumor sections stained with CXCL13, CD8, and CD4. Scale bars, 20 μm. Red arrows show cells co-expressing CD8 and CXCL13. Blue arrows show cells co-expressing CD4 and CXCL13. e Kaplan–Meier curves of PFS comparing CXCL13+ to CXCL13 in melanoma (up) and CRC (bottom) patients. P value was calculated by the log-rank test.

We further used immunohistochemistry (IHC) staining to detect protein expression of CXCL13 on cancer sections, and found that CXCL13 was specifically expressed in CD4+ and CD8+ T cells (Fig. 5d). Consistent with a previous report,33 the tertiary lymphoid structure (TLS) in tumor displayed high expression of CXCL13, suggesting that Tas cells were enriched in TLS. But sporadic expression of CXCL13 was also observed in some tumors without TLS (Fig. 5d). We then collected tumor samples from melanoma and colorectal cancer (CRC) patients before ICB treatment, and examined the expression of CXCL13 by IHC. Similar to the results from published datasets, high levels of CXCL13 protein were associated with improved progression-free survival (PFS) in these patients (Fig. 5e and Tables 14). These data suggested that the level of CXCL13, standing for the abundance of Tas cells in tumor, could precisely predict the response to ICB.

Table 1 Clinical characteristics for the SYSUCC melanoma cohort.
Table 2 Clinical characteristics for the SYSUCC CRC cohort.
Table 3 Univariate and multivariate COX proportional hazard model analysis for progression free survival of melanoma patients from SYSUCC.
Table 4 Univariate and multivariate COX proportional hazard model analysis for progression free survival of CRC patients from SYSUCC.

Gene expression profiling identifies surface biomarkers of Tas cells

Surface markers could facilitate the sorting of Tas cells from tumor. We therefore picked surface genes highly expressed in CD4+ or CD8+ Tas cells (Fig. 3b) and compared their expression in T cell clusters in tumor (Fig. 6a). We found that CD200 was specifically expressed in CD4+CXCL13+ Tas cells, and ENTPD1 and LAYN could distinguish CD8+CXCL13+ Tas cells from other CD8 clusters although they were expressed in CD4+ Treg.34 Analysis of published scRNA-seq data showed that CD4+CXCL13+ and CD8+CXCL13+ Tas cells with similar features presented in all the 9 analyzed tumor types (Fig. 6b; Supplementary information, Fig. S3 and Table S10), suggesting that these markers could identify Tas cells across tumor types.

Fig. 6: Identification of surface markers for Tas cells.
figure 6

a Violin plot showing the expression of selected genes in tumor T cells of each cluster. b Box plot showing the expression of selected marker genes of CD4+ and CD8+ Tas cells, and CD4+ Tregs in clusters identified in CD4+ (top panel) and CD8+ T cells (bottom panel) from nine different cancer types. c FACS plot showing the gating strategy of CD4+ and CD8+ Tas cells sorting from tumor (left panel); Representative bar plot of five independent experiments showing the expression of CXCL13 in different groups of immune cells and tumor cells by qPCR (right panel). d CD4+ and CD8+ Tas cells sorted from tumors were activated with anti-CD3/CD28 antibodies for 2 days followed by culture in IL-2 for 7 days, and then stimulated with autologous tumor cells for 24 h. FACS plot showing the expression of IFNγ detected by intracellular staining (left panel); Representative bar plot of five independent experiments showing the percentages of IFNγ+ cells (right panel).

We then tested if CD200 and ENTPD1 could be used to isolate CD4+ and CD8+ Tas cells from tumor samples by flow cytometry. LAYN was not tested because there is no commercially available antibody. Different subsets of immune cells were sorted from patient tumors, and the expression of CXCL13 in these subsets was evaluated by qPCR (Fig. 6c; Supplementary information, Fig. S4b). Consistent with the scRNA-seq data, both CD4+CD200+ and CD8+ENTPD1+ T cells showed significantly higher expression of CXCL13 than the corresponding negative T cells (Fig. 6c). Moreover, autologous tumor cells could activate CD4+CD200+ and CD8+ENTPD1+ T cells in vitro (Fig. 6d). These data demonstrated that the surface markers CD200 and ENTPD1 enabled the isolation of CD4+ and CD8+ Tas cells from tumor by FACS sorting, respectively.

Discussion

In summary, using scRNA-seq, TCR-seq, and in vitro neoantigen stimulation, we revealed the characteristics of Tas cells in tumor. Several markers were identified to specifically distinguish CD4+ and/or CD8+ Tas cells from bystander T cells in tumor. We further showed that T cells engineered with Tas TCRs exhibited significant therapeutic effect on autologous PDX tumors. The identified surface biomarkers enabled the isolation of Tas cells from tumors by FACS sorting. We therefore developed a TCR-T cell therapy targeting personalized TSAs (Supplementary information, Fig. S5).

We comprehensively analyzed gene expression, clonal expansion, TCR lineage and antigen specificity of T cells in tumor and its adjacent tissues, and found that tumor-infiltrating T cells consisted of bystanders circulated from adjacent tissues and Tas cells expanded locally in tumor. Although the expression of exhaustion markers was observed in Tas clusters, these genes were also expressed in other clusters, particularly in CD4+ Tregs.24,25,34 A high level of ENTPD1 was also detected in CD4+ Tregs, but it could distinguish CD8+ Tas cells from other CD8+ T cell populations. CD200 was identified as a specific marker of CD4+ Tas cells. These two surface markers enabled the isolation of CD8+ and CD4+ Tas cells from tumor by FACS sorting. Given that the co-inhibitory receptor CD200 is specifically expressed on CD4+ Tas cells, it is worth to test if CD200 blockade could enhance the efficacy of immunotherapy in the future.

Our data revealed that CXCL13 was specifically expressed in all the Tas clusters. Since its expression was not detected in tumor cells and other non-T immune cells in tumor, IHC staining of CXCL13 could examine the location and distribution of Tas cells in tumor sections. CXCL13 is a CXC chemokine promoting the migration of CXCR5+ immune cells by chemotaxis.35,36 Similar to other reports,33,37 high expression of CXCL13 was observed in TLS, suggesting that Tas cells play an important role in the TLS formation by recruiting CXCR5+ immune cells such as B cells, T cells and dendritic cells into tumor. However, scattered distribution of CXCL13 was also observed in some samples without TLS. These two patterns of CXCL13 distribution implied that other factors were required for the development of TLS in tumor. Overall, the level of Tas cells in tumor measured by the expression of CXCL13 could indicate the strength of the antitumor immunity in patients. Consistently, high levels of CXCL13 indicated improved response to anti-PD-1/L1 and anti-CTLA4 treatment in clinical trial data PRJEB23709 (anti-PD-1 or anti-CTLA-4 plus anti-PD-1), GSE91061 (anti-CTLA-4 or anti-PD-1), PRJEB25780 (anti-PD-L1), and GSE93157 (anti-PD-1). These findings suggest that Tas cells are essential for the efficacy of different ICB treatments. Combination of CXCL13 with other markers such as PD-L1 might provide more precise prediction of response to immunotherapy.

Although both CD4+ and CD8+ Tas cells were identified in tumor, more TCRs with high frequencies were observed in CD8+ Tas cells. These data suggest that tumor antigens are mainly presented through class I MHC molecules, which is consistent with the previous report.38 TCRs with high frequencies also showed better response during in vitro antigen stimulation and in vivo treatment. Hence, the high frequency of a Tas TCR might indicate that the corresponding tumor antigen are widely expressed in tumor. Since tumor antigens are heterogeneously expressed in a tumor, TCR-T cells targeting widely expressed tumor antigens will be more effective in cancer treatment. It is worth to compare the therapeutic effect of TCR-T cells using Tas TCRs with different frequencies, which might determine the number and frequencies of Tas TCRs used for the personalized TCR-T cell therapy. Some of the tested CXCL13+ TCRs did not show positive response to tumor antigen stimulation, possibly because they recognize TSAs out of the top 100 mutations included in the assay.

Overall, the identification of Tas cells and their specific biomarkers paved the way for the personalized TCR-T cell therapy targeting patient-specific tumor antigens. Based on these findings, we have initiated a clinical trial of personalized TCR-T cells for the treatment of solid tumors. This individualized strategy of T cell therapy may generate efficient and safe antitumor immune response by amplifying the naturally occurred antitumor immunity in patients, and provide an option to patients resistant to current immunotherapies.

Methods and materials

Reagents

ChromiumTM Single cell 5’ Library & Gel Bead Kit (1000006), ChromiumTM Single cell 3’/5’ Library Construction Kit (1000020), ChromiumTM Single Cell V(D)J Enrichment Kit, Human T Cell (1000005), ChromiumTM Single Cell A Chip Kit (120236) and Chromium i7 multiplex kit (120262) were purchased from 10x Genomics (California, USA). All cell culture reagents were purchased from Gibco (California, USA) unless otherwise indicated.

Collection, preparation and analysis of clinical samples

10 patients diagnosed with NSCLC were enrolled for scRNA-seq at the Sun Yat-sen University (SYSU) cancer center (Guangzhou, China). Detailed characteristics of patients were listed in Supplementary information, Table S1.

Peripheral blood samples were collected prior to surgery in heprin anti-coagulant tubes and subsequently undergone erythrocyte lysis with ACK Buffer (8.29 g NH4Cl, 1 g KHCO3 and 37.2 mg Na2EDTA dissolved in 1000 mL ddH2O, PH 7.2–7.4) for further FACS staining. Surgical tumor, peri-tumor and normal tissue samples were cut into about 1 mm3 pieces and washed by Phosphate Buffered Saline (PBS) containing 2% fetal bovine serum (FBS) for two times. Tissues were then digested in 15 mL RPMI (Invitrogen, California, USA) supplemented with 2% FBS, 50 U/mL Collagenase Type IV (Invitrogen, California, USA), 20 U/mL DNase (Roche, Indianapolis, IN) and incubated at 37 °C for 2 h while gently shaking and further processed with the gentleMACS dissociator (Miltenyi Biotech, Germany). After digestion, tissues were washed twice and filtered through a 70 μm cell strainer to obtain single cell suspensions for further FACS staining. Each sample was stained with CD45 (APC-Cy7, clone 2D1), CD3 (APC, clone UCHT1), CD19 (PE, clone HIB19) at 4 °C for 30 min. After washing, cells were sorted twice using a BD FACS Aria II (BD Bioscience, New Jersey, USA).

For the isolation of indicated cell subsets by FACS sorting and in vitro tumor cell stimulation, single cell suspension from human lung cancer tissues were obtained as described above, and then stained with CD45 (PE, clone 2D1), CD3 (FITC, clone OKT3), CD4 (APC-Cy7, clone A161A1), CD8 (Percp-Cy5.5, clone SK1), CD19 (PE-Cy7, clone SJ25C1), CD200 (APC, clone OX-104) and ENTPD1 (APC, clone A1) at 4 °C for 30 min. After washing, cells were sorted twice using a BD FACS Aria II. Seven groups of cells: CD45, CD45+CD3CD19, CD45+CD3CD19+, CD45+CD4+CD200+, CD45+CD4+CD200, CD45+CD8+ENTPD1+ and CD45+CD8+ENTPD1 were collected. Parts of the cells were lysed in TRIzol reagents (Invitrogen, California, USA) for qPCR analysis. Total RNA obtained from these cell groups were reverse-transcribed using the GoScript Reverse Transcription System (Promega, Wisconsin, USA). The reverse-transcription product was amplified with the TransStart Top Green qPCR SuperMix (TransGen Biotech, Nanjing, China) and analyzed with gene-specific primers on a Bio-Red CFX system. The sequences of the PCR primers are:

hCXCL13-Forward: 5′-TATCCCTAGACGCTTCATTGATCG-3′

hCXCL13-Reverse: 5′-CCATTCAGCTTGAGGGTCCACA-3′

h18s-Forward: 5′-GCGGCGGAAAATAGCCTTTG-3′

h18s-Reverse: 5′-GATCACACGTTCCACCTCATC-3′

The rest of the sorted cells were activated by anti-CD3 (clone OKT3) and CD28 (Clone CD28.2) (5 ug/mL respectively, Peprotech, New Jersey, USA) at a density of 1 × 105 cells per well. After 48 h, activated T cells were transferred to 24-well plates and cultured with 200 U/mL IL-2 at a density of 2 × 105 cells/mL for 7 days. Cryopreserved single cell suspensions of the corresponding tumor tissues were thawed, washed, and resuspended in RPMI 1640 medium, and then mixed with T cells at 1:1 ratio. 4 h before harvest, Brefeldin A (10 μg/mL; Biolegend, California, USA) was added. After 24 h co-culture, Cells were washed twice using PBS with 1% FBS and stained with CD4 (BV421, clone A161A1) or CD8 (APC, clone SK1) at 4 °C for 30 min. Cells were washed and fixed using the transcription factor buffer set (562574, BD Biosciences), and then stained with the IFNγ (APC, clone 4S.B3) antibody for 45 min at 4 °C. After washed twice, cells were analyzed by a FACS Aria II flow cytometer (BD Biosciences). Data analysis was performed using FlowJo software.

Tumor samples before anti-PD1 treatment were obtained from 34 melanoma and 53 CRC patients at the SYSU Cancer Center. PFS was defined as the duration from the initial treatment to the date of disease progression. Radiological evaluation of the disease was based on the Response Evaluation Criteria in Solid Tumors (RECIST) version 1.1. Patients that had not progressed were censored at their last scan. The study was approved by the research ethics committee of the SYSU cancer center (#GZR2017-216). Written informed consents were obtained from each patient or their guardians.

Patient-derived tumor xenografts

PDX tumor models were established following the published protocol.39 Briefly, surgical tumor samples were cut into approximately 3 × 3 × 3 mm3 pieces and transplanted into 6–8-week-old female immunodeficient NSG mice. The rest of the samples were processed for scRNA-seq, WES and bulk RNA-seq. Mice were bred and maintained at the local animal facility according to the legislation and ethical approval obtained for the establishment of PDX. Mice were observed every three days for tumor growth up to 6 months after the transplantation. Tumors were harvested and passaged to another batch of NSG mice when the sizes reached about 500 mm3. Tumors passaged two more times were stored in liquid nitrogen for further experiments.

For the treatment of PDX tumors, 8–10-week-old female NSG mice bearing similar sizes of tumors were randomly divided with five mice per group. 2 × 107 TCR-T cells or control T cells were intravenously transplanted to each mouse in the corresponding groups. IL-2 (1 × 106 U each mouse) was intraperitoneally injected at the same time. Tumor size was measured every 3 days and calculated as length × width.

scRNA-seq and data preprocessing

The scRNA-seq libraries were prepared using the ChromiumTM Single cell 5′ Reagent Kit of Chromium platform (10x Genomics, California, USA) following the manufacturer’s instruction. Generated scRNA-seq libraries were sequenced on the Illumina HiSeq X Ten platform. The CellRanger software (version 3.1.0) was used for preprocessing of the PE150 Illumina sequencing reads. Briefly, raw reads in bcl format were converted to FASTQ format using “cellranger mkfastq”, and the reads in FASTQ format were aligned to human genome reference (hg38, GRCh38) using STAR, and then “cellranger count” was used to derive gene expression matrix for each sample.

Determination of cell types from scRNA-seq data

Seurat (v3.1.3) R toolkit was used to analyze the scRNA-seq data. Firstly, cells with low quality were filtered. Briefly, the dead or dying cells with more than 20% mitochondrial RNA content were removed, and the cells with low number of UMI (less than 200) were also removed. Cell doublets were predicted and removed using DoubletFinder. For each patient, a 4% (true) doublet rate was assumed, 5 principal components were used, and the default value of 20% was used for pN (the number of generated artificial doublets expressed as a proportion of the merged real-artificial data). For each library, the PC neighborhood size pK was estimated using as the maxima of the distribution of mean-variance normalized bimodality coefficient scores. Cells expressed more than one marker among the three markers (CD2, CD79A, CD68) were also defined as doublets and removed. Then, the filtered gene expression matrix for each sample was normalized using “NormalizeData” function in Seurat, and only highly variable genes were remained using “FindVariableFeatures” function in Seurat. Next, “FindIntegrationAnchors” and “Integratedata” functions in Seurat were used to integrate the gene expression matrices of all samples, where batch effects between different samples have been adjusted. Next, “RunPCA” function was used to perform the principal component analysis (PCA) and “FindNeighbors” function was used to construct a K-nearest-neighbor graph. The most representative principle components (PCs) selected based on PCA were used for clustering analysis with “FindCluster” function to determine different cell types. Lastly, t-SNE was used to visualize different cell types.

We annotated the cell types using the following rules. Based on the most 10 differentially expressed genes that were derived using “wilcoxauc” function in presto, genes such as CD2, CD3D, CD3E, CD3G and CD247 were used as T cell markers. The percentage of cells expressing CD4 and CD8A was counted to define CD4+ T cells or CD8+ T cells. The CD4+, CD8A+ T cells with TCR clonal expansion (≥2 T cells containing the same TCR in a patient) were further clustered using the single cell analysis pipeline as described above. To get higher resolution clusters, the “resolution” parameter used in FindCluster was set from 0.3 to 0.5. T cells from all the tumor samples were clustered by the R package “cellassign” with learning rate 0.001. The training set is the 10 most differentially expressed genes in each group of the expanded CD4+ T cells and CD8+ T cells.

TCR detected only in tumor tissue was defined as TCR specific, The corresponding cells were divided into CD4+ Treg, CD4+ T-specfic and CD8+ T-specfic cells according to the expression levels of FOXP3 and IL2RA. Other cells were divided into CD4+ T-shared and CD8+ T-shared cells.

Bulk RNA-seq and WES data processing

The bulk RNA-Seq data are aligned by STAR with genome reference hg19, then remove duplicate used picard, and indel realiment used GATK4. The WES data are aligned by BWA mem software with genome reference hg19, and then remove duplicate with sambamba markdup, realign and recal with GATK RealignerTargetCreator, IndelRealigner, BaseRecalibrator tools, call somatical variation with Mutect2, filter these variation with GATK FilterMutectCalls and SelectVariants tools. Finally, we annotate these variation use annovar software.

Calculation of tissue preference for each T cell cluster

We use Ro/e value to estimate the tissue preference of each T cell cluster as previously described.24 Ro/e represents the ratio of observed to expected cell numbers in a cluster. The expected cell numbers of each cell cluster in each tissue are obtained from χ2 test. Ro/e > 1 for a certain cluster in a certain tissue indicates that the cluster prefers that tissue.

TCR analysis

Full-length TCR V(D)J segments were enriched from amplified cDNA from 5′ libraries via PCR amplification using a Chromium Single-Cell V(D)J Enrichment kit according to the manufacturer’s protocol (10x Genomics). The TCR sequences for each single T cell were assembled by Cell Ranger vdj pipeline (v3.1.0), leading to the identification of CDR3 sequence and the rearranged TCR gene. Analysis was performed using Loupe V(D)J Browser v.2.0.1 (10x Genomics). Some T cells contained the same TCR with two α and/or two β chains. Four TCR forms, 1α1β, 1α2β, 2α1β, and 2α2β were considered in the analysis. The same form of TCR was considered with one class, and got a unique TCR name. In brief, a TCR diversity metric, containing clonotype frequency and barcode information, was obtained.

Trajectory analysis using Monocle 2

Monocle 2 was used to illustrate the cell state transition in CD8+ T cells and CD4+ T cells. It applies a reversed graph embedding technique to reconstruct single-cell trajectories. Briefly, we used UMI count matrices and the negbinomial.size parameter to create a CellDataSet object in the default setting. We used Monocle 2 variable genes with the following cutoff criteria: dispersion_empirical > dispersion_fit; and mean expression > 0.001. Dimensional reduction and cell ordering were performed using the DDRTree method and the orderCells function.

Generation of TCR-T cells

The α and β sequences of a sequenced TCR were linked by the 2A sequence (GSGATNFSLLKQAGDVEENPGP), and then synthesized and subcloned into the pHAGE lentiviral vector expressing the surface marker mThy1.1. Lentivirus was produced in 293T cells. In short, 293T cells were plated into a T175 flask 1 day prior to transfection in Dulbecco’s minimum essential medium (DMEM) with 10% FBS to achieve 70% confluence on the day of transfection. On the day of transfection, 33 μg Lenti-vector, 30 μg DR8.91 and 10.5 μg VSVG plasmid were added into 4 mL 0.125 M CaCl2, then 4 mL 2× Hepes-buffered saline (2× HBSS) was dropped into the CaCl2-DNA mixture and vortex to get Calcium phosphate-DNA complex. The complex were incubated for 20 min at room temperature and then added to the 293T culture. Transfection media was removed 6 h later and replaced with DMEM supplemented 20% FBS. The lentivirus supernatant was harvested at 48 h and 72 h after transfection. After ultracentrifuge concentration (20,000 rpm at 4 °C for 90 min), the lentiviral particles were resuspended in cold sterile 1× HBSS with 5% sucrose and stored at −80 °C.

Cryopreserved PBMCs isolated from patient blood were recovered and seeded in RPMI 1640 supplemented with human IL-2 (200 U/mL, Peprotech, New Jersey, USA) in a 24-well plate pre-coated with anti-CD3 and CD28 (5 μg/mL respectively, Peprotech, New Jersey, USA), at a density of 1 × 106 cells per well. After 48 h, 5 × 105 activated T cells were transferred to Retronectin (Takara, Japan)-coated 24-well plates and TCR virus were subsequently added, and plates were centrifuged at 2000 rpm for 50 min at 30 °C. After centrifugation, supernatants were removed and replaced with fresh medium containing 200 U/mL IL-2. Transduction efficiency was measured in 72 h by the expression of mThy1.1 using flow cytometry. Transduced T cells were cultured with replaced fresh medium and IL-2 every 3 days, and were cryopreserved in 10 days post-transduction.

Generation of LCLs expressing minigenes of mutations

Immortalized autologous B cells were generated according to the published protocol4.40 Each minigene was designed to encode 12 mutation epitopes (Supplementary information, Table S2), and codon-optimized sequences were synthesized and subcloned into the pHAGE lentiviral vector expressing GFP. LCLs were transduced with lentivirus of minigenes. GFP+ LLC cells were sorted by FACS Arial II and kept in culture until use in functional assays.

In vitro tumor antigen stimulation assay

6 × 105 TCR-T cells were co-cultured with 6 × 105 mixed LCLs expressing mutation minigenes at 37 °C for 16 h in RPMI 1640 supplemented with L-glutamine (2 mM), antibiotics (penicillin, 100 U/mL and streptomycin, 100 μg/mL) and 10% FBS. 4 h before harvest, Brefeldin A (10 μg/mL; Biolegend, California, USA) was added. Cells were washed twice using PBS with 1% FBS and stained with CD3 (PE-Cy7, clone HIT3a) and mCD90.1 (PE, clone OX-7) at 4 °C for 30 min. Cells were washed and fixed using the transcription factor buffer set, and then stained with the IFNγ (APC, clone 4S.B3) antibody for 45 min at 4 °C. After washed twice, cells were analyzed by a FACS Aria II flow cytometer. Data analysis was performed using FlowJo software.

IHC staining and scoring

IHC staining was performed on formalin fixed paraffin-embedded (FFPE) tumor tissue sections. The tumor tissues were fixed in 10% formalin, embedded in paraffin, and serially sectioned into 4-µm-thick sections for histopathological study. Sections were deparaffinized in xylene, rehydrated in absolute and 90% ethanol serially, and washed with distilled water. After antigen retrieval in citrate buffer (0.01 M, pH 6.4), they were incubated in blocking solution (5% horse serum, 3% BSA, 0.1% Tween-20 in PBS) for 1 h at room temperature (RT), then stained with antibodies against CXCL13 (R&D Systems, Clone Q53X90, 1:200), CD8 (Zsbio, Clone ZA0508, 1:200) and PD-L1 (CST, Clone E1L3N, 1:100) at 4 °C overnight in a humidified chamber. Sections were then washed several times with PBS, and incubated 1 h at RT with the secondary antibody. All sections were counterstained with haematoxylin, dehydrated and mounted, and processed with peroxidase-conjugated avidin/biotin and 3′-3-diaminobenzidine (DAB) substrate (Leica Microsystem). Human FFPE tonsil sections were used as positive controls. IHC images were independently analyzed blindly by three pathologists.

TLS in tumor area were identified as aggregates of lymphocytes having histological features with analogous structures to that of lymphoid tissue.

Criteria used for the quantification of TLS, CXCL13, PD-L1 and CD8 IHC staining includes: (1) TLS structures either within the tumor area or in direct contact with the tumor cells on the margin of the tumors were counted; (2) The expression of CXCL13 in TLS structures and sporadic lymphocytes (non-TLS) was evaluated respectively; (3) The intensity of CXCL13 staining was semi-quantitatively evaluated using the following criteria: positive (scored 1), any degree of brown staining appreciable in lymphocytes or intercellular space; negative (scored 0), no appreciable staining in lymphocytes or intercellular space; (4) The amounts of CD8 staining were calculated using ImageJ (National Institutes of Health, http://imagej.nih.gov/ij), 10 times magnifying. The ROC curve for the cutoff value was generated to categorize at two levels: high (scored 1), and low (scored 0); (5) The intensity of PD-L1 staining in tumor cells was semi-quantitatively categorized at two levels: positive (scored 1), any degree of brown staining appreciable in more than 1% of the tumor cells; negative (scored 0), no appreciable staining or any degree of brown staining appreciable in less than 1% of the tumor cells.

Analysis of the correlation of CXCL13 expression with TMB and ORRs

Maf files which have mutation information were download from GDC (https://portal.gdc.cancer.gov/). The TMB estimate for each sample is equal to the total mutation frequency/38. TMB per megabase is calculated by dividing the total number of mutations by the size of the coding region of the target. The median TMB and CXCL13 expression of each cancer was calculated. The relationship between the median TMB and the median CXCL13 expression was analyzed with the methods of Spearman. Objective response rate of each cancer type was collected as reported.28 The relationship between the median CXCL13 expression and objective response rate was analyzed with the methods of Spearman.

Statistical analysis

Based on the IHC staining scores, patients are divided into several groups. The Kaplan-Meier method was used to estimate PFS or OS, the log-rank test was used to compare the Kaplan-Meier curves. HRs and corresponding 95% CIs were estimated with the use of Cox’s regression model. Using a χ2 test to seek relationship with CXCL13, TLS expression between age, gender and stage. A two-sided P value of less than 0.05 was considered significant. All sample sizes were large enough to ensure proper statistical analysis. Statistical analyses were performed using GraphPad Prism (GraphPad Software, Inc.). P value < 0.05 were considered as statistically significant. All Mann-Whitney test analyses were two-tailed unless otherwise indicated (paired or unpaired depending on the experiments). The number of replicates (n), number of independent experiments performed, and P values for each experiment are reported in the corresponding figure legends.

DEGs between CD4+ T-shared and T-specfic cells, CD8+ T-shared and T-specfic cells, and CD4+ T-shared cells and CD4+ Treg cells were obtained using R package edgeR with log2 Fold change > 0.58 and P value < 0.05.