Introduction

COVID-19 is a potentially severe disease caused by SARS-CoV-2. As of March 30, 2021, SARS-CoV-2 has affected 128,339,981 people, resulting in 2,806,701 million people deaths worldwide. COVID-19 not only endangers people's health but also affects the economy (Bolay et al. 2020). Clinically, COVID-19 presents with influenza-like symptoms of headache, fever, dry cough, diarrhea, fatigue, pneumonia, conjunctivitis, and gastrointestinal symptoms such as diarrhoea (Abobaker et al. 2020; Berthelot et al. 2020; Burhan et al. 2020). Although the efforts and resources spent by clinicians and scientists trying to explore antiviral drugs and vaccines (Lundstrom 2020; Sharma et al. 2020; Wiersinga et al. 2020), there is still no effective clinical treatment for COVID-19 to date. Therefore, there is an urgent need to better understand the host immune response in SARS-CoV-2 virus infection that will offer fundamental insights into the treatment and design effective therapeutic interventions for the disease.

In order to mount effective adaptive responses against pathogens, the human adaptive immune system harbors a vast array of T-cell receptors (Hou et al. 2019a, b, c). High-throughput sequencing (HTS) of T cell receptors is a useful tool for analyses of T lymphocytes' clonality and diversity (Hou et al. 2019a, b, c). In this study, we performed a comprehensive immunological analysis to explore the extent to which SARS-CoV-2 influences the TCR repertoire, including diversity of TCR, CDR3 frequency distribution, and CDR3 length distribution, V/J usage, rearrangement sites, and overlap indices. Meanwhile, we identified a list of TCR sequences that may be correlated with SARS-CoV-2 viral antigens in COVID-19 patients, which could be used to distinguish COVID-19 patients from healthy controls accurately. These results provided a link between COVID-19 and TCR repertoires and helped prevent the occurrence or progression of COVID-19 disease with timely diagnosis and treatment.

Materials and methods

Biological materials

All TCR sequences used here evaluation was downloaded from immuneaccess (https://clients.adaptivebiotech.com/pub/covid-2020) (Nolan et al. 2020). As described previously, this is an exploratory study of unselected eligible participants baseline to, infected with, or recovering from COVID-19. Whole blood samples were collected under the INCOVE project at Providence St. Joseph Health (Seattle, WA), and patients were enrolled during the active phase and monitored through disease. This study includes T-cell repertoire data from 139 samples, including 60, 65, 14 samples collected from baseline, acute, and convalescent stages, respectively. COVID-19 patients aged 23 to 89 years (mean ± SD, 60.30 ± 17.54) were consented and enrolled via a virtual study design. This study was approved by the Western Institutional Review Board (WIRB reference number 1–1,281,891-1, Protocol ADAP-006). All the participating individuals provided their written informed consent. In addition, the TCR sequences for 21 healthy subjects were obtained from Adaptive Biotechnologies immuneACCESS siIte (https://doi.org/10.21417/B7SG6T). These healthy subjects were tested negative for anti-HIV antibodies, anti-hepatitis B surface antigen (anti-HBsAg) antibodies and exhibited no laboratory or clinical signs of other immunological disorders or infectious diseases. Among these 21 healthy donors, ten were males and eleven were females, and had a mean age of 56.61 ± 16.34 years.

High-Throughput Sequencing

Whole blood samples blood were collected in EDTA tubes and sent to isolated peripheral blood mononuclear cells (PBMC) and DNA extracted for TCRβ analysis via the immunoSEQ (Nolan et al. 2020). In briefly, extracted genomic DNA was amplified using a bias-controlled multiplex-PCR system, followed by high-throughput sequencing. Raw data processing and analysis were performed through immunoSEQ. Then, demultiplexed reads were processed to reduce amplification and sequencing bias. Subsequently, we used the international ImMunoGeneTics information system (IMGT; http://imgt.cines.fr) database for the alignment. According to the criteria of the international ImMunoGeneTics Collaboration, the TCRβ CDR3 region starts at the second conserved cysteine encoded by the 3’ position of the Vβ gene segment and ends with the conserved phenylanine encoded by the 5’ position of the Jβ gene segment. The number of amino acid between these codons determines the CDR3 length. According to the identity of each sequence after alignment, the relative abundance of each TCRβ CDR3 sequence was clear and calculated. The percent frequency of each distinct DNA sequence, amino acid sequence and V-J combination was also identified. Additionally, we used batch correction to eliminate the batch effect of different datasets. Subsequently, multiple TCR data statistics were performed included TCR repertoire diversity, V/J usage, V-J pairing, CDR3 length distribution, and the length distribution of n1Insertion, vDeletion, d5Deletion, d3Deletion, jDeletion, n2Insertion were identified as previously described (Gomez-Tourino et al. 2017; Hou et al. 2019a, b, c; Huang et al. 2019).

Calculation of diversity indices

The immune repertoire was characterized by examining the diversity and clonality. Analysis of the immune repertoire involves several diversity indexes, such as the D50 index, Shannon index, Simpson index and Gini index (Hou et al. 2020; Marcon et al. 2014; Venturi et al. 2007).

Analysis of deletion and insertion patterns

For each of the 6 rearrangement sites (Vdel, D5del, D3del, Jdel, n1ins, and n2ins) the frequency of clonotypes bearing a particular number of indels was calculated for cells of each samples. The indel patterns are shown classified into germline, short, medium and long. Indels were classifified into length of the relevant rearrangement event. A cumulative frequency was calculated for each of classes and rearrangement sites.

Calculation of overlap indices

Interindividual sharing TCR repertoires were quantifified by calculating the number of common amino acid sequences in each sharing category (overlap (X, Y) =|X and Y|/min (|X|, |Y|) for amino acid sequences).

ROC curve analysis and disease-associated-clones definition

Receiver operating curve (ROC) analysis was performed to evaluate the value of disease-associated TCRβ clones to differentiate between cases and controls. Disease associated clones were defined as those TCRβ presented in at least four COVID-19 patients and less than three healthy individuals. Exhausted leave-one-out cross-validation was used to assess the identifier’s performance during model training. More specifically, given that there were N samples in each group, 4/5 N samples were used as training data and processed using the above classification model. The remaining 1/5 N samples were used as testing data to perform the classification. The cross-validations were repeated 5 N times until every sample had been used as testing data five times (Huang et al. 2019). The area under the ROC curve (AUC) values were calculated using the predicted probability of a given TCRβ chain belonging to the COVID-19 population. The parameter in which the highest AUC value was determined, and the points where the sum of sensitivity and specificity levels reached the maximum were accepted as the best cut-of values (Huang et al. 2019).

Statistical analysis

For comparisons of two groups, statistical analyses were conducted using the Mann–Whitney U-test or unpaired t-test. Data were presented as the mean ± SD values or as percentages (%). Statistical analyses were performed using SPSS20.

Results and discussion

The diversity of TCRβ repertoires was relatively low in COVID-19 patients

The TCRβ repertoires of different groups (baseline, acute, and convalescent, and healthy controls) were examined. Diversity indices were calculated, consisting of the D50 index, Shannon index, Simpson index, and inversesimpson index (Fig. 1). For the Simpson index, the smaller the index was, the higher diversity had. Besides, for inversesimpson index, D50 index, and Shannon index, an increased value indicates greater diversity. According to the Fig. 1, there was a lower TCR diversity in COVID-19 patients. It is worth noting that the diversity of the convalescent group was lower than that of the baseline group and the acute group. The reason for this phenomenon may be that the TCR diversity of convalescent group had not convalescent during this period.

Fig. 1
figure 1

The diversity of TCRβ repertoire in COVID-19 patients from baseline, acute, and convalescent group, and healthy controls (HC) are measured by the Shannon index (a), D50 index (b), inversesimpson index (c), and Simpson index (d). Data are presented as the mean ± SD values and compared using the t-test. *P < 0.05, **P < 0.01, ***P < 0.001(two-tailed)

Abnormal TRBV/J gene usage in COVID-19 patients

TCR gene rearrangement generates a diversity of T lymphocytes by V(D)J recombination. We identified disease-related changes of TRBV and TRBVJ gene usage in COVID-19 patients (Fig. 2). The results showed that there were more differentially expressed TRBV and TRBJ gene subfamilies in baseline group, and acute group, and less in the convalescent group compared with the healthy control group. The results showed that TRBV10, and TRBV2 exhibited significantly lower expression in COVID-19 patients, while TRBV12, TRBV20, TRBV24, TRBV3, and TRBV9 showed higher expression in COVID-19 patients compared with that in healthy controls, whether in the baseline, acute, and convalescent group (Fig. 2a–c). Besides, TRBV6 was lower expressed, and TRBV15, TRBV23, and TRBV29 were higher expressed in the baseline and acute group, no difference in the convalescent group. Regarding the degree of TRBJ usage, TRBJ2-5 showed higher usage, while TRBJ1-3 and TRBJ1-4 showed significantly lower usage in COVID-19 patients when compared to healthy control, whether in the baseline, acute, and convalescent group (Fig. 2d–f). Moreover, TRBJ1-2 and TRBJ1-6 showed significantly higher usage, while TRBJ1-1 and TRBJ2-6 showed significantly lower usage in the baseline and acute group, no difference in the convalescent group. In addition, COVID-19 patients could be distinguished from healthy controls by a principal coordinate analysis (PCA) based on TRBV segment usage (Fig. 2g), indicating that TRBV gene expression changes were involved in the pathogenesis of COVID-19.

Fig. 2
figure 2

Usage frequency of TRBV/J segments was different between COVID-19 patients and healthy controls. a–c Significant differences of TRBV gene usage in COVID-19 patients from baseline (a), acute (b), and convalescent (c) group, compared with that in healthy controls (HC). d–f Differences of TRBJ gene usage in COVID-19 patients from baseline (d), acute (e), and convalescent (f) group, compared with that in healthy controls. Data were statistically analyzed by t-test and FDR correction. Principal coordinate analysis (PCA) based on TRBV segment usage showed profound differences between COVID-19 patients and healthy controls (f)

Moreover, we further analyzed the differential expression TRBJ gene subfamilies among the three disease groups (baseline, acute, and convalescent). TRBJ1-3 and TRBJ1-4 were significantly higher expression, TRBJ2-2 were significantly lower expression in baseline and acute group, compared with a convalescent group (Fig. 3a, b). In addition, three-dimensional landscapes of TRBV gene usage in conjunction with TRBJ usage were plotted. We found the top three V-J combinations of covid-19 patients (including baseline, acute, and convalescent group) were TRBV7-TRBJ2-7, TRBV5-TRBJ2-7, TRBV6-TRBJ2-7 (Fig. 3c–e). However, the top three V-J combinations in healthy controls were TRBV6-TRBJ1-1, TRBV5-TRBJ1-1,TRBV6-TRBJ2-7 (Fig. S1).

Fig. 3
figure 3

Usage frequency of TRBV/J segments was different among the three different infected stages of COVID-19 disease. a, b Significant differences of TRBJ gene usage between baseline and convalescent group (a), Significant differences of TRBJ gene usage between acute and convalescent group (b). Data were statistically analyzed by t-test and FDR correction. Three-dimensional landscapes depicting the mean utilization frequency of specific Vβ–Jβ gene segment combinations in the TCRβ chains expressed in patients with COVID-19 from the baseline (c), acute (d), and convalescent (e) group

Divergence of CDR3 length distribution and recombination events (InDels) in COVID-19 patients

The analysis of CDR3 length variation might reveal the structure–function relationships of different TCRs, recognizing specific antigens (Hou et al. 2016). Importantly, we found there were significant differences of TCRβ CDR3 length distribution in COVID-19 patients compared with that in HCs (Fig. 4a–c), whether in the baseline, acute or convalescent group. Moreover, there were significant differences among the three different infection stages of COVID-19 patients (Fig. S2). CDR3 diversity is generated by the extensive genomic rearrangement that takes place among TRBV, TRBD, and TRBJ gene segments and by the nucleotide insertions and deletions (InDels) within the junction of V-D-J (Gomez-Tourino et al. 2017). Therefore, to understand the molecular basis for these features, we analyzed the Indel nucleotides in the 6 rearrangement sites (Videos, Jdels, D3dels, D3dels, n1ins, and n2ins), and found that length distribution of the six recombination events (indels) was significantly different between COVID-19 patients and healthy control, whether in the baseline (Fig. S3), acute (Fig. 4d–f, Fig. S4) or convalescent group (Fig. S5). The differences were mainly concentrated in germline (no deletions/insertions) and long d5dels and Jdels. Moreover, there were significant differences among the three different infection stages of COVID-19 patients (Figs. 4g–i, S6).

Fig. 4
figure 4

Altered CDR3 length distribution and InDel patterns in COVID-19 patients. a–c Altered CDR3 length in COVID-19 patients from the baseline group (a), acute group (b), and convalescent group (c), compared with healthy controls. d–f TCRβ CDR3 sequences from acute COVID-19 patients and healthy controls were analyzed for the number of insertions and deletions (InDels) at each of the three rearrangement sites, D3del (d), D5del (e), Jdel (f). The deletions were classified as short, medium or long depending on their frequency in the overall distribution for each of the 3 rearrangement sites. G: germline (no deletions/insertions). Short: 1–11 (Jdel), 1–5 (D5del, D3del). Medium: 6–9 (D5del), 6–8 (D3del), 12–18 (Jdel). Long: 10–11 (D5del), 9–10 (D3del), 19–25 (Jdel). g–i TCRβ CDR3 sequences of COVID-19 patients from baseline and acute group were analyzed for the number of InDels at each of the three rearrangement sites, D3del (g), D5del (h), Jdel (i)

Overlap index of TCRβ CDR3 increased in COVID-19 patients

TCRβ clonotypes that are shared between individuals are thought to play a vital role in the efficacy of pathogen-specific responses and infection control (Zhao et al. 2016). We found the overlap indices of TCRβ CDR3 amino acid sequences were substantially higher in COVID-19 patients than that in HCs (Fig. 5a–c). However, the overlap index was similar among the three different infection stages of COVID-19 patients (Fig. S7). Notably, we found 522, 777, 1022 amino acid sequences shared by 80% of the COVID-19 patients from the baseline group, acute group, and convalescent group, respectively. Besides, sharing TCR repertoires between any two samples in each group was also quantified (Fig. 5d–f).

Fig. 5
figure 5

TCRβ CDR3s were highly shared across COVID-19 patients. a–b Overlap indices for TCRβ sequences are higher in COVID-19 patients than in healthy control, whether in the baseline group (a), acute group (b), or convalescent group (c). d–f The overlap level of TCRβ CDR3s between any two samples in each group is calculated in COVID-19 patients from baseline group (d), acute group (e), and convalescent group (f)

Diagnose of COVID-19 with the disease-associated CDR3s

Finally, we intended to identify the COVID-19 disease-related specific clones and used them to construct the ROC curve to distinguish COVID-19 patients from healthy controls. We screened out 15, 15, 30 TCRβ CDR3 clones that were abundantly and widely presented in the COVID-19 patients from the baseline, acute, or convalescent group, respectively, but rarely appear in the healthy controls. We defined these clones as COVID-19 disease-associated TCRβ CDR3 clones, and found that the unique and total disease-associated TCRβ clones frequencies were significantly higher in COVID-19 patients than in healthy controls (Figs. S8, 6ac). In the baseline group, these 15 disease-related TCRβ clones could separate COVID-19 patients from healthy control with a sensitivity of 95.58%, a specificity of 89.32%, and an accuracy of 96.67% (AUC (the area under the ROC curve) = 63.2%) (Fig. 6d). The same analysis was carried out in the acute and convalescent group, with a sensitivity of 96.28%, a specificity of 92.26% and an accuracy of 96.83% (AUC = 63.34%) in the acute group (Fig. 6e), and a sensitivity of 78%, a specificity of 95.98% and an accuracy of 92.0% (AUC = 72.84%) in convalescent group (Fig. 6f). In addition, we identified some TCRβ CDR3 clones which changed with different infection stage. Among these, the abundance of 14 clones increased with the disease's progression (baseline-acute-convalescent) (Fig. 7a). The abundance of 13 clones increased to a peak from the baseline stage to the acute stage, and then decreased from the acute stage to the convalescent stage (Fig. 7b).

Fig. 6
figure 6

Disease-associated TCRβ clones can distinguish COVID-19 patients with healthy control. a–c The total frequency of disease-associated TCRβ clones are significantly higher in PBMC of COVID-19 patients from baseline group (a), acute group (b), and convalescent group (c) than that in healthy controls. d–f The ROC curve of classification of healthy controls and COVID-19 patients from baseline group (d), acute group (e), and convalescent group (f) according to disease-associated TCRβ clones. Data are presented as the mean ± SD values and compared using the t-test. ***indicate P values less than 0.001

Fig. 7
figure 7

The heat map showed the changes in the expression level of disease-associated TCRβ CDR3 clones during different stages of infection. The abundance of 14 clones was increased in COVID-19 patients from baseline to acute stage to convalescent stage (a). The abundance of 13 clones was increased in COVID-19 patients from baseline stage to acute stage, and then decrease from acute stage to convalescent stage (b)

Discussion

COVID-19 became a severe pandemic, widespread in the world. As of March 30, 2021, the number of global infections is estimated to be over 128 million, with over 2.8 million deaths. The adaptive immune responses are essential for SARS-CoV-2 virus clearance. Understanding humoral responses to SARS-CoV-2 is critical for improving diagnostics, therapeutics, and vaccines (Du and Yuan 2020; Shrock et al. 2020). In this study, we used ImmunoSEQ-technology to characterize the TCR repertoire features in PBMCs of COVID-19. Results indicated that the patients with COVID-19 showed lower diversity than healthy controls, although the difference was not statistically significant. Moreover, our analysis results TRBV/J gene exhibited significantly different expression in COVID-19 patients. The results may be associated with immune-related response to virus antigen. In response to disease-related antigen stimulation, T cells become activated and undergo clonal expansion. However, the total number of immune cells generally remains within a relatively stable range to maintain its normal physiological functions. The increased number of dominant T cells leads to a decreased number of other T cells (Sui et al. 2015). As a result, COVID-19 patients had relative increases in the proportion of some TRBV(J) gene segments and decreased others.

TCR CDR3 loops can vary in both length and sequence, and this gives them the ability to recognize a diverse set of antigens (Afik et al. 2017; Pickman et al. 2013). Indeed, we found that COVID-19 patients tend to select longer clonotypes compared with healthy controls. Recombination events generate diversity in antigen recognition in the CDR3 region of the TCR. We also analyzed the deletions or insertions of 6 rearrangement sites, and results showed that the Indel lengths distribution was different between COVID-19 patients and healthy controls. Different rearrangements may result in longer CDR3 lengths in COVID-19 patients.

More generally, the higher proportion of shared clones among abundant clonotypes could result from peripheral selection and expansion of antigen-specific clonotypes driven by persistent pathogens (Chandran et al. 2015). We found that the overlap degree of the TCR repertoire was significantly higher in COVID-19 patients. It may be that SARS-CoV-2 viruses foster disease-associated public repertoire formation. If so, specific public sequences may prove useful for the longitudinal monitoring of immune responses during COVID-19 diseases. In simian immunodeficiency virus (SIV)-infected rhesus macaques, the emergence of public Gag-specific clonotypes correlates with protection, and the absolute number of shared clonotypes inversely correlates with viral load (Price et al. 2009). So, disease-associated public TCRs may explain certain infections' susceptibility, correlations of pathogenesis, and clinical manifestations of the disease. Besides, some specific public sequences can be applied toward developing novel molecular biomarkers potentially useful for accurate disease classification. Huang et al. identified a set of TCRβ and IGH clones, which can be used to distinguish IgAN from NIgAN and healthy controls with high accuracy (Huang et al. 2019). In this study, we also identified a set of SARS-CoV-2 associated TCRβ clones which can distinguish patients with COVID-19 from healthy control with an accuracy rate of more than 92%. However, the main limitation of our study is the relatively low sample number. Hence, to estimate whether the biomarker performs well, validation studies with large numbers of cases in other ethnic populations are required.

In conclusion, we present a comprehensive overview of TCRβ CDR3 repertoire in COVID-19 patients. These analyses provide information to distinguish COVID-19 patients and healthy controls. More importantly, this provided a basis for using it as a new biomarker in the future.