Introduction

During the COVID-19 pandemic, cases of reinfection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have been occurring worldwide [1,2,3]. According to the Centers for Disease Control and Prevention, reinfection is defined as at least two SARS-CoV-2 infection episodes caused by viruses of different lineages confirmed by sequencing or a second positive test result obtained using a specimen collected >90 days after the first one, except in the case of severely immunocompromised individuals [4]. However, this definition might be changed when more information has been obtained.

Re-detection of viral RNA in patients has been associated with either reinfection or reactivation (incomplete virus elimination leading to positive results after a negative test) [5, 6]. To distinguish between these possibilities, reinfection and/or reactivation should preferably be confirmed by viral genome sequencing to evaluate if the viruses causing primary and secondary infections are of different lineages [7, 8].

Understanding the natural course of reinfection and reactivation is pivotal for developing control strategies for COVID-19. We therefore used high-throughput sequencing of samples from four putative cases of COVID-19 reinfection from a database composed of more than 30,000 tests from a region of southern Brazil between March and November 2020, before the beginning of the national vaccination campaign. These genome sequences were analyzed to identify the mutational patterns of the viruses from the first and second infections and reconstruct their evolutionary history.

Materials and methods

The Clinical Epidemiology Laboratory (Epiclin) is located at the Federal University of Health Sciences of Porto Alegre (UFCSPA) and funded by Associação Hospitalar Moinhos de Vento (AHMV). The Institutional Review Board of Moinhos de Vento Hospital approved this study under protocol number 32149620.9.0000.5330, following procedures from national guidelines for ethics committees. In the Epiclin, 31,973 tests for SARS-CoV-2 were carried out between April and December 2020. Naso- and oropharynx swab samples were obtained from four regions belonging to the Northeast and Metropolitan mesoregions of Rio Grande do Sul, southern Brazil. We identified (by name and date of birth) four symptomatic patients who had experienced at least two distinct episodes of COVID-19 with an interval of at least 45 days and tested positive based on amplification of the three viral genes that are targeted by the AllplexTM SARS-CoV-2 Assay (Seegene, South Korea) (Supplementary Fig. S1).

Sequencing libraries were constructed using a QIAseq SARS-CoV-2 Kit (QIAGEN, Hilden, Germany). Sequencing was performed using an Illumina Miseq instrument and a MiSeq Reagent Kit v3 (600 cycles) (Illumina, USA) as described by Sant'Anna et al. [9].

A variant-calling pipeline, available at https://github.com/fhsantanna/variant_calling_pipeline, was built using the Snakemake scheme. Reads were preprocessed using Fastp version 0.23.2, performing quality trimming and removing sequencing adapters. Subsequently, preprocessed reads were mapped to the SARS-CoV-2 archetype genome (MN908947) using BWA version 0.7.17-r1188. Next, optical duplicates were removed using MarkDuplicates from GATK package version 4.2.0.0. Deduplicated reads were then trimmed for ARTIC primers using Ivar version 1.3.1. Trimmed reads were assembled using Samtools version 1.15, and variants were called using Ivar with a minimum read depth of 5. Variants were filtered using Ivar with default parameters. Finally, consensus sequences were generated using Ivar, with a minimum read depth of 5. Consensus sequences were evaluated using the COVID-19 genome annotator (http://giorgilab.unibo.it/coronannotator/). Mutations that were not found in the variant-calling step were removed manually from the consensus sequence.

For phylogenetic reconstruction, all available SARS-CoV-2 sequences from the year 2020 from the cities where the putative reinfection cases occurred were used as references. These sequences were downloaded from GISAID in July 2021 and analyzed together with the genome sequences from this study. The sequence Wuhan/2019 was also included in the dataset. Sequences were aligned using MAFFT version 7.480 with the FFT-NS-2 option with default parameters. A phylogenetic tree was built using IQ-Tree version 2.1.4-beta, with the best-fit model determined automatically (option -m MFP) and ultrafast bootstrapping with 1,000 replicates. Next, a timetree inference and a "mugration" model using discrete PANGO lineages were conducted using Treetime version 0.8.1. Subsequently, the tree was plotted using a script written in R (ggtree library), coloring the branches according to the PANGO lineages.

Results and discussion

Among the 31,973 samples in our collection, we found 11 putative reinfection cases, and in four of these cases, all of the viral targets were amplified in the diagnostic assay. This is consistent with previous studies showing reinfection to be a rare event within a short timeframe (~one year) [10,11,12].

The four patients were between 33 and 76 years old, symptomatic, and unvaccinated for SARS-CoV-2, and three of them were male (Table 1). The first and last samples from patients P1, P2, and P3 were collected approximately three months apart (September and November 2020). The four samples from patient P4 were collected on 5, 18, and 25 May and on 2 July 2020 (~60 days between the first and fourth collections). The third sample from P4 (25 May 2020) gave a negative result in the RT-PCR test.

Table 1 Data for samples included in this study

We sequenced all nine of the positive samples, but the data from one of them was of insufficient quality for further analysis. We conducted a comparative analysis of eight genome sequences. For generating the consensus sequences of the viral genomes, 3276 to 1,662,544 reads were mapped to the reference sequence (Table 1). The breadth of coverage ranged from 37.2% to 99.98%, and the mean depth of coverage ranged from 10.61- to 6791.66-fold (Table 1).

The timetree shown in Figure 1 presents the relationships between the episodes of infection in a temporal framework and highlights their lineage groups. Two of the four cases (P1 and P3) exhibited infections with distinct lineages: B.1.1.28 and B.1.1.33 in the first (sample 1, S1) and second infection (S2), respectively. For patient P2, the sequence obtained from sample S1 grouped with the B.1.1.28 lineage, but the S2 sequence did not group with the B.1.1.28 or B.1.1.33 lineage. In the fourth case (P4), three positive samples (S1, S2, and S4) were sequenced, and two sequences (S1 and S4) were of sufficient quality for phylogenetic reconstruction. Both sequences clustered within the B.1.1.33 lineage. The period from March to November 2020 represents the first wave of the COVID-19 pandemic in southern Brazil. During that time, B.1.1.33 and B.1.1.28, descendants of the B.1 lineage, were the main circulating lineages [9, 13, 14].

Fig. 1
figure 1

Time-scaled phylogenetic tree of SARS-CoV-2 reinfection cases. Geometrical shapes indicate the patients who were reinfected (legend). Internal branches are colored according to the most probable PANGO lineage of the last common ancestor of the clade (legend).

Considering only regions covered in both viral genomes, the comparison between sequences from samples S1 and S2 of patients P1, P2, and P3, showed a total of two, seven, and 12 distinct nucleotide substitutions, and they shared one, two, and four nucleotide (nt) substitutions, respectively (Fig. 2).

Fig. 2
figure 2

Mutation profile of the genome sequences of each patient's samples. Nucleotide substitutions are represented by colored bars. Black bars indicate regions not covered in both genome sequences, while grey bars indicate positions not included in the consensus sequence due to low coverage depth.

Some of these substitutions resulted in amino acid (aa) changes in the encoded viral proteins. From patient P1, sample S1 contained the substitution ORF8:R115L. From P2, S1 contained ORF6:I33T and N:A134V and S2 contained NSP3:N1636S and NSP12b:A176P. From P3, S1 contained NSP15:A171V, S:V1176F, ORF3a:G224C, and ORF8:I121L, and S2 contained NSP6:L37F, NSP15:Q19H, S:S939F, and ORF6:I33T.

The sequences of the first (S1) and fourth (S4) samples from patient P4 showed four distinct nt substitutions, resulting in the aa changes NSP7:Q34H, S:Q675R, NSP3:V1635G, and NSP3:N1636S (Fig. 2). The phylogenetic reconstruction showed that these sequences were in the outermost clade of the B.1.1.33 lineage. The S3 sample from this patient was negative, suggesting that patient P4 was reinfected after that sample was collected. However, we cannot rule out a reactivation, since we did not have access to the clinical data and outcomes of the patients, and both sequences clustered in the same lineage [4, 6]. Another limitation of this study is the low quality of two of the eight genome sequences determined, which were not accepted in a genomic database (GISAD and GenBank; Supplementary Information). The sequence quality can be associated with a low viral copy number, as suggested by high cycle threshold (Ct) values.

An individual's immune response might be deficient in episodes of reinfection, allowing the second virus to evade the limited and transitory immunity induced by the primary infection [1, 15, 16]. Decreased immunity to SARS-CoV-2 over time selects immune escape variants, which can be a threat to vaccine strategies [7, 17, 18].

Here, we describe three cases of confirmed reinfection and one case of suspected reinfection in southern Brazil using genomic and phylogenetic analysis. Genomic monitoring is a crucial tool for differentiating between reactivation and reinfection and for providing information to guide public health systems.