Introduction

Following natural infection or vaccination, sensitive measurement of SARS-CoV-2 serological status is important to identify immune correlates of protection from future waves of the pandemic, evaluate those in need of booster vaccination and identify candidates for SARS-CoV-2 antibody therapy. The rapid response to the COVID-19 pandemic has led to the development of a wide range of serological tests suitable for evaluating SARS-CoV-2 exposure, infection or vaccination status1,2,3. Typically, these tests are approved for use by the regulatory authorities based on their performance against a panel of reference sera including positive and negative controls at either 14- or 21-days post infection4.

Public Health England reported a 93.9% sensitivity for the Abbott SARS-CoV-2 IgG Nucleoprotein assay5 and 100% for the Roche Elecsys Nucleoprotein assay at ≥ 14 days post infection6. This led to widespread adoption of these tests across NHS laboratories for testing at population level. Other studies have confirmed this test performance at 14–21 days post infection7,8. Population level serological studies have also based their conclusions—vital to guide national policy—on the basis of these tests9 without considering how time since infection influences the performance of the test. The problem with this approach is that it does not take into account SARS-CoV-2 humoral dynamics and changes in avidity over time10,11. Although serological tests with limited diagnostic range may demonstrate excellent sensitivity shortly after infection, it is unclear how they will perform with time following infection or vaccination.

In order to address this question, we applied 4 widely used serological assays in parallel to serial samples from the Co-STARs study12 in which staff testing seropositive to SARS-CoV-2 were followed for up to 200 days following infection. We compared the proportion of samples that remained seropositive over time using a survival analysis and determined the decay rate of the nucleoprotein (N) antibody and the spike (S) antibody for each test using a previously published mathematical model fitted to the data.

Materials and methods

Study setting and design

Serological testing was performed on stored serum samples collected as part of the Co-STARs study (www.clinicaltrials.gov, NCT04380896), approved by the UK National Health Service Health Research Authority and run at Great Ormond Street Hospital between April 29th and November 2020 in accordance with the relevant guidelines10. Briefly, Co-STARs was a 1-year single-centre prospective cohort study of antibody responses to COVID-19 infection in healthcare workers. Serum samples were taken from the 3657 participants at baseline and underwent a screening ELISA using the EDI assay. Repeated monthly serum samples were then taken from those with a seropositive baseline screening test for up to 250 days after the date of infection. Written informed consent was obtained from all participants. Those samples identified as seropositive with available symptom start date had further confirmatory testing with the quantitative three antigen MSD assay.

Study participants

The majority of hospital staff were eligible for the Co-STARS study12. Only those participants with significant immunosuppression, those that had received blood products within 6 months of recruitment and those that had active and ongoing symptoms of SARS-CoV-2 infection (within the last 21 days) were excluded. Only samples from individuals with at least one positive test from any platform were included in the analysis. Moreover, individuals without a known symptom start date were removed.

Data collection

As part of the Co-STARs study all participants undertook a detailed standardised online questionnaire at study entry12. This included the date of onset of COVID-19 symptoms, and any SARS-CoV-2 diagnostic test results.

Comparison of serological assays

Samples taken as part of the Co-STARS study12 which had an accompanying symptom start date available for analysis were initially screened for seropositivity by the EDI assay or by any of the three antigens of the Meso Scale Discovery (MSD) assay. The selected samples each underwent testing with 4 serological assays: (1) The Roche Elecsys Anti-SARS-CoV-2 electrochemiluminescence immunoassay (ECLIA) assay detects the nucleocapsid (N) antigen (Roche-N); (2) the Roche Elecsys Anti-SARS-CoV-2 S electrochemiluminescence immunoassay (ECLIA) assay detects the spike (S) antigen (Roche-S); (3) the Abbott Nucleoprotein Chemiluminescent Microparticle Immunoassay (CLMIA) assay detects the nucleocapsid (N) antigen (Abbott-N); (4). All tests were performed as per manufacturer’s specifications.The four antigen Meso Scale Discovery (MSD) assay was undertaken at the WHO Pneumococcal Supranational Reference Laboratory at the UCL Institute of Child Health. Only 3 antigens were reported from the MSD assay (the Spike, the Nucleoprotein and the RBD) as the baseline test performance of the N-terminal domain (NTD) antibody response was insufficient for further evaluation as previously reported13. The Roche-N and Roche-S assays were undertaken by the Laboratory Medicine Service of Swansea Bay University Health Board, Morriston Hospital, Swansea. The Abbott-N assay was undertaken by Public Health Wales Microbiology at Cardiff and Vale University Hospital. All samples were stored and transported between laboratories at − 80 °C and only removed for aliquoting prior to testing to avoid unnecessary freeze–thaw cycles.

Statistical analysis and modelling

In order to evaluate the relative proportion of seropositive tests in the parallel serological assays over time, a time-to-event analysis was performed using the time from symptom onset and the first negative test for each assay after a first positive test as the event of interest using the R package survival14,15. Only tests taken > 14 days after symptom onset were considered in the analysisNo tests were performed between 14- and 21-days post symptoms, and thus using 14 or 21 days post symptom onset as threshold did not affect our results. A participant was defined as seropositive when at least one of the 4 tests undertaken was seropositive. If the other tests that were run in parallel never became seropositive, the time-to-event was set to the earliest test taken for that individual. If a participant never became seronegative during the follow-up period, a right-censored observation was added at the time of the last serological test.

Additionally, the decay rate after 21 days since symptom onset was estimated using a Bayesian generalized linear mixed model as implemented in the R package MCMCglmm16, where time from symptom onset was included as a fixed effect and study participants as a random effect. Therefore, a unique slope for the regression was estimated for the entire population, while the intercept was allowed to vary between the study participants. The decay rate was estimated from the slope of the linear model.

To assess the overall diagnostic capability of the Abbott-N assay, a receiver operating characteristic curve (ROC) analysis was performed using the pROC package within R17. The MSD-N and Roche-N assays were used as the gold standard for the comparison.

Ethical approval and consent

The study had national Integrated Research Application System (IRAS) approval and all participants in the study provided informed consent.

Results

A total of 950 samples from 329 participants seropositive by any assay after 14 days underwent testing with the Roche-N, Roche-S, the MSD and the Abbott-N assay. The majority of the participants (98%, 321/329) had a positive result by two or more assays.

Antibody decay with time

Plotting the raw log transformed antibody titers over time since symptom onset (Fig. 1) demonstrated that antibody dynamics were dependent on the assay undertaken. The production of spike antibodies was demonstrated to be maintained at high levels up to 200 days when evaluated by the MSD and the quantitative Roche -S assay. All nucleoprotein antibody assays demonstrated decay of the nucleoprotein antibody over time. This was most pronounced in the Abbott-N assay and much less so in the Roche -N assay which demonstrated slow waning of the nucleoprotein antibody.

Figure 1
figure 1

Log transformed serial serological antibody titer data plotted by time from symptom onset. Antibody dynamics are dependent on the assay used with the sensitive Roche-S and MSD-S assay demonstrating maintenance of the spike protein antibody while the nucleoprotein antibody is shown to wane with the MSD and Abbott-N assays but to a lesser extent with the Roche-N assay.

Assay sensitivity with time post symptom onset

The existing published test performance for all assays undertaken is provided in Table 1. The sensitivity of all assays (at least 14 days from symptom onset) at 50, 100 and 150 days is provided in Table 2. All assays demonstrated a reasonable sensitivity at 50 days following infection (Fig. 2a). As time passed following infection, the Abbott-N assay rapidly became seronegative (Fig. 2a), with a median survival time inferred at 175 days (95% CI 168–185 days), whereas the survival probability at 150 days was inferred to be 95% for the Roche-N (95% CI 0.92–0.97), and 91% for the MSD-N assay (95% CI 0.87–0.94). The Roche-S and MSD-S assays remained seropositive for the duration of the study. The MSD-RBD assay showed some evidence of waning seropositivity over time (90% Survival probability at 150 days, 95% CI 0.88–0.94).

Table 1 A summary of the existing published data for the commercially available tests in this comparison.
Table 2 Sensitivity of compared assays at 50, 100 and 150 days from symptom onset.
Figure 2
figure 2

Comparison of seropositivity and antibody dynamics between serological tests. The Roche-S assay target the spike antibody, the Abbott-N and the Roche-N assays target the N-antibody while the MSD assay targets the N-, the S- and the antibody to the Receptor Binding Domain (RBD) of the spike protein in parallel. (a) Kaplan–Meier curve and numbers at risk (the number of participants under follow up with serological tests available for analysis at that time point) for different serological tests. Y-axis represents the probability of remaining seropositive, while the X-axis shows days after symptom onset with numbers of participants under follow up shown in the table below. (b) Inferred posterior density distributions of the decay rate in a generalized linear mixed model.

A total of 45% (159/329) of the individuals had a negative result using the Abbott-N assay during the course of the study. For the MSD test, 16% (52/329) of participants had a negative test for the N antigen, 11% (36/329) for RBD, and 3% (11/329) for the S antigen. For the Roche platform, 5.5% (18/329) of the individuals had a negative result with the Roche-N assay, while only 4.8% (16/329) of them had a negative result for the S antigen over the course of the study.

Mathematical model fits to estimate antibody decay

To estimate the decay rate for each antibody and assay studied, a generalized linear mixed model was fitted to the trajectory of antibody decay after 21 days from symptom onset, where the decay rate was estimated as the slope of the antibody titer through time. Under the most sensitive and quantitative Roche -S assay the spike antibody demonstrated no decay at all and rather a slow rate of increased titers over time from symptom onset (0.0031, 95% CI 0.0018–0.0044, Fig. 2b). In accordance with the raw observed data, all nucleoprotein antibodies under the mathematical model decayed. This was most pronounced in the Abbott-N assay (− 0.022, 95% CI − 0.023 to − 0.02) and least pronounced in the Roche -N assay (− 0.0025, 95% CI − 0.0039 to − 0.0012, Fig. 2b, Table 3).

Table 3 Decay rate for each serological assay (log arbitrary units per day) estimated in a generalized linear mixed model.

The lower performance of the Abbott-N assay can be explained by a lower detection of titer values as their concentration wanes over time. When compared to the quantitative MSD-N, 26% (222/860) of all positive samples by the MSD-N were negative for the Abbott-N test (Fig. 3a). A total of 75% of samples (137/ 183) positive by the MSD-N with an MSD arbitrary titer value lower than 403 were negative for the Abbott-N assay. Using the currently manufacturer recommended threshold of 1.4 arbitrary units, the Abbott-N test was characterized by a high specificity of 0.96 and a sensitivity of 0.74 using all our test results after 14 days. Using a ROC curve (Fig. 3b), the optimal cut-off that maximises both specificity and sensitivity was estimated to be 0.845 arbitrary units.

Figure 3
figure 3

Comparison of antibody titers between the Abbott-N assay and the MSD-N assay. (a) The quantitative results for the MSD-N assay were compared to those of the Abbott-N test for each sample taken. Colours divide the samples depending on whether it was positive (green) or negative (red) for the MSD-N assay. Dotted red lines represent the seropositivity threshold for the Abbott-N assay (horizontal) and the MSD-N test (vertical). (b) ROC curve for the Abbott-N assay using the MSD-N test as gold standard. The x ~ y line represents the profile of a random classifier. Blue shaded area shows the 95% CI.

Discussion

Sensitive measurement of SARS-CoV-2 seropositivity is key to evaluate who has been infected or exposed to SARS-CoV-2, to determine the correlates of protection from future disease, stratify those that need booster vaccination and target the use of anti-SARS-CoV-2 antibodies to those that are seronegative. To our knowledge no other study has evaluated the sensitivity of multiple diagnostic tests in parallel on longitudinally collected serological samples. This study demonstrates that as time elapses after infection, the sensitivity of serological testing varies widely depending on the test used. Although serological tests may be demonstrated to perform well 14–21 days after infection, this initial test performance often diminishes as time passes. In order to evaluate whether or not the population maintains SARS-CoV-2 antibodies it is vital that we utilize serological tests that remain sensitive over time.

Initial published baseline test performance reports concluded that the Abbott-N assay was a high-performance test and a key tool in SARS-CoV-2 surveillance18. Our data demonstrate that as time passes following infection the sensitivity of this assay declines rapidly until at < 6 months following infection it is no more than 50% sensitive. Our findings support the concerns raised by others regarding the poor performance of some nucleoprotein based assays21,22.

In contrast, the Roche assays, particularly the Roche Elecsys Anti-SARS-CoV-2 Spike assay maintained high sensitivity for the 200-day duration of the study. Although there remains no single correlate of sterilizing or protective immunity following SARS-CoV-2 infection or vaccination, it is clear that natural infection and the presence of neutralizing spike antibodies decreases the possibility of re-infection and the severity of disease upon re-exposure to currently circulating strains23. Our finding that spike antibodies remained at high titers 200 days after infection adds to our previous study on this topic10 and provides further evidence in support of long-lasting protection against severe disease from currently circulating strains. Fitting mathematical models to the raw data of the Roche spike assay demonstrated that spike antibody titers did not decay but rather increased slightly over the duration of the study. The Roche nucleoprotein assay also maintained sensitivity for the duration of the study with a low rate of decay. Although this assay is semi-quantitative, our findings suggest that this could be used to sensitively identify those that have been vaccinated from those that have been both vaccinated and infected.

Many studies have evaluated the impact of time on test sensitivity over the first 3 weeks following symptom onset24,25,26. However, we found no other study that had examined the sensitivity of antibody testing on parallel longitudinal samples collected between 1 and 6 months after infection or exposure. Assays with a higher titer cut-off for detection may perform well in the initial period after infection, but fail to detect seropositivity as antibody levels wane over time. We show that the Abbott-N test failed to detect 75% of samples positive for the MSD-N with a titer value lower than 403, which makes the Abbott-N assay less suitable for seroprevalence studies. Using a ROC curve and the MSD-N and Roche-N assays as the gold standard, we showed that a lower threshold of 0.845 instead of 1.4 arbitrary units may be more suitable to optimize the sensitivity and specificity. Even though different thresholds may be relevant depending on whether sensitivity or specificity needs to be prioritized, our findings suggest that the high Abbott-N test threshold results in a high number of false negatives. These findings are concordant with previous reports showing a range of high uncertainty between 0.49 and 1.427. Barzin et al.28 used Abbott-N testing alone to determine SARS-CoV-2 seroprevalence in 2,973 asymptomatic out-patients in North Carolina estimating a seroprevalence of 0.8%. Similarly, Wilkins et al.29 used Abbott-N on 6510 healthcare workers up to 150 days after symptom onset and estimated a seroprevalence of 4.8%. Our findings suggest that previously published surveys of SARS-CoV-2 seroprevalence such as these could have significantly underestimated the true prevalence of SARS-CoV-2 humoral immunity.

Memory T-cell interferon gamma release or proliferation assays in response to SARS-CoV-2 antigens provide an alternative means of assessing prior exposure to infection. However, these assays are limited by cross reactive immunity to the seasonal coronaviruses decreasing specificity30,31.

Although all serological tests used in the study demonstrated a high initial specificity, one limitation is that only 38% of participants had a confirmatory SARS-CoV-2 PCR result. Our data may therefore be influenced by an unknown proportion of falsely positive serological tests. However, at entry to the study, all seropositive participants had both a screening EDI nucleoprotein assay and an MSD assay performed which limited the chances of a falsely positive result due to a single erroneous test. Not all samples were processed at the same time; the Roche and Abbott-N assays were processed 3 months after the MSD assays. Despite this, we believe that sample storage and freeze-thawing cycles are unlikely to have influenced our findings as the Roche quantitative spike assay was performed last and demonstrated the highest prolonged levels of spike antibody of all tests used.

In summary, although serological tests may demonstrate high sensitivity 3-weeks after SARS-CoV-2 infection, this is far from the case with some tests 6-months after infection. The Abbott-N assay performed poorly at this time, whereas the Roche and MSD tests maintained a high sensitivity for the 200 days of the study. Tests that perform poorly over time will lead to spurious estimates in population level seroprevalence studies and findings from these studies should be adjusted to account for sensitivity of the test used and the time since infection. Test performance as time passes post infection should be considered before evaluating who is a candidate for booster vaccination or anti-SARS-CoV-2 antibody therapy.