Introduction

While a thorough clinical assessment is a physician’s foremost tool for diagnosing Lyme disease (Lyme borreliosis, LB), laboratory work-up is oftentimes required. The cornerstone of this laboratory work-up is serology. Traditionally, guidelines in both the USA and Europe have advised to perform serodiagnosis of LB by applying two-tier testing [1,2,3]. In the first tier, a highly sensitive (but possibly false-positive) enzyme immunoassay (EIA) is performed, after which positive or equivocal results require confirmation in a second tier by IgM/IgG immunoblotting. While some have come close, no single test has to the best of our knowledge attained or surpassed the combined sensitivity and specificity of standard two-tiered testing (STTT).

An important component of several EIAs is the VlsE protein. VlsE is highly immunogenic and antibodies are produced early after onset of infection with Borrelia burgdorferi sensu lato (Bbsl) [4, 5]. Within the VlsE protein, a 26-amino acid sequence named invariable region 6 (IR6) has been shown to be highly conserved among the various subspecies of Bbsl, and to be highly immunogenic [6]. This had led to development of the C6-ELISA in which a synthetic peptide based on IR6 is used as antigen. The C6-ELISA is used both in the USA and Europe [7, 8].

For the second tier, American healthcare providers rely on immunoblots prepared from native cultivated B. burgdorferi sensu stricto (Bbss) bacteria, which is the primary causative agent in North America. In Europe, LB is caused by a variety of Bbsl species (e.g., B. garinii (Bgar) or B. afzelii (Bafz)) [9]. It is, therefore, virtually impossible to standardize immunoblotting in Europe using Borrelia lysates with respect to choice of antigen and uniform interpretation criteria. As a result, European immunoblots rely primarily on recombinant antigens of the various subspecies of Bbsl prevalent on that continent [10]. Other drawbacks of immunoblotting are that it is considered laborious and may be prone to inter-assay variation as it is non-quantitative.

A further argument for revisiting the STTT algorithm is its limited sensitivity early in the course of the disease, specifically for diagnosing early localized LB (an erythema migrans, EM). A meta-analysis found that STTT has a sensitivity of approximately 50% in patients with an EM [11]. For this reason, most guidelines consider EM a clinical diagnosis and recommend against performing serological testing for early localized LB [1,2,3], even though extra certainty in the form of laboratory testing can be desirable, for instance when cutaneous lesions are atypical or vague.

Various solutions to the aforementioned problems with confirmatory testing by immunoblot have been proposed. In an American study comparing the diagnostic parameters of STTT vs. the C6-ELISA alone, the C6 assay was shown to be significantly more sensitive than standard two-tiered testing in EM, and comparably sensitive in various forms of disseminated LB [12]. However, this did come at the cost of a significantly lower specificity, even though the difference was small (C6 98.8%; STTT 99.5%) [12]. Other studies have examined the diagnostic parameters of so-called modified two-tier testing (MTTT) using different EIAs or ELISAs in both tiers [7, 13,14,15]. These studies showed that the MTTT algorithm had far better sensitivity in early LB than STTT, but without the associated loss of specificity. These findings—and others—have led the FDA to recently approve a Borrelia-EIA for use as confirmatory 2nd-tier test [16]. Previous research on MTTT has, however, been limited to sera from the USA. The differences in genospecies and diagnostic tests between Europe and the USA necessitate that these findings be reproduced in European sera, before MTTT algorithms can also be used on that continent.

In this multiple-gate case-control study, we have investigated the sensitivity of various assays and algorithms of MTTT in sera of European (Dutch) patients with physician-diagnosed EM as the most prevalent manifestation of early LB in Europe, and their specificity in a variety of control sera. We have used one of several commercial EIAs in the first tier, followed by a European immunoblot or the C6-ELISA in the second tier. Similar to findings from the USA, we hypothesize that MTTT will improve sensitivity of serological testing in early LB over STTT, while maintaining adequate specificity.

Materials and methods

EM sera

Sera from 228 patients with early localized LB (an EM) were used as cases. These were selected from all sera sent between October 2010 and October 2011 by general practitioners (GPs) requesting Borrelia serology at the Certe Laboratory of Infectious Diseases (Certe LID) in Groningen, the Netherlands. Clinical data were collected from GPs by questionnaire, which were sent to the GP before serological test results were available. Selection of participants was consecutive: the study was performed on sera from all patients (1) who had sera sent in during the given timeframe, (2) for whom the necessary clinical information was available, (3) who had a probable EM > 5 cm as diagnosis, and (4) for whom enough serum was available to perform all tests. All other sera were excluded. The EM diagnosis was made by the GP, based on Dutch guidelines [17]. It was made purely based on these clinical findings, and hence independently from serological test results. Sera were drawn at the time of the clinical diagnosis. The process of serum selection is also given in Fig. 1. Of the selected EM sera, 46.1% were collected from males and 53.9% from females. The median age of patients was 53 years (range 1 to 86, IQR 39–62 years). The aforementioned sample size was chosen independent from a sample size calculation.

Fig. 1
figure 1

Serum selection

Population control sera

Two hundred thirty-one sera were selected from the serum bank archived at the National Institute of Public Health and the Environment of the Netherlands (RIVM) and were used as population controls (PopC) (Fig. 1) [18]. These sera were collected between 2006 and 2007 as a representative sample of the general population of the Netherlands. PopC sera were selected to yield an age and sex distribution comparable with the EM patients. Of the selected control sera, 45.9% were collected from males and 54.1% from females. The median age of population controls was 53 years (range 1 to 79, IQR 40–62 years).

Cross-reactive sera

Ten sera with VDRL ≥ 1:32 and positive TPPA (syphilis), 10 sera with positive anti-nuclear factor (ANF), 10 sera with positive rheumatoid factor (RF), 10 sera positive for IgM and IgG antibodies to CMV, and 10 sera positive for IgM and IgG antibodies to EBV were used as cross-reactive controls (CRC) (Fig. 1). These sera were selected from the serum bank of Certe LID.

Assays

Four commercially available immunoassays were performed on all samples (Table 1) in a non-blinded fashion. All sera which gave an equivocal or positive result in at least one of the EIAs/CLIA were tested by immunoblot. All assays were performed according to the manufacturers’ instructions. The Enzygnost-2 and immunoblot on EM sera were performed as part of normal clinical routine. All other assays were performed in batches, as were the Enzygnost-2 and immunoblot on PopC and CRC sera. In the interim, sera were stored at − 80 °C. An effort was made to keep thaw-refreeze effects to a minimum.

Table 1 Overview of assays

Assays were interpreted using cutoffs pre-defined by the manufacturer. A test was deemed positive when either the IgM component, the IgG component, or both were positive. A test was considered negative when neither component was reactive. A test was considered equivocal when both components were equivocal, or when one component was equivocal and the other negative.

Algorithms

Three serodiagnostic algorithms were evaluated:

  1. 1.

    Single-tier: one of the EIAs/CLIA as sole test. Equivocal test results were classified as negative.

  2. 2.

    STTT: one of the EIAs/CLIA followed by immunoblot. This algorithm was evaluated in its standard form, i.e. equivocal test results in the EIAs were classified as positive, but equivocal test results in the immunoblot were classified as negative.

  3. 3.

    MTTT: one of the EIAs/CLIA (C6 excepted) followed by the C6-ELISA. This algorithm was evaluated both (3a) classifying equivocal test results as negative (‘strict’) and (3b) classifying equivocal test results as positive (‘permissive’).

Statistical analysis

Diagnostic parameters of tests within each serum group were compared with one another using the McNemar test, or the exact McNemar where applicable. The 95% confidence intervals of proportions (i.e. of the sensitivity and specificity) were determined using Clopper-Pearson. The differences in sensitivity and specificity of the permissive vs. strict variant of MTTT were assessed using a one-sample t test, as algorithm interpretation rules prevented the McNemar test from being used in these comparisons. The same applied to the comparison of the MTTT-strict algorithm vs. the C6-ELISA as single-tier test, and the comparison of MTTT-permissive vs. the MTTT-permissive IgG only. We performed subanalyses to assess consistency of reactivity across the various conditions in CRC sera. For all analyses, p values < 0.05 were considered statistically significant.

Ethical statement

The study was conducted according to the principles of the Declaration of Helsinki and in conformity with institutional regulations and guidelines. The study utilized only patient materials left over from standard clinical practice. Therefore, the Dutch Medical Research Involving Human Subjects Act does not apply to this study and no informed consent was asked from participants.

Results

Using the single-tier algorithm, three assays performed comparably with regard to sensitivity (78–81%), except for the Liaison which showed a significantly lower sensitivity (70%) than all other assays (p < 0.01) (Table 2). All single-tier assays performed comparably with respect to specificity, both in PopC (89–94%) and CRC sera (62–78%) (Table 3). False positivity in CRC sera was mainly due to IgM cross-reactivity in patients with acute EBV or CMV infection (data not shown).

Table 2 Sensitivity of all algorithms
Table 3 Specificity of all algorithms

Sensitivities of STTT protocols were much lower (44–47%). Specificities increased to 97% for PopC sera and 88–92% for CRC sera. No significant differences were seen regarding sensitivity and specificity between the various assays as used in STTT (Tables 2 and 3).

We used MTTT in two variants: classifying equivocal results as negative (“strict”) and classifying equivocal results as positive (“permissive”). The highest sensitivity within the MTTT algorithm was achieved using the permissive variant with the Enzygnost-2 in the first tier, and the C6-ELISA in the second (77.6%). The highest specificity was achieved using the strict variant of the MTTT algorithm, irrespective of the first-tier EIA that was used (96.5%). In both variants, the sensitivity of the combination Enzygnost-2/C6 was better than that of the combination Liaison/C6 (strict p = 0.003; permissive p = 0.007). Other comparisons within each algorithm/variant yielded no significant differences (Tables 2 and 3).

When comparing the single-tier, MTTT-strict, or MTTT-permissive algorithms to STTT, we found sensitivity of all these algorithms to be significantly better than the equivalent STTT algorithm. The specificity of all assays in the single-tier algorithm was significantly lower than that of STTT. However, the specificity of both MTTT strategies in the population controls and in controls with a potentially cross-reactive condition was comparable with that of STTT.

Comparing the permissive variant of the MTTT algorithm to the strict variant, we found that the permissive strategy resulted in a higher sensitivity for all assays (p < 0.01). This increase in sensitivity was 4.8% (95% CI 2.0–7.6) for the Enzygnost-1/C6 and Enzygnost-2/C6, and 5.7% (95% CI 2.6–8.7) for the Liaison/C6. Interestingly, specificity in both population controls and controls with a potentially cross-reactive illness was comparable between the two variants. The MTTT-permissive algorithm also outperformed equivalent single-tier assays, as all MTTT-permissive combinations had comparable sensitivity, but a higher specificity than their single-tier equivalent.

Comparing all MTTT combinations to the “benchmark” C6-ELISA, we found unsurprisingly that the assays in the strict variant had a significantly better specificity than the single-tier C6-ELISA. These combinations did all have significantly lower sensitivity. As previously mentioned, all MTTT-permissive variants had a better sensitivity than their equivalent strict variant, and two of such combinations (Enz1/C6 and Enz2/C6) even had comparable sensitivity with the single-tier C6-ELISA. Importantly, for the best scoring combination (Enz2/C6), this did not come at the cost of a loss of specificity, which was still better than that of the single-tier C6-ELISA (specificity PopC C6 93.5% vs. C6/Enz2 96.1%, p = 0.03). All comparisons between MTTT variants and the C6-ELISA are given in Tables 2 and 3.

The diagnostics odds ratios for the various assays and algorithms are given in Table 4.

Table 4 Diagnostic odds ratios of all algorithms

Among the screening EIAs, the relative contribution of the IgM component was highest in the Enzygnost-1 assay. That assay’s IgG component, which does not contain VlsE, performed rather poorly (Fig. 2a, orange and pink bars combined), as did the IgM component of the Liaison, which relies on recombinant antigens only (Fig. 2a, orange and green bars combined). Use of MTTT was most useful for correct classification of IgM-positive PopC sera (Fig. 2b) and also filtered out a significant number of IgM false-positive CRC sera (Fig. 2c). Based on Fig. 2a–c, it could be hypothesized that using only the IgG component of each first-tier assay would result in a large improvement of specificity in CRC sera with only a minor decrease in sensitivity. Diagnostic parameters of this MTTT-permissive IgG-only variant are given in Table 5. This variant without a separate IgM component had the expected increase in specificity in CRC sera, but this did come at the cost of a statistically significant decrease in sensitivity for all combinations. Within the IgG-only variant, the Enz2-IgG/C6 outperformed both other combinations (both comparisons p < 0.01). Further subanalysis of both MTTT algorithms showed that Enzygnost-2-IgG outperformed Liaison-IgG in terms of sensitivity (p = 0.041), while having comparable specificity in PopC and CRC sera, but only in the permissive variant (strict variant, sensitivity Enz1-IgG vs. Lia-IgG p = 0.664) (Supplementary Table 1).

Fig. 2
figure 2

Reactivity per component - single-tier & MTTT-permissive (C6 and RecomL not shown) a EM sera, percentages calculated from n=228. b PopC sera, percentages calculated from n=231. c CRC sera, percentages calculated from n=50

Table 5 Diagnostic parameters of algorithms with IgG component only

Subanalysis of IgG-blot positive vs. IgG-blot negative EM sera showed that the additional value of VlsE in the IgG-EIAs was restricted to blot-negative sera, i.e. early infection (IgG-blot NEG, Enz2-IgG/Lia-IgG vs. Enz1-IgG p < 0.01; IgG-blot POS, all comparisons p > 0.05) (Supplementary Table 2).

Discussion

In our study, we have evaluated various algorithms of single-tier testing or MTTT compared with STTT in European patients with early localized LB. Previous studies conducted in the USA have shown that MTTT protocols generally improve sensitivity of serological tests for LB without losing specificity [7, 13,14,15]. Our results show that the same holds true for the European situation, even when the screening EIA contains VlsE and the confirmatory test is the C6-ELISA, which functions with the IR6 peptide of VlsE.

Overall, the best diagnostic parameters were found using the Enzygnost-2 in the first tier, even though differences with the other assays within any algorithm were mostly non-significant. The highest sensitivity was found when using this assay as a standalone single-tier test. As was to be expected, this did result in a markedly lower specificity. Conversely, STTT had the best specificity, but lacked sensitivity in the EM sera that were used. The sensitivity of STTT in our study was comparable with the reported sensitivity of serological testing in EM patients in a meta-analysis by Leeflang and colleagues [11], and we therefore consider it to be representative of EM patients in general.

The algorithms of MTTT that we studied all showed marked improvement of sensitivity as compared with STTT, without significant loss of specificity. Because MTTT is quite new in Europe, we also investigated which “rules” should be used for assay interpretation. Interestingly, we found that classifying equivocal results as positive resulted in an increase of sensitivity and only a negligible and non-significant decrease in specificity. For each assay, this permissive approach maintained comparable sensitivity with their single-tier equivalents, but with a significantly better specificity.

Finally, it must be noted that the best performing combination (Enzynost-2/C6 in the permissive variant) had a comparable sensitivity with the “benchmark” C6-ELISA, but a significantly better specificity in population controls. While the absolute differences were small (C6 93.5% vs. Enz2/C6 96.1%), this is still of great significance in the field of LB diagnostics. Borrelia serology is frequently requested for patients with a low pre-test probability of having LB; hence, the impact of a small improvement in specificity on the eventual positive predictive value can be substantial. This implies that two-tier testing is still advantageous, in spite of the C6-ELISA’s excellent diagnostic parameters. It must be noted that further improvements to the diagnostic parameters of multi-tier Borrelia serology may be possible when using more innovative ways of combining assays or assays’ components than the simple Boolean logic (i.e. AND/OR) we have employed in the current study [19, 20]. Our findings do not support dropping the IgM component from the first-tier assays, even in assays which have VlsE in their IgG component, even though such a move has recently been gaining popularity [21, 22].

One might argue that serological testing for an EM is not good clinical practice. After all, clinical guidelines state that an EM is considered a clinical diagnosis [1,2,3]. However, this is in part because current serological algorithms lack adequate sensitivity. Taking into account our findings, it may be prudent to further investigate the applicability of MTTT for patients with early LB. Of course, treatment of a cutaneous lesion which is recognized as an EM should not be delayed by serological confirmation, as antibodies may not always have formed. However, both our and previous studies show that requests for serology are nonetheless frequent in the setting of Dutch general practice, even in patients with clear-cut EM [8]. This implies that patients and clinicians value the added certainty that laboratory work-up gives to the physician’s clinical assessment. Laboratory testing may be especially relevant in situations where the LB diagnosis is not clear-cut, for example, when skin lesions do not resemble a typical EM, or if patients present with only generalized symptoms without a skin lesion [23, 24]. Considering the improved diagnostic parameters of MTTT, serology may serve to aid in the diagnosis of these situations. All of the aforementioned does not negate the recommendation, however, that cutaneous lesions should be treated as an EM without awaiting serological outcomes, if the clinician recognizes them as such. Finally, it is important to note once more that the true value of any MTTT algorithm will depend on the setting in which the algorithm is used (i.e. the pre-test probability and resulting positive and negative predictive values).

Our study also indicates that the IgM and IgG components of the different assays rely on different antigens for their reactivity. The IgG assay without VlsE was substantially less reactive in EM sera, as was the IgM assay which did not contain a whole cell extract but relied on recombinant antigens only. This last finding is in agreement with a recent study from Northern Europe [25]. Our subanalysis allows for a tentative hypothesis that the VlsE from the Enzygnost-2, consisting of three species, performs better than Liaison’s Bbav-VlsE; however, these results depended on the algorithm used (strict vs. permissive). Further studies should confirm whether addition of VlsE from different species improves diagnostic accuracy.

A limitation of our study is that we did not include a polymerase chain reaction (PCR) or culture on skin biopsies obtained from the EM to verify the presence of Bbsl in these lesions. However, the sensitivity of culture or PCR is not perfect [26, 27]. Excluding EMs in which the spirochete cannot be directly detected may therefore lead to an overestimation of the sensitivity. Branda and colleagues [13] also found a trend towards higher sensitivity of their MTTT algorithms in patients with culture-confirmed EM compared with culture-negative EMs. Our study might also have benefitted from multiple sera from the same patient from both the acute and convalescent phase; however, these were not available for a sufficient number of patients in this convenience sample. In contrast, a strength of our study was that the population controls were drawn from a representative sample of the population as a whole, whereas many studies use blood bank donors as controls. Blood bank donors are inherently a healthier subset of the entire population, with the potential for bias towards a different background seroprevalence.

Due to the rarity of definite and probable disseminated LB, we could not collect sufficient sera during the study period to include these in the study as well. Even though one of the major advantages of MTTT (i.e. improved sensitivity) primarily applies to early (localized) LB, it is imperative that future studies compare different MTTT algorithms for patients with disseminated LB as well. Even though the expected improvement in diagnostic parameters will likely be smaller, other advantages of MTTT will still apply: EIAs require less labor and hence lower costs of LB serology, and due to their quantitative nature have a decreased risk of inter-assay variation [28].

As a final remark, it is evident that MTTT protocols are not the solution to some of the other inherent problems of serology for LB diagnostics. MTTT does not solve the problem of distinguishing between an active and past infection after IgG-seroconversion, and all serology still depends on indirect methods of detection, as opposed to direct detection of the causative microorganism. MTTT protocols may, however, be a partial solution to some of the problems surrounding serology, which cause controversy among doctors and patients.