Skip to content
BY 4.0 license Open Access Published online by De Gruyter January 12, 2024

Should we depend on reference intervals from manufacturer package inserts? Comparing TSH and FT4 reference intervals from four manufacturers with results from modern indirect methods and the direct method

  • Niek F. Dirks , Wendy P.J. den Elzen , Jacquelien J. Hillebrand , Heleen I. Jansen ORCID logo , Edwin ten Boekel , Jacoline Brinkman , Madelon M. Buijs , Ayse Y. Demir , Ineke M. Dijkstra , Silvia C. Endenburg , Paula Engbers , Jeannette Gootjes , Marcel J.W. Janssen , Wilhelmina H.A. Kniest-de Jong , Maarten B. Kok , Stephan Kamphuis , Adrian Kruit , Etienne Michielsen , Albert Wolthuis , Anita Boelen and Annemieke C. Heijboer EMAIL logo

Abstract

Objectives

Correct interpretation of thyroid function tests relies on correct reference intervals (RIs) for thyroid-stimulating hormone (TSH) and free thyroxine (FT4). ISO15189 mandates periodic verification of RIs, but laboratories struggle with cost-effective approaches. We investigated whether indirect methods (utilizing historical laboratory data) could replace the direct approach (utilizing healthy reference individuals) and compared results with manufacturer-provided RIs for TSH and FT4.

Methods

We collected historical data (2008–2022) from 13 Dutch laboratories to re-establish RIs by employing indirect methods, TMC (for TSH) and refineR (for FT4). Laboratories used common automated platforms (Roche, Abbott, Beckman or Siemens). Indirect RIs (IRIs) were determined per laboratory per year and clustered per manufacturer (>1.000.000 data points per manufacturer). Direct RIs (DRIs) were established in 125 healthy individuals per platform.

Results

TSH IRIs remained robust over the years for all manufacturers. FT4 IRIs proved robust for three manufacturers (Roche, Beckman and Siemens), but the IRI upper reference limit (URL) of Abbott showed a decrease of 2 pmol/L from 2015. Comparison of the IRIs and DRIs for TSH and FT4 showed close agreement using adequate age-stratification. Manufacturer-provided RIs, notably Abbott, Roche and Beckman exhibited inappropriate URLs (overall difference of 0.5–1.0 µIU/mL) for TSH. For FT4, the URLs provided by Roche, Abbott and Siemens were overestimated by 1.5–3.5 pmol/L.

Conclusions

These results underscore the importance of RI verification as manufacturer-provided RIs are often incorrect and RIs may not be robust. Indirect methods offer cost-effective alternatives for laboratory-specific or platform-specific verification of RIs.

Introduction

Euthyroidism is defined as a normal thyroid function with thyroid stimulating hormone (TSH) and free thyroxine (FT4) concentrations falling within the reference interval (RI). Other combinations of TSH and FT4 may lead to diagnoses such as primary hypothyroidism (elevated TSH and reduced FT4) or hyperthyroidism (reduced TSH and elevated FT4). Obviously, a correct diagnosis depends greatly on correct definition of RIs provided by the laboratory.

Ideally, laboratories establish their own lower and upper reference limits (LRLs & URLs) to define an RI, tailored to their specific method and patient population. The Clinical Laboratory Standards Institute (CLSI) provides guidance (CLSI C28-A3c) for determining RIs [1]. However, determining RIs in accordance with this guideline with at least 120 suitable reference individuals is time consuming and costly. Instead, laboratories frequently rely on RIs from manufacturer package inserts or another laboratory. Thereafter, when transitioning to a new methodology, existing RIs are often adjusted after a method comparison study. The aforementioned guideline also provides guidance for verifying RIs established by the manufacturer or another laboratory. Unfortunately, RIs established externally are not necessarily transferable, because pre-analytical, analytical and biological variables may differ [2], [3], [4]. Ensuring the suitability of RIs requires aligning the externally used platform and reference population with the current platform and patient population. Comparability of the platform may be assessed by means of a method comparison study according to CLSI EP09 [5]. Comparability of the test subject population is more subjective and can only be objectively reviewed by validation of the externally established RI in the test subject population. CLSI C28-A3c recommends verifying the new RI with a reference population of at least 20 samples, accepting the externally established RI when no more than 2 of 20 values fall outside the proposed RI. Of note, when all of the 20 reference values fall within the proposed RI, it might be too wide for the test subject population. The risk of accepting an RI that is too wide is therewith not minimized.

An alternative and increasingly robust approach using modern methodology is to establish or validate an RI using existing historical laboratory data [3]. Utilizing indirect methods on historical data yields estimated RIs that reflect the laboratory’s analytical system and test subject population. Additionally, it enables more extensive partitioning (e.g. for sex and age) that is often too costly for laboratories by an approach using the direct method, provided sufficient historical data are available. In recent years, multiple modern indirect methods have been developed that have overcome the limitations of the well-known Bhattacharya and Hoffmann methods [6]. In a recent study, Ammer et al. compared the Hofmann method with four modern indirect methods, Truncated-Maximum-Likelihood (TML), kosmic, Truncated-Minimum-Chi-Square (TMC), and refineR, alongside the nonparametric direct method using 120 reference samples [7]. TMC performed best for TSH, which exhibits a heavily skewed distribution, while refineR showed the best results for FT4, showing an approximately normal distribution. With a fraction of non-normal results (pathological fraction) of 10 % or less for TSH and FT4, TMC and refineR even outperformed the direct method (using 120 reference individuals).

Once RIs have been established, laboratories should periodically re-evaluate them to comply with ISO standard 15189. The frequency of reassessment remains arbitrary and often occurs only when transitioning to a new method. Therefore, we investigated the robustness of TSH and FT4 RIs over the past 15 years to determine whether more frequent periodical reevaluation of RIs is necessary. We used two indirect methods for estimation of RIs, TMC (for TSH) and refineR (for FT4) on historical data (2008–2022). Additionally, we assessed the suitability of TSH and FT4 RIs estimated with these indirect methods and manufacturer-provided RIs by comparing them to estimated RIs using the direct method.

Materials and methods

Sample and data collection

Direct RIs

TSH and FT4 were measured in serum aliquots obtained from 125 reference individuals (healthy employees of Amsterdam UMC, including 54 males and 71 females, aged 19–68 years). These reference samples were collected between September 2022 and February 2023. From 69 of the 125 reference individuals heparinized plasma aliquots were also available. Written consent was obtained from each participant. Serum and heparinized plasma samples were collected, spun at 3,000×g for 10 min at 18 °C, aliquoted and stored at −80 °C. The storage time did not exceed 10 months.

Indirect RIs

All of the 13 participating laboratories are based in The Netherlands, which is considered an iodine sufficient area. These laboratories support a combined total of 28 hospitals throughout The Netherlands and provide additional primary care laboratory service including phlebotomy at over 500 different locations across the country. For IRIs, TSH and FT4 results from 2008 to 2022 were retrieved from the laboratory information system (LIS) by the participating laboratories and transferred to Excel tables together with an anonymous and unique patient identifier, a unique sample identifier, date of analysis, sex and age. Non-numerical results and results from patients under the age of 18 or above the age of 100 were excluded. To reduce the percentage of pathological results in the datasets, all results from a patient were excluded from the analyses when multiple results were obtained for that patient within the same year (repeated measurements). In case laboratories applied factors on their TSH or FT4 results, these factors were reversed before conducting the analyses. Table 1 provides an overview of the analytical systems, the approximate percentage of primary care results, the number of TSH and FT4 tests and the age and sex distribution of results in 2022 for each of the participating laboratories. Of note, some laboratories have switched manufacturers and may also be included in analyses for a manufacturer different from the one mentioned in Table 1.

Table 1:

Characteristics in 2022 of participating laboratories.

Laboratory Analytical system Number of adult TSH/FT4 tests Percentage of primary care results Median age (IQR) for adult results Average % of female adult results
1 Abbott Architect >200,000/>100,000 ∼50 % 59.8 (42.3–75.1) 67.9 %
2 Abbott Architect >50,000/>20,000 ∼35 % 59.9 (43.2–74.2) 67.0 %
3 Roche Cobas >20,000/>20,000 ∼10 % 58.9 (42.1–72.4) 65.4 %
4 Roche Cobas >200,000/>50,000 ∼45 % 60.7 (44.4–73.8) 68.2 %
5 Roche Cobas >50,000/>20,000 ∼35 % 61.1 (45.6–73.5) 64.8 %
6 Siemens Atellica >50,000/>50,000 ∼50 % 61.8 (45.8–74.9) 65.8 %
7 Beckman Unicel DxI >200,000/>50,000 100 % 54.5 (37.6–70.0) 69.3 %
8 Beckman Unicel DxI >20,000/>5,000 ∼20 % 59.1 (44.0–72.4) 66.0 %
9 Abbott Architect >20,000/>20,000 ∼55 % 63.0 (49.0–75.0) 73.0 %
10 Siemens Atellica >50,000/>20,000 ∼40 % 59.3 (42.9–73.1) 64.4 %
11 Roche Cobas >50,000/>20,000 ∼30 % 63.1 (48.5–75.3) 62.4 %
12 Siemens Atellica >20,000/>10,000 ∼30 % 58.6 (40.3–72.7) 67.8 %
13 Siemens Atellica >100,000/>50,000 100 % 58.2 (41.0–71.9) 68.4 %
  1. IQR, interquartile range.

Analytical procedures

All participating laboratories were accredited according to ISO 15189 (or CCKL (National Coordination Committee for Quality Assurance for Health Care Laboratories in The Netherlands) in the past) and performed internal and external quality assessments. TSH and FT4 concentrations were determined on their testing platform using the appropriate reagents and calibrators according to manufacturer instructions (see Table 1 for the details on the testing platforms). Blood samples were drawn by venipuncture according to local guidelines. All laboratories used heparinized plasma tubes for TSH and FT4. The Supplemental Materials and Methods provides precise information regarding the TSH and FT4 assays employed by each manufacturer.

Direct reference intervals

TSH and FT4 concentrations were analyzed using Roche Cobas e601, Abbott Alinity I, and Siemens Atellica IM 1300 at the Endocrine Laboratory of Amsterdam UMC. Nij Smellinghe Hospital analyzed TSH and FT4 concentrations using the Beckman Unicel DxI. DRIs were based on 125 samples. DRIs were calculated parametrically or non-parametrically, depending on the distribution of results. The 2.5th and 97.5th percentiles and their 90 % confidence intervals were defined as per CLSI 28-A3c using MedCalc Statistical Software version 18.5 [1]. Given that internationally TSH and FT4 are typically measured in serum, while all participating Dutch laboratories in this study used heparinized plasma samples, method comparisons (Passing-Bablok regression using Analyse-it for Microsoft Excel (version 2.30)) between 69 paired serum and heparinized plasma samples were performed to determine potential significant differences between the two matrices.

Indirect reference intervals

Methodology

The indirect methods TMC and refineR are based on the assumption that the majority of the included data is non-pathological [89]. Both algorithms presuppose the presence of a range of test results where the contribution of pathological results is negligible, and the distribution of the non-pathological fraction can be effectively modeled using a Box-Cox transformed normal distribution. The models are defined by a power parameter λ, mean µ, and standard deviation σ. For TMC, RI estimation was conducted as described in the TMC Software Manual (version 13, revision 2022-11-15). For RefineR, R package refineR version 1.6.0 was used. For both TMC and RefineR, a combination of Excel and R Statistical Software v. 4.2.1 (Foundation for Statistical Computing, Vienna, Austria) was used to perform the exclusion steps and the analyses. Two hundred bootstraps were used to estimate RIs for FT4 using refineR. Resulting histograms with proposed reference limits (RLs) from TMC and RefineR were examined visually and checked for erroneous modelled distributions. Years with erroneous modelled distributions were excluded from subsequent analyses.

Data stability

The stability of test results over time was assessed. Details are provided in the Supplemental Materials and Methods.

Stratification

Because the populations for estimation of direct (18–79 years) and IRIs (18–100 years) were different, we assessed if stratification based on age was required to compare directly and indirectly estimated RIs for TSH or FT4. Details are provided in the Supplemental Materials and Methods.

Reflex-testing bias

A number of laboratories use reflex-testing for assessment of thyroid function. It means measurement of FT4 is automatically triggered when the TSH result is outside the RI. Requesting only a TSH, an FT4, or the combination irrespective of TSH results remained possible. To evaluate whether the FT4 dataset from any laboratory was biased to non-euthyroid states as a result of a higher percentage of hypothyroid and hyperthyroid individuals, we calculated laboratory IRIs for FT4 in 2022 from a subset of data containing only FT4 results from patients with normal TSH (based on the TSH IRIs in 2022).

Long-term robustness of reference intervals

Equivalence limits (ELs), as described by Haeckel et al., were used to compare IRIs [10]. In short, ELs are determined by the permissible uncertainty at the two reference limits of the RI. It is based on the permissible analytical standard deviation derived from the empirical biological variation (calculated from the RI). Years with RLs that surpassed ELs corresponding to the overall median RLs were considered non-robust and were excluded from subsequent analyses. In that case, a new median was determined without the non-robust RLs. This procedure was repeated until no RLs surpassed ELs.

Comparison of the indirect and manufacturer provided reference intervals with the direct reference intervals

IRIs per manufacturer (median of all laboratories) from 2022 and manufacturer-provided RIs were compared with DRIs using bias ratios (BRs) as described by Ozarda et al. [11]. In short, BRs are based on the permissible bias at the minimum level and must not exceed 0.375 (minimal bias ratio). Desirable (<0.250) and optimal (<0.125) BRs have also been defined.

Expert meeting group

An expert meeting group was established, comprising clinical chemists with expertise in endocrinology and data science from each of the participating laboratories. This group convened on multiple occasions to discuss the methodology, results and consequences of our findings.

Results

Direct reference intervals

TSH measured using the Roche Cobas showed a very small relative difference between serum and heparinized samples (4 %), which was considered clinically insignificant. Comparison graphs are depicted in Supplemental Figure 1. Results of the direct estimation of RIs are included in Table 2. As expected, TSH distributions were skewed and the non-parametric method was used to define the TSH DRIs. Roche, Abbott and Beckman showed normal distributions for FT4 (D’Agostino-Pearson test) and therefore the parametric method was used to define FT4 DRIs. FT4 results obtained with the Siemens Atellica were not normally distributed, necessitating the use of the non-parametric method.

Table 2:

Comparison of the indirect and direct TSH RIs with the manufacturer-suggested TSH RIs.

Direct method (2023) Indirect method (2022) Manufacturer-suggested
19–68 years 18–100 years 18–60 years
LRL (90 % CI) URL (90 % CI) Median LRL (range) Median URL (range) BR LRL BR URL Median LRL (range) Median URL (range) BR LRL BR URL LRL ULR BR LRL BR URL
Roche Cobas (n=4) 0.79 (0.60–0.95) 4.79 (4.41–6.46) 0.66 (0.65–0.67) 5.02 (4.86–5.32) 0.127 0.225 0.65 (0.63–0.71) 4.73 (4.64–5.04) 0.137 0.059 0.27 4.20 0.510 0.578
Abbott Alinity (n=3) 0.67 (0.43–0.77) 3.97 (3.82–5.71) 0.55 (0.47–0.56) 4.39 (4.34–5.10) 0.151 0.528 0.56 (0.49–0.57) 4.06 (3.94–4.32) 0.138 0.113 0.35a 4.94a 0.402 1.219
Siemens Atellica (n=4) 0.76 (0.60–0.89) 4.76 (4.40–6.94) 0.66 (0.65–0.67) 5.09 (5.06–5.58) 0.098 0.323 0.66 (0.62–0.67) 4.71 (4.62–5.05) 0.098 0.049 0.55 4.78 0.206 0.020
Beckman DxI (n=2) 0.71 (0.60–0.83) 4.67 (4.27–6.71) 0.64 (0.61–0.66) 5.33 (5.16–5.49) 0.069 0.653 0.64 (0.61–0.67) 4.84 (4.61–5.06) 0.069 0.168 0.38 5.33 0.327 0.653
  1. aAbbott provides a TSH RI based on the central 99 % interval, instead of the regular central 95 % interval. The Abbott TSH RI based on the central 95 % interval would be 0.47–3.67 µIU/mL. Bias ratios for these RLs are 0.251 (LRL) and 0.377 (URL). BR LRL = LRL LRL 0 SD RI , BR URL = URL URL 0 SD RI , SD RI = URL 0 LRL 0 3.92 , minimal BR<0.375, desirable BR<0.250, optimal BR<0.125. LRL, lower reference limit; URL, upper reference limit; BR, bias ratio.

Indirect reference intervals

Data stability

Moving monthly median TSH plots (Supplemental Figure 2) indicated the stability of TSH results across the studied time periods in nearly all laboratories. Moving monthly median plots for FT4 (Supplemental Figure 3) show that FT4 results were generally less stable. Further details are given in the Supplemental Materials and Methods.

Stratification

Detailed examination of TSH and FT4 IRIs for each age interval of 10 years showed the need for stratification the TSH and FT4 IRIs with higher age. From 60 years onwards, the TSH IRIs surpassed Els (Supplemental Figure 4). For FT4, IRIs surpassed the ELs from 70 years and older (Supplemental Figure 5). For comparison, age groups 18–60 and 18–100 were used for TSH and 18–70 and 18–100 for FT4. More details are provided in the Supplemental Materials and Methods.

Long-term robustness of reference intervals

Over seven million results for adult TSH were retrieved from the LISs between 2008 and 2022, of which around 55 % was from patients of 18–60 years old. Over 2 million adult FT4 results were retrieved, of which around 70 % were from patients of 18–70 years old. Figures 1 and 2 show the IRIs in the age-group 18–60 (TSH) or 18–70 (FT4) clustered per testing platform (Roche, Abbott, Beckman or Siemens). Single non-robust years that were excluded from the analyses frequently showed cut-off violations in the same year in the moving monthly median plots. All median TSH IRIs and Roche, Beckman and Siemens FT4 IRIs showed long-term robustness during the studied time period. FT4 IRIs from Abbott did not show continued robustness. Instead, it showed a decline during the time period 2013–2015 as was expected based on the data stability results.

Figure 1: 
Indirect TSH reference intervals (RIs) for the four manufacturers during the periods 2008–2022 (Roche, Abbott and Beckman) or 2014–2022 (Siemens). Individual laboratory indirect RLs are depicted as single dots. Indirect RIs are defined by the median yearly LRLs and URLs from the individual laboratories (grey area). Manufacturer opted RLs are depicted as dotted lines. RIs estimated by the direct method are depicted as dashed lines.
Figure 1:

Indirect TSH reference intervals (RIs) for the four manufacturers during the periods 2008–2022 (Roche, Abbott and Beckman) or 2014–2022 (Siemens). Individual laboratory indirect RLs are depicted as single dots. Indirect RIs are defined by the median yearly LRLs and URLs from the individual laboratories (grey area). Manufacturer opted RLs are depicted as dotted lines. RIs estimated by the direct method are depicted as dashed lines.

Figure 2: 
Indirect FT4 reference intervals (RIs) for the four manufacturers (Roche, Abbott, Beckman & Siemens) during the time period 2008–2022. Individual laboratory indirect RLs are depicted as single dots. Indirect RIs are defined by the median yearly LRLs and URLs from the individual laboratories (grey area). Manufacturer opted RLs are depicted as dotted lines. RIs estimated by the direct method are depicted as dashed lines.
Figure 2:

Indirect FT4 reference intervals (RIs) for the four manufacturers (Roche, Abbott, Beckman & Siemens) during the time period 2008–2022. Individual laboratory indirect RLs are depicted as single dots. Indirect RIs are defined by the median yearly LRLs and URLs from the individual laboratories (grey area). Manufacturer opted RLs are depicted as dotted lines. RIs estimated by the direct method are depicted as dashed lines.

Comparison of the indirect and direct reference intervals with the manufacturer suggested reference intervals

Table 2 (TSH) and Table 3 (FT4) show the results from the comparison of the DRIs, IRIs and the manufacturer suggested RIs. For visual comparison, manufacturer suggested and DRIs were included as horizontal lines in Figures 1 and 2. TSH IRI estimation in the age group 18–60 obtained very similar results compared to the DRIs, with optimal or even desirable BRs. The need for RI partitioning is reflected by the improved BRs obtained with the indirect method in the age group 18–60 compared to those obtained with the age group 18–100. The TSH RI suggested by Siemens showed good agreement with DRIs with BRs below the desirable level. On the contrary, the TSH IRIs as suggested by Roche, Abbott and Beckman did not surpass minimal BRs when compared with the DRIs. FT4 IRIs in the age group 18–70 and 18–100 obtained wider results compared to the DRIs. Using the age group 18–70 only slightly improved the comparison. None of the FT4 RLs obtained by the indirect method surpassed minimal BRs except the indirect Siemens FT4 LRL, which showed an optimal BR. FT4 URLs as suggested by Roche, Abbott and Siemens, were too high compared to the direct method. LRLs and the Beckman URL were closer to those obtained by the direct method, yet did not always surpass a minimal BR (Abbott and Beckman).

Table 3:

Comparison of the indirect and direct FT4 RIs with the manufacturer-suggested FT4 RIs.

Direct method (2023) Indirect method (2022) Manufacturer-suggested
19–68 years 18–100 years 18–70 years 18–70 years TSH within RI
LRL (90 % CI) URL (90 % CI) Median LRL (range) Median URL (range) BR LRL BR URL Median LRL (range) Median URL (range) BR LRL BR URL Median LRL (range) Median URL (range) BR LRL BR URL LRL ULR BR LRL BR URL
Roche Cobas (n=4) 12.6 (12.1–13.1) 19.8 (19.3–20.2) 11.6 (11.3–12.0) 20.8 (20.5–21.2) 0.544 0.544 11.5 (11.4–11.9) 20.6 (20.4–21.0) 0.599 0.436 11.6 (11.4–12.0) 20.2 (19.7–20.9) 0.544 0.218 12 22 0.327 1.198
Abbott Alinity (n=3) 9.6 (9.4–9.9) 13.6 (13.3–13.8) 10.1 (10.1–10.5) 16.5 (15.9–16.7) 0.490 2.842 10.2a (9.9–10.4) 16.2a (15.9–16.6) 0.588 2.548 10.5a (10.3–10.6) 16.2a (15.9–16.5) 0.882 2.548 9.01b 19.5b 0.578 5.782
Siemens Atellica (n=4) 12.1 (11.4–12.9) 19.5 (17.9–21.1) 12.0 c (11.7–12.3) 20.9c (20.2–22.2) 0.053 0.742 12.0c (11.6–12.3) 20.7c (20.3–20.7) 0.053 0.636 12.2 (11.8–12.5) 20.7 (20.0–21.4) 0.053 0.636 11.5 22.7 0.318 1.695
Beckman DxI (n=2) 8.6 (8.3–9.0) 13.9 (13.6–14.3) 7.9 (7.8–8.1) 15.3 (15.1–15.4) 0.518 1.035 7.9 (7.8–8.0) 15.0 (14.9–15.0) 0.518 0.814 8.1 (7.9–8.3) 14.5 (14.3–14.7) 0.370 0.444 7.86 14.41 0.547 0.377
  1. an=2, 2022 data from laboratory 9 was excluded after visual check of refineR model. bAbbott provides an FT4 RI based on the central 99 % interval, instead of the regular central 95 % interval. The Abbott FT4 RI based on the central 95 % interval would be 10.3–17.8 pmol/L. Bias ratios for these RLs are 0.686 (LRL) and 4.077 (URL). cn=3, 2022 data from laboratory 12 was excluded after visual check of refineR model. BR LRL = LRL LRL 0 SD RI , BR URL = URL URL 0 SD RI , SD RI = URL 0 LRL 0 3.92 , minimal BR<0.375, desirable BR<0.250, optimal BR<0.125. LRL, lower reference limit; URL, upper reference limit; BR, bias ratio.

Discussion

This paper describes our findings on the robustness of TSH and FT4 RIs, the inaccuracy of TSH and FT4 RIs from manufacturer package inserts and the applicability of modern indirect methods to estimate TSH and FT4 RIs. We demonstrated the robustness of Roche, Abbott, Beckman and Siemens TSH IRIs. Our results also revealed the robustness of Roche, Beckman and Siemens FT4 IRIs, while the data from Abbott showed a decline in the FT4 URL over the time period 2013–2015. Earlier, Algeciras-Schimnich et al. have suggested that lot-to-lot validation studies might not be adequate to assess robustness of TSH and FT4 results [12, 13], which potentially explains why the decline in Abbott results might have been overlooked.

Furthermore, we identified inaccuracies in the TSH and FT4 RIs from manufacturer package inserts and discourage transferring these RIs to our laboratories due to too narrow (Roche TSH), too wide (Abbott TSH, Beckman TSH, Roche FT4 and Abbott FT4), too low (Beckman FT4) or too high (Siemens) RIs. As expected, the LRL and URL of Abbott based on the central 95 % interval (instead of the central 99 % interval) more closely resembled the DRI, but still were too low (TSH) or too high (FT4).

The TSH IRIs derived from the age group 18–60 demonstrated excellent agreement with the results from the direct method. The FT4 IRIs derived from the age group 18–70 are wider than the results from the direct study. The reasons for these less-than-optimal FT4 IRIs are possibly multifaceted, encompassing analytical, inter- and intra-individual variations, seasonal fluctuations and day-to-day variability and the distribution of results, all of which differ for TSH and FT4. In addition, Ammer et al. showed that indirect methods might outperform the direct method [7]. Overall, IRIs represent a significant improvement over the manufacturer-suggested FT4 RIs.

Alarmingly, the manufacturer suggested FT4 URL from Abbott differed almost 6 pmol/L from the results obtained with the direct method in 2023. When taking the central 95 % RI from Abbott instead of the central 99 % CI indicated in the manufacturer’s assay information, the results still differed more than 4 pmol/L. Interestingly, the indirect method also overestimated the FT4 URL with 3 pmol/L. As another recent study also showed an actual Abbott FT4 ULN of approximately 16 pmol/L, we hypothesized this difference to be caused by a second fast decline in the results of the assay near the URL in between measurement of the latest FT4 results for the IRIs (2022) and collection and measurement of the samples for the DRIs (2023) [14]. Thereupon, we re-measured 95 samples collected and measured at laboratory 1 in June 2022 (lots 35506UD00 and 40045UD00), which were frozen at −80 °C, with newer Abbott reagent at laboratory 1 in August 2023 (lot 50481UD00). A proportional bias of 12 % was found (Supplemental Figure 6). To confirm these results, we additionally measured the samples at the Endocrine Laboratory of Amsterdam UMC with the most recent Abbott FT4 reagent in August 2023 and found a significant proportional difference of 16 % between the two reagents, meaning an initial negative trend was continued, explaining the observed difference between the indirect and direct URLs.

The amount of filtering historical LIS data requires before it is suitable for use in an indirect method remains subject of debate. Ma et al. have evaluated two approaches for the use of the indirect method to establish IRIs for TSH and FT4 [14]. They compared a largely unfiltered dataset with patient data downloaded directly from the LIS and a dataset with patient data that met strict exclusion criteria. Both methods resulted in similar results proving that establishing IRIs for TSH and FT4 without extensive filtering using additional exclusion criteria is possible. We decided to exclude all results from repeated measurements as Arzideh et al. showed that the use of all data without a filtering step results in a significant bias in reference limits [15]. Our findings confirm the applicability of indirect methods with use of minimal filtering to establish or validate TSH and FT4 RIs. A conclusion already drawn by several studies, yet none of those studies matched the comprehensiveness of our study with data from all four major manufacturers, nor did they assess the robustness of TSH and FT4 IRIs using historical data. We have provided a summary of all efforts to estimate direct or indirect TSH and FT4 RIs from 2008 till 2023 in Supplemental Table 1.

Bohn et al. have used a very similar approach in their pursuit of RI harmonization for 16 analytes, including TSH and FT4 [10]. For the TSH method an independent harmonized RI was deemed plausible in Canada, while method-specific FT4 results differed substantially. Our findings corroborate these observations regarding FT4 differences, but also found substantial differences between the Abbott URL (4.06) and Roche, Beckman and Siemens URLs (4.73, 4.71 and 4.84, respectively). This means that we are still far from harmonized method independent RIs for TSH and FT4 based on our results and waiting for the IFCC Working Group for Standardization of Thyroid Function Tests (WG-STFT) to effectuate the harmonization [16], [17], [18], [19].

TSH and, to a lesser extent, FT4 show an age dependency with the RI broadening with increasing age. Whether this raise in TSH and FT4 is physiological, a representation of thyroid dysfunction, or reflecting the presence of comorbidities is a question for ongoing debate [20]. Nonetheless, it means that correct stratification is necessary to adequately compare directly and indirectly estimated RIs [4]. We have contemplated in another paper on the implications of age-specific TSH and FT4 RIs in the laboratory (Jansen et al., in preparation).

The reason why appropriate RIs are essential in the assessment of thyroid function was shown by Coene et al. [21]. They concluded that TSH and FT4 RIs applied by an individual laboratory are of influence in the assessment of thyroid function by showing the diagnosis of subclinical thyroid disease in The Netherlands. They showed that this could be a laboratory-induced condition as a result of large variation in TSH concentrations and applied RIs. We looked at the diagnoses of overt hypo- and hyperthyroidism (based on the first measurement per patient) and calculated the number of new patients diagnosed with overt thyroidal illnesses with the Abbott manufacturer-suggested RIs (TSH: 0.35–4.94 µIU/mL, FT4: 9.01–19.5 pmol/L) compared to the Abbott IRIs (TSH: 0.57–4.06 µIU/mL, FT4: 10.5–16.2 pmol/L) in the laboratories employing Abbott assays. The percentage of overt hypothyroidism diagnoses increased from 0.8 to 3.3 %, while the percentage of overt hyperthyroidism diagnoses increased from 1.2 to 3.9 %.

Our study has a few noteworthy limitations. First, we were unable to differentiate between non-pregnant and pregnant females in the datasets, despite the known influence of pregnancy on TSH and FT4 RIs [22]. Additionally, although we have attempted to exclude patients on thyroid medication by excluding results from repeated measurements, we were not able to quantify the success rate of this exclusion. Dutch guidelines permit annual monitoring of TSH in stable patients with thyroidal disease, which means these patients could still be part of our datasets. Notably, Jansen et al. demonstrated that patients on levothyroxine exhibit higher FT4 concentrations without complete TSH suppression and it is unclear whether RefineR classifies these results as pathological or non-pathological [23].

Conclusions

Our findings underscore the importance of exercising caution when considering implementation of manufacturer-suggested RIs for TSH and FT4 in the laboratory. We observed that indirect estimations of TSH and FT4 offer better comparability to DRIs. Laboratories should establish their own RIs tailored to their specific methods and characteristics of their patient population, or adopt RIs from a more reliable source than the manufacturer package inserts. We recommend frequent verification of RI to identify potential drifts in results to prevent under- or overestimation of thyroid disorders.


Corresponding author: Annemieke C. Heijboer, Endocrine Laboratory, Department of Laboratory Medicine, Amsterdam UMC Location University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands; Amsterdam Gastroenterology, Endocrinology & Metabolism, Amsterdam, The Netherlands; Department of Laboratory Medicine, Amsterdam UMC Location Vrije Universiteit Amsterdam, Boelelaan 1117, Amsterdam, The Netherlands; and Amsterdam Reproduction & Development Research Institute, Amsterdam, The Netherlands, E-mail:

  1. Research ethics: This study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

  2. Informed consent: Informed consent was obtained from all individuals included in this study.

  3. Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  4. Competing interests: The authors state no conflict of interest.

  5. Research funding: None declared.

  6. Data availability: Not applicable.

References

1. CLSI, Defining, establishing, and verifying reference intervals in the clinical laboratory; Approved guideline—Third Edition. CLSI document EP28-A3c, Wayne, PA: Clinical and Laboratory Standards Institute; 2008.Search in Google Scholar

2. Ozcurumez, MK, Haeckel, R. Biological variables influencing the estimation of reference limits. Scand J Clin Lab Invest 2018;78:337–45. https://doi.org/10.1080/00365513.2018.1471617.Search in Google Scholar PubMed

3. Tate, JR, Yen, T, Jones, GR. Transference and validation of reference intervals. Clin Chem 2015;61:1012–5. https://doi.org/10.1373/clinchem.2015.243055.Search in Google Scholar PubMed

4. Haeckel, R, Wosniok, W. The importance of correct stratifications when comparing directly and indirectly estimated reference intervals. Clin Chem Lab Med 2021;59:1628–33. https://doi.org/10.1515/cclm-2021-0353.Search in Google Scholar PubMed

5. Clinical and Laboratory Standards Institute (CLSI). Measurement procedure comparison and bias estimation using patient samples; approved guideline—third edition. CLSI document EP09-A3. Wayne, PA. CLSI; 2013.Search in Google Scholar

6. Haeckel, R, Wosniok, W, Streichert, T, Members of the Section Guide Limits of the DGKL. Review of potentials and limitations of indirect approaches for estimating reference limits/intervals of quantitative procedures in laboratory medicine. J Lab Med 2021;45:35–53. https://doi.org/10.1515/labmed-2020-0131.Search in Google Scholar

7. Ammer, T, Schützenmeister, A, Prokosch, H, Zierk, J, Rank, CM, RIbench, RM. A proposed benchmark for the standardized evaluation of indirect methods for reference interval estimation. Clin Chem 2022;68:1410–24. https://doi.org/10.1093/clinchem/hvac142.Search in Google Scholar PubMed

8. Wosniok, W, Haeckel, R. A new indirect estimation of reference intervals: truncated minimum chi-square (TMC) approach. Clin Chem Lab Med 2019;57:1933–47. https://doi.org/10.1515/cclm-2018-1341.Search in Google Scholar PubMed

9. Ammer, T, Schützenmeister, A, Prokosch, H, Rauh, M, Rank, CM, refineR, ZJ. A novel algorithm for reference interval estimation from real-world data. Sci Rep 2021;11:16023. https://doi.org/10.1038/s41598-021-95301-2.Search in Google Scholar PubMed PubMed Central

10. Haeckel, R, Wosniok, W, Arzideh, F. Equivalence limits of reference intervals for partitioning of population data. Relevant differences of reference limits. LaboratoriumsMedizin 2016;40:199–205. https://doi.org/10.1515/labmed-2016-0002.Search in Google Scholar

11. Ozarda, Y, Ichihara, K, Jones, G, Streichert, T, Ahmadian, R, IFCC, C-RIDL. Comparison of reference intervals derived by direct and indirect methods based on compatible datasets obtained in Turkey. Clin Chim Acta 2021;520:186–95. https://doi.org/10.1016/j.cca.2021.05.030.Search in Google Scholar PubMed

12. Algeciras-Schimnich, A, Bruns, DE, Boyd, JC, Bryant, SC, La Fortune, KA, Grebe, SKG. Failure of current laboratory protocols to detect lot-to-lot reagent differences: findings and possible solutions. Clin Chem 2013;59:1187–94. https://doi.org/10.1373/clinchem.2013.205070.Search in Google Scholar PubMed

13. Katzman, BM, Ness, KM, Algeciras-Schimnich, A. Evaluation of the CLSI EP26-A protocol for detection of reagent lot-to-lot differences. Clin Biochem 2017;50:768–71. https://doi.org/10.1016/j.clinbiochem.2017.03.012.Search in Google Scholar PubMed

14. Ma, C, Cheng, X, Xue, F, Li, X, Yin, Y, Wu, J, et al.. Validation of an approach using only patient big data from clinical laboratories to establish reference intervals for thyroid hormones based on data mining. Clin Biochem 2020;80:25–30. https://doi.org/10.1016/j.clinbiochem.2020.03.012.Search in Google Scholar PubMed

15. Arzideh, F, Özcürümez, M, Albers, E, Haeckel, R, Strecihert, T. Indirect estimation of reference intervals using first or last results and results from patients without repeated measurements. J Lab Med 2021;45:103–9. https://doi.org/10.1515/labmed-2020-0149.Search in Google Scholar

16. Thienpont, LM, Van Uytfanghe, K, Van Houcke, S, Das, B, Faix, JD, MacKenzie, F, et al.. A progress report of the IFCC committee for standardization of thyroid function tests. Eur Thyroid J 2014;3:109–16. https://doi.org/10.1159/000358270.Search in Google Scholar PubMed PubMed Central

17. Thienpont, LM, Van Uytfanghe, Beastall, G, Faix, JD, Ieiri, T, Miller, WG, et al.. Report of the IFCC working group for standardization of thyroid function tests; part 1: thyroid-stimulating hormone. Clin Chem 2010;56:902–11. https://doi.org/10.1373/clinchem.2009.140178.Search in Google Scholar PubMed

18. Thienpont, LM, Van Uytfanghe, Beastall, G, Faix, JD, Ieiri, T, Miller, WG, et al.. Report of the IFCC working group for standardization of thyroid function tests; part 2: free thyroxine and free triiodothyronine. Clin Chem 2010;56:912–20. https://doi.org/10.1373/clinchem.2009.140194.Search in Google Scholar PubMed

19. Thienpont, LM, Faix, JD, Beastall, G. Standardization of free thyroxine and harmonization of thyrotropin measurements: a request for input from endocrinologists and other physicians. Thyroid 2015;25:1379–80. https://doi.org/10.1089/thy.2015.0309.Search in Google Scholar PubMed PubMed Central

20. Raverot, V, Bonjour, M, du Payrat, AJ, Perrin, P, Roucher-Boulez, F, Lasolle, H, et al.. Age- and sex-specific TSH upper-limit reference intervals in the general French population: there is a need to adjust our actual practices. J Clin Med 2020;9:792. https://doi.org/10.1530/endoabs.70.aep898.Search in Google Scholar

21. Coene, KL, Demir, AY, Broeren, MAC, Verschuure, P, Lentjes, EGWM, Boer, AK. Subclinical hypothyroidism: a “laboratory-induced” condition? Eur J Endocrinol 2015;173:499–505. https://doi.org/10.1530/eje-15-0684.Search in Google Scholar PubMed

22. Osinga, JAJ, Derakhshan, A, Palomaki, GE, Ashoor, G, Männistö, T, Maraka, S, et al.. TSH and FT4 reference intervals in pregnancy: a systematic review and individual participant data meta-analysis. J Clin Endocrinol Metab 2022;107:2925–33. https://doi.org/10.1210/clinem/dgac425.Search in Google Scholar PubMed PubMed Central

23. Jansen, HI, Bult, MM, Bisschop, PH, Boelen, A, Heijboer, AC, Hillebrand, JJ. Increased fT4 concentrations in patients using levothyroxine without complete suppression of TSH. Endocr Connect 2023;12:e220538. https://doi.org/10.1530/ec-22-0538.Search in Google Scholar


Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/cclm-2023-1237).


Received: 2023-11-01
Accepted: 2023-12-30
Published Online: 2024-01-12

© 2024 the author(s), published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 21.5.2024 from https://www.degruyter.com/document/doi/10.1515/cclm-2023-1237/html
Scroll to top button