Data mining of pediatric reference intervals

Jakob Zierk; Markus Metzler; Manfred Rauh

doi:10.1515/labmed-2021-0120

Open Access Published by De Gruyter October 13, 2021

Data mining of pediatric reference intervals

Jakob Zierk , Markus Metzler and Manfred Rauh

From the journal Journal of Laboratory Medicine

https://doi.org/10.1515/labmed-2021-0120

Abstract

Laboratory tests are essential to assess the health status and to guide patient care in individuals of all ages. The interpretation of quantitative test results requires availability of appropriate reference intervals, and reference intervals in children have to account for the extensive physiological dynamics with age in many biomarkers. Creation of reference intervals using conventional approaches requires the sampling of healthy individuals, which is opposed by ethical and practical considerations in children, due to the need for a large number of blood samples from healthy children of all ages, including neonates and young infants. This limits the availability and quality of pediatric reference intervals, and ultimately negatively impacts pediatric clinical decision-making. Data mining approaches use laboratory test results and clinical information from hospital information systems to create reference intervals. The extensive number of available test results from laboratory information systems and advanced statistical methods enable the creation of pediatric reference intervals with an unprecedented age-related accuracy for children of all ages. Ongoing developments regarding the availability and standardization of electronic medical records and of indirect statistical methods will further improve the benefit of data mining for pediatric reference intervals.

Keywords: data mining; indirect methods; pediatric reference intervals

Introduction

Laboratory tests are a ubiquitous tool for health assessment in modern medicine and support diagnostic and therapeutic decisions in patients of all ages [1]. The interpretation of quantitative laboratory test results requires knowledge of the distribution of test results in healthy individuals, and the effect of both physiological and pathological factors that influence test results [2]. Therefore, most laboratory test results need to be reported together with the so-called reference interval, which represents the central 95% range of physiological test results in a group of individuals that is comparable to the person in which the test was performed [3].

Direct methods to create reference intervals

The classical approach (so called direct method) to create reference intervals is both straightforward and challenging: laboratory testing is performed in a sufficiently large group of healthy individuals, and the 2.5 and 97.5th percentiles of test results are reported as lower and upper reference limits [3, 4]. Different statistical methods exist to calculate percentiles, and the appropriate technique depends on sample size and sample distribution, and may require mathematical transformation of the dataset [5]. The number of samples required for sufficiently accurate and precise reference intervals depends on the distribution of test results, and minimum sample sizes of 120–400 samples are generally recommended [6]. To account for factors that influence test results, reference intervals have to be stratified by the relevant covariates, which depend on the measured analyte, and in most cases include at least sex, age, and the used measurement method. In practice, the recruitment of a sufficiently large number of healthy individuals while accounting for the relevant covariates is the critically limiting factor for practical, logistical, and financial reasons [7].

Challenges when using direct methods to create pediatric reference intervals

When reference intervals for children need to be established, ethical considerations additionally complicate the recruitment of healthy individuals. Blood drawing from children -especially young infants and neonates- is associated with pain and stress, and results in blood loss (albeit minor), without individual benefit for participating healthy children in case of reference interval studies. These factors restrict the availability of high-quality reference intervals that stratify perfectly for all relevant covariates [7], although the Canadian CALIPER study and others have dramatically improved the situation [8], [9], [10]. Due to the age-dependent physiological changes in children, precise stratification for age is essential in many analytes, but is directly opposed by the resulting increase (i.e. multiplication per age group) in necessary blood samples. Importantly, this especially concerns the most vulnerable pediatric populations -neonates and young infants- which exhibit the most dynamic changes in many biomarkers with age [7, 11, 12]. Ethical considerations are particularly pronounced in these very young children, who are considered to require the highest possible level of protection from interventions from which they do not directly benefit (i.e. blood drawing for the establishment of reference intervals). This results in the dilemma that for the most vulnerable pediatric subpopulations, which are overrepresented in clinical-decision making due to their disproportionate share in pediatric morbidity and mortality, availability and quality of reference intervals are worst.

Although reference intervals established using direct methods are considered the gold standard, they do not necessarily reflect a unique ground truth, but are influenced by the choice of statistical methods and in- and exclusion criteria. Recently, Hickman et al. showed that the choice of outlier exclusion criteria (e.g. Reed-Dixon or Tukey) can have a substantial effect on the resulting reference intervals [13]. Similarly, the strict inclusion criteria employed when using direct methods can lead to the selection of a “super-healthy” minority of individuals, especially in the elderly, where the prevalence of comorbidities is high and a majority of individuals take prescription medication. Direct methods exclude these individuals and result in reference intervals that are based on samples from the minority of elderly patients without any comorbidities and prescription medication, which has resulted in the exclusion of up to 80% of screened candidates in major studies [2, 14]. It is therefore unclear whether these reference intervals are appropriate for the population in which they are ultimately used, or whether they are inadequately narrow and lead to unnecessary flagging of essentially “normal” test results. Finally, the challenges associated with creating reference intervals using direct methods often result in reference intervals that do not stratify as accurately as required for all relevant covariates, resulting in the use of inappropriate (e.g. not adequately age-matched) reference intervals due to the lack of more appropriate alternatives.

Indirect methods as a complement to direct methods

Due to these concerns, so-called indirect methods or data mining methods have been developed as a complement or alternative to conventional reference interval methods [4, 15], [16], [17]. The intent of these methods is to use test results that have been obtained during patient care to create reference intervals, and a prerequisite for their application is therefore that the analyzed dataset indeed contains a subset of physiological test results. Thus, the challenge of all indirect methods is the identification of the physiological samples (or of their distribution) in the mixed input dataset, which contains both physiological test results and abnormal values.

Basically, two strategies that are not mutually exclusive exist towards this goal: metadata-driven strategies apply filters to the input dataset, or more specifically to the associated metadata, to selectively include and exclude test results. A non-exhaustive list of typical filters includes patients’ diagnoses, performed procedures, treatment units (e.g. outpatient vs. inpatient, intensive care units), medical specialties, number of test results per analyte in a given time range, or test results in other analytes – the aim of these filters being a selective enrichment of physiological values of the analyte under consideration. The performance of this strategy depends on the availability and quality of the available metadata and the validity and completeness of the assumptions that are used to construct the applied filters.

On the other hand, primarily statistical approaches are used: the basic assumption of statistical indirect reference methods is the presence of a “major” distribution of physiological test results, which can be identified despite a “contamination” by abnormal test results. More specifically, the vast majority of statistical indirect methods relies on the presence of a range of test results in the input data set, in which the contamination with abnormal test results is negligible, and use the shape of the (truncated) distribution in this range to estimate the parameters of the distribution of physiological test results (see Figure 1 for an example as implemented in the kosmic algorithm). In most cases, these methods assume a specific distribution type (typically a Gaussian distribution or a Box-Cox-transformed Gaussian distribution) for the physiological test results. The range of uncontaminated test results is identified either using visualization by a human operator using an appropriate graphical representation, or using a mathematical optimization method. A variety of indirect methods have been implemented and are currently used for estimating reference intervals, the most recent or historically important include the visual Hoffmann approach [18] and the Bhattacharya method [19], Arzideh et al.’s Truncated Minimum Likelihood (TML) method [20], [21], [22], Wosniok et al.’s Truncated Minimum Chi-square approach (TMC) [23], a modified version of the TML method by Zierk et al. called kosmic [24], and most recently the refineR algorithm by Ammer et al. [25] (See also Haeckel et al. for a review of different methods [16], although kosmic is only briefly discussed and refineR was published after that review). These methods have been applied to clinical databases and the derived reference intervals are used for clinical decision-making, however, a comparative in-depth evaluation of their performance has not yet been performed. kosmic has been evaluated in various benchmarking datasets, showing valid reference intervals in the presence of up to 20–30% abnormal test results in simulated scenarios, and in real-world datasets even if samples with a high proportion of pathological test results (e.g. blood counts from intensive care units and hematology/oncology) were included. refineR has been shown to outperform kosmic under most conditions in simulated datasets, and outperforms even the direct method (n=120) in the published simulation studies, although no real-world data were analyzed for both comparisons. Despite these results, a more comprehensive benchmark and head-to-head comparison of indirect reference interval algorithms is required, and would clarify and strengthen the role of different indirect as well as direct approaches. Importantly, a benchmark study should support individual laboratories when selecting indirect methods for reference interval estimation or validation by providing guidance on the minimal number of samples and the maximum proportion of abnormal test results.

Figure 1:

Estimation of reference intervals in a “contaminated” dataset using the statistical approach implemented in kosmic (example using simulated hemoglobin test results).

Based on the histogram of test results H, the cumulative density of test results D is determined. Subsequently, the cumulative density F of a normal distribution is compared to D within a truncation interval T, specifically the term KS. Using an optimization process T and the normal distribution’s parameters μ and σ resulting in the minimum KS are identified to construct the estimated distribution of physiological test results. This process is performed for different “skewness” factors λ (prior Box-Cox transformation of test results using λ) to enable the estimation of non-normal distributions (Modified from ref. [24]).

Advantages of indirect methods in pediatric laboratory medicine

While indirect methods have not been specifically developed to create pediatric reference intervals, the long-standing gaps existing in pediatric reference intervals together with the unique and extraordinary ethical challenges of drawing blood from healthy children have led to their extensive use to create pediatric reference intervals. The availability of large clinical and laboratory databases enables the application of both metadata-driven indirect methods and statistical approaches, and often both are combined. Importantly, this enables the creation of reference intervals specifically for those very young age groups where conventional methods are most limited, i.e. neonates and young infants, as these children are overrepresented in laboratory databases due to their major contribution to pediatric morbidity. Additionally, the extensive availability of samples enables much more fine-grained stratification by age, which better represents the continuous dynamics of many analytes during physiological pediatric development. The most precise representation of change with age in many biomarkers can be achieved using continuous reference intervals and percentile charts, which –although not specific to indirect methods and first created using direct methods- have been considerably pushed forward by indirect methods and the resulting extensive availability of data points [4, 26].

Applications of indirect methods to create pediatric reference intervals

Indirect methods have been extensively used to establish pediatric reference intervals:

Loh and Metz have established continuous reference intervals (represented using percentile charts) for children from birth to 18 years in 22 biochemistry analytes [27, 28]. To this end, they used a metadata-driven approach and included only test results from primary care providers and from children in whom only one test result was available during the one-year study period, under the assumption that children who visited a primary care provider and in whom no retesting was performed were most likely healthy. The statistical methods subsequently used were direct statistical methods typically used for creating percentile charts (the LMS method by Cole and Green [29]).

Christenssen et al. have established a variety of neonatal and young infants’ reference intervals that stratify for gestational age, an essential pediatric covariate [30], [31], [32], [33]. To exclude abnormal samples, they applied elaborate exclusion criteria (e.g. exclusion of samples for red cell reference intervals if children received red cell transfusions, had a diagnosis of anemia, or if the mothers had a diagnosis with a high probability of neonatal anemia). Additionally, they reported a more narrow range (5 and 95th percentiles instead of 2.5 and 97.5th percentiles) for their reference intervals.

Statistical indirect methods have for example been used by Ahmed et al. to create pediatric reference intervals for alkaline phosphatase and creatinine concentration in Pakistani children using the TML approach [34, 35]. Similarly, Chung has established pediatric reference intervals for ionized calcium using kosmic and Bhattacharya analysis [36].

In the PEDREF study (Next-generation pediatric reference intervals, https://www.pedref.org/), we have used a combination of metadata-driven criteria and statistical approaches to create pediatric reference intervals. Our first analyses included single-center data from the University Hospital Erlangen only, and we have continuously expanded the study with data from additional centers from across Germany, with the current dataset containing >20,000,000 test results from >1,000,000 children from 15 centers. We have used both the TML method and kosmic to identify the proportion of physiological test results, and employed retesting frequency as a surrogate for children’s health status. In our first publication, we have shown that the exclusion of patients from intensive care units and hemato-oncological wards is not necessary in children >3 months, as the used indirect method TML correctly identifies the physiological sample distribution even in the presence of samples from intensive care patients and oncological patients [37]. In later publications, we removed all test results from children with multiple measurements (or repeat measurements within 50 days in children aged <100 days) to reduce the fraction of abnormal test results in the input dataset [12]. However, this filtering technique is highly unspecific and results in unnecessary exclusion of samples. More fine-grained filtering e.g. by diagnoses and performed procedures obtained from the electronic medical record (EMR) would allow both more sensitive and more specific sample exclusion, and therefore increase the number of test results available for analysis while reducing the proportion of pathological test results. However, multi-center analyses of EMRs (or any other datasets that contain extensive metadata, like diagnoses and performed procedures) are restricted in Germany and most other jurisdictions. This is due to the fact that transfer of such high-dimensional datasets is incompatible with privacy regulations because of an inacceptable re-identification risk even if directly identifying information is removed. This also severely restricts the applicability of multi-center data mining approaches to premature neonates, a subpopulation of particular pediatric interest, as information regarding gestational age and birthweight is not available.

Methods to create continuous reference intervals or percentile charts using data mining

When continuous reference intervals or percentile charts are created using statistical indirect methods, the most common approach is a sequential procedure [11, 12, 37], [38], [39] (see Figure 2): First, the input dataset is split into discrete age groups of varying sample size and age ranges. Second, an indirect method (e.g. TML or kosmic) is applied to each group, resulting in age-specific reference intervals. In a third step, these reference intervals are merged using statistical methods (e.g. using cubic smoothing splines or fractional polynomials) to yield a single representation for each reference limit. While this approach is straightforward, it has a major limitation: Due to the fact that reference intervals are estimated independently for each different age group, the information available from the input data set is not used as efficiently as possible. Improved approaches should account for the fact that the distribution of physiological samples changes continuously with age, which would reduce the necessary samples size and increase the precision of resulting reference intervals. While such statistical approaches are firmly established for direct methods (e.g. Generalized Additive Models for Location, Scale and Shape [40]), corresponding indirect methods are not yet ready for clinical application. Hepp et al. have developed the first integrated indirect reference interval method for a continuous covariate, however, limitations in the model’s assumptions (a Gaussian distribution of both physiological and pathological test results, and a constant ratio of abnormal test results and physiological test results across all ages) restrict its applicability to real-world data [41]. Despite its current restrictions, the availability of this algorithm is a major step forward to an indirect algorithm that incorporates change with age at its core. Importantly, the source code of this algorithm’s implementation -as well as most other current indirect methods (e.g. kosmic, refineR, TMC, TML)- is freely available under an open-source license, enabling further development and improvement by the scientific community.

Figure 2:

Approach to create continuous reference intervals and percentile charts in the PEDREF study [12].

Test results (example for girls’ creatinine values) are retrieved from laboratory information systems (A), and the distribution of physiological samples is identified using kosmic (see Figure 1) for each age (B). The determined age-specific distributions are subsequently merged to create continuous percentile charts from birth to 18 years (C).

Summary and outlook

The availability of advanced indirect statistical methods and large clinical and laboratory databases has enabled extensive data mining of pediatric reference intervals. Greater proliferation and ongoing standardization of electronic medical records will increase the quality and quantity of available metadata to incorporate into data mining approaches. As an example, the German Medical Informatics Initiative (MII) is providing a standardized core data set of laboratory test results, demographic data, diagnoses, and performed procedures of all patients treated at German university hospitals, with an infrastructure to access this dataset while respecting privacy regulations [42, 43]. A current aim of the PEDREF study is to use this data to further improve pediatric reference intervals, and to more precisely stratify for pediatric covariates (e.g. gestational age).

Conclusions

Data mining approaches complement and extend conventional reference interval approaches especially in pediatrics, where ethical and practical considerations most severely limit the applicability of direct approaches. This allows to improve the availability and quality of reference intervals, with more exact stratification for age and the establishment of population-specific reference intervals in global settings, where conventional population-based methods cannot be used due to resource constraints. Ultimately, this enables us to tackle the particular challenges when creating pediatric reference intervals, which prevent the optimal use of laboratory test results in children and adolescents.

Corresponding author: Jakob Zierk, Department of Pediatrics and Adolescent Medicine, University Hospital Erlangen, Erlangen, Germany, E-mail: jakob.zierk@uk-erlangen.de

Research funding: None declared.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Competing interests: Authors state no conflict of interest.
Informed consent: Not applicable.
Ethical approval: Not applicable.

References

1. Rohr, U-P, Binder, C, Dieterle, T, Giusti, F, Messina, CGM, Toerien, E, et al.. The value of in vitro diagnostic testing in medical practice: a status report. PLoS One 2016;11:e0149856. https://doi.org/10.1371/journal.pone.0149856.Search in Google Scholar PubMed PubMed Central

2. Horowitz, GL. The power of asterisks. Clin Chem 2015;61:1009–11. https://doi.org/10.1373/clinchem.2015.243048.Search in Google Scholar PubMed

3. Jones, G, Barker, A. Reference intervals. Clin Biochem Rev 2008;29:S93–7.Search in Google Scholar

4. Haeckel, R, Wosniok, W, Arzideh, F, Zierk, J, Gurr, E, Streichert, T. Critical comments to a recent EFLM recommendation for the review of reference intervals. Clin Chem Lab Med 2017;55:341–7. https://doi.org/10.1515/cclm-2016-1112.Search in Google Scholar PubMed

5. Higgins, V, Asgari, S, Adeli, K. Choosing the best statistical method for reference interval estimation. Clin Biochem 2019;71:14–6. https://doi.org/10.1016/j.clinbiochem.2019.06.006.Search in Google Scholar PubMed

6. CLSI. Defining, establishing, and verifying reference intervals in the clinical laboratory; approved guideline, 3rd ed. Wayne, PA: Clinical and Laboratory Standards Institute; 2008, Report No.: CLSI document C28-A3.Search in Google Scholar

7. Ceriotti, F. Establishing pediatric reference intervals: a challenging task. Clin Chem 2012;58:808–10. https://doi.org/10.1373/clinchem.2012.183483.Search in Google Scholar PubMed

8. Adeli, K. Closing the gaps in pediatric reference intervals: an update on the CALIPER project. Clin Biochem 2014;47:737–9. https://doi.org/10.1016/j.clinbiochem.2014.05.037.Search in Google Scholar PubMed

9. Adeli, K, Higgins, V, Trajcevski, K, Habeeb, NW-A. The Canadian laboratory initiative on pediatric reference intervals: a CALIPER white paper. Crit Rev Clin Lab Sci 2017;54:358–413. https://doi.org/10.1080/10408363.2017.1379945.Search in Google Scholar PubMed

10. Hoq, M, Matthews, S, Karlaftis, V, Burgess, J, Cowley, J, Donath, S, et al.. Reference values for 30 common biochemistry analytes across five different analyzers in neonates and children 30 days to 18 years of age. Clin Chem 2019;65:1317–26. https://doi.org/10.1373/clinchem.2019.306431.Search in Google Scholar PubMed

11. Zierk, J, Hirschmann, J, Toddenroth, D, Arzideh, F, Haeckel, R, Bertram, A, et al.. Next-generation reference intervals for pediatric hematology. Clin Chem Lab Med 2019;57:1595–607. https://doi.org/10.1515/cclm-2018-1236.Search in Google Scholar PubMed

12. Zierk, J, Baum, H, Bertram, A, Boeker, M, Buchwald, A, Cario, H, et al.. High-resolution pediatric reference intervals for 15 biochemical analytes described using fractional polynomials. Clin Chem Lab Med 2021;59:1267–78. https://doi.org/10.1515/cclm-2020-1371.Search in Google Scholar PubMed

13. Hickman, PE, Koerbin, G, Potter, JM, Glasgow, N, Cavanaugh, JA, Abhayaratna, WP, et al.. Choice of statistical tools for outlier removal causes substantial changes in analyte reference intervals in healthy populations. Clin Chem 2020;66:1558–61. https://doi.org/10.1093/clinchem/hvaa208.Search in Google Scholar PubMed

14. Adeli, K, Raizman, JE, Chen, Y, Higgins, V, Nieuwesteeg, M, Abdelhaleem, M, et al.. Complex biological profile of hematologic markers across pediatric, adult, and geriatric ages: establishment of robust pediatric and adult reference intervals on the basis of the Canadian health measures survey. Clin Chem 2015;61:1075–86. https://doi.org/10.1373/clinchem.2015.240531.Search in Google Scholar PubMed

15. Farrell, C-JL, Nguyen, L. Indirect reference intervals: harnessing the power of stored laboratory data. Clin Biochem Rev 2019;40:99–111.10.33176/AACB-19-00022Search in Google Scholar

16. Haeckel, R, Wosniok, W, Streichert, T. Review of potentials and limitations of indirect approaches for estimating reference limits/intervals of quantitative procedures in laboratory medicine. J Lab Med 2021;45:35–53. https://doi.org/10.1515/labmed-2020-0131.Search in Google Scholar

17. Obstfeld, AE, Patel, K, Boyd, JC, Drees, J, Holmes, DT, Ioannidis, JPA, et al.. Data mining approaches to reference interval studies. Clin Chem 2021;67:1175–81. https://doi.org/10.1093/clinchem/hvab137.Search in Google Scholar PubMed

18. Hoffmann, RG. Statistics in the practice of medicine. J Am Med Assoc 1963;185:864–73. https://doi.org/10.1001/jama.1963.03060110068020.Search in Google Scholar PubMed

19. Bhattacharya, CG. A simple method of resolution of a distribution into Gaussian components. Biometrics 1967;23:115–35. https://doi.org/10.2307/2528285.Search in Google Scholar

20. Arzideh, F, Wosniok, W, Haeckel, R. Indirect reference intervals of plasma and serum thyrotropin (TSH) concentrations from intra-laboratory data bases from several German and Italian medical centres. Clin Chem Lab Med 2011;49:659–64. https://doi.org/10.1515/CCLM.2011.114.Search in Google Scholar PubMed

21. Arzideh, F, Wosniok, W, Haeckel, R. Reference limits of plasma and serum creatinine concentrations from intra-laboratory data bases of several German and Italian medical centres: comparison between direct and indirect procedures. Clin Chim Acta 2010;411:215–21. https://doi.org/10.1016/j.cca.2009.11.006.Search in Google Scholar PubMed

22. Arzideh, F, Brandhorst, G, Gurr, E, Hinsch, W, Hoff, T, Roggenbuck, L, et al.. An improved indirect approach for determining reference limits from intra-laboratory data bases exemplified by concentrations of electrolytes. J Lab Med 2009;33:52–66. https://doi.org/10.1515/jlm.2009.015.Search in Google Scholar

23. Wosniok, W, Haeckel, R. A new indirect estimation of reference intervals: truncated minimum chi-square (TMC) approach. Clin Chem Lab Med 2019;57:1933–47. https://doi.org/10.1515/cclm-2018-1341.Search in Google Scholar PubMed

24. Zierk, J, Arzideh, F, Kapsner, LA, Prokosch, H-U, Metzler, M, Rauh, M. Reference interval estimation from mixed distributions using truncation points and the Kolmogorov-Smirnov distance (kosmic). Sci Rep 2020;10:1704. https://doi.org/10.1038/s41598-020-58749-2.Search in Google Scholar PubMed PubMed Central

25. Ammer, T, Schützenmeister, A, Prokosch, H-U, Rauh, M, Rank, CM, Zierk, J. refineR: a novel algorithm for reference interval estimation from real-world data. Sci Rep 2021;11:16023. https://doi.org/10.1038/s41598-021-95301-2.Search in Google Scholar PubMed PubMed Central

26. Higgins, V, Adeli, K. Advances in pediatric reference intervals: from discrete to continuous. J Lab Precis Med 2018;3:3. https://doi.org/10.21037/jlpm.2018.01.02.Search in Google Scholar

27. Loh, TP, Antoniou, G, Baghurst, P, Metz, MP. Development of paediatric biochemistry centile charts as a complement to laboratory reference intervals. Pathology 2014;46:336–43. https://doi.org/10.1097/pat.0000000000000118.Search in Google Scholar PubMed

28. Loh, TP, Metz, MP. Trends and physiology of common serum biochemistries in children aged 0–18 years. Pathology 2015;47:452–61. https://doi.org/10.1097/pat.0000000000000274.Search in Google Scholar

29. Cole, TJ, Green, PJ. Smoothing reference centile curves: the LMS method and penalized likelihood. Stat Med 1992;11:1305–19. https://doi.org/10.1002/sim.4780111005.Search in Google Scholar PubMed

30. Christensen, RD, Henry, E, Jopling, J, Wiedmeier, SE. The CBC: reference ranges for neonates. Semin Perinatol 2009;33:3–11. https://doi.org/10.1053/j.semperi.2008.10.010.Search in Google Scholar PubMed

31. Christensen, RD, Del Vecchio, A, Henry, E. Expected erythrocyte, platelet and neutrophil values for term and preterm neonates. J Matern Fetal Med 2012;25:77–9. https://doi.org/10.3109/14767058.2012.715472.Search in Google Scholar PubMed

32. Christensen, RD, Jopling, J, Henry, E, Wiedmeier, SE. The erythrocyte indices of neonates, defined using data from over 12,000 patients in a multihospital health care system. J Perinatol 2008;28:24–8. https://doi.org/10.1038/sj.jp.7211852.Search in Google Scholar PubMed

33. Christensen, RD, Yaish, HM, Henry, E, Bennett, ST. Red blood cell distribution width: reference intervals for neonates. J Matern Fetal Med 2015;28:883–8. https://doi.org/10.3109/14767058.2014.938044.Search in Google Scholar PubMed

34. Ahmed, S, Zierk, J, Khan, AH. Establishment of reference intervals for alkaline phosphatase in Pakistani children using a data mining approach. Lab Med 2020;51:484–90. https://doi.org/10.1093/labmed/lmz096.Search in Google Scholar PubMed

35. Ahmed, S, Zierk, J, Siddiqui, I, Khan, AH. Indirect determination of serum creatinine reference intervals in a Pakistani pediatric population using big data analytics. World J Clin Pediatr 2021;10:72–8. https://doi.org/10.5409/wjcp.v10.i4.72.Search in Google Scholar PubMed PubMed Central

36. Chung, JZY. Paediatric reference intervals for ionised calcium – a data mining approach. Clin Chem Lab Med 2021;59:e271–3. https://doi.org/10.1515/cclm-2021-0006.Search in Google Scholar PubMed

37. Zierk, J, Arzideh, F, Haeckel, R, Rascher, W, Rauh, M, Metzler, M. Indirect determination of pediatric blood count reference intervals. Clin Chem Lab Med 2013;51:863–72. https://doi.org/10.1515/cclm-2012-0684.Search in Google Scholar PubMed

38. Zierk, J, Arzideh, F, Rechenauer, T, Haeckel, R, Rascher, W, Metzler, M, et al.. Age- and sex-specific dynamics in 22 hematologic and biochemical analytes from birth to adolescence. Clin Chem 2015;61:964–73. https://doi.org/10.1373/clinchem.2015.239731.Search in Google Scholar PubMed

39. Zierk, J, Arzideh, F, Haeckel, R, Cario, H, Frühwald, MC, Groß, H-J, et al.. Pediatric reference intervals for alkaline phosphatase. Clin Chem Lab Med 2017;55:102–10. https://doi.org/10.1515/cclm-2016-0318.Search in Google Scholar PubMed

40. Rigby, RA, Stasinopoulos, DM. Generalized additive models for location, scale and shape, (with discussion). J Roy Stat Soc C Appl Stat 2005;54:507–54. https://doi.org/10.1111/j.1467-9876.2005.00510.x.Search in Google Scholar

41. Hepp, T, Zierk, J, Rauh, M, Metzler, M, Mayr, A. Latent class distributional regression for the estimation of non-linear reference limits from contaminated data sources. BMC Bioinf 2020;21:524. https://doi.org/10.1186/s12859-020-03853-3.Search in Google Scholar PubMed PubMed Central

42. Semler, SC, Wissing, F, Heyder, R. German medical informatics initiative. Methods Inf Med 2018;57:e50–6. https://doi.org/10.3414/me18-03-0003.Search in Google Scholar

43. Gehring, S, Eulenfeld, R. German medical Informatics initiative: unlocking data for research and health care. Methods Inf Med 2018;57:e46–9. https://doi.org/10.3414/me18-13-0001.Search in Google Scholar PubMed PubMed Central

Received: 2021-09-05

Accepted: 2021-09-28

Published Online: 2021-10-13

Published in Print: 2021-12-20

This work is licensed under the Creative Commons Attribution 4.0 International License.

Data mining of pediatric reference intervals

Abstract

Introduction

Direct methods to create reference intervals

Challenges when using direct methods to create pediatric reference intervals

Indirect methods as a complement to direct methods

Advantages of indirect methods in pediatric laboratory medicine

Applications of indirect methods to create pediatric reference intervals

Methods to create continuous reference intervals or percentile charts using data mining

Summary and outlook

Conclusions

References

Journal and Issue

Articles in the same Issue