Introduction

The mass storage of health data continues to transform healthcare delivery. To improve patient care, strategies that allow for the extraction of information and insights from clinical data are paramount. Computerized clinical decision support (CDS) systems provide a way to translate research findings to real-time interventions, promote quality improvement, and decrease variation in care.1 CDS may be coupled with machine learning (ML) approaches, which use algorithms that leverage statistical methods to learn useful patterns from data. The resulting artificial intelligence (AI)-based CDS (AI-CDS) may allow for improved predictive performance to identify patients with a higher or lower likelihood of developing a disease, suffering a clinical deterioration, or who may benefit from a particular management strategy. AI-CDS represents an exciting opportunity to further improve care delivery. In this narrative review, we summarize key concepts related to AI-CDS: the principles and current state of CDS and AI, the role of AI-CDS in pediatric care, key concepts related to the development and implementation of AI-CDS, the challenges associated with AI-CDS, and future steps in the field.

“Big data” in healthcare

While some definitions of “big data” refer to the size of the dataset (e.g., as too large to be stored or analyzed by conventional software solutions), this definition is complicated by the ever-growing capacity of computer systems to work with larger datasets.2 Another definition by Gartner describes “big data” as “high-volume, high-velocity, and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making, and process automation”.3 While “volume” refers to the large size of the dataset, “velocity” refers to the need to be able to rapidly grow. “Variety” refers to the ability to use heterogeneous data, which in a healthcare context may include medical recordkeeping, billing, genomics, social determinants of health, patient-reported outcomes, and genomics. There are increasing efforts to leverage big data for purposes beyond recordkeeping to improve the delivery of patient care.4 AI, defined as a field of study that focuses on how computers learn from data and the development of algorithms that make this learning possible, provides one important way to utilize these technologies to better improve practice.5

Clinical decision support

CDS may be defined as “computer systems designed to impact clinician decision making about individual patients at the point in time that these decisions are made.”6 CDS generally encompasses three steps: (1) acquiring patient data, (2) summarizing data, and (3) suggesting an appropriate course of action.7 CDS may include alerts, reminders, order sets, drug-dose calculations, care summary dashboards, and point-of-care information retrieval systems.8 Within pediatrics, CDS has been used in a variety of applications, including traumatic brain injury (TBI),9,10 asthma,11 urinary tract infections (UTIs),12 screening for developmental disorders,13 ventilatory support,14,15 and antibiotic selection.16

A systematic review evaluating the impact of CDS tools from 148 randomized controlled trials (primarily from adults) demonstrated that CDS improved outcomes when used for performing preventive services (odds ratio [OR], 1.42; 95% CI 1.27–1.58), ordering clinical studies (OR, 1.72; 95% CI, 1.47–2.00), and prescribing therapies (OR, 1.57; 95% CI 1.35–1.82) compared to care provided without CDS.8 Few studies, however, have evaluated the balancing measures, including unintended consequences or adverse effects such as false negatives or increased physician workload burden.17 In pediatrics, traditional rule-based CDS is routinely used to improve patient care but is frequently limited by poor model specificity, often resulting in false positive alerts.18 These can result in physician dissatisfaction, contribute to burnout, and result in patient harm. One study of drug alerts in the primary care setting demonstrated a decline in the utilization of an alert-based CDS with additional reminders for the same patient.19 In turn, ignored alerts from a CDS can result in adverse consequences. In one report from a pediatric intensive care unit, clinicians repeatedly overrode alerts in an EHR drug allergy alerting system resulting in the deterioration of a patient over time, emphasizing the potentially harmful effects that can arise from alarm fatigue.20

AI and predictive modeling

The use of AI may overcome some limitations attributed to traditional CDS. If perceived as ineffective or intrusive, users may ultimately view CDS as a burden toward effective patient care.21,22 In addition, CDS may result in inaccurate or poorly individualized recommendations. Prior qualitative work, for example, has cited important concerns with CDS, including inaccuracy of predictions leading to false positive and negative results.23 AI-CDS represents an important evolution in the development of CDS models.

AI-CDS systems are sometimes called “non-knowledge” based AI, as they differ from the “if/then” rules that define rules-based (or “knowledge”) based CDS because the predictions are based on statistical or ML algorithms.24 ML, considered a domain of AI, is a set of methods for inferring useful relationships in large datasets for which the assumptions and techniques of traditional statistics may be poorly defined. While the validity of traditional statistics comes from attempting to put conservative error bounds on inference, the validity of ML is generally defined with respect to out-of-sample predictive usefulness. ML has origins in statistics but evolved along an independent trajectory as computing power rapidly advanced, fostering the development of computational methods for the analysis of larger and larger datasets. Instead of traditionally used “hard-coded” algorithms or prediction rules, ML algorithms instead experientially improve through training, often entailing dimension reduction steps that particularly suit these methods to high-dimensional data.25,26 Supervised ML is generally used for predictive modeling (e.g., likelihood of an event happening). In contrast, unsupervised ML is generally used to find natural groupings or clusters in the data (e.g., phenotypes of a disease or syndrome).5

In supervised ML, models undergo a period of training in which the ML algorithm is provided with input data in addition to the predefined outcome measure (such as the presence of a disease), to identify relationships between the outcome of interest and the features (or predictors) (Fig. 1).27 Outcome data may be categorical (called “classification”), such as in-hospital mortality or intensive care unit hospitalization. Alternatively, a “regression” ML task is when a continuous outcome is required (such as hospital length of stay). Hundreds of algorithms for supervised ML have been described. Common methods used in the context of classification include logistic regression, k-nearest neighbors, naïve Bayes, decision trees, random forests, neural networks, and support vector machines. Other algorithms have been described which combine the strengths of base learners. Details of specific ML algorithms are summarized elsewhere.5

Fig. 1: Development of a supervised machine learning algorithm.
figure 1

Datasets frequently require cleaning and/or preprocessing (such as “one-hot” encoding of categorical variables). Following this, initial analyses are performed to identify distinct variables with the strongest association with the study outcome. A portion of the data may be used as a holdout cohort for internal validation. The remainder is used for model training. Following internal validation, the model may be tested in distinct datasets.

To date, most predictive models in children have been derived using more classical ML algorithms. For example, models to identify patients at risk of UTIs,12 pneumonia,28 bacterial meningitis,29 clinically important TBI,9 serious bacterial infections,30 septic arthritis,31 or intra-abdominal injury32 have been derived using readily understandable logistic regression or decision trees. Such approaches can offer familiarity and interpretability to clinicians by design and have a track record of relatively robust predictive performance in some settings. When models perform similarly, the simpler, more applicable, and transparent model should be considered over the more complicated one.

Within the pediatric acute care setting ML models have been reported for a variety of pediatric indications. Some of these demonstrate superior potential compared to classic ML models. Some examples are provided below:

  1. 1.

    Prediction of children with clinically important TBI: investigators used optimal classification trees to identify children at risk of clinically important TBI and compared it to a frequently utilized model developed through classification and regression trees (CART).33 The models demonstrated comparable sensitivity to the originally published CART model, but with higher specificity, potentially allowing for a reduction in unnecessary CT scans.

  2. 2.

    Emergency department testing: Singh et al. reported an ML model to predict the need for frequently performed clinical testing (urinary dipstick testing, electrocardiogram, abdominal ultrasonography, testicular ultrasonography, bilirubin level testing, and forearm radiographs) among children presenting to a pediatric emergency department.34 Using an outcome of testing ordered in triage, the model demonstrated high area under the receiver operating characteristic curve (AUROC) (0.89–0.99, across individual use cases for clinical tests including urinary dipstick testing and electrocardiograms) with a high positive predictive value (0.77–0.94) with results available by a mean time of 165 min. The investigators were further able to characterize model explainability using Shapley Additive Explanation to ensure their clinical relevance.

  3. 3.

    Sepsis prediction: given its complexity with diagnosis, the presence of multiple overlapping diagnoses and the interaction of variables with each other, the early identification of pediatric sepsis may be one that is particularly well suited to an ML application to guide CDS. Several recent studies have demonstrated that ML methods may accurately identify children with sepsis.35,36,37 One recent multicenter study that was trained to identify children with septic shock (defined as systolic hypotension and vasoactive use or ≥30 ml/kg of crystalloid administration within 24 h) using data limited to 2 h of presentation, for example, demonstrated an AUROC of 0.83–0.85 between the two included validation datasets.37 This study, which utilized least absolute shrinkage and selection operator with 10-fold cross-validation, demonstrated a specificity of 62% when using a pre-selected sensitivity of 90%. The model was trained on demographic data, chronic conditions, prior visit data, emergency department visit data, clinical (e.g., vital signs), and laboratory data.

  4. 4.

    Prediction of infants with significant intra-abdominal injury: several investigators have developed ML models to identify children presenting following abdominal trauma who are at risk of intra-abdominal injury requiring intervention, including injuries that resulted in death, therapeutic angiography or laparotomy, blood transfusion, or admission for hospitalization for ≥2 nights to receive intravenous fluids.38,39 In one study, the random forest, support vector machine, and generalized linear modeling algorithms demonstrated high predictive performance to identify patients at low risk of this outcome, potentially resulting in a decreased requirement for imaging in this cohort.39

Other models have been described to broadly classify pediatric diagnoses,40 identify young infants with serious bacterial infections,41 and assist with ventilator support.42 Nearly all models in pediatric practice have not been externally validated in a peer-reviewed setting.

Role of AI-CDS

The integration of computers to assist in medical decision making was first discussed as a possibility in 1959, when the authors postulated that computers may be able to perform complex reasoning tasks, collect and process clinical information, and remind the physician of overlooked diagnoses.43 As described, several studies constructing or evaluating AI models in children have been reported, suggesting that AI-CDS may have the potential to improve care. Despite this avid interest, less work has been done to externally validate these models, construct CDS tools using these AI models with relevant stakeholder groups, and evaluate their role in implementation.

A template for an AI-CDS model is provided in Fig. 2. Clinical data may be collected from both structured and unstructured data. Structured data exists within predefined fields, such as vital signs, laboratory results, or diagnosis codes. Unstructured data are less organized and more subject to irregularities, such as information contained within clinician notes and imaging reports. To assist with the interpretation of unstructured data from notes, natural language processing (NLP), a branch of AI concerned with computational interpretation and production of human language, may allow for the translation of free text into results that may be used in risk prediction models.44 Neural network and deep learning models have also been used to automate the interpretation of unstructured imaging data, the results of which may also feed into a CDS model and have been used for pediatric pneumonia detection from chest radiographs.45,46

Fig. 2: Functioning of an artificial intelligence clinical decision support (AI-CDS) tool.
figure 2

Electronic health data exists in a variety of formats, including structured (in discrete fields) or unstructured (such as in narrative notes). The machine learning algorithm may then be applied to these test data. When a desired threshold of disease probability is reached, a best practice alert may be provided to the treatment team.

One recent study demonstrated a role for an NLP-based tool in the interpretation of chest radiographs for pneumonia. Using data from a single center, the authors used NLP-extracted features following a random forest-based ML classifier. The model demonstrated high accuracy (with an AUROC of 0.95). The authors subsequently deployed the model in a CDS tool within the EHR. Chest radiographs with a high probability of pneumonia triggered an interruptive best practice advisory.47

Another recent implementation study evaluated the role of AI-CDS in the management of asthma in children in a randomized control study that included 184 participants. For one group, AI-CDS summarized relevant clinical information, provided a prediction of the risk of asthma exacerbations over the following year, and suggested asthma management plans. The AI-CDS was compared to a group managed with usual care. Both groups showed similar declines in asthma exacerbations (12% for the intervention group and 15% for the control group). However, use of the AI-CDS resulted in significant decreases in time spent reviewing medical charts and time to follow-up, and reduced the cost of care compared to the usual care group.48

Development and implementation of AI-CDS

The process of AI-CDS development extends from initial model derivation and validation to implementation, study, and dissemination (Fig. 3). Model development is best performed by an interdisciplinary team of stakeholders, including clinicians and other potential end users, data scientists, clinical informaticians, and implementation scientists. For the development and reporting of prediction models, the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) provides a rubric for best practice reporting approaches.49 In addition, the minimum information about clinical artificial intelligence modeling (MI-CLAIM) criterion provides considerations specific for AI.50 MI-CLAIM emphasizes the need to perform benchmarking, clearly define input data types, and describe preprocessing data steps (e.g., transformations).

Fig. 3: Steps involved in the development of artificial intelligence clinical decision support.
figure 3

Stakeholders should be recruited early in the process to evaluate existing models, identify key priorities and develop a machine learning model. Models should then externally validated with specific consideration to balancing measures, including false positives and negatives and the performance of the model on minorities and/or socioeconomically disadvantaged subgroups. Models may then be implemented into the electronic health record with subsequent evaluation. Models should be studied and compared to standard of care, and if proven favorable, may then be disseminated.

In addition to evaluating a model’s overall performance, consideration should be provided to other metrics, including those related to cost, resource utilization, and other balancing measures. Pediatric-specific considerations for balancing measures may include those in the Pediatric Patient Reported Outcomes Measurement Information System rubrics, including those pertaining to physical, mental, and social wellness.51 Others include those related to painful procedures (i.e., venipuncture), radiation exposure, unnecessary exposure to medications, and harm to the family.

After model development and validation, the next step is for the AI-based algorithm to be implemented in the form of a CDS tool. Models may further be scaled, including through full integration into the EHR using EHR-vendor-specific modules, via application programming interfaces through standards like the Fast Health Interoperability Resources, or as standalone systems.52 When addressing implementation, interoperability standards within EHRs must be considered.53 Following local deployment, models require proactive and continued monitoring to evaluate changes in performance over time, retraining the model with additional targeted and augmented data, and identifying barriers and facilitators toward the use of the model by clinicians.54 Continued vigilance is required after local implementation of a model to evaluate for other balancing effects and a decline in performance over time. CDS tools should have systems in place to report problems and provide feedback.55,56

An AI-CDS that performs well locally may indicate the potential for generalizability. Traditional and emerging methods of clinical investigation can be used to evaluate whether the AI-CDS tool results in improved patient-centered outcomes across several centers. Randomized, controlled trials comparing the use of the AI-CDS to existing standards of care are essential to ensure that costs of scaling, implementing, and maintaining a tool more broadly are warranted and to ensure that dissemination does not have unintended effects that might negatively impact patient care. Methods of pragmatic clinical investigation that meld quality improvement techniques such as patient care unit or institution-level process standardization with randomization, such as clustered, stepped-wedge designs, or approaches involving EHR-embedded trial elements, such as randomization at the point-of-care, should be considered in the trial design to aid in the overall efficiency of conducting trials in this space.57,58 An AI-CDS tool should only be disseminated after it has demonstrated generalizable value through rigorous clinical investigation.

A recent editorial published by Shortliffe on AI-CDS identified important priorities which must be considered during implementation.59 First, the reasoning behind the AI decision should be transparent in order for the clinician to comprehend the rationale behind the decision. In other words, the “black box” algorithms of AI (in which a decision is made by an algorithm that lacks interpretability, such as deep learning models) is inadequate for clinical contexts where transparency and clinician trust in the models is paramount. Second, CDS should promote additional efficiency and blend seamlessly with a clinical environment. Third, CDS tools should be intuitively constructed and simple to use so that no major training is required for their use. Fourth, a CDS should reflect an understanding of the pertinent domain and answer clinically relevant questions. Fifth, advice should be offered in a way that recognizes the knowledge of the user to augment—but not replace—decision making. Finally, a CDS tool should be constructed on a rigorous, peer-reviewed scientific basis to establish its reliability and generalizability.

Challenges of AI-CDS in children

Though AI-CDS tools may have advantages compared to rule-based CDS in many situations, their use may result in additional challenges. One review of qualitative research on the use of AI in the clinical context that included clinicians, consumers, healthcare executives, and industry professionals, reported positive perceptions about its use, though also identified many reservations including wariness about liability from AI-mediated errors, need for training, reputational harm, worsening communication, privacy concerns, lack of proof regarding efficacy, and a lack of explainability.60

Some limitations lie in the development and implementation of AI-CDS carrying greater importance to research in children. These include a need for large datasets, challenges relating to unbalanced data, challenges of generalizability, lack of evidence-based care, maturational variation in children, and ethical issues.

  1. 1.

    Need for large, high-resolution datasets: high-resolution datasets, which contain demographic data combined with detailed clinical data (including relevant data from a patient’s past medical history, physical examination, vital signs, monitor data, laboratory results, and outcome data) are ideal for ML applications. From the standpoint of model development and prediction, an important limitation for pediatric research lies in the lack of large datasets. Children account for a smaller proportion of healthcare resources,61 and datasets in children (either derived from the EHR, prospective, or administrative data) are frequently smaller. For ML to be successful, granular data are required, yet such data are commonly unavailable in administrative datasets (defined as data collected for administrative or billing purposes)62 and the procurement of more detailed datasets can be expensive. When using smaller datasets, advanced ML methods may offer no advantage or only a slight advantage over classic linear models.63,64

  2. 2.

    Imbalanced datasets: pediatric AI research may be challenged by the presence of imbalanced datasets, a term used to describe when one of the categorical outcome variables (such as the lack of a disease state) occurs substantially more commonly than the other (the presence of a disease state). For example, only 0.1% of children presenting to the ED will ultimately be diagnosed with sepsis, making its prediction a challenge.65 A number of techniques have been described to handle imbalanced datasets. Among these, undersampling (in which the amount of majority class instances is reduced), oversampling (in which the amount of minority class instances is inflated using bootstrapping or a similar approach), or combinations of the two66 may overcome the limitations of imbalanced datasets and are particularly relevant in ML for pediatric applications. An additional challenge with imbalanced datasets is appropriate measurements of results. Classic performance measures like AUROC may be inappropriate in these cases given that the performance may be overestimated by a model that tends to over-predict a majority class. Alternative approaches, such as the area under the precision-recall curve (AUPRC) may be more appropriate in this context given that the interpretation of the AUPRC is based on the prevalence of the outcome and represents the relationship between sensitivity and positive predictive value, which are arguably the most important performance measures for rare event prediction.67

  3. 3.

    Generalizability: models derived from one hospital system may poorly generalize to another. One recently reported example was the Epic Sepsis Model, a proprietary sepsis prediction model to predict adults with sepsis study that was developed across three health systems. In an external validation of this model in a different hospital system, it demonstrated a decline in performance in a different hospital system, with an AUROC of 0.63 (compared to AUROCs of 0.76–0.83 as originally reported).68 In addition, challenges exist with interoperability of definitions between EHR systems which need to be reconciled prior to their use.

  4. 4.

    Lack of evidence-based guidelines and variation in care: CDS is most effective when the detection of disease can serve as a prompt to promote evidence-based care. However, a large evidence-base does not currently exist for the optimal management of many common pediatric conditions. As such, the role of a CDS in providing recommendations may be affected by substantial practice pattern variations in care, both between individual providers and across institutions.

  5. 5.

    Developmental variation in children: pediatric AI models must account for the physiologic changes and changes in disease risk that occur through early childhood. Vital signs need to be adjusted for age.69 ML tools for developmental disorders similarly need to adjust for maturational changes.70 The automated interpretation of radiographs similarly must account for maturational changes in children, differing appearances of common pathologies, and broader differential diagnoses, which can all create challenges associated with their automated interpretation.71

  6. 6.

    Ethical issues: many important ethical challenges have been raised regarding the use of AI-CDS to augment the care of children. Efforts to demonstrate the benefit of AI-CDS among pediatric patients can be expected to encounter a set of challenges already faced by other clinical investigators aiming to conduct trials enrolling pediatric patients. Waived or deferred consent is relatively uncommon in pediatric clinical trials in the United States. This poses a potential challenge during regulatory reviews of the design of embedded, pragmatic studies that seek to evaluate the use of AI-CDS nested within usual care workflows. Other ethical issues surrounding the deployment of AI-CDS relate to the determination of acceptable performance standards for models, questions of whether “black box” tools should be relied on in high-stakes clinical care, the potential infringement of data protection rights of individuals, and whether the use of AI-CDS must be disclosed to patients and their caregivers.72,73

  7. 7.

    Disparities in care: some studies have suggested that socioeconomic and racial disparities in care may be improved through the standardization in care provided by a CDS system.74,75 However, AI models have been demonstrated to potentiate systemic racism in healthcare models.76 Similarly, crisis standards of care during the COVID-19 pandemic have commonly relied on rubrics incorporating regression-based sequential organ failure assessment (SOFA) scores to guide the allocation of scarce critical care resources. Recently highlighted racial disparities in SOFA scores77,78,79 have cast substantial doubt on the use of SOFA-based CDS for patients with COVID-19. Similarly, the potential for inequity was noted in a popular web-based system for the identification of young infants at risk of UTI used a dichotomized variable of race,12 prompting criticism that such a model may delay the identification of Black children with UTI.80

  8. 8.

    Medicolegal issues: the implementation of an AI-guided CDS to partially assist or automate decision making carries medicolegal implications, such as in patients who have a poor outcome due to inaccurate predictions or CDS recommendations. Others have raised concerns about patient privacy or that such a system may be exposed to hacking. In response to stakeholder feedback, the United States Food and Drug Administration proposed a regulatory framework in 2021 to describe a potential approach for the review of AI/ML-driven software modifications, in which AI software would be considered a “medical device”.81 In it, the FDA described priorities, including a need for “best practice” model development, the importance of a patient-centered approach, consideration of biases, and for continual monitoring of a software product through its post-market life to enable the FDA to have a reasonable assurance of safety and effectiveness as a model continues to iteratively improve over time.

  9. 9.

    Limitations of CDS at large: many limitations associated with CDS system implementation also apply to AI-CDS. One study performed in the primary care setting, for example, noted that providers were likely to use prompts when provided as a means of support and choice, and were resistant when such prompts were perceived as a means of enforcement.82

Other issues with AI-CDS that have been proposed include impact on user skill (e.g., human skill deterioration over time), need for continued maintenance, and challenges with data quality.24 Finally, AI-CDS are expensive to develop and maintain over time.83

Future steps

A well-validated and implemented AI-CDS tool may allow for the improved care of patients at the bedside through the delivery. However, for AI-CDS to benefit children, several critical steps are required. Situations that may benefit from improved predictive performance need to be identified. To construct predictive models, large datasets are needed. These may be potentially generated via the construction of larger, federated datasets, defined as datasets that are mapped from subsidiary datasets that may be interconnected across computer systems.84 These data may be derived either retrospectively (from hospital medical records) or through prospective trials for use in ML applications. These models need to be published in accordance with reporting guidelines and externally validated in distinct populations and settings. Importantly, more effort is required to ensure generalizability by assessing the performance of models on distinct patient cohorts.

Prior to and in conjunction with data collection and modeling efforts, continued engagement is needed with key stakeholders, including physicians and other healthcare providers, payors, computer scientists, and regulators to identify ways in which AI-CDS models may be most effectively implemented. Critically, patients and their families are a frequently overlooked stakeholder group, though they have an important voice in the conversation about AI-based technologies in healthcare. Two previous studies have suggested that parents may generally be receptive toward the use of AI,85 though there may be differences in trust by race and age.86 More data are needed to assess their baseline comfort with the use of AI in the development of healthcare predictions and recommendations: while the public at large appears to have a favorable perspective on the role of AI applications in healthcare,87,88,89,90,91,92 important concerns have emerged, such as a loss of humanism in medicine.93 Other studies have identified racial disparities with respect to the comfort in the role of AI in medicine.88 Little has been reported on the opinions of caregivers on the role of AI in medicine.

The role of AI in CDS continues to evolve. More work is needed to compare AI-CDS tools to conventional CDS or routine practice, for which little has been reported. There is a growing interest in explainable AI, which is more transparent, allowing the user to evaluate how predictions are constructed.94 Models that do not require periodic tuning, and that can instead improve over time when exposed to new data, a term called adaptive CDS, also represent an important opportunity for growth.95

Conclusion

Applied in the appropriate clinical context, AI-CDS may provide an opportunity to seamlessly provide evidence-based and individualized care to children. Given the novelty of ML-based models in clinical practice, its current use in any healthcare domain currently remains limited, including in the care of children. AI-CDS may be able to overcome some of the limitations frequently ascribed to CDS, though more research is required. For AI-CDS to positively impact the care of children, a diverse group of stakeholders is required to promote the development of validated tools from high-resolution datasets, with high predictive accuracy, that are transparent and easy to use, and that complement the existing knowledge base of the clinician.