Introduction

Clinical care moved away from traditional intuition-based approaches towards evidence-based medicine (EBM) [1]. EBM aspires to provide the best possible care for patients, supported by high-quality evidence. Nonetheless, high-quality evidence is rather an unclear term, and thus, objective measurements of its certainty have been introduced. The most used framework to address evidence certainty is the Grading of Recommendations Assessment, Development and Evaluation (GRADE). GRADE is crucial in translating scientific evidence into medical recommendations and is also a valuable tool to help clinicians better understand the gathered evidence [2].

High-quality randomized controlled trials (RCTs) represent the basis of the foundation of evidence [3]. Since the beginning of the twenty-first century, the publication rate of high-quality research concerning neonates has been declining [4]. Many factors contribute to this, but the most significant are the rising standards to define the quality of clinical research, which mainly evolved in adult medicine and are not easy to be reproduced in neonatology. In fact, the high certainty of evidence consists typically of several high quality and large randomized controlled trials, whose results are in consensus with each other. Nonetheless, this is uncommon in neonatal trials due to varying co-interventions and definitions for multifaceted and multifactorial outcomes (e.g., bronchopulmonary dysplasia) [2], This variation makes it difficult to combine results and estimate the effectiveness of interventions and the certainty of the evidence [2]. Being born extremely prematurely usually means that the newborn is in a critical condition and has many medical problems occurring simultaneously and influencing each other [5]. Therefore, it is challenging for a single intervention to significantly impact on an outcome, making it unfair to deem these interventions unsuitable in neonatology [5]. Thus, we hypothesize that certainty of the available evidence is relatively low, and we designed a meta-epidemiological review to examine what has been the evidence certainty in the latest Cochrane neonatal reviews and investigate if the number of trials and enrolled patients is associated with the certainty of evidence.

Methods

Protocol

We performed a systematic meta-epidemiological review, whose protocol was registered in Open Science Framework, and it is available from https://osf.io/7k6s8/. The protocol was agreed before commencing the search and included search criteria, analysis plan, and full description of methods.

Search process and screening

For this work, we searched Cochrane neonatal reviews published between January 2022 and May 2024 from the Cochrane review register. We decided to focus on the past 2 years to provide the most recent view of neonatal evidence. Furthermore, we hypothesized that these reviews would be similar in terms of methodology, considering that guidance on risk of bias was updated in 2019, and GRADE guidance is continuously updated. The search results were then uploaded to Covidence software (Veritas Healthcare, Melbourne, Australia) for abstract screening. Two authors (TV and IK) performed the screening process independently, and disagreements, if any, were solved by reaching a mutual consensus or consulting the third author.

Inclusion and exclusion criteria

We included all Cochrane reviews on interventions that targeted neonates and had at least one meta-analysis performed for which the evidence certainty was rated according to GRADE criteria. If the review remained qualitative and did not perform any statistical pooling of the results, it was excluded.

Data extraction and treatment

Data from the reviews were extracted to a pre-designed Excel spreadsheet. First, 20% of the reviews were extracted independently by two authors (TV and IK) to pilot and test the extraction process, and as there were no conflicts, one author (TV) extracted the remaining 80% of the review data. The following information was extracted from each included review and for each outcome: year of publication, intervention, control, patient population, setting, number of studies, number of participants, effect estimates, and evidence certainty rating as declared by the meta-analysis according to GRADE classification [6]. This information was extracted from the presented summary of findings tables.

We classified the patient populations analyzed by each Cochrane meta-analysis in three groups as follows: (1) preterm neonates (gestational age less than 37+0 weeks), (2) term neonates (gestational age 37+0 weeks or more), (3) neonates of any gestational age. Furthermore, the meta-analyzed interventions were classified as ventilation, nutrition, medication, and others. Finally, the outcomes were classified as “subjective” or “objective” for the outcome assessor. Objective outcomes were, for example, laboratory parameters and clinical measurements which are measured by standardized methods. Death, intraventricular hemorrhage, and bronchopulmonary dysplasia were also considered as objective clinical outcomes. On the contrary, outcomes that may be possibly affected by the knowledge of the received intervention were considered subjective (e.g., duration of invasive ventilation, need for reintubation, time to discharge, pain). For full transparency, all the extracted data is available in the online supplementary file 1.

Statistics

For categorized variables, we presented absolute numbers with proportions (%). We used cross tabulation and Chi-squared or Fisher test, as appropriate, to examine differences in categorical outcomes between groups. We presented the findings for different interventions and outcomes in a traffic-light plot where green indicates high, yellow indicates moderate, orange indicates low, and red indicates very low certainty. The mean number of studies and patients per each evidence certainty categories were analyzed with ANOVA followed by Bonferroni post hoc test. A sensitivity analysis with Kruskal–Wallis test was also made. Statistical analyses were made by using SPSS 29.0 (IBM, Chicago-IL, USA), and p value less than 0.05 was considered statistically significant. We have reported the findings of this review according to the meta-epidemiological extension of the preferred reporting items in systematic reviews and meta-analysis (PRISMA) guideline [7].

Results

We screened 55 Cochrane reviews and included 49 of these for our analysis; six articles were excluded due to lack of quantitative meta-analysis in the review (Fig. 1). The included 49 reviews reported a total of 443 outcomes whose certainty of evidence was evaluated. Of the included articles, 14 reviews with 115 outcomes focused on feeding interventions, 8 reviews with 75 outcomes on ventilation, 17 reviews with 180 outcomes on medications, and 10 reviews with 73 outcomes were classified as miscellaneous.

Fig. 1
figure 1

PRISMA flowchart of the study selection process

Overall, the certainty of evidence was reported to be high in 8 (1.8%), moderate in 89 (20.2%), low in 195 (44.0%), and very low in 151 (34.0%) of the outcomes (Fig. 2). There were significant differences in certainty of evidence among reviews focusing on different interventions (p < 0.001). For instance, the highest proportion of very low certainty was in the medication reviews, and the highest proportion of low certainty was in feeding reviews; ventilation and other interventions had rather similar certainty (Fig. 2). There was no high certainty of evidence outcomes in ventilation reviews. Reviews reporting subjective outcomes had a significantly greater proportion of very low certainty (100 out of 243, i.e., 41.2%) than those with objective outcomes (50 out of 198, i.e., 25.3%, p < 0.001), and, finally, certainty was best in the reviews focusing solely on preterm neonates (Fig. 2).

Fig. 2
figure 2

Traffic-light plot of the evidence certainties. Evidence certainties are classified by type of intervention, outcome subjectivity, and patient maturity; differences are tested by chi-squared/Fisher test (more details in the main text). Overall certainties are also depicted. Absolute numbers presented in the graph and the scale is proportional to 100% for colors

Reviews with at least one outcome with high certainty of evidence included significantly more trials and enrolled patients compared to those with low certainty (p < 0.001 for both the number of studies and patients; Fig. 3A, B, respectively). Results of post hoc analyses are shown in Fig. 3: numbers of studies and patients were significantly different between outcomes of any certainty of evidence with the only exception of moderate vs high certainty comparisons. A sensitivity analysis with Kruskal–Wallis test showed similarly p < 0.001 for both the number of studies and patients. Specifically, reviews with at least one outcome with high certainty had approximately 3 and 1.5 times more studies or patients than those with very low or low certainty, respectively.

Fig. 3
figure 3

Average number of trials (A) and participants (B) among outcomes with a reported certainty of evidence. Columns and T-bar represent means and 95% confidence intervals (CIs), respectively. Horizontal lines represent significant post hoc comparisons. Data are analyzed with ANOVA and Bonferroni post hoc test

Outcomes with high certainty of evidence were eight, and their details are presented in Table 1. The interventions which had high certainty for at least one outcome were early developmental intervention programs, lumbar puncture position, oral dextrose gel, methylxanthine for apnea prevention, musical/vocal interventions for preterm neonates, and indomethacin and ibuprofen in PDA patients. Interestingly, only four (50%) of the high-certainty evidence outcomes were based on a pharmacological intervention. Furthermore, four reviews reported null findings, as they reported no clear difference between the intervention and control groups.

Table 1 Outcomes with high certainty of evidence in neonatal Cochrane reviews published in 2022–2024

Discussion

In general, we found that most outcomes in the Cochrane neonatal reviews published in the last 2 years have low or very low certainty of evidence. Only about 2% of the outcomes had high certainty, and around 20% had moderate certainty. These percentages varied slightly across different categories, such as type of intervention, but the overall proportions of certainty were almost identical within these categories. We also found a significant association between certainty of evidence and the number of trials and enrolled patients. However, our main finding was the worrisome and striking low number of reviews with high certainty for different interventions and outcomes. For example, none of the studies on ventilation interventions had high certainty of evidence.

There are many possible explanations for this result. First, there is a high variance in the number of trials comparing different non-invasive and invasive ventilation modalities. For example, the use of non-invasive neurally adjusted ventilation and nasal high frequency oscillatory ventilation was studied only in 5 small trials and approximately 10 randomized studies, respectively [8,9,10,11]. Second, the understanding of the outcome assessments subjectivity plays a key part, as ventilation strategies are hard to be blinded to attending clinicians. A recent meta-epidemiological review on this matter found a worrying rate of incorrectly performed risk of bias assessment leading to downgrading of certainty [12], and the proportion of incorrect assessments was similar between Cochrane and non-Cochrane reviews. Nonetheless, the recent NASONE trial showed that it is possible to compare complex ventilation strategies for a long time with an assessor-blinded design and reduce possible bias [9]. More and above this, it is important to remind that ventilation cannot be considered as a single intervention, like a pharmaceutical therapy. Ventilation is a complex technique delivered on various modes and, even within the same mode, with several possible combinations of ventilatory parameters and thus several therapeutic strategies. This is even more complex for non-invasive ventilation where the use of different interfaces plays a significant role in ventilation efficacy, patient-ventilator interaction, and comfort [13, 14]. The lack of detailed standardization of the ventilatory intervention can significantly downgrade the quality of results, and only recent trials have implemented this concept [9, 15]. Finally, a ventilatory strategy may be applied to neonates with completely different lung mechanics and pathophysiology (e.g., respiratory distress syndrome or bronchopulmonary dysplasia or neonatal acute respiratory distress syndrome): this also plays a role in downgrading the quality and demands a close pathophysiology phenotyping to enroll homogeneous populations [16, 17]. This is the basis for a personalized neonatal respiratory care and currently represent an important need [18].

The number of trials and enrolled patients was associated with the certainty of evidence. In other words, outcomes that were more widely studied had higher certainty. This is consistent with previously published meta-epidemiological reports and is worrisome. In fact, as previous reports date to 2006 and 2013, the situation has not significantly improved over time [19, 20].

As previously reported, the majority of the studies and reviews focus on preterm neonates. In fact, critically ill preterm neonates are more numerous than term infants, and the two age groups present with very different diseases and comorbidities. Thus, not only preterm neonates are more commonly admitted to NICUs than term babies, but these latter present with more complex and rare life-threatening conditions such as, for instance, meconium aspiration [21, 22] that are more difficult to study in randomized trials. The association between the certainty of evidence and the number of trials and enrolled patients was expected since, according to the GRADE classification, an intervention needs a sample size large enough to produce precise results [23]. One of the domains in the GRADE assessment is the imprecision of the outcome estimate. As imprecision is mostly based on the interpretation of the effect estimates, whose confidence intervals are narrower for more common event and larger populations. It is also interesting that four of the eight high certainty outcomes reported the “so called” null findings, where the analyzed intervention did not show evidence of a difference compared to control intervention.

The classification of the certainty of evidence may change if reviews are updated, and the outcome estimates usually become more precise [24]. It is more common for very low evidence to be reclassified into higher categories, than for high evidence outcome to be downgraded [12, 25]. Continuous research efforts to improve the quality of neonatal care are needed. Traditionally, this means designing more and larger randomized trials. However, meta-epidemiological reviews of existing literature are important, as these hold potential to spare resources from unnecessary trials.

Strengths and limitations

The main strength of our work was the full transparency and respect of best review practices without any protocol violations. The main limitation comes from the fact that we classified the review interventions and outcomes after data extraction and classified the outcomes as subjective and objective based on our own experience and opinion. To reduce this and promote transparency, we have provided the whole data as a supplementary file. Furthermore, a further limitation was that we had a protocol deviation, as we initially planned to use dual data extraction but performed the data extraction by individual author after 20% of the reports were extracted. This decision was based on the fact that there were no issues with the extraction as it was made directly from the summary of findings tables as those were. Another limitation may be that we decided to focus solely on Cochrane reviews and on a relatively narrow amount of time (i.e., 2022–2024). We decided so because (1) Cochrane reviews are conducted rigorously using the same formats and protocols, and (2) we wanted to have a picture of the really recent situation. In fact, we considered that the risk of bias tool was revised in 2019 and implemented into use only in 2020; this means that the protocols for the reviews published in 2022 have been written mostly in 2020–2021. However, it could be that the results here are influenced by the short timeline, and a longer study period could have altered the results in either direction. Thus, a future study should investigate if results presented here may be different on a wider time window or comparing Cochrane and non-Cochrane meta-analyses. A clear limitation of our results was the lack of detailed extraction on the GRADE domain assessments and the reasons for downgrading the evidence certainty. Although these were provided in the Cochrane reviews, we did not extract this information, and this will be the object of future meta-epidemiological studies. Thus, future research should analyze the GRADE assessments more in detail and clarify the reasons of downgrading the certainty and whether these decisions have been correctly taken.

Conclusion

Only 2% of the outcomes had high certainty of evidence in the Cochrane neonatal reviews published in the last 2 years. The certainty was significantly associated with the number of included trials and participants. Neonatal trials and reviews still need more and larger studies to improve the overall quality of the evidence.