Introduction

Abnormalities of the foetal brain occur in approximately 25 per 10,000 births in the UK [1] and can result from environmental, chromosomal, genetic or acquired causes. Accurate diagnosis of foetal brain abnormalities is necessary to guide management of the pregnancy and facilitate parental counselling.

Ultrasound scanning (USS) is the primary diagnostic imaging method for screening of the pregnancy and considered the reference standard for imaging the foetus brain. There are occasions when technical limitations hinder clear visualisation of the foetal anatomy [2, 3] which led to the exploration of other diagnostic tests to supplement USS.

Advances in MR technology have meant initial technical restrictions in imaging the foetus with in utero magnetic resonance (iuMR) imaging have been overcome, experience within radiology has increased and a growing body of literature confirms increasing use of iuMR in diagnosing foetal brain abnormalities [47]. Despite this, the true clinical value of iuMR has not been established. Previous limited statistical evidence was unable to demonstrate, in terms of diagnostic accuracy, any benefit [8].

To our knowledge, there have been only two other recently published systematic reviews in which Rossi and van Doorn aimed to clarify the additional benefit of MRI in the diagnostic pathway when used in addition to USS [9, 10]. Rossi reviewed 13 studies and van Doorrn selected 27 studies for review. Despite similar aims and inclusion criteria only seven studies were included in both reviews. This could, along with date differences for searches, be due to the differences in exclusion criteria. The criteria used by Rossi excluded studies without an outcome reference diagnosis (ORD), non-English publications and those where data were reported in graphs or percentages. Van Doorns review excluded studies with a sample size of less than 20 and studies where diagnoses were inadequately described. We felt a new systematic review was justified in order to update the existing, to attempt to limit the number of studies excluded and to identify any other studies which may have been erroneously excluded.

The aim of this study is to answer the following question: Is the diagnostic accuracy of iuMR superior, equivalent or inferior to USS? We aimed to assess diagnostic accuracy of iuMR following antenatal USS through:

(a) Measurement of diagnostic accuracy of antenatal USS alone (i.e. prior to iuMR) in relation to an ORD determined by postnatal imaging, surgery or post-mortem examination

(b) Measurement of diagnostic accuracy of iuMR (following antenatal USS) relative to an ORD

Secondary aims were to determine if counselling and/or management of the pregnancy changes as a result of iuMR imaging and to identify the foetal brain anomalies for which iuMR is most useful.

Methods

Protocol

The protocol was written in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [11] and registered with the International Prospective Register of Systematic Reviews (PROSPERO, CRD42015010265).

Eligibility criteria

All study designs were considered eligible apart from case reports, reviews or commentaries.

Participants

Pregnant women who had undergone, due to suspicion of a brain abnormality, prenatal ultrasound and subsequent prenatal iuMR of their foetus’ brain and any findings confirmed by an ORD.

Reference standard

Reference standards accepted to confirm the outcome diagnosis were postnatal imaging (transcranial US, MRI or CT) and surgery or, in cases of foetal demise or neonatal death, autopsy and post-mortem MR imaging.

Exclusions

Studies not reported in English and translation was unavailable. If an English abstract was available these were scrutinised for relevant information, but limited data meant adherence to the inclusion criteria could not be certain.

Search methods

We identified all studies in which iuMR imaging was used to supplement USS for imaging foetal brain abnormalities in utero using a sensitive search strategy of the following electronic databases using MesH and free-text terms as detailed in Appendix 1, adapting the strategy for each database.

Databases searched were Medline (via OVID) (1966 to present), EMBASE (via OVID) (1980 to present), Cochrane Register of Diagnostic Test Accuracy Studies (accessed 18/03/2015 and 02/10/2015) and Web of Science (1900 to present). In addition, we searched relevant journals, conference proceedings and examined reference lists of relevant and included studies.

Electronic searches were conducted in March 2015 without date restriction and later updated to identify all relevant papers up to September 2015.

Data collection

Selection of studies

Screening of citations was completed independently by two reviewers (DJ, CM). Any disagreements were resolved by consensus. Where only abstracts were available, attempts were made to contact authors for full reports. If the same data had been published in more than one publication, the most up to date or complete study was selected.

A PRISMA flowchart was used to document and report any decisions made during the study selection process [9] (Fig. 1).

Fig. 1
figure 1

PRISMA flowchart of study selection and exclusions

Assessment of methodological quality of included studies

Included studies were assessed independently for methodical quality (DJ and CM) using a modified Quality Assessment of Diagnostic Accuracy Studies (QUADAS 2) tool [12]. Studies were rated in terms of bias risk and applicability using signalling questions to score the four key domains—Patient selection, Index tests, Reference standard and Flow and Timing. Studies were scored as “Yes”, “No” or “Unclear” for each checklist item. Additional signalling questions were introduced for both study design and index tests. These were to determine prospective versus retrospective design and details regarding USS and iuMR technique and reporting as these were elements considered likely to introduce bias.

Data items and analysis

Study characteristics and outcomes were extracted independently (DJ and CM) and recorded using a data collection form (Appendix 2) which was piloted on three papers to ensure suitability. Characteristics noted for each study are listed in Appendix 2. The number of correct and incorrect diagnoses made by both USS and iuMR were also recorded as judged by the ORD confirmed by postnatal imaging, autopsy or surgery. Clinical examination was discounted as a reference standard as the majority of structural brain abnormalities are not apparent externally. Where studies reported the results of imaging from multiple anatomical areas, only results of the foetal brain were included.

It was anticipated that all studies would recruit only (or predominantly) foetuses with a brain abnormality diagnosed by USS, meaning the sensitivity and specificity of the imaging modalities could not be estimated because of the lack of foetuses without brain abnormality. Therefore, the analysis defined diagnostic accuracy for each modality as the percentage of cases where the diagnosis was confirmed by ORD. In foetuses with multiple abnormalities a primary diagnosis was identified as the abnormality with the most detrimental clinical outcome. In cases where both modalities identified the primary diagnosis but one provided a more specific diagnosis and/or additional information without fundamentally changing the primary diagnosis, our analysis assumed both modalities were correct but the nature of disagreements was subsequently investigated.

A meta-analysis of the diagnostic accuracy of iuMR in relation to USS was conducted using the Stata statistical analysis software [13]. For each study the odds ratio for the paired iuMR and USS accuracies and its standard error were computed using the method of Becker and Balagtas, using a 0.5 correction for zero cells [14, 15]. Odds ratios were combined using a random effects model and the I 2 statistic was used as an indicator of heterogeneity within the included studies [16, 17].

Results

Our initial searches generated a total of 1252 potential studies with 807 remaining for additional scrutiny after duplicates were removed. Further screening resulted in 34 published studies for final inclusion [3, 1850]. Categories for exclusion of full papers reviewed but rejected are listed in the PRISMA flowchart (Fig. 1).

Study characteristics

The 34 studies, listed in Table 1, were published over a 20-year period (1994–2014). Nineteen were prospective [3, 1835], 12 retrospective [3647] and three unspecified [4850]. All studies selected a consecutive cohort of patients with either a remit to investigate all foetal brain abnormalities (24 studies [3, 1823, 2932, 35, 3941, 4449]) or to investigate a more specific brain abnormality e.g. ventriculomegaly, corpus callosum anomalies (10 studies) [2428, 34, 36, 38, 43, 50].

Table 1 Studies included in the review and their characteristics

USS was performed in a tertiary centre and/or conducted by foetal medicine experts in 21/34 studies [3, 1821, 2629, 31, 32, 3538, 4043, 47], in 12/34 it was either unclear or not specified [2224, 30, 33, 34, 39, 45, 46, 4850], and in one study [44] USS was performed in a routine clinical setting. Clear details regarding USS technique (transabdominal or transvaginal, views obtained) and equipment (manufacturer, transducer) were provided in 21 studies [1820, 22, 2426, 28, 3237, 39, 40, 42, 43, 47, 48, 50]. The remaining 13 studies [3, 21, 23, 27, 2931, 38, 41, 4446, 49] provided minimal information or details were not given. Three out of 34 acknowledged technical difficulties in some cases which limited the USS [3, 28, 48]. The age range of foetuses reported across studies was 13–41 weeks gestation. Time delay between USS and iuMR was less than 2 weeks in 19/34 [3, 1821, 2325, 27, 30, 32, 33, 39, 41, 42, 44, 46, 47, 49] and not specified in 13/34 studies [22, 26, 28, 29, 31, 34, 36, 38, 40, 43, 45, 48, 50]. In two studies [35, 37] there were cases in which the time delay was greater than 2 weeks.

Experience of the clinician reporting the iuMR study was only available in 10/34 studies [20, 21, 25, 27, 28, 30, 32, 35, 37, 42], half of these quantified this in terms of years (between 1 and 15) the remaining gave a description of ‘experienced’. In two studies, the reporting radiologist was unaware of USS findings [21, 37]. Information regarding MR technique was reported in all papers including at least two of the following: manufacturer, sequences, types of receiver coils and patient positioning. Fast T2-weighted sequences were performed in all studies with some using additional sequences (e.g. T1, DWI, 3D and FLAIR). Early studies reported the use of fasting and sedation to achieve optimal imaging [22, 34].

Methodological quality

The methodological quality assessments using the Quadas 2 criteria are presented in Fig. 2. Risk of bias for patient selection and applicability was low in 31/34 (91 %) studies [3, 1845, 47, 50], high in one (6 %) [46] and unclear in two [48, 49] with high risk of bias due to patient selection criteria not being defined and retrospective study designs. The risk of bias due to conduct and interpretation of the index tests was low risk in 15/34 (44 %) [3, 18, 20, 21, 25, 28, 30, 32, 3537, 40, 42, 43, 47], high risk in 4/34 (12 %) [38, 4446] and unclear in 15/34 (44 %) [19, 2224, 26, 27, 29, 31, 33, 34, 39, 41, 4850]. Assessment of potential bias introduced by the reference standard was considered low risk in 19/34 (56 %) studies [3, 18, 19, 21, 22, 24, 2831, 35, 36, 38, 40, 44, 4750], high risk in nine (26 %) [20, 27, 3234, 41, 43, 45, 46] and unclear in 6/34 (18 %) [23, 25, 26, 37, 39, 42], as there were a proportion of cases within the study that did not have a confirmed outcome or it was determined by clinical examination. Bias in the flow and timing as judged by timing between USS and iuMR imaging or due to methods used for analysis of findings was deemed low in 15/34 (42 %) [3, 18, 19, 2325, 30, 32, 33, 35, 39, 46, 47, 49], high in 11/34 (32 %) [21, 26, 31, 34, 3642] and unclear in 9/34 (27 %) [20, 22, 2729, 44, 48, 50].

Fig. 2
figure 2

QUADAS risk of bias assessment

Diagnostic accuracy of US and MRI

The 34 included studies reported a combined total of 2530 foetuses (median 32.5, range 10–834) but of these 62 % (n = 1571) were excluded as they did not have an iuMR (n = 796), 542 did not have an ORD, were non-brain pathology (n = 159) or other exclusions (n = 74). Consequently this systematic review reports on the outcomes of 959 foetuses. In 6/34 studies [19, 2830, 44, 49], all foetuses had an ORD, and combined contributed 186/959 to the analysis in this review (median 24.5, range 12–72). The remaining 773/959 (median 38, range 10–834) foetuses were from the outstanding 28 studies [18, 2027, 3043, 4548, 50].

The overall diagnostic accuracy combined across 34 studies was 75.2 % for USS and 91.0 % for iuMR (overall odds ratio = 3.10, 95 % CI 1.98 to 4.86, p < 0.0001; Fig. 3). Although individual studies were heterogeneous (I 2 = 45 %; p = 0.002), nearly all reported an improvement in diagnostic accuracy following iuMR. The data are also represented in the form of a L’Abbe plot (Fig. 4) in which the diagnostic accuracies of iuMR and USS are presented as percentages.

Fig. 3
figure 3

Forest plot showing the odds ratios of all studies (first author and date only) and overall odds ratio with confidence intervals

Fig. 4
figure 4

L’Abbe plot of diagnostic accuracy of USS and iuMR. Circle size is proportional to sample size of each study

Agreement between USS and iuMR

The reports from USS and iuMR were in agreement and agreed with the ORD in 527/959 (55 %). USS and iuMR were in agreement but discordant with the ORD in 52/959 (5.5 %) foetuses (Table 1a and b, and 2).

Table 2 Results of the number and percentage of foetuses within each category of outcome

In 160/959 (16.5 %) foetuses iuMR and USS were in agreement regarding the primary diagnosis but additional information was added—either secondary diagnoses or a more concise/confident primary diagnosis given. In this category iuMR provided additional information in 146/959 (15 %) and USS provided additional information in 14/959 (1.5 %) cases as confirmed by ORD.

Disagreement between USS and iuMR

The diagnoses on iuMR and USS disagreed in 222 (23 %) cases. Of these, the iuMR was in agreement with the ORD in 186 (19 %), the majority of which were abnormalities undetected by USS (139/186, 75 %). The remaining 47/186 (25 %) were abnormalities reported by USS but correctly excluded by iuMR. In 34 cases the USS diagnosis was incorrectly overturned by iuMR, 10 of which were abnormalities wrongly excluded or missed by iuMR and 24/34 were abnormalities diagnosed by iuMR but not found by USS or on the ORD (Table 2b and 3b).

Table 3 Discordant diagnoses according to abnormality detected

Table 3 presents the discordant diagnoses between USS and iuMR according to category of anomaly. The most frequent areas of disagreement were midline (24 %) and posterior fossa abnormalities (21 %). In particular agenesis of the corpus callosum and the Dandy Walker spectrum of abnormalities were frequently missed or, less frequently, wrongly identified on USS. The most frequently misdiagnosed anomalies on both USS and iuMR were cortical formation abnormalities (17 %) such as hemimegalencephaly, lissencephaly and heterotopia.

Changes in counselling and management

Eleven studies [3, 18, 2831, 40, 41, 44, 47, 48] reporting on 186 foetuses specified the benefit of iuMR in terms of changes to counselling of parents or management of the pregnancy. These changes as a result of findings on iuMR affected 78/186 (41.9 %) foetuses.

Discussion

This systematic review and meta-analysis demonstrates that using iuMR to support USS in the diagnosis of foetal brain abnormalities increases diagnostic accuracy by 16 % (75 % for USS alone and 91 % for iuMR as an adjunct). The heterogeneity of the included studies was moderate (I 2 = 45 %, p = 0.002) according to the definitions of Higgins et al. [51], suggesting methodological and clinical variability and inconsistency in the measurement of outcomes within each study. Although investigation of heterogeneity is recommended [51], the ability to do so is compromised by the lack of reporting (and indeed quantification) of all the ways in which studies differ. The performance of both diagnostic tests is influenced by many factors, and a limitation of this review was incomplete reporting of characteristics that would potentially influence diagnostic performance such as operator experience (specified in just a third of included studies) and technical difficulties (three studies) [3, 28, 48].

iuMR is not without its limitations and our review demonstrated that iuMR overestimates the presence of abnormalities more frequently than failing to identify them. This could be explained by the nature of foetal iuMR in which the need for fast imaging compromises image quality. To the untrained eye artefacts from maternal breathing, foetal movement and image aliasing may potentially mimic or obscure pathology [52]. It is for this reason ‘experience’ should perhaps be defined by the number of foetal brain examinations reported.

The timing of USS in relation to iuMR imaging is also relevant in the assessment of both tests. The foetal brain develops rapidly and significant delay between the two examinations may influence the ability to diagnose accurately either because of natural brain development, increase in size of critical anatomical structures or because of disease progression. Thirteen out of 34 studies failed to report delay time, making an overall analysis of effect from this criteria unreliable.

The extent to which iuMR ultimately contributes to changes in management or in counselling regarding the pregnancy is also unclear as this was only reported in a small proportion of studies. Equally the impact of a wrong diagnosis made by iuMR was not defined in any study despite it occurring in 14/34 studies [18, 19, 21, 23, 26, 29, 33, 35, 36, 3942, 44, 47].

Our review builds on the systematic reviews undertaken by Rossi et al. and van Doorn. Rossi identified 2323 potential studies published between years 2000 and the end of 2012 and reviewed 13 studies (710 foetuses), having excluded 2293 by title and abstract. Van Doorn searched for publications between years 1990 and March 2014 and identified 2748 and excluded 2577 by title and abstract with 27 studies (1184 abnormalities detected by USS but only 454 with ORD) reviewed. The differences of search dates and of exclusion criteria, described earlier, appear to be the factors resulting in the variation of studies reviewed by each study.

An important difference between the two is that Rossi restricted studies to those where outcomes were confirmed by a reference diagnosis, although chose to accept clinical examination as an ORD whereas van Doorns’ selection criteria did not require an ORD. A strength of our review was the requirement of an ORD for any outcomes included in the meta-analysis. As previously stated we excluded clinical examination as an ORD. Although this significantly reduced the number of outcomes available, we felt this was justified as most structural brain abnormalities, and consequently diagnostic accuracy, cannot be determined with certainty on clinical examination alone.

Our analysis included 34 studies, of which 15 were additional to those included in the previous reviews owing to more recent searches and differences in selection criteria such as unlimited year of publication or sample size within studies. Although Van Doorns’ searches were unrestricted by non-English publications or the requirement of an ORD, our review included more studies. This may be due to the limitation of sample size of less than 20 by van Doorn, resulting in six additional studies in this review, and the requirement of ‘adequate description of diagnoses’ which was not clearly defined by van Doorn.

Even with subtle differences in methods between all the reviews, findings were similar. Rossi reported that iuMR was accurately able to identify brain abnormalities in 94.3 % of included foetuses, van Doorn reported 80 % and our study 91 %, an increase of 15–20 % when compared to USS alone. Both Rossi and van Doorn reported that the highest proportion of disagreement between USS and iuMR was related to midline abnormalities, particularly the posterior fossa. iuMR was better able to diagnose abnormalities in this anatomical region, also consistent with the findings of this systematic review which incorporates a further four studies published since 2012.

Although heterogeneity was not quantified by Rossi and van Doorn, both reviews highlighted the inadequate reporting of study characteristics which may compromise the findings of all systematic reviews. In order to adequately assess the accuracy of a diagnostic test and determine its true benefit in clinical practice, optimal study design is necessary [51].

We believe replication of the previous reviews is both justified and necessary—it reassures that the minor differences in inclusion and exclusion criteria both at study selection and data extraction do not change the outcomes significantly, thus adding weight to the current evidence base. In spite of the different nature of all the studies, the diagnostic accuracy of iuMR was clearly superior across the studies but the heterogeneity identified may compromise these findings. The moderate level of heterogeneity identified by our review warranted further investigation but was prevented by insufficient reporting of study characteristics. Despite its increasing use in clinical practice, poor study design has previously brought into question the diagnostic capabilities of iuMR above that which is achieved by USS and its benefit in terms of guiding the management of pregnancy and further studies are needed [53]. For this reason we instigated the MERIDIAN [54] project, a large prospective study to investigate iuMR imaging in the diagnosis of foetal brain abnormalities to provide definitive evidence to guide future practice.

Conclusion

When foetal brain abnormalities are suspected on USS, iuMR imaging is able to contribute significantly to the diagnostic pathway by both clarifying findings and increasing significantly the detection rate of abnormalities, particularly in midline and posterior fossa anomalies. Limitations of previous studies suggests that further investigation is still required to clarify the full impact of iuMR.