Introduction

Crohn’s disease (CD) is associated with substantial morbidity [1, 2]. The management of inflammation is crucial in preventing disease complications—emphasizing the importance of effective assessment of inflammation [3].

Colonoscopy stands as the gold standard for CD diagnosis. However, it is invasive, and the evaluation of the small bowel remains inadequate [4]. Magnetic Resonance Enterography (MRE) is a non-invasive technique that is effective for assessment of CD activity [4,5,6,7,8]. MRE’s non-invasive nature also offers potential in evaluating treatment response or identifying therapeutic inefficiency. This can prompt early detection and timely adjustments in therapy to maintain clinical remission [9,10,11]. Nevertheless, diagnosing Crohn’s disease using MRE is time intensive and demands high expertise.

In recent years, Artificial Intelligence (AI), especially convolutional neural networks (CNN), have notably impacted computer vision. CNNs, a type of deep learning (Fig. 1), excel in pattern recognition and are affecting the way in which medical images can be analyzed [12, 13]. This technology offers an innovative approach to diagnosing and monitoring CD activity [14,15,16,17].

Fig. 1
figure 1

CNN specialize in image processing, utilizing small filters per layer to identify recurring patterns. Their hierarchical structure enables shallow layers to detect low-level patterns and deeper layers to grasp high-level image comprehension

We reviewed the literature to evaluate articles focused on the use of deep learning to improve MRE analysis in CD.

Methods

This review adhered to the guidelines outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). The PRISMA checklist for systematic reviews can be found in Supplementary Materials 1. Studies were selected based on the following systematic approach.

Search strategy

We conducted an extensive literature search on October 1, 2023, using the PubMed/MEDLINE database to identify studies investigating the use of deep learning for detecting CD in MRI/MRE. Our search terms included “MRI or MRE,” and “Crohn’s disease,” and “deep learning” or related terms, such as “convolutional neural networks,” “machine learning,” and “artificial intelligence.”

Detailed information about the complete search strategies can be found in Supplementary Materials 2. We also conducted a manual search of the references in the studies we included.

Inclusion criteria encompassed studies that (1) assessed the effectiveness of a deep learning model in detecting CD on MRI/MRE, (2) were published in the English language, (3) were peer-reviewed original publications, and (4) included an outcome measure.

Exclusions were applied to articles not related to computer vision, non-deep learning articles, non-original articles, and abstracts.

This study is registered with PROSPERO under the registration number CRD42023484725.

Study selection

Two authors OB, medical student, and EK, senior abdominal radiologist, autonomously assessed the titles and abstracts to ascertain if the studies satisfied the inclusion criteria. When the title met the inclusion criteria or if any uncertainty arose, a thorough examination of the full-text article was conducted. If a relevant title appeared in the references section of one of the included studies, it was also screened for inclusion. In cases of disagreements, a third reviewer DE, medical student, was consulted for resolution.

Data extraction

Utilizing a uniform data extraction template in Microsoft Excel, two reviewers (OB and EK) separately gathered information. The data encompassed details, such as publication year, study design and location, patient count, ethical considerations, inclusion and exclusion criteria, study population description, deep learning technique, utilization of an online database, database size, incorporation of an independent test dataset, performance of cross-validation, assessment metrics employed, and the main findings.

Research questions

The overall aim of this systematic review is to analyze original, peer-reviewed journal publications between 2019 and 2023 on deep learning applications to MRE-based CD assessment. We have defined the following main research questions for our study:

  1. 1.

    What are the different applications of deep learning on MRE in CD?

  2. 2.

    Is there potential for advancement of conventional techniques using deep learning?

  3. 3.

    What are the strengths and limitations of these applications, especially with respect to the field of CD disease burden assessment?

  4. 4.

    What key research gaps are currently being investigated or should be investigated, according to researchers?

Quality assessment and risk of bias

We evaluated the quality of the studies using the modified Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) criteria [18]. The task was performed by two researchers: OB and DE. Justifications for the assessment of risk were noted for each criterion in each of the five studies included, and disagreements were resolved by a third researcher (MA, senior abdominal radiologist). No study was excluded based on quality assessment, and it is important to note that four of the five studies had noted “high risk” in at least one category. Details are available in Supplementary materials 2.

Results

Study selection and characteristics

The initial literature search resulted in 16 articles. Five studies met our inclusion criteria. Studies were published between 2019 and 2023. A total of 468 subjects were analyzed. Table 1 summarizes the characteristics of the included studies. Figure 2 summarizes the clinical application of each study included.

Table 1 A summary of articles in the literature review that applied deep learning techniques for magnetic resonance imaging involving Crohn’s disease
Fig. 2
figure 2

Graphical depiction categorizing deep learning studies according to their clinical application

Four of the studies were retrospective, one was prospective. In four of the studies, a board-certified radiologist, served as reference standard. No studies have performed external validation.

Descriptive summary of results

Studies included in this review utilized deep learning techniques for various tasks.

Son et al. used deep learning-based reconstruction (DLR) and Lian et al. employed CNN to improve image quality. Lamash et al. used CNN for inflammation assessment, while Van Harten et al. measured intestinal motility via centerline segmentation. McFarlane et al. utilized 3D image processing and reconstruction (3D-IPR) with AI to create pre-procedure 3D animations of perianal fistulas (Fig. 2).

Son et al. utilized a DLR technique to enhance the quality of MRE images in Crohn’s patients [18]. A qualitative assessment was carried out by two abdominal radiologists who evaluated three distinct sets of images: (1) original, (2) images processed with a conventional filter, and (3) images processed using the deep learning tool. The mean scores assigned by the radiologists revealed a statistically significant improvement in overall image quality for the DLR image set in comparison to both the original and conventionally filtered images (e.g., Coronal overall image quality: 3.6 (Original), 3.8 (Filtered), and 4.7 (DLR)). Additionally, they conducted a signal-to-noise ratio (SNR) analysis and observed a significant increase in SNR when employing the DLR method.

Lian et al. developed and tested a CNN algorithm with a dataset from 392 patients with epidemic IBD [19]. For the diagnosis of epidemic IBD, they achieved sensitivity 95% and specificity 47%. This demonstrates their CNN algorithm significantly improved image quality.

Lamash et al. employed a 1.5-T MRI system and T1-weighted post-contrast VIBE sequence to examine 23 pediatric patients with Crohn’s disease [20]. Their CNN-based segmentation demonstrated Dice Similarity Coefficients of 75% for the lumen, 81% for the wall, and 97% for the background. The median relative contrast enhancement value (P = 0.003) demonstrated discriminatory potential between active and non-active disease segments, while various other extracted markers showed differentiation capacity between segments with and without strictures (P < 0.05).

Van Harten et al. utilized a deep learning technique for quantification of intestinal motility as a marker of inflammation [21]. They achieved sensitivity 80% and PPV 86% for severe bowel disease with 312 annotated segments between the two groups (185 segments in 14 healthy volunteers vs. 127 segments in the severe bowel disease group consisting of 10 patients).

McFarlane et al. evaluated utility of deep learning-based 3D reconstruction models for 4 perineal CD patients [22]. They found the model provided a more comprehensive visual representation of the disease. No quantitative measures were conducted.

Quality assessment

As per the QUADAS-2 tool, three papers were identified with a high risk of bias in at least one category. Additionally, none of the five papers discussed in this review have been externally validated. A detailed evaluation of the bias risk is provided in Supplementary Document 2, Table 2.

Table 2 A summary of articles in the literature review that applied deep learning techniques for magnetic resonance imaging involving Crohn’s disease

Discussion

Accurate assessment of CD is pivotal for patient management [23,24,25]. MRE provides non-invasive insights into both structural and functional aspects without ionizing radiation. Objective endpoints for CD management have been evaluated and established in the form of MRE indices [10, 11, 26]. However, prior research raises concern about MRE analysis being prone to human error, stemming from the time and attention required in radiologists’ interpretations of the entire bowel [27].

The complex nature of CD can benefit from more reliable assessment of disease activity, distribution, and treatment response. This is especially true for clinical trials, where precise quantification of disease is essential. Interestingly, our review demonstrates varied clinical tasks of deep learning applied to CD in MRE. These range from improving image quality, segmenting disease to quantify burden, and 3D reconstruction for surgery planning.

Image quality

The use of deep learning models presents significant potential advancements in accuracy with these algorithms applied to MRE image analysis for CD [12, 13, 18]. The two included studies in this review evaluated image quality using deep learning applications: Son et al. attempted a DLR technique on 35 patients with known IBD, while Lian et al. employed CNN optimization on 392 patients with suspected IBD. Both studies reported a significantly increased quantitative signal-to-noise ratio. Son et al. demonstrated a qualitative improvement in image quality, including contrast, sharpness, motion artifacts, and blurring in the coronal and axial images; however, DLR image sets appeared more synthetic [18]. Lian et al. demonstrated a lower high-frequency error norm with the optimized CNN algorithm compared to other algorithms and reported a sensitivity of 95.2% and specificity of 46.7% for epidemic IBD, indicating significantly increased image quality with the use of CNN optimization compared to baseline [19].

Quantification of disease burden

Lamash et al. evaluated the effectiveness of CNN-based segmentation in 23 pediatric patients [20]. Aside from showcasing identification of active from non-active bowel segments, similar to conventional methods, deep learning models were able to accurately, and automatically, identify bowel strictures. Stricture length measurements between two senior radiologists were also reported with greater agreement.

CD, among other gastrointestinal diseases, is associated with deviations in small bowel motility [28,29,30]. A novel study has emerged for the quantitative assessment of 3D cine-MRI of small intestinal motility in both healthy patients and those with severe IBD [21]. By employing a combination of stochastic tracking and CNN-based orientation classifiers, Van Harten et al. successfully differentiated between motile and non-motile bowel segments. Their findings indicated that such models may surpass the clinically recommended model in assessing ileal CD activity. This highlights the potential for precise, automated, and non-invasive monitoring of intestinal inflammation in CD patients [27].

3D image processing and reconstruction

McFarlane et al. developed a CNN-based algorithm to assist in 3D image reconstruction, enhancing surgical planning for four patients with complex perianal fistulas in Crohn’s disease [22]. The reconstruction models provided a comprehensive representation of the perianal disease, aiding in the identification of internal fistula orifices, seton placements, and fistula tracts. While not providing extra information beyond what is acquired through MRI, 3D image reconstruction offers a more realistic depiction of the anatomy. This study demonstrates the potential of CNN-based diagnostic tools to assist surgeons improving surgical outcomes and reducing the number of surgical procedures needed for patients.

The heterogeneous focus of the included studies highlights the broad potential of deep learning in CD assessment via MRE. However, this diversity also complicates direct comparison of outcomes. Despite these challenges, the variety underscores the versatility of deep learning technologies in improving diagnosis, monitoring, and treatment planning for CD. To harness this potential fully, future research must aim for more standardized study designs. Such consistency will facilitate clearer comparisons and enable validation of deep learning applications across clinical settings. This approach will not only streamline research efforts but also accelerate the integration of these technologies into clinical practice, offering new avenues for patient care in CD management.

The heterogeneity observed in the studies under review reflects the multifaceted applications of deep learning across the spectrum of CD management via MRE, highlighting its potential to revolutionize the patient journey in radiology. From improving images through disease burden quantification to treatment planning, deep learning could potentially enhance each step, offering a more nuanced, precise, and patient-centered approach.

Despite the potential on display, research around deep learning is still in its early days, with few studies tackling these varied applications. Analyzing this evolving field of research is crucial. As deep learning and MRE technology advance, they promise to offer deeper insights into CD, enhancing diagnosis, monitoring, and surgical planning. Continuously reviewing these developments is essential for harnessing their full potential to improve patient care.

Limitations

This systematic review, while providing insights into the emerging role of deep learning in enhancing MRE for CD assessment, has several limitations that warrant discussion:

Heterogeneity of included studies: The diversity in the objectives, methodologies, and outcomes of the included studies presents a significant challenge in synthesizing findings. This heterogeneity stems from varied focuses, such as image quality improvement, disease burden quantification, and 3D reconstruction for surgical planning. While this breadth highlights the potential of deep learning across different aspects of CD imaging, it complicates direct comparisons and synthesis of results and prevents performing a meta-analysis. Our rationale for including these diverse studies was to capture this research field’s current directions, acknowledging that deep learning in CD MRE is rapidly evolving with research in its early stages.

Methodological diversity: The included studies exhibit a range of study designs, patient populations, and deep learning techniques, which may influence the generalizability and comparability of findings. The predominance of retrospective studies and the absence of external validation in the analyzed research further contribute to potential biases and limit the strength of our conclusions.

Quality and relevance assessment: Our systematic review adhered to established guidelines and employed the QUADAS-2 tool for assessing the risk of bias. However, the subjective nature of some assessment criteria and the limited number of studies meeting our inclusion criteria may have impacted the robustness of our evaluation.

Addressing the limitations identified in this review requires an effort toward conducting prospective, multicenter studies with larger sample sizes, and external validation. Standardizing outcome measures and employing consistent deep learning methodologies will facilitate more direct comparisons of findings. Future reviews should also focus on overcoming the challenges of study heterogeneity by exploring specific tasks and specific algorithms of deep learning in CD MRE.

Conclusion

In this review, we have included a set of preliminary studies evaluating deep learning’s potential to improve conventional assessment of CD MRE. Deep learning models in this field offer promising enhancements to current MRE readings in the tasks of improving image quality, quantification of disease burden, and enhancing surgical planning.

Despite the promise shown by these studies, they are not without limitations. All, but one, of the studies are retrospective and none of them have had external validation. Although this review helps shape our current understanding of deep learning applications in relation to MRE and CD assessment, the findings discussed are limited due to heterogeneity in study design, the small size of the studies included, and the varied clinical tasks explored. As a result of these limitations, the clinical significance of deep learning is still uncertain and in active research.

Future research needs to include direct comparisons between deep learning and conventional radiological assessments. Each of the algorithms discussed must be further evaluated prior to its introduction to the clinical setting. Multicenter prospective studies will be crucial in validating the effectiveness of these systems, thereby establishing their role in the clinical management of CD.