Introduction

Helicobacter pylori (H. pylori) infects the epithelial lining of the stomach and is the major cause of chronic gastritis, peptic ulcer disease, and gastric cancer1. H. pylori eradication has become the standard therapy to cure peptic ulcer disease1. In regions with a high incidence of gastric adenocarcinoma, eradication of H. pylori is advocated to prevent the development of gastric cancer2.

Several diagnostic methods utilizing invasive or non-invasive techniques with varying levels of sensitivity and specificity have been developed to detect H. pylori infection. Invasive methods including rapid urease test, histology, and culture require endoscopy with biopsies of gastric tissues3. Rapid urease test is based on the production of urease enzyme by H. pylori bacteria. The sensitivity the test are significantly lower in patients with intestinal metaplasia and also in the cases with peptic ulcer bleeding4,5,6. Additionally, treatment with proton-pump inhibitors, antibiotics, and bismuth compounds may also lead to false-negative results because these agents can prevent the production of urease by H. pylori3. Furthermore, several organisms such as Klebsiella pneumoniae, Staphylococcus aureus, Proteus mirabilis, Enterobacter cloacae, and Citrobacter freundii in the oral cavity or stomach also present urease activity and may give false-positive results6. Histology is more expensive than rapid urease test. Many factors affect the diagnostic accuracy of histological examination, such as the number and location of the collected biopsy materials, the experiences of pathologists, the staining techniques, PPIs or antibiotic use, and the presence of other bacterial species4, but with structural similarity to Helicobacter7.

Several studies have demonstrated that the judgment of H. pylori infection by conventional white light endoscopy could be based on the presence of diffuse redness, rugal hypertrophy, or thick and whitish mucus8. However, diagnosis by the impression of a gastroenterologist using endoscopic images is inaccurate and cannot be used for the management of gastrointestinal diseases in clinical practice8.

Recently, emerging studies have highlighted the application of artificial intelligence in the diagnosis of gastrointestinal diseases9,10,11. For example, the application of deep learning to endoscopic images by a Convolutional Neural Networks (CNN) has been used to detect small intestine or colon lesions12 and to assess the invasion depth of gastric cancer13,14,15. Deep learning with the computer-aided analysis of endoscopic images using CNN has also been developed for the diagnosis of H. pylori infection10,16,17. However, several studies applying artificial intelligence in the diagnosis of H. pylori infection used inadequate tests as gold standards for diagnosis such as serum H. pylori antibody10,16 and urine H. pylori antibody17. In fact, a positive test of serum or urine H. pylori antibody indicates the tested subjects with either active or past H. pylori infection. Therefore, these studies using antibody tests as gold standards for active H. pylori infection might have a false-positive result for those with past H. pylori infection, and the inadequate gold standard would impair the diagnostic accuracy of developed AI system for H. pylori diagnosis. Additionally, some of these studies excluded patients with peptic ulcer and gastric cancer from the investigated population18. Exclusion of these important target populations might limit the generalizability of the CNN decision system for the diagnosis of H. pylori infection.

With regard to the artificial intelligence technology in the diagnosis of gastrointestinal diseases, Liu et al. proposed two sub-networks: O-stream and P-stream. The original image was used as an O-stream. The input extract color, global features, and preprocessed image were used as the input of the P-stream to extract texture and detailed features19. Sobri et al. proposed a computer visualization technology to extract features from texture and color and to extract features of the Gray-Level Co-occurrence Matrix (GLCM) from the wavelet transformed image. They used discrete wavelet transform on the endoscopic image, classified the endoscopic gastritis image with image features, and then combined the texture and color moment features to develop a classifier model, SVM20. Many pre-processing articles use discrete wavelet transform, GLCM, and color space conversion methods to extract texture features.

Hierarchical feature engineering in high dimensional learned kernels from the complex connection of parameters and nonlinear activation function makes the learned features in the CNN augur well, with the benefit of translation invariance. However, many methods in the past used separate modules with deep learning to extract features of images concerned with the nature of the underlying problem. Jain et al. proposed a CNN-based WCENet model for anomaly detection and positioning in Wireless Capsule Endoscopy (WCE) images21. Zhang et al. proposed a dense CNN network-based stereo matching method with multiscale feature connections as Dense-CNN. A new dense connection network with multiscale convolutional layers was constructed using Dense-CNN. The rich image features were extracted, and the combined multiscale features with context information were used to estimate the cost of stereo matching. The experimental results with the proposed new loss function strategy have been used to learn neural network parameters more reasonably, which can improve the performance of the proposed Dense-CNN model in disparity calculation22. Several previous studies have shown that a cognitive visual attention mechanism that adds to the CNN network architecture can extract more important features from the original image and improve the performance of artificial intelligence.

Currently, diagnosis of H. pylori infection during endoscopy requires gastric biopsies with rapid urease test, histology or culture in clinical practice. However, gastric biopsies with aforementioned tests require biopsy instruments and costs of rapid urease test, histology and culture. Additionally, histological examination and culture of H. pylori are time-consuming. Furthermore, gastric biopsy may induce bleeding in patients taking antiplatelet or anticoagulant agents and those with coagulopathy. If a novel artificial intelligence system using real-time endoscopic images has a similar ore even higher diagnostic accuracy for H. pylori infection as aforementioned biopsy methods, it may replace these diagnostic modalities and also can save medical cost, provide immediate diagnosis and avoid biopsy-induced bleeding in patients with bleeding tendency.

In this study, we hypothesized that artificial intelligence learning technology can accurately assess H. pylori status by endoscopic images, and aimed to develop a novel artificial intelligence classification system for the diagnosis of H. pylori infection by CNN and Concurrent Spatial and Channel Squeeze and Excitation (scSE) network, combined with different classification models for deep learning of gastric images. In order to increase the generalizability of the artificial intelligence classification system, we included the subjects with and without major upper gastrointestinal diseases such as peptic ulcer and gastric cancer. In addition, we used an accurate method, rapid urease test, as the gold standard for the diagnosis of H. pylori infection in this study. Furthermore, the current study used the CNN model and the attention technology, which could improve the body and antrum images with a better classification effect.

Materials and methods

Patient population

Patients receiving endoscopy with gastric biopsies for rapid urease test at An Nan Hospital (Tainan, Taiwan) from October 2020 to December 2021 were retrospectively searched. The exclusion criteria included (1) previous eradication treatment for H. pylori infection, (2) history of gastrectomy, (3) use of antibiotics within the previous 4 weeks, (4) use of proton pump inhibitor within 2 weeks before endoscopy (5) coexistence of serious concomitant illness (for example, decompensated liver cirrhosis, uremia, and malignancy), and (6) upper gastrointestinal bleeding. The patients were divided into 5 equal subsets, and each subset had about 60 patients. The endoscopic images from the first three subsets of patients (n = 182) receiving endoscopy between October 2020 and June 2021 were assigned to the derivation group for creating an artificial intelligence classification system in the diagnosis of H. pylori infection. The endoscopic images from the other two subsets of patients (n = 120) receiving endoscopy between July 2021 and December 2021 were assigned to the validation group for assessing the accuracy of the derived artificial intelligence classification system. The study protocol was approved by the Institutional Review Board of the An Nan Hospital of China Medical University (TMANH109-REC008). The Institutional Review Board waived informed consent requirement of the study because it was a retrospective work.

Upper endoscopy and gastric images

Upper endoscopy was performed using a standard endoscope (GIF-Q260J; Olympus, Tokyo, Japan). Gastric images captured during high-definition, white-light examination of the antrum (forward) and body (forward and retroflex) were used for both the derivation and validation datasets. An antral gastric biopsy specimen and a body biopsy specimen were obtained for rapid urease test. H. pylori status was determined by the results of rapid urease test (Delta West Bentley, WA, Australia)23. Archived gastric images obtained during standard white-light examination from the endoscopic database were extracted. Two endoscopists independently screened and excluded images that were suboptimal in quality (i.e., blurred images, excessive mucus, food residue, bleeding, and/or insufficient air insufflation). The representative areas were then independently selected by two endoscopists according to standard selection criteria. The standard criteria for representative image selection included (1) clear images, (2) no bubbles, blood or food residue, (3) no reflex light, and (4) no specific lesions (e.g., erosion, ulcer or tumor). No special tool was used for representative area selection. Table 1 shows the numbers of patients and images in the derivation and validation groups. The major gastrointestinal diseases that patients suffered from included gastroesophageal reflux disease (n = 69), non-ulcer dyspepsia (n = 199), gastric ulcer (n = 20), duodenal ulcer (n = 12), and gastric cancer (n = 2).

Table 1 Numbers of patients and images in the derivation and validation datasets.

Figure 1 demonstrates the overall research flowchart. Endoscopic images of the gastric body and antrum from patients receiving endoscopy with confirmation of H. pylori status by rapid urease test were obtained for the derivation of an artificial intelligence classification system. The CNN and scSE network, combined with different classification models for deep learning of gastric images. The characteristics of the sample images were extracted effectively, and the classification model used the gastroscopic images from the antrum or body to make a comprehensive evaluation and diagnosis. All methods were performed in accordance with the relevant guidelines and regulations. The total number of patients providing endoscopic images was 302, of which 136 were H. pylori-negative and 166 were H. pylori-positive. Table 1 shows the numbers of patients and images in the derivation and validation datasets. The endoscopic images were obtained from the gastric antrum and body (Fig. 2). H. pylori status in the two gastric parts was classified into positive or negative categories using the artificial intelligence classification model.

Figure 1
figure 1

Overall research flow chart.

Figure 2
figure 2

(a) Endoscopic images of the body and (b) Endoscopic images of the antrum.

Capture image

Since the original input image affected the accuracy of the output result, unnecessary feature information was removed. As shown in Fig. 3, the representative area was selected by the endoscopists for image capture. Because the traditional image pre-processing method may destroy the original important features of the image and cannot improve the accuracy of machine learning classification, we did not use any traditional image pre-processing technology in this study. Two deep learning neural network models, CNN and scSE, were directly used to extract the image features to facilitate subsequent analysis of image features by various machine learning classification methods.

Figure 3
figure 3

(a) Body and (b) Antrum image capture.

Convolutional neural networks (CNN)

The problem with traditional deep learning models is that they ignore three dimensional information, such as the horizontal, vertical, and color channels of the data. For CNN24, each image in the train and test set of images passes through a series of layers, including the convolutional, pooling, and fully connected layers. Among these, the convolutional and pooling layers can maintain shape characteristics to avoid a large increase in parameters, while the fully connected layer will be extracted. The image feature uses the connection between each neuron and the upper neuron to perform the final classification25. Because CNN has a shared weight architecture and translation invariance features, and feature extraction and classification can be generated at the same time during training, allowing the network to learn more effectively in parallel26, so it has excellent results in image data work27.

Spatial and channel squeeze and excitation block (scSE)

For the scSE network28, the Spatial Squeeze and Channel Excitation Block (cSE) and Channel Squeeze and Spatial Excitation Block (sSE) models were used to adjust the network features and were regarded as important effective feature maps or feature channels. Weight was used to weigh and reduce the impact of unimportant features. Therefore, useful information was given a higher weight, while invalid information was given a lower weight29. As shown in Fig. 4, in the cSE model, the C × W × H feature vector of the feature map was converted to C × 1 × 1 through global average pooling, and then two 1 × 1 × 1 feature information was used for processing to obtain C-dimensional feature information, normalized using the Sigmoid activation function, and finally multiplied channel wise to obtain the feature map of cSE30. As shown in Fig. 5, the sSE model was a spatial attention mechanism, which mainly used 1 × 1 convolution to compress the original feature map to form a change from C × W × H to 1 × W × H, and then the Sigmoid function layer normalized the feature information from 0 to 1 and obtained the feature map of spatial attention, and finally directly added it to the original feature map to complete the spatial information calibration31. As shown in Fig. 6, the scSE was mainly composed of a parallel connection of two modules, cSE and sSE. After the original feature map passed through the sSE and cSE models, two modules were added to obtain a more accurate and calibrated feature map32.

Figure 4
figure 4

Channel attention mechanism of cSE architecture model.

Figure 5
figure 5

Spatial attention mechanism of sSE architecture model.

Figure 6
figure 6

A scSE architecture model composed of cSE and sSE.

Derivation and training algorithm

The endoscopic images of 182 patients were used for deep machine learning. Classification is the process of predicting the category of a given data point and belongs to the category of supervised learning, in which the target is also provided with input data33,34,35. For the need to predict H. pylori infection, it is more suitable to use classification algorithms for classification. As shown in Fig. 7, the layers of the feature extraction network were stacked from the original two to four, and finally, the last layer was used to match the input of the classification model such as KNN, SVM, RF, GBDT, AdaBoost, XGBoost, LGBoost, CatBoost. The output of the network was globally average pooled, and the original 8 × 8 × 256 dimensions were compressed to one dimensional data to allow the classification model to classify.

Figure 7
figure 7

Convolutional network combined with classification model.

Validation algorithm

Endoscopic images from 120 patients were used to evaluate the performance of the derived Artificial intelligence diagnostic system. There are different evaluation index methods for each machine learning model, and many evaluation indexes can be used to measure the performance of the classification model or prediction. The adjustment of parameters and feature selection of different models are typically used to achieve better evaluation performance and to help monitor and evaluate the situation to make appropriate fine tuning parameters and optimization goals36. In this study, six evaluation metrics were used to judge the performance of each classification model: accuracy, positive predictive value (PPV), negative predictive value (NPV), sensitivity, specificity, and area under the curve (AUC). Table 2 shows the confusion matrix for binary classification. The confusion matrix in predictive analysis was composed of true negative (TN, the predicted result was negative, and True was also negative), false negative (FN, predicted result was negative; however, the actual result was positive), false positive (FP, predicted result was positive, but the actual result was interpreted as negative), and true positive (TP, predicted result was positive, but the actual result was also positive)37. The performance of artificial intelligence diagnostic system for a single gastric image was assessed. Chi-square test was used to compare the performance of the different models. Differences were considered statistically significant at P < 0.05. Because the distribution of H. pylori on the surface of the stomach is heterogeneous through the gastric antrum and body, we also assessed the performance of the scSE-CatBoost diagnostic module for the representative images from the antrum and the body of the same patients.

Table 2 Confusion matrix for binary classification.

Informed consent statement

All authors have confirmed the manuscript and approved the publication of the manuscript.

Approval statement

A statement to confirm that all experimental protocols were approved by An Nan Hospital Medical Foundation Human Body Experiment Committee.

Results

Each machine learning model can effectively help in understanding the performance of the model for the evaluation results. Therefore, this study used the attention mechanism and a combination of classification models to classify positive and negative and to evaluate and compare the two parts of the body and antrum infected by H. pylori. The classification methods of K-Nearest Neighbor (KNN)38, Support Vector Machine (SVM)39, Adaptive Boosting (AdaBoost)40, Random Forest (RF)41, Gradient Boosting Decision Tree (GBDT)42, eXtreme Gradient Boosting (XGBoost)43, Light Gradient Boosting (LGBoost)44, and Categorical Boosting (CatBoost)45 were used on the CNN and scSE models. The performance of each model was assessed by six parameters including accuracy, sensitivity, specificity, PPV, NPV, and AUC.

Performance of CNN or scSE combined with different classification models for the diagnosis of H. pylori infection by endoscopic images from the gastric body or antrum

Table 3 shows the performance of CNN combined with different classification models for the diagnosis of H. pylori infection using endoscopic images from the gastric body. The CNN-CatBoost classification model had the best performance, with an accuracy of 88%, sensitivity of 93%, specificity of 80%, and AUC of 0.87. Table 4 displays the performance of scSE combined with different classification models for the diagnosis of H. pylori infection using endoscopic images from the gastric body. The scSE-LGBoost classification model achieved the best performance with an accuracy of 90%, sensitivity of 93%, specificity of 83%, and AUC of 0.88.

Table 3 Performance of CNN combined with different classification models for the diagnosis of H. pylori infection by single endoscopic image from gastric body.
Table 4 Performance of scSE combined with different classification models for the diagnosis of H. pylori infection by single endoscopic image from gastric body.

Table 5 lists the performance of CNN combined with different classification models for the diagnosis of H. pylori infection using endoscopic images from the gastric antrum. CNN-LGBoost had the best performance, with an accuracy of 87%, sensitivity of 89%, specificity of 86%, and AUC of 0.87. Table 6 demonstrates the performance of scSE combined with different classification models for the diagnosis of H. pylori infection by endoscopic images of the gastric antrum. Both scSE-KNN and scSE-CatBoost achieved the best performance, with an accuracy of 89%, sensitivity of 90%, specificity of 88%, and AUC of 0.89.

Table 5 Performance of CNN combined with different classification models for the diagnosis of H. pylori infection by single endoscopic image from gastric antrum.
Table 6 Performance of scSE combined with different classification models for the diagnosis of H. pylori infection by single endoscopic image from gastric antrum.

Comprehensive assessment of H. pylori status by the scSE-CatBoost classification model with endoscopic images of both the body and antrum

Table 7 shows the results of the comprehensive assessment for H. pylori status by the scSE-CatBoost classification model with endoscopic images of both the body and the antrum of same patients. In this comprehensive classification model, H. pylori status was judged as a negative result if both body image and antrum image were classified as a negative result by the scSE-CatBoost classification model. If either the body or antrum image was classified as a positive result by the scSE-CatBoost classification model, the H. pylori status in the comprehensive assessment was judged as a positive result. The comprehensive assessment with the scSE-CatBoost classification model using endoscopic images from the antrum and body of same patients had good performance with an accuracy of 90%, sensitivity of 100%, specificity of 81%, and AUC of 0.88.

Table 7 Comprehensive assessment for H. pylori status by scSE-CatBoost classification models with endoscopic images from the antrum and body of same patients.

Discussion

In this study, we developed a novel artificial intelligence classification system for the diagnosis of H. pylori infection by endoscopic images using the CNN and scSE networks and machine learning methods. The sensitivity, specificity, and accuracy for predicting H. pylori status by scSE-CatBoost classification model using endoscopic images from both the antrum and the body were 100%, 81%, and 90%, respectively. The results indicate that scSE-CatBoost classification model can achieve a high accuracy for the diagnosis of H. pylori infection with white light endoscopic images. It is important to note that the negative predictive value of our artificial intelligence-assisted H. pylori diagnosis system was 100%. The possibility of positive H. pylori status of the patients receiving endoscopy is extremely low if our image diagnosis system shows negative result of H. pylori status. Therefore, it is not necessary to further perform biopsy to check H. pylori status during endoscopy. The avoiding unnecessary biopsy for H. pylori testing has clinical implications because it can decrease medical cost, save endoscopy time and prevent biopsy-induced bleeding in patients with bleeding tendency. Currently, we still suggest the endoscopists to perform biopsy with rapid urease test or histology to confirm the diagnosis of H. pylori infection in patients with positive predictions by our artificial intelligence diagnostic system because the positive predictive value of our diagnostic system is suboptimal (82%). It is necessary to further confirm the diagnosis of H. pylori infection before the administration of eradication therapy. Nonetheless, the accuracy of our artificial intelligence diagnostic system may be further improved by deep learning of more endoscopic images and application of new learning technologies in the future. If no differences in the accuracies between our artificial intelligence diagnostic system and rapid urease test or histology exist, our image diagnostic system has a great potential to replace current biopsy-dependent methods for H. pylori testing.

The current study has several innovative improvements in the diagnosis of H. pylori infection by CNN and scSE networks. First, we examined the performances of CNN and scSE networks combined with different classification models for the diagnosis of H. pylori infection. The results showed that scSE-CatBoost classification models could achieve a very high accuracy for the diagnosis of H. pylori infection. Second, we assessed endoscopic images obtained from white light endoscopic system that was commonly used in daily practice in endoscopic unites. Some studies used blue laser or linked color images to develop image classification system for the diagnosis of H. pylori infection. These images are not ready to obtain in most endoscopic units. Third, some artificial intelligence diagnostic systems excluded the endoscopic images from patients with peptic ulcer and gastric cancer from the investigated population and limited the generalizability of their image diagnostic system in patients with important gastrointestinal diseases. In the current study, we included the subjects with and without major upper gastrointestinal diseases in the process of developing the artificial intelligence image diagnostic system. Therefore, our artificial intelligence classification system can be applied for the diagnosis of H. pylori infection in the patients with peptic ulcer and gastric cancer. In addition, some previous artificial intelligence diagnostic system used inadequate tests (serum of urine H. pylori antibody tests as gold standards for the diagnosis of H. pylori infection10,16. In the current study, we used rapid urease test as the gold standard for the diagnosis of H. pylori infection in this study. The rapid urease test is a reliable testing for H. pylori infection with a sensitivity of 90–95% and a specificity of 95–100%3.

This study used deep learning combined with classification models for datasets of endoscopic images from the gastric body and antrum. The evaluation of the model mainly uses CNN and scSE for evaluation and comparison. The experimental results showed that the use of scSE had a higher evaluation effect, either using gastric body or antrum images. The main reason is that the scSE model can perform weighting operations on information channels to enhance effective information and suppress invalid information. After adding the scSE model, it has more nonlinearity for the overall network, which can better fit the complex correlation between channels, not only increasing the effectiveness of extracting features but also greatly reducing the number of parameters and calculations.

Our data showed that the comprehensive assessment by scSE-CatBoost classification models with endoscopic images of both the body and antrum had a good performance in determining H. pylori status. The performance of for H. pylori status by scSE-CatBoost classification models could achieve an accuracy of 0.90, a sensitivity of 1.00, a specificity of 0.81, and an AUC of 0.88.

Our study has several limitations. First, the assessment of endoscopic images was not real-time. In clinical practice, it is important for real-time assessment of H. pylori infection during live endoscopy. Second, we only included patients without previous H. pylori eradication therapy. It remains unclear whether the artificial intelligence-assisted image diagnosis system can be applied for post-eradication assessment for H. pylori status. Third, this study was a retrospective work, our artificial intelligence-assisted image diagnosis system still require prospective validation in other populations.

Conclusions

In clinical practice, the judgment of H. pylori infection by gastroenterologists’ impression of endoscopic images is often inaccurate. The comprehensive assessment of gastric endoscopic images by the scSE-CatBoost classification model and deep learning can achieve good performance in the determination of H. pylori status. The current study suggests that a machine learning based Image recognition system can be applied to distinguish H. pylori status and has great potential to be applied in the survey or diagnosis of H. pylori infection during endoscopy.