Published online Mar 27, 2023.
https://doi.org/10.5051/jpis.2300160008
Deep-learning performance in identifying and classifying dental implant systems from dental imaging: a systematic review and meta-analysis
Abstract
Deep learning (DL) offers promising performance in computer vision tasks and is highly suitable for dental image recognition and analysis. We evaluated the accuracy of DL algorithms in identifying and classifying dental implant systems (DISs) using dental imaging. In this systematic review and meta-analysis, we explored the MEDLINE/PubMed, Scopus, Embase, and Google Scholar databases and identified studies published between January 2011 and March 2022. Studies conducted on DL approaches for DIS identification or classification were included, and the accuracy of the DL models was evaluated using panoramic and periapical radiographic images. The quality of the selected studies was assessed using QUADAS-2. This review was registered with PROSPERO (CRDCRD42022309624). From 1,293 identified records, 9 studies were included in this systematic review and meta-analysis. The DL-based implant classification accuracy was no less than 70.75% (95% confidence interval [CI], 65.6%–75.9%) and no higher than 98.19 (95% CI, 97.8%–98.5%). The weighted accuracy was calculated, and the pooled sample size was 46,645, with an overall accuracy of 92.16% (95% CI, 90.8%–93.5%). The risk of bias and applicability concerns were judged as high for most studies, mainly regarding data selection and reference standards. DL models showed high accuracy in identifying and classifying DISs using panoramic and periapical radiographic images. Therefore, DL models are promising prospects for use as decision aids and decision-making tools; however, there are limitations with respect to their application in actual clinical practice.
Graphical Abstract
INTRODUCTION
Artificial intelligence (AI) is a fast-growing and promising approach in healthcare, for which increasingly many new technologies have been introduced in the past decade [1, 2]. Likewise, the application of AI has already begun to change the paradigm of dental science, since AI models have shown similar or superior accuracy to that of dental professionals in most clinical areas, including implantology, endodontics, maxillofacial surgery, prosthodontics, orthodontics, and periodontics [3, 4]. Many scientific papers in the field of AI-based dentistry have been published, and active research on clinical applications is also being conducted [5].
In the 1950s, a dental implant system (DIS) was developed based on the concept of “osseointegration”, and today, DISs have become a standard treatment modality for replacing missing teeth and rehabilitating edentulous and partially edentulous jaws [6]. In order to improve implant–bone interactions by increasing primary stability and accelerating osseointegration, new and improved DISs—featuring surface and material modifications, such as surface coating, coronal interfaces and flanges, tapered and thread types, and innovations in the apex shape—are being continuously developed and revised [7, 8]. Accordingly, hundreds of manufacturers worldwide produce thousands of different types and varieties of DISs, and it is clinically and practically important to clearly identify and classify which DIS is present in the oral cavity for proper maintenance and management [9, 10, 11].
A subfield of AI, deep learning (DL)-based convolutional neural network algorithms have displayed encouraging performance in computer vision tasks and have been demonstrated to be highly suitable for dental image recognition and analysis [12, 13, 14, 15, 16]. Dental imaging techniques, such as panoramic and periapical radiographs, are valuable methods for identifying and classifying various types of DISs, but they are dependent on subjective human interpretation. Several recent studies have suggested that DL is highly accurate in the identification and classification of various types of DISs, and the classification performance of DL systems has been shown to be equal or superior to that of dental professionals specialized or non-specialized in implantology [17, 18, 19, 20, 21, 22, 23, 24, 25]. In this systematic review, we investigated the current status of DL-based identification and classification of DISs using dental radiographic images and evaluated the accuracy of DL through a meta-analysis.
MATERIALS AND METHODS
In this systematic review and meta-analysis, studies on the DL-based identification and classification of DISs were identified, and the accuracy of various types of DISs using dental radiographic images was investigated. The current systematic review was conducted using the PRISMA guidelines for reporting items and was registered with PROSPERO (registration number CRDCRD42022309624) [26].
Search strategy
We explored the MEDLINE/PubMed, Scopus, Embase, Cochrane, and Google Scholar electronic databases and identified studies published between January 2011 and March 2022, with no language restrictions. A comprehensive title/abstract/keyword search was conducted using the following search query: “artificial intelligence,” “deep learning,” “machine learning,” “neural networks, computer,” “dental implants,” and “dental implantation.” In addition, a manual search of bibliographies, citations, and related articles was performed to identify additional relevant articles, and all included articles were compiled using the bibliographic software tool EndNote (version 20; Clarivate Analytics, Philadelphia, PA, USA).
Eligibility criteria
The eligibility assessment was conducted by one reviewer (JHL), who screened all the titles and abstracts. The following inclusion criteria were employed in the selection of studies: 1) DL approaches for DIS segmentation, detection, identification, or classification; and 2) assessment of the accuracy of DL models using dental radiography, including panoramic, periapical, and bitewing radiographs, and cone-beam computed tomography (CBCT). The exclusion criteria were as follows: 1) letters or narrative reviews, 2) studies in which details of the dataset or data modality were not mentioned, and 3) studies without a clear explanation of the convolutional neural network-based model. Finally, duplicate data and articles were excluded from the analysis.
Quality assessment
The methodological quality of the selected research was evaluated independently using the validated Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool for risk of bias and assessment of applicability [27]. Any disagreement was resolved by argumentation or referral to a third reviewer (LJH). The QUADAS-2 checklist consists of 4 domains related to patient (data) selection, the index test, the reference standard, and flow and timing, as well as applicability concerns regarding patient (data) selection, the index test, and the reference test.
Data analysis
Two reviewers (AC and AN) independently extracted the data based on predetermined criteria. Any disagreement was resolved by argumentation or by referral to a third reviewer (JHL). Detailed descriptive characteristics (including the first author, year of publication, country, radiographic methods, sample size, manufacturers and brands of DISs, training/validation and test set ratio, and DL architecture or modeling framework) were extracted, and a comparative evaluation of accuracy-related metrics (including accuracy, precision, recall, F1 score, sensitivity, specificity, positive and negative predictive values, the Youden index, intersection over union, and area under the receiver operating characteristic curve [AUC-ROC]) was performed for each study. The findings of the meta-analysis are presented in forest plots, with point estimates and 95% confidence intervals (CIs) for each study and overall.
RESULTS
Literature search
Our search identified 1,293 records, of which 488 were screened. Subsequently, 11 studies were included after the title and abstract evaluation. From these screened studies, 11 full-text studies were assessed for eligibility, and 9 studies were included in the systematic review and meta-analysis [17, 18, 19, 20, 21, 22, 23, 24, 25]. A detailed flowchart of the current study is presented in Figure 1.
Figure 1
Flowchart of study selection.
Study characteristics
The detailed characteristics of the nine studies included in this systematic review and meta-analysis are presented in Table 1. The included studies were published between 2020 and 2022 in 4 countries (Korea: 4 studies [20, 21, 22, 25]; Japan: 3 studies [17, 19, 23]; Brazil: 1 study [24]; and France: 1 study [18]). The mean number of different types of DISs included in each study was 6±3 (median: 6; range: 3–12), and the mean number of radiographic images included in the dataset was 5,977 ± 4,379 (median: 7, 325; range: 801–11,980). The ratio of the training/validation to test sets was 80:20 in 7 studies [17, 18, 20, 21, 22, 23, 24], while ratios of 72:25 [19] and 97.5:2.5 [25] were applied in the other 2 studies. For all studies, a DL-based accuracy analysis was performed using unopened and individual datasets, and architectures based on deep convolutional neural networks were adopted, followed by customized algorithms (n=4) [19, 23, 24, 25], GoogLeNet Inception v3 (n=3) [18, 20, 22], ResNet-18/34/50/101/152 (n=2) [22, 23], VGG-16/19 (n=1) [19], and YOLO v3 (n=1) [17].
Table 1
Descriptive characteristics of the studies included in this systematic review
Risk of bias and accuracy outcomes
A summary diagram of the methodological quality assessment is shown in Figure 2. The risk of bias in each individual study was assessed in 4 domains (data selection, the index test, the reference standard, and flow and timing) and was high for most studies with respect to data selection (n=7) and reference standard (n=6). Likewise, applicability concerns, including data selection, the index test, and the reference standard, were also present in most studies with respect to the data selection (n=7) and reference test (n=6). These results were primarily due to the collection of biased datasets and the absence of a valid reference test independent of the index test. The DL-based implant identification and classification accuracy was assessed to be no less than 70.75% (95% CI, 65.6%–75.9%) [17] and no higher than 98.19% (95% CI, 97.8%–98.5%) [23]. The weighted accuracy was calculated, and the pooled sample size was 46,645, with an overall accuracy performance of 92.16% (95% CI, 90.8%–93.5%) (Figure 3). In a comparison between sample size and accuracy, 3 studies [18, 22, 24] reported an accuracy of more than 80% despite a small sample size (<2,000). In the rest of the studies, a positive correlation was observed; as sample size increased, so did accuracy (r2=0.0879) (Figure 4A). Out of the 9 studies included in the analysis, only 1 [23] used a 75:25 training/validation and test set ratio and reported 90.02% accuracy. Seven studies used a training/validation and test set ratio of 80:20, and the accuracy ranged from 70.75% to 98.15%. Another [25] used a 97.5:2.5 training/validation and test set ratio and showed an accuracy of 77.79% (Figure 4B).
Figure 2
Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool for the risk of bias and assessment of the applicability of the studies included in the review.
Figure 3
Forest plot for reporting accuracy and 95% CIs, showing the largest effect size in each paper. The blue diamond shows the overall estimate.
CI: confidence interval.
Figure 4
Scatter plot (A) between sample size and accuracy and (B) between the training/validation and test set ratio and classification accuracy of dental implant systems.
DISCUSSION
When mechanical and biological complications, such as screw loosening, fracture, or peri-implantitis, occur in DISs, a specific identification or classification of the manufacturer, brand, or type of the DIS can reduce the additional efforts required from clinical practitioners to remove or replace the DIS and avoid the possibility of other unintended iatrogenic complications [28, 29, 30]. However, many DISs placed in the jaw remain unidentified or unlabeled due to various internal and external environmental contingencies, such as the relocation or closure of dental practices or closure of dental implant manufacturers. Therefore, the dental radiography-based identification and classification of DISs implanted in the jaw have significant clinical advantages in providing appropriate professional care.
To the authors’ best knowledge, this is the first systematic review and meta-analysis of the DL-based classification accuracy of various types of DISs using dental radiographic images, and 9 studies were included in the main analysis. All studies included in this review reported a high accuracy (over 70%), confirming the potential applicability of DL-based AI technology to support clinical decision-making. Nevertheless, there are doubts and unclear scientific underpinnings in terms of reliability and validity regarding whether DL can be used appropriately in actual clinical practice. Furthermore, because the studies analyzed herein had a high risk of bias, the points discussed below need to be considered in the interpretation and analysis.
First, existing studies have not considered significant parameters (including contrast, intensity level, sharpness, aspect ratio, orientation, and resolution) related to the quality management and standardization of dental radiographic images. Four studies [20, 21, 24, 25] described a quality assessment, including characteristics such as noise, haziness, blur, positioning errors, or distortion in the images used in the datasets, but none of the studies conducted validation by an oral and maxillofacial radiologist who was not involved in managing the dataset. In addition, the mixed use of panoramic and periapical radiographic images requires caution when interpreting the findings regarding comparative accuracy. Five studies [17, 18, 19, 23, 25] focused on the analysis of panoramic radiographic images, 2 studies [22, 24] analyzed periapical radiographic images, and 2 other studies [20, 21] analyzed both panoramic and periapical radiographic images in their datasets. In 1 study [18], the outline of the implant fixture was manually cropped and used as data, whereas all other studies cropped the images in a rectangular or square shape to define the region of interest (ROI). This non-standardization of the ROI has a negative effect on the quality management of the dataset.
Second, although the amount of data according to class labels is a key factor in conducting successful DL-based analysis, most studies used a small number of radiographic images and a small collection of different types of DISs. Compared to DL-based medical studies that used tens of thousands of images as a dataset, the studies included in this review included between 3 to 12 different types of DISs and between 801 to 11,980 radiographic images for DL training and inference; these numbers are relatively low for application in actual clinical practice [31, 32]. In addition, because most studies did not cross-validate the models and results [17, 18, 20, 21, 22, 24, 25], the accuracy may have been overestimated. Therefore, since thousands of DISs currently exist, the accuracy-related outcomes of this review are likely to be highly biased.
Third, because the DL algorithms used in each study included in this review have different architectures and structures, it is difficult to objectively and quantitatively compare the classification performance for the different types of DISs. In general, it is expected that more recently developed or modified DL algorithms would be more accurate; however, significant differences in accuracy according to differences between each algorithm could not be identified. This may indicate that although the DL model itself is a considerably important factor, the quantity and quality of the dataset are currently more important. Two studies using different algorithms with the same dataset were included in this review, and the accuracy did not change significantly depending on the DL algorithm [21, 25].
Fourth, it is necessary to consider 3-dimensional (3D) dental radiographic imaging. All 9 studies utilized 2-dimensional (2D) radiographs that included only panoramic and periapical radiographic images as datasets for DL applications [17, 18, 19, 20, 21, 22, 23, 24, 25]. Several recent studies have developed and experimentally verified various DL algorithms for the recognition and localization of 3D images, such as computed tomography and magnetic resonance imaging [33, 34]. CBCT images, which are widely used in the field of dental implantology, have an advantage compared to 2D images in that they have less distortion and can obtain 3D volume information [35]. Therefore, the development of DL models that can use 3D CBCT images as input and a comparative evaluation with 2D images are absolutely necessary.
Finally, in order to evaluate the applicability and feasibility of DL models as decision aids and decision-making tools in clinical practice, the accuracy of DL models should be compared with that of dental professionals. Three studies compared the accuracy between DL models and dental professionals, and DL showed higher classification accuracy than dental professionals on average with respect to parameters such as accuracy, AUC-ROC, sensitivity, and specificity [20, 21, 25]. According to a recent study, using DL as a decision-aid tool significantly improved the classification accuracy of dental professionals (P<0.05) [25]. In particular, when assisted by DL, dental professionals who specialized in implantology (mean accuracy: 88.56%) showed higher accuracy than the DL did alone (mean accuracy: 80.56%) [25]. The result of this previous study supports the conjecture that the assistance of DL and the knowledge of experienced dental professionals can have synergistic effects on each other.
The efficiency and robustness of DL technology critically depend on advanced DL architecture and a well-organized dataset. DL technology is advancing rapidly; hence, dental implant-related datasets are being advanced for use in actual clinical practice after standardization and quality improvement. Based on the limited findings of the current systematic review, the following conclusions were drawn. According to the studies included in this review, 1) the DL models developed to identify and classify DISs using panoramic and periapical radiographic images showed 70.75% to 98.19% accuracy for 3 to 11 different types of DISs, and 2) DL models can potentially be used as decision aids and decision-making tools; however, there are limitations concerning their practical clinical use.
Funding:This study was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2019R1A2C1083978) and the Fund of Biomedical Research Institute, Jeonbuk National University Hospital.
Conflict of Interest:No potential conflict of interest relevant to this article was reported.
Author Contributions:
Conceptualization: Akhilanand Chaurasia, Arunkumar Namachivayam, Revan Birke Koca-Ünsal, Jae-Hong Lee.
Formal analysis: Akhilanand Chaurasia, Arunkumar Namachivayam, Revan Birke Koca-Ünsal, Jae-Hong Lee.
Investigation: Akhilanand Chaurasia, Arunkumar Namachivayam, Revan Birke Koca-Ünsal, Jae-Hong Lee.
Methodology: Akhilanand Chaurasia, Arunkumar Namachivayam, Revan Birke Koca-Ünsal, Jae-Hong Lee.
Project administration: Akhilanand Chaurasia, Arunkumar Namachivayam, Revan Birke Koca-Ünsal, Jae-Hong Lee.
Writing - original draft: Akhilanand Chaurasia, Arunkumar Namachivayam, Revan Birke Koca-Ünsal, Jae-Hong Lee.
Writing - review & editing: Akhilanand Chaurasia, Arunkumar Namachivayam, Revan Birke Koca-Ünsal, Jae-Hong Lee.
References
-
Jokstad A, Braegger U, Brunski JB, Carr AB, Naert I, Wennerberg A. Quality of dental implants. Int Dent J 2003;53 Suppl 2:409–443.
-
-
Jokstad A, Ganeles J. Systematic review of clinical and patient-reported outcomes following oral rehabilitation on dental implants with a tapered compared to a non-tapered implant design. Clin Oral Implants Res 2018;29 Suppl 16:41–54.
-
-
Hadj Saïd M, Le Roux MK, Catherine JH, Lan R. Development of an artificial intelligence model to identify a dental implant from a radiograph. Int J Oral Maxillofac Implants 2020;36:1077–1082.
-
-
Li Y, Li W, Xiong J, Xia J, Xie Y. Comparison of supervised and unsupervised deep learning methods for medical image synthesis between computed tomography and magnetic resonance images. BioMed Res Int 2020;2020:5193707
-