Abstract
Objectives
To investigate how a DL model makes decisions in lesion classification with a newly defined region of evidence (ROE) by incorporating “explainable AI” (xAI) techniques.
Methods
A data set of 785 2D breast ultrasound images acquired from 367 females. The DenseNet-121 was used to classify whether the lesion is benign or malignant. For performance assessment, classification results are evaluated by calculating accuracy, sensitivity, specificity, and receiver operating characteristic for experiments of both coarse and fine regions of interest (ROIs). The area under the curve (AUC) was evaluated, and the true-positive, false-positive, true-negative, and false-negative results with breakdown in high, medium, and low resemblance on test sets were also reported.
Results
The two models with coarse and fine ROIs of ultrasound images as input achieve an AUC of 0.899 and 0.869, respectively. The accuracy, sensitivity, and specificity of the model with coarse ROIs are 88.4%, 87.9%, and 89.2%, and with fine ROIs are 86.1%, 87.9%, and 83.8%, respectively. The DL model captures ROE with high resemblance of physicians’ consideration as they assess the image.
Conclusions
We have demonstrated the effectiveness of using DenseNet to classify breast lesions with limited quantity of 2D grayscale ultrasound image data. We have also proposed a new ROE-based metric system that can help physicians and patients better understand how AI makes decisions in reading images, which can potentially be integrated as a part of evidence in early screening or triaging of patients undergoing breast ultrasound examinations.
Key Points
• The two models with coarse and fine ROIs of ultrasound images as input achieve an AUC of 0.899 and 0.869, respectively. The accuracy, sensitivity, and specificity of the model with coarse ROIs are 88.4%, 87.9%, and 89.2%, and with fine ROIs are 86.1%, 87.9%, and 83.8%, respectively.
• The first model with coarse ROIs is slightly better than the second model with fine ROIs according to these evaluation metrics.
• The results from coarse ROI and fine ROI are consistent and the peripheral tissue is also an impact factor in breast lesion classification.
Similar content being viewed by others
Abbreviations
- AI:
-
Artificial intelligence
- AUC:
-
Area under the curve
- CNN:
-
Convolutional neural network
- DL:
-
Deep learning
- FN:
-
False negative
- FP:
-
False positive
- Grad-CAM:
-
Gradient-weighted class activation mapping
- HC:
-
High confidence
- HR:
-
High resemblance
- LC:
-
Low confidence
- LR:
-
Low resemblance
- MC:
-
Medium confidence
- MR:
-
Medium resemblance
- RCTs:
-
Randomized controlled trials
- RNN:
-
Recurrent neural network
- ROC:
-
Receiver operating characteristic
- ROE:
-
Region of evidence
- ROI:
-
Region of interest
- TN:
-
True negative
- TP:
-
True positive
- US:
-
Ultrasound
References
Donzelli A (2013) The benefits and harms of breast cancer screening. Lancet 381(9869):799–800
Miller AB, Baines CJ, To T, Wall C (1992) Canadian National Breast Screening Study: 2. Breast cancer detection and death rates among women aged 50 to 59 years. CMAJ 147(10):1477–1488
Moss SM, Summerley ME, Thomas BT, Ellman R, Chamberlain JO (1992) A case-control evaluation of the effect of breast cancer screening in the United Kingdom trial of early detection of breast cancer. J Epidemiol Community Health 46(4):362–364
Otto SJ (2003) National Evaluation Team for Breast Screening: Initiation of population-based mammography screening in Dutch municipalities and effect on breast-cancer mortality: a systemic review. Lancet 361:1411–1417
Jin ZQ, Lin MY, Hao WQ et al (2015) Diagnostic evaluation of ductal carcinoma in situ of the breast: ultrasonographic, mammographic and histopathologic correlations. Ultrasound Med Biol 41(1):47–55
Osako T, Takahashi K, Iwase T et al (2007) Diagnostic ultrasonography and mammography for invasive and noninvasive breast cancer in women aged 30 to 39 years. Breast Cancer 14(2):229–233
Tohno E, Ueno E, Watanabe H (2009) Ultrasound screening of breast cancer. Breast Cancer 16(1):18
Lee CH, Dershaw DD, Kopans D et al (2010) Breast cancer screening with imaging: recommendations from the Society of Breast Imaging and the ACR on the use of mammography, breast MRI, breast ultrasound, and other technologies for the detection of clinically occult breast cancer. J Am Coll Radiol 7(1):18–27
Berg WA, Gutierrez L, NessAiver MS et al (2004) Diagnostic accuracy of mammography, clinical examination, US, and MR imaging in preoperative assessment of breast cancer. Radiology 233(3):830–849
Su X, Lin Q, Cui C et al (2017) Non-calcified ductal carcinoma in situ of the breast: comparison of diagnostic accuracy of digital breast tomosynthesis, digital mammography, and ultrasonography. Breast Cancer 24(4):562–570
Cho KR, Seo BK, Kim CH et al (2008) Non-calcified ductal carcinoma in situ: ultrasound and mammographic findings correlated with histological findings. Yonsei Med J 49(1):103–110
Hinton G (2018) Deep learning—a technology with the potential to transform health care. JAMA 320(11):1101–1102
Erickson BJ, Korfiatis P, Akkus Z et al (2017) Machine learning for medical imaging. Radiographics 37(2):505–515
Greenspan H, Van Ginneken B, Summers RM (2016) Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans Med Imaging 35(5):1153–1159
Han S, Kang HK, Jeong JY et al (2017) A deep learning framework for supporting the classification of breast lesions in ultrasound images. Phys Med Biol 62(19):7714
Mohamed AA, Berg WA, Peng H et al (2018) A deep learning method for classifying mammographic breast density categories. Med Phys 45(1):314–321
Yala A, Schuster T, Miles R et al (2019) A deep learning model to triage screening mammograms: a simulation study. Radiology 293(1):38–46
Cruz-Roa A, Gilmore H, Basavanhally A et al (2017) Accurate and reproducible invasive breast cancer detection in whole-slide images: a deep learning approach for quantifying tumor extent. Sci Rep 7:46450
Albarqouni S, Baur C, Achilles F et al (2016) Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE Trans Med Imaging 35(5):1313–1321
Qiu Y, Wang Y, Yan S et al (2016) An initial investigation on developing a new method to predict short-term breast cancer risk based on deep learning technology. In: Medical Imaging 2016: Computer-Aided Diagnosis, vol 9785. International Society for Optics and Photonics, p 978521
Cao Z, Duan L, Yang G et al (2019) An experimental study on breast lesion detection and classification from ultrasound images using deep learning architectures. BMC Med Imaging 19(1):51
Codari M, Schiaffino S, Sardanelli F, Trimboli RM (2019) Artificial intelligence for breast MRI in 2008–2018: a systematic mapping review. AJR Am J Roentgenol 212(2):280–292
Ciritsis A, Rossi C, Eberhard M et al (2019) Automatic classification of ultrasound breast lesions using a deep convolutional neural network mimicking human decision-making. Eur Radiol 29(10):5458–5468
Cao Z, Duan L, Yang G et al (2017) Breast tumor detection in ultrasound images using deep learning. In: International Workshop on Patch-based Techniques in Medical Imaging. Springer, Cham, pp 121–128
Yap MH, Goyal M, Osman FM et al (2018) Breast ultrasound lesions recognition: end-to-end deep learning approaches. J Med Imaging (Bellingham) 6(1):011007
Behboodi B, Amiri M, Brooks R et al (2020) Breast lesion segmentation in ultrasound images with limited annotated data. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, pp 1834–1837
Lévy D, Jain A (2016) Breast mass classification from mammograms using deep convolutional neural networks. arXiv preprint arXiv:1612.00542
Shaffer K (2018) Can machine learning be used to generate a model to improve management of high-risk breast lesions? Radiology 286(3):819–821
Burt JR, Torosdagli N, Khosravan N et al (2018) Deep learning beyond cats and dogs: recent advances in diagnosing breast cancer with deep neural networks. Br J Radiol 91(1089):20170545
Portnoi T, Yala A, Schuster T et al (2019) Deep learning model to assess cancer risk on the basis of a breast MR image alone. AJR Am J Roentgenol 213(1):227–233
Price WN, Gerke S, Cohen IG (2019) Potential liability for physicians using artificial intelligence. JAMA 322(18):1765–1766
Raso FA, Hilligoss H, Krishnamurthy V et al (2018) Artificial Intelligence & Human Rights: Opportunities & Risks. Berkman Klein Center Research Publication, pp 2018–2016
Doshi-Velez F, Kortz M, Budish R et al. (2017) Accountability of AI under the law: The role of explanation. arXiv preprint arXiv:1711.01134
Deeks A (2019) The judicial demand for explainable artificial intelligence. Columbia Law Rev 119(7):1829–1850
Petit N (2018) Artificial intelligence and automated law enforcement: A review paper. Available at SSRN 3145133
Mittelstadt B, Russell C, Wachter S (2019) Explaining explanations in AI. In: Proceedings of the conference on fairness, accountability, and transparency, pp 279–288
Arrieta AB, Díaz-Rodríguez N, Del Ser J et al (2020) Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 58:82–115
Ribeiro MT, Singh S, Guestrin C et al (2016) "Why Should I Trust You?": Explaining the Predictions of Any Classifier. arXiv, arXiv-1602
Mishra S, Sturm BL, Dixon S (2017) Local Interpretable Model-Agnostic Explanations for Music Content Analysis. In: ISMIR, pp 537–543
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems, pp 4765–4774
Dabkowski P, Gal Y (2017) Real time image saliency for black box classifiers. In: Advances in Neural Information Processing Systems, pp 6967–6976
Huang G, Liu Z, Van Der Maaten et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Zeiler MD, Taylor GW, Fergus R et al (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: 2011 International Conference on Computer Vision. IEEE, pp 2018–2025
Zeiler MD, Krishnan D, Taylor GWR (2010) Deconvolutional networks. In: 2010 IEEE Computer Society Conference on computer vision and pattern recognition. IEEE, pp 2528–2535
Selvaraju RR, Cogswell M, Das A et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Xu K, Ba J, Kiros R et al (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Omeiza D, Speakman S, Cintas C et al (2019) Smooth grad-cam++: An enhanced inference level visualization technique for deep convolutional neural network models. arXiv preprint arXiv:1908.01224
Xu SX, Xu W (2014) Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Processing Lett 21(11):1389–1393
DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics:837–845
American College of Radiology, and Carl J. D’Orsi (2013) ACR BI-RADS Atlas: Breast Imaging Reporting and Data System; Mammography, Ultrasound, Magnetic Resonance Imaging, Follow-up and Outcome Monitoring, Data Dictionary. ACR, American College of Radiology
Zhou LQ, Wu XL, Huang SY et al (2020) Lymph node metastasis prediction from primary breast cancer US images using deep learning. Radiology 294(1):19–28
Acknowledgements
This project was supported by the Medical Science and Technology Research Foundation of Guangdong (B2019045, project approval, but non-subsidy).
Funding
The authors state that this work has not received any funding.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Guarantor
The scientific guarantor of this publication is Jinfeng Xu, who is the director of Ultrasound Department of Shenzhen People’s Hospital.
Conflict of interest
The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.
Statistics and biometry
One of the authors has significant statistical expertise.
Informed consent
Written informed consent was obtained from all subjects (patients) in this study.
Ethical approval
Institutional Review Board approval was obtained.
Methodology
• retrospective
• diagnostic or prognostic study
• performed at one institution
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
ESM 1
(DOC 1447 kb)
Rights and permissions
About this article
Cite this article
Dong, F., She, R., Cui, C. et al. One step further into the blackbox: a pilot study of how to build more confidence around an AI-based decision system of breast nodule assessment in 2D ultrasound. Eur Radiol 31, 4991–5000 (2021). https://doi.org/10.1007/s00330-020-07561-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00330-020-07561-7