One step further into the blackbox: a pilot study of how to build more confidence around an AI-based decision system of breast nodule assessment in 2D ultrasound

Dong, Fajin; She, Ruilian; Cui, Chen; Shi, Siyuan; Hu, Xuqiao; Zeng, Jieying; Wu, Huaiyu; Xu, Jinfeng; Zhang, Yun

doi:10.1007/s00330-020-07561-7

One step further into the blackbox: a pilot study of how to build more confidence around an AI-based decision system of breast nodule assessment in 2D ultrasound

Imaging Informatics and Artificial Intelligence
Published: 06 January 2021

Volume 31, pages 4991–5000, (2021)
Cite this article

European Radiology Aims and scope Submit manuscript

Fajin Dong^1,2,
Ruilian She³,
Chen Cui¹,
Siyuan Shi¹,
Xuqiao Hu²,
Jieying Zeng²,
Huaiyu Wu²,
Jinfeng Xu ORCID: orcid.org/0000-0001-5380-4625² &
…
Yun Zhang¹

943 Accesses
17 Citations
13 Altmetric
1 Mention
Explore all metrics

Abstract

Objectives

To investigate how a DL model makes decisions in lesion classification with a newly defined region of evidence (ROE) by incorporating “explainable AI” (xAI) techniques.

Methods

A data set of 785 2D breast ultrasound images acquired from 367 females. The DenseNet-121 was used to classify whether the lesion is benign or malignant. For performance assessment, classification results are evaluated by calculating accuracy, sensitivity, specificity, and receiver operating characteristic for experiments of both coarse and fine regions of interest (ROIs). The area under the curve (AUC) was evaluated, and the true-positive, false-positive, true-negative, and false-negative results with breakdown in high, medium, and low resemblance on test sets were also reported.

Results

The two models with coarse and fine ROIs of ultrasound images as input achieve an AUC of 0.899 and 0.869, respectively. The accuracy, sensitivity, and specificity of the model with coarse ROIs are 88.4%, 87.9%, and 89.2%, and with fine ROIs are 86.1%, 87.9%, and 83.8%, respectively. The DL model captures ROE with high resemblance of physicians’ consideration as they assess the image.

Conclusions

We have demonstrated the effectiveness of using DenseNet to classify breast lesions with limited quantity of 2D grayscale ultrasound image data. We have also proposed a new ROE-based metric system that can help physicians and patients better understand how AI makes decisions in reading images, which can potentially be integrated as a part of evidence in early screening or triaging of patients undergoing breast ultrasound examinations.

Key Points

• The two models with coarse and fine ROIs of ultrasound images as input achieve an AUC of 0.899 and 0.869, respectively. The accuracy, sensitivity, and specificity of the model with coarse ROIs are 88.4%, 87.9%, and 89.2%, and with fine ROIs are 86.1%, 87.9%, and 83.8%, respectively.

• The first model with coarse ROIs is slightly better than the second model with fine ROIs according to these evaluation metrics.

• The results from coarse ROI and fine ROI are consistent and the peripheral tissue is also an impact factor in breast lesion classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The importance of multi-modal imaging and clinical information for humans and AI-based algorithms to classify breast masses (INSPiRED 003): an international, multicenter analysis

Article Open access 17 February 2022

André Pfob, Chris Sidey-Gibbons, … Michael Golatta

Artificial intelligence for non-mass breast lesions detection and classification on ultrasound images: a comparative study

Article Open access 04 September 2023

Guoqiu Li, Hongtian Tian, … Fajin Dong

Multi-modal artificial intelligence for the combination of automated 3D breast ultrasound and mammograms in a population of women with predominantly dense breasts

Article Open access 16 January 2023

Tao Tan, Alejandro Rodriguez-Ruiz, … Lingyun Bao

Abbreviations

AI:: Artificial intelligence
AUC:: Area under the curve
CNN:: Convolutional neural network
DL:: Deep learning
FN:: False negative
FP:: False positive
Grad-CAM:: Gradient-weighted class activation mapping
HC:: High confidence
HR:: High resemblance
LC:: Low confidence
LR:: Low resemblance
MC:: Medium confidence
MR:: Medium resemblance
RCTs:: Randomized controlled trials
RNN:: Recurrent neural network
ROC:: Receiver operating characteristic
ROE:: Region of evidence
ROI:: Region of interest
TN:: True negative
TP:: True positive
US:: Ultrasound

References

Donzelli A (2013) The benefits and harms of breast cancer screening. Lancet 381(9869):799–800
Article Google Scholar
Miller AB, Baines CJ, To T, Wall C (1992) Canadian National Breast Screening Study: 2. Breast cancer detection and death rates among women aged 50 to 59 years. CMAJ 147(10):1477–1488
CAS PubMed PubMed Central Google Scholar
Moss SM, Summerley ME, Thomas BT, Ellman R, Chamberlain JO (1992) A case-control evaluation of the effect of breast cancer screening in the United Kingdom trial of early detection of breast cancer. J Epidemiol Community Health 46(4):362–364
Article CAS Google Scholar
Otto SJ (2003) National Evaluation Team for Breast Screening: Initiation of population-based mammography screening in Dutch municipalities and effect on breast-cancer mortality: a systemic review. Lancet 361:1411–1417
Article Google Scholar
Jin ZQ, Lin MY, Hao WQ et al (2015) Diagnostic evaluation of ductal carcinoma in situ of the breast: ultrasonographic, mammographic and histopathologic correlations. Ultrasound Med Biol 41(1):47–55
Article Google Scholar
Osako T, Takahashi K, Iwase T et al (2007) Diagnostic ultrasonography and mammography for invasive and noninvasive breast cancer in women aged 30 to 39 years. Breast Cancer 14(2):229–233
Article Google Scholar
Tohno E, Ueno E, Watanabe H (2009) Ultrasound screening of breast cancer. Breast Cancer 16(1):18
Article Google Scholar
Lee CH, Dershaw DD, Kopans D et al (2010) Breast cancer screening with imaging: recommendations from the Society of Breast Imaging and the ACR on the use of mammography, breast MRI, breast ultrasound, and other technologies for the detection of clinically occult breast cancer. J Am Coll Radiol 7(1):18–27
Article Google Scholar
Berg WA, Gutierrez L, NessAiver MS et al (2004) Diagnostic accuracy of mammography, clinical examination, US, and MR imaging in preoperative assessment of breast cancer. Radiology 233(3):830–849
Article Google Scholar
Su X, Lin Q, Cui C et al (2017) Non-calcified ductal carcinoma in situ of the breast: comparison of diagnostic accuracy of digital breast tomosynthesis, digital mammography, and ultrasonography. Breast Cancer 24(4):562–570
Article Google Scholar
Cho KR, Seo BK, Kim CH et al (2008) Non-calcified ductal carcinoma in situ: ultrasound and mammographic findings correlated with histological findings. Yonsei Med J 49(1):103–110
Article Google Scholar
Hinton G (2018) Deep learning—a technology with the potential to transform health care. JAMA 320(11):1101–1102
Article Google Scholar
Erickson BJ, Korfiatis P, Akkus Z et al (2017) Machine learning for medical imaging. Radiographics 37(2):505–515
Article Google Scholar
Greenspan H, Van Ginneken B, Summers RM (2016) Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans Med Imaging 35(5):1153–1159
Article Google Scholar
Han S, Kang HK, Jeong JY et al (2017) A deep learning framework for supporting the classification of breast lesions in ultrasound images. Phys Med Biol 62(19):7714
Article Google Scholar
Mohamed AA, Berg WA, Peng H et al (2018) A deep learning method for classifying mammographic breast density categories. Med Phys 45(1):314–321
Article Google Scholar
Yala A, Schuster T, Miles R et al (2019) A deep learning model to triage screening mammograms: a simulation study. Radiology 293(1):38–46
Article Google Scholar
Cruz-Roa A, Gilmore H, Basavanhally A et al (2017) Accurate and reproducible invasive breast cancer detection in whole-slide images: a deep learning approach for quantifying tumor extent. Sci Rep 7:46450
Article CAS Google Scholar
Albarqouni S, Baur C, Achilles F et al (2016) Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE Trans Med Imaging 35(5):1313–1321
Article Google Scholar
Qiu Y, Wang Y, Yan S et al (2016) An initial investigation on developing a new method to predict short-term breast cancer risk based on deep learning technology. In: Medical Imaging 2016: Computer-Aided Diagnosis, vol 9785. International Society for Optics and Photonics, p 978521
Cao Z, Duan L, Yang G et al (2019) An experimental study on breast lesion detection and classification from ultrasound images using deep learning architectures. BMC Med Imaging 19(1):51
Article Google Scholar
Codari M, Schiaffino S, Sardanelli F, Trimboli RM (2019) Artificial intelligence for breast MRI in 2008–2018: a systematic mapping review. AJR Am J Roentgenol 212(2):280–292
Ciritsis A, Rossi C, Eberhard M et al (2019) Automatic classification of ultrasound breast lesions using a deep convolutional neural network mimicking human decision-making. Eur Radiol 29(10):5458–5468
Article Google Scholar
Cao Z, Duan L, Yang G et al (2017) Breast tumor detection in ultrasound images using deep learning. In: International Workshop on Patch-based Techniques in Medical Imaging. Springer, Cham, pp 121–128
Chapter Google Scholar
Yap MH, Goyal M, Osman FM et al (2018) Breast ultrasound lesions recognition: end-to-end deep learning approaches. J Med Imaging (Bellingham) 6(1):011007
Google Scholar
Behboodi B, Amiri M, Brooks R et al (2020) Breast lesion segmentation in ultrasound images with limited annotated data. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, pp 1834–1837
Lévy D, Jain A (2016) Breast mass classification from mammograms using deep convolutional neural networks. arXiv preprint arXiv:1612.00542
Shaffer K (2018) Can machine learning be used to generate a model to improve management of high-risk breast lesions? Radiology 286(3):819–821
Burt JR, Torosdagli N, Khosravan N et al (2018) Deep learning beyond cats and dogs: recent advances in diagnosing breast cancer with deep neural networks. Br J Radiol 91(1089):20170545
Article Google Scholar
Portnoi T, Yala A, Schuster T et al (2019) Deep learning model to assess cancer risk on the basis of a breast MR image alone. AJR Am J Roentgenol 213(1):227–233
Article Google Scholar
Price WN, Gerke S, Cohen IG (2019) Potential liability for physicians using artificial intelligence. JAMA 322(18):1765–1766
Article Google Scholar
Raso FA, Hilligoss H, Krishnamurthy V et al (2018) Artificial Intelligence & Human Rights: Opportunities & Risks. Berkman Klein Center Research Publication, pp 2018–2016
Doshi-Velez F, Kortz M, Budish R et al. (2017) Accountability of AI under the law: The role of explanation. arXiv preprint arXiv:1711.01134
Deeks A (2019) The judicial demand for explainable artificial intelligence. Columbia Law Rev 119(7):1829–1850
Google Scholar
Petit N (2018) Artificial intelligence and automated law enforcement: A review paper. Available at SSRN 3145133
Mittelstadt B, Russell C, Wachter S (2019) Explaining explanations in AI. In: Proceedings of the conference on fairness, accountability, and transparency, pp 279–288
Chapter Google Scholar
Arrieta AB, Díaz-Rodríguez N, Del Ser J et al (2020) Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 58:82–115
Article Google Scholar
Ribeiro MT, Singh S, Guestrin C et al (2016) "Why Should I Trust You?": Explaining the Predictions of Any Classifier. arXiv, arXiv-1602
Mishra S, Sturm BL, Dixon S (2017) Local Interpretable Model-Agnostic Explanations for Music Content Analysis. In: ISMIR, pp 537–543
Google Scholar
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems, pp 4765–4774
Google Scholar
Dabkowski P, Gal Y (2017) Real time image saliency for black box classifiers. In: Advances in Neural Information Processing Systems, pp 6967–6976
Google Scholar
Huang G, Liu Z, Van Der Maaten et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Google Scholar
Zeiler MD, Taylor GW, Fergus R et al (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: 2011 International Conference on Computer Vision. IEEE, pp 2018–2025
Zeiler MD, Krishnan D, Taylor GWR (2010) Deconvolutional networks. In: 2010 IEEE Computer Society Conference on computer vision and pattern recognition. IEEE, pp 2528–2535
Selvaraju RR, Cogswell M, Das A et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Google Scholar
Xu K, Ba J, Kiros R et al (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Google Scholar
Omeiza D, Speakman S, Cintas C et al (2019) Smooth grad-cam++: An enhanced inference level visualization technique for deep convolutional neural network models. arXiv preprint arXiv:1908.01224
Xu SX, Xu W (2014) Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Processing Lett 21(11):1389–1393
Article Google Scholar
DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics:837–845
American College of Radiology, and Carl J. D’Orsi (2013) ACR BI-RADS Atlas: Breast Imaging Reporting and Data System; Mammography, Ultrasound, Magnetic Resonance Imaging, Follow-up and Outcome Monitoring, Data Dictionary. ACR, American College of Radiology
Zhou LQ, Wu XL, Huang SY et al (2020) Lymph node metastasis prediction from primary breast cancer US images using deep learning. Radiology 294(1):19–28
Article Google Scholar

Download references

Acknowledgements

This project was supported by the Medical Science and Technology Research Foundation of Guangdong (B2019045, project approval, but non-subsidy).

Funding

The authors state that this work has not received any funding.

Author information

Authors and Affiliations

The Key Laboratory of Cardiovascular Remodeling and Function Research, Chinese Ministry of Education and Chinese Ministry of Health, and The State and Shandong Province Joint Key Laboratory of Translational Cardiovascular Medicine, Qilu Hospital of Shandong University, No. 107 Wenhuaxi Road, Jinan, 250012, People’s Republic of China
Fajin Dong, Chen Cui, Siyuan Shi & Yun Zhang
Department of Ultrasound, First Affiliated Hospital of Southern University of Science and Technology, The Second Clinical Medical College of Jinan University, Shenzhen People’s Hospital, Shenzhen, 518020, People’s Republic of China
Fajin Dong, Xuqiao Hu, Jieying Zeng, Huaiyu Wu & Jinfeng Xu
Department of Obstetrics and Gynecology, The Second Clinical Medical College of Jinan University, Shenzhen People’s Hospital, Shenzhen, 518020, People’s Republic of China
Ruilian She

Authors

Fajin Dong
View author publications
You can also search for this author in PubMed Google Scholar
Ruilian She
View author publications
You can also search for this author in PubMed Google Scholar
Chen Cui
View author publications
You can also search for this author in PubMed Google Scholar
Siyuan Shi
View author publications
You can also search for this author in PubMed Google Scholar
Xuqiao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jieying Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Huaiyu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jinfeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jinfeng Xu or Yun Zhang.

Ethics declarations

Guarantor

The scientific guarantor of this publication is Jinfeng Xu, who is the director of Ultrasound Department of Shenzhen People’s Hospital.

Conflict of interest

The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.

Statistics and biometry

One of the authors has significant statistical expertise.

Informed consent

Written informed consent was obtained from all subjects (patients) in this study.

Ethical approval

Institutional Review Board approval was obtained.

Methodology

• retrospective

• diagnostic or prognostic study

• performed at one institution

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

ESM 1

(DOC 1447 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, F., She, R., Cui, C. et al. One step further into the blackbox: a pilot study of how to build more confidence around an AI-based decision system of breast nodule assessment in 2D ultrasound. Eur Radiol 31, 4991–5000 (2021). https://doi.org/10.1007/s00330-020-07561-7

Download citation

Received: 27 May 2020
Revised: 28 October 2020
Accepted: 24 November 2020
Published: 06 January 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s00330-020-07561-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

One step further into the blackbox: a pilot study of how to build more confidence around an AI-based decision system of breast nodule assessment in 2D ultrasound