Skip to main content

Advertisement

Log in

One step further into the blackbox: a pilot study of how to build more confidence around an AI-based decision system of breast nodule assessment in 2D ultrasound

  • Imaging Informatics and Artificial Intelligence
  • Published:
European Radiology Aims and scope Submit manuscript

Abstract

Objectives

To investigate how a DL model makes decisions in lesion classification with a newly defined region of evidence (ROE) by incorporating “explainable AI” (xAI) techniques.

Methods

A data set of 785 2D breast ultrasound images acquired from 367 females. The DenseNet-121 was used to classify whether the lesion is benign or malignant. For performance assessment, classification results are evaluated by calculating accuracy, sensitivity, specificity, and receiver operating characteristic for experiments of both coarse and fine regions of interest (ROIs). The area under the curve (AUC) was evaluated, and the true-positive, false-positive, true-negative, and false-negative results with breakdown in high, medium, and low resemblance on test sets were also reported.

Results

The two models with coarse and fine ROIs of ultrasound images as input achieve an AUC of 0.899 and 0.869, respectively. The accuracy, sensitivity, and specificity of the model with coarse ROIs are 88.4%, 87.9%, and 89.2%, and with fine ROIs are 86.1%, 87.9%, and 83.8%, respectively. The DL model captures ROE with high resemblance of physicians’ consideration as they assess the image.

Conclusions

We have demonstrated the effectiveness of using DenseNet to classify breast lesions with limited quantity of 2D grayscale ultrasound image data. We have also proposed a new ROE-based metric system that can help physicians and patients better understand how AI makes decisions in reading images, which can potentially be integrated as a part of evidence in early screening or triaging of patients undergoing breast ultrasound examinations.

Key Points

The two models with coarse and fine ROIs of ultrasound images as input achieve an AUC of 0.899 and 0.869, respectively. The accuracy, sensitivity, and specificity of the model with coarse ROIs are 88.4%, 87.9%, and 89.2%, and with fine ROIs are 86.1%, 87.9%, and 83.8%, respectively.

The first model with coarse ROIs is slightly better than the second model with fine ROIs according to these evaluation metrics.

The results from coarse ROI and fine ROI are consistent and the peripheral tissue is also an impact factor in breast lesion classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Abbreviations

AI:

Artificial intelligence

AUC:

Area under the curve

CNN:

Convolutional neural network

DL:

Deep learning

FN:

False negative

FP:

False positive

Grad-CAM:

Gradient-weighted class activation mapping

HC:

High confidence

HR:

High resemblance

LC:

Low confidence

LR:

Low resemblance

MC:

Medium confidence

MR:

Medium resemblance

RCTs:

Randomized controlled trials

RNN:

Recurrent neural network

ROC:

Receiver operating characteristic

ROE:

Region of evidence

ROI:

Region of interest

TN:

True negative

TP:

True positive

US:

Ultrasound

References

  1. Donzelli A (2013) The benefits and harms of breast cancer screening. Lancet 381(9869):799–800

    Article  Google Scholar 

  2. Miller AB, Baines CJ, To T, Wall C (1992) Canadian National Breast Screening Study: 2. Breast cancer detection and death rates among women aged 50 to 59 years. CMAJ 147(10):1477–1488

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Moss SM, Summerley ME, Thomas BT, Ellman R, Chamberlain JO (1992) A case-control evaluation of the effect of breast cancer screening in the United Kingdom trial of early detection of breast cancer. J Epidemiol Community Health 46(4):362–364

    Article  CAS  Google Scholar 

  4. Otto SJ (2003) National Evaluation Team for Breast Screening: Initiation of population-based mammography screening in Dutch municipalities and effect on breast-cancer mortality: a systemic review. Lancet 361:1411–1417

    Article  Google Scholar 

  5. Jin ZQ, Lin MY, Hao WQ et al (2015) Diagnostic evaluation of ductal carcinoma in situ of the breast: ultrasonographic, mammographic and histopathologic correlations. Ultrasound Med Biol 41(1):47–55

    Article  Google Scholar 

  6. Osako T, Takahashi K, Iwase T et al (2007) Diagnostic ultrasonography and mammography for invasive and noninvasive breast cancer in women aged 30 to 39 years. Breast Cancer 14(2):229–233

    Article  Google Scholar 

  7. Tohno E, Ueno E, Watanabe H (2009) Ultrasound screening of breast cancer. Breast Cancer 16(1):18

    Article  Google Scholar 

  8. Lee CH, Dershaw DD, Kopans D et al (2010) Breast cancer screening with imaging: recommendations from the Society of Breast Imaging and the ACR on the use of mammography, breast MRI, breast ultrasound, and other technologies for the detection of clinically occult breast cancer. J Am Coll Radiol 7(1):18–27

    Article  Google Scholar 

  9. Berg WA, Gutierrez L, NessAiver MS et al (2004) Diagnostic accuracy of mammography, clinical examination, US, and MR imaging in preoperative assessment of breast cancer. Radiology 233(3):830–849

    Article  Google Scholar 

  10. Su X, Lin Q, Cui C et al (2017) Non-calcified ductal carcinoma in situ of the breast: comparison of diagnostic accuracy of digital breast tomosynthesis, digital mammography, and ultrasonography. Breast Cancer 24(4):562–570

    Article  Google Scholar 

  11. Cho KR, Seo BK, Kim CH et al (2008) Non-calcified ductal carcinoma in situ: ultrasound and mammographic findings correlated with histological findings. Yonsei Med J 49(1):103–110

    Article  Google Scholar 

  12. Hinton G (2018) Deep learning—a technology with the potential to transform health care. JAMA 320(11):1101–1102

    Article  Google Scholar 

  13. Erickson BJ, Korfiatis P, Akkus Z et al (2017) Machine learning for medical imaging. Radiographics 37(2):505–515

    Article  Google Scholar 

  14. Greenspan H, Van Ginneken B, Summers RM (2016) Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans Med Imaging 35(5):1153–1159

    Article  Google Scholar 

  15. Han S, Kang HK, Jeong JY et al (2017) A deep learning framework for supporting the classification of breast lesions in ultrasound images. Phys Med Biol 62(19):7714

    Article  Google Scholar 

  16. Mohamed AA, Berg WA, Peng H et al (2018) A deep learning method for classifying mammographic breast density categories. Med Phys 45(1):314–321

    Article  Google Scholar 

  17. Yala A, Schuster T, Miles R et al (2019) A deep learning model to triage screening mammograms: a simulation study. Radiology 293(1):38–46

    Article  Google Scholar 

  18. Cruz-Roa A, Gilmore H, Basavanhally A et al (2017) Accurate and reproducible invasive breast cancer detection in whole-slide images: a deep learning approach for quantifying tumor extent. Sci Rep 7:46450

    Article  CAS  Google Scholar 

  19. Albarqouni S, Baur C, Achilles F et al (2016) Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE Trans Med Imaging 35(5):1313–1321

    Article  Google Scholar 

  20. Qiu Y, Wang Y, Yan S et al (2016) An initial investigation on developing a new method to predict short-term breast cancer risk based on deep learning technology. In: Medical Imaging 2016: Computer-Aided Diagnosis, vol 9785. International Society for Optics and Photonics, p 978521

  21. Cao Z, Duan L, Yang G et al (2019) An experimental study on breast lesion detection and classification from ultrasound images using deep learning architectures. BMC Med Imaging 19(1):51

    Article  Google Scholar 

  22. Codari M, Schiaffino S, Sardanelli F, Trimboli RM (2019) Artificial intelligence for breast MRI in 2008–2018: a systematic mapping review. AJR Am J Roentgenol 212(2):280–292

  23. Ciritsis A, Rossi C, Eberhard M et al (2019) Automatic classification of ultrasound breast lesions using a deep convolutional neural network mimicking human decision-making. Eur Radiol 29(10):5458–5468

    Article  Google Scholar 

  24. Cao Z, Duan L, Yang G et al (2017) Breast tumor detection in ultrasound images using deep learning. In: International Workshop on Patch-based Techniques in Medical Imaging. Springer, Cham, pp 121–128

    Chapter  Google Scholar 

  25. Yap MH, Goyal M, Osman FM et al (2018) Breast ultrasound lesions recognition: end-to-end deep learning approaches. J Med Imaging (Bellingham) 6(1):011007

    Google Scholar 

  26. Behboodi B, Amiri M, Brooks R et al (2020) Breast lesion segmentation in ultrasound images with limited annotated data. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, pp 1834–1837

  27. Lévy D, Jain A (2016) Breast mass classification from mammograms using deep convolutional neural networks. arXiv preprint arXiv:1612.00542

  28. Shaffer K (2018) Can machine learning be used to generate a model to improve management of high-risk breast lesions? Radiology 286(3):819–821

  29. Burt JR, Torosdagli N, Khosravan N et al (2018) Deep learning beyond cats and dogs: recent advances in diagnosing breast cancer with deep neural networks. Br J Radiol 91(1089):20170545

    Article  Google Scholar 

  30. Portnoi T, Yala A, Schuster T et al (2019) Deep learning model to assess cancer risk on the basis of a breast MR image alone. AJR Am J Roentgenol 213(1):227–233

    Article  Google Scholar 

  31. Price WN, Gerke S, Cohen IG (2019) Potential liability for physicians using artificial intelligence. JAMA 322(18):1765–1766

    Article  Google Scholar 

  32. Raso FA, Hilligoss H, Krishnamurthy V et al (2018) Artificial Intelligence & Human Rights: Opportunities & Risks. Berkman Klein Center Research Publication, pp 2018–2016

  33. Doshi-Velez F, Kortz M, Budish R et al. (2017) Accountability of AI under the law: The role of explanation. arXiv preprint arXiv:1711.01134

  34. Deeks A (2019) The judicial demand for explainable artificial intelligence. Columbia Law Rev 119(7):1829–1850

    Google Scholar 

  35. Petit N (2018) Artificial intelligence and automated law enforcement: A review paper. Available at SSRN 3145133

  36. Mittelstadt B, Russell C, Wachter S (2019) Explaining explanations in AI. In: Proceedings of the conference on fairness, accountability, and transparency, pp 279–288

    Chapter  Google Scholar 

  37. Arrieta AB, Díaz-Rodríguez N, Del Ser J et al (2020) Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 58:82–115

    Article  Google Scholar 

  38. Ribeiro MT, Singh S, Guestrin C et al (2016) "Why Should I Trust You?": Explaining the Predictions of Any Classifier. arXiv, arXiv-1602

  39. Mishra S, Sturm BL, Dixon S (2017) Local Interpretable Model-Agnostic Explanations for Music Content Analysis. In: ISMIR, pp 537–543

    Google Scholar 

  40. Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems, pp 4765–4774

    Google Scholar 

  41. Dabkowski P, Gal Y (2017) Real time image saliency for black box classifiers. In: Advances in Neural Information Processing Systems, pp 6967–6976

    Google Scholar 

  42. Huang G, Liu Z, Van Der Maaten et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

    Google Scholar 

  43. Zeiler MD, Taylor GW, Fergus R et al (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: 2011 International Conference on Computer Vision. IEEE, pp 2018–2025

  44. Zeiler MD, Krishnan D, Taylor GWR (2010) Deconvolutional networks. In: 2010 IEEE Computer Society Conference on computer vision and pattern recognition. IEEE, pp 2528–2535

  45. Selvaraju RR, Cogswell M, Das A et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

    Google Scholar 

  46. Xu K, Ba J, Kiros R et al (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

    Google Scholar 

  47. Omeiza D, Speakman S, Cintas C et al (2019) Smooth grad-cam++: An enhanced inference level visualization technique for deep convolutional neural network models. arXiv preprint arXiv:1908.01224

  48. Xu SX, Xu W (2014) Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Processing Lett 21(11):1389–1393

    Article  Google Scholar 

  49. DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics:837–845

  50. American College of Radiology, and Carl J. D’Orsi (2013) ACR BI-RADS Atlas: Breast Imaging Reporting and Data System; Mammography, Ultrasound, Magnetic Resonance Imaging, Follow-up and Outcome Monitoring, Data Dictionary. ACR, American College of Radiology

  51. Zhou LQ, Wu XL, Huang SY et al (2020) Lymph node metastasis prediction from primary breast cancer US images using deep learning. Radiology 294(1):19–28

    Article  Google Scholar 

Download references

Acknowledgements

This project was supported by the Medical Science and Technology Research Foundation of Guangdong (B2019045, project approval, but non-subsidy).

Funding

The authors state that this work has not received any funding.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jinfeng Xu or Yun Zhang.

Ethics declarations

Guarantor

The scientific guarantor of this publication is Jinfeng Xu, who is the director of Ultrasound Department of Shenzhen People’s Hospital.

Conflict of interest

The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.

Statistics and biometry

One of the authors has significant statistical expertise.

Informed consent

Written informed consent was obtained from all subjects (patients) in this study.

Ethical approval

Institutional Review Board approval was obtained.

Methodology

• retrospective

• diagnostic or prognostic study

• performed at one institution

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

ESM 1

(DOC 1447 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, F., She, R., Cui, C. et al. One step further into the blackbox: a pilot study of how to build more confidence around an AI-based decision system of breast nodule assessment in 2D ultrasound. Eur Radiol 31, 4991–5000 (2021). https://doi.org/10.1007/s00330-020-07561-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00330-020-07561-7

Keywords

Navigation