Natural language processing to predict isocitrate dehydrogenase genotype in diffuse glioma using MR radiology reports

Kim, Minjae; Ong, Kai Tzu-iunn; Choi, Seonah; Yeo, Jinyoung; Kim, Sooyon; Han, Kyunghwa; Park, Ji Eun; Kim, Ho Sung; Choi, Yoon Seong; Ahn, Sung Soo; Kim, Jinna; Lee, Seung-Koo; Sohn, Beomseok

doi:10.1007/s00330-023-10061-z

Natural language processing to predict isocitrate dehydrogenase genotype in diffuse glioma using MR radiology reports

Neuro
Published: 11 August 2023

Volume 33, pages 8017–8025, (2023)
Cite this article

European Radiology Aims and scope Submit manuscript

Minjae Kim^1,2,
Kai Tzu-iunn Ong³,
Seonah Choi⁴,
Jinyoung Yeo³,
Sooyon Kim⁵,
Kyunghwa Han¹,
Ji Eun Park²,
Ho Sung Kim²,
Yoon Seong Choi¹,
Sung Soo Ahn¹,
Jinna Kim¹,
Seung-Koo Lee¹ &
…
Beomseok Sohn ORCID: orcid.org/0000-0002-6765-8056^1,6

596 Accesses
2 Citations
2 Altmetric
Explore all metrics

A Commentary to this article was published on 22 September 2023

Abstract

Objectives

To evaluate the performance of natural language processing (NLP) models to predict isocitrate dehydrogenase (IDH) mutation status in diffuse glioma using routine MR radiology reports.

Materials and methods

This retrospective, multi-center study included consecutive patients with diffuse glioma with known IDH mutation status from May 2009 to November 2021 whose initial MR radiology report was available prior to pathologic diagnosis. Five NLP models (long short-term memory [LSTM], bidirectional LSTM, bidirectional encoder representations from transformers [BERT], BERT graph convolutional network [GCN], BioBERT) were trained, and area under the receiver operating characteristic curve (AUC) was assessed to validate prediction of IDH mutation status in the internal and external validation sets. The performance of the best performing NLP model was compared with that of the human readers.

Results

A total of 1427 patients (mean age ± standard deviation, 54 ± 15; 779 men, 54.6%) with 720 patients in the training set, 180 patients in the internal validation set, and 527 patients in the external validation set were included. In the external validation set, BERT GCN showed the highest performance (AUC 0.85, 95% CI 0.81−0.89) in predicting IDH mutation status, which was higher than LSTM (AUC 0.77, 95% CI 0.72−0.81; p = .003) and BioBERT (AUC 0.81, 95% CI 0.76−0.85; p = .03). This was higher than that of a neuroradiologist (AUC 0.80, 95% CI 0.76−0.84; p = .005) and a neurosurgeon (AUC 0.79, 95% CI 0.76−0.84; p = .04).

Conclusion

BERT GCN was externally validated to predict IDH mutation status in patients with diffuse glioma using routine MR radiology reports with superior or at least comparable performance to human reader.

Clinical relevance statement

Natural language processing may be used to extract relevant information from routine radiology reports to predict cancer genotype and provide prognostic information that may aid in guiding treatment strategy and enabling personalized medicine.

Key Points

• A transformer-based natural language processing (NLP) model predicted isocitrate dehydrogenase mutation status in diffuse glioma with an AUC of 0.85 in the external validation set.

• The best NLP models were superior or at least comparable to human readers in both internal and external validation sets.

• Transformer-based models showed higher performance than conventional NLP model such as long short-term memory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning and deep learning approach for medical image analysis: diagnosis to detection

Article 24 December 2022

Deep learning modelling techniques: current progress, applications, advantages, and challenges

Article Open access 17 April 2023

Convolutional neural networks: an overview and application in radiology

Article Open access 22 June 2018

Abbreviations

AUC:: Area under the receiver operating characteristic curve
BERT:: Bidirectional encoder representations from transformers
BiLSTM:: Bidirectional long short-term memory
GCN:: Graph convolutional network
IDH:: Isocitrate dehydrogenase
LSTM:: Long short-term memory
NLP:: Natural language processing
VASARI:: Visually Accessible Rembrandt Images

References

Pons E, Braun LM, Hunink MG, Kors JA (2016) Natural language processing in radiology: a systematic review. Radiology 279:329–343
Article PubMed Google Scholar
Donnelly LF, Grzeszczuk R, Guimaraes CV (2022) Use of natural language processing (NLP) in evaluation of radiology reports: an update on applications and technology advances. Semin Ultrasound CT MRI 43:176–181
Article Google Scholar
Casey A, Davidson E, Poon M et al (2021) A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak 21:179
Article PubMed PubMed Central Google Scholar
Do RKG, Lupton K, Andrieu PIC et al (2021) Patterns of metastatic disease in patients with cancer derived from natural language processing of structured CT radiology reports over a 10-year period. Radiology 301:115–122
Article PubMed Google Scholar
Yim W-w, Yetisgen M, Harris WP, Kwan SW (2016) Natural language processing in oncology: a review. JAMA Oncol 2:797–804
Article PubMed Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780
Article CAS PubMed Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1 (long and short papers). Association for computational linguistics, pp 4171–4186. https://doi.org/10.48550/arXiv.1810.04805
Lee J, Yoon W, Kim S et al (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36:1234–1240
Article CAS PubMed Google Scholar
Savova GK, Danciu I, Alamudun F et al (2019) Use of natural language processing to extract clinical cancer phenotypes from electronic medical records. Cancer Res 79:5463–5470
Article CAS PubMed PubMed Central Google Scholar
Fink MA, Kades K, Bischoff A et al (2022) Deep learning–based assessment of oncologic outcomes from natural language processing of structured radiology reports. Radiol Artif Intell 4:e220055
Article PubMed PubMed Central Google Scholar
Liu F, Zhou P, Baccei SJ et al (2021) Qualifying certainty in radiology reports through deep learning–based natural language processing. AJNR Am J Neuroradiol 42:1755–1761
CAS PubMed PubMed Central Google Scholar
Chaudhari GR, Liu T, Chen TL et al (2022) Application of a domain-specific BERT for detection of speech recognition errors in radiology reports. Radiol Artif Intell 4:e210185
Article PubMed PubMed Central Google Scholar
Tejani AS, Ng YS, Xi Y, Fielding JR, Browning TG, Rayan JC (2022) Performance of multiple pretrained BERT models to automate and accelerate data annotation for large datasets. Radiol Artif Intell 4:e220007
Article PubMed PubMed Central Google Scholar
Iorga M, Drakopoulos M, Naidech AM, Katsaggelos AK, Parrish TB, Hill VB (2022) Labeling noncontrast head CT reports for common findings using natural language processing. AJNR Am J Neuroradiol 43:721–726
Article CAS PubMed PubMed Central Google Scholar
Sanson M, Marie Y, Paris S et al (2009) Isocitrate dehydrogenase 1 codon 132 mutation is an important prognostic biomarker in gliomas. J Clin Oncol 27:4150–4154
Article CAS PubMed Google Scholar
Yan H, Parsons DW, Jin G et al (2009) IDH1 and IDH2 mutations in gliomas. N Engl J Med 360:765–773
Article CAS PubMed PubMed Central Google Scholar
Hartmann C, Hentschel B, Wick W et al (2010) Patients with IDH1 wild type anaplastic astrocytomas exhibit worse prognosis than IDH1-mutated glioblastomas, and IDH1 mutation status accounts for the unfavorable prognostic effect of higher age: implications for classification of gliomas. Acta Neuropathol 120:707–718
Article PubMed Google Scholar
Zhou H, Vallières M, Bai HX et al (2017) MRI features predict survival and molecular markers in diffuse lower-grade gliomas. Neuro Oncol 19:862–870
Article CAS PubMed PubMed Central Google Scholar
Choi YS, Bae S, Chang JH et al (2020) Fully automated hybrid approach to predict the IDH mutation status of gliomas via deep learning and radiomics. Neuro Oncol 23:304–313
Article PubMed Central Google Scholar
Park YW, Han K, Ahn SS et al (2018) Prediction of IDH1-mutation and 1p/19q-codeletion status using preoperative MR imaging phenotypes in lower grade gliomas. AJNR Am J Neuroradiol 39:37–42
Article CAS PubMed PubMed Central Google Scholar
Gutman DA, Cooper LA, Hwang SN et al (2013) MR imaging predictors of molecular profile and survival: multi-institutional study of the TCGA glioblastoma data set. Radiology 267:560–569
Article PubMed PubMed Central Google Scholar
Zhu Y, Kiros R, Zemel R et al (2015) Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: Ren X, Chen CC, Gupta A, Malik J (eds) Proceedings of the IEEE international conference on computer vision. IEEE, pp 19–27
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems. Curran Associates, Inc., vol 30, pp 5998–6008
Lin Y, Meng Y, Sun X et al (2021) BertGCN: transductive text classification by combining GCN and BERT. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, pp 1456–1462. https://doi.org/10.48550/arXiv.2105.05727
Youden WJ (1950) Index for rating diagnostic tests. Cancer 3:32–35
Article CAS PubMed Google Scholar
Dipnall JF, Lu J, Gabbe BJ et al (2022) Comparison of state-of-the-art machine and deep learning algorithms to classify proximal humeral fractures using radiology text. Eur J Radiol 153:110366
Article PubMed Google Scholar
Olthof AW, Shouche P, Fennema EM et al (2021) Machine learning based natural language processing of radiology reports in orthopaedic trauma. Comput Methods Programs Biomed 208:106304
Article CAS PubMed Google Scholar
Choi YS, Bae S, Chang JH et al (2021) Fully automated hybrid approach to predict the IDH mutation status of gliomas via deep learning and radiomics. Neuro Oncol 23:304–313
Article CAS PubMed Google Scholar
Kim M, Jung SY, Park JE et al (2020) Diffusion- and perfusion-weighted MRI radiomics model may predict isocitrate dehydrogenase (IDH) mutation and tumor aggressiveness in diffuse lower grade glioma. Eur Radiol 30:2142–2151
Article PubMed Google Scholar
Suh CH, Kim HS, Jung SC, Choi CG, Kim SJ (2019) Imaging prediction of isocitrate dehydrogenase (IDH) mutation in patients with glioma: a systemic review and meta-analysis. Eur Radiol 29:745–758
Article PubMed Google Scholar
Senders JT, Cho LD, Calvachi P et al (2020) Automating clinical chart review: an open-source natural language processing pipeline developed on free-text radiology reports from patients with glioblastoma. JCO Clin Cancer Inform 4:25–34
Article PubMed Google Scholar
Mozayan A, Fabbri AR, Maneevese M, Tocino I, Chheang S (2021) Practical guide to natural language processing for radiology. Radiographics 41:1446–1453
Article PubMed Google Scholar
Park SH, Choi J, Byeon J-S (2021) Key principles of clinical validation, device approval, and insurance coverage decisions of artificial intelligence. Korean J Radiol 22:442–453
Article PubMed PubMed Central Google Scholar
Yan A, McAuley J, Lu X et al (2022) RadBERT: adapting transformer-based language models to radiology. Radiol Artif Intell 4:e210258
Article PubMed PubMed Central Google Scholar

Download references

Funding

The authors state that this work has not received any funding.

Author information

Authors and Affiliations

Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
Minjae Kim, Kyunghwa Han, Yoon Seong Choi, Sung Soo Ahn, Jinna Kim, Seung-Koo Lee & Beomseok Sohn
Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
Minjae Kim, Ji Eun Park & Ho Sung Kim
Department of Artificial Intelligence, College of Computing, Yonsei University, Seoul, Korea
Kai Tzu-iunn Ong & Jinyoung Yeo
Department of Neurosurgery, Brain Tumor Center, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
Seonah Choi
Department of Statistics and Data Science, Yonsei University, Seoul, Korea
Sooyon Kim
Department of Radiology and Center for Imaging Sciences, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
Beomseok Sohn

Authors

Minjae Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kai Tzu-iunn Ong
View author publications
You can also search for this author in PubMed Google Scholar
Seonah Choi
View author publications
You can also search for this author in PubMed Google Scholar
Jinyoung Yeo
View author publications
You can also search for this author in PubMed Google Scholar
Sooyon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kyunghwa Han
View author publications
You can also search for this author in PubMed Google Scholar
Ji Eun Park
View author publications
You can also search for this author in PubMed Google Scholar
Ho Sung Kim
View author publications
You can also search for this author in PubMed Google Scholar
Yoon Seong Choi
View author publications
You can also search for this author in PubMed Google Scholar
Sung Soo Ahn
View author publications
You can also search for this author in PubMed Google Scholar
Jinna Kim
View author publications
You can also search for this author in PubMed Google Scholar
Seung-Koo Lee
View author publications
You can also search for this author in PubMed Google Scholar
Beomseok Sohn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Beomseok Sohn.

Ethics declarations

Guarantor

The scientific guarantor of this publication is Beomseok Sohn.

Conflict of interest

The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.

Statistics and biometry

Sooyon Kim and Kyunghwa Han kindly provided statistical advice for this manuscript.

Informed consent

Written informed consent was waived by the Institutional Review Board.

Ethical approval

Institutional Review Board approval was obtained.

Study subjects or cohorts overlap

There were no cohorts overlap.

Methodology

• Retrospective

• Diagnostic or prognostic study

• Multicenter study

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kim, M., Ong, K.Ti., Choi, S. et al. Natural language processing to predict isocitrate dehydrogenase genotype in diffuse glioma using MR radiology reports. Eur Radiol 33, 8017–8025 (2023). https://doi.org/10.1007/s00330-023-10061-z

Download citation

Received: 12 January 2023
Revised: 18 May 2023
Accepted: 22 June 2023
Published: 11 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s00330-023-10061-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing to predict isocitrate dehydrogenase genotype in diffuse glioma using MR radiology reports