On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts

Miftahutdinov, Zulfat; Alimova, Ilseyar; Tutubalina, Elena

doi:10.1007/978-3-030-45442-5_35

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12036))

Included in the following conference series:

European Conference on Information Retrieval

7494 Accesses
9 Citations

Abstract

Although deep neural networks yield state-of-the-art performance in biomedical named entity recognition (bioNER), much research shares one limitation: models are usually trained and evaluated on English texts from a single domain. In this work, we present a fine-grained evaluation intended to understand the efficiency of multilingual BERT-based models for bioNER of drug and disease mentions across two domains in two languages, namely clinical data and user-generated texts on drug therapy in English and Russian. We investigate the role of transfer learning (TL) strategies between four corpora to reduce the number of examples that have to be manually annotated. Evaluation results demonstrate that multi-BERT shows the best transfer capabilities in the zero-shot setting when training and test sets are either in the same language or in the same domain. TL reduces the amount of labeled data needed to achieve high performance on three out of four corpora: pretrained models reach 98–99% of the full dataset performance on both types of entities after training on 10–25% of sentences. We demonstrate that pretraining on data with one or both types of transfer can be effective.

You have full access to this open access chapter, Download conference paper PDF

Cross-Lingual Transfer Learning in Drug-Related Information Extraction from User-Generated Texts

Article 07 December 2023

A. S. Sakhovskiy & E. V. Tutubalina

Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization

Article Open access 17 December 2021

Renzo M. Rivera-Zavala & Paloma Martínez

Cross-Lingual Transfer Learning for Medical Named Entity Recognition

Keywords

1 Introduction

Drugs and diseases play a central role in many areas of biomedical research and healthcare. A large part of the biomedical research has focused on scientific abstracts in English; see a good overview of the field in [9]. In contrast to the biomedical literature, research into the processing of electronic health records (EHRs) and user-generated texts (UGTs) about drug therapy has not reached the same level of maturity. The bottleneck of modern supervised models for named entity recognition (NER) is the human effort needed to annotate sufficient training examples for each language or domain. Moreover, state of the art text processing models may perform extremely poorly under domain shift [14]. Recent advances in neural networks, especially deep contextualized word representations via language models [1, 4, 12, 16] and Transformer-based architectures [19], offer new opportunities to improve NER models in the biomedical field.

In this work, we take the task a step further from existing monolingual research in a single domain [2, 3, 6, 12, 13, 20, 22] by exploring multilingual transfer between EHRs and UGTs in different languages. Our goal is not to outperform state of the art models on each dataset separately, but to ask whether we can transfer knowledge from a high-resource language, such as English, to a low-resource one, e.g., Russian, for NER of biomedical entities. Our transfer learning strategy involves pretraining the multilingual cased BERT [4] on one corpus and transferring the learned weights to initialize training on a gold-standard corpus in another language or domain. In this work, we seek to answer the following research questions: RQ1: How well does a BERT-based NER model trained on one corpus works for the detection of drugs and diseases from another language or domain in the zero-shot setting? RQ2: Given a small number of training examples, can the NER model perform as well as a model trained on much larger datasets? RQ3: Will transfer learning help achieve more stable performance on a varying size of training data?

All experiments are carried out on 4 datasets: English corpora CADEC [10] and n2c2 [7], a dataset of EHRs in Russian [17], and our novel dataset of UGTs in Russian. All three existing corpora share an entity of interest with our corpus. To our knowledge, this is the first work exploring the interlingual transfer ability of multilingual BERT on bioNER on two domains in English and Russian.

2 Data

Each corpus is characterized by two parameters: (i) language: English (EN) or Russian (RU); (ii) domain: electronic health records (EHRs) or user-generated texts (UGTs). A statistical summary of the datasets is presented in Table 1. Since all corpora have different annotation schemes for disease-related entities, and these subtypes are highly imbalanced in the corpus, we join them into a single primary type named Disease. Further, we unify the names of four datasets according to their characteristics.

Table 1. Summary statistics of four datasets. Summary of each dataset includes the number of Drug and Disease entities, the number of documents and sentences, the average length of a document (in sentences), the average length of a sentence (in tokens), the average length of a Drug/Disease entity (in tokens).

Full size table

CADEC (EN UGT). CSIRO Adverse Drug Event Corpus [10] contains medical forum posts taken from AskaPatient.com about 12 drugs of two categories: Diclofenac and Lipitor. Medical students and computer scientists annotated the dataset. The agreement between four annotators computed on a set of 55 user posts was approximately 78% for Diclofenac and 95% for Lipitor posts.

Our Dataset of UGTs (RU UGT). We utilized and annotated user posts in Russian from a publicly accessible source Otzovik.com; we note that we have obtained all reviews without accessing password-protected information. Four annotators from the I.M. Sechenov First Moscow State Medical University and the department of pharmacology of the Kazan Federal University were asked to read the review and highlight all spans of text including drug names and patient’s health conditions experienced before/during/after the drug use. The agreement between two annotators computed on a set of 100 posts was 72%.

Russian EHRs (RU EHR). Shelmanov et al. [17] created a corpus of Russian clinical notes from a multi-disciplinary pediatric center. The authors extended an annotation scheme from the CLEF eHealth 2014 Task 2.

n2c2 (EN EHR). This corpus consists of de-identified EHRs [7]. Two independent annotators annotated each record in the dataset and a third annotator resolved conflicts. For both EHR corpora, the agreement rates were not provided.

3 Models

For NER, we utilize BERT with a softmax layer over all possible tags as the output. Word labels are encoded with the BIO tag scheme. The model was trained on a sentence level. Due to space constraints, we refer to [4, 12] for more details. In particular, we use BERT\(_{base}\), Multilingual Cased (Multi-BERT), which is pretrained on 104 languages and has 12 heads, 12 layers, 768 hidden units per layer, and a total of 110M parameters. All models were trained without fine-tuning or explicit selection of parameters. The loss function became stable (without significant decreases) after 35–40 epochs. We use Adam optimizer with polynomial decay to update the learning rate on each epoch with warm-up steps in the beginning. As baselines, we utilized LSTM-CRF with default settings from the Saber library [5] and BioBERT [12]. For LSTM-CRF, we adopted (i) 200-dim. word2vec embeddings trained on 2.5M of health-related posts in English [18] and (ii) 300-dim. word2vec embeddings trained on the Russian National Corpus [11].

4 Experiments and Evaluation

We randomly split each of the datasets into 70% training set and 30% test set. We trained a total of 720 models on one machine with 8 NVIDIA P40 GPUs. The training of all models took approximately 96 h. We compare all models in terms of precision (P), recall (R), and F1-score (F) on the test sets with exactly matching criteria via a CoNLL script. Our experiments are available at https://github.com/dartrevan/multilingual_experiments.

Comparison with Baselines. Table 2 shows the in-corpus (IC) performance of Multi-BERT with BioBERT and LSTM-CRF when trained and tested on the same corpus. On all datasets, BERT-based models achieve the best scores over LSTM-CRF based on word embeddings. The difference in the performance of BioBERT and Multi-BERT is not statistically significant; we measured significance with the two-tailed t-test (\(p \le 0.05\)). All models achieve much higher performance for the detection of drugs rather than diseases; it can be explained by boundary problems in multi-word expressions (see the av. length in Table 1).

Zero-Shot Transfer. To answer RQ1, we trained Multi-BERT on one corpus and then applied it to another language/domain in a zero-shot fashion, i.e., without further training. Results of the out-of-corpus (OOC) performance of Multi-BERT are presented in Table 3. For drug recognition, the best generalizability is achieved when training on EHRs and evaluated on UGTs in English. For OOC performance on the EN UGT corpus, the model reaches F1-scores of 77.08% and 36.31% when trained on the EN EHR and RU UGT corpora, respectively, while IC reaches the F1-score of 84.88%. We note that the number of sentences in the EN EHR corpus is nine times higher than in the EN UGT corpus. 78% of Drug tokens in the EN UGT corpus are presented in the EN EHR set (see Table 4). For OOC performance on the RU UGT corpus, the model achieves F1-scores of 26.31% and 34.78% when trained on the EN UGT and EN EHR corpora, respectively, while the IC performance is F1-score of 60.45%.

Table 2. In-corpus (IC) performance of multi-BERT with comparison to BiLSTM-CRF and BioBERT, measured by Precision, Recall, and F1-score with an exact matching criteria.

Full size table

Table 3. Out-of-corpus (OOC) performance of Multi-BERT in the zero-shot setting. OOC performance is derived by training on one corpus (train) and testing on another (test).

Full size table

Table 4. Summary statistics of Byte Pair Encoding (BPE) tokens of entities in four datasets. Summary includes the number of unique BPE tokens, intersection between un. tokens, percentage of shared tokens from unique set.

Full size table

Table 5. Summary of the number of training sentences needed to achieve 99% of the full dataset performance with Multi-BERT with pretraining.

Full size table

For disease recognition, Multi-BERT generalizes much worse to corpora other than it was trained on. For OOC performance on the RU UGT corpus, the model achieves F1-scores of 24.12% and 30.86% when trained on the EN UGT and RU EHR corpora, respectively, while the IC performance is F1-score of 49.35%. For OOC performance on the EN UGT corpus, the model obtains F1-scores of 37.94% and 4.32% when trained on the RU UGT and EN EHR corpora, respectively, while the IC performance is F1-score of 67.25%. One possible explanation might be that there are well-known differences in layperson language and professional medical terms.

Few-Shot Transfer. Transfer learning aims to solve the problem on a “target” dataset using knowledge learned from a “source” dataset [5, 15, 21]. In the transfer learning setting, the BERT-based NER model was pretrained on one of three “source” datasets (see Table 2 for the IC performance of these models). To answer RQ2 and RQ3, we begin with a random sampling of 50 sentences from a “target” training set, train the pretrained model on this subsampled dataset, and test it on the “target” test set. Next, we increase the sample size by 50 sentences of the “target” training set and repeat the described procedure, doing so up to 2000 sentences of the training set. In each round, we train from scratch to avoid overfitting, as suggested in [8].

For each pretraining setup, we record the size of the subset when the model achieves at least 99% of the F1-measure achieved on the full dataset. Results for the RU UGT, RU EHR, and EN UGT datasets are given in Table 5 and Fig. 1. Multi-BERT pretrained on the EN UGT set and trained on 2000 sentences from the EN EHR corpus (2.81% of the full corpus) obtains 92% F1 and 76% F1 of the full dataset performance on drugs and diseases respectively. As shown in Table 5 and Fig. 1, models with transfer knowledge outperform the models without the pretraining phase even in cases when both domain and language shifts between “source” and “target” sets. Using the transfer learning strategy could require up to 550 sentences less than training from scratch. In particular, models require only 10% and 23% of the EN UGT and RU URT corpora respectively to achieve results as good as full dataset performances. We believe that this observation is very crucial for low resource languages and new domains (e.g., social media, clinical trials). We observe that the performance of models with pretraining setup trained on the different numbers of sentences becomes more stable in terms of deviations between F1-scores (see Fig. 1).

5 Conclusion and Future Work

We studied the task of recognition of drug and disease mentions in English and a low-resource language in the biomedical area, using a newly collected Russian corpus of user reviews about drugs (RU UGT) with 3,624 manually annotated entities. We ask: can additional pretraining on an existing dataset be helpful for bioNER performance of multilingual BERT-based NER model on a new dataset with a small number of labeled examples if the domain, the language, or both shift between these datasets? Our study consisted of over 720 models trained on different subsets of two corpora in English and two corpora in Russian. For each language, we experimented with the clinical domain, i.e., electronic health records, and the social media domain, i.e., reviews about drug therapy. As expected, models with pretraining on data in the same language or the same domain obtain better results in zero-shot or few-shot settings. To our surprise, we found that pretraining on data with two shifts can be effective. The model with the best pretraining achieves 99% of the full dataset performance using only 23.56% of the training data on our RU URT corpus, while the model with pretraining on data with two shifts (the EN EHR set) used 26.1% of the training data. The model without pretraining achieves similar results on the RU URT corpus using 31.97% of the training set.

We foresee three directions for future work. First, transfer learning and multi-task strategies on three and more domains remain to be explored. Second, a promising research direction is the evaluation of multilingual BERT on a broad set of entities. Third, future research will focus on the creation of fine-grained entity types in our corpus of Russian reviews that can help in finding associations between drugs and adverse drug reactions.

References

Alsentzer, E., et al.: Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. 72–78 (2019)
Google Scholar
Crichton, G., Pyysalo, S., Chiu, B., Korhonen, A.: A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinform. 18(1), 368 (2017)
Article Google Scholar
Dang, T.H., Le, H.Q., Nguyen, T.M., Vu, S.T.: D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information. Bioinformatics 34(20), 3539–3546 (2018)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1(Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Giorgi, J., Bader, G.: Towards reliable named entity recognition in the biomedical domain. Bioinformatics (2019)
Google Scholar
Gupta, A., Goyal, P., Sarkar, S., Gattu, M.: Fully contextualized biomedical NER. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds.) ECIR 2019. LNCS, vol. 11438, pp. 117–124. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15719-7_15
Chapter Google Scholar
Henry, S., Buchan, K., Filannino, M., Stubbs, A., Uzuner, O.: 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records. J. Am. Med. Inform. Assoc. (2019)
Google Scholar
Hu, P., Lipton, Z.C., Anandkumar, A., Ramanan, D.: Active learning with partial feedback. arXiv preprint arXiv:1802.07427 (2018)
Huang, C.C., Lu, Z.: Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief. Bioinform. 17(1), 132–144 (2015)
Article Google Scholar
Karimi, S., Metke-Jimenez, A., Kemp, M., Wang, C.: Cadec: a corpus of adverse drug event annotations. J. Biomed. Inform. 55, 73–81 (2015)
Article Google Scholar
Kutuzov, A., Kunilovskaya, M.: Size vs. structure in training corpora for word embedding models: araneum russicum maximum and Russian National Corpus. In: van der Aalst, W.M.P., et al. (eds.) AIST 2017. LNCS, vol. 10716, pp. 47–58. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73013-4_5
Chapter Google Scholar
Lee, J., et al.: BioBERT: pre-trained biomedical language representation model for biomedical text mining. Bioinformatics (2019)
Google Scholar
Miftahutdinov, Z., Tutubalina, E.: Deep neural models for medical concept normalization in user-generated texts. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 393–399 (2019)
Google Scholar
Neumann, M., King, D., Beltagy, I., Ammar, W.: ScispaCy: fast and robust models for biomedical natural language processing. arXiv preprint arXiv:1902.07669 (2019)
Pan, S., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Discov. Data Eng. 22(10) (2010)
Google Scholar
Peters, M., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), vol. 1, pp. 2227–2237 (2018)
Google Scholar
Shelmanov, A., Smirnov, I., Vishneva, E.: Information extraction from clinical texts in Russian. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”, vol. 14, pp. 537–549 (2015)
Google Scholar
Tutubalina, E.V., et al.: Using semantic analysis of texts for the identification of drugs with similar therapeutic effects. Russ. Chem. Bull. 66(11), 2180–2189 (2017). https://doi.org/10.1007/s11172-017-2000-8
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Weber, L., Münchmeyer, J., Rocktäschel, T., Habibi, M., Leser, U.: HUNER: improving biomedical NER with pretraining. Bioinformatics (2019)
Google Scholar
Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data 3(1), 1–40 (2016). https://doi.org/10.1186/s40537-016-0043-6
Article Google Scholar
Zhao, S., Liu, T., Zhao, S., Wang, F.: A neural multi-task learning framework to jointly model medical named entity recognition and normalization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 817–824 (2019)
Google Scholar

Download references

Acknowledgments

We thank Sergey Nikolenko for helpful discussions. This research was supported by the Russian Science Foundation grant # 18-11-00284.

Author information

Authors and Affiliations

Chemoinformatics and Molecular Modeling Laboratory, Kazan Federal University, Kazan, Russia
Zulfat Miftahutdinov, Ilseyar Alimova & Elena Tutubalina
Samsung-PDMI Joint AI Center, Steklov Mathematical Institute at St. Petersburg, Saint Petersburg, Russia
Elena Tutubalina
Insilico Medicine Hong Kong Ltd., Pak Shek Kok, New Territories, Hong Kong
Elena Tutubalina

Authors

Zulfat Miftahutdinov
View author publications
You can also search for this author in PubMed Google Scholar
Ilseyar Alimova
View author publications
You can also search for this author in PubMed Google Scholar
Elena Tutubalina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elena Tutubalina .

Editor information

Editors and Affiliations

University of Glasgow, Glasgow, UK
Joemon M. Jose
University College London, London, UK
Emine Yilmaz
Universidade NOVA de Lisboa, Lisbon, Portugal
João Magalhães
Universidad Autónoma de Madrid, Madrid, Spain
Pablo Castells
University of Padua, Padua, Italy
Nicola Ferro
Universidade de Lisboa, Lisbon, Portugal
Mário J. Silva
Universidade NOVA de Lisboa, Lisbon, Portugal
Flávio Martins

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miftahutdinov, Z., Alimova, I., Tutubalina, E. (2020). On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts. In: Jose, J., et al. Advances in Information Retrieval. ECIR 2020. Lecture Notes in Computer Science(), vol 12036. Springer, Cham. https://doi.org/10.1007/978-3-030-45442-5_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-45442-5_35
Published: 08 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45441-8
Online ISBN: 978-3-030-45442-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts

Abstract

Similar content being viewed by others

Cross-Lingual Transfer Learning in Drug-Related Information Extraction from User-Generated Texts

Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization

Cross-Lingual Transfer Learning for Medical Named Entity Recognition

Keywords

1 Introduction

2 Data

3 Models

4 Experiments and Evaluation

5 Conclusion and Future Work

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On Biomedical Named Entity Recognition: Experiments in Interlingual Transfer for Clinical and Social Media Texts

Abstract

Similar content being viewed by others

Cross-Lingual Transfer Learning in Drug-Related Information Extraction from User-Generated Texts

Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization

Cross-Lingual Transfer Learning for Medical Named Entity Recognition

Keywords

1 Introduction

2 Data

3 Models

4 Experiments and Evaluation

5 Conclusion and Future Work

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation