Abstract
Recognizing named entities in Adverse Drug Reactions narratives is a crucial step towards extracting valuable patient information from unstructured text and transforming the information into an easily processable structured format. This motivates using advanced data analytics to support data-driven pharmacovigilance. Yet existing biomedical named entity recognition (NER) tools are limited in their ability to identify certain entity types from these domain-specific narratives, resulting in poor accuracy. To address this shortcoming, we propose our novel methodology called Tiered Ensemble Learning System with Diversity (TELS-D), an ensemble approach that integrates a rich variety of named entity recognizers to procure the final result. There are two specific challenges faced by biomedical NER: the classes are imbalanced and the lack of a single best performing method. The first challenge is addressed through a balanced, under-sampled bagging strategy that depends on the imbalance level to overcome this highly skewed data problem. To address the second challenge, we design an ensemble of heterogeneous entity recognizers that leverages a novel ensemble combiner. Our experimental results demonstrate that for biomedical text datasets: (i) a balanced learning environment combined with an ensemble of heterogeneous classifiers consistently improves the performance over individual base learners and (ii) stacking-based ensemble combiner methods outperform simple majority voting based solutions by 0.3 in F1-score.
We are grateful for funding to in part support this research, including by the Seeds of STEM at WPI via the Institute of Education Sciences, U.S. Department of Education grant R305A150571, Oak Ridge Associated Universities (ORAU) for two ORISE Fellowships to conduct research with the U.S. Food and Drug Administration, and the Department of Education GAANN fellowship grant P200A150306.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2014)
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium, p. 17. AMIA (2001)
Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003)
Bird, S., et al.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
Błaszczyński, J., Stefanowski, J., Idkowiak, Ł.: Extending bagging for imbalanced data. In: Burduk, R., Jackowski, K., Kurzynski, M., Wozniak, M., Zolnierek, A. (eds.) Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013. Advances in Intelligent Systems and Computing, vol. 226, pp. 269–278. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-319-00969-8_26
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting on ACL, pp. 173–180. ACL (2005)
Chawla, N.V.: Data mining for imbalanced datasets: An overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4_45
Doan, S., Xu, H.: Recognizing medication related entities in hospital discharge summaries using support vector machine. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 259–266. ACL (2010)
FDA: FAERS (FDA adverse event reporting system) (2016)
Feng, X., et al.: Assessing pancreatic cancer risk associated with dipeptidyl peptidase 4 inhibitors: data mining of FDA adverse event reporting system (FAERS). J. Pharmacovigilance 1, 1–7 (2013)
Ferrucci, D., Lally, A.: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng. 10(3–4), 327–348 (2004)
Friedman, C., Alderson, P.O., Austin, J.H., Cimino, J.J., Johnson, S.B.: A general natural-language text processor for clinical radiology. JAMIA 1(2), 161–174 (1994)
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)
Ghiasvand, O.: Disease name extraction from clinical text using conditional random fields. Ph.D. thesis, The University of Wisconsin-Milwaukee (2014)
Halgrim, S.R., Xia, F., Solti, I., Cadag, E., Uzuner, Ö.: A cascade of classifiers for extracting medication information from discharge summaries. J. Biomed. Semant. 2(3), S2 (2011)
Harpaz, R., et al.: Text mining for adverse drug events: the promise, challenges, and state of the art. Drug Saf. 37(10), 777–790 (2014)
Jagannatha, A.N., Yu, H.: Bidirectional RNN for medical event detection in electronic health records. In: Proceedings of the conference. ACL. North American Chapter. Meeting, vol. 2016, p. 473. NIH Public Access (2016)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, Stanford, CA, vol. 14, pp. 1137–1145 (1995)
Kuhn, M., Letunic, I., Jensen, L.J., Bork, P.: The sider database of drugs and side effects. Nucleic Acids Res. 44(D1), D1075–D1079 (2015)
Lazarou, J., Pomeranz, B.H., Corey, P.N.: Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. JAMA 279(15), 1200–1205 (1998)
Longadge, R., Dongre, S.: Class imbalance problem in data mining review. arXiv preprint arXiv:1305.1707 (2013)
Nguyen, H., Patrick, J.: Text mining in clinical domain: dealing with noise. In: KDD, pp. 549–558 (2016)
Polikar, R.: Ensemble learning. Scholarpedia 4(1), 2776 (2009). https://doi.org/10.4249/scholarpedia.2776. revision #91224
Ramesh, B.P., Belknap, S.M., Li, Z., Frid, N., West, D.P., Yu, H.: Automatically recognizing medication and adverse event information from food and drug administration’s adverse event reporting system narratives. JMIR Med. Inform. 2(1), e10 (2014)
Sakaeda, T., Tamon, A., Kadoyama, K., Okuno, Y.: Data mining of the public version of the FDA adverse event reporting system. Int. J. Med. Sci. 10(7), 796 (2013)
Savova, G.K., et al.: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. JAMIA 17(5), 507–513 (2010)
Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)
Simpson, M.S., Demner-Fushman, D.: Biomedical text mining: a survey of recent progress. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 465–517. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_14
Tan, P.N., et al.: Introduction to Data Mining. Pearson Education India, New Delhi (2006)
Uzuner, Ö., Solti, I., Cadag, E.: Extracting medication information from clinical text. JAMIA 17(5), 514–518 (2010)
Uzuner, Ö., Solti, I., Xia, F., Cadag, E.: Community annotation experiment for ground truth generation for the i2b2 medication challenge. JAMIA 17(5), 519–523 (2010)
Uzuner, Ö., Zhang, X., Sibanda, T.: Machine learning and rule-based approaches to assertion classification. JAMIA 16(1), 109–115 (2009)
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM, pp. 324–331 (2009)
Wilson, A.M., Thabane, L., Holbrook, A.: Application of data mining techniques in pharmacovigilance. BJCP 57(2), 127–134 (2004)
Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
Wunnava, S., et al.: One size does not fit all: an ensemble approach towards information extraction from adverse drug event narratives. In: Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, pp. 176–188. INSTICC, SciTePress (2018). https://doi.org/10.5220/0006600201760188
Xu, H., Stenner, S.P., Doan, S., Johnson, K.B., Waitman, L.R., Denny, J.C.: MedEx: a medication information extraction system for clinical narratives. JAMIA 17(1), 19–24 (2010)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wunnava, S. et al. (2019). Multi-layered Learning for Information Extraction from Adverse Drug Event Narratives. In: Cliquet Jr., A., et al. Biomedical Engineering Systems and Technologies. BIOSTEC 2018. Communications in Computer and Information Science, vol 1024. Springer, Cham. https://doi.org/10.1007/978-3-030-29196-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-29196-9_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29195-2
Online ISBN: 978-3-030-29196-9
eBook Packages: Computer ScienceComputer Science (R0)