KXtractor: An Effective Biomedical Information Extraction Technique Based on Mixture Hidden Markov Models

Song, Min; Song, Il-Yeol; Hu, Xiaohua; Allen, Robert B.

doi:10.1007/11567752_5

KXtractor: An Effective Biomedical Information Extraction Technique Based on Mixture Hidden Markov Models

Min Song²¹,
Il-Yeol Song²¹,
Xiaohua Hu²¹ &
…
Robert B. Allen²¹

Conference paper

344 Accesses

Part of the book series: Lecture Notes in Computer Science ((TCSB,volume 3680))

Abstract

We present a novel information extraction (IE) technique, KXtractor, which combines a text chunking technique and Mixture Hidden Markov Models (MiHMM). KXtractor overcomes the problem of the single Part-Of-Speech (POS) HMMs with modeling the rich representation of text where features overlap among state units such as word, line, sentence, and paragraph. KXtractor also resolves issues with the traditional HMMs for IE that operate only on the semi-structured data such as HTML documents and other text sources in which language grammar does not play a pivotal role. We compared KXtractor with three IE techniques: 1) RAPIER, an inductive learning-based machine learning system, 2) a Dictionary-based extraction system, and 3) single POS HMM. Our experiments showed that KXtractor outperforms these three IE systems in extracting protein-protein interactions. In our experiments, the F-measure for KXtractor was higher than for RAPIER, a dictionary-based system, and single POS HMM respectively by 16.89%, 16.28%, and 8.58%. In addition, both precision and recall of KXtractor are higher than those systems.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blaschke, C., Andrade, M.A., Ouzounis, C., Valencia, A.: Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions. In: Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, Heidelberg, Germany, pp. 60–67 (1999)
Google Scholar
Bunescu, R., Ge, R., Kate, R.J., Marcotte, E.M., Mooney, R.J., Ramani, A.K., Wong, Y.W.: Comparative Experiments on Learning Information Extractors for Proteins and their Interactions. To appear in Journal Artificial Intelligence in Medicine (Special Issue on Summarization and Information Extraction from Medical Documents) (2004)
Google Scholar
Califf, M.E., Mooney, R.J.: Relational Learning of Pattern-Match Rules for Information Extraction. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI 1999), Orlando, FL, pp. 328–334 (1999)
Google Scholar
Collier, N., Nobata, C., Tsujii, J.: Extracting the Names of Genes and Gene Products with a Hidden Markov Model. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), Saarbrucken, Germany, pp. 201–207 (2000)
Google Scholar
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Freitag, D., McCallum, A.: Information extraction with HMMs and shrinkage. In: AAAI 1999 Workshop on Machine Learning for Information Extraction, Orlando, FL, pp. 31–36 (1999)
Google Scholar
Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17(suppl. 1), S74–S82 (2001)
Google Scholar
Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Pacific Symposium Biocomputing, pp. 707–718 (1998)
Google Scholar
Jenssen, T.K., Laegreid, A., Komorowski, J., Hovig, E.: A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics 28(1), 21–28 (2001)
Article Google Scholar
Krauthammer, M., Kra, P., Iossifov, I., Gomez, S.M., Hripcsak, G.: Of truth and pathways: Chasing bits of information through myriads of articles. Bioinformatics 18(suppl. 1), S249–S257 (2002)
Google Scholar
Kudo, T., Matsumoto, Y.: Use of Support Vector Learning for Chunk Identification. In: Proceedings of CoNLL 2000 and LLL 2000, Saarbruncken, Germany, pp. 142–144 (2000)
Google Scholar
Leek, T.R.: Information extraction using Hidden Markov Models. MSc Thesis, Department of Computer Science, University of California, San Diego (1997)
Google Scholar
National Library of Medicine, The MEDLINE database (2003), http://www.mcbi.nlm.nih.gov/PubMed/
Proux, D., Rechenmann, F., Julliard, L.: A pragmatic information extraction strategy for gathering data on genetic interactions. In: Proceedings of International Conference on Intelligent System for Molecular Biology, La Jolla, CA, vol. 8, pp. 279–285 (2000)
Google Scholar
Pustejovsky, J., Castano, J., Zhang, J., Kotecki, M., Cochran, B.: Robust relational parsing over biomedical literature: extracting inhibit relations. In: Pacific Symposium on Biocomputing, pp. 362–373 (2002)
Google Scholar
Rabiner, L.R.: A Tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286 (1989)
Article Google Scholar
Shatkay, H., Feldman, R.: Mining the biomedical literature in the genomic era: An overview. Journal of Computational Biology 10(6), 821–855 (2003)
Article Google Scholar
Skounakis, M., Craven, M., Ray, S.: Hierarchical Hidden Markov Models for Information Extraction. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, August 2003, pp. 427–433 (2003)
Google Scholar
Song, M., Song, I.-Y., Hu, X.: KPSpotter: A Flexible Information Gain-based Keyphrase Extraction System. In: Fifth International Workshop on Web Information and Data Management (WIDM 2003), New Orleans, LA, pp. 50–53 (2003)
Google Scholar
Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Processing 13, 260–269 (1967)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Science and Technology, Drexel University, Philadelphia, PA, 19104
Min Song, Il-Yeol Song, Xiaohua Hu & Robert B. Allen

Authors

Min Song
View author publications
You can also search for this author in PubMed Google Scholar
Il-Yeol Song
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohua Hu
View author publications
You can also search for this author in PubMed Google Scholar
Robert B. Allen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre for Computational and Systems Biology, The Microsoft Research - University of Trento, Piazza Manci, 17, 38050, Povo (TN), Italy
Corrado Priami
Department of Computer Science, Georgia State University, GA 30303, Atlanta, USA
Alexander Zelikovsky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, M., Song, IY., Hu, X., Allen, R.B. (2005). KXtractor: An Effective Biomedical Information Extraction Technique Based on Mixture Hidden Markov Models. In: Priami, C., Zelikovsky, A. (eds) Transactions on Computational Systems Biology II. Lecture Notes in Computer Science(), vol 3680. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11567752_5

Download citation

DOI: https://doi.org/10.1007/11567752_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29401-6
Online ISBN: 978-3-540-31661-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics