skip to main content
10.1145/3307339.3342167acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

PPPred: Classifying Protein-phenotype Co-mentions Extracted from Biomedical Literature

Published:04 September 2019Publication History

ABSTRACT

The MEDLINE database provides an extensive source of scientific articles and heterogeneous biomedical information in the form of unstructured text. One of the most important knowledge present within articles are the relations between human proteins and their phenotypes, which can stay hidden due to the exponential growth of publications. This has presented a range of opportunities for the development of computational methods to extract these biomedical relations from the articles. However, currently, no such method exists for the automated extraction of relations involving human proteins and human phenotype ontology (HPO) terms. In our previous work, we developed a comprehensive database composed of all co-mentions of proteins and phenotypes. In this study, we present a supervised machine learning approach called PPPred (Protein-Phenotype Predictor) for classifying the validity of a given sentence-level co-mention. Using an in-house developed gold standard dataset, we demonstrate that PPPred significantly outperforms several baseline methods. This two-step approach of co-mention extraction and classification constitutes a complete biomedical relation extraction pipeline for extracting protein-phenotype relations.

References

  1. Behrouz Bokharaeian, Alberto Diaz, Nasrin Taghizadeh, Hamidreza Chitsaz, and Ramyar Chavoshinejad. 2017. SNPPhenA: a corpus for extracting ranked associations of single-nucleotide polymorphisms and phenotypes from literature. Journal of biomedical semantics , Vol. 8, 1 (2017), 14.Google ScholarGoogle ScholarCross RefCross Ref
  2. Quoc-Chinh Bui, Sophia Katrenko, and Peter MA Sloot. 2010. A hybrid approach to extract Protein--Protein Interactions. Bioinformatics , Vol. 27, 2 (2010), 259--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sumit Kumar Chaturvedi, Mohammad Khursheed Siddiqi, Parvez Alam, and Rizwan Hasan Khan. 2016. Protein misfolding and aggregation: mechanism, factors and detection. Process Biochemistry , Vol. 51, 9 (2016), 1183--1192.Google ScholarGoogle ScholarCross RefCross Ref
  4. Elizabeth S Chen, George Hripcsak, Hua Xu, Marianthi Markatou, and Carol Friedman. 2008. Automated acquisition of disease--drug knowledge from biomedical and clinical documents: an initial study. Journal of the American Medical Informatics Association , Vol. 15, 1 (2008), 87--98.Google ScholarGoogle ScholarCross RefCross Ref
  5. Fabrizio Chiti and Christopher M Dobson. 2017. Protein misfolding, amyloid formation, and human disease: a summary of progress over the last decade. Annual review of biochemistry , Vol. 86 (2017), 27--68.Google ScholarGoogle Scholar
  6. Rajesh Chowdhary, Jinfeng Zhang, and Jun S Liu. 2009. Bayesian inference of protein--protein interactions from biological literature. Bioinformatics , Vol. 25, 12 (2009), 1536--1542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Adrien Coulet, Nigam H Shah, et almbox. 2010. Using text to build semantic networks for pharmacogenomics. Journal of biomedical informatics , Vol. 43, 6 (2010), 1009--1019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Mark Craven. 1999. Learning to extract relations from MEDLINE. In AAAI-99 workshop on machine learning for information extraction, Vol. 5. The AAAI Press, 604--611.Google ScholarGoogle Scholar
  9. International Society for Biocuration. 2018. Biocuration: Distilling data into knowledge. PLOS Biology , Vol. 16, 4 (04 2018), 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  10. Chern-Sing Goh, Tara A Gianoulis, et almbox. 2006. Integration of curated databases to identify genotype-phenotype associations. BMC genomics , Vol. 7, 1 (2006), 257.Google ScholarGoogle Scholar
  11. Peter W Harrison, Alison E Wright, and Judith E Mank. 2012. The evolution of gene expression and the transcriptome--phenotype relationship. In Seminars in cell & developmental biology, Vol. 23. Elsevier, 222--229.Google ScholarGoogle Scholar
  12. F Ulrich Hartl. 2017. Protein misfolding diseases. Annual Review of Biochemistry , Vol. 86 (2017), 21--26.Google ScholarGoogle ScholarCross RefCross Ref
  13. Minlie Huang, Xiaoyan Zhu, Yu Hao, Donald G Payan, Kunbin Qu, and Ming Li. 2004. Discovering patterns to extract Protein--Protein Interactions from full texts. Bioinformatics , Vol. 20, 18 (2004), 3604--3612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. cS enay Kafkas and Robert Hoehndorf. 2019. Ontology based text mining of gene-phenotype associations: application to candidate gene prediction. Database , Vol. 2019 (2019).Google ScholarGoogle Scholar
  15. Sophia Katrenko and Pieter Adriaans. 2007. Learning relations from biomedical corpora using dependency trees. In Knowledge Discovery and Emergent Complexity in Bioinformatics. Springer, 61--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Maryam Khordad and Robert E Mercer. 2017. Identifying genotype-phenotype relationships in biomedical text. Journal of biomedical semantics , Vol. 8, 1 (2017), 57.Google ScholarGoogle ScholarCross RefCross Ref
  17. Sebastian Köhler, Sandra C Doelken, et almbox. 2013. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic acids research , Vol. 42, D1 (2013), D966--D974.Google ScholarGoogle Scholar
  18. Jan O Korbel, Tobias Doerks, et almbox. 2005. Systematic association of genes to phenotypes by genome and literature mining. PLoS biology , Vol. 3, 5 (2005), e134.Google ScholarGoogle Scholar
  19. Andre Lamurias, Luka A Clarke, and Francisco M Couto. 2017. Extracting microRNA-gene relations from biomedical literature using distant supervision. PloS one , Vol. 12, 3 (2017), e0171929.Google ScholarGoogle ScholarCross RefCross Ref
  20. Mark Larance and Angus I Lamond. 2015. Multidimensional proteomics for cell biology. Nature reviews Molecular cell biology , Vol. 16, 5 (2015), 269.Google ScholarGoogle Scholar
  21. Pei-Yau Lung, Zhe He, Tingting Zhao, Disa Yu, and Jinfeng Zhang. 2019. Extracting chemical--protein interactions from literature using sentence structure analysis and feature engineering. Database , Vol. 2019 (2019).Google ScholarGoogle Scholar
  22. ASM Ashique Mahmood, Tsung-Jung Wu, Raja Mazumder, and K Vijay-Shanker. 2016. DiMeX: a text mining system for mutation-disease association extraction. PloS one , Vol. 11, 4 (2016), e0152725.Google ScholarGoogle Scholar
  23. Edward M Marcotte, Ioannis Xenarios, and David Eisenberg. 2001. Mining literature for Protein--Protein Interactions. Bioinformatics , Vol. 17, 4 (2001), 359--363.Google ScholarGoogle ScholarCross RefCross Ref
  24. Mary L McHugh. 2012. Interrater reliability: the Kappa statistic. Biochemia medica: Biochemia medica , Vol. 22, 3 (2012), 276--282.Google ScholarGoogle Scholar
  25. Ines Moreno-Gonzalez, George Edwards III, Natalia Salvadores, Mohammad Shahnawaz, Rodrigo Diaz-Espinoza, and Claudio Soto. 2017. Molecular interaction between type 2 diabetes and Alzheimer's disease through cross-seeding of protein misfolding. Molecular psychiatry , Vol. 22, 9 (2017), 1327.Google ScholarGoogle Scholar
  26. See-Kiong Ng and Marie Wong. 1999. Toward routine automatic pathway discovery from on-line scientific text abstracts. Genome Informatics , Vol. 10 (1999), 104--112.Google ScholarGoogle Scholar
  27. Yifan Peng, Anthony Rios, Ramakanth Kavuluru, and Zhiyong Lu. 2018. Extracting chemical--protein relations with ensembles of SVM and deep learning models. Database , Vol. 2018 (2018), bay073.Google ScholarGoogle ScholarCross RefCross Ref
  28. Morteza Pourreza Shahri and Indika Kahanda. 2018. Extracting Co-mention Features from Biomedical Literature for Automated Protein Phenotype Prediction using PHENOstruct. In 10th International Conference on Bioinformatics and Computational Biology, BICOB 2018. 123--128.Google ScholarGoogle Scholar
  29. Morteza Pourreza Shahri and Indika Kahanda. 2019. ProPheno 1.0: An online dataset for accelerating the complete characterization of the human protein-phenotype landscape in biomedical literature. (2019).Google ScholarGoogle Scholar
  30. KE Ravikumar, Majid Rastegar-Mojarad, and Hongfang Liu. 2017. BELMiner: adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences. Database , Vol. 2017 (2017).Google ScholarGoogle Scholar
  31. Marylyn D Ritchie, Emily R Holzinger, Ruowang Li, Sarah A Pendergrass, and Dokyoon Kim. 2015. Methods of integrating data to uncover genotype--phenotype interactions. Nature Reviews Genetics , Vol. 16, 2 (2015), 85.Google ScholarGoogle ScholarCross RefCross Ref
  32. Peter N Robinson. 2012. Deep phenotyping for precision medicine. Human mutation , Vol. 33, 5 (2012), 777--780.Google ScholarGoogle Scholar
  33. Barbara Rosario and Marti A Hearst. 2004. Classifying semantic relations in bioscience texts. In Proceedings of the 42nd annual meeting on association for computational linguistics. Association for Computational Linguistics, 430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Takeshi Sekimizu, Hyun S Park, and Jun'ichi Tsujii. 1998. Identifying the interaction between genes and gene products based on frequently seen verbs in medline abstracts. Genome informatics , Vol. 9 (1998), 62--71.Google ScholarGoogle Scholar
  35. Qiancheng Shen, Feixiong Cheng, Huili Song, Weiqiang Lu, Junfei Zhao, Xiaoli An, Mingyao Liu, Guoqiang Chen, Zhongming Zhao, and Jian Zhang. 2017. Proteome-scale investigation of protein allosteric regulation perturbed by somatic mutations in 7,000 cancer genomes. The American Journal of Human Genetics , Vol. 100, 1 (2017), 5--20.Google ScholarGoogle ScholarCross RefCross Ref
  36. Ayush Singhal, Michael Simmons, and Zhiyong Lu. 2016a. Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature. Journal of the American Medical Informatics Association , Vol. 23, 4 (2016), 766--772.Google ScholarGoogle ScholarCross RefCross Ref
  37. Ayush Singhal, Michael Simmons, and Zhiyong Lu. 2016b. Text mining genotype-phenotype relationships from biomedical literature for database curation and precision medicine. PLoS computational biology , Vol. 12, 11 (2016), e1005017.Google ScholarGoogle Scholar
  38. Joshua M Temkin and Mark R Gilder. 2003. Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics , Vol. 19, 16 (2003), 2046--2053.Google ScholarGoogle ScholarCross RefCross Ref
  39. Akane Yakushiji, Yuka Tateisi, et almbox. 2000. Event extraction from biomedical papers using a full parser. In Biocomputing 2001 . World Scientific, 408--419.Google ScholarGoogle Scholar
  40. Yijia Zhang, Hongfei Lin, Zhihao Yang, Jian Wang, Shaowu Zhang, Yuanyuan Sun, and Liang Yang. 2018. A hybrid model based on neural networks for biomedical relation extraction. Journal of biomedical informatics , Vol. 81 (2018), 83--92.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. PPPred: Classifying Protein-phenotype Co-mentions Extracted from Biomedical Literature

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
      September 2019
      716 pages
      ISBN:9781450366663
      DOI:10.1145/3307339

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 September 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      BCB '19 Paper Acceptance Rate42of157submissions,27%Overall Acceptance Rate254of885submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader