skip to main content
10.1145/3539618.3591997acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper
Open Access

HiPrompt: Few-Shot Biomedical Knowledge Fusion via Hierarchy-Oriented Prompting

Published:18 July 2023Publication History

ABSTRACT

Medical decision-making processes can be enhanced by comprehensive biomedical knowledge bases, which require fusing knowledge graphs constructed from different sources via a uniform index system. The index system often organizes biomedical terms in a hierarchy to provide the aligned entities with fine-grained granularity. To address the challenge of scarce supervision in the biomedical knowledge fusion (BKF) task, researchers have proposed various unsupervised methods. However, these methods heavily rely on ad-hoc lexical and structural matching algorithms, which fail to capture the rich semantics conveyed by biomedical entities and terms. Recently, neural embedding models have proved effective in semantic-rich tasks, but they rely on sufficient labeled data to be adequately trained. To bridge the gap between the scarce-labeled BKF and neural embedding models, we propose HiPrompt, a supervision-efficient knowledge fusion framework that elicits the few-shot reasoning ability of large language models through hierarchy-oriented prompts. Empirical results on the collected KG-Hi-BKF benchmark datasets demonstrate the effectiveness of HiPrompt.

References

  1. Krisztian Balog and Robert Neumayer. 2012. Hierarchical target type identification for entity-oriented queries. In CIKM.Google ScholarGoogle Scholar
  2. Bodo Billerbeck and Justin Zobel. 2005. Document expansion versus query expansion for ad-hoc retrieval. In Proceedings of the 10th Australasian Document Computing Symposium.Google ScholarGoogle Scholar
  3. David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent Dirichlet Allocation. JMLR (2003).Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Olivier Bodenreider. 2004. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research (2004).Google ScholarGoogle Scholar
  5. Antoine Bordes, Nicolas Usunier, Alberto Garc'i a-Durá n, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In NeurIPS.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Adam S Brown and Chirag J Patel. 2017. A standard database for drug repositioning. Scientific data (2017).Google ScholarGoogle Scholar
  7. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. NeurIPS (2020).Google ScholarGoogle Scholar
  8. Payal Chandak, Kexin Huang, and Marinka Zitnik. 2022. Building a knowledge graph to enable precision medicine. bioRxiv (2022).Google ScholarGoogle Scholar
  9. Muhao Chen, Yingtao Tian, Mohan Yang, and Carlo Zaniolo. 2017. Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment. In IJCAI.Google ScholarGoogle Scholar
  10. Xiangjue Dong, Jiaying Lu, Jianling Wang, and James Caverlee. 2023. Closed-book Question Generation via Contrastive Learning. In EACL.Google ScholarGoogle Scholar
  11. Daniel Faria, Catia Pesquita, Emanuel Santos, Matteo Palmonari, Isabel F Cruz, and Francisco M Couto. 2013. The agreementmakerlight ontology matching system. In ODBASE.Google ScholarGoogle Scholar
  12. Michael Glass, Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Ankita Naik, Pengshan Cai, and Alfio Gliozzo. 2022. Re2G: Retrieve, Rerank, Generate. In NAACL.Google ScholarGoogle Scholar
  13. Daniel Scott Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, and Sergio E Baranzini. 2017. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. Elife (2017).Google ScholarGoogle Scholar
  14. Shuai Jiang, Qiheng Qian, Tongtong Zhu, Wenting Zong, Yunfei Shang, Tong Jin, Yuansheng Zhang, Ming Chen, Zishan Wu, Yuan Chu, et al. 2023. Cell Taxonomy: a curated repository of cell types with multifaceted characterization. Nucleic Acids Research (2023).Google ScholarGoogle Scholar
  15. Ernesto Jiménez-Ruiz and Bernardo Cuenca Grau. 2011. Logmap: Logic-based and scalable ontology matching. In ISWC.Google ScholarGoogle Scholar
  16. Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners. In ICML 2022 Workshop on Knowledge Retrieval and Language Models.Google ScholarGoogle Scholar
  17. Fangyu Liu, Ehsan Shareghi, Zaiqiao Meng, Marco Basaldella, and Nigel Collier. 2021. Self-Alignment Pretraining for Biomedical Entity Representations. In NAACL.Google ScholarGoogle Scholar
  18. Xiao Liu, Haoyun Hong, Xinghao Wang, Zeyi Chen, Evgeny Kharlamov, Yuxiao Dong, and Jie Tang. 2022. SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs. In The Web Conference.Google ScholarGoogle Scholar
  19. Jiaying Lu, Xiangjue Dong, and Carl Yang. 2023. Weakly Supervised Concept Map Generation through Task-Guided Graph Translation. IEEE Transactions on Knowledge and Data Engineering (2023).Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jiaying Lu and Carl Yang. 2022. Open-World Taxonomy and Knowledge Graph Co-Learning. In 4th Conference on Automated Knowledge Base Construction.Google ScholarGoogle Scholar
  21. Wenjing Ma, Jiaying Lu, and Hao Wu. 2023. Cellcano: supervised cell type identification for single cell ATAC-seq data. Nature Communications (2023).Google ScholarGoogle Scholar
  22. Yoshitomo Matsubara, Thuy Vu, and Alessandro Moschitti. 2020. Reranking for efficient transformer-based answer selection. In SIGIR.Google ScholarGoogle Scholar
  23. David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. 2021. Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350 (2021).Google ScholarGoogle Scholar
  24. Xiang Ren, Jiaming Shen, Meng Qu, Xuan Wang, Zeqiu Wu, Qi Zhu, Meng Jiang, Fangbo Tao, Saurabh Sinha, David Liem, Peipei Ping, Richard M. Weinshilboum, and Jiawei Han. 2017. Life-iNet: A Structured Network-Based Knowledge Exploration and Analytics System for Life Sciences. In ACL.Google ScholarGoogle Scholar
  25. Eric Sven Ristad and Peter N Yianilos. 1998. Learning string-edit distance. TPAMI (1998).Google ScholarGoogle Scholar
  26. Stephen Robertson, Hugo Zaragoza, et al. 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval (2009).Google ScholarGoogle Scholar
  27. Gerard Salton and Chris Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management (1988).Google ScholarGoogle Scholar
  28. Alberto Santos, Ana R Colacc o, Annelaura B Nielsen, Lili Niu, Maximilian Strauss, Philipp E Geyer, Fabian Coscia, Nicolai J Wewer Albrechtsen, Filip Mundt, Lars Juhl Jensen, and Matthias Mann. 2022. A knowledge graph to interpret clinical proteomics data. Nature Biotechnology (2022).Google ScholarGoogle Scholar
  29. Timo Schick and Hinrich Schütze. 2021. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In EACL.Google ScholarGoogle Scholar
  30. Lynn M Schriml, James B Munro, Mike Schor, Dustin Olley, Carrie McCracken, Victor Felix, J Allen Baron, Rebecca Jackson, Susan M Bello, Cynthia Bearer, et al. 2022. The human disease ontology 2022 update. Nucleic acids research (2022).Google ScholarGoogle Scholar
  31. Jiaming Shen and Jiawei Han. 2022. Automated Taxonomy Discovery and Exploration. Springer Nature.Google ScholarGoogle Scholar
  32. Jiaming Shen, Wenda Qiu, Yu Meng, Jingbo Shang, Xiang Ren, and Jiawei Han. 2021. TaxoClass: Hierarchical multi-label text classification using only class names. In NAACL.Google ScholarGoogle Scholar
  33. Taylor Shin, Yasaman Razeghi, Robert L Logan IV, Eric Wallace, and Sameer Singh. 2020. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In EMNLP.Google ScholarGoogle Scholar
  34. Dibakar Sigdel, Vincent Kyi, Aiden Zhang, Shaun P Setty, David Liem, Yu Shi, Xuan Wang, Jiaming Shen, Wei Wang, Jiawei Han, and Peipei Ping. 2019. Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications. JoVE, Vol. 144 (2019).Google ScholarGoogle Scholar
  35. Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. 2022. Large Language Models Encode Clinical Knowledge. arXiv preprint arXiv:2212.13138 (2022).Google ScholarGoogle Scholar
  36. Chang Su, Yu Hou, Suraj Rajendran, Jacqueline RMA Maasch, Zehra Abedi, Haotan Zhang, Zilong Bai, Anthony Cuturrufo, Winston Guo, Fayzan F Chaudhry, et al. 2021. Biomedical Discovery through the integrative Biomedical Knowledge Hub (iBKH). medRxiv (2021).Google ScholarGoogle Scholar
  37. Fabian M Suchanek, Serge Abiteboul, and Pierre Senellart. 2011. PARIS: Probabilistic Alignment of Relations, Instances, and Schema. VLDB (2011).Google ScholarGoogle Scholar
  38. Zequn Sun, Qingheng Zhang, Wei Hu, Chengming Wang, Muhao Chen, Farahnaz Akrami, and Chengkai Li. 2020. A benchmarking study of embedding-based entity alignment for knowledge graphs. VLDB (2020).Google ScholarGoogle Scholar
  39. Together. 2023. GPT-JT-6B. https://huggingface.co/togethercomputer/GPT-JT-6B-v1. Accessed on February 14, 2023.Google ScholarGoogle Scholar
  40. George Tsatsaronis, Georgios Balikas, Prodromos Malakasiotis, Ioannis Partalas, Matthias Zschunke, Michael R Alvers, Dirk Weissenborn, Anastasia Krithara, Sergios Petridis, Dimitris Polychronopoulos, et al. 2015. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC bioinformatics (2015).Google ScholarGoogle Scholar
  41. Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, and Huan Sun. 2022a. Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. arXiv preprint arXiv:2212.10001 (2022).Google ScholarGoogle Scholar
  42. Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval. In SIGIR.Google ScholarGoogle Scholar
  43. Lu Wang, Ruiming Tang, Xiaofeng He, and Xiuqiang He. 2022b. Hierarchical imitation learning via subgoal representation learning for dynamic treatment recommendation. In WSDM.Google ScholarGoogle Scholar
  44. David S Wishart, Yannick D Feunang, An C Guo, Elvis J Lo, Ana Marcu, Jason R Grant, Tanvir Sajed, Daniel Johnson, Carin Li, Zinat Sayeeda, et al. 2018. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic acids research (2018).Google ScholarGoogle Scholar
  45. Zhibiao Wu and Martha Palmer. 1994. Verbs semantics and lexical selection. In ACL.Google ScholarGoogle Scholar
  46. Bo Xiong, Nico Potyka, Trung-Kien Tran, Mojtaba Nayyeri, and Steffen Staab. 2022. Faithful Embeddings for EL Knowledge Bases. In ISWC.Google ScholarGoogle Scholar
  47. Chengjin Xu, Fenglong Su, Bo Xiong, and Jens Lehmann. 2022. Time-aware Entity Alignment using Temporal Relational Attention. In WWW.Google ScholarGoogle Scholar
  48. Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022).Google ScholarGoogle Scholar
  49. Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. 2023. Automatic Chain of Thought Prompting in Large Language Models. In The Eleventh International Conference on Learning Representations.Google ScholarGoogle Scholar
  50. Chaoyu Zhu, Zhihao Yang, Xiaoqiong Xia, Nan Li, Fan Zhong, and Lei Liu. 2022. Multimodal reasoning based on knowledge graph embedding for specific diseases. Bioinformatics (2022).Google ScholarGoogle Scholar

Index Terms

  1. HiPrompt: Few-Shot Biomedical Knowledge Fusion via Hierarchy-Oriented Prompting

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader