ABSTRACT
Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.
- P. Agarwal and D. B. Searls. Can literature analysis identify innovation drivers in drug discovery? Nature reviews. Drug discovery, 8(11):865--78, Nov. 2009.Google ScholarCross Ref
- A. R. Aronson and F.-M. Lang. An overview of MetaMap: historical perspective and recent advances. JAMIA, 17(3):229--236, 2010.Google Scholar
- M. Bendersky, D. Metzler, and W. B. Croft. Parameterized concept weighting in verbose queries. In SIGIR'11, pages 605--614, Beijing, China, July 2011. Google ScholarDigital Library
- J. Bullinaria and J. Levy. Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39(3):510, 2007.Google ScholarCross Ref
- J. E. Caviedes and J. J. Cimino. Towards the development of a conceptual distance metric for the UMLS. Journal of biomedical informatics, 37(2):77--85, Apr. 2004. Google ScholarDigital Library
- S. Cederberg and D. Widdows. Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In Proc of CoNLL'03, pages 111--118, NJ, USA, 2003. Google ScholarDigital Library
- T. Cohen and D. Widdows. Empirical distributional semantics: Methods and biomedical applications. Journal of Biomedical Informatics, 42(2):390--405, 2009. Google ScholarDigital Library
- P. Glenisson, P. Antal, J. Mathys, Y. Moreau, and B. D. Moor. Evaluation Of The Vector Space Representation In Text-Based Gene Clustering. In Proc Pacific Symposium of Biocomputing, pages 391--402, 2003.Google Scholar
- W. Hersh. Information retrieval: a health and biomedical perspective. Springer Verlag, New York, 3rd edition, 2009.Google Scholar
- B. Koopman, P. Bruza, L. Sitbon, and M. Lawley. Towards Semantic Search and Inference in Electronic Medical Records: an approach using Concept-based Information Retrieval. Australasian Medical Journal, In Press, 2012.Google ScholarCross Ref
- K. Lund and C. Burgess. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavioral Research Methods, 28(2):203--208, 1996.Google ScholarCross Ref
- T. Pedersen, S. Pakhomov, S. Patwardhan, and C. Chute. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3):288--299, 2007. Google ScholarDigital Library
- M. Sahlgren. An introduction to random indexing. In Proc of TKE'05, pages 1--9, Leipzig, Germany, 2005.Google Scholar
- D. Sánchez, M. Batet, and A. Valls. Computing Knowledge-Based Semantic Similarity from the Web: An Application to the Biomedical Domain. In Proc of Knowledge Science, Engineering and Management, KSEM'09, pages 17--28, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarDigital Library
- D. Trieschnigg, E. Meij, M. de Rijke, and W. Kraaij. Measuring concept relatedness using language models. In Proc of SIGIR'08, pages 823--824, NY, USA, 2008. Google ScholarDigital Library
- E. Voorhees and R. Tong. Overview of the TREC Medical Records Track. In Proc of TREC'11, MD, USA, 2011Google Scholar
Index Terms
- An evaluation of corpus-driven measures of medical concept similarity for information retrieval
Recommendations
Medical Semantic Similarity with a Neural Language Model
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementAdvances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ...
Measures of semantic similarity and relatedness in the biomedical domain
Measures of semantic similarity between concepts are widely used in Natural Language Processing. In this article, we show how six existing domain-independent measures can be adapted to the biomedical domain. These measures were originally based on ...
Evaluating medical information retrieval
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalThis paper presents a framework for evaluating information retrieval of medical records. We use the BLULab corpus, a large collection of real-world de-identified medical records. The collection has been hand coded by clinical terminologists using the ...
Comments