skip to main content
10.1145/2396761.2398661acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

An evaluation of corpus-driven measures of medical concept similarity for information retrieval

Published:29 October 2012Publication History

ABSTRACT

Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.

References

  1. P. Agarwal and D. B. Searls. Can literature analysis identify innovation drivers in drug discovery? Nature reviews. Drug discovery, 8(11):865--78, Nov. 2009.Google ScholarGoogle ScholarCross RefCross Ref
  2. A. R. Aronson and F.-M. Lang. An overview of MetaMap: historical perspective and recent advances. JAMIA, 17(3):229--236, 2010.Google ScholarGoogle Scholar
  3. M. Bendersky, D. Metzler, and W. B. Croft. Parameterized concept weighting in verbose queries. In SIGIR'11, pages 605--614, Beijing, China, July 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Bullinaria and J. Levy. Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39(3):510, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  5. J. E. Caviedes and J. J. Cimino. Towards the development of a conceptual distance metric for the UMLS. Journal of biomedical informatics, 37(2):77--85, Apr. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Cederberg and D. Widdows. Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In Proc of CoNLL'03, pages 111--118, NJ, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Cohen and D. Widdows. Empirical distributional semantics: Methods and biomedical applications. Journal of Biomedical Informatics, 42(2):390--405, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Glenisson, P. Antal, J. Mathys, Y. Moreau, and B. D. Moor. Evaluation Of The Vector Space Representation In Text-Based Gene Clustering. In Proc Pacific Symposium of Biocomputing, pages 391--402, 2003.Google ScholarGoogle Scholar
  9. W. Hersh. Information retrieval: a health and biomedical perspective. Springer Verlag, New York, 3rd edition, 2009.Google ScholarGoogle Scholar
  10. B. Koopman, P. Bruza, L. Sitbon, and M. Lawley. Towards Semantic Search and Inference in Electronic Medical Records: an approach using Concept-based Information Retrieval. Australasian Medical Journal, In Press, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  11. K. Lund and C. Burgess. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavioral Research Methods, 28(2):203--208, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  12. T. Pedersen, S. Pakhomov, S. Patwardhan, and C. Chute. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3):288--299, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Sahlgren. An introduction to random indexing. In Proc of TKE'05, pages 1--9, Leipzig, Germany, 2005.Google ScholarGoogle Scholar
  14. D. Sánchez, M. Batet, and A. Valls. Computing Knowledge-Based Semantic Similarity from the Web: An Application to the Biomedical Domain. In Proc of Knowledge Science, Engineering and Management, KSEM'09, pages 17--28, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Trieschnigg, E. Meij, M. de Rijke, and W. Kraaij. Measuring concept relatedness using language models. In Proc of SIGIR'08, pages 823--824, NY, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Voorhees and R. Tong. Overview of the TREC Medical Records Track. In Proc of TREC'11, MD, USA, 2011Google ScholarGoogle Scholar

Index Terms

  1. An evaluation of corpus-driven measures of medical concept similarity for information retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
      October 2012
      2840 pages
      ISBN:9781450311564
      DOI:10.1145/2396761

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader