poster

An evaluation of corpus-driven measures of medical concept similarity for information retrieval

Authors:
Bevan Koopman

CSIRO, Brisbane, Australia

CSIRO, Brisbane, Australia
View Profile

,
Guido Zuccon

CSIRO, Brisbane, Australia

CSIRO, Brisbane, Australia
View Profile

,
Peter Bruza

Queensland University of Technology, Brisbane, Australia

Queensland University of Technology, Brisbane, Australia
View Profile

,
Laurianne Sitbon

Queensland University of Technology, Brisbane, Australia

Queensland University of Technology, Brisbane, Australia
View Profile

,
Michael Lawley

CSIRO, Brisbane, Australia

CSIRO, Brisbane, Australia
View Profile

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementOctober 2012Pages 2439–2442https://doi.org/10.1145/2396761.2398661

Published:29 October 2012Publication History

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Pages 2439–2442

ABSTRACT

Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.

References

P. Agarwal and D. B. Searls. Can literature analysis identify innovation drivers in drug discovery? Nature reviews. Drug discovery, 8(11):865--78, Nov. 2009.Google ScholarCross Ref
A. R. Aronson and F.-M. Lang. An overview of MetaMap: historical perspective and recent advances. JAMIA, 17(3):229--236, 2010.Google Scholar
M. Bendersky, D. Metzler, and W. B. Croft. Parameterized concept weighting in verbose queries. In SIGIR'11, pages 605--614, Beijing, China, July 2011. Google ScholarDigital Library
J. Bullinaria and J. Levy. Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39(3):510, 2007.Google ScholarCross Ref
J. E. Caviedes and J. J. Cimino. Towards the development of a conceptual distance metric for the UMLS. Journal of biomedical informatics, 37(2):77--85, Apr. 2004. Google ScholarDigital Library
S. Cederberg and D. Widdows. Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In Proc of CoNLL'03, pages 111--118, NJ, USA, 2003. Google ScholarDigital Library
T. Cohen and D. Widdows. Empirical distributional semantics: Methods and biomedical applications. Journal of Biomedical Informatics, 42(2):390--405, 2009. Google ScholarDigital Library
P. Glenisson, P. Antal, J. Mathys, Y. Moreau, and B. D. Moor. Evaluation Of The Vector Space Representation In Text-Based Gene Clustering. In Proc Pacific Symposium of Biocomputing, pages 391--402, 2003.Google Scholar
W. Hersh. Information retrieval: a health and biomedical perspective. Springer Verlag, New York, 3rd edition, 2009.Google Scholar
B. Koopman, P. Bruza, L. Sitbon, and M. Lawley. Towards Semantic Search and Inference in Electronic Medical Records: an approach using Concept-based Information Retrieval. Australasian Medical Journal, In Press, 2012.Google ScholarCross Ref
K. Lund and C. Burgess. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavioral Research Methods, 28(2):203--208, 1996.Google ScholarCross Ref
T. Pedersen, S. Pakhomov, S. Patwardhan, and C. Chute. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3):288--299, 2007. Google ScholarDigital Library
M. Sahlgren. An introduction to random indexing. In Proc of TKE'05, pages 1--9, Leipzig, Germany, 2005.Google Scholar
D. Sánchez, M. Batet, and A. Valls. Computing Knowledge-Based Semantic Similarity from the Web: An Application to the Biomedical Domain. In Proc of Knowledge Science, Engineering and Management, KSEM'09, pages 17--28, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarDigital Library
D. Trieschnigg, E. Meij, M. de Rijke, and W. Kraaij. Measuring concept relatedness using language models. In Proc of SIGIR'08, pages 823--824, NY, USA, 2008. Google ScholarDigital Library
E. Voorhees and R. Tong. Overview of the TREC Medical Records Track. In Proc of TREC'11, MD, USA, 2011Google Scholar

Index Terms

An evaluation of corpus-driven measures of medical concept similarity for information retrieval
1. Information systems
  1. Information retrieval

Recommendations

Medical Semantic Similarity with a Neural Language Model
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ...
Read More
Measures of semantic similarity and relatedness in the biomedical domain

Measures of semantic similarity between concepts are widely used in Natural Language Processing. In this article, we show how six existing domain-independent measures can be adapted to the biomedical domain. These measures were originally based on ...
Read More
Evaluating medical information retrieval
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

This paper presents a framework for evaluating information retrieval of medical records. We use the BLULab corpus, a large collection of real-world de-identified medical records. The collection has been hand coded by clinical terminologists using the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
October 2012
2840 pages
ISBN:9781450311564
DOI:10.1145/2396761
General Chair:
Xuewen Chen
Wayne State University, USA
,
Program Chairs:
Guy Lebanon
Georgia Institute of Technology
,
Haixun Wang
Microsoft Research Asia
,
Mohammed J. Zaki
Rensselaer Polytechnic Institute
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
medical information retrieval
semantic similarity
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 307
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An evaluation of corpus-driven measures of medical concept similarity for information retrieval

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Medical Semantic Similarity with a Neural Language Model

Measures of semantic similarity and relatedness in the biomedical domain

Evaluating medical information retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An evaluation of corpus-driven measures of medical concept similarity for information retrieval

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Medical Semantic Similarity with a Neural Language Model

Measures of semantic similarity and relatedness in the biomedical domain

Evaluating medical information retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media