Methods Inf Med 2013; 52(05): 395-402
DOI: 10.3414/ME12-01-0054
Original Articles
Schattauer GmbH

Developing Topic-specific Search Filters for PubMed with Click-through Data

J. Li
1   Institute of Medical Information and Library, Chinese Academy of Medical Sciences, Beijing, China
2   National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
,
Z. Lu
2   National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
› Author Affiliations
Further Information

Publication History

received: 12 June 2012

accepted: 27 February 2013

Publication Date:
20 January 2018 (online)

Summary

Objectives: Search filters have been developed and demonstrated for better information access to the immense and ever-growing body of publications in the biomedical domain. However, to date the number of filters remains quite limited because the current filter development methods require significant human efforts in manual document review and filter term selection. In this regard, we aim to investigate automatic methods for generating search filters.

Methods: We present an automated method to develop topic-specific filters on the basis of users’ search logs in PubMed. Specifically, for a given topic, we first detect its relevant user queries and then include their corresponding clicked articles to serve as the topic-relevant document set accordingly. Next, we statistically identify informative terms that best represent the topic-relevant document set using a background set composed of topic irrelevant articles. Lastly, the selected representative terms are combined with Boolean operators and evaluated on benchmark datasets to derive the final filter with the best performance.

Results: We applied our method to develop filters for four clinical topics: nephrology, diabetes, pregnancy, and depression. For the nephrology filter, our method obtained performance comparable to the state of the art (sensitivity of 91.3%, specificity of 98.7%, precision of 94.6%, and accuracy of 97.2%). Similarly, high-performing results (over 90% in all measures) were obtained for the other three search filters.

Conclusion: Based on PubMed click-through data, we successfully developed a high-performance method for generating topic-specific search filters that is significantly more efficient than existing manual methods. All data sets (topic-relevant and irrelevant document sets) used in this study and a demonstration system are publicly available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/downloads/CQ_filter/

 
  • References

  • 1 Shariff SZ, Sontrop JM, Haynes RB, Iansavichus AV, McKibbon KA, Wilczynski NL. et al. Impact of PubMed search filters on the retrieval of evidence by physicians. CMAJ 2012; 184 (03) E184-E190.
  • 2 Lee E, Dobbins M, Decorby K, McRae L, Tirilis D, Husson H. An optimal search filter for retrieving systematic reviews and meta-analyses. BMC Med Res Methodol 2012; 12: 51
  • 3 Golder S, Loke YK. The performance of adverse effects search filters in MEDLINE and EMBASE. Health Info Libr J 2012; 29 (02) 141-151.
  • 4 Jenkins M. Evaluation of methodological search filters-a review. Health Info Libr J 2004; 21 (03) 148-163.
  • 5 PubMed’s Clinical Queries [internet] Bethesda (MD): National Library of Medicine (US) (cited 2012 Dec). Available from: http://www.ncbi.nlm.nih.gov/pubmed/clinical.
  • 6 Garg AX, Iansavichus AV, Wilczynski NL, Kastner M, Baier LA, Shariff SZ. et al. Filtering Medline for a clinical discipline: diagnostic test assessment framework. BMJ 2009; 339: b3435
  • 7 Iansavichus AV, Haynes RB, Shariff SZ, Weir M, Wilczynski NL, McKibbon KA. et al. Optimal search filters for renal information in EMBASE. Am J of Kidney Dis 2010; 56 (01) 14-22.
  • 8 van de Glind EM, van Munster BC, Spijker R, Scholten RJ, Hooft L. Search filters to identify geriatric medicine in Medline. J Am Med Inform Assoc 2012; 19 (03) 468-472.
  • 9 Kastner M, Wilczynski NL, Walker-Dilks C, McKibbon KA, Haynes RB. Age-specific search strategies for Medline. J Med Internet Res 2006; 8 (04) e25
  • 10 Beahler CC, Sundheim JJ, Trapp NI. Information retrieval in systematic reviews: challenges in the public health arena. Am J Prev Med 2000; 18 (Suppl. 04) 6-10.
  • 11 Goss C, Lowenstein S, Roberts I, Diguiseppi C. Identifying controlled studies of alcohol-impaired driving prevention: designing an effective search strategy. J Inf Sci 2007; 33 (02) 151-162.
  • 12 Mesgarpour B, Muller M, Herkner H. Search strategies to identify reports on "off-label” drug use in EMBASE. BMC Med Res Methodol 2012; 12: 190
  • 13 Lu Z, Xie N, Wilbur WJ. Identifying related journals through log analysis. Bioinformatics 2009; 25 (22) 3038-3039.
  • 14 Lu Z, Wilbur WJ, McEntyre JR, Iskhakov A, Szilagyi L. Finding query suggestions for PubMed. Proceedings of the American Medical Informatics Association 2009 Annual Symposium; 2009 Nov. San Francisco, USA.: AMIA; 2009: 14-18.
  • 15 Islamaj Dogan R, Lu Z. Click-words: learning to predict document keywords from a user perspective. Bioinformatics 2010; 26 (21) 2767-2775.
  • 16 Neveol A, Islamaj Dogan R, Lu Z. Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction. J Biomed Inform 2011; 44 (02) 310-318.
  • 17 Wilczynski NL, Morgan D, Haynes RB. An overview of the design and methods for retrieving high-quality studies for clinical care. BMC Med Inform Decis Mak 2005; 5: 20
  • 18 Medical Subject Headings (MeSH®) [internet] Bethesda (MD): National Library of Medicine (US) (cited 2012 Dec). Available from: http://www.nlm.nih.gov/mesh/.
  • 19 Corrao S, Colomba D, Arnone S, Argano C, Di Chiara T, Scaglione R. et al. Improving efficacy of PubMed Clinical Queries for retrieving scientifically strong studies on treatment. J Am Med Inform Assoc 2006; 13 (05) 485-487.
  • 20 Islamaj Dogan R, Murray GC, Neveol A, Lu Z. Understanding PubMed user search behavior through log analysis. Database (Oxford) 2009; 2009: bap018
  • 21 Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proceedings of the American Medical Informatics Association 2001 Annual Symposium; 2001 Nov 3-7. Washington DC, USA: AMIA; 2001.
  • 22 MetaMap [internet] Bethesda (MD): National Library of Medicine (US) (cited 2012 Dec). Available from: http://metamap.nlm.nih.gov/.
  • 23 Unified Medical Language System ® (UMLS®) [internet] Bethesda (MD): National Library of Medicine (US) (cited 2012 Dec). Available from: http://www.nlm.nih.gov/research/umls/.
  • 24 Core clinical journals [internet] Bethesda (MD): National Library of Medicine (US) (cited 2012 Dec). Available from: http://www.nlm.nih.gov/bsd/aim.html.
  • 25 Croft WB, Metzler D, Strohman T. Search engines: information retrieval in practice. 2nd ed. Pearson Education Inc 2010.
  • 26 Indexed Field in PubMed (internet) Bethesda (MD): National Library of Medicine (US) (cited 2012 Dec). Available from: http://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.Search_ Field_Descrip.
  • 27 McCray AT, Gefeller O, Aronsky D, Leong TY, Sarkar IN, Bergemann D. et al. The birth and evolution of a discipline devoted to information in biomedicine and health care. As reflected in its longest running journal. Methods Inf Med 2011; 50 (06) 491-507.
  • 28 Kastrin A, Peterlin B, Hristovski D. Chi-square-based scoring function for categorization of MEDLINE citations. Methods Inf Med 2010; 49 (04) 371-378.
  • 29 Yen YT, Chen B, Chiu HW, Lee YC, Li YC, Hsu CY. Developing an NLP and IR-based algorithm for analyzing gene-disease relationships. Methods Inf Med 2006; 45 (03) 321-329.
  • 30 Neveol A, Islamaj Dogan R, Lu Z. Author keywords in biomedical journal articles. Proceedings of the American Medical Informatics Association 2010 Annual Symposium; 2010 Nov 13-17. Washington DC, USA.: AMIA; 2010.
  • 31 Lu Z, Kim W, Wilbur WJ. Evaluation of query expansion using MeSH in PubMed. Inf Retr Boston 2009; 12 (01) 69-80.
  • 32 Robinson KA, Dickersin K. Development of a highly sensitive search strategy for the retrieval of reports of controlled trials using PubMed. Int J Epidemiol 2002; 31 (01) 150-153.
  • 33 Huang M, Neveol A, Lu Z. Recommending MeSH terms for annotating biomedical articles. J Am Med Inform Assoc 2011; 18 (05) 660-667.
  • 34 Macdonald C, Ounis I. Usefulness of quality click-through data for training. Proceedings of the Workshop on Web Search Click Data; 2009 Feb 9. Barcelona, Spain.: ACM; 2009
  • 35 Joachims T. Optimizing search engines using clickthrough data. Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining; 2002 Jul 23-26. Edmonton, Alberta, Canada.: ACM SIGKDD; 2002.
  • 36 Joachims T, Granka L, Pan B, Hembrooke H, Gay G. Accurately interpreting clickthrough data as implicit feedback. Proceedings of the 28th International Conference on Research and Development in Information Retrieval; 2005 Aug 15-19. Salvador, Brazil: ACM SIGIR; 2005.
  • 37 Carterette B, Jones R. Evaluating search engines by modeling the relationship between relevance and clicks. Proceedings of the 21st Annual Conference on Neural Information Processing Systems; 2007 Dec 3-6. Vancouver, B.C., Canada: NIPS; 2007.
  • 38 Xu J, Chen C, Xu G, Li H, Abib ERT. Improving quality of training data for learning to rank using click-through data. Proceedings of the 3rd International Conference on Web Search and Data Mining; 2010 Feb 3-6. New York, USA: ACM WSDM; 2010.