Testing a Cancer Meta Spider
Introduction
In the healthcare profession, potentially significant decisions often depend on the availability of reliable and up-to-date information, although health-related data, especially those that are Web-based, are highly distributed, of varying quality, and difficult to locate. For instance, clinical information is often mingled with non-clinical information, and consumer information is undistinguished from clinical or research information (Hersh, 1996). Documents with different amounts of technical detail and varying quality are often mixed together in an unstructured way and it has become increasingly difficult to judge the quality and credibility of a piece of Web-based medical information. As a result, medical professionals and the general public increasingly experience the information and cognitive overload problem (Bowman et al., 1994) when seeking medical information.
The development of medical information retrieval (IR) systems such as medical search engines and peer-reviewed medical Web directories has helped alleviate this problem. However, the effectiveness and usefulness of these systems have been limited by low search precision and poor presentation of the retrieved documents. More specifically the following should be noted.
- •
Finding specific answers to user questions can be time-consuming and expensive, in part because of the amount of effort required to browse through large collections of returned documents and to identify the relevant ones. A search query in a general search engine like Google often returns thousands of results.
- •
Traditional search engines, including medical search engines, present search results as ranked lists, ordered by estimated relevance to the query. A major drawback of this presentation is that it fails to give users a quick overall “feel” for the retrieved documents and requires often significant manual browsing effort from users to locate documents of interest.
The rest of the paper is structured as follows. Section 2 surveys the current status of IR in the healthcare domain and some IR techniques, including meta-searching, document indexing, and post-retrieval clustering and visualization. In Section 3, we present the architectural design and major components of Cancer Spider. Section 4 discusses the design of a user study conducted to evaluate the proposed approach, and Section 5 reports and discusses the findings of this user study. We conclude the paper in Section 6 with a discussion of future research and system development.
Section snippets
Information retrieval in healthcare
Healthcare is an information-intensive business. Hersh (1996) classified textural health information into two main categories. The first category is patient-specific information. The second category is knowledge-based information, which can be further divided into the following three layers. Primary knowledge-based information contains original research reported in academic journals, books, technical reports, and other sources. Secondary knowledge-based information consists of indexes that
Cancer Spider system architecture
In this section, we present the architectural design of Cancer Spider, focusing both on its use as a meta medical search tool and as a document clustering and visualization tool. Although the current implementation of Cancer Spider focuses on cancer-related topics, it can be easily adapted for other medical areas, since the core technologies are domain-independent.
Intended end-users of Cancer Spider are cancer researchers, physicians and medical librarians. The design goal of Cancer Spider is
Comparison base
We conducted a user study to evaluate the proposed approach of meta-searching, clustering, and visualization implemented in the Cancer Spider system. In our experiment, Cancer Spider was compared with NLM Gateway (http://gateway.nlm.nih.gov/gw/Cmd?GMBasicSearch), the portal search engine to NLM's multiple literature databases.
NLM Gateway is a Web-based system that lets users search simultaneously multiple retrieval systems at the US National Library of Medicine (NLM) through a unified Web
Performance
Measured by theme-based precision and recall, the performances of the two IR systems, Cancer Spider and NLM Gateway were comparable. The main statistics are summarized in Table 1. Both the mean precision and the mean recall of Cancer Spider were comparable with those of NLM Gateway. On average, NLM Gateway seemed to be slightly better than Cancer Spider, but the difference was not statistically significant, as suggested by the p-value of the pair-wise t-test. The F-measure, which was the
Conclusions and future directions
Both Cancer Spider and NLM Gateway have strengths and weaknesses. NLM Gateway, as a portal site to many high-quality NLM databases, is a conventional medical search engine. Cancer Spider's data coverage is not as comprehensive as that of NLM Gateway. However, Cancer Spider draws its strength from post-retrieval processing and clustering capability.
With respect to system performance measured by theme-based precision and recall, Cancer Spider and NLM Gateway achieved comparable results. When
Acknowledgements
The Cancer Spider project is supported in part by the following research grants:
- •
NSF Digital Library Initiative-2, “High-Performance Digital Library Systems: From Information to Knowledge Management,” IIS-9817473, April 1999–March 2002.
- •
NIH/NLM, “UMLS Enhanced Dynamic Agents to Manage Medical Knowledge,” 1 R01 LM06919-1A1, February 2001–January 2004.
We would like to thank the researchers at the Arizona Cancer Center who participated in the user study and the librarians at the Arizona Health and
References (37)
- et al.
The retrieval effectiveness of medical information on the Web
International Journal of Medical Informatics
(2001) - et al.
Internet categorization and searcha self-organizing approach
Journal of Visual Communication and Image Representation
(1996) Designing information-abundant Web sitesissues and recommendations
International Journal of Human–Computer Studies
(1997)- et al.
Evaluating the effectiveness of visual user interfaces for information retrieval
International Journal of Human–Computer Studies
(2000) - et al.
Scalable Internet resource discoveryresearch problems and approaches
Communications of the ACM
(1994) Transformation-based error-driven learning and natural language processing
Computational Linguistics
(1995)- Chau, M., Zeng, D., Chen, H., 2001. Personalized spiders for Web search and analysis. In: Proceedings of the First...
- et al.
Internet browsing and searchinguser evaluations of category map and concept space techniques
Journal of the American Society for Information Science
(1998) - et al.
MetaSpidermeta-searching and categorization on the Web
Journal of the American Society for Information Science & Technology
(2001) - et al.
Profusionintelligent fusion from multiple different search engines
Journal of Universal Computer Science
(1996)
How effective is suffixing?
Journal of the American Society for Information Science
Information RetrievalA Health Care Perspective
SavvySearcha meta-search engine that learns which search engines to query
AI Magazine
Stemming algorithms—a case study for detailed evaluation
Journal of the American Society for Information Science
Medical Information on the InternetA Guide for Health Professionals
Self-Organizing Maps
Cited by (11)
Vehicle defect discovery from social media
2012, Decision Support SystemsCitation Excerpt :For instance, in the healthcare profession, reliable and up-to-date health-related data is highly distributed, of varying quality, and difficult to locate on the Web [10]. In the healthcare domain, special-purpose cancer “vertical search spiders”, which use classification technology from the text mining literature, have been developed [11] to specifically identify high quality documents relating to cancer topics from among the thousands of low quality results returned by traditional search engines. In the automotive domain, sifting defect postings (especially safety defect postings) from the discussion ‘dregs’ is similarly challenging and important.
Designing the user interface and functions of a search engine development tool
2010, Decision Support SystemsMining communities and their relationships in blogs: A study of online hate groups
2007, International Journal of Human Computer StudiesCMedPort: An integrated approach to facilitating Chinese medical information seeking
2006, Decision Support SystemsA Systematic Review of Cost, Effort, and Load Research in Information Search and Retrieval, 1972-2020
2023, ACM Transactions on Information SystemsA hybrid system for online detection of emotional distress
2012, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)