Abstract
PURPOSE: We evaluate how argumentation in scientific articles can be used to propose an original index pruning strategy, which significantly reduce the size of the engine’s indexes but having a limited impact on retrieval effectiveness. METHODS: A Bayesian classifier trained on explicitly structured MEDLINE abstracts generates these argumentative categories. The categories are used to generate four different argumentative indexes. A fifth index contains the complete abstract, together with the title and the list of Medical Subject Headings (MeSH) terms. This last index is used as baseline to compare results obtained when only a specific argumentative index is retrieved. RESULTS and CONCLUSION: When titles and medical subject headings are also stored in the respective indexes, querying PURPOSE and CONCLUSION indexes can respectively achieves 78.4% and 74.3% of the baseline, while the size if the index is divided by two. It is concluded that argumentation can be a powerful index pruning strategy in complement to more traditionnal approaches.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aronson, A., Mork, J., Gay, C., Humphrey, S., Rogers, W.: The NLM Indexing Initiative’s Medical Text Indexer. In: MedInfo 1989 Proceedings (2004)
Ruch, P., Baud, R.: valuating and Reducing the Effect of Data Corruption when Applying Bag of Words Approaches to Medical Records. Int J Med Inf 67(1-3), 75–83 (2002)
Névéol, A., Soualmia, L., Douyère, M., Rogozan, A., Thirion, B., Darmoni, S.: Using cismef mesh ”encapsulated” terminology and a categorization algorithm for health resources. Int J Med Inf 73(1), 57–64 (2004)
Tschopp, M., Lovis, C., Geissbühler, A.: Understanding usage patterns of handheld computers in clinical practice. In: Proc. AMIA Symp, pp. 806–9 (2000)
Witten, I., Moffat, A., Bell, T.: Managing Gigabytes. Morgan Kaufman, San Francisco (1999)
Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y., Soffer, A.: Static index pruning for information retrieval systems. In: Proc. of ACM-SIGIR, pp. 43–50 (2001)
Craswell, N., Hawking, D., Wilkinson, R., Wu, M.: Overview of the trec 2003 web track. In: TREC, pp. 78–92 (2003)
Aronson, A., Bodenreider, O., Chang, H., Humphrey, S., Mork, J., Nelson, S., Rindflesch, T., Wilbur, W.: The indexing initiative. A report to the board of scientific counselors of the lister hill national center for biomedical communications. Technical report, NLM (1999)
Schuemie, M., Weeber, M., Schijvenaars, B., van Mulligen, E., van der Eijk, C., Jeliert, R., Mons, B., Kors, J.: Distribution of information in biomedical abstracts and full text publications. Bioinformatics (2004)
Orasan, C.: Patterns in Scientific Abstracts. In: Proceedings of Corpus Linguistics, pp. 433–445
Ruch, P., Chichester, C., Cohen, G., Coray, G., Ehrler, F., Ghorbel, H., Müller, H., Pallotta, V.: Report on the TREC 2003 Experiment: Genomic Track. In: TREC-12 (2004)
Shaw, W., Wood, J., Wood, R., Tibbo, H.: The cystic fibrosis database: Content and research opportunities. LSIR 13, 347–366 (1991)
Salton, G., Fox, E., Wu, H.: Communications of the acm. Journal of the American Society for Information Science 26(11), 1022–1036 (1983)
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: ACM-SIGIR, pp. 21–29 (1996)
Amati, G., van Rijsbergen, C.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems (TOIS) 20(4), 357–389 (2002)
Savoy, J.: Report on clef-2003 monolingual tracks: Fusion of probabilistic models for effective monolingual retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 322–336. Springer, Heidelberg (2004)
Ruch, P.: Using contextual spelling correction to improve retrieval effectiveness in degraded text collections. In: COLING 2002 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ruch, P., Baud, R., Marty, J., Geissbühler, A., Tbahriti, I., Veuthey, AL. (2005). Latent Argumentative Pruning for Compact MEDLINE Indexing. In: Miksch, S., Hunter, J., Keravnou, E.T. (eds) Artificial Intelligence in Medicine. AIME 2005. Lecture Notes in Computer Science(), vol 3581. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527770_36
Download citation
DOI: https://doi.org/10.1007/11527770_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27831-3
Online ISBN: 978-3-540-31884-2
eBook Packages: Computer ScienceComputer Science (R0)