Abstract
This paper proposes the use of algorithms for mining association rules as an approach for Cross-Language Information Retrieval. These algorithms have been widely used to analyse market basket data. The idea is to map the problem of finding associations between sales items to the problem of finding term translations over a parallel corpus. The proposal was validated by means of experiments using queries in two distinct languages: Portuguese and Finnish to retrieve documents in English. The results show that the performance of our proposed approach is comparable to the performance of the monolingual baseline and to query translation via machine translation, even though these systems employ more complex Natural Language Processing techniques. The combination between machine translation and our approach yielded the best results, even outperforming the monolingual baseline.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. of the ACM SIGMOD Conference on Management of Data, Washington, D.C. (1993)
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proceedings of the 20th VLDB Conference, Santiago, Chile, pp. 487–499 (1994)
Aguirre, E., et al.: CLEF 2008: Ad Hoc Track Overview. In: Working Notes for the CLEF 2008 Workshop, Aarhus, Denmark (2008)
ELDA, http://www.elra.info/ (accessed on: 15-October 2008)
EuroParl, http://www.statmt.org/europarl/ (accessed on: 15-October 2008)
Geraldo, A.P., Orengo, V.M.: UFRGS@CLEF2008: Using Association rules for Cross-Language Information Retrieval. In: Borri, F., Nardi, A., Peters, C. (eds.) Working Notes of CLEF 2008, Aarhus, Denmark (2008)
Global Reach, http://global-reach.biz/globstats/refs.php3 (accessed on: 19-October 2007)
Google Translator, http://www.google.com/translate_t (accessed on: 29-October 2008)
Grefenstette, G.: Cross-Language Information Retrieval, p. 200. Kluwer Academic Publishers, Boston (1998)
Kraaij, W., Nie, J., Simard, M.: Embedding web-based statistical translation models in cross-language information retrieval. Computational Linguistics 29(3), 381–419 (2003)
McNamee, P., Mayfield, J.: Scalable Multilingual Information Access. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785. Springer, Heidelberg (2003)
Nie, J., et al.: Cross-Language Information Retrieval based on Parallel Texts and Automatic Mining of Parallel Texts from the Web. In: SIGIR, pp. 74–81 (1999)
Orengo, V.M., Huyck, C.R.: Portuguese-English Cross-Language Information Retrieval Using Latent Semantic Indexing. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785. Springer, Heidelberg (2003)
Pirkola, A., et al.: Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings. Information Retrieval 4(3), 209–230 (2001)
Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)
Salton, G.: Automatic Processing of Foreign Language Documents. Journal of the American Society for Information Science 21(3), 187–194 (1970)
Savoy, J.: Combining Multiple Strategies for Effective Monolingual and Cross-Language Retrieval. Information Retrieval 7(1-2), 121–148 (2004)
Snowball Stemmers, http://snowball.tartarus.org/texts/stemmersoverview.html (accessed on: 28-October 2008)
Systran, http://www.systransoft.com/ (accessed on: 22/01/2009)
Veloso, A., et al.: Learning to Rank at Query-Time using Association Rules. In: SIGIR 2008, Singapore, pp. 267–274 (2008)
World Internet Statistics, http://www.internetworldstats.com/stats7.htm (accessed on: 29-October 2008)
Yang, Y., et al.: Translingual Information Retrieval. In: 15th International Joint Conference on Artificial Inteligence (IJCAI), Nagoya, Japan (1997)
Zettair, www.seg.rmit.edu.au/zettair/ (accessed on: 11/06/07)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Geraldo, A.P., Moreira, V.P., Gonçalves, M.A. (2009). On-Demand Associative Cross-Language Information Retrieval. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds) String Processing and Information Retrieval. SPIRE 2009. Lecture Notes in Computer Science, vol 5721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03784-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-03784-9_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03783-2
Online ISBN: 978-3-642-03784-9
eBook Packages: Computer ScienceComputer Science (R0)