Abstract
The paper studies concept-based cross-language information retrieval (CLIR). The document collection was a subset of the TREC collection. The test requests were formed from TREC's health related topics. As translation dictionaries the study used a general dictionary and a domain-specific (=medical) dictionary. The effects of translation method, conjunction, and facet order on the effectiveness of concept-based cross-language queries were studied, and concept-based structuring of cross-language queries was compared to mechanical structuring based on the output of dictionaries. The performance of translated Finnish queries against English documents was compared to the performance of original English queries against the English documents, and the performance of different CLIR query types was compared with one another. No major difference was found between concept-based and mechanical structuring. The best translation method was a simultaneous look-up in the medical dictionary and the general dictionary, in which case cross-language queries performed as well as the original English queries. The results showed that especially at high exhaustivity (the number of mutually restrictive concepts in a request) levels cross-language queries perform well in relation to monolingual queries. This suggests that conjunction disambiguates cross-language queries. An extensive study was made of the relative importance of the concepts of requests. On the basis of the classification data of request concepts it was shown how the order of facets in a query affects cross-language as well as monolingual queries.
Article PDF
Similar content being viewed by others
References
Ballesteros L and Croft WB (1996) Dictionary-based methods for cross-lingual information retrieval. In: Proceedings of the 7th International DEXA Conference on Database and Expert Systems Applications, pp. 791–801.
Ballesteros L and Croft WB (1997) Phrasal translation and query expansion techniques for cross-language information retrieval. In: Working Notes of the AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, Stanford, CA. [available from: http://www.ee.umd.edu/medlab/filter/sss/papers/]
Ballesteros L and Croft WB (1998) Resolving ambiguity for cross-language retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 64–71.
Broglio J, Callan J and Croft WB (1994) Inquery system overview. In: Proceedings of the TIPSTER Text Program (Phase I), pp. 47–67.
proceedings.html]
proceedings.html]
Dumais ST, Landuer TK and Littman ML (1996) Automatic cross-linguistic information retrieval using latent semantic indexing. Working Notes of the Workshop on Cross-Linguistic Information Retrieval, ACM SIGIR, Zürich, Switzerland, pp. 16–23. In: Grefenstette G, ed. Cross-Language Information Retrieval. Kluwer Academic Press, Boston, 1998.
Grefenstette G (1998) Evaluating the adequacy of a multilingual transfer dictionary for cross-language information retrieval. In: LREC'98, the First International Conference on Language Resources and Evaluating, Granada, Spain.
proceedings.html]
Hull D and Grefenstette G (1996) Querying across languages: A dictionary-based approach to multilingual information retrieval. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zürich, Switzerland, pp. 49–57.
Hull D (1997) Using structured queries for disambiguation in cross-language information retrieval. In: Working Notes of the AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, Stanford, CA. [available from: http://www.ee.umd.edu/medlab/filter/sss/papers/]
Keen EM (1992) Presenting results of experimental retrieval comparisons. Information Processing and Management, 28(4):491–502.
Kekäläinen J (1999) The effects of query complexity, broadness and structure on retrieval performance. Ph.D. Thesis manuscript, University of Tampere, Department of Information Studies.
Kekäläinen J and Järvelin K (1998) The impact of query structure and query expansion on retrieval performance. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 130–137.
Koskenniemi K (1983) Two-level morphology: A general computational model for word-form recognition and production. University of Helsinki, Department of General Linguistics.
Losee RM (1998). Text retrieval and filtering: Analytic models of performance. Kluwer Academic Publishers, Boston, Dordrecht, London, p. 242.
Losee RM and Bookstein A (1988) Integrating Boolean queries in conjunctive normal form with probabilistic retrieval models. Information Processing and Management, 24(3):315–321.
Oard D and Dorr B (1996) A survey of multilingual text retrieval, Technical ReportUMIACS-TR–96–19, University of Maryland, Institute for Advanced Computer Studies.
Peters C and Picchi E (1997) Using linguistic tools and resources in cross-language retrieval. In: Working Notes of the AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, Stanford, CA. [available from: http://www.ee.umd.edu/medlab/filter/sss/papers/]
Pirkola A (1998) The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 55–63.
Pirkola A and Keskustalo H (1999) The effects of translation method, conjunction, and facet structure on concept-based cross-language queries. Forthcoming in the Finnish Information Studies Series.
Pirkola A (1999) Homonymy in cross-language retrieval. University of Tampere, Department of Information Studies. Unpublished manuscript.
Sheridan P and Ballerini J (1996) Experiments in multilingual information retrieval using SPIDER system. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zürich, Switzerland, In: Grefenstette G, ed. Cross-Language Information Retrieval. Kluwer Academic Press, Boston, 1998, pp. 58–65.
Sheridan P, Braschler M and Schäuble P (1997) Cross-language information retrieval in a multilingual legal domain. In: Research and Advanced Technology for Digital Libraries. First European Conference, ECDL'97, Pisa, Italy, 1–3 September, Proceedings. Lecture Notes in Computer Science, Vol. 1324, pp. 253–268.
Siegel S and Castellan N J (1988) Nonparametric statistics for the behavioral sciences. McGraw-Hill, New York, p. 399.
Yamabana K, Muraki K, Doi S and Kamei S (1996) A language conversion front-end for cross-linguistic information retrieval. In: Working Notes of the Workshop on Cross-Linguistic Information Retrieval, ACM SIGIR, Zürich, Switzerland, In: Grefenstette G, ed. Cross-Language Information Retrieval. Kluwer Academic Press, Boston, 1998, pp. 34–39.
Rights and permissions
About this article
Cite this article
Pirkola, A., Keskustalo, H. & Järvelin, K. The Effects of Conjunction, Facet Structure, and Dictionary Combinations in Concept-Based Cross-Language Retrieval. Information Retrieval 1, 217–250 (1999). https://doi.org/10.1023/A:1009939707058
Issue Date:
DOI: https://doi.org/10.1023/A:1009939707058