Abstract
In this article we describe a hybrid technique for dictionary-based query translation suitable for English-Chinese cross language information retrieval. This technique marries a graph-based model for the resolution of candidate term ambiguity with a pattern-based method for the translation of out-of-vocabulary (OOV) terms. We evaluate the performance of this hybrid technique in an experiment using several NTCIR test collections. Experimental results indicate a substantial increase in retrieval effectiveness over various baseline systems incorporating machine- and dictionary-based translation.
- AbdulJaleel, N. and Larkey, L. S. 2003. Statistical transliteration for English-Arabic cross language information retrieval. In Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM'03). New Orleans, LA. ACM Press. 139--146. Google ScholarDigital Library
- Adriani, M. 2000. Using statistical term similarity for sense disambiguationin cross-language information retrieval. Inf. Retr. 2, 1, 71--82. Google ScholarDigital Library
- Ballesteros, L. and Croft, W. B. 1998. Resolving ambiguity for cross-language retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia. ACM Press. 64--71. Google ScholarDigital Library
- Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Proceedings of the 7th International World Wide Web Conference (WWW'98). Google ScholarDigital Library
- Brody, S., Navigli, R., and Lapata, M. 2006. Ensemble methods for unsupervised wsd. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL (ACL'06). Association for Computational Linguistics, Morristown, NJ, 97--104. Google ScholarDigital Library
- Buckley, C., Mitra, M., Walz, J., and Cardie, C. 2000. Using clustering and superconcepts within smart: Trec 6. Inform. Process. Manage. 36, 1, 109--131. Google ScholarDigital Library
- Cao, G., Gao, J., and Nie, J.-Y. 2007. A system to mine large-scale bilingual dictionaries from monolingual Web pages. In Machine Translation Summit XI. Copenhagen, Denmark, 57--64.Google Scholar
- Chen, J., Li, Q., and Jia, W. 2005. Automatically generating an e-textbook on the Web. World Wide Web 8, 4, 377--394. Google ScholarDigital Library
- Chen, K.-J. and Ma, W.-Y. 2002. Unknown word extraction for Chinese Documents. In Proceedings of the 19th International Conference on Computational Linguistics (COLIN'02). Association for Computational Linguistics, Morristown, NJ, 1--7. Google ScholarDigital Library
- Cheng, P.-J., Teng, J.-W., Chen, R.-C., Wang, J.-H., Lu, W.-H., and Chien, L.-F. 2004. Translating unknown queries with Web corpora for cross-language information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'04). Sheffield, UK. ACM Press, 146--153. Google ScholarDigital Library
- Cimiano, P., Handschuh, S., and Staab, S. 2004. Towards the self-annotating Web. In Proceedings of the 13th International Conference on World Wide Web (WWW'04). New York, NY. ACM Press. 462--471. Google ScholarDigital Library
- Cimiano, P., Ladwig, G., and Staab, S. 2005. Gimme the context: context-driven automatic semantic annotation with c-pankow. In Proceedings of the 14th International Conference on World Wide Web (WWW'05). Chiba, Japan: ACM Press, 332--341. Google ScholarDigital Library
- Erkan, G. and Radev, D. R. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. J. AI Res. 22, 457--479. Google ScholarDigital Library
- Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S., and Yates, A. 2004. Web-scale information extraction in knowitall: (preliminary results). In Proceedings of the 13th International Conference on World Wide Web (WWW'04). New York, NY. ACM Press, 100--110. Google ScholarDigital Library
- Federico, M. and Bertoldi, N. 2002. Statistical cross-language information retrieval using n-best query translations. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'02). Tampere, Finland. ACM Press, 167--174. Google ScholarDigital Library
- Fujii, A. and Ishikawa, T. 2001. Japanese/English cross-language information retrieval: Exploration of query translation and transliteration. Comput. Human. 35, 4, 389--420.Google ScholarCross Ref
- Gao, J. and Nie, J.-Y. 2006. A study of statistical models for query translation: Finding a good unit of translation. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'06). Seattle, WA. ACM Press, 194--201. Google ScholarDigital Library
- Gao, J., Zhou, M., Nie, J.-Y., He, H., and Chen, W. 2002. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'02). New York, NY. ACM Press, 183--190. Google ScholarDigital Library
- Hearst, M. A. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th Conference on Computational Linguistics (COLING'92). Association for Computational Linguistics, Morristown, NJ, 539--545. Google ScholarDigital Library
- Iwanska, L., Mata, N., and Kruger, K. 1999. Fully automatic acquisition of taxonomic knowledge from large corpora of texts: Limited syntax knowledge representation system based on natural language. In Proceedings of the 11th International Symposium on Foundations of Intelligent Systems (ISMIS'95). London, UK. Springer-Verlag, 430--438. Google ScholarDigital Library
- Jang, M.-G., Myaeng, S. H., and Park, S. Y. 1999. Using mutual information to resolve query translation ambiguities and query term weighting. In Proceedings of the 37th Annual Meeting of the Association on Computational Linguistics (COLING'99). College Park, MD. Association for Computational Linguistics, 223--229. Google ScholarDigital Library
- Kang, I.-H. and Kim, G. 2000. English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks. In Proceedings of the 18th Conference on Computational Linguistics - Volume 1. Saarbrcken, Germany. Association for Computational Linguistics. 418--424. Google ScholarDigital Library
- Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632. 324140. Google ScholarDigital Library
- Kraaij, W. 2001. Tno at clef-2001. In Proceedings of Workshop on Cross-Language Evaluation Forum (CLEF'01). Darmstadt, Germany, 79--83.Google Scholar
- Kurland, O. and Lee, L. 2005. Pagerank without hyperlinks: structural re-ranking using links induced by language models. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). Salvador, Brazil. ACM Press, 306--313. 1076087. Google ScholarDigital Library
- Kwok, K.-L. and Dinstl, N. 2007. Ntcir-6 monolingual Chinese and English-Chinese cross language retrieval experiments using pircs. In Proceedings of the 6th NTCIR Workshop Meeting. NII, Tokyo, Japan, 190--197.Google Scholar
- Liu, B., Chin, C. W., and Ng, H. T. 2003. Mining topic-specific concepts and definitions on the Web. In Proceedings of the 12th International Conference on World Wide Web (WWW'03). New York, NY. ACM Press, 251--260. Google ScholarDigital Library
- Liu, Y., Jin, R., and Chai, J. Y. 2005. A maximum coherence model for dictionary-based cross-language information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). Salvador, Brazil. ACM Press, 536--543. 1076125. Google ScholarDigital Library
- Lu, C., Xu, Y., and Geva, S. 2007. Translation disambiguation in Web-based translation extraction for English-Chinese CLIR. In Proceedings of the 2007 ACM Symposium on Applied Computing (SAC'07). New York, NY. ACM Press, 819--823. Google ScholarDigital Library
- Lu, W.-H., Chien, L.-F., and Lee, H.-J. 2002. Translation of Web queries using anchor text mining. ACM Trans. Asian Lang. Inform. Process. 1, 2, 159--172. 568958. Google ScholarDigital Library
- Maeda, A., Sadat, F., Yoshikawa, M., and Uemura, S. 2000. Query term disambiguation for Web cross-language information retrieval using a search engine. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages (IRAL'00). Hong Kong. ACM Press, 25--32. Google ScholarDigital Library
- Manning, C. D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA. Google ScholarDigital Library
- Mihalcea, R. 2005. Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT'05). Morristown, NJ. Association for Computational Linguistics, 411--418. Google ScholarDigital Library
- Mihalcea, R. and Tarau, P. 2004. Textrank-bringing order into texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'04). 404--411.Google Scholar
- Monz, C. and Dorr, B. J. 2005. Iterative translation disambiguation for cross-language information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). Salvador, Brazil. ACM Press, 520--527. Google ScholarDigital Library
- Pirkola, A. 1998. The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98). Melbourne, Australia. ACM Press, 55--63. Google ScholarDigital Library
- Pirkola, A., Keskustalo, H., Leppanen, E., Kansala, A.-P., and Jarvelin, K. 2002. Targeted s-gram matching: a novel n-gram matching technique for cross- and monolingual word form variants. Inform. Res. 7, 2.Google Scholar
- Pirkola, A., Toivonen, J., Keskustalo, H., Visala, K., J, K., and Rvelin. 2003. Fuzzy translation of cross-lingual spelling variants. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (SIGIR'03). Toronto, Canada. ACM Press, 345--352. Google ScholarDigital Library
- Qu, Y., Grefenstette, G., and Evans, D. A. 2003. Automatic transliteration for Japanese-to-English text retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'03). Toronto, Canada. ACM Press, 353--360. Google ScholarDigital Library
- Serban, R., Teije, A. T., Harmelen, F. V., Marcos, M., and C., P. 2005. Ontology-driven extraction of linguistic patterns for modelling clinical guidelines. In Proceedings of the 10th European Conference on Artificial Intelligence in Medicine (AIME'05). 194--253. Google ScholarDigital Library
- Sperer, R. and Oard, D. W. 2000. Structured translation for cross-language information retrieval. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'00). New York, NY. ACM Press, 120--127. Google ScholarDigital Library
- Virga, P. and Khudanpur, S. 2003. Transliteration of proper names in cross-lingual information retrieval. In Proceedings of the ACL Workshop on Multilingual and Mixed-language Named Entity Recognition, Vol. 15. Association for Computational Linguistics, 57--64. Google ScholarDigital Library
- Voorhees, E. and Harman, D. 2000. Overview of the ninth text retrieval conference. In Proceedings of the 9th Text Retrieval Conference. NIST, 1--28.Google Scholar
- Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Academic Press, San Diego, CA. Google ScholarDigital Library
- Wu, Y.-C., Tsai, K.-C., and Yang, J.-C. 2007. Ncu in bilingual information retrieval experiments at NTCIR-6. In Proceedings of the 6th NTCIR Workshop Meeting. NII, Tokyo, Japan, 133--139.Google Scholar
- Zhang, Y. and Vines, P. 2004. Using the Web for automated translation extraction in cross-language information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'04). Sheffield, UK. ACM Press, 162--169. Google ScholarDigital Library
- Zhang, Y., Vines, P., and Zobel, J. 2005. Chinese OOV translation and post-translation query expansion in Chinese--English cross-lingual information retrieval. ACM Trans. Asian Lang. Inform. Process. 4, 2, 57--77. Google ScholarDigital Library
- Zhou, D., Goulding, J., Truran, M., and Brailsford, T. 2007. Llama: automatic hypertext generation utilizing language models. In Proceedings of the 18th Conference on Hypertext and Hypermedia (HT'07). New York, NY. ACM Press, 77--80. Google ScholarDigital Library
- Zhou, D., Truran, M., Brailsford, T., and Ashman, H. 2007. NTCIR-6 experiments using pattern matched translation extraction. In Proceedings of the 6th NTCIR Workshop Meeting. NII, Tokyo, Japan, 145--151.Google Scholar
Index Terms
- A Hybrid Technique for English-Chinese Cross Language Information Retrieval
Recommendations
Exploiting a Chinese-English bilingual wordlist for English-Chinese cross language information retrieval
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languagesWe investigated using the LDC English/Chinese bilingual wordlists for English-Chinese cross language retrieval. It is shown that the Chinese-to-English wordlist can be considered as both a phrase and word dictionary, and is preferable to the English-to-...
Statistical transliteration for english-arabic cross language information retrieval
CIKM '03: Proceedings of the twelfth international conference on Information and knowledge managementOut of vocabulary (OOV) words are problematic for cross language information retrieval. One way to deal with OOV words when the two languages have different alphabets, is to transliterate the unknown words, that is, to render them in the orthography of ...
Cross language information retrieval based on concept base and language grid
ESAIR '10: Proceedings of the third workshop on Exploiting semantic annotations in information retrievalThis paper describes query translation using multiple language resources and concept base method for the Cross Language Information Retrieval (CLIR). In the proposed method, the queries are translated by multiple machine translation systems on the ...
Comments