Abstract
We present a framework for mining synonymous transliterations from a set of Web pages collected via a search engine. An integrated statistical measure is proposed to form search keywords for a search engine in order to retrieve relevant Web snippets. We employ a scheme of comparing the similarity between two transliterations to aid in identifying synonymous transliterations. Experimental results show that the average number of harvesting synonymous transliterations is about 5.04 for an input transliteration. The retrieval results could be beneficial for constructing ontology, especially, in the domain of foreign person names.
Chapter PDF
Similar content being viewed by others
Keywords
References
Carpineto, C., Bordoni, F.U., Mori, R.D., Avignon, U.O., Romano, G., Bordoni, F.U., Bigi, B.: An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems 19(1), 1–27 (2001)
Huang, S., Chen, Z., Yu, Y., Ma, W.-Y.: Multitype features coselection for Web document clustering. IEEE Transactions on Knowledge and Data Engineering 18(4), 448–459 (2006)
Cheng, P.-J., Teng, J.-W., Chen, R.-C., Wang, J.-H., Lu, W.-H., Chien, L.-F.: Translating unknown queries with Web corpora for cross-language information retrieval. In: Proceedings of ACM SIGIR, Sheffield, South Yorkshire, UK (2004)
Cilibrasi, R.L., Vitanyi, P.M.B.: The Google similarity distance. IEEE Transactions on Knowledge and Data Enginerring 19(3), 370–383 (2007)
Tsuji, K.: Automatic extraction of translational Japanese-Katakana and English word pairs from bilingual corpora. International Journal of Computer Processing of Oriental Language, 261–280 (2002)
Stalls, B.G., Kevin, K.: Translating names and technical terms in arabic text. In: Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages (1998)
Somers, H.L.: Similarity metrics for aligning children’ s articulation data. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 1227–1231 (1998)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. In: IEEE Trans. Acoustics, Speech, and Signal Proc. ASSP, pp. 43–49 (1978)
Lin, W.H., Chen, H.H.: Backward machine transliteration by learning phonetic similarity. In: Proceedings of the Sixth Conference on Natural Language Learning, Taipei, Taiwan, pp. 139–145 (2002)
Lin, W.H., Chen, H.H.: Similarity measure in backward transliteration between different character sets and its applications to CLIR. In: Proceedings of Research on Computational Linguistics Conference XIII, Taipei, Taiwan, pp. 97–113 (2000)
Lee, C.J., Chang, J.S., Jang, J.-S.R.: Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources. ACM Transactions on Asian Language Information Processing 5(2), 121–145 (2006)
Hsu, C.-C., Chen, C.-H., Shih, T.-T., Chen, C.-K.: Measuring similarity between transliterations against noise data. ACM Transactions on Asian Language Information Processing (2007)
Kuo, J.-S., Li, H., Yang, Y.-K.: A phonetic similarity model for automatic extraction of transliteration pairs. ACM Trans. Asian Language Information Processing (2007)
Kondrak, G.: Phonetic alignment and similarity. Computers and the Humanities 37(3), 273–291 (2003)
Chen, H.H., Lin, W., Yang, C.C., Lin, W.H.: Translating/transliterating named entities for multilingual information access. Journal of the American Society for Information Science and Technology, 645–659 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hsu, CC., Chen, CH. (2008). Synonymous Chinese Transliterations Retrieval from World Wide Web by Using Association Words. In: Bubak, M., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds) Computational Science – ICCS 2008. ICCS 2008. Lecture Notes in Computer Science, vol 5101. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69384-0_96
Download citation
DOI: https://doi.org/10.1007/978-3-540-69384-0_96
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69383-3
Online ISBN: 978-3-540-69384-0
eBook Packages: Computer ScienceComputer Science (R0)