Abstract
We describe the syntactic structure transfer, a central design question in machine translation, between two languages Tamil (source) and Hindi (target), belonging to two different language families, Dravidian and Indo-Aryan respectively. Tamil and Hindi differ extensively at the clausal construction level and transferring the structure is difficult. The syntactic structure transfer described here is a hybrid approach where we use CRFs for identifying the clause boundaries in the source language, Transformation Based Learning (TBL) for extracting the rules and use semantic classification of Postpositions (PSP) for choosing semantically appropriate structure in constructions where there are one to many mapping in the target language. We have evaluated the system using web data and the results are encouraging.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics 21(4), 543–566 (1995)
Chris, Q.: Arul Menezes, and Colin C.: Dependency tree let translation: Syntactically informed phrasal smt. In: Proceedings of the 43rd ACL
Collins, M., Koehn, P., Ivona, K.: Clause restructuring for statistical machine translation. In: ACL, Ann Arbor,MI, pp. 531–540
Dien, Z.D., Ngan, T., Quang, X., Nam, C.: A hybrid approach to word-order transfer in the english – vietnamese machine translation system. In: Proceedings of the MT Summit IX, Louisiana, USA, pp. 79–86 (2003)
Ding, Y., Palmer, M.: Machine translation using probablisitic synchronous dependency insertion grammars. In: Proceedings of the 43rd ACL
Koehn, P., Josef, O.F., Marcu, D.: Statistical Phrase-Based Translation. In: Proc of HLT/NAACL 2003, pp. 127–133 (2003)
Lavie, A.: Stat-XFER: A general search-based syntax-driven framework for machine translation. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 362–375. Springer, Heidelberg (2008)
Lin, D.: A path-based transfer model for machine translation. In: Proceedings of the 20th COLING 2004 (2004)
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web enhanced lexicons. In: Proceedings of CoNLL 2003, Edmonton, Canada, pp. 188–191 (2003)
Och, F.J., Tillmann, C., Ney, H.: Improved Alignment Models for Statistical Machine Translation. In: EMNLP (1999)
Probst, K., Levin, L.: Challenges in Automated Elicitation of a Controlled Bilingual Corpus. In: Proceedings of TMI (2002)
Slocum, J.: Machine Translation: its history, current status, and future prospects. In: Proceedings of the 10th international conference on Computational linguistics, Stanford, California, July 02-06, pp. 546–561 (1984)
Kudo, T.: CRF++, an open source toolkit for CRF (2005), http://crfpp.sourceforge.net
Xia, F., Michael, M.: Improving a statistical MT system with automatically learned rewrite patterns. In: COLING 2004 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lalitha Devi, S., Ram R, V.S., Pralayankar, P., T, B. (2010). Syntactic Structure Transfer in a Tamil to Hindi MT System – A Hybrid Approach. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-12116-6_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)