ABSTRACT
We present a framework for statistical machine translation of natural languages based on direct maximum entropy models, which contains the widely used source-channel approach as a special case. All knowledge sources are treated as feature functions, which depend on the source language sentence, the target language sentence and possible hidden variables. This approach allows a baseline machine translation system to be extended easily by adding new feature functions. We show that a baseline statistical machine translation system is significantly improved using this approach.
- L. R. Bahl, P. F. Brown, P. V. de Souza, and R. L. Mercer. 1986. Maximum mutual information estimation of hidden markov model parameters. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 49--52, Tokyo, Japan, April.Google Scholar
- A. L. Berger, S. A. Della Pietra, and V. J. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--72, March. Google ScholarDigital Library
- P. Beyerlein. 1997. Discriminative model combination. In Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, pages 238--245, Santa Barbara, CA, December.Google ScholarCross Ref
- P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311. Google ScholarDigital Library
- J. N. Darroch and D. Ratcliff. 1972. Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics, 43:1470--1480.Google ScholarCross Ref
- B. H. Juang, W. Chou, and C. H. Lee. 1995. Statistical and discriminative methods for speech recognition. In A. J. R. Ayuso and J. M. L. Soler, editors, Speech Recognition and Coding - New Advances and Trends. Springer Verlag, Berlin, Germany.Google Scholar
- H. Ney. 1995. On the probabilistic-interpretation of neural-network classifiers and discriminative training criteria. IEEE Trans. on Pattern Analysis and Machine Intelligence, 17(2):107--119, February. Google ScholarDigital Library
- S. Nießen, F. J. Och, G. Leusch, and H. Ney. 2000. An evaluation tool for machine translation: Fast evaluation for MT research. In Proc. of the Second Int. Conf. on Language Resources and Evaluation (LREC), pages 39--45, Athens, Greece, May.Google Scholar
- F. J. Och, C. Tillmann, and H. Ney. 1999. Improved alignment models for statistical machine translation. In Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 20--28, University of Maryland, College Park, MD, June. Google ScholarDigital Library
- K. A. Papineni, S. Roukos, and R. T. Ward. 1997. Feature-based language understanding. In European Conf. on Speech Communication and Technology, pages 1435--1438, Rhodes, Greece, September.Google Scholar
- K. A. Papineni, S. Roukos, and R. T. Ward. 1998. Maximum likelihood and discriminative training of direct translation models. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 189--192, Seattle, WA, May.Google Scholar
- K. A. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2001. Bleu: a method for automatic evaluation of machine translation. Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY, September.Google Scholar
- J. Peters and D. Klakow. 1999. Compact maximum entropy language models. In Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, Keystone, CO, December.Google Scholar
- R. Schlüter and H. Ney. 2001. Model-based MCE bound to the true Bayes' error. IEEE Signal Processing Letters, 8(5):131--133, May.Google ScholarCross Ref
- W. Wahlster. 1993. Verbmobil: Translation of face-to-face dialogs. In Proc. of MT Summit IV, pages 127--135, Kobe, Japan, July.Google Scholar
- Discriminative training and maximum entropy models for statistical machine translation
Recommendations
Syntactic discriminative language model rerankers for statistical machine translation
This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational LinguisticsIn this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven ...
Maximum entropy based phrase reordering model for statistical machine translation
ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational LinguisticsWe propose a novel reordering model for phrase-based statistical machine translation (SMT) that uses a maximum entropy (MaxEnt) model to predicate reorderings of neighbor blocks (phrase pairs). The model provides content-dependent, hierarchical phrasal ...
Comments