Article

Free Access

Discriminative training and maximum entropy models for statistical machine translation

Authors:
Franz Josef Och

University of Technology, Aachen, Germany

University of Technology, Aachen, Germany
View Profile

,
Hermann Ney

University of Technology, Aachen, Germany

University of Technology, Aachen, Germany
View Profile

ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational LinguisticsJuly 2002Pages 295–302https://doi.org/10.3115/1073083.1073133

Published:06 July 2002Publication History

ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

Pages 295–302

ABSTRACT

We present a framework for statistical machine translation of natural languages based on direct maximum entropy models, which contains the widely used source-channel approach as a special case. All knowledge sources are treated as feature functions, which depend on the source language sentence, the target language sentence and possible hidden variables. This approach allows a baseline machine translation system to be extended easily by adding new feature functions. We show that a baseline statistical machine translation system is significantly improved using this approach.

References

L. R. Bahl, P. F. Brown, P. V. de Souza, and R. L. Mercer. 1986. Maximum mutual information estimation of hidden markov model parameters. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 49--52, Tokyo, Japan, April.Google Scholar
A. L. Berger, S. A. Della Pietra, and V. J. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--72, March. Google ScholarDigital Library
P. Beyerlein. 1997. Discriminative model combination. In Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, pages 238--245, Santa Barbara, CA, December.Google ScholarCross Ref
P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311. Google ScholarDigital Library
J. N. Darroch and D. Ratcliff. 1972. Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics, 43:1470--1480.Google ScholarCross Ref
B. H. Juang, W. Chou, and C. H. Lee. 1995. Statistical and discriminative methods for speech recognition. In A. J. R. Ayuso and J. M. L. Soler, editors, Speech Recognition and Coding - New Advances and Trends. Springer Verlag, Berlin, Germany.Google Scholar
H. Ney. 1995. On the probabilistic-interpretation of neural-network classifiers and discriminative training criteria. IEEE Trans. on Pattern Analysis and Machine Intelligence, 17(2):107--119, February. Google ScholarDigital Library
S. Nießen, F. J. Och, G. Leusch, and H. Ney. 2000. An evaluation tool for machine translation: Fast evaluation for MT research. In Proc. of the Second Int. Conf. on Language Resources and Evaluation (LREC), pages 39--45, Athens, Greece, May.Google Scholar
F. J. Och, C. Tillmann, and H. Ney. 1999. Improved alignment models for statistical machine translation. In Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 20--28, University of Maryland, College Park, MD, June. Google ScholarDigital Library
K. A. Papineni, S. Roukos, and R. T. Ward. 1997. Feature-based language understanding. In European Conf. on Speech Communication and Technology, pages 1435--1438, Rhodes, Greece, September.Google Scholar
K. A. Papineni, S. Roukos, and R. T. Ward. 1998. Maximum likelihood and discriminative training of direct translation models. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 189--192, Seattle, WA, May.Google Scholar
K. A. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2001. Bleu: a method for automatic evaluation of machine translation. Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY, September.Google Scholar
J. Peters and D. Klakow. 1999. Compact maximum entropy language models. In Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, Keystone, CO, December.Google Scholar
R. Schlüter and H. Ney. 2001. Model-based MCE bound to the true Bayes' error. IEEE Signal Processing Letters, 8(5):131--133, May.Google ScholarCross Ref
W. Wahlster. 1993. Verbmobil: Translation of face-to-face dialogs. In Proc. of MT Summit IV, pages 127--135, Kobe, Japan, July.Google Scholar

Discriminative training and maximum entropy models for statistical machine translation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Syntactic discriminative language model rerankers for statistical machine translation

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Read More
N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven ...
Read More
Maximum entropy based phrase reordering model for statistical machine translation
ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

We propose a novel reordering model for phrase-based statistical machine translation (SMT) that uses a maximum entropy (MaxEnt) model to predicate reorderings of neighbor blocks (phrase pairs). The model provides content-dependent, hierarchical phrasal ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
July 2002
543 pages
General Chair:
Pierre Isabelle
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 6 July 2002
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 215
  Total Citations
  View Citations
- 2,435
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Discriminative training and maximum entropy models for statistical machine translation

ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Syntactic discriminative language model rerankers for statistical machine translation

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

Maximum entropy based phrase reordering model for statistical machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Discriminative training and maximum entropy models for statistical machine translation

ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Syntactic discriminative language model rerankers for statistical machine translation

N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

Maximum entropy based phrase reordering model for statistical machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media