skip to main content
10.3115/1073083.1073133dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free Access

Discriminative training and maximum entropy models for statistical machine translation

Published:06 July 2002Publication History

ABSTRACT

We present a framework for statistical machine translation of natural languages based on direct maximum entropy models, which contains the widely used source-channel approach as a special case. All knowledge sources are treated as feature functions, which depend on the source language sentence, the target language sentence and possible hidden variables. This approach allows a baseline machine translation system to be extended easily by adding new feature functions. We show that a baseline statistical machine translation system is significantly improved using this approach.

References

  1. L. R. Bahl, P. F. Brown, P. V. de Souza, and R. L. Mercer. 1986. Maximum mutual information estimation of hidden markov model parameters. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 49--52, Tokyo, Japan, April.Google ScholarGoogle Scholar
  2. A. L. Berger, S. A. Della Pietra, and V. J. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--72, March. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Beyerlein. 1997. Discriminative model combination. In Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, pages 238--245, Santa Barbara, CA, December.Google ScholarGoogle ScholarCross RefCross Ref
  4. P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. N. Darroch and D. Ratcliff. 1972. Generalized iterative scaling for log-linear models. Annals of Mathematical Statistics, 43:1470--1480.Google ScholarGoogle ScholarCross RefCross Ref
  6. B. H. Juang, W. Chou, and C. H. Lee. 1995. Statistical and discriminative methods for speech recognition. In A. J. R. Ayuso and J. M. L. Soler, editors, Speech Recognition and Coding - New Advances and Trends. Springer Verlag, Berlin, Germany.Google ScholarGoogle Scholar
  7. H. Ney. 1995. On the probabilistic-interpretation of neural-network classifiers and discriminative training criteria. IEEE Trans. on Pattern Analysis and Machine Intelligence, 17(2):107--119, February. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Nießen, F. J. Och, G. Leusch, and H. Ney. 2000. An evaluation tool for machine translation: Fast evaluation for MT research. In Proc. of the Second Int. Conf. on Language Resources and Evaluation (LREC), pages 39--45, Athens, Greece, May.Google ScholarGoogle Scholar
  9. F. J. Och, C. Tillmann, and H. Ney. 1999. Improved alignment models for statistical machine translation. In Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 20--28, University of Maryland, College Park, MD, June. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. A. Papineni, S. Roukos, and R. T. Ward. 1997. Feature-based language understanding. In European Conf. on Speech Communication and Technology, pages 1435--1438, Rhodes, Greece, September.Google ScholarGoogle Scholar
  11. K. A. Papineni, S. Roukos, and R. T. Ward. 1998. Maximum likelihood and discriminative training of direct translation models. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 189--192, Seattle, WA, May.Google ScholarGoogle Scholar
  12. K. A. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2001. Bleu: a method for automatic evaluation of machine translation. Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY, September.Google ScholarGoogle Scholar
  13. J. Peters and D. Klakow. 1999. Compact maximum entropy language models. In Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, Keystone, CO, December.Google ScholarGoogle Scholar
  14. R. Schlüter and H. Ney. 2001. Model-based MCE bound to the true Bayes' error. IEEE Signal Processing Letters, 8(5):131--133, May.Google ScholarGoogle ScholarCross RefCross Ref
  15. W. Wahlster. 1993. Verbmobil: Translation of face-to-face dialogs. In Proc. of MT Summit IV, pages 127--135, Kobe, Japan, July.Google ScholarGoogle Scholar
  1. Discriminative training and maximum entropy models for statistical machine translation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
        July 2002
        543 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 6 July 2002

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate85of443submissions,19%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader