research-article

Free Access

Shrinking exponential language models

Author:
Stanley F. Chen

IBM T.J. Watson Research Center, Yorktown Heights, NY

IBM T.J. Watson Research Center, Yorktown Heights, NY
View Profile

NAACL '09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational LinguisticsMay 2009Pages 468–476

Published:31 May 2009Publication History

NAACL '09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Pages 468–476

ABSTRACT

In (Chen, 2009), we show that for a variety of language models belonging to the exponential family, the test set cross-entropy of a model can be accurately predicted from its training set cross-entropy and its parameter values. In this work, we show how this relationship can be used to motivate two heuristics for "shrinking" the size of a language model to improve its performance. We use the first heuristic to develop a novel class-based language model that outperforms a baseline word trigram model by 28% in perplexity and 1.9% absolute in speech recognition word-error rate on Wall Street Journal data. We use the second heuristic to motivate a regularized version of minimum discrimination information models and show that this method outperforms other techniques for domain adaptation.

References

Michiel Bacchiani, Michael Riley, Brian Roark, and Richard Sproat. 2006. MAP adaptation of stochastic grammars. Computer Speech and Language, 20(1):41--68. Google ScholarDigital Library
Jerome R. Bellegarda. 2004. Statistical language model adaptation: review and perspectives. Speech Communication, 42(1):93--108.Google ScholarCross Ref
Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, Jennifer C. Lai, and Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467--479, December. Google ScholarDigital Library
Stanley F. Chen and Joshua Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Harvard University.Google Scholar
Stanley F. Chen. 2008. Performance prediction for exponential language models. Technical Report RC 24671, IBM Research Division, October.Google Scholar
Stanley F. Chen. 2009. Performance prediction for exponential language models. In Proc. of HLT-NAACL. Google ScholarDigital Library
Chuang-Hua Chueh and Jen-Tzung Chien. 2008. Reliable feature selection for language model adaptation. In Proc. of ICASSP, pp. 5089--5092.Google Scholar
Stephen Della Pietra, Vincent Della Pietra, Robert L. Mercer, and Salim Roukos. 1992. Adaptive language modeling using minimum discriminant estimation. In Proc. of the Speech and Natural Language DARPA Workshop, February. Google ScholarDigital Library
Marcello Federico. 1996. Bayesian estimation methods for n-gram language model adaptation. In Proc. of ICSLP, pp. 240--243.Google ScholarCross Ref
Marcello Federico. 1999. Efficient language model adaptation through MDI estimation. In Proc. of Eurospeech, pp. 1583--1586.Google Scholar
Joshua T. Goodman. 2001. A bit of progress in language modeling. Technical Report MSR-TR-2001-72, Microsoft Research.Google Scholar
Rukmini Iyer, Mari Ostendorf, and Herbert Gish. 1997. Using out-of-domain data to improve in-domain language models. IEEE Signal Processing Letters, 4(8):221--223, August.Google ScholarCross Ref
Frederick Jelinek, Bernard Merialdo, Salim Roukos, and Martin Strauss. 1991. A dynamic language model for speech recognition. In Proc. of the DARPA Workshop on Speech and Natural Language, pp. 293--295, Morristown, NJ, USA. Google ScholarDigital Library
Slava M. Katz. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3):400--401, March.Google ScholarCross Ref
Jun'ichi Kazama and Jun'ichi Tsujii. 2003. Evaluation and extension of maximum entropy models with inequality constraints. In Proc. of EMNLP, pp. 137--144. Google ScholarDigital Library
Dietrich Klakow. 1998. Log-linear interpolation of language models. In Proc. of ICSLP.Google Scholar
Reinhard Kneser, Jochen Peters, and Dietrich Klakow. 1997. Language model adaptation using dynamic marginals. In Proc. of Eurospeech.Google Scholar
Hirokazu Masataki, Yoshinori Sagisaka, Kazuya Hisaki, and Tatsuya Kawahara. 1997. Task adaptation using MAP estimation in n-gram language modeling. In Proc. of ICASSP, volume 2, pp. 783--786, Washington, DC, USA. IEEE Computer Society. Google ScholarDigital Library
Douglas B. Paul and Janet M. Baker. 1992. The design for the Wall Street Journal-based CSR corpus. In Proc. of the DARPA Speech and Natural Language Workshop, pp. 357--362, February. Google ScholarDigital Library
P. Srinivasa Rao, Michael D. Monkowski, and Salim Roukos. 1995. Language model adaptation via minimum discrimination information. In Proc. of ICASSP, volume 1, pp. 161--164.Google ScholarCross Ref
P. Srinivasa Rao, Satya Dharanipragada, and Salim Roukos. 1997. MDI adaptation of language models across corpora. In Proc. of Eurospeech, pp. 1979--1982.Google Scholar
George Saon, Daniel Povey, and Geoffrey Zweig. 2005. Anatomy of an extremely fast LVCSR decoder. In Proc. of Interspeech, pp. 549--552.Google Scholar
Hagen Soltau, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon, and Geoffrey Zweig. 2005. The IBM 2004 conversational telephony system for rich transcription. In Proc. of ICASSP, pp. 205--208.Google Scholar
Andreas Stolcke. 1998. Entropy-based pruning of backoff language models. In Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 270--274, Lansdowne, VA, February.Google Scholar
Wen Wang and Mary P. Harper. 2002. The Super-ARV language model: Investigating the effectiveness of tightly integrating multiple knowledge sources. In Proc. of EMNLP, pp. 238--247. Google ScholarDigital Library
Wen Wang, Yang Liu, and Mary P. Harper. 2002. Rescoring effectiveness of language models using different levels of knowledge and their integration. In Proc. of ICASSP, pp. 785--788.Google Scholar
Wen Wang, Andreas Stolcke, and Mary P. Harper. 2004. The use of a linguistically motivated language model in conversational speech recognition. In Proc. of ICASSP, pp. 261--264.Google Scholar
Hirofumi Yamamoto and Yoshinori Sagisaka. 1999. Multi-class composite n-gram based on connection direction. In Proc. of ICASSP, pp. 533--536. Google ScholarDigital Library
Hirofumi Yamamoto, Shuntaro Isogai, and Yoshinori Sagisaka. 2003. Multi-class composite n-gram language model. Speech Communication, 41(2--3):369--379.Google Scholar

Index Terms

Shrinking exponential language models
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Paraphrastic language models

Natural languages are known for their expressive richness. Many sentences can be used to represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage and generalization, for example, when ...
Read More
Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition

This paper describes our research on statistical language modeling of Lithuanian. The idea of improving sparse n-gram models of highly inflected Lithuanian language by interpolating them with complex n-gram models based on word clustering and morphological ...
Read More
Syntactic and semantic features for code-switching factored language models

This paper presents our latest investigations on different features for factored language models for Code-Switching speech and their effect on automatic speech recognition (ASR) performance. We focus on syntactic and semantic features which can be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
NAACL '09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
May 2009
716 pages
ISBN:9781932432411
General Chair:
Mari Ostendorf
University of Washington
,
Program Chairs:
Michael Collins
Massachusetts Institute of Technology
,
Shri Narayanan
University of Southern California
,
Douglas W. Oard
Microsoft Research
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 31 May 2009
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate21of29submissions,72%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 420
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Shrinking exponential language models

NAACL '09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Paraphrastic language models

Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition

Syntactic and semantic features for code-switching factored language models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Shrinking exponential language models

NAACL '09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Paraphrastic language models

Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition

Syntactic and semantic features for code-switching factored language models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media