Applications of Language Modeling in Speech-To-Speech Translation

Liu, Fu-Hua; Gu, Liang; Gao, Yuqing; Picheny, Michael

doi:10.1023/B:IJST.0000017019.94181.d4

Applications of Language Modeling in Speech-To-Speech Translation

Published: April 2004

Volume 7, pages 221–229, (2004)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Fu-Hua Liu¹,
Liang Gu¹,
Yuqing Gao¹ &
…
Michael Picheny¹

62 Accesses
Explore all metrics

Abstract

This paper describes various language modeling issues in a speech-to-speech translation system. These issues are addressed in the IBM speech-to-speech system we developed for the DARPA Babylon program in the context of two-way translation between English and Mandarin Chinese. First, the language models for the speech recognizer had to be adapted to the specific domain to improve the recognition performance for in-domain utterances, while keeping the domain coverage as broad as possible. This involved considerations of disfluencies and lack of punctuation, as well as domain-specific utterances. Second, we used a hybrid semantic/syntactic representation to minimize the data sparseness problem in a statistical natural language generation framework. Serious inflection and synonym issues arise when words in the target language are to be determined in the translation output. Instead of relying on tedious handcrafted grammar rules, we used N-gram models as a post-processing step to enhance the generation performance. When an interpolated language model was applied to a Chinese-to-English translation task, the translation performance, measured by an objective metric of BLEU, improved substantially to 0.514 from 0.318 when we used the correct transcription as input. Similarly, the BLEU score improved to 0.300 from 0.194 for the same task when the input was speech data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

Early dementia detection with speech analysis and machine learning techniques

Article Open access 11 April 2024

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

References

Berger, A., Della Pietra, S., and Della Pietra, V. (1996). A maximum entropy approach to natural language processing. Computer Linguistics, 22(1):39–71.
Google Scholar
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., and Mercer, R.L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19:263–311.
Google Scholar
Davies, K., Donovan, R., Epstein, M., Franz, M., Ittycheriah, A., Jan, E., LeRoux, J., Lubensky, D., Neti, C., Padmanabhan, M., Papineni, K., Roukos, S., Sakrajda, A., Sorensen, J., Tydlitat, B., and Ward, T. (1999). The IBM conversational telephony system for financial applications. EuroSpeech-1999 Proceedings. Budapest, Hungary: EuroSpeech, pp. 275–278.
Google Scholar
Donovan, R.E., Franz, M., Sorensen, J.S., and Roukos, S. (1999). Phrase splicing and variable substitution using the IBM trainable speech synthesis system. ICASSP-1999 Proceedings. Phoenix, AZ: ICASSP, pp. 373–376.
Google Scholar
Lavie, A., Mztze, F., Cattoni, R., and Costantini, E. (2002). A multi-perspective evaluation of the NESPOLE! speech-tospeech translation system. Speech-to-Speech Translation Workshop. Philadelphia, PA: ACL, pp. 121–128.
Google Scholar
Lavie, A., Waibel, A., Levin, L., Finke, M., Gates, D., Gavalda, M., Zeppenfeld, T., and Zhan, P. (1997). JANUS-III: Speech-to-speech translation in multiple languages. ICASSP-1997 Proceedings. Munich, Germany: ICASSP, pp. 99–102.
Google Scholar
Levin, L., Gates, D., Wallace, D., Peterson, K., Lavie, A., Pianesi, F., Pianta, E., Cattoni, R., and Mana, N. (2002). Balancing expressiveness and simplicity in an interlingua for task based dialogue. Speech-to-Speech Translation Workshop, Philadelphia, PA: ACL, pp. 53–60.
Google Scholar
Magerman, D. (1994). Natural language parsing as statistical pattern recognition. Ph. D. thesis, Stanford University, Palo Alto, CA.
Google Scholar
McCord, M.C. (1989). A new version of the machine translation system LMT. Literary and Linguistic Computing, 4:218–229.
Google Scholar
Ney, H., NieBen, S., Och, F.J., Sawaf, H., Tillmann, C., and Vogel, S. (2000). Algorithms for statistical translation of spoken language. IEEE Trans. on Speech And Audio Processing, 8(1):24–36.
Google Scholar
Papineni, K., Roukos, S., Ward, T., and Zhu, W. (2001). Bleu: A method for automatic evaluation of machine translation. Research Report RC22176, IBM. Yorktown Heights, NY.
Google Scholar
Ratnaparkhi, A. (2000). Trainable methods for surface natural language generation.NAACL-2000 Proceedings. Seattle, WA: NAACL, pp. 194–201.
Google Scholar
Rayner, M., Carter, D., Bouillon, P., Digalakis, V., and Wiren, M. (2000). The Spoken Language Translator. Cambridge, UK: Cambridge University Press.
Google Scholar
Takezawa, T., Morimoto, T., Sagisaka, Y., Campbell, N., Iida, H., Sugaya, F., Yokoo, A., and Yamamoto, S. (1998). A Japanese-to-English speech translation system: ART-MATRIX. ICSLP-1998 Proceedings. Sydney, Australia: ICSLP, pp. 2779–2782.
Google Scholar
Wahlster, W. (2000). Verbmobil: Foundations of Speech-to-Speech Translation. Berlin: Springer-Verlag.
Google Scholar
Zhou, B., Gao, Y., Sorensen, J., Diao, Z., and Picheny, M. (2002). Statistical natural language generation for speech-to-speech machine translation systems. ICSLP-2002 Proceedings. Denver, CO: ICSLP, pp. 1897–1900.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, 1101 Kitchawan Road, Rt. 134, Yorktown Heights, NY, 10598, USA
Fu-Hua Liu, Liang Gu, Yuqing Gao & Michael Picheny

Authors

Fu-Hua Liu
View author publications
You can also search for this author in PubMed Google Scholar
Liang Gu
View author publications
You can also search for this author in PubMed Google Scholar
Yuqing Gao
View author publications
You can also search for this author in PubMed Google Scholar
Michael Picheny
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, FH., Gu, L., Gao, Y. et al. Applications of Language Modeling in Speech-To-Speech Translation. International Journal of Speech Technology 7, 221–229 (2004). https://doi.org/10.1023/B:IJST.0000017019.94181.d4

Download citation

Issue Date: April 2004
DOI: https://doi.org/10.1023/B:IJST.0000017019.94181.d4

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applications of Language Modeling in Speech-To-Speech Translation

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Early dementia detection with speech analysis and machine learning techniques

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Applications of Language Modeling in Speech-To-Speech Translation

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Early dementia detection with speech analysis and machine learning techniques

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation