Robust In-Car Speech Recognition Based on Nonlinear Multiple Regressions

Li, Weifeng; Takeda, Kazuya; Itakura, Fumitada

doi:10.1155/2007/16921

Research Article
Open access
Published: 01 December 2006

Robust In-Car Speech Recognition Based on Nonlinear Multiple Regressions

Weifeng Li¹,
Kazuya Takeda¹ &
Fumitada Itakura²

EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 016921 (2006) Cite this article

1025 Accesses
4 Citations
Metrics details

Abstract

We address issues for improving handsfree speech recognition performance in different car environments using a single distant microphone. In this paper, we propose a nonlinear multiple-regression-based enhancement method for in-car speech recognition. In order to develop a data-driven in-car recognition system, we develop an effective algorithm for adapting the regression parameters to different driving conditions. We also devise the model compensation scheme by synthesizing the training data using the optimal regression parameters and by selecting the optimal HMM for the test speech. Based on isolated word recognition experiments conducted in 15 real car environments, the proposed adaptive regression approach shows an advantage in average relative word error rate (WER) reductions of 52.5 and 14.8, compared to original noisy speech and ETSI advanced front end, respectively.

References

Gong Y: Speech recognition in noisy environments: a survey. Speech Communication 1995,16(3):261–291. 10.1016/0167-6393(94)00059-J
Article Google Scholar
Davis SB, Mermelstein P: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 1980,28(4):357–366. 10.1109/TASSP.1980.1163420
Article Google Scholar
Hermansky H: Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America 1990,87(4):1738–1752. 10.1121/1.399423
Article Google Scholar
Hermansky H, Morgan N: RASTA processing of speech. IEEE Transactions on Speech and Audio Processing 1994,2(4):578–589. 10.1109/89.326616
Article Google Scholar
Gold B, Morgan N: Speech and Audio Signal Processing: Processing and Perception of Speech and Music. John Wiley & Sons, New York, NY, USA; 1999.
Google Scholar
Ghitza O: Auditory models and human performance in tasks related to speech coding and speech recognition. IEEE Transactions on Speech and Audio Processing 1994,2(1):115–132. 10.1109/89.260357
Article Google Scholar
Boll SF: Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 1979,27(2):113–120. 10.1109/TASSP.1979.1163209
Article Google Scholar
Huang X, Acero A, Hon H-W: Spoken Language Processing—A Guide to Theory, Algorithm, and System Development. Prentice-Hall, Englewood Cliffs, NJ, USA; 2001.
Google Scholar
Acero A: Acoustical and environmental robustness in automatic speech recognition, Ph.D. thesis. Carnegie Mellon University, Pittsburgh, Pa, USA; 1990.
Google Scholar
Leggetter CJ, Woodland PC: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 1995,9(2):171–185. 10.1006/csla.1995.0010
Article Google Scholar
Sagayama S, Yamaguchi Y, Takahashi S: Jacobian adaptation of noisy speech models. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, December 1997, Santa Barbara, Calif, USA 396–403.
Chapter Google Scholar
Sarikaya R, Hansen JHL: Improved Jacobian adaptation for fast acoustic model adaptation in noisy speech recognition. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 702–705.
Google Scholar
Sorensen HBD: A cepstral noise reduction multi-layer neural network. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '91), May 1991, Toronto, Ontario, Canada 2: 933–936.
Google Scholar
Yuk D, Flanagan J: Telephone speech recognition using neural networks and hidden Markov models. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 1: 157–160.
Google Scholar
Li W, Takeda K, Itakura F: Adaptive log-spectral regression for in-car speech recognition using multiple distributed microphones. IEEE Signal Processing Letters 2005,12(4):340–343.
Article Google Scholar
Kawaguchi N, Matsubara S, Iwa H, et al.: Construction of speech corpus in moving car environment. Proceedings of the 6th International Conference of Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 362–365.
Google Scholar
Haykin S: Neural Networks—A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs, NJ, USA; 1999.
MATH Google Scholar
Quackenbush SR, Barnwell TP, Clements MA: Objective Measures of Speech Quality. Prentice-Hall, Englewood Cliffs, NJ, USA; 1988.
Google Scholar
Porter JE, Boll SF: Optimal estimators for spectral restoration of noisy speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '84), 1984, San Diego, Calif, USA 2: 18A.2.1–18A.2.4.
Google Scholar
Li W, Itou K, Takeda K, Itakura F: Two-stage noise spectra estimation and regression based in-car speech recognition using single distant microphone. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA I: 533–536.
Google Scholar
Berouti M, Schwartz R, Makhoul J: Enhancement of speech corrupted by acoustic noise. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '79), April 1979, Washington, DC, USA 4: 208–211.
Article Google Scholar
Chen J, Paliwal KK, Nakamura S: Sub-band based additive noise removal for robust speech recognition. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 571–574.
Google Scholar
Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error-log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(2):443–445. 10.1109/TASSP.1985.1164550
Article Google Scholar
Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1109–1121. 10.1109/TASSP.1984.1164453
Article Google Scholar
Cappe O, Laroche J: Evaluation of short-time spectral attenuation techniques for the restoration of musical recordings. IEEE Transactions on Speech and Audio Processing 1995,3(1):84–93. 10.1109/89.365378
Article Google Scholar
Martin R: Speech enhancement using MMSE short time spectral estimation with Gamma distributed speech priors. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 253–256.
Google Scholar
Li W, Itou K, Takeda K, Itakura F: Subjective and objective quality assessment of regression-enhanced speech in real car environments. Proceedings of the 9th European Conference on Speech Communication and Technology, September 2005, Lisbon, Portugal 2093–2096.
Google Scholar
Carey MJ, Parris ES, Lloyd-Thomas H: A comparison of features for speech, music discrimination. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 1: 149–152.
Google Scholar
Peltonen V, Tuomi J, Klapuri A, Huopaniemi J, Sorsa T: Computational auditory scene recognition. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1941–1944.
Google Scholar
Duda RO, Hart PE, Stork DG: Pattern Classification. 2nd edition. John Wiley & Sons, New York, NY, USA; 2001.
MATH Google Scholar
Shimizu Y, Kajita S, Takeda K, Itakura F: Speech recognition based on space diversity using distributed multi-microphone. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 3: 1747–1750.
Google Scholar
Deng L, Acero A, Plumpe M, Huang X: Large-vocabulary speech recognition under adverse acoustic environments. Proceedings of the 6th International Conference of Spoken Language Processing (ICSLP '00), October 2000, Beijing, China 806–809.
Google Scholar
Droppo J, Deng L, Acero A: Evaluation of the SPLICE algorithm on the Aurora2 database. Proceedings of the 7th European Conference on Speech Communication and Technology (EUROSPEECH '01), September 2001, Aalborg, Denmark 217–220.
Google Scholar
“Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced frontend feature extraction algorithm; compression algorithm,” ETSI ES 202 050 v1.1.1, 2002.
Griffiths LJ, Jim CW: An alternative approach to linearly constrained adaptive beamforming. IEEE Transactions on Antennas and Propagation 1982,30(1):27–34. 10.1109/TAP.1982.1142739
Article Google Scholar
Haykin S: Adaptive Filter Theory. Prentice-Hall, Englewood Cliffs, NJ, USA; 2002.
MATH Google Scholar
Mendel JM: Lessons in Estimation Theory for Signal Processing, Communications, and Control. Prentice-Hall, Englewood Cliffs, NJ, USA; 1995.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science, Nagoya University, Nagoya, 464-8603, Japan
Weifeng Li & Kazuya Takeda
Department of Information Engineering, Faculty of Science and Technology, Meijo University, Nagoya, 468-8502, Japan
Fumitada Itakura

Authors

Weifeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Kazuya Takeda
View author publications
You can also search for this author in PubMed Google Scholar
Fumitada Itakura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weifeng Li.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Li, W., Takeda, K. & Itakura, F. Robust In-Car Speech Recognition Based on Nonlinear Multiple Regressions. EURASIP J. Adv. Signal Process. 2007, 016921 (2006). https://doi.org/10.1155/2007/16921

Download citation

Received: 31 January 2006
Revised: 10 August 2006
Accepted: 29 October 2006
Published: 01 December 2006
DOI: https://doi.org/10.1155/2007/16921

Robust In-Car Speech Recognition Based on Nonlinear Multiple Regressions

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords