ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
Computer Speech & Language
Volume 19, Issue 1, January 2005, Pages 31-54
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (270 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.csl.2003.12.003    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2004 Elsevier Ltd. All rights reserved.

Additive background noise as a source of non-linear mismatch in the cepstral and log-energy domain

Febe de WetCorresponding Author Contact Information, E-mail The Corresponding Author, Johan de VethE-mail The Corresponding Author, Loe BovesE-mail The Corresponding Author and Bert CranenE-mail The Corresponding Author

Department of Language and Speech, University of Nijmegen, P.O. Box 9103, Nijmegen 6500 HD, The Netherlands

Received 21 February 2003; 
Revised 15 October 2003; 
accepted 15 December 2003. 
Available online 24 February 2004.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

The aim of this investigation is to determine to what extent automatic speech recognition may be enhanced if, in addition to the linear compensation accomplished by mean and variance normalisation, a non-linear mismatch reduction technique is applied to the cepstral and energy features, respectively. An additional goal is to determine whether the degree of mismatch between the feature distributions of the training and test data that is associated with acoustic mismatch, differs for the cepstral and energy features. Towards these aims, two non-linear mismatch reduction techniques – time domain noise reduction and histogram normalisation – were evaluated on the Aurora2 digit recognition task as well as on a continuous speech recognition task with noisy test conditions similar to those in the Aurora2 experiments. The experimental results show that recognition performance is enhanced by the application of both non-linear mismatch reduction techniques. The best results are obtained when the two techniques are applied simultaneously. The results also reveal that the mismatch in the energy features is quantitatively and qualitatively much larger than the corresponding mismatch associated with the cepstral coefficients. The most substantial gains in average recognition rate are therefore accomplished by reducing training-test mismatch for the energy features.

Article Outline

1. Introduction
2. Mismatch reduction techniques
2.1. Time-domain noise reduction
2.2. Histogram normalisation
3. Speech data and experimental set-up
3.1. Aurora2
3.1.1. Speech data
3.1.2. Hidden Markov modelling
3.2. VIOS
3.2.1. Speech data
3.2.2. Hidden Markov modelling
3.3. Acoustic pre-processing and feature extraction
3.4. Mismatch reduction experiments
4. Results and discussion
4.1. Aurora2
4.1.1. Experiment I: TDNR
4.1.2. Experiment II: HN
4.1.3. Experiment III: TDNR and HN
4.1.4. Discussion
4.2. VIOS
4.2.1. Experiment I: TDNR
4.2.2. Experiment II: HN
4.2.3. Experiment III: TDNR and HN
4.2.4. Discussion
5. General discussion
6. Conclusions
References







 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.