ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Computer Speech & Language
Volume 20, Issue 1, January 2006, Pages 2-21
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (319 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.csl.2004.06.001    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2004 Elsevier Ltd All rights reserved.

Multiple resolution analysis for robust automatic speech recognition

Roberto Gemelloa, Corresponding Author Contact Information, E-mail The Corresponding Author, Franco Manaa, E-mail The Corresponding Author, Dario Albesanoa, E-mail The Corresponding Author and Renato De Morib, E-mail The Corresponding Author

aLoquendo, Via Valdellatorre, 4, 10149 Torino, Italy bLia Ceri-Iup, University of Avignon, BP 1228, 84911 Avignon Cedex 9, France

Received 16 April 2004; 
revised 28 May 2004; 
accepted 3 June 2004. 
Available online 29 July 2004.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

This paper investigates the potential of exploiting the redundancy implicit in multiple resolution analysis for automatic speech recognition systems. The analysis is performed by a binary tree of elements, each one of which is made by a half-band filter followed by a down sampler which discards odd samples. Filter design and feature computation from samples are discussed and recognition performance with different choices is presented.

A paradigm consisting in redundant feature extraction, followed by feature normalization, followed by dimensionality reduction is proposed. Feature normalization is performed by denoising algorithms. Two of them are considered and evaluated, namely, signal-to-noise ratio-dependent spectral subtraction and soft thresholding. Dimensionality reduction is performed with principal component analysis.

Experiments using telephone corpora and the Aurora3 corpus are reported. They indicate that the proposed paradigm leads to a recognition performance with clean speech, measured in word error rate, marginally superior to the one obtained with perceptual linear prediction coefficients. Nevertheless, performance of the proposed analysis paradigm is significantly superior when used with noisy data and the same denoising algorithm is applied to all the analysis methods, which are compared.

Article Outline

1. Introduction
2. Multiple resolution analysis
3. Experiments with multiple resolution analysis and dimensionality reduction on telephone speech
4. Denoising
4.1. Spectral subtraction
4.2. Soft thresholding
4.3. Experiments
5. Conclusions
Acknowledgements
References





 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.