ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
Speech Communication
Volume 48, Issue 11, November 2006, Pages 1447-1457
Robustness Issues for Conversational Interaction
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (191 K)

  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.specom.2006.06.008    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2006 Elsevier B.V. All rights reserved.

A feature extraction method using subband based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition

Kentaro IshizukaCorresponding Author Contact Information, a, E-mail The Corresponding Author and Tomohiro Nakatania, E-mail The Corresponding Author

aNTT Communication Science Laboratories, NTT Corporation, Hikaridai 2-4, Seikacho, Sourakugun, Kyoto 619-0237, Japan

Received 30 July 2005; 
revised 7 June 2006; 
accepted 26 June 2006. 
Available online 21 July 2006.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

This paper proposes a frontend processing technique that employs a speech feature extraction method called Subband based Periodicity and Aperiodicity DEcomposition (SPADE), and examines its validity for automatic speech recognition in noisy environments. SPADE divides speech signals into subband signals, which are then decomposed into their periodic and aperiodic features, and uses both features as speech feature parameters. SPADE employs independent periodicity estimation within each subband and periodicity–aperiodicity decomposition design based on a parallel distributed processing technique motivated by the human speech perception process. Unlike other speech features, this decomposition of speech into two characteristics provides information about periodicities and aperiodicities, and thus allows the utilization of the robustness exhibited by periodic features without losing certain essential information included in aperiodic features. This paper first introduces an implementation of SPADE that operates in the frequency domain, and then examines the validity of combining SPADE with speech enhancement methods. For this examination, we combine SPADE with noise compensation methods that operate in the frequency domain and cepstral normalization methods. In addition, we employ an energy parameter calculation method based on the SPADE framework. An evaluation with the AURORA-2J noisy continuous digit speech recognition database (Japanese AURORA-2) shows that SPADE combined with adaptive Wiener filtering, cepstral normalization, and the energy parameter achieves average word accuracy rates of 82.58% with clean training and 92.55% with multicondition training. These rates are higher than those achieved with ETSI WI008 advanced DSR frontend processing (77.98% and 91.01%, respectively) whose speech feature parameter is based on conventional Mel-frequency cepstral coefficients. By comparison with ETSI WI008 advanced DSR frontend, the proposed method reduces word error rates by 20.9% with clean training and 17.2% with multicondition training. These results confirmed that SPADE combined with noise reduction methods can increase robustness in the presence of noise.

Keywords: Speech feature; Noise robust frontend; Subband; Periodicity; Aperiodicity

Article Outline

1. Introduction
2. Speech feature extraction method “SPADE” performed in the frequency domain (SPADE-QUEEN)
3. “SPADE-QUEEN” combined with noise robust processing
3.1. Non-linear spectral subtraction (NSS)
3.2. Adaptive Wiener filtering (AWF)
3.3. Cepstral mean/variance normalization (CMN/CVN)
3.4. Periodic energy coefficient (PE)
4. Experiment
4.1. AURORA-2J
4.2. Evaluation of feature extraction method
4.3. Evaluation of noise robust frontend processing
4.4. Evaluation under clean test condition
5. Conclusion
Acknowledgements
Appendix. Computational complexity of SPADE-QUEEN based system
References



Speech Communication
Volume 48, Issue 11, November 2006, Pages 1447-1457
Robustness Issues for Conversational Interaction
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.