Copyright © 2001 Elsevier Science B.V. All rights reserved.
Time and frequency filtering of filter-bank energies for robust HMM speech recognition
Available online 14 February 2001.
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
Every speech recognition system requires a signal representation that parametrically models the temporal evolution of the speech spectral envelope. Current parameterizations involve, either explicitly or implicitly, a set of energies from frequency bands which are often distributed in a mel scale. The computation of those energies is performed in diverse ways, but it always includes smoothing of basic spectral measurements and non-linear amplitude compression. Several linear transformations are then applied to the two-dimensional time-frequency sequence of energies before entering the HMM pattern matching stage. In this paper, a recently introduced technique that consists of filtering that sequence of energies along the frequency dimension is presented, and its resulting parameters are compared with the widely used cepstral coefficients. Then, that frequency filtering transformation is jointly considered with the time filtering transformation that is used to compute dynamic parameters, showing that the flexibility of this combined (tiffing) approach can be used to design a robust set of filters. Recognition experiment results are reported which show the potential of tiffing for an enhanced and more robust HMM speech recognition.
Author Keywords: Robust speech recognition; Time and frequency filtering; Modulation spectrum; Filter-bank energies
Article Outline
- 1. Introduction
- 2. Non-linearly compressed filter-bank energies
- 2.1. Spectral smoothing
- 2.2. Quasi-optimality of frequency averaging
- 2.3. Non-linear compression
- 2.4. Compressed FBEs assumed in this work
- 3. Linear transformation of the parameter vector
- 3.1. Disadvantages of cepstral coefficients for speech recognition
- 3.2. The frequency filtering technique
- 3.3. FF and decorrelation of FBEs
- 3.4. FF and discriminative liftering
- 3.5. Recognition tests with static parameters
- 3.6. Alternative combination of FF and non-linearity
- 4. Temporal filtering
- 5. Tiffing (time and frequency filtering)
- 5.1. The two-dimensional modulation spectrum (2D-MS)
- 5.2. 2D-MS-assisted design of the time and frequency filters for robust speech recognition
- 5.3. Tiffing versus cepstral-time matrices
- 5.4. Recognition tests with the Aurora database and recognition setup
- 5.5. Conclusion: advantage of time and frequency filtering
- 6. Optimal transformations of the whole set of features: PCA and LDA
- 7. Conclusions
- Acknowledgements
- References







E-mail Article
Add to my Quick Links

Cited By in Scopus (27)






