ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Computer Speech & Language
Volume 17, Issues 2-3, April-July 2003, Pages 113-136
New Computational Paradigms for Acoustic Modeling in Speech Recognition
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (214 K)

  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/S0885-2308(03)00004-4    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2003 Elsevier Science Ltd. All rights reserved.

Non-parametric probability estimation for HMM-based automatic speech recognition

Fabrice LefèvreE-mail The Corresponding Author, 1

Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403, Orsay Cedex, France

Received 6 June 2001; 
revised 5 February 2003; 
accepted 5 February 2003. ;
Available online 28 February 2003.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

During the last decade, the most significant advances in the field of continuous speech recognition (CSR) have arisen from the use of hidden Markov models (HMM) for acoustic modeling. These models address one of the major issues for CSR: simultaneous modeling of temporal and frequency distortions in the speech signal. In the HMM, the temporal dimension is managed through an oriented states graph, each state accounting for the local frequency distortions through a probability density function. In this study, improvement of the HMM performance is expected from the introduction of a very effective non-parametric probability density function estimate: the k-nearest neighbors (k-nn) estimate.

First, experiments on a short-term speech spectrum identification task are performed to compare the k-nn estimate and the widespread estimate based on mixtures of Gaussian functions. Then adaptations implied by the integration of the k-nn estimate in an HMM-based recognition system are developed. An optimal training protocol is obtained based on the introduction of the membership coefficients in the HMM parameters. The membership coefficients measure the degree of association between a reference acoustic vector and a HMM state. The training procedure uses the expectation-maximization (EM) algorithm applied to the membership coefficient estimation. Its convergence is shown according to the maximum likelihood criterion. This study leads to the development of a baseline k-nn/HMM recognition system which is evaluated on the TIMIT speech database. Further improvements of the k-nn/HMM system are finally sought through the introduction of a temporal information into the representation space (delta coefficients) and the adaptation of the references (mainly, gender modeling and contextual modeling).

Article Outline

1. Introduction
2. Non-parametric discriminant probability estimates
2.1. Principle
2.2. Maximum a posteriori rule and classification error bounds
3. k-nn/HMM system training
3.1. Output state probability computation
3.2. Training membership coefficients
4. Representation space and reference adaptations
4.1. Integration of delta coefficients in the k-nn/HMM system
4.2. Gender modeling
4.3. Contextual modeling
5. Experiments
5.1. Local identification with the k-nn rule
5.1.1. Signal analysis and fast algorithm for k-nn computation
5.1.2. k-nn/Gaussian mixture comparison on a local identification task
5.1.3. Detailed analysis of the identification results
5.1.4. Speech/silence separation
5.2. Evaluation of baseline k-nn/HMM and Gauss/HMM systems
5.3. Integrating state-of-the-art techniques into the k-nn/HMM system
5.3.1. Delta coefficients
5.3.2. Gender-dependent modeling
5.3.3. Contextual modeling
5.4. Discussion
6. Conclusion
Appendix A. Proof of the reestimation formulae for the k-nn/HMM system parameters
References




Computer Speech & Language
Volume 17, Issues 2-3, April-July 2003, Pages 113-136
New Computational Paradigms for Acoustic Modeling in Speech Recognition
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.