ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
Computer Speech & Language
Volume 22, Issue 1, January 2008, Pages 69-83
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (873 K)

  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.csl.2007.06.002    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2007 Elsevier Ltd All rights reserved.

Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement

Qin Yana, Corresponding Author Contact Information, E-mail The Corresponding Author, Saeed Vaseghia, E-mail The Corresponding Author, Esfandiar Zavareheia, E-mail The Corresponding Author, Ben Milnerb, E-mail The Corresponding Author, Jonathan Darchb, E-mail The Corresponding Author, Paul Whitec, E-mail The Corresponding Author and Ioannis Andrianakisc, E-mail The Corresponding Author

aSchool of Computer and Information Engineering, Hohai University, Nanjing 210000, China bSchool of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK cInstitute of Sound and Vibration Research, University Road, Highfield, Southampton S017 1BJ, UK

Received 13 June 2006; 
revised 22 May 2007; 
accepted 15 June 2007. 
Available online 10 July 2007.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

This paper presents a speech enhancement method based on the tracking and denoising of the formants of a linear prediction (LP) model of the spectral envelope of speech and the parameters of a harmonic noise model (HNM) of its excitation. The main advantages of tracking and denoising the prominent energy contours of speech are the efficient use of the spectral and temporal structures of successive speech frames and a mitigation of processing artefact known as the ‘musical noise’ or ‘musical tones’.

The formant-tracking linear prediction (FTLP) model estimation consists of three stages: (a) speech pre-cleaning based on a spectral amplitude estimation, (b) formant-tracking across successive speech frames using the Viterbi method, and (c) Kalman filtering of the formant trajectories across successive speech frames.

The HNM parameters for the excitation signal comprise; voiced/unvoiced decision, the fundamental frequency, the harmonics’ amplitudes and the variance of the noise component of excitation. A frequency-domain pitch extraction method is proposed that searches for the peak signal to noise ratios (SNRs) at the harmonics. For each speech frame several pitch candidates are calculated. An estimate of the pitch trajectory across successive frames is obtained using a Viterbi decoder. The trajectories of the noisy excitation harmonics across successive speech frames are modeled and denoised using Kalman filters.

The proposed method is used to deconstruct noisy speech, de-noise its model parameters and then reconstitute speech from its cleaned parts. Experimental evaluations show the performance gains of the formant tracking, pitch extraction and noise reduction stages.

Keywords: HNM; Kalman; Formant

Article Outline

1. Introduction
2. An overview of formant-tracking LP model with HNM of excitation
3. Estimation of a formant-tracking LP model from noisy speech
3.1. Initial-cleaning of spectral amplitudes of noisy speech
3.2. HMM-based formant tracking
3.3. Formant tracking using viterbi decoder with MSE criterion
3.4. Investigation of the effect of noise on formant estimation
3.5. Formant track smoothing with state-dependent Kalman filters
3.6. Performance evaluation of formant tracking LP model
4. Estimation of harmonic noise model (HNM) of excitation
4.1. Fundamental frequency (pitch) estimation
4.2. Estimation of harmonic amplitudes of excitation
4.3. Estimation of noise component of HNM
5. Kalman filtering of trajectories of formants and harmonics
5.1. State-dependent Kalman filters
6. Performance evaluation for speech enhancement
6.1. Speech distortion measurements
7. Conclusion
Acknowledgements
References













 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.