Abstract
In this chapter, we focus on the statistical methods that constitute a speech spectral enhancement system and describe some of their fundamental components. We begin in Sect. 44.2 by formulating the problem of spectral enhancement. In Sect. 44.3, we address the time-frequency correlation of spectral coefficients for speech and noise signals, and present statistical models that conform with these characteristics. In Sect. 44.4, we present estimators for speech spectral coefficients under speech presence uncertainty based on various fidelity criteria. In Sect. 44.5, we address the problem of speech presence probability estimation. In Sect. 44.6, we present useful estimators for the a priori signal-to-noise ratio (SNR) under speech presence uncertainty. We present the decision-directed approach, which is heuristically motivated, and the recursive estimation approach, which is based on statistical models and follows the rationale of Kalman filtering. In Sect. 44.7, we describe the improved minima-controlled recursive averaging (IMCRA) approach for noise power spectrum estimation. In Sect. 44.8, we provide a detailed example of a speech enhancement algorithm, and demonstrate its performance in environments with various noise types. In Sect. 44.9, we survey the main types of spectral enhancement components, and discuss the significance of the choice of statistical model, fidelity criterion, a priori SNR estimator, and noise spectrum estimator. Some concluding comments are made in Sect. 44.10.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ACS:
-
autocorrelation coefficient sequences
- HMP:
-
hidden Markov processes
- IMCRA:
-
improved minima-controlled recursive averaging
- ITU:
-
International Telecommunication Union
- KLT:
-
Karhunen-Loève transform
- LSA:
-
log-spectral amplitude
- MAP:
-
maximum a posteriori
- MMSE-LSA:
-
MMSE of the log-spectral amplitude
- MMSE-SA:
-
MMSE of the spectral amplitude
- MMSE:
-
minimum mean-square error
- MS:
-
minimum statistics
- MSE:
-
mean-square error
- PESQ:
-
perceptual evaluation of speech quality
- SNR:
-
signal-to-noise ratio
- STFT:
-
short-time Fourier transform
- SegSNR:
-
segmental SNR
- TI:
-
transinformation index
- VAD:
-
voice activity detector
- WGN:
-
white Gaussian noise
References
J. Benesty, S. Makino, J. Chen (Eds.): Speech Enhancement (Springer, Berlin, Heidelberg 2005)
Y. Ephraim, I. Cohen: Recent advancements in speech enhancement. In: The Electrical Engineering Handbook, Circuits, Signals, and Speech and Image Processing, 3rd edn., ed. by R.C. Dorf (CRC, Boca Raton 2006) pp. 15-12-15-26, Chap. 15
Y. Ephraim, H. Lev-Ari, W.J.J. Roberts: A brief survey of speech enhancement. In: The Electronic Handbook, 2nd edn. (CRC-Press, Boca Raton 2005)
S.F. Boll: Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process. 27(2), 113-120 (1979)
J.S. Lim, A.V. Oppenheim: Enhancement and bandwidth compression of noisy speech, Proc. IEEE 67(12), 1586-1604 (1979)
M. Berouti, R. Schwartz, J. Makhoul: Enhancement of speech corrupted by acoustic noise, Proc. 4th ICASSP 79, 208-211 (1979)
Z. Goh, K.-C. Tan, T.G. Tan: Postprocessing method for suppressing musical noise generated by spectral subtraction, IEEE Trans. Speech Audio Process. 6(3), 287-292 (1998)
B.L. Sim, Y.C. Tong, J.S. Chang, C.T. Tan: A parametric formulation of the generalized spectral subtraction method, IEEE Trans. Speech Audio Process. 6(4), 328-337 (1998)
H. Gustafsson, S.E. Nordholm, I. Claesson: Spectral subtraction using reduced delay convolution and adaptive averaging, IEEE Trans. Speech Audio Process. 9(8), 799-807 (2001)
D.E. Tsoukalas, J.N. Mourjopoulos, G. Kokkinakis: Speech enhancement based on audible noise suppression, IEEE Trans. Speech Audio Process. 5(6), 497-514 (1997)
N. Virag: Single channel speech enhancement based on masking properties of the human auditory system, IEEE Trans. Speech Audio Process. 7(2), 126-137 (1999)
Y. Ephraim, D. Malah: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. ASSP-32(6), 1109-1121 (1984)
D. Burshtein, S. Gannot: Speech enhancement using a mixture-maximum model, IEEE Trans. Speech Audio Process. 10(6), 341-351 (2002)
R. Martin: Speech enhancement based on minimum mean-square error estimation and supergaussian priors, IEEE Trans. Speech Audio Process. 13(5), 845-856 (2005)
I. Cohen: Relaxed statistical model for speech enhancement and a priori SNR estimation, IEEE Trans. Speech Audio Process. 13(5), 870-881 (2005)
I. Cohen: Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity models, Signal Process. 86(4), 698-709 (2006)
Y. Ephraim, D. Malah: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. ASSP-33(2), 443-445 (1985)
P.J. Wolfe, S.J. Godsill: Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement, Special Issue EURASIP JASP Digital Audio Multim. Commun. 2003(10), 1043-1051 (2003)
P.C. Loizou: Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum, IEEE Trans. Speech Audio Process. 13(5), 857-869 (2005)
B.H. Juang, L.R. Rabiner: Mixture autoregressive hidden Markov models for speech signals, IEEE Trans. Acoust. Speech Signal Process. ASSP-33(6), 1404-1413 (1985)
Y. Ephraim: Statistical-model-based speech enhancement systems, Proc. IEEE 80(10), 1526-1555 (1992)
H. Sheikhzadeh, L. Deng: Waveform-based speech recognition using hidden filter models: Parameter selection and sensitivity to power normalization, IEEE Trans. Speech Audio Process. 2, 80-91 (1994)
Y. Ephraim, N. Merhav: Hidden Markov processes, IEEE Trans. Inform. Theory 48(6), 1518-1568 (2002)
C.J. Wellekens: Explicit time correlations in hidden Markov models for speech recognition, Proc. 12th ICASSP 87, 384-386 (1987)
H. Sameti, H. Sheikhzadeh, L. Deng, R.L. Brennan: HMM-based strategies for enhancement of speech signals embedded in nonstationary noise, IEEE Trans. Speech Audio Process. 6(5), 445-455 (1998)
L.R. Rabiner, B.-H. Juang: Fundamentals of Speech Recognition (Prentice-Hall, Upper Saddle River 1993)
F. Jelinek: Statistical Methods for Speech Recognition (MIT Press, Cambridge 1998)
Y. Ephraim, H.L.V. Trees: A signal subspace approach for speech enhancement, IEEE Trans. Speech Audio Process. 3(4), 251-266 (1995)
F. Asano, S. Hayamizu, T. Yamada, S. Nakamura: Speech enhancement based on the subspace method, IEEE Trans. Speech Audio Process. 8(5), 497-507 (2000)
U. Mittal, N. Phamdo: Signal/noise KLT based approach for enhancing speech degraded by colored noise, IEEE Trans. Speech Audio Process. 8(2), 159-167 (2000)
Y. Hu, P.C. Loizou: A generalized subspace approach for enhancing speech corrupted by colored noise, IEEE Trans. Speech Audio Process. 11(4), 334-341 (2003)
S.H. Jensen, P.C. Hansen, S.D. Hansen, J.A. Sørensen: Reduction of broad-band noise in speech by truncated QSVD, IEEE Trans. Speech Audio Process. 3(6), 439-448 (1995)
S. Doclo, M. Moonen: GSVD-based optimal filtering for single and multimicrophone speech enhancement, IEEE Trans. Signal Process. 50(9), 2230-2244 (2002)
F. Jabloun, B. Champagne: Incorporating the human hearing properties in the signal subspace approach for speech enhancement, IEEE Trans. Speech Audio Process. 11(6), 700-708 (2003)
Y. Hu, P.C. Loizou: A perceptually motivated approach for speech enhancement, IEEE Trans. Speech Audio Process. 11(5), 457-465 (2003)
J. Wexler, S. Raz: Discrete Gabor expansions, Speech Process. 21(3), 207-220 (1990)
R.E. Crochiere, L.R. Rabiner: Multirate Digital Signal Processing (Prentice-Hall, Englewood Cliffs 1983)
J.S. Garofolo: Getting Started with the DARPA TIMIT CD-ROM: An Acoustic Phonetic Continuous Speech Database (NIST, Gaithersburg 1988)
A. Stuart, J.K. Ord: Kendallʼs Advanced Theory of Statistics, Vol. 1, 6th edn. (Arnold, London 1994)
R. Martin: Speech enhancement using MMSE short time spectral estimation with Gamma distributed speech priors, Proc. 27th ICASSP 02, 253-256 (2002)
I. Cohen: Modeling speech signals in the time-frequency domain using GARCH, Signal Process. 84(12), 2453-2459 (2004)
I. Cohen, B. Berdugo: Speech enhancement for non-stationary noise environments, Signal Process. 81(11), 2403-2418 (2001)
I. Cohen: Speech enhancement using supergaussian speech models and noncausal a priori SNR estimation, Speech Commun. 47(3), 336-350 (2005)
I.S. Gradshteyn, I.M. Ryzhik: Table of Integrals, Series, and Products, 4th edn. (Academic Press, New York 1980)
R. Martin, C. Breithaupt: Speech enhancement in the DFT domain using Laplacian speech priors. In: Proc. 8th Int. Workshop on Acoustic Echo and Noise Control (Kyoto, Japan 2003) pp. 87-90
J. Porter, S. Boll: Optimal estimators for spectral restoration of noisy speech, Proc. ICASSP 84, 18A.2.1-18A.2.4 (1984)
O. Cappé: Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor, IEEE Trans. Acoust. Speech Signal Process. 2(2), 345-349 (1994)
P. Scalart, J. Vieira-Filho: Speech enhancement based on a priori signal to noise estimation, Proc. 21th ICASSP 96, 629-632 (1996)
D. Malah, R.V. Cox, A.J. Accardi: Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments, Proc. 24th ICASSP 99, 789-792 (1999)
I. Cohen: On speech enhancement under signal presence uncertainty, Proc. 26th ICASSP 2001, 167-170 (2001)
I.Y. Soon, S.N. Koh, C.K. Yeo: Improved noise suppression filter using self-adaptive estimator of probability of speech absence, Signal Process. 75(2), 151-159 (1999)
M. Marzinzik: Noise reduction schemes for digital hearing aids and their use for the hearing impaired, Ph.D. Thesis (Oldenburg University, Oldenburg 2000)
I. Cohen: Speech enhancement using a noncausal a priori SNR estimator, IEEE Signal Process. Lett. 11(9), 725-728 (2004)
I. Cohen: Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging, IEEE Trans. Speech Audio Process. 11(5), 466-475 (2003)
C. Ris, S. Dupont: Assessing local noise level estimation methods: Application to noise robust ASR, Speech Commun. 34(1-2), 141-158 (2001)
R. Martin: Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech Audio Process. 9(5), 504-512 (2001)
G. Doblinger: Computationally efficient speech enhancement by spectral minima tracking in subbands, Proc. 4th Eurospeech 95, 1513-1516 (1995)
S. Qian, D. Chen: Discrete Gabor transform, IEEE Trans. Signal Process. 41(7), 2429-2438 (1993)
R. Martin: Spectral subtraction based on minimum statistics, Proc. 7th EUSIPCO 94, 1182-1185 (1994)
A. Varga, H.J.M. Steeneken: Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems, Speech Commun. 12(3), 247-251 (1993)
S.R. Quackenbush, T.P. Barnwell, M.A. Clements: Objective Measures of Speech Quality (Prentice-Hall, Englewood Cliffs 1988)
J.R. Deller, J.H.L. Hansen, J.G. Proakis: Discrete-Time Processing of Speech Signals, 2nd edn. (IEEE, New York 2000)
P.E. Papamichalis: Practical Approaches to Speech Coding (Prentice-Hall, Englewood Cliffs 1987)
A.J. Accardi, R.V. Cox: A modular approach to speech enhancement with an application to speech coding, Proc. 24th ICASSP 99, 201-204 (1999)
J. Sohn, N.S. Kim, W. Sung: A statistical model-based voice activity detector, IEEE Signal Process. Lett. 6(1), 1-3 (1999)
T. Lotter, C. Benien, P. Vary: Multichannel speech enhancement using bayesian spectral amplitude estimation, Proc. 28th ICASSP 03, 832-835 (2003)
J.W.B. Davenport: Probability and Random Processes: An Introduction for Applied Scientists and Engineers (McGraw-Hill, New York 1970)
C. Breithaupt, R. Martin: MMSE estimation of magnitude-squared DFT coefficients with supergaussian priors, Proc. 28th ICASSP 03, 896-899 (2003)
T. Lotter, P. Vary: Noise reduction by maximum a posteriori spectral amplitude estimation with supergaussian speech modeling. In: Proc. 8th Internat. Workshop on Acoustic Echo and Noise Control (2003) pp. 83-86
Y. Ephraim, D. Malah: Signal to Noise Ratio Estimation for Enhancing Speech Using the Viterbi Algorithm, Tech. Rep. EE PUB 489 (Technion - Israel Institute of Technology, Haifa 1984)
J. Meyer, K.U. Simmer, K.D. Kammeyer: Comparison of one- and two-channel noise-estimation techniques, Proc. 5th IWAENC 97, 137-145 (1997)
B.L. McKinley, G.H. Whipple: Model based speech pause detection, Proc. 22th ICASSP 97, 1179-1182 (1997)
R.J. McAulay, M.L. Malpass: Speech enhancement using a soft-decision noise suppression filter, IEEE Trans. Acoust. Speech Signal Process. ASSP-28(2), 137-145 (1980)
H.G. Hirsch, C. Ehrlicher: Noise estimation techniques for robust speech recognition, Proc. 20th ICASSP 95, 153-156 (1995)
I. Cohen, B. Berdugo: Speech enhancement for non-stationary noise environments, Signal Process. 81(11), 2403-2418 (2001)
V. Stahl, A. Fischer, R. Bippus: Quantile based noise estimation for spectral subtraction and Wiener filtering, Proc. 25th ICASSP 2000, 1875-1878 (2000)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Cohen, I., Gannot, S. (2008). Spectral Enhancement Methods. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_44
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)