Spectral Enhancement Methods

Cohen, Israel; Gannot, Sharon

doi:10.1007/978-3-540-49127-9_44

Israel Cohen Ph.D⁴ &
Sharon Gannot Ph.D⁵

Part of the book series: Springer Handbooks ((SHB))

9068 Accesses
31 Citations

Abstract

In this chapter, we focus on the statistical methods that constitute a speech spectral enhancement system and describe some of their fundamental components. We begin in Sect. 44.2 by formulating the problem of spectral enhancement. In Sect. 44.3, we address the time-frequency correlation of spectral coefficients for speech and noise signals, and present statistical models that conform with these characteristics. In Sect. 44.4, we present estimators for speech spectral coefficients under speech presence uncertainty based on various fidelity criteria. In Sect. 44.5, we address the problem of speech presence probability estimation. In Sect. 44.6, we present useful estimators for the a priori signal-to-noise ratio (SNR) under speech presence uncertainty. We present the decision-directed approach, which is heuristically motivated, and the recursive estimation approach, which is based on statistical models and follows the rationale of Kalman filtering. In Sect. 44.7, we describe the improved minima-controlled recursive averaging (IMCRA) approach for noise power spectrum estimation. In Sect. 44.8, we provide a detailed example of a speech enhancement algorithm, and demonstrate its performance in environments with various noise types. In Sect. 44.9, we survey the main types of spectral enhancement components, and discuss the significance of the choice of statistical model, fidelity criterion, a priori SNR estimator, and noise spectrum estimator. Some concluding comments are made in Sect. 44.10.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 579.00; Price excludes VAT (USA)

Hardcover Book: USD 729.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ACS:: autocorrelation coefficient sequences
HMP:: hidden Markov processes
IMCRA:: improved minima-controlled recursive averaging
ITU:: International Telecommunication Union
KLT:: Karhunen-Loève transform
LSA:: log-spectral amplitude
MAP:: maximum a posteriori
MMSE-LSA:: MMSE of the log-spectral amplitude
MMSE-SA:: MMSE of the spectral amplitude
MMSE:: minimum mean-square error
MS:: minimum statistics
MSE:: mean-square error
PESQ:: perceptual evaluation of speech quality
SNR:: signal-to-noise ratio
STFT:: short-time Fourier transform
SegSNR:: segmental SNR
TI:: transinformation index
VAD:: voice activity detector
WGN:: white Gaussian noise

References

J. Benesty, S. Makino, J. Chen (Eds.): Speech Enhancement (Springer, Berlin, Heidelberg 2005)
Google Scholar
Y. Ephraim, I. Cohen: Recent advancements in speech enhancement. In: The Electrical Engineering Handbook, Circuits, Signals, and Speech and Image Processing, 3rd edn., ed. by R.C. Dorf (CRC, Boca Raton 2006) pp. 15-12-15-26, Chap. 15
Google Scholar
Y. Ephraim, H. Lev-Ari, W.J.J. Roberts: A brief survey of speech enhancement. In: The Electronic Handbook, 2nd edn. (CRC-Press, Boca Raton 2005)
Google Scholar
S.F. Boll: Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process. 27(2), 113-120 (1979)
Article Google Scholar
J.S. Lim, A.V. Oppenheim: Enhancement and bandwidth compression of noisy speech, Proc. IEEE 67(12), 1586-1604 (1979)
Article Google Scholar
M. Berouti, R. Schwartz, J. Makhoul: Enhancement of speech corrupted by acoustic noise, Proc. 4th ICASSP 79, 208-211 (1979)
Google Scholar
Z. Goh, K.-C. Tan, T.G. Tan: Postprocessing method for suppressing musical noise generated by spectral subtraction, IEEE Trans. Speech Audio Process. 6(3), 287-292 (1998)
Article Google Scholar
B.L. Sim, Y.C. Tong, J.S. Chang, C.T. Tan: A parametric formulation of the generalized spectral subtraction method, IEEE Trans. Speech Audio Process. 6(4), 328-337 (1998)
Article Google Scholar
H. Gustafsson, S.E. Nordholm, I. Claesson: Spectral subtraction using reduced delay convolution and adaptive averaging, IEEE Trans. Speech Audio Process. 9(8), 799-807 (2001)
Article Google Scholar
D.E. Tsoukalas, J.N. Mourjopoulos, G. Kokkinakis: Speech enhancement based on audible noise suppression, IEEE Trans. Speech Audio Process. 5(6), 497-514 (1997)
Article MATH Google Scholar
N. Virag: Single channel speech enhancement based on masking properties of the human auditory system, IEEE Trans. Speech Audio Process. 7(2), 126-137 (1999)
Article Google Scholar
Y. Ephraim, D. Malah: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. ASSP-32(6), 1109-1121 (1984)
Article Google Scholar
D. Burshtein, S. Gannot: Speech enhancement using a mixture-maximum model, IEEE Trans. Speech Audio Process. 10(6), 341-351 (2002)
Article Google Scholar
R. Martin: Speech enhancement based on minimum mean-square error estimation and supergaussian priors, IEEE Trans. Speech Audio Process. 13(5), 845-856 (2005)
Article Google Scholar
I. Cohen: Relaxed statistical model for speech enhancement and a priori SNR estimation, IEEE Trans. Speech Audio Process. 13(5), 870-881 (2005)
Article Google Scholar
I. Cohen: Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity models, Signal Process. 86(4), 698-709 (2006)
Article MATH Google Scholar
Y. Ephraim, D. Malah: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. ASSP-33(2), 443-445 (1985)
Article Google Scholar
P.J. Wolfe, S.J. Godsill: Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement, Special Issue EURASIP JASP Digital Audio Multim. Commun. 2003(10), 1043-1051 (2003)
MATH Google Scholar
P.C. Loizou: Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum, IEEE Trans. Speech Audio Process. 13(5), 857-869 (2005)
Article Google Scholar
B.H. Juang, L.R. Rabiner: Mixture autoregressive hidden Markov models for speech signals, IEEE Trans. Acoust. Speech Signal Process. ASSP-33(6), 1404-1413 (1985)
Article Google Scholar
Y. Ephraim: Statistical-model-based speech enhancement systems, Proc. IEEE 80(10), 1526-1555 (1992)
Article Google Scholar
H. Sheikhzadeh, L. Deng: Waveform-based speech recognition using hidden filter models: Parameter selection and sensitivity to power normalization, IEEE Trans. Speech Audio Process. 2, 80-91 (1994)
Article Google Scholar
Y. Ephraim, N. Merhav: Hidden Markov processes, IEEE Trans. Inform. Theory 48(6), 1518-1568 (2002)
Article MathSciNet MATH Google Scholar
C.J. Wellekens: Explicit time correlations in hidden Markov models for speech recognition, Proc. 12th ICASSP 87, 384-386 (1987)
Google Scholar
H. Sameti, H. Sheikhzadeh, L. Deng, R.L. Brennan: HMM-based strategies for enhancement of speech signals embedded in nonstationary noise, IEEE Trans. Speech Audio Process. 6(5), 445-455 (1998)
Article Google Scholar
L.R. Rabiner, B.-H. Juang: Fundamentals of Speech Recognition (Prentice-Hall, Upper Saddle River 1993)
MATH Google Scholar
F. Jelinek: Statistical Methods for Speech Recognition (MIT Press, Cambridge 1998)
Google Scholar
Y. Ephraim, H.L.V. Trees: A signal subspace approach for speech enhancement, IEEE Trans. Speech Audio Process. 3(4), 251-266 (1995)
Article Google Scholar
F. Asano, S. Hayamizu, T. Yamada, S. Nakamura: Speech enhancement based on the subspace method, IEEE Trans. Speech Audio Process. 8(5), 497-507 (2000)
Article Google Scholar
U. Mittal, N. Phamdo: Signal/noise KLT based approach for enhancing speech degraded by colored noise, IEEE Trans. Speech Audio Process. 8(2), 159-167 (2000)
Article Google Scholar
Y. Hu, P.C. Loizou: A generalized subspace approach for enhancing speech corrupted by colored noise, IEEE Trans. Speech Audio Process. 11(4), 334-341 (2003)
Article Google Scholar
S.H. Jensen, P.C. Hansen, S.D. Hansen, J.A. Sørensen: Reduction of broad-band noise in speech by truncated QSVD, IEEE Trans. Speech Audio Process. 3(6), 439-448 (1995)
Article MATH Google Scholar
S. Doclo, M. Moonen: GSVD-based optimal filtering for single and multimicrophone speech enhancement, IEEE Trans. Signal Process. 50(9), 2230-2244 (2002)
Article Google Scholar
F. Jabloun, B. Champagne: Incorporating the human hearing properties in the signal subspace approach for speech enhancement, IEEE Trans. Speech Audio Process. 11(6), 700-708 (2003)
Article Google Scholar
Y. Hu, P.C. Loizou: A perceptually motivated approach for speech enhancement, IEEE Trans. Speech Audio Process. 11(5), 457-465 (2003)
Article Google Scholar
J. Wexler, S. Raz: Discrete Gabor expansions, Speech Process. 21(3), 207-220 (1990)
Google Scholar
R.E. Crochiere, L.R. Rabiner: Multirate Digital Signal Processing (Prentice-Hall, Englewood Cliffs 1983)
Google Scholar
J.S. Garofolo: Getting Started with the DARPA TIMIT CD-ROM: An Acoustic Phonetic Continuous Speech Database (NIST, Gaithersburg 1988)
Google Scholar
A. Stuart, J.K. Ord: Kendallʼs Advanced Theory of Statistics, Vol. 1, 6th edn. (Arnold, London 1994)
Google Scholar
R. Martin: Speech enhancement using MMSE short time spectral estimation with Gamma distributed speech priors, Proc. 27th ICASSP 02, 253-256 (2002)
Google Scholar
I. Cohen: Modeling speech signals in the time-frequency domain using GARCH, Signal Process. 84(12), 2453-2459 (2004)
Article Google Scholar
I. Cohen, B. Berdugo: Speech enhancement for non-stationary noise environments, Signal Process. 81(11), 2403-2418 (2001)
Article MATH Google Scholar
I. Cohen: Speech enhancement using supergaussian speech models and noncausal a priori SNR estimation, Speech Commun. 47(3), 336-350 (2005)
Article Google Scholar
I.S. Gradshteyn, I.M. Ryzhik: Table of Integrals, Series, and Products, 4th edn. (Academic Press, New York 1980)
MATH Google Scholar
R. Martin, C. Breithaupt: Speech enhancement in the DFT domain using Laplacian speech priors. In: Proc. 8th Int. Workshop on Acoustic Echo and Noise Control (Kyoto, Japan 2003) pp. 87-90
Google Scholar
J. Porter, S. Boll: Optimal estimators for spectral restoration of noisy speech, Proc. ICASSP 84, 18A.2.1-18A.2.4 (1984)
Google Scholar
O. Cappé: Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor, IEEE Trans. Acoust. Speech Signal Process. 2(2), 345-349 (1994)
Article Google Scholar
P. Scalart, J. Vieira-Filho: Speech enhancement based on a priori signal to noise estimation, Proc. 21th ICASSP 96, 629-632 (1996)
Google Scholar
D. Malah, R.V. Cox, A.J. Accardi: Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments, Proc. 24th ICASSP 99, 789-792 (1999)
Google Scholar
I. Cohen: On speech enhancement under signal presence uncertainty, Proc. 26th ICASSP 2001, 167-170 (2001)
Google Scholar
I.Y. Soon, S.N. Koh, C.K. Yeo: Improved noise suppression filter using self-adaptive estimator of probability of speech absence, Signal Process. 75(2), 151-159 (1999)
Article MATH Google Scholar
M. Marzinzik: Noise reduction schemes for digital hearing aids and their use for the hearing impaired, Ph.D. Thesis (Oldenburg University, Oldenburg 2000)
Google Scholar
I. Cohen: Speech enhancement using a noncausal a priori SNR estimator, IEEE Signal Process. Lett. 11(9), 725-728 (2004)
Article Google Scholar
I. Cohen: Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging, IEEE Trans. Speech Audio Process. 11(5), 466-475 (2003)
Article Google Scholar
C. Ris, S. Dupont: Assessing local noise level estimation methods: Application to noise robust ASR, Speech Commun. 34(1-2), 141-158 (2001)
Article MATH Google Scholar
R. Martin: Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech Audio Process. 9(5), 504-512 (2001)
Article Google Scholar
G. Doblinger: Computationally efficient speech enhancement by spectral minima tracking in subbands, Proc. 4th Eurospeech 95, 1513-1516 (1995)
Google Scholar
S. Qian, D. Chen: Discrete Gabor transform, IEEE Trans. Signal Process. 41(7), 2429-2438 (1993)
Article MATH Google Scholar
R. Martin: Spectral subtraction based on minimum statistics, Proc. 7th EUSIPCO 94, 1182-1185 (1994)
Google Scholar
A. Varga, H.J.M. Steeneken: Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems, Speech Commun. 12(3), 247-251 (1993)
Article Google Scholar
S.R. Quackenbush, T.P. Barnwell, M.A. Clements: Objective Measures of Speech Quality (Prentice-Hall, Englewood Cliffs 1988)
Google Scholar
J.R. Deller, J.H.L. Hansen, J.G. Proakis: Discrete-Time Processing of Speech Signals, 2nd edn. (IEEE, New York 2000)
Google Scholar
P.E. Papamichalis: Practical Approaches to Speech Coding (Prentice-Hall, Englewood Cliffs 1987)
Google Scholar
A.J. Accardi, R.V. Cox: A modular approach to speech enhancement with an application to speech coding, Proc. 24th ICASSP 99, 201-204 (1999)
Google Scholar
J. Sohn, N.S. Kim, W. Sung: A statistical model-based voice activity detector, IEEE Signal Process. Lett. 6(1), 1-3 (1999)
Article Google Scholar
T. Lotter, C. Benien, P. Vary: Multichannel speech enhancement using bayesian spectral amplitude estimation, Proc. 28th ICASSP 03, 832-835 (2003)
MATH Google Scholar
J.W.B. Davenport: Probability and Random Processes: An Introduction for Applied Scientists and Engineers (McGraw-Hill, New York 1970)
MATH Google Scholar
C. Breithaupt, R. Martin: MMSE estimation of magnitude-squared DFT coefficients with supergaussian priors, Proc. 28th ICASSP 03, 896-899 (2003)
Google Scholar
T. Lotter, P. Vary: Noise reduction by maximum a posteriori spectral amplitude estimation with supergaussian speech modeling. In: Proc. 8th Internat. Workshop on Acoustic Echo and Noise Control (2003) pp. 83-86
Google Scholar
Y. Ephraim, D. Malah: Signal to Noise Ratio Estimation for Enhancing Speech Using the Viterbi Algorithm, Tech. Rep. EE PUB 489 (Technion - Israel Institute of Technology, Haifa 1984)
Google Scholar
J. Meyer, K.U. Simmer, K.D. Kammeyer: Comparison of one- and two-channel noise-estimation techniques, Proc. 5th IWAENC 97, 137-145 (1997)
Google Scholar
B.L. McKinley, G.H. Whipple: Model based speech pause detection, Proc. 22th ICASSP 97, 1179-1182 (1997)
Google Scholar
R.J. McAulay, M.L. Malpass: Speech enhancement using a soft-decision noise suppression filter, IEEE Trans. Acoust. Speech Signal Process. ASSP-28(2), 137-145 (1980)
Article Google Scholar
H.G. Hirsch, C. Ehrlicher: Noise estimation techniques for robust speech recognition, Proc. 20th ICASSP 95, 153-156 (1995)
Google Scholar
I. Cohen, B. Berdugo: Speech enhancement for non-stationary noise environments, Signal Process. 81(11), 2403-2418 (2001)
Article MATH Google Scholar
V. Stahl, A. Fischer, R. Bippus: Quantile based noise estimation for spectral subtraction and Wiener filtering, Proc. 25th ICASSP 2000, 1875-1878 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Technion-Israel Institute of Technology, Technion City, 32000, Haifa, Israel
Israel Cohen Ph.D
School of Electrical Engineering, Bar-Ilan University, 52900, Ramat-Gan, Israel
Sharon Gannot Ph.D

Authors

Israel Cohen Ph.D
View author publications
You can also search for this author in PubMed Google Scholar
Sharon Gannot Ph.D
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Israel Cohen Ph.D or Sharon Gannot Ph.D .

Editor information

Editors and Affiliations

INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada
Jacob Benesty Dr.
Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
M. Mohan Sondhi Ph.D.
Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Yiteng Arden Huang Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cohen, I., Gannot, S. (2008). Spectral Enhancement Methods. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_44

Download citation

DOI: https://doi.org/10.1007/978-3-540-49127-9_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics