Skip to main content

Part of the book series: Springer Handbooks ((SHB))

Abstract

In this chapter, we focus on the statistical methods that constitute a speech spectral enhancement system and describe some of their fundamental components. We begin in Sect. 44.2 by formulating the problem of spectral enhancement. In Sect. 44.3, we address the time-frequency correlation of spectral coefficients for speech and noise signals, and present statistical models that conform with these characteristics. In Sect. 44.4, we present estimators for speech spectral coefficients under speech presence uncertainty based on various fidelity criteria. In Sect. 44.5, we address the problem of speech presence probability estimation. In Sect. 44.6, we present useful estimators for the a priori signal-to-noise ratio (SNR) under speech presence uncertainty. We present the decision-directed approach, which is heuristically motivated, and the recursive estimation approach, which is based on statistical models and follows the rationale of Kalman filtering. In Sect. 44.7, we describe the improved minima-controlled recursive averaging (IMCRA) approach for noise power spectrum estimation. In Sect. 44.8, we provide a detailed example of a speech enhancement algorithm, and demonstrate its performance in environments with various noise types. In Sect. 44.9, we survey the main types of spectral enhancement components, and discuss the significance of the choice of statistical model, fidelity criterion, a priori SNR estimator, and noise spectrum estimator. Some concluding comments are made in Sect. 44.10.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 579.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 729.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ACS:

autocorrelation coefficient sequences

HMP:

hidden Markov processes

IMCRA:

improved minima-controlled recursive averaging

ITU:

International Telecommunication Union

KLT:

Karhunen-Loève transform

LSA:

log-spectral amplitude

MAP:

maximum a posteriori

MMSE-LSA:

MMSE of the log-spectral amplitude

MMSE-SA:

MMSE of the spectral amplitude

MMSE:

minimum mean-square error

MS:

minimum statistics

MSE:

mean-square error

PESQ:

perceptual evaluation of speech quality

SNR:

signal-to-noise ratio

STFT:

short-time Fourier transform

SegSNR:

segmental SNR

TI:

transinformation index

VAD:

voice activity detector

WGN:

white Gaussian noise

References

  1. J. Benesty, S. Makino, J. Chen (Eds.): Speech Enhancement (Springer, Berlin, Heidelberg 2005)

    Google Scholar 

  2. Y. Ephraim, I. Cohen: Recent advancements in speech enhancement. In: The Electrical Engineering Handbook, Circuits, Signals, and Speech and Image Processing, 3rd edn., ed. by R.C. Dorf (CRC, Boca Raton 2006) pp. 15-12-15-26, Chap. 15

    Google Scholar 

  3. Y. Ephraim, H. Lev-Ari, W.J.J. Roberts: A brief survey of speech enhancement. In: The Electronic Handbook, 2nd edn. (CRC-Press, Boca Raton 2005)

    Google Scholar 

  4. S.F. Boll: Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process. 27(2), 113-120 (1979)

    Article  Google Scholar 

  5. J.S. Lim, A.V. Oppenheim: Enhancement and bandwidth compression of noisy speech, Proc. IEEE 67(12), 1586-1604 (1979)

    Article  Google Scholar 

  6. M. Berouti, R. Schwartz, J. Makhoul: Enhancement of speech corrupted by acoustic noise, Proc. 4th ICASSP 79, 208-211 (1979)

    Google Scholar 

  7. Z. Goh, K.-C. Tan, T.G. Tan: Postprocessing method for suppressing musical noise generated by spectral subtraction, IEEE Trans. Speech Audio Process. 6(3), 287-292 (1998)

    Article  Google Scholar 

  8. B.L. Sim, Y.C. Tong, J.S. Chang, C.T. Tan: A parametric formulation of the generalized spectral subtraction method, IEEE Trans. Speech Audio Process. 6(4), 328-337 (1998)

    Article  Google Scholar 

  9. H. Gustafsson, S.E. Nordholm, I. Claesson: Spectral subtraction using reduced delay convolution and adaptive averaging, IEEE Trans. Speech Audio Process. 9(8), 799-807 (2001)

    Article  Google Scholar 

  10. D.E. Tsoukalas, J.N. Mourjopoulos, G. Kokkinakis: Speech enhancement based on audible noise suppression, IEEE Trans. Speech Audio Process. 5(6), 497-514 (1997)

    Article  MATH  Google Scholar 

  11. N. Virag: Single channel speech enhancement based on masking properties of the human auditory system, IEEE Trans. Speech Audio Process. 7(2), 126-137 (1999)

    Article  Google Scholar 

  12. Y. Ephraim, D. Malah: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. ASSP-32(6), 1109-1121 (1984)

    Article  Google Scholar 

  13. D. Burshtein, S. Gannot: Speech enhancement using a mixture-maximum model, IEEE Trans. Speech Audio Process. 10(6), 341-351 (2002)

    Article  Google Scholar 

  14. R. Martin: Speech enhancement based on minimum mean-square error estimation and supergaussian priors, IEEE Trans. Speech Audio Process. 13(5), 845-856 (2005)

    Article  Google Scholar 

  15. I. Cohen: Relaxed statistical model for speech enhancement and a priori SNR estimation, IEEE Trans. Speech Audio Process. 13(5), 870-881 (2005)

    Article  Google Scholar 

  16. I. Cohen: Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity models, Signal Process. 86(4), 698-709 (2006)

    Article  MATH  Google Scholar 

  17. Y. Ephraim, D. Malah: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. ASSP-33(2), 443-445 (1985)

    Article  Google Scholar 

  18. P.J. Wolfe, S.J. Godsill: Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement, Special Issue EURASIP JASP Digital Audio Multim. Commun. 2003(10), 1043-1051 (2003)

    MATH  Google Scholar 

  19. P.C. Loizou: Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum, IEEE Trans. Speech Audio Process. 13(5), 857-869 (2005)

    Article  Google Scholar 

  20. B.H. Juang, L.R. Rabiner: Mixture autoregressive hidden Markov models for speech signals, IEEE Trans. Acoust. Speech Signal Process. ASSP-33(6), 1404-1413 (1985)

    Article  Google Scholar 

  21. Y. Ephraim: Statistical-model-based speech enhancement systems, Proc. IEEE 80(10), 1526-1555 (1992)

    Article  Google Scholar 

  22. H. Sheikhzadeh, L. Deng: Waveform-based speech recognition using hidden filter models: Parameter selection and sensitivity to power normalization, IEEE Trans. Speech Audio Process. 2, 80-91 (1994)

    Article  Google Scholar 

  23. Y. Ephraim, N. Merhav: Hidden Markov processes, IEEE Trans. Inform. Theory 48(6), 1518-1568 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  24. C.J. Wellekens: Explicit time correlations in hidden Markov models for speech recognition, Proc. 12th ICASSP 87, 384-386 (1987)

    Google Scholar 

  25. H. Sameti, H. Sheikhzadeh, L. Deng, R.L. Brennan: HMM-based strategies for enhancement of speech signals embedded in nonstationary noise, IEEE Trans. Speech Audio Process. 6(5), 445-455 (1998)

    Article  Google Scholar 

  26. L.R. Rabiner, B.-H. Juang: Fundamentals of Speech Recognition (Prentice-Hall, Upper Saddle River 1993)

    MATH  Google Scholar 

  27. F. Jelinek: Statistical Methods for Speech Recognition (MIT Press, Cambridge 1998)

    Google Scholar 

  28. Y. Ephraim, H.L.V. Trees: A signal subspace approach for speech enhancement, IEEE Trans. Speech Audio Process. 3(4), 251-266 (1995)

    Article  Google Scholar 

  29. F. Asano, S. Hayamizu, T. Yamada, S. Nakamura: Speech enhancement based on the subspace method, IEEE Trans. Speech Audio Process. 8(5), 497-507 (2000)

    Article  Google Scholar 

  30. U. Mittal, N. Phamdo: Signal/noise KLT based approach for enhancing speech degraded by colored noise, IEEE Trans. Speech Audio Process. 8(2), 159-167 (2000)

    Article  Google Scholar 

  31. Y. Hu, P.C. Loizou: A generalized subspace approach for enhancing speech corrupted by colored noise, IEEE Trans. Speech Audio Process. 11(4), 334-341 (2003)

    Article  Google Scholar 

  32. S.H. Jensen, P.C. Hansen, S.D. Hansen, J.A. Sørensen: Reduction of broad-band noise in speech by truncated QSVD, IEEE Trans. Speech Audio Process. 3(6), 439-448 (1995)

    Article  MATH  Google Scholar 

  33. S. Doclo, M. Moonen: GSVD-based optimal filtering for single and multimicrophone speech enhancement, IEEE Trans. Signal Process. 50(9), 2230-2244 (2002)

    Article  Google Scholar 

  34. F. Jabloun, B. Champagne: Incorporating the human hearing properties in the signal subspace approach for speech enhancement, IEEE Trans. Speech Audio Process. 11(6), 700-708 (2003)

    Article  Google Scholar 

  35. Y. Hu, P.C. Loizou: A perceptually motivated approach for speech enhancement, IEEE Trans. Speech Audio Process. 11(5), 457-465 (2003)

    Article  Google Scholar 

  36. J. Wexler, S. Raz: Discrete Gabor expansions, Speech Process. 21(3), 207-220 (1990)

    Google Scholar 

  37. R.E. Crochiere, L.R. Rabiner: Multirate Digital Signal Processing (Prentice-Hall, Englewood Cliffs 1983)

    Google Scholar 

  38. J.S. Garofolo: Getting Started with the DARPA TIMIT CD-ROM: An Acoustic Phonetic Continuous Speech Database (NIST, Gaithersburg 1988)

    Google Scholar 

  39. A. Stuart, J.K. Ord: Kendallʼs Advanced Theory of Statistics, Vol. 1, 6th edn. (Arnold, London 1994)

    Google Scholar 

  40. R. Martin: Speech enhancement using MMSE short time spectral estimation with Gamma distributed speech priors, Proc. 27th ICASSP 02, 253-256 (2002)

    Google Scholar 

  41. I. Cohen: Modeling speech signals in the time-frequency domain using GARCH, Signal Process. 84(12), 2453-2459 (2004)

    Article  Google Scholar 

  42. I. Cohen, B. Berdugo: Speech enhancement for non-stationary noise environments, Signal Process. 81(11), 2403-2418 (2001)

    Article  MATH  Google Scholar 

  43. I. Cohen: Speech enhancement using supergaussian speech models and noncausal a priori SNR estimation, Speech Commun. 47(3), 336-350 (2005)

    Article  Google Scholar 

  44. I.S. Gradshteyn, I.M. Ryzhik: Table of Integrals, Series, and Products, 4th edn. (Academic Press, New York 1980)

    MATH  Google Scholar 

  45. R. Martin, C. Breithaupt: Speech enhancement in the DFT domain using Laplacian speech priors. In: Proc. 8th Int. Workshop on Acoustic Echo and Noise Control (Kyoto, Japan 2003) pp. 87-90

    Google Scholar 

  46. J. Porter, S. Boll: Optimal estimators for spectral restoration of noisy speech, Proc. ICASSP 84, 18A.2.1-18A.2.4 (1984)

    Google Scholar 

  47. O. Cappé: Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor, IEEE Trans. Acoust. Speech Signal Process. 2(2), 345-349 (1994)

    Article  Google Scholar 

  48. P. Scalart, J. Vieira-Filho: Speech enhancement based on a priori signal to noise estimation, Proc. 21th ICASSP 96, 629-632 (1996)

    Google Scholar 

  49. D. Malah, R.V. Cox, A.J. Accardi: Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments, Proc. 24th ICASSP 99, 789-792 (1999)

    Google Scholar 

  50. I. Cohen: On speech enhancement under signal presence uncertainty, Proc. 26th ICASSP 2001, 167-170 (2001)

    Google Scholar 

  51. I.Y. Soon, S.N. Koh, C.K. Yeo: Improved noise suppression filter using self-adaptive estimator of probability of speech absence, Signal Process. 75(2), 151-159 (1999)

    Article  MATH  Google Scholar 

  52. M. Marzinzik: Noise reduction schemes for digital hearing aids and their use for the hearing impaired, Ph.D. Thesis (Oldenburg University, Oldenburg 2000)

    Google Scholar 

  53. I. Cohen: Speech enhancement using a noncausal a priori SNR estimator, IEEE Signal Process. Lett. 11(9), 725-728 (2004)

    Article  Google Scholar 

  54. I. Cohen: Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging, IEEE Trans. Speech Audio Process. 11(5), 466-475 (2003)

    Article  Google Scholar 

  55. C. Ris, S. Dupont: Assessing local noise level estimation methods: Application to noise robust ASR, Speech Commun. 34(1-2), 141-158 (2001)

    Article  MATH  Google Scholar 

  56. R. Martin: Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech Audio Process. 9(5), 504-512 (2001)

    Article  Google Scholar 

  57. G. Doblinger: Computationally efficient speech enhancement by spectral minima tracking in subbands, Proc. 4th Eurospeech 95, 1513-1516 (1995)

    Google Scholar 

  58. S. Qian, D. Chen: Discrete Gabor transform, IEEE Trans. Signal Process. 41(7), 2429-2438 (1993)

    Article  MATH  Google Scholar 

  59. R. Martin: Spectral subtraction based on minimum statistics, Proc. 7th EUSIPCO 94, 1182-1185 (1994)

    Google Scholar 

  60. A. Varga, H.J.M. Steeneken: Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems, Speech Commun. 12(3), 247-251 (1993)

    Article  Google Scholar 

  61. S.R. Quackenbush, T.P. Barnwell, M.A. Clements: Objective Measures of Speech Quality (Prentice-Hall, Englewood Cliffs 1988)

    Google Scholar 

  62. J.R. Deller, J.H.L. Hansen, J.G. Proakis: Discrete-Time Processing of Speech Signals, 2nd edn. (IEEE, New York 2000)

    Google Scholar 

  63. P.E. Papamichalis: Practical Approaches to Speech Coding (Prentice-Hall, Englewood Cliffs 1987)

    Google Scholar 

  64. A.J. Accardi, R.V. Cox: A modular approach to speech enhancement with an application to speech coding, Proc. 24th ICASSP 99, 201-204 (1999)

    Google Scholar 

  65. J. Sohn, N.S. Kim, W. Sung: A statistical model-based voice activity detector, IEEE Signal Process. Lett. 6(1), 1-3 (1999)

    Article  Google Scholar 

  66. T. Lotter, C. Benien, P. Vary: Multichannel speech enhancement using bayesian spectral amplitude estimation, Proc. 28th ICASSP 03, 832-835 (2003)

    MATH  Google Scholar 

  67. J.W.B. Davenport: Probability and Random Processes: An Introduction for Applied Scientists and Engineers (McGraw-Hill, New York 1970)

    MATH  Google Scholar 

  68. C. Breithaupt, R. Martin: MMSE estimation of magnitude-squared DFT coefficients with supergaussian priors, Proc. 28th ICASSP 03, 896-899 (2003)

    Google Scholar 

  69. T. Lotter, P. Vary: Noise reduction by maximum a posteriori spectral amplitude estimation with supergaussian speech modeling. In: Proc. 8th Internat. Workshop on Acoustic Echo and Noise Control (2003) pp. 83-86

    Google Scholar 

  70. Y. Ephraim, D. Malah: Signal to Noise Ratio Estimation for Enhancing Speech Using the Viterbi Algorithm, Tech. Rep. EE PUB 489 (Technion - Israel Institute of Technology, Haifa 1984)

    Google Scholar 

  71. J. Meyer, K.U. Simmer, K.D. Kammeyer: Comparison of one- and two-channel noise-estimation techniques, Proc. 5th IWAENC 97, 137-145 (1997)

    Google Scholar 

  72. B.L. McKinley, G.H. Whipple: Model based speech pause detection, Proc. 22th ICASSP 97, 1179-1182 (1997)

    Google Scholar 

  73. R.J. McAulay, M.L. Malpass: Speech enhancement using a soft-decision noise suppression filter, IEEE Trans. Acoust. Speech Signal Process. ASSP-28(2), 137-145 (1980)

    Article  Google Scholar 

  74. H.G. Hirsch, C. Ehrlicher: Noise estimation techniques for robust speech recognition, Proc. 20th ICASSP 95, 153-156 (1995)

    Google Scholar 

  75. I. Cohen, B. Berdugo: Speech enhancement for non-stationary noise environments, Signal Process. 81(11), 2403-2418 (2001)

    Article  MATH  Google Scholar 

  76. V. Stahl, A. Fischer, R. Bippus: Quantile based noise estimation for spectral subtraction and Wiener filtering, Proc. 25th ICASSP 2000, 1875-1878 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Israel Cohen Ph.D or Sharon Gannot Ph.D .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Cohen, I., Gannot, S. (2008). Spectral Enhancement Methods. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49127-9_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49125-5

  • Online ISBN: 978-3-540-49127-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics