Modelling Speech Intelligibility in Adverse Conditions

Jørgensen, Søren; Dau, Torsten

doi:10.1007/978-1-4614-1590-9_38

Modelling Speech Intelligibility in Adverse Conditions

Søren Jørgensen⁶ &
Torsten Dau⁶

Conference paper
First Online: 01 January 2013

4360 Accesses
4 Citations

Part of the book series: Advances in Experimental Medicine and Biology ((volume 787))

Abstract

Jørgensen and Dau (J Acoust Soc Am 130:1475–1487, 2011) proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNR_env). This metric was shown to be the key for predicting the intelligibility of reverberant speech as well as noisy speech processed by spectral subtraction. The key role of the SNR_env metric is further supported here by the ability of a short-term version of the sEPSM to predict speech masking release for different speech materials and modulated interferers. However, the sEPSM cannot account for speech subjected to phase jitter, a condition in which the spectral structure of the intelligibility of speech signal is strongly affected, while the broadband temporal envelope is kept largely intact. In contrast, the effects of this distortion can be predicted successfully by the spectro-temporal modulation index (STMI) (Elhilali et al., Speech Commun 41:331–348, 2003), which assumes an explicit analysis of the spectral “ripple” structure of the speech signal. However, since the STMI applies the same decision metric as the STI, it fails to account for spectral subtraction. The results from this study suggest that the SNR_env might reflect a powerful decision metric, while some explicit across-frequency analysis seems crucial in some conditions. How such across-frequency analysis is “realized” in the auditory system remains unresolved.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Berouti M, Schwartz R, Makhoul J (1979) Enhancement of speech corrupted by acoustic noise. Proc IEEE Int Conf Acoust, Speech, Signal Proces (ICASSP-79), USA 4:208–211
Google Scholar
Dubbelboer F, Houtgast T (2008) The concept of signal-to-noise ratio in the modulation domain and speech intelligibility. J Acoust Soc Am 124:3937–3946
Article PubMed Google Scholar
Elhilali M, Chi T, Shamma SA (2003) A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Commun 41:331–348
Article Google Scholar
Elhilali M, Ma L, Micheyl C, Oxenham AJ, Shamma SA (2009) Temporal coherence in the perceptual organization and cortical representation of auditory scenes. Neuron 61:317–329
Article PubMed CAS Google Scholar
Ewert S, Dau T (2000) Characterizing frequency selectivity for envelope fluctuations. J Acoust Soc Am 108:1181–1196
Article PubMed CAS Google Scholar
French N, Steinberg J (1947) Factors governing intelligibility of speech sounds. J Acoust Soc Am 19:90–119
Article Google Scholar
Green DM, Swets JA (1988) Signal detection theory and psychophysics. Peninsula Publishing, Los Altos, pp 238–239
Google Scholar
Holube I, Fredelake S, Vlaming M, Kollmeier B (2010) Development and analysis of an International Speech Test Signal (ISTS). Int J Audiol 49:891–903
Article PubMed Google Scholar
Jørgensen S, Dau T (2011) Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing. J Acoust Soc Am 130:1475–1487
Article PubMed Google Scholar
Kjems U, Boldt JB, Pedersen MS, Lunner T, Wang D (2009) Role of mask pattern in intelligibility of ideal binary-masked noisy speech. J Acoust Soc Am 126:1415–1426
Article PubMed Google Scholar
Moore BCJ, Glasberg BR (1983) Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. J Acoust Soc Am 74:750–753
Article PubMed CAS Google Scholar
Nielsen JB, Dau T (2009) Development of a Danish speech intelligibility test. Int J Audiol 48:729–741
Article PubMed Google Scholar
Piechowiak T, Ewert SD, Dau T (2007) Modeling comodulation masking release using an equalization-cancellation mechanism. J Acoust Soc Am 121:2111–2126
Article PubMed Google Scholar
Steeneken HJM, Houtgast T (1980) A physical method for measuring speech transmission quality. J Acoust Soc Am 67:318–326
Article PubMed CAS Google Scholar
Wagener K, Josvassen JL, Ardenkjaer R (2003) Design, optimization and evaluation of a Danish sentence test in noise. Int J Audiol 42:10–17
Article PubMed Google Scholar

Download references

Acknowledgements

We thank Ewen MacDonald and Hedwig Gockel for helpful comments and suggestions.

Author information

Authors and Affiliations

Department of Electrical Engineering, Centre for Applied Hearing Research, Technical University of Denmark, Ørsteds Plads, Building 352, DK-2800, Kgs., Lyngby, Denmark
Søren Jørgensen & Torsten Dau

Authors

Søren Jørgensen
View author publications
You can also search for this author in PubMed Google Scholar
Torsten Dau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Torsten Dau .

Editor information

Editors and Affiliations

Department of Experimental Psychology, University of Cambridge, Cambridge, United Kingdom
Brian C. J. Moore
Physiology Department, University of Cambridge, Cambridge, United Kingdom
Roy D. Patterson
Physiology Department, University of Cambridge, Cambridge, United Kingdom
Ian M. Winter
MRC-Cognition and Brain Sciences Unit, MRC-Cognition and Brain Sciences Unit, Cambridge, United Kingdom
Robert P. Carlyon
MRC-Cognition and Brain Sciences Unit, MRC-Cognition and Brain Sciences Unit, Cambridge, United Kingdom
Hedwig E Gockel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jørgensen, S., Dau, T. (2013). Modelling Speech Intelligibility in Adverse Conditions. In: Moore, B., Patterson, R., Winter, I., Carlyon, R., Gockel, H. (eds) Basic Aspects of Hearing. Advances in Experimental Medicine and Biology, vol 787. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1590-9_38

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1590-9_38
Published: 16 April 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1589-3
Online ISBN: 978-1-4614-1590-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics