Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions

Al-Karawi, Khamis A.; Mohammed, Duraid Y.

doi:10.1007/s10772-019-09648-z

Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions

Published: 11 October 2019

Volume 22, pages 1077–1084, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

291 Accesses
18 Citations
Explore all metrics

Abstract

A speech signal captured by a distant microphone is generally smeared by reverberation, which severely degrades automatic speaker recognition performance. To improve system performance, an effective and robust method is proposed to extract features for speech processing. In this paper, a room impulse response is presumed to comprise of three parts: a direct-path response, early reflections and late reverberations. Since late reverberations are known to be a major cause of system performance degradation, this paper focuses on dealing with the effect of early reflection because the early reflections and their properties play a necessary role within the acoustics of an enclosure. The proposed method first estimates the early reflection using autocorrelation function from the presentation of speech signals in the first stage, the estimates are combined with an anechoic signal for use into training the system in the second stage. The employed method looks to be promising, achieving a substantial improvement in system performance relating to reduced equal error rate and detection trade-off, especially at longer reverberation time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Milestones in speaker recognition

Article Open access 15 February 2024

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

Article Open access 25 October 2023

An ensemble of optimal smoothing and minima controlled through iterative averaging for speech enhancement under uncontrolled environment

Article Open access 17 April 2024

References

Al-Karawi, K. A. (2019). Robustness speaker recognition based on feature space in clean and noisy condition. International Journal of Sensors, Wireless Communications and Control,9, 1–10.
Article Google Scholar
Al-Karawi, K. A., & Li, F. (2017). Robust speaker verification in reverberant conditions using estimated acoustic parameters: A maximum likelihood estimation and training on the fly approach. 2017 Seventh International Conference on Innovative Computing Technology (INTECH) (pp. 52–57).
Al-Noori, A. H., Al-Karawi, K. A., & Li, F. (2015). Improving robustness of speaker recognition in noisy and reverberant conditions via training. 2015 European Intelligence and Security Informatics Conference (EISIC) (p. 180).
Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., et al. (2004). A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing,2004, 430–451.
Google Scholar
Bradley, J., Sato, H., & Picard, M. (2003). On the importance of early reflections for speech in rooms. The Journal of the Acoustical Society of America,113, 3233–3244.
Article Google Scholar
CATT-Acoustic. (2010). v8.0c, Room acoustic modelling software. Retrieved October 18, 2010 from http://www.catt.se.
Defrance, G., Daudet, L., & Polack, J.-D. (2008). Detecting arrivals within room impulse responses using matching pursuit. Proceedings of the 11th International Conference on Digital Audio Effects (DAFx-08), Espoo, Finland (pp. 307–316).
Guillemain, P., & Kronland-Martinet, R. (1996). Characterization of acoustic signals through continuous linear time-frequency representations. Proceedings of the IEEE,84, 561–585.
Article Google Scholar
Jeub, M., Schafer, M., & Vary, P. (2009). A binaural room impulse response database for the evaluation of dereverberation algorithms. 2009 16th International Conference on Digital Signal Processing (pp. 1–5).
Kinoshita, K., Delcroix, M., Nakatani, T., & Miyoshi, M. (2009). Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction. IEEE Transactions on Audio, Speech and Language Processing,17, 534–545.
Article Google Scholar
Kuster, M. (2008). Reliability of estimating the room volume from a single room impulse response. The Journal of the Acoustical Society of America,124, 982–993.
Article Google Scholar
Kuttruff, H. (2009). Room acoustics. Boca Raton: CRC Press.
Google Scholar
Li, F. F. (2016). Robust speaker recognition by means of acoustic transmission channel matching: An acoustic parameter estimation approach. 2016 Sixth International Conference on Innovative Computing Technology (INTECH) (pp. 194–198).
Loutridis, S. J. (2005). Decomposition of impulse responses using complex wavelets. Journal of the Audio Engineering Society,53, 796–811.
Google Scholar
Mammone, R. J., Zhang, X., & Ramachandran, R. P. (1996). Robust speaker recognition: A feature-based approach. IEEE Signal Processing Magazine,13, 58.
Article Google Scholar
Ming, J., Hazen, T. J., Glass, J. R., & Reynolds, D. A. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech and Language Processing,15, 1711–1723.
Article Google Scholar
Ristić, D. M., Pavlović, M., Pavlović, D. Š., & Reljin, I. (2013). Detection of early reflections using multifractals. The Journal of the Acoustical Society of America,133, EL235–EL241.
Article Google Scholar
Sadjadi, S. O., & Hansen, J. H. (2012). Blind reverberation mitigation for robust speaker identification. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4225–4228).
Sadjadi, S. O., Slaney, M., & Heck, L. (2013). MSR identity toolbox v1. 0: A MATLAB toolbox for speaker-recognition research. Speech and Language Processing Technical Committee Newsletter.
Schonle, M., Fliege, N., & Zolzer, U. (1993). Parametric approximation of room impulse responses based on wavelet decomposition. 1993 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1993. Final Program and Paper Summaries (pp. 68–71).
Suits, B. H. (2015). Autocorrelation (for sound signals). Retrieved March 10, 2015 from http://pages.mtu.edu/~suits/autocorrelation.html.
Vesa, S. (2009). Binaural sound source distance learning in rooms. IEEE Transactions on Audio, Speech and Language Processing,17, 1498–1507.
Article Google Scholar
Wang, N., Ching, P., Zheng, N., & Lee, T. (2011). Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Transactions on Audio, Speech and Language Processing,19, 196–205.
Article Google Scholar
Wang, L., & Nakagawa, S. (2009). Speaker identification/verification for reverberant speech using phase information. Proceedings of WESPAC 2009.
Zhao, X., Shao, Y., & Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech and Language Processing,20, 1608–1616.
Article Google Scholar

Download references

Author information

Khamis A. Al-Karawi
Present address: University of Diyala, Diyala, Baqubah, Iraq

Authors and Affiliations

School of Engineering, Al-Iraqia University, Baghdad, Iraq
Duraid Y. Mohammed

Authors

Khamis A. Al-Karawi
View author publications
You can also search for this author in PubMed Google Scholar
Duraid Y. Mohammed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khamis A. Al-Karawi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Karawi, K.A., Mohammed, D.Y. Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions. Int J Speech Technol 22, 1077–1084 (2019). https://doi.org/10.1007/s10772-019-09648-z

Download citation

Received: 11 April 2019
Accepted: 26 September 2019
Published: 11 October 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10772-019-09648-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions

Abstract

Access this article

Similar content being viewed by others

Milestones in speaker recognition

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

An ensemble of optimal smoothing and minima controlled through iterative averaging for speech enhancement under uncontrolled environment

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions

Abstract

Access this article

Similar content being viewed by others

Milestones in speaker recognition

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

An ensemble of optimal smoothing and minima controlled through iterative averaging for speech enhancement under uncontrolled environment

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation