Skip to main content
Log in

Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

A speech signal captured by a distant microphone is generally smeared by reverberation, which severely degrades automatic speaker recognition performance. To improve system performance, an effective and robust method is proposed to extract features for speech processing. In this paper, a room impulse response is presumed to comprise of three parts: a direct-path response, early reflections and late reverberations. Since late reverberations are known to be a major cause of system performance degradation, this paper focuses on dealing with the effect of early reflection because the early reflections and their properties play a necessary role within the acoustics of an enclosure. The proposed method first estimates the early reflection using autocorrelation function from the presentation of speech signals in the first stage, the estimates are combined with an anechoic signal for use into training the system in the second stage. The employed method looks to be promising, achieving a substantial improvement in system performance relating to reduced equal error rate and detection trade-off, especially at longer reverberation time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Al-Karawi, K. A. (2019). Robustness speaker recognition based on feature space in clean and noisy condition. International Journal of Sensors, Wireless Communications and Control,9, 1–10.

    Article  Google Scholar 

  • Al-Karawi, K. A., & Li, F. (2017). Robust speaker verification in reverberant conditions using estimated acoustic parameters: A maximum likelihood estimation and training on the fly approach. 2017 Seventh International Conference on Innovative Computing Technology (INTECH) (pp. 52–57).

  • Al-Noori, A. H., Al-Karawi, K. A., & Li, F. (2015). Improving robustness of speaker recognition in noisy and reverberant conditions via training. 2015 European Intelligence and Security Informatics Conference (EISIC) (p. 180).

  • Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., et al. (2004). A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing,2004, 430–451.

    Google Scholar 

  • Bradley, J., Sato, H., & Picard, M. (2003). On the importance of early reflections for speech in rooms. The Journal of the Acoustical Society of America,113, 3233–3244.

    Article  Google Scholar 

  • CATT-Acoustic. (2010). v8.0c, Room acoustic modelling software. Retrieved October 18, 2010 from http://www.catt.se.

  • Defrance, G., Daudet, L., & Polack, J.-D. (2008). Detecting arrivals within room impulse responses using matching pursuit. Proceedings of the 11th International Conference on Digital Audio Effects (DAFx-08), Espoo, Finland (pp. 307–316).

  • Guillemain, P., & Kronland-Martinet, R. (1996). Characterization of acoustic signals through continuous linear time-frequency representations. Proceedings of the IEEE,84, 561–585.

    Article  Google Scholar 

  • Jeub, M., Schafer, M., & Vary, P. (2009). A binaural room impulse response database for the evaluation of dereverberation algorithms. 2009 16th International Conference on Digital Signal Processing (pp. 1–5).

  • Kinoshita, K., Delcroix, M., Nakatani, T., & Miyoshi, M. (2009). Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction. IEEE Transactions on Audio, Speech and Language Processing,17, 534–545.

    Article  Google Scholar 

  • Kuster, M. (2008). Reliability of estimating the room volume from a single room impulse response. The Journal of the Acoustical Society of America,124, 982–993.

    Article  Google Scholar 

  • Kuttruff, H. (2009). Room acoustics. Boca Raton: CRC Press.

    Google Scholar 

  • Li, F. F. (2016). Robust speaker recognition by means of acoustic transmission channel matching: An acoustic parameter estimation approach. 2016 Sixth International Conference on Innovative Computing Technology (INTECH) (pp. 194–198).

  • Loutridis, S. J. (2005). Decomposition of impulse responses using complex wavelets. Journal of the Audio Engineering Society,53, 796–811.

    Google Scholar 

  • Mammone, R. J., Zhang, X., & Ramachandran, R. P. (1996). Robust speaker recognition: A feature-based approach. IEEE Signal Processing Magazine,13, 58.

    Article  Google Scholar 

  • Ming, J., Hazen, T. J., Glass, J. R., & Reynolds, D. A. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech and Language Processing,15, 1711–1723.

    Article  Google Scholar 

  • Ristić, D. M., Pavlović, M., Pavlović, D. Š., & Reljin, I. (2013). Detection of early reflections using multifractals. The Journal of the Acoustical Society of America,133, EL235–EL241.

    Article  Google Scholar 

  • Sadjadi, S. O., & Hansen, J. H. (2012). Blind reverberation mitigation for robust speaker identification. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4225–4228).

  • Sadjadi, S. O., Slaney, M., & Heck, L. (2013). MSR identity toolbox v1. 0: A MATLAB toolbox for speaker-recognition research. Speech and Language Processing Technical Committee Newsletter.

  • Schonle, M., Fliege, N., & Zolzer, U. (1993). Parametric approximation of room impulse responses based on wavelet decomposition. 1993 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1993. Final Program and Paper Summaries (pp. 68–71).

  • Suits, B. H. (2015). Autocorrelation (for sound signals). Retrieved March 10, 2015 from http://pages.mtu.edu/~suits/autocorrelation.html.

  • Vesa, S. (2009). Binaural sound source distance learning in rooms. IEEE Transactions on Audio, Speech and Language Processing,17, 1498–1507.

    Article  Google Scholar 

  • Wang, N., Ching, P., Zheng, N., & Lee, T. (2011). Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Transactions on Audio, Speech and Language Processing,19, 196–205.

    Article  Google Scholar 

  • Wang, L., & Nakagawa, S. (2009). Speaker identification/verification for reverberant speech using phase information. Proceedings of WESPAC 2009.

  • Zhao, X., Shao, Y., & Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech and Language Processing,20, 1608–1616.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khamis A. Al-Karawi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Karawi, K.A., Mohammed, D.Y. Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions. Int J Speech Technol 22, 1077–1084 (2019). https://doi.org/10.1007/s10772-019-09648-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-019-09648-z

Keywords

Navigation