Abstract
A speech signal captured by a distant microphone is generally smeared by reverberation, which severely degrades automatic speaker recognition performance. To improve system performance, an effective and robust method is proposed to extract features for speech processing. In this paper, a room impulse response is presumed to comprise of three parts: a direct-path response, early reflections and late reverberations. Since late reverberations are known to be a major cause of system performance degradation, this paper focuses on dealing with the effect of early reflection because the early reflections and their properties play a necessary role within the acoustics of an enclosure. The proposed method first estimates the early reflection using autocorrelation function from the presentation of speech signals in the first stage, the estimates are combined with an anechoic signal for use into training the system in the second stage. The employed method looks to be promising, achieving a substantial improvement in system performance relating to reduced equal error rate and detection trade-off, especially at longer reverberation time.
Similar content being viewed by others
References
Al-Karawi, K. A. (2019). Robustness speaker recognition based on feature space in clean and noisy condition. International Journal of Sensors, Wireless Communications and Control,9, 1–10.
Al-Karawi, K. A., & Li, F. (2017). Robust speaker verification in reverberant conditions using estimated acoustic parameters: A maximum likelihood estimation and training on the fly approach. 2017 Seventh International Conference on Innovative Computing Technology (INTECH) (pp. 52–57).
Al-Noori, A. H., Al-Karawi, K. A., & Li, F. (2015). Improving robustness of speaker recognition in noisy and reverberant conditions via training. 2015 European Intelligence and Security Informatics Conference (EISIC) (p. 180).
Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., et al. (2004). A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing,2004, 430–451.
Bradley, J., Sato, H., & Picard, M. (2003). On the importance of early reflections for speech in rooms. The Journal of the Acoustical Society of America,113, 3233–3244.
CATT-Acoustic. (2010). v8.0c, Room acoustic modelling software. Retrieved October 18, 2010 from http://www.catt.se.
Defrance, G., Daudet, L., & Polack, J.-D. (2008). Detecting arrivals within room impulse responses using matching pursuit. Proceedings of the 11th International Conference on Digital Audio Effects (DAFx-08), Espoo, Finland (pp. 307–316).
Guillemain, P., & Kronland-Martinet, R. (1996). Characterization of acoustic signals through continuous linear time-frequency representations. Proceedings of the IEEE,84, 561–585.
Jeub, M., Schafer, M., & Vary, P. (2009). A binaural room impulse response database for the evaluation of dereverberation algorithms. 2009 16th International Conference on Digital Signal Processing (pp. 1–5).
Kinoshita, K., Delcroix, M., Nakatani, T., & Miyoshi, M. (2009). Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction. IEEE Transactions on Audio, Speech and Language Processing,17, 534–545.
Kuster, M. (2008). Reliability of estimating the room volume from a single room impulse response. The Journal of the Acoustical Society of America,124, 982–993.
Kuttruff, H. (2009). Room acoustics. Boca Raton: CRC Press.
Li, F. F. (2016). Robust speaker recognition by means of acoustic transmission channel matching: An acoustic parameter estimation approach. 2016 Sixth International Conference on Innovative Computing Technology (INTECH) (pp. 194–198).
Loutridis, S. J. (2005). Decomposition of impulse responses using complex wavelets. Journal of the Audio Engineering Society,53, 796–811.
Mammone, R. J., Zhang, X., & Ramachandran, R. P. (1996). Robust speaker recognition: A feature-based approach. IEEE Signal Processing Magazine,13, 58.
Ming, J., Hazen, T. J., Glass, J. R., & Reynolds, D. A. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech and Language Processing,15, 1711–1723.
Ristić, D. M., Pavlović, M., Pavlović, D. Š., & Reljin, I. (2013). Detection of early reflections using multifractals. The Journal of the Acoustical Society of America,133, EL235–EL241.
Sadjadi, S. O., & Hansen, J. H. (2012). Blind reverberation mitigation for robust speaker identification. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4225–4228).
Sadjadi, S. O., Slaney, M., & Heck, L. (2013). MSR identity toolbox v1. 0: A MATLAB toolbox for speaker-recognition research. Speech and Language Processing Technical Committee Newsletter.
Schonle, M., Fliege, N., & Zolzer, U. (1993). Parametric approximation of room impulse responses based on wavelet decomposition. 1993 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1993. Final Program and Paper Summaries (pp. 68–71).
Suits, B. H. (2015). Autocorrelation (for sound signals). Retrieved March 10, 2015 from http://pages.mtu.edu/~suits/autocorrelation.html.
Vesa, S. (2009). Binaural sound source distance learning in rooms. IEEE Transactions on Audio, Speech and Language Processing,17, 1498–1507.
Wang, N., Ching, P., Zheng, N., & Lee, T. (2011). Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Transactions on Audio, Speech and Language Processing,19, 196–205.
Wang, L., & Nakagawa, S. (2009). Speaker identification/verification for reverberant speech using phase information. Proceedings of WESPAC 2009.
Zhao, X., Shao, Y., & Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech and Language Processing,20, 1608–1616.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Al-Karawi, K.A., Mohammed, D.Y. Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions. Int J Speech Technol 22, 1077–1084 (2019). https://doi.org/10.1007/s10772-019-09648-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-019-09648-z