Abstract
Lately developed approaches to distant speech processing tasks fail to reach the quality of close-talking speech processing in terms of speech recognition, speaker identification and diarization quality. Sound source localization remains an important aspect in multi-channel distant speech processing applications. This paper considers an approach to improve speaker localization quality on large-aperture microphone arrays. To reduce the shortcomings of signal acquisition with large-aperture arrays and reduce the impact of noise and interference, a Time-Frequency masking approach is proposed applying Complex Angular Central Gaussian Mixture Models for sound source directional clustering and inter-component phase analysis for polyharmonic speech component restoration. The approach is tested on real-life multi-speaker recordings and shown to increase speaker localization accuracy for the cases of non-overlapped and partially overlapped speech.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Astapov, S., Lavrentyev, A., Shuranov, E.: Far field speech enhancement at low SNR in presence of nonstationary noise based on spectral masking and MVDR beamforming. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 21–31. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_3
Astapov, S., et al.: Acoustic event mixing to multichannel AMI data for distant speech recognition and acoustic event classification benchmarking. In: Salah, A.A., Karpov, A., Potapova, R. (eds.) SPECOM 2019. LNCS (LNAI), vol. 11658, pp. 31–42. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26061-3_4
Barysenka, S.Y., Vorobiov, V.I., Mowlaee, P.: Single-channel speech enhancement using inter-component phase relations. Speech Commun. 99, 144–160 (2018)
Comanducci, L., Cobos, M., Antonacci, F., Sarti, A.: Time difference of arrival estimation from frequency-sliding generalized cross-correlations using convolutional neural networks. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949 (2020)
Dey, N., Ashour, A.: Direction of Arrival Estimation and Localization of Multi-Speech Sources. Springer Briefs in Speech Technology. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-73059-2
DiBiase, J.H.: A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays. Ph.D. thesis, Brown University, Providence, RI, USA (2000)
Do, H., Silverman, H.F.: Stochastic particle filtering: a fast SRP-PHAT single source localization algorithm. In: 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 213–216 (2009)
He, W., Lu, L., Zhang, B., Mahadeokar, J., Kalgaonkar, K., Fuegen, C.: Spatial attention for far-field speech recognition with deep beamforming neural networks. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7499–7503, May 2020
Ito, N., Araki, S., Nakatani, T.: Complex angular central Gaussian mixture model for directional statistics in mask-based microphone array signal processing. In: 2016 24th European Signal Processing Conference (EUSIPCO), pp. 1153–1157 (2016)
Kulmer, J., Mowlaee, P.: Phase estimation in single channel speech enhancement using phase decomposition. IEEE Signal Process. Lett. 22(5), 598–602 (2015)
Luo, Y., Han, C., Mesgarani, N., Ceolini, E., Liu, S.C.: FaSNet: Low-latency adaptive beamforming for multi-microphone audio processing. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 260–267. IEEE, Piscataway, NJ (2020). IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019); Conference Location: Singapore, Singapore; Conference Date: December 14–18 (2019)
Sachar, J.M.: Some Important Algorithms for Large-Aperture Microphone Arrays: Calibration and Determination of Talker Orientation. Ph.D. thesis, Brown University, Providence, RI, USA (2004)
Silverman, H.F., Patterson, W.R., Sachar, J.: Factors affecting the performance of large-aperture microphone arrays. J. Acoust. Soc. Am. 111(5 Pt 1), 2140–2157 (2002)
Vera-Diaz, J., Pizarro, D., Macias-Guarasa, J.: Towards end-to-end acoustic localization using deep learning: from audio signals to source position coordinates. Sensors 18, 3418 (2018)
Vorobiov, V.I., Davydov, A.G.: Study of the relations between quasi-harmonic components of speech signal in Chinese language. Proc. Twenty-Fifth Session Russian Acoust. Soc. 3, 11–14 (2012)
Watanabe, S., Araki, S., Bacchiani, M., Haeb-Umbach, R., Seltzer, M.L.: Introduction to the issue on far-field speech processing in the era of deep learning: speech enhancement, separation, and recognition. IEEE J. Sel. Top. Sig. Process. 13(4), 785–786 (2019)
Xiao, X., Watanabe, S., Chng, E.S., Li, H.: Beamforming networks using spatial covariance features for far-field speech recognition. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–6 (2016)
Zhao, H., Zarar, S., Tashev, I., Lee, C.H.: Convolutional-recurrent neural networks for speech enhancement. In: IEEE International Conference Acoustics Speech and Signal Processing (ICASSP), April 2018
Acknowledgments
This research was financially supported by the Foundation NTI (Contract 20/18gr, ID 0000000007418QR20002) and by the Government of the Russian Federation (Grant 08-08).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Astapov, S., Popov, D., Kabarov, V. (2020). Directional Clustering with Polyharmonic Phase Estimation for Enhanced Speaker Localization. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-60276-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)