On Residual CNN in Text-Dependent Speaker Verification Task

Malykh, Egor; Novoselov, Sergey; Kudashev, Oleg

doi:10.1007/978-3-319-66429-3_59

Egor Malykh¹⁶,
Sergey Novoselov^16,17 &
Oleg Kudashev^16,17

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

International Conference on Speech and Computer

1619 Accesses
6 Citations

Abstract

Deep learning approaches are still not very common in the speaker verification field. We investigate the possibility of using deep residual convolutional neural network with spectrograms as an input features in the text-dependent speaker verification task. Despite the fact that we were not able to surpass the baseline system in quality, we achieved a quite good results for such a new approach getting an 5.23% ERR on the RSR2015 evaluation part. Fusion of the baseline and proposed systems outperformed the best individual system by 18% relatively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)
Article Google Scholar
Lei, Y., Scheffer, N., Ferrer, L., McLaren, M.: A novel scheme for speaker recognition using a phonetically-aware deep neural network. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1695–1699. IEEE, May 2014
Google Scholar
McLaren, M., Lei, Y., Ferrer, L.: Advances in deep neural network approaches to speaker recognition. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4814–4818. IEEE, April 2015
Google Scholar
Bhattacharya, G., Alam, J., Stafylakis, T., Kenny, P.: Deep Neural Network based Text-Dependent Speaker Recognition: Preliminary Results
Google Scholar
Stafylakis, T., Kenny, P., Ouellet, P., Perez, J., Kockmann, M., Dumouchel, P.: Text-dependent speaker recognition using PLDA with uncertainty propagation. Matrix 500, 1 (2013)
Google Scholar
Larcher, A., Lee, K. A., Ma, B., Li, H.: RSR2015: database for text-dependent speaker verification using multiple pass-phrases. In: INTERSPEECH, pp. 1580–1583, September 2012
Google Scholar
Larcher, A., Lee, K.A., Ma, B., Li, H.: Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)
Article Google Scholar
Aronowitz, H.: Text dependent speaker verification using a small development set. In: Odyssey 2012-The Speaker and Language Recognition Workshop (2012)
Google Scholar
Novoselov, S., Pekhovsky, T., Shulipa, A., Sholokhov, A.: Text-dependent GMM-JFA system for password based speaker verification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 729–737. IEEE, May 2014
Google Scholar
Matějka, P., Glembek, O., Novotný, O., Plchot, O., Grézl, F., Burget, L., Cernocký, J.H.: Analysis of DNN approaches to speaker identification. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5100–5104. IEEE, March 2016
Google Scholar
Variani, E., Lei, X., McDermott, E., Moreno, I.L., Gonzalez-Dominguez, J.: Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4052–4056. IEEE, May 2014
Google Scholar
Heigold, G., Moreno, I., Bengio, S., Shazeer, N.: End-to-end text-dependent speaker verification. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5115–5119. IEEE, March 2016
Google Scholar
Zhang, S.X., Chen, Z., Zhao, Y., Li, J., Gong, Y.: End-to-End attention based text-dependent speaker verification. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 171–178. IEEE, December 2016
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Zeinali, H., Burget, L., Sameti, H., Glembek, O., Plchot, O.: Deep neural networks and hidden markov models in i-vector-based text-dependent speaker verification. In: Odyssey-The Speaker and Language Recognition Workshop, June 2016
Google Scholar
Chollet, F.: Keras (2015). http://keras.io
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Ghemawat, S., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems (2016). arXiv preprint: arXiv:1603.04467
Kingma, D., Ba, J.: Adam: a method for stochastic optimization, arXiv preprint: arXiv:1412.6980
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). doi:10.1007/978-3-319-46493-0_38
Chapter Google Scholar
Novoselov, S., Pekhovsky, T., Kudashev, O., Mendelev, V., Prudnikov, A.: Non-linear PLDA for i-vector speaker verification. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 214–218 (2015)
Google Scholar
Kudashev, O., Novoselov, S., Pekhovsky, T., Simonchik, K., Lavrentyeva, G.: Usage of DNN in Speaker recognition: advantages and problems. In: Cheng, L., Liu, Q., Ronzhin, A. (eds.) ISNN 2016. LNCS, vol. 9719, pp. 82–91. Springer, Cham (2016). doi:10.1007/978-3-319-40663-3_10
Chapter Google Scholar
Novoselov, S., Pekhovsky, T., Shulipa, A., Kudashev, O.: PLDA-based system for text-prompted password speaker verification. In: 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–5. IEEE, August 2015
Google Scholar
Novoselov, S., Sukhmel, V., Sholokhov, A., Pekhovsky, T.: Employment of DTW-based HMM-GMM multi-session training in textdependent speaker verification. J. Instrum. Eng. 57(2), 77–84 (2014). (in Russian)
Google Scholar

Download references

Acknowledgements

This work was financially supported by the Ministry of Education and Science of the Russian Federation, contract 14.578.21.0126 (ID RFMEFI57815X0126).

Author information

Authors and Affiliations

ITMO University, St. Petersburg, Russia
Egor Malykh, Sergey Novoselov & Oleg Kudashev
STC-innovations Ltd., St. Petersburg, Russia
Sergey Novoselov & Oleg Kudashev

Authors

Egor Malykh
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Novoselov
View author publications
You can also search for this author in PubMed Google Scholar
Oleg Kudashev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Egor Malykh .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malykh, E., Novoselov, S., Kudashev, O. (2017). On Residual CNN in Text-Dependent Speaker Verification Task. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_59

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_59
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics