Skip to main content

On Residual CNN in Text-Dependent Speaker Verification Task

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

Abstract

Deep learning approaches are still not very common in the speaker verification field. We investigate the possibility of using deep residual convolutional neural network with spectrograms as an input features in the text-dependent speaker verification task. Despite the fact that we were not able to surpass the baseline system in quality, we achieved a quite good results for such a new approach getting an 5.23% ERR on the RSR2015 evaluation part. Fusion of the baseline and proposed systems outperformed the best individual system by 18% relatively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  2. Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)

    Article  Google Scholar 

  3. Lei, Y., Scheffer, N., Ferrer, L., McLaren, M.: A novel scheme for speaker recognition using a phonetically-aware deep neural network. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1695–1699. IEEE, May 2014

    Google Scholar 

  4. McLaren, M., Lei, Y., Ferrer, L.: Advances in deep neural network approaches to speaker recognition. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4814–4818. IEEE, April 2015

    Google Scholar 

  5. Bhattacharya, G., Alam, J., Stafylakis, T., Kenny, P.: Deep Neural Network based Text-Dependent Speaker Recognition: Preliminary Results

    Google Scholar 

  6. Stafylakis, T., Kenny, P., Ouellet, P., Perez, J., Kockmann, M., Dumouchel, P.: Text-dependent speaker recognition using PLDA with uncertainty propagation. Matrix 500, 1 (2013)

    Google Scholar 

  7. Larcher, A., Lee, K. A., Ma, B., Li, H.: RSR2015: database for text-dependent speaker verification using multiple pass-phrases. In: INTERSPEECH, pp. 1580–1583, September 2012

    Google Scholar 

  8. Larcher, A., Lee, K.A., Ma, B., Li, H.: Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)

    Article  Google Scholar 

  9. Aronowitz, H.: Text dependent speaker verification using a small development set. In: Odyssey 2012-The Speaker and Language Recognition Workshop (2012)

    Google Scholar 

  10. Novoselov, S., Pekhovsky, T., Shulipa, A., Sholokhov, A.: Text-dependent GMM-JFA system for password based speaker verification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 729–737. IEEE, May 2014

    Google Scholar 

  11. Matějka, P., Glembek, O., Novotný, O., Plchot, O., Grézl, F., Burget, L., Cernocký, J.H.: Analysis of DNN approaches to speaker identification. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5100–5104. IEEE, March 2016

    Google Scholar 

  12. Variani, E., Lei, X., McDermott, E., Moreno, I.L., Gonzalez-Dominguez, J.: Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4052–4056. IEEE, May 2014

    Google Scholar 

  13. Heigold, G., Moreno, I., Bengio, S., Shazeer, N.: End-to-end text-dependent speaker verification. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5115–5119. IEEE, March 2016

    Google Scholar 

  14. Zhang, S.X., Chen, Z., Zhao, Y., Li, J., Gong, Y.: End-to-End attention based text-dependent speaker verification. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 171–178. IEEE, December 2016

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  16. Zeinali, H., Burget, L., Sameti, H., Glembek, O., Plchot, O.: Deep neural networks and hidden markov models in i-vector-based text-dependent speaker verification. In: Odyssey-The Speaker and Language Recognition Workshop, June 2016

    Google Scholar 

  17. Chollet, F.: Keras (2015). http://keras.io

  18. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Ghemawat, S., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems (2016). arXiv preprint: arXiv:1603.04467

  19. Kingma, D., Ba, J.: Adam: a method for stochastic optimization, arXiv preprint: arXiv:1412.6980

  20. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). doi:10.1007/978-3-319-46493-0_38

    Chapter  Google Scholar 

  21. Novoselov, S., Pekhovsky, T., Kudashev, O., Mendelev, V., Prudnikov, A.: Non-linear PLDA for i-vector speaker verification. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 214–218 (2015)

    Google Scholar 

  22. Kudashev, O., Novoselov, S., Pekhovsky, T., Simonchik, K., Lavrentyeva, G.: Usage of DNN in Speaker recognition: advantages and problems. In: Cheng, L., Liu, Q., Ronzhin, A. (eds.) ISNN 2016. LNCS, vol. 9719, pp. 82–91. Springer, Cham (2016). doi:10.1007/978-3-319-40663-3_10

    Chapter  Google Scholar 

  23. Novoselov, S., Pekhovsky, T., Shulipa, A., Kudashev, O.: PLDA-based system for text-prompted password speaker verification. In: 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–5. IEEE, August 2015

    Google Scholar 

  24. Novoselov, S., Sukhmel, V., Sholokhov, A., Pekhovsky, T.: Employment of DTW-based HMM-GMM multi-session training in textdependent speaker verification. J. Instrum. Eng. 57(2), 77–84 (2014). (in Russian)

    Google Scholar 

Download references

Acknowledgements

This work was financially supported by the Ministry of Education and Science of the Russian Federation, contract 14.578.21.0126 (ID RFMEFI57815X0126).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Egor Malykh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Malykh, E., Novoselov, S., Kudashev, O. (2017). On Residual CNN in Text-Dependent Speaker Verification Task. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_59

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics