Abstract
The aim of the paper is to investigate the ways to improve acoustic models for Russian spontaneous speech recognition. We applied the main steps of the Kaldi Switchboard recipe to a Russian dataset but obtained low accuracy with respect to the results for English spontaneous telephone speech. We found two methods to be especially useful for Russian spontaneous speech: the i-vector based deep neural network adaptation and speaker-dependent bottleneck features which provide 8.6 % and 11.9 % relative word error rate reduction over the baseline system respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Soltau, H., Saon, G., Sainath, T.N.: Joint training of convolutional and non-convolutional neural networks. In: 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5572–5576. Florence (2014)
Vesely, K., Ghoshal, A., Burget, L., Povey, D.: Sequence-discriminative training of deep neural networks. In: 14th Annual Conference of the International Speech Communication Association (Interspeech), pp. 2345–2349. Lyon (2014)
Saon, G., Soltau, H., Nahamoo, D., Picheny, M.: Speaker adaptation of neural network acoustic models using i-vectors. In: 13th Biannual IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 55–59. Olomouc (2013)
Godfrey, J.J., Holliman, E.C., McDaniel, J.: SWITCHBOARD: telephone speech corpus for research and development. In: 17th International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 517–520. San Francisco (1992)
Povey, D. et al.: The Kaldi speech recognition toolkit. In: 12th Biannual IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 5572–5576. Big Island (2011)
Gales, M.J.F.: Maximum Likelihood Linear Transformations for HMM-Based Speech Recognition. Technical report, Cambridge University Engineering Department (1997)
Povey, D.: Discriminative training for large vocabulary speech recognition. Ph.D. dissertation. University of Cambridge, Cambridge, UK (2003)
Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: 12th Biannual IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 24–29. Big Island (2011)
Gemello, R., Mana, F., Scanzio, S., Laface, P., De Mori, R.: Linear hidden transformations for adaptation of hybrid ANN/HMM models. Speech Commun. 49(10–11), 827–835 (2007)
Yao K., Yu, D., Seide, F., Su, H., Deng, L., Gong, Y.: Adaptation of context-dependent deep neural networks for automatic speech recognition. In: IEEE Spoken Language Technology Workshop (SLT), pp. 366–369. Miami (2012)
Ochiai, T., Matsuda, S., Lu, X., Hori, C., Katagiri, S.: Speaker adaptive training using deep neural networks. In: 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6399–6403. Florence (2014)
Li, X., Bilmes, J.: Regularized adaptation of discriminative classifiers. In: 31st International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toulouse (2006)
Yu, D., Yao, K., Su, H., Li, G., Seide, F.: KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In: 38th International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7893–7897. Vancouver (2013)
Senior, A., Lopez-Moreno, I.: Improving DNN speaker independence with i-vector inputs. In: 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 225–229. Florence (2014)
Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: 15th Annual Conference of the International Speech Communication Association, pp. 2997–3001. Singapore (2014)
Liu, S., Sim, K.C.: On combining DNN and GMM with unsupervised speaker adaptation for robust automatic speech. In: 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 195–199. Florence (2014)
Rouvier, M., Favre, B.: Speaker adaptation of DNN-based ASR with i-vectors: does it actually adapt models to speakers? In: 15th Annual Conference of the International Speech Communication Association (Interspeech), pp. 3007–3011. Singapore (2014)
Kozlov, A., Kudashev, O., Matveev, Y., Pekhovsky, T., Simonchik, K., Shulipa, A.: SVID Speaker Recognition System for NIST SRE 2012. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 278–285. Springer, Heidelberg (2013)
Novoselov, S., Pekhovsky, T., Simonchik, K., Shulipa, A.: RBM-PLDA subsystem for the NIST i-vector challenge. In: 15th Annual Conference of the International Speech Communication Association (Interspeech), pp. 378–382. Singapore (2014)
Karafiat, M., Grezl, F., Hannemann, M., Cernocky, J.H.: But neural network features for spontaneous Vietnamese in BABEL. In: 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5622–5626 (2014)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Technical report search in Computing Technology (Harvard University) (1998)
Acknowledgements
The work was partially financially supported by the Government of the Russian Federation, Grant 074-U01, and by the Ministry of Education and Science of Russian Federation, contract 14.579.21.0057, ID RFMEFI57914X0057.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Prudnikov, A., Medennikov, I., Mendelev, V., Korenevsky, M., Khokhlov, Y. (2015). Improving Acoustic Models for Russian Spontaneous Speech Recognition. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-23132-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)