Abstract
When learning English, Chinese students tend to spend a lot of time in practicing reading and writing skills, while neglecting their ability to speak English. This study presented a speech recognition-based intelligent spoken English pronunciation training system which took Mel Frequency Cepstral Coefficients as the characteristic parameter of speech signal and introduced deep neural network algorithm to improve the accuracy of speech recognition. Taking tone, speech speed and intonation as the evaluation criteria, a simulation experiment of artificial evaluation and machine evaluation was carried out. The results demonstrated that deep neural network had high speech recognition rate, and the three evaluation criteria were reliable, which provides a reference for the development of spoken English learning system.
Similar content being viewed by others
References
Cao, J., Cui, H., Hao, S., & Jiao, L. (2016). Big Data: A parallel particle swarm optimization-back-propagation neural network algorithm based on MapReduce. PLoS ONE, 11(6), e0157551.
Celebi, M. E., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications, 40(1), 200–210.
Doremalen, J. V., Lou, B., Colpaert, J., Cucchiarini, C., & Strik, H. (2016). Evaluating automatic speech recognition-based language learning systems: A case study. Computer Assisted Language Learning, 29(4), 1–19.
Fischer, A. (2014). Training restricted boltzmann machines. Pattern Recognition, 47(1), 25–39.
Hammami, N., Bedda, M., & Nadir, F. (2012). The second-order derivatives of MFCC for improving spoken Arabic digits recognition using Tree distributions approximation model and HMMs. In International conference on communications and information technology. IEEE, pp. 1–5.
Leema, N., Nehemiah, H. K., & Kannan, A. (2016). Neural network classifier optimization using differential evolution with global information and back propagation algorithm for clinical datasets. Applied Soft Computing, 49, 834–844.
Li, X., & Wu, X. (2014). Labeling unsegmented sequence data with DNN-HMM and its application for speech recognition. In International Symposium on Chinese Spoken Language Processing. IEEE, pp. 10–14.
Maher, R., Millar, D. S., Savory, S. J., & Thomsen, B. C. (2012). SOA blanking and signal pre-emphasis for wavelength agile 100 Gb/s transmitters. In Opto-electronics and communications conference. IEEE, pp. 905–906.
Mishali, M., & Eldar, Y. C. (2011). Sub-Nyquist sampling: Bridging theory and practice. IEEE Signal Processing Magazine, 11(2):61–71.
Nallasamy, U., Metze, F., & Schultz, T. (2013). Active learning for accent adaptation in automatic speech recognition. In Spoken language technology workshop. IEEE, pp. 360–365.
Shahamiri, S. R., & Salim, S. S. B. (2014). Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach[J]. Advanced Engineering Informatics, 28(1), 102–110.
Simonchik, K., Aleinik, S., Ivanko, D., & Lavrentyeva, G. (2015). Automatic preprocessing technique for detection of corrupted speech signal fragments for the purpose of speaker recognition. speech and computer (pp.121–128). Berlin: Springer
Tan, S., & Sim, K. C. (2017). Learning utterance-level normalisation using Variational Autoencoders for robust automatic speech recognition. In Spoken language technology workshop. IEEE, pp. 43–49.
Wang, Z., & Bi, G. (2016). A time-frequency preprocessing method for blind source separation of speech signal with temporal structure. In International conference on information, communications and signal processing. IEEE, pp. 1–6.
Wu, Y., Ye, Q., Li, X., Tan, D., & Shao, G. (2013). Applications of autocorrelation function method for spatial characteristics analysis of dielectric barrier discharge. Vacuum, 91(3), 28–34.
Xia, M., & Xu, Z. (2012). Entropy/cross entropy-based group decision making under intuitionistic fuzzy environment. Information Fusion, 13(1), 31–47.
Xu, Z., Liu, J., Chen, X., Wang, Y., & Zhao, Z. (2017). Continuous blood pressure estimation based on multiple parameters from eletrocardiogram and photoplethysmogram by Back-propagation neural network. Computers in Industry, 89(C), 50–59.
Zhang, J., Haitao, H. U., & Li, C. (2014). Robust voice endpoint detection fusing Burg spectrum estimate and signal variability. Journal of Xidian University, 41(3), 192–195+220.
Zhou, H., Deng, Z., Xia, Y., & Fu, M. (2016). A new sampling method in particle filter based on Pearson correlation coefficient. Neurocomputing, 216, 208–215.
Funding
Supported by Humanities and Social Sciences Research Project of Mudanjiang Normal University: Chinese Writing—Research on the New Generation of Chinese American Female Novels (No. GG2018009).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cai, J., Liu, Y. Research on English pronunciation training based on intelligent speech recognition. Int J Speech Technol 21, 633–640 (2018). https://doi.org/10.1007/s10772-018-9523-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-018-9523-8