Skip to main content
Log in

Research on English pronunciation training based on intelligent speech recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

When learning English, Chinese students tend to spend a lot of time in practicing reading and writing skills, while neglecting their ability to speak English. This study presented a speech recognition-based intelligent spoken English pronunciation training system which took Mel Frequency Cepstral Coefficients as the characteristic parameter of speech signal and introduced deep neural network algorithm to improve the accuracy of speech recognition. Taking tone, speech speed and intonation as the evaluation criteria, a simulation experiment of artificial evaluation and machine evaluation was carried out. The results demonstrated that deep neural network had high speech recognition rate, and the three evaluation criteria were reliable, which provides a reference for the development of spoken English learning system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Cao, J., Cui, H., Hao, S., & Jiao, L. (2016). Big Data: A parallel particle swarm optimization-back-propagation neural network algorithm based on MapReduce. PLoS ONE, 11(6), e0157551.

    Article  Google Scholar 

  • Celebi, M. E., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications, 40(1), 200–210.

    Article  Google Scholar 

  • Doremalen, J. V., Lou, B., Colpaert, J., Cucchiarini, C., & Strik, H. (2016). Evaluating automatic speech recognition-based language learning systems: A case study. Computer Assisted Language Learning, 29(4), 1–19.

    Google Scholar 

  • Fischer, A. (2014). Training restricted boltzmann machines. Pattern Recognition, 47(1), 25–39.

    Article  MATH  Google Scholar 

  • Hammami, N., Bedda, M., & Nadir, F. (2012). The second-order derivatives of MFCC for improving spoken Arabic digits recognition using Tree distributions approximation model and HMMs. In International conference on communications and information technology. IEEE, pp. 1–5.

  • Leema, N., Nehemiah, H. K., & Kannan, A. (2016). Neural network classifier optimization using differential evolution with global information and back propagation algorithm for clinical datasets. Applied Soft Computing, 49, 834–844.

    Article  Google Scholar 

  • Li, X., & Wu, X. (2014). Labeling unsegmented sequence data with DNN-HMM and its application for speech recognition. In International Symposium on Chinese Spoken Language Processing. IEEE, pp. 10–14.

  • Maher, R., Millar, D. S., Savory, S. J., & Thomsen, B. C. (2012). SOA blanking and signal pre-emphasis for wavelength agile 100 Gb/s transmitters. In Opto-electronics and communications conference. IEEE, pp. 905–906.

  • Mishali, M., & Eldar, Y. C. (2011). Sub-Nyquist sampling: Bridging theory and practice. IEEE Signal Processing Magazine, 11(2):61–71.

    Google Scholar 

  • Nallasamy, U., Metze, F., & Schultz, T. (2013). Active learning for accent adaptation in automatic speech recognition. In Spoken language technology workshop. IEEE, pp. 360–365.

  • Shahamiri, S. R., & Salim, S. S. B. (2014). Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach[J]. Advanced Engineering Informatics, 28(1), 102–110.

    Article  Google Scholar 

  • Simonchik, K., Aleinik, S., Ivanko, D., & Lavrentyeva, G. (2015). Automatic preprocessing technique for detection of corrupted speech signal fragments for the purpose of speaker recognition. speech and computer (pp.121–128). Berlin: Springer

    Google Scholar 

  • Tan, S., & Sim, K. C. (2017). Learning utterance-level normalisation using Variational Autoencoders for robust automatic speech recognition. In Spoken language technology workshop. IEEE, pp. 43–49.

  • Wang, Z., & Bi, G. (2016). A time-frequency preprocessing method for blind source separation of speech signal with temporal structure. In International conference on information, communications and signal processing. IEEE, pp. 1–6.

  • Wu, Y., Ye, Q., Li, X., Tan, D., & Shao, G. (2013). Applications of autocorrelation function method for spatial characteristics analysis of dielectric barrier discharge. Vacuum, 91(3), 28–34.

    Article  Google Scholar 

  • Xia, M., & Xu, Z. (2012). Entropy/cross entropy-based group decision making under intuitionistic fuzzy environment. Information Fusion, 13(1), 31–47.

    Article  MathSciNet  Google Scholar 

  • Xu, Z., Liu, J., Chen, X., Wang, Y., & Zhao, Z. (2017). Continuous blood pressure estimation based on multiple parameters from eletrocardiogram and photoplethysmogram by Back-propagation neural network. Computers in Industry, 89(C), 50–59.

    Article  Google Scholar 

  • Zhang, J., Haitao, H. U., & Li, C. (2014). Robust voice endpoint detection fusing Burg spectrum estimate and signal variability. Journal of Xidian University, 41(3), 192–195+220.

    Google Scholar 

  • Zhou, H., Deng, Z., Xia, Y., & Fu, M. (2016). A new sampling method in particle filter based on Pearson correlation coefficient. Neurocomputing, 216, 208–215.

    Article  Google Scholar 

Download references

Funding

Supported by Humanities and Social Sciences Research Project of Mudanjiang Normal University: Chinese Writing—Research on the New Generation of Chinese American Female Novels (No. GG2018009).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, J., Liu, Y. Research on English pronunciation training based on intelligent speech recognition. Int J Speech Technol 21, 633–640 (2018). https://doi.org/10.1007/s10772-018-9523-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-9523-8

Keywords

Navigation