Skip to main content
Log in

A new formant feature and its application in Mandarin vowel pronunciation quality assessment

  • Published:
Journal of Central South University Aims and scope Submit manuscript

Abstract

In order to improve the Mandarin vowel pronunciation quality assessment, a novel formant feature was proposed and applied to formant classification for Chinese Mandarin vowel pronunciation quality evaluation. Formant candidates of each frame were plotted on the time-frequency plane to form a bitmap, and its Gabor feature was extracted to represent the formant trajectory. The feature was then classified by using GMM model and the classification posterior probability was mapped to pronunciation quality grade. The experiments of comparing the Gabor transformation based formant trajectory feature with several other kinds of traditionally used features show that with this method, a human-machine scoring correlation coefficient (CC) of 0.842 can be achieved, which is better than the result of 0.832 by traditional speech recognition techniques. At the same time, considering that the long-term information of formant classification and the short-term information of speech recognition technique are complementary to each other, it is investigated to combine their results with linear or nonlinear methods to further improve the evaluation performance. As a result, experiments on PSK show that the best CC of 0.913, which is very close to the correlation of inter-human rating of 0.94, is gotten by using neural network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. FRANCO H, NEUMEYER L. Automatic pronunciation scoring for language instruction [C]// Proceedings of International Conference of Acoust, Speech and Signal Processing, Munich: IEEE, 1997: 1471–1474.

    Google Scholar 

  2. NEUMEYER L, FRANCO H. Automatic scoring of pronunciation quality [J]. Speech Communication, 2000, 30(2): 83–93.

    Article  Google Scholar 

  3. STRIK H, TRUONG K, WET F D, CUCCHIARINI C. Comparing different approaches for automatic pronunciation error detection [J]. Speech Communication, 2009, 51(10): 1–14.

    Article  Google Scholar 

  4. TATSUYA K, MASATAKE D, YASUSHI T. Practical use of English pronunciation system for Japanese students in the CALL classroom [C]// Proceedings of 8th International Conference on Spoken Language Processing, Jeju Island, Korea: ISCA, 2004: 1689–1692.

    Google Scholar 

  5. WITT S M, YOUNG S J. Phone-level pronunciation scoring and assessment for interactive language learning [J]. Speech Communication, 2000, 30(2/3): 95–108.

    Article  Google Scholar 

  6. CHEN Jiang-chun, JANG Jyh-shing Roger, TSAI Te-lu. Automatic pronunication assessment for mandarin Chinese: Approaches and system overview [J]. Computational Linguistics and Chinese Language Processing, 2007, 12(4): 443–458.

    Google Scholar 

  7. TRUONG K, NERI A, CUCCHIARINI C, STRIK H. Automatic pronunciation error detection: an acoustic-phonetic approach [C]// Proceedings of the InSTIL/ICALL Symposium on NLP and Speech Technologies in Advanced Language Learning Systems, Venice, Italy: Springger-Verlag, 2004: 135–138.

    Google Scholar 

  8. TRUONG K. Automatic pronunciation error detection in Dutch as a second language: an acoustic-phonetic approach [D]. The Netherlands: Utrecht University, 2004.

    Google Scholar 

  9. XIE Sun, KEELAN E. Gaussian mixture modeling of vowel durations for automated assessment of non-native speech [C]// Proceedings of International Conference of Acoust, Speech and Signal Processing, Prague: IEEE, 2011: 5716–5719.

    Google Scholar 

  10. YUSOF S A M, PAULRAJ M, YAACOB S. Classification of malaysian vowels using formant based feature [J]. Journal of ICT, 2009, 7(2): 27–40.

    Google Scholar 

  11. ZHANDOS Y, MUSLIMA K, ALTYNBEK S. Formant analysis and mathematical model of kazakh vowels [C]// Proceedings of International Conference on Modeling and Simulation, UKSim: IEEE, 2012: 427–431.

    Google Scholar 

  12. SCHMID P, BANARD E. Explicit, n-best formant features for vowel classification [C]// Proceedings of International Conference of Acoust, Speech and Signal Processing, Munich: IEEE, 1997: 21–24.

    Google Scholar 

  13. LEE M, VANSANTEN J, MOBIUS B, OLIVE J. Formant tracking using context-dependent phonemic information [J]. IEEE Transactions on Speech and Audio Processing, 2005, 13(5): 741–750.

    Article  Google Scholar 

  14. GRIGORESCU S E, PETKOV N, KRUIZINGA P. Comparison of texture features based on Gabor filters [J]. IEEE Transactions on Image Processing, 2002, 11(10): 1160–1167.

    Article  MathSciNet  Google Scholar 

  15. PETKOV N. Biologically motivated computationally intensive approaches to image pattern recognition [J]. Future Generation Computer Systems, 1995, 11(4): 451–465.

    Article  Google Scholar 

  16. JONES J P, PALMER L A. An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex [J]. Journal of Neurophysiology, 1987, 58(6): 1233–1258.

    Google Scholar 

  17. MORGAN N, ZHU Qi-feng. Pushing the envelope aside: Beyond the spectral envelope as the fundamental representation for speech recognition [J]. IEEE Signal Processing Magazine, 2005, 22(5): 81–88.

    Article  Google Scholar 

  18. XIAO Dong-rong, HOU Jian-min. Application research of neural network in fault diagnosis [J]. Journal of Central South University of Technology: Natural Science, 2003, 34(1): 206–208. (in Chinese)

    Google Scholar 

  19. DING De-xin, ZHANG Zhi-jun. Artificial neural network based on inverse design method for circular sliding slopes [J]. Journal of Central South University of Technology, 2004, 11(1): 89–92.

    Article  Google Scholar 

  20. BISHOP C M. Neural networks for pattern recognition [M]. New York: Oxford University Press, 1995: 216–302.

    Google Scholar 

  21. ZHOU Yu, PEDRYCA W, QIAN Xu. Application of extension neural network to safety status pattern recognition of coal mines [J]. Journal of Central South University of Technology, 2011, 18(1): 633–641.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-chun Lu  (卢小春).

Additional information

Foundation item: Project(61062011) supported by the National Natural Science Foundation of China; Project(2010GXNSFA013128) supported by the Natural Science Foundation of Guangxi Province, China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Xc., Pan, Fp., Yin, Jx. et al. A new formant feature and its application in Mandarin vowel pronunciation quality assessment. J. Cent. South Univ. 20, 3573–3581 (2013). https://doi.org/10.1007/s11771-013-1883-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11771-013-1883-2

Key words

Navigation