Abstract
Nowadays, recognition of human emotion is a challenging yet important speech technology. In this paper, based on deriving prosody features from emotional speech, some voice quality features are proposed to be extracted as new emotional features to improve emotion recognition. Utilizing support vector machines classifier, four emotions from Chinese natural emotional speech corpus including anger, joy, sadness and neutral are discriminated by combining prosody and voice quality features. The experiment results show that combining prosody and voice quality features yields an overall accuracy of 76% for emotion recognition, which makes approximately 10% improvement compared with using the single prosody features. It also shows that voice quality features in speech are effective emotional features and can promote prosody features for improving emotion recognition results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine 18(01), 32–80 (2001)
Lee, C.M., Narayanan, S.S.: Toward Detecting Emotions in Spoken Dialogs. IEEE Transactions on Speech and Audio Processing 13(2), 293–303 (2005)
Nakatsu, R., Nicholson, J., Tosa, N.: Emotion Recognition and Its Application to Computer Agents with Spontaneous Interactive Capabilities. Knowledge-Based Systems 13(7-8), 497–504 (2000)
Ang, J., Dhillon, R., Krupski, A., Shriberg, E., Stolcke, A.: Prosody-Based Automatic Detection of Annoyance and Frustration in Human-Computer Dialog. In: Proceedings of the ICSLP, Denver, Colorado, pp. 2037–2039 (2002)
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov Model-Based Speech Emotion Recognition. In: Proceedings of the ICASSP, Hong Kong, vol. 2, pp. 1–4 (2003)
Tato, R., Santos, R., Kompe, R., Pardo, J.M.: Emotional Space Improves Emotion Recognition. In: Proceedings of the ICSLP, Denver, Colorado, pp. 2029–2032 (2002)
Gobl, C., Ni-Chasaide, A.: The Role of Voice Quality in Communicating Emotion, Mood, and Attitude. Speech Communication 40, 189–212 (2003)
Johnstone, T., Scherer, K.R.: The Effects of Emotions on Voice Quality. In: Proceedings of the XIVth International Congress of Phonetic Science, San Francisco, pp. 2029–2032 (1999)
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the 14th International Conference on Machine Learning ICML 1997, pp. 412–420 (1997)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods-Support Vector Learning. MIT Press, Cambridge (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, S. (2008). Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds) Advances in Neural Networks - ISNN 2008. ISNN 2008. Lecture Notes in Computer Science, vol 5264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87734-9_52
Download citation
DOI: https://doi.org/10.1007/978-3-540-87734-9_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87733-2
Online ISBN: 978-3-540-87734-9
eBook Packages: Computer ScienceComputer Science (R0)