Emotional Vocal Expressions Recognition Using the COST 2102 Italian Database of Emotional Speech

Atassi, Hicham; Riviello, Maria Teresa; Smékal, Zdeněk; Hussain, Amir; Esposito, Anna

doi:10.1007/978-3-642-12397-9_21

Hicham Atassi^20,21,
Maria Teresa Riviello²²,
Zdeněk Smékal²¹,
Amir Hussain²⁰ &
…
Anna Esposito²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5967))

2345 Accesses
12 Citations

Abstract

The present paper proposes a new speaker-independent approach to the classification of emotional vocal expressions by using the COST 2102 Italian database of emotional speech. The audio records extracted from video clips of Italian movies possess a certain degree of spontaneity and are either noisy or slightly degraded by an interruption making the collected stimuli more realistic in comparison with available emotional databases containing utterances recorded under studio conditions. The audio stimuli represent 6 basic emotional states: happiness, sarcasm/irony, fear, anger, surprise, and sadness. For these more realistic conditions, and using a speaker independent approach, the proposed system is able to classify the emotions under examination with 60.7% accuracy by using a hierarchical structure consisting of a Perceptron and fifteen Gaussian Mixture Models (GMM) trained to distinguish within each pair (couple) of emotions under examination. The best features in terms of high discriminative power were selected by using the Sequential Floating Forward Selection (SFFS) algorithm among a large number of spectral, prosodic and voice quality features. The results were compared with the subjective evaluation of the stimuli provided by human subjects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Christian, J., Deeming, A.: Affective Human-Robotic Interaction. Affect and Emotion in Human-Computer Interaction: From Theory to Applications, Christian Peter, Russell Beale (2008)
Google Scholar
Sony AIBO Europe, Sony Entertainment, http://www.sonydigital-link.com/AIBO/
Petrushin, V.: Emotion in Speech: Recognition and Application to Call Centers. In: Proceedings of the Conference on Artificial Neural Networks in Engineering, pp. 7–10 (1999)
Google Scholar
Van Bezooijen, R.: The Characteristics and Recognisability of Vocal Expression of Emotions. Drodrecht, The Netherlands, Foris (1984)
Google Scholar
Rahurkar, M., Hansen, J.H.L.: Frequency Band Analysis for Stress Detection Using Teager energy Operator Based Feature. In: Proc. Int. Conf. Spoken Language Processing (ICSLP 2002), vol. 3, pp. 2021–2024 (2002)
Google Scholar
Navas, E., Hernáez, L.I.: An Objective and Subjective Study of the Role of Semantics and Prosodic Features in Building Corpora for Emotional TTS. IEEE Transactions on Audio, Speech, and Language Processing 14, 1117–1127 (2006)
Article Google Scholar
Atassi, H., Esposito, A.: A Speaker Independent Approach to the Classification of Emotional Vocal Expressions. In: Proc. of 20th Int. Conf. Tools with Artificial Intelligence, ICTAI 2008, pp. 147–151. IEEE Computer Society, Dayton (2008)
Chapter Google Scholar
Pudil, P., Ferri, F., Novovicova, J., Kittler, J.: Floating search method for feature selection with non monotonic criterion functions. Pattern Recognition 2, 279–283 (1994)
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of German Emotional Speech. In: Proceedings of Interspeech, pp. 1517–1520 (2005)
Google Scholar
Ekman, P.: Facial expression of emotion: New findings, new questions. Psychological Science 3, 34–38 (1992)
Article Google Scholar
Oatley, K., Jenkins, J.M.: Understanding emotions. Blackwell, Oxford (1996)
Google Scholar
Banse, R., Scherer, K.: Acoustic profiles in vocal emotion expression. Journal of Personality & Social Psychology 70(3), 614–636 (1996)
Article Google Scholar
Scherer, K.R.: Vocal communication of emotion: A review of research paradigms. Speech Communication 40, 227–256 (2003)
Article MATH Google Scholar
Scherer, K.R., Banse, R., Wallbott, H.G.: Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology, 76–92 (2001)
Google Scholar
Scherer, K.R., Banse, R., Wallbott, H.G., Goldbeck, T.: Vocal cues in emotion encoding and decoding. Motivation and Emotion 15, 123–148 (1991)
Article Google Scholar
Scherer, K.R.: Vocal correlates of emotional arousal and affective disturbance. In: Wagner, H., Manstead, A. (eds.) Handbook of social Psychophysiology, pp. 165–197. Wiley, New York (1989)
Google Scholar
Esposito, A., Riviello, M.T., Di Maio, G.: The COST 2102 Italian Audio and Video Emotional Database. In: To be published in Proceedings of WIRN 2009, Vietri sul Mare, May 28-30, IOS press, Amsterdam (2009)
Google Scholar
Esposito, A., Riviello, M.T., Bourbakis, N.: Cultural Specific Effects on the Recognition of Basic Emotions: A Study on Italian Subjects. In: Holzinger, A. (ed.) USAB 2009. LNCS, vol. 5889, pp. 135–148. Springer, Heidelberg (2009)
Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov Model-Based Speech Emotion Recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2003, Hong Kong, China, vol. 2 (2003)
Google Scholar
Nogueiras, A., Marino, J.B., Moreno, A., Bonafonte, A.: Speech emotion recognition using hidden Markov models. In: Proc. European Conf. Speech Communication and Technology (Eurospeech 2001), Denmark (2001)
Google Scholar
Ververidis, D., Kotropoulos, C.: Emotional speech classification using Gaussian mixture models and the sequential floating forward selection algorithm. In: Proc. Int. Conf. Multimedia and Expo, ICME 2005 (2005)
Google Scholar
Ververidis, D., Kotropoulos, C.: Automatic Speech Classification to five emotional states based on gender information. In: Proc. 12th European Signal Processing Conf., Vienna, pp. 341–344 (2004)
Google Scholar
Pao, T., Chen, Y., Yeh, J.: Emotion Recognition from Mandarin Speech Signals. In: International Symposium on Spoken Language Processing, Chinese (2004)
Google Scholar
Lugger, M., Yang, B.: The Relevance of Voice Quality Features in Speaker Independent Emotion Recognition. In: Proceedings of ICASSP, Honolulu, Hawaii (2007)
Google Scholar
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Communication 41, 603–623 (2003)
Article Google Scholar
Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis of Speech. Journal of Acoustic Socienty (4), 1738–1753 (1990)
Article Google Scholar
Apolloni, B., Aversano, G., Esposito, A.: Preprocessing and Classification of Emotional Features in Speech Sentences. In: Kosarev, Y. (ed.) Proc. of International Workshop on Speech and Computer, SPIIRAS, pp. 49–52 (2000)
Google Scholar
Busso, C., Lee, S., Narayanan, S.S.: Using Neutral Speech Models for Emotional Speech Analysis. In: Interspeech- Eurospeech, Antwerp, Belgium, pp. 2225–2228 (2007)
Google Scholar
Stejskal, V., Smekal, Z., Esposito, A., Bourbakis, N.: The Significance of Empty Speech Pauses: Cognitive and Algorithmic Issues. In: Mele, F., Ramella, G., Santillo, S., Ventriglia, F. (eds.) BVAI 2007. LNCS, vol. 4729, pp. 1–13. Springer, Heidelberg (2007)
Chapter Google Scholar
Esposito, A., Aversano, G.: Text Independent Methods for Speech Segmentation. In: Chollet, G., Esposito, A., Faúndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling and Applications. LNCS (LNAI), vol. 3445, pp. 261–290. Springer, Heidelberg (2005)
Chapter Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, Chichester (2003)
MATH Google Scholar
Scherer, S., Oubbati, M., Schwenker, F., Palm, G.: Real-time emotion recognition using echo state model. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds.) PIT 2008. LNCS (LNAI), vol. 5078, pp. 200–204. Springer, Heidelberg (2008)
Chapter Google Scholar
Lee, C., Narayanan, S.: Emotion recognition using a data-driven fuzzy inference system. In: Proceedings of Eurospeech, pp. 157–160 (2003)
Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP 2004), vol. 1, pp. 557–560 (2004)
Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004)
Book MATH Google Scholar
Faundez-Zanuy, M.: Data Fusion at Different Levels. In: Multimodal Signals: Cognitive and Algorithmic Issues: COST Action 2102 and euCognition International School Vietri sul Mare, Italy, pp. 21–26 (2008)
Google Scholar
Beerends, J.G., Rix, A.W., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ) The new ITU standard for end-to-end speech quality assessment, Part I – Time-Delay Compensation. J. Audio Eng. Soc. 50(10), 755–764 (2002)
Google Scholar
Esposito, A., Riviello, T.: The New Italian Audio and Video Emotional Database. In: Esposito, A., et al. (eds.) Development of Multimodal Interfaces: Active Listening and Synchrony. LNCS, vol. 5967, pp. 255–267. Springer, Heidelberg (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science and Mathematics, University of Stirling, UK
Hicham Atassi & Amir Hussain
Department of Telecommunications, Brno University of Technology, Czech Republic
Hicham Atassi & Zdeněk Smékal
Department of Psychology and IIASS, Second University of Naples, Italy
Maria Teresa Riviello & Anna Esposito

Authors

Hicham Atassi
View author publications
You can also search for this author in PubMed Google Scholar
Maria Teresa Riviello
View author publications
You can also search for this author in PubMed Google Scholar
Zdeněk Smékal
View author publications
You can also search for this author in PubMed Google Scholar
Amir Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Anna Esposito
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Second University of Naples, and IIASS, Via Pellegrino, 84019, Vietri sul Mare, SA, Italy
Anna Esposito
Centre for Language and Communication Studies, Trinity College, The University of Dublin, Dublin 2, Ireland
Nick Campbell & Carl Vogel &
Department of Computing Science & Mathematics, University of Stirling, FK9 4LA, Stirling, Scotland, UK
Amir Hussain
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, P.O. Box 217, 7500 AE, Enschede, The Netherlands
Anton Nijholt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Atassi, H., Riviello, M.T., Smékal, Z., Hussain, A., Esposito, A. (2010). Emotional Vocal Expressions Recognition Using the COST 2102 Italian Database of Emotional Speech. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds) Development of Multimodal Interfaces: Active Listening and Synchrony. Lecture Notes in Computer Science, vol 5967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12397-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-12397-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12396-2
Online ISBN: 978-3-642-12397-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics