ABSTRACT
Although there have been a great number of papers in the area of emotional speech recognition, most of them contribute to the feature extraction phase. Regarding classification algorithm, hidden Markov model (HMM) is still the most commonly used method. Whereas HMM was pointed out to be less accurate than its discriminative counterpart, the hidden conditional random fields (HCRF) model, for example in phone classification or gesture recognition. Therefore in this study, we investigate the use of the HCRF model in emotional speech classification problem. In our experiments, we extracted Mel-frequency cepstral coefficients (MFCC) features from the well-known Berlin emotional speech dataset (EMO) and eNTERFACE 2005 dataset. After that, we used the 10-fold cross validation rule to train, evaluate and compare our result with that of HMM. The experiments show that HCRF achieves significant improvement (p-value ≤ 0.05) in classification accuracy. In addition, we speed up the training phase of the model by caching the gradient computation. Therefore our computation time is much less than that of the existing methods.
- M. E. Ayadi, M. S. Kamel, and F. Karray. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3):572--587, 2011. Google ScholarDigital Library
- D. Bitouk, R. Verma, and A. Nenkova. Class-level spectral features for emotion recognition. Speech Communication, 52(7-8):613--625, 2010. Google ScholarDigital Library
- F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss. A database of german emotional speech. In Proceedings of the 9th European Conference on Speech Communication and Technology, pages 1517--1520, 2005.Google ScholarCross Ref
- D. A. Cairns and J. H. L. Hansen. Nonlinear analysis and classification of speech under stressed conditions. Journal of the Acoustical Society of America, 96(6):3392--3400, 1994.Google ScholarCross Ref
- R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. Taylor. Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1):32--80, 2001.Google ScholarCross Ref
- A. Gunawardana, M. Mahajan, A. Acero, and J. C. Platt. Hidden conditional random fields for phone classification. In Proceedings of the International Conference on Speech Communication and Technology, pages 1117--1120, 2005.Google ScholarCross Ref
- A. I. Iliev, M. S. Scordilis, J. P. Papa, and A. X. Falcão. Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech and Language, 24(3):445--460, 2010. Google ScholarDigital Library
- H.-K. J. Kuo and Y. Gao. Maximum entropy direct models for speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 14(3):873--881, may 2006. Google ScholarDigital Library
- J. Lafferty. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, pages 282--289. Morgan Kaufmann, 2001. Google ScholarDigital Library
- A. McCallum, D. Freitag, and F. C. N. Pereira. Maximum entropy markov models for information extraction and segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 591--598, 2000. Google ScholarDigital Library
- T. L. Nwe, S. W. Foo, and L. C. D. Silva. Speech emotion recognition using hidden markov models. Speech Communication, 41(4):603--623, 2003.Google ScholarCross Ref
- M. O., A. J., H. A., K. I., S. A., and S. R. Multimodal caricatural mirror. In Proceedings of the SIMILAR NoE Summer Workshop on Multimodal Interfaces, pages 13--20, 2005.Google Scholar
- A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell. Hidden conditional random fields. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(10):1848--1852, oct. 2007. Google ScholarDigital Library
- A. Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Proceedings of the Empirical Methods in Natural Language Processing, pages 133--142, 1996.Google Scholar
- M. J. Schervish. P values: What they are and what they are not. The American Statistician, 50(3):203--206, August 1996.Google Scholar
- D. Tacconi, O. Mayora, P. Lukowicz, B. Arnrich, C. Setz, G. Troster, and C. Haring. Activity and emotion recognition to support early diagnosis of psychiatric diseases. In Proceedings of the Second International Conference on Pervasive Computing Technologies for Healthcare, pages 100--102, 2008.Google ScholarCross Ref
Index Terms
- Emotional speech classification using hidden conditional random fields
Recommendations
Robust Arabic speech recognition in noisy environments using prosodic features and formant
This paper investigates the contribution of formants and prosodic features such as pitch and energy in Arabic speech recognition under real-life conditions. Our speech recognition system based on Hidden Markov Models (HMMs) is implemented using the HTK ...
Continuous Malayalam speech recognition using Hidden Markov Models
A2CWiC '10: Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in IndiaAccurate and computationally efficient means of recognizing continuous speech has been a subject of research in recent years. This paper reports the development of a small vocabulary, speaker independent continuous Malayalam speech recognition system ...
Comments