skip to main content
10.1145/2069216.2069246acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

Emotional speech classification using hidden conditional random fields

Published:13 October 2011Publication History

ABSTRACT

Although there have been a great number of papers in the area of emotional speech recognition, most of them contribute to the feature extraction phase. Regarding classification algorithm, hidden Markov model (HMM) is still the most commonly used method. Whereas HMM was pointed out to be less accurate than its discriminative counterpart, the hidden conditional random fields (HCRF) model, for example in phone classification or gesture recognition. Therefore in this study, we investigate the use of the HCRF model in emotional speech classification problem. In our experiments, we extracted Mel-frequency cepstral coefficients (MFCC) features from the well-known Berlin emotional speech dataset (EMO) and eNTERFACE 2005 dataset. After that, we used the 10-fold cross validation rule to train, evaluate and compare our result with that of HMM. The experiments show that HCRF achieves significant improvement (p-value ≤ 0.05) in classification accuracy. In addition, we speed up the training phase of the model by caching the gradient computation. Therefore our computation time is much less than that of the existing methods.

References

  1. M. E. Ayadi, M. S. Kamel, and F. Karray. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3):572--587, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Bitouk, R. Verma, and A. Nenkova. Class-level spectral features for emotion recognition. Speech Communication, 52(7-8):613--625, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss. A database of german emotional speech. In Proceedings of the 9th European Conference on Speech Communication and Technology, pages 1517--1520, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  4. D. A. Cairns and J. H. L. Hansen. Nonlinear analysis and classification of speech under stressed conditions. Journal of the Acoustical Society of America, 96(6):3392--3400, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  5. R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. Taylor. Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1):32--80, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Gunawardana, M. Mahajan, A. Acero, and J. C. Platt. Hidden conditional random fields for phone classification. In Proceedings of the International Conference on Speech Communication and Technology, pages 1117--1120, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  7. A. I. Iliev, M. S. Scordilis, J. P. Papa, and A. X. Falcão. Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech and Language, 24(3):445--460, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H.-K. J. Kuo and Y. Gao. Maximum entropy direct models for speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 14(3):873--881, may 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Lafferty. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, pages 282--289. Morgan Kaufmann, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. McCallum, D. Freitag, and F. C. N. Pereira. Maximum entropy markov models for information extraction and segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 591--598, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. L. Nwe, S. W. Foo, and L. C. D. Silva. Speech emotion recognition using hidden markov models. Speech Communication, 41(4):603--623, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  12. M. O., A. J., H. A., K. I., S. A., and S. R. Multimodal caricatural mirror. In Proceedings of the SIMILAR NoE Summer Workshop on Multimodal Interfaces, pages 13--20, 2005.Google ScholarGoogle Scholar
  13. A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell. Hidden conditional random fields. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(10):1848--1852, oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Proceedings of the Empirical Methods in Natural Language Processing, pages 133--142, 1996.Google ScholarGoogle Scholar
  15. M. J. Schervish. P values: What they are and what they are not. The American Statistician, 50(3):203--206, August 1996.Google ScholarGoogle Scholar
  16. D. Tacconi, O. Mayora, P. Lukowicz, B. Arnrich, C. Setz, G. Troster, and C. Haring. Activity and emotion recognition to support early diagnosis of psychiatric diseases. In Proceedings of the Second International Conference on Pervasive Computing Technologies for Healthcare, pages 100--102, 2008.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Emotional speech classification using hidden conditional random fields

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SoICT '11: Proceedings of the 2nd Symposium on Information and Communication Technology
      October 2011
      225 pages
      ISBN:9781450308809
      DOI:10.1145/2069216

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 October 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate147of318submissions,46%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader