research-article

Emotional speech classification using hidden conditional random fields

Authors:
La The Vinh

Kyung Hee University, Korea

Kyung Hee University, Korea
View Profile

,
Sungyoung Lee

Kyung Hee University, Korea

Kyung Hee University, Korea
View Profile

,
Young-Koo Lee

Kyung Hee University, Korea

Kyung Hee University, Korea
View Profile

SoICT '11: Proceedings of the 2nd Symposium on Information and Communication TechnologyOctober 2011Pages 146–150https://doi.org/10.1145/2069216.2069246

Published:13 October 2011Publication History

SoICT '11: Proceedings of the 2nd Symposium on Information and Communication Technology

Pages 146–150

ABSTRACT

Although there have been a great number of papers in the area of emotional speech recognition, most of them contribute to the feature extraction phase. Regarding classification algorithm, hidden Markov model (HMM) is still the most commonly used method. Whereas HMM was pointed out to be less accurate than its discriminative counterpart, the hidden conditional random fields (HCRF) model, for example in phone classification or gesture recognition. Therefore in this study, we investigate the use of the HCRF model in emotional speech classification problem. In our experiments, we extracted Mel-frequency cepstral coefficients (MFCC) features from the well-known Berlin emotional speech dataset (EMO) and eNTERFACE 2005 dataset. After that, we used the 10-fold cross validation rule to train, evaluate and compare our result with that of HMM. The experiments show that HCRF achieves significant improvement (p-value ≤ 0.05) in classification accuracy. In addition, we speed up the training phase of the model by caching the gradient computation. Therefore our computation time is much less than that of the existing methods.

References

M. E. Ayadi, M. S. Kamel, and F. Karray. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3):572--587, 2011. Google ScholarDigital Library
D. Bitouk, R. Verma, and A. Nenkova. Class-level spectral features for emotion recognition. Speech Communication, 52(7-8):613--625, 2010. Google ScholarDigital Library
F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss. A database of german emotional speech. In Proceedings of the 9th European Conference on Speech Communication and Technology, pages 1517--1520, 2005.Google ScholarCross Ref
D. A. Cairns and J. H. L. Hansen. Nonlinear analysis and classification of speech under stressed conditions. Journal of the Acoustical Society of America, 96(6):3392--3400, 1994.Google ScholarCross Ref
R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. Taylor. Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1):32--80, 2001.Google ScholarCross Ref
A. Gunawardana, M. Mahajan, A. Acero, and J. C. Platt. Hidden conditional random fields for phone classification. In Proceedings of the International Conference on Speech Communication and Technology, pages 1117--1120, 2005.Google ScholarCross Ref
A. I. Iliev, M. S. Scordilis, J. P. Papa, and A. X. Falcão. Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech and Language, 24(3):445--460, 2010. Google ScholarDigital Library
H.-K. J. Kuo and Y. Gao. Maximum entropy direct models for speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 14(3):873--881, may 2006. Google ScholarDigital Library
J. Lafferty. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, pages 282--289. Morgan Kaufmann, 2001. Google ScholarDigital Library
A. McCallum, D. Freitag, and F. C. N. Pereira. Maximum entropy markov models for information extraction and segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 591--598, 2000. Google ScholarDigital Library
T. L. Nwe, S. W. Foo, and L. C. D. Silva. Speech emotion recognition using hidden markov models. Speech Communication, 41(4):603--623, 2003.Google ScholarCross Ref
M. O., A. J., H. A., K. I., S. A., and S. R. Multimodal caricatural mirror. In Proceedings of the SIMILAR NoE Summer Workshop on Multimodal Interfaces, pages 13--20, 2005.Google Scholar
A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell. Hidden conditional random fields. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(10):1848--1852, oct. 2007. Google ScholarDigital Library
A. Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Proceedings of the Empirical Methods in Natural Language Processing, pages 133--142, 1996.Google Scholar
M. J. Schervish. P values: What they are and what they are not. The American Statistician, 50(3):203--206, August 1996.Google Scholar
D. Tacconi, O. Mayora, P. Lukowicz, B. Arnrich, C. Setz, G. Troster, and C. Haring. Activity and emotion recognition to support early diagnosis of psychiatric diseases. In Proceedings of the Second International Conference on Pervasive Computing Technologies for Healthcare, pages 100--102, 2008.Google ScholarCross Ref

Index Terms

Emotional speech classification using hidden conditional random fields
1. Information systems
  1. Information systems applications

Recommendations

Hidden Conditional Random Fields for Speech Recognition
Read More
Robust Arabic speech recognition in noisy environments using prosodic features and formant

This paper investigates the contribution of formants and prosodic features such as pitch and energy in Arabic speech recognition under real-life conditions. Our speech recognition system based on Hidden Markov Models (HMMs) is implemented using the HTK ...
Read More
Continuous Malayalam speech recognition using Hidden Markov Models
A2CWiC '10: Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India

Accurate and computationally efficient means of recognizing continuous speech has been a subject of research in recent years. This paper reports the development of a small vocabulary, speaker independent continuous Malayalam speech recognition system ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SoICT '11: Proceedings of the 2nd Symposium on Information and Communication Technology
October 2011
225 pages
ISBN:9781450308809
DOI:10.1145/2069216
General Chairs:
Thang Huynh Quyet
Hanoi University of Science and Technology, Vietnam
,
Dinh Khang Tran
Hanoi University of Science and Technology, Vietnam
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
HCRF
HMM
MFCC
emotion classification
emotional speech
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate147of318submissions,46%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 214
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Emotional speech classification using hidden conditional random fields

SoICT '11: Proceedings of the 2nd Symposium on Information and Communication Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hidden Conditional Random Fields for Speech Recognition

Robust Arabic speech recognition in noisy environments using prosodic features and formant

Continuous Malayalam speech recognition using Hidden Markov Models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Emotional speech classification using hidden conditional random fields

SoICT '11: Proceedings of the 2nd Symposium on Information and Communication Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hidden Conditional Random Fields for Speech Recognition

Robust Arabic speech recognition in noisy environments using prosodic features and formant

Continuous Malayalam speech recognition using Hidden Markov Models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media