Video modeling and learning on Riemannian manifold for emotion recognition in the wild

Liu, Mengyi; Wang, Ruiping; Li, Shaoxin; Huang, Zhiwu; Shan, Shiguang; Chen, Xilin

doi:10.1007/s12193-015-0204-5

Video modeling and learning on Riemannian manifold for emotion recognition in the wild

Original Paper
Published: 11 November 2015

Volume 10, pages 113–124, (2016)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Mengyi Liu¹,
Ruiping Wang¹,
Shaoxin Li¹,
Zhiwu Huang¹,
Shiguang Shan¹ &
…
Xilin Chen¹

495 Accesses
6 Citations
Explore all metrics

Abstract

In this paper, we present the method for our submission to the emotion recognition in the wild challenge (EmotiW). The challenge is to automatically classify the emotions acted by human subjects in video clips under real-world environment. In our method, each video clip can be represented by three types of image set models (i.e. linear subspace, covariance matrix, and Gaussian distribution) respectively, which can all be viewed as points residing on some Riemannian manifolds. Then different Riemannian kernels are employed on these set models correspondingly for similarity/distance measurement. For classification, three types of classifiers, i.e. kernel SVM, logistic regression, and partial least squares, are investigated for comparisons. Finally, an optimal fusion of classifiers learned from different kernels and different modalities (video and audio) is conducted at the decision level for further boosting the performance. We perform extensive evaluations on the EmotiW 2014 challenge data (including validation set and blind test set), and evaluate the effects of different components in our pipeline. It is observed that our method has achieved the best performance reported so far. To further evaluate the generalization ability, we also perform experiments on the EmotiW 2013 data and two well-known lab-controlled databases: CK+ and MMI. The results show that the proposed framework significantly outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Essam H. Houssein, Asmaa Hammad & Abdelmgeid A. Ali

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

Priyadarsini Samal & Mohammad Farukh Hashmi

Deep learning-based facial emotion recognition for human–computer interaction applications

Article 22 April 2021

M. Kalpana Chowdary, Tu N. Nguyen & D. Jude Hemanth

References

Arandjelovic O, Shakhnarovich G, Fisher J, Cipolla R, Darrell T (2005) Face recognition with image sets using manifold density divergence. IEEE Comput Vis Pattern Recognit 1:581–588
Google Scholar
Arsigny V, Fillard P, Pennec X, Ayache N (2007) Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM J Matrix Anal Appl 29(1):328–347
Article MathSciNet MATH Google Scholar
Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Google Scholar
Chen J, Chen Z, Chi Z, Fu H (2014) Emotion recognition in the wild with feature fusion and multiple kernel learning. ACM Int Conf Multimodal Interact 1:508–513
Google Scholar
Chew SW, Lucey S, Lucey P, Sridharan S, Conn JF (2012) Improved facial expression recognition via uni-hyperplane classification. IEEE Comput Vis Pattern Recognit 1:2554–2561
Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE Comput Vis Pattern Recognit 1:886–893
Google Scholar
Deng J, DongW, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. IEEE Comput Vis Pattern Recognit 1:248–255
Dhall A, Asthana A, Goecke R, Gedeon T (2011) Emotion recognition using phog and lpq features. IEEE Autom Face Gesture Recognit 1:878–883
Google Scholar
Dhall A, Goecke R, Joshi J, Sikka K, Gedeon T (2014) Emotion recognition in the wild challenge 2014: baseline, data and protocol. ACM Int Conf Multimodal Interact 1:461–466
Google Scholar
Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T (2013) Emotion recognition in the wild challenge 2013. ACM Int Conf Multimodal Interact 1:509–516
Google Scholar
Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE MultiM 19(3):34–41
Article Google Scholar
Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the munich versatile and fast open-source audio feature extractor. ACM Int Conf Multimed 1:1459–1462
Google Scholar
Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint; arXiv:1311.2524
Grosicki M (2014) Neural networks for emotion recognition in the wild. ACM Int Conf Multimodal Interact 1:467–472
Google Scholar
Guo Y, Zhao G, Pietikäinen M (2012) Dynamic facial expression recognition using longitudinal facial expression atlases. Eur Conf Comput Vis 1:631–644
Google Scholar
Hamm J, Lee DD (2008) Grassmann discriminant analysis: a unifying view on subspace-based learning. Int Conf Mach Learn 1:376–383
Google Scholar
Harandi MT, Sanderson C, Shirazi S, Lovell BC (2011) Graph embedding discriminant analysis on grassmannian manifolds for improved image set matching. IEEE Comput Vis Pattern Recognit 1:2705–2712
Google Scholar
Hotelling H (1936) Relations between two sets of variates. Biometrika 321–377
Huang X, He Q, Hong X, Zhao G, Pietikainen M (2014) Improved spatiotemporal local monogenic binary pattern for emotion recognition in the wild. ACM Int Conf Multimodal Interact 1:514–520
Google Scholar
Hubel DH, Wiesel TN (1968) Receptive fields and functional architecture of monkey striate cortex. J Physiol 195(1):215–243
Article Google Scholar
Jia Y (2013) Caffe: an open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org
Kaya H, Salah AA (2014) Combining modality-specific extreme learning machines for emotion recognition in the wild. ACM Int Conf Multimodal Interact 1:487–493
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 1:1097–1105
Google Scholar
Lan Z-Z, Jiang L, Yu S-I, Rawat S, Cai Y, Gao C, Xu S, Shen H, Li X, Wang Y, et al (2013) Cmu-informedia at trecvid 2013 multimedia event detection. TRECVID 2013 Workshop
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Li P, Wang Q, Zhang L (2013) A novel earth mover’s distance methodology for image matching with gaussian mixture models. IEEE Int Conf Comput Vis 1:1689–1696
Google Scholar
Liu M, Li S, Shan S, Chen X (2013) Au-aware deep networks for facial expression recognition. IEEE Autom Face Gesture Recognit 1:1–6
Google Scholar
Liu M, Li S, Shan S, Wang R, Chen X (2014) Deeply learning deformable facial action parts model for dynamic expression analysis. Asian Conf Comput Vis 1:143–157
Google Scholar
Liu M, Shan S, Wang R, Chen X (2014) Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. IEEE Comput Vis Pattern Recognit 1:1749–1756
Google Scholar
Liu M, Wang R, Huang Z, Shan S, Chen X (2013) Partial least squares regression on grassmannian manifold for emotion recognition. ACM Int Conf Multimodal Interact 1:525–530
Google Scholar
Liu M, Wang R, Li S, Shan S, Huang Z, Chen X (2014) Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. ACM Int Conf Multimodal Interact 1:494–501
Google Scholar
Lovrić M, Min-Oo M, Ruh EA (2000) Multivariate normal distributions parametrized as a riemannian symmetric space. J Multivar Anal 74(1):36–48
Article MathSciNet MATH Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Article Google Scholar
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. IEEE Comput Vis Pattern Recognit Workshops 1:94–101
Google Scholar
Meudt S, Schwenker F (2014) Enhanced autocorrelation in real world emotion recognition. ACM Int Conf Multimodal Interact 1:502–507
Google Scholar
Pennec X, Fillard P, Ayache N (2006) A riemannian framework for tensor computing. Int J Comput Vision 66(1):41–66
Article MathSciNet MATH Google Scholar
Ringeval F, Amiriparian S, Eyben F, Scherer K, Schuller B (2014) Emotion recognition in the wild: incorporating voice and lip activity in multimodal decision-level fusion. ACM Int Conf Multimodal Interact 1:473–480
Google Scholar
Rosipal R, Krämer N (2006) Overview and recent advances in partial least squares. Subspace Latent Struct Featur Select 34–51
Shakhnarovich G, Fisher JW, Darrell T (2002) Face recognition from long-term observations. Eur Conf Comput Vis 1:851–865
MATH Google Scholar
Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vision Comput 27(6):803–816
Article Google Scholar
Sharma A, Jacobs DW (2011) Bypassing synthesis: pls for face recognition with pose, low-resolution and sketch. IEEE Comput Vis Pattern Recognit 1:593–600
Google Scholar
Sikka K, Dykstra K, Sathyanarayana S, Littlewort G, Bartlett M (2013) Multiple kernel learning for emotion recognition in the wild. ACM Int Conf Multimodal Interact 1:517–524
Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint; arXiv:1406.2199
Sun B, Li L, Zuo T, Chen Y, Zhou G, Wu X (2014) Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. ACM Int Conf Multimodal Interact 1:481–486
Google Scholar
Valstar M, Pantic M (2010) Induced disgust, happiness and surprise: an addition to the mmi facial expression database. Int Conf Lang Resour Eval 1:65–70
Google Scholar
Valstar M, Schuller B, Smith K, Eyben F, Jiang B, Bilakhia S, Schnieder S, Cowie R, Pantic M (2013) Avec 2013: the continuous audio/visual emotion and depression recognition challenge. ACM Int Workshop Audio/Vis Emot Chall 1:3–10
Article Google Scholar
Valstar MF, Jiang B, Mehu M, Pantic M, Scherer K (2011) The first facial expression recognition and analysis challenge. IEEE Autom Face Gesture Recognit 1:921–926
Vemulapalli R, Pillai JK, Chellappa R (2013) Kernel learning for extrinsic classification of manifold features. IEEE Comput Vis Pattern Recognit 1:1782–1789
Google Scholar
Wang R, Guo H, Davis LS, Dai Q (2012) Covariance discriminative learning: a natural and efficient approach to image set classification. IEEE Comput Vis Pattern Recognit 1:2496–2503
Google Scholar
Wang Z, Wang S, Ji Q (2013) Capturing complex spatio-temporal relations among facial muscles for facial expression recognition. IEEE Comput Vis Pattern Recognit 1:3422–3429
Google Scholar
Wold H (1985) Partial least squares. In: Kotz S, Johnson NL (eds) Encyclopedia of statistical sciences, vol 6. Wiley, New York, pp 581–591
Google Scholar
Yamaguchi O, Fukui K, Maeda K-I (1998) Face recognition using temporal image sequence. IEEE Autom Face Gesture Recognit 1:318–323
Article Google Scholar
Yang P, Liu Q, Metaxas DN (2007) Boosting coded dynamic features for facial action units and facial expression recognition. IEEE Comput Vis Pattern Recognit 1:1–6
Google Scholar
Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. Pattern Anal Mach Intell IEEE Trans 31(1):39–58
Article Google Scholar
Zhang X, Zhang L, Wang X-J, Shum H-Y (2012) Finding celebrities in billions of web images. Multimed IEEE Trans 14(4):995–1007
Article Google Scholar
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern Anal Mach Intell IEEE Trans 29(6):915–928
Article Google Scholar
Zhong L, Liu Q, Yang P, Liu B, Huang J, Metaxas DN (2012) Learning active facial patches for expression analysis. IEEE Comput Vis Pattern Recognit 1:2562–2569
Google Scholar

Download references

Acknowledgments

This work is partially supported by 973 Program under contract No. 2015CB351802, Natural Science Foundation of China under contracts Nos. 61390511, 61222211, 61379083, and Youth Innovation Promotion Association CAS No. 2015085.

Author information

Authors and Affiliations

Key Laboratory of Intelligent Processing, Institute of Computing Technology, CAS, Chinese Academy of Sciences (CAS), No. 6 Kexueyuan South Road, Beijing, 100190, People’s Republic of China
Mengyi Liu, Ruiping Wang, Shaoxin Li, Zhiwu Huang, Shiguang Shan & Xilin Chen

Authors

Mengyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ruiping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shaoxin Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shiguang Shan
View author publications
You can also search for this author in PubMed Google Scholar
Xilin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shiguang Shan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, M., Wang, R., Li, S. et al. Video modeling and learning on Riemannian manifold for emotion recognition in the wild. J Multimodal User Interfaces 10, 113–124 (2016). https://doi.org/10.1007/s12193-015-0204-5

Download citation

Received: 15 December 2014
Accepted: 14 October 2015
Published: 11 November 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s12193-015-0204-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video modeling and learning on Riemannian manifold for emotion recognition in the wild

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Deep learning-based facial emotion recognition for human–computer interaction applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Video modeling and learning on Riemannian manifold for emotion recognition in the wild

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Deep learning-based facial emotion recognition for human–computer interaction applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation