ABSTRACT
In this paper, we utilize facial action units (AUs) detection to construct an end-to-end deep learning framework for the macro- and micro-expressions spotting task in long video sequences. The proposed framework focuses on individual components of facial muscle movement rather than processing the whole image, which eliminates the influence of image change caused by noises, such as body or head movement. Compared with existing models deploying deep learning methods with classical Convolutional Neural Network (CNN) models, the proposed framework utilizes Gated Recurrent Unit (GRU) or Long Short-term Memory (LSTM) or our proposed Concat-CNN models to learn the characteristic correlation between AUs of distinctive frames. The Concat-CNN uses three convolutional kernels with different sizes to observe features of different duration and emphasizes both local and global mutation features by changing dimensionality (max-pooling size) of the output space. Our proposal achieves state-of-the-art performance from the aspect of overall F1-scores: 0.2019 on CAS(ME)2-cropped, 0.2736 on SAMM Long Video, and 0.2118 on CAS(ME)2, which not only outperforms the baseline but is also ranked the 3rd of FME challenge 2021 for combined datasets of CAS(ME)2-cropped and SAMM-LV.
- Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. OpenFace 2.0: Facial Behavior Analysis Toolkit. In 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018). 59--66. https://doi.org/10.1109/FG.2018.00019Google ScholarDigital Library
- Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk, and Y. Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1724--1734. https: //doi.org/10.3115/v1/D14--1179Google Scholar
- Adrian K Davison, Cliff Lansley, Nicholas Costen, Kevin Tan, and Moi Hoon Yap. 2016. Samm: A spontaneous micro-facial movement dataset. IEEE transactions on affective computing 9, 1 (2016), 116--129. Google ScholarDigital Library
- Adrian K Davison, Walied Merghani, and Moi Hoon Yap. 2018. Objective classes for micro-facial expression recognition. Journal of Imaging 4, 10 (2018), 119.Google ScholarCross Ref
- Carlos Duque, Olivier Alata, Remi Emonet, Anneclaire Legrand, and Hubert Konik. 2018. Micro-Expression Spotting Using the Riesz Pyramid. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). 66--74. https://doi.org/10.1109/WACV.2018.00014Google Scholar
- Jennifer Endres and Anita Laidlaw. 2009. Micro-expression recognition training in medical students: A pilot study. BMC medical education 9 (02 2009), 47. https: //doi.org/10.1186/1472-6920-9-47Google Scholar
- Ying He, Su-Jing Wang, Jingting Li, and Moi Hoon Yap. 2020. Spotting Macro- and Micro-expression Intervals in Long Video Sequences. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG). IEEE Computer Society, Los Alamitos, CA, USA, 742--748. https://doi.org/ 10.1109/FG47880.2020.00036Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-term Memory. Neural computation 9 (12 1997), 1735--80. https://doi.org/10.1162/neco.1997.9.8.1735 Google ScholarDigital Library
- Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1746--1751. https://doi.org/10.3115/v1/D14-1181Google ScholarCross Ref
- Jingting LI, Su-Jing Wang, Moi Hoon Yap, John See, Xiaopeng Hong, and Xiaobai Li. 2020. MEGC2020 - The Third Facial Micro-Expression Grand Challenge. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). 777--780. https://doi.org/10.1109/FG47880.2020.00035Google Scholar
- Xiaobai Li, Xiaopeng Hong, Antti Moilanen, Xiaohua Huang, Tomas Pfister, Guoying Zhao, and Matti Pietikainen. 2017. Towards Reading Hidden Emotions: A Comparative Study of Spontaneous Micro-Expression Spotting and Recognition Methods. IEEE Transactions on Affective Computing PP (02 2017), 1--1. https: //doi.org/10.1109/TAFFC.2017.2667642Google Scholar
- Hua Lu, Kidiyo Kpalma, and Joseph Ronsin. 2017. Micro-expression detection using integral projections. Journal of WSCG 25 (01 2017), 87--96.Google Scholar
- Albert Mehrabian. 1968. Communication Without Words. Psychological Today. 53--55 pages.Google Scholar
- Maureen O'Sullivan, Mark Frank, Carolyn Hurley, and Jaspreet Tiwana. 2009. Police Lie Detection Accuracy: The Effect of Lie Scenario. Law and human behavior 33 (11 2009). https://doi.org/10.1007/s10979-009-9191-yGoogle Scholar
- Hang Pan, Lun Xie, and Zhiliang Wang. 2020. Local Bilinear Convolutional Neural Network for Spotting Macro- and Micro-expression Intervals in Long Video Sequences. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). 749--753. https://doi.org/10.1109/FG47880. 2020.00052Google Scholar
- Fangbing Qu, Su-Jing Wang, Wen-Jing Yan, He Li, Shuhang Wu, and Xiaolan Fu. 2018. CAS(ME)2: A Database for Spontaneous Macro-Expression and Micro- Expression Spotting and Recognition. IEEE Transactions on Affective Computing 9, 4 (2018), 424--436. https://doi.org/10.1109/TAFFC.2017.2654440Google ScholarDigital Library
- Patrick Stewart, Bridget Waller, and James Schubert. 2009. Presidential speech- making style: Emotional response to micro-expressions of facial affect. Motivation and Emotion 33 (06 2009), 125--135. https://doi.org/10.1007/s11031-009-9129-1Google Scholar
- Su-Jing Wang, Ying He, Jingting Li, and Xiaolan Fu. 2021. MESNet: A Convo- lutional Neural Network for Spotting Multi-Scale Micro-Expression Intervals in Long Videos. IEEE Transactions on Image Processing 30 (2021), 3956--3969. https://doi.org/10.1109/TIP.2021.3064258Google ScholarCross Ref
- Chuin Hong Yap, Connah Kendrick, and Moi Hoon Yap. 2020. SAMM Long Videos: A Spontaneous Facial Micro- and Macro-Expressions Dataset. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG). IEEE Computer Society, Los Alamitos, CA, USA, 771--776. https: //doi.org/10.1109/FG47880.2020.00029Google Scholar
- Chuin Hong Yap, Moi Hoon Yap, Adrian K. Davison, and Ryan Cunningham. 2021. Efficient Lightweight 3D-CNN using Frame Skipping and Contrast Enhancement for Facial Macro- and Micro-expression Spotting. CoRR abs/2105.06340 (2021). arXiv:2105.06340 https://arxiv.org/abs/2105.06340Google Scholar
- Li-Wei Zhang, Jingting Li, Su-Jing Wang, Xian-Hua Duan, Wen-Jing Yan, Hai- Yong Xie, and Shu-Cheng Huang. 2020. Spatio-temporal fusion for Macro-andMicro-expression Spotting in Long Video Sequences. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). 734--741. https://doi.org/10.1109/FG47880.2020.00037Google ScholarCross Ref
- Zhihao Zhang, T. Chen, Hongying Meng, Guangyuan Liu, and Xiaolan Fu. 2018. SMEConvNet: A Convolutional Neural Network for Spotting Spontaneous Facial Micro-Expression From Long Videos. IEEE Access 6 (11 2018), 71143--71151. https://doi.org/10.1109/ACCESS.2018.2879485.Google Scholar
Index Terms
- Facial Action Unit-based Deep Learning Framework for Spotting Macro- and Micro-expressions in Long Video Sequences
Recommendations
Spatio-temporal Convolutional Attention Network for Spotting Macro- and Micro-expression Intervals
FME'21: Proceedings of the 1st Workshop on Facial Micro-Expression: Advanced Techniques for Facial Expressions Generation and SpottingEmotional detection based on facial expressions is an important procedure in high-risk tasks such as criminal investigation or lie detection. To reduce the impact of the inconsistency in the duration of macro- and micro-expression, we propose an ...
Automatic stress analysis from facial videos based on deep facial action units recognition
AbstractStress conditions are manifested in different human body’s physiological processes and the human face. Facial expressions are modelled consistently through the Facial Action Coding System (FACS) using the facial Action Units (AU) parameters. This ...
Deep Structure Inference Network for Facial Action Unit Recognition
Computer Vision – ECCV 2018AbstractFacial expressions are combinations of basic components called Action Units (AU). Recognizing AUs is key for general facial expression analysis. Recently, efforts in automatic AU recognition have been dedicated to learning combinations of local ...
Comments