short-paper

Facial Action Unit-based Deep Learning Framework for Spotting Macro- and Micro-expressions in Long Video Sequences

Authors:
Bo Yang

KDDI Research, Inc., Fujimino-shi, Japan

KDDI Research, Inc., Fujimino-shi, Japan
View Profile

,
Jianming Wu

KDDI Research, Inc., Fujimino-shi, Japan

KDDI Research, Inc., Fujimino-shi, Japan
View Profile

,
Zhiguang Zhou

Systemsoft, Inc., Fujimino-shi, Japan

Systemsoft, Inc., Fujimino-shi, Japan
View Profile

,
Megumi Komiya

Systemsoft, Inc., Fujimino-shi, Japan

Systemsoft, Inc., Fujimino-shi, Japan
View Profile

,
Koki Kishimoto

KDDI Research, Inc., Fujimino-shi, Japan

KDDI Research, Inc., Fujimino-shi, Japan
View Profile

,
Jianfeng Xu

KDDI Research, Inc., Fujimino-shi, Japan

KDDI Research, Inc., Fujimino-shi, Japan
View Profile

,
Keisuke Nonaka

KDDI Research Inc., Fujimino-shi, Japan

KDDI Research Inc., Fujimino-shi, Japan
View Profile

,
Toshiharu Horiuchi

KDDI Research, Inc., Fujimino-shi, Japan

KDDI Research, Inc., Fujimino-shi, Japan
View Profile

,
Satoshi Komorita

KDDI Research, Inc., Fujimino-shi, Japan

KDDI Research, Inc., Fujimino-shi, Japan
View Profile

,
Gen Hattori

KDDI Research, Inc., Fujimino-shi, Japan

KDDI Research, Inc., Fujimino-shi, Japan
View Profile

,
Sei Naito

KDDI Research Inc., Fujimino-shi, Japan

KDDI Research Inc., Fujimino-shi, Japan
View Profile

,
Yasuhiro Takishima

KDDI Research, Inc., Fujimino-shi, Japan

KDDI Research, Inc., Fujimino-shi, Japan
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 4794–4798https://doi.org/10.1145/3474085.3479209

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 4794–4798

ABSTRACT

In this paper, we utilize facial action units (AUs) detection to construct an end-to-end deep learning framework for the macro- and micro-expressions spotting task in long video sequences. The proposed framework focuses on individual components of facial muscle movement rather than processing the whole image, which eliminates the influence of image change caused by noises, such as body or head movement. Compared with existing models deploying deep learning methods with classical Convolutional Neural Network (CNN) models, the proposed framework utilizes Gated Recurrent Unit (GRU) or Long Short-term Memory (LSTM) or our proposed Concat-CNN models to learn the characteristic correlation between AUs of distinctive frames. The Concat-CNN uses three convolutional kernels with different sizes to observe features of different duration and emphasizes both local and global mutation features by changing dimensionality (max-pooling size) of the output space. Our proposal achieves state-of-the-art performance from the aspect of overall F1-scores: 0.2019 on CAS(ME)2-cropped, 0.2736 on SAMM Long Video, and 0.2118 on CAS(ME)2, which not only outperforms the baseline but is also ranked the 3rd of FME challenge 2021 for combined datasets of CAS(ME)2-cropped and SAMM-LV.

References

Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. OpenFace 2.0: Facial Behavior Analysis Toolkit. In 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018). 59--66. https://doi.org/10.1109/FG.2018.00019Google ScholarDigital Library
Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk, and Y. Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1724--1734. https: //doi.org/10.3115/v1/D14--1179Google Scholar
Adrian K Davison, Cliff Lansley, Nicholas Costen, Kevin Tan, and Moi Hoon Yap. 2016. Samm: A spontaneous micro-facial movement dataset. IEEE transactions on affective computing 9, 1 (2016), 116--129. Google ScholarDigital Library
Adrian K Davison, Walied Merghani, and Moi Hoon Yap. 2018. Objective classes for micro-facial expression recognition. Journal of Imaging 4, 10 (2018), 119.Google ScholarCross Ref
Carlos Duque, Olivier Alata, Remi Emonet, Anneclaire Legrand, and Hubert Konik. 2018. Micro-Expression Spotting Using the Riesz Pyramid. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). 66--74. https://doi.org/10.1109/WACV.2018.00014Google Scholar
Jennifer Endres and Anita Laidlaw. 2009. Micro-expression recognition training in medical students: A pilot study. BMC medical education 9 (02 2009), 47. https: //doi.org/10.1186/1472-6920-9-47Google Scholar
Ying He, Su-Jing Wang, Jingting Li, and Moi Hoon Yap. 2020. Spotting Macro- and Micro-expression Intervals in Long Video Sequences. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG). IEEE Computer Society, Los Alamitos, CA, USA, 742--748. https://doi.org/ 10.1109/FG47880.2020.00036Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-term Memory. Neural computation 9 (12 1997), 1735--80. https://doi.org/10.1162/neco.1997.9.8.1735 Google ScholarDigital Library
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1746--1751. https://doi.org/10.3115/v1/D14-1181Google ScholarCross Ref
Jingting LI, Su-Jing Wang, Moi Hoon Yap, John See, Xiaopeng Hong, and Xiaobai Li. 2020. MEGC2020 - The Third Facial Micro-Expression Grand Challenge. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). 777--780. https://doi.org/10.1109/FG47880.2020.00035Google Scholar
Xiaobai Li, Xiaopeng Hong, Antti Moilanen, Xiaohua Huang, Tomas Pfister, Guoying Zhao, and Matti Pietikainen. 2017. Towards Reading Hidden Emotions: A Comparative Study of Spontaneous Micro-Expression Spotting and Recognition Methods. IEEE Transactions on Affective Computing PP (02 2017), 1--1. https: //doi.org/10.1109/TAFFC.2017.2667642Google Scholar
Hua Lu, Kidiyo Kpalma, and Joseph Ronsin. 2017. Micro-expression detection using integral projections. Journal of WSCG 25 (01 2017), 87--96.Google Scholar
Albert Mehrabian. 1968. Communication Without Words. Psychological Today. 53--55 pages.Google Scholar
Maureen O'Sullivan, Mark Frank, Carolyn Hurley, and Jaspreet Tiwana. 2009. Police Lie Detection Accuracy: The Effect of Lie Scenario. Law and human behavior 33 (11 2009). https://doi.org/10.1007/s10979-009-9191-yGoogle Scholar
Hang Pan, Lun Xie, and Zhiliang Wang. 2020. Local Bilinear Convolutional Neural Network for Spotting Macro- and Micro-expression Intervals in Long Video Sequences. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). 749--753. https://doi.org/10.1109/FG47880. 2020.00052Google Scholar
Fangbing Qu, Su-Jing Wang, Wen-Jing Yan, He Li, Shuhang Wu, and Xiaolan Fu. 2018. CAS(ME)2: A Database for Spontaneous Macro-Expression and Micro- Expression Spotting and Recognition. IEEE Transactions on Affective Computing 9, 4 (2018), 424--436. https://doi.org/10.1109/TAFFC.2017.2654440Google ScholarDigital Library
Patrick Stewart, Bridget Waller, and James Schubert. 2009. Presidential speech- making style: Emotional response to micro-expressions of facial affect. Motivation and Emotion 33 (06 2009), 125--135. https://doi.org/10.1007/s11031-009-9129-1Google Scholar
Su-Jing Wang, Ying He, Jingting Li, and Xiaolan Fu. 2021. MESNet: A Convo- lutional Neural Network for Spotting Multi-Scale Micro-Expression Intervals in Long Videos. IEEE Transactions on Image Processing 30 (2021), 3956--3969. https://doi.org/10.1109/TIP.2021.3064258Google ScholarCross Ref
Chuin Hong Yap, Connah Kendrick, and Moi Hoon Yap. 2020. SAMM Long Videos: A Spontaneous Facial Micro- and Macro-Expressions Dataset. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG). IEEE Computer Society, Los Alamitos, CA, USA, 771--776. https: //doi.org/10.1109/FG47880.2020.00029Google Scholar
Chuin Hong Yap, Moi Hoon Yap, Adrian K. Davison, and Ryan Cunningham. 2021. Efficient Lightweight 3D-CNN using Frame Skipping and Contrast Enhancement for Facial Macro- and Micro-expression Spotting. CoRR abs/2105.06340 (2021). arXiv:2105.06340 https://arxiv.org/abs/2105.06340Google Scholar
Li-Wei Zhang, Jingting Li, Su-Jing Wang, Xian-Hua Duan, Wen-Jing Yan, Hai- Yong Xie, and Shu-Cheng Huang. 2020. Spatio-temporal fusion for Macro-andMicro-expression Spotting in Long Video Sequences. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). 734--741. https://doi.org/10.1109/FG47880.2020.00037Google ScholarCross Ref
Zhihao Zhang, T. Chen, Hongying Meng, Guangyuan Liu, and Xiaolan Fu. 2018. SMEConvNet: A Convolutional Neural Network for Spotting Spontaneous Facial Micro-Expression From Long Videos. IEEE Access 6 (11 2018), 71143--71151. https://doi.org/10.1109/ACCESS.2018.2879485.Google Scholar

Index Terms

Facial Action Unit-based Deep Learning Framework for Spotting Macro- and Micro-expressions in Long Video Sequences
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Spatio-temporal Convolutional Attention Network for Spotting Macro- and Micro-expression Intervals
FME'21: Proceedings of the 1st Workshop on Facial Micro-Expression: Advanced Techniques for Facial Expressions Generation and Spotting

Emotional detection based on facial expressions is an important procedure in high-risk tasks such as criminal investigation or lie detection. To reduce the impact of the inconsistency in the duration of macro- and micro-expression, we propose an ...
Read More
Automatic stress analysis from facial videos based on deep facial action units recognition
Abstract
Stress conditions are manifested in different human body’s physiological processes and the human face. Facial expressions are modelled consistently through the Facial Action Coding System (FACS) using the facial Action Units (AU) parameters. This ...
Read More
Deep Structure Inference Network for Facial Action Unit Recognition
Computer Vision – ECCV 2018
Abstract
Facial expressions are combinations of basic components called Action Units (AU). Recognizing AUs is key for general facial expression analysis. Recently, efforts in automatic AU recognition have been dedicated to learning combinations of local ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
facial action units
macro-expression
micro-expression
neural networks
spotting task
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 351
  Total Downloads
- Downloads (Last 12 months)113
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Facial Action Unit-based Deep Learning Framework for Spotting Macro- and Micro-expressions in Long Video Sequences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Spatio-temporal Convolutional Attention Network for Spotting Macro- and Micro-expression Intervals

Automatic stress analysis from facial videos based on deep facial action units recognition

Deep Structure Inference Network for Facial Action Unit Recognition