skip to main content
10.1145/3382507.3418867acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper
Public Access

Multimodal Gated Information Fusion for Emotion Recognition from EEG Signals and Facial Behaviors

Published:22 October 2020Publication History

ABSTRACT

Emotions associated with neural and behavioral responses are detectable through scalp electroencephalogram (EEG) signals and measures of facial expressions. We propose a multimodal deep representation learning approach for emotion recognition from EEG and facial expression signals. The proposed method involves the joint learning of a unimodal representation aligned with the other modality through cosine similarity and a gated fusion for modality fusion. We evaluated our method on two databases: DAI-EF and MAHNOB-HCI. The results show that our deep representation is able to learn mutual and complementary information between EEG signals and face video, captured by action units, head and eye movements from face videos, in a manner that generalizes across databases. It is able to outperform similar fusion methods for the task at hand.

Skip Supplemental Material Section

Supplemental Material

3382507.3418867.mp4

mp4

30.4 MB

References

  1. J. Arevalo, Th. Solorio, M. Montes-y-Gómez, and F. A. González. 2017. Gated Multimodal Units for Information Fusion. arXiv e-prints, Article arXiv:1702.01992 (Feb. 2017), arXiv:1702.01992 pages. arXiv:stat.ML/1702.01992Google ScholarGoogle Scholar
  2. T. Baltrušaitis, P.Robinson, and LP Morency. 2016. OpenFace: An open source facial behavior analysis toolkit. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  3. T. Baltrušaitis, C. Ahuja, and LP. Morency. 2019. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (Feb 2019), 423--443. https://doi.org/10.1109/TPAMI.2018.2798607Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Bashivan, I. Rish, M. Yeasin, and N. Codella. 2016. Learning Representations from EEG with Deep Recurrent-Convolutional Neural Networks. In 4th International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  5. M. Chen, S. Wang, P. Pu Liang, T. Baltrušaitis, A. Zadeh, and LP. Morency. 2017. Multimodal sentiment analysis with word-level fusion and reinforcement learning. In Proceedings of the 19th ACM International Conference on Multimodal Interaction - ICMI 2017. ACM Press. https://doi.org/10.1145/3136755.3136801Google ScholarGoogle Scholar
  6. S. K. D'mello and J. Kory. 2015. A Review and Meta-Analysis of Multimodal Affect Detection Systems. ACM Comput. Surv. 47, 3, Article 43 (Feb. 2015), 36 pages.Google ScholarGoogle Scholar
  7. Paul Ekman and Wallace V. Friesen. 1978. Facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press.Google ScholarGoogle Scholar
  8. I.I. Goncharova, D.J. McFarland, T.M. Vaughan, and J.R. Wolpaw. 2003. EMG contamination of EEG: spectral and topographical characteristics. Clinical Neurophysiology 114, 9 (sep 2003), 1580--1593. https://doi.org/10.1016/S1388--2457(03)00093-Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Hochreiter and J. Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (1997), 1735--1780. https://doi.org/10.1162/neco.1997.9.8.1735 arXiv:https://doi.org/10.1162/neco.1997.9.8.1735Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jaekyum K., Junho K., Yecheol K., Jaehyung Ch., Youngbae H., and Jun Won Ch. 2019. Robust Deep Multi-modal Learning Based on Gated Information Fusion Network. In Computer Vision -- ACCV 2018, C.V. Jawahar, Hongdong Li, Greg Mori, and Konrad Schindler (Eds.). Springer International Publishing, Cham, 90--106.Google ScholarGoogle Scholar
  11. L. Kessous, G. Castellano, and G. Caridakis. 2010. Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. Journal on Multimodal User Interfaces 3, 1 (2010), 33--48.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sander Koelstra and Ioannis Patras. 2013. Fusion of facial expressions and EEG for implicit affective tagging. Image and Vision Computing 31, 2 (2013), 164--174. https://doi.org/10.1016/j.imavis.2012.10.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J.D. Morris. 1995. Observations: SAM: The self-assessment manikin: An efficient cross-cultural measurement of emotional response. Journal of Advertising Research 35, 6 (1995), 63--68.Google ScholarGoogle Scholar
  14. Ch. Mühl, B. Allison, and G. Nijholt, A.and Chanel. 2014. A survey of affective brain computer interfaces: principles, state-of-the-art, and challenges. BrainComputer Interfaces 1, 2 (2014), 66--84. https://doi.org/10.1080/2326263X.2014.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Rayatdoost, D. Rudrauf, and M. Soleymani. 2020. Expression-Guided EEG Representation Learning for Emotion Recognition. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 3222--3226.Google ScholarGoogle Scholar
  16. S. Rayatdoost and M. Soleymani. 2018. CROSS-CORPUS EEG-BASED EMOTION RECOGNITION. In 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP). 1--6. https://doi.org/10.1109/MLSP.2018.8517037Google ScholarGoogle Scholar
  17. S. Siddharth, T. Jung, and T. J. Sejnowski. 2019. Impact of Affective Multimedia Content on the Electroencephalogram and Facial Expressions. Scientific Reports 9, 1 (Nov. 2019). https://doi.org/10.1038/s41598-019--52891--2Google ScholarGoogle ScholarCross RefCross Ref
  18. S. Siddharth, T. Jung, and T. J. Sejnowski. 2019. Utilizing Deep Learning Towards Multi-modal Bio-sensing and Vision-based Affective Computing. IEEE Transactions on Affective Computing (2019), 1--1. https://doi.org/10.1109/TAFFC.2019.Google ScholarGoogle Scholar
  19. K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:cs.CV/1409.1556Google ScholarGoogle Scholar
  20. M. Soleymani, S. Asghari-Esfeden, Y. Fu, and M. Pantic. 2016. Analysis of EEG Signals and Facial Expressions for Continuous Emotion Detection. IEEE Transactions on Affective Computing 7, 1 (Jan 2016), 17--28. https://doi.org/10.1109/ TAFFC.2015.2436926Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic. 2012. A Multimodal Database for Affect Recognition and Implicit Tagging. IEEE Transactions on Affective Computing 3, 1 (Jan 2012), 42--55. https://doi.org/10.1109/T-AFFC.2011.25Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Soleymani, M. Pantic, and T. Pun. 2012. Multimodal Emotion Recognition in Response to Videos. IEEE Transactions on Affective Computing 3, 2 (April 2012), 211--223. https://doi.org/10.1109/T-AFFC.2011.37Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Song, W. Zheng, P. Song, and Z. Cui. 2018. EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks. IEEE Transactions on Affective Computing (2018), 1--1. https://doi.org/10.1109/TAFFC.2018.2817622Google ScholarGoogle Scholar
  24. X. Yang, P. Ramesh, R. Chitta, S. Madhvanath, E. A. Bernal, and J. Luo. 2017. Deep Multimodal Representation Learning from Temporal Data. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5066--5074. https: //doi.org/10.1109/CVPR.2017.538Google ScholarGoogle ScholarCross RefCross Ref
  25. A. Zadeh, M. Chen, S. Poria, E. Cambria, and LP. Morency. 2017. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 1103--1114. https: //doi.org/10.18653/v1/D17--1115Google ScholarGoogle Scholar
  26. L. Zhao, R. Li, W. Zheng, and B. Lu. 2019. Classification of Five Emotions from EEG and Eye Movement Signals: Complementary Representation Properties. In 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER). 611--614. https://doi.org/10.1109/NER.2019.8717055Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multimodal Gated Information Fusion for Emotion Recognition from EEG Signals and Facial Behaviors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
        October 2020
        920 pages
        ISBN:9781450375818
        DOI:10.1145/3382507

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 October 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate453of1,080submissions,42%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader