skip to main content
10.1145/3581783.3612836acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open Access

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

Authors Info & Claims
Published:27 October 2023Publication History

ABSTRACT

The first Multimodal Emotion Recognition Challenge (MER 2023)1 was successfully held at ACM Multimedia. The challenge focuses on system robustness and consists of three distinct tracks: (1) MER-MULTI, where participants are required to recognize both discrete and dimensional emotions; (2) MER-NOISE, in which noise is added to test videos for modality robustness evaluation; (3) MER-SEMI, which provides a large amount of unlabeled samples for semi-supervised learning. In this paper, we introduce the motivation behind this challenge, describe the benchmark dataset, and provide some statistics about participants. To continue using this dataset after MER 2023, please sign a new End User License Agreement2 and send it to our official email address3. We believe this high-quality dataset can become a new benchmark in multimodal emotion recognition, especially for the Chinese research community.

References

  1. Samira Ebrahimi Kahou, Vincent Michalski, Kishore Konda, Roland Memisevic, and Christopher Pal. Recurrent neural networks for emotion recognition in video. In Proceedings of the International Conference on Multimodal Interaction, pages 467--474, 2015.Google ScholarGoogle Scholar
  2. Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vin-cent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, et al. Emonets: Multimodal deep learning approaches for emotion recognition in video. Journal on Multimodal User Interfaces, 10:99--111, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  3. Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3):55--75, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  4. Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. Iemo-cap: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42:335--359, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  5. AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2236--2246, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  6. Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. Multimodal multi-task learning for dimensional and continuous emotion recognition. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pages 19--26, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Md Shad Akhtar, Dushyant Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, and Pushpak Bhattacharyya. Multi-task learning for multi-modal emotion recognition and sentiment analysis. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 370--379, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  8. Huang-Cheng Chou, Chi-Chun Lee, and Carlos Busso. Exploiting co-occurrence frequency of emotions in perceptual evaluations to train a speech emotion classifier. In Proceedings of the Interspeech, pages 161--165, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  9. Kexin Wang, Zheng Lian, Licai Sun, Bin Liu, Jianhua Tao, and Yin Fan. Emotional reaction analysis based on multi-label graph convolutional networks and dynamic facial expression recognition transformer. In Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge, pages 75--80, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jinming Zhao, Ruichen Li, and Qin Jin. Missing modality imagination network for emotion recognition with uncertain missing modalities. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2608--2618, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  11. Ziqi Yuan, Wei Li, Hua Xu, and Wenmeng Yu. Transformer-based feature recon- struction network for robust multimodal sentiment analysis. In Proceedings of the 29th ACM International Conference on Multimedia, pages 4400--4407, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Licai Sun, Zheng Lian, Bin Liu, and Jianhua Tao. Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. arXiv preprint arXiv:2208.07589, 2022.Google ScholarGoogle Scholar
  13. Devamanyu Hazarika, Yingting Li, Bo Cheng, Shuai Zhao, Roger Zimmermann, and Soujanya Poria. Analyzing modality robustness in multimodal sentiment analysis. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 685--696, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  14. Changqing Zhang, Yajie Cui, Zongbo Han, Joey Tianyi Zhou, Huazhu Fu, and Qinghua Hu. Deep partial multi-view learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(05):2402--2415, 2022.Google ScholarGoogle Scholar
  15. Zheng Lian, Lan Chen, Licai Sun, Bin Liu, and Jianhua Tao. Gcnet: Graph completion network for incomplete multimodal learning in conversation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zheng Lian, Jianhua Tao, Bin Liu, and Jian Huang. Unsupervised representation learning with future observation prediction for speech emotion recognition. Proceedings of the Interspeech, pages 3840--3844, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  17. Rui Mao, Qian Liu, Kai He, Wei Li, and Erik Cambria. The biases of pre-trained language models: An empirical study on prompt-based sentiment analysis and emotion detection. IEEE Transactions on Affective Computing, 2022.Google ScholarGoogle Scholar
  18. Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. A survey of transfer learning. Journal of Big data, 3(1):1--40, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  19. Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In Proceedings of the Advances in Neural Information Processing Systems, 2022.Google ScholarGoogle Scholar
  20. Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15979--15988. IEEE, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  21. Björn Schuller, Michel Valstar, Florian Eyben, Gary McKeown, Roddy Cowie, and Maja Pantic. Avec 2011-the first international audio/visual emotion challenge. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction, pages 415--424. Springer, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  22. Björn Schuller, Michel Valster, Florian Eyben, Roddy Cowie, and Maja Pantic. Avec 2012: the continuous audio/visual emotion challenge. In Proceedings of the International Conference on Multimodal Interaction, pages 449--456. ACM, 2012.Google ScholarGoogle Scholar
  23. Michel Valstar, Björn Schuller, Kirsty Smith, Florian Eyben, Bihan Jiang, Sanjay Bilakhia, Sebastian Schnieder, Roddy Cowie, and Maja Pantic. Avec 2013: the continuous audio/visual emotion and depression recognition challenge. In Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, pages 3--10, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michel Valstar, Björn Schuller, Kirsty Smith, Timur Almaev, Florian Eyben, Jarek Krajewski, Roddy Cowie, and Maja Pantic. Avec 2014: 3d dimensional affect and depression recognition challenge. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, pages 3--10, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, and Maja Pantic. Avec 2015: The 5th international audio/visual emotion challenge and workshop. In Proceedings of the 23rd ACM International Conference on Multimedia, pages 1335--1336, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michel Valstar, Jonathan Gratch, Björn Schuller, Fabien Ringeval, Denis Lalanne, Mercedes Torres Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and Maja Pantic. Avec 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pages 3--10, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Fabien Ringeval, Björn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Sharon Mozgai, Nicholas Cummins, Maximilian Schmitt, and Maja Pantic. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pages 3--9, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, Heysem Kaya, Maximilian Schmitt, Shahin Amiriparian, Nicholas Cummins, Denis Lalanne, Adrien Michaud, et al. Avec 2018 workshop and challenge: Bipolar disorder and cross-cultural affect recognition. In Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, pages 3--13, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Fabien Ringeval, Björn Schuller, Michel Valstar, Nicholas Cummins, Roddy Cowie, Leili Tavabi, Maximilian Schmitt, Sina Alisamir, Shahin Amiriparian, Eva-Maria Messner, et al. Avec 2019 workshop and challenge: state-of-mind, detecting depression with ai, and cross-cultural affect recognition. In Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, pages 3--12, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Abhinav Dhall, Roland Goecke, Jyoti Joshi, Michael Wagner, and Tom Gedeon. Emotion recognition in the wild challenge 2013. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pages 509--516, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Abhinav Dhall, Roland Goecke, Jyoti Joshi, Karan Sikka, and Tom Gedeon. Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 461--466, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Abhinav Dhall, OV Ramana Murthy, Roland Goecke, Jyoti Joshi, and Tom Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 423--426, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Abhinav Dhall, Roland Goecke, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. Emotiw 2016: Video and group-level emotion recognition challenges. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, pages 427--432, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Abhinav Dhall, Roland Goecke, Shreya Ghosh, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. From individual to group-level emotion recognition: Emotiw 5.0. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, pages 524--528, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. Emotiw 2018: Audio-video, student engagement and group-level affect prediction. In Proceedings of the 20th ACM International Conference on Multimodal Interaction, pages 653--656, 2018.Google ScholarGoogle Scholar
  36. Abhinav Dhall. Emotiw 2019: Automatic emotion, engagement and cohesion prediction tasks. In Proceedings of the International Conference on Multimodal Interaction, pages 546--550, 2019.Google ScholarGoogle Scholar
  37. Abhinav Dhall, Garima Sharma, Roland Goecke, and Tom Gedeon. Emotiw 2020: Driver gaze, group emotion, student engagement and physiological signal based challenges. In Proceedings of the International Conference on Multimodal Interaction, pages 784--789, 2020.Google ScholarGoogle Scholar
  38. Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn W Schuller, Iulia Lefter, et al. Muse 2020 challenge and workshop: Multimodal sentiment analysis, emotion-target engagement and trustworthiness detection in real-life media: Emotional car reviews in-the-wild. In Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, pages 35--44, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Lukas Stappen, Alice Baird, Lukas Christ, Lea Schumann, Benjamin Sertolli, Eva-Maria Messner, Erik Cambria, Guoying Zhao, and Björn W Schuller. The muse 2021 multimodal sentiment analysis challenge: sentiment, emotion, physiological-emotion, and stress. In Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, pages 5--14, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shahin Amiriparian, Lukas Christ, Andreas König, Eva-Maria Meßner, Alan Cowen, Erik Cambria, and Björn W Schuller. Muse 2022 challenge: Multimodal humour, emotional reactions, and stress. In Proceedings of the 30th ACM International Conference on Multimedia, pages 7389--7391, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. Meld: A multimodal multi-party dataset for emotion recognition in conversations. In Proceedings of the 57th Conference of the Association for Computational Linguistics, pages 527--536, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  42. Zheng Lian, Bin Liu, and Jianhua Tao. Ctnet: Conversational transformer network for emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:985--1000, 2021.Google ScholarGoogle Scholar
  43. Zheng Lian, Bin Liu, and Jianhua Tao. Decn: Dialogical emotion correction network for conversational emotion recognition. Neurocomputing, 454:483--495, 2021.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader