MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

Authors:
Zheng Lian

Institute of Automation, Chinese Academy of Sciences (CAS), Beijing, China

Institute of Automation, Chinese Academy of Sciences (CAS), Beijing, China

0000-0001-9477-0599
View Profile

,
Haiyang Sun

Institute of Automation, Chinese Academy of Sciences, Beijing, China

Institute of Automation, Chinese Academy of Sciences, Beijing, China

0009-0007-5184-478X
View Profile

,
Licai Sun

University of Chinese Academy of Sciences, Beijing, China

University of Chinese Academy of Sciences, Beijing, China

0000-0002-7944-3458
View Profile

,
Kang Chen

Peking University, Beijing, China

Peking University, Beijing, China

0000-0002-7944-3458
View Profile

,
Mngyu Xu

Institute of Automation, CAS, Beijing, China

Institute of Automation, CAS, Beijing, China

0000-0002-7944-3458
View Profile

,
Kexin Wang

Institute of Automation, CAS, Beijing, China

Institute of Automation, CAS, Beijing, China

0000-0002-7944-3458
View Profile

,
Ke Xu

University of Chinese Academy of Sciences, Beijing, China

University of Chinese Academy of Sciences, Beijing, China

0000-0002-7944-3458
View Profile

,
Yu He

University of Chinese Academy of Sciences, Beijing, China

University of Chinese Academy of Sciences, Beijing, China

0000-0002-7944-3458
View Profile

,
Ying Li

Shandong Normal University, Shandong, China

Shandong Normal University, Shandong, China

0000-0002-7944-3458
View Profile

,
Jinming Zhao

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China

0000-0002-7944-3458
View Profile

,
Ye Liu

Institute of Psychology, CAS, Beijing, China

Institute of Psychology, CAS, Beijing, China

0000-0002-7944-3458
View Profile

,
Bin Liu

Institute of Psychology, CAS, Beijing, China

Institute of Psychology, CAS, Beijing, China

0000-0002-7944-3458
View Profile

,
Jiangyan Yi

Institute of Psychology, CAS, Beijing, China

Institute of Psychology, CAS, Beijing, China

0000-0003-2422-4618
View Profile

,
Meng Wang

Ant Group, Beijing, China

Ant Group, Beijing, China

0000-0003-2422-4618
View Profile

,
Erik Cambria

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore

0000-0002-3030-1280
View Profile

,
Guoying Zhao

University of Oulu, Oulu, Finland

University of Oulu, Oulu, Finland

0000-0003-3694-206X
View Profile

,
Björn W. Schuller

Imperial College London, London, United Kingdom

Imperial College London, London, United Kingdom

0000-0002-6478-8699
View Profile

,
Jianhua Tao

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China

0000-0002-0477-587X
View Profile

MM '23: Proceedings of the 31st ACM International Conference on MultimediaOctober 2023Pages 9610–9614https://doi.org/10.1145/3581783.3612836

Published:27 October 2023Publication History

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 9610–9614

ABSTRACT

The first Multimodal Emotion Recognition Challenge (MER 2023)1 was successfully held at ACM Multimedia. The challenge focuses on system robustness and consists of three distinct tracks: (1) MER-MULTI, where participants are required to recognize both discrete and dimensional emotions; (2) MER-NOISE, in which noise is added to test videos for modality robustness evaluation; (3) MER-SEMI, which provides a large amount of unlabeled samples for semi-supervised learning. In this paper, we introduce the motivation behind this challenge, describe the benchmark dataset, and provide some statistics about participants. To continue using this dataset after MER 2023, please sign a new End User License Agreement2 and send it to our official email address3. We believe this high-quality dataset can become a new benchmark in multimodal emotion recognition, especially for the Chinese research community.

References

Samira Ebrahimi Kahou, Vincent Michalski, Kishore Konda, Roland Memisevic, and Christopher Pal. Recurrent neural networks for emotion recognition in video. In Proceedings of the International Conference on Multimodal Interaction, pages 467--474, 2015.Google Scholar
Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vin-cent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, et al. Emonets: Multimodal deep learning approaches for emotion recognition in video. Journal on Multimodal User Interfaces, 10:99--111, 2016.Google ScholarCross Ref
Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3):55--75, 2018.Google ScholarCross Ref
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. Iemo-cap: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42:335--359, 2008.Google ScholarCross Ref
AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2236--2246, 2018.Google ScholarCross Ref
Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. Multimodal multi-task learning for dimensional and continuous emotion recognition. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pages 19--26, 2017.Google ScholarDigital Library
Md Shad Akhtar, Dushyant Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, and Pushpak Bhattacharyya. Multi-task learning for multi-modal emotion recognition and sentiment analysis. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 370--379, 2019.Google ScholarCross Ref
Huang-Cheng Chou, Chi-Chun Lee, and Carlos Busso. Exploiting co-occurrence frequency of emotions in perceptual evaluations to train a speech emotion classifier. In Proceedings of the Interspeech, pages 161--165, 2022.Google ScholarCross Ref
Kexin Wang, Zheng Lian, Licai Sun, Bin Liu, Jianhua Tao, and Yin Fan. Emotional reaction analysis based on multi-label graph convolutional networks and dynamic facial expression recognition transformer. In Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge, pages 75--80, 2022.Google ScholarDigital Library
Jinming Zhao, Ruichen Li, and Qin Jin. Missing modality imagination network for emotion recognition with uncertain missing modalities. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2608--2618, 2021.Google ScholarCross Ref
Ziqi Yuan, Wei Li, Hua Xu, and Wenmeng Yu. Transformer-based feature recon- struction network for robust multimodal sentiment analysis. In Proceedings of the 29th ACM International Conference on Multimedia, pages 4400--4407, 2021.Google ScholarDigital Library
Licai Sun, Zheng Lian, Bin Liu, and Jianhua Tao. Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. arXiv preprint arXiv:2208.07589, 2022.Google Scholar
Devamanyu Hazarika, Yingting Li, Bo Cheng, Shuai Zhao, Roger Zimmermann, and Soujanya Poria. Analyzing modality robustness in multimodal sentiment analysis. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 685--696, 2022.Google ScholarCross Ref
Changqing Zhang, Yajie Cui, Zongbo Han, Joey Tianyi Zhou, Huazhu Fu, and Qinghua Hu. Deep partial multi-view learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(05):2402--2415, 2022.Google Scholar
Zheng Lian, Lan Chen, Licai Sun, Bin Liu, and Jianhua Tao. Gcnet: Graph completion network for incomplete multimodal learning in conversation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.Google ScholarDigital Library
Zheng Lian, Jianhua Tao, Bin Liu, and Jian Huang. Unsupervised representation learning with future observation prediction for speech emotion recognition. Proceedings of the Interspeech, pages 3840--3844, 2019.Google ScholarCross Ref
Rui Mao, Qian Liu, Kai He, Wei Li, and Erik Cambria. The biases of pre-trained language models: An empirical study on prompt-based sentiment analysis and emotion detection. IEEE Transactions on Affective Computing, 2022.Google Scholar
Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. A survey of transfer learning. Journal of Big data, 3(1):1--40, 2016.Google ScholarCross Ref
Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In Proceedings of the Advances in Neural Information Processing Systems, 2022.Google Scholar
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15979--15988. IEEE, 2022.Google ScholarCross Ref
Björn Schuller, Michel Valstar, Florian Eyben, Gary McKeown, Roddy Cowie, and Maja Pantic. Avec 2011-the first international audio/visual emotion challenge. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction, pages 415--424. Springer, 2011.Google ScholarCross Ref
Björn Schuller, Michel Valster, Florian Eyben, Roddy Cowie, and Maja Pantic. Avec 2012: the continuous audio/visual emotion challenge. In Proceedings of the International Conference on Multimodal Interaction, pages 449--456. ACM, 2012.Google Scholar
Michel Valstar, Björn Schuller, Kirsty Smith, Florian Eyben, Bihan Jiang, Sanjay Bilakhia, Sebastian Schnieder, Roddy Cowie, and Maja Pantic. Avec 2013: the continuous audio/visual emotion and depression recognition challenge. In Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, pages 3--10, 2013.Google ScholarDigital Library
Michel Valstar, Björn Schuller, Kirsty Smith, Timur Almaev, Florian Eyben, Jarek Krajewski, Roddy Cowie, and Maja Pantic. Avec 2014: 3d dimensional affect and depression recognition challenge. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, pages 3--10, 2014.Google ScholarDigital Library
Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, and Maja Pantic. Avec 2015: The 5th international audio/visual emotion challenge and workshop. In Proceedings of the 23rd ACM International Conference on Multimedia, pages 1335--1336, 2015.Google ScholarDigital Library
Michel Valstar, Jonathan Gratch, Björn Schuller, Fabien Ringeval, Denis Lalanne, Mercedes Torres Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and Maja Pantic. Avec 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pages 3--10, 2016.Google ScholarDigital Library
Fabien Ringeval, Björn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Sharon Mozgai, Nicholas Cummins, Maximilian Schmitt, and Maja Pantic. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pages 3--9, 2017.Google ScholarDigital Library
Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, Heysem Kaya, Maximilian Schmitt, Shahin Amiriparian, Nicholas Cummins, Denis Lalanne, Adrien Michaud, et al. Avec 2018 workshop and challenge: Bipolar disorder and cross-cultural affect recognition. In Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, pages 3--13, 2018.Google ScholarDigital Library
Fabien Ringeval, Björn Schuller, Michel Valstar, Nicholas Cummins, Roddy Cowie, Leili Tavabi, Maximilian Schmitt, Sina Alisamir, Shahin Amiriparian, Eva-Maria Messner, et al. Avec 2019 workshop and challenge: state-of-mind, detecting depression with ai, and cross-cultural affect recognition. In Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, pages 3--12, 2019.Google ScholarDigital Library
Abhinav Dhall, Roland Goecke, Jyoti Joshi, Michael Wagner, and Tom Gedeon. Emotion recognition in the wild challenge 2013. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pages 509--516, 2013.Google ScholarDigital Library
Abhinav Dhall, Roland Goecke, Jyoti Joshi, Karan Sikka, and Tom Gedeon. Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 461--466, 2014.Google ScholarDigital Library
Abhinav Dhall, OV Ramana Murthy, Roland Goecke, Jyoti Joshi, and Tom Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 423--426, 2015.Google ScholarDigital Library
Abhinav Dhall, Roland Goecke, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. Emotiw 2016: Video and group-level emotion recognition challenges. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, pages 427--432, 2016.Google ScholarDigital Library
Abhinav Dhall, Roland Goecke, Shreya Ghosh, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. From individual to group-level emotion recognition: Emotiw 5.0. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, pages 524--528, 2017.Google ScholarDigital Library
Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. Emotiw 2018: Audio-video, student engagement and group-level affect prediction. In Proceedings of the 20th ACM International Conference on Multimodal Interaction, pages 653--656, 2018.Google Scholar
Abhinav Dhall. Emotiw 2019: Automatic emotion, engagement and cohesion prediction tasks. In Proceedings of the International Conference on Multimodal Interaction, pages 546--550, 2019.Google Scholar
Abhinav Dhall, Garima Sharma, Roland Goecke, and Tom Gedeon. Emotiw 2020: Driver gaze, group emotion, student engagement and physiological signal based challenges. In Proceedings of the International Conference on Multimodal Interaction, pages 784--789, 2020.Google Scholar
Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn W Schuller, Iulia Lefter, et al. Muse 2020 challenge and workshop: Multimodal sentiment analysis, emotion-target engagement and trustworthiness detection in real-life media: Emotional car reviews in-the-wild. In Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, pages 35--44, 2020.Google ScholarDigital Library
Lukas Stappen, Alice Baird, Lukas Christ, Lea Schumann, Benjamin Sertolli, Eva-Maria Messner, Erik Cambria, Guoying Zhao, and Björn W Schuller. The muse 2021 multimodal sentiment analysis challenge: sentiment, emotion, physiological-emotion, and stress. In Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, pages 5--14, 2021.Google ScholarDigital Library
Shahin Amiriparian, Lukas Christ, Andreas König, Eva-Maria Meßner, Alan Cowen, Erik Cambria, and Björn W Schuller. Muse 2022 challenge: Multimodal humour, emotional reactions, and stress. In Proceedings of the 30th ACM International Conference on Multimedia, pages 7389--7391, 2022.Google ScholarDigital Library
Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihalcea. Meld: A multimodal multi-party dataset for emotion recognition in conversations. In Proceedings of the 57th Conference of the Association for Computational Linguistics, pages 527--536, 2019.Google ScholarCross Ref
Zheng Lian, Bin Liu, and Jianhua Tao. Ctnet: Conversational transformer network for emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:985--1000, 2021.Google Scholar
Zheng Lian, Bin Liu, and Jianhua Tao. Decn: Dialogical emotion correction network for conversational emotion recognition. Neurocomputing, 454:483--495, 2021.Google ScholarCross Ref

Index Terms

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Read More
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Read More
Semi-supervised partial label learning algorithm via reliable label propagation
Abstract
Partial label learning (PLL) is a weakly supervised learning method that is able to predict one label as the correct answer from a given candidate label set. In PLL, when all possible candidate labels are as signed to real-world training examples, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2023
Check for updates
Author Tags
modality robustness
multi-label learning
multimodal emotion recognition challenge (mer 2023)
semi-supervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 921
  Total Downloads
- Downloads (Last 12 months)921
- Downloads (Last 6 weeks)207
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training

Semi-supervised multi-label classification using incomplete label information

Semi-supervised partial label learning algorithm via reliable label propagation