ABSTRACT
Exploiting social media to spread hate has tremendously increased over the years. Lately, multi-modal hateful content such as memes has drawn relatively more traction than uni-modal content. Moreover, the availability of implicit content payloads makes them fairly challenging to be detected by existing hateful meme detection systems. In this paper, we present a use case study to analyze such systems’ vulnerabilities against external adversarial attacks. We find that even very simple perturbations in uni-modal and multi-modal settings performed by humans with little knowledge about the model can make the existing detection models highly vulnerable. Empirically, we find a noticeable performance drop of as high as 10% in the macro-F1 score for certain attacks. As a remedy, we attempt to boost the model’s robustness using contrastive learning as well as an adversarial training-based method - VILLA. Using an ensemble of the above two approaches, in two of our high resolution datasets, we are able to (re)gain back the performance to a large extent for certain attacks. We believe that ours is a first step toward addressing this crucial problem in an adversarial setting and would inspire more such investigations in the future.
- Tariq Habib Afridi, Aftab Alam, Muhammad Numan Khan, Jawad Khan, and Young-Koo Lee. 2020. A Multimodal Memes Classification: A Survey and Open Research Issues. https://doi.org/10.48550/ARXIV.2009.08395Google ScholarCross Ref
- Piush Aggarwal, Michelle Espranita Liman, Darina Gold, and Torsten Zesch. 2021. VL-BERT+: Detecting Protected Groups in Hateful Multimodal Memes. In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021). Association for Computational Linguistics, Online, 207–214. https://doi.org/10.18653/v1/2021.woah-1.22Google ScholarCross Ref
- Natalie Alkiviadou. 2019. Hate speech on social media networks: towards a regulatory framework¿Information & Communications Technology Law 28, 1 (2019), 19–35. https://doi.org/10.1080/13600834.2018.1494417 arXiv:https://doi.org/10.1080/13600834.2018.1494417Google ScholarCross Ref
- Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Belhassen Bayar and Matthew C. Stamm. 2016. A Deep Learning Approach to Universal Image Manipulation Detection Using a New Convolutional Layer. In Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security (Vigo, Galicia, Spain) (IH&MMSec ’16). Association for Computing Machinery, New York, NY, USA, 5–10. https://doi.org/10.1145/2909827.2930786Google ScholarDigital Library
- Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. 2013. Evasion Attacks against Machine Learning at Test Time. In Machine Learning and Knowledge Discovery in Databases, Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, and Filip Železný (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 387–402.Google Scholar
- N. Carlini and D. Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 39–57. https://doi.org/10.1109/SP.2017.49Google ScholarCross Ref
- Olivier Chapelle, Jason Weston, Léon Bottou, and Vladimir Vapnik. 2000. Vicinal Risk Minimization. In Advances in Neural Information Processing Systems, T. Leen, T. Dietterich, and V. Tresp (Eds.). Vol. 13. MIT Press. https://proceedings.neurips.cc/paper/2000/file/ba9a56ce0a9bfa26e8ed9e10b2cc8f46-Paper.pdfGoogle Scholar
- Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. arXiv preprint arXiv:2002.05709 (2020).Google Scholar
- Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton. 2020. Big Self-Supervised Models are Strong Semi-Supervised Learners. arXiv preprint arXiv:2006.10029 (2020).Google Scholar
- Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020. Uniter: Universal image-text representation learning. In ECCV.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarCross Ref
- Samuel Dooley, Tom Goldstein, and John P. Dickerson. 2021. Robustness Disparities in Commercial Face Detection. ArXiv abs/2108.12508 (2021).Google Scholar
- Ivan Evtimov, Russel Howes, Brian Dolhansky, Hamed Firooz, and Cristian Canton Ferrer. 2020. Adversarial Evaluation of Multimodal Models under Realistic Gray Box Assumption. https://doi.org/10.48550/ARXIV.2011.12902Google ScholarCross Ref
- Edgar González Fernández, Ana Sandoval Orozco, Luis García Villalba, and Julio Hernandez-Castro. 2018. Digital Image Tamper Detection Technique Based on Spectrum Analysis of CFA Artifacts. Sensors 18, 9 (Aug. 2018), 2804. https://doi.org/10.3390/s18092804Google ScholarCross Ref
- Elisabetta Fersini, Francesca Gasparini, Giulia Rizzi, Aurora Saibene, Berta Chulvi, Paolo Rosso, Alyssa Lees, and Jeffrey Sorensen. 2022. SemEval-2022 Task 5: Multimedia automatic misogyny identification. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). Association for Computational Linguistics.Google ScholarCross Ref
- Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng, and Jingjing Liu. 2020. Large-Scale Adversarial Training for Vision-and-Language Representation Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates, Inc., 6616–6628. https://proceedings.neurips.cc/paper/2020/file/49562478de4c54fafd4ec46fdb297de5-Paper.pdfGoogle Scholar
- Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng, and Jingjing Liu. 2020. Large-Scale Adversarial Training for Vision-and-Language Representation Learning. arxiv:2006.06195 [cs.CV]Google Scholar
- Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng, and Jingjing Liu. 2020. Large-Scale Adversarial Training for Vision-and-Language Representation Learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS’20). Curran Associates Inc., Red Hook, NY, USA, Article 555, 13 pages.Google ScholarDigital Library
- Darina Gold, Piush Aggarwal, and Torsten Zesch. 2021. GerMemeHate: A Parallel Dataset of German Hateful Memes Translated from English. https://www.inf.uni-hamburg.de/en/inst/ab/lt/publications/2021-alacamyimmam-konvens-mmhs21.pdf#page=9Google Scholar
- Thomas Gottron. 2008. Content Code Blurring: A New Approach to Content Extraction. In 2008 19th International Workshop on Database and Expert Systems Applications. 29–33. https://doi.org/10.1109/DEXA.2008.43Google ScholarDigital Library
- Tommi Gröndahl, Luca Pajola, Mika Juuti, Mauro Conti, and N. Asokan. 2018. All You Need is "Love": Evading Hate-speech Detection. https://doi.org/10.48550/ARXIV.1808.09115Google ScholarCross Ref
- Amos Guiora and Elizabeth A. Park. 2017. Hate Speech on Social Media. Philosophia 45, 3 (July 2017), 957–971. https://doi.org/10.1007/s11406-017-9858-4Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. https://doi.org/10.48550/ARXIV.1512.03385Google ScholarCross Ref
- Ming Shan Hee, Roy Ka-Wei Lee, and Wen-Haw Chong. 2022. On Explaining Multimodal Hateful Meme Detection Models. https://doi.org/10.48550/ARXIV.2204.01734Google ScholarCross Ref
- Siddharth Jaiswal, Karthikeya Duggirala, Abhisek Dash, and Animesh Mukherjee. 2022. Two-Face: Adversarial Audit of Commercial Face Recognition Systems. Proceedings of the International AAAI Conference on Web and Social Media 16, 1 (May 2022), 381–392. https://doi.org/10.1609/icwsm.v16i1.19300Google ScholarCross Ref
- Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Tuo Zhao. 2020. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2177–2190. https://doi.org/10.18653/v1/2020.acl-main.197Google ScholarCross Ref
- Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, and Davide Testuggine. 2020. The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes. https://doi.org/10.48550/ARXIV.2005.04790Google ScholarCross Ref
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. https://doi.org/10.48550/ARXIV.1412.6980Google ScholarCross Ref
- Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, and Thomas Dandres. 2019. Quantifying the Carbon Emissions of Machine Learning. arXiv preprint arXiv:1910.09700 (2019).Google Scholar
- Vijaysinh Lendave. 2021. A guide to different types of noises and image denoising methods. https://analyticsindiamag.com/a-guide-to-different-types-of-noises-and-image-denoising-methods/Google Scholar
- Linjie Li, Zhe Gan, and Jingjing Liu. 2021. A Closer Look at the Robustness of Vision-and-Language Pre-trained Models. arxiv:2012.08673 [cs.CV]Google Scholar
- Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang. 2019. VisualBERT: A Simple and Performant Baseline for Vision and Language. https://doi.org/10.48550/ARXIV.1908.03557Google ScholarCross Ref
- Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiaowei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, 2020. Oscar: Object-semantics aligned pre-training for vision-language tasks. In European Conference on Computer Vision. Springer, 121–137.Google ScholarDigital Library
- Phillip Lippe, Nithin Holla, Shantanu Chandra, Santhosh Rajamanickam, Georgios Antoniou, Ekaterina Shutova, and Helen Yannakoudakis. 2020. A Multimodal Framework for the Detection of Hateful Memes. (2020). https://doi.org/10.48550/ARXIV.2012.12871Google ScholarCross Ref
- Xiaofeng Liu, Yang Zou, Lingsheng Kong, Zhihui Diao, Junliang Yan, Jun Wang, Site Li, Ping Jia, and Jane You. 2018. Data Augmentation via Latent Space Interpolation for Image Classification. In 2018 24th International Conference on Pattern Recognition (ICPR). 728–733. https://doi.org/10.1109/ICPR.2018.8545506Google ScholarCross Ref
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/ARXIV.1907.11692Google ScholarCross Ref
- Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards Deep Learning Models Resistant to Adversarial Attacks. https://doi.org/10.48550/ARXIV.1706.06083Google ScholarCross Ref
- Francesco Marra, Diego Gragnaniello, Davide Cozzolino, and Luisa Verdoliva. 2018. Detection of GAN-Generated Fake Images over Social Networks. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). 384–389. https://doi.org/10.1109/MIPR.2018.00084Google ScholarCross Ref
- Martina Miliani, Giulia Giorgi, Ilir Rama, Guido Anselmi, and Gianluca E. Lebani. 2020. DANKMEMES @ EVALITA 2020: The Memeing of Life: Memes, Multimodality and Politics. In EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020. Accademia University Press, 275–283. https://doi.org/10.4000/books.aaccademia.7330Google ScholarCross Ref
- Niklas Muennighoff. 2020. Vilio: State-of-the-art Visio-Linguistic Models applied to Hateful Memes. https://doi.org/10.48550/ARXIV.2012.07788Google ScholarCross Ref
- Lin Pan, Chung-Wei Hang, Avirup Sil, and Saloni Potdar. 2021. Improved Text Classification via Contrastive Adversarial Training. https://doi.org/10.48550/ARXIV.2107.10137Google ScholarCross Ref
- Tianyu Pang, Xiao Yang, Yinpeng Dong, Kun Xu, Jun Zhu, and Hang Su. 2020. Boosting Adversarial Training with Hypersphere Embedding. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates, Inc., 7779–7792. https://proceedings.neurips.cc/paper/2020/file/5898d8095428ee310bf7fa3da1864ff7-Paper.pdfGoogle Scholar
- Zoe Papakipos and Joanna Bitton. 2022. AugLy: Data Augmentations for Robustness. arxiv:2201.06494 [cs.AI]Google Scholar
- Gabriel Peyré and Marco Cuturi. 2018. Computational Optimal Transport. (2018). https://doi.org/10.48550/ARXIV.1803.00567Google ScholarCross Ref
- Shraman Pramanick, Dimitar Dimitrov, Rituparna Mukherjee, Shivam Sharma, Md. Shad Akhtar, Preslav Nakov, and Tanmoy Chakraborty. 2021. Detecting Harmful Memes and Their Targets. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 2783–2796. https://doi.org/10.18653/v1/2021.findings-acl.246Google ScholarCross Ref
- Yao Qiu, Jinchao Zhang, and Jie Zhou. 2021. Improving Gradient-based Adversarial Training for Text Classification by Contrastive Learning and Auto-Encoder. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 1698–1707. https://doi.org/10.18653/v1/2021.findings-acl.148Google ScholarCross Ref
- Sylvestre-Alvise Rebuffi, Sven Gowal, Dan A. Calian, Florian Stimberg, Olivia Wiles, and Timothy Mann. 2021. Data Augmentation Can Improve Robustness. https://doi.org/10.48550/ARXIV.2111.05328Google ScholarCross Ref
- Sylvestre-Alvise Rebuffi, Sven Gowal, Dan A. Calian, Florian Stimberg, Olivia Wiles, and Timothy Mann. 2021. Fixing Data Augmentation to Improve Adversarial Robustness. https://doi.org/10.48550/ARXIV.2103.01946Google ScholarCross Ref
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.). Vol. 28. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdfGoogle Scholar
- Benet Oriol Sabat, Cristian Canton Ferrer, and Xavier Giro i Nieto. 2019. Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation. arxiv:1910.02334 [cs.MM]Google Scholar
- Chhavi Sharma, Deepesh Bhageria, William Scott, Srinivas PYKL, Amitava Das, Tanmoy Chakraborty, Viswanath Pulabaigari, and Björn Gambäck. 2020. SemEval-2020 Task 8: Memotion Analysis- the Visuo-Lingual Metaphor!. In Proceedings of the Fourteenth Workshop on Semantic Evaluation. International Committee for Computational Linguistics, Barcelona (online), 759–773. https://doi.org/10.18653/v1/2020.semeval-1.99Google ScholarCross Ref
- Patrice Y. Simard, Yann A. LeCun, John S. Denker, and Bernard Victorri. 2012. Transformation Invariance in Pattern Recognition – Tangent Distance and Tangent Propagation. Springer Berlin Heidelberg, Berlin, Heidelberg, 235–269. https://doi.org/10.1007/978-3-642-35289-8_17Google ScholarCross Ref
- Shardul Suryawanshi, Bharathi Raja Chakravarthi, Mihael Arcan, and Paul Buitelaar. 2020. Multimodal Meme Dataset (MultiOFF) for Identifying Offensive Content in Image and Text. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying. European Language Resources Association (ELRA), Marseille, France, 32–41. https://aclanthology.org/2020.trac-1.6Google Scholar
- Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. https://doi.org/10.48550/ARXIV.1312.6199Google ScholarCross Ref
- Hao Tan and Mohit Bansal. 2019. LXMERT: Learning Cross-Modality Encoder Representations from Transformers. https://doi.org/10.48550/ARXIV.1908.07490Google ScholarCross Ref
- Önsen Toygar, Felix O Babalola, and Yiltan Bitirim. 2020. FYO: a novel multimodal vein database with palmar, dorsal and wrist biometrics. IEEE Access 8 (2020), 82461–82470.Google ScholarCross Ref
- Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2017. Ensemble Adversarial Training: Attacks and Defenses. https://doi.org/10.48550/ARXIV.1705.07204Google ScholarCross Ref
- Riza Velioglu and Jewgeni Rose. 2020. Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://doi.org/10.48550/ARXIV.2012.12975Google ScholarCross Ref
- Nishant Vishwamitra, Hongxin Hu, Ziming Zhao, Long Cheng, and Feng Luo. 2021. Understanding and Measuring Robustness of Multimodal Learning. https://doi.org/10.48550/ARXIV.2112.12792Google ScholarCross Ref
- Nishant Vishwamitra, Hongxin Hu, Ziming Zhao, Long Cheng, and Feng Luo. 2021. Understanding and Measuring Robustness of Multimodal Learning. arxiv:2112.12792 [cs.LG]Google Scholar
- Guoqing Wang, Chuanxin Lan, Hu Han, Shiguang Shan, and Xilin Chen. 2019. Multi-modal face presentation attack detection via spatial and channel attentions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0.Google ScholarCross Ref
- Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. https://github.com/facebookresearch/detectron2.Google Scholar
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. https://doi.org/10.48550/ARXIV.1609.08144Google ScholarCross Ref
- Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L. Yuille, and Kaiming He. 2019. Feature Denoising for Improving Adversarial Robustness. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Karren Yang, Wan-Yi Lin, Manash Barman, Filipe Condessa, and Zico Kolter. 2021. Defending Multimodal Fusion Models Against Single-Source Adversaries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3340–3349.Google ScholarCross Ref
- Savvas Zannettou, Barry Bradlyn, Emiliano De Cristofaro, Haewoon Kwak, Michael Sirivianos, Gianluca Stringini, and Jeremy Blackburn. 2018. What is Gab: A Bastion of Free Speech or an Alt-Right Echo Chamber. In Companion Proceedings of the The Web Conference 2018 (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1007–1014. https://doi.org/10.1145/3184558.3191531Google ScholarDigital Library
- Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. 2019. Theoretically Principled Trade-off between Robustness and Accuracy. In Proceedings of the 36th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 7472–7482. https://proceedings.mlr.press/v97/zhang19p.htmlGoogle Scholar
- Lin Zhao, Changsheng Chen, and Jiwu Huang. 2021. Deep Learning-Based Forgery Attack on Document Images. IEEE Transactions on Image Processing 30 (2021), 7964–7979. https://doi.org/10.1109/TIP.2021.3112048Google ScholarCross Ref
- Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Tom Goldstein, and Jingjing Liu. 2020. FreeLB: Enhanced Adversarial Training for Natural Language Understanding. In International Conference on Learning Representations. https://openreview.net/forum¿id=BygzbyHFvBGoogle Scholar
Index Terms
- HateProof: Are Hateful Meme Detection Systems really Robust?
Recommendations
Multimodal Zero-Shot Hateful Meme Detection
WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022Facebook has recently launched the hateful meme detection challenge, which garnered much attention in academic and industry research communities. Researchers have proposed multimodal deep learning classification methods to perform hateful meme ...
On Explaining Multimodal Hateful Meme Detection Models
WWW '22: Proceedings of the ACM Web Conference 2022Hateful meme detection is a new multimodal task that has gained significant traction in academic and industry research communities. Recently, researchers have applied pre-trained visual-linguistic models to perform the multimodal classification task, ...
Disentangling Hate in Online Memes
MM '21: Proceedings of the 29th ACM International Conference on MultimediaHateful and offensive content detection has been extensively explored in a single modality such as text. However, such toxic information could also be communicated via multimodal content such as online memes. Therefore, detecting multimodal hateful ...
Comments