skip to main content
10.1145/3543507.3587427acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
short-paper

Identifying Creative Harmful Memes via Prompt based Approach

Published:30 April 2023Publication History

ABSTRACT

The creative nature of memes has made it possible for harmful content to spread quickly and widely on the internet. Harmful memes can range from spreading hate speech promoting violence, and causing emotional distress to individuals or communities. These memes are often designed to be misleading, manipulative, and controversial, making it challenging to detect and remove them from online platforms. Previous studies focused on how to fuse visual and language modalities to capture contextual information. However, meme analysis still severely suffers from data deficiency, resulting in insufficient learning of fusion modules. Further, using conventional pretrained encoders for text and images exhibits a greater semantic gap in feature spaces and leads to low performance. To address these gaps, this paper reformulates a harmful meme analysis as an auto-filling and presents a prompt-based approach to identify harmful memes. Specifically, we first transform multimodal data to a single (i.e., textual) modality by generating the captions and attributes of the visual data and then prepend the textual data in the prompt-based pre-trained language model. Experimental results on two benchmark harmful memes datasets demonstrate that our method outperformed state-of-the-art methods. We conclude with the transferability and robustness of our approach to identify creative harmful memes.

References

  1. Firoj Alam, Stefano Cresci, Tanmoy Chakraborty, Fabrizio Silvestri, Dimiter Dimitrov, Giovanni Da San Martino, Shaden Shaar, Hamed Firooz, and Preslav Nakov. 2021. A survey on multimodal disinformation detection. arXiv preprint arXiv:2103.12541 (2021).Google ScholarGoogle Scholar
  2. Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2009. Evaluation measures for ordinal regression. In 2009 Ninth international conference on intelligent systems design and applications. IEEE, 283–287.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.Google ScholarGoogle Scholar
  4. Yitao Cai, Huiyu Cai, and Xiaojun Wan. 2019. Multi-modal sarcasm detection in twitter with hierarchical fusion model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2506–2515.Google ScholarGoogle ScholarCross RefCross Ref
  5. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  6. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarGoogle ScholarCross RefCross Ref
  7. Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708.Google ScholarGoogle ScholarCross RefCross Ref
  8. Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Ethan Perez, and Davide Testuggine. 2019. Supervised multimodal bitransformers for classifying images and text. arXiv preprint arXiv:1909.02950 (2019).Google ScholarGoogle Scholar
  9. Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, and Davide Testuggine. 2020. The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in Neural Information Processing Systems 33 (2020), 2611–2624.Google ScholarGoogle Scholar
  10. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. https://doi.org/10.48550/ARXIV.1412.6980Google ScholarGoogle ScholarCross RefCross Ref
  11. Gokul Karthik Kumar and Karthik Nanadakumar. 2022. Hate-CLIPper: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP Features. arXiv preprint arXiv:2210.05916 (2022).Google ScholarGoogle Scholar
  12. Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3045–3059.Google ScholarGoogle ScholarCross RefCross Ref
  13. Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. https://doi.org/10.48550/ARXIV.2201.12086Google ScholarGoogle ScholarCross RefCross Ref
  14. Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang. 2019. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019).Google ScholarGoogle Scholar
  15. Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586 (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems 32 (2019).Google ScholarGoogle Scholar
  17. Shie Mannor, Dori Peleg, and Reuven Rubinstein. 2005. The cross entropy method for classification. 561–568. https://doi.org/10.1145/1102351.1102422Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Usman Naseem, Jinman Kim, Matloob Khushi, and Adam G Dunn. 2023. A Multimodal Framework for the Identification of Vaccine Critical Memes on Twitter. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 706–714.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, and Sebastian Riedel. 2019. Language Models as Knowledge Bases¿https://doi.org/10.48550/ARXIV.1909.01066Google ScholarGoogle ScholarCross RefCross Ref
  20. Shraman Pramanick, Shivam Sharma, Dimitar Dimitrov, Md Shad Akhtar, Preslav Nakov, and Tanmoy Chakraborty. 2021. MOMENTA: A multimodal framework for detecting harmful memes and their targets. arXiv preprint arXiv:2109.05184 (2021).Google ScholarGoogle Scholar
  21. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763.Google ScholarGoogle Scholar
  22. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. https://doi.org/10.48550/ARXIV.2103.00020Google ScholarGoogle ScholarCross RefCross Ref
  23. Shivam Sharma, Firoj Alam, Md Akhtar, Dimitar Dimitrov, Giovanni Da San Martino, Hamed Firooz, Alon Halevy, Fabrizio Silvestri, Preslav Nakov, Tanmoy Chakraborty, 2022. Detecting and Understanding Harmful Memes: A Survey. arXiv preprint arXiv:2205.04274 (2022).Google ScholarGoogle Scholar
  24. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  25. Surendrabikram Thapa, Aditya Shah, Farhan Jafri, Usman Naseem, and Imran Razzak. 2022. A multi-modal dataset for hate speech detection on social media: Case-study of russia-ukraine conflict. In Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE). 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  26. Qi Wu, Chunhua Shen, Lingqiao Liu, Anthony Dick, and Anton Van Den Hengel. 2016. What value do explicit high level concepts have in vision to language problems¿. In Proceedings of the IEEE conference on computer vision and pattern recognition. 203–212.Google ScholarGoogle ScholarCross RefCross Ref
  27. Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1492–1500.Google ScholarGoogle ScholarCross RefCross Ref
  28. Jinyu Yang, Zhe Li, Feng Zheng, Ales Leonardis, and Jingkuan Song. 2022. Prompting for Multi-Modal Tracking. In Proceedings of the 30th ACM International Conference on Multimedia. 3492–3500.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yang Yu and Dong Zhang. 2022. Few-shot multi-modal sentiment analysis with prompt-based vision-aware language modeling. In 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  30. Yang Yu, Dong Zhang, and Shoushan Li. 2022. Unified Multi-modal Pre-training for Few-shot Sentiment Analysis with Prompt-based Learning. In Proceedings of the 30th ACM International Conference on Multimedia. 189–198.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Identifying Creative Harmful Memes via Prompt based Approach

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WWW '23: Proceedings of the ACM Web Conference 2023
        April 2023
        4293 pages
        ISBN:9781450394161
        DOI:10.1145/3543507

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 April 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format