short-paper

Identifying Creative Harmful Memes via Prompt based Approach

Authors:
Junhui Ji

School of Computer Science, University of Sydney, Australia

School of Computer Science, University of Sydney, Australia

0009-0009-2811-1865
View Profile

,
Wei Ren

School of Computer Science, University of Sydney, Australia

School of Computer Science, University of Sydney, Australia

0009-0005-3436-7438
View Profile

,
Usman Naseem

School of Computer Science, University of Sydney, Australia

School of Computer Science, University of Sydney, Australia

0000-0003-0191-7171
View Profile

Authors Info & Claims

WWW '23: Proceedings of the ACM Web Conference 2023April 2023Pages 3868–3872https://doi.org/10.1145/3543507.3587427

Published:30 April 2023Publication History

WWW '23: Proceedings of the ACM Web Conference 2023

Pages 3868–3872

ABSTRACT

The creative nature of memes has made it possible for harmful content to spread quickly and widely on the internet. Harmful memes can range from spreading hate speech promoting violence, and causing emotional distress to individuals or communities. These memes are often designed to be misleading, manipulative, and controversial, making it challenging to detect and remove them from online platforms. Previous studies focused on how to fuse visual and language modalities to capture contextual information. However, meme analysis still severely suffers from data deficiency, resulting in insufficient learning of fusion modules. Further, using conventional pretrained encoders for text and images exhibits a greater semantic gap in feature spaces and leads to low performance. To address these gaps, this paper reformulates a harmful meme analysis as an auto-filling and presents a prompt-based approach to identify harmful memes. Specifically, we first transform multimodal data to a single (i.e., textual) modality by generating the captions and attributes of the visual data and then prepend the textual data in the prompt-based pre-trained language model. Experimental results on two benchmark harmful memes datasets demonstrate that our method outperformed state-of-the-art methods. We conclude with the transferability and robustness of our approach to identify creative harmful memes.

References

Firoj Alam, Stefano Cresci, Tanmoy Chakraborty, Fabrizio Silvestri, Dimiter Dimitrov, Giovanni Da San Martino, Shaden Shaar, Hamed Firooz, and Preslav Nakov. 2021. A survey on multimodal disinformation detection. arXiv preprint arXiv:2103.12541 (2021).Google Scholar
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2009. Evaluation measures for ordinal regression. In 2009 Ninth international conference on intelligent systems design and applications. IEEE, 283–287.Google ScholarDigital Library
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.Google Scholar
Yitao Cai, Huiyu Cai, and Xiaojun Wan. 2019. Multi-modal sarcasm detection in twitter with hierarchical fusion model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2506–2515.Google ScholarCross Ref
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708.Google ScholarCross Ref
Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Ethan Perez, and Davide Testuggine. 2019. Supervised multimodal bitransformers for classifying images and text. arXiv preprint arXiv:1909.02950 (2019).Google Scholar
Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, and Davide Testuggine. 2020. The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in Neural Information Processing Systems 33 (2020), 2611–2624.Google Scholar
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. https://doi.org/10.48550/ARXIV.1412.6980Google ScholarCross Ref
Gokul Karthik Kumar and Karthik Nanadakumar. 2022. Hate-CLIPper: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP Features. arXiv preprint arXiv:2210.05916 (2022).Google Scholar
Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3045–3059.Google ScholarCross Ref
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. https://doi.org/10.48550/ARXIV.2201.12086Google ScholarCross Ref
Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang. 2019. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019).Google Scholar
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586 (2021).Google ScholarDigital Library
Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems 32 (2019).Google Scholar
Shie Mannor, Dori Peleg, and Reuven Rubinstein. 2005. The cross entropy method for classification. 561–568. https://doi.org/10.1145/1102351.1102422Google ScholarDigital Library
Usman Naseem, Jinman Kim, Matloob Khushi, and Adam G Dunn. 2023. A Multimodal Framework for the Identification of Vaccine Critical Memes on Twitter. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 706–714.Google ScholarDigital Library
Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, and Sebastian Riedel. 2019. Language Models as Knowledge Bases¿https://doi.org/10.48550/ARXIV.1909.01066Google ScholarCross Ref
Shraman Pramanick, Shivam Sharma, Dimitar Dimitrov, Md Shad Akhtar, Preslav Nakov, and Tanmoy Chakraborty. 2021. MOMENTA: A multimodal framework for detecting harmful memes and their targets. arXiv preprint arXiv:2109.05184 (2021).Google Scholar
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763.Google Scholar
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. https://doi.org/10.48550/ARXIV.2103.00020Google ScholarCross Ref
Shivam Sharma, Firoj Alam, Md Akhtar, Dimitar Dimitrov, Giovanni Da San Martino, Hamed Firooz, Alon Halevy, Fabrizio Silvestri, Preslav Nakov, Tanmoy Chakraborty, 2022. Detecting and Understanding Harmful Memes: A Survey. arXiv preprint arXiv:2205.04274 (2022).Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Surendrabikram Thapa, Aditya Shah, Farhan Jafri, Usman Naseem, and Imran Razzak. 2022. A multi-modal dataset for hate speech detection on social media: Case-study of russia-ukraine conflict. In Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE). 1–6.Google ScholarCross Ref
Qi Wu, Chunhua Shen, Lingqiao Liu, Anthony Dick, and Anton Van Den Hengel. 2016. What value do explicit high level concepts have in vision to language problems¿. In Proceedings of the IEEE conference on computer vision and pattern recognition. 203–212.Google ScholarCross Ref
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1492–1500.Google ScholarCross Ref
Jinyu Yang, Zhe Li, Feng Zheng, Ales Leonardis, and Jingkuan Song. 2022. Prompting for Multi-Modal Tracking. In Proceedings of the 30th ACM International Conference on Multimedia. 3492–3500.Google ScholarDigital Library
Yang Yu and Dong Zhang. 2022. Few-shot multi-modal sentiment analysis with prompt-based vision-aware language modeling. In 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.Google ScholarCross Ref
Yang Yu, Dong Zhang, and Shoushan Li. 2022. Unified Multi-modal Pre-training for Few-shot Sentiment Analysis with Prompt-based Learning. In Proceedings of the 30th ACM International Conference on Multimedia. 189–198.Google ScholarDigital Library

Index Terms

Identifying Creative Harmful Memes via Prompt based Approach
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. World Wide Web
    1. Web applications
      1. Social networks

Recommendations

Disentangling Hate in Online Memes
MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Hateful and offensive content detection has been extensively explored in a single modality such as text. However, such toxic information could also be communicated via multimodal content such as online memes. Therefore, detecting multimodal hateful ...
Read More
Identifying the influential bloggers in a community
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining

Blogging becomes a popular way for a Web user to publish information on the Web. Bloggers write blog posts, share their likes and dislikes, voice their opinions, provide suggestions, report news, and form groups in Blogosphere. Bloggers form their ...
Read More
Identifying Influential Bloggers: Time Does Matter
WI-IAT '09: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01

Blogs have recently become one of the most favored services on the Web. Many users maintain a blog and write posts to express their opinion, experience and knowledge about a product, an event and every subject of general or specific interest. More users ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '23: Proceedings of the ACM Web Conference 2023
April 2023
4293 pages
ISBN:9781450394161
DOI:10.1145/3543507
Editors:
Ying Ding,
Jie Tang,
Juan Sequeda,
Lora Aroyo,
Carlos Castillo,
Geert-Jan Houben
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 April 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 419
  Total Downloads
- Downloads (Last 12 months)418
- Downloads (Last 6 weeks)50
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Identifying Creative Harmful Memes via Prompt based Approach

WWW '23: Proceedings of the ACM Web Conference 2023

ABSTRACT

References

Cited By

Index Terms

Recommendations

Disentangling Hate in Online Memes

Identifying the influential bloggers in a community

Identifying Influential Bloggers: Time Does Matter

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Identifying Creative Harmful Memes via Prompt based Approach

WWW '23: Proceedings of the ACM Web Conference 2023

ABSTRACT

References

Cited By

Index Terms

Recommendations

Disentangling Hate in Online Memes

Identifying the influential bloggers in a community

Identifying Influential Bloggers: Time Does Matter

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media