research-article

Open Access

Hate Speech Detection in Limited Data Contexts Using Synthetic Data Generation

Authors:
Aman Khullar

Georgia Institute of Technology, USA

Georgia Institute of Technology, USA

0000-0003-4670-6407
View Profile

,
Daniel Nkemelu

Georgia Institute of Technology, USA

Georgia Institute of Technology, USA

0000-0001-9394-004X
View Profile

,
V. Cuong Nguyen

Georgia Institute of Technology, USA

Georgia Institute of Technology, USA

0000-0002-8504-9350
View Profile

,
Michael L. Best

Georgia Institute of Technology, USA

Georgia Institute of Technology, USA

0000-0001-8156-9001
View Profile

Authors Info & Claims

ACM Journal on Computing and Sustainable Societies Volume 2 Issue 1Article No.: 4pp 1–18https://doi.org/10.1145/3625679

Published:13 January 2024Publication History

ACM Journal on Computing and Sustainable Societies

Abstract

A growing body of work has focused on text classification methods for detecting the increasing amount of hate speech posted online. This progress has been limited to only a select number of highly resourced languages causing detection systems to either under-perform or not exist in limited data contexts. This is mostly caused by a lack of training data, which are expensive to collect and curate in these settings. In this work, we propose a data augmentation approach that addresses the problem of lack of data for online hate speech detection in limited data contexts using synthetic data generation techniques. Given a handful of hate speech examples in a high-resource language such as English, we present three methods to synthesize new examples of hate speech data in a target language that retains the hate sentiment in the original examples but transfers the hate targets. We apply our approach to generate training data for hate speech classification tasks in Hindi and Vietnamese. Our findings show that a model trained on synthetic data performs comparably to, and in some cases outperforms, a model trained only on the samples available in the target domain. This method can be adopted to bootstrap hate speech detection models from scratch in limited data contexts. As the growth of social media within these contexts continues to outstrip response efforts, this work furthers our capacities for detection, understanding, and response to hate speech. Disclaimer: This work contains terms that are offensive and hateful. These, however, cannot be avoided due to the nature of the work.

1 INTRODUCTION

The increase in hateful content online has motivated research in automatic approaches for detecting hate speech [24, 57]. Applied approaches from prior work have included heuristic (e.g., dictionaries, distance metrics, rule-based systems) and machine learning–based (e.g., topic modeling [3], word embeddings [5], and deep learning [70]) methods. However, the task of detecting hate speech in limited data contexts is difficult [18, 39]. There is a lack of datasets for training hate speech detection models in many languages, and this presents one of the main long-standing challenges for hate speech detection [65]. This problem is exacerbated for less-popular under-resourced languages [24, 29].

Since only a small proportion of the huge amount of content generated daily is hate speech, most curated datasets have a very high class imbalance with a significantly small amount of positive hate class samples. Hate speech data collection and labeling tasks from scratch have shown to be expensive and not guaranteed to result in sufficient data for training a model [44, 64]. This work explores the effectiveness of synthetic data generation techniques for limited data contexts with little to no ground-truth hate speech data. Within the scope of this article, we describe high-resource languages as languages with ample availability of digital data broadly and hate speech data specifically. Limited data contexts refer to language domains with little to no labeled hate speech examples, whether or not they have unlabeled data resources. While these languages may be reasonably represented in language modeling data, they often do not have existing hate speech repositories to support the work of hate speech detection [30]. These contexts represent the target context in this article.

Data augmentation explores strategies for increasing the diversity of training samples without explicitly collecting new data [23]. Data augmentation techniques have increasingly been used for addressing imbalances or biases in training data by creating new data points through oversampling, heuristics, or geometric transformations [59]. This idea has been successfully applied in other domains, such as audio classification [63] and video classification [69]. With considerations for the sensitive and subjective nature of hate speech, we draw on techniques from data augmentation to generate synthetic examples via context transfer from a freely available high-resource hate speech data repository to a language with limited hate speech data.

In this work, we address the issue of limited hate speech data by exploiting available resources from other higher resourced languages. We propose few-shot methods for hate speech data augmentation in limited data contexts. We then compared the performance of three synthetic hate speech generation methods. The first approach involves automatic machine translation (MT) of the hateful posts in a high-resource language to the limited data language. In the second approach, we identify suitable contextual replacement tokens in the hate speech examples from the high-resource language. Our method, contextual entity substitution (CES), takes as input a handful of examples in a language such as the English language and heuristically replaces the person or group under attack in the high-resource context with potential hate-targeted persons/groups in the target context. This semi-heuristic method retains the sentiment of hate for the target group without altering the meaning of the text, as is prone in generative approaches. We then use an open source language model, BLOOM [56], to synthetically generate hate speech examples in the target context. We design the prompts such that the model can generate hateful posts in the target context when given a few hate speech examples.

We conducted multiple experiments to investigate the performance of the proposed data augmentation approaches in two languages: Hindi and Vietnamese. Though these languages are not considered low resourced (since they are fairly represented in language modeling research due to their representation in unlabeled data sources such as Wikipedia), they have very little hate speech data available, making it nonetheless difficult to train hate speech detection models [30]. A systematic review of 463 hate speech research works found only 4% and <1% representation for the Hindi and Vietnamese languages [29]. Our findings show that synthetic data generated via the CES method can further improve model performance on the target language. Our analyses indicate that the magnitude of the performance gain from CES is based on the careful curation of an entity replacement table that is sensitive to the quality of the replacement matching setup and domain drift.

In summary, the main contributions of this article include the following:

(1)	development of a method for employing synthetic data generation techniques to counter harmful content like hate speech on social media platforms especially in limited data contexts,
(2)	empirical investigation of gains vs. noise tradeoff in combining synthetic machine-translated hate speech data with few original hate speech posts from limited data contexts, and
(3)	development of a new use-case for multilingual large language models showing how generative language models can be used to develop models that counter hate speech.

In the following sections, we present related work, explain our synthetic data generation methodologies in detail, present the experiments that we performed along with their results, and then discuss the implications of our results. The code, data, and the entity table used for our work is present in our GitHub repository.¹

2 RELATED WORK

2.1 Hate Speech Detection in Limited Data Contexts

Detecting hate speech content in limited data contexts remains a critical yet challenging task for machine learning systems. Publicly available ground-truth datasets for hate speech, while abundant in some languages such as English and Chinese, are limited to nonexistent in other contexts such as Burmese and Tagalog. Data unavailability hampers the development of effective hate speech detection models in these contexts [4, 8]. Previous works have explored curating hate speech datasets in low-resource languages by leveraging the knowledge of context experts [44]. However, the data work required for curating hate speech datasets is often an expensive time-consuming step that is not guaranteed to return sufficient data for model training [44, 55].

Earlier works have used SVMs, CNNs, and RNNs for hate speech and offensive language detection in limited data contexts [14, 19, 52]. With the growth of large language models, researchers have leveraged pre-trained multilingual language models such as BERT [20] and XLM-R [15] to perform hate speech classification for limited data contexts via few-shot learning [2, 4, 62]. Aluru et al. [4] evaluated the effectiveness of the multilingual BERT (mBERT) [20] and Language-Agnostic SEntence Representations (LASER) [51] models in detecting hate speech content in both high-resource languages (such as English and Spanish) and low-resource languages (such as Indonesian and Polish) and found that the LASER embedding model with logistic regression performed best in the low-resource scenario, whereas BERT-based models performed better in the high-resource scenario. They also show that data from other languages tend to improve performance in low-resource settings. Lauscher et al. [34] also show that multilingual transformer models like mBERT tend to perform poorly in zero-shot transfer to distant target languages, and augmentation with few annotated samples from the distant language can help improve performance.

Other researchers have explored using transfer learning to adapt existing labeled hate speech data in English and other languages to unlabeled data in new target domains. This often involves leveraging cross-lingual contextual embeddings to make predictions in the low-resource language [8, 50]. In their work, Ranasinghe and Zampieri [50] analyzed how XLM-R, a cross-lingual contextual embedding architecture [15], performs on the task of detecting offensive language in languages such as Bengali and Hindi. They implemented a transfer learning strategy by sequentially training an XLM-R model on English-language offensive speech data and then on the offensive speech data of the lower-resourced language. They found that using the model fine-tuned on Hindi training data achieves an F1 score of 0.806, and fine-tuning on both Hindi and English training data yields an improved F1 score of 0.857.

Our work builds on these existing works by combining transfer learning techniques with contextual entity substitution and language generation methods. We employ a few-shot setup to train an mBERT model on some hate speech examples and then on the augmented data to measure improvement in model performance with synthetic data.

2.2 Context Transfer across Languages

A more targeted approach to improve the performance of models on tasks in limited data contexts involves employing data from higher-resourced languages related to the limited data context. Exploiting similarity in vocabulary and syntax makes insights gained from the high-resource language data reasonably transferable to the limited data context [31, 66]. Khemchandani et al. [31] proposed RelateLM, a mechanism to effectively incorporate new low-resource languages into existing pre-trained language models by aligning low-resource lexicon embeddings with their counterparts in a related high-resource language [31]. They tested the effectiveness of this mechanism on Oriya and Assamese, two Indic languages whose data are unavailable in the mBERT model. In contrast to monolingual BERT, they found benefits in starting from a BERT model fine-tuned on Hindi (a higher-resourced Indic language) and then using RelateLM to incorporate Oriya and Assamese.

Within the context of hate speech, prior works have explored how models trained in one context can be transferred to a different language context [26, 67]. Gröndahl et al. [26] show that hate speech models tend to perform poorly on data that differ from their initial training data. Swamy et al. [60] demonstrated that hate speech models trained on the BERT model tend to perform competitively for different datasets, though generalization depends highly on the training data used. In analyzing the generalizability of hate speech models, Yoder et al. [67] found that targeted demographic categories such as gender/sexuality and race/ethnicity play a significant role and vary from one context to another. Our work takes a data-centric, rather than a model-centric, approach. To address the generalization shortcomings of pretrained models, we focus on improving the synthetic data by transferring the hate sentiment to the limited data context and substituting the contextually relevant target of hate speech to create a new dataset that fits the new domain.

2.3 Data Augmentation and Synthetic Data Generation in NLP

Recent advancements in the field of image generation [25, 49, 54, 68], text generation [10, 48, 61], and speech synthesis [45, 58] have led to the development of an area of research in which model outputs can be used to retrain newer models. This helps reduce annotation costs, maintains data privacy, and can also help with data imbalance and scarcity issues. For audio processing, text-to-speech models are being used to provide the training data to reduce the word-error-rate of the speech recognition models [28] and also help capture words that were not present in the training data [22]. Image generation models are being used to improve the dermatology classifiers [53], detect floods [12], and action recognition [32]. Techniques such as cropping and noise injection, are commonly applied in image and sound processing [47, 59]. However, these techniques do not work well for text data, as they can potentially change the original meaning of the input sentence. To this end, there is a growing body of work on data augmentation for natural language processing exploring tasks such as machine translation [66], automatic speech recognition [43], and named-entity recognition [21].

Text generation models are helping mitigate the class imbalance problem by synthesizing new examples for classes with few-shot approaches [35]. A majority of these works frame the data augmentation requirement as a text generation task [23]. For example, Xia et al. [66] proposed a generalized framework for data augmentation for low-resourced machine translation by generating a parallel corpus between a given low-resourced language and English from a parallel corpus between a related high-resourced language and English through unsupervised machine translation [66]. This technique increased model performance by 1.5 to 8.0 BLEU points compared to the supervised back-translation baseline. The importance of diversity and naturalism has also been studied to help build better synthetic datasets [6].

Data augmentation techniques have successfully been applied to construct hate speech classifiers [11, 27]. For instance, Hartvigsen et al. [27] attempted to augment existing toxic content datasets by leveraging GPT-3, a text generation model, for generating large-scale data on toxic and benign statements targeted at minority identity groups. The authors found that not only was the machine-generated dataset of high quality, but toxicity detection models trained on it significantly outperformed those trained on existing human-curated toxicity datasets. Similarly to these works, we aim to generate synthetic data to improve the hate speech detection accuracy of the machine learning classifier. However, we situate our work specifically to improve hate speech detection accuracy for limited data contexts. We also present an alternative to large language models and provide a competitive synthetic data generation methodology through heuristic contextualization of hate speech for the low-resource language, which is taken from a high-resource language.

3 METHODOLOGY

In this section, we describe our methodology to augment hate speech posts in limited data contexts using synthetic data generation techniques and to evaluate their performance in model training. The initial step involves curating a hate speech dataset in a high-resource language, which is a relatively easy task and is described next.

3.1 Dataset Curation

To start the synthetic data generation process, the first step is to identify a high-resource language and curate hateful posts in the selected language. The sources for the hate speech dataset are diversified to mitigate bias or over-representation of a single target group or individual. This is a relatively easy task due to the abundance of such datasets in the high-resource context. For our experiments, we use English as our high-resource language and use data curated by Mathew et al. [42], which covers 18 different groups targeted with hate speech in the American context. The authors built a corpus of hate speech posts using lexicons provided by Davidson et al. [17], Ousidhoum et al. [46], and Mathew et al. [40]. To reduce ambiguity in the nature of the posts, we selected only posts labeled as hateful and discarded posts labeled as offensive. From this dataset, we use a subset of 3,000 hateful posts in English. We pre-processed this data to remove the tags, hashtags, links, and emoticons from the text. We consider only posts with a word length greater than two after this pre-processing step.

3.2 Machine Translation

After curating the hate speech posts, we use this data to augment the hate speech data in the target language. Das et al. [16] show automatic machine translation can boost classification performance to detect hate speech in the limited data context. For our first synthetic data generation approach, we apply a similar methodology using Microsoft Azure’s machine translation API to convert the curated hate speech posts into Hindi and Vietnamese.

3.3 Contextual Entity Substitution

Our second synthetic data generation approach builds on automatic machine translation. In this approach, we leverage the contextual nature of hate speech to account for differences in target groups and individuals based on different geography while transferring the hate sentiment across contexts. The main idea behind this approach is to identify the target entities subjected to hate speech or hate terms that are used in the high-resource context and substitute them with entities and hate terms from the target context. Figure 1 illustrates the framework we develop to generate synthetic hateful posts while accounting for this context shift.

Fig. 1. Framework to synthetically generate hate speech posts in the limited data context. This framework takes the English hateful dataset as input and contextually translates it to the target language of interest with the help of human-curated entity tables.

The next step involves building an entity table in the high-resource context. This entity table is an instantiation of the practice of creating lexicon lists as done in other works in the literature. For example, The PeaceTech Lab has curated hate lexicons for languages spoken in conflict-prone countries such as Lebanon, Cameroon, and Sudan. The PeaceTech Lab Lexicons are a series of hate speech terms explaining inflammatory social media keywords and offering counter-speech suggestions to combat the spread of hate speech [1]. However, these lexicons are only available for a handful of languages and contexts.

To create the entity table, we categorized lexicons into target groups, target individuals, hate terms, target countries, and political groups. We also differentiated entities (such as countries) that are present in the hateful posts but might not necessarily be the targets and created another category for them. We rely on multiple sources, including lexicons collected by Mathew et al. [41] that were derived from sources including Hatebase² and the Urban Dictionary.³ We annotated 200 posts in the dataset to identify the most common target groups, individuals, countries, and hate terms and added them to the corresponding column in the entity table.

Subsequently, we automatically identify candidate entities for substitution in the hate speech dataset in the high-resource language. We adopt a heuristic approach that leverages the entity table and named entity recognition (NER) models. We iterate over the hate speech posts, find the words with a Levenshtein similarity score greater than a threshold value (0.75 in our case) to the words existing in the entity table, and then replace these words with a corresponding MASK-x. The MASK corresponds to the entity we replace and the suffix x represents the category of the entity. <MASK-G>, <MASK-I>, <MASK-CT>, <MASK-HT>, and <MASK-P> correspond to target groups, target individuals, target countries, hate-terms, and political groups, respectively. For robust coverage in cases where certain names were not captured in our entity table, we used Spacy’S NER model⁴ to identify all the entities with the tag PERSON and replace it with <MASK-I>

We created a similar entity table for the target context. To create this entity table, we ask two native speakers of Hindi and Vietnamese to review a sample of hate speech posts in their respective languages and to identify the hate target entities. Using these data and their experience with the context, they created the corresponding entity table for both languages. We subsequently included a lexicon of hate terms in the ”hate-term” column of the entity table. This is the distinguishing part of the pipeline for different target contexts. We can create contextually relevant hateful posts in the target context of our interest just by modifying the contents of the entity table. Table 1 shows the statistics of the entity table in English, Hindi, and Vietnamese.

Table 1.

	English	Hindi	Vietnamese
Hate-terms	56	19	23
Target groups	140	21	26
Target individuals	24	28	13

View Table

Table 1. Count of Lexicons in the Top Three Categories in the Entity Tables for English, Hindi, and Vietnamese

After creating the entity table, we use the machine translation API to translate the masked English hateful posts into the target context. Our experiments showed that machine translation preserves the masks while translating the other words in the post. However, we also observed a slight loss in semantics during masked translation compared to standard translation. However, our study results show that the subsequent entity substitution was able to bridge this loss in semantic information, and the results are discussed further in Section 4.

Creating the synthetic hate speech posts involves combining the entity table and the masked translated posts in the target context. The different MASK-x annotations help specify entity categories to replace to maintain semantic relevance. We randomly choose an entity from the corresponding entity category and replace the MASK-x with that entity. We could theoretically increase the replacement seed to have an exponential number of synthetic hateful posts from a single masked translated post. However, through our initial experiments, we found that setting the seed value to 1 helps us get the best results—a reasonably diversified dataset that helps avoid overfitting.

The entity replacement step completes our pipeline for synthetically generating hateful posts in the limited data context. Table 2 shows how a machine-translated hateful post differs from a machine-translated hateful post with contextual entity substitution. In the first example in 2, we find that the word “kike” has been transliterated from Latin to Devanagari. In contrast, we find that the CES method has the name of the person, “Bhagat Singh,” referred to as “penisless,” and is contextually relevant in the Indian domain (Indian freedom fighter). In the second example, we again find the word “dyke” being transliterated from the Latin script to the Devanagari script. The CES methodology, however, substitutes this entity with “Heejra,” which stands for transgender people in India.

Table 2.

S. No.	Machine Translated	Contextual Entity Substitution
1.	this ugly kike cunt keeps showing up on my timeline	this penisless Bhagat Singh keeps showing up on my timeline
2.	angry bald dyke	angry bald Heejra

View Table

Table 2. Qualitative Difference between the Synthetic Hate Speech Data Using Machine Translation and the Synthetic Hate Speech Data Using Contextual Entity Adaptation

Finally, we use these synthetically generated posts to augment the hateful posts in the target context and then train the machine learning model for hate speech detection. The results are explained in detail in Section 4.4.

3.4 BLOOM Language Model

Drawing from the advancements in natural language modeling, we investigate how large language models can help improve the classification accuracy of the hate speech detection models. We develop a third methodology for generating synthetic hate speech posts in the target context using the BLOOM language model with 3 billion parameters [56].

This method is different from the previous two methods, as it depends only on a sample of hateful posts in the target context, which are used as few-shot examples to help the language model generate similar posts. To generate hate speech posts, we pass a few posts from the target context and prompt the language model to generate the sixth post as shown in Table 3. The entire prompt is a string of Devanagari text given as input to the BLOOM-LM, and then it is asked to predict the following post from the input, which it predicts in Devanagari. Inferring from our empirical analyses, we chose the number of input examples as 5, set a repetition penalty of 2 to prevent post-repetition, applied early stopping with sampling, and specified a maximum token length of 100 words for the generated prompt.

Table 3.

Prompt

Post: The fiscal deficit in previous governments was at an alarming level of 3.3 per cent. The situation was that no one was ready to give loan to UP because no one gives loan at a loss of more than 3 percent. Yogi ji reduced this deficit and brought it down to the level of 2.97 per cent.

Post: Slogans of Pakistan indabad raised in Mumbai..!! Slogans of Pakistan Zindabad kept being raised in front of party’s quota minister in Thackeray government, Abu Azmi! Respected Sir, it is a request that these Pakistan lovers, traitors, traitors should be badly thrown out of the country..

Post: If this is the condition today, then tomorrow it will definitely be seen in UP and Delhi! Rather, people from every corner of the country are settled in Delhi, from where will they show their papers!

Post: People call Yogi government as casteist, it is very shameful that they have always run governments for the health of one caste.

Post: In the case of rape and subsequent brutal murder of Dr. Priyanka Reddy in Hyderabad, India’s so-called secularists are refraining from raising their voice today because the accused Muslim and the locality, Asaduddin Owaisi, is it not enough to protest?

Post:

Generated Post

Generated Post Do you know that India was going to become a world leader, but by ruining it by people like Modiji, we had become the poorest nation in the world.

View Table

Table 3. Prompt Engineering to Generate the Hateful Posts Using the BLOOM Language Model

3.5 Model and Metrics

After generating the different types of synthetic data, we fine-tune the Multilingual BERT model [20] using them. We report the average F1 scores of three independent runs of the training step. The same methodology is adopted for Hindi and Vietnamese.

4 EXPERIMENTS AND RESULTS

We focus on generating synthetic hateful posts to reduce the data imbalance problem and bootstrap hate speech detection work in new contexts. We collected datasets from high-resource and limited data contexts to perform our experiments. The dataset collected from the high-resource domain (i.e., English) supports the translation and entity substitution steps. The other datasets (in Hindi and Vietnamese) contain non-hateful posts and a small set of hateful posts on which data augmentation is performed.

4.1 Training Data

For Hindi, we use the dataset curated by Bhardwaj et al. [7], and for Vietnamese, we use the dataset curated by Luu et al. [37]. Table 4 shows the distribution of the hateful and non-hateful posts in each of the datasets. Since we only use hateful posts from English, we report only the amount of hateful posts available in English. As illustrated in Table 4, the number of hateful posts was the least in the Hindi dataset. Hence, we keep 450 posts as the upper limit for our data in the low-resource language. Using a few-shot training setup, we gradually augment the hateful posts in the low-resource language with synthetic hateful posts.

Table 4.

	Train Set		In-Domain Test Set		Out-Of-Domain Test Set
	Non-Hateful	Hateful	Non-Hateful	Hateful	Non-Hateful	Hateful
English	—	5,936	—	—	—	—
Hindi	3,050	478	277	133	1,753	1,017
Vietnamese	19,886	2,556	4,344	642	—	—

View Table

Table 4. Distribution of Non-hateful and Hateful Posts in Different Data Sources

4.1.1 Test Data.

To make our experimental conditions mirror real-world scenarios, our test dataset contains only original posts curated from the limited data context. We use the test data provided by Bhardwaj et al. [7] and Luu et al. [37] in Hindi and Vietnamese, respectively. Since these test data are obtained from the same source as the training data, we call this an in-domain test set. However, in field deployments, we find real-time production data varies from the dataset on which the classifier was trained. This difference could be due to the different forms of hate speech on different social media platforms, domain and narrative shifts, or dissimilarity in data curation methodologies. To observe the performance of the trained models in such a scenario, we leverage another dataset in Hindi by Bohra et al. [9] and term this the Out-Of-Domain (OOD) test set. These data comprise Hindi–English code-mixed posts in contrast to the training data, which comprised unilingual hate speech posts in Hindi. We transliterate this code-mixed data into Devanagari to carry out our test experiments. Due to the limited availability of open source hate speech datasets in Vietnamese, we did not perform the OOD analysis in Vietnamese.

4.2 Model Details

For all experiments, we fine-tuned the cased multilingual BERT model [20]. We used BERT’s sub-word tokenizer to tokenize the pre-processed input post and encode it into 768 dimensions using BERT embeddings. The encoding layer is followed by a dropout layer with a probability of 0.1, followed by a linear output layer that projects the 768-dimensional embedding into a two-dimensional vector. We use the Cross-Entropy loss function and Adam optimizer to train the model. We use a batch size of 16, learning rate of \(1e^{-05}\) without weight decay, and gradient clipping norm of 1.0 and fine-tuned the model for 10 epochs. We separate 10% of the training dataset for cross-validation and use 90% of the training data during the fine-tuning step. Our model has 177M trainable parameters, and we use a Microsoft Azure Virtual Machine with 1 GPU and 8 GB memory to fine-tune the model. Below, we report the experimental setup and our results.

4.3 Synthetic Hateful Augmentation through Machine Translation

We analyzed the impact of machine-translated hateful posts from English for training hate speech detection models in Hindi and Vietnamese. We use non-hateful posts available in Hindi and Vietnamese and a baseline of 100 original hateful posts in both languages. This mimics the typical real-world case where a handful of labeled hateful data is available compared to a majority of non-hateful posts. This initial split had 18% of the training data with true labels as original hate speech and about 82% for non-hate speech. This base case demonstrates a mean F1 score of 84.46 and 64.29 for Hindi and Vietnamese, respectively.

To test the effectiveness of MT examples for augmentation, we increase the baseline training data by adding 50 original hateful posts to the training data and comparing the results with a training data setup of the baseline of 100 original hateful + 50 new machine-translated hateful posts. We iteratively execute this increment of original vs. synthetic for seven steps until the hateful/non-hateful split is even (50:50%). Table 5 shows the Macro F1 score of models trained on the baseline and subsequent synthetic increments. The all-original model (All-Orig) acts as an upper limit to the performance if we had a complete set of original hateful posts and did not need to perform data augmentation. Our results show that in the ideal case where additional original hateful posts are added to the training data, the model performance attained F1 scores up to 88.48 (All-Orig, 7a) and 67.44 (All-Orig, 5a), compared to the initial baseline scores of 84.46 and 64.29 for Hindi and Vietnamese, respectively.

Table 5.

S. No.	Model Type	Original hateful posts	Synthetic hateful posts	Mean macro F1 (H)	Mean macro F1 (V)
1.	Base	100	0	84.46	64.29
2a.	All-Orig	150	0	85.76	66.58
2b.	MT	100	50	84.69	63.66
2c.	CES	100	50	85.62	63.25
2d.	BLOOM-LM	100	50	85.35	63.47
3a.	All-Orig	200	0	86.85	66.10
3b.	MT	100	100	84.07	63.33
3c.	CES	100	100	84.52	63.04
3d.	BLOOM-LM	100	100	84.91	64.48
4a.	All-Orig	250	0	86.77	66.89
4b.	MT	100	150	84.32	63.82
4c.	CES	100	150	85.54	61.94
4d.	BLOOM-LM	100	150	85.13	63.70
5a.	All-Orig	300	0	87.71	67.44
5b.	MT	100	200	85.80	62.48
5c.	CES	100	200	85.06	62.26
5d.	BLOOM-LM	100	200	85.25	63.74
6a.	All-Orig	350	0	86.93	65.03
6b.	MT	100	250	85.62	62.16
6c.	CES	100	250	85.91	61.78
6d.	BLOOM-LM	100	250	84.25	63.79
7a.	All-Orig	400	0	88.48	65.22
7b.	MT	100	300	84.05	63.34
7c.	CES	100	300	85.84	61.06
7d.	BLOOM-LM	100	300	84.92	63.92
8a.	All-Orig	450	0	87.61	63.31
8b.	MT	100	350	84.65	62.00
8c.	CES	100	350	85.99	63.25
8d.	BLOOM-LM	100	350	84.68	64.23

For each run, we use a constant 450 non-hateful posts.

View Table

Table 5. Comparison of Contextual Entity Substitution and BLOOM Language Model Synthetic Data Generation Methodologies vs. Machine Translated Synthetic Data Generation Methodology for Hindi (H) and Vietnamese (V)

For each run, we use a constant 450 non-hateful posts.

Finding. We observe that models trained using data augmented with machine-translated posts showed very little improvement on the baseline for Hindi (with mean F1 scores ranging from 84.05 - 85.80 vs. 84.46 baseline) but did not outperform the baseline for Vietnamese (with mean F1 scores ranging from 62.16 to 63.66 vs. 64.29 baseline). In general, the MT models did not significantly improve on the baseline, as more translated data were added indicating that the MT data potentially introduced more noise and less signal to the model.

4.4 Synthetic Hateful Augmentation through Contextual Entity Substitution

We follow the setup described in Section 4.3 to compare the baseline results with synthetic examples generated from the original English hate speech dataset using our CES method described in Section 3.3. Similarly, we use 450 non-hateful posts and iteratively augment the original hateful posts in Hindi and Vietnamese with synthetic CES posts in increments of 50. Table 5 shows the comparative results between the CES method vs. the MT and All-Orig models for Hindi and Vietnamese.

Finding. For Hindi, we find that in the majority of the steps, the CES methodology outperforms the machine-translated methodology and closes the gap with the models trained on all original hateful posts of the same quantity. The CES methodology shows a boost in performance with a mean F1 score up to 85.99 with 350 synthetic hate posts (CES, 8c), which is better than both the performance of the baseline of 100 original hate posts alone and MT-augmentation for all cases. For Vietnamese, both the MT and CES scenarios show a decrease in performance after adding more synthetic data resulting in mean F1 scores that were lower than the baseline. Broadly, we observe an increase in performance for CES-augmented models in Hindi. However, there is a surprising dominance of MT over CES methods in Vietnamese. We hypothesize that this is possibly due to the nature of the entity table for Vietnamese and discuss this in Section 5.

4.5 Synthetic Hateful Posts through Hateful Language Generation

Next, we again augmented the existing 100 hateful posts using hateful language generated using the BLOOM large language model [56], BLOOM-LM. In this method, we converted the entire hate speech dataset in the low-resource language into subsets of five posts and synthetically generate a sixth hateful post for each subset. We use the 100 available hateful posts in the low-resource language to generate 20 synthetic hateful posts. Then, we randomize the 100 posts in the low-resource language to re-order and re-group the hateful posts to form a new prompt. This re-ordered dataset generates 20 more synthetic hateful posts. We iterate this step until we acquire the required number of synthetic posts.

Finding. We find that the BLOOM-LM method outperforms the MT method in both Hindi and Vietnamese as we increase the amount of synthetic data. BLOOM-LM also closes the gap in performance between the model trained on all original hateful posts of similar quantity as the BLOOM-augmented model. However, the CES method outperforms the BLOOM-LM method in Hindi in most cases while the BLOOM-LM method outperforms the CES method in most Vietnamese cases. Specifically, for Vietnamese, we observe that adding more BLOOM-LM synthetic data leads to a steady increase in performance. We hypothesize that this is potentially due to more representation of Vietnamese data in the BLOOM pretraining dataset compared to Hindi and discuss this in Section 5.

4.6 Results on OOD Test Set

To test the robustness of the CES method in comparison to the All-Orig, MT, and BLOOM-LM cases, we mimic a real-world deployment scenario and test the trained models on entirely new data from a different source than the training data. This is particularly challenging for hate speech models, since differences in platform sources, hate lingo, narratives, and so on, can lead to entirely new forms of hate speech. We only found a different dataset for our OOD test in Hindi and thus use that for our analysis.

Finding. In the base case, training with 450 non-hateful and 100 original hateful, the mean F1 was 50.81. We observe that the BLOOM-LM method performs better than CES and MT methods on OOD data. As we incrementally add synthetic data, we noticed a reduction in performance for both MT and CES methods. MT on OOD test data dropped from a mean F1 of 45.85 to 41.49 and CES from 46.20 to 41.71, but for BLOOM-LM the performance ranged from 50.91 to 51.95.

In general, we observe that in the OOD test, fewer training data performed better than more training data for all the methods—All-Orig, MT, CES, and BLOOM-LM. This makes sense, since more data will increase the existing significant deviation between the training set and the new test set. Nonetheless, a CES approach may be more relevant for languages not represented in large language models like BLOOM. Since OOD data often represent the present state of the world at test/deployment time, we argue that incorporating newer entities from the real-world dataset into the entity table can significantly improve the performance of the CES method.

5 DISCUSSION

5.1 Interpretability Analysis

The performance boost obtained through training on synthetic data with the CES method helps validate our hypothesis of transferring hate speech context across languages. To develop a deeper understanding of our results and examine if the performance boost was, in fact, due to the presence of context-specific entities, we interpreted our model results using the SHapley Additive exPlanations (SHAP) framework [36]. The SHAP framework helps us calculate the contribution of each word when the model makes its prediction.

In our interpretability analysis, we obtained the average SHAP value for every word in the test data and sorted the words with maximum contribution across the entire dataset. We then annotated the top 20 words with respect to them being an entity or not and calculated the percentage contribution by entities across the top 20 contributing words in the test set. This analysis helps us understand whether the entities play a greater role in classifier prediction for the model trained on the synthetic data with CES when compared to that trained on MT synthetic data.

We observe that the average contribution of entities on classifier prediction is 31% for the MT model. However, it is 38% for the CES model in the Hindi language. We found even more promising results for Vietnamese as there was only a 13% contribution by the entities toward the final prediction with MT while there was a 59% contribution by entities in the CES model. This provides further evidence of entities playing a greater role in guiding the model prediction when the model is fine-tuned on the synthetic data with contextual entity substitution.

5.2 Implications

The scarcity of data for hate speech detection in low-resource language contexts has been well documented [24, 29, 38]. Data work for machine learning (hate speech detection inclusive) is considered boring, expensive, and intensive, especially when accounting for geographic and language barriers [44, 55]. Our work presents three significant implications: (i) by presenting methods for augmenting hate speech data in limited data contexts and comparing their performance on in-domain and out-of-domain test sets, we address a lingering question for hate speech practitioners about technical approaches for boosting limited hate speech data for real-world deployments; (2) our empirical findings highlight the important role of humans-in-the-loop of hate speech detection systems for creating and maintaining structures, managing domain drifts, and evaluating performance; and (3) we motivate the need for more research in synthetic hate speech data generation and, broadly, in the inclusion of more lower-resourced languages in large language models for use in downstream applications.

Our findings show that automatically translating hate speech data from one language is not the best approach for data augmentation. This is mostly due to the loss of contextual relevance of hate targets as the model translates from one language to another. Drawing from findings in vision systems [6], two key properties that make synthetic data good is naturalism and diversity. Naturalism implies that the data may not be real, but they must capture certain structural properties seen in real data. We attempt to achieve this natural property by translating data from one language to another. However, prior work has shown that machine translating hate speech data is subject to the quality of the translation system, the annotation scheme used in both languages, and class balance [13].

Our contextual entity substitution method addresses a major limitation of machine-translated hate speech data by infusing structure and context into the translated results. The CES method also proffers the diversity property to the synthetic data generated. We have shown that this method outperforms simple machine translation and performs comparably to models trained using only original data or generative methods. However, since this method is heavily reliant on a finite set of entities in the entity table, we see no remarkable improvements as more CES synthetic data are generated. For instance, our analysis of the entity table in Hindi and Vietnamese from Table 1 shows that the Hindi entity table having more target individuals than Vietnamese led to a more diverse synthetic data generation. The success of this method is dependent on the continuous update of the entity table to account for domain drifts and to improve the diversity of the generated synthetic data. This offers support to previous claims to include context experts as part of effective hate speech detection and tracking projects [44].

The BLOOM large language model used in this work has been trained on 46 natural languages [56], and our findings show that the level of language representation can play a role in the quality of the sentences generated by the model. For instance, since Vietnamese had twice the size of Hindi language data in the BLOOM pretraining dataset, we observe that the quality of synthetic data generated in Vietnamese is better than for Hindi. As models include more diverse languages in their pre-training setup, these methods can be extended to newer contexts. Future work can also explore the potential benefits of further fine-tuning the language models on data from the languages of interest prior to generating synthetic examples. Furthermore, our findings motivate the need for additional work in prompt engineering for synthetic hate speech data. Our initial experimentation with target-guided prompting (see Table 6) seeks to generate hate speech targeting a specific group. This approach could potentially improve the quality of synthetic data generated to train machine learning models.

Table 6.

Prompt

Target group: Muslim Post: If you are a Muslim, you are a criminal, end of the matter!

Target group: Muslim Post: There is only one similarity between Kashmir and Palestine, Muslims are being openly oppressed in both the places and the world is silent!

Target group: Muslim Post: The video of slogan of Pakistan Zindabad in the presence of Samajwadi Party MLA Abu Asim Azmi in Shramik Special train running from Wadala in Mumbai is going viral

Target group: Muslim Post: If you are a Muslim then you are a criminal, and if you are anti-BJP then you are a straight terrorist.

Target group: Muslim Post: In the case of rape and subsequent brutal murder of Dr. Priyanka Reddy in Hyderabad, India’s so-called secularists are refraining from raising their voice today because the accused Muslim and the locality, Asaduddin Owaisi, is it not enough to protest?

Target group: Muslim Post:

Generated Post

If you are a Muslim or not a Hindu, you will have to leave the country.

View Table

Table 6. Target Group-specific Prompt Engineering to Generate Targeted Hateful Posts Using the BLOOM Language Model

Overall, we find that there is no single recipe for augmenting hate speech data in low-resource contexts. When the entity table is comprehensive, the CES method shines; when the language is well represented in a generative large language model, the language generation technique performs well. In general, adapting hate speech from one context to another is bound to introduce noise and domain shift, and choosing whether to perform contextual substitution or language generation will depend on the constraints in the limited data context of preference.

5.3 Limitation and Future Work

We recognize that this work presents some limitations, and some of them suggest promising directions for future work. Our present analyses have investigated the performance of our proposed methods on Hindi and Vietnamese even though these languages have reasonably decent representation in many language models. This selection bias might have influenced the performance of the proposed methods. Though within the context of hate speech detection research there are very few resources in Hindi and Vietnamese [29, 30], it is unclear whether our methods will work for many less-resourced languages. We believe that approaches for hate speech detection using synthetic data generation should be extended to lesser-resourced languages, and future work should consider this.

Our methodology adopts a random matching mechanism for selecting substituted entities from the table. Further work is needed to explore other matching methods, for example, exploring the effectiveness of adding another layer of semantic understanding to adapt the entities more closely to their corresponding replacement. This semantic coherence could potentially increase the quality of the hateful posts and further boost the performance of the models.

An ethical concern with researching hate speech detection methodologies using synthetically generated data is the possibility for bad actors to adopt these strategies for propagating synthetically generated hateful content on social media. While we unequivocally denounce such use, we argue that responsible use of the proposed methodologies can be deployed to inhibit the spread of such content from such malicious use. A model trained on synthetic data could even be more astute in detecting synthetic hate speech because of the distribution similarity with the data. The proposed methods could also be extended to incorporate techniques such as watermarking [33] to detect synthetically generated texts while retaining the benefits of data augmentation.

6 CONCLUSION

In this work, we address the issue of data imbalance and data unavailability affecting the performance of automatic hate speech detection systems in limited data contexts. We investigated three approaches to generate synthetic hate speech data and presented a novel methodology for transferring hateful sentiment across languages while retaining contextual relevance in the target domains. We augmented a small number of hateful posts in Hindi and Vietnamese with synthetically generated hateful posts and trained machine learning models in a few-shot setup. Our findings show significant benefits of our proposed methods under different scenarios. Our contribution will help practitioners and researchers working on hate speech detection in limited data contexts build more robust machine learning systems to further their capacity to counter hate speech.

ACKNOWLEDGMENTS

We thank Microsoft for providing compute for the experiment in Azure credits, and the Computing For Good Fellowship at Georgia Institute of Technology, which partially funded the first author’s work on this project. We also thank our partners at The Carter Center for their support in the project. Finally, we thank our Technologies and International Development Lab colleagues and the anonymous reviewers who provided critical feedback to help improve this article.

Footnotes

REFERENCES

[1] 2021. PeaceTech Lab—Hate Speech Lexicons. Retrieved from https://www.peacetechlab.org/hate-speechGoogle Scholar
Reference
[2] Ali Raza, Farooq Umar, Arshad Umair, Shahzad Waseem, and Beg Mirza Omer. 2022. Hate speech detection on Twitter using transfer learning. Computer Speech & Language 74 (2022), 101365.Google ScholarDigital Library
Reference
[3] Alshalan Raghad, Al-Khalifa Hend, Alsaeed Duaa, Al-Baity Heyam, and Alshalan Shahad. 2020. Detection of hate speech in covid-19–related tweets in the arab region: Deep learning and topic modeling approach. Journal of Medical Internet Research 22, 12 (2020), e22609.Google Scholar
Reference
[4] Aluru Sai Saketh, Mathew Binny, Saha Punyajoy, and Mukherjee Animesh. 2021. A deep dive into multilingual hate speech classification. In Proceedings of the Machine Learning and Knowledge Discovery in Databases, Applied Data Science and Demo Track: European Conference (ECML PKDD’20) Ghent, Belgium, September 14–18, 2020, Proceedings, Part V. Springer, 423–439.Google ScholarDigital Library
Reference 1Reference 2Reference 3
[5] Badri Nabil, Kboubi Ferihane, and Chaibi Anja Habacha. 2022. Combining FastText and Glove word embedding for offensive and hate speech text detection. Proc. Comput. Sci. 207 (2022), 769–778.Google ScholarDigital Library
Reference
[6] Jurjo Manel Baradad, Wulff Jonas, Wang Tongzhou, Isola Phillip, and Torralba Antonio. 2021. Learning to see by looking at noise. Advances in Neural Information Processing Systems 34 (2021), 2556–2569.Google Scholar
Reference 1Reference 2
[7] Bhardwaj Mohit, Akhtar Md Shad, Ekbal Asif, Das Amitava, and Chakraborty Tanmoy. 2020. Hostility detection dataset in Hindi. arXiv preprint arXiv:2011.03588 (2020).Google Scholar
Reference 1Reference 2
[8] Bigoulaeva Irina, Hangya Viktor, and Fraser Alexander. 2021. Cross-lingual transfer learning for hate speech detection. In Proceedings of the 1st Workshop on Language Technology for Equality, Diversity and Inclusion. 15–25.Google Scholar
Reference 1Reference 2
[9] Bohra Aditya, Vijay Deepanshu, Singh Vinay, Akhtar Syed Sarfaraz, and Shrivastava Manish. 2018. A dataset of Hindi-English code-mixed social media text for hate speech detection. In Proceedings of the 2nd Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media. 36–41.Google ScholarCross Ref
Reference
[10] Brown Tom, Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared D., Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, et al. 2020. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33 (2020), 1877–1901.Google Scholar
Reference
[11] Cao Rui and Lee Roy Ka-Wei. 2020. HateGAN: Adversarial generative-based data augmentation for hate speech detection. In Proceedings of the 28th International Conference on Computational Linguistics. 6327–6338.Google ScholarCross Ref
Reference
[12] Cardoso Renato, Vallecorsa Sofia, and Nemni Edoardo. 2022. Conditional progressive generative adversarial network for satellite image generation. In NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research.Google Scholar
Reference
[13] Casula Camilla and Tonelli Sara. 2020. Hate speech detection with machine-translated data: The role of annotation scheme, class imbalance and undersampling. In Proceedings of the 7th Italian Conference on Computational Linguistics (CLiC-it’20), Vol. 2769. CEUR-WS. org.Google ScholarCross Ref
Reference
[14] Chopra Shivang, Sawhney Ramit, Mathur Puneet, and Shah Rajiv Ratn. 2020. Hindi-english hate speech detection: Author profiling, debiasing, and practical perspectives. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 386–393.Google ScholarCross Ref
Reference
[15] Conneau Alexis, Khandelwal Kartikay, Goyal Naman, Chaudhary Vishrav, Wenzek Guillaume, Guzmán Francisco, Grave Edouard, Ott Myle, Zettlemoyer Luke, and Stoyanov Veselin. 2019. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 8440–8451.Google Scholar
Reference 1Reference 2
[16] Das Mithun, Banerjee Somnath, and Mukherjee Animesh. 2022. Data bootstrapping approaches to improve low resource abusive language detection for indic languages. In Proceedings of the 33rd ACM Conference on Hypertext and Social Media. 32–42.Google ScholarDigital Library
Reference
[17] Davidson Thomas, Warmsley Dana, Macy Michael, and Weber Ingmar. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11. 512–515.Google ScholarCross Ref
Reference
[18] Gibert Ona De, Perez Naiara, García-Pablos Aitor, and Cuadros Montse. 2018. Hate speech dataset from a white supremacy forum. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). 11–20.Google Scholar
Reference
[19] Demilie Wubetu Barud and Salau Ayodeji Olalekan. 2022. Detection of fake news and hate speech for Ethiopian languages: A systematic review of the approaches. J. Big Data 9, 1 (2022), 66.Google ScholarCross Ref
Reference
[20] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[21] Ding Bosheng, Liu Linlin, Bing Lidong, Kruengkrai Canasai, Nguyen Thien Hai, Joty Shafiq R., Si Luo, and Miao Chunyan. 2020. DAGA: Data augmentation with a generation approach for low-resource tagging tasks. In Conference on Empirical Methods in Natural Language Processing.Google ScholarCross Ref
Reference
[22] Fazel Amin, Yang Wei, Liu Yulan, Barra-Chicote Roberto, Meng Yixiong, Maas Roland, and Droppo Jasha. 2021. Synthasr: Unlocking synthetic data for speech recognition. arXiv preprint arXiv:2106.07803 (2021).Google Scholar
Reference
[23] Feng Steven Y., Gangal Varun, Wei Jason, Chandar Sarath, Vosoughi Soroush, Mitamura Teruko, and Hovy Eduard. 2021. A survey of data augmentation approaches for NLP. In Findings of the Association for Computational Linguistics: (ACL-IJCNLP’21). 968–988.Google Scholar
Reference 1Reference 2
[24] Fortuna Paula and Nunes Sérgio. 2018. A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51, 4 (2018), 1–30.Google ScholarDigital Library
Reference 1Reference 2Reference 3
[25] Goodfellow Ian, Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, and Bengio Yoshua. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.Google ScholarDigital Library
Reference
[26] Gröndahl Tommi, Pajola Luca, Juuti Mika, Conti Mauro, and Asokan N. 2018. All you need is “love” evading hate speech detection. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security. 2–12.Google ScholarDigital Library
Reference 1Reference 2
[27] Hartvigsen Thomas, Gabriel Saadia, Palangi Hamid, Sap Maarten, Ray Dipankar, and Kamar Ece. 2022. ToxiGen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 3309–3326.Google Scholar
Reference 1Reference 2
[28] Hu Ting-Yao, Armandpour Mohammadreza, Shrivastava Ashish, Chang Jen-Hao Rick, Koppula Hema, and Tuzel Oncel. 2022. Synt++: Utilizing imperfect synthetic data to improve speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’22). IEEE, 7682–7686.Google ScholarCross Ref
Reference
[29] Jahan Md Saroar and Oussalah Mourad. 2023. A systematic review of Hate Speech automatic detection using Natural Language Processing. Neurocomputing (2023), 126232.Google ScholarDigital Library
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[30] Joshi Pratik, Santy Sebastin, Budhiraja Amar, Bali Kalika, and Choudhury Monojit. 2020. The state and fate of linguistic diversity and inclusion in the NLP world. arXiv preprint arXiv:2004.09095 (2020).Google Scholar
Reference 1Reference 2Reference 3
[31] Khemchandani Yash, Mehtani Sarvesh, Patil Vaidehi, Awasthi Abhijeet, Talukdar Partha Pratim, and Sarawagi Sunita. 2021. Exploiting language relatedness for low web-resource language model adaptation: An indic languages study. In Annual Meeting of the Association for Computational Linguistics.Google Scholar
Reference 1Reference 2Reference 3
[32] Kim Yo-whan. 2022. How Transferable are Video Representations Based on Synthetic Data?Ph. D. Dissertation. Massachusetts Institute of Technology.Google Scholar
Reference
[33] Kirchenbauer John, Geiping Jonas, Wen Yuxin, Katz Jonathan, Miers Ian, and Goldstein Tom. 2023. A watermark for large language models. arXivpreprint arXiv:2301.10226 (2023).Google Scholar
Reference
[34] Lauscher Anne, Ravishankar Vinit, Vulic Ivan, and Glavas Goran. 2020. From zero to hero: On the limitations of zero-shot cross-lingual transfer with multilingual transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 4483–4499.Google Scholar
Reference
[35] Lee Kenton, Guu Kelvin, He Luheng, Dozat Tim, and Chung Hyung Won. 2021. Neural data augmentation via example extrapolation. arXiv preprint arXiv:2102.01335 (2021).Google Scholar
Reference
[36] Lundberg Scott M. and Lee Su-In. 2017. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30 (2017).Google Scholar
Reference
[37] Luu Son T., Nguyen Kiet Van, and Nguyen Ngan Luu-Thuy. 2021. A large-scale dataset for hate speech detection on vietnamese social media texts. In Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices: 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA/AIE’21), Kuala Lumpur, Malaysia, July 26–29, 2021, Proceedings, Part I 34. Springer, 415–426.Google ScholarDigital Library
Reference 1Reference 2
[38] Madukwe Kosisochukwu, Gao Xiaoying, and Xue Bing. 2020. In data we trust: A critical analysis of hate speech detection datasets. In Proceedings of the 4th Workshop on Online Abuse and Harms. 150–161.Google ScholarCross Ref
Reference
[39] Madukwe Kosisochukwu Judith, Gao Xiaoying, and Xue Bing. 2022. Token replacement-based data augmentation methods for hate speech detection. World Wide Web (2022), 1–22.Google Scholar
Reference
[40] Mathew Binny, Dutt Ritam, Goyal Pawan, and Mukherjee Animesh. 2019. Spread of hate speech in online social media. In Proceedings of the 10th ACM Conference on Web Science. 173–182.Google ScholarDigital Library
Reference
[41] Mathew Binny, Illendula Anurag, Saha Punyajoy, Sarkar Soumya, Goyal Pawan, and Mukherjee Animesh. 2020. Hate begets hate: A temporal study of hate speech. Proc. ACM Hum.-Comput. Interact. 4, CSCW2 (2020), 1–24.Google ScholarDigital Library
Reference
[42] Mathew Binny, Saha Punyajoy, Yimam Seid Muhie, Biemann Chris, Goyal Pawan, and Mukherjee Animesh. 2021. Hatexplain: A benchmark dataset for explainable hate speech detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 14867–14875.Google ScholarCross Ref
Reference
[43] Meng Linghui, Xu Jin, Tan Xu, Wang Jindong, Qin Tao, and Xu Bo. 2021. MixSpeech: Data augmentation for low-resource automatic speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’21), 7008–7012.Google Scholar
Reference
[44] Nkemelu Daniel, Shah Harshil, Essa Irfan, and Best Michael L.. 2022. Tackling hate speech in low-resource languages with context experts. In Proceedings of the 2022 International Conference on Information and Communication Technologies and Development. 1–11.Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
[45] Oord Aaron van den, Dieleman Sander, Zen Heiga, Simonyan Karen, Vinyals Oriol, Graves Alex, Kalchbrenner Nal, Senior Andrew, and Kavukcuoglu Koray. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).Google Scholar
Reference
[46] Ousidhoum Nedjma, Lin Zizheng, Zhang Hongming, Song Yangqiu, and Yeung Dit-Yan. 2019. Multilingual and multi-aspect hate speech analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 4675–4684.Google Scholar
Reference
[47] Perez Luis and Wang Jason. 2017. The effectiveness of data augmentation in image classification using deep learning. ArXiv abs/1712.04621 (2017).Google Scholar
Reference
[48] Raffel Colin, Shazeer Noam, Roberts Adam, Lee Katherine, Narang Sharan, Matena Michael, Zhou Yanqi, Li Wei, and Liu Peter J.. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.Google Scholar
Reference
[49] Ramesh Aditya, Dhariwal Prafulla, Nichol Alex, Chu Casey, and Chen Mark. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).Google Scholar
Reference
[50] Ranasinghe Tharindu and Zampieri Marcos. 2021. Multilingual offensive language identification for low-resource languages. Transactions on Asian and Low-Resource Language Information Processing 21, 1 (2021), 1–13.Google Scholar
Reference 1Reference 2
[51] Research Facebook. 2019. LASER: Language-Agnostic SEntence Representations.Google Scholar
Reference
[52] Romim Nauros, Ahmed Mosahed, Talukder Hriteshwar, and Islam Md Saiful. 2021. Hate speech detection in the bengali language: A dataset and its baseline evaluation. In Proceedings of International Joint Conference on Advances in Computational Intelligence (IJCACI’20). Springer, 457–468.Google ScholarCross Ref
Reference
[53] Sagers Luke W, Diao James A, Groh Matthew, Rajpurkar Pranav, Adamson Adewole S, and Manrai Arjun K. 2022. Improving dermatology classifiers across populations using images generated by large diffusion models. arXiv preprint arXiv:2211.13352 (2022).Google Scholar
Reference
[54] Saharia Chitwan, Chan William, Saxena Saurabh, Li Lala, Whang Jay, Denton Emily, Ghasemipour Seyed Kamyar Seyed, Ayan Burcu Karagol, Mahdavi S. Sara, Lopes Rapha Gontijo, et al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479–36494.Google Scholar
Reference
[55] Sambasivan Nithya, Kapania Shivani, Highfill Hannah, Akrong Diana, Paritosh Praveen, and Aroyo Lora M.. 2021. “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1–15.Google Scholar
Reference 1Reference 2
[56] Scao Teven Le, Fan Angela, Akiki Christopher, Pavlick Ellie, Ilić Suzana, Hesslow Daniel, Castagné Roman, Luccioni Alexandra Sasha, Yvon François, Gallé Matthias, et al. 2022. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022).Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[57] Schmidt Anna and Wiegand Michael. 2019. A survey on hate speech detection using natural language processing. In Proceedings of the 5th International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, 1–10.Google Scholar
Reference
[58] Shen Jonathan, Pang Ruoming, Weiss Ron J., Schuster Mike, Jaitly Navdeep, Yang Zongheng, Chen Zhifeng, Zhang Yu, Wang Yuxuan, Skerrv-Ryan Rj, et al. 2018. Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’18). IEEE, 4779–4783.Google ScholarDigital Library
Reference
[59] Shorten Connor and Khoshgoftaar Taghi M.. 2019. A survey on image data augmentation for deep learning. Journal of big data 6, 1 (2019), 1–48.Google Scholar
Reference 1Reference 2
[60] Swamy Steve Durairaj, Jamatia Anupam, and Gambäck Björn. 2019. Studying generalisability across abusive language detection datasets. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL’19). 940–950.Google ScholarCross Ref
Reference
[61] Thoppilan Romal, Freitas Daniel De, Hall Jamie, Shazeer Noam, Kulshreshtha Apoorv, Cheng Heng-Tze, Jin Alicia, Bos Taylor, Baker Leslie, Du Yu, et al. 2022. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022).Google Scholar
Reference
[62] Toraman Cagri, Şahinuç Furkan, and Yılmaz Eyup Halit. 2022. Large-scale hate speech detection with cross-domain transfer. In Proceedings of the 13th Language Resources and Evaluation Conference. 2215–2225.Google Scholar
Reference
[63] Wei Shengyun, Zou Shun, Liao Feifan, et al. 2020. A comparison on data augmentation methods based on deep learning for audio classification. In Journal of Physics: Conference Series, Vol. 1453. IOP Publishing, 012085.Google ScholarCross Ref
Reference
[64] Whang Steven Euijong, Roh Yuji, Song Hwanjun, and Lee Jae-Gil. 2023. Data collection and quality challenges in deep learning: A data-centric ai perspective. The VLDB Journal (2023), 1–23.Google Scholar
Reference
[65] Wu Shijie and Dredze Mark. 2020. Are all languages created equal in multilingual BERT? In Proceedings of the 5th Workshop on Representation Learning for NLP. 120–130.Google Scholar
Reference
[66] Xia M., Kong X., Anastasopoulos Antonios, and Neubig Graham. 2019. Generalized data augmentation for low-resource translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5786–5796.Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[67] Yoder Michael Miller, Ng Lynnette Hui Xian, Brown David West, and Carley Kathleen M.. 2022. How hate speech varies by target identity: A computational analysis. In Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL’22). 27–39.Google Scholar
Reference 1Reference 2
[68] Yu Jiahui, Xu Yuanzhong, Koh Jing Yu, Luong Thang, Baid Gunjan, Wang Zirui, Vasudevan Vijay, Ku Alexander, Yang Yinfei, Ayan Burcu Karagol, et al. 2022. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789 (2022).Google Scholar
Reference
[69] Yun Sangdoo, Oh Seong Joon, Heo Byeongho, Han Dongyoon, and Kim Jinhyung. 2020. Videomix: Rethinking data augmentation for video classification. arXiv preprint arXiv:2012.03457 (2020).Google Scholar
Reference
[70] Ziqi Z., Robinson D., and Jonathan T.. 2019. Hate speech detection using a convolution-LSTM based deep neural network. IJCCS 11816 (2019), 2546–2553.Google Scholar
Reference

Index Terms

Hate Speech Detection in Limited Data Contexts Using Synthetic Data Generation

Index terms have been assigned to the content through auto-classification.

Recommendations

Tackling Hate Speech in Low-resource Languages with Context Experts
ICTD '22: Proceedings of the 2022 International Conference on Information and Communication Technologies and Development

Given Myanmar’s historical and socio-political context, hate speech spread on social media have escalated into offline unrest and violence. This paper presents findings from our remote study on the automatic detection of hate speech online in Myanmar. ...
Read More
Hate Speech Identification using the Hate Codes for Indonesian Tweets
DSIT 2019: Proceedings of the 2019 2nd International Conference on Data Science and Information Technology

The hate speech has become the major source of negativity spread in all over the social media. As the social media becomes aware of this issue, they gradually build several new regulations to handle the spread of hate speech e.g. by automatically ...
Read More
A Measurement Study of Hate Speech in Social Media
HT '17: Proceedings of the 28th ACM Conference on Hypertext and Social Media

Social media platforms provide an inexpensive communication medium that allows anyone to quickly reach millions of users. Consequently, in these platforms anyone can publish content and anyone interested in the content can obtain it, representing a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Journal on Computing and Sustainable Societies Volume 2, Issue 1
March 2024
255 pages
EISSN:2834-5533
DOI:10.1145/3613746
Editor:
Lakshminarayanan Subramanian
New York University, United States
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 January 2024
- Online AM: 12 October 2023
- Accepted: 22 February 2023
- Received: 15 February 2023
Published in acmjcss Volume 2, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Hate speech
synthetic data
machine learning
low-resource text classification
digital threats
democracy
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 1,144
  Total Downloads
- Downloads (Last 12 months)1,144
- Downloads (Last 6 weeks)331
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hate Speech Detection in Limited Data Contexts Using Synthetic Data Generation

ACM Journal on Computing and Sustainable Societies

Abstract

1 INTRODUCTION

2 RELATED WORK

2.1 Hate Speech Detection in Limited Data Contexts

2.2 Context Transfer across Languages

2.3 Data Augmentation and Synthetic Data Generation in NLP

3 METHODOLOGY

3.1 Dataset Curation

3.2 Machine Translation

3.3 Contextual Entity Substitution

3.4 BLOOM Language Model

3.5 Model and Metrics

4 EXPERIMENTS AND RESULTS

4.1 Training Data

4.1.1 Test Data.

4.2 Model Details

4.3 Synthetic Hateful Augmentation through Machine Translation

4.4 Synthetic Hateful Augmentation through Contextual Entity Substitution

4.5 Synthetic Hateful Posts through Hateful Language Generation

4.6 Results on OOD Test Set

5 DISCUSSION

5.1 Interpretability Analysis

5.2 Implications

5.3 Limitation and Future Work

6 CONCLUSION

ACKNOWLEDGMENTS

Footnotes

REFERENCES

Cited By

Index Terms

Recommendations

Tackling Hate Speech in Low-resource Languages with Context Experts

Hate Speech Identification using the Hate Codes for Indonesian Tweets

A Measurement Study of Hate Speech in Social Media

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media