Introduction and background

The rise of Natural Language Processing (NLP) tasks focused on hate speech Badjatiya et al. (2017) and the analysis of online debates Celli et al. (2014) have both highlight bad behaviors in social media, such as offensive language against vulnerable groups (e.g., immigrants, minorities, etc.) Poletto et al. (2017), as well as aggressive language against women Saha et al. (2018). An under-researched - yet important - area of investigation is anti-policy hate: the hate speech against politicians, policy making and laws at any level (national, regional and local). While anti-policy hate speech has been addressed in Arabic Guellil et al. (2020), most European languages have been under-researched.

In recent years, scientific research contributed to the automatic detection of hate speech from text with datasets annotated with hate labels, aggressiveness, offensiveness, and other related dimensions Sanguinetti et al. (2018). Scholars have presented systems for the detection of hate speech in social media focused on specific targets, such as immigrants Del Vigna et al. (2017), and language domains, such as racism Kwok and Wang (2013), misogyny Frenda et al. (2019) or cyberbullying Menini et al. (2019). Each type of hate speech has its own vocabulary and its own dynamics, thus the selection of a specific domain is crucial to obtain clean data and to restrict the scope of experiments and learning tasks.

We have formulated three Research Questions:

  • RQ1: How different are hate speech domains, such as anti-immigrants and anti-policy?

  • RQ2: Is it possible to perform cross-domain training to exploit techniques and models trained in one domain (i.e. anti-immigration) to detect hate speech in another domain (i.e. against policy-makers)?

  • RQ3: Is it possible to identify and track the topics of public debate involved/not involved in hate speech?

In order to address RQ1, we performed correlation and classification analysis. The former was carried out to measure how different language features are related to hate speech in different domains, the latter to test the performance of classifiers in different domains. To address RQ2, we performed cross-domain classification and applied hate speech models trained in an anti-immigration domain to a policy-making domain. Finally, to address RQ3, we extracted the hashtags from tweets labelled as hateful and non-hateful, visualized the network of co-occurrences with a Hifan Hu graph Yifan and Shi (2015).

With this research, we aim to provide actionable insights for evidence-based decision-making Kyriazis et al. (2020), as online hate is often a predictor of offline crime Williams et al. (2020). We selected Twitter as the source of data and Italian as the target language for two reasons:

  1. (1)

    There are datasets annotated with anti-immigrant hate speech labels in Italian, but no datasets annotated with anti policy making hate speech labels,

  2. (2)

    Italy has, at least since the elections in 2018, a large audience that pays attention to hyper-partisan sources on Twitter that are prone to produce and retweet messages of hate against policy making Giglietto et al. (2019).

This paper contributes to the scientific research in NLP and hate speech detection in two ways. First: the production of a new corpus, annotated with hate speech labels, in an under-resourced language (Italian). Second: the classification of hate speech tweets against policy making, and its comparison to the classification of hate speech against immigrants.

The paper is structured as follows: after a literature review (‘Related work’), we collect a stream of tweets in Italian using keywords (i.e., hashtags) related to laws and regulations (‘Data collection and annotation’). We then train, test, and evaluate models for hate speech from existing resources, analyze the predictive power of each feature, visualize the results (‘Experiments and discussion’), and draw conclusions (‘Conclusion and future work’).

Related work

Hate speech is defined as any expression that is abusive, insulting, intimidating, harassing, and/or incites, supports and facilitates violence, hatred, or discrimination. It is directed against people (individuals or groups) on the basis of their race, ethnic origin, religion, gender, age, physical condition, disability, sexual orientation, political conviction, and so forthKarmen and Melita (2012). A recent study defined the relationships between hate speech and related concepts (see Fig. 1), highlighting the fact that involved phenomena make hate speech especially hard to model, with the risk of creating data that is biased and making the models prone to overfitting. In addition to this, literature also reports cases of annotators’ insensitivity to differences in dialects and offenses Sap et al. (2019) that make annotation difficult. For these reasons, one of the largest challenges in the field of hate speech is to investigate architectures that are explainable, stable and well-performing across different languages and domains Poletto et al. (2020).

Fig. 1
figure 1

Relation between hate speech and related concepts. Source Poletto et al. (2020)

Another key issue is that many recent approaches based on word embeddings Kenneth (2017), Deep Learning algorithms and BERT Pre-trained transformers Jacob et al. (2018) Tenney et al. (2019) Polignano et al. (2019), are vulnerable to undesirable bias in training data, especially in the political domain Wich et al. (2020), and suffer from poor interpretability MacAvaney et al. (2019). In other words, it can be difficult to understand how the systems based on Deep Learning techniques make their decisions about hateful/non-hateful messages. Moreover, the decisions taken by systems might be based on biased and unfair models. A method for explaining the decisions of transformer models is to look at the attention vectors Clark et al. (2019). Yet, studies show that learned attention weights are frequently uncorrelated with gradient-based measures of feature importance, thus, different attention distributions can nonetheless yield similar predictions Jain and Wallace (2019). In a context of policy making, the transparency of the decisions and the possibility to interpret the results should be considered a priority.

Despite there being many studies about hate speech in Natural Language Processing (NLP) against various targets, such as anti-immigrants, there are few works in the field of hate speech detection against politicians and policy making. Previous approaches to this task exploited transparent Machine Learning (ML) algorithms, such as Gaussian Naïve Bayes, Random Forests or Support Vector Machines (SVM), as well as Deep Learning algorithms, such as convolutional neural network (CNN), Multi-Layer Perceptrons (MLP), Recurrent Neural Networks (RNN) with long-short-term memory (LSTM) or bi-directional long-short-term memory (Bi-LSTM) on top of word embeddings extracted from the training set or pre-trained from other resources with transfer learning. These studies show that good results can be obtained with Bi-LSTM, MLP and SVM Guellil et al. (2020).

Studies that provided useful datasets in the field of hate speech include SemEval 2019 who studies multilingual hate speech against immigrants and women in English and Spanish Basile et al. (2019). In Italian there are two main corpora, both about anti-immigrant hate: the Italian HS corpus Poletto et al. (2017) and HaSpeeDe-tw2018, the dataset released during the EVALITA campaign in 2018 Sanguinetti et al. (2020b). The former is a collection of more than 5,700 tweets manually annotated with hate speech, aggressiveness, irony and other forms of potentially harassing communication. The latter, is a dataset (3000 tweets for training and 1000 for testing) manually annotated with hate speech labels. The results of HaSpeeDe-tw2018, reported in Table 1, are the state-of-the-art in hate speech detection in Italian and show that lexical resources, such as polarity and emotion lexica, are useful to this task Bosco et al. (2018), Fersini et al. (2018).

Table 1 State-of-the-art in hate speech classification

Most hate speech recognition systems at HaSpeeDe-tw2018 exploit SVM, Recurrent Neural Networks with LSTM or ensemble learning (meta) Bai et al. (2018), Michele et al. (2018), De la Pena Sarracén et al. (2018), and word-embeddings as features Santucci et al. (2018), pre-trained or extracted from the training set. Some systems also use cross-platform data (i.e. Facebook and Twitter) and shows that this strategy yields similar results for Twitter Corazza et al. (2019). Crucially, the best performing systems make use of lexical resources for polarity, subjectivity and emotions Cimino et al. (2018), showing that word embeddings are more effective when combined with lexical resources. The current state-of-the-art in the HaSpeeDe task from Twitter is 0.808 macro-F1, obtained using transformer-based models Sanguinetti et al. (2020a). Regarding the visualization, the heuristic power of network graphs has been known in computational social sciences for at least one decade. For example, network graphs of topics or Twitter hashtags can be used to analyze sentiment polarization of hyper-partisan topics Kiran and Weber (2017). Another example, networks of replies annotated with personality types can represent the conversational dynamics of neurotics and emotionally stable users Celli and Rossi (2013).

In the next section, we describe how we created the dataset and annotated it with hate speech labels.

Data collection and annotation

In order to monitor the reactions of society towards policy making, we retrieved a stream of tweets in Italian from March to May 2020, using snowball sampling. Starting from a set of seed hashtags, for instance: #dpcm (decree of the president of the council of ministers), #legge (law) and #leggedibilancio (budget law), we retrieved a sample of tweets and then added the new hashtags contained in this sample to extend the list of seed hashtags and retrieve new tweets. We called this dataset Policycorpus. We removed duplicates, retweets and tweets containing only hashtags and urls. In total we obtain a set of 1264 tweets (1000 for training and 264 for testing). The amount of hate labels in the Policycorpus is 11% (1124 normal and 140 hate tweets). It is strongly unbalanced, like in the it-HS corpus (17% of hate tweets), because it reflects the raw distribution of hate tweets in Twitter. The HaSpeeDe-tw corpus (32% of hate tweets) instead has a distribution that oversample hate tweets. At the end of the sampling process, the list of seeds included about 60 hashtags referring to

  • Laws, such as #decretorilancio (#relaunchdecree), #leggelettorale (#electorallaw), #decretosicurezza (#securitydecree)

  • Politicians and policy makers, such as #Salvini, #decretoSalvini (#Salvinidecree), #Renzi, #Meloni, #DraghiPremier

  • Political parties, such as #lega (#league), #pd (#Democratic Party)

  • Political tv shows, such as #ottoemezzo, #nonelarena, #noneladurso, #Piazzapulita

  • Topics of the public debate, such as #COVID, #precari (#precariousworkers), #sicurezza (#security), #giustizia (#justice), #ItalExit

  • Hyper-partisan slogans, such as #vergognaConte (#shameonConte), #contedimettiti (#ConteResign) or #noicontrosalvini (#WeareagainstSalvini)

This is the first corpus in Italian annotated with hate speech against policy makers. We plan to make this resource available under requestFootnote 1.

To produce gold standard labels, we asked two Italian experts of communication, to manually label the tweets in the Policycorpus, distinguishing between hate and normal tweets according to the following guidelines: By definition, hate speech is any expression that is abusive, insulting, intimidating, harassing, and/or incites to violence, hatred, or discrimination. It is directed against people on the basis of their race, ethnic origin, religion, gender, age, physical condition, disability, sexual orientation, political conviction, and so forth. Translated Examples:

  1. (1)

    a clear #NO to #Netherlands that we would like users of the #MES economic resources but in exchange for Italy’s renunciation of its budgetary autonomy. To Netherlands we say: thank you and goodbye, WE ARE NOT INTERESTED !!” is normal because it does not contain hate, insults, intimidation, violence or discrimination.

  2. (2)

    ... There is a weekly catwalk of the #jackal #no #notAtAll! Listening to a Po #clown after a true PATRIOT a doctor from #Bergamo cannot be held, seen or heard. Giletti should stop inviting certain SLACKERS FROM THE PO VALLEY! #COVID-19 #NonelArena” contains hate speech, including insults like #clown and #jackal.

  3. (3)

    I have my say ... #Draghi is a great economist but we don’t need a #Monti-style economist ... We don’t need another technical #government to obey the banking lobby! We need a political leader! We need a #ItalExit! We need the #Lira! #No to #DraghiPremier” is a normal case, despite the strong negative sentiment. It might be controversial for the presence of the term lobby, often used in abusive contexts, but in this case, it is not directed against people on the basis of their race, ethnic origin, religion, gender, age, physical condition, disability, sexual orientation or political conviction.

The Inter-Annotator Agreement is k = 0.53. Although the score is not high, it is in line with the score reported in the literature for hate speech against immigrants \((k=0.54)\)Poletto et al. (2017) and indicates that the detection of hate speech is a hard task for humans. All the examples of disagreement were discussed and an agreement was reached between the annotators. The cases of disagreements occurred more often when the sentiment of the tweet was negative, this was mainly due to:

The use of vulgar expressions not explicitly directed against specific people but generically against political choices.

The negative interpretation of hyper-partisan hashtags, such as #contedimettiti (#ConteResign) or #noicontrosalvini (#WeareagainstSalvini), in tweets without explicit insults or abusive language.

The substitution of explicit insults with derogatory words, such as the word “circus” instead of “clowns”. In the next section, we report and discuss the results of the experiments.

Experiments and discussion

Our goal is to create models of hate speech that automatically predict hateful tweets against policy makers in the Policycorpus. First, we describe the features extracted from text, then we perform in-domain and cross-domain classification, and finally, we conduct feature analysis and visualize the hashtag networks. As discussed in ‘Related work’, we aim to develop explainable Artificial Intelligence (AI) models, hence we also exploited ML algorithms based on lexical resource (Lex), such as SVM, Adaboost and Random Forests, in addition to more advanced techniques, for instance, neural networks based on the AlBERTo pretrained transformer model. We ran two different experiments:

  • In experiment one, we tried to answer to RQ2, using different algorithms to train models on the existing corpora. We then perform a cross-domain classification, evaluating the predictions trained on HaSpeeDe-tw and it-HS to the Policycorpus test set (‘In-domain and cross-domain classification’);

  • In experiment two, we tried to answer to RQ1, with a feature analysis to understand which features are best predictors of hate speech in the policy making domain with respect to anti-immigration domain (‘Feature analysis’);

Finally, to answer RQ3, we visualized the networks of hashtags in order to understand the relationships between topics used in normal and hateful tweets (‘Hashtags network analysis’). Before all, we described the features extracted from text.

Feature extraction

Building upon the previous work presented in the literature, we adopted linguistic resources for the extraction of features to use with ML algorithms. In particular, we used:

  • LIWC Tausczik and Pennebaker (2010), a linguistic resource available in many languages, including Italian Alparone et al. (2004), that maps words to 68 psycholinguistics dimensions, such as linguistic dimensions (i.e. pronouns, articles, tense), psychological processes (i.e cognitive mechanisms, sensations, certainty, causation) human processes (i.e. sex, social life, family), personal concerns (i.e. leisure, money, religion, death) and spoken categories (i.e. assent, nonfluencies)

  • NRC Mohammad et al. (2013), a linguistic resource that maps words to 10 emotion and polarity features, for instance positive words, negative words, anger, anticipation, fear, sadness, joy, surprise, trust and disgust.

  • Other 22 language-independent stylometric features Celli (2015), including positive/negative emoticons/emojis, ratio of punctuation, question and expression marks, numbers, operators, links, hashtags, mentions or emails addresses, parentheses, lowercase/uppercase and ratio of repeated bigrams.

These dictionaries extract a matrix of 100 features, less sparse than bag-of-words. In addition to this, we used a transformer model trained on Italian tweets: AlBERTo, that extracts a dense matrix of more than 700 embedding features Polignano et al. (2019).

In-domain and cross-domain classification

Hate speech labels are naturally unbalanced, as normal tweets are - fortunately - the large majority, especially in the Policycorpus and it-HS corpus. As this is a natural condition, we chose to keep the labels unbalanced and measure the performances with two metrics: ROC AUC curve, which is insensitive to class imbalance, and weighted-average F-measure that takes into account the difference of performance for the two classes. In this experiment, we trained and tested various algorithms, we used a training-test split as evaluation settings, which is 88–12% in the it-HS corpus, 75–25% in HaSpeeDe-tw2018 and 80–20% in the Policycorpus (Table 2).

Table 2 Results of the classification of hate speech in Italian on the Italian HS corpus (it-HS), HaSpeeDe-tw2018 (HaSpeeDe-tw) and Policycorpus (PC) with different algorithms, lexical features (Lex) and transformer embeddings (AlBERTo)
Table 3 Per-class results of the classification on each corpus with the best algorithm (AlBERTo + neural networks)

A closer look at the per-class performance obtained with the best algorithm (AlBERTo + neural networks), reveals that in general the algorithm has a higher performance in the detection of normal tweets and lower performance in the recognition of hate tweets, which have a poor recall. The fact that recall is higher in the HaSpeeDe-tw corpus than in the Policycorpus suggests that balancing the number of hate examples with the normal ones has a positive effect on recall. Precision is similar in these two datasets (0.75): the it-HS corpus has a higher precision on the hate class, but the recall follows the same pattern of the other two corpora. We present these results in Table 3.

In attempt to address RQ2, we used the models trained on the HaSpeeDe-tw and it-HS corpora in the previous experiment to automatically produce predictions on the Policycorpus test set, thus performing a cross-domain backtest. Given the differences between domains we expect poor results in the next experiment, the results of which are presented in Table 4.

Table 4 Results of the cross-domain classification of hate speech in Italian on the Policycorpus-test (PC-test) with the models trained on the HaSpeeDe-tw2018 and Italian HS corpora

As expected, the results of cross-domain classification show that the domain shift had a huge impact on the performance of the classifiers, particularly from HaSpeeDe-tw to Policycorpus, where the results measured with weighted-average F1 are below the majority baseline, suggesting that the features are so different that the model cannot use them in the correct way. Surprisingly, the models trained on the it-HS corpus produced good results, but only the ones trained with ML algorithms, particularly random forests and adaboost, that are more capable of using weak features. AlBERTo and Neural Networks in this case performed only slightly better than the majority baseline. We believe that the large training size of it-HS corpus had a positive effect for the cross-domain adaptation.

Feature analysis

The cross-domain classification highlighted the difference in the features between the corpora. To measure this difference, and answer RQ1, we computed the Pearson correlation between the lexical features and the hate speech scores. In Table 5 we present the best lexical features correlated to hate speech in each dataset. Positive correlation indicates the best features to classify hate messages and negative correlations indicate the best feature to classify normal messages. All these features were used in the classification experiments.

Table 5 Results of the correlation ranking between different lexical features and hate speech
Fig. 2
figure 2

Visualization of the average activations of each token in the attention vectors, associated to hate and normal labels, for each corpus

The analysis revealed that Stylometric features, such as the ratio of lowercase and uppercase characters, have a strong predictive power in the HaSpeeDe-tw2018 dataset, but not in the it-HS corpus, where there is more variety. LIWC features, such as sexual, anger and swear word ratios, are among the best predictors of hate speech against politicians. This experiment clearly shows that the most useful features for the detection of hate speech in the domain of anti-immigration are punctuation (the more there is punctuation, the more a message is non-hateful) and expression marks (the more exclamations, the more a message is likely to be hateful). In Policycorpus there are sexual and swear words as markers of hateful messages and lower case, numbers and positive emotions as markers of non-hateful messages. It is interesting to note that lower case letters are correlated to hate speech in the anti-immigration domain, while in the anti-policy domain they are correlated to non-hateful messages. The similarity between the best features in it-HS and Policycorpus explained the good result obtained in the cross-domain classification with ML algorithms. We also exploited the attention vectors of AlBERTo to try to explain the poor performance in the cross-domain classification. Using the average activations of each token in the attention vectors, we computed the strongest predictors in the model. The results, represented as wordclouds in Fig. 2, show the most frequent tokens activated to detect hate and normal labels for each corpus. The clear difference from the tokens used in the anti-immigrant and anti-policy domains is a clue of the poor performance in cross-domain classification.

Hashtags network analysis

Fig. 3
figure 3

Detailed visualization of the hashtag network of a small portion of the Policycorpus with Yifan Hu trees. The blue cloud (above) contains hashtags of normal messages, the red cloud (below) contains the hashtags from hate tweets and the smaller cloud between the two contains the hashtags in both messages

To address RQ3, we treated the ‘normal’ and ‘hate’ classes as nodes in a network that we plotted with Yifan Hu trees Yifan and Shi (2015). In this way we were able to visualize the network of hashtags connected to the ‘normal’ or ‘hate’ nodes in the Policycorpus. In other words, we were able to visualize hashtags appearing only in hate speech context, hashtags appearing only in normal contexts, and hashtags appearing in both networks. The results, depicted in Fig. 3, show a pattern with a blue cloud (above), that represents the network of hashtags in normal tweets and a red cloud (below), that represents the hashtags in hate messages. Between the two, there is a smaller cloud of hashtags used in both contexts. A closer look to these hashtags reveals the topics of the public debate that are more controversial.

These topics include politicians (#Salvini, #Meloni, #Conte, #Draghi), economic issues (#lira, #MES), keywords related to the pandemic (#covid, #pandemia, #mascherine) and to political tv shows (#nonelarena).

Conclusion and future work

In this paper, we presented a new resource for the analysis of hate speech against policy makers on Twitter. The dataset, named Policycorpus, is the first of this type in Italian, an under-resourced language. We confirmed that the annotation of hate speech is difficult, and detailed the cases of disagreements between annotators. Using this resource, we demonstrated that:

  • Deep Learning algorithms and transformer-based models achieve state-of-the-art performances in both domains.

  • Machine Learning algorithms are suitable for cross-domain classification from hate speech against immigrants to hate speech against policy makers.

  • Hate speech against immigrants can be detected by looking at the style of the written text (i.e. punctuation and exclamation), while hate speech towards policy makers is based more on the vocabulary and psycholinguistic aspects (i.e. swear words).

We also visualized the spread of hate speech in Twitter against policy makers Hagen et al. (2019) and identified clusters of tweets that appear only in hate tweets and in both normal and hate tweets. We suggest that this method can be exploited to track which topics convey hatred towards policy-makers. Combining hate speech detection algorithms and visualizations, one can build a dashboard for monitoring hate speech on Twitter. The final aspect that we want to highlight is that the amount of data available, and its balance between classes, can help to improve the performance of the classifiers. In the future we plan to run experiments on domain-adaptation and collect more data for hate speech detection against policy makers.