short-paper

A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

Authors:
Mohammadreza Rezvan

Kno.e.sis Center, Dayton, OH, USA

Kno.e.sis Center, Dayton, OH, USA
View Profile

,
Saeedeh Shekarpour

University of Dayton, Dayton, OH, USA

University of Dayton, Dayton, OH, USA
View Profile

,
Lakshika Balasuriya

Kno.e.sis Center, Dayton, OH, USA

Kno.e.sis Center, Dayton, OH, USA
View Profile

,
Krishnaprasad Thirunarayan

Kno.e.sis Center, Dayton, OH, USA

Kno.e.sis Center, Dayton, OH, USA
View Profile

,
Valerie L. Shalin

Kno.e.sis Center, Dayton, OH, USA

Kno.e.sis Center, Dayton, OH, USA
View Profile

,
Amit Sheth

Kno.e.sis Center, Dayton, OH, USA

Kno.e.sis Center, Dayton, OH, USA
View Profile

WebSci '18: Proceedings of the 10th ACM Conference on Web ScienceMay 2018Pages 33–36https://doi.org/10.1145/3201064.3201103

Published:15 May 2018Publication History

WebSci '18: Proceedings of the 10th ACM Conference on Web Science

Pages 33–36

ABSTRACT

A quality annotated corpus is essential to research. Despite the re- cent focus of the Web science community on cyberbullying research, the community lacks standard benchmarks. This paper provides both a quality annotated corpus and an o ensive words lexicon capturing di erent types of harassment content: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual, and (v) political1. We rst crawled data from Twitter using this content-tailored o ensive lexicon. As mere presence of an o ensive word is not a reliable indicator of harassment, human judges annotated tweets for the presence of harassment. Our corpus consists of 25,000 annotated tweets for the ve types of harassment content and is available on the Git repository2.

References

Sofia Berne, Ann Frisén, and Johanna Kling. 2014. Appearance-related cyberbullying: A qualitative investigation of characteristics, content, reasons, and effects. Body image 11, 4 (2014), 527--533.Google Scholar
Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean Birds: Detecting Aggression and Bullying on Twitter. CoRR abs/1702.06877 (2017). arXiv:1702.06877 http://arxiv.org/abs/1702.06877Google Scholar
Jennifer Golbeck, Zahra Ashktorab, Rashad O Banjo, Alexandra Berlinger, Siddharth Bhagwan, Cody Buntain, Paul Cheakalos, Alicia A Geller, Quint Gergory, Rajesh Kumar Gnanasekaran, et al. 2017. A Large Labeled Corpus for Online Harassment Research. In Proceedings of the 2017 ACM on Web Science Conference. ACM, 229--233. Google ScholarDigital Library
Homa Hosseinmardi, Rahat Ibn Rafiq, Richard Han, Qin Lv, and Shivakant Mishra. 2016. Prediction of cyberbullying incidents in a media-based social network. In Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on. IEEE, 186--192. Google ScholarDigital Library
Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica: Biochemia medica 22, 3 (2012), 276--282.Google Scholar
Elaheh Raisi and Bert Huang. 2017. Cyberbullying detection with weakly supervised machine learning. In Proceedings of the IEEE/ACM International Conference on Social Networks Analysis and Mining. Google ScholarDigital Library
Mohammadreza Rezvan, Saeedeh Shekarpour, Thirunarayan Krishnaprasad, Valerie Shalin, and Amit Sheth. 2018. Analyzing and Learning Language for Harassment in Different Contexts. In Submitted to THE 12TH INTERNATIONAL AAAI CONFERENCE ON WEB AND SOCIAL MEDIA (ICWSM-18).Google Scholar
Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. 88--93.Google ScholarCross Ref
Dawei Yin, Zhenzhen Xue, Liangjie Hong, Brian D Davison, April Kontostathis, and Lynne Edwards. 2009. Detection of harassment on web 2.0. Proceedings of the Content Analysis in the WEB 2 (2009), 1--7.Google Scholar

Recommendations

A Large Labeled Corpus for Online Harassment Research
WebSci '17: Proceedings of the 2017 ACM on Web Science Conference

A fundamental part of conducting cross-disciplinary web science research is having useful, high-quality datasets that provide value to studies across disciplines. In this paper, we introduce a large, hand-coded corpus of online harassment data. A team ...
Read More
A Survey on Automatic Detection of Hate Speech in Text

The scientific study of hate speech, from a computer science point of view, is recent. This survey organizes and describes the current state of the field, providing a structured overview of previous approaches, including core algorithms, methods, and ...
Read More
Mean Birds: Detecting Aggression and Bullying on Twitter
WebSci '17: Proceedings of the 2017 ACM on Web Science Conference

In recent years, bullying and aggression against social media users have grown significantly, causing serious consequences to victims of all demographics. Nowadays, cyberbullying affects more than half of young social media users worldwide, suffering ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WebSci '18: Proceedings of the 10th ACM Conference on Web Science
May 2018
399 pages
ISBN:9781450355636
DOI:10.1145/3201064
General Chairs:
Hans Akkermans
Vrije Universiteit Amsterdam, The Netherlands
,
Kathy Fontaine
Rensselaer Polytechnic Institute, USA
,
Ivar Vermeulen
Vrije Universiteit Amsterdam, The Netherlands
,
Program Chairs:
Geert-Jan Houben
TU Delft, The Netherlands
,
Matthew S. Weber
Rutgers University, New Jersey, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
annotated corpus
appearance-related
context
cyberbullying
harassment
intellectual
offensive lexicon
political
profane word
racial
sexual
Qualifiers
- short-paper
Conference

Acceptance Rates
WebSci '18 Paper Acceptance Rate30of113submissions,27%Overall Acceptance Rate218of875submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 34
  Total Citations
  View Citations
- 313
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

WebSci '18: Proceedings of the 10th ACM Conference on Web Science

ABSTRACT

References

Cited By

Recommendations

A Large Labeled Corpus for Online Harassment Research

A Survey on Automatic Detection of Hate Speech in Text

Mean Birds: Detecting Aggression and Bullying on Twitter

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

WebSci '18: Proceedings of the 10th ACM Conference on Web Science

ABSTRACT

References

Cited By

Recommendations

A Large Labeled Corpus for Online Harassment Research

A Survey on Automatic Detection of Hate Speech in Text

Mean Birds: Detecting Aggression and Bullying on Twitter

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media