skip to main content
10.1145/3201064.3201103acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
short-paper

A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research

Published:15 May 2018Publication History

ABSTRACT

A quality annotated corpus is essential to research. Despite the re- cent focus of the Web science community on cyberbullying research, the community lacks standard benchmarks. This paper provides both a quality annotated corpus and an o ensive words lexicon capturing di erent types of harassment content: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual, and (v) political1. We rst crawled data from Twitter using this content-tailored o ensive lexicon. As mere presence of an o ensive word is not a reliable indicator of harassment, human judges annotated tweets for the presence of harassment. Our corpus consists of 25,000 annotated tweets for the ve types of harassment content and is available on the Git repository2.

References

  1. Sofia Berne, Ann Frisén, and Johanna Kling. 2014. Appearance-related cyberbullying: A qualitative investigation of characteristics, content, reasons, and effects. Body image 11, 4 (2014), 527--533.Google ScholarGoogle Scholar
  2. Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean Birds: Detecting Aggression and Bullying on Twitter. CoRR abs/1702.06877 (2017). arXiv:1702.06877 http://arxiv.org/abs/1702.06877Google ScholarGoogle Scholar
  3. Jennifer Golbeck, Zahra Ashktorab, Rashad O Banjo, Alexandra Berlinger, Siddharth Bhagwan, Cody Buntain, Paul Cheakalos, Alicia A Geller, Quint Gergory, Rajesh Kumar Gnanasekaran, et al. 2017. A Large Labeled Corpus for Online Harassment Research. In Proceedings of the 2017 ACM on Web Science Conference. ACM, 229--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Homa Hosseinmardi, Rahat Ibn Rafiq, Richard Han, Qin Lv, and Shivakant Mishra. 2016. Prediction of cyberbullying incidents in a media-based social network. In Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on. IEEE, 186--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica: Biochemia medica 22, 3 (2012), 276--282.Google ScholarGoogle Scholar
  6. Elaheh Raisi and Bert Huang. 2017. Cyberbullying detection with weakly supervised machine learning. In Proceedings of the IEEE/ACM International Conference on Social Networks Analysis and Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mohammadreza Rezvan, Saeedeh Shekarpour, Thirunarayan Krishnaprasad, Valerie Shalin, and Amit Sheth. 2018. Analyzing and Learning Language for Harassment in Different Contexts. In Submitted to THE 12TH INTERNATIONAL AAAI CONFERENCE ON WEB AND SOCIAL MEDIA (ICWSM-18).Google ScholarGoogle Scholar
  8. Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. 88--93.Google ScholarGoogle ScholarCross RefCross Ref
  9. Dawei Yin, Zhenzhen Xue, Liangjie Hong, Brian D Davison, April Kontostathis, and Lynne Edwards. 2009. Detection of harassment on web 2.0. Proceedings of the Content Analysis in the WEB 2 (2009), 1--7.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    WebSci '18: Proceedings of the 10th ACM Conference on Web Science
    May 2018
    399 pages
    ISBN:9781450355636
    DOI:10.1145/3201064

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 15 May 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • short-paper

    Acceptance Rates

    WebSci '18 Paper Acceptance Rate30of113submissions,27%Overall Acceptance Rate218of875submissions,25%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader