skip to main content
10.1145/3501247.3531572acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

LIWC-UD: Classifying Online Slang Terms into LIWC Categories

Published:26 June 2022Publication History

ABSTRACT

Linguistic Inquiry and Word Count (LIWC), a popular tool for automated text analysis, relies on an expert-crafted internal dictionary of psychologically relevant words and their corresponding categories. While LIWC’s dictionary covers a significant portion of commonly used words, the continuous evolution of language and the usage of slang in settings such as social media requires fixed resources to be frequently updated in order to stay relevant. In this work we present LIWC-UD, an automatically generated extension to LIWC’s dictionary which includes terms defined in Urban Dictionary. While original LIWC contains 6,547 unique entries, LIWC-UD consists of 141K unique terms automatically categorized into LIWC categories with high confidence using BERT classifier. LIWC-UD covers many additional terms that are commonly used on social media platforms like Twitter. We release LIWC-UD publicly to the community as a supplement to the original LIWC lexicon.

Skip Supplemental Material Section

Supplemental Material

WS22_S7_100.mp4

mp4

1,019.8 MB

References

  1. Aseel Addawood, Adam Badawy, Kristina Lerman, and Emilio Ferrara. 2019. Linguistic cues to deception: Identifying political trolls on social media. In Proceedings of the international AAAI conference on web and social media, Vol. 13. 15–25.Google ScholarGoogle ScholarCross RefCross Ref
  2. Silvio Amir, Rámon Astudillo, Wang Ling, Paula C Carvalho, and Mário J Silva. 2016. Expanding subjective lexicons for social media mining with embedding subspaces. arXiv preprint arXiv:1701.00145(2016).Google ScholarGoogle Scholar
  3. Ramón Astudillo, Silvio Amir, Wang Ling, Mario J Silva, and Isabel Trancoso. 2015. Learning word representations from scarce and noisy data with embedding subspaces. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1074–1084.Google ScholarGoogle ScholarCross RefCross Ref
  4. Henri Avancini, Alberto Lavelli, Fabrizio Sebastiani, and Roberto Zanoli. 2006. Automatic expansion of domain-specific lexicons by term categorization. ACM Transactions on Speech and Language Processing (TSLP) 3, 1(2006), 1–30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gilbert Badaro, Hussein Jundi, Hazem Hajj, and Wassim El-Hajj. 2018. EmoWordNet: Automatic expansion of emotion lexicon using English WordNet. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. 86–93.Google ScholarGoogle ScholarCross RefCross Ref
  6. Mohamed Bahgat, Steven R Wilson, and Walid Magdy. 2020. Towards Using Word Vector Embeddings Space for Better Cohort Analysis. In Proceedings of the International AAAI Conference on Web and Social Media.Google ScholarGoogle ScholarCross RefCross Ref
  7. Pedro Balage Filho, Thiago Alexandre Salgueiro Pardo, and Sandra Aluísio. 2013. An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology.Google ScholarGoogle Scholar
  8. Ryan L Boyd, Ashwini Ashokkumar, Sarah Seraj, and James W Pennebaker. 2022. The Development and Psychometric Properties of LIWC-22. (2022).Google ScholarGoogle Scholar
  9. Margaret M Bradley and Peter J Lang. 1999. Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical Report. Technical report C-1, the center for research in psychophysiology ….Google ScholarGoogle Scholar
  10. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.Google ScholarGoogle Scholar
  11. Peter Sheridan Dodds, Kameron Decker Harris, Isabel M Kloumann, Catherine A Bliss, and Christopher M Danforth. 2011. Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PloS one 6, 12 (2011), e26752.Google ScholarGoogle ScholarCross RefCross Ref
  12. Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, and Elizabeth Belding. 2018. Hate lingo: A target-based linguistic analysis of hate speech in social media. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 12.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ethan Fast, Binbin Chen, and Michael S Bernstein. 2016. Empath: Understanding topic signals in large-scale text. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 4647–4657.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. 168–177.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kokil Jaidka, Niyati Chhaya, Saran Mumick, Matthew Killingsworth, Alon Halevy, and Lyle Ungar. 2020. Beyond Positive Emotion: Deconstructing Happy Moments Based on Writing Prompts. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 294–302.Google ScholarGoogle ScholarCross RefCross Ref
  16. Kokil Jaidka, Niyati Chhaya, and Lyle Ungar. 2018. Diachronic degradation of language models: Insights from social media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Melbourne, Australia, 195–200. https://doi.org/10.18653/v1/P18-2032Google ScholarGoogle ScholarCross RefCross Ref
  17. Ewa Kacewicz, James W Pennebaker, Matthew Davis, Moongee Jeon, and Arthur C Graesser. 2014. Pronoun use reflects standings in social hierarchies. Journal of Language and Social Psychology 33, 2 (2014), 125–143.Google ScholarGoogle ScholarCross RefCross Ref
  18. Jeffrey H Kahn, Renee M Tobin, Audra E Massey, and Jennifer A Anderson. 2007. Measuring emotional expression with the Linguistic Inquiry and Word Count. The American journal of psychology(2007), 263–286.Google ScholarGoogle Scholar
  19. Hussain S Khawaja, Mirza O Beg, and Saira Qamar. 2018. Domain specific emotion lexicon expansion. In 2018 14th International Conference on Emerging Technologies (ICET). IEEE, 1–5.Google ScholarGoogle ScholarCross RefCross Ref
  20. David D Lewis, Yiming Yang, Tony Russell-Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. Journal of machine learning research 5, Apr (2004), 361–397.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Bernardo Magnini and Gabriela Cavaglia. 2000. Integrating Subject Field Codes into WordNet.. In LREC. 1413–1418.Google ScholarGoogle Scholar
  22. Matthew Matero, Akash Idnani, Youngseo Son, Salvatore Giorgi, Huy Vu, Mohammad Zamani, Parth Limbachiya, Sharath Chandra Guntuku, and H Andrew Schwartz. 2019. Suicide risk assessment with multi-level dual-context language and bert. In Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology. 39–44.Google ScholarGoogle ScholarCross RefCross Ref
  23. Tabea Meier, Ryan L Boyd, James W Pennebaker, Matthias R Mehl, Mike Martin, Markus Wolf, and Andrea B Horn. 2019. “LIWC auf Deutsch”: The Development, Psychometrics, and Introduction of DE-LIWC2015. PsyArXiva(2019).Google ScholarGoogle Scholar
  24. George Miller. 1998. WordNet: An electronic lexical database. MIT press.Google ScholarGoogle Scholar
  25. Saif M Mohammad and Peter D Turney. 2013. Crowdsourcing a word–emotion association lexicon. Computational intelligence 29, 3 (2013), 436–465.Google ScholarGoogle Scholar
  26. Matthew L. Newman, James W. Pennebaker, Diane S. Berry, and Jane M. Richards. 2003. Lying Words: Predicting Deception from Linguistic Styles. Personality and Social Psychology Bulletin 29, 5 (2003), 665–675. https://doi.org/10.1177/0146167203029005010 arXiv:https://doi.org/10.1177/0146167203029005010PMID: 15272998.Google ScholarGoogle ScholarCross RefCross Ref
  27. Dong Nguyen, Barbara McGillivray, and Taha Yasseri. 2018. Emo, love and god: making sense of Urban Dictionary, a crowd-sourced online dictionary. Royal Society open science 5, 5 (2018), 172320.Google ScholarGoogle Scholar
  28. Marco Ortu, Alessandro Murgia, Giuseppe Destefanis, Parastou Tourani, Roberto Tonelli, Michele Marchesi, and Bram Adams. 2016. The emotional side of software developers in JIRA. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR). IEEE, 480–483.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of LIWC2015. Technical Report.Google ScholarGoogle Scholar
  30. James W Pennebaker, Martha E Francis, and Roger J Booth. 2001. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates 71, 2001 (2001), 2001.Google ScholarGoogle Scholar
  31. Soujanya Poria, Alexander Gelbukh, Erik Cambria, Amir Hussain, and Guang-Bin Huang. 2014. EmoSenticSpace: A novel framework for affective common-sense reasoning. Knowledge-Based Systems 69 (2014), 108–123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383–2392.Google ScholarGoogle ScholarCross RefCross Ref
  33. Sara Rosenthal, Preslav Nakov, Svetlana Kiritchenko, Saif Mohammad, Alan Ritter, and Veselin Stoyanov. 2015. SemEval-2015 Task 10: Sentiment Analysis in Twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). Association for Computational Linguistics, Denver, Colorado, 451–463. https://doi.org/10.18653/v1/S15-2078Google ScholarGoogle ScholarCross RefCross Ref
  34. Fabrizio Sebastiani, Alessandro Sperduti, and Nicola Valdambrini. 2000. An improved boosting algorithm and its application to text categorization. In Proceedings of the ninth international conference on Information and knowledge management. 78–85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Samira Shaikh, Kit Cho, Tomek Strzalkowski, Laurie Feldman, John Lien, Ting Liu, and George Aaron Broadwell. 2016. ANEW+: Automatic expansion and validation of affective norms of words lexicons in multiple languages. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). 1127–1132.Google ScholarGoogle Scholar
  36. Jacopo Staiano and Marco Guerini. 2014. Depeche Mood: a Lexicon for Emotion Analysis from Crowd Annotated News. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 427–433.Google ScholarGoogle ScholarCross RefCross Ref
  37. Philip J Stone, Dexter C Dunphy, and Marshall S Smith. 1966. The general inquirer: A computer approach to content analysis.(1966).Google ScholarGoogle Scholar
  38. Mike Thelwall. 2017. The Heart and soul of the web? Sentiment strength detection in the social web with SentiStrength. In Cyberemotions. Springer, 119–134.Google ScholarGoogle Scholar
  39. Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas. 2010. Sentiment strength detection in short informal text. Journal of the American society for information science and technology 61, 12 (2010), 2544–2558.Google ScholarGoogle ScholarCross RefCross Ref
  40. Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2019. Superglue: A stickier benchmark for general-purpose language understanding systems. arXiv preprint arXiv:1905.00537(2019).Google ScholarGoogle Scholar
  41. Alex Wang, Amapreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2019. Glue: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  42. Amy Beth Warriner, Victor Kuperman, and Marc Brysbaert. 2013. Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior research methods 45, 4 (2013), 1191–1207.Google ScholarGoogle Scholar
  43. Steven Wilson, Walid Magdy, Barbara McGillivray, and Gareth Tyson. 2021. Embedding Structured Dictionary Entries. Association for Computational Linguistics.Google ScholarGoogle Scholar
  44. Steven R Wilson, Walid Magdy, Barbara McGillivray, Kiran Garimella, and Gareth Tyson. 2020. Urban dictionary embeddings for slang NLP applications. ACL.Google ScholarGoogle Scholar
  45. Steven R Wilson, Walid Magdy, Barbara McGillivray, and Gareth Tyson. 2020. Analyzing temporal relationships between trending terms on twitter and urban dictionary activity. In 12th ACM Conference on Web Science. 155–163.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of human language technology conference and conference on empirical methods in natural language processing. 347–354.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Liang Wu, Fred Morstatter, and Huan Liu. 2018. SlangSD: building, expanding and using a sentiment dictionary of slang words for short-text sentiment classification. Language Resources and Evaluation 52, 3 (2018), 839–852.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Xiangkai Zeng, Cheng Yang, Cunchao Tu, Zhiyuan Liu, and Maosong Sun. 2018. Chinese LIWC lexicon expansion via hierarchical classification of word embeddings with sememe attention. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. LIWC-UD: Classifying Online Slang Terms into LIWC Categories
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022
            June 2022
            479 pages
            ISBN:9781450391917
            DOI:10.1145/3501247

            Copyright © 2022 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 26 June 2022

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate218of875submissions,25%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format