research-article

LIWC-UD: Classifying Online Slang Terms into LIWC Categories

Authors:
Mohamed Bahgat

University Of Edinburgh, United Kingdom

University Of Edinburgh, United Kingdom
View Profile

,
Steve Wilson

Oakland University, USA

Oakland University, USA
View Profile

,
Walid Magdy

University of Edinburgh, United Kingdom

University of Edinburgh, United Kingdom
View Profile

WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022June 2022Pages 422–432https://doi.org/10.1145/3501247.3531572

Published:26 June 2022Publication History

WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022

Pages 422–432

ABSTRACT

Linguistic Inquiry and Word Count (LIWC), a popular tool for automated text analysis, relies on an expert-crafted internal dictionary of psychologically relevant words and their corresponding categories. While LIWC’s dictionary covers a significant portion of commonly used words, the continuous evolution of language and the usage of slang in settings such as social media requires fixed resources to be frequently updated in order to stay relevant. In this work we present LIWC-UD, an automatically generated extension to LIWC’s dictionary which includes terms defined in Urban Dictionary. While original LIWC contains 6,547 unique entries, LIWC-UD consists of 141K unique terms automatically categorized into LIWC categories with high confidence using BERT classifier. LIWC-UD covers many additional terms that are commonly used on social media platforms like Twitter. We release LIWC-UD publicly to the community as a supplement to the original LIWC lexicon.

Supplemental Material

WS22_S7_100.mp4

mp4

1,019.8 MB

Download

References

Aseel Addawood, Adam Badawy, Kristina Lerman, and Emilio Ferrara. 2019. Linguistic cues to deception: Identifying political trolls on social media. In Proceedings of the international AAAI conference on web and social media, Vol. 13. 15–25.Google ScholarCross Ref
Silvio Amir, Rámon Astudillo, Wang Ling, Paula C Carvalho, and Mário J Silva. 2016. Expanding subjective lexicons for social media mining with embedding subspaces. arXiv preprint arXiv:1701.00145(2016).Google Scholar
Ramón Astudillo, Silvio Amir, Wang Ling, Mario J Silva, and Isabel Trancoso. 2015. Learning word representations from scarce and noisy data with embedding subspaces. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1074–1084.Google ScholarCross Ref
Henri Avancini, Alberto Lavelli, Fabrizio Sebastiani, and Roberto Zanoli. 2006. Automatic expansion of domain-specific lexicons by term categorization. ACM Transactions on Speech and Language Processing (TSLP) 3, 1(2006), 1–30.Google ScholarDigital Library
Gilbert Badaro, Hussein Jundi, Hazem Hajj, and Wassim El-Hajj. 2018. EmoWordNet: Automatic expansion of emotion lexicon using English WordNet. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. 86–93.Google ScholarCross Ref
Mohamed Bahgat, Steven R Wilson, and Walid Magdy. 2020. Towards Using Word Vector Embeddings Space for Better Cohort Analysis. In Proceedings of the International AAAI Conference on Web and Social Media.Google ScholarCross Ref
Pedro Balage Filho, Thiago Alexandre Salgueiro Pardo, and Sandra Aluísio. 2013. An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology.Google Scholar
Ryan L Boyd, Ashwini Ashokkumar, Sarah Seraj, and James W Pennebaker. 2022. The Development and Psychometric Properties of LIWC-22. (2022).Google Scholar
Margaret M Bradley and Peter J Lang. 1999. Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical Report. Technical report C-1, the center for research in psychophysiology ….Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.Google Scholar
Peter Sheridan Dodds, Kameron Decker Harris, Isabel M Kloumann, Catherine A Bliss, and Christopher M Danforth. 2011. Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PloS one 6, 12 (2011), e26752.Google ScholarCross Ref
Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, and Elizabeth Belding. 2018. Hate lingo: A target-based linguistic analysis of hate speech in social media. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 12.Google ScholarCross Ref
Ethan Fast, Binbin Chen, and Michael S Bernstein. 2016. Empath: Understanding topic signals in large-scale text. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 4647–4657.Google ScholarDigital Library
Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. 168–177.Google ScholarDigital Library
Kokil Jaidka, Niyati Chhaya, Saran Mumick, Matthew Killingsworth, Alon Halevy, and Lyle Ungar. 2020. Beyond Positive Emotion: Deconstructing Happy Moments Based on Writing Prompts. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 294–302.Google ScholarCross Ref
Kokil Jaidka, Niyati Chhaya, and Lyle Ungar. 2018. Diachronic degradation of language models: Insights from social media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Melbourne, Australia, 195–200. https://doi.org/10.18653/v1/P18-2032Google ScholarCross Ref
Ewa Kacewicz, James W Pennebaker, Matthew Davis, Moongee Jeon, and Arthur C Graesser. 2014. Pronoun use reflects standings in social hierarchies. Journal of Language and Social Psychology 33, 2 (2014), 125–143.Google ScholarCross Ref
Jeffrey H Kahn, Renee M Tobin, Audra E Massey, and Jennifer A Anderson. 2007. Measuring emotional expression with the Linguistic Inquiry and Word Count. The American journal of psychology(2007), 263–286.Google Scholar
Hussain S Khawaja, Mirza O Beg, and Saira Qamar. 2018. Domain specific emotion lexicon expansion. In 2018 14th International Conference on Emerging Technologies (ICET). IEEE, 1–5.Google ScholarCross Ref
David D Lewis, Yiming Yang, Tony Russell-Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. Journal of machine learning research 5, Apr (2004), 361–397.Google ScholarDigital Library
Bernardo Magnini and Gabriela Cavaglia. 2000. Integrating Subject Field Codes into WordNet.. In LREC. 1413–1418.Google Scholar
Matthew Matero, Akash Idnani, Youngseo Son, Salvatore Giorgi, Huy Vu, Mohammad Zamani, Parth Limbachiya, Sharath Chandra Guntuku, and H Andrew Schwartz. 2019. Suicide risk assessment with multi-level dual-context language and bert. In Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology. 39–44.Google ScholarCross Ref
Tabea Meier, Ryan L Boyd, James W Pennebaker, Matthias R Mehl, Mike Martin, Markus Wolf, and Andrea B Horn. 2019. “LIWC auf Deutsch”: The Development, Psychometrics, and Introduction of DE-LIWC2015. PsyArXiva(2019).Google Scholar
George Miller. 1998. WordNet: An electronic lexical database. MIT press.Google Scholar
Saif M Mohammad and Peter D Turney. 2013. Crowdsourcing a word–emotion association lexicon. Computational intelligence 29, 3 (2013), 436–465.Google Scholar
Matthew L. Newman, James W. Pennebaker, Diane S. Berry, and Jane M. Richards. 2003. Lying Words: Predicting Deception from Linguistic Styles. Personality and Social Psychology Bulletin 29, 5 (2003), 665–675. https://doi.org/10.1177/0146167203029005010 arXiv:https://doi.org/10.1177/0146167203029005010PMID: 15272998.Google ScholarCross Ref
Dong Nguyen, Barbara McGillivray, and Taha Yasseri. 2018. Emo, love and god: making sense of Urban Dictionary, a crowd-sourced online dictionary. Royal Society open science 5, 5 (2018), 172320.Google Scholar
Marco Ortu, Alessandro Murgia, Giuseppe Destefanis, Parastou Tourani, Roberto Tonelli, Michele Marchesi, and Bram Adams. 2016. The emotional side of software developers in JIRA. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR). IEEE, 480–483.Google ScholarDigital Library
James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of LIWC2015. Technical Report.Google Scholar
James W Pennebaker, Martha E Francis, and Roger J Booth. 2001. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates 71, 2001 (2001), 2001.Google Scholar
Soujanya Poria, Alexander Gelbukh, Erik Cambria, Amir Hussain, and Guang-Bin Huang. 2014. EmoSenticSpace: A novel framework for affective common-sense reasoning. Knowledge-Based Systems 69 (2014), 108–123.Google ScholarDigital Library
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383–2392.Google ScholarCross Ref
Sara Rosenthal, Preslav Nakov, Svetlana Kiritchenko, Saif Mohammad, Alan Ritter, and Veselin Stoyanov. 2015. SemEval-2015 Task 10: Sentiment Analysis in Twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). Association for Computational Linguistics, Denver, Colorado, 451–463. https://doi.org/10.18653/v1/S15-2078Google ScholarCross Ref
Fabrizio Sebastiani, Alessandro Sperduti, and Nicola Valdambrini. 2000. An improved boosting algorithm and its application to text categorization. In Proceedings of the ninth international conference on Information and knowledge management. 78–85.Google ScholarDigital Library
Samira Shaikh, Kit Cho, Tomek Strzalkowski, Laurie Feldman, John Lien, Ting Liu, and George Aaron Broadwell. 2016. ANEW+: Automatic expansion and validation of affective norms of words lexicons in multiple languages. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). 1127–1132.Google Scholar
Jacopo Staiano and Marco Guerini. 2014. Depeche Mood: a Lexicon for Emotion Analysis from Crowd Annotated News. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 427–433.Google ScholarCross Ref
Philip J Stone, Dexter C Dunphy, and Marshall S Smith. 1966. The general inquirer: A computer approach to content analysis.(1966).Google Scholar
Mike Thelwall. 2017. The Heart and soul of the web? Sentiment strength detection in the social web with SentiStrength. In Cyberemotions. Springer, 119–134.Google Scholar
Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas. 2010. Sentiment strength detection in short informal text. Journal of the American society for information science and technology 61, 12 (2010), 2544–2558.Google ScholarCross Ref
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2019. Superglue: A stickier benchmark for general-purpose language understanding systems. arXiv preprint arXiv:1905.00537(2019).Google Scholar
Alex Wang, Amapreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2019. Glue: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
Amy Beth Warriner, Victor Kuperman, and Marc Brysbaert. 2013. Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior research methods 45, 4 (2013), 1191–1207.Google Scholar
Steven Wilson, Walid Magdy, Barbara McGillivray, and Gareth Tyson. 2021. Embedding Structured Dictionary Entries. Association for Computational Linguistics.Google Scholar
Steven R Wilson, Walid Magdy, Barbara McGillivray, Kiran Garimella, and Gareth Tyson. 2020. Urban dictionary embeddings for slang NLP applications. ACL.Google Scholar
Steven R Wilson, Walid Magdy, Barbara McGillivray, and Gareth Tyson. 2020. Analyzing temporal relationships between trending terms on twitter and urban dictionary activity. In 12th ACM Conference on Web Science. 155–163.Google ScholarDigital Library
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of human language technology conference and conference on empirical methods in natural language processing. 347–354.Google ScholarDigital Library
Liang Wu, Fred Morstatter, and Huan Liu. 2018. SlangSD: building, expanding and using a sentiment dictionary of slang words for short-text sentiment classification. Language Resources and Evaluation 52, 3 (2018), 839–852.Google ScholarDigital Library
Xiangkai Zeng, Cheng Yang, Cunchao Tu, Zhiyuan Liu, and Maosong Sun. 2018. Chinese LIWC lexicon expansion via hierarchical classification of word embeddings with sememe attention. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref

Index Terms

LIWC-UD: Classifying Online Slang Terms into LIWC Categories

Index terms have been assigned to the content through auto-classification.

Recommendations

Improving LIWC Using Soft Word Matching
BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

The widely deployed and easy-to-use Linguistic Inquiry and Word Count (LIWC) tool is the gold standard for many computerized text analysis tasks for many medical applications such as patient sentiment analysis, depression detection, and ADHD detection. ...
Read More
Chinese LIWC lexicon expansion via hierarchical classification of word embeddings with sememe attention
AAAI'18/IAAI'18/EAAI'18: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence

Linguistic Inquiry and Word Count (LIWC) is a word counting software tool which has been used for quantitative text analysis in many fields. Due to its success and popularity, the core lexicon has been translated into Chinese and many other languages. ...
Read More
A study on LIWC categories for opinion mining in Spanish reviews

With the exponential growth of social media, that is, blogs and social networks, organizations and individual persons are increasingly using the number of reviews of these media for decision-making about a product or service. Opinion mining detects ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022
June 2022
479 pages
ISBN:9781450391917
DOI:10.1145/3501247
General Chairs:
Ricardo Baeza-Yates
Northeastern University, MA, USA & Universitat Pompeu Fabra, Spain
,
Katrin Weller
GESIS & Center for Advanced Internet Studies, Germany
,
Organizing Chair:
Manuel Portela
Universitat Pompeu Fabra, Spain
,
Program Chairs:
Oshani Seneviratne
Rensselaer Polytechnic Institute, NY, USA
,
Ingmar Weber
Qatar Computing Research Institute, Qatar
,
Taha Yasseri
University College Dublin, Ireland
,
Publications Chairs:
Anna Bon
Vrije Universiteit Amsterdam, Netherlands
,
Srinath Srinivas
International Institute of Information Technology, Bangalore, India
,
Luis-Daniel Ibáñez
University of Southampton, UK
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Expansion
LIWC
Lexicons
Urban Dictionary
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate218of875submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 306
  Total Downloads
- Downloads (Last 12 months)117
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

LIWC-UD: Classifying Online Slang Terms into LIWC Categories

WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Improving LIWC Using Soft Word Matching

Chinese LIWC lexicon expansion via hierarchical classification of word embeddings with sememe attention

A study on LIWC categories for opinion mining in Spanish reviews