ABSTRACT
Linguistic Inquiry and Word Count (LIWC), a popular tool for automated text analysis, relies on an expert-crafted internal dictionary of psychologically relevant words and their corresponding categories. While LIWC’s dictionary covers a significant portion of commonly used words, the continuous evolution of language and the usage of slang in settings such as social media requires fixed resources to be frequently updated in order to stay relevant. In this work we present LIWC-UD, an automatically generated extension to LIWC’s dictionary which includes terms defined in Urban Dictionary. While original LIWC contains 6,547 unique entries, LIWC-UD consists of 141K unique terms automatically categorized into LIWC categories with high confidence using BERT classifier. LIWC-UD covers many additional terms that are commonly used on social media platforms like Twitter. We release LIWC-UD publicly to the community as a supplement to the original LIWC lexicon.
Supplemental Material
- Aseel Addawood, Adam Badawy, Kristina Lerman, and Emilio Ferrara. 2019. Linguistic cues to deception: Identifying political trolls on social media. In Proceedings of the international AAAI conference on web and social media, Vol. 13. 15–25.Google ScholarCross Ref
- Silvio Amir, Rámon Astudillo, Wang Ling, Paula C Carvalho, and Mário J Silva. 2016. Expanding subjective lexicons for social media mining with embedding subspaces. arXiv preprint arXiv:1701.00145(2016).Google Scholar
- Ramón Astudillo, Silvio Amir, Wang Ling, Mario J Silva, and Isabel Trancoso. 2015. Learning word representations from scarce and noisy data with embedding subspaces. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1074–1084.Google ScholarCross Ref
- Henri Avancini, Alberto Lavelli, Fabrizio Sebastiani, and Roberto Zanoli. 2006. Automatic expansion of domain-specific lexicons by term categorization. ACM Transactions on Speech and Language Processing (TSLP) 3, 1(2006), 1–30.Google ScholarDigital Library
- Gilbert Badaro, Hussein Jundi, Hazem Hajj, and Wassim El-Hajj. 2018. EmoWordNet: Automatic expansion of emotion lexicon using English WordNet. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. 86–93.Google ScholarCross Ref
- Mohamed Bahgat, Steven R Wilson, and Walid Magdy. 2020. Towards Using Word Vector Embeddings Space for Better Cohort Analysis. In Proceedings of the International AAAI Conference on Web and Social Media.Google ScholarCross Ref
- Pedro Balage Filho, Thiago Alexandre Salgueiro Pardo, and Sandra Aluísio. 2013. An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology.Google Scholar
- Ryan L Boyd, Ashwini Ashokkumar, Sarah Seraj, and James W Pennebaker. 2022. The Development and Psychometric Properties of LIWC-22. (2022).Google Scholar
- Margaret M Bradley and Peter J Lang. 1999. Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical Report. Technical report C-1, the center for research in psychophysiology ….Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.Google Scholar
- Peter Sheridan Dodds, Kameron Decker Harris, Isabel M Kloumann, Catherine A Bliss, and Christopher M Danforth. 2011. Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PloS one 6, 12 (2011), e26752.Google ScholarCross Ref
- Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, and Elizabeth Belding. 2018. Hate lingo: A target-based linguistic analysis of hate speech in social media. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 12.Google ScholarCross Ref
- Ethan Fast, Binbin Chen, and Michael S Bernstein. 2016. Empath: Understanding topic signals in large-scale text. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 4647–4657.Google ScholarDigital Library
- Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. 168–177.Google ScholarDigital Library
- Kokil Jaidka, Niyati Chhaya, Saran Mumick, Matthew Killingsworth, Alon Halevy, and Lyle Ungar. 2020. Beyond Positive Emotion: Deconstructing Happy Moments Based on Writing Prompts. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 294–302.Google ScholarCross Ref
- Kokil Jaidka, Niyati Chhaya, and Lyle Ungar. 2018. Diachronic degradation of language models: Insights from social media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Melbourne, Australia, 195–200. https://doi.org/10.18653/v1/P18-2032Google ScholarCross Ref
- Ewa Kacewicz, James W Pennebaker, Matthew Davis, Moongee Jeon, and Arthur C Graesser. 2014. Pronoun use reflects standings in social hierarchies. Journal of Language and Social Psychology 33, 2 (2014), 125–143.Google ScholarCross Ref
- Jeffrey H Kahn, Renee M Tobin, Audra E Massey, and Jennifer A Anderson. 2007. Measuring emotional expression with the Linguistic Inquiry and Word Count. The American journal of psychology(2007), 263–286.Google Scholar
- Hussain S Khawaja, Mirza O Beg, and Saira Qamar. 2018. Domain specific emotion lexicon expansion. In 2018 14th International Conference on Emerging Technologies (ICET). IEEE, 1–5.Google ScholarCross Ref
- David D Lewis, Yiming Yang, Tony Russell-Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. Journal of machine learning research 5, Apr (2004), 361–397.Google ScholarDigital Library
- Bernardo Magnini and Gabriela Cavaglia. 2000. Integrating Subject Field Codes into WordNet.. In LREC. 1413–1418.Google Scholar
- Matthew Matero, Akash Idnani, Youngseo Son, Salvatore Giorgi, Huy Vu, Mohammad Zamani, Parth Limbachiya, Sharath Chandra Guntuku, and H Andrew Schwartz. 2019. Suicide risk assessment with multi-level dual-context language and bert. In Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology. 39–44.Google ScholarCross Ref
- Tabea Meier, Ryan L Boyd, James W Pennebaker, Matthias R Mehl, Mike Martin, Markus Wolf, and Andrea B Horn. 2019. “LIWC auf Deutsch”: The Development, Psychometrics, and Introduction of DE-LIWC2015. PsyArXiva(2019).Google Scholar
- George Miller. 1998. WordNet: An electronic lexical database. MIT press.Google Scholar
- Saif M Mohammad and Peter D Turney. 2013. Crowdsourcing a word–emotion association lexicon. Computational intelligence 29, 3 (2013), 436–465.Google Scholar
- Matthew L. Newman, James W. Pennebaker, Diane S. Berry, and Jane M. Richards. 2003. Lying Words: Predicting Deception from Linguistic Styles. Personality and Social Psychology Bulletin 29, 5 (2003), 665–675. https://doi.org/10.1177/0146167203029005010 arXiv:https://doi.org/10.1177/0146167203029005010PMID: 15272998.Google ScholarCross Ref
- Dong Nguyen, Barbara McGillivray, and Taha Yasseri. 2018. Emo, love and god: making sense of Urban Dictionary, a crowd-sourced online dictionary. Royal Society open science 5, 5 (2018), 172320.Google Scholar
- Marco Ortu, Alessandro Murgia, Giuseppe Destefanis, Parastou Tourani, Roberto Tonelli, Michele Marchesi, and Bram Adams. 2016. The emotional side of software developers in JIRA. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR). IEEE, 480–483.Google ScholarDigital Library
- James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of LIWC2015. Technical Report.Google Scholar
- James W Pennebaker, Martha E Francis, and Roger J Booth. 2001. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates 71, 2001 (2001), 2001.Google Scholar
- Soujanya Poria, Alexander Gelbukh, Erik Cambria, Amir Hussain, and Guang-Bin Huang. 2014. EmoSenticSpace: A novel framework for affective common-sense reasoning. Knowledge-Based Systems 69 (2014), 108–123.Google ScholarDigital Library
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383–2392.Google ScholarCross Ref
- Sara Rosenthal, Preslav Nakov, Svetlana Kiritchenko, Saif Mohammad, Alan Ritter, and Veselin Stoyanov. 2015. SemEval-2015 Task 10: Sentiment Analysis in Twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). Association for Computational Linguistics, Denver, Colorado, 451–463. https://doi.org/10.18653/v1/S15-2078Google ScholarCross Ref
- Fabrizio Sebastiani, Alessandro Sperduti, and Nicola Valdambrini. 2000. An improved boosting algorithm and its application to text categorization. In Proceedings of the ninth international conference on Information and knowledge management. 78–85.Google ScholarDigital Library
- Samira Shaikh, Kit Cho, Tomek Strzalkowski, Laurie Feldman, John Lien, Ting Liu, and George Aaron Broadwell. 2016. ANEW+: Automatic expansion and validation of affective norms of words lexicons in multiple languages. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). 1127–1132.Google Scholar
- Jacopo Staiano and Marco Guerini. 2014. Depeche Mood: a Lexicon for Emotion Analysis from Crowd Annotated News. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 427–433.Google ScholarCross Ref
- Philip J Stone, Dexter C Dunphy, and Marshall S Smith. 1966. The general inquirer: A computer approach to content analysis.(1966).Google Scholar
- Mike Thelwall. 2017. The Heart and soul of the web? Sentiment strength detection in the social web with SentiStrength. In Cyberemotions. Springer, 119–134.Google Scholar
- Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas. 2010. Sentiment strength detection in short informal text. Journal of the American society for information science and technology 61, 12 (2010), 2544–2558.Google ScholarCross Ref
- Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2019. Superglue: A stickier benchmark for general-purpose language understanding systems. arXiv preprint arXiv:1905.00537(2019).Google Scholar
- Alex Wang, Amapreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2019. Glue: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
- Amy Beth Warriner, Victor Kuperman, and Marc Brysbaert. 2013. Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior research methods 45, 4 (2013), 1191–1207.Google Scholar
- Steven Wilson, Walid Magdy, Barbara McGillivray, and Gareth Tyson. 2021. Embedding Structured Dictionary Entries. Association for Computational Linguistics.Google Scholar
- Steven R Wilson, Walid Magdy, Barbara McGillivray, Kiran Garimella, and Gareth Tyson. 2020. Urban dictionary embeddings for slang NLP applications. ACL.Google Scholar
- Steven R Wilson, Walid Magdy, Barbara McGillivray, and Gareth Tyson. 2020. Analyzing temporal relationships between trending terms on twitter and urban dictionary activity. In 12th ACM Conference on Web Science. 155–163.Google ScholarDigital Library
- Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of human language technology conference and conference on empirical methods in natural language processing. 347–354.Google ScholarDigital Library
- Liang Wu, Fred Morstatter, and Huan Liu. 2018. SlangSD: building, expanding and using a sentiment dictionary of slang words for short-text sentiment classification. Language Resources and Evaluation 52, 3 (2018), 839–852.Google ScholarDigital Library
- Xiangkai Zeng, Cheng Yang, Cunchao Tu, Zhiyuan Liu, and Maosong Sun. 2018. Chinese LIWC lexicon expansion via hierarchical classification of word embeddings with sememe attention. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref
Index Terms
- LIWC-UD: Classifying Online Slang Terms into LIWC Categories
Recommendations
Improving LIWC Using Soft Word Matching
BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsThe widely deployed and easy-to-use Linguistic Inquiry and Word Count (LIWC) tool is the gold standard for many computerized text analysis tasks for many medical applications such as patient sentiment analysis, depression detection, and ADHD detection. ...
Chinese LIWC lexicon expansion via hierarchical classification of word embeddings with sememe attention
AAAI'18/IAAI'18/EAAI'18: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial IntelligenceLinguistic Inquiry and Word Count (LIWC) is a word counting software tool which has been used for quantitative text analysis in many fields. Due to its success and popularity, the core lexicon has been translated into Chinese and many other languages. ...
A study on LIWC categories for opinion mining in Spanish reviews
With the exponential growth of social media, that is, blogs and social networks, organizations and individual persons are increasingly using the number of reviews of these media for decision-making about a product or service. Opinion mining detects ...
Comments