Detecting Sentiment toward Emerging Infectious Diseases on Social Media: A Validity Evaluation of Dictionary-Based Sentiment Analysis
Abstract
:1. Introduction
2. Methods
2.1. Data
2.2. Target DSA
2.3. Sentiment Classification of DSA
2.4. Coding Procedure
2.5. Analytical Plan
3. Results
3.1. DSA Validity Evaluation
3.2. Textual Features Associated with Invalidity of DSA
4. Discussion
Suggestions for Scholars on the Use of DSA
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lerner, J.S.; Li, Y.; Valdesolo, P.; Kassam, K.S. Emotion and Decision Making. Annu. Rev. Psychol. 2015, 66, 799–823. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Golder, S.A.; Macy, M.W. Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures. Science 2011, 333, 1878–1881. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chan, C.-H.; Bajjalieh, J.; Auvil, L.; Wessler, H.; Althaus, S.; Welbers, K.; van Atteveldt, W.; Jungblut, M. Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: A large-scale p-hacking experiment. Comput. Commun. Res. 2021, 3, 1–27. [Google Scholar] [CrossRef]
- Kim, E.H.-J.; Jeong, Y.K.; Kim, Y.; Kang, K.Y.; Song, M. Topic-based content and sentiment analysis of Ebola virus on Twitter and in the news. J. Inf. Sci. 2016, 42, 763–781. [Google Scholar] [CrossRef]
- Lim, S.; Tucker, C.S.; Kumara, S. An unsupervised machine learning model for discovering latent infectious diseases using social media data. J. Biomed. Inform. 2017, 66, 82–94. [Google Scholar] [CrossRef]
- Clark, E.M.; Jones, C.A.; Williams, J.R.; Kurti, A.N.; Norotsky, M.C.; Danforth, C.M.; Dodds, P.S. Vaporous Marketing: Uncovering Pervasive Electronic Cigarette Advertisements on Twitter. PLoS ONE 2016, 11, e0157304. [Google Scholar] [CrossRef] [Green Version]
- Lu, Y.; Wu, Y.; Liu, J.; Li, J.; Zhang, P.; Vaughan, T.; Bie, B. Understanding Health Care Social Media Use from Different Stakeholder Perspectives: A Content Analysis of an Online Health Community. J. Med. Internet Res. 2017, 19, e109. [Google Scholar] [CrossRef]
- Kumar, C.S.P.; Babu, L.D.D. Evolving dictionary based sentiment scoring framework for patient authored text. Evol. Intell. 2021, 14, 657–667. [Google Scholar] [CrossRef]
- Vishnu, K.; Apoorva, T.; Gupta, D. Learning domain-specific and domain-independent opinion oriented lexicons using multiple domain knowledge. In Proceedings of the 2014 Seventh International Conference on Contemporary Computing (IC3), Noida, India, 7–9 August 2014; pp. 318–323. [Google Scholar] [CrossRef]
- van Atteveldt, W.; van der Velden, M.A.C.G.; Boukes, M. The Validity of Sentiment Analysis: Comparing Manual Annotation, Crowd-Coding, Dictionary Approaches, and Machine Learning Algorithms. Commun. Methods Meas. 2021, 15, 121–140. [Google Scholar] [CrossRef]
- Taboada, M.; Brooke, J.; Tofiloski, M.; Voll, K.; Stede, M. Lexicon-Based Methods for Sentiment Analysis. Comput. Linguist. 2011, 37, 267–307. [Google Scholar] [CrossRef]
- Maynard, D.G.; Greenwood, M.A. Who Cares about Sarcastic Tweets? Investigating the Impact of Sarcasm on Sentiment Analaysis. In Proceedings of the LREC 2014 Proceedings, Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland, 26–31 May 2014. [Google Scholar]
- Farooq, U. Negation Handling in Sentiment Analysis at Sentence Level. JCP 2017, 12, 470–478. [Google Scholar] [CrossRef]
- Kennedy, A.; Inkpen, D. Sentiment Classification of Movie Reviews Using Contextual Valence Shifters. Comput. Intell. 2006, 22, 110–125. [Google Scholar] [CrossRef] [Green Version]
- Polanyi, L.; Zaenen, A. Contextual Valence Shifters. In Computing Attitude and Affect in Text: Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–10. [Google Scholar]
- Zhou, Z.; Zhang, X.; Sanderson, M. Sentiment Analysis on Twitter through Topic-Based Lexicon Expansion; Springer: Berlin/Heidelberg, Germany, 2014; pp. 98–109. [Google Scholar]
- Brody, S.; Diakopoulos, N. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, 27–31 July 2011; pp. 562–570. [Google Scholar]
- Kumar, A.; Sebastian, T.M. Sentiment Analysis on Twitter. Int. J. Comput. Sci. Issues 2012, 9, 372. [Google Scholar]
- Yadav, P.; Pandya, D. SentiReview: Sentiment Analysis Based on Text and Emoticons. In Proceedings of the 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bengaluru, India, 21–23 February 2017; pp. 467–472. [Google Scholar]
- Sun, G.; Tang, T.; Peng, T.-Q.; Liang, R.; Wu, Y. SocialWave: Visual Analysis of Spatio-Temporal Diffusion of Information on Social Media. ACM Trans. Intell. Syst. Technol. 2018, 9, 1–23. [Google Scholar] [CrossRef]
- Krippendorff, K. Reliability in Content Analysis. Hum. Commun. Res. 2004, 30, 411–433. [Google Scholar] [CrossRef]
- Lazarus, R.S. From Psychological Stress to the Emotions: A History of Changing Outlooks. Annu. Rev. Psychol. 1993, 44, 1–22. [Google Scholar] [CrossRef] [PubMed]
- Hung, C.; Chen, S.-J. Word sense disambiguation based sentiment lexicons for sentiment classification. Knowl.-Based Syst. 2016, 110, 224–232. [Google Scholar] [CrossRef]
- Lesk, M. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation, Toronto, ON, Canada, 1986; pp. 24–26. [Google Scholar] [CrossRef]
- Banerjee, S.; Pedersen, T. An adapted Lesk algorithm for word sense disambiguation using WordNet. In Proceedings of the Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, 17–23 February 2002; pp. 136–145, PASCAL Archive. [Google Scholar]
- Banerjee, S.; Pedersen, T. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 9–15 August 2003; pp. 805–810. [Google Scholar]
- Rahman, H.A.A.; Wah, Y.B.; Huat, O.S. Predictive Performance of Logistic Regression for Imbalanced Data with Categorical Covariate. Pertanika J. Sci. Technol. 2021, 29, 181–197. [Google Scholar] [CrossRef]
- Teh, P.L.; Rayson, P.; Pak, I.; Piao, S. Sentiment analysis tools should take account of the number of exclamation marks. In Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services, Brussels, Belgium, 11–13 December 2015; pp. 1–6. [Google Scholar]
- Ayvaz, S.; Shiha, M.O. The Effects of Emoji in Sentiment Analysis. Int. J. Comput. Electr. Eng. 2017, 9, 360–369. [Google Scholar] [CrossRef]
- Shadish, W.R.; Cook, T.D.; Campbell, D.T. Statistical conclusion validity and internal validity. In Experimental and Quasi-Experimental Designs for Generalized Causal Inference; Houghton Mifflin: Boston, MA, USA, 2002; pp. 45–48. [Google Scholar]
- Baden, C.; Pipal, C.; Schoonvelde, M.; van der Velden, M.A.C.G. Three Gaps in Computational Text Analysis Methods for Social Sciences: A Research Agenda. Commun. Methods Meas. 2021, 16, 1–18. [Google Scholar] [CrossRef]
- Hutto, C.; Gilbert, E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In Proceedings of the International AAAI Conference on Web and Social Media, Seattle, WA, USA, 30 March–2 April 2008; Volume 8. Available online: https://ojs.aaai.org/index.php/ICWSM/article/view/14550 (accessed on 13 February 2021).
- Kolchyna, O.; Souza, T.T.P.; Treleaven, P.; Aste, T. Twitter Sentiment Analysis: Lexicon Method, Machine Learning Method and Their Combination. arXiv 2015, arXiv:1507.00955. [Google Scholar]
Textual Features | Definitions | Reasonings and Examples | References |
---|---|---|---|
Semantic Level | |||
Embedded Hashtag | A hashtag that grammatically structures a sentence. | An embedded hashtag can threaten the validity of DSA because a hashtag structurally embedded in tweets can be meaningful, and widely used but cannot be generally captured by DSA. (e.g., #Stopspreadingebola by donating $5 to our NGO.) | N.A. |
Irrealis | A function indicating that a certain situation or action is unknown to happen. | It is challenging to estimate the accurate sentiment of a text containing irrealis because irrealis can change the meaning of sentiment-bearing words in a subtle manner. Irrealis’ markers include modal verbs (e.g., would, could, would have), conditional markers (e.g., if), negative polarity items (e.g., any, anything), certain verbs (e.g., expect, doubt, assume), and questions. (e.g., if it spreads, it will destroy everything it touches.) | [11] |
Sarcasm | A sarcastic statement is defined as one where the opposite meaning is intended. | Sarcasm completely shifts the orientation of sentiment by using the opposite meaning of words given a context. (e.g., What did I tell you? This may be the “great plague.”) | [12] |
Negation | Negations are terms that reverse the sentiment of a certain word. | Negations change the orientation of a sentence from positive to negative or negative to positive (e.g., no, not, rather, never, none, nobody, no one, nothing, neither, nor, nowhere, without). (e.g., Ebola ain’t fun.) | [11,13,14,15] |
Intensifier | Intensifiers are terms that intensify the degree of the expressed sentiment. | Intensifiers change the sentiment of a sentence by intensifying the strength of sentiment (e.g., very, really, extraordinarily, huge, total). (e.g., the risk of Ebola infection for travelers is very low.) | [11,14,15] |
Diminisher | Diminishers are terms that decrease the degree of the expressed sentiment. | Diminishers change the sentiment of a sentence by decreasing the strength of sentiment (e.g., slightly; somewhat; minor). (e.g., I’m a little worried about Ebola.) | [11,14] |
Word-Level | |||
Unconfirmed typo | A misspelled word. | A misspelled word may hold sentiments but is not generally capturable by DSA. (e.g., I feel bad about Ebola.) | [16] |
Lengthened word | A lengthened word. | A lengthened word is difficult to be captured through DSA due to its unstructured format, although it may contain stronger sentiment compared with an ordinary format word. (e.g., who’s got the biggest smile to save N lives against Ebola? Nooooobody.) | [16,17] |
Irregularly capitalized word | A word that is capitalized in an uncommon way. | An irregularly capitalized word may contain stronger sentiment than a word in its ordinary format but is not generally capturable by DSA. (e.g., I think Bill Gates is a GREAT man!) | [18] |
Abbreviation | A shortened form of a word. | An abbreviation may contain a sentiment but is generally ignored by DSA. (e.g., there is no cure or something is really bs.) | [16] |
Acronym | A shortened form of a phrase that consists of the initials of each word. | An acronym may contain a sentiment but is generally ignored by DSA. (e.g., who TF eats bats?) | [19] |
LIWC | ANEW | SWN | orgSWN | adSWN | ||
---|---|---|---|---|---|---|
F1 | F1 | F1 | F1 | F1 | Mean | |
Neg | 0.34 | 0.20 | 0.35 | 0.24 | 0.31 | 0.29 |
Neu | 0.70 | 0.47 | 0.01 | 0.51 | 0.17 | 0.37 |
Pos | 0.30 | 0.13 | 0.19 | 0.15 | 0.13 | 0.18 |
Macro Average | 0.45 | 0.27 | 0.18 | 0.30 | 0.21 | 0.28 |
Accuracy (%) | 56.84 | 32.87 | 19.22 | 37.46 | 21.25 | 33.53 |
Tweets (n) | 7421 | 7797 | 7790 | 7319 | 7175 | 7500 |
LIWC | ANEW | SWN | orgSWN | Averaged Matched Cases | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Neg | Neu | Pos | Mix | Neg | Neu | Pos | Mix | Neg | Neu | Pos | Mix | Neg | Neu | Pos | Mix | |||
LIWC | Neg | 41.72 | ||||||||||||||||
Neu | ||||||||||||||||||
Pos | ||||||||||||||||||
Mix | ||||||||||||||||||
ANEW | Neg | 11.22 | 8.69 | 1.54 | 1.38 | 34.40 | ||||||||||||
Neu | 6.80 | 21.67 | 6.10 | 1.24 | ||||||||||||||
Pos | 8.62 | 20.66 | 9.85 | 2.21 | ||||||||||||||
Mix | 0.00 | 0.01 | 0.00 | 0.01 | ||||||||||||||
SWN | Neg | 18.58 | 36.02 | 11.45 | 2.99 | 16.16 | 24.89 | 27.97 | 0.03 | 33.72 | ||||||||
Neu | 0.08 | 0.22 | 0.08 | 0.01 | 0.08 | 0.17 | 0.14 | 0.00 | ||||||||||
Pos | 7.98 | 14.71 | 5.95 | 1.83 | 6.59 | 10.71 | 13.17 | 0.00 | ||||||||||
Mix | 0.00 | 0.09 | 0.01 | 0.01 | 0.01 | 0.05 | 0.05 | 0.00 | ||||||||||
orgSWN | Neg | 10.68 | 15.28 | 3.05 | 1.40 | 10.62 | 8.48 | 11.31 | 0.01 | 21.37 | 0.06 | 8.96 | 0.01 | 31.36 | ||||
Neu | 9.08 | 20.82 | 4.62 | 1.26 | 6.04 | 16.25 | 13.48 | 0.01 | 24.73 | 0.19 | 10.76 | 0.09 | ||||||
Pos | 5.28 | 11.92 | 8.51 | 1.94 | 4.80 | 8.83 | 14.03 | 0.00 | 18.78 | 0.12 | 8.74 | 0.01 | ||||||
Mix | 1.59 | 3.00 | 1.31 | 0.26 | 1.38 | 2.26 | 2.51 | 0.00 | 4.14 | 0.01 | 2.00 | 0.00 | ||||||
adSWN | Neg | 13.51 | 22.49 | 6.82 | 2.33 | 11.19 | 16.98 | 16.99 | 0.00 | 31.18 | 0.18 | 13.73 | 0.06 | 21.37 | 0.06 | 8.96 | 0.01 | 30.99 |
Neu | 1.56 | 6.14 | 0.91 | 0.21 | 1.77 | 4.50 | 2.55 | 0.00 | 6.23 | 0.08 | 2.50 | 0.01 | 24.73 | 0.19 | 10.76 | 0.09 | ||
Pos | 9.51 | 18.02 | 8.57 | 1.92 | 8.00 | 11.67 | 18.32 | 0.03 | 26.18 | 0.13 | 11.68 | 0.03 | 18.78 | 0.12 | 8.74 | 0.01 | ||
Mix | 2.04 | 4.39 | 1.19 | 0.38 | 1.87 | 2.67 | 3.46 | 0.00 | 5.44 | 0.00 | 2.55 | 0.01 | 4.14 | 0.01 | 2.00 | 0.00 | ||
Average of Total Matched Cases | 34.44 |
Textual Features (IVs) | Inconsistency (DV) | ||||
---|---|---|---|---|---|
LIWC | ANEW | SWN | orgSWN | adSWN | |
Intercept | −0.42 *** | 0.73 *** | 1.83 *** | 0.49 *** | 1.66 *** |
Semantic Level | |||||
Embedded hashtags | −0.08 | −0.16 | −0.08 | 0.13 | −0.29 * |
Irrealis | 0.47 * | 0.21 | −1.06 *** | −0.11 | −0.37 |
Sarcasm | 1.09 | 0.94 | −0.32 | −0.34 | −1.69 * |
Negations | 0.77 *** | 0.09 | −1.17 *** | 0.42 ** | −0.60 *** |
Intensifiers | 0.21 | −0.08 | −0.30 | 0.35 * | −0.28 |
Diminishers | 0.33 | −0.25 | −0.23 | 0.85 | −0.33 |
Word-level | |||||
Unconfirmed typos | 0.62 | 0.30 | −1.55 *** | 0.24 | −0.95 ** |
Lengthened words | 0.95 | 0.93 | −0.72 | 0.13 | 0.69 |
Irregularly capitalized words | 0.45 | 0.83 * | −1.08 *** | 0.29 | −0.60 * |
Abbreviations | 0.46 * | 0.54 * | −0.78 ** | 0.19 | −0.38 |
Acronyms | 1.00 * | 0.52 | −1.14 ** | 1.26 * | −0.78 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, S.; Ma, S.; Meng, J.; Zhuang, J.; Peng, T.-Q. Detecting Sentiment toward Emerging Infectious Diseases on Social Media: A Validity Evaluation of Dictionary-Based Sentiment Analysis. Int. J. Environ. Res. Public Health 2022, 19, 6759. https://doi.org/10.3390/ijerph19116759
Lee S, Ma S, Meng J, Zhuang J, Peng T-Q. Detecting Sentiment toward Emerging Infectious Diseases on Social Media: A Validity Evaluation of Dictionary-Based Sentiment Analysis. International Journal of Environmental Research and Public Health. 2022; 19(11):6759. https://doi.org/10.3390/ijerph19116759
Chicago/Turabian StyleLee, Sanguk, Siyuan Ma, Jingbo Meng, Jie Zhuang, and Tai-Quan Peng. 2022. "Detecting Sentiment toward Emerging Infectious Diseases on Social Media: A Validity Evaluation of Dictionary-Based Sentiment Analysis" International Journal of Environmental Research and Public Health 19, no. 11: 6759. https://doi.org/10.3390/ijerph19116759