Abstract
The connected society we live in today has allowed online users to willingly share opinions on an unprecedented scale. Motivated by the advent of mass opinion sharing, it is then crucial to devise algorithms that efficiently identify the emotions expressed within the opinionated content. Traditional opinion-based classifiers require extracting high-dimensional feature representations, which become computationally expensive to process and can misrepresent or deteriorate the accuracy of a classifier. In this paper, we propose an unsupervised graph-based approach for extracting Twitter-specific emotion-bearing patterns to be used as features. By utilizing a more representative list of patterns, as features, we improved the precision and recall of a given emotion classification task. Due to its novel bootstrapping process, the full system is also adaptable to different domains and languages. The experimented results demonstrate that the extracted patterns are effective in identifying emotions for English, Spanish, and French Twitter streams. We also provide detailed experiments and offer an extended version of our algorithm to support the classification of Indonesian microblog posts. Overall, our empirical experimented results demonstrate that the proposed approach bears desirable characteristics such as accuracy, generality, adaptability, minimal supervision, and coverage.
Similar content being viewed by others
Notes
Predefined dictionaries require time and human effort to construct, and they may not be suitable for certain domains and languages. Moreover, directly translating them to other languages may not be effective or warrant high accuracy.
By emotion-related terms, we are referring to connector words and subject words that are relevant to emotion.
Considering that the languages we are studying are syntactic, where word order carries most or all of the meaning, graph analysis plays an important role in preserving the structure and meaning of text.
Other external sources or dictionaries become quickly outdated since they might not cover or contain such recent and commonly used informal words.
Unigrams require extra computational effort and may at times misrepresent a text.
We used around 5 query hashtags, as noisy labels, to collect the social data related to each emotion category. For example, for the emotion “sadness” we queried for tweets that contain the hashtag “#sadness” and other emotion-related hashtags.
References
Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst 26(3):12:1–12:34
Arifin AZ, Sari YA, Ratnasari EK, Mutrofin S (2014) Emotion detection of tweets in indonesian language using non-negative matrix factorization. Int J Intell Syst Appl (IJISA) 6(9):54
Balahur A, Turchi M (2012) Comparative experiments for multilingual sentiment analysis using machine translation. In: Proceedings of the 1st workshop on sentiment discovery from affective data (ECML-PKDD 2012), SDAD ’12, pp 75
Banea C, Mihalcea R, Wiebe J (2010) Multilingual subjectivity: Are more languages better? In: Proceedings of the 23rd international conference on computational linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, COLING ’10, pp 28–36
Banea C, Mihalcea R, Wiebe J (2013) Porting multilingual subjectivity resources across languages. IEEE Trans Affect Comput 99(PrePrints):1
Bermingham A, Smeaton AF (2010) Classifying sentiment in microblogs: Is brevity an advantage? Association for Computing Machinery, p 1833
Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boomboxes and blenders: domain adaptation for sentiment classification. In: ACL, pp 187–205
Bollen J, Mao H, Zeng XJ (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8
Calderon F, Chang CH, Argueta C, Saravia E, Chen YS (2015) Analyzing event opinion transition through summarized emotion visualization. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015, ACM, New York, NY, USA, ASONAM ’15, pp 749–752
Cui A, Zhang M, Liu Y, Ma S (2011) Emotion tokens: bridging the gap among multilingual twitter sentiment analysis. In: Proceedings of the 7th Asia conference on information retrieval technology, pp 238–249
Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: WWW ’03: Proceedings of the 12th international conference on World Wide Web, pp 519–528
Davidov D, Tsur O, Rappoport A (2010) Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 241–249
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1:12
Hu Y, Duan J, Chen X, Pei B, Lu R (2005) A new method for sentiment classification in text retrieval. In: Natural Language Processing–IJCNLP 2005, Springer, pp 1–9
Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inf Sci 60(11):2169–2188
Kanayama H, Nasukawa T (2006) Fully automatic lexicon expansion for domain-oriented sentiment analysis. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Association for Computational Linguistics, Stroudsburg, PA, USA, EMNLP ’06, pp 355–363
Kim E, Gilbert S, Edwards M, Graeff E (2009) Detecting sadness in 140 characters. Webecology Project
Kobayashi N, Inui K, Matsumoto Y, Tateishi K, Fukushima T (2005) Collecting evaluative expressions for opinion extraction. In: Natural Language Processing–IJCNLP 2004, Springer, pp 596–605
Kouloumpis E, Wilson T, Moore J (2011) Twitter sentiment analysis: the good the bad and the omg!. Icwsm 11:538–541
Kucuktunc O, Cambazoglu BB, Weber I, Ferhatosmanoglu H (2012) A large-scale sentiment analysis for yahoo! answers. In: Proceedings of the fifth ACM international conference on web search and data mining, ACM, New York, NY, USA, WSDM ’12, pp 633–642
Lin Y, Lei H, Wu J, Li X (2015) An empirical study on sentiment classification of chinese review using word embedding. Citeseer, pp 258–266
Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167
Liu KL, Li WJ, Guo M (2012) Emoticon smoothed language models for twitter sentiment analysis. In: AAAI, pp 1678–1684
Mohammad SM, Turney PD (2013) Crowdsourcing a word-emotion association lexicon. Comput Intell 29(3):436–465
Narr S, Hulfenhaus M, Albayrak S (2012) Language-independent twitter sentiment analysis. In: Workshop on knowledge discovery, data mining and machine learning, pp 12–14
Nasukawa T, Yi J (2003) Sentiment analysis: capturing favorability using natural language processing. In: Proceedings of the 2nd international conference on knowledge capture, ACM, New York, NY, USA, K-CAP ’03, pp 70–77
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. LREC 10:1320–1326
Pandey V, Iyer CVK (2010) Sentiment analysis of microblogs. Citeseer
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, vol 10, Association for Computational Linguistics, pp 79–86
Pell M, Rothermich K, Liu P, Paulmann S, Sethi S, Rigoulot S (2015) Preferential decoding of emotion from human non-linguistic vocalizations versus speech prosody. Biol Psychol 111:14–25
Pennebaker JW, Chung CK, Ireland M, Gonzales A, Booth RJ (2007) The development and psychometric properties of LIWC2007
Purnama KE (2012) Classification of emotions in indonesian textsusing K-NN method. Int J Inf Electron Eng 2(6):899
Qadir A, Riloff E (2013) Bootstrapped learning of emotion hashtags# hashtags4you. In: Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 2–11
Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 conference on empirical methods in natural language processing (EMNLP-03), pp 105–112
Saravia E, Argueta C, Chen YS (2015) Emoviz: Mining the world’s interest through emotion analysis. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015, ACM, New York, NY, USA, ASONAM ’15, pp 753–756
Singh V, Piryani R, Uddin A, Waila P (2013) Sentiment analysis of movie reviews and blog posts. In: 2013 IEEE 3rd international on advance computing conference (IACC), IEEE, pp 893–898
Sun Y, Quan C, Kang X, Zhang Z, Ren F (2015) Customer emotion detection by emotion expression analysis on adverbs. Inf Technol Manag 16:1–9
Takamura H, Inui T, Okumura M (2006) Latent variable models for semantic orientations of phrases. In: EACL, pp 201–208
Tokuhisa R, Inui K, Matsumoto Y (2008) Emotion classification using massive examples extracted from the web. In: Proceedings of the 22nd international conference on computational linguistics-vol 1, Association for Computational Linguistics, pp 881–888
Tromp E, Pechenizkiy M (2015) Pattern-based emotion classification on social media. In: Gaber MM, Cocea M, Wiratunga N, Goker A (eds) Advances in social media analysis. Springer, Berlin, pp 1–20
Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with twitter: what 140 characters reveal about political sentiment. ICWSM 10:178–185
Volvoka S, Wilson T, Yarowski D (2013) Exploring sentiment in social media: Bootstrapping subjectivity clues from multilingual twitter streams. In: Proceedings of the 51st annual meeting of the association for computational linguistics, Association for Computational Linguistics, Baltimore, MD, USA, ACLShort ’13, pp 505–510
Wei B, Pal C (2010) Cross lingual adaptation: an experiment on sentiment classifications. In: Proceedings of the ACL 2010 conference short papers, Association for Computational Linguistics, Stroudsburg, PA, USA, ACLShort ’10, pp 258–262
Wicaksono AF, Vania C, Bayu Distiawan T, Adriani M (2014) Automatically building a corpus for sentiment analysis on indonesian tweets. In: Proceedings of the 28th Pacific Asia conference on language, information and computation, pp 185–194
Wijaya V, Erwin A, Galinium M, Muliady W (2013) Automatic mood classification of indonesian tweets using linguistic approach. In: 2013 International conference on information technology and electrical engineering (ICITEE), IEEE, pp 41–46
Xu H, Yang W, Wang J (2015) Hierarchical emotion classification and emotion component analysis on Chinese micro-blog posts. Expert Syst Appl 42(22):8745–8752
Yessenalina A, Yue Y, Cardie C (2010) Multi-level structured models for document-level sentiment classification. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Association for Computational Linguistics, Stroudsburg, PA, USA, EMNLP ’10, pp 1046–1056
Zhai Z, Xu H, Kang B, Jia P (2011) Exploiting effective features for Chinese sentiment classification. Expert Syst Appl 38(8):9139–9146
Acknowledgments
This research was partially supported by Ministry of Science and Technology (#103-2221-E-007-092).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saravia, E., Argueta, C. & Chen, YS. Unsupervised graph-based pattern extraction for multilingual emotion classification. Soc. Netw. Anal. Min. 6, 92 (2016). https://doi.org/10.1007/s13278-016-0403-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-016-0403-4