Skip to main content
Log in

Unsupervised graph-based pattern extraction for multilingual emotion classification

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

The connected society we live in today has allowed online users to willingly share opinions on an unprecedented scale. Motivated by the advent of mass opinion sharing, it is then crucial to devise algorithms that efficiently identify the emotions expressed within the opinionated content. Traditional opinion-based classifiers require extracting high-dimensional feature representations, which become computationally expensive to process and can misrepresent or deteriorate the accuracy of a classifier. In this paper, we propose an unsupervised graph-based approach for extracting Twitter-specific emotion-bearing patterns to be used as features. By utilizing a more representative list of patterns, as features, we improved the precision and recall of a given emotion classification task. Due to its novel bootstrapping process, the full system is also adaptable to different domains and languages. The experimented results demonstrate that the extracted patterns are effective in identifying emotions for English, Spanish, and French Twitter streams. We also provide detailed experiments and offer an extended version of our algorithm to support the classification of Indonesian microblog posts. Overall, our empirical experimented results demonstrate that the proposed approach bears desirable characteristics such as accuracy, generality, adaptability, minimal supervision, and coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. Predefined dictionaries require time and human effort to construct, and they may not be suitable for certain domains and languages. Moreover, directly translating them to other languages may not be effective or warrant high accuracy.

  2. https://www.mturk.com.

  3. By emotion-related terms, we are referring to connector words and subject words that are relevant to emotion.

  4. Considering that the languages we are studying are syntactic, where word order carries most or all of the meaning, graph analysis plays an important role in preserving the structure and meaning of text.

  5. Other external sources or dictionaries become quickly outdated since they might not cover or contain such recent and commonly used informal words.

  6. Unigrams require extra computational effort and may at times misrepresent a text.

  7. We used around 5 query hashtags, as noisy labels, to collect the social data related to each emotion category. For example, for the emotion “sadness” we queried for tweets that contain the hashtag “#sadness” and other emotion-related hashtags.

  8. https://github.com/stanfordnlp.

  9. https://en.wikipedia.org/wiki/Genetic_algorithm.

References

  • Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst 26(3):12:1–12:34

    Article  Google Scholar 

  • Arifin AZ, Sari YA, Ratnasari EK, Mutrofin S (2014) Emotion detection of tweets in indonesian language using non-negative matrix factorization. Int J Intell Syst Appl (IJISA) 6(9):54

    Google Scholar 

  • Balahur A, Turchi M (2012) Comparative experiments for multilingual sentiment analysis using machine translation. In: Proceedings of the 1st workshop on sentiment discovery from affective data (ECML-PKDD 2012), SDAD ’12, pp 75

  • Banea C, Mihalcea R, Wiebe J (2010) Multilingual subjectivity: Are more languages better? In: Proceedings of the 23rd international conference on computational linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, COLING ’10, pp 28–36

  • Banea C, Mihalcea R, Wiebe J (2013) Porting multilingual subjectivity resources across languages. IEEE Trans Affect Comput 99(PrePrints):1

  • Bermingham A, Smeaton AF (2010) Classifying sentiment in microblogs: Is brevity an advantage? Association for Computing Machinery, p 1833

  • Blitzer J, Dredze M, Pereira F (2007) Biographies, bollywood, boomboxes and blenders: domain adaptation for sentiment classification. In: ACL, pp 187–205

  • Bollen J, Mao H, Zeng XJ (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8

    Article  Google Scholar 

  • Calderon F, Chang CH, Argueta C, Saravia E, Chen YS (2015) Analyzing event opinion transition through summarized emotion visualization. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015, ACM, New York, NY, USA, ASONAM ’15, pp 749–752

  • Cui A, Zhang M, Liu Y, Ma S (2011) Emotion tokens: bridging the gap among multilingual twitter sentiment analysis. In: Proceedings of the 7th Asia conference on information retrieval technology, pp 238–249

  • Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: WWW ’03: Proceedings of the 12th international conference on World Wide Web, pp 519–528

  • Davidov D, Tsur O, Rappoport A (2010) Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 241–249

  • Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1:12

  • Hu Y, Duan J, Chen X, Pei B, Lu R (2005) A new method for sentiment classification in text retrieval. In: Natural Language Processing–IJCNLP 2005, Springer, pp 1–9

  • Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inf Sci 60(11):2169–2188

    Article  Google Scholar 

  • Kanayama H, Nasukawa T (2006) Fully automatic lexicon expansion for domain-oriented sentiment analysis. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Association for Computational Linguistics, Stroudsburg, PA, USA, EMNLP ’06, pp 355–363

  • Kim E, Gilbert S, Edwards M, Graeff E (2009) Detecting sadness in 140 characters. Webecology Project

  • Kobayashi N, Inui K, Matsumoto Y, Tateishi K, Fukushima T (2005) Collecting evaluative expressions for opinion extraction. In: Natural Language Processing–IJCNLP 2004, Springer, pp 596–605

  • Kouloumpis E, Wilson T, Moore J (2011) Twitter sentiment analysis: the good the bad and the omg!. Icwsm 11:538–541

    Google Scholar 

  • Kucuktunc O, Cambazoglu BB, Weber I, Ferhatosmanoglu H (2012) A large-scale sentiment analysis for yahoo! answers. In: Proceedings of the fifth ACM international conference on web search and data mining, ACM, New York, NY, USA, WSDM ’12, pp 633–642

  • Lin Y, Lei H, Wu J, Li X (2015) An empirical study on sentiment classification of chinese review using word embedding. Citeseer, pp 258–266

  • Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167

    Article  Google Scholar 

  • Liu KL, Li WJ, Guo M (2012) Emoticon smoothed language models for twitter sentiment analysis. In: AAAI, pp 1678–1684

  • Mohammad SM, Turney PD (2013) Crowdsourcing a word-emotion association lexicon. Comput Intell 29(3):436–465

    Article  MathSciNet  Google Scholar 

  • Narr S, Hulfenhaus M, Albayrak S (2012) Language-independent twitter sentiment analysis. In: Workshop on knowledge discovery, data mining and machine learning, pp 12–14

  • Nasukawa T, Yi J (2003) Sentiment analysis: capturing favorability using natural language processing. In: Proceedings of the 2nd international conference on knowledge capture, ACM, New York, NY, USA, K-CAP ’03, pp 70–77

  • Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. LREC 10:1320–1326

    Google Scholar 

  • Pandey V, Iyer CVK (2010) Sentiment analysis of microblogs. Citeseer

  • Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135

    Article  Google Scholar 

  • Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, vol 10, Association for Computational Linguistics, pp 79–86

  • Pell M, Rothermich K, Liu P, Paulmann S, Sethi S, Rigoulot S (2015) Preferential decoding of emotion from human non-linguistic vocalizations versus speech prosody. Biol Psychol 111:14–25

    Article  Google Scholar 

  • Pennebaker JW, Chung CK, Ireland M, Gonzales A, Booth RJ (2007) The development and psychometric properties of LIWC2007

  • Purnama KE (2012) Classification of emotions in indonesian textsusing K-NN method. Int J Inf Electron Eng 2(6):899

    Google Scholar 

  • Qadir A, Riloff E (2013) Bootstrapped learning of emotion hashtags# hashtags4you. In: Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 2–11

  • Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 conference on empirical methods in natural language processing (EMNLP-03), pp 105–112

  • Saravia E, Argueta C, Chen YS (2015) Emoviz: Mining the world’s interest through emotion analysis. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015, ACM, New York, NY, USA, ASONAM ’15, pp 753–756

  • Singh V, Piryani R, Uddin A, Waila P (2013) Sentiment analysis of movie reviews and blog posts. In: 2013 IEEE 3rd international on advance computing conference (IACC), IEEE, pp 893–898

  • Sun Y, Quan C, Kang X, Zhang Z, Ren F (2015) Customer emotion detection by emotion expression analysis on adverbs. Inf Technol Manag 16:1–9

    Article  Google Scholar 

  • Takamura H, Inui T, Okumura M (2006) Latent variable models for semantic orientations of phrases. In: EACL, pp 201–208

  • Tokuhisa R, Inui K, Matsumoto Y (2008) Emotion classification using massive examples extracted from the web. In: Proceedings of the 22nd international conference on computational linguistics-vol 1, Association for Computational Linguistics, pp 881–888

  • Tromp E, Pechenizkiy M (2015) Pattern-based emotion classification on social media. In: Gaber MM, Cocea M, Wiratunga N, Goker A (eds) Advances in social media analysis. Springer, Berlin, pp 1–20

    Chapter  Google Scholar 

  • Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with twitter: what 140 characters reveal about political sentiment. ICWSM 10:178–185

    Google Scholar 

  • Volvoka S, Wilson T, Yarowski D (2013) Exploring sentiment in social media: Bootstrapping subjectivity clues from multilingual twitter streams. In: Proceedings of the 51st annual meeting of the association for computational linguistics, Association for Computational Linguistics, Baltimore, MD, USA, ACLShort ’13, pp 505–510

  • Wei B, Pal C (2010) Cross lingual adaptation: an experiment on sentiment classifications. In: Proceedings of the ACL 2010 conference short papers, Association for Computational Linguistics, Stroudsburg, PA, USA, ACLShort ’10, pp 258–262

  • Wicaksono AF, Vania C, Bayu Distiawan T, Adriani M (2014) Automatically building a corpus for sentiment analysis on indonesian tweets. In: Proceedings of the 28th Pacific Asia conference on language, information and computation, pp 185–194

  • Wijaya V, Erwin A, Galinium M, Muliady W (2013) Automatic mood classification of indonesian tweets using linguistic approach. In: 2013 International conference on information technology and electrical engineering (ICITEE), IEEE, pp 41–46

  • Xu H, Yang W, Wang J (2015) Hierarchical emotion classification and emotion component analysis on Chinese micro-blog posts. Expert Syst Appl 42(22):8745–8752

    Article  Google Scholar 

  • Yessenalina A, Yue Y, Cardie C (2010) Multi-level structured models for document-level sentiment classification. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Association for Computational Linguistics, Stroudsburg, PA, USA, EMNLP ’10, pp 1046–1056

  • Zhai Z, Xu H, Kang B, Jia P (2011) Exploiting effective features for Chinese sentiment classification. Expert Syst Appl 38(8):9139–9146

    Article  Google Scholar 

Download references

Acknowledgments

This research was partially supported by Ministry of Science and Technology (#103-2221-E-007-092).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi-Shin Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saravia, E., Argueta, C. & Chen, YS. Unsupervised graph-based pattern extraction for multilingual emotion classification. Soc. Netw. Anal. Min. 6, 92 (2016). https://doi.org/10.1007/s13278-016-0403-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-016-0403-4

Keywords

Navigation