Skip to main content
Log in

Event extraction using behaviors of sentiment signals and burst structure in social media

  • Regular paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Significant world events often cause the behavioral convergence of the expression of shared sentiment. This paper examines the use of the blogosphere as a framework to study user psychological behaviors, using their sentiment responses as a form of ‘sensor’ to infer real-world events of importance automatically. We formulate a novel temporal sentiment index function using quantitative measure of the valence value of bearing words in blog posts in which the set of affective bearing words is inspired from psychological research in emotion structure. The annual local minimum and maximum of the proposed sentiment signal function are utilized to extract significant events of the year and corresponding blog posts are further analyzed using topic modeling tools to understand their content. The paper then examines the correlation of topics discovered in relation to world news events reported by the mainstream news service provider, Cable News Network, and by using the Google search engine. Next, aiming at understanding sentiment at a finer granularity over time, we propose a stochastic burst detection model, extended from the work of Kleinberg, to work incrementally with stream data. The proposed model is then used to extract sentimental bursts occurring within a specific mood label (for example, a burst of observing ‘shocked’). The blog posts at those time indices are analyzed to extract topics, and these are compared to real-world news events. Our comprehensive set of experiments conducted on a large-scale set of 12 million posts from Livejournal shows that the proposed sentiment index function coincides well with significant world events while bursts in sentiment allow us to locate finer-grain external world events.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Allan J, Papka R, Lavrenko V (1998) Online new event detection and tracking. In: Proceedings of the ACM international conference on research and development in information retrieval (SIGIR), pp 37–45

  2. Back M, Küfner A, Egloff B (2010) The emotional timeline of September 11, 2001. Psychol Sci 21(10): 1417

    Article  Google Scholar 

  3. Balog K, Mishne G, de Rijke M (2006) Why are they excited? Identifying and explaining spikes in blog mood levels. In: Proceedings of the conference of the European chapter of the association for computational linguistics (EACL), pp 207–210

  4. Baumeister R, DeWall C, Vohs K, Alquist J (2010) Does emotion cause behavior (apart from making people do stupid, destructive things)? Oxford University Press, New York

    Google Scholar 

  5. Blei D, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3: 993–1022

    MATH  Google Scholar 

  6. Bracken P, Giller J, Summerfield D (1995) Psychological responses to war and atrocity: the limitations of current concepts. Soc Sci Med 40(8): 1073–1082

    Article  Google Scholar 

  7. Bradley M, Lang P (1999) Affective norms for English words (ANEW): instruction manual and affective ratings. University of Florida, Gainesville

    Google Scholar 

  8. Cao L (2008) Behavior informatics and analytics: let behavior talk. In: Proceedings of the IEEE international conference on data mining workshops, pp 87–96

  9. Cao L (2010) In-depth behavior understanding and use: the behavior informatics approach. Inf Sci 180(17): 3067–3085

    Article  Google Scholar 

  10. Cao L, Ou Y, Yu P (2011) Coupled behavior analysis with applications. IEEE Trans Knowl Data Eng 24: 1378–1392

    Article  Google Scholar 

  11. Christakis N, Fowler J (2009) Connected: the surprising power of our social networks and how they shape our lives. Little, Brown and Company, New York

    Google Scholar 

  12. Chua A, Razikin K, Goh D (2011) Social tags as news event detectors. J Inf Sci 37(1): 3

    Article  Google Scholar 

  13. Church T, Katigbak MS, Reyes J, Jensen S (1998) Language and organisation of Filipino emotion concepts: comparing emotion concepts and dimensions across cultures. Cognit Emot 12(1): 63–92

    Article  Google Scholar 

  14. Colombetti G (2005) Appraising valence. J Conscious Stud 12(10): 103–126

    Google Scholar 

  15. Cont R, Bouchaud J (2000) Herd behavior and aggregate fluctuations in financial markets. Macroecon Dyn 4(2): 170–196

    Article  MATH  Google Scholar 

  16. Coontz R (2009) Blogs: happiness barometers? Science 325: 5941

    Google Scholar 

  17. Das Sarma A, Jain A, Yu C (2011) Dynamic relationship and event discovery. In: Proceedings of the ACM international conference on web search and data mining (WSDM), pp 207–216

  18. Dodds P, Danforth C (2010) Measuring the happiness of large-scale written expression: songs, blogs, and presidents. J Happiness Stud 11(4): 441–456

    Article  Google Scholar 

  19. Duong T, Phung D, Bui H, Venkatesh S (2006) Human behavior recognition with generic exponential family duration modeling in the hidden semi-Markov model. In: Proceedings of the international conference on pattern recognition, pp 202–207

  20. Fan T, Chang C (2010) Sentiment-oriented contextual advertising. Knowl Inf Syst 23: 321–344

    Article  Google Scholar 

  21. Feng S, Wang D, Yu G, Gao W, Wong K (2011) Extracting common emotions from blogs based on fine-grained sentiment clustering. Knowl Inf Syst 27: 281–302

    Article  MATH  Google Scholar 

  22. Fontaine J, Scherer K, Roesch E, Ellsworth P (2007) The world of emotions is not two-dimensional. Psychol Sci 18(12): 1050

    Article  Google Scholar 

  23. Fujiki T, Nanno T, Suzuki Y, Okumura M (2004) Identification of bursts in a document stream. In: Proceedings of the first international workshop on knowledge discovery in data streams, pp 55–64

  24. Galati D, Sini B, Tinti C, Testa S (2008) The lexicon of emotion in the neo-Latin languages. Soc Sci Inf 47(2): 205

    Article  Google Scholar 

  25. Gehm T, Scherer KR (1988) Factors determining the dimensions of subjective emotional space. Lawrence Erlbaum Associates, Hillsdale

    Google Scholar 

  26. Gilbert E, Karahalios K (2010) Widespread worry and the stock market. In: Proceedings of the international AAAI conference on weblogs and social media (ICWSM)

  27. Giles J (2010) Blogs and tweets could predict the future. New Sci 206(2765): 20–21

    Article  Google Scholar 

  28. Glance N, Hurst M, Tomokiyo T (2004) Blogpulse: automated trend discovery for weblogs. In: Proceedings of the WWW workshop on the weblogging ecosystem: aggregation, analysis and dynamics

  29. Griffiths T, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(90001): 5228–5235

    Article  Google Scholar 

  30. Gruhl D, Guha R, Kumar R, Novak J, Tomkins A (2005) The predictive power of online chatter. In: Proceedings of the ACM international conference on knowledge discovery and data mining (SIGKDD), pp 78–87

  31. He Q, Chang K, Lim E (2007) Using burstiness to improve clustering of topics in news streams. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 493–498

  32. Kim E, Gilbert S, Edwards M, Graeff E (2009) Detecting sadness in 140 characters: sentiment analysis of mourning Michael Jackson on Twitter. Technical report, web Ecology Project

  33. Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7(4): 373–397

    Article  MathSciNet  Google Scholar 

  34. Kramer A (2010) An unobtrusive behavioral model of gross national happiness. In: Proceedings of the ACM conference on human factors in computing systems (SIGCHI), pp 287–290

  35. Kumar R, Novak J, Raghavan P, Tomkins A (2003) On the bursty evolution of blogspace. In: Proceedings of the international conference on world wide web (WWW), pp 568–576

  36. Kumaran G, Allan J (2004) Text classification and named entities for new event detection. In: Proceedings of the ACM international conference on research and development in information retrieval (SIGIR), pp 297–304

  37. Leshed G, Kaye J (2006) Understanding how bloggers feel: recognizing affect in blog posts. In: Proceedings of the ACM conference on human factors in computing systems (SIGCHI), p 1024

  38. Luo D, Yang J, Krstajic M, Ribarsky W, Keim D (2011) Eventriver: visually exploring text collections with temporal references. IEEE Trans Vis Comput Graph PP(99): 1

    Google Scholar 

  39. Makkonen J, Ahonen-Myka H, Salmenkivi M (2003) Topic detection and tracking with spatio-temporal evidence. Advances in information retrieval, pp 549–549

  40. Mauss I, Robinson M (2009) Measures of emotion: a review. Cognit Emot 23(2): 209–237

    Article  Google Scholar 

  41. Mishne G, De Rijke M (2006) Capturing global mood levels using blog posts. In: Proceedings of the AAAI spring symposium on computational approaches to analysing weblogs, pp 145–152

  42. Mishne G, Glance N (2006) Predicting movie sales from blogger sentiment. In: Proceedings of the AAAI spring symposium on computational approaches to analysing weblogs

  43. Néda Z, Ravasz E, Brechet Y, Vicsek T, Barabási A (2000) Self-organizing processes: the sound of many hands clapping. Nature 403: 849–850

    Article  Google Scholar 

  44. Nguyen T, Phung D, Adams B, Tran T, Venkatesh S (2010) Classification and pattern discovery of mood in weblogs. Adv Knowl Discov Data Mining 6119: 283–290

    Article  Google Scholar 

  45. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2): 1–135

    Article  Google Scholar 

  46. Pennebaker J, Francis M, Booth R (2007) Linguistic inquiry and word count (LIWC) [computer software]. LIWC Inc, Austin, Texas

    Google Scholar 

  47. Phung D, Duong T, Bui H, Venkatesh S (2005) Topic transition detection using hierarchical hidden markov and semi-markov models. In: Proceedings of the ACM international conference on multimedia

  48. Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2): 257–286

    Article  Google Scholar 

  49. Russell J (1980) A circumplex model of affect. J Pers Soc Psychol 39(6): 1161–1178

    Article  Google Scholar 

  50. Russell J (1983) Pancultural aspects of the human conceptual organization of emotions. J Pers Soc Psychol 45(6): 1281

    Article  Google Scholar 

  51. Russell J (2009) Emotion, core affect, and psychological construction. Cognit Emot 23(7): 1259–1283

    Article  Google Scholar 

  52. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the international conference on world wide web (WWW), pp 851–860

  53. Saleh B, Masseglia F (2010) Discovering frequent behaviors: time is an essential element of the context. Knowl Inf Syst 28: 311–331

    Article  Google Scholar 

  54. Shaver P, Murdaya U, Fraley R (2001) Structure of the Indonesian emotion lexicon. Asian J Soc Psychol 4(3): 201–224

    Article  Google Scholar 

  55. Silver R, Holman E, McIntosh D, Poulin M, Gil-Rivas V (2002) Nationwide longitudinal study of psychological responses to September 11. J Am Med Assoc 288(10): 1235

    Article  Google Scholar 

  56. Smith C, Ellsworth P (1985) Patterns of cognitive appraisal in emotion. J Pers Soc Psychol 48(4): 813

    Article  Google Scholar 

  57. Solomon RC (2003) Against valence (‘positive and negative emotions’) Not Passion’s Slave 1(9): 162–178

    Article  Google Scholar 

  58. Subasic I, Berendt B (2010) Discovery of interactive graphs for understanding and searching time-indexed corpora. Knowl Inf Syst 23: 293–319

    Article  Google Scholar 

  59. Tausczik Y, Pennebaker J (2010) The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol 29(1): 24

    Article  Google Scholar 

  60. Thelwall M, Buckley K, Paltoglou G (2011) Sentiment in Twitter events. J Am Soc Inf Sci Technol 62(2): 406–418

    Article  Google Scholar 

  61. Tran T, Phung D, Bui H, Venkatesh S (2006) AdaBoost. MRF: boosted Markov random forests and application to multilevel activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1686–1693

  62. Tsai F, Zhang Y (2010) D2S: document-to-sentence framework for novelty detection. Knowl Inf Syst 29: 419–433

    Article  Google Scholar 

  63. Tumasjan A, Sprenger T, Sandner P, Welpe I (2010) Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: Proceedings of the international AAAI conference on weblogs and social media (ICWSM)

  64. Venkatesh S, Adams B, Phung D, Dorai C, Farrell R, Agnihotri L, Dimitrova N (2008) YouTube and I find: personalizing multimedia content access. Proc IEEE (special issue on advances in multimedia and information retrieval) 96(4): 697–711

    Google Scholar 

  65. Xing Z, Pei J, Yu P (2011) Early classification on time series. Knowl Inf Syst 31: 105–127

    Article  Google Scholar 

  66. Yoshida M, Kinase R, Kurokawa J, Yashiro S (1970) Multi-dimensional scaling of emotion. Jpn Psychol Res 12(2): 45–61

    Google Scholar 

  67. Zhang K, Zi J, Wu L (2007) New event detection based on indexing-tree and named entity. In: Proceedings of the ACM international conference on research and development in information retrieval (SIGIR), pp 215–222

  68. Zhao Q, Mitra P, Chen B (2007) Temporal and information flow based event detection from social text streams. In: Proceedings of the national conference on artificial intelligence (AAAI), pp 1501–1506

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thin Nguyen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, T., Phung, D., Adams, B. et al. Event extraction using behaviors of sentiment signals and burst structure in social media. Knowl Inf Syst 37, 279–304 (2013). https://doi.org/10.1007/s10115-012-0494-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0494-9

Keywords

Navigation