Abstract
Significant world events often cause the behavioral convergence of the expression of shared sentiment. This paper examines the use of the blogosphere as a framework to study user psychological behaviors, using their sentiment responses as a form of ‘sensor’ to infer real-world events of importance automatically. We formulate a novel temporal sentiment index function using quantitative measure of the valence value of bearing words in blog posts in which the set of affective bearing words is inspired from psychological research in emotion structure. The annual local minimum and maximum of the proposed sentiment signal function are utilized to extract significant events of the year and corresponding blog posts are further analyzed using topic modeling tools to understand their content. The paper then examines the correlation of topics discovered in relation to world news events reported by the mainstream news service provider, Cable News Network, and by using the Google search engine. Next, aiming at understanding sentiment at a finer granularity over time, we propose a stochastic burst detection model, extended from the work of Kleinberg, to work incrementally with stream data. The proposed model is then used to extract sentimental bursts occurring within a specific mood label (for example, a burst of observing ‘shocked’). The blog posts at those time indices are analyzed to extract topics, and these are compared to real-world news events. Our comprehensive set of experiments conducted on a large-scale set of 12 million posts from Livejournal shows that the proposed sentiment index function coincides well with significant world events while bursts in sentiment allow us to locate finer-grain external world events.
Similar content being viewed by others
References
Allan J, Papka R, Lavrenko V (1998) Online new event detection and tracking. In: Proceedings of the ACM international conference on research and development in information retrieval (SIGIR), pp 37–45
Back M, Küfner A, Egloff B (2010) The emotional timeline of September 11, 2001. Psychol Sci 21(10): 1417
Balog K, Mishne G, de Rijke M (2006) Why are they excited? Identifying and explaining spikes in blog mood levels. In: Proceedings of the conference of the European chapter of the association for computational linguistics (EACL), pp 207–210
Baumeister R, DeWall C, Vohs K, Alquist J (2010) Does emotion cause behavior (apart from making people do stupid, destructive things)? Oxford University Press, New York
Blei D, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3: 993–1022
Bracken P, Giller J, Summerfield D (1995) Psychological responses to war and atrocity: the limitations of current concepts. Soc Sci Med 40(8): 1073–1082
Bradley M, Lang P (1999) Affective norms for English words (ANEW): instruction manual and affective ratings. University of Florida, Gainesville
Cao L (2008) Behavior informatics and analytics: let behavior talk. In: Proceedings of the IEEE international conference on data mining workshops, pp 87–96
Cao L (2010) In-depth behavior understanding and use: the behavior informatics approach. Inf Sci 180(17): 3067–3085
Cao L, Ou Y, Yu P (2011) Coupled behavior analysis with applications. IEEE Trans Knowl Data Eng 24: 1378–1392
Christakis N, Fowler J (2009) Connected: the surprising power of our social networks and how they shape our lives. Little, Brown and Company, New York
Chua A, Razikin K, Goh D (2011) Social tags as news event detectors. J Inf Sci 37(1): 3
Church T, Katigbak MS, Reyes J, Jensen S (1998) Language and organisation of Filipino emotion concepts: comparing emotion concepts and dimensions across cultures. Cognit Emot 12(1): 63–92
Colombetti G (2005) Appraising valence. J Conscious Stud 12(10): 103–126
Cont R, Bouchaud J (2000) Herd behavior and aggregate fluctuations in financial markets. Macroecon Dyn 4(2): 170–196
Coontz R (2009) Blogs: happiness barometers? Science 325: 5941
Das Sarma A, Jain A, Yu C (2011) Dynamic relationship and event discovery. In: Proceedings of the ACM international conference on web search and data mining (WSDM), pp 207–216
Dodds P, Danforth C (2010) Measuring the happiness of large-scale written expression: songs, blogs, and presidents. J Happiness Stud 11(4): 441–456
Duong T, Phung D, Bui H, Venkatesh S (2006) Human behavior recognition with generic exponential family duration modeling in the hidden semi-Markov model. In: Proceedings of the international conference on pattern recognition, pp 202–207
Fan T, Chang C (2010) Sentiment-oriented contextual advertising. Knowl Inf Syst 23: 321–344
Feng S, Wang D, Yu G, Gao W, Wong K (2011) Extracting common emotions from blogs based on fine-grained sentiment clustering. Knowl Inf Syst 27: 281–302
Fontaine J, Scherer K, Roesch E, Ellsworth P (2007) The world of emotions is not two-dimensional. Psychol Sci 18(12): 1050
Fujiki T, Nanno T, Suzuki Y, Okumura M (2004) Identification of bursts in a document stream. In: Proceedings of the first international workshop on knowledge discovery in data streams, pp 55–64
Galati D, Sini B, Tinti C, Testa S (2008) The lexicon of emotion in the neo-Latin languages. Soc Sci Inf 47(2): 205
Gehm T, Scherer KR (1988) Factors determining the dimensions of subjective emotional space. Lawrence Erlbaum Associates, Hillsdale
Gilbert E, Karahalios K (2010) Widespread worry and the stock market. In: Proceedings of the international AAAI conference on weblogs and social media (ICWSM)
Giles J (2010) Blogs and tweets could predict the future. New Sci 206(2765): 20–21
Glance N, Hurst M, Tomokiyo T (2004) Blogpulse: automated trend discovery for weblogs. In: Proceedings of the WWW workshop on the weblogging ecosystem: aggregation, analysis and dynamics
Griffiths T, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(90001): 5228–5235
Gruhl D, Guha R, Kumar R, Novak J, Tomkins A (2005) The predictive power of online chatter. In: Proceedings of the ACM international conference on knowledge discovery and data mining (SIGKDD), pp 78–87
He Q, Chang K, Lim E (2007) Using burstiness to improve clustering of topics in news streams. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 493–498
Kim E, Gilbert S, Edwards M, Graeff E (2009) Detecting sadness in 140 characters: sentiment analysis of mourning Michael Jackson on Twitter. Technical report, web Ecology Project
Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7(4): 373–397
Kramer A (2010) An unobtrusive behavioral model of gross national happiness. In: Proceedings of the ACM conference on human factors in computing systems (SIGCHI), pp 287–290
Kumar R, Novak J, Raghavan P, Tomkins A (2003) On the bursty evolution of blogspace. In: Proceedings of the international conference on world wide web (WWW), pp 568–576
Kumaran G, Allan J (2004) Text classification and named entities for new event detection. In: Proceedings of the ACM international conference on research and development in information retrieval (SIGIR), pp 297–304
Leshed G, Kaye J (2006) Understanding how bloggers feel: recognizing affect in blog posts. In: Proceedings of the ACM conference on human factors in computing systems (SIGCHI), p 1024
Luo D, Yang J, Krstajic M, Ribarsky W, Keim D (2011) Eventriver: visually exploring text collections with temporal references. IEEE Trans Vis Comput Graph PP(99): 1
Makkonen J, Ahonen-Myka H, Salmenkivi M (2003) Topic detection and tracking with spatio-temporal evidence. Advances in information retrieval, pp 549–549
Mauss I, Robinson M (2009) Measures of emotion: a review. Cognit Emot 23(2): 209–237
Mishne G, De Rijke M (2006) Capturing global mood levels using blog posts. In: Proceedings of the AAAI spring symposium on computational approaches to analysing weblogs, pp 145–152
Mishne G, Glance N (2006) Predicting movie sales from blogger sentiment. In: Proceedings of the AAAI spring symposium on computational approaches to analysing weblogs
Néda Z, Ravasz E, Brechet Y, Vicsek T, Barabási A (2000) Self-organizing processes: the sound of many hands clapping. Nature 403: 849–850
Nguyen T, Phung D, Adams B, Tran T, Venkatesh S (2010) Classification and pattern discovery of mood in weblogs. Adv Knowl Discov Data Mining 6119: 283–290
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2): 1–135
Pennebaker J, Francis M, Booth R (2007) Linguistic inquiry and word count (LIWC) [computer software]. LIWC Inc, Austin, Texas
Phung D, Duong T, Bui H, Venkatesh S (2005) Topic transition detection using hierarchical hidden markov and semi-markov models. In: Proceedings of the ACM international conference on multimedia
Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2): 257–286
Russell J (1980) A circumplex model of affect. J Pers Soc Psychol 39(6): 1161–1178
Russell J (1983) Pancultural aspects of the human conceptual organization of emotions. J Pers Soc Psychol 45(6): 1281
Russell J (2009) Emotion, core affect, and psychological construction. Cognit Emot 23(7): 1259–1283
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the international conference on world wide web (WWW), pp 851–860
Saleh B, Masseglia F (2010) Discovering frequent behaviors: time is an essential element of the context. Knowl Inf Syst 28: 311–331
Shaver P, Murdaya U, Fraley R (2001) Structure of the Indonesian emotion lexicon. Asian J Soc Psychol 4(3): 201–224
Silver R, Holman E, McIntosh D, Poulin M, Gil-Rivas V (2002) Nationwide longitudinal study of psychological responses to September 11. J Am Med Assoc 288(10): 1235
Smith C, Ellsworth P (1985) Patterns of cognitive appraisal in emotion. J Pers Soc Psychol 48(4): 813
Solomon RC (2003) Against valence (‘positive and negative emotions’) Not Passion’s Slave 1(9): 162–178
Subasic I, Berendt B (2010) Discovery of interactive graphs for understanding and searching time-indexed corpora. Knowl Inf Syst 23: 293–319
Tausczik Y, Pennebaker J (2010) The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol 29(1): 24
Thelwall M, Buckley K, Paltoglou G (2011) Sentiment in Twitter events. J Am Soc Inf Sci Technol 62(2): 406–418
Tran T, Phung D, Bui H, Venkatesh S (2006) AdaBoost. MRF: boosted Markov random forests and application to multilevel activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1686–1693
Tsai F, Zhang Y (2010) D2S: document-to-sentence framework for novelty detection. Knowl Inf Syst 29: 419–433
Tumasjan A, Sprenger T, Sandner P, Welpe I (2010) Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: Proceedings of the international AAAI conference on weblogs and social media (ICWSM)
Venkatesh S, Adams B, Phung D, Dorai C, Farrell R, Agnihotri L, Dimitrova N (2008) YouTube and I find: personalizing multimedia content access. Proc IEEE (special issue on advances in multimedia and information retrieval) 96(4): 697–711
Xing Z, Pei J, Yu P (2011) Early classification on time series. Knowl Inf Syst 31: 105–127
Yoshida M, Kinase R, Kurokawa J, Yashiro S (1970) Multi-dimensional scaling of emotion. Jpn Psychol Res 12(2): 45–61
Zhang K, Zi J, Wu L (2007) New event detection based on indexing-tree and named entity. In: Proceedings of the ACM international conference on research and development in information retrieval (SIGIR), pp 215–222
Zhao Q, Mitra P, Chen B (2007) Temporal and information flow based event detection from social text streams. In: Proceedings of the national conference on artificial intelligence (AAAI), pp 1501–1506
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nguyen, T., Phung, D., Adams, B. et al. Event extraction using behaviors of sentiment signals and burst structure in social media. Knowl Inf Syst 37, 279–304 (2013). https://doi.org/10.1007/s10115-012-0494-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0494-9