ABSTRACT
Many socio-economic indicators are sensitive to real-world events. Proper characterization of the events can help to identify the relevant events that drive fluctuations in these indicators. In this paper, we propose a novel generative model of real-world events and employ it to extract events from a large corpus of news articles. We introduce the notion of an event class, which is an abstract grouping of similarly themed events. These event classes are manifested in news articles in the form of event triggers which are specific words that describe the actions or incidents reported in any article. We use the extracted events to predict fluctuations in different socio-economic indicators. Specifically, we focus on food prices and predict the price of 12 different crops based on real-world events that potentially influence food price volatility, such as transport strikes, festivals etc. Our experiments demonstrate that incorporating event information in the prediction tasks reduces the root mean square error (RMSE) of prediction by 22% compared to the standard ARIMA model. We also predict sudden increases in the food prices (i.e. spikes) using events as features, and achieve an average 5-10% increase in accuracy compared to baseline models, including an LDA topic-model based predictive model.
- G. Amodeo, R. Blanco, and U. Brefeld. Hybrid models for future event prediction. CIKM '11, pages 1981--1984, 2011. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, Mar. 2003. Google ScholarCross Ref
- J. Bollen, H. Mao, and X. Zeng. Twitter mood predicts the stock market. J. of Computational Science, 2(1):1--8, 2011.Google ScholarCross Ref
- P. J. Brockwell and R. A. Davis. Introduction to Time Series and Forecasting. Springer, 2nd edition, Mar. 2002.Google ScholarCross Ref
- D. Cheng, M. T. Bahadori, and Y. Liu. Fblg: A simple and effective approach for temporal dependence discovery from time series data. KDD '14, pages 382--391, 2014. Google ScholarDigital Library
- C. Cortes and V. Vapnik. Support-vector networks. Mach. Learn., 20(3):273--297, Sept. 1995. Google ScholarDigital Library
- G. R. Doddington, A. Mitchell, M. A. Przybocki, L. A. Ramshaw, S. Strassel, and R. M. Weischedel. The automatic content extraction (ace) program-tasks, data, and evaluation.Google Scholar
- J. E. Engelberg and C. A. Parsons. The causal impact of media in financial markets. J. of Fin, 66(1):67--97, 2011.Google ScholarCross Ref
- Y. Fang, L. Si, N. Somasundaram, and Z. Yu. Mining contrastive opinions on political texts using cross-perspective topic model. WSDM '12, pages 63--72. Google ScholarDigital Library
- G. Gidofalvi. Using news articles to predict stock price movements, 2001.Google Scholar
- M. Hagenau, M. Liebmann, and D. Neumann. Automated news reading: Stock price prediction based on financial news using context-capturing features. Decision Support Systems, 55(3):685--697, 2013.Google ScholarCross Ref
- A. Hald. On the history of maximum likelihood in relation to inverse probability and least squares. Statist. Sci., 14(2):214--222, 05 1999.Google ScholarCross Ref
- D. Headey and S. Fan. Anatomy of a crisis: the causes and consequences of surging food prices. Agricultural Economics, 39(s1):375--391, 2008.Google Scholar
- R. Heakal. Explaining the world through macroeconomic analysis. 2012.Google Scholar
- T. Hofmann. Probabilistic latent semantic indexing. SIGIR '99, pages 50--57, 1999. Google ScholarDigital Library
- F. Hogenboom, M. de Winter, F. Frasincar, and U. Kaymak. A news event-driven approach for the historical value at risk method. Expert Systems with Applications, 42(10):4667 -- 4675, 2015. Google ScholarDigital Library
- N. Kawamae. Trend analysis model: trend consists of temporal words, topics, and timestamps. WSDM '11, pages 317--326, 2011. Google ScholarDigital Library
- J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML '01, pages 282--289, 2001. Google ScholarDigital Library
- Z. Li, B. Wang, M. Li, and W.-Y. Ma. A probabilistic model for retrospective news event detection. SIGIR '05, pages 106--113, 2005. Google ScholarDigital Library
- S. Liao and R. Grishman. Using document level cross-event inference to improve event extraction. ACL '10, 2010. Google ScholarDigital Library
- C. Luo, J.-G. Lou, Q. Lin, Q. Fu, R. Ding, D. Zhang, and Z. Wang. Correlating events with time series for incident diagnosis. KDD '14, pages 1583--1592, 2014. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. Google ScholarCross Ref
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. NIPS '13, pages 3111--3119.Google Scholar
- F. Ming, F. Wong, Z. Liu, and M. Chiang. Stock market prediction from wsj: Text mining via sparse matrix factorization. ICDM '14, 2014. Google ScholarDigital Library
- M. Mohebbi, D. Vanderkam, J. Kodysh, R. Schonberger, H. Choi, and S. Kumar. Google correlate whitepaper.Google Scholar
- S. Nallareddy and M. Ogneva. Predicting restatements in macroeconomic indicators using accounting info, 2014.Google Scholar
- B. O'Connor and D. Bamman. Computational Text Analysis for Social Science: Model Assumptions and Complexity. public health, 2011.Google Scholar
- A. M. Okun. Economics for policymaking. 2004.Google Scholar
- K. Radinsky and E. Horvitz. Mining the web to predict future events. WSDM '13, pages 255--264. ACM, 2013. Google ScholarDigital Library
- C. Rudin, B. Letham, and D. Madigan. Learning theory analysis for association rules and sequential event prediction. J. of Mach. Learn. Rsrch, 14:3441--3492, 2013. Google ScholarDigital Library
- R. P. Schumaker and H. Chen. Textual analysis of stock market prediction using breaking financial news: The azfin text system. ACM Transactions on Information Systems (TOIS), 27(2):12, 2009. Google ScholarDigital Library
- D. Shahaf and C. Guestrin. Connecting the dots between news articles. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 623--632. ACM, 2010. Google ScholarDigital Library
- J. Si, A. Mukherjee, B. Liu, Q. Li, H. Li, and X. Deng. Exploiting topic based twitter sentiment for stock prediction. In ACL (2), pages 24--29. The Association for Computer Linguistics, 2013.Google Scholar
- C. A. Sugar, Gareth, and M. James. Finding the number of clusters in a data set: An information theoretic approach. Journal of the American Statistical Association, 98:750--763, 2003.Google ScholarCross Ref
- P. C. Tetlock. Giving content to investor sentiment: The role of media in the stock market. The J. of Finance, 62(3):1139--1168, 2007.Google ScholarCross Ref
- S. Tripathi. The importance of knowing one's onions, January 2011. {Online; accessed Feb-2016}.Google Scholar
- C. K. Vaca, A. Mantrach, A. Jaimes, and M. Saerens. A time-based collective factorization for topic discovery and monitoring in news. WWW '14, pages 527--538, 2014. Google ScholarDigital Library
- Y. Wang, E. Agichtein, and M. Benzi. Tm-lda: efficient online modeling of latent topic transitions in social media. KDD '12, pages 123--131. ACM, 2012. Google ScholarDigital Library
- F. M. F. Wong, Z. Liu, and M. Chiang. Stock market prediction from WSJ: Text mining via sparse matrix factorization. Arxiv preprint, 2014.Google Scholar
- B. Wuthrich, V. Cho, S. Leung, D. Permunetilleke, K. Sankaran, and J. Zhang. Daily stock market forecast from textual web data. In Systems, Man, and Cybernetics, 1998., volume 3, pages 2720--2725. IEEE, 1998.Google ScholarCross Ref
- W. Zhang and S. Skiena. Trading strategies to exploit blog and news sentiment. ICWSM, 2010.Google ScholarCross Ref
- D. Zhou, D. Zhong, and Y. He. Event trigger identification for biomedical events extraction using domain knowledge. Bioinformatics, 20:1587--1594, Jun 2014.Google ScholarCross Ref
- J. Zhu, A. Ahmed, and E. P. Xing. Medlda: Maximum margin supervised topic models for regression and classification. ICML '09, pages 1257--1264, 2009. Google ScholarDigital Library
Index Terms
- Predicting Socio-Economic Indicators using News Events
Recommendations
A hierarchical model for representation of events in multimedia observation systems
EiMM '09: Proceedings of the 1st ACM international workshop on Events in multimediaEvents are centric to multimedia observation systems, which are meant to capture and process the sensory data and provide decisions about the events occurring in the environment. There are various granularities at which the events occur. For instance, ...
Concepts and models for typing events for event-based systems
DEBS '07: Proceedings of the 2007 inaugural international conference on Distributed event-based systemsEvent-based systems are increasingly gaining widespread attention for applications that require integration with loosely coupled and distributed systems for time-critical business solutions. In this paper, we show concepts and models for representing, ...
Study on Complex Event Processing for CPS: An Event Model Perspective
UIC-ATC-SCALCOM '14: Proceedings of the 2014 IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing and 2014 IEEE 11th Intl Conf on Autonomic and Trusted Computing and 2014 IEEE 14th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom)As an emerging technology approved in areas of database and business processing, CEP (Complex Event Process) faced challenges when applied in critical areas like CPS. Based on analysis such challenges from three aspects: event model definition, event ...
Comments