skip to main content
10.1145/2939672.2939817acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Predicting Socio-Economic Indicators using News Events

Published:13 August 2016Publication History

ABSTRACT

Many socio-economic indicators are sensitive to real-world events. Proper characterization of the events can help to identify the relevant events that drive fluctuations in these indicators. In this paper, we propose a novel generative model of real-world events and employ it to extract events from a large corpus of news articles. We introduce the notion of an event class, which is an abstract grouping of similarly themed events. These event classes are manifested in news articles in the form of event triggers which are specific words that describe the actions or incidents reported in any article. We use the extracted events to predict fluctuations in different socio-economic indicators. Specifically, we focus on food prices and predict the price of 12 different crops based on real-world events that potentially influence food price volatility, such as transport strikes, festivals etc. Our experiments demonstrate that incorporating event information in the prediction tasks reduces the root mean square error (RMSE) of prediction by 22% compared to the standard ARIMA model. We also predict sudden increases in the food prices (i.e. spikes) using events as features, and achieve an average 5-10% increase in accuracy compared to baseline models, including an LDA topic-model based predictive model.

References

  1. G. Amodeo, R. Blanco, and U. Brefeld. Hybrid models for future event prediction. CIKM '11, pages 1981--1984, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, Mar. 2003. Google ScholarGoogle ScholarCross RefCross Ref
  3. J. Bollen, H. Mao, and X. Zeng. Twitter mood predicts the stock market. J. of Computational Science, 2(1):1--8, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  4. P. J. Brockwell and R. A. Davis. Introduction to Time Series and Forecasting. Springer, 2nd edition, Mar. 2002.Google ScholarGoogle ScholarCross RefCross Ref
  5. D. Cheng, M. T. Bahadori, and Y. Liu. Fblg: A simple and effective approach for temporal dependence discovery from time series data. KDD '14, pages 382--391, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Cortes and V. Vapnik. Support-vector networks. Mach. Learn., 20(3):273--297, Sept. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. R. Doddington, A. Mitchell, M. A. Przybocki, L. A. Ramshaw, S. Strassel, and R. M. Weischedel. The automatic content extraction (ace) program-tasks, data, and evaluation.Google ScholarGoogle Scholar
  8. J. E. Engelberg and C. A. Parsons. The causal impact of media in financial markets. J. of Fin, 66(1):67--97, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  9. Y. Fang, L. Si, N. Somasundaram, and Z. Yu. Mining contrastive opinions on political texts using cross-perspective topic model. WSDM '12, pages 63--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Gidofalvi. Using news articles to predict stock price movements, 2001.Google ScholarGoogle Scholar
  11. M. Hagenau, M. Liebmann, and D. Neumann. Automated news reading: Stock price prediction based on financial news using context-capturing features. Decision Support Systems, 55(3):685--697, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Hald. On the history of maximum likelihood in relation to inverse probability and least squares. Statist. Sci., 14(2):214--222, 05 1999.Google ScholarGoogle ScholarCross RefCross Ref
  13. D. Headey and S. Fan. Anatomy of a crisis: the causes and consequences of surging food prices. Agricultural Economics, 39(s1):375--391, 2008.Google ScholarGoogle Scholar
  14. R. Heakal. Explaining the world through macroeconomic analysis. 2012.Google ScholarGoogle Scholar
  15. T. Hofmann. Probabilistic latent semantic indexing. SIGIR '99, pages 50--57, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Hogenboom, M. de Winter, F. Frasincar, and U. Kaymak. A news event-driven approach for the historical value at risk method. Expert Systems with Applications, 42(10):4667 -- 4675, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Kawamae. Trend analysis model: trend consists of temporal words, topics, and timestamps. WSDM '11, pages 317--326, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML '01, pages 282--289, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Z. Li, B. Wang, M. Li, and W.-Y. Ma. A probabilistic model for retrospective news event detection. SIGIR '05, pages 106--113, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Liao and R. Grishman. Using document level cross-event inference to improve event extraction. ACL '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Luo, J.-G. Lou, Q. Lin, Q. Fu, R. Ding, D. Zhang, and Z. Wang. Correlating events with time series for incident diagnosis. KDD '14, pages 1583--1592, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. Google ScholarGoogle ScholarCross RefCross Ref
  23. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. NIPS '13, pages 3111--3119.Google ScholarGoogle Scholar
  24. F. Ming, F. Wong, Z. Liu, and M. Chiang. Stock market prediction from wsj: Text mining via sparse matrix factorization. ICDM '14, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Mohebbi, D. Vanderkam, J. Kodysh, R. Schonberger, H. Choi, and S. Kumar. Google correlate whitepaper.Google ScholarGoogle Scholar
  26. S. Nallareddy and M. Ogneva. Predicting restatements in macroeconomic indicators using accounting info, 2014.Google ScholarGoogle Scholar
  27. B. O'Connor and D. Bamman. Computational Text Analysis for Social Science: Model Assumptions and Complexity. public health, 2011.Google ScholarGoogle Scholar
  28. A. M. Okun. Economics for policymaking. 2004.Google ScholarGoogle Scholar
  29. K. Radinsky and E. Horvitz. Mining the web to predict future events. WSDM '13, pages 255--264. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Rudin, B. Letham, and D. Madigan. Learning theory analysis for association rules and sequential event prediction. J. of Mach. Learn. Rsrch, 14:3441--3492, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. P. Schumaker and H. Chen. Textual analysis of stock market prediction using breaking financial news: The azfin text system. ACM Transactions on Information Systems (TOIS), 27(2):12, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Shahaf and C. Guestrin. Connecting the dots between news articles. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 623--632. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Si, A. Mukherjee, B. Liu, Q. Li, H. Li, and X. Deng. Exploiting topic based twitter sentiment for stock prediction. In ACL (2), pages 24--29. The Association for Computer Linguistics, 2013.Google ScholarGoogle Scholar
  34. C. A. Sugar, Gareth, and M. James. Finding the number of clusters in a data set: An information theoretic approach. Journal of the American Statistical Association, 98:750--763, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  35. P. C. Tetlock. Giving content to investor sentiment: The role of media in the stock market. The J. of Finance, 62(3):1139--1168, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  36. S. Tripathi. The importance of knowing one's onions, January 2011. {Online; accessed Feb-2016}.Google ScholarGoogle Scholar
  37. C. K. Vaca, A. Mantrach, A. Jaimes, and M. Saerens. A time-based collective factorization for topic discovery and monitoring in news. WWW '14, pages 527--538, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Y. Wang, E. Agichtein, and M. Benzi. Tm-lda: efficient online modeling of latent topic transitions in social media. KDD '12, pages 123--131. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. F. M. F. Wong, Z. Liu, and M. Chiang. Stock market prediction from WSJ: Text mining via sparse matrix factorization. Arxiv preprint, 2014.Google ScholarGoogle Scholar
  40. B. Wuthrich, V. Cho, S. Leung, D. Permunetilleke, K. Sankaran, and J. Zhang. Daily stock market forecast from textual web data. In Systems, Man, and Cybernetics, 1998., volume 3, pages 2720--2725. IEEE, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  41. W. Zhang and S. Skiena. Trading strategies to exploit blog and news sentiment. ICWSM, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  42. D. Zhou, D. Zhong, and Y. He. Event trigger identification for biomedical events extraction using domain knowledge. Bioinformatics, 20:1587--1594, Jun 2014.Google ScholarGoogle ScholarCross RefCross Ref
  43. J. Zhu, A. Ahmed, and E. P. Xing. Medlda: Maximum margin supervised topic models for regression and classification. ICML '09, pages 1257--1264, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Predicting Socio-Economic Indicators using News Events

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
                  August 2016
                  2176 pages
                  ISBN:9781450342322
                  DOI:10.1145/2939672

                  Copyright © 2016 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 13 August 2016

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%

                  Upcoming Conference

                  KDD '24

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader