Skip to main content

Integrating Multiple Data Sources for Stock Prediction

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5175))

Abstract

In many real world applications, decisions are usually made by collecting and judging information from multiple different data sources. Let us take the stock market as an example. We never make our decision based on just one single piece of advice, but always rely on a collection of information, such as the stock price movements, exchange volumes, market index, as well as the information from the news articles, expert comments and special announcements (e.g., the increase of stamp duty). Yet, modeling the stock market is difficult because: (1) The process related to market states (up and down) is a stochastic process, which is hard to capture by using the deterministic approach; and (2) The market state is invisible but will be influenced by the visible market information, such as stock prices and news articles. In this paper, we try to model the stock market process by using a Non-homogeneous Hidden Markov Model (NHMM) which takes multiple sources of information into account when making a future prediction. Our model contains three major elements: (1) External event, which denotes the events happening within the stock market (e.g., the drop of US interest rate); (2) Observed market state, which denotes the current market status (e.g. the rise in the stock price); and (3) Hidden market state, which conceptually exists but is invisible to the market participants. Specifically, we model the external events by using the information contained in the news articles, and model the observed market state by using the historical stock prices. Base on these two pieces of observable information and the previous hidden market state, we aim to identify the current hidden market state, so as to predict the immediate market movement. Extensive experiments were conducted to evaluate our work. The encouraging results indicate that our proposed approach is practically sound and effective.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adler, P.A., Adler, P.: The market as collective behavior. In: The Social Dynamics of Financial Markets, pp. 85–105 (1984)

    Google Scholar 

  2. Bodie, Z., Kane, A., Marcus, A.J.: Investments, 3rd edn. Irwin, Chicago (1996)

    Google Scholar 

  3. Fung, G.P.C., Yu, J.X., Lam, W.: News sensitive stock trend prediction. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 481–493. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Fung, G.P.C., Yu, J.X., Lu, H.: The predicting power of textual information on financial markets. IEEE Intelligent Informatics Bulletin 5(1), 1–10 (2005)

    Google Scholar 

  5. Ge, X., Smyth, P.: Deformable markov model templates for time-series pattern matching. In: Proc. of KDD 2000, pp. 81–90 (2000)

    Google Scholar 

  6. Hellstrom, T., Holmstrom, K.: Predicting the stock market (1998)

    Google Scholar 

  7. Hughes, J.P., Guttorp, P., Charles, S.P.: A non-homogeneous hidden Markov model for precipitation occurrence. Applied Statistics 48(1), 15–30 (1999)

    MATH  Google Scholar 

  8. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice-Hall, Englewood Cliffs (2000)

    Google Scholar 

  9. Keogh, E.J., Chu, S., Hart, D., Pazzani, M.J.: An online algorithm for segmenting time series. In: Proc. of ICDM 2001, pp. 289–296 (2001)

    Google Scholar 

  10. Kirshner, S.: Modeling of multivariate time series using hidden Markov models. PhD thesis, Univedrsity of California, Irvine (2005)

    Google Scholar 

  11. Klein, F., Prestbo, J.A.: News and the Market. Henry Regenry, Chicago (1974)

    Google Scholar 

  12. Lavrenko, V., Schmill, M.D., Lawire, D., Ogivie, P., Jensen, D., Allan, J.: Mining of Concurrent Text and Time Series. In: Proc. of KDD 2000 Workshop on Text Mining (2000)

    Google Scholar 

  13. Luenberger, D.G.: Investment Science. Prentice Hall, Englewood Cliffs (1997)

    Google Scholar 

  14. Pang-Ning Tan, M.S., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2006)

    Google Scholar 

  15. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)

    Google Scholar 

  16. Wüthrich, B.: Probabilistic knowledge bases. IEEE Transactions on Knowledge and Data Engineering 7(5), 691–698 (1995)

    Article  Google Scholar 

  17. Wu, H., Salzberg, B., Zhang, D.: Online event-driven subsequence matching over financial data streams. In: SIGMOD Conference, pp. 23–34 (2004)

    Google Scholar 

  18. Wuthrich, B., Permunetilleke, D., Leung, S., Cho, V., Zhang, J., Lam, W.: Daily prediction of major stock indices from textual www data. In: Proc. of KDD 1998, pp. 364–368 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

James Bailey David Maier Klaus-Dieter Schewe Bernhard Thalheim Xiaoyang Sean Wang

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, D., Fung, G.P.C., Yu, J.X., Liu, Z. (2008). Integrating Multiple Data Sources for Stock Prediction. In: Bailey, J., Maier, D., Schewe, KD., Thalheim, B., Wang, X.S. (eds) Web Information Systems Engineering - WISE 2008. WISE 2008. Lecture Notes in Computer Science, vol 5175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85481-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85481-4_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85480-7

  • Online ISBN: 978-3-540-85481-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics