Abstract
In many real world applications, decisions are usually made by collecting and judging information from multiple different data sources. Let us take the stock market as an example. We never make our decision based on just one single piece of advice, but always rely on a collection of information, such as the stock price movements, exchange volumes, market index, as well as the information from the news articles, expert comments and special announcements (e.g., the increase of stamp duty). Yet, modeling the stock market is difficult because: (1) The process related to market states (up and down) is a stochastic process, which is hard to capture by using the deterministic approach; and (2) The market state is invisible but will be influenced by the visible market information, such as stock prices and news articles. In this paper, we try to model the stock market process by using a Non-homogeneous Hidden Markov Model (NHMM) which takes multiple sources of information into account when making a future prediction. Our model contains three major elements: (1) External event, which denotes the events happening within the stock market (e.g., the drop of US interest rate); (2) Observed market state, which denotes the current market status (e.g. the rise in the stock price); and (3) Hidden market state, which conceptually exists but is invisible to the market participants. Specifically, we model the external events by using the information contained in the news articles, and model the observed market state by using the historical stock prices. Base on these two pieces of observable information and the previous hidden market state, we aim to identify the current hidden market state, so as to predict the immediate market movement. Extensive experiments were conducted to evaluate our work. The encouraging results indicate that our proposed approach is practically sound and effective.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adler, P.A., Adler, P.: The market as collective behavior. In: The Social Dynamics of Financial Markets, pp. 85–105 (1984)
Bodie, Z., Kane, A., Marcus, A.J.: Investments, 3rd edn. Irwin, Chicago (1996)
Fung, G.P.C., Yu, J.X., Lam, W.: News sensitive stock trend prediction. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 481–493. Springer, Heidelberg (2002)
Fung, G.P.C., Yu, J.X., Lu, H.: The predicting power of textual information on financial markets. IEEE Intelligent Informatics Bulletin 5(1), 1–10 (2005)
Ge, X., Smyth, P.: Deformable markov model templates for time-series pattern matching. In: Proc. of KDD 2000, pp. 81–90 (2000)
Hellstrom, T., Holmstrom, K.: Predicting the stock market (1998)
Hughes, J.P., Guttorp, P., Charles, S.P.: A non-homogeneous hidden Markov model for precipitation occurrence. Applied Statistics 48(1), 15–30 (1999)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice-Hall, Englewood Cliffs (2000)
Keogh, E.J., Chu, S., Hart, D., Pazzani, M.J.: An online algorithm for segmenting time series. In: Proc. of ICDM 2001, pp. 289–296 (2001)
Kirshner, S.: Modeling of multivariate time series using hidden Markov models. PhD thesis, Univedrsity of California, Irvine (2005)
Klein, F., Prestbo, J.A.: News and the Market. Henry Regenry, Chicago (1974)
Lavrenko, V., Schmill, M.D., Lawire, D., Ogivie, P., Jensen, D., Allan, J.: Mining of Concurrent Text and Time Series. In: Proc. of KDD 2000 Workshop on Text Mining (2000)
Luenberger, D.G.: Investment Science. Prentice Hall, Englewood Cliffs (1997)
Pang-Ning Tan, M.S., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2006)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)
Wüthrich, B.: Probabilistic knowledge bases. IEEE Transactions on Knowledge and Data Engineering 7(5), 691–698 (1995)
Wu, H., Salzberg, B., Zhang, D.: Online event-driven subsequence matching over financial data streams. In: SIGMOD Conference, pp. 23–34 (2004)
Wuthrich, B., Permunetilleke, D., Leung, S., Cho, V., Zhang, J., Lam, W.: Daily prediction of major stock indices from textual www data. In: Proc. of KDD 1998, pp. 364–368 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, D., Fung, G.P.C., Yu, J.X., Liu, Z. (2008). Integrating Multiple Data Sources for Stock Prediction. In: Bailey, J., Maier, D., Schewe, KD., Thalheim, B., Wang, X.S. (eds) Web Information Systems Engineering - WISE 2008. WISE 2008. Lecture Notes in Computer Science, vol 5175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85481-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-85481-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85480-7
Online ISBN: 978-3-540-85481-4
eBook Packages: Computer ScienceComputer Science (R0)