Abstract
Online mining of data streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some inherent characteristics. In this paper, we propose a new single-pass algorithm, called DSM-FI (data stream mining for frequent itemsets), for online incremental mining of frequent itemsets over a continuous stream of online transactions. According to the proposed algorithm, each transaction of the stream is projected into a set of sub-transactions, and these sub-transactions are inserted into a new in-memory summary data structure, called SFI-forest (summary frequent itemset forest) for maintaining the set of all frequent itemsets embedded in the transaction data stream generated so far. Finally, the set of all frequent itemsets is determined from the current SFI-forest. Theoretical analysis and experimental studies show that the proposed DSM-FI algorithm uses stable memory, makes only one pass over an online transactional data stream, and outperforms the existing algorithms of one-pass mining of frequent itemsets.
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 international conference on management of data, Washington, D.C., pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca J, Jarke M, Zaniolo C (eds) Proceedings of the 20th international conference on lery large data bases, Chile, pp 487–499
Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Popa L (eds) Proceedings of the 21th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Wisconsin, USA, pp 1–16
Chang J, Lee W (2004) Decaying obsolete information in finding recent frequent itemsets over data stream. In:IEICE transactions on information and systems, vol E87-D, no. 6, pp 1588–1592
Chang J and Lee W (2004). A sliding window method for finding recently frequent itemsets over online data streams. J Inf Sci Eng 20(4): 753–762
Cheng J, Ke Y, Ng W (2006) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst Doi:10.1007/s10115-007-0092-4
Chi Y, Wang H, Yu P and Muntz R (2006). Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inf Syst 10(3): 265–294
Dang X, Ng W, Ong K (2007) Online mining of frequent sets in data streams with error guarantee. Knowl Inf Syst, Doi:10.1007/s10115-007-0106-2
Giannella C, Han J, Pei J, Yan X, Yu P (2003) Mining frequent patterns in data streams at multiple time granularities. In:Data mining: next generation challenges and future directions, AAAI/MIT, Kargupta H, Joshi A, Sivakumar K, Yesha Y (eds), pp 191–212
Golab L and Özsu M (2003). Issues in data stream management. ACM SIGMOD Rec 32(2): 5–14
Han J, Pei J, Yin Y and Mao R (2004). Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Dis 8(1): 53–87
Jin R, Agrawal G (2005) An algorithm for in-core frequent itemset mining on streaming data. In: Proceedings of the 5th IEEE international conference on data mining, Houston, TX, USA, pp 210–217
Lin C, Chiu D, Wu Y, Chen A (2005) Mining frequent itemsets from data streams with a time-sensitive sliding window. In: Proceedings of 2005 SIAM international conference on data mining, Newport Beach, CA, USA
Manku G, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on very large data bases, Hong Kong, China, pp 346–357
Teng W, Chen M, Yu P (2003) A regression-based temporal pattern mining scheme for data streams. In: Freytag J, Lockemann P, Abiteboul S, Carey M, Selinger P, Heuer A (eds) Proceedings of the 29th international conference on very large data bases, Berlin, Germany, pp 93–104
Wong R and Fu A (2006). Mining top-k frequent itemset from data streams. J Data Min Knowl Dis 13(2): 193–217
Yu J, Chong Z, Lu H, Zhou A (2004) False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Nascimento M, Özsu M, Kossmann D, Miller R, Blakeley J, Schiefer K (eds) Proceedings of the 30th international conference on very large data bases, Toronto, Canada, pp 204–215
Zhu Y, Shasha D (2002) StatStream: statistical monitoring of thousands of data streams in real time. In: Proceedings of the 28th international conference on very large data bases, Hong Kong, China, pp 358–369
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, HF., Shan, MK. & Lee, SY. DSM-FI: an efficient algorithm for mining frequent itemsets in data streams. Knowl Inf Syst 17, 79–97 (2008). https://doi.org/10.1007/s10115-007-0112-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-007-0112-4