DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

Li, Hua-Fu; Shan, Man-Kwan; Lee, Suh-Yin

doi:10.1007/s10115-007-0112-4

DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

Regular Paper
Published: 09 January 2008

Volume 17, pages 79–97, (2008)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Hua-Fu Li¹,
Man-Kwan Shan² &
Suh-Yin Lee³

278 Accesses
45 Citations
Explore all metrics

Abstract

Online mining of data streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some inherent characteristics. In this paper, we propose a new single-pass algorithm, called DSM-FI (data stream mining for frequent itemsets), for online incremental mining of frequent itemsets over a continuous stream of online transactions. According to the proposed algorithm, each transaction of the stream is projected into a set of sub-transactions, and these sub-transactions are inserted into a new in-memory summary data structure, called SFI-forest (summary frequent itemset forest) for maintaining the set of all frequent itemsets embedded in the transaction data stream generated so far. Finally, the set of all frequent itemsets is determined from the current SFI-forest. Theoretical analysis and experimental studies show that the proposed DSM-FI algorithm uses stable memory, makes only one pass over an online transactional data stream, and outperforms the existing algorithms of one-pass mining of frequent itemsets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 international conference on management of data, Washington, D.C., pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca J, Jarke M, Zaniolo C (eds) Proceedings of the 20th international conference on lery large data bases, Chile, pp 487–499
Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Popa L (eds) Proceedings of the 21th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Wisconsin, USA, pp 1–16
Chang J, Lee W (2004) Decaying obsolete information in finding recent frequent itemsets over data stream. In:IEICE transactions on information and systems, vol E87-D, no. 6, pp 1588–1592
Chang J and Lee W (2004). A sliding window method for finding recently frequent itemsets over online data streams. J Inf Sci Eng 20(4): 753–762
Google Scholar
Cheng J, Ke Y, Ng W (2006) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst Doi:10.1007/s10115-007-0092-4
Google Scholar
Chi Y, Wang H, Yu P and Muntz R (2006). Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inf Syst 10(3): 265–294
Article Google Scholar
Dang X, Ng W, Ong K (2007) Online mining of frequent sets in data streams with error guarantee. Knowl Inf Syst, Doi:10.1007/s10115-007-0106-2
Google Scholar
Giannella C, Han J, Pei J, Yan X, Yu P (2003) Mining frequent patterns in data streams at multiple time granularities. In:Data mining: next generation challenges and future directions, AAAI/MIT, Kargupta H, Joshi A, Sivakumar K, Yesha Y (eds), pp 191–212
Golab L and Özsu M (2003). Issues in data stream management. ACM SIGMOD Rec 32(2): 5–14
Article Google Scholar
Han J, Pei J, Yin Y and Mao R (2004). Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Dis 8(1): 53–87
Article MathSciNet Google Scholar
Jin R, Agrawal G (2005) An algorithm for in-core frequent itemset mining on streaming data. In: Proceedings of the 5th IEEE international conference on data mining, Houston, TX, USA, pp 210–217
Lin C, Chiu D, Wu Y, Chen A (2005) Mining frequent itemsets from data streams with a time-sensitive sliding window. In: Proceedings of 2005 SIAM international conference on data mining, Newport Beach, CA, USA
Manku G, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on very large data bases, Hong Kong, China, pp 346–357
Teng W, Chen M, Yu P (2003) A regression-based temporal pattern mining scheme for data streams. In: Freytag J, Lockemann P, Abiteboul S, Carey M, Selinger P, Heuer A (eds) Proceedings of the 29th international conference on very large data bases, Berlin, Germany, pp 93–104
Wong R and Fu A (2006). Mining top-k frequent itemset from data streams. J Data Min Knowl Dis 13(2): 193–217
Article MathSciNet Google Scholar
Yu J, Chong Z, Lu H, Zhou A (2004) False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Nascimento M, Özsu M, Kossmann D, Miller R, Blakeley J, Schiefer K (eds) Proceedings of the 30th international conference on very large data bases, Toronto, Canada, pp 204–215
Zhu Y, Shasha D (2002) StatStream: statistical monitoring of thousands of data streams in real time. In: Proceedings of the 28th international conference on very large data bases, Hong Kong, China, pp 358–369

Download references

Author information

Authors and Affiliations

Department of Computer Science, Kainan University, Taoyuan, Taiwan
Hua-Fu Li
Department of Computer Science, National Chengchi University, Taipei, Taiwan
Man-Kwan Shan
Department of Computer Science, National Chiao-Tung University, Hsinchu, Taiwan
Suh-Yin Lee

Authors

Hua-Fu Li
View author publications
You can also search for this author in PubMed Google Scholar
Man-Kwan Shan
View author publications
You can also search for this author in PubMed Google Scholar
Suh-Yin Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua-Fu Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, HF., Shan, MK. & Lee, SY. DSM-FI: an efficient algorithm for mining frequent itemsets in data streams. Knowl Inf Syst 17, 79–97 (2008). https://doi.org/10.1007/s10115-007-0112-4

Download citation

Received: 09 November 2005
Revised: 07 October 2006
Accepted: 01 September 2007
Published: 09 January 2008
Issue Date: October 2008
DOI: https://doi.org/10.1007/s10115-007-0112-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Stratified random sampling from streaming and stored data

Big data preprocessing: methods and prospects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Stratified random sampling from streaming and stored data

Big data preprocessing: methods and prospects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation