Skip to main content
Log in

DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Online mining of data streams is an important data mining problem with broad applications. However, it is also a difficult problem since the streaming data possess some inherent characteristics. In this paper, we propose a new single-pass algorithm, called DSM-FI (data stream mining for frequent itemsets), for online incremental mining of frequent itemsets over a continuous stream of online transactions. According to the proposed algorithm, each transaction of the stream is projected into a set of sub-transactions, and these sub-transactions are inserted into a new in-memory summary data structure, called SFI-forest (summary frequent itemset forest) for maintaining the set of all frequent itemsets embedded in the transaction data stream generated so far. Finally, the set of all frequent itemsets is determined from the current SFI-forest. Theoretical analysis and experimental studies show that the proposed DSM-FI algorithm uses stable memory, makes only one pass over an online transactional data stream, and outperforms the existing algorithms of one-pass mining of frequent itemsets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 international conference on management of data, Washington, D.C., pp 207–216

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca J, Jarke M, Zaniolo C (eds) Proceedings of the 20th international conference on lery large data bases, Chile, pp 487–499

  3. Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Popa L (eds) Proceedings of the 21th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Wisconsin, USA, pp 1–16

  4. Chang J, Lee W (2004) Decaying obsolete information in finding recent frequent itemsets over data stream. In:IEICE transactions on information and systems, vol E87-D, no. 6, pp 1588–1592

  5. Chang J and Lee W (2004). A sliding window method for finding recently frequent itemsets over online data streams. J Inf Sci Eng 20(4): 753–762

    Google Scholar 

  6. Cheng J, Ke Y, Ng W (2006) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst Doi:10.1007/s10115-007-0092-4

    Google Scholar 

  7. Chi Y, Wang H, Yu P and Muntz R (2006). Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inf Syst 10(3): 265–294

    Article  Google Scholar 

  8. Dang X, Ng W, Ong K (2007) Online mining of frequent sets in data streams with error guarantee. Knowl Inf Syst, Doi:10.1007/s10115-007-0106-2

    Google Scholar 

  9. Giannella C, Han J, Pei J, Yan X, Yu P (2003) Mining frequent patterns in data streams at multiple time granularities. In:Data mining: next generation challenges and future directions, AAAI/MIT, Kargupta H, Joshi A, Sivakumar K, Yesha Y (eds), pp 191–212

  10. Golab L and Özsu M (2003). Issues in data stream management. ACM SIGMOD Rec 32(2): 5–14

    Article  Google Scholar 

  11. Han J, Pei J, Yin Y and Mao R (2004). Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Dis 8(1): 53–87

    Article  MathSciNet  Google Scholar 

  12. Jin R, Agrawal G (2005) An algorithm for in-core frequent itemset mining on streaming data. In: Proceedings of the 5th IEEE international conference on data mining, Houston, TX, USA, pp 210–217

  13. Lin C, Chiu D, Wu Y, Chen A (2005) Mining frequent itemsets from data streams with a time-sensitive sliding window. In: Proceedings of 2005 SIAM international conference on data mining, Newport Beach, CA, USA

  14. Manku G, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on very large data bases, Hong Kong, China, pp 346–357

  15. Teng W, Chen M, Yu P (2003) A regression-based temporal pattern mining scheme for data streams. In: Freytag J, Lockemann P, Abiteboul S, Carey M, Selinger P, Heuer A (eds) Proceedings of the 29th international conference on very large data bases, Berlin, Germany, pp 93–104

  16. Wong R and Fu A (2006). Mining top-k frequent itemset from data streams. J Data Min Knowl Dis 13(2): 193–217

    Article  MathSciNet  Google Scholar 

  17. Yu J, Chong Z, Lu H, Zhou A (2004) False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Nascimento M, Özsu M, Kossmann D, Miller R, Blakeley J, Schiefer K (eds) Proceedings of the 30th international conference on very large data bases, Toronto, Canada, pp 204–215

  18. Zhu Y, Shasha D (2002) StatStream: statistical monitoring of thousands of data streams in real time. In: Proceedings of the 28th international conference on very large data bases, Hong Kong, China, pp 358–369

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hua-Fu Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, HF., Shan, MK. & Lee, SY. DSM-FI: an efficient algorithm for mining frequent itemsets in data streams. Knowl Inf Syst 17, 79–97 (2008). https://doi.org/10.1007/s10115-007-0112-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-007-0112-4

Keywords

Navigation