Skip to main content
Log in

Identifying Approximate Itemsets of Interest in Large Databases

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper presents a method for discovering approximate frequent itemsets of interest in large scale databases. This method uses the central limit theorem to increase efficiency, enabling us to reduce the sample size by about half compared to previous approximations. Further efficiency is gained by pruning from the search space uninteresting frequent itemsets. In addition to improving efficiency, this measure also reduces the number of itemsets that the user need consider. The model and algorithm have been implemented and evaluated using both synthetic and real-world databases. Our experimental results demonstrate the efficiency of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. C. Aggarawal and P. Yu, “A new framework for itemset generation,” in Proceedings of the ACM PODS, 1998, pp. 18–24.

  2. R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,” in Proceedings of the ACM SIGMOD Conference on Management of Data, 1993, pp. 207–216.

  3. R. Agrawal, T. Imielinski, and A. Swami, “Database Mining: A Performance Perspective,” IEEE Trans. Knowledge and Data Eng., vol. 5, no.6, pp. 914–925, 1993.

    Google Scholar 

  4. S. Brin, R. Motwani, and C. Silverstein, “Beyond market baskets: Generalizing association rules to Correlations,” in Proceedings of the ACMSIGMOD International Conference on Management of Data, 1997, pp. 265–276.

  5. C. Carter, H. Hamilton, and N. Cercone, “Share based measures for itemsets,” in Principles of Data Mining and Knowledge Discovery, edited by J. Komorowski and J. Zytkow, pp. 14–24, 1997.

  6. J. Park, M. Chen, and P. Yu, “Using a Hash-based method with transaction trimming for mining association rules,” IEEE Trans. Knowledge and Data Eng., vol. 9, no.5, pp. 813–824, 1997.

    Google Scholar 

  7. T. Shintani and M. Kitsuregawa, “Parallel mining algorithms for generalized association rules with classification hierarchy,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1998, pp. 25–36.

  8. R. Srikant and R. Agrawal, “Mining quantitative association rules in large relational tables,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1996, pp. 1–12.

  9. R. Srikant and R. Agrawal, “Mining generalized association rules,” Future Generation Computer Systems, vol. 13, pp. 161–180, 1997.

    Google Scholar 

  10. D. Tsur, J. Ullman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov, and A. Rosenthal, “Query flocks: A generalization of association-rule mining,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1998, pp. 1–12.

  11. S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic item-set counting and implication rules for market basket data,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1997, pp. 255–264.

  12. H. Toivonen, “Sampling large databases for association rules,” in Proceedings of the 22nd VLDB Conference, 1996, pp. 134–145.

  13. G. Webb, “Efficient search for association rules,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, 2000, pp. 99–107.

  14. R. Durrett, Probability: Theory and Examples, Duxbury Press, 1996.

  15. T. Hagerup and C. Rub, “A guided tour of Chernoff bounds,” Information Processing Letters, vol. 33, pp. 305–308, 1989.

    Google Scholar 

  16. R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proceedings of the 20th VLDB Conference, 1994, pp. 487–499.

  17. E. Omiecinski and A. Savasere, “Efficient mining of association rules in large dynamic databases,” in Proceedings of 16th British National Conference on Databases BNCOD 16, Cardiff, Wales, UK, 1998, pp. 49–63.

  18. A. Savasere, E. Omiecinski, and S. Navathe, “An efficient algorithm for mining association rules in large databases,” in Proceedings of the 21st International Conference on Very Large Data Bases, Zurich, Switzerland, 1995, pp. 688–692.

  19. G. Piatetsky-Shapiro, “Discovery, analysis, and presentation of strong rules,” in Knowledge Discovery in Databases, edited by G. Piatetsky-Shapiro and W. Frawley, AAAI Press/MIT Press, pp. 229–248, 1991.

  20. D. Cheung, J. Han, V. Ng, and C. Wong, “Maintenance of discovered association rules in large databases: An incremental updating technique,” in Proceedings of IEEE, 1996, pp. 106–114.

  21. R. Godin and R. Missaoui, “An incremental concept formation approach for learning from databases,” Theoretical Computer Science, vol. 133, pp. 387–419, 1994.

    Google Scholar 

  22. J. Han, Y. Cai, and N. Cercone, “Knowledge discovery in databases: An attribute-oriented approach,” in Proceedings of VLDB-92, Canada, 1992, pp. 547–559.

  23. M. Houtsma and A. Swami, “Set-oriented data mining in relational databases,” Data & Knowledge Engineering, vol. 17, pp. 245–262, 1995.

    Google Scholar 

  24. R. Miller and Y. Yang, “Association rules over interval data,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1997, pp. 452–461.

  25. D. Rasmussen and R. Yager, “Induction of fuzzy characteristic rules,” in Principles of Data Mining and Knowledge Discovery, edited by J. Komorowski and J. Zytkow, pp. 123–133. 1997.

  26. E. Han, G. Karypis, and V. Kumar, “Scalable parallel data mining for association rules,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1997, pp. 277–288.

  27. M. Chen, J. Han, and P. Yu, “Data mining: An overview from a database perspective,” IEEE Trans. Knowledge and Data Eng., vol. 8, no.6, pp. 866–881, 1996.

    Google Scholar 

  28. U. Fayyad and P. Stolorz, “Data mining and KDD: Promise and challenges,” Future Generation Computer Systems, vol. 13, pp. 99–115, 1997.

    Google Scholar 

  29. J. Hosking, E. Pednault, and M. Sudan, “A statistical perspective on data mining,” Future Generation Computer Systems, vol. 13,pp. 117–134, 1997.

    Google Scholar 

  30. H. Liu and H. Motoda, Instance Selection and Construction for Data Mining, Kluwer Academic Publishers: Dordrecht, 2001.

    Google Scholar 

  31. N. Syed, H. Liu, and K. Sung, “From incremental learning to model independent instance selection—A support vector machine approach,” Technical Report, TRA9/99, School of Computing, National University of Singapore, Sept, 1999 (http://techrep.comp.nus.edu.sg/techreports/1999/TRA9-99.asp).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Zhang, S. & Webb, G.I. Identifying Approximate Itemsets of Interest in Large Databases. Applied Intelligence 18, 91–104 (2003). https://doi.org/10.1023/A:1020995206763

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1020995206763

Navigation