skip to main content
10.1145/2628194.2628243acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article
Open Access

Condensed representation of frequent itemsets

Published:07 July 2014Publication History

ABSTRACT

One of the major problems in pattern mining is still the problem of pattern explosion, i.e., the large amounts of patterns produced by the mining algorithms when analyzing a database with a predefined minimum support threshold. The approach we take to overcome this problem aims for automatically inferring variables from the patterns found, in order to generalize those patterns by representing them in a compact way. We introduce the novel concept of meta-patterns and present the RECAP algorithm. Meta-patterns can take several forms and the sets of patterns can be grouped considering different criteria. These decisions come as a trade-off between expressiveness and compaction of the patterns. The proposed solution accomplishes good results in the tested dataset, reducing to less than half the amount of patterns found.

References

  1. J. Han and M. Kamber, Data Mining: Concepts and Techniques, Second Edition, San Francisco, CA: Morgan Kaufmann Publishers, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. J. Frawley, G. Piatetsky-Shapiro and C. J. Matheus, "Knowledge Discovery in Databases: An Overview," AI Magazine Volume 13 Number 3, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. U. Fayyad, G. Piatetsky-Shapiro and P. Smyth, "From Data Mining to Knowledge Discovery in Databases," American Association for Artificial Intelligence, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Agrawal, T. Imieliński and A. Swami, "Mining association rules between sets of items in large databases," in SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data, New York, New York, USA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. J. Zaki, "A Journey in Pattern Mining," in Journeys to Data Mining, Springer, 2012, p. 235.Google ScholarGoogle Scholar
  6. R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases," in Very Large Data Bases (VLDB) Conference, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D.-I. Lin and Z. M. Kedem, "Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set," in In 6th Intl. Conf. Extending Database Technology, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Pasquier, Y. Bastide, R. Taouil and L. Lakhal, "Efficient mining of association rules using closed itemset lattices," Information Systems, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. J. Zaki and M. Ogihara, "Theoretical foundations of association rules," Proc. SIGMOD Workshop on Reasearch Issues in Data Mining and Knowledge Discovery DMKD'98, 1998.Google ScholarGoogle Scholar
  10. J.-F. Boulicaut, A. Bykowski and C. Rigotti, "Approximation of Frequency Queries by Means of Free-Sets," in Principles of Data Mining and Knowledge Discovery, Springer Berlin Heidelberg, 2000, pp. 75--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Bykowski and C. Rigotti, "A Condensed Representation to Find Frequent Patterns," in PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, New York, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Calders and B. Goethals, "Mining All Non-Derivable Frequent Itemsets," in 6th European Conference, PKDD 2002, Helsinki, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Han, J. Wang, Y. Lu and P. Tzvetkov, "Mining Top-K Frequent Closed Patterns without Minimum Support," in IEEE International Conference on Data Mining, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Afrati, A. Gionis and H. Mannila, "Approximating a Collection of Frequent Sets," in Knowledge Discovery and Data Mining, Seattle, Washington, USA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Xin, H. Cheng, X. Yan and J. Han, "Extracting Redundancy-Aware Top-K Patterns," in Knowledge Discovery and Data Mining, Philadelphia, Pennsylvania, USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Kameya and T. Sato, "RP-growth: Top-k Mining of Relevant Patterns with Minimum Support Raising," in Society for Industrial and Applied Mathematics, Austin, Texas, USA, 2013.Google ScholarGoogle Scholar
  17. A. Siebes, J. Vreeken and M. van Leeuwen, "Item Sets That Compress," in SIAM Conference on Data Mining, 2006.Google ScholarGoogle Scholar
  18. D. Xin, J. Han, X. Yan and H. Cheng, "Mining Compressed Frequent-Pattern Sets," in 31st VLDB Conference, Trondheim, Norway, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Yan, H. Cheng, J. Han and D. Xin, "Summarizing Itemset Patterns: A Profile-Based Approach," in Knowledge Discovery and Data Mining, Chicago, Illinois, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Liu, H. Lu, W. Lou and J. X. Yu, "On Computing, Storing and Querying Frequent Patterns," in SIGKDD, Washington, DC, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Dzeroski, "Inductive logic programming and knowledge discovery in databases," in Advances in Knowledge Discovery and Data Mining, Menlo Park, California, USA, AAAI Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. De Raedt, "Inductive Logic Programming," 2010. {Online}. Available: https://lirias.kuleuven.be/bitstream/123456789/301407/1/ilp4.pdf. {Accessed 12 October 2013}.Google ScholarGoogle Scholar
  23. C. Stolle, A. Karwath and L. De Raedt, "Classic'cl: An integrated ILP system," in Discovery Science 8th International Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Antunes, "Project D2PM," Project funded by FCT, under the grant PTDC/EIA-EIA/110074/2009, {Online}. Available: https://sites.google.com/site/projectd2pm/. {Accessed 21 03 2014}.Google ScholarGoogle Scholar
  25. M. J. Zaki, S. Parthasarathy, M. Ogihara and W. Li, "New Algorithms for Fast Discovery of Association Rules," University of Rochester, Rochester, NY, USA, 1997.Google ScholarGoogle Scholar
  26. C. Borgelt, "Efficient Implementations of Apriori and Eclat," in Workshop of Frequent Item Set Mining Implementations, Melbourne, FL, USA, 2003.Google ScholarGoogle Scholar
  27. S. C. Madeira and A. L. Oliveira, "Biclustering algorithms for biological data analysis: a survey," Computational Biology and Bioinformatics, IEEE/ACM, vol. 1, no. 1, pp. 24--45, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Condensed representation of frequent itemsets
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              IDEAS '14: Proceedings of the 18th International Database Engineering & Applications Symposium
              July 2014
              411 pages
              ISBN:9781450326278
              DOI:10.1145/2628194

              Copyright © 2014 Owner/Author

              Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 7 July 2014

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate74of210submissions,35%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader