ABSTRACT
One of the major problems in pattern mining is still the problem of pattern explosion, i.e., the large amounts of patterns produced by the mining algorithms when analyzing a database with a predefined minimum support threshold. The approach we take to overcome this problem aims for automatically inferring variables from the patterns found, in order to generalize those patterns by representing them in a compact way. We introduce the novel concept of meta-patterns and present the RECAP algorithm. Meta-patterns can take several forms and the sets of patterns can be grouped considering different criteria. These decisions come as a trade-off between expressiveness and compaction of the patterns. The proposed solution accomplishes good results in the tested dataset, reducing to less than half the amount of patterns found.
- J. Han and M. Kamber, Data Mining: Concepts and Techniques, Second Edition, San Francisco, CA: Morgan Kaufmann Publishers, 2006. Google ScholarDigital Library
- W. J. Frawley, G. Piatetsky-Shapiro and C. J. Matheus, "Knowledge Discovery in Databases: An Overview," AI Magazine Volume 13 Number 3, 1992. Google ScholarDigital Library
- U. Fayyad, G. Piatetsky-Shapiro and P. Smyth, "From Data Mining to Knowledge Discovery in Databases," American Association for Artificial Intelligence, 1996. Google ScholarDigital Library
- R. Agrawal, T. Imieliński and A. Swami, "Mining association rules between sets of items in large databases," in SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data, New York, New York, USA, 1993. Google ScholarDigital Library
- M. J. Zaki, "A Journey in Pattern Mining," in Journeys to Data Mining, Springer, 2012, p. 235.Google Scholar
- R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases," in Very Large Data Bases (VLDB) Conference, 1994. Google ScholarDigital Library
- D.-I. Lin and Z. M. Kedem, "Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set," in In 6th Intl. Conf. Extending Database Technology, 1997. Google ScholarDigital Library
- N. Pasquier, Y. Bastide, R. Taouil and L. Lakhal, "Efficient mining of association rules using closed itemset lattices," Information Systems, 1999. Google ScholarDigital Library
- M. J. Zaki and M. Ogihara, "Theoretical foundations of association rules," Proc. SIGMOD Workshop on Reasearch Issues in Data Mining and Knowledge Discovery DMKD'98, 1998.Google Scholar
- J.-F. Boulicaut, A. Bykowski and C. Rigotti, "Approximation of Frequency Queries by Means of Free-Sets," in Principles of Data Mining and Knowledge Discovery, Springer Berlin Heidelberg, 2000, pp. 75--85. Google ScholarDigital Library
- A. Bykowski and C. Rigotti, "A Condensed Representation to Find Frequent Patterns," in PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, New York, 2001. Google ScholarDigital Library
- T. Calders and B. Goethals, "Mining All Non-Derivable Frequent Itemsets," in 6th European Conference, PKDD 2002, Helsinki, 2002. Google ScholarDigital Library
- J. Han, J. Wang, Y. Lu and P. Tzvetkov, "Mining Top-K Frequent Closed Patterns without Minimum Support," in IEEE International Conference on Data Mining, 2002. Google ScholarDigital Library
- F. Afrati, A. Gionis and H. Mannila, "Approximating a Collection of Frequent Sets," in Knowledge Discovery and Data Mining, Seattle, Washington, USA, 2004. Google ScholarDigital Library
- D. Xin, H. Cheng, X. Yan and J. Han, "Extracting Redundancy-Aware Top-K Patterns," in Knowledge Discovery and Data Mining, Philadelphia, Pennsylvania, USA, 2006. Google ScholarDigital Library
- Y. Kameya and T. Sato, "RP-growth: Top-k Mining of Relevant Patterns with Minimum Support Raising," in Society for Industrial and Applied Mathematics, Austin, Texas, USA, 2013.Google Scholar
- A. Siebes, J. Vreeken and M. van Leeuwen, "Item Sets That Compress," in SIAM Conference on Data Mining, 2006.Google Scholar
- D. Xin, J. Han, X. Yan and H. Cheng, "Mining Compressed Frequent-Pattern Sets," in 31st VLDB Conference, Trondheim, Norway, 2005. Google ScholarDigital Library
- X. Yan, H. Cheng, J. Han and D. Xin, "Summarizing Itemset Patterns: A Profile-Based Approach," in Knowledge Discovery and Data Mining, Chicago, Illinois, USA, 2005. Google ScholarDigital Library
- G. Liu, H. Lu, W. Lou and J. X. Yu, "On Computing, Storing and Querying Frequent Patterns," in SIGKDD, Washington, DC, USA, 2003. Google ScholarDigital Library
- S. Dzeroski, "Inductive logic programming and knowledge discovery in databases," in Advances in Knowledge Discovery and Data Mining, Menlo Park, California, USA, AAAI Press, 1996. Google ScholarDigital Library
- L. De Raedt, "Inductive Logic Programming," 2010. {Online}. Available: https://lirias.kuleuven.be/bitstream/123456789/301407/1/ilp4.pdf. {Accessed 12 October 2013}.Google Scholar
- C. Stolle, A. Karwath and L. De Raedt, "Classic'cl: An integrated ILP system," in Discovery Science 8th International Conference, 2005. Google ScholarDigital Library
- C. Antunes, "Project D2PM," Project funded by FCT, under the grant PTDC/EIA-EIA/110074/2009, {Online}. Available: https://sites.google.com/site/projectd2pm/. {Accessed 21 03 2014}.Google Scholar
- M. J. Zaki, S. Parthasarathy, M. Ogihara and W. Li, "New Algorithms for Fast Discovery of Association Rules," University of Rochester, Rochester, NY, USA, 1997.Google Scholar
- C. Borgelt, "Efficient Implementations of Apriori and Eclat," in Workshop of Frequent Item Set Mining Implementations, Melbourne, FL, USA, 2003.Google Scholar
- S. C. Madeira and A. L. Oliveira, "Biclustering algorithms for biological data analysis: a survey," Computational Biology and Bioinformatics, IEEE/ACM, vol. 1, no. 1, pp. 24--45, 2004. Google ScholarDigital Library
Index Terms
- Condensed representation of frequent itemsets
Recommendations
A condensed representation to find frequent patterns
PODS '01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsGiven a large set of data, a common data mining problem is to extract the frequent patterns occurring in this set. The idea presented in this paper is to extract a condensed representation of the frequent patterns called disjunction-free sets, instead ...
An Efficient Algorithm for Deriving Frequent Itemsets from Lossless Condensed Representation
Advances and Trends in Artificial Intelligence. From Theory to PracticeAn efficient pattern growth approach for mining fault tolerant frequent itemsets
Highlights- Mining fault tolerant (FT) frequent itemsets are computationally expensive.
- ...
AbstractMining fault tolerant (FT) frequent itemsets from transactional databases are computationally more expensive than mining exact matching frequent itemsets. Previous algorithms mine FT frequent itemsets using Apriori heuristic. Apriori-...
Comments