research-article

Open Access

Condensed representation of frequent itemsets

Authors:
Daniel Serrano

Universidade de Lisboa, Lisboa

Universidade de Lisboa, Lisboa
View Profile

,
Cláudia Antunes

Universidade de Lisboa, Lisboa

Universidade de Lisboa, Lisboa
View Profile

IDEAS '14: Proceedings of the 18th International Database Engineering & Applications SymposiumJuly 2014Pages 168–175https://doi.org/10.1145/2628194.2628243

Published:07 July 2014Publication History

IDEAS '14: Proceedings of the 18th International Database Engineering & Applications Symposium

Pages 168–175

ABSTRACT

One of the major problems in pattern mining is still the problem of pattern explosion, i.e., the large amounts of patterns produced by the mining algorithms when analyzing a database with a predefined minimum support threshold. The approach we take to overcome this problem aims for automatically inferring variables from the patterns found, in order to generalize those patterns by representing them in a compact way. We introduce the novel concept of meta-patterns and present the RECAP algorithm. Meta-patterns can take several forms and the sets of patterns can be grouped considering different criteria. These decisions come as a trade-off between expressiveness and compaction of the patterns. The proposed solution accomplishes good results in the tested dataset, reducing to less than half the amount of patterns found.

References

J. Han and M. Kamber, Data Mining: Concepts and Techniques, Second Edition, San Francisco, CA: Morgan Kaufmann Publishers, 2006. Google ScholarDigital Library
W. J. Frawley, G. Piatetsky-Shapiro and C. J. Matheus, "Knowledge Discovery in Databases: An Overview," AI Magazine Volume 13 Number 3, 1992. Google ScholarDigital Library
U. Fayyad, G. Piatetsky-Shapiro and P. Smyth, "From Data Mining to Knowledge Discovery in Databases," American Association for Artificial Intelligence, 1996. Google ScholarDigital Library
R. Agrawal, T. Imieliński and A. Swami, "Mining association rules between sets of items in large databases," in SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data, New York, New York, USA, 1993. Google ScholarDigital Library
M. J. Zaki, "A Journey in Pattern Mining," in Journeys to Data Mining, Springer, 2012, p. 235.Google Scholar
R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases," in Very Large Data Bases (VLDB) Conference, 1994. Google ScholarDigital Library
D.-I. Lin and Z. M. Kedem, "Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set," in In 6th Intl. Conf. Extending Database Technology, 1997. Google ScholarDigital Library
N. Pasquier, Y. Bastide, R. Taouil and L. Lakhal, "Efficient mining of association rules using closed itemset lattices," Information Systems, 1999. Google ScholarDigital Library
M. J. Zaki and M. Ogihara, "Theoretical foundations of association rules," Proc. SIGMOD Workshop on Reasearch Issues in Data Mining and Knowledge Discovery DMKD'98, 1998.Google Scholar
J.-F. Boulicaut, A. Bykowski and C. Rigotti, "Approximation of Frequency Queries by Means of Free-Sets," in Principles of Data Mining and Knowledge Discovery, Springer Berlin Heidelberg, 2000, pp. 75--85. Google ScholarDigital Library
A. Bykowski and C. Rigotti, "A Condensed Representation to Find Frequent Patterns," in PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, New York, 2001. Google ScholarDigital Library
T. Calders and B. Goethals, "Mining All Non-Derivable Frequent Itemsets," in 6th European Conference, PKDD 2002, Helsinki, 2002. Google ScholarDigital Library
J. Han, J. Wang, Y. Lu and P. Tzvetkov, "Mining Top-K Frequent Closed Patterns without Minimum Support," in IEEE International Conference on Data Mining, 2002. Google ScholarDigital Library
F. Afrati, A. Gionis and H. Mannila, "Approximating a Collection of Frequent Sets," in Knowledge Discovery and Data Mining, Seattle, Washington, USA, 2004. Google ScholarDigital Library
D. Xin, H. Cheng, X. Yan and J. Han, "Extracting Redundancy-Aware Top-K Patterns," in Knowledge Discovery and Data Mining, Philadelphia, Pennsylvania, USA, 2006. Google ScholarDigital Library
Y. Kameya and T. Sato, "RP-growth: Top-k Mining of Relevant Patterns with Minimum Support Raising," in Society for Industrial and Applied Mathematics, Austin, Texas, USA, 2013.Google Scholar
A. Siebes, J. Vreeken and M. van Leeuwen, "Item Sets That Compress," in SIAM Conference on Data Mining, 2006.Google Scholar
D. Xin, J. Han, X. Yan and H. Cheng, "Mining Compressed Frequent-Pattern Sets," in 31st VLDB Conference, Trondheim, Norway, 2005. Google ScholarDigital Library
X. Yan, H. Cheng, J. Han and D. Xin, "Summarizing Itemset Patterns: A Profile-Based Approach," in Knowledge Discovery and Data Mining, Chicago, Illinois, USA, 2005. Google ScholarDigital Library
G. Liu, H. Lu, W. Lou and J. X. Yu, "On Computing, Storing and Querying Frequent Patterns," in SIGKDD, Washington, DC, USA, 2003. Google ScholarDigital Library
S. Dzeroski, "Inductive logic programming and knowledge discovery in databases," in Advances in Knowledge Discovery and Data Mining, Menlo Park, California, USA, AAAI Press, 1996. Google ScholarDigital Library
L. De Raedt, "Inductive Logic Programming," 2010. {Online}. Available: https://lirias.kuleuven.be/bitstream/123456789/301407/1/ilp4.pdf. {Accessed 12 October 2013}.Google Scholar
C. Stolle, A. Karwath and L. De Raedt, "Classic'cl: An integrated ILP system," in Discovery Science 8th International Conference, 2005. Google ScholarDigital Library
C. Antunes, "Project D2PM," Project funded by FCT, under the grant PTDC/EIA-EIA/110074/2009, {Online}. Available: https://sites.google.com/site/projectd2pm/. {Accessed 21 03 2014}.Google Scholar
M. J. Zaki, S. Parthasarathy, M. Ogihara and W. Li, "New Algorithms for Fast Discovery of Association Rules," University of Rochester, Rochester, NY, USA, 1997.Google Scholar
C. Borgelt, "Efficient Implementations of Apriori and Eclat," in Workshop of Frequent Item Set Mining Implementations, Melbourne, FL, USA, 2003.Google Scholar
S. C. Madeira and A. L. Oliveira, "Biclustering algorithms for biological data analysis: a survey," Computational Biology and Bioinformatics, IEEE/ACM, vol. 1, no. 1, pp. 24--45, 2004. Google ScholarDigital Library

Index Terms

Condensed representation of frequent itemsets

Index terms have been assigned to the content through auto-classification.

Recommendations

A condensed representation to find frequent patterns
PODS '01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Given a large set of data, a common data mining problem is to extract the frequent patterns occurring in this set. The idea presented in this paper is to extract a condensed representation of the frequent patterns called disjunction-free sets, instead ...
Read More
An Efficient Algorithm for Deriving Frequent Itemsets from Lossless Condensed Representation
Advances and Trends in Artificial Intelligence. From Theory to Practice
Abstract
Mining frequent itemsets (abbr. FIs) from dense databases usually generates a large amount of itemsets, causing the mining algorithms to suffer from long execution time and high memory usage. Frequent closed itemset (abbr. FCI) is a lossless ...
Read More
An efficient pattern growth approach for mining fault tolerant frequent itemsets
Highlights
- Mining fault tolerant (FT) frequent itemsets are computationally expensive.
- ...
Abstract
Mining fault tolerant (FT) frequent itemsets from transactional databases are computationally more expensive than mining exact matching frequent itemsets. Previous algorithms mine FT frequent itemsets using Apriori heuristic. Apriori-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IDEAS '14: Proceedings of the 18th International Database Engineering & Applications Symposium
July 2014
411 pages
ISBN:9781450326278
DOI:10.1145/2628194
Editors:
Ana Maria Almeida
ISEP
,
Jorge Bernardino
CISUC-Polytechnic Institute of Coimbra
,
Elsa Ferreira Gomes
ISEP
,
General Chairs:
Bipin C. Desai
Concordia University
,
Jorge Bernardino
CISUC-Polytechnic Institute of Coimbra
,
Program Chairs:
Ana Maria Almeida
ISEP
,
Bipin C. Desai
Concordia University
Copyright © 2014 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 July 2014
Check for updates
Author Tags
compaction
frequent itemset mining
pattern explosion
summarization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate74of210submissions,35%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 305
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Condensed representation of frequent itemsets

IDEAS '14: Proceedings of the 18th International Database Engineering & Applications Symposium

ABSTRACT

References

Cited By

Index Terms

Recommendations

A condensed representation to find frequent patterns

An Efficient Algorithm for Deriving Frequent Itemsets from Lossless Condensed Representation

An efficient pattern growth approach for mining fault tolerant frequent itemsets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Condensed representation of frequent itemsets

IDEAS '14: Proceedings of the 18th International Database Engineering & Applications Symposium

ABSTRACT

References

Cited By

Index Terms

Recommendations

A condensed representation to find frequent patterns

An Efficient Algorithm for Deriving Frequent Itemsets from Lossless Condensed Representation

An efficient pattern growth approach for mining fault tolerant frequent itemsets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media