research-article

UP-Hist Tree: An Efficient Data Structure for Mining High Utility Patterns from Transaction Databases

Authors:
Siddharth Dawar

Indraprastha Institute of Information Technology, Delhi, India

Indraprastha Institute of Information Technology, Delhi, India
View Profile

,
Vikram Goyal

Indraprastha Institute of Information Technology, Delhi, India

Indraprastha Institute of Information Technology, Delhi, India
View Profile

IDEAS '15: Proceedings of the 19th International Database Engineering & Applications SymposiumJuly 2015Pages 56–61https://doi.org/10.1145/2790755.2790771

Published:13 July 2015Publication History

IDEAS '15: Proceedings of the 19th International Database Engineering & Applications Symposium

Pages 56–61

ABSTRACT

High-utility itemset mining is an emerging research area in the field of Data Mining. Several algorithms were proposed to find high-utility itemsets from transaction databases and use a data structure called UP-tree for their working. However, algorithms based on UP-tree generate a lot of candidates due to limited information availability in UP-tree for computing utility value estimates of itemsets. In this paper, we present a data structure named UP-Hist tree which maintains a histogram of item quantities with each node of the tree. The histogram allows computation of better utility estimates for effective pruning of the search space. Extensive experiments on real as well as synthetic datasets show that our algorithm based on UP-Hist tree outperforms the state of the art pattern-growth based algorithms in terms of the total number of candidate high utility itemsets generated that needs to be verified.

References

R. Agrawal, R. Srikant, et al. Fast algorithms for mining association rules. In 20th VLDB, volume 1215, pages 487--499, 1994. Google ScholarDigital Library
C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong, and Y.-K. Lee. Efficient tree structures for high utility pattern mining in incremental databases. IEEE TKDE, 21(12):1708--1721, 2009. Google ScholarDigital Library
R. Chan, Q. Yang, and Y.-D. Shen. Mining high utility itemsets. In IEEE ICDM, pages 19--26, 2003. Google ScholarDigital Library
P. Fournier-Viger, C.-W. Wu, S. Zida, and V. S. Tseng. Fhm: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In Foundations of intelligent systems, pages 83--92. Springer, 2014.Google Scholar
B. Goethals and M. Zaki. the fimi repository, 2012.Google Scholar
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD Record, volume 29, pages 1--12, 2000. Google ScholarDigital Library
C. K.-S. Leung, Q. I. Khan, Z. Li, and T. Hoque. Cantree: A canonical-order tree for incremental frequent-pattern mining. Knowledge and Information Systems, 11(3):287--311, 2007. Google ScholarDigital Library
H.-F. Li, H.-Y. Huang, Y.-C. Chen, Y.-J. Liu, and S.-Y. Lee. Fast and memory efficient mining of high utility itemsets in data streams. In IEEE ICDM, pages 881--886, 2008. Google ScholarDigital Library
M. Liu and J. Qu. Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 55--64. ACM, 2012. Google ScholarDigital Library
Y. Liu, W.-k. Liao, and A. Choudhary. A fast high utility itemsets mining algorithm. In International workshop on Utility-based data mining, pages 90--99. ACM, 2005. Google ScholarDigital Library
Y. Liu, W.-k. Liao, and A. Choudhary. A two-phase algorithm for fast discovery of high utility itemsets. In Advances in Knowledge Discovery and Data Mining, pages 689--695. Springer, 2005. Google ScholarDigital Library
T. Lu, Y. Liu, and L. Wang. An algorithm of top-k high utility itemsets mining over data stream. Journal of Software, 9(9):2342--2347, 2014.Google ScholarCross Ref
B.-E. Shie, H.-F. Hsiao, V. S. Tseng, and S. Y. Philip. Mining high utility mobile sequential patterns in mobile commerce environments. In DASFAA, pages 224--238, 2011. Google ScholarDigital Library
B.-E. Shie, V. S. Tseng, and P. S. Yu. Online mining of temporal maximal utility itemsets from data streams. In ACM Symposium on Applied Computing, pages 1622--1626. ACM, 2010. Google ScholarDigital Library
V. S. Tseng, B.-E. Shie, C.-W. Wu, and P. S. Yu. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE TKDE, 25(8):1772--1786, 2013. Google ScholarDigital Library
V. S. Tseng, C.-W. Wu, B.-E. Shie, and P. S. Yu. Up-growth: An efficient algorithm for high utility itemset mining. In 16th ACM SIGKDD, pages 253--262, 2010. Google ScholarDigital Library
C. W. Wu, B.-E. Shie, V. S. Tseng, and P. S. Yu. Mining top-k high utility itemsets. In 18th ACM SIGKDD, pages 78--86. ACM, 2012. Google ScholarDigital Library
J. Yin, Z. Zheng, L. Cao, Y. Song, and W. Wei. Efficiently mining top-k high utility sequential patterns. In IEEE ICDM, pages 1259--1264. IEEE, 2013.Google ScholarCross Ref

Index Terms

UP-Hist Tree: An Efficient Data Structure for Mining High Utility Patterns from Transaction Databases
1. General and reference
2. Information systems
  1. Data management systems

Recommendations

ShrFP-tree: an efficient tree structure for mining share-frequent patterns
AusDM '08: Proceedings of the 7th Australasian Data Mining Conference - Volume 87

Share-frequent pattern mining discovers more useful and realistic knowledge from database compared to the traditional frequent pattern mining by considering the non-binary frequency values of items in transactions. Therefore, recently share-frequent ...
Read More
CFP-tree: A compact disk-based structure for storing and querying frequent itemsets

Frequent itemset mining is an important problem in the data mining area with a wide range of applications. Many decision support systems need to support online interactive frequent itemset mining, which is a challenging task because frequent itemset ...
Read More
GC-tree: a fast online algorithm for mining frequent closed itemsets
PAKDD'07: Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining

Frequent closed itemsets is a complete and condensed representaion for all the frequent itemsets, and it's important to generate non-redundant association rules. It has been studied extensively in data mining research, but most of them are done based on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IDEAS '15: Proceedings of the 19th International Database Engineering & Applications Symposium
July 2015
251 pages
ISBN:9781450334143
DOI:10.1145/2790755
General Chair:
Bipin C. Desai
Concordia University
,
Program Chair:
Motomichi Toyama
Keio University
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 July 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
IDEAS
International Database Engineering & Applications Symposium
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate74of210submissions,35%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 184
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

UP-Hist Tree: An Efficient Data Structure for Mining High Utility Patterns from Transaction Databases

IDEAS '15: Proceedings of the 19th International Database Engineering & Applications Symposium

ABSTRACT

References

Cited By

Index Terms

Recommendations

ShrFP-tree: an efficient tree structure for mining share-frequent patterns

CFP-tree: A compact disk-based structure for storing and querying frequent itemsets

GC-tree: a fast online algorithm for mining frequent closed itemsets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

UP-Hist Tree: An Efficient Data Structure for Mining High Utility Patterns from Transaction Databases

IDEAS '15: Proceedings of the 19th International Database Engineering & Applications Symposium

ABSTRACT

References

Cited By

Index Terms

Recommendations

ShrFP-tree: an efficient tree structure for mining share-frequent patterns

CFP-tree: A compact disk-based structure for storing and querying frequent itemsets

GC-tree: a fast online algorithm for mining frequent closed itemsets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media