ABSTRACT
High-utility itemset mining is an emerging research area in the field of Data Mining. Several algorithms were proposed to find high-utility itemsets from transaction databases and use a data structure called UP-tree for their working. However, algorithms based on UP-tree generate a lot of candidates due to limited information availability in UP-tree for computing utility value estimates of itemsets. In this paper, we present a data structure named UP-Hist tree which maintains a histogram of item quantities with each node of the tree. The histogram allows computation of better utility estimates for effective pruning of the search space. Extensive experiments on real as well as synthetic datasets show that our algorithm based on UP-Hist tree outperforms the state of the art pattern-growth based algorithms in terms of the total number of candidate high utility itemsets generated that needs to be verified.
- R. Agrawal, R. Srikant, et al. Fast algorithms for mining association rules. In 20th VLDB, volume 1215, pages 487--499, 1994. Google ScholarDigital Library
- C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong, and Y.-K. Lee. Efficient tree structures for high utility pattern mining in incremental databases. IEEE TKDE, 21(12):1708--1721, 2009. Google ScholarDigital Library
- R. Chan, Q. Yang, and Y.-D. Shen. Mining high utility itemsets. In IEEE ICDM, pages 19--26, 2003. Google ScholarDigital Library
- P. Fournier-Viger, C.-W. Wu, S. Zida, and V. S. Tseng. Fhm: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In Foundations of intelligent systems, pages 83--92. Springer, 2014.Google Scholar
- B. Goethals and M. Zaki. the fimi repository, 2012.Google Scholar
- J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD Record, volume 29, pages 1--12, 2000. Google ScholarDigital Library
- C. K.-S. Leung, Q. I. Khan, Z. Li, and T. Hoque. Cantree: A canonical-order tree for incremental frequent-pattern mining. Knowledge and Information Systems, 11(3):287--311, 2007. Google ScholarDigital Library
- H.-F. Li, H.-Y. Huang, Y.-C. Chen, Y.-J. Liu, and S.-Y. Lee. Fast and memory efficient mining of high utility itemsets in data streams. In IEEE ICDM, pages 881--886, 2008. Google ScholarDigital Library
- M. Liu and J. Qu. Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 55--64. ACM, 2012. Google ScholarDigital Library
- Y. Liu, W.-k. Liao, and A. Choudhary. A fast high utility itemsets mining algorithm. In International workshop on Utility-based data mining, pages 90--99. ACM, 2005. Google ScholarDigital Library
- Y. Liu, W.-k. Liao, and A. Choudhary. A two-phase algorithm for fast discovery of high utility itemsets. In Advances in Knowledge Discovery and Data Mining, pages 689--695. Springer, 2005. Google ScholarDigital Library
- T. Lu, Y. Liu, and L. Wang. An algorithm of top-k high utility itemsets mining over data stream. Journal of Software, 9(9):2342--2347, 2014.Google ScholarCross Ref
- B.-E. Shie, H.-F. Hsiao, V. S. Tseng, and S. Y. Philip. Mining high utility mobile sequential patterns in mobile commerce environments. In DASFAA, pages 224--238, 2011. Google ScholarDigital Library
- B.-E. Shie, V. S. Tseng, and P. S. Yu. Online mining of temporal maximal utility itemsets from data streams. In ACM Symposium on Applied Computing, pages 1622--1626. ACM, 2010. Google ScholarDigital Library
- V. S. Tseng, B.-E. Shie, C.-W. Wu, and P. S. Yu. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE TKDE, 25(8):1772--1786, 2013. Google ScholarDigital Library
- V. S. Tseng, C.-W. Wu, B.-E. Shie, and P. S. Yu. Up-growth: An efficient algorithm for high utility itemset mining. In 16th ACM SIGKDD, pages 253--262, 2010. Google ScholarDigital Library
- C. W. Wu, B.-E. Shie, V. S. Tseng, and P. S. Yu. Mining top-k high utility itemsets. In 18th ACM SIGKDD, pages 78--86. ACM, 2012. Google ScholarDigital Library
- J. Yin, Z. Zheng, L. Cao, Y. Song, and W. Wei. Efficiently mining top-k high utility sequential patterns. In IEEE ICDM, pages 1259--1264. IEEE, 2013.Google ScholarCross Ref
Index Terms
- UP-Hist Tree: An Efficient Data Structure for Mining High Utility Patterns from Transaction Databases
Recommendations
ShrFP-tree: an efficient tree structure for mining share-frequent patterns
AusDM '08: Proceedings of the 7th Australasian Data Mining Conference - Volume 87Share-frequent pattern mining discovers more useful and realistic knowledge from database compared to the traditional frequent pattern mining by considering the non-binary frequency values of items in transactions. Therefore, recently share-frequent ...
CFP-tree: A compact disk-based structure for storing and querying frequent itemsets
Frequent itemset mining is an important problem in the data mining area with a wide range of applications. Many decision support systems need to support online interactive frequent itemset mining, which is a challenging task because frequent itemset ...
GC-tree: a fast online algorithm for mining frequent closed itemsets
PAKDD'07: Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data miningFrequent closed itemsets is a complete and condensed representaion for all the frequent itemsets, and it's important to generate non-redundant association rules. It has been studied extensively in data mining research, but most of them are done based on ...
Comments