Utility Mining Across Multi-Dimensional Sequences

Authors:
Wensheng Gan

Jinan University, Guangzhou, Guangdong, China

Jinan University, Guangzhou, Guangdong, China
View Profile

,
Jerry Chun-Wei Lin

Western Norway University of Applied Sciences, Bergen, Norway

Western Norway University of Applied Sciences, Bergen, Norway
View Profile

,
Jiexiong Zhang

Harbin Institute of Technology (Shenzhen), Shenzhen, China

Harbin Institute of Technology (Shenzhen), Shenzhen, China
View Profile

,
Hongzhi Yin

University of Queensland, Brisbane, QLD, Australia

University of Queensland, Brisbane, QLD, Australia
View Profile

,
Philippe Fournier-Viger

Harbin Institute of Technology (Shenzhen), Shenzhen, China

Harbin Institute of Technology (Shenzhen), Shenzhen, China
View Profile

,
Han-Chieh Chao

National Dong Hwa University, Hualien, Taiwan, ROC

National Dong Hwa University, Hualien, Taiwan, ROC
View Profile

,
Philip S. Yu

University of Illinois at Chicago, Chicago IL, USA

University of Illinois at Chicago, Chicago IL, USA
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 15 Issue 5Article No.: 82pp 1–24https://doi.org/10.1145/3446938

Published:10 May 2021Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Knowledge extraction from database is the fundamental task in database and data mining community, which has been applied to a wide range of real-world applications and situations. Different from the support-based mining models, the utility-oriented mining framework integrates the utility theory to provide more informative and useful patterns. Time-dependent sequence data are commonly seen in real life. Sequence data have been widely utilized in many applications, such as analyzing sequential user behavior on the Web, influence maximization, route planning, and targeted marketing. Unfortunately, all the existing algorithms lose sight of the fact that the processed data not only contain rich features (e.g., occur quantity, risk, and profit), but also may be associated with multi-dimensional auxiliary information, e.g., transaction sequence can be associated with purchaser profile information. In this article, we first formulate the problem of utility mining across multi-dimensional sequences, and propose a novel framework named MDUS to extract <underline>M</underline>ulti-<underline>D</underline>imensional <underline>U</underline>tility-oriented <underline>S</underline>equential useful patterns. To the best of our knowledge, this is the first study that incorporates the time-dependent sequence-order, quantitative information, utility factor, and auxiliary dimension. Two algorithms respectively named MDUS_EM and MDUS_SD are presented to address the formulated problem. The former algorithm is based on database transformation, and the later one performs pattern joins and a searching method to identify desired patterns across multi-dimensional sequences. Extensive experiments are carried on six real-life datasets and one synthetic dataset to show that the proposed algorithms can effectively and efficiently discover the useful knowledge from multi-dimensional sequential databases. Moreover, the MDUS framework can provide better insight, and it is more adaptable to real-life situations than the current existing models.

References

Rakesh Agrawal and Ramakrishnan Srikant. 1994. Quest synthetic data generator. Retrieved from http://www.Almaden.ibm.com/cs/quest/syndata.html.Google Scholar
Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases. 487--499.Google ScholarDigital Library
Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, and Byeong Soo Jeong. 2010. A novel approach for mining high-utility sequential patterns in sequence databases. ETRI Journal 32, 5 (2010), 676--686.Google ScholarCross Ref
Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong Soo Jeong, and Young Koo Lee. 2009. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering 21, 12 (2009), 1708--1721.Google ScholarDigital Library
Oznur Kirmemis Alkan and Pinar Karagoz. 2015. CRoM and HuspExt: Improving efficiency of high utility sequential pattern extraction. IEEE Transactions on Knowledge and Data Engineering 27, 10 (2015), 2645--2657.Google ScholarDigital Library
Xiang Ao, Ping Luo, Jin Wang, Fuzhen Zhuang, and Qing He. 2018. Mining precise-positioning episode rules from event sequences. IEEE Transactions on Knowledge and Data Engineering 30, 3 (2018), 530--543.Google ScholarCross Ref
Raymond Chan, Qiang Yang, and Yi Dong Shen. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE, 19--26.Google ScholarCross Ref
Ming Syan Chen, Jiawei Han, and Philip S. Yu. 1996. Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering 8, 6 (1996), 866--883.Google ScholarDigital Library
James S. Coleman and Thomas J. Fararo. 1992. Rational Choice Theory. Sage.Google Scholar
Philippe Fournier-Viger, Jerry Chun Wei Lin, Rage Uday Kiran, Yun Sing Koh, and Rincy Thomas. 2017. A survey of sequential pattern mining. Data Science and Pattern Recognition 1, 1 (2017), 54--77.Google Scholar
Philippe Fournier-Viger, Cheng Wei Wu, Souleymane Zida, and Vincent S. Tseng. 2014. FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In Proceedings of the International Symposium on Methodologies for Intelligent Systems. Springer, 83--92.Google Scholar
Wensheng Gan, Chun Wei Lin, Philippe Fournier-Viger, Han Chieh Chao, Vincent Tseng, and Philip S. Yu. 2019. A survey of utility-oriented pattern mining. IEEE Transactions on Knowledge and Data Engineering 33, 4 (2021), 1306–1327.Google ScholarCross Ref
Wensheng Gan, Jerry Chun Wei Lin, Philippe Fournier-Viger, Han Chieh Chao, Tzung Pei Hong, and Hamido Fujita. 2018. A survey of incremental high-utility itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 2 (2018), e1242.Google ScholarDigital Library
Wensheng Gan, Jerry Chun Wei Lin, Philippe Fournier-Viger, Han Chieh Chao, and Philip S. Yu. 2019. HUOPM: High utility occupancy pattern mining. IEEE Transactions on Cybernetics 50, 3 (2019), 1195--1208.Google ScholarCross Ref
Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, and Philip S. Yu. 2020. Utility mining across multi-sequences with individualized thresholds. ACM Transactions on Data Science 1, 2 (2020), 1--29.Google ScholarDigital Library
Liqiang Geng and Howard J. Hamilton. 2006. Interestingness measures for data mining: A survey. ACM Computing Surveys 38, 3 (2006), 9.Google ScholarDigital Library
Jiawei Han, Laks V.S. Lakshmanan, and Raymond T. Ng. 1999. Constraint-based, multidimensional data mining. Computer 32, 8 (1999), 46--50.Google ScholarDigital Library
Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. 2004. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery 8, 1 (2004), 53--87.Google ScholarDigital Library
Dongyeop Kang, Daxin Jiang, Jian Pei, Zhen Liao, Xiaohui Sun, and Ho-Jin Choi. 2011. Multidimensional mining of large-scale search logs: A topic-concept cube approach. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 385--394.Google ScholarDigital Library
Guo Cheng Lan, Tzung Pei Hong, and Vincent S. Tseng. 2011. Discovery of high utility itemsets from on-shelf time periods of products. Expert Systems with Applications 38, 5 (2011), 5851--5857.Google ScholarDigital Library
Guo Cheng Lan, Tzung Pei Hong, Vincent S. Tseng, and Shyue Liang Wang. 2014. Applying the maximum utility measure in high utility sequential pattern mining. Expert Systems with Applications 41, 11 (2014), 5071--5081.Google ScholarCross Ref
Hongwei Liang and Ke Wang. 2018. Top- route search through submodularity modeling of recurrent POI features. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 155--164.Google ScholarDigital Library
Chun Wei Lin, Tzung Pei Hong, and Wen Hsiang Lu. 2011. An effective tree structure for mining high utility itemsets. Expert Systems with Applications 38, 6 (2011), 7419--7424.Google ScholarDigital Library
Jerry Chun Wei Lin, Philippe Fournier-Viger, and Wensheng Gan. 2016. FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits. Knowledge-Based Systems 111 (2016), 283--298.Google ScholarDigital Library
Jerry Chu Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung Pei Hong, and Vincent S. Tseng. 2016. Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowledge-Based Systems 96 (2016), 171--187.Google ScholarDigital Library
Jerry Chun Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung Pei Hong, and Vincent S. Tseng. 2016. Fast algorithms for mining high-utility itemsets with various discount strategies. Advanced Engineering Informatics 30, 2 (2016), 109--126.Google ScholarDigital Library
Jerry Chun Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung Pei Hong, and Justin Zhan. 2016. Efficient mining of high-utility itemsets using multiple minimum utility thresholds. Knowledge-Based Systems 113 (2016), 100--115.Google ScholarDigital Library
Jerry Chun Wei Lin, Wensheng Gan, and Tzung Pei Hong. 2015. A fast updated algorithm to maintain the discovered high-utility itemsets for transaction modification. Advanced Engineering Informatics 29, 3 (2015), 562--574.Google ScholarDigital Library
Jerry Chun Wei Lin, Wensheng Gan, Tzung Pei Hong, and Vincent S. Tseng. 2015. Efficient algorithms for mining up-to-date high-utility patterns. Advanced Engineering Informatics 29, 3 (2015), 648--661.Google ScholarDigital Library
Jerry Chun Wei Lin, Jiexiong Zhang, and Philippe Fournier-Viger. 2017. High-utility sequential pattern mining with multiple minimum utility thresholds. In Proceedings of the Asia-Pacific Web and Web-Age Information Management Joint Conference on Web and Big Data. Springer, 215--229.Google Scholar
Yu Feng Lin, Cheng Wei Wu, Chien Feng Huang, and Vincent S. Tseng. 2015. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications 42, 12 (2015), 5303--5314.Google ScholarDigital Library
Junqiang Liu, Ke Wang, and Benjamin C.M. Fung. 2016. Mining high utility patterns in one phase without generating Candidates.IEEE Transactions on Knowledge and Data Engineering 28, 5 (2016), 1245--1257.Google Scholar
Mengchi Liu and Junfeng Qu. 2012. Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 55--64.Google ScholarDigital Library
Ying Liu, Wei Keng Liao, and Alok Choudhary. 2005. A two-phase algorithm for fast discovery of high utility itemsets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 689--695.Google ScholarDigital Library
Thang Mai, Bay Vo, and Loan T.T. Nguyen. 2017. A lattice-based approach for mining high utility association rules. Information Sciences 399 (2017), 81--97.Google ScholarDigital Library
Alfred Marshall. 2005. From principles of economics. In Readings in the Economics of the Division of Labor: The Classical Tradition. World Scientific, 195--215.Google Scholar
Campbell R. McConnell, Stanley L. Brue, and Sean Masaki Flynn. 2009. Economics: Principles, Problems, and Policies. Boston McGraw-Hill/Irwin.Google Scholar
Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Jianyong Wang, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Mei Chun Hsu. 2004. Mining sequential patterns by pattern-growth: The PrefixSpan approach. IEEE Transactions on Knowledge and Data Engineering16, 11 (2004), 1424--1440.Google Scholar
Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, and Umeshwar Dayal. 2001. Multi-dimensional sequential pattern mining. In Proceedings of the 10th International Conference on Information and Knowledge Management. ACM, 81--88.Google ScholarDigital Library
Marc Plantevit, Anne Laurent, Dominique Laurent, Maguelonne Teisseire, and Yeow Wei Choong. 2010. Mining multidimensional and multilevel sequential patterns. ACM Transactions on Knowledge Discovery from Data 4, 1 (2010), 1--37.Google Scholar
Chedy Raïssi and Marc Plantevit. 2008. Mining multidimensional sequential patterns over data streams. In Proceedings of International Conference on Data Warehousing and Knowledge Discovery. Springer, 263--272.Google ScholarDigital Library
Heungmo Ryang and Unil Yun. 2016. High utility pattern mining over data streams with sliding window technique. Expert Systems with Applications 57 (2016), 214--231.Google ScholarDigital Library
Bai En Shie, Hui Fang Hsiao, Vincent S. Tseng, and Philip S. Yu. 2011. Mining high utility mobile sequential patterns in mobile commerce environments. In Proceedings of International Conference on Database Systems for Advanced Applications. Springer, 224--238.Google Scholar
Ramakrishnan Srikant and Rakesh Agrawal. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of International Conference on Extending Database Technology. Springer, 1--17.Google ScholarCross Ref
Vincent S. Tseng, Bai En Shie, Cheng Wei Wu, and Philip S. Yu. 2013. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering 25, 8 (2013), 1772--1786.Google ScholarDigital Library
Vincent S. Tseng, Cheng Wei Wu, Philippe Fournier-Viger, and Philip S. Yu. 2015. Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Transactions on Knowledge and Data Engineering 27, 3 (2015), 726--739.Google ScholarDigital Library
Vincent S. Tseng, Cheng Wei Wu, Philippe Fournier-Viger, and Philip S. Yu. 2016. Efficient algorithms for mining top- high utility itemsets. IEEE Transactions on Knowledge and Data Engineering 28, 1 (2016), 54--67.Google ScholarDigital Library
Vincent S. Tseng, Cheng Wei Wu, Bai En Shie, and Philip S. Yu. 2010. UP-Growth: An efficient algorithm for high utility itemset mining. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 253--262.Google Scholar
Jun Zhe Wang and Jiun Long Huang. 2016. Incremental mining of high utility sequential patterns in incremental databases. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 2341--2346.Google ScholarDigital Library
Jun Zhe Wang and Jiun Long Huang. 2018. On incremental high utility sequential pattern mining. ACM Transactions on Intelligent Systems and Technology 9, 5 (2018), 55.Google ScholarDigital Library
Jun Zhe Wang, Jiun Long Huang, and Yi Cheng Chen. 2016. On efficiently mining high utility sequential patterns. Knowledge and Information Systems 49, 2 (2016), 597--627.Google ScholarDigital Library
Hong Yao, Howard J. Hamilton, and Cory J. Butz. 2004. A foundational approach to mining itemset utilities from databases. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 482--486.Google Scholar
Junfu Yin, Zhigang Zheng, and Longbing Cao. 2012. USpan: An efficient algorithm for mining high utility sequential patterns. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 660--668.Google ScholarDigital Library
Chung Ching Yu and Yen Liang Chen. 2005. Mining sequential patterns from multidimensional sequence data. IEEE Transactions on Knowledge and Data Engineering 17, 1 (2005), 136--140.Google ScholarDigital Library
Unil Yun, Gangin Lee, and Eunchul Yoon. 2017. Efficient high utility pattern mining for establishing manufacturing plans with sliding window control. IEEE Transactions on Industrial Electronics 64, 9 (2017), 7239--7249.Google ScholarCross Ref
Mohammed J. Zaki. 2001. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning 42, 1-2 (2001), 31--60.Google ScholarCross Ref
Qi Zhao, Yongfeng Zhang, Yi Zhang, and Daniel Friedman. 2017. Multi-product utility maximization for economic recommendation. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. ACM, 435--443.Google ScholarDigital Library
Souleymane Zida, Philippe Fournier-Viger, Jerry Chun Wei Lin, Cheng Wei Wu, and Vincent S. Tseng. 2015. EFIM: A highly efficient algorithm for high-utility itemset mining. In Proceedings of Mexican International Conference on Artificial Intelligence. Springer, 530--546.Google Scholar
Morteza Zihayat, Heidar Davoudi, and Aijun An. 2017. Mining significant high utility gene regulation sequential patterns. BMC Systems Biology 11, 6 (2017), 109.Google ScholarCross Ref

Index Terms

Utility Mining Across Multi-Dimensional Sequences
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Business intelligence
  2. Information systems applications
    1. Data mining
    2. Decision support systems
      1. Data analytics

Recommendations

On-shelf utility mining with negative item values

We introduce a new research work, on-shelf utility mining with negative item values.We propose a TS-HOUN algorithm for mining the new type of utility itemsets.The derived itemsets are expected to be more reliable in terms of business.The synthetic and ...
Read More
High average-utility itemsets mining: a survey
Abstract
HUIM (High utility itemsets mining) is a sub-division of data mining dealing with the task to obtain promising patterns in the quantitative datasets. A variant of HUIM is to discover the HAUIM (High average-utility itemsets mining) where average-...
Read More
Multi-level Utility Mining: Retrieval of High Utility Itemsets in a Transaction Database
Abstract
Utility mining is a key field in data mining, meant to reveal high utility itemsets (HUIs). It retrieves HUIs from a multi-level dataset. An algorithm MUMA (multilevel utility mining algorithm) was proposed to retrieve HUIs in a multi-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 15, Issue 5
October 2021
508 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3461317
Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 May 2021
- Accepted: 1 January 2021
- Revised: 1 November 2020
- Received: 1 May 2020
Published in tkdd Volume 15, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Economic
auxiliary dimension
pruning strategies
sequential data
utility mining
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 463
  Total Downloads
- Downloads (Last 12 months)205
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Utility Mining Across Multi-Dimensional Sequences

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

On-shelf utility mining with negative item values

High average-utility itemsets mining: a survey

Multi-level Utility Mining: Retrieval of High Utility Itemsets in a Transaction Database