skip to main content
research-article
Public Access

Utility Mining Across Multi-Dimensional Sequences

Authors Info & Claims
Published:10 May 2021Publication History
Skip Abstract Section

Abstract

Knowledge extraction from database is the fundamental task in database and data mining community, which has been applied to a wide range of real-world applications and situations. Different from the support-based mining models, the utility-oriented mining framework integrates the utility theory to provide more informative and useful patterns. Time-dependent sequence data are commonly seen in real life. Sequence data have been widely utilized in many applications, such as analyzing sequential user behavior on the Web, influence maximization, route planning, and targeted marketing. Unfortunately, all the existing algorithms lose sight of the fact that the processed data not only contain rich features (e.g., occur quantity, risk, and profit), but also may be associated with multi-dimensional auxiliary information, e.g., transaction sequence can be associated with purchaser profile information. In this article, we first formulate the problem of utility mining across multi-dimensional sequences, and propose a novel framework named MDUS to extract <underline>M</underline>ulti-<underline>D</underline>imensional <underline>U</underline>tility-oriented <underline>S</underline>equential useful patterns. To the best of our knowledge, this is the first study that incorporates the time-dependent sequence-order, quantitative information, utility factor, and auxiliary dimension. Two algorithms respectively named MDUSEM and MDUSSD are presented to address the formulated problem. The former algorithm is based on database transformation, and the later one performs pattern joins and a searching method to identify desired patterns across multi-dimensional sequences. Extensive experiments are carried on six real-life datasets and one synthetic dataset to show that the proposed algorithms can effectively and efficiently discover the useful knowledge from multi-dimensional sequential databases. Moreover, the MDUS framework can provide better insight, and it is more adaptable to real-life situations than the current existing models.

References

  1. Rakesh Agrawal and Ramakrishnan Srikant. 1994. Quest synthetic data generator. Retrieved from http://www.Almaden.ibm.com/cs/quest/syndata.html.Google ScholarGoogle Scholar
  2. Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases. 487--499.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, and Byeong Soo Jeong. 2010. A novel approach for mining high-utility sequential patterns in sequence databases. ETRI Journal 32, 5 (2010), 676--686.Google ScholarGoogle ScholarCross RefCross Ref
  4. Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong Soo Jeong, and Young Koo Lee. 2009. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering 21, 12 (2009), 1708--1721.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Oznur Kirmemis Alkan and Pinar Karagoz. 2015. CRoM and HuspExt: Improving efficiency of high utility sequential pattern extraction. IEEE Transactions on Knowledge and Data Engineering 27, 10 (2015), 2645--2657.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Xiang Ao, Ping Luo, Jin Wang, Fuzhen Zhuang, and Qing He. 2018. Mining precise-positioning episode rules from event sequences. IEEE Transactions on Knowledge and Data Engineering 30, 3 (2018), 530--543.Google ScholarGoogle ScholarCross RefCross Ref
  7. Raymond Chan, Qiang Yang, and Yi Dong Shen. 2003. Mining high utility itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE, 19--26.Google ScholarGoogle ScholarCross RefCross Ref
  8. Ming Syan Chen, Jiawei Han, and Philip S. Yu. 1996. Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering 8, 6 (1996), 866--883.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. James S. Coleman and Thomas J. Fararo. 1992. Rational Choice Theory. Sage.Google ScholarGoogle Scholar
  10. Philippe Fournier-Viger, Jerry Chun Wei Lin, Rage Uday Kiran, Yun Sing Koh, and Rincy Thomas. 2017. A survey of sequential pattern mining. Data Science and Pattern Recognition 1, 1 (2017), 54--77.Google ScholarGoogle Scholar
  11. Philippe Fournier-Viger, Cheng Wei Wu, Souleymane Zida, and Vincent S. Tseng. 2014. FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In Proceedings of the International Symposium on Methodologies for Intelligent Systems. Springer, 83--92.Google ScholarGoogle Scholar
  12. Wensheng Gan, Chun Wei Lin, Philippe Fournier-Viger, Han Chieh Chao, Vincent Tseng, and Philip S. Yu. 2019. A survey of utility-oriented pattern mining. IEEE Transactions on Knowledge and Data Engineering 33, 4 (2021), 1306–1327.Google ScholarGoogle ScholarCross RefCross Ref
  13. Wensheng Gan, Jerry Chun Wei Lin, Philippe Fournier-Viger, Han Chieh Chao, Tzung Pei Hong, and Hamido Fujita. 2018. A survey of incremental high-utility itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 2 (2018), e1242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Wensheng Gan, Jerry Chun Wei Lin, Philippe Fournier-Viger, Han Chieh Chao, and Philip S. Yu. 2019. HUOPM: High utility occupancy pattern mining. IEEE Transactions on Cybernetics 50, 3 (2019), 1195--1208.Google ScholarGoogle ScholarCross RefCross Ref
  15. Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, and Philip S. Yu. 2020. Utility mining across multi-sequences with individualized thresholds. ACM Transactions on Data Science 1, 2 (2020), 1--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Liqiang Geng and Howard J. Hamilton. 2006. Interestingness measures for data mining: A survey. ACM Computing Surveys 38, 3 (2006), 9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jiawei Han, Laks V.S. Lakshmanan, and Raymond T. Ng. 1999. Constraint-based, multidimensional data mining. Computer 32, 8 (1999), 46--50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. 2004. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery 8, 1 (2004), 53--87.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dongyeop Kang, Daxin Jiang, Jian Pei, Zhen Liao, Xiaohui Sun, and Ho-Jin Choi. 2011. Multidimensional mining of large-scale search logs: A topic-concept cube approach. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 385--394.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Guo Cheng Lan, Tzung Pei Hong, and Vincent S. Tseng. 2011. Discovery of high utility itemsets from on-shelf time periods of products. Expert Systems with Applications 38, 5 (2011), 5851--5857.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Guo Cheng Lan, Tzung Pei Hong, Vincent S. Tseng, and Shyue Liang Wang. 2014. Applying the maximum utility measure in high utility sequential pattern mining. Expert Systems with Applications 41, 11 (2014), 5071--5081.Google ScholarGoogle ScholarCross RefCross Ref
  22. Hongwei Liang and Ke Wang. 2018. Top- route search through submodularity modeling of recurrent POI features. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 155--164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Chun Wei Lin, Tzung Pei Hong, and Wen Hsiang Lu. 2011. An effective tree structure for mining high utility itemsets. Expert Systems with Applications 38, 6 (2011), 7419--7424.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jerry Chun Wei Lin, Philippe Fournier-Viger, and Wensheng Gan. 2016. FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits. Knowledge-Based Systems 111 (2016), 283--298.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jerry Chu Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung Pei Hong, and Vincent S. Tseng. 2016. Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowledge-Based Systems 96 (2016), 171--187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jerry Chun Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung Pei Hong, and Vincent S. Tseng. 2016. Fast algorithms for mining high-utility itemsets with various discount strategies. Advanced Engineering Informatics 30, 2 (2016), 109--126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jerry Chun Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung Pei Hong, and Justin Zhan. 2016. Efficient mining of high-utility itemsets using multiple minimum utility thresholds. Knowledge-Based Systems 113 (2016), 100--115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jerry Chun Wei Lin, Wensheng Gan, and Tzung Pei Hong. 2015. A fast updated algorithm to maintain the discovered high-utility itemsets for transaction modification. Advanced Engineering Informatics 29, 3 (2015), 562--574.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jerry Chun Wei Lin, Wensheng Gan, Tzung Pei Hong, and Vincent S. Tseng. 2015. Efficient algorithms for mining up-to-date high-utility patterns. Advanced Engineering Informatics 29, 3 (2015), 648--661.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jerry Chun Wei Lin, Jiexiong Zhang, and Philippe Fournier-Viger. 2017. High-utility sequential pattern mining with multiple minimum utility thresholds. In Proceedings of the Asia-Pacific Web and Web-Age Information Management Joint Conference on Web and Big Data. Springer, 215--229.Google ScholarGoogle Scholar
  31. Yu Feng Lin, Cheng Wei Wu, Chien Feng Huang, and Vincent S. Tseng. 2015. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications 42, 12 (2015), 5303--5314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Junqiang Liu, Ke Wang, and Benjamin C.M. Fung. 2016. Mining high utility patterns in one phase without generating Candidates.IEEE Transactions on Knowledge and Data Engineering 28, 5 (2016), 1245--1257.Google ScholarGoogle Scholar
  33. Mengchi Liu and Junfeng Qu. 2012. Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 55--64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ying Liu, Wei Keng Liao, and Alok Choudhary. 2005. A two-phase algorithm for fast discovery of high utility itemsets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 689--695.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Thang Mai, Bay Vo, and Loan T.T. Nguyen. 2017. A lattice-based approach for mining high utility association rules. Information Sciences 399 (2017), 81--97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Alfred Marshall. 2005. From principles of economics. In Readings in the Economics of the Division of Labor: The Classical Tradition. World Scientific, 195--215.Google ScholarGoogle Scholar
  37. Campbell R. McConnell, Stanley L. Brue, and Sean Masaki Flynn. 2009. Economics: Principles, Problems, and Policies. Boston McGraw-Hill/Irwin.Google ScholarGoogle Scholar
  38. Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Jianyong Wang, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Mei Chun Hsu. 2004. Mining sequential patterns by pattern-growth: The PrefixSpan approach. IEEE Transactions on Knowledge and Data Engineering16, 11 (2004), 1424--1440.Google ScholarGoogle Scholar
  39. Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, and Umeshwar Dayal. 2001. Multi-dimensional sequential pattern mining. In Proceedings of the 10th International Conference on Information and Knowledge Management. ACM, 81--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Marc Plantevit, Anne Laurent, Dominique Laurent, Maguelonne Teisseire, and Yeow Wei Choong. 2010. Mining multidimensional and multilevel sequential patterns. ACM Transactions on Knowledge Discovery from Data 4, 1 (2010), 1--37.Google ScholarGoogle Scholar
  41. Chedy Raïssi and Marc Plantevit. 2008. Mining multidimensional sequential patterns over data streams. In Proceedings of International Conference on Data Warehousing and Knowledge Discovery. Springer, 263--272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Heungmo Ryang and Unil Yun. 2016. High utility pattern mining over data streams with sliding window technique. Expert Systems with Applications 57 (2016), 214--231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Bai En Shie, Hui Fang Hsiao, Vincent S. Tseng, and Philip S. Yu. 2011. Mining high utility mobile sequential patterns in mobile commerce environments. In Proceedings of International Conference on Database Systems for Advanced Applications. Springer, 224--238.Google ScholarGoogle Scholar
  44. Ramakrishnan Srikant and Rakesh Agrawal. 1996. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of International Conference on Extending Database Technology. Springer, 1--17.Google ScholarGoogle ScholarCross RefCross Ref
  45. Vincent S. Tseng, Bai En Shie, Cheng Wei Wu, and Philip S. Yu. 2013. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering 25, 8 (2013), 1772--1786.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Vincent S. Tseng, Cheng Wei Wu, Philippe Fournier-Viger, and Philip S. Yu. 2015. Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Transactions on Knowledge and Data Engineering 27, 3 (2015), 726--739.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Vincent S. Tseng, Cheng Wei Wu, Philippe Fournier-Viger, and Philip S. Yu. 2016. Efficient algorithms for mining top- high utility itemsets. IEEE Transactions on Knowledge and Data Engineering 28, 1 (2016), 54--67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Vincent S. Tseng, Cheng Wei Wu, Bai En Shie, and Philip S. Yu. 2010. UP-Growth: An efficient algorithm for high utility itemset mining. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 253--262.Google ScholarGoogle Scholar
  49. Jun Zhe Wang and Jiun Long Huang. 2016. Incremental mining of high utility sequential patterns in incremental databases. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 2341--2346.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jun Zhe Wang and Jiun Long Huang. 2018. On incremental high utility sequential pattern mining. ACM Transactions on Intelligent Systems and Technology 9, 5 (2018), 55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jun Zhe Wang, Jiun Long Huang, and Yi Cheng Chen. 2016. On efficiently mining high utility sequential patterns. Knowledge and Information Systems 49, 2 (2016), 597--627.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Hong Yao, Howard J. Hamilton, and Cory J. Butz. 2004. A foundational approach to mining itemset utilities from databases. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 482--486.Google ScholarGoogle Scholar
  53. Junfu Yin, Zhigang Zheng, and Longbing Cao. 2012. USpan: An efficient algorithm for mining high utility sequential patterns. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 660--668.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Chung Ching Yu and Yen Liang Chen. 2005. Mining sequential patterns from multidimensional sequence data. IEEE Transactions on Knowledge and Data Engineering 17, 1 (2005), 136--140.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Unil Yun, Gangin Lee, and Eunchul Yoon. 2017. Efficient high utility pattern mining for establishing manufacturing plans with sliding window control. IEEE Transactions on Industrial Electronics 64, 9 (2017), 7239--7249.Google ScholarGoogle ScholarCross RefCross Ref
  56. Mohammed J. Zaki. 2001. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning 42, 1-2 (2001), 31--60.Google ScholarGoogle ScholarCross RefCross Ref
  57. Qi Zhao, Yongfeng Zhang, Yi Zhang, and Daniel Friedman. 2017. Multi-product utility maximization for economic recommendation. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. ACM, 435--443.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Souleymane Zida, Philippe Fournier-Viger, Jerry Chun Wei Lin, Cheng Wei Wu, and Vincent S. Tseng. 2015. EFIM: A highly efficient algorithm for high-utility itemset mining. In Proceedings of Mexican International Conference on Artificial Intelligence. Springer, 530--546.Google ScholarGoogle Scholar
  59. Morteza Zihayat, Heidar Davoudi, and Aijun An. 2017. Mining significant high utility gene regulation sequential patterns. BMC Systems Biology 11, 6 (2017), 109.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Utility Mining Across Multi-Dimensional Sequences

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Knowledge Discovery from Data
          ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 5
          October 2021
          508 pages
          ISSN:1556-4681
          EISSN:1556-472X
          DOI:10.1145/3461317
          Issue’s Table of Contents

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 May 2021
          • Accepted: 1 January 2021
          • Revised: 1 November 2020
          • Received: 1 May 2020
          Published in tkdd Volume 15, Issue 5

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format