skip to main content
10.1145/2492517.2500301acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Efficient mining of frequent itemsets in social network data based on MapReduce framework

Published:25 August 2013Publication History

ABSTRACT

Social Networks promote information sharing between people everywhere and at all times. Mining data produced in this data-rich environment can be extremely useful. Frequent itemset mining plays an important role in mining associations, correlations, sequential patterns, causality, episodes, multidimensional patterns, max-patterns, partial periodicity, emerging patterns, and many other significant data mining tasks in social networks. With the exponential growth of social network data towards a terabyte or more, most of the traditional frequent itemset mining algorithms become ineffective due to either huge resource requirements or large communications overhead. Cloud computing has proved that processing very large datasets over commodity clusters can be done by providing the right programming model. As a parallel programming model, MapReduce, one of most important techniques for cloud computing, has emerged in the mining of datasets of terabyte scale or larger on clusters of computers. In this paper, we propose an efficient frequent itemset mining algorithm, called IMRApriori, based on MapReduce framework which deals with Hadoop cloud, a parallel store and computing platform. The paper demonstrates experimental results to corroborate the theoretical claims.

References

  1. Le Zhou; Zhiyong Zhong; Jin Chang; Junjie Li; Huang, J. Z.; Shengzhong Feng, "Balanced parallel FP-Growth with MapReduce," Information Computing and Telecommunications (YC-ICT), 2010 IEEE Youth Conference on, vol., no., pp. 243, 246, 28--30 Nov. 2010.Google ScholarGoogle Scholar
  2. R. Agrawal and R. Srikant: Fast Algorithms for Mining Association Rules in Large Databases. In: Proceedings of the Twentieth International Conference on Very Large Databases (VLDB), pp. 487--499, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Buehrer, S. Parthasarathy, S. Tatikonda, T. Kurc, and J. Saltz. Toward terabyte pattern mining: an architecture-conscious solution. In Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '07, pages 2--12, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Cong, J. Han, J. Hoeflinger, and D. Padua. A sampling-based framework for parallel data mining. In Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '05, pages 255--265, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. El-Hajj and O. Zaiane. Parallel leap: large-scale maximal pattern mining in a distributed environment. In Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference on, volume 1, page 8 pp., 0--0 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Fang, K. K. Lau, M. Lu, X. Xiao, C. K. Lam, Y. Yang, B. He, Q. Luo, P. V. Sander, and K. Yang. Parallel data mining on graphics processors. Technical Report 07, The Hong Kong University of Science & Technology, 2008.Google ScholarGoogle Scholar
  7. L. Liu, E. Li, Y. Zhang, and Z. Tang. Optimization of frequent itemset mining on multiple-core processor. In Proceedings of the 33rd international conference on Very large data bases, VLDB '07, pages 1275--1285. VLDB Endowment, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Ozkural, B. Ucar, and C. Aykanat. Parallel frequent item set mining with selective item replication. Parallel and Distributed Systems, IEEE Transactions on, 22(10): 1632--1640, oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Ruoming, Y. Ge, and G. Agrawal. Shared memory parallelization of data mining algorithms: techniques, programming interface, and performance. Knowledge and Data Engineering, IEEE Transactions on, 17(1): 71--89, jan. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Zaki. Parallel and distributed association mining: a survey. Concurrency, IEEE, 7(4): 14--25, oct-dec 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Dean and S. Ghemawat: Mapreduce: Simplified Data Processing on Large Clusters. In: Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI), pp. 137--150, 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hadoop, http://hadoop.apache.org/Google ScholarGoogle Scholar
  13. Ming-Yen Lin, Pei-Yu Lee, and Sue-Chen Hsueh. 2012. Apriori-based frequent itemset mining algorithms on MapReduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication (ICUIMC '12). ACM, New York, NY, USA, Article 76, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Shah K. D. & Mahajan S. (2009). Maximizing the Efficiency of Parallel Apriori Algorithm. Proceeding of the International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom '09) Kottayam, Kerala, IEEE: 107--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ye Y. & Chiang C. (2006). A Parallel Apriori Algorithm for Frequent Itemsets Mining. Proc. of the 4th International Conference on Software Engineering Research, Management and Applications (SERA '06). Seattle, WA, IEEE: 87--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Paul S. & Saravanan V. (2008). Hash Partitioned Apriori in Parallel and Distributed Data Mining Environment with Dynamic Data Allocation Approach. Proc. of the International Conference on Computer Science and Information Technology (ICCSIT '08). Singapore, IEEE: 481--485. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yu K. & Zhou J. (2008). A Weighted Load-Balancing Parallel Apriori Algorithm for Association Rule Mining. Proc. of the International Conference on Granular Computing (GrC '08). Hangzhou, IEEE: 756--761.Google ScholarGoogle Scholar
  18. G. Buehrer, S. Parthasarathy, S. Tatikonda, T. Kurc, and J. Saltz. Toward terabyte pattern mining: an architecture-conscious solution. In Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '07, pages 2--12, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Li L. & Zhang M. (2011). The Strategy of Mining Association Rule Based on Cloud Computing. Proceeding of the 2011 International Conference on Business Computing and Global Informatization (BCGIN '11). Washington, DC, USA, IEEE: 475--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Li N., Zeng L., He Q. & Shi Z. (2012). Parallel Implementation of Apriori Algorithm Based on MapReduce. Proc. of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD '12). Kyoto, IEEE: 236--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yang X. Y., Liu Z. & Fu Y. (2010). MapReduce as a Programming Model for Association Rules Algorithm on Hadoop. Proc. of the 3rd International Conference on Information Sciences and Interaction Sciences (ICIS '10). Chengdu, China, IEEE: 99--102.Google ScholarGoogle ScholarCross RefCross Ref
  22. Othman Yahya, Osman Hegazy, Ehab Ezat (2012).An Efficient Implementation of Apriori Algorithm Based on Hadoop-Mapreduce Model, Proc. of the International Journal of Reviews in Computing 31st December 2012. Vol. 12: 59--67.Google ScholarGoogle Scholar
  23. Z. Zheng, R. Kohavi, and L. Mason, "Real world performance of association rule algorithms", in Proc. KDD, 2001, pp. 401--406. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient mining of frequent itemsets in social network data based on MapReduce framework

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
          August 2013
          1558 pages
          ISBN:9781450322409
          DOI:10.1145/2492517

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 August 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate116of549submissions,21%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader