research-article

Efficient mining of frequent itemsets in social network data based on MapReduce framework

Authors:
Zahra Farzanyar

York University, Toronto, Canada

York University, Toronto, Canada
View Profile

,
Nick Cercone

York University, Toronto, Canada

York University, Toronto, Canada
View Profile

ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningAugust 2013Pages 1183–1188https://doi.org/10.1145/2492517.2500301

Published:25 August 2013Publication History

ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Pages 1183–1188

ABSTRACT

Social Networks promote information sharing between people everywhere and at all times. Mining data produced in this data-rich environment can be extremely useful. Frequent itemset mining plays an important role in mining associations, correlations, sequential patterns, causality, episodes, multidimensional patterns, max-patterns, partial periodicity, emerging patterns, and many other significant data mining tasks in social networks. With the exponential growth of social network data towards a terabyte or more, most of the traditional frequent itemset mining algorithms become ineffective due to either huge resource requirements or large communications overhead. Cloud computing has proved that processing very large datasets over commodity clusters can be done by providing the right programming model. As a parallel programming model, MapReduce, one of most important techniques for cloud computing, has emerged in the mining of datasets of terabyte scale or larger on clusters of computers. In this paper, we propose an efficient frequent itemset mining algorithm, called IMRApriori, based on MapReduce framework which deals with Hadoop cloud, a parallel store and computing platform. The paper demonstrates experimental results to corroborate the theoretical claims.

References

Le Zhou; Zhiyong Zhong; Jin Chang; Junjie Li; Huang, J. Z.; Shengzhong Feng, "Balanced parallel FP-Growth with MapReduce," Information Computing and Telecommunications (YC-ICT), 2010 IEEE Youth Conference on, vol., no., pp. 243, 246, 28--30 Nov. 2010.Google Scholar
R. Agrawal and R. Srikant: Fast Algorithms for Mining Association Rules in Large Databases. In: Proceedings of the Twentieth International Conference on Very Large Databases (VLDB), pp. 487--499, 1994. Google ScholarDigital Library
G. Buehrer, S. Parthasarathy, S. Tatikonda, T. Kurc, and J. Saltz. Toward terabyte pattern mining: an architecture-conscious solution. In Proceedings of the 12^th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '07, pages 2--12, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
S. Cong, J. Han, J. Hoeflinger, and D. Padua. A sampling-based framework for parallel data mining. In Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '05, pages 255--265, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
M. El-Hajj and O. Zaiane. Parallel leap: large-scale maximal pattern mining in a distributed environment. In Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference on, volume 1, page 8 pp., 0--0 2006. Google ScholarDigital Library
W. Fang, K. K. Lau, M. Lu, X. Xiao, C. K. Lam, Y. Yang, B. He, Q. Luo, P. V. Sander, and K. Yang. Parallel data mining on graphics processors. Technical Report 07, The Hong Kong University of Science & Technology, 2008.Google Scholar
L. Liu, E. Li, Y. Zhang, and Z. Tang. Optimization of frequent itemset mining on multiple-core processor. In Proceedings of the 33rd international conference on Very large data bases, VLDB '07, pages 1275--1285. VLDB Endowment, 2007. Google ScholarDigital Library
E. Ozkural, B. Ucar, and C. Aykanat. Parallel frequent item set mining with selective item replication. Parallel and Distributed Systems, IEEE Transactions on, 22(10): 1632--1640, oct. 2011. Google ScholarDigital Library
J. Ruoming, Y. Ge, and G. Agrawal. Shared memory parallelization of data mining algorithms: techniques, programming interface, and performance. Knowledge and Data Engineering, IEEE Transactions on, 17(1): 71--89, jan. 2005. Google ScholarDigital Library
M. Zaki. Parallel and distributed association mining: a survey. Concurrency, IEEE, 7(4): 14--25, oct-dec 1999. Google ScholarDigital Library
J. Dean and S. Ghemawat: Mapreduce: Simplified Data Processing on Large Clusters. In: Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI), pp. 137--150, 2004 Google ScholarDigital Library
Hadoop, http://hadoop.apache.org/Google Scholar
Ming-Yen Lin, Pei-Yu Lee, and Sue-Chen Hsueh. 2012. Apriori-based frequent itemset mining algorithms on MapReduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication (ICUIMC '12). ACM, New York, NY, USA, Article 76, 8 pages. Google ScholarDigital Library
Shah K. D. & Mahajan S. (2009). Maximizing the Efficiency of Parallel Apriori Algorithm. Proceeding of the International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom '09) Kottayam, Kerala, IEEE: 107--109. Google ScholarDigital Library
Ye Y. & Chiang C. (2006). A Parallel Apriori Algorithm for Frequent Itemsets Mining. Proc. of the 4^th International Conference on Software Engineering Research, Management and Applications (SERA '06). Seattle, WA, IEEE: 87--94. Google ScholarDigital Library
Paul S. & Saravanan V. (2008). Hash Partitioned Apriori in Parallel and Distributed Data Mining Environment with Dynamic Data Allocation Approach. Proc. of the International Conference on Computer Science and Information Technology (ICCSIT '08). Singapore, IEEE: 481--485. Google ScholarDigital Library
Yu K. & Zhou J. (2008). A Weighted Load-Balancing Parallel Apriori Algorithm for Association Rule Mining. Proc. of the International Conference on Granular Computing (GrC '08). Hangzhou, IEEE: 756--761.Google Scholar
G. Buehrer, S. Parthasarathy, S. Tatikonda, T. Kurc, and J. Saltz. Toward terabyte pattern mining: an architecture-conscious solution. In Proceedings of the 12^th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '07, pages 2--12, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
Li L. & Zhang M. (2011). The Strategy of Mining Association Rule Based on Cloud Computing. Proceeding of the 2011 International Conference on Business Computing and Global Informatization (BCGIN '11). Washington, DC, USA, IEEE: 475--478. Google ScholarDigital Library
Li N., Zeng L., He Q. & Shi Z. (2012). Parallel Implementation of Apriori Algorithm Based on MapReduce. Proc. of the 13^th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD '12). Kyoto, IEEE: 236--241. Google ScholarDigital Library
Yang X. Y., Liu Z. & Fu Y. (2010). MapReduce as a Programming Model for Association Rules Algorithm on Hadoop. Proc. of the 3^rd International Conference on Information Sciences and Interaction Sciences (ICIS '10). Chengdu, China, IEEE: 99--102.Google ScholarCross Ref
Othman Yahya, Osman Hegazy, Ehab Ezat (2012).An Efficient Implementation of Apriori Algorithm Based on Hadoop-Mapreduce Model, Proc. of the International Journal of Reviews in Computing 31st December 2012. Vol. 12: 59--67.Google Scholar
Z. Zheng, R. Kohavi, and L. Mason, "Real world performance of association rule algorithms", in Proc. KDD, 2001, pp. 401--406. Google ScholarDigital Library

Index Terms

Efficient mining of frequent itemsets in social network data based on MapReduce framework
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Web-based interaction
2. Information systems
  1. Information retrieval
    1. Document representation
  2. Information systems applications
    1. Data mining

Recommendations

Accelerating Frequent Itemsets Mining on the Cloud: A MapReduce -Based Approach
ICDMW '13: Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops

Frequent pattern mining has a critical role in mining associations, sequential patterns, correlations, causality, episodes, multidimensional patterns, emerging patterns, and many other significant data mining tasks. With the exponential growth of ...
Read More
An efficient pattern growth approach for mining fault tolerant frequent itemsets
Highlights
- Mining fault tolerant (FT) frequent itemsets are computationally expensive.
- ...
Abstract
Mining fault tolerant (FT) frequent itemsets from transactional databases are computationally more expensive than mining exact matching frequent itemsets. Previous algorithms mine FT frequent itemsets using Apriori heuristic. Apriori-...
Read More
Applying bit-vector projection approach for efficient mining of N-most interesting frequent itemsets
CI '07: Proceedings of the Third IASTED International Conference on Computational Intelligence

Real world datasets are sparse, dirty and contain hundreds of items. In such situations, discovering interesting rules (results) using traditional frequent itemset mining approach by specifying a user defined input support threshold is not appropriate. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
August 2013
1558 pages
ISBN:9781450322409
DOI:10.1145/2492517
General Chairs:
Jon Rokne
University of Calgary, Calgary, AB, Canada
,
Christos Faloutsos
Carnegie Mellon University, Pittsburgh, PA
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cloud computing
frequent itemset mining
mapreduce
social networks
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate116of549submissions,21%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 549
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient mining of frequent itemsets in social network data based on MapReduce framework

ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Accelerating Frequent Itemsets Mining on the Cloud: A MapReduce -Based Approach

An efficient pattern growth approach for mining fault tolerant frequent itemsets

Applying bit-vector projection approach for efficient mining of N-most interesting frequent itemsets