research-article

PTE: Enumerating Trillion Triangles On Distributed Systems

Authors:
Ha-Myung Park

KAIST, Daejeon, Republic of Korea

KAIST, Daejeon, Republic of Korea
View Profile

,
Sung-Hyon Myaeng

KAIST, Daejeon, Republic of Korea

KAIST, Daejeon, Republic of Korea
View Profile

,
U. Kang

Seoul National University, Seoul, Republic of Korea

Seoul National University, Seoul, Republic of Korea
View Profile

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2016Pages 1115–1124https://doi.org/10.1145/2939672.2939757

Published:13 August 2016Publication History

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1115–1124

ABSTRACT

How can we enumerate triangles from an enormous graph with billions of vertices and edges? Triangle enumeration is an important task for graph data analysis with many applications including identifying suspicious users in social networks, detecting web spams, finding communities, etc. However, recent networks are so large that most of the previous algorithms fail to process them. Recently, several MapReduce algorithms have been proposed to address such large networks; however, they suffer from the massive shuffled data resulting in a very long processing time. In this paper, we propose PTE (Pre-partitioned Triangle Enumeration), a new distributed algorithm for enumerating triangles in enormous graphs by resolving the structural inefficiency of the previous MapReduce algorithms. PTE enumerates trillions of triangles in a billion scale graph by decreasing three factors: the amount of shuffled data, total work, and network read.

Experimental results show that PTE provides up to 47 times faster performance than recent distributed algorithms on real world graphs, and succeeds in enumerating more than 3 trillion triangles on the ClueWeb12 graph with 6.3 billion vertices and 72 billion edges, which any previous triangle computation algorithm fail to process.

Supplemental Material

kdd2016_park_trillion_triangles_01-acm.mp4

mp4

263.8 MB

Download

References

Jesse Alpert and Nissan Hajaj. http://googleblog.blogspot.kr/2008/07/we-knew-web-was-big.html, 2008.Google Scholar
Shaikh Arifuzzaman, Maleq Khan, and Madhav V. Marathe. PATRIC: a parallel algorithm for counting triangles in massive networks. In CIKM, 2013. Google ScholarDigital Library
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. Efficient algorithms for large-scale local triangle counting. TKDD, 2010. Google ScholarDigital Library
Jonathan W Berry, Bruce Hendrickson, Randall A LaViolette, and Cynthia A Phillips. Tolerating the community detection resolution limit with edge weighting. Phys. Rev. E, 83(5):056119, 2011.Google ScholarCross Ref
Bin-Hui Chou and Einoshin Suzuki. Discovering community-oriented roles of nodes in a social network. In DaWaK, pages 52--64, 2010. Google ScholarDigital Library
Jonathan Cohen. Graph twiddling in a mapreduce world. CiSE, 11(4):29--41, 2009. Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarDigital Library
Jean-Pierre Eckmann and Elisha Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS, 99(9):5825--5829, 2002.Google ScholarCross Ref
Facebook. http://newsroom.fb.com/company-info, 2015.Google Scholar
Ilias Giechaskiel, George Panagopoulos, and Eiko Yoneki. PDTL: parallel and distributed triangle listing for massive graphs. In ICPP, 2015. Google ScholarDigital Library
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30, 2012. Google ScholarDigital Library
Herodotos Herodotou. Hadoop performance models. arXiv, 2011.Google Scholar
Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. Massive graph triangulation. In SIGMOD, pages 325--336, 2013. Google ScholarDigital Library
ByungSoo Jeon, Inah Jeon, Lee Sael, and U Kang. Scout: Scalable coupled matrix-tensor factorization - algorithm and discoveries. In ICDE, 2016.Google ScholarCross Ref
U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos. Net-ray: Visualizing and mining billion-scale graphs. In PAKDD, 2014.Google Scholar
U Kang, Brendan Meeder, Evangelos E. Papalexakis, and Christos Faloutsos. Heigen: Spectral analysis for billion-scale graphs. TKDE, pages 350--362, 2014. Google ScholarDigital Library
U Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos. Gbase: an efficient analysis platform for large graphs. VLDB J., 21(5):637--650, 2012. Google ScholarDigital Library
U Kang, Charalampos E. Tsourakakis, and Faloutsos Faloutsos. Pegasus: A peta-scale graph mining system - implementation and observations. ICDM, 2009. Google ScholarDigital Library
Jinha Kim, Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, and Hwanjo Yu. OPT: A new framework for overlapped and parallel triangulation in large-scale graphs. In SIGMOD, pages 637--648, 2014. Google ScholarDigital Library
Matthieu Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci., pages 458--473, 2008. Google ScholarDigital Library
Rasmus Pagh and Francesco Silvestri. The input/output complexity of triangle enumeration. In PODS, pages 224--233, 2014. Google ScholarDigital Library
Ha-Myung Park and Chin-Wan Chung. An efficient mapreduce algorithm for counting triangles in a very large graph. In CIKM, pages 539--548, 2013. Google ScholarDigital Library
Ha-Myung Park, Francesco Silvestri, U Kang, and Rasmus Pagh. Mapreduce triangle enumeration with guarantees. In CIKM, pages 1739--1748, 2014. Google ScholarDigital Library
Filippo Radicchi, Claudio Castellano, Federico Cecconi, Vittorio Loreto, and Domenico Parisi. Defining and identifying communities in networks. PNAS, 101(9):2658--2663, 2004.Google ScholarCross Ref
Thomas Schank. Algorithmic aspects of triangle-based network analysis. Phd thesis, University Karlsruhe, 2007.Google Scholar
Siddharth Suri and Sergei Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607--614, 2011. Google ScholarDigital Library
Twitter. https://about.twitter.com/company, 2015.Google Scholar
Mark N. Wegman and Larry Carter. New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci., 22(3):265--279, 1981.Google ScholarCross Ref
Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. Uncovering social network sybils in the wild. TKDD, 2014. Google ScholarDigital Library

Index Terms

PTE: Enumerating Trillion Triangles On Distributed Systems
1. Information systems
  1. Information systems applications
    1. Data mining
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Enumerating Trillion Subgraphs On Distributed Systems

How can we find patterns from an enormous graph with billions of vertices and edges? The subgraph enumeration, which is to find patterns from a graph, is an important task for graph data analysis with many applications, including analyzing the social ...
Read More
BIGMiner: a fast and scalable distributed frequent pattern miner for big data

Frequent itemset mining is widely used as a fundamental data mining technique. Recently, there have been proposed a number of MapReduce-based frequent itemset mining methods in order to overcome the limits on data size and speed of mining that ...
Read More
Finding a maximum-weight induced k-partite subgraph of an i-triangulated graph

An i-triangulated graph is a graph in which every odd cycle has two non-crossing chords; i-triangulated graphs form a subfamily of perfect graphs. A slightly more general family of perfect graphs are clique-separable graphs. A graph is clique-separable ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
big data
distributed algorithm
graph algorithm
network analysis
scalable algorithm
triangle enumeration
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 270
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PTE: Enumerating Trillion Triangles On Distributed Systems

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Enumerating Trillion Subgraphs On Distributed Systems

BIGMiner: a fast and scalable distributed frequent pattern miner for big data

Finding a maximum-weight induced k-partite subgraph of an i-triangulated graph