skip to main content
10.1145/2939672.2939757acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

PTE: Enumerating Trillion Triangles On Distributed Systems

Published:13 August 2016Publication History

ABSTRACT

How can we enumerate triangles from an enormous graph with billions of vertices and edges? Triangle enumeration is an important task for graph data analysis with many applications including identifying suspicious users in social networks, detecting web spams, finding communities, etc. However, recent networks are so large that most of the previous algorithms fail to process them. Recently, several MapReduce algorithms have been proposed to address such large networks; however, they suffer from the massive shuffled data resulting in a very long processing time. In this paper, we propose PTE (Pre-partitioned Triangle Enumeration), a new distributed algorithm for enumerating triangles in enormous graphs by resolving the structural inefficiency of the previous MapReduce algorithms. PTE enumerates trillions of triangles in a billion scale graph by decreasing three factors: the amount of shuffled data, total work, and network read.

Experimental results show that PTE provides up to 47 times faster performance than recent distributed algorithms on real world graphs, and succeeds in enumerating more than 3 trillion triangles on the ClueWeb12 graph with 6.3 billion vertices and 72 billion edges, which any previous triangle computation algorithm fail to process.

Skip Supplemental Material Section

Supplemental Material

kdd2016_park_trillion_triangles_01-acm.mp4

mp4

263.8 MB

References

  1. Jesse Alpert and Nissan Hajaj. http://googleblog.blogspot.kr/2008/07/we-knew-web-was-big.html, 2008.Google ScholarGoogle Scholar
  2. Shaikh Arifuzzaman, Maleq Khan, and Madhav V. Marathe. PATRIC: a parallel algorithm for counting triangles in massive networks. In CIKM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. Efficient algorithms for large-scale local triangle counting. TKDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jonathan W Berry, Bruce Hendrickson, Randall A LaViolette, and Cynthia A Phillips. Tolerating the community detection resolution limit with edge weighting. Phys. Rev. E, 83(5):056119, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bin-Hui Chou and Einoshin Suzuki. Discovering community-oriented roles of nodes in a social network. In DaWaK, pages 52--64, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jonathan Cohen. Graph twiddling in a mapreduce world. CiSE, 11(4):29--41, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jean-Pierre Eckmann and Elisha Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS, 99(9):5825--5829, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  9. Facebook. http://newsroom.fb.com/company-info, 2015.Google ScholarGoogle Scholar
  10. Ilias Giechaskiel, George Panagopoulos, and Eiko Yoneki. PDTL: parallel and distributed triangle listing for massive graphs. In ICPP, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Herodotos Herodotou. Hadoop performance models. arXiv, 2011.Google ScholarGoogle Scholar
  13. Xiaocheng Hu, Yufei Tao, and Chin-Wan Chung. Massive graph triangulation. In SIGMOD, pages 325--336, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. ByungSoo Jeon, Inah Jeon, Lee Sael, and U Kang. Scout: Scalable coupled matrix-tensor factorization - algorithm and discoveries. In ICDE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  15. U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos. Net-ray: Visualizing and mining billion-scale graphs. In PAKDD, 2014.Google ScholarGoogle Scholar
  16. U Kang, Brendan Meeder, Evangelos E. Papalexakis, and Christos Faloutsos. Heigen: Spectral analysis for billion-scale graphs. TKDE, pages 350--362, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. U Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos. Gbase: an efficient analysis platform for large graphs. VLDB J., 21(5):637--650, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. U Kang, Charalampos E. Tsourakakis, and Faloutsos Faloutsos. Pegasus: A peta-scale graph mining system - implementation and observations. ICDM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jinha Kim, Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, and Hwanjo Yu. OPT: A new framework for overlapped and parallel triangulation in large-scale graphs. In SIGMOD, pages 637--648, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Matthieu Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci., pages 458--473, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rasmus Pagh and Francesco Silvestri. The input/output complexity of triangle enumeration. In PODS, pages 224--233, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ha-Myung Park and Chin-Wan Chung. An efficient mapreduce algorithm for counting triangles in a very large graph. In CIKM, pages 539--548, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ha-Myung Park, Francesco Silvestri, U Kang, and Rasmus Pagh. Mapreduce triangle enumeration with guarantees. In CIKM, pages 1739--1748, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Filippo Radicchi, Claudio Castellano, Federico Cecconi, Vittorio Loreto, and Domenico Parisi. Defining and identifying communities in networks. PNAS, 101(9):2658--2663, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  25. Thomas Schank. Algorithmic aspects of triangle-based network analysis. Phd thesis, University Karlsruhe, 2007.Google ScholarGoogle Scholar
  26. Siddharth Suri and Sergei Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607--614, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Twitter. https://about.twitter.com/company, 2015.Google ScholarGoogle Scholar
  28. Mark N. Wegman and Larry Carter. New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci., 22(3):265--279, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  29. Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. Uncovering social network sybils in the wild. TKDD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PTE: Enumerating Trillion Triangles On Distributed Systems

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
            August 2016
            2176 pages
            ISBN:9781450342322
            DOI:10.1145/2939672

            Copyright © 2016 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 August 2016

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%

            Upcoming Conference

            KDD '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader