research-article

GPU-Accelerated Graph Clustering via Parallel Label Propagation

Authors:
Yusuke Kozawa

AIST, Tokyo, Japan

AIST, Tokyo, Japan
View Profile

,
Toshiyuki Amagasa

University of Tsukuba, Tsukuba, Japan

University of Tsukuba, Tsukuba, Japan
View Profile

,
Hiroyuki Kitagawa

University of Tsukuba, Tsukuba, Japan

University of Tsukuba, Tsukuba, Japan
View Profile

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementNovember 2017Pages 567–576https://doi.org/10.1145/3132847.3132960

Published:06 November 2017Publication History

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 567–576

ABSTRACT

Graph clustering has recently attracted much attention as a technique to extract community structures from various kinds of graph data. Since available graph data becomes increasingly large, the acceleration of graph clustering is an important issue for handling large-scale graphs. To this end, this paper proposes a fast graph clustering method using GPUs. The proposed method is based on parallelization of label propagation, one of the fastest graph clustering algorithms. Our method has the following three characteristics: (1) efficient parallelization: the algorithm of label propagation is transformed into a sequence of data-parallel primitives; (2) load balance: the method takes into account load balancing by adopting the primitives that make the load among threads and blocks well balanced; and (3) out-of-core processing: we also develop algorithms to efficiently deal with large-scale datasets that do not fit into GPU memory. Moreover, this GPU out-of-core algorithm is extended to simultaneously exploit both CPUs and GPUs for further performance gain. Extensive experiments with real-world and synthetic datasets show that our proposed method outperforms an existing parallel CPU implementation by a factor of up to 14.3 without sacrificing accuracy.

References

J. Arai, H. Shiokawa, T. Yamamuro, M. Onizuka, and I. Sotetsu. Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis. In Proc. IPDPS, pp. 22--31, 2016.Google ScholarCross Ref
A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, vol. 286, no. 5439, pp. 509--512, Oct. 1999.Google ScholarCross Ref
P. Boldi, M. Santini, and S. Vigna. A Large Time-Aware Graph. SIGIR Forum, vol. 42, no. 2, pp. 33--38, Dec. 2008. Google ScholarDigital Library
S. Baxter. Modern GPU, ver. 1.0. https://github.com/moderngpu/moderngpu.Google Scholar
N. Bell and M. Garland. Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. In Proc. SC, pp. 18:1--18:11, 2009. Google ScholarDigital Library
V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp., vol. 2008, no. 10, p. P10008, Oct. 2008.Google ScholarCross Ref
M. Chen, K. Kuzmin, and B. K. Szymanski. Community Detection via Maximization of Modularity and Its Variants. IEEE Trans. Computational Soc. Syst., vol. 1, no. 1, pp. 46--65, Mar. 2014.Google ScholarCross Ref
G. Cordasco and L. Gargano. Label Propagation Algorithm: A Semi-synchronous Approach. Int. J. Soc. Netw. Min., vol. 1, no. 1, pp. 3--26, 2012.Google Scholar
H. N. Djidjev and M. Onus. Scalable and Accurate Graph Clustering and Community Structure Detection. IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 5, pp. 1022--1029, May 2013. Google ScholarDigital Library
E. Duriakova, N. Hurley, D. Ajwani, and A. Sala. Analysis of the Semi-synchronous Approach to Large-scale Parallel Community Finding. In Proc. COSN, pp. 51--62, 2014. Google ScholarDigital Library
S. Fortunato and M. Barthélemy. Resolution limit in community detection. Proc. Natl. Acad. Sci., vol. 104, no. 1, pp. 36--41, Dec. 2007.Google ScholarCross Ref
S. Fortunato. Community detection in graphs. Phys. Rep., vol. 486, no. 3--5, pp. 75--174, Feb. 2010.Google Scholar
B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational Query Coprocessing on Graphics Processors. ACM Trans. Database Syst., vol. 34, pp. 21:1--21:39, Dec. 2009. Google ScholarDigital Library
A. Lancichinetti, S. Fortunato, and F. Radicchi. Benchmark graphs for testing community detection algorithms. Phys. Rev. E, vol. 78, no. 4, p. 046110, Oct. 2008.Google ScholarCross Ref
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, Jun. 2014.Google Scholar
I. X. Y. Leung, P. Hui, P. Liò, and J. Crowcroft. Towards real-time community detection in large networks. Phys. Rev. E, vol. 79, p. 066107, Jun 2009.Google ScholarCross Ref
D. Merrill. CUB, ver. 1.6.4. http://nvlabs.github.io/cub/.Google Scholar
M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, vol. 69, no. 2, p. 026113, Feb. 2004.Google ScholarCross Ref
NVIDIA. CUDA C Programming Guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf.Google Scholar
J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips. GPU Computing. Proc. IEEE, vol. 96, no. 5, pp. 879--899, May 2008.Google ScholarCross Ref
S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos. Community Detection in Social Media. Data Min. Knowl. Discov., vol. 24, no. 3, pp. 515--554, May 2012. Google ScholarDigital Library
U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E, vol. 76, p. 036106, Sep. 2007.Google ScholarCross Ref
J. Soman and A. Narang. Fast Community Detection Algorithm with GPUs and Multicore Architectures. In Proc. IPDPS, pp. 568--579, 2011. Google ScholarDigital Library
C. L. Staudt and H. Meyerhenke. Engineering Parallel Algorithms for Community Detection in Massive Networks. IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 1, pp. 171--184, Jan. 2016. Google ScholarDigital Library
T. R. Stovall, S. Kockara, and R. Avci. GPUSCAN: GPU-Based Parallel Structural Clustering Algorithm for Networks. IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 12, pp. 3381--3393, Dec. 2015. Google ScholarDigital Library
M. Wang, C. Wang, J. X. Yu, and J. Zhang. Community Detection in Social Networks: An In-depth Benchmarking Study with a Procedure-oriented Framework. Proc. VLDB Endow., vol. 8, no. 10, pp. 998--1009, Jun. 2015. Google ScholarDigital Library
X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger. SCAN: A Structural Clustering Algorithm for Networks. In Proc. KDD, pp. 824--833, 2007. Google ScholarDigital Library

Recommendations

Optimizing linpack benchmark on GPU-accelerated petascale supercomputer
Special issue on Community Analysis and Information Recommendation

In this paper we present the programming of the Linpack benchmark on TianHe-1 system, the first petascale supercomputer system of China, and the largest GPU-accelerated heterogeneous system ever attempted before. A hybrid programming model consisting of ...
Read More
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights
- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
Abstract
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
community detection
gpu
graph clustering
label propagation
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '17 Paper Acceptance Rate171of855submissions,20%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 427
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GPU-Accelerated Graph Clustering via Parallel Label Propagation

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Recommendations

Optimizing linpack benchmark on GPU-accelerated petascale supercomputer

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers