ABSTRACT
Graph clustering has recently attracted much attention as a technique to extract community structures from various kinds of graph data. Since available graph data becomes increasingly large, the acceleration of graph clustering is an important issue for handling large-scale graphs. To this end, this paper proposes a fast graph clustering method using GPUs. The proposed method is based on parallelization of label propagation, one of the fastest graph clustering algorithms. Our method has the following three characteristics: (1) efficient parallelization: the algorithm of label propagation is transformed into a sequence of data-parallel primitives; (2) load balance: the method takes into account load balancing by adopting the primitives that make the load among threads and blocks well balanced; and (3) out-of-core processing: we also develop algorithms to efficiently deal with large-scale datasets that do not fit into GPU memory. Moreover, this GPU out-of-core algorithm is extended to simultaneously exploit both CPUs and GPUs for further performance gain. Extensive experiments with real-world and synthetic datasets show that our proposed method outperforms an existing parallel CPU implementation by a factor of up to 14.3 without sacrificing accuracy.
- J. Arai, H. Shiokawa, T. Yamamuro, M. Onizuka, and I. Sotetsu. Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis. In Proc. IPDPS, pp. 22--31, 2016.Google ScholarCross Ref
- A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, vol. 286, no. 5439, pp. 509--512, Oct. 1999.Google ScholarCross Ref
- P. Boldi, M. Santini, and S. Vigna. A Large Time-Aware Graph. SIGIR Forum, vol. 42, no. 2, pp. 33--38, Dec. 2008. Google ScholarDigital Library
- S. Baxter. Modern GPU, ver. 1.0. https://github.com/moderngpu/moderngpu.Google Scholar
- N. Bell and M. Garland. Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors. In Proc. SC, pp. 18:1--18:11, 2009. Google ScholarDigital Library
- V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp., vol. 2008, no. 10, p. P10008, Oct. 2008.Google ScholarCross Ref
- M. Chen, K. Kuzmin, and B. K. Szymanski. Community Detection via Maximization of Modularity and Its Variants. IEEE Trans. Computational Soc. Syst., vol. 1, no. 1, pp. 46--65, Mar. 2014.Google ScholarCross Ref
- G. Cordasco and L. Gargano. Label Propagation Algorithm: A Semi-synchronous Approach. Int. J. Soc. Netw. Min., vol. 1, no. 1, pp. 3--26, 2012.Google Scholar
- H. N. Djidjev and M. Onus. Scalable and Accurate Graph Clustering and Community Structure Detection. IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 5, pp. 1022--1029, May 2013. Google ScholarDigital Library
- E. Duriakova, N. Hurley, D. Ajwani, and A. Sala. Analysis of the Semi-synchronous Approach to Large-scale Parallel Community Finding. In Proc. COSN, pp. 51--62, 2014. Google ScholarDigital Library
- S. Fortunato and M. Barthélemy. Resolution limit in community detection. Proc. Natl. Acad. Sci., vol. 104, no. 1, pp. 36--41, Dec. 2007.Google ScholarCross Ref
- S. Fortunato. Community detection in graphs. Phys. Rep., vol. 486, no. 3--5, pp. 75--174, Feb. 2010.Google Scholar
- B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational Query Coprocessing on Graphics Processors. ACM Trans. Database Syst., vol. 34, pp. 21:1--21:39, Dec. 2009. Google ScholarDigital Library
- A. Lancichinetti, S. Fortunato, and F. Radicchi. Benchmark graphs for testing community detection algorithms. Phys. Rev. E, vol. 78, no. 4, p. 046110, Oct. 2008.Google ScholarCross Ref
- J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, Jun. 2014.Google Scholar
- I. X. Y. Leung, P. Hui, P. Liò, and J. Crowcroft. Towards real-time community detection in large networks. Phys. Rev. E, vol. 79, p. 066107, Jun 2009.Google ScholarCross Ref
- D. Merrill. CUB, ver. 1.6.4. http://nvlabs.github.io/cub/.Google Scholar
- M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, vol. 69, no. 2, p. 026113, Feb. 2004.Google ScholarCross Ref
- NVIDIA. CUDA C Programming Guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf.Google Scholar
- J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips. GPU Computing. Proc. IEEE, vol. 96, no. 5, pp. 879--899, May 2008.Google ScholarCross Ref
- S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos. Community Detection in Social Media. Data Min. Knowl. Discov., vol. 24, no. 3, pp. 515--554, May 2012. Google ScholarDigital Library
- U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E, vol. 76, p. 036106, Sep. 2007.Google ScholarCross Ref
- J. Soman and A. Narang. Fast Community Detection Algorithm with GPUs and Multicore Architectures. In Proc. IPDPS, pp. 568--579, 2011. Google ScholarDigital Library
- C. L. Staudt and H. Meyerhenke. Engineering Parallel Algorithms for Community Detection in Massive Networks. IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 1, pp. 171--184, Jan. 2016. Google ScholarDigital Library
- T. R. Stovall, S. Kockara, and R. Avci. GPUSCAN: GPU-Based Parallel Structural Clustering Algorithm for Networks. IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 12, pp. 3381--3393, Dec. 2015. Google ScholarDigital Library
- M. Wang, C. Wang, J. X. Yu, and J. Zhang. Community Detection in Social Networks: An In-depth Benchmarking Study with a Procedure-oriented Framework. Proc. VLDB Endow., vol. 8, no. 10, pp. 998--1009, Jun. 2015. Google ScholarDigital Library
- X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger. SCAN: A Structural Clustering Algorithm for Networks. In Proc. KDD, pp. 824--833, 2007. Google ScholarDigital Library
Recommendations
Optimizing linpack benchmark on GPU-accelerated petascale supercomputer
Special issue on Community Analysis and Information RecommendationIn this paper we present the programming of the Linpack benchmark on TianHe-1 system, the first petascale supercomputer system of China, and the largest GPU-accelerated heterogeneous system ever attempted before. A hybrid programming model consisting of ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
AbstractGPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...
Comments