Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware

Abstract

With a cluster of commodity hardware, how can we efficiently find all connected components of an enormous graph containing hundreds of billions of nodes and edges? The problem of finding connected components has been used in various applications such as pattern recognition, reachability indexing, graph compression, graph partitioning, and random walk. Several studies have been proposed to efficiently find connected components in various environments. Most existing single-machine and distributed-memory algorithms are limited in scalability as they have to load all data generated during the process into the main memory; they require expensive machines with vast memory capacities to handle large graphs. Several MapReduce algorithms try to handle large graphs by exploiting distributed storage but fail due to data explosion problems, which is a phenomenon that significantly increases the size of data as the computation proceeds. The latest MapReduce algorithms resolve the problem by proposing two distinguishing star-operations and executing them alternately, while the star-operations still cause massive network traffic as a star-operation is a distributed operation that connects each node to its smallest neighbor. In this paper, we unite the two star-operations into a single operation, namely UniStar, and propose UniCon, a new distributed algorithm for finding connected components in enormous graphs using UniStar. The partition-aware processing of UniStar effectively resolves the data explosion problems. We further optimize UniStar by filtering dispensable edges and exploiting a hybrid data structure. Experimental results with a cluster of 10 cheap machines each of which is equipped with Intel Xeon E3-1220 CPU (4-cores at 3.10GHz), 16GB RAM, and 2 SSDs of 1TB show that UniCon is up to 13 times faster than competitors on real-world graphs. UniCon succeeds in processing a tremendous graph with 129 billion edges, which is up to 4096 times larger than graphs competitors can process.

Introduction

Given a large graph containing hundreds of billions of nodes and edges, how can we find all connected components efficiently on a cluster of commodity hardware? A connected component in a graph is a maximal subset of nodes connected by paths. Finding connected components is one of the most important tasks in the field of graph analysis with various applications including pattern recognition [1, 2], reachability indexing [35], graph compression [68], graph partitioning [911], random walk [12], etc. Meanwhile, billion to trillion-scale graphs have emerged recently, which are very challenging to handle because of the enormity.

Various methods have been proposed to efficiently find connected components in large graphs, which are common these days with the expansion of data on the Web. Parallel algorithms [1315], external algorithms [1619], and distributed-memory algorithms [2025] run quickly on moderate-sized graphs. However, these algorithms fail when the graph is large because they have to load all data generated during the process, including the entire input graph, into the main memory (see Fig 14). Like in [15, 26] and [25], some algorithms reportedly handle a hundred billion scale graphs by exploiting expensive machines, but such machines are unaffordable for common data scientists.

Several MapReduce algorithms [2631] try to handle large graphs by exploiting distributed storage but fail because they execute distributed operations a lot or suffer from data explosion problems, which significantly increase the size of data as the computation proceeds which leads to a massive disk and network I/O. To resolve these problems, recent MapReduce algorithms [3234] propose two distinguishing star-operations and conduct them alternately, where a star-operation is a distributed operation that transforms the input graph into another one keeping the connectivity. Then, our question is: Is there any way to improve the performance of the MapReduce algorithms by merging the two star-operations into one? The data explosion problem occurs again if we combine two star-operations carelessly.

In this paper, we propose UniStar, a unified star-operation, and UniCon, a new distributed algorithm using UniStar. UniStar avoids the data explosion problem by partition-aware processing, which partitions nodes and processes nodes in each partition together. We further optimize UniStar in two ways: 1) filtering dispensable edges to reduce intermediate data and 2) minimizing the memory consumption in workers by a custom data structure HybridMap. We summarize the main contributions of this paper as follows:

  • Algorithm. We propose UniStar, a unified star-operation avoiding data explosion problem. We also propose UniCon, a fast and scalable distributed algorithm using UniStar for finding connected components in an enormous graph.
  • Theory. We prove the correctness and various properties of UniCon. We guarantee that the expected memory usage of a worker by UniCon is O((|V| + |E|)/ρ) where |V|, |E|, and ρ are the numbers of nodes, edges, and partitions, respectively.
  • Experiment. Extensive experiments show that UniCon outperforms the state-of-the-art distributed algorithms; UniCon runs as fast as distributed-memory algorithms and succeeds in processing a tremendous graph with 129 billion edges, using only 10 cheap machines each of which is equipped with Intel Xeon E3-1220 CPU (4-cores at 3.10GHz), 16GB RAM, and 2 SSDs of 1TB.

The codes and datasets used in this paper are available in https://github.com/UniCon2021/UniCon.

Related work

In this section, we review and compare existing methods for finding connected components in three categories: single-machine algorithms, distributed-memory algorithms, and MapReduce algorithms. We also describe how the proposed method is improved over the existing algorithms.

Single-machine algorithms

Traditional graph traversal algorithms such as breadth-first search and depth-first search find connected components in linear time on the size of the graph. Loading the entire graph into the memory, they require O(|V| + |E|) memory space where |V| and |E| are the numbers of nodes and edges in the graph, respectively. Union-Find based algorithms [3537] reduce required memory space to O(|V|) by exploiting a parent pointer tree data structure. Our proposed method UniCon uses Rem [37], a Union-Find based single machine algorithm, with modification as a module. Multi-core algorithms [1315] reduce the running time by exploiting multi-core CPUs. ConnectIt [15] is the state-of-the-art multi-core algorithm showing the fastest performance by advanced optimization techniques such as edge sampling, tree linking, and tree compression. However, the multi-core algorithms, including ConnectIt, are limited in scalability because they require loading the entire graph into the main memory. To process a large graph using the above multi-core algorithms, we have to prepare an expensive machine with a huge memory capacity. For example, in [15], the authors use a machine with 72 cores and 1TB memory to process billion-scale graphs; our experiments show that ConnectIt fails to process large graphs on a commodity machine (see Section “Results on Real-world Datasets”).

Distributed-memory algorithms

For the purpose of improving the speed and scalability, distributed-memory algorithms exploit the main memory of multiple machines to store all the input and the intermediate data generated during the process. Pregel-like systems [18, 2023, 38, 39] describe graph algorithms, including connected component computation, as a set of node-centric operations that propagate the value of a node to neighboring nodes repeatedly. FastSV [25], the state-of-the-art distributed-memory algorithm, and LACC [24] compute connected components in a linear algebraic way; they implement the Awerbuch-Shiloach algorithm [40] using the Combinatorial BLAS library [41], which provides several primitives to represent graph algorithms. However, the above distributed-memory algorithms fail when the intermediate data does not fit into the memory. To process a large graph with distributed-memory algorithms, we need an expensive cluster with massive memory capacity. For example, LACC and FastSV use Cray XC40, the supercomputer composed of more than 4,000 nodes (262,000 cores, 360 TB main memory), to process a graph of 50 billion edges. We show that LACC and FastSV fail to process large graphs on a cluster of commodity hardware (see Section “Results on Real-world Datasets”).

MapReduce algorithms

MapReduce [42] is a framework for processing large data using a cluster that consists of multiple commodity machines. While distributed-memory algorithms are limited to moderate-sized graphs, MapReduce is suitable for handling enormous graphs as it processes data in an I/O efficient manner on a distributed file system. Several MapReduce algorithms [2630, 3234] have been proposed to find connected components in enormous graphs. Pegasus [27] propagates the label of each node to its neighbors using a distributed operation each round. The number of rounds required by Pegasus is O(d) where d is the diameter of the graph. As each distributed operation takes non-trivial time, Pegasus does not scale well to large graphs. Hash-Greater-to-Min [28] reduces the number of rounds to O(log|V|), while Hash-to-Min, proposed in the same paper, runs faster in practice. Hash-to-min builds initial clusters each of which consists of a node and its neighbors, and then unions the clusters each round. Hash-to-Min, however, suffers from the problems of data explosion and load balancing. The alternating algorithm [32] resolves the data explosion problem by dividing the union operation of Hash-to-Min into two distributed operations, namely star-operation, and by executing them alternately. PACC [34] resolves the load balancing problem by redesigning the star-operations to partition the nodes; however, PACC still generates a large amount of intermediate data as it alternates the two star-operations as the alternating algorithm does. Note that our method UniCon elaborately unifies the two distributed operations to reduce the amount of intermediate data without load balancing problems. Cracker [31] also improves on the alternating algorithm via vertex pruning and edge reduction. Stergiou et al. [26] propose a label-propagation based distributed method guarantees a logarithmic round number by shortcutting. The algorithm is reported to handle the largest dataset to date exploiting a large-scale cluster that contains 5000 workers, each of which has 128GB memory. Unfortunately, the algorithm is not tested in our experiments because it is not publicly available and not reproducible; it is implemented on Yahoo’s private graph processing system. We believe that the method can’t handle large graphs on commodity machines because it requires loading the entire graph on memory every round to examine every edge in the graph at each iteration as described in [26].

Preliminaries

In this section, we define the problem of finding all connected components. Symbols frequently used in this paper are listed in Table 1.

Problem definition

Let G = (V, E) denote an undirected graph where V and E are the sets of nodes and edges, respectively. The nodes in V are totally ordered; u < v indicates that u precedes v (or v follows u). An edge between two nodes u and v is denoted as an ordered pair: (u, v) if u > v, (v, u) otherwise. We say two nodes u and v are connected if G contains a path from u to v. A connected component, shortly a component, of G is a maximal subset of V where all pairs are connected in G. Every node belongs to exactly one component. We denote the component containing node u by Λ(u, G). The problem of interest in this paper is finding all components in a given graph. This problem is equivalent to mapping each node u to the representative node in Λ(u, G). Even though any node in a component can be the representative node, we consider the most preceding node in the component to be the representative node. For a node set S, we denote the most preceding node in S by m(S). Then, we formally define the problem of finding connected components as follows:

Definition 1. (Finding All Connected Components) Given an undirected graph G = (V, E), the problem of finding all connected components is to map each node uV to the most preceding node m(Λ(u, G)) in the component containing u.

We define several symbols that are used to describe our method. For a graph G = (V, E) and a node uV, we denote by Γ(u, G) = {v|(u, v)∈E} the set of preceding neighbors of u. Similarly, we denote by Γ+(u, G) = {v|(v, u)∈E} the set of following neighbors of u. Γ(u, G) = Γ(u, G)∪Γ+(u, G) denotes the set of all neighbors of u. UniCon partitions the nodes by a random hash function ξ: V → [ρ] where ρ is the number of partitions and [ρ] = {0, …, ρ − 1} is the set of partition ids. ξ(u) is the partition id of node u. For a node set S, [S]i = {vS|ξ(v) = i} denotes the i-th partition of node set S. Each partition [V]i of node set V is totally ordered.

Proposed method

In this section, we propose UniCon, a new distributed algorithm for finding connected components. UniCon achieves high-speed and high-scalability by dealing with the following challenges.

  1. To avoid data explosion problems, existing MapReduce algorithms [3234] divide the union operation of Hash-to-Min [28] into two star-operations, which still transfer massive data via the network. How do we reunite two star-operations into one while resolving data explosion problems? We propose a new star-operation UniStar that alleviates data explosion problems by partition-aware processing; it removes duplicate edges in each partition and leads to early convergence as nodes jump to near the representative node through the edges in each partition. (Section “UniStar: The Unified Star Operation”)
  2. UniStar reads and writes all edges of the input graph but most edges do not change anymore after several rounds. How do we figure out such edges during the process and filter out them to minimize the size of data I/O? We elaborately design three types of edges that no longer contribute to updating the graph. UniCon filters out such dispensable edges and reduces the intermediate data size significantly. (Section “UniStar-opt: Filtering Out Dispensable Edge”)
  3. In UniStar, each worker uses a data structure to keep the preceding node for each node. It is easy to run out of memory if the data structure is inadequately designed, especially on commodity machines. How do we efficiently design the data structure and guarantee the memory consumption of UniCon? A hybrid data structure of an array and a hash table ensures that the expected memory size required by each worker is O((|V| + |E|)/ρ) while showing the fast performance in practice, where ρ is the number of partitions. (Section “A Hybrid Map Data Structure”)

We describe the overall structure of UniCon. Algorithm 1 is the pseudocode of UniCon. UniCon consists of three steps: sketching, partitioning, and finishing. Fig 1 is an example showing the input and output of each step. The sketching step, proposed in [34], computes connected components on each chunk of the input graph to reduce the graph size and does a load balancing work (line 2), where a chunk is a subset of edges existing consecutively in storage. In the partitioning step, UniCon partitions the input graph into the number of partitions ρ overlapping subgraphs by iteratively running the unified star-operation UniStar or the optimized version UniStar-opt (lines 3-9). If the number of input edges is less than a threshold τ, UniCon runs Rem instead of UniStar to reduce the number of rounds (lines 4-8). After the partitioning step, the nodes in each subgraph are connected to the representative node by paths so that the finishing step computes connected components correctly by independently processing each subgraph using Rem (line 10). We describe UniStar in Section “UniStar: The Unified Star Operation” and UniStar-opt in Section “UniStar-opt: Filtering Out Dispensable Edges” in detail.

Algorithm 1: UniCon

Input: E as an edge list and V where G = (V, E)

Output: {(u, m(Λ(u, G)))|uV}

1 F ← ∅;

2 E ← Sketching(E)          // of [34]

3 repeat

4if |E| > τ then

5   E, ‘sep’ ← UniStar-opt(E)     // Algorithm 2

6   FF∪ ‘sep’;

7else

8   E ← Rem(E)          // of [37]

9 until Convergence;

10 return Finishing(V, E, F)

thumbnail
Fig 1. An example showing the input and output for each step of UniCon.

https://doi.org/10.1371/journal.pone.0277527.g001

UniStar: The unified star operation

We first demonstrate that combining two star-operations in a simple way causes data explosion problems. After that, we propose UniStar, a unified star-operation that resolves the data explosion problems.

UniStar-naïve.

One simple method to combine two distributed operations is UniStar-naïve. For each node u, UniStar-naïve connects each neighbor v ∈ Γ(u, G) ∪ {u} to m(Γ(u, G) ∪ {u}), but if vm([Γ(u, G) ∪ {u}]ξ(v)) then connects v to m([Γ(u, G) ∪ {u}]ξ(v)) for load balancing, like in [33]. In round 1 of Fig 2, for example, the neighbors Γ(4, G) ∪ {4} = {1, 2, 4, 5, 6, 8} of node 4 are connected to m([Γ(4, G) ∪ {4}]0) = 2 or m([Γ(4, G) ∪ {4}]1) = 1, and m([Γ(4, G) ∪ {4}]0) = 2 is connected to m(Γ(4, G) ∪ {4}) = 1.

thumbnail
Fig 2. UniStar-naïve causes a data explosion problem.

The more edges, the darker the background color in red. The blue lines show how edge (17, 16) propagates; the edge is copied every round. The dashed-line is the partition.

https://doi.org/10.1371/journal.pone.0277527.g002

Simply combining two distributed operations, however, causes a data explosion problem, which prolongs the running time significantly or leads to failure. Fig 2 shows the demonstration of the data explosion problem caused by UniStar-naïve. The number of edges rises to 29 in Round 2, while the number of input edges is 14. The reason the number of edges increases is that each edge is processed on both side nodes. For example, UniStar-naïve of round 1 copies edge (17, 16) in the original graph to edges (17, 12) and (17, 16) since m(Γ(16, G) ∪ {16}) = 12 and m(Γ(17, G) ∪ {17}) = 16. The copied edges are copied again and again in subsequent rounds, causing the data explosion problem.

UniStar.

UniStar avoids the data explosion problem in two ways: partition-aware processing and excluding intact-edge.

Partition-aware processing. For each partition i, partition-aware processing is to handle the nodes in partition i and their incident edges together on the same machine. Partition-aware processing has two advantages; it significantly reduces the number of edges by removing duplicate edges made in each partition and accelerates convergence by providing opportunities for each node u to jump to near m(Λ(u, G)) through the edges in each partition. Let Gi = (Vi, Ei) be the subgraph of G induced by the set Ei of edges incident to the nodes in partition i. UniStar computes by connecting each node u to m(Λ(u, Gi)), but if um([Λ(u, Gi)]ξ(u)) then u to m([Λ(u, Gi)]ξ(u)) for load balancing where Λ(u, Gi) is Gi’s connected component that contains u. Then, the output graph of UniStar is G′ = (V′, E′) where and . Fig 3 shows an illustration of UniStar when the number ρ of partitions is 2. In this example, nodes 20 and 2 in are connected to nodes 2 and 1, respectively, because Λ(20, G0) = {1, 2, 4, 5, 6, 8, 11, 12, 16, 17, 20}, m([Λ(20, G0)]ξ(20)) = 2, and m(Λ(2, G0)) = 1.

thumbnail
Fig 3. An example of UniStar when the number of partitions ρ = 2.

Red and blue areas represent 0 and 1, respectively. UniStar divides G into overlapping subgraphs G0 and G1, computes from G0 by connecting each node uV0 to m(Λ(u, G0)) = 1 or m([Λ(u, G0)]ξ(u)) = 2, and computes in the same way. Then, G′ is the union of and .

https://doi.org/10.1371/journal.pone.0277527.g003

Theorem 1. The output graph G′ = (V′, E′) of UniStar has the same connectivity as the input graph G = (V, E).

Proof. If nodes u and v in Gi are connected by a path, then m(Λ(u, Gi)) = m(Λ(v, Gi)). All nodes u in are connected to m(Λ(u, Gi)) directly or through m([Λ(u, Gi)]ξ(u)). It indicates that Gi and have the same connectivity. G and G′ also have the same connectivity as E = ⋃i∈[ρ] Ei and by definition.

Excluding intact-edge. UniStar reduces the amount of data to process by excluding several intact edges when dividing the input graph G into overlapping subgraphs Gi for i ∈ [ρ]. We say an edge is intact if the edge has not changed when transforming Gi to , and we let be the set of intact edges from partition i. An intact edge (u, v)∈Iξ(v) for ξ(u)≠ξ(v) implies that u has no path to another node through node v, and thus UniStar excludes the edge from Gξ(u) in the next round. In Fig 4, for example, blue edges in round r are intact, and the intact edge sets are I0 = {(4, 2)} and I1 = {(4, 1), (11, 9), (13, 9), (17, 16)}. Let G(r), G′(r), Gi(r), and be G, G′, Gi, and in round r, respectively. Similarly, Let Ii(r) be Ii in round r. The output of round r is the input of round r + 1, i.e., G′(r) = G(r+ 1). UniStar divides G(r+ 1) into Gi(r+ 1) for i ∈ [ρ], and each edge (u, v) in G(r + 1) exists in two subgraphs Gξ(u)(r+ 1) and Gξ(v)(r+ 1) if ξ(u)≠ξ(v). If edge (u, v) is an intact edge in Ii(r), meanwhile, UniStar excludes (u, v) from Gj(r+ 1) for ji. In Fig 4, edge exists in both G0(r + 1) and G1(r + 1), but edge (4, 1)∈I1(r) does only in G1(r + 1) and is excluded from G0(r + 1).

thumbnail
Fig 4. Intact edges in round r are marked in blue.

The output of UniStar in round r becomes the input in round r + 1. Intact edge (u, v) in round r such that ξ(u)≠ξ(v) is excluded in a subgraph of round r + 1. The excluded intact edges are marked with red dashed lines.

https://doi.org/10.1371/journal.pone.0277527.g004

A MapReduce version of UniStar is listed in Algorithm 2 (ignore the blue lines). Given an edge (u, v) such that u > v, the map function of UniStar emits 〈ξ(u);(u, v)〉 and 〈ξ(v);(u, v)〉 (lines 1-4) so that the input of the reduce function is the edge set Ei. If node u (or v) has a tag that tells the edge (u, v) is intact in the previous round, UniStar doesn’t emit 〈ξ(u);(u, v)〉 (or 〈ξ(u);(u, v)〉), because edge (u, v) is excluded from Gξ(u) (or Gξ(v)). The tag is attached during the reduce function of UniStar in the previous round (lines 17 and 19). Given Ei, the reduce function of UniStar first finds m(Λ(u, Gi)) as p(u) for each node uVi using the Rem algorithm (line 8) where Gi = (Vi, Ei) is the subgraph induced by Ei. A node uc means edge (u, p(u)) has changed. For each node v in each connected component pu of Gi where u = m(pu), UniStar emits 〈v, u〉 if v = m([pu]ξ(v)), otherwise it emits 〈v, m([pu]ξ(v))〉 (lines 9-11, 15-19, and 22-23). If v = m([pu]ξ(v)) and vc, UniStar attaches a tag to the emitted pair 〈v;u〉 meaning that edge (v, u) is intact (lines 15-19). The tag is used for excluding intact edges in the next round.

Algorithm 2: UniStar-opt

Map: input 〈u;v〉 such that u > v

1 if u has no tag B then

2  emit 〈ξ(u);(u, v)〉

3 if v has no tag and ξ(u)≠ξ(v) then

4  emit 〈ξ(v);(u, v)〉

Reduce: input 〈i;Ei

5 Let Gi = (Vi, Ei) be the subgraph induced by Ei

6 Compute |Γ+(u, Gi)| and |Γ(u, Gi)| for each uVi ∩ [V]i

7 Clear all tags in Ei

 /* p: set of key-value pairs where each key is a node u and the value p(u) is m(Λ(u, Gi)) */

 /* c: set of keys of changed pairs in p: (u, p(u)) ∉ Ei if uc */

8 p, c← findCCInPartition(Ei)      // Rem [37] with change tracking

9 foreach u∈ the set of unique values in p do

10  let pu be the set of all keys whose value is u in p

11foreach vpu\{u} do

12   if ξ(v) = i and |Γ(v, Gi)| = |Γ+(v, Gi)| = 0 then

      // case 3: do nothing

13   else if ξ(u) = i and |Γ(u, Gi)| = 0 then

14    emit 〈v, u〉 to ‘sep’      // case 2

15   else if v = m([pu]ξ(v)) then

16    if vc and ξ(v) = i and ξ(u) ≠ i then

17     emit 〈v;u with a tag〉

18    else if vc and ξ(v) ≠ i then

19     emit 〈v with a tag;u

20    else if ξ(v) = i and |Γ+(v, Gi)| = 0 the

21     emit 〈v, m([pu]ξ(v))〉 to ‘sep’      // case 1

22    else

23     emit 〈v, m([pu]ξ(v))〉

UniStar-opt: Filtering Out Dispensable Edges

In this section, we propose UniStar-opt that reduces the size of processed data significantly by filtering out dispensable edges. We have noticed that, even if we filter out a considerable number of edges during UniStar, UniCon is able to compute the connected components of a graph correctly. UniStar-opt excludes such filtered edges from the input of subsequent rounds so that the amount of disk and network I/Os decreases dramatically (see Fig 10). UniStar-opt filters out an edge in three cases:

  1. Case 1. ξ(u) = ξ(v) = i and Γ+(u, Gi) = ∅ where Γ+(u, Gi) = {w|(w, u)∈Ei}
  2. Case 2. ξ(v) = i and Γ(v, Gi) = ∅ where Γ(v, Gi) = {w|w has no tag, (w, v)∈Ei}∪{w|w has no tag, (v, w)∈Ei}
  3. Case 3. ξ(u) = i and Γ(u, Gi) = Γ+(u, Gi) = ∅

UniStar-opt accumulates the edges of cases 1 and 2 over several rounds into ‘sep’ and uses ‘sep’ as the input of the finishing step. The edges of case 3 are just discarded. In case 1, edge (u, v) belongs to partition i entirely, and node u has no following neighbor. By the definition of Gi, all edges incident to node u in G are also incident to node u in Gi if ξ(u) = i; this fact guarantees that, if node u has no following neighbor in Gi, there is no node w connected to m(Λ(w, G)) through (u, v). Thus, it is safe for UniStar-opt to exclude the edge (u, v) from the input of the next round. Note that, even if node u is not yet connected to m(Λ(u, G)), the finishing step connects u to m(Λ(u, G)) finally. Cases 2 and 3 are for the edges that remain unchanged in subsequent rounds of UniStar since they are in the connected components already discovered. In case 2, every edge incident to node v is intact in Gi where ξ(v) = i. In this case, v is m(Λ(v, G)) and all w ∈ Λ(v, Gi) are directly connected to v. That is, the edges do not change anymore, and UniStar-opt filters them out safely. In case 3, the only edge incident to node u is (u, v) that is intact in Gi where ξ(u) = i. UniStar-opt discards (u, v) from because node v has no chance to connect with another node through node u and the same edge also exists in Gξ(v).

Theorem 2. For an edge (u, v) such that ξ(u)≠ξ(v), if (u, v) is the only edge incident to node u in Gξ(u), (u, v) also exists in Gξ(v).

Proof. A logically equivalent claim: if (u, v) is in Eξ(u), then (u, v)∈Eξ(v) or (u, w)∈Eξ(u) for some wv. The claim is directly true for non-intact edges by the definition of Ei: a non-intact edge (u, v) exists in both Eξ(u) and Eξ(v) if ξ(u)≠ξ(v). All edges in the original graph are non-intact. Assume that a non-intact edge (u, v) changes to an intact edge in partition ξ(u) of round r, i.e., (u, v)∉Iξ(u)(r − 1) and . Edge (u, v) also exists in Eξ(v)(r) as (u, v) is non-intact. If (u, v) is in , (u, v) belongs to Iξ(v)(r) and (u, v) exists in both Eξ(u)(r + 1) and Eξ(v)(r + 1), following the claim. If (u, v) is not in , it implies (u, w) is in such that wv. Then, (u, w) belongs to Eξ(u)(r + 1), following the claim.

Fig 5 shows an illustration of UniStar-opt when the threshold τ is 4. The edges filtered by cases 1, 2, and 3 are marked with orange, green, and purple dashed lines, respectively. Blue lines are intact edges. Fig 5(a) is an input graph G consists of 4 connected components and 11 edges. In round 1 of Fig 5(b), edges (4, 2) and (20, 12) in and edges (15, 9) and (19, 9) in are filtered by case 1. For example, edge (4, 2) is in case 1 because node 4 has no following neighbor and nodes 2 and 4 are in partition i. In round 2 of Fig 5(c), edge (6, 5) in is filtered by case 2 because ξ(5) = 1 and all edges incident to node 5 in G1 is intact. In the same round, edge (6, 5) in is discarded by case 3 because ξ(6) = 0 and edge (6, 5) is intact and the only edge incident to node 6 in G0. The number of remaining edges shrinks every round quickly, and round 3 is the last round because the number of remaining edges is less than τ = 4. After running the Rem algorithm on the graph induced by the remaining edges, the output edges of Rem and the edges filtered by cases 1 and 2 together become the input of the finishing step as in Fig 5(e). In this example, the input and output of Rem are the same.

thumbnail
Fig 5. An example of UniStar-opt when the threshold τ = 4.

Intact edges are represented as blue lines. The edges filtered by cases 1, 2, and 3 are marked with orange, green, and purple dashed lines, respectively. As the number of remaining edges after round 3 is less than τ = 4, the output edges of Rem on the graph induced by the remaining edges and the edges filtered by cases 1 and 2 become the input of the finishing step.

https://doi.org/10.1371/journal.pone.0277527.g005

A MapReduce version of UniStar-opt, the optimized version of UniStar, is listed in Algorithm 2; added or modified lines from the UniStar are marked in blue. The reduce function processes the edges of the three filtering cases at lines 20-21 (case 1), lines 13-14 (case 2), and line 12 (case 3), respectively. Computing |Γ+(u, Gi)| and |Γ(u, Gi)| for each node uVi∩[V]i in advance (lines 5-6), UniStar-opt checks within a constant time that in which case each edge is.

A hybrid map data structure

To figure out m(Λ(u, Gi)) for each node u, UniStar (as well as UniStar-opt) uses the Rem algorithm with modification for tracking changes. The original Rem algorithm uses an array of size |V| for the mapping table p that maps each node u to m(Λ(u, Gi)). If UniStar uses the original Rem algorithm, each worker processing a subgraph Gi = (Vi, Ei) requires |V| memory space for the mapping table p and causes an out-of-memory error when |V| exceeds the memory size of a worker; even though |Vi| is much smaller than |V|, the array size for p should be |V| since every node has a possibility of belonging to Vi. One easy solution to avoid an out-of-memory error is using a hash table for p instead of an array; it is guaranteed that the memory space required by a hash table is O(|Vi|). However, accessing values by key from a hash table is 10 to 100 times slower than accessing values by index from an array.

We propose HybridMap, a data structure that guarantees fast performance and low memory usage by using the fact that Gi is induced by Ei. HybridMap takes advantage of both an array and a hash table; an array is used for nodes in [V]i, and a hash table is used for nodes not in [V]i. As Ei is the set of edges incident to nodes in [V]i, most nodes in Vi are in [V]i. It suggests that most accesses to p are performed quickly on the array while suggesting much fewer accesses to the hash table. HybridMap requires as much memory space as the size of the array plus the number of items on the hash table. The array size equals the cardinality of [V]i as the nodes are totally ordered. The number of items in the hash table is |Vi\[V]i|, and it is bounded by |Ei| = O(|V|/ρ); the initialization step of UniCon makes |E| to be α|V| for a constant α [34]. The hash table size |Vi\[V]i| is much smaller than the array size |[V]i| in practice (see Section “Efficacy of HybridMap”).

Theorem 3. UniStar with HybridMap requires O((|V| + |E|)/ρ) memory space to compute from Gi.

Proof. UniStar stores one value for each node of Vi in HybridMap. HybridMap uses an array for nodes in [V]i and a hash table for nodes in [V]i\Vi. As UniStar partitions the nodes randomly, the expected size of [V]i is |V|/ρ. As Gi is induced by Ei, |[V]i\Vi| is at most |Ei| when every edge contains a distinct node not belonging to [V]i. Assuming that the edges are evenly distributed by a random hash function, the expected size of Ei is 2|E|/ρ; every edge can be in two subgraphs. Thus, HybridMap requires O(|[V]i| + |[V]i\Vi|) = O((|V| + |E|)/ρ).

UniStar with HybridMap avoids an out-of-memory error by setting ρ to be O((|V| + |E|)/M) where M is the memory size of a worker. Large ρ decreases the benefit of UniStar’s partition-aware processing and increases the running time. Thus, setting ρ as low as possible is good for performance, even though the running time does not increase much as ρ increases (see Fig 7).

Experiments

In this section, we aim to answer the following questions from the experiments:

  1. Q1 Efficacy of UniStar (Section “Efficacy of UniStar”). How much intermediate data does UniStar reduce to resolve the data explosion problem?
  2. Q2 Efficacy of edge filtering (Section “Efficacy of Edge Filtering”). How many edges are filtered by the edge filtering of UniStar-opt?
  3. Q3 Efficacy of HybridMap (Section “Efficacy of HybridMap”). How efficient is UniCon using HybridMap compared to using an array or a hash table?
  4. Q4 Scalability (Section “Scalability”). How does UniCon scale up in terms of the number of machines and the data size?
  5. Q5 Performance on real-world datasets (Section “Results on Real-world Datasets”). How well does UniCon perform on real-world graphs compared to state-of-the-art algorithms?

Experimental settings

Datasets.

We evaluate UniCon with real-world graphs summarized in Table 2. TW is a follower-followee network in Twitter. LJ and FS are friendship networks in social networking services LiveJournal and Friendster, respectively. SD is a domain level hyperlink network. GSH, CW, and HL are page level hyperlink networks. RMAT-k for k ∈ {21, 23, 25, 27, 29, 31, 33} is a synthetic graph following RMAT model [43], and we generate it using TeGViz [44], a distributed graph generator. We set RMAT parameters (a, b, c, d) to (0.57, 0.19, 0.19, 0.05).

Machines.

The cluster used in the experiment consists of 10 machines, and each machine is equipped with Intel Xeon E3-1220 CPU (4-cores at 3.10GHz), 16GB RAM, and 2 SSDs of 1TB. Hadoop v3.2.1, Spark v3.0.1, and MPICH v3.3 are installed. One machine of the cluster acts as the master and also as a worker, and the others act as workers. Single machine algorithms are tested on the master node.

Algorithms.

We implement three versions of UniCon (UniCon-naïve, UniCon-base, UniCon-opt) on Hadoop. For a fair comparison, we use the original codes from the authors of the competitors: Cracker, FastSV, LACC, PowerGraph, PACC, and ConnectIt. All codes are publicly available on the Web. Rem is implemented with C++11.

  • UniCon-naïve: the naïve version of the proposed method described in Section “UniStar-naïve”.
  • UniCon-base: the proposed method in Section “UniStar”.
  • UniCon-opt: the proposed method in Section “UniStar-opt: Filtering Out Dispensable Edges”.
  • PACC [34]: the state-of-the-art MapReduce algorithm.
  • Cracker [30]: a MapReduce algorithm implemented in Apache Spark. We add Cracker here because Cracker and PACC have not been tested together so far.
  • PowerGraph [22]: the connected component algorithm implemented on PowerGraph, a representative pregel-like distributed-memory graph processing system.
  • LACC [24], FastSV [25]: the state-of-the-art distributed-memory algorithms.
  • ConnectIt [15]: the state-of-the-art multi-core algorithm. UF-Rem-CAS and LDD sampling are used since the combination is the fastest according to [15].
  • Rem [37]: the fastest sequential algorithm based on Union-Find according to [35].

Parameters.

Unless otherwise noted, we use all 10 workers. To find the optimal condition for each method, we vary the parameter values and compare the running time. Fig 6 shows the effect of threshold τ on the running time of UniCon-opt on each dataset. When processing GSH,CW, and HL with τ = 2000M, UniCon-opt gets an out-of-memory error as it tries to handle large data using only a single machine. We use the optimal τ for UniCon-opt unless otherwise noted.

Fig 7 shows the running time of UniCon-opt and UniCon-base with various partition numbers ρ. The running time soars when ρ is too small because the methods do not exploit all the workers. Also, the running time tends to increase as ρ increases because of computational overhead, but the increase in running time is marginal when the graph is large enough e.g., GSH, CW, and HL. It implies that, when the graph is enormous, UniCon can avoid an out-of-memory error by increasing ρ, with a slight increase in running time. Accordingly, we set the number of partition ρ to 280 for CW. Both PACC and Cracker perform the best when the number of partitions is 20, and PowerGraph, LACC, and FastSV do the best when the number of processors is 10, 4, and 36, respectively; thus, we use them as the default values.

thumbnail
Fig 7. The running time of UniCon-opt and UniCon-base on various partition number ρ.

With large data like GSH, CW, and HL, when ρ is greater than each optimal value, the running time of UniCon increases marginally as ρ increases.

https://doi.org/10.1371/journal.pone.0277527.g007

Efficacy of UniStar

We compare UniCon-base and UniCon-naïve to show the effects made by UniStar; UniCon-base uses UniStar and UniCon-naïve does UniStar-naïve. For the two operations, Fig 8 shows the input and intermediate data sizes each round where the intermediate data is the output of the Map step in the MapReduce implementation. Thanks to the partition-aware processing and excluding intact-edges, UniStar effectively resolves the data explosion problem that UniStar-naïve suffers from. UniStar reduces the input data and intermediate data sizes significantly up to 4× and 8×, respectively. UniStar-naïve fails to process CW and HL because of massive intermediate data.

thumbnail
Fig 8. The input and intermediate data sizes of UniStar and UniStar-naïve each round.

UniStar reduces both sizes by up to 4× and 8×, respectively, compared to UniStar-naïve.

https://doi.org/10.1371/journal.pone.0277527.g008

Fig 9 shows the running time of UniStar and UniStar-naïve, and the cumulative sums of them. UniStar is 2-5 times faster than UniStar-naïve in every case and UniCon-base takes fewer rounds than UniCon-naïve; accordingly, UniCon-base reduces the total running time by up to 4.3× compared to UniCon-naïve.

thumbnail
Fig 9. The running time of UniStar and UniStar-naïve, and its’ cumulative sums, respectively.

UniStar is faster than UniStar-naïve in all rounds and requires fewer rounds.

https://doi.org/10.1371/journal.pone.0277527.g009

Efficacy of edge filtering

The lines in Fig 10 show the numbers of input edges to UniCon-opt (τ = 0) and UniCon-base each round. We fix τ to be 0 to show the efficacy of edge filtering, which is applied to UniCon-opt. The edge filtering of UniCon-opt decreases the input size rapidly for every dataset. Meanwhile, UniCon-base, which is not accompanied by edge filtering, takes a huge amount of input every round. In round 12 of UniCon-opt, the number of input edges increases because the last round of UniCon-opt is the finishing step; it takes as input the output edges of the partitioning step and filtered edges by cases 1 and 2. UniCon-opt shrinks the input size by 80.4% on average every round. The bars in four colors represent the decreased input sizes by four cases: excluded intact edges (Section “UniStar: The Unified Star Operation”) and filtered edges by three cases in Section “UniStar-opt: Filtering Out Dispensable Edges”. All four cases contribute greatly to reducing the size of the input data. Note that TW, FS, and GSH have only one connected component, so there are no edges filtered by case 2, in which edges belonging to early discovered connected components are filtered.

thumbnail
Fig 10. The number of input edges (denoted by lines) for UniCon-opt (τ = 0) and UniCon-base each round.

Filtering dispensable edges (denoted by bars), UniCon-opt shrinks the input size by 80.4% on average every round on CW.

https://doi.org/10.1371/journal.pone.0277527.g010

Fig 11 shows the running time of UniStar-opt (τ = 0) and UniCon-base, and the cumulative sums of them each round. The running time of UniStar-opt drops dramatically as the input size plummets every round.

thumbnail
Fig 11. The running time of UniStar-opt and UniStar each round.

By the edge filtering, the running time of UniStar-opt drops quickly.

https://doi.org/10.1371/journal.pone.0277527.g011

Efficacy of HybridMap

UniCon-opt uses HybridMap to keep the connectivity of nodes in each partition. To show the effectiveness of HybridMap, we compare the original UniCon-opt with the versions where HybridMap is replaced with arrays and hash tables, respectively. Fig 12 shows the average size of data that UniCon-opt (τ = 0) stores in memory on each worker, and Fig 13 does the running time of UniCon-opt when the underlying data structure used by UniCon-opt is HybridMap, an array, and a hash table, respectively. HybridMap only takes O((|V| + |E|)/ρ) for each worker each round as analyzed theoretically in Theorem 3, while UniCon-opt with arrays fails in processing GSH, CW, and HL because |V| of GSH, CW, and HL exceeds the memory size of a worker. Although HybridMap takes more space than a stand-alone hash table, it is sufficiently small already, and HybridMap reduces the running time by using an array for frequently accessed nodes. The size of data actually stored in the hash table of HybridMap is much smaller than the theoretical result O(|E|/ρ). UniCon-opt with HybridMap shows the best performance in terms of speed every round, and its cumulative sum is also the lowest, except on LJ and TW. UniCon-opt with arrays shows the long running time even though an access to array is much faster than an access to a hash table because it loads an array of size |V| to every worker, taking a long time for memory allocation. The running time of UniCon-opt with HybridMap is 22.7% is lower than UniCon-opt with hash tables on CW.

thumbnail
Fig 12. The numbers of data stored in HybridMap, a hash table, and an array each round of UniCon-opt.

The Y-axis is in a log scale. HybridMap significantly reduces the data stored in memory compared to an array, letting UniCon-opt succeed in processing large graphs: GSH, CW, and HL. HybridMap and a hash table have no significant difference in terms of the peak number of stored data.

https://doi.org/10.1371/journal.pone.0277527.g012

thumbnail
Fig 13. The running time of UniStar-opt with HybridMap, a hash table, and an array, respectively, each round.

UniCon-opt with HybridMap outperforms Unicon-opt with hash tables when the graph is large enough (FS, SD, GSH, CW, and HL) thanks to fast random accesses of HybridMap. UniCon-opt with arrays fails on large graphs (GSH, CW, and HL) because of an out-of-memory error.

https://doi.org/10.1371/journal.pone.0277527.g013

Scalability

Fig 14 shows (left) the data scalability and (right) the machine scalability of all algorithms in Section “Algorithms”. In the left figure, RMAT graphs with various sizes are used for the data scalability analysis. Both axes are in a log scale. UniCon and PACC show sub-linear scalability with slopes under 1. This can be interpreted as Hadoop takes a long time when processing small data due to the start-up and clean-up overheads. Distributed-memory algorithms (PowerGraph, LACC, FastSV), Cracker, ConnectIt, and Rem show near-linear scalability but fail on medium-sized graphs because of out-of-memory errors. As a result, UniCon processes 4096× larger graph than LACC and FastSV, 1024× larger graph than PowerGraph and ConnectIt, 256× larger graph than Cracker, and 4× larger graph than Rem. Even though Rem is a sequential algorithm, Rem handles larger graphs than distributed-memory algorithms because Rem stores only nodes in memory while the distributed-memory algorithms require to store the entire graph in memory. According to papers [25] and [15], LACC, FastSV, and ConnectIt reportedly can process graphs with 100 billion edges when expensive machines are available (e.g., a supercomputer with 262K cores for LACC and FastSV, and an expensive server computer with 72 cores and 1TB memory for ConnectIt). On a commodity cluster of 10 cheap machines used in this experiment, however, they cannot even process a graph containing only a billion edges.

thumbnail
Fig 14. Data and machine scalability.

(left) UniCon handles up to 4096× larger graphs than competitors. (right) UniCon-opt shows the best performance regardless of the number of machines.

https://doi.org/10.1371/journal.pone.0277527.g014

In the right figure, the machine scalability analysis shows the running time on a various number of machines. TW is used. Both axes are in a log scale. All distributed-memory algorithms and MapReduce algorithms are tested, but Cracker, PowerGraph, LACC, and FastSV are omitted here because they fail to process TW because of out-of-memory errors. UniCon-opt shows the best performance regardless of the number of machines. The slope of UniCon-opt from 2 to 10 machines is -0.67, meaning that the running time decreases by 1.59× when the number of machines doubles.

Results on Real-world Datasets

Fig 15 shows the relative running time, compared to UniCon-opt, of all the algorithms in Section “Algorithms” on the real-world graphs listed in Table 2. UniCon-opt shows the best performance on all graphs except LJ. All distributed-memory algorithms (PowerGraph, LACC, FastSV), ConnectIt, and Cracker fail on all graphs except LJ because of out-of-memory errors. Even on LJ, UniCon-opt is faster than LACC and FastSV, while ConnectIt is the fastest. Only UniCon and PACC succeed in processing CW, the largest real-world graph tested in this experiment. UniCon-opt outperforms PACC for all graphs; the speed of UniCon-opt is 143% of PACC’s.

thumbnail
Fig 15. The relative running time, compared to UniCon-opt, of competitors on real-world graphs.

o.o.m.: out-of-memory error. UniCon-opt with optimal τ outperforms all competitors on all graphs except for LJ. Rem, ConnectIt, and PowerGraph are faster than UniCon-opt on LJ but fail on all other graphs because of out-of-memory errors.

https://doi.org/10.1371/journal.pone.0277527.g015

Fig 16 shows the numbers of rounds required by all algorithms except Rem on real-world graphs. For PACC, one execution of a star-operation is counted as one round. Thanks to partition-aware processing, UniCon-opt requires a smaller number of rounds than PACC, Cracker, FastSV, and LACC, and reduces up to 11 rounds compared to competitors. UniCon-opt runs more rounds than PACC on CW because the optimal τ = 2M of Unicon-opt is small. It implies that UniStar-opt is more efficient than PACC’s star-operations, and thus UniCon-opt performs a single machine algorithm only when the data size is reduced sufficiently.

thumbnail
Fig 16. The numbers of rounds required by all algorithms except Rem on real-world graphs.

UniCon-opt requires up to 11 fewer rounds than competitors.

https://doi.org/10.1371/journal.pone.0277527.g016

Conclusion

In this paper, we propose UniStar, a unified star-operation, and UniCon, a new distributed algorithm finding connected components in an enormous graph using UniStar. The partition-aware processing of UniStar effectively avoids the data explosion problem reducing the intermediate data size by up to 87.5% compared to UniStar-naïve. Edge filtering of UniCon shrinks the size of input data by 80.4% on average each round. The HybridMap data structure of UniCon ensures that the memory consumption of each worker is O((|V| + |E|)/ρ) where ρ is the number of partitions and improves performance by 22.7% over when using a typical hash table. As a result, on a commodity cluster, UniCon handles up to 4096 times larger graphs than graphs competitors can process. With a cluster of only 10 cheap machines, UniCon succeeds in processing a graph containing 129 billion edges, showing the fastest performance.

References

  1. 1. He L, Chao Y, Suzuki K, Wu K. Fast connected-component labeling. Pattern Recognition. 2009;42(9):1977–1987.
  2. 2. Kang U, McGlohon M, Akoglu L, Faloutsos C. Patterns on the Connected Components of Terabyte-Scale Graphs. In: Webb GI, Liu B, Zhang C, Gunopulos D, Wu X, editors. ICDM. IEEE Computer Society; 2010. p. 875–880.
  3. 3. Lyu Q, Li Y, He B, Gong B. DBL: Efficient Reachability Queries on Dynamic Graphs. In: DASFAA. vol. 12682 of Lecture Notes in Computer Science. Springer; 2021. p. 761–777.
  4. 4. Medini D, Covacci A, Donati C. Protein Homology Network Families Reveal Step-Wise Diversification of Type III and Type IV Secretion Systems. PLoS Computational Biology. 2006;2(12). pmid:17140285
  5. 5. Albert R. Scale-free networks in cell biology. Journal of cell science. 2005;118(21):4947–4957. pmid:16254242
  6. 6. Aktas ME, Nguyen T, Akbas E. Homology Preserving Graph Compression. In: ICMLA; 2021. p. 930–935.
  7. 7. Kang U, Faloutsos C. Beyond’Caveman Communities’: Hubs and Spokes for Graph Compression and Mining. In: Cook DJ, Pei J, Wang W, Zaïane OR, Wu X, editors. ICDM. IEEE Computer Society; 2011. p. 300–309.
  8. 8. Lim Y, Kang U, Faloutsos C. SlashBurn: Graph Compression and Mining beyond Caveman Communities. TKDE. 2014;26(12):3077–3089.
  9. 9. Bruglieri M, Cordone R. Metaheuristics for the Minimum Gap Graph Partitioning Problem. Computers and Operations Research. 2021;132:105301.
  10. 10. Lim Y, Lee W, Choi H, Kang U. Discovering large subsets with high quality partitions in real world graphs. In: BIGCOMP. IEEE Computer Society; 2015. p. 186–193.
  11. 11. Lim Y, Lee W, Choi H, Kang U. MTP: discovering high quality partitions in real world graphs. WWW. 2017;20(3):491–514.
  12. 12. Jung J, Shin K, Sael L, Kang U. Random Walk with Restart on Large Graphs Using Block Elimination. TODS. 2016;41(2):12:1–12:43.
  13. 13. Shiloach Y, Vishkin U. An O(log n) Parallel Connectivity Algorithm. J Algorithms. 1982;3(1):57–67.
  14. 14. Bader DA, Cong G. A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs). J Parallel Distributed Comput. 2005;65(9):994–1006.
  15. 15. Dhulipala L, Hong C, Shun J. ConnectIt: A Framework for Static and Incremental Parallel Graph Connectivity Algorithms. CoRR. 2020;abs/2008.03909.
  16. 16. Kim M, Lee S, Han W, Park H, Lee J. DSP-CC-: I/O Efficient Parallel Computation of Connected Components in Billion-Scale Networks. TKDE. 2015;27(10):2658–2671.
  17. 17. Kyrola A, Blelloch GE, Guestrin C. GraphChi: Large-Scale Graph Computation on Just a PC. In: Thekkath C, Vahdat A, editors. OSDI. USENIX Association; 2012. p. 31–46.
  18. 18. Low Y, Gonzalez JE, Kyrola A, Bickson D, Guestrin C, Hellerstein JM. GraphLab: A New Framework For Parallel Machine Learning. CoRR. 2014;abs/1408.2041.
  19. 19. Zheng D, Mhembere D, Burns RC, Vogelstein JT, Priebe CE, Szalay AS. FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs. In: FAST. USENIX Association; 2015. p. 45–58.
  20. 20. Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, et al. Pregel: a system for large-scale graph processing; 2010. p. 135–146.
  21. 21. Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I. GraphX: Graph Processing in a Distributed Dataflow Framework. In: Flinn J, Levy H, editors. OSDI. USENIX Association; 2014. p. 599–613.
  22. 22. Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In: OSDI; 2012. p. 17–30.
  23. 23. Chen R, Shi J, Chen Y, Chen H. PowerLyra: differentiated graph computation and partitioning on skewed graphs. In: EuroSys; 2015. p. 1:1–1:15.
  24. 24. Azad A, Buluç A. LACC: A Linear-Algebraic Algorithm for Finding Connected Components in Distributed Memory. In: IPDPS. IEEE; 2019. p. 2–12.
  25. 25. Zhang Y, Azad A, Hu Z. FastSV: A Distributed-Memory Connected Component Algorithm with Fast Convergence. In: Biros G, Yang UM, editors. PPSC. SIAM; 2020. p. 46–57.
  26. 26. Stergiou S, Rughwani D, Tsioutsiouliklis K. Shortcutting Label Propagation for Distributed Connected Components. In: Chang Y, Zhai C, Liu Y, Maarek Y, editors. WSDM. ACM; 2018. p. 540–546.
  27. 27. Turifi M. Optimisation techniques for finding connected components in large graphs using GraphX. The University of Salford; 2018.
  28. 28. Kang U, Tsourakakis CE, Faloutsos C. PEGASUS: A Peta-Scale Graph Mining System. In: ICDM; 2009. p. 229–238.
  29. 29. Rastogi V, Machanavajjhala A, Chitnis L, Sarma AD. Finding connected components in map-reduce in logarithmic rounds. In: ICDE; 2013. p. 50–61.
  30. 30. Lulli A, Ricci L, Carlini E, Dazzi P, Lucchese C. Cracker: Crumbling large graphs into connected components. In: ISCC. IEEE Computer Society; 2015. p. 574–581.
  31. 31. Lulli A, Carlini E, Dazzi P, Lucchese C, Ricci L. Fast Connected Components Computation in Large Graphs by Vertex Pruning. IEEE Trans Parallel Distributed Syst. 2017;28(3):760–773.
  32. 32. Kiveris R, Lattanzi S, Mirrokni VS, Rastogi V, Vassilvitskii S. Connected Components in MapReduce and Beyond. In: SOCC; 2014. p. 18:1–18:13.
  33. 33. Park H, Park N, Myaeng S, Kang U. Partition Aware Connected Component Computation in Distributed Systems. In: ICDM; 2016. p. 420–429.
  34. 34. Park HM, Park N, Myaeng SH, Kang U. PACC: Large scale connected component computation on Hadoop and Spark. PLOS ONE. 2020;15(3):1–25. pmid:32187232
  35. 35. Patwary MMA, Blair JRS, Manne F. Experiments on Union-Find Algorithms for the Disjoint-Set Data Structure. In: Festa P, editor. SEA. vol. 6049. Springer; 2010. p. 411–423.
  36. 36. Tarjan RE, van Leeuwen J. Worst-case Analysis of Set Union Algorithms. J ACM. 1984;31(2):245–281.
  37. 37. Dijkstra EW. A Discipline of Programming. Prentice-Hall; 1976.
  38. 38. Shun J, Blelloch GE. Ligra: a lightweight graph processing framework for shared memory. In: PPoPP; 2013. p. 135–146.
  39. 39. Zhu X, Chen W, Zheng W, Ma X. Gemini: A Computation-Centric Distributed Graph Processing System. In: OSDI; 2016. p. 301–316.
  40. 40. Awerbuch B, Shiloach Y. New Connectivity and MSF Algorithms for Shuffle-Exchange Network and PRAM. IEEE Trans Computers. 1987;36(10):1258–1263.
  41. 41. Buluç A, Gilbert JR. The Combinatorial BLAS: design, implementation, and applications. Int J High Perform Comput Appl. 2011;25(4):496–509.
  42. 42. Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. vol. 51; 2008. p. 107–113.
  43. 43. Chakrabarti D, Zhan Y, Faloutsos C. R-MAT: A Recursive Model for Graph Mining. In: SIAM. SIAM; 2004. p. 442–446.
  44. 44. Jeon B, Jeon I, Kang U. TeGViz: Distributed Tera-Scale Graph Generation and Visualization. In: ICDMW. IEEE Computer Society; 2015. p. 1620–1623.
  45. 45. Kwak H, Lee C, Park H, Moon SB. What is Twitter, a social network or a news media? In: Rappa M, Jones P, Freire J, Chakrabarti S, editors. WWW. ACM; 2010. p. 591–600.