Incremental k-core decomposition: algorithms and evaluation

Sarıyüce, Ahmet Erdem; Gedik, Buğra; Jacques-Silva, Gabriela; Wu, Kun-Lung; Çatalyürek, Ümit V.

doi:10.1007/s00778-016-0423-8

Incremental k-core decomposition: algorithms and evaluation

Regular Paper
Published: 15 February 2016

Volume 25, pages 425–447, (2016)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Ahmet Erdem Sarıyüce¹,
Buğra Gedik²,
Gabriela Jacques-Silva³,
Kun-Lung Wu³ &
…
Ümit V. Çatalyürek⁴

2943 Accesses
57 Citations
3 Altmetric
Explore all metrics

Abstract

A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-hard problems on real networks efficiently, like maximal clique finding. In many real-world applications, networks change over time. As a result, it is essential to develop efficient incremental algorithms for dynamic graph data. In this paper, we propose a suite of incremental k-core decomposition algorithms for dynamic graph data. These algorithms locate a small subgraph that is guaranteed to contain the list of vertices whose maximum k-core values have changed and efficiently process this subgraph to update the k-core decomposition. We present incremental algorithms for both insertion and deletion operations, and propose auxiliary vertex state maintenance techniques that can further accelerate these operations. Our results show a significant reduction in runtime compared to non-incremental alternatives. We illustrate the efficiency of our algorithms on different types of real and synthetic graphs, at varying scales. For a graph of 16 million vertices, we observe relative throughputs reaching a million times, relative to the non-incremental algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph based anomaly detection and description: a survey

Article 05 July 2014

Leman Akoglu, Hanghang Tong & Danai Koutra

Time-Dependent Graphs: Definitions, Applications, and Algorithms

Article Open access 25 September 2019

Yishu Wang, Ye Yuan, … Guoren Wang

Propagation kernels: efficient graph kernels from propagated information

Article 17 July 2015

Marion Neumann, Roman Garnett, … Kristian Kersting

Notes

https://github.com/sparsehash/sparsehash.

References

Aksu, H., Canim, M., Chang, Y., Korpeoglu, I., Ulusoy, O.: Distributed-Core View Materialization and Maintenance for Large Dynamic Graphs. Knowl Data Eng. IEEE Trans. 26(10), 2439–2452 (2014)
Alvarez-Hamelin, J.I., Dall’Asta, L., Barrat, A., Vespignani, A.: k-Core decomposition: a tool for the visualization of large scale networks. In: The Computing Research Repository (CoRR), arXiv:abs/cs/0504107 (2005)
Andersen, R., Chellapilla, K.: Finding dense subgraphs with size bounds. In: Workshop on Algorithms and Models for the Web Graph (WAW), pp. 25–37 (2009)
Bader, G.D., Hogue, C.W.V.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinf. 4(1), 1–27 (2003). doi:10.1186/1471-2105-4-2
Balasundaram, B., Butenko, S., Hicks, I.: Clique relaxations in social network analysis: the maximum \(k\)-plex problem. Oper. Res. 59, 133–142 (2011)
Article MathSciNet MATH Google Scholar
Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Article MathSciNet MATH Google Scholar
Batagelj, V., Zaversnik, M.: An O(m) algorithm for cores decomposition of networks. In: The Computing Research Repository (CoRR), arXiv:cs.DS/0310049 (2003)
Baur, M., Gaertler, M., Görke, R., Krug, M., Wagner, D.: Augmenting k-core generation with preferential attachment. Netw. Heterog. Media 3(2), 277–294 (2008)
Article MathSciNet MATH Google Scholar
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: A recursive model for graph mining. In: SIAM International Conference on Data Mining (SDM) (2004)
Cheng, J., Ke, Y., Chu, S., Ozsu, M.T.: Efficient core decomposition in massive networks. In: IEEE International Conference on Data Engineering (ICDE), pp. 51–62 (2011)
DIMACS. 10th DIMACS Implementation Challenge. http://www.cc.gatech.edu/dimacs10
Dorogovtsev, S.N., Goltsev, A.V., Mendes, J.F.F.: k-core organization of complex networks. Phys. Rev. Lett. 96(4), 040601 (2006)
Dourisboure, Y., Geraci, F., Pellegrini, M.: Extraction and classification of dense communities in the web. In: World Wide Web Conference (WWW), pp. 461–470 (2007)
Erdős, P., Hajnal, A.: On chromatic number of graphs and set-systems. Acta Math. Hung. 17, 61–99 (1966)
Article MathSciNet MATH Google Scholar
Erdős, P., Rényi, A.: On the Evolution of Random Graphs, pp. 17–61. Institute of Mathematics, Hungarian Academy of Sciences, Budapest, Hungary (1960)
Fortunato, S.: Community detection in graphs. Phys. Rep. 483(3–5), 75–174 (2009)
MathSciNet Google Scholar
Gaertler, M.: Dynamic analysis of the autonomous system graph. In: International Workshop on Inter-domain Performance and Simulation (IPS), pp. 13–24 (2004)
Giatsidis, C., Thilikos, D.M., Vazirgiannis, M.: D-cores: Measuring collaboration of directed graphs based on degeneracy. In: IEEE International Conference on Data Mining (ICDM), pp. 201–210 (2011)
Giatsidis, C., Thilikos, D.M., Vazirgiannis, M.: Evaluating cooperation in communities with the \(k\)-core structure. In: International Conference on Advances in Social Network Analysis and Mining (ASONAM), pp. 87–93 (2011)
Healy, J., Janssen, J., Milios, E., Aiello, W.: Characterization of graphs using degree cores. In: Workshop on Algorithms and Models for the Web Graph (WAW), pp. 137–148 (2006)
Kortsarz, G., Peleg, D.: Generating sparse 2-spanners. J. Algorithms 17(2), 222–236 (1994)
Article MathSciNet MATH Google Scholar
Li, R.-H., Yu, J.X.: Efficient Core Maintenance in Large Dynamic Graphs. CoRR, arXiv:1207.4567 (2012)
Luczak, T.: Size and connectivity of the k-core of a random graph. Discrete Math. 91(1), 61–68 (1991)
Article MathSciNet MATH Google Scholar
Nanavati, A.A., Siva, G., Das, G., Chakraborty, D., Dasgupta, K., Mukherjea, S., Joshi, A.: On the structural properties of massive telecom call graphs: findings and implications. In: ACM International Conference on Information and Knowledge Management (CIKM), pp. 435–444 (2006)
Ozgul, F., Erdem, Z., Bowerman, C., Atzenbeck, C.: Comparison of feature-based criminal network detection models with k-core and n-clique. In: International Conference on Advances in Social Network Analysis and Mining (ASONAM), pp: 400–401 (2010)
Saito, H., Toyoda, M., Kitsuregawa, M., Aihara, K.: A large-scale study of link spam detection by graph algorithms. In: International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pp. 45–48 (2007)
Samudrala, R., Moult, J.: A graph-theoretic algorithm for comparative modeling of protein structure. J. Mol. Biol. 279(1), 287–302 (1998)
Article Google Scholar
Sarıyüce, A.E., Gedik, B., Jacques-Silva, G., Wu, K.-L., Çatalyürek, Ü.V.: Streaming algorithms for k-core decomposition. In: Proceedings of the Very Large Data Bases Conference (PVLDB), pp. 433–444 (2013)
Seidman, S.B.: Network structure and minimum degree. Soc. Netw. 5(3), 269–287 (1983)
Article MathSciNet Google Scholar
SNAP. Stanford network analysis package. http://snap.stanford.edu/snap
Turaga, D., Andrade, H., Gedik, B., Venkatramani, C., Verscheure, O., Harris, J.D., Cox, J., Szewczyk, W., Jones, P.: Design principles for developing stream processing applications. Softw. Pract. Exp. 40(12), 1073–1104 (2010)
Article Google Scholar
Verma, A., Butenko, S.: Network clustering via clique relaxations: a community based approach. In: 10th DIMACS Implementation Challenge (2011)
Wuchty, S., Almaas, E.: Peeling the yeast protein network. Proteomics 5(2), 444–449 (2005)
Article Google Scholar
Zhang, Y., Parthasarathy, S.: Extracting analyzing and visualizing triangle k-core motifs within networks. In: IEEE International Conference on Data Engineering (ICDE), pp. 1049–1060 (2012)

Download references

Acknowledgments

This work is partially sponsored by the US Defense Advanced Research Projects Agency (DARPA) under the Social Media in Strategic Communication (SMISC) program (Agreement No. W911NF-12-C-0028). The views and conclusions contained in this document are those of the author(s) and should not be interpreted as representing the official policies, either expressed or implied, of DARPA or the US Government.This work is also partially sponsored by The Scientific and Technological Research Council of Turkey (TÜBİTAK) under Grant EEEAG #112E271.

Author information

Authors and Affiliations

Sandia National Labs, Livermore, CA, USA
Ahmet Erdem Sarıyüce
Bilkent University, Ankara, Turkey
Buğra Gedik
IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Gabriela Jacques-Silva & Kun-Lung Wu
The Ohio State University, Columbus, OH, USA
Ümit V. Çatalyürek

Authors

Ahmet Erdem Sarıyüce
View author publications
You can also search for this author in PubMed Google Scholar
Buğra Gedik
View author publications
You can also search for this author in PubMed Google Scholar
Gabriela Jacques-Silva
View author publications
You can also search for this author in PubMed Google Scholar
Kun-Lung Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ümit V. Çatalyürek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmet Erdem Sarıyüce.

Appendix: Generic RCD maintenance

As stated earlier in Sect. 4.3.1, maintaining RCD values is a non-trivial operation. Yet, it is critical in reducing the scopes of the traversals, potentially bringing down the cost of edge modifications. Overall, efficient mechanisms for maintaining RCD values is needed. Here, we introduce the generic versions of the RCD maintenance algorithms, which update the RCD values of vertices up to the given hop count n. In other words, given the number of hops, n, the proposed algorithms maintain the RCD values for \(n, n-1, \ldots , 1\).

multihopPrepareRCDsInsertion (Algorithm 9) is used at the beginning of the multihop traversal-based edge insertion algorithm, explained in Sect. 4.5. It prepares the RCD values before the multihop traversal operation is performed for the inserted edge, \((u_{1}, u_{2})\), and the given hop distance, n. This preparation is needed as the RCD values of the root(s) may have changed due to the updated degrees, and this change may have propagated to RCD values of other vertices. The preparation phase is performed assuming that the K values are intact. Those will be updated during the traversal, and a re-computation of RCDs would be required at the end (multihopRecomputeRCDs procedure).

The preparation starts with determining the root vertices based on their K values. If the K values of the extremities of the inserted edge are not equal, we increment the \( RCD (r,h)\) value of root for all \(h \le n\). The rationale behind this is that the root vertex gains a new neighbor with a higher K value, and by Definition 7, it increases all RCD values of root by one. Following this increment operation, we check whether the \( RCD (r, h)\) has exceeded k, because this implies further changes in \( RCD (\cdot ,h+1)\) values of r’s neighbor vertices (by Definition 7). In the preparation phase, \( RCD (\cdot ,n)\) of a vertex only changes when \( RCD (\cdot ,n-1)\) of a neighbor changes and that is what we are checking for. Remember that \( RCD (u,n)\) is the number of u’s neighbors, w, where either \(K(u)<K(w)\) or \(K(u)=K(w)\) and \( RCD (w,n-1) > K(u)\). Throughout the algorithm, we accumulate the vertices whose \( RCD (\cdot ,h)\) values just exceed k in the next frontiers set, where h is the hop number. We avoid this accumulation operation if the last hop number h is being processed, since there is no need for further processing in that case. When the hop number h is greater than 1, we process the neighbors (with the same K value) of the vertices in the current frontiers set by incrementing their \( RCD (\cdot ,h)\) values. We also perform checks to see whether k is exceeded and accordingly populate the next frontiers set.

If the K values of the extremities of the inserted edge are equal, we do different operations for \(h=1\) and \(h>1\), where h is the current hop number. For \(h=1\), where \( RCD (u,1)\) is actually equal to \( MCD (u)\), we just increment the \( RCD (\cdot ,1)\) values of both extremities of the inserted edge (by Definition 7) and perform checks to see whether k is exceeded and accordingly populate the next frontiers set. If \(h>1\), we need to handle the new inserted edge separately. Let us say \(u_1\) and \(u_2\) are the extremities of the inserted edge. We first check the \( RCD (u_1,h-1)\) [and dually \( RCD (u_2,h-1)\)] is greater than k. If so, we increment the \( RCD (u_2,h)\) [and dually \( RCD (u_1,h)\)] and perform the k value checks to populate the next frontier as needed. After that, we process the neighbors (with the same K value) of the vertices in current frontiers set. One important difference in this step is that we exclude the edge between \(u_1\) and \(u_2\), because that edge is already handled.

Multihop algorithms are only applicable for the edge insertion operation. For removal, using 1-hop information (MCD values) is necessary and sufficient, as stated in the last paragraph of Sect. 4.3.3. Therefore, going for multihop information does not bring any additional benefit in terms of the running time. However, given that we are interested in sliding window scenarios, where removals happen together with insertions, we need to accommodate the maintenance of RCD values when there is an edge removal. For this purpose, we develop the multihopPrepareRCDsRemoval method. Detailed pseudocode and explanation can be found in “RCD maintenance for edge removal” of appendix.

After the multihop traversal, if the K values of some vertices are incremented, then this will create a cascading effect on RCD values of the vertices around. Efficiently handling the cascades and doing the update operations is again of great importance. Algorithm 10 finds those vertices whose RCD values need to be updated and efficiently updates these RCD values. It has two main parameters: the set of vertices whose K values are updated (\(\mathtt{changed}\)), and the hop distance until which RCD values are to be updated (n). We start the algorithm by marking the \(\mathtt{changed}\) vertices as visited. Throughout the algorithm, we mark the vertices via the visited array to prevent duplicates during the update procedure. In the main for loop (the second one), we process the updates for each hop, in order. At each iteration, we populate the \(\mathtt{changed}\) set with the updated vertices and then update the RCD values of the vertices in \(\mathtt{changed}\). Cascading effect propagates by a single hop neighborhood at each iteration. In other words, if we assume that a vertex u has its K value updated, and we want 3-hop distance RCD values to be updated; \( RCD (1)\), \( RCD (2)\) and \( RCD (3)\) of u will be updated. Furthermore, \( RCD (2)\) and \( RCD (3)\) values of some vertices in u’s hop-1 neighborhood will be updated, and \( RCD (3)\) values of some vertices in u’s hop-2 neighborhood will be updated.

Pruning the vertices in the neighborhood is critical in making the procedure efficient. For the edge insertion case, given vertex v, we prune the neighborhood vertices by checking whether they are visited previously and whether the K value of the neighbor vertex is either equal to K value of v or equal to K value of v minus 1 (plus 1 for the edge removal case). The reason behind this check is based on Definition 7. \( RCD (n)\) of a neighbor vertex may change iff the K values are equal. For the edge insertion case, given that there are also some vertices whose K values are incremented, we need to consider them as well by checking the neighbor vertices with one less K value. Likewise, for the edge removal, we check the neighbor vertices with one more K value, as stated with comments in the pseudocode of Algorithm 10. We accumulate the vertices to be updated in \(\mathtt{changed}\) set and update their RCD values for the hop distance at that iteration. computeRCD procedure at the end finds the RCD values for all hop numbers up to h. It basically makes use of Definition 7. In summary, we handle the cascading effect of RCD maintenance efficiently by the aforementioned pruning techniques.

1.1 RCD maintenance for edge removal

Algorithm 11 adjusts the RCD values when there is an edge removal and is very similar to Algorithm 9. One important difference is that, instead of incrementing the RCD values, we need to decrement them whenever necessary. We also check whether the \( RCD (\cdot ,h)\) value goes below \(k+1\), which implies changes in \( RCD (\cdot ,h+1)\) values of neighbor vertices. Another difference between Algorithms 11 and 9 exists when the K values of the removed edge extremities are equal. In this case, we need to remember the RCD values for all hop numbers before the edge removal operation. This enables us to process the hop numbers \(h>1\).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sarıyüce, A.E., Gedik, B., Jacques-Silva, G. et al. Incremental k-core decomposition: algorithms and evaluation. The VLDB Journal 25, 425–447 (2016). https://doi.org/10.1007/s00778-016-0423-8

Download citation

Received: 13 October 2014
Revised: 04 January 2016
Accepted: 16 January 2016
Published: 15 February 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s00778-016-0423-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incremental k-core decomposition: algorithms and evaluation

Abstract

Access this article

Similar content being viewed by others

Graph based anomaly detection and description: a survey

Time-Dependent Graphs: Definitions, Applications, and Algorithms

Propagation kernels: efficient graph kernels from propagated information

Notes

References

Acknowledgments