1 Correction to: Optim Lett https://doi.org/10.1007/s11590-015-0971-7

This article provides a corrigendum to “Finding a maximum k-club using the k-clique formulation and canonical hypercube cuts,” published online in Optim Lett, 2015. Due to programming errors in our C++ implementations, the computational results reported in the article are incorrect. In some pathological instances, a significantly larger number of k-cliques that are not k-clubs can be detected, which can adversely affect the performance of the algorithms proposed. This corrigendum presents completely revised computational results, discussion, and conclusions that are meant to replace Sections 3 and 4 in the original article.

2 Computational results

In this section, we study the computational performance of the ITDBC and the DBC algorithms in Section 2 of the original article in solving the MkCP. The direct solution of the integer programming (IP) formulations, referred to as F1 [4] and F2 [5], serve as our comparison. All four approaches (DBC, ITDBC, F1, and F2) are implemented in C++, and \(\hbox {Gurobi}^{\textsc {TM}}\) Optimizer 7.0.2 [2] is used to solve the IP formulations. All numerical experiments are conducted on a 64-bit \(\hbox {Linux}^\circledR \) compute node with dual \(\hbox {Intel}^\circledR \) \(\hbox {Xeon}^\circledR \) E5-2620 hex-core 2.0 GHz processors and 32 GB RAM. For one instance with more than \(10^4\) vertices, a large memory compute node with 256GB RAM is used.

We solve the MkCP for \(k \in \{2,3,\ldots ,7\}\) on the test-bed from [4] using the four approaches we consider. We solve the MkCP on each connected component using the F1, F2, and DBC approaches in the decreasing order of number of vertices in the component. If the largest k-club known prior to solving the MkCP on a component is larger than the number vertices in that component, we terminate. A vertex of maximum degree in the graph and its neighbors form a feasible k-club for all \(k \ge 2\), and it is used to initialize F1, F2, and DBC approaches. By contrast, ITDBC approach is initialized using the largest distance-\(\lfloor \frac{k}{2}\rfloor \) neighborhood of a vertex and, in each iteration, solves the MkCP on the subgraph induced by the largest distance-k neighborhood (only if it is larger than the largest k-club known). ITDBC also uses the trim procedure, a scale reduction technique, as described in Algorithm 1 in the original article. Both DBC and ITDBC approaches implement the CHCs through the lazy cut callback procedure in Gurobi [2]. In the callback, if the k-clique encountered is also a k-club, variables are fixed to zero if the distance-k neighborhood of the corresponding vertices is smaller than the k-club detected.

Table 1 A comparison of running times (s) averaged over 10 samples on the test-bed from [4]; fastest (on average) running times are highlighted in bold font

F1 and F2 use Gurobi’s barrier implementation to solve the root node relaxation due to potential degeneracies, while DBC and ITDBC use the Gurobi’s default setting that automatically selects the solver. All approaches use the dual simplex solver at all other nodes of the branch-and-bound tree. In all approaches, we set the Gurobi objective cutoff parameter to the size of the largest k-club known prior to calling the Gurobi optimization procedure. This informs the solver that we are only interested in solutions with better objective values [2].

We impose a 1-h limit using the Gurobi parameter for limiting the solve time. We do the same with the total elapsed running time after every component is solved in F1, F2, and DBC, and after every iteration of ITDBC. However, on some instances Gurobi did not strictly enforce the time limit parameter resulting in significantly longer running times; these must be interpreted as a failure to solve to optimality within the 1-h limit. All other Gurobi parameters including numerical tolerances, branching strategies, heuristics, cuts, and the use of multiple threads were left at their default values.

We report average running times over 10 samples for each order (n) and edge density (\(\rho \)) we consider in Table 1, along with the number of instances that were not solved to optimality within 1 h. The fastest average running time for every pair of n and \(\rho \) is highlighted in bold font. On this test-bed, F1 and F2 are comparable for smaller k. For the larger values of k we often find F2 is significantly faster than F1, with some exceptions. F2 uses a slightly different variable definition compared to F1 and avoids using “big-M” parameters in the constraints, which facilitates better performance with a general-purpose solver.

By contrast, DBC and ITDBC are close in performance on most instances in this test-bed for all k values we consider. However, there are some notable exceptions where their performances are significantly different, as we identify next using \((n,\rho , k)\) triples: ITDBC is, on average, significantly better than DBC on \((200, 1.5\%, 4), (100, 2\%, 5),\) and \((300, 1.5\%, 6)\); DBC is better than ITDBC on \((200, 1\%, 5)\). The former are cases where the preprocessing techniques in ITDBC were very effective, while in the latter case they were not. In particular, if the number of vertices at distance k or less from the fixed vertex is very large, it would be more effective to solve that instance directly using DBC. When preprocessing is not effective, ITDBC might solve a large instance in several iterations using DBC, degrading the performance of ITDBC.

Comparing DBC/ITDBC against F1/F2 on this test-bed, we frequently find that when one or more of the samples were not solved to optimality by DBC and/or ITDBC, the average running time is worse in comparison to direct solution of the formulations. However, when the decomposition approaches are faster, they are generally much faster than directly solving the formulations. Superior performance of the DBC algorithm depends on how often the detected k-clique is also a k-club. Whenever this is not the case, we add a CHC to eliminate a k-clique that is not a k-club. The drawback of the CHC is that it eliminates only that k-clique based on which the cut is constructed. Let \({\widetilde{\omega }}_k(G)\) and \({\bar{\omega }}_k(G)\) denote the size of a maximum k-clique (the k-clique number) and k-club (the k-club number) in G, respectively. The number of k-cliques that are not k-clubs that need to be eliminated in the DBC approach is exponential in the gap \({\widetilde{\omega }}_k(G)-{\bar{\omega }}_k(G)\). This is because every subset of a k-clique is also a k-clique. These observations imply that a very large number of CHCs have to be generated on instances that have a large gap \({\widetilde{\omega }}_k(G)-{\bar{\omega }}_k(G)\). These are pathological for the DBC approach and on such instances the performance of DBC significantly degrades. This can in turn degrade the performance of ITDBC if the preprocessing techniques are not effective on such pathological instances. To highlight this issue, in Table 2 we report the average and maximum values of the gap \({\widetilde{\omega }}_k(G)-{\bar{\omega }}_k(G)\) over 10 samples for each setting of \((n,\rho , k)\).

Table 2 Average\(\backslash \)maximum gap \({\widetilde{\omega }}_k-\bar{\omega _k}\) over 10 samples on the test-bed from [4]

Table 3 lists the DIMACS clustering benchmark instances [1], many of which model real data that are also used in our study.Footnote 1 Tables 4, 5, 6, 7, 8, and 9 tabulate the results of our experiments in solving MkCP for \(k \in \{2, 3\ldots ,7\}\) on this test-bed. The fastest running time is highlighted in bold font. For the purposes of our discussion, we divide these instances into two groups: group-1 includes instances with fewer than 1600 vertices, and group-2 includes the remaining instances. The two groups are separated by a line in Tables 3, 4, 5, 6, 7, 8, and 9.

Table 3 Number of vertices, edges, and edge density for DIMACS clustering benchmarks used in this study

On group-1, with three exceptions, we find that the decomposition approaches outperform directly solving F1/F2; consistent with our discussions on the other test-bed, these exceptions (\(k=3\) on football and \(k=3,4\) on email) also correspond to large gaps for \({\widetilde{\omega }}_k(G)-{\bar{\omega }}_k(G)\). Interestingly, we are able to directly solve the MkCP for \(k=3\) on football using F1 and F2 in under 2 min, while all four approaches fail to solve the MkCP to optimality for \(k=3,4\) on email.

Table 4 The 2-clique number, the 2-club number, the number of \(\mathrm{CHC}\mathrm{s}\) added, and the running time in seconds for DBC, ITDBC, F1, and F2 on the DIMACS test-bed
Table 5 The 3-clique number, the 3-club number or the size of the largest 3-club found, the number of \(\mathrm{CHC}\mathrm{s}\) added, and the running time in seconds for DBC, ITDBC, F1, and F2 on the DIMACS test-bed

The DBC and ITDBC approaches are comparable in terms of overall running time for most instances in group-1 for all k values we consider; we review the exceptions to this observation next. One of the exceptional instances, email for \(k=2\), solved much faster with ITDBC due to effective preprocessing. In fact, detailed runtime statistics reveal that the initial heuristic solution is optimal and preprocessing reduces the instance to fewer than 300 vertices from the original 1133 vertices in the three ITDBC calls in which DBC is invoked. However, the same email graph for \(k=5\) required 21 iterations of ITDBC, each with 1000+ vertices as preprocessing was not as effective even after the optimal solution was detected in the first iteration. Another exceptional instance, polblogs, is also an example where the ITDBC is noticeably slower than the DBC for \(k=2,3,4\). Detailed iteration statistics of ITDBC for this instance show that, for the aforementioned k values, multiple iterations were needed and the graph was not sufficiently reduced as a result of preprocessing. In such cases, it is more effective to solve the entire instance directly using DBC. As an illustration, we report iteration statistics for polblogs in Table 10 for the case \(k=3\) that took the most number of iterations among all k values we consider for this instance.

Table 6 The 4-clique number, the 4-club number or the size of the largest 4-club found, the number of \(\mathrm{CHC}\mathrm{s}\) added, and the running time in seconds for DBC, ITDBC, F1, and F2 on the DIMACS test-bed
Table 7 The 5-clique number, the 5-club number or the size of the largest 5-club found, the number of \(\mathrm{CHC}\mathrm{s}\) added, and the running time in seconds for DBC, ITDBC, F1, and F2 on the DIMACS test-bed
Table 8 The 6-clique number, the 6-club number or the size of the largest 6-club found, the number of \(\mathrm{CHC}\mathrm{s}\) added, and the running time in seconds for DBC, ITDBC, F1, and F2 on the DIMACS test-bed
Table 9 The 7-clique number, the 7-club number or the size of the largest 7-club found, the number of \(\mathrm{CHC}\mathrm{s}\) added, and the running time in seconds for DBC, ITDBC, F1, and F2 on the DIMACS test-bed
Table 10 Progress of the ITDBC algorithm when solving polblogs (\(|V|=1490, |E|=16{,}715\)) for \(k=3\)

Note that even if \({\widetilde{\omega }}_k(G)-{\bar{\omega }}_k(G)=0\) (or nearly zero), if there are a significantly large number of optimal k-cliques that are not k-clubs, we may still have to eliminate these before finding a maximum k-club. Pertinently, even if the gap is zero and DBC does not generate any CHC, ITDBC could still require CHCs. The subgraph induced by the distance-k neighborhood of the fixed vertex, potentially a very different graph than the original graph, can result in a very different runtime behavior of the ITDBC compared to the DBC. This is reflected in the number of CHC added by the two decomposition approaches on the instances: polbooks when \(k=3,4\), polblogs when \(k=3\), and netscience for \(k=5\). We also find such performance variations between different computing platforms, which is a phenomenon germane to IP solvers [3].

We close our discussion of group-1 DIMACS instances with a note on what we observed with polbooks when \(k=3\). This instance is not solved to optimality within the 1-h time limit by ITDBC, although it is solved to optimality by DBC, F1, and F2 in fewer than 4 s (fastest being DBC in less than 1 s). The very first iteration of the ITDBC approach that solves the problem on \(N_G^3[v]\) for vertex \(v=5\) (a vertex with the largest distance-3 neighborhood fixed to be included in the solution) stymies the DBC approach in a manner the entire instance does not. Ironically, no maximum 3-club in polbooks contains vertex 5, as a valid upper-bound smaller than the 3-club number of polbooks is returned at the end of the first iteration. Furthermore, this one iteration terminates suboptimally reaching the 1-h running time limit even though the reduced instance only contains 104 vertices. This observation is a reminder that even smaller instances of this NP-hard problem can be extremely challenging to solve using this approach. Although the original instance is easy for DBC to solve, when we consider the subgraph induced by \(N_G^3[5]\), the difficulty level is dramatically different. We verified this observation independently with the DBC code by adding the constraint \(x_5 = 1\) to the original instance that solved in under 1 s, which also resulted in suboptimal termination after an hour.

We finally turn our attention to the so-called group-2 instances in the DIMACS test-bed with several thousands of vertices. An immediate observation is that the largest of these instances, PGPgiantcompo, crashes due to memory issues with F2 for all values of k we consider, and with F1 for nearly all values of k we consider. In general, the less memory-intensive decomposition approaches fair better. ITDBC, due to its preprocessing techniques, is very effective on all three instances when \(k=2\) and in solving power for all k values we consider. The next largest instance in this group, hep-th, with 8000+ vertices is not solved to optimality within the time limit by any of the approaches for \(k=3,\ldots ,7\). However, one of the decomposition approaches, often ITDBC, finds the best known solution for this instance. The largest instance, PGPgiantcompo with 10,000+ vertices, is solved to optimality for \(k=2,3,4,7\) by ITDBC, and its best known solution is found by ITDBC when \(k=5,6\).

3 Conclusion

Optimization Letters DOI 10.1007/s11590-015-0971-7 introduces a decomposition approach to solve the maximum k-club problem that employs the maximum k-clique problem, which is a graph-theoretic relaxation of k-club. The approach then uses a naive cutting plane, one that eliminates precisely one binary vector at a time, whenever a k-clique that is not a k-club is detected. The article also introduces an iterative version that can incorporate more effective preprocessing techniques due to the fact that in each iteration, some vertex is fixed to be included in the k-club.

In this corrigendum, a revised computational study of the proposed approaches in comparison to direct solution of integer programming formulations for the problem is presented; the corrigendum was necessitated by a coding error that was detected after the article appeared online in Optimization Letters. We find that the performance of the decomposition algorithms introduced in Section 2 of Optimization Letters DOI 10.1007/s11590-015-0971-7 is heavily dependent on the gap between the size of a maximum k-clique and k-club of the graph given by \({\widetilde{\omega }}_k(G)-{\bar{\omega }}_k(G)\). When this gap is small, the decomposition algorithms generally outperform directly solving the formulations; in this case, the performance gains are very significant for larger graphs and values of k. If this gap is 10 or more, performance of the decomposition algorithms deteriorates significantly, unless preprocessing techniques drastically reduce the instance. This behavior is a direct consequence of the nature of the cutting planes added—specifically, the elimination of one k-clique at a time.

It is encouraging to note that for most of the larger instances with 1000+ vertices from the DIMACS test-bed based on real data, across all values of k we consider, at least one of the decomposition approaches is significantly faster than directly solving the formulations. Barring some exceptions noted in our computational study, the effectiveness of the decomposition approaches when the gap \({\widetilde{\omega }}_k(G)-{\bar{\omega }}_k(G)\) is small, especially on large-scale real-life instances warrants further investigation into the addition of stronger cutting planes.