doi:10.1016/j.inffus.2005.01.008
Copyright © 2005 Elsevier B.V. All rights reserved.
Moderate diversity for better cluster ensembles
Stefan T. Hadjitodorova,
,
, Ludmila I. Kunchevab and Ludmila P. Todorovaa
aCenter for Biomedical Engineering (CLBME), Bulgarian Academy of Sciences, “Acad G. Bonchev” Str., block 105, Sofia 1113, Bulgaria
bSchool of Informatics, University of Wales—Bangor, Bangor, Gwynedd LL57 1UT, United Kingdom
Received 21 September 2004;
revised 29 January 2005;
accepted 29 January 2005.
Available online 3 March 2005.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Adjusted Rand index is used to measure diversity in cluster ensembles and a diversity measure is subsequently proposed. Although the measure was found to be related to the quality of the ensemble, this relationship appeared to be non-monotonic. In some cases, ensembles which exhibited a moderate level of diversity gave a more accurate clustering. Based on this, a procedure for building a cluster ensemble of a chosen type is proposed (assuming that an ensemble relies on one or more random parameters): generate a small random population of cluster ensembles, calculate the diversity of each ensemble and select the ensemble corresponding to the median diversity. We demonstrate the advantages of both our measure and procedure on 5 data sets and carry out statistical comparisons involving two diversity measures for cluster ensembles from the recent literature. An experiment with 9 data sets was also carried out to examine how the diversity-based selection procedure fares on ensembles of various sizes. For these experiments the classification accuracy was used as the performance criterion. The results suggest that selection by median diversity is no worse and in some cases is better than building and holding on to one ensemble.
Keywords: Pattern recognition; Machine learning; Multiple classifiers; Cluster ensembles; Diversity measures, Adjusted Rand index
Fig. 1. The generic pairwise cluster ensemble algorithm.
Fig. 2. Ensemble accuracy, A, versus 6 diversity measures for the four-gauss data. The bottom curve in each plot is the averaged individual accuracy (dot marker).
Fig. 3. Ensemble accuracy, A, versus 6 diversity measures for the wine data. The bottom curve in each plot is the averaged individual accuracy (dot marker).
Fig. 4. Fitted polynomial of degree 3 for the ensemble accuracy versus Dnp−3 for the four-gauss and wine data. The bottom curve in each plot is the averaged individual accuracy (dot marker).
Fig. 5. Artificial data sets: (a) four-gauss; (b) easy-doughnut; (c) difficult-doughnut.
Fig. 6. Total numbers of statistically significant differences in favour of each method. The numbers on the x-axis correspond to these in Table 6.
Fig. 7. (A) Classification accuracy versus ensemble size for glass, wine and iris data. (B) Classification accuracy versus ensemble size for glass, wine and iris data. (C) Classification accuracy versus ensemble size for segmentation, soybean and contractions data.
Table 1.
Correlation coefficients between the 6 diversity measures and the ensemble accuracy for the examples in Fig. 2 and Fig. 3

Shown also is the correlation between the individual average and the ensemble accuracy.
Table 2.
Summary of the design of the 7 ensembles types

Table 3.
Data sets

Table 4.
Ensemble accuracies (ar(P*, PT)) for the 7 ensemble models and the 5 data sets, averaged across 100 realizations

The largest value for each data set is shown in boldface.
Table 5.
Ensemble accuracies (ar(Pmed, PT)) for ensemble model
and the 5 data sets, averaged across 100 runs (samples of 15 ensembles and selection)

The largest value for each data set is shown in boldface.
Table 6.
Statistical significance of the differences between the “competitors”

Entry (i, j) in the table shows the number of comparisons (out of 35) where competitor i has been better than competitor j.
Key:
- 1 Base accuracy (equivalent to randomly chosen ensemble)
2 Dp, selection by median
3 H, selection by median
4 Dnp−1, selection by median
5 Dnp−2, selection by median
6 Dnp−3, selection by median
7 Dnp−4, selection by median
8 Dp, selection by maximum
9 H, selection by maximum
10 Dnp−1, selection by maximum
11 Dnp−2, selection by maximum
12 Dnp−3, selection by maximum
13 Dnp−4, selection by maximum
Table 7.
Total number of statistically significant differences in favour of each method for sample sizes 5, 15 and 25

The median-selection results are highlighted in bold. Methods are numbered as in Table 6.
Table 8.
Additional data sets
a Personal communication from Dr. Fernando Vialriño, Computer Vision Centre, Barcelona, Spain.