Abstract
The major contribution of the work described in this paper could be articulated as a parameter free clustering approach that leads to appropriate distribution of the given data instances into the most convenient clusters. This goal is realized in several steps. First, we apply multi-objective genetic algorithm to determine some alternative clustering solutions that constitute the pareto-front. The result is a pool of the clusters reported by all the solutions. Then, we determine the homogeneity of each cluster in the pool to keep the most homogeneous clusters which may not be select from one solution because a solution which is favored the most by considering the multiple objectives might have some clusters which are less homogeneous compared to best clusters in other solutions. Finally, as a given data instance may belong to more than one cluster in the solution set we reduce this membership to the cluster in which the instance is closest to the centroid. Many applications like gene expression data analysis are in need for such parameter free approach because the correctness of the post processing is directly affected by the outcome form the clustering process. We demonstrate the applicability and effectiveness of the proposed clustering approach by conducting experiments using two benchmark data sets.
This study was supported by Scientific and Technical Research Council of Turkey (Grant number TÜBITAK EEEAG 109E241).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Recognition and Machine Intelligence (1), 224–227 (1979)
Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybernetics 4, 95–104 (1974)
Halkidi, M., Vazirgiannis, M., Batistakis, Y.: Quality scheme assessment in the clustering process. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. PKDD, vol. 1910, pp. 265–276. Springer, Heidelberg (2000)
Halkidi, M., Vazirgiannis, M.: Clustering Validity Assessment: Finding the optimal partitioning of a data set. In: Proceedings of IEEE ICDM, California (November 2001)
Horn, J., Nafpliotis, N., Goldberg, D.E.: A niched pareto genetic algorithm for multiobjective optimization. In: Proceedings of IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Computation, Piscataway, NJ, vol. 1, pp. 82–87 (1994)
Hubert, L., Schultz, J.: Quadratic assignment as a general data-analysis strategy. British Journal of Mathematical and Statistical Psychology 29, 190–241 (1976)
Lu, Y., Lu, S., Fotouhi, F., Deng, Y., Brown, S.: FGKA: A Fast Genetic K-means Clustering Algorithm. In: Proceedings of ACM Symposium on Applied Computing, Nicosia, Cyprus, pp. 162–163 (2004)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comp. App. Math. 20, 53–65 (1987)
Özyer, T., Alhajj, R.: Parallel Clustering of High Dimensional Data by Integrating Multi-Objective Genetic Algorithm with Divide and Conquer. Applied Intelligence (in press)
Tan, M., Alshalalfa, M., Alhajj, R., Polat, F.: Influence of Prior Knowledge in Constraint-Based Learning of Gene Regulatory Networks. In: IEEE/ACM TCBB, vol. 8(1), pp. 130–142 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Peng, P. et al. (2011). From Alternative Clustering to Robust Clustering and Its Application to Gene Expression Data. In: Yin, H., Wang, W., Rayward-Smith, V. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2011. IDEAL 2011. Lecture Notes in Computer Science, vol 6936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23878-9_50
Download citation
DOI: https://doi.org/10.1007/978-3-642-23878-9_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23877-2
Online ISBN: 978-3-642-23878-9
eBook Packages: Computer ScienceComputer Science (R0)