From Alternative Clustering to Robust Clustering and Its Application to Gene Expression Data

Peng, Peter; Nagi, Mohamad; Şair, Omer; Suleiman, Iyad; Qabaja, Ala; ElSheikh, Abdallah M.; Gao, Shang; Özyer, Tansel; Kianmehr, Keivan; Naji, Ghada; Ridley, Mick; Rokne, Jon; Alhajj, Reda

doi:10.1007/978-3-642-23878-9_50

Peter Peng¹⁹,
Mohamad Nagi²⁰,
Omer Şair²¹,
Iyad Suleiman²⁰,
Ala Qabaja¹⁹,
Abdallah M. ElSheikh¹⁹,
Shang Gao¹⁹,
Tansel Özyer²¹,
Keivan Kianmehr²²,
Ghada Naji²³,
Mick Ridley²⁰,
Jon Rokne¹⁹ &
…
Reda Alhajj^19,24

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6936))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1805 Accesses
1 Citations

Abstract

The major contribution of the work described in this paper could be articulated as a parameter free clustering approach that leads to appropriate distribution of the given data instances into the most convenient clusters. This goal is realized in several steps. First, we apply multi-objective genetic algorithm to determine some alternative clustering solutions that constitute the pareto-front. The result is a pool of the clusters reported by all the solutions. Then, we determine the homogeneity of each cluster in the pool to keep the most homogeneous clusters which may not be select from one solution because a solution which is favored the most by considering the multiple objectives might have some clusters which are less homogeneous compared to best clusters in other solutions. Finally, as a given data instance may belong to more than one cluster in the solution set we reduce this membership to the cluster in which the instance is closest to the centroid. Many applications like gene expression data analysis are in need for such parameter free approach because the correctness of the post processing is directly affected by the outcome form the clustering process. We demonstrate the applicability and effectiveness of the proposed clustering approach by conducting experiments using two benchmark data sets.

This study was supported by Scientific and Technical Research Council of Turkey (Grant number TÜBITAK EEEAG 109E241).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Recognition and Machine Intelligence (1), 224–227 (1979)
Google Scholar
Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybernetics 4, 95–104 (1974)
Article MathSciNet MATH Google Scholar
Halkidi, M., Vazirgiannis, M., Batistakis, Y.: Quality scheme assessment in the clustering process. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. PKDD, vol. 1910, pp. 265–276. Springer, Heidelberg (2000)
Chapter Google Scholar
Halkidi, M., Vazirgiannis, M.: Clustering Validity Assessment: Finding the optimal partitioning of a data set. In: Proceedings of IEEE ICDM, California (November 2001)
Google Scholar
Horn, J., Nafpliotis, N., Goldberg, D.E.: A niched pareto genetic algorithm for multiobjective optimization. In: Proceedings of IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Computation, Piscataway, NJ, vol. 1, pp. 82–87 (1994)
Google Scholar
Hubert, L., Schultz, J.: Quadratic assignment as a general data-analysis strategy. British Journal of Mathematical and Statistical Psychology 29, 190–241 (1976)
Article MathSciNet MATH Google Scholar
Lu, Y., Lu, S., Fotouhi, F., Deng, Y., Brown, S.: FGKA: A Fast Genetic K-means Clustering Algorithm. In: Proceedings of ACM Symposium on Applied Computing, Nicosia, Cyprus, pp. 162–163 (2004)
Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comp. App. Math. 20, 53–65 (1987)
Article MATH Google Scholar
Özyer, T., Alhajj, R.: Parallel Clustering of High Dimensional Data by Integrating Multi-Objective Genetic Algorithm with Divide and Conquer. Applied Intelligence (in press)
Google Scholar
Tan, M., Alshalalfa, M., Alhajj, R., Polat, F.: Influence of Prior Knowledge in Constraint-Based Learning of Gene Regulatory Networks. In: IEEE/ACM TCBB, vol. 8(1), pp. 130–142 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Dept, University of Calgary, Calgary, Alberta, Canada
Peter Peng, Ala Qabaja, Abdallah M. ElSheikh, Shang Gao, Jon Rokne & Reda Alhajj
School of Computing, University of Bradford, Bradford, UK
Mohamad Nagi, Iyad Suleiman & Mick Ridley
Dept of Computer Engineering, TOBB University, Ankara, Turkey
Omer Şair & Tansel Özyer
Dept of Elect. & Comp. Eng., University of Western Ontario, London, ON, Canada
Keivan Kianmehr
Department of Biology, Lebanese University, Tripoli, Lebanon
Ghada Naji
Department of Computer Science, Global University, Beirut, Lebanon
Reda Alhajj

Authors

Peter Peng
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad Nagi
View author publications
You can also search for this author in PubMed Google Scholar
Omer Şair
View author publications
You can also search for this author in PubMed Google Scholar
Iyad Suleiman
View author publications
You can also search for this author in PubMed Google Scholar
Ala Qabaja
View author publications
You can also search for this author in PubMed Google Scholar
Abdallah M. ElSheikh
View author publications
You can also search for this author in PubMed Google Scholar
Shang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Tansel Özyer
View author publications
You can also search for this author in PubMed Google Scholar
Keivan Kianmehr
View author publications
You can also search for this author in PubMed Google Scholar
Ghada Naji
View author publications
You can also search for this author in PubMed Google Scholar
Mick Ridley
View author publications
You can also search for this author in PubMed Google Scholar
Jon Rokne
View author publications
You can also search for this author in PubMed Google Scholar
Reda Alhajj
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, University of Manchester, Sackville Street Building, M60 1QD, Manchester, UK
Hujun Yin
School of Computing Sciences, University of East Anglia, NR4 7TJ, Norwich, UK
Wenjia Wang
University of East Anglia, NR4 7TJ, Norwich, UK
Victor Rayward-Smith

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, P. et al. (2011). From Alternative Clustering to Robust Clustering and Its Application to Gene Expression Data. In: Yin, H., Wang, W., Rayward-Smith, V. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2011. IDEAL 2011. Lecture Notes in Computer Science, vol 6936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23878-9_50

Download citation

DOI: https://doi.org/10.1007/978-3-642-23878-9_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23877-2
Online ISBN: 978-3-642-23878-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics