Abstract
Cancer has been identified as the leading cause of death. It is predicted that around 20–26 million people will be diagnosed with cancer by 2020. With this alarming rate, there is an urgent need for a more effective methodology to understand, prevent and cure cancer. Microarray technology provides a useful basis of achieving this goal, with cluster analysis of gene expression data leading to the discrimination of patients, identification of possible tumor subtypes and individualized treatment. Amongst clustering techniques, k-means is normally chosen for its simplicity and efficiency. However, it does not account for the different importance of data attributes. This paper presents a new locally weighted extension of k-means, which has proven more accurate across many published datasets than the original and other extensions found in the literature.
Similar content being viewed by others
References
Aggarwal, C., Procopiuc, C., Wolf, J. L., Yu, P. S., and Park, J. S., Fast algorithms for projected clustering. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 61–72, 1999.
Alizadeh, A. A. et al., Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511, 2000.
Armstrong, S. et al., MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30:41–47, 2002.
Bittner, M. et al., Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406(6795):536–540, 2000.
Boongoen, T., and Shen, Q., Nearest-neighbor guided evaluation of data reliability and its applications. IEEE Trans. Syst. Man cybern., Part B 40(6):1622–1633, 2010.
Boongoen, T., Shang, C., Iam-On, N., and Shen. Q., Extending data reliability measure to a filter approach for soft subspace clustering. IEEE Trans. Syst. Man cybern., Part B 41(6):1705–1714, 2011.
Cheng, Y., and Church, G. M., Biclustering of expression data. In: Proceedings of Int Conf on Intelligent Systems for Molecular Biology, pp 93–103, 2000.
de Souto, M., Costa, I., de Araujo, D., Ludermir, T., and Schliep, A., Clustering cancer gene expression data: a comparative study. BMC Bioinformatics 9:497, 2008.
Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., and Papadopoulos, D., Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery 14(1):63–97, 2007.
Dy, J. G., and Brodley, C. E., Feature selection for unsupervised learning. J. Mach. Learn. Res. 5:845–889, 2004.
Dyrskjot, L. et al., Identifying distinct classes of bladder carcinoma using microarrays. Nat. Genet. 33:90–96, 2003.
Gan, G. J., and Wu, J. H., A convergence theorem for the fuzzy subspace clustering (FSC) algorithm. Pattern Recogn. 41:1939–1947, 2008.
Garber, M. E. et al., Diversity of gene expression in adenocarcinoma of the lung. Proc. Natl. Acad. Sci. USA 98(24):13784–13789, 2001.
Golub, T. et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537, 1999.
Gordon, G. J. et al., Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62(17):4963–4967, 2002.
Gu, J., and Liu, J. S., Bayesian biclustering of gene expression data. BMC Genomics 9(Suppl I):S4, 2008.
Iam-On, N., and Boongoen, T., New soft subspace method to gene expression data clustering. In: Proceedings of IEEE-EMBS International Conference on Biomedical and Health Informatics, pp 984–987, 2012.
Iam-On, N., Boongoen, T., and Garrett, S., LCE: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 26(12):1513–1519, 2010.
Jing, L., Ng, M. K., and Huang, J. Z., An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowl. Data Eng. 19(8):1026–1041, 2007.
Joliffe, I., Principal component analysis. Springer: New York, 1986.
Khan, J. et al., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6):673–679, 2001.
Kriegel, H. P., Kroger, P., and Zimek, A., Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. KDD 3(1):1–ex, 2009.
Laiho, P. et al., Serrated carcinomas form a subclass of colorectal cancer with distinct molecular basis. Oncogene 26(2):312–320, 2007.
Ng, A., Jordan, M., and Weiss, Y., On spectral clustering: analysis and an algorithm. Advances in NIPS 14, 2001.
Nutt, C. et al., Gene expressionbased classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7):1602–1607, 2003.
Pomeroy, S. et al., Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442, 2002.
Ramaswamy, S. et al., Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98(26):15149–15154, 2001.
Shipp, M. A. et al., Diffuse large B-cell lymphoma outcome prediction by geneexpression profiling and supervised machine learning. Nat. Med. 8:68–74, 2002.
Singh, D. et al., Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209, 2002.
Spang, R., Diagnostic signatures from microarrays: a bioinformatics concept for personalized medicine. BIOSILICO 1:264–268, 2003.
Strehl, A., and Ghosh, J., Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3:583–617, 2002.
Su, A. et al., Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res. 61(20):7388–7393, 2001.
Wallqvist, A., Rabow, A., Shoemaker, R., Sausville, E., and Covell, D., Establishing connections between microarray expression data and chemotherapeutic cancer pharmacology. Mol. Cancer. Ther. 1:311–320, 2002.
Acknowledgements
The authors would like to thank X. Z. Fern and C. E. Brodley for the source code of HBGF, and C. Domeniconi for the implementation of LAC.
Conflict of Interest The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Iam-On, N., Boongoen, T. A New Locally Weighted K-Means for Cancer-Aided Microarray Data Analysis. J Med Syst 36 (Suppl 1), 43–49 (2012). https://doi.org/10.1007/s10916-012-9889-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10916-012-9889-0