Skip to main content

Advertisement

Log in

A New Locally Weighted K-Means for Cancer-Aided Microarray Data Analysis

  • Original Paper
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Cancer has been identified as the leading cause of death. It is predicted that around 20–26 million people will be diagnosed with cancer by 2020. With this alarming rate, there is an urgent need for a more effective methodology to understand, prevent and cure cancer. Microarray technology provides a useful basis of achieving this goal, with cluster analysis of gene expression data leading to the discrimination of patients, identification of possible tumor subtypes and individualized treatment. Amongst clustering techniques, k-means is normally chosen for its simplicity and efficiency. However, it does not account for the different importance of data attributes. This paper presents a new locally weighted extension of k-means, which has proven more accurate across many published datasets than the original and other extensions found in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Aggarwal, C., Procopiuc, C., Wolf, J. L., Yu, P. S., and Park, J. S., Fast algorithms for projected clustering. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 61–72, 1999.

  2. Alizadeh, A. A. et al., Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511, 2000.

    Article  Google Scholar 

  3. Armstrong, S. et al., MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30:41–47, 2002.

    Article  Google Scholar 

  4. Bittner, M. et al., Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406(6795):536–540, 2000.

    Article  Google Scholar 

  5. Boongoen, T., and Shen, Q., Nearest-neighbor guided evaluation of data reliability and its applications. IEEE Trans. Syst. Man cybern., Part B 40(6):1622–1633, 2010.

    Article  Google Scholar 

  6. Boongoen, T., Shang, C., Iam-On, N., and Shen. Q., Extending data reliability measure to a filter approach for soft subspace clustering. IEEE Trans. Syst. Man cybern., Part B 41(6):1705–1714, 2011.

    Article  Google Scholar 

  7. Cheng, Y., and Church, G. M., Biclustering of expression data. In: Proceedings of Int Conf on Intelligent Systems for Molecular Biology, pp 93–103, 2000.

  8. de Souto, M., Costa, I., de Araujo, D., Ludermir, T., and Schliep, A., Clustering cancer gene expression data: a comparative study. BMC Bioinformatics 9:497, 2008.

    Article  Google Scholar 

  9. Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., and Papadopoulos, D., Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery 14(1):63–97, 2007.

    Article  MathSciNet  Google Scholar 

  10. Dy, J. G., and Brodley, C. E., Feature selection for unsupervised learning. J. Mach. Learn. Res. 5:845–889, 2004.

    MathSciNet  MATH  Google Scholar 

  11. Dyrskjot, L. et al., Identifying distinct classes of bladder carcinoma using microarrays. Nat. Genet. 33:90–96, 2003.

    Article  Google Scholar 

  12. Gan, G. J., and Wu, J. H., A convergence theorem for the fuzzy subspace clustering (FSC) algorithm. Pattern Recogn. 41:1939–1947, 2008.

    Article  MATH  Google Scholar 

  13. Garber, M. E. et al., Diversity of gene expression in adenocarcinoma of the lung. Proc. Natl. Acad. Sci. USA 98(24):13784–13789, 2001.

    Article  Google Scholar 

  14. Golub, T. et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537, 1999.

    Article  Google Scholar 

  15. Gordon, G. J. et al., Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62(17):4963–4967, 2002.

    Google Scholar 

  16. Gu, J., and Liu, J. S., Bayesian biclustering of gene expression data. BMC Genomics 9(Suppl I):S4, 2008.

    Article  Google Scholar 

  17. Iam-On, N., and Boongoen, T., New soft subspace method to gene expression data clustering. In: Proceedings of IEEE-EMBS International Conference on Biomedical and Health Informatics, pp 984–987, 2012.

  18. Iam-On, N., Boongoen, T., and Garrett, S., LCE: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 26(12):1513–1519, 2010.

    Article  Google Scholar 

  19. Jing, L., Ng, M. K., and Huang, J. Z., An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowl. Data Eng. 19(8):1026–1041, 2007.

    Article  Google Scholar 

  20. Joliffe, I., Principal component analysis. Springer: New York, 1986.

    Google Scholar 

  21. Khan, J. et al., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6):673–679, 2001.

    Article  Google Scholar 

  22. Kriegel, H. P., Kroger, P., and Zimek, A., Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. KDD 3(1):1–ex, 2009.

    Google Scholar 

  23. Laiho, P. et al., Serrated carcinomas form a subclass of colorectal cancer with distinct molecular basis. Oncogene 26(2):312–320, 2007.

    Article  Google Scholar 

  24. Ng, A., Jordan, M., and Weiss, Y., On spectral clustering: analysis and an algorithm. Advances in NIPS 14, 2001.

  25. Nutt, C. et al., Gene expressionbased classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7):1602–1607, 2003.

    Google Scholar 

  26. Pomeroy, S. et al., Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442, 2002.

    Article  Google Scholar 

  27. Ramaswamy, S. et al., Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98(26):15149–15154, 2001.

    Article  Google Scholar 

  28. Shipp, M. A. et al., Diffuse large B-cell lymphoma outcome prediction by geneexpression profiling and supervised machine learning. Nat. Med. 8:68–74, 2002.

    Article  Google Scholar 

  29. Singh, D. et al., Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209, 2002.

    Article  Google Scholar 

  30. Spang, R., Diagnostic signatures from microarrays: a bioinformatics concept for personalized medicine. BIOSILICO 1:264–268, 2003.

    Article  Google Scholar 

  31. Strehl, A., and Ghosh, J., Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3:583–617, 2002.

    MathSciNet  Google Scholar 

  32. Su, A. et al., Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res. 61(20):7388–7393, 2001.

    Google Scholar 

  33. Wallqvist, A., Rabow, A., Shoemaker, R., Sausville, E., and Covell, D., Establishing connections between microarray expression data and chemotherapeutic cancer pharmacology. Mol. Cancer. Ther. 1:311–320, 2002.

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank X. Z. Fern and C. E. Brodley for the source code of HBGF, and C. Domeniconi for the implementation of LAC.

Conflict of Interest The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natthakan Iam-On.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iam-On, N., Boongoen, T. A New Locally Weighted K-Means for Cancer-Aided Microarray Data Analysis. J Med Syst 36 (Suppl 1), 43–49 (2012). https://doi.org/10.1007/s10916-012-9889-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10916-012-9889-0

Keywords

Navigation