Skip to main content

Ascending and Descending Order of Random Projections: Comparative Analysis of High-Dimensional Data Clustering

  • Conference paper
  • First Online:
Harmony Search and Nature Inspired Optimization Algorithms

Abstract

Random Projection has been used in many applications for dimensionality reduction. In this paper, a variant to the iterative random projection K-means algorithm to cluster high-dimensional data has been proposed and validated experimentally. Iterative random projection K-means (IRP K-means) method [1] is a fusion of dimensionality reduction (random projection) and clustering (K-means). This method starts with a chosen low-dimension and gradually increases the dimensionality in each K-means iteration. K-means is applied in each iteration on the projected data. The proposed variant, in contrast to the IRP K-means, starts with the high dimension and gradually reduces the dimensionality. Performance of the proposed algorithm is tested on five high-dimensional data sets. Of these, two are image and three are gene expression data sets. Comparative Analysis is carried out for the cases of K-means clustering using RP-Kmeans and IRP-Kmeans. The analysis is based on K-means objective function, that is the mean squared error (MSE). It indicates that our variant of IRP K-means method is giving good clustering performance compared to the previous two (RP and IRP) methods. Specifically, for the AT & T Faces data set, our method achieved the best average result \((9.2759\times 10^9)\), where as IRP-Kmeans average MSE is \(1.9134\times 10^{10}\). For the Yale Image data set, our method is giving MSE \(1.6363\times 10^8\), where as the MSE of IRP-Kmeans is \(3.45\times 10^8\). For the GCM and Lung data sets we have got a performance improvement, which is a multiple of 10 on the average MSE. For the Luekemia data set, the average MSE is \(3.6702\times 10^{12}\) and \(7.467\times 10^{12}\) for the proposed and IRP-Kmeans methods respectively. In summary, our proposed algorithm is performing better than the other two methods on the given five data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cardoso, A., Wichert, A.: Iterative random projections for high-dimensional data clustering. Pattern Recogn. Lett. 33, 1749–1755 (2012)

    Article  Google Scholar 

  2. Lloyd, S.: Least squares quantization in pcm. Inf. Theory IEEE Trans. 28, 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  3. Johnson, W., Lindenstrauss, J.: Extensions of lipschitz mappings into a hilbert space. Contemp. Math. 26, 189–206 (1984)

    Article  MathSciNet  Google Scholar 

  4. Fradkin D., Madigan D.: Experiments with random projections for machine learning. In: KDD ’03: Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data mining (2003)

    Google Scholar 

  5. Bingham E., Mannila H.: Random projection in dimensionality reduction: applications to image and text data. In: KDD ’01: Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data mining (2001)

    Google Scholar 

  6. Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the Twentieth International Conference of Machine Learning (2003)

    Google Scholar 

  7. Deegalla S., Bostrom H.: Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification. In: Proceedings of the 5th International Conference on Machine Learning and Applications (ICMLA), Fl, pp. 245–250 (2006)

    Google Scholar 

  8. Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. 31, 651–666 (2010)

    Article  Google Scholar 

  9. Alshamiri, A.K., Singh, A., Surampudi, B.R.: A novel ELM K-means algorithm for clustering. In: Proceedings of 5th International Conference on Swarm, Evolutionary and Memetic Computing (SEMCO), pp. 212–222. Bhubaneswar, India (2014)

    Google Scholar 

  10. Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22, 60–65 (2003)

    Article  MathSciNet  Google Scholar 

  11. Achlioptas D.: Database-friendly random projections: Johnson-lindenstrauss with binary coins. J. Comput. Syst. Sci. 66, pp. 671–687. Special Issue on PODS 2001

    Google Scholar 

  12. Li P., Hastie T.J., Church K.W.: Vary sparse random projections. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 287–296. ACM, New York, NY, USA (2006)

    Google Scholar 

  13. Hecht-Nielsen R.: Context vectors: general purpose approximate meaning representations self-organized from raw data. Computational Intelligence: Imitating Life, pp. 43–56 (1994)

    Google Scholar 

  14. Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of 17th ACM Symposium on the Principles of Database Systems, pp. 159–168 (1998)

    Google Scholar 

  15. Boustsidis, C., Zouzias, A., Drineas, P.: Random projections for k-means clustering. Adv. Neural Inf. Proc. Syst. 23, 298–306 (2010)

    Google Scholar 

  16. Dasgupta S.: Experiments with random projection. In: Uncertainity in Artificial Intelligence: Proceedings of the Sixteenth Conference (UAI-2000), pp. 143–151 (2000)

    Google Scholar 

  17. Selim, S.Z., Alsultan, K.: A simulated annealing algorithm for the clsutering problem. Pattern Recogn. 24, 1003–1008 (1991)

    Article  Google Scholar 

  18. Megan A.: Dimensionality reductions that preserve volumes and distance to affine spaces, and their algorithmic applications. Randomization and Approximation Techniques in Computer Science. Springer. volume 2483 of Lecture Notes in Computer Science, pp. 239–253 (2002)

    Google Scholar 

Download references

Acknowledgements

The first author would like to thank Dr.Angelo Cardoso for providing the IRP-Kmeans code.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raghunadh Pasunuri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pasunuri, R., China Venkaiah, V., Dhariyal, B. (2019). Ascending and Descending Order of Random Projections: Comparative Analysis of High-Dimensional Data Clustering. In: Yadav, N., Yadav, A., Bansal, J., Deep, K., Kim, J. (eds) Harmony Search and Nature Inspired Optimization Algorithms. Advances in Intelligent Systems and Computing, vol 741. Springer, Singapore. https://doi.org/10.1007/978-981-13-0761-4_14

Download citation

Publish with us

Policies and ethics