Ascending and Descending Order of Random Projections: Comparative Analysis of High-Dimensional Data Clustering

Pasunuri, Raghunadh; China Venkaiah, Vadlamudi; Dhariyal, Bhaskar

doi:10.1007/978-981-13-0761-4_14

Raghunadh Pasunuri¹⁹,
Vadlamudi China Venkaiah¹⁹ &
Bhaskar Dhariyal¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 741))

1698 Accesses
1 Citations

Abstract

Random Projection has been used in many applications for dimensionality reduction. In this paper, a variant to the iterative random projection K-means algorithm to cluster high-dimensional data has been proposed and validated experimentally. Iterative random projection K-means (IRP K-means) method [1] is a fusion of dimensionality reduction (random projection) and clustering (K-means). This method starts with a chosen low-dimension and gradually increases the dimensionality in each K-means iteration. K-means is applied in each iteration on the projected data. The proposed variant, in contrast to the IRP K-means, starts with the high dimension and gradually reduces the dimensionality. Performance of the proposed algorithm is tested on five high-dimensional data sets. Of these, two are image and three are gene expression data sets. Comparative Analysis is carried out for the cases of K-means clustering using RP-Kmeans and IRP-Kmeans. The analysis is based on K-means objective function, that is the mean squared error (MSE). It indicates that our variant of IRP K-means method is giving good clustering performance compared to the previous two (RP and IRP) methods. Specifically, for the AT & T Faces data set, our method achieved the best average result \((9.2759\times 10^9)\), where as IRP-Kmeans average MSE is \(1.9134\times 10^{10}\). For the Yale Image data set, our method is giving MSE \(1.6363\times 10^8\), where as the MSE of IRP-Kmeans is \(3.45\times 10^8\). For the GCM and Lung data sets we have got a performance improvement, which is a multiple of 10 on the average MSE. For the Luekemia data set, the average MSE is \(3.6702\times 10^{12}\) and \(7.467\times 10^{12}\) for the proposed and IRP-Kmeans methods respectively. In summary, our proposed algorithm is performing better than the other two methods on the given five data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cardoso, A., Wichert, A.: Iterative random projections for high-dimensional data clustering. Pattern Recogn. Lett. 33, 1749–1755 (2012)
Article Google Scholar
Lloyd, S.: Least squares quantization in pcm. Inf. Theory IEEE Trans. 28, 129–137 (1982)
Article MathSciNet Google Scholar
Johnson, W., Lindenstrauss, J.: Extensions of lipschitz mappings into a hilbert space. Contemp. Math. 26, 189–206 (1984)
Article MathSciNet Google Scholar
Fradkin D., Madigan D.: Experiments with random projections for machine learning. In: KDD ’03: Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data mining (2003)
Google Scholar
Bingham E., Mannila H.: Random projection in dimensionality reduction: applications to image and text data. In: KDD ’01: Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data mining (2001)
Google Scholar
Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the Twentieth International Conference of Machine Learning (2003)
Google Scholar
Deegalla S., Bostrom H.: Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification. In: Proceedings of the 5th International Conference on Machine Learning and Applications (ICMLA), Fl, pp. 245–250 (2006)
Google Scholar
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. 31, 651–666 (2010)
Article Google Scholar
Alshamiri, A.K., Singh, A., Surampudi, B.R.: A novel ELM K-means algorithm for clustering. In: Proceedings of 5th International Conference on Swarm, Evolutionary and Memetic Computing (SEMCO), pp. 212–222. Bhubaneswar, India (2014)
Google Scholar
Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22, 60–65 (2003)
Article MathSciNet Google Scholar
Achlioptas D.: Database-friendly random projections: Johnson-lindenstrauss with binary coins. J. Comput. Syst. Sci. 66, pp. 671–687. Special Issue on PODS 2001
Google Scholar
Li P., Hastie T.J., Church K.W.: Vary sparse random projections. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 287–296. ACM, New York, NY, USA (2006)
Google Scholar
Hecht-Nielsen R.: Context vectors: general purpose approximate meaning representations self-organized from raw data. Computational Intelligence: Imitating Life, pp. 43–56 (1994)
Google Scholar
Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of 17th ACM Symposium on the Principles of Database Systems, pp. 159–168 (1998)
Google Scholar
Boustsidis, C., Zouzias, A., Drineas, P.: Random projections for k-means clustering. Adv. Neural Inf. Proc. Syst. 23, 298–306 (2010)
Google Scholar
Dasgupta S.: Experiments with random projection. In: Uncertainity in Artificial Intelligence: Proceedings of the Sixteenth Conference (UAI-2000), pp. 143–151 (2000)
Google Scholar
Selim, S.Z., Alsultan, K.: A simulated annealing algorithm for the clsutering problem. Pattern Recogn. 24, 1003–1008 (1991)
Article Google Scholar
Megan A.: Dimensionality reductions that preserve volumes and distance to affine spaces, and their algorithmic applications. Randomization and Approximation Techniques in Computer Science. Springer. volume 2483 of Lecture Notes in Computer Science, pp. 239–253 (2002)
Google Scholar

Download references

Acknowledgements

The first author would like to thank Dr.Angelo Cardoso for providing the IRP-Kmeans code.

Author information

Authors and Affiliations

School of Computer and Information Sciences, University of Hyderabad, Hyderabad, India
Raghunadh Pasunuri, Vadlamudi China Venkaiah & Bhaskar Dhariyal

Authors

Raghunadh Pasunuri
View author publications
You can also search for this author in PubMed Google Scholar
Vadlamudi China Venkaiah
View author publications
You can also search for this author in PubMed Google Scholar
Bhaskar Dhariyal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raghunadh Pasunuri .

Editor information

Editors and Affiliations

School of Engineering and Technology, BML Munjal University, Gurgaon, Haryana, India
Neha Yadav
Department of Sciences and Humanities, National Institute of Technology, Srinagar, Uttarakhand, India
Anupam Yadav
Department of Mathematics, South Asian University, New Delhi, India
Jagdish Chand Bansal
Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Kusum Deep
School of Civil, Environmental and Architectural Engineering, Korea University, Seoul, Korea (Republic of)
Joong Hoon Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pasunuri, R., China Venkaiah, V., Dhariyal, B. (2019). Ascending and Descending Order of Random Projections: Comparative Analysis of High-Dimensional Data Clustering. In: Yadav, N., Yadav, A., Bansal, J., Deep, K., Kim, J. (eds) Harmony Search and Nature Inspired Optimization Algorithms. Advances in Intelligent Systems and Computing, vol 741. Springer, Singapore. https://doi.org/10.1007/978-981-13-0761-4_14

Download citation

DOI: https://doi.org/10.1007/978-981-13-0761-4_14
Published: 24 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0760-7
Online ISBN: 978-981-13-0761-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics