Abstract
Random Projection has been used in many applications for dimensionality reduction. In this paper, a variant to the iterative random projection K-means algorithm to cluster high-dimensional data has been proposed and validated experimentally. Iterative random projection K-means (IRP K-means) method [1] is a fusion of dimensionality reduction (random projection) and clustering (K-means). This method starts with a chosen low-dimension and gradually increases the dimensionality in each K-means iteration. K-means is applied in each iteration on the projected data. The proposed variant, in contrast to the IRP K-means, starts with the high dimension and gradually reduces the dimensionality. Performance of the proposed algorithm is tested on five high-dimensional data sets. Of these, two are image and three are gene expression data sets. Comparative Analysis is carried out for the cases of K-means clustering using RP-Kmeans and IRP-Kmeans. The analysis is based on K-means objective function, that is the mean squared error (MSE). It indicates that our variant of IRP K-means method is giving good clustering performance compared to the previous two (RP and IRP) methods. Specifically, for the AT & T Faces data set, our method achieved the best average result \((9.2759\times 10^9)\), where as IRP-Kmeans average MSE is \(1.9134\times 10^{10}\). For the Yale Image data set, our method is giving MSE \(1.6363\times 10^8\), where as the MSE of IRP-Kmeans is \(3.45\times 10^8\). For the GCM and Lung data sets we have got a performance improvement, which is a multiple of 10 on the average MSE. For the Luekemia data set, the average MSE is \(3.6702\times 10^{12}\) and \(7.467\times 10^{12}\) for the proposed and IRP-Kmeans methods respectively. In summary, our proposed algorithm is performing better than the other two methods on the given five data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cardoso, A., Wichert, A.: Iterative random projections for high-dimensional data clustering. Pattern Recogn. Lett. 33, 1749–1755 (2012)
Lloyd, S.: Least squares quantization in pcm. Inf. Theory IEEE Trans. 28, 129–137 (1982)
Johnson, W., Lindenstrauss, J.: Extensions of lipschitz mappings into a hilbert space. Contemp. Math. 26, 189–206 (1984)
Fradkin D., Madigan D.: Experiments with random projections for machine learning. In: KDD ’03: Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data mining (2003)
Bingham E., Mannila H.: Random projection in dimensionality reduction: applications to image and text data. In: KDD ’01: Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data mining (2001)
Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the Twentieth International Conference of Machine Learning (2003)
Deegalla S., Bostrom H.: Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification. In: Proceedings of the 5th International Conference on Machine Learning and Applications (ICMLA), Fl, pp. 245–250 (2006)
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. 31, 651–666 (2010)
Alshamiri, A.K., Singh, A., Surampudi, B.R.: A novel ELM K-means algorithm for clustering. In: Proceedings of 5th International Conference on Swarm, Evolutionary and Memetic Computing (SEMCO), pp. 212–222. Bhubaneswar, India (2014)
Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22, 60–65 (2003)
Achlioptas D.: Database-friendly random projections: Johnson-lindenstrauss with binary coins. J. Comput. Syst. Sci. 66, pp. 671–687. Special Issue on PODS 2001
Li P., Hastie T.J., Church K.W.: Vary sparse random projections. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 287–296. ACM, New York, NY, USA (2006)
Hecht-Nielsen R.: Context vectors: general purpose approximate meaning representations self-organized from raw data. Computational Intelligence: Imitating Life, pp. 43–56 (1994)
Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of 17th ACM Symposium on the Principles of Database Systems, pp. 159–168 (1998)
Boustsidis, C., Zouzias, A., Drineas, P.: Random projections for k-means clustering. Adv. Neural Inf. Proc. Syst. 23, 298–306 (2010)
Dasgupta S.: Experiments with random projection. In: Uncertainity in Artificial Intelligence: Proceedings of the Sixteenth Conference (UAI-2000), pp. 143–151 (2000)
Selim, S.Z., Alsultan, K.: A simulated annealing algorithm for the clsutering problem. Pattern Recogn. 24, 1003–1008 (1991)
Megan A.: Dimensionality reductions that preserve volumes and distance to affine spaces, and their algorithmic applications. Randomization and Approximation Techniques in Computer Science. Springer. volume 2483 of Lecture Notes in Computer Science, pp. 239–253 (2002)
Acknowledgements
The first author would like to thank Dr.Angelo Cardoso for providing the IRP-Kmeans code.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pasunuri, R., China Venkaiah, V., Dhariyal, B. (2019). Ascending and Descending Order of Random Projections: Comparative Analysis of High-Dimensional Data Clustering. In: Yadav, N., Yadav, A., Bansal, J., Deep, K., Kim, J. (eds) Harmony Search and Nature Inspired Optimization Algorithms. Advances in Intelligent Systems and Computing, vol 741. Springer, Singapore. https://doi.org/10.1007/978-981-13-0761-4_14
Download citation
DOI: https://doi.org/10.1007/978-981-13-0761-4_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0760-7
Online ISBN: 978-981-13-0761-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)