research-article

Large-Scale Subspace Clustering via k-Factorization

Author:
Jicong Fan

The Chinese University of Hong Kong (Shenzhen) and Shenzhen Research Institute of Big Data, Shenzhen, China

The Chinese University of Hong Kong (Shenzhen) and Shenzhen Research Institute of Big Data, Shenzhen, China
View Profile

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningAugust 2021Pages 342–352https://doi.org/10.1145/3447548.3467267

Published:14 August 2021Publication History

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Pages 342–352

ABSTRACT

Subspace clustering (SC) aims to cluster data lying in a union of low-dimensional subspaces. Usually, SC learns an affinity matrix and then performs spectral clustering. Both steps suffer from high time and space complexity, which leads to difficulty in clustering large datasets. This paper presents a method called k-Factorization Subspace Clustering (k-FSC) for large-scale subspace clustering. K-FSC directly factorizes the data into k groups via pursuing structured sparsity in the matrix factorization model. Thus, k-FSC avoids learning affinity matrix and performing eigenvalue decomposition, and has low (linear) time and space complexity on large datasets. This paper proves the effectiveness of the k-FSC model theoretically. An efficient algorithm with convergence guarantee is proposed to solve the optimization of k-FSC. In addition, k-FSC is able to handle sparse noise, outliers, and missing data, which are pervasive in real applications. This paper also provides online extension and out-of-sample extension for k-FSC to handle streaming data and cluster arbitrarily large datasets. Extensive experiments on large-scale real datasets show that k-FSC and its extensions outperform state-of-the-art methods of subspace clustering.

Supplemental Material

largescale_subspace_clustering_via_kfactorization-jicong_fan.mp4

mp4

154.1 MB

Download

References

Pankaj K Agarwal and Nabil H Mustafa. 2004. K-means projective clustering. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 155--165.Google ScholarDigital Library
Ralph G Andrzejak, Klaus Lehnertz, Florian Mormann, Christoph Rieke, Peter David, and Christian E Elger. 2001. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, Vol. 64, 6 (2001), 061907.Google ScholarCross Ref
Amir Beck. 2017. First-order methods in optimization. SIAM.Google Scholar
Paul S Bradley and Olvi L Mangasarian. 2000. K-plane clustering. Journal of Global Optimization, Vol. 16, 1 (2000), 23--32.Google ScholarDigital Library
Joan Bruna and Stéphane Mallat. 2013. Invariant scattering convolution networks. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 8 (2013), 1872--1886.Google ScholarDigital Library
Emmanuel J. Candès and Benjamin Recht. 2009. Exact Matrix Completion via Convex Optimization. Foundations of Computational Mathematics, Vol. 9, 6 (2009), 717--772. https://doi.org/10.1007/s10208-009-9045-5Google ScholarCross Ref
Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and Edward Y Chang. 2010. Parallel spectral clustering in distributed systems. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 3 (2010), 568--586.Google Scholar
Xinlei Chen and Deng Cai. 2011. Large scale spectral clustering with landmark-based representation. In Twenty-fifth AAAI conference on artificial intelligence. Citeseer.Google ScholarDigital Library
Xiaojun Chen, Weijun Hong, Feiping Nie, Dan He, Min Yang, and Joshua Zhexue Huang. 2018. Spectral clustering of large-scale data by directly solving normalized cut. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1206--1215.Google ScholarDigital Library
Ying Chen, Chun-Guang Li, and Chong You. 2020. Stochastic Sparse Subspace Clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4155--4164.Google ScholarCross Ref
Chris Ding, Xiaofeng He, and Horst D Simon. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the 2005 SIAM international conference on data mining. SIAM, 606--610.Google ScholarCross Ref
Chris Ding, Ding Zhou, Xiaofeng He, and Hongyuan Zha. 2006. R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In Proceedings of the 23rd international conference on Machine learning. 281--288.Google ScholarDigital Library
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
Yonina C Eldar, Patrick Kuppinger, and Helmut Bolcskei. 2010. Block-sparse signals: Uncertainty relations and efficient recovery. IEEE Transactions on Signal Processing, Vol. 58, 6 (2010), 3042--3054.Google ScholarDigital Library
E. Elhamifar and R. Vidal. 2013. Sparse Subspace Clustering: Algorithm, Theory, and Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 11 (2013), 2765--2781. https://doi.org/10.1109/TPAMI.2013.57Google ScholarDigital Library
Jicong Fan and Tommy W.S. Chow. 2017. Sparse subspace clustering for data with missing entries and high-rank matrix completion. Neural Networks, Vol. 93 (2017), 36--44.Google ScholarCross Ref
Jicong Fan, Lijun Ding, Yudong Chen, and Madeleine Udell. 2019. Factor group-sparse regularization for efficient low-rank matrix recovery. In Advances in Neural Information Processing Systems. 5104--5114.Google Scholar
Jicong Fan, Zhaoyang Tian, Mingbo Zhao, and Tommy W.S. Chow. 2018. Accelerated low-rank representation for subspace clustering and semi-supervised classification on large-scale data. Neural Networks, Vol. 100 (2018), 39--48.Google ScholarDigital Library
Jicong Fan, Chengrun Yang, and Madeleine Udell. 2021. Robust Non-Linear Matrix Factorization for Dictionary Learning, Denoising, and Clustering. IEEE Transactions on Signal Processing, Vol. 69 (2021), 1755--1770.Google ScholarCross Ref
Charless Fowlkes, Serge Belongie, Fan Chung, and Jitendra Malik. 2004. Spectral grouping using the Nystrom method. IEEE transactions on pattern analysis and machine intelligence, Vol. 26, 2 (2004), 214--225.Google ScholarDigital Library
Andrew Gitlin, Biaoshuai Tao, Laura Balzano, and John Lipor. 2018. Improving K-Subspaces via Coherence Pursuit. IEEE Journal of Selected Topics in Signal Processing, Vol. 12, 6 (2018), 1575--1588.Google ScholarCross Ref
Markus Haltmeier. 2013. Block-sparse analysis regularization of ill-posed problems via l 2, 1-minimization. In 2013 18th International Conference on Methods & Models in Automation & Robotics (MMAR). IEEE, 520--523.Google ScholarCross Ref
Jun He, Yue Zhang, Jiye Wang, Nan Zeng, and Hanyong Hao. 2016. Robust k-subspaces recovery with combinatorial initialization. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 3573--3582.Google ScholarCross Ref
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.Google ScholarCross Ref
Daniel D Lee and H Sebastian Seung. 2001. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems. 556--562.Google Scholar
Chun-Guang Li and Rene Vidal. 2015. Structured sparse subspace clustering: A unified optimization framework. In Proceedings of the IEEE conference on computer vision and pattern recognition. 277--286.Google Scholar
J. Li, H. Liu, Z. Tao, H. Zhao, and Y. Fu. 2020. Learnable Subspace Clustering. IEEE Transactions on Neural Networks and Learning Systems (2020), 1--15.Google Scholar
Jun Li and Handong Zhao. 2017. Large-scale subspace clustering by fast regression coding. In IJCAI.Google Scholar
G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. 2013. Robust Recovery of Subspace Structures by Low-Rank Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 1 (2013), 171--184.Google ScholarDigital Library
Canyi Lu, Jiashi Feng, Zhouchen Lin, Tao Mei, and Shuicheng Yan. 2018. Subspace clustering by block diagonal representation. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 2 (2018), 487--501.Google Scholar
Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. 2009. Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning. ACM, 689--696.Google ScholarDigital Library
Shin Matsushima and Maria Brbic. 2019. Selective sampling-based scalable sparse subspace clustering. In Advances in Neural Information Processing Systems. 12416--12425.Google Scholar
Feiping Nie, Heng Huang, Xiao Cai, and Chris H Ding. 2010. Efficient and robust feature selection via joint ℓ_2,1-norms minimization. In Advances in neural information processing systems. 1813--1821.Google Scholar
Neal Parikh, Stephen Boyd, et almbox. 2014. Proximal algorithms. Foundations and Trends® in Optimization, Vol. 1, 3 (2014), 127--239.Google Scholar
Lance Parsons, Ehtesham Haque, and Huan Liu. 2004. Subspace clustering for high dimensional data: a review. SIGKDD Explor. Newsl., Vol. 6, 1 (2004), 90--105.Google ScholarDigital Library
Xi Peng, Lei Zhang, and Zhang Yi. 2013. Scalable sparse subspace clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 430--437.Google ScholarDigital Library
I. Ramirez, P. Sprechmann, and G. Sapiro. 2010. Classification and clustering via dictionary learning with structured incoherence and shared features. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 3501--3508. https://doi.org/10.1109/CVPR.2010.5539964Google Scholar
Kelvin Sim, Vivekanand Gopalkrishnan, Arthur Zimek, and Gao Cong. 2013. A survey on enhanced subspace clustering. Data mining and knowledge discovery, Vol. 26, 2 (2013), 332--397.Google Scholar
Pablo Sprechmann and Guillermo Sapiro. 2010. Dictionary learning and sparse coding for unsupervised clustering. In 2010 IEEE international conference on acoustics, speech and signal processing. IEEE, 2042--2045.Google ScholarCross Ref
Yuanming Suo, Minh Dao, Trac Tran, Hojjat Mousavi, Umamahesh Srinivas, and Vishal Monga. 2014. Group structured dirty dictionary learning for classification. In 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 150--154.Google ScholarCross Ref
Zoltán Szabó, Barnabás Póczos, and András LHo rincz. 2011. Online group-structured dictionary learning. In CVPR 2011. IEEE, 2865--2872.Google ScholarDigital Library
R. Vidal. 2011. Subspace Clustering. IEEE Signal Processing Magazine, Vol. 28, 2 (2011), 52--68. https://doi.org/10.1109/MSP.2010.939739Google ScholarCross Ref
Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing, Vol. 17, 4 (2007), 395--416.Google Scholar
Shusen Wang, Bojun Tu, Congfu Xu, and Zhihua Zhang. 2014. Exact subspace clustering in linear time. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. 2113--2120.Google ScholarCross Ref
Lingfei Wu, Pin-Yu Chen, Ian En-Hsu Yen, Fangli Xu, Yinglong Xia, and Charu Aggarwal. 2018. Scalable spectral clustering using random binning features. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2506--2515.Google ScholarDigital Library
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.showeprint[arXiv]cs.LG/1708.07747 [cs.LG]Google Scholar
Yangyang Xu and Wotao Yin. 2013. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on imaging sciences, Vol. 6, 3 (2013), 1758--1789.Google Scholar
Chong You, Chun-Guang Li, Daniel P Robinson, and René Vidal. 2016a. Oracle based active set algorithm for scalable elastic net subspace clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3928--3937.Google ScholarCross Ref
Chong You, Daniel Robinson, and René Vidal. 2016b. Scalable sparse subspace clustering by orthogonal matching pursuit. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3918--3927.Google ScholarCross Ref
Tong Zhang, Pan Ji, Mehrtash Harandi, Wenbing Huang, and Hongdong Li. 2019. Neural collaborative subspace clustering. arXiv preprint arXiv:1904.10596 (2019).Google Scholar

Index Terms

Large-Scale Subspace Clustering via k-Factorization
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Large-scale non-negative subspace clustering based on Nyström approximation
Abstract
Large-scale subspace clustering usually drops the requirements of the full similarity matrix and Laplacian matrix but constructs the anchor affinity matrix and uses matrix approximation methods to reduce the clustering complexity. ...
Highlights
- A novel large-scale non-negative subspace clustering model based on Nyström approximation is designed.
Read More
A global-local affinity matrix model via EigenGap for graph-based subspace clustering

We propose a Global-Local Affinity Matrix Model for Graph-based Subspace Clustering.We propose a criterion called Fractional Eigenvalues Sum (FEVS) for global scheme.Our proposed model is solved by Alternative Direction Method (ADM).We evaluates our ...
Read More
Symmetric low-rank representation for subspace clustering

We propose a symmetric low-rank representation (SLRR) method for subspace clustering, which assumes that a data set is approximately drawn from the union of multiple subspaces. The proposed technique can reveal the membership of multiple subspaces ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
General Chairs:
Feida Zhu
Singapore Management University
,
Beng Chin Ooi
National University of Singapore
,
Chunyan Miao
Nanyang Technology University
,
Program Chairs:
Haixun Wang,
Iryna Skrypnyk,
Wynne Hsu,
Sanjay Chawla
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 August 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
large-scale clustering
matrix factorization
spectral clustering
subspace clustering
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 602
  Total Downloads
- Downloads (Last 12 months)92
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Large-Scale Subspace Clustering via k-Factorization

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Large-scale non-negative subspace clustering based on Nyström approximation

A global-local affinity matrix model via EigenGap for graph-based subspace clustering

Symmetric low-rank representation for subspace clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Large-Scale Subspace Clustering via k-Factorization

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Large-scale non-negative subspace clustering based on Nyström approximation

A global-local affinity matrix model via EigenGap for graph-based subspace clustering

Symmetric low-rank representation for subspace clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media