A DCA Based Algorithm for Feature Selection in Model-Based Clustering

Nguyen, Viet Anh; Le Thi, Hoai An; Le, Hoai Minh

doi:10.1007/978-3-030-41964-6_35

A DCA Based Algorithm for Feature Selection in Model-Based Clustering

Conference paper
First Online: 04 March 2020

1294 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12033))

Abstract

Gaussian Mixture Models (GMM) is a model-based clustering approach which has been used in many applications thanks to its flexibility and effectiveness. However, in high dimension data, GMM based clustering lost its advantages due to over-parameterization and noise features. To deal with this issue, we incorporate feature selection into GMM clustering. For the first time, a non-convex sparse inducing regularization is considered for feature selection in GMM clustering. The resulting optimization problem is nonconvex for which we develop a DCA (Difference of Convex functions Algorithm) to solve. Numerical experiments on several benchmark and synthetic datasets illustrate the efficiency of our algorithm and its superiority over an EM method for solving the GMM clustering using \(l_1\) regularization.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Article MathSciNet Google Scholar
Bhattacharya, S., McNicholas, P.D.: A LASSO-penalized BIC for mixture model selection. Adv. Data Anal. Classif. 8, 45–61 (2014)
Article MathSciNet Google Scholar
Bouveyron, C., Brunet, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2013)
Article MathSciNet Google Scholar
Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Proceedings of the Fifteenth International Conference on Machine Learning ICML 1998, pp. 82–90 (1998)
Google Scholar
Grun, B.: Model-based clustering. In: Fruhwirth-Schnatter, S., Celeux, G., Robert, C.P. (eds.) Handbook of Mixture Analysis. Taylor and Francis, New York (2019)
Google Scholar
Guo, J., Levina, E., Michailidis, G., Zhu, J.: Pairwise variable selection for high-dimensional model-based clustering. Biometrics 66, 793–804 (2009)
Article MathSciNet Google Scholar
Hennig, C.: What are the true clusters? Pattern Recogn. Lett. 64, 53–62 (2015)
Article Google Scholar
Hsieh, C.J., Sustik, M.A., Dhillon, I.S., Ravikumar, P.: QUIC: quadratic approximation for sparse inverse covariance estimation. J. Mach. Learn. Res. 15, 2911–2947 (2014)
MathSciNet MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article Google Scholar
Judice, J., Raydan, M., Rosa, S.: On the solution of the symmetric eigenvalue complementarity problem by the spectral projected gradient algorithm. Numer. Algorithms 47, 391–407 (2008)
Article MathSciNet Google Scholar
Le, H.M., Le Thi, H.A., Nguyen, M.C.: Sparse semi-supervised support vector machines by DC programming and DCA. Neurocomputing 153, 62–76 (2015)
Article Google Scholar
Le Thi, H.A., Le, H.M., Nguyen, V.V., Pham Dinh, T.: A DC programming approach for feature selection in support vector machines learning. J. Adv. Data Anal. Classif. 2, 259–278 (2013)
Article MathSciNet Google Scholar
Le Thi, H.A., Nguyen Thi, B.T., Le, H.M.: Sparse signal recovery by difference of convex functions algorithms. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013. LNCS (LNAI), vol. 7803, pp. 387–397. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36543-0_40
Chapter Google Scholar
Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1–4), 23–46 (2005)
MathSciNet MATH Google Scholar
Le Thi, H.A., Pham Dinh, T., Le, H.M., Vo, X.T.: DC approximation approaches for sparse optimization. Eur. J. Oper. Res. 244(1), 26–46 (2015)
Article MathSciNet Google Scholar
Le Thi, H.A., Phan, D.N.: DC programming and DCA for sparse optimal scoring problem. Neurocomput. 186(C), 170–181 (2016)
Article Google Scholar
Le Thi, H.A., Pham Dinh, T.: DC programming and DCA: thirty years of developments. Math. Program. 169(1), 5–68 (2018)
Article MathSciNet Google Scholar
McNicholas, P.: Model-based clustering. J. Classif. 33, 331–373 (2016)
Article MathSciNet Google Scholar
Pan, W., Shen, X.: Penalized model-based clustering with application to variable selection. J. Mach. Learn. Res 8, 1145–1164 (2007)
MATH Google Scholar
Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop (2017)
Google Scholar
Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to DC programming: theory, algorithms and applications. Acta Math. Vietnamica 22(1), 289–355 (1997)
MathSciNet MATH Google Scholar
Pham Dinh, T., Le Thi, H.A.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
Article MathSciNet Google Scholar
Stahl, D., Sallis, H.: Model-based cluster analysis. Comput. Stat. 4, 341–358 (2015)
Article Google Scholar
Wang, S., Zhu, J.: Model-based high-dimensional clustering and its application to microarray data. Biometrics 64, 440–448 (2008)
Article MathSciNet Google Scholar
Wolfe, J.: Object cluster analysis of social areas. Master’s thesis, Ph.D. thesis, California, Berkeley (1963)
Google Scholar
Zhou, H., Pan, W., Shen, X.: Penalized model-based clustering with un-constrained covariance matrices. Electron. J. Stat. 3, 1473–1496 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Application Department, LGIPM, University of Lorraine, Metz, France
Viet Anh Nguyen, Hoai An Le Thi & Hoai Minh Le

Authors

Viet Anh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Hoai An Le Thi
View author publications
You can also search for this author in PubMed Google Scholar
Hoai Minh Le
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hoai Minh Le .

Editor information

Editors and Affiliations

Department of Applied Informatics, Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Kietikul Jearanaitanakij
Faculty of Computer Science and Information, University Teknologi Malaysia, Kuala Lumpur, Malaysia
Ali Selamat
Department of Applied Informatics, Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński
King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Suphamit Chittayasothorn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, V.A., Le Thi, H.A., Le, H.M. (2020). A DCA Based Algorithm for Feature Selection in Model-Based Clustering. In: Nguyen, N., Jearanaitanakij, K., Selamat, A., Trawiński, B., Chittayasothorn, S. (eds) Intelligent Information and Database Systems. ACIIDS 2020. Lecture Notes in Computer Science(), vol 12033. Springer, Cham. https://doi.org/10.1007/978-3-030-41964-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-41964-6_35
Published: 04 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41963-9
Online ISBN: 978-3-030-41964-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics