Skip to main content

A DCA Based Algorithm for Feature Selection in Model-Based Clustering

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12033))

Abstract

Gaussian Mixture Models (GMM) is a model-based clustering approach which has been used in many applications thanks to its flexibility and effectiveness. However, in high dimension data, GMM based clustering lost its advantages due to over-parameterization and noise features. To deal with this issue, we incorporate feature selection into GMM clustering. For the first time, a non-convex sparse inducing regularization is considered for feature selection in GMM clustering. The resulting optimization problem is nonconvex for which we develop a DCA (Difference of Convex functions Algorithm) to solve. Numerical experiments on several benchmark and synthetic datasets illustrate the efficiency of our algorithm and its superiority over an EM method for solving the GMM clustering using \(l_1\) regularization.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  2. Bhattacharya, S., McNicholas, P.D.: A LASSO-penalized BIC for mixture model selection. Adv. Data Anal. Classif. 8, 45–61 (2014)

    Article  MathSciNet  Google Scholar 

  3. Bouveyron, C., Brunet, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2013)

    Article  MathSciNet  Google Scholar 

  4. Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Proceedings of the Fifteenth International Conference on Machine Learning ICML 1998, pp. 82–90 (1998)

    Google Scholar 

  5. Grun, B.: Model-based clustering. In: Fruhwirth-Schnatter, S., Celeux, G., Robert, C.P. (eds.) Handbook of Mixture Analysis. Taylor and Francis, New York (2019)

    Google Scholar 

  6. Guo, J., Levina, E., Michailidis, G., Zhu, J.: Pairwise variable selection for high-dimensional model-based clustering. Biometrics 66, 793–804 (2009)

    Article  MathSciNet  Google Scholar 

  7. Hennig, C.: What are the true clusters? Pattern Recogn. Lett. 64, 53–62 (2015)

    Article  Google Scholar 

  8. Hsieh, C.J., Sustik, M.A., Dhillon, I.S., Ravikumar, P.: QUIC: quadratic approximation for sparse inverse covariance estimation. J. Mach. Learn. Res. 15, 2911–2947 (2014)

    MathSciNet  MATH  Google Scholar 

  9. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    Article  Google Scholar 

  10. Judice, J., Raydan, M., Rosa, S.: On the solution of the symmetric eigenvalue complementarity problem by the spectral projected gradient algorithm. Numer. Algorithms 47, 391–407 (2008)

    Article  MathSciNet  Google Scholar 

  11. Le, H.M., Le Thi, H.A., Nguyen, M.C.: Sparse semi-supervised support vector machines by DC programming and DCA. Neurocomputing 153, 62–76 (2015)

    Article  Google Scholar 

  12. Le Thi, H.A., Le, H.M., Nguyen, V.V., Pham Dinh, T.: A DC programming approach for feature selection in support vector machines learning. J. Adv. Data Anal. Classif. 2, 259–278 (2013)

    Article  MathSciNet  Google Scholar 

  13. Le Thi, H.A., Nguyen Thi, B.T., Le, H.M.: Sparse signal recovery by difference of convex functions algorithms. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013. LNCS (LNAI), vol. 7803, pp. 387–397. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36543-0_40

    Chapter  Google Scholar 

  14. Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1–4), 23–46 (2005)

    MathSciNet  MATH  Google Scholar 

  15. Le Thi, H.A., Pham Dinh, T., Le, H.M., Vo, X.T.: DC approximation approaches for sparse optimization. Eur. J. Oper. Res. 244(1), 26–46 (2015)

    Article  MathSciNet  Google Scholar 

  16. Le Thi, H.A., Phan, D.N.: DC programming and DCA for sparse optimal scoring problem. Neurocomput. 186(C), 170–181 (2016)

    Article  Google Scholar 

  17. Le Thi, H.A., Pham Dinh, T.: DC programming and DCA: thirty years of developments. Math. Program. 169(1), 5–68 (2018)

    Article  MathSciNet  Google Scholar 

  18. McNicholas, P.: Model-based clustering. J. Classif. 33, 331–373 (2016)

    Article  MathSciNet  Google Scholar 

  19. Pan, W., Shen, X.: Penalized model-based clustering with application to variable selection. J. Mach. Learn. Res 8, 1145–1164 (2007)

    MATH  Google Scholar 

  20. Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop (2017)

    Google Scholar 

  21. Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to DC programming: theory, algorithms and applications. Acta Math. Vietnamica 22(1), 289–355 (1997)

    MathSciNet  MATH  Google Scholar 

  22. Pham Dinh, T., Le Thi, H.A.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)

    Article  MathSciNet  Google Scholar 

  23. Stahl, D., Sallis, H.: Model-based cluster analysis. Comput. Stat. 4, 341–358 (2015)

    Article  Google Scholar 

  24. Wang, S., Zhu, J.: Model-based high-dimensional clustering and its application to microarray data. Biometrics 64, 440–448 (2008)

    Article  MathSciNet  Google Scholar 

  25. Wolfe, J.: Object cluster analysis of social areas. Master’s thesis, Ph.D. thesis, California, Berkeley (1963)

    Google Scholar 

  26. Zhou, H., Pan, W., Shen, X.: Penalized model-based clustering with un-constrained covariance matrices. Electron. J. Stat. 3, 1473–1496 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoai Minh Le .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, V.A., Le Thi, H.A., Le, H.M. (2020). A DCA Based Algorithm for Feature Selection in Model-Based Clustering. In: Nguyen, N., Jearanaitanakij, K., Selamat, A., Trawiński, B., Chittayasothorn, S. (eds) Intelligent Information and Database Systems. ACIIDS 2020. Lecture Notes in Computer Science(), vol 12033. Springer, Cham. https://doi.org/10.1007/978-3-030-41964-6_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41964-6_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41963-9

  • Online ISBN: 978-3-030-41964-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics