Skip to main content

Advertisement

Log in

Feature selection based on an improved cat swarm optimization algorithm for big data classification

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Feature selection, which is a type of optimization problem, is generally achieved by combining an optimization algorithm with a classifier. Genetic algorithms and particle swarm optimization (PSO) are two commonly used optimal algorithms. Recently, cat swarm optimization (CSO) has been proposed and demonstrated to outperform PSO. However, CSO is limited by long computation times. In this paper, we modify CSO to present an improved algorithm, ICSO. We then apply the ICSO algorithm to select features in a text classification experiment for big data. Results show that the proposed ICSO outperforms traditional CSO. For big data classification, the results show that using term frequency-inverse document frequency (TF-IDF) with ICSO for feature selection is more accurate than using TF-IDF alone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Xu Z, Jin R, Ye J, Lyu MR, King I (2009) Non-monotonic feature selection. 26th International conference on machine learning

  2. Zhao Z, Morstatter F, Sharma S, Alelyani S, Anand A, Liu H (2010) Advancing feature selection research-ASU feature selection repository. School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe

  3. Cha S-H, Tappert C (2009) A genetic algorithm for constructing compact binary decision trees. J Pattern Recognit Res 1:1–13

  4. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of IEEE international conference on neural networks IV, pp 1942–1948

  5. Colorni A, Dorigo M, Maniezzo V (1991) Distributed optimization by ant colonies. In: Proceedings of the 1st European conference on artificial life, pp 134–142, Paris

  6. Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-TR06, Erciyes University, Engineering Faculty, Computer Engineering Department

  7. Chu SC, Tsai PW (2007) Computational intelligence based on the behavior of cats. Int J Innov Comput Inf Control 3(1):163–173

    Google Scholar 

  8. Deivaseelan A, Babu P (2012) Modified cat swarm optimization for Iir system identification. Adv Nat Appl Sci 6(6):731–740

  9. Orouskhani M, Orouskhani Y, Mansouri M, Teshnehlab M (2013) A novel cat swarm optimization algorithm for unconstrained optimization problems. Inf Technol Comput Sci 5(11):32–41

    Google Scholar 

  10. Lin K-C, Zhang K-Y, Hung JC (2014) Feature selection of support vector machine based on harmonious cat swarm optimization. The 7th IEEE international conference on Ubi-Media computing (UMEDIA’14), Ulaanbaatar, Mongolia, July 12–14

  11. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo, California

    Google Scholar 

  12. Zhang GP (2000) Neural networks for classification: a survey. IEEE Trans Syst Man Cybern Part C Appl Rev 30(4):451–462

  13. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  14. Lewis DD (1998) Naive (Bayes) at forty the independence assumption in information retrieval. 10th European conference on machine learning, pp 4–15

  15. Hsu C-W, Chang C-C, Lin C-J (2003) A practical guide to support vector classiffication. From website: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

  16. Chang CC, Lin CJ LIBSVM: a library for support vector machines. From website: http://www.csie.ntu.edu.tw/~cjlin/libsvm

  17. Lin KC, Chien HY (2009) CSO-based feature selection and parameter optimization for support vector machine. In: Joint conference on pervasive computing, pp 783–788

  18. Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. From website: http://www.ics.uci.edu/~mlearn/MLRepository.html

  19. Salzberg SL (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Discov 1:317–328

    Article  Google Scholar 

  20. (2014) Food Culture in Taiwan-Food Categories. From website: http://data.gov.tw/

  21. Tsai C-H (2000) MMSEG: a word identification system for Mandarin Chinese text based on two variants of the maximum matching algorithm. From website: http://technology.chtsai.org/mmseg/

  22. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  23. Lin K-C, Hsu S-H, Hung JC (2012) Adaptive SVM-based classification systems based on the improved endocrine-based PSO algorithm. Lect Notes Comput Sci 7669:543–552

    Article  Google Scholar 

  24. Lin K-C, Huang Y-H, Hung JC, Lin Y-T (2015) Feature selection and parameter optimization of support vector machines based on modified cat swarm optimization. Int J Distrib Sens Netw 2015:9. Article ID 365869. doi:10.1155/2015/365869

  25. Lin K-C, Hsieh Y-H (2015) Classification of medical datasets using SVMs with hybrid evolutionary algorithms based on endocrine-based particle swarm optimization and artificial bee colony algorithms. J Med Syst 39(10)

  26. Lin K-C, Chen S-Y, Hung JC (2015) Feature selection and parameter optimization of support vector machines based on modified artificial fish swarm algorithms. Math Probl Eng 2015:9. Article ID 604108. doi:10.1155/2015/604108

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason C. Hung.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, KC., Zhang, KY., Huang, YH. et al. Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput 72, 3210–3221 (2016). https://doi.org/10.1007/s11227-016-1631-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1631-0

Keywords

Navigation