Skip to main content
Log in

Feature selection based on long short term memory for text classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The selection of discriminative terms from large quantity of terms in text documents is helpful for achieving better accuracy of text classification. To focus on the task of selecting discriminative terms from text, a deep learning based feature selection method is proposed. The method is developed by using the long short term memory (LSTM) network. A deep network based on LSTM is trained in unsupervised manner to extracted deep features from bag-of-words term frequency vectors. The deep features are integrated with term frequencies to evaluate the effectiveness of terms. The proposed method extends the limitation of term frequency information by applying deep features for feature selection. Experiments in nine public datasets demonstrate better performance of our method in selecting discriminative terms than comparative methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data availability

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Abdi A, Shamsuddin S, Hasan MS et al (2019) Deep learning-based sentiment classification of evaluative text based on multi-feature fusion. Inf Process Manage 56(4):1245–1259

    Google Scholar 

  2. Abdullah M, Hadzikadic M, Shaikh S (2018) SEDAT: sentiment and emotion detection in Arabic text using CNN-LSTM deep learning. In: Proceedings of 17th IEEE international conference on machine learning and applications (IEEE ICMLA), Orlando, pp 835–840

  3. Abdur R, Kashif J, Haroon AB (2017) Feature selection based on a normalized difference measure for text classification. Inf Process Manage 53:473–489

    Google Scholar 

  4. Adel A, Omar N, Abdullah S, Al-Shabi A (2022) Co-operative binary bat optimizer with rough set reducts for text feature selection. Appl Sci-Basel 12(21):1–35

    Google Scholar 

  5. Agnihotri D, Verma K, Tripathi P (2017) Variable global feature selection scheme for automatic classification of text documents. Expert Syst Appl 81:268–281

    Google Scholar 

  6. Agnihotri D, Verma K, Tripathi P et al (2019) Soft voting technique to improve the performance of global filter based feature selection in text corpus. Appl Intell 49:1597–1619

    Google Scholar 

  7. Al-Dyani WZ, Ahmad FK, Kamaruddin SS (2022) adaptive binary bat and markov clustering algorithms for optimal text feature selection in news events detection model. IEEE Access 10(85655):85676

    Google Scholar 

  8. Ali F, El-Sappagh S, Kwak D (2019) Fuzzy ontology and LSTM-based text mining: a transportation network monitoring system for assisting travel. Sensors 19(2):234

    Google Scholar 

  9. Asim M, Javed K, Rehman A, Babri HA (2021) Int J Mach Learn Cyber 12(9):2461–2478

    Google Scholar 

  10. Azam N, Yao JT (2012) Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst Appl 39(5):4760–4768

    Google Scholar 

  11. Balderas D, Ponce P, Molina A (2019) Convolutional long short term memory deep neural networks for image sequence prediction. Expert Syst Appl 122:152–162

    Google Scholar 

  12. Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42:3105–3114

    Google Scholar 

  13. Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl Soft Comput 43:20–34

    Google Scholar 

  14. Breuel TM (2017) High performance text recognition using a hybrid convolutional-LSTM implementation. In: Proceedings of 14th IAPR international conference on document analysis and recognition (ICDAR), Kyoto, pp 11–16

  15. Brunello A, Sciavicco G (2019) Multiobjective evolutionary feature selection and fuzzy classification of contact centre data. Expert Systems 36(3):e12375

    Google Scholar 

  16. Cekik R, Uysal AK (2020) A novel filter feature selection method using rough set for short text data. Expert Syst Appl 160:1–15

    Google Scholar 

  17. Chen Z, Tondi B, Li X et al (2019) Secure detection of image manipulation by means of random feature selection. IEEE Trans Inf Forensics Secur 14(9):2454–2469

    Google Scholar 

  18. Cheng CH, Chen HH (2019) Sentimental text mining based on an additional features method for text classification. PLoS One 14(6):e0217591

    Google Scholar 

  19. Ciarelli PM, Oliveira E (2009) Agglomeration and elimination of terms for dimensionality reduction. In: Proceedings of 9th International Conference on Intelligent Systems Design and Applications. Pias, Italy, pp 547–552

  20. Ciarelli PM, Salles EOT, Oliveira E (2010) An evolving system based on probabilistic neural Network. In: Proceedings of 2010 Eleventh Brazilian Symposium on Neural Networks, Sao Paulo, Brazil, Vol. 1, pp. 182–187

  21. Cui Q, EI-Arroudi K, Weng Y (2019) A feature selection method for high impedance fault detection. IEEE Trans Power Delivery 34(3):1203–1215

    Google Scholar 

  22. Deng X, Li Y, Weng J et al (2019) Feature selection for text classification: a review. Multimed Tools Appl 78:3739–3816

    Google Scholar 

  23. El-Hajj W, Hajj H (2022) An optimal approach for text feature selection. Comput Speech Lang 74:1–14

    Google Scholar 

  24. Erenel Z, Adegboye OR, Kusetogullari H (2020) A new feature selection scheme for emotion recognition from text. Appl Sci-Basel 10(15):1–13

    Google Scholar 

  25. FarghalyAbd El-Hafeez HMT (2023) A high-quality feature selection method based on frequent and correlated items for text classification. Soft Comput 27(16):11259–11274

    Google Scholar 

  26. Feng G, An B, Yang F et al (2017) Relevance popularity: a term event model based feature selection scheme for text classification. PLoS One 12(4):e0174341

    Google Scholar 

  27. Feng G, Guo J, Jing BY, Sun T (2015) Feature subset selection using naive bayes for text classification. Pattern Recogn Lett 65:109–115

    Google Scholar 

  28. Fernandes M, Canito A, Bolon-Canedo V et al (2019) Data analysis and feature selection for predictive maintenance: a case-study in the metallurgic industry. Int J Inform Manag 46:252–262

    Google Scholar 

  29. Fu G, Li B, Yang Y, Li C (2023) Re-ranking and TOPSIS-based ensemble feature selection with multi-stage aggregation for text categorization. Pattern Recogn Lett 168(47):56

    Google Scholar 

  30. Garg M (2022) UBIS: unigram bigram importance score for feature selection from short text. Expert Syst Appl 195:1–10

    Google Scholar 

  31. Ganesan K, Zhai CX (2012) Opinion-based entity ranking. Inf Retrieval 15(2):116–150

    Google Scholar 

  32. Gao Z, Xu Y, Meng F, Qi F, Lin Z (2014) Improved information gain-based feature selection for text categorization, In: Proceedings of the 4th International Conference on Wireless Communications, Vehicular Technology, Information Theory and Aerospace & Electronic Systems, Aalborg, Denmark

  33. Ghareb AS, Bakar AA, Hamdan AR (2016) Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst Appl 49:31–47

    Google Scholar 

  34. Guo Y, Li W, Wang B et al (2019) DeepACLSTM: deep asymmetric convolutional long short-term memory neural models for protein secondary structure prediction. BMC Bioinform 20:341

    Google Scholar 

  35. Hosseinalipour A, Gharehchopogh FS, Masdari M, Khademi A (2021) A novel binary farmland fertility algorithm for feature selection in analysis of the text psychology. Appl Intell 51(7):4824–4859

    Google Scholar 

  36. Hu Q, Sulla-Menashe D, Xu B et al (2019) A phenology-based spectral and temporal feature selection method for crop mapping from satellite time series. Int J Appl Earth Obs Geoinf 80:218–229

    Google Scholar 

  37. Jang B, Kim M, Harerimana G, Kang SU, Kim JW (2020) Bi-LSTM model to increase accuracy in text classification: combining Word2vec CNN and attention mechanism. Appl Sci-Basel 10(17):1–14

    Google Scholar 

  38. Jiang JW, Zhang HY, Dai CX, Zhao QJ, Feng H, Ji ZL, Ganchev I (2021) Enhancements of attention-based bidirectional LSTM for hybrid automatic text summarization. IEEE Access 9:123660–123671

    Google Scholar 

  39. Jin BL, Zhang L, Zhao L (2023) Feature selection based on absolute deviation factor for text classification. Inf Process Manage 60(3):1–31

    Google Scholar 

  40. Jin C, Ma T, Hou R et al (2015) Chi-square statistics feature selection based on term frequency and distribution for text categorization. IETE J Res 61(4):351–362

    Google Scholar 

  41. Joachims T (1999) Transductive Inference for Text Classification using Support Vector Machines. In: 16th International Conference on Machine Learning, Bled, Slovenia, pp. 200–209

  42. Kashif J, Haroon AB, Sameen M (2016) Improving text classification performance with random forests-based feature selection. Arab J Sci Eng 41(3):951–964

    Google Scholar 

  43. Kashif J, Sameen M, Haroon AB (2015) A two-stage markov blanket based feature selection algorithm for text classification. Neurocomputing 157:91–104

    Google Scholar 

  44. Karthiga R, Mangai S (2019) Feature selection using multi-objective modified genetic algorithm in multimodal biometric system. J Med Syst 43(7):214

    Google Scholar 

  45. Kilinç D, Özçift A, Bozyiğit F, Yildirim P, Yucalar F, Borandağ E (2015) Ttc-3600: a new benchmark dataset for Turkish text categorization. J Inf Sci 43(2):174–185

    Google Scholar 

  46. Kotzias D, Denil M, De Freitas N, Smyth P (2015) From group to individual labels using deep features. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, pp. 597–606

  47. Kozodoi N, Lessmann S, Papakonstantinou K et al (2019) A multi-objective approach for profit-driven feature selection in credit scoring. Decis Support Syst 120:106–117

    Google Scholar 

  48. Kumar MRP, Jayagopal P (2023) Context-sensitive lexicon for imbalanced text sentiment classification using bidirectional LSTM. J Intell Manuf 34(5):2123–2132

    Google Scholar 

  49. Kushwaha N, Pant M (2018) Link based BPSO for feature selection in big data text clustering. Futur Gener Comput Syst 82:190–199

    Google Scholar 

  50. Lamirel JC, Cuxac P, Chivukula AS et al (2015) Optimizing text classification through efficient feature selection based on quality metric. J Intell Inf Syst 45(3):379–396

    Google Scholar 

  51. Leclercq M, Vittrant B, Martin-Magniette ML et al (2019) Large-scale automatic feature selection for biomarker discovery in high-dimensional OMICs data. Front Genet 10:452

    Google Scholar 

  52. Li L, Li W, Gong D (2019) Naive bayesian automatic classification of railway service complaint text based on eigenvalue extraction. Tehnicki Vjesnik-Technical Gazette 26(3):778–785

    Google Scholar 

  53. Li Q, Dong J, Zhong J et al (2019) A neural model for type classification of entities for text. Knowl-Based Syst 176:122–132

    Google Scholar 

  54. Li CB, Zhang GH, Li ZH (2018) News text classification based on improved Bi-LSTM-CNN. In: Proceedings of 9th international conference on information Technology in Medicine and Education (ITME), Hangzhou, pp 890–893

  55. Li BY, Zhou KM, Gao W et al (2017) Attention-based LSTM-CNNs for uncertainty identification on Chinese social media texts. In: Proceedings of international conference on security, pattern analysis, and cybernetics (ICSPAC), Shenzhen, pp 609–614

  56. Lim CG, Choi HJ (2018) LSTM-based model for extracting temporal relations from Korean text. In: Proceedings of IEEE international conference on big data and smart computing (BigComp), Shanghai, pp 666–668

  57. Lim H, Kim DW (2020) Generalized term similarity for feature selection in text classification using quadratic programming. Entropy 22(4):1–12

    MathSciNet  Google Scholar 

  58. Liu Y (2019) Novel volatility forecasting using deep learning-long short term memory recurrent neural networks. Expert Syst Appl 132:99–109

    Google Scholar 

  59. Liu G, Guo JB (2019) Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337:325–338

    Google Scholar 

  60. Liu Y, Jin X, Shen H (2019) Towards early identification of online rumors based on long short-term memory networks. Inf Process Manage 56(4):1457–1467

    Google Scholar 

  61. Lu Y, Liang M, Ye Z, Cao L (2015) Improved particle swarm optimization algorithm and its application in text feature selection. Appl Soft Comput 35:629–636

    Google Scholar 

  62. Mahdieh L, Parham M, Fardin A, Mahdi J (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37

    Google Scholar 

  63. Manochandar S, Punniyamoorthy M (2018) Scaling feature selection method for enhancing the classification performance of support vector machines in text mining. Comput Ind Eng 124:139–156

    Google Scholar 

  64. Marafino BJ, Boscardin JW, Dudley AR (2015) Efficient and sparse feature selection for biomedical text classification via the elastic net: application to ICU risk stratification from nursing notes. J Biomed Inform 54:114–120

    Google Scholar 

  65. Melike T, Murat CG, Selim A (2016) Helmholtz principle based supervised and unsupervised feature selection methods for text mining. Inf Process Manage 52:885–910

    Google Scholar 

  66. Mustafa AM, Rashid TA (2018) Kurdish stemmer pre-processing steps for improving information retrieval. J Inf Sci 44(1):15–27

    Google Scholar 

  67. Ni C, Chen X, Wu F et al (2019) An empirical study on pareto based multi-objective feature selection for software defect prediction. J Syst Softw 152:215–238

    Google Scholar 

  68. Nowak J, Taspinar A, Scherer R (2017) LSTM recurrent neural networks for short text and sentiment classification. In: Proceedings of 16th International Conference on Artificial Intelligence and Soft Computing (ICAISC), vol 10246, Zakopane, pp 553–562

  69. Ong P, Tieh THC, Lai KH et al (2019) Efficient gear fault feature selection based on moth-flame optimisation in discrete wavelet packet analysis domain. J Braz Soc Mech Sci Eng 41(6):266

    Google Scholar 

  70. Parwez MA, Abulaish M, Jahiruddin (2019) Multi-label classification of microblogging texts using convolution neural network. IEEE Access 7(68678):68691

    Google Scholar 

  71. Pinheiro RHW, Cavalcanti GDC, Ren TI (2015) Data-driven global-ranking local feature selection methods for text categorization. Expert Syst Appl 42:1941–1949

    Google Scholar 

  72. Rashid TA, Mustafa AM, Saeed AM (2017) A robust categorization system for kurdish sorani text documents. Inf Technol J 16(1):27–34

    Google Scholar 

  73. Rashid TA, Mustafa AM, Saeed AM (2017b) Automatic Kurdish text classification using KDC 4007 dataset. In: Proceedings of the 5th International Conference on Emerging Internetworking, Data & Web Technologies, Wuhan, China, Vol. 6, pp.187–198

  74. Saeed MM, Al Aghbari Z (2022) ARTC: feature selection using association rules for text classification. Neural Comput Appl 34(24):22519–22529

    Google Scholar 

  75. Sahu SK, Anand A (2018) Drug-drug interaction extraction from biomedical texts using long short-term memory network. J Biomed Inform 86:15–24

    Google Scholar 

  76. Sasankan N, Geng H, Zhong H et al (2019) Identifying predictive radiomic markers for patients in RTOG0617 using multiple feature selection methods. Med Phys 46(6):E336–E436

    Google Scholar 

  77. Shang W, Huang H, Zhu H, Lin Y, Qu Y, Wang Z (2007) A novel feature selection algorithm for text categorization. Expert Syst Appl 33(1):1–5

    Google Scholar 

  78. She XY, Zhang D (2018) Text classification based on hybrid CNN-LSTM hybrid model. In: Proceedings of 11th international symposium on computational intelligence and design (ISCID), Hangzhou, pp 185–189

  79. Shi SM, Zhao M, Guan J et al (2017) A hierarchical LSTM model with multiple features for sentiment analysis of Sina Weibo texts. In: Proceedings of international conference on Asian language processing (IALP), Singapore, pp 379–382

  80. Shih CH, Yan BC, Liu SH et al (2017) Investigating Siamese LSTM networks for text categorization. In: Proceedings of 9th annual summit and conference of the Asia-Pacific-signal-and-information-processing-association (APSIPA ASC), Kuala Lumpur, pp 641–646

  81. Shu B, Ren FJ, Bao YW (2018) Investigating Lstm with k-max pooling for text classification. In: Proceedings of 11th international conference on intelligent computation technology and automation (ICICTA), Changsha, pp 31–34

  82. Singh G, Nagpal A, Singh V (2023) Optimal feature selection and invasive weed tunicate swarm algorithm-based hierarchical attention network for text classification. Connect Sci 35(1):1–25

    Google Scholar 

  83. Song SL, Huang HT, Ruan TX (2019) Abstractive text summarization using LSTM-CNN based deep learning. Multimed Tools Appl 78(1):857–875

    Google Scholar 

  84. Sprugnoli R, Tonelli S (2019) Novel event detection and classification for historical texts. Comput Linguist 45(2):229–265

    Google Scholar 

  85. Su MH, Wu CH, Huang KY et al (2018) LSTM-based text emotion recognition using semantic and emotional word vectors. In: Proceedings of 1st Asian conference on affective computing and intelligent interaction (ACII Asia), Beijing

  86. Sun CJ, Liu Y, Jia CE et al (2017) Recognizing text entailment via bidirectional LSTM model with inner-attention. In: Proceedings of 13th International Conference on Intelligent Computing (ICIC), vol 10363, Liverpool, pp 448–457

  87. Tang B, Kay S, He H (2016) Toward optimal feature selection in naïve bayes for text categorization. IEEE Trans Knowl Data Eng 28(9):2508–2521

    Google Scholar 

  88. Tang X, Dai Y, Xiang Y (2019) Feature selection based on feature interactions with application to text categorization. Expert Syst Appl 120:207–216

    Google Scholar 

  89. Tan AH, Ridge K, Labs D, Terrace HMK (1999) Text mining: the state of the art and the challenges. In: Proceedings of the Pakdd Workshop on Knowledge Disocovery from Advanced Databases, pp. 65–70

  90. Thirumoorthy K, Nuneeswaran K (2021) Feature selection using hybrid poor and rich optimization algorithm for text classification. Pattern Recogn Lett 147:63–70

    Google Scholar 

  91. Tomer M, Kumar M (2020) Improving text summarization using ensembled approach based on fuzzy with LSTM. Arab J Sci Eng 45(12):10743–10754

    Google Scholar 

  92. Tommasel A, Godoy D (2018) Short-text feature construction and selection in social media data: a survey. Artif Intell Rev 49(3):301–338

    Google Scholar 

  93. Uğuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl-Based Syst 24(7):1024–1032

    Google Scholar 

  94. Uysal AK (2016) An improved global feature selection scheme for text classification. Expert Syst Appl 43:82–92

    Google Scholar 

  95. Uysal AK (2018) On two-stage feature selection methods for text classification. IEEE Access 6:43233–43251

    MathSciNet  Google Scholar 

  96. VeeraSekharReddy B, Rao KS, Koppula N (2023) An attention based bi-LSTM DenseNet model for named entity recognition in english texts. Wireless Pers Commun 130:1435–1448

    Google Scholar 

  97. Wan C, Wang Y, Liu Y et al (2019) Composite feature extraction and selection for text classification. IEEE Access 7:35208–35219

    Google Scholar 

  98. Wang J, Cao ZW (2017) Chinese text sentiment analysis using LSTM network based on L2 and Nadam. In: Proceedings of 2017 17th IEEE international conference on communication technology (ICCT 2017), Chengdu, pp 1891–1895

  99. Wang G, Lochovsky FH (2004) Feature selection with conditional mutual information maximin in text categorization. In: Proceedings of the International Conference on Information and Knowledge Management, Washington, D.C., USA, pp.342–349

  100. Wang H, Hong M (2015) Distance variance score: an efficient feature selection method in text classification. Math Probl Eng 2015:695720

    Google Scholar 

  101. Wang H, Hong M (2019) Supervised hebb rule based feature selection for text classification. Inf Process Manage 56:167–191

    Google Scholar 

  102. Wang HT, Li FB (2022) A text classification method based on LSTM and graph attention network. Connect Sci 34(1):2466–2480

    Google Scholar 

  103. Wang S, Wang X, Wang S et al (2019) Bi-directional long short-term memory method based on attention mechanism and rolling update for short-term load forecasting. Int J Electr Power Energy Syst 109:470–479

    Google Scholar 

  104. Wang W, Hong T, Xu X et al (2019) Forecasting district-scale energy dynamics through integrating building network and long short-term memory learning algorithm. Appl Energy 248:217–230

    Google Scholar 

  105. Wang Y, Feng L (2018) A new feature selection method for handling redundant information in text classification. Front Inform Technol Electron Eng 19(2):221–234

    Google Scholar 

  106. Witten IH, Frank E, Hall MA, Pal CJ (2017) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Cambridge

    Google Scholar 

  107. Wu JL, He YY, Yu LC, Lai KR (2020) Identifying emotion labels from psychiatric social texts using a bi-directional LSTM-CNN model. IEEE Access 8(66638):66646

    Google Scholar 

  108. Wu X, Fei MR, Wu DK et al (2023) Enhanced binary black hole algorithm for text feature selection on resources classification. Knowl-Based Syst 274:1–23

    Google Scholar 

  109. Xiao LZ, Wang GZ, Zuo Y (2018) Research on patent text classification based on Word2Vec and LSTM. In: Proceedings of 11th international symposium on computational intelligence and design (ISCID), Hangzhou, pp 71–74

  110. Xu HS, Hu B (2022) Legal text recognition using LSTM-CRF deep learning model. Comput Intell Neurosci 2022:1–10

    Google Scholar 

  111. Xu F, Yi G, Qi W et al (2018) Research on automatic summary of Chinese short text based on LSTM and keywords correction. In: Proceedings of 10th international conference on advanced computational intelligence (ICACI), Xiamen, pp 467–472

  112. Yao WX, Liu J, Cai ZH (2017) Personal attributes extraction in Chinese text based on distant-supervision and LSTM. In: Proceedings of 12th KIPS International Conference on Ubiquitous Information Technologies and Applications (CUTE) / 9th International Conference on Computer Science and its Applications (CSA), vol 474, Taiwan, pp 511–515

  113. Yao L, Mao C, Luo Y (2019) Clinical text classification with rule-based features and knowledge-guided convolutional neural networks. BMC Med Inform Decis Mak 19(S3):71

    Google Scholar 

  114. Yin ZY, Shao JS, Hussain MJ, Hao YJ, Chen Y, Zhang XF, Wang L (2023) DPG-LSTM: An Enhanced LSTM Framework for Sentiment Analysis in Social Media Text Based on Dependency Parsing and GCN. Appl Sci-Basel 13(1):1–17

    Google Scholar 

  115. Zhai ZL, Zhang X, Fang FF, Yao LY (2023) Text classification of Chinese news based on multi-scale CNN and LSTM hybrid model. Multimed Tools Appl 82(14):20975–20988

    Google Scholar 

  116. Zhang S, Chen Y, Huang X et al (2019) Text classification of public feedbacks using convolutional neural network based on differential evolution algorithm. Int J Comput Commun Control 14(1):124–134

    Google Scholar 

  117. Zhang JR, Li YX, Tian J et al (2018) LSTM-CNN hybrid model for text classification. In: Proceedings of 3rd IEEE advanced information technology, electronic and automation control conference (IAEAC), Chongqing, pp 1675–1680

  118. Zhang B, Li J, Quan L et al (2019) Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network. Neurocomputing 357:86–100

    Google Scholar 

  119. Zhang Z, Ye L, Qin H et al (2019) Wind speed prediction method using shared weight long short-term memory network and Gaussian process regression. Appl Energy 247:270–284

    Google Scholar 

  120. Zheng Z (2004) Feature selection for text categorization on imbalanced data. ACM SIGKDD Explorations Newsl 6(1):80–89

    Google Scholar 

  121. Zong W, Wu F, Chu LK, Sculli D (2015) A discriminative and semantic feature selection method for text categorization. Int J Prod Econ 165:215–222

    Google Scholar 

Download references

Acknowledgements

This research was supported by the Fundamental Research Funds for Guangdong Natural Science Foundation, Grant No. 2022A1515011848; Guangzhou Philosophy and Social Science, Grant No. 2020GZYB04; Guangdong Philosophy and Social Science, Grant No. GD22YYJ15.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heyong Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Results of classification accuracy analysis

Table 3

Table 3 Classification accuracy of CF dataset (Corresponding to Fig. 9)

Table 4

Table 4 Classification accuracy of CR dataset (Corresponding to Fig. 10)

Table 5

Table 5 Classification accuracy of CNAE dataset (Corresponding to Fig. 11)

Table 6

Table 6 Classification accuracy of IMDB dataset (Corresponding to Fig. 12)

Table 7

Table 7 Classification accuracy of KDC dataset (Corresponding to Fig. 13)

Table 8

Table 8 Classification accuracy of TTC dataset (Corresponding to Fig. 14)

Table 9

Table 9 Classification accuracy of WEBKB dataset (Corresponding to Fig. 15)

Table 10

Table 10 Classification accuracy of R8 dataset (Corresponding to Fig. 16)

Table 11

Table 11 Classification accuracy of R52 dataset (Corresponding to Fig. 17)

1.2 Results of semantics analysis

The bold terms are the manually selected terms which are related to topics of datasets (all the terms are stemmed, and all the uppercases are transformed to lowercases).

Table 12

Table 12 Top 20 Terms for CF dataset (Corresponding to Fig. 18 (a))

Table 13

Table 13 Top 20 Terms for CR dataset (Corresponding to Fig. 18 (b))

Table 14

Table 14 Top 20 Terms for IMDB dataset (Corresponding to Fig. 18 (c))

Table 15

Table 15 Top 20 Terms for WEBKB dataset (Corresponding to Fig. 18 (d))

Table 16

Table 16 Top 20 Terms for R8 dataset (Corresponding to Fig. 18 (e))

Table 17

Table 17 Top 20 Terms for R52 dataset (Corresponding to Fig. 18 (f))

1.3 Results of sparsity analysis

Table 18

Table 18 Sparsity for CF dataset (Corresponding to Fig. 19 (a))

Table 19

Table 19 Sparsity for CR dataset (Corresponding to Fig. 19 (b))

Table 20

Table 20 Sparsity for CNAE dataset (Corresponding to Fig. 19 (c))

Table 21

Table 21 Sparsity for IMDB dataset (Corresponding to Fig. 19 (d))

Table 22

Table 22 Sparsity for KDC dataset (Corresponding to Fig. 19 (e))

Table 23

Table 23 Sparsity for TTC dataset (Corresponding to Fig. 19 (f))

Table 24

Table 24 Sparsity for WEBKB dataset (Corresponding to Fig. 19 (g))

Table 25

Table 25 Sparsity for R8 dataset (Corresponding to Fig. 19 (h))

Table 26

Table 26 Sparsity for R52 dataset (Corresponding to Fig. 19 (i))

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hong, M., Wang, H. Feature selection based on long short term memory for text classification. Multimed Tools Appl 83, 44333–44378 (2024). https://doi.org/10.1007/s11042-023-16990-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16990-7

Keywords

Navigation