Abstract
In recent years, cost-sensitive feature selection has drawn much attention. However, some issues still remain to be investigated. Particularly, most existing work deals with single-typed data, while only a few studies deal with hybrid data; moreover, both the test cost of a feature and the misclassification cost of an object are often assumed to be fixed, but in fact they are usually variable with the error range of the data, or equivalently the data granularity. In view of these facts, a feature–granularity selection approach is proposed to select the optimal feature subset and the optimal data granularity simultaneously to minimize the total cost for processing hybrid data. In the approach, firstly an adaptive neighborhood model is constructed, in which the neighborhood granules are generated adaptively according to the types of features. Then, multiple kinds of variable cost setting are discussed according to reality, and finally, an optimal feature–granularity selection algorithm is designed. Experimental results on sixteen UCI datasets show that a good trade-off among feature dimension reduction, data granularity selection and total cost minimization could be achieved by the proposed algorithm. In particular, the influences of different cost settings to the feature–granularity selection are also discussed thoroughly in the paper, which would provide some feasible schemes for decision making.
Similar content being viewed by others
References
Ansorge S, Schmidt J (2015) Visualized episode mining with feature granularity selection. In: Industrial conference on data mining. Springer, Cham, pp 201–215
Bian J, Peng XG, Wang Y, Zhang H (2016) An efficient cost-sensitive feature selection using chaos genetic algorithm for class imbalance problem. Math Probl Eng 2016:1–9
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/mlrepository.html
Boussouf M, Quafafou M (2000) Scalable feature selection using rough set theory. In: Proceedings of rough sets and current trends in computing, vol. 2005. LNCS, pp 131–138
Cao P, Zhao DZ, Zaiane O (2013) An optimized cost-sensitive SVM for imbalanced data learning. In: Advances in knowledge discovery and data mining, vol 7819. LNCS, pp 280–292
Chai XY, Deng L, Yang Q, Ling CX (2004) Test-cost sensitive Naïve Bayes classification. In: Proceedings of the 5th international conference on data mining, pp 51–58
Chen DG, Yang YY (2014) Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Trans Fuzzy Syst 22(5):1325–1334
Dai JH, Wang WT, Xu Q, Tian HW (2012) Uncertainty measurement for interval-valued decision systems based on extended conditional entropy. Knowl Based Syst 27:443–450
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151:155–176
Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of the 5th international conference on knowledge discovery and data mining, pp 155–164
Doquire G, Verleysen M (2011) An hybrid approach to feature selection for mixed categorical and continuous data. In: Proceedings of the international conference on knowledge discovery and information retrieval, pp 394–401
Du J, Cai ZH, Ling CX (2007) Cost-sensitive decision trees with pre-pruning. In: Proceedings of Canadian AI, No. 4509. LNAI, pp 171–179
Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos Trans R Soc Lond Ser A Contain Pap Math Phys Charact 222:309–368
Greiner R, Grove AJ, Roth D (2002) Learning cost-sensitive active classifiers. Artif Intell 139(2):137–174
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
Hu QH, Pedrycz W, Yu DR, Lang J (2010) Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans Syst Man Cybern Part B Cybern 40(1):137–150
Huang TY, Zhu W (2017) Cost-sensitive feature selection via manifold learning. J Shandong Univ 52(3):91–96
Iswandy K, Koenig A (2006) Feature selection with acquisition cost for optimizing sensor system design. Adv Radio Sci 4:135–141
Jia XY, Liao WH, Tang ZM, Shang L (2013) Minimum cost attribute reduction in decision-theoretic rough set models. Inf Sci 219:151–167
Kannan SS, Ramaraj N (2010) A novel hybrid feature selection via symmetrical uncertainty ranking based local memetic search algorithm. Knowl Based Syst 23:580–585
Liang JY, Wang F, Dang CY, Qian YH (2012) An efficient rough feature selection algorithm with a multi-granulation view. Int J Approx Reason 53:912–926
Liao SJ, Zhu QX, Min F (2014) Cost-sensitive attribute reduction in decision-theoretic rough set models. Math Probl Eng 2014:1–9
Liao SJ, Zhu QX, Liang R (2017) An efficient approach of test-cost-sensitive attribute reduction for numerical data. Int J Innov Comput Inf Control 13(6):2099–2111
Liao SJ, Zhu QX, Qian YH, Lin GP (2018) Multi-granularity feature selection on cost-sensitive data with measurement errors and variable costs. Knowl Based Syst 158:25–42
Liu GL, Sai Y (2009) A comparison of two types of rough sets induced by coverings. Int J Approx Reason 50(3):521–528
Luo C, Li TR, Chen HM, Lu LX (2015) Fast algorithms for computing rough approximations in set-valued decision systems while updating criteria values. Inf Sci 299:221–242
Min F, He HP, Qian YH, Zhu W (2011) Test-cost-sensitive attribute reduction. Inf Sci 181:4928–4942
Min F, Hu QH, Zhu W (2014) Feature selection with test cost constraint. Int J Approx Reason 55:167–179
Pendharkar PC (2013) A maximum-margin genetic algorithm for misclassification cost minimizing feature selection problem. Expert Syst Appl 40(10):3918–3925
Shu WH, Shen H (2016) Multi-criteria feature selection on cost-sensitive data with missing values. Pattern Recogn 51:268–280
Turney PD (2000) Types of cost in inductive concept learning. In: Proceedings of the workshop on cost-sensitive learning at the 17th ICML, pp 1–7
Wang T, Qin ZX, Jin Z, Zhang S (2010) Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning. J Syst Softw 83(7):1137–1147
Weiss Y, Elovici Y, Rokach L (2013) The cash algorithm-cost-sensitive attribute selection using histograms. Inf Sci 222:247–268
Yao YY (2004) A partition model of granular computing. Lect Notes Comput Sci 3100:232–253
Yao YY, Zhao Y (2008) Attribute reduction in decision-theoretic rough set models. Inf Sci 178(17):3356–3373
Yu SL, Zhao H (2018) Rough sets and Laplacian score based cost sensitive feature selection. PLoS ONE 13(6):1–23
Zhang SC, Liu L, Zhu XF, Zhang C (2008) A strategy for attributes selection in cost-sensitive decision trees induction. In: IEEE 8th international conference on computer and information technology workshops, Sydney, QLD, pp 8–13
Zhang Y, Gong DW, Cheng J (2017) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinform 14(1):64–75
Zhao H, Yu SL (2019) Cost-sensitive feature selection via the \(l_{2,1}\)-norm. Int J Approx Reason 104:25–37
Zhao H, Zhu W (2014) Optimal cost-sensitive granularization based on rough sets for variable costs. Knowl Based Syst 65:72–82
Zhao H, Min F, Zhu W (2013) Cost-sensitive feature selection of numeric data with measurement errors. J Appl Math 2013:1–13
Zhou YH, Zhou ZH (2016) Large margin distirbution learning with cost interval and unlabeled data. IEEE Trans Knowl Data Eng 28(7):1749–1763
Zhou QF, Zhou H, Li T (2016) Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl Based Syst 95:1–11
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 61603173 and 11871259, the Natural Science Foundation of Fujian Province, China, under Grant No. 2017J01771, the Institute of Meteorological Big Data-Digital Fujian and Fujian Key Laboratory of Data Science and Statistics.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
This study was funded by the National Natural Science Foundation of China under Grant Nos. 61603173 and 11871259, the Natural Science Foundation of Fujian Province, China, under Grant No. 2017J01771, the Institute of Meteorological Big Data-Digital Fujian and Fujian Key Laboratory of Data Science and Statistics.
Conflict of interest
Author Shujiao Liao declares that she has no conflict of interest. Author Qingxin Zhu declares that he has no conflict of interest. Author Yuhua Qian declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liao, S., Zhu, Q. & Qian, Y. Feature–granularity selection with variable costs for hybrid data. Soft Comput 23, 13105–13126 (2019). https://doi.org/10.1007/s00500-019-03854-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-03854-2