Abstract
We investigate the rough-set-based framework for feature selection in decision tables with numeric attributes. We compare functions evaluating subsets of attributes with respect to their potential in determining the distinguished decision attribute by means of two alternative methods: discernibility-based functions over discretized numeric data, as well as distance-based functions often used in the fuzzy-rough approaches to feature selection. In both cases, the idea is to compare objects belonging to different decision classes, by verifying whether they can be distinguished from each other by using discretized attributes or measuring distances between their values over original numeric attributes. We draw a correspondence between functions evaluating subsets of numeric attributes according to both methodologies. For a subset of numeric attributes, we consider a function measuring the amount of pairs of objects belonging to different decision classes that are not discerned by discretized attributes, averaged over all possible choices of binary discretization cuts over the attribute ranges. We prove that such a function can be rewritten by means of distances between the original numeric attributes. Namely, it is equal to the average fuzzy indiscernibility function computed by using the product t-norm combining indiscernibility degrees obtained over particular attributes.
Supported by the Polish National Science Centre grant 2011/01/B/ST6/03867.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bay, S.D.: Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets. In: Proc. of ICML, pp. 37–45 (1998)
Bazan, J.G., Nguyen, H.S., Nguyen, S.H., Synak, P., Wróblewski, J.: Rough Set Algorithms in Classification Problem. In: Polkowski, L., Tsumoto, S., Lin, T.Y. (eds.) New Developments in Knowledge Discovery in Information Systems, pp. 49–88. Physica Verlag (2000)
Bazan, J.G., Skowron, A., Ślęzak, D., Wróblewski, J.: Searching for the Complex Decision Reducts: The Case Study of the Survival Analysis. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 160–168. Springer, Heidelberg (2003)
Cornelis, C., Jensen, R., Hurtado Martín, G., Ślęzak, D.: Attribute Selection with Fuzzy Decision Reducts. Information Sciences 180(2), 209–224 (2010)
Düntsch, I., Gediga, G., Nguyen, H.S.: Rough Set Data Analysis in the KDD Process. In: Proc. of IPMU, pp. 220–226 (2000)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery in Databases. AI Magazine 17(3), 37–54 (1996)
Greco, S., Matarazzo, B., Słowiński, R.: Rough Sets Theory for Multicriteria Decision Analysis. European Journal of Operational Research 129(1), 1–47 (2001)
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Hung, Y.H.: A Neural Network Classifier with Rough Set-based Feature Selection to Classify Multiclass IC Package Products. Advanced Engineering Informatics 23(3), 348–357 (2009)
Jensen, R., Cornelis, C.: Fuzzy-Rough Nearest Neighbour Classification. In: Peters, J.F., Skowron, A., Chan, C.-C., Grzymala-Busse, J.W., Ziarko, W.P. (eds.) Transactions on Rough Sets XIII. LNCS, vol. 6499, pp. 56–72. Springer, Heidelberg (2011)
Jensen, R., Shen, Q.: New Approaches to Fuzzy-Rough Feature Selection. IEEE Transactions on Fuzzy Systems 17(4), 824–838 (2009)
Kovalerchuk, B., Vityaev, E., Yupusov, H.: Symbolic Methodology in Numeric Data Mining: Relational Techniques for Financial Applications. Computational Engineering, Finance, and Science Journal (2002)
Kowalski, M., Stawicki, S.: SQL-based Heuristics for Selected KDD Tasks over Large Data Sets. In: Proc. of FedCSIS (2012)
Kwiatkowski, P., Nguyen, S.H., Nguyen, H.S.: On Scalability of Rough Set Methods. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds.) IPMU 2010, Part I. CCIS, vol. 80, pp. 288–297. Springer, Heidelberg (2010)
Lal, T., Chapelle, O., Weston, J., Elisseeff, A.: Embedded Methods. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer (2005)
Moshkov, M.J., Piliszczuk, M., Zielosko, B.: On Partial Covers, Reducts and Decision Rules. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets VIII. LNCS, vol. 5084, pp. 251–288. Springer, Heidelberg (2008)
Nair, B.B., Mohandas, V.P., Sakthivel, N.R.: A Decision Tree- Rough set Hybrid System for Stock Market Trend Prediction. International Journal of Computer Applications 6(9), 1–6 (2010)
Nguyen, H.S.: Approximate Boolean Reasoning: Foundations and Applications in Data Mining. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets V. LNCS, vol. 4100, pp. 334–506. Springer, Heidelberg (2006)
Pawlak, Z.: Rough Sets – Theoretical Aspects of Reasoning about Data. Kluwer (1991)
Pawlak, Z., Skowron, A.: Rudiments of Rough Sets. Information Sciences 177(1), 3–27 (2007)
Rudnicki, W.R., Kierczak, M., Koronacki, J., Komorowski, J.: A Statistical Method for Determining Importance of Variables in an Information System. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., Słowiński, R. (eds.) RSCTC 2006. LNCS (LNAI), vol. 4259, pp. 557–566. Springer, Heidelberg (2006)
Ślęzak, D.: Approximate Decision Reducts. PhD Thesis, University of Warsaw, Poland (2002) (In Polish)
Ślęzak, D.: Degrees of Conditional (In)dependence: A Framework for Approximate Bayesian Networks and Examples Related to the Rough Set-based Feature Selection. Information Sciences 179(3), 197–209 (2009)
Ślęzak, D., Wróblewski, J.: Classification Algorithms Based on Linear Combinations of Features. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 548–553. Springer, Heidelberg (1999)
Stepaniuk, J.: Approximation Spaces, Reducts and Representations. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery 2, pp. 109–126. Physica Verlag (1998)
Świniarski, R.W., Skowron, A.: Rough Set Methods in Feature Selection and Recognition. Pattern Recognition Letters 24(6), 833–849 (2003)
Widz, S., Ślęzak, D.: Rough Set Based Decision Support – Models Easy to Interpret. In: Peters, G., Lingras, P., Ślęzak, D., Yao, Y. (eds.) Selected Methods and Applications of Rough Sets in Management and Engineering, Springer (2012)
Wojna, A.: Combination of Metric-Based and Rule-Based Classification. In: Ślęzak, D., Wang, G., Szczuka, M.S., Düntsch, I., Yao, Y. (eds.) RSFDGrC 2005, Part I. LNCS (LNAI), vol. 3641, pp. 501–511. Springer, Heidelberg (2005)
Wróblewski, J.: Ensembles of Classifiers Based on Approximate Reducts. Fundamenta Informaticae 47(3-4), 351–360 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ślęzak, D., Betliński, P. (2012). A Role of (Not) Crisp Discernibility in Rough Set Approach to Numeric Feature Selection. In: Hassanien, A.E., Salem, AB.M., Ramadan, R., Kim, Th. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2012. Communications in Computer and Information Science, vol 322. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35326-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-35326-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35325-3
Online ISBN: 978-3-642-35326-0
eBook Packages: Computer ScienceComputer Science (R0)