Skip to main content

A Comparison of Different Data Transformation Approaches in the Feature Ranking Context

  • Conference paper
  • First Online:
Discovery Science (DS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9956))

Included in the following conference series:

  • 1609 Accesses

Abstract

Due to the omnipresence of high-dimensional datasets, feature selection and ranking are very important steps in data preprocessing. In this work, we propose three transformations for real-valued features. The transformations are based on estimating the probability densities of the features. Originally, we propose modified distance measures for the ReliefF algorithm, which is one the most prominent feature ranking algorithms. To enable their comparison with the other feature ranking algorithms, we present data transformations that are mathematically equivalent to the modified distance measures. Finally, we evaluate our proposed transformations used in combination with several feature ranking methods on a set of benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Due to its implementation in Weka, the SVM-RFE algorithm could not be applied to datasets with non-binary nominal features, hence the results of SVM-RFE are based on 19 (and not 28) small datasets. From now on, we refer to SVM-RFE as SVM.

References

  1. Visualization-based cancer microarray data classification analysis. http://www.biolab.si/supp/bi-cancer/projections. Accessed 04 Oct 2015

  2. Abramowitz, M., Stegun, I.: Handbook of Mathematical Functions (1972)

    Google Scholar 

  3. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J.M., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 115–135 (2014)

    Article  Google Scholar 

  4. Botev, Z., Grotowsky, J., Kroese, D.P.: Kernel density estimation via diffusion. Ann. Stat. 38(5), 2916–2957 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bowling, S.R., Khasawneh, M.T., Kaewkuekool, S., Cho, B.R.: A logistic approximation to the cumulative normal distribution. J. Ind. Eng. Manag. 2, 114–127 (2009)

    Google Scholar 

  6. Cantelli, F.P.: Sulla determinazione empirica delle leggi di probabilita. Giornale dell’Istituto Italiano degli Attuari 4, 421–424 (1933)

    MATH  Google Scholar 

  7. Cao, X.H., Obradovic, Z.: A robust data scaling algorithm for gene expression classification. In: Proceedings of the 15th IEEE International Conference on Bioinformatics and Bioengineering (2015)

    Google Scholar 

  8. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  9. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Glivenko, V.I.: Sulla determinazione empirica delle leggi di probabilita. Giornale dell’Istituto Italiano degli Attuari 2, 92–99 (1933)

    MATH  Google Scholar 

  11. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)

    Article  MATH  Google Scholar 

  12. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  13. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. MorganKaufmann Publishers Inc., San Francisco (2011)

    MATH  Google Scholar 

  14. Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the Tenth National Conference on Artificial Intelligence, AAAI 1992, pp. 129–134 (1992)

    Google Scholar 

  15. Kononenko, I., Robnik-Šikonja, M.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. J. 53, 23–69 (2003)

    Article  MATH  Google Scholar 

  16. Lewis, A.: Getdist. https://github.com/cmbant/getdist. Accessed 27 May 2016

  17. Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

  18. Petković, M., Panov, P., Džeroski, S.: Improved ranking of numeric features with ReliefF. Presented at the Workshops on Machine Learning in Computational Biology (MLCB) & Machine Learning in Systems Biology (MLSB) (2015)

    Google Scholar 

  19. Rao, K.R., Kim, D.N., Hwang, J.J.: Fast Fourier Transform - Algorithms and Applications, 1st edn. Springer Publishing Company, Incorporated, Heidelberg (2010)

    Book  MATH  Google Scholar 

  20. Slavkov, I.: An Evaluation Method for Feature Rankings. Ph.D. thesis, Mednarodna podiplomska šola Jožefa Stefana, Ljubljana (2012)

    Google Scholar 

  21. Stańczyk, U., Jain, L.C. (eds.): Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence, vol. 584. Springer, Heidelberg (2015). doi:10.1007/978-3-662-45620-0

    Google Scholar 

  22. Wu, C.: On the convergence properties of the EM algorithm. Ann. Stat. 11, 95–103 (1983)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to acknowledge the support of the EC through the projects: MAESTRA (FP7-ICT-612944) and HBP (FP7-ICT-604102), and the Slovenian Research Agency through a young researcher grant and the program Knowledge Technologies (P2-0103).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matej Petković .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Petković, M., Panov, P., Džeroski, S. (2016). A Comparison of Different Data Transformation Approaches in the Feature Ranking Context. In: Calders, T., Ceci, M., Malerba, D. (eds) Discovery Science. DS 2016. Lecture Notes in Computer Science(), vol 9956. Springer, Cham. https://doi.org/10.1007/978-3-319-46307-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46307-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46306-3

  • Online ISBN: 978-3-319-46307-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics