Abstract
Data mining techniques usually require a flat data table as input. For categorical attributes, there is often no canonical flat data table, since they can often be considered in different levels of granularity (like continent, country or local region). The choice of the best level of granularity for a data mining task can be very tedious, especially when a larger number of attributes with different levels of granularities is involved. In this paper we propose two approaches to automatically select the granularity levels in the context of a naive Bayes classifier. The two approaches are based on the χ 2 independence test including correction for multiple testing and the minimum description length principle.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ince, K., Klawonn, F.: Decision and regression trees in the context of attributes with different granularity levels. In: Borgelt, C., Gil, M.Á., Sousa, J.M.C., Verleysen, M. (eds.) Towards Advanced Data Analysis. STUDFUZZ, vol. 285, pp. 331–342. Springer, Heidelberg (2012)
Hand, D., Yu, K.: Idiot’s bayes—Not so stupid after all? International Statistical Review 69, 385–398 (2001)
Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997)
Berthold, M., Borgelt, C., Höppner, F., Klawonn, F.: Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data. Springer, London (2010)
John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, pp. 338–345. Morgan Kaufmann (1995)
Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Rec. 26, 65–74 (1997)
Shaffer, J.P.: Multiple hypothesis testing. Ann. Rev. Psych. 46, 561–584 (1995)
Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)
Benjamini, Y., Hochberg, Y.: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 289–300 (1995)
Gruenwald, P.D.: The Minimum Description Length Principle. The MIT Press, Cambridge (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ince, K., Klawonn, F. (2013). Handling Different Levels of Granularity within Naive Bayes Classifiers. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2013. IDEAL 2013. Lecture Notes in Computer Science, vol 8206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41278-3_63
Download citation
DOI: https://doi.org/10.1007/978-3-642-41278-3_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41277-6
Online ISBN: 978-3-642-41278-3
eBook Packages: Computer ScienceComputer Science (R0)