Handling Different Levels of Granularity within Naive Bayes Classifiers

Ince, Kemal; Klawonn, Frank

doi:10.1007/978-3-642-41278-3_63

Kemal Ince²⁴ &
Frank Klawonn^25,26

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8206))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

4810 Accesses

Abstract

Data mining techniques usually require a flat data table as input. For categorical attributes, there is often no canonical flat data table, since they can often be considered in different levels of granularity (like continent, country or local region). The choice of the best level of granularity for a data mining task can be very tedious, especially when a larger number of attributes with different levels of granularities is involved. In this paper we propose two approaches to automatically select the granularity levels in the context of a naive Bayes classifier. The two approaches are based on the χ ² independence test including correction for multiple testing and the minimum description length principle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ince, K., Klawonn, F.: Decision and regression trees in the context of attributes with different granularity levels. In: Borgelt, C., Gil, M.Á., Sousa, J.M.C., Verleysen, M. (eds.) Towards Advanced Data Analysis. STUDFUZZ, vol. 285, pp. 331–342. Springer, Heidelberg (2012)
Chapter Google Scholar
Hand, D., Yu, K.: Idiot’s bayes—Not so stupid after all? International Statistical Review 69, 385–398 (2001)
MATH Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997)
Article MATH Google Scholar
Berthold, M., Borgelt, C., Höppner, F., Klawonn, F.: Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data. Springer, London (2010)
Book Google Scholar
John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, pp. 338–345. Morgan Kaufmann (1995)
Google Scholar
Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Rec. 26, 65–74 (1997)
Article Google Scholar
Shaffer, J.P.: Multiple hypothesis testing. Ann. Rev. Psych. 46, 561–584 (1995)
Article Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)
MathSciNet MATH Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 289–300 (1995)
MathSciNet MATH Google Scholar
Gruenwald, P.D.: The Minimum Description Length Principle. The MIT Press, Cambridge (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Volkswagen AG, Komponenten-Werkzeugbau, Gifhornerstr. 180, D-38037, Braunschweig, Germany
Kemal Ince
Department of Computer Science, Ostfalia University of Applied Sciences, Salzdahlumer Str. 46/48, D-38302, Wolfenbuettel, Germany
Frank Klawonn
Bioinformatics and Statistics, Helmholtz Centre for Infection Research, Inhoffenstr. 7, D-38124, Braunschweig, Germany
Frank Klawonn

Authors

Kemal Ince
View author publications
You can also search for this author in PubMed Google Scholar
Frank Klawonn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin
University of Science and Technology of China, Hefei, China
Ke Tang
Nanjing University, Nanjing, China
Yang Gao
Ostfalia University of Applied Sciences, 38302, Wolfenbüttel, Germany
Frank Klawonn
Kyungpook National University, 702-701, Buk-Gu, Daegu, Korea
Minho Lee
Nature Inspired Computational and Applications Laboratory, School of Computer Science and Technology,, University of Science and Technology of China, 230027, Hefei, China
Thomas Weise
University of Science and Technology of China, 230017, Hefei, China
Bin Li
CERCIA, School of Computer Science, University of Birmingham, B15 2TT, Edgbaston, Birmingham, UK
Xin Yao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ince, K., Klawonn, F. (2013). Handling Different Levels of Granularity within Naive Bayes Classifiers. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2013. IDEAL 2013. Lecture Notes in Computer Science, vol 8206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41278-3_63

Download citation

DOI: https://doi.org/10.1007/978-3-642-41278-3_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41277-6
Online ISBN: 978-3-642-41278-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics