Skip to main content

Handling Different Levels of Granularity within Naive Bayes Classifiers

  • Conference paper
Intelligent Data Engineering and Automated Learning – IDEAL 2013 (IDEAL 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8206))

  • 4810 Accesses

Abstract

Data mining techniques usually require a flat data table as input. For categorical attributes, there is often no canonical flat data table, since they can often be considered in different levels of granularity (like continent, country or local region). The choice of the best level of granularity for a data mining task can be very tedious, especially when a larger number of attributes with different levels of granularities is involved. In this paper we propose two approaches to automatically select the granularity levels in the context of a naive Bayes classifier. The two approaches are based on the χ 2 independence test including correction for multiple testing and the minimum description length principle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ince, K., Klawonn, F.: Decision and regression trees in the context of attributes with different granularity levels. In: Borgelt, C., Gil, M.Á., Sousa, J.M.C., Verleysen, M. (eds.) Towards Advanced Data Analysis. STUDFUZZ, vol. 285, pp. 331–342. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  2. Hand, D., Yu, K.: Idiot’s bayes—Not so stupid after all? International Statistical Review 69, 385–398 (2001)

    MATH  Google Scholar 

  3. Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997)

    Article  MATH  Google Scholar 

  4. Berthold, M., Borgelt, C., Höppner, F., Klawonn, F.: Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data. Springer, London (2010)

    Book  Google Scholar 

  5. John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, pp. 338–345. Morgan Kaufmann (1995)

    Google Scholar 

  6. Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Rec. 26, 65–74 (1997)

    Article  Google Scholar 

  7. Shaffer, J.P.: Multiple hypothesis testing. Ann. Rev. Psych. 46, 561–584 (1995)

    Article  Google Scholar 

  8. Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)

    MathSciNet  MATH  Google Scholar 

  9. Benjamini, Y., Hochberg, Y.: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 289–300 (1995)

    MathSciNet  MATH  Google Scholar 

  10. Gruenwald, P.D.: The Minimum Description Length Principle. The MIT Press, Cambridge (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ince, K., Klawonn, F. (2013). Handling Different Levels of Granularity within Naive Bayes Classifiers. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2013. IDEAL 2013. Lecture Notes in Computer Science, vol 8206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41278-3_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41278-3_63

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41277-6

  • Online ISBN: 978-3-642-41278-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics