Skip to main content

Advertisement

Springer Nature Link
Log in
Menu
Find a journal Publish with us Track your research
Search
Cart
  1. Home
  2. Machine Learning
  3. Article

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

  • Published: November 1997
  • Volume 29, pages 103–130, (1997)
  • Cite this article
Download PDF
Machine Learning Aims and scope Submit manuscript
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Download PDF
  • Pedro Domingos1 &
  • Michael Pazzani1 
  • 12k Accesses

  • 2006 Citations

  • 11 Altmetric

  • Explore all metrics

Abstract

The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containing clear attribute dependences suggest that the answer to this question may be positive. This article shows that, although the Bayesian classifier's probability estimates are only optimal under quadratic loss if the independence assumption holds, the classifier itself can be optimal under zero-one loss (misclassification rate) even when this assumption is violated by a wide margin. The region of quadratic-loss optimality of the Bayesian classifier is in fact a second-order infinitesimal fraction of the region of zero-one optimality. This implies that the Bayesian classifier has a much greater range of applicability than previously thought. For example, in this article it is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption. Further, studies in artificial domains show that it will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is a priori much less appropriate to the domain. This article's results also imply that detecting attribute dependence is not necessarily the best way to extend the Bayesian classifier, and this is also verified empirically.

Article PDF

Download to read the full article text

Similar content being viewed by others

When is the Naive Bayes approximation not so naive?

Article 21 July 2017

Constrained Naïve Bayes with application to unbalanced data classification

Article Open access 20 October 2021

Confidence curves: an alternative to null hypothesis significance testing for the comparison of classifiers

Article 30 December 2016

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.
  • Artificial Intelligence
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  • Ben-Bassat, M., Klove, K. L., & Weil, M. H. (1980). Sensitivity analysis in Bayesian classification models: Multiplicative deviations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 261–266.

    Google Scholar 

  • Breiman, L. (1996). Bias, variance and arcing classifiers (Technical Report 460). Statistics Department, University of California at Berkeley, Berkeley, CA. ftp://ftp.stat.berkeley.edu/users/breiman/arcall.ps.Z.

    Google Scholar 

  • Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. Proceedings of the Ninth European Conference on Artificial Intelligence. Stockholm, Sweden: Pitman.

    Google Scholar 

  • Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. Proceedings of the Sixth European Working Session on Learning (pp. 151–163). Porto, Portugal: Springer-Verlag.

    Google Scholar 

  • Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261–283.

    Google Scholar 

  • Cost, S., & Salzberg, S. (1993). A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 10, 57–78.

    Google Scholar 

  • DeGroot, M. H. (1986). Probability and statistics (2nd ed.). Reading, MA: Addison-Wesley.

    Google Scholar 

  • Dietterich, T. (1996). Statistical tests for comparing supervised classification learning algorithms (technical report). Department of Computer Science, Oregon State University, Corvallis, OR. ftp://ftp.cs.orst.edu/pub/tgd/papers/stats.ps.gz.

    Google Scholar 

  • Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. Proceedings of the Twelfth International Conference on Machine Learning (pp. 194–202). Tahoe City, CA: Morgan Kaufmann.

    Google Scholar 

  • Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. New York, NY: Wiley.

    Google Scholar 

  • Flury, B., Schmid, M. J., & Narayanan, A. (1994). Error rates in quadratic discrimination with constraints on the covariance matrices. Journal of Classification, 11, 101–120.

    Google Scholar 

  • Friedman, J. H. (1996). On bias, variance, 0/1-loss, and the curse-of-dimensionality (technical report). Department of Statistics, Stanford University, Stanford, CA. ftp://playfair.stanford.edu/pub/friedman/kdd.ps.Z.

    Google Scholar 

  • Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning (this volume).

  • Haussler, D. (1988). Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artificial Intelligence, 36, 177–221.

    Google Scholar 

  • John, G., & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 338–345). Montr´eal, Canada: Morgan Kaufmann.

    Google Scholar 

  • Kohavi, R. (1995). Wrappers for performance enhancement and oblivious decision graphs. PhD thesis, Department of Computer Science, Stanford University, Stanford, CA.

    Google Scholar 

  • Kohavi, R. (1996). Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (pp. 202–207). Portland, OR: AAAI Press.

    Google Scholar 

  • Kohavi, R., Becker, B., & Sommerfield, D. (1997). Improving simple Bayes (technical report). Data Mining and Visualization Group, Silicon Graphics Inc., Mountain View, CA. ftp://starry.stanford.edu/pub/ronnyk/impSBC.ps.Z.

    Google Scholar 

  • Kohavi, R., & Wolpert, D. H. (1996). Bias plus variance decomposition for zero-one loss functions. Proceedings of the Thirteenth International Conference on Machine Learning (pp. 275–283). Bari, Italy: Morgan Kaufmann.

    Google Scholar 

  • Kong, E. B., & Dietterich, T. G. (1995). Error-correcting output coding corrects bias and variance. Proceedings of the Twelfth International Conference on Machine Learning (pp. 313–321). Tahoe City, CA: Morgan Kaufmann.

    Google Scholar 

  • Kononenko, I. (1990). Comparison of inductive and naive Bayesian learning approaches to automatic knowledge acquisition. In B. Wielinga (Ed.), Current Trends in Knowledge Acquisition. Amsterdam, The Netherlands: IOS Press.

    Google Scholar 

  • Kononenko, I. (1991). Semi-naive Bayesian classifier. Proceedings of the Sixth European Working Session on Learning (pp. 206–219). Porto, Portugal: Springer-Verlag.

    Google Scholar 

  • Kubat, M., Flotzinger, D., & Pfurtscheller, G. (1993). Discovering patterns in EEG-Signals: Comparative study of a few methods. Proceedings of the Eighth European Conference on Machine Learning (pp. 366–371). Vienna, Austria: Springer-Verlag.

    Google Scholar 

  • Langley, P. (1993). Induction of recursive Bayesian classifiers. Proceedings of the Eighth European Conference on Machine Learning (pp. 153–164). Vienna, Austria: Springer-Verlag.

    Google Scholar 

  • Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 223–228). San Jose, CA: AAAI Press.

    Google Scholar 

  • Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (pp. 399–406). Seattle, WA: Morgan Kaufmann.

    Google Scholar 

  • Merz, C. J., Murphy, P. M., & Aha, D. W. (1997). UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA. http://www.ics.uci.edu/ mlearn/MLRepository.html.

    Google Scholar 

  • Niblett, T. (1987). Constructing decision trees in noisy domains. Proceedings of the Second European Working Session on Learning (pp. 67–78). Bled, Yugoslavia: Sigma.

    Google Scholar 

  • Pazzani, M. J. (1996). Searching for dependencies in Bayesian classifiers. In D. Fisher & H.-J. Lenz (Eds.), Learning from data: Artificial intelligence and statistics V (pp. 239–248). New York, NY: Springer-Verlag.

    Google Scholar 

  • Pazzani, M., Muramatsu, J., & Billsus, D. (1996). Syskill&Webert: Identifying interesting web sites. Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 54–61). Portland, OR: AAAI Press.

    Google Scholar 

  • Pazzani, M., & Sarrett, W. (1990). A framework for average case analysis of conjunctive learning algorithms. Machine Learning, 9, 349–372.

    Google Scholar 

  • Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Russek, E., Kronmal, R. A., & Fisher, L. D. (1983). The effect of assuming independence in applying Bayes' theorem to risk estimation and classification in diagnosis. Computers and Biomedical Research, 16, 537–552.

    Google Scholar 

  • Sahami, M. (1996). Learning limited dependence Bayesian classifiers. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (pp. 335–338). Portland, OR: AAAI Press.

    Google Scholar 

  • Singh, M., & Provan, G. M. (1995). Acomparison of induction algorithms for selective and non-selective Bayesian classifiers. Proceedings of the Twelfth International Conference on Machine Learning (pp. 497–505). Tahoe City, CA: Morgan Kaufmann.

    Google Scholar 

  • Singh, M., & Provan, G. M. (1996). Efficient learning of selective Bayesian network classifiers. Proceedings of the Thirteenth International Conference on Machine Learning (pp. 453–461). Bari, Italy: Morgan Kaufmann.

    Google Scholar 

  • Tibshirani, R. (1996). Bias, variance and prediction error for classification rules (technical report). Department of Preventive Medicine and Biostatistics, University of Toronto, Toronto, Ontario. http://utstat.toronto.edu/reports/tibs/biasvar.ps.

    Google Scholar 

  • Wan, S. J., & Wong, S. K. M. (1989). A measure for concept dissimilarity and its applications in machine learning. Proceedings of the International Conference on Computing and Information (pp. 267–273). Toronto, Ontario: North-Holland.

Download references

Author information

Authors and Affiliations

  1. Department of Information and Computer Science, University of California, Irvine, CA, 92697

    Pedro Domingos & Michael Pazzani

Authors
  1. Pedro Domingos
    View author publications

    You can also search for this author inPubMed Google Scholar

  2. Michael Pazzani
    View author publications

    You can also search for this author inPubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Domingos, P., Pazzani, M. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning 29, 103–130 (1997). https://doi.org/10.1023/A:1007413511361

Download citation

  • Issue Date: November 1997

  • DOI: https://doi.org/10.1023/A:1007413511361

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Simple Bayesian classifier
  • naive Bayesian classifier
  • zero-one loss
  • optimal classification
  • induction with attribute dependences
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

18.191.166.168

Not affiliated

Springer Nature

© 2025 Springer Nature