A learning criterion for stochastic rules

Yamanishi, Kenji

doi:10.1007/BF00992676

A learning criterion for stochastic rules

Published: July 1992

Volume 9, pages 165–203, (1992)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

A learning criterion for stochastic rules

Download PDF

Kenji Yamanishi¹

971 Accesses
42 Citations
3 Altmetric
Explore all metrics

Abstract

This paper proposes a learning criterion for stochastic rules. This criterion is developed by extending Valiant's PAC (Probably Approximately Correct) learning model, which is a learning criterion for deterministic rules. Stochastic rules here refer to those which probabilistically asign a number of classes, {Y}, to each attribute vector X. The proposed criterion is based on the idea that learning stochastic rules may be regarded as probably approximately correct identification of conditional probability distributions over classes for given input attribute vectors. An algorithm (an MDL algorithm) based on the MDL (Minimum Description Length) principle is used for learning stochastic rules. Specifically, for stochastic rules with finite partitioning (each of which is specified by a finite number of disjoint cells of the domain and a probability parameter vector associated with them), this paper derives target-dependent upper bounds and worst-case upper bounds on the sample size required by the MDL algorithm to learn stochastic rules with given accuracy and confidence. Based on these sample size bounds, this paper proves polynomial-sample-size learnability of stochastic decision lists (which are newly proposed in this paper as a stochastic analogue of Rivest's decision lists) with at mostk literals (k is fixed) in each decision, and polynomial-sample-size learnability of stochastic decision trees (a stochastic analogue of decision trees) with at mostk depth. Sufficient conditions for polynomial-sample-size learnability and polynomial-time learnability of any classes of stochastic rules with finite partitioning are also derived.

Article PDF

Finding Probabilistic Rule Lists using the Minimum Description Length Principle

Closed-Form Solutions in Learning Probabilistic Logic Programs by Exact Score Maximization

MetaBayes: Bayesian Meta-Interpretative Learning Using Higher-Order Stochastic Refinement

References

Abe, N. & Warmuth, M. (1990). On the computational complexity of approximating distributions by probabilistic automata.Proceedings of the Third Workshop on Computational Learning Theory (pp. 52–66), Rochester, NY: Morgan Kaufmann.
Google Scholar
Akaike, H. (1974). A new look at the statistical model identification.IEEE Trans. Autom. Contr., AC-19, 716–723.
Google Scholar
Angluin, D. & Laird, P. (1988). Learning from noisy examles.Machine Learning, 343–370.
Barron, A.R. (1985).Logically smooth density estimation. Ph.D. dissertation, Dept. of Electrical Eng., Stanford Univ.
Barron, A.R. & Cover, T.M. (1991). Minimum complexity density estimation.IEEE Trans. on IT, IT-37, 1034–1054.
Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M.K. (1987). Occam's razor.Information Processing Letters, 24, 377–380.
Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M.K. (1989). Learnability and Vapnik-Chervonenkis dimension.Journal of ACM, 36, 929–965.
Google Scholar
Cesa-Bianchi, N. (1990). Learning the distribution in the extended PAC model.Proceedings of the First International Workshop on Algorithmic Learning Theory (pp. 236–246). Tokyo, Japan: Japanese Society for Artificial Intelligence.
Google Scholar
Ehrenfeucht, A., Haussler, D., Kearns, M., & Valiant, L. (1989). A general lower bound on the number of examples needed for learning.Information and Computation, 82, 247–251.
Google Scholar
Fisher, R.A. (1956).Statistical Methods and Scientific Inference. Olyver and Boyd.
Gallager, R.G. (1986).Information theory and reliable communication. New York: Wiley, 1986.
Google Scholar
Haussler, D. (1989).Generalizing the PAC model for neural net and other learning applications. Technical Report UCSC CRL-89-30, Univ. of California at Santa Cruz.
Haussler, D. (1990). Decision theoretic generalizations of the PAC learning model.Proceedings of the First International Workshop on Algorithmic Learning Theory (pp. 21–41), Tokyo, Japan: Japanese Society for Artificial Intelligence.
Google Scholar
Haussler, D. & Long, P. (1990).A generalization of Sauer's lemma. Technical Report UCSC CRL-90-15, Univ. of California at Santa Cruz.
Kearns, M. & Li, M. (1988). Learning in the presence of malicious errors.Proceedings of the 20th Annual ACM Symposium on Theory of Computing (pp. 267–279), Chicago, IL.
Kearns, M. & Schapire, R. (1990). Efficient distribution-free learning of probabilistic concepts.Proceedings of the 31st Symposium on Foundations of Computer Science (pp. 382–391), St. Louis, Missouri.
Kraft, C. (1949).A device for quantizing, grouping, and coding amplitude modulated pulses. M.S. Thesis, Department of Electrical Engineering, MIT, Cambridge, MA.
Google Scholar
Kraft, C. (1955). Some conditions for consistency and uniform consistency of statistical procedures.University of California Publications in Statistics, 2, 125–141.
Google Scholar
Kullback, S. (1967). A lower bound for discrimination in terms of variation.IEEE Trans. on IT, IT-13, 126–127.
Google Scholar
Laird, P.D. (1988). Efficient unsupervised learning.Proceedings of the First Annual Workshop on Computational Learning Theory (pp. 91–96), Cambridge, MA: Morgan Kaufmann.
Google Scholar
Pednault, E.P.D. (1989). Some experiments in applying inductive inference principles to surface reconstruction.Proceedings of the 11th International Joint Conference on Artificial Intelligence (pp. 1603–1609), Morgan Kaufmann.
Pitman, E.J.G. (1979).Some Basic Theory for Statistical Inference. London: Chapman and Hall.
Google Scholar
Pitt, L. & Valiant, L.G. (1988). Computational limitation on learning from examples.Journal of ACM, 35, 965–984.
Google Scholar
Quinlan, J.R. & Rivest, R.L. (1989). Inferring decision trees using the minimum description length criterion.Information and Computation, 80, 227–248.
Google Scholar
Rissanen, J. (1978). Modeling by shortest data description.Automatica, 14, 465–471.
Google Scholar
Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length.Annals of Statistics, 11, 416–431.
Google Scholar
Rissanen, J. (1984). Universal coding, information, prediction, and estimation.IEEE Trans. on IT, IT-30, 629–636.
Google Scholar
Rissanen, J. (1986). Stochastic complexity and modeling.Annals of Statistics, 14, 1080–1100.
Google Scholar
Rissanen, J. (1989).Stochastic complexity in statistical inquiry, World Scientific, Series in Computer Science, 15.
Rivest, R.L. (1987). Learning decision lists.Machine Learning, 2, 229–246.
Google Scholar
Schreiber, F. (1985). The Bayes Laplace statistic of the multinomial distributions.AEU, 39, 293–298.
Google Scholar
Segen, J. (1989).From features to symbols: Learning relational shape. In J.C. Simon, (Ed.),Pixels to features. Elsevier Science Publishers B.V.
Sloan, R. (1988). Types of noise in data for concept learning. InProceedings of the First Annual Workshop on Computational Learning Theory (pp. 91–96), Cambridge, MA, CA: Morgan Kaufmann.
Google Scholar
Solomonoff, R.J. (1964). A formal theory of inductive inference.Part 1. Information and Control, 7, 1–22.
Google Scholar
Valiant, L.G. (1984). A theory of the learnable.Communications of the ACM, 27, 1134–1142.
Google Scholar
Valiant, L.G. (1985). Learning disjunctions of conjunctions.Proceedings of the Ninth International Joint Conference on Artificial Intelligence (pp. 560–566), Los Angeles, CA: Morgan Kaufmann.
Google Scholar
Vapnik, V.N. & Chervonenkis, A.Ya. (1971). On the uniform convergence of relative frequencies of events to their probabilities.Theory of Probability and its Applications, XVI(2, 264–280.
Google Scholar
Wallace, C.S. & Boulton, D.M. (1968). An information measure for classification.Computer Journal, 185–194.
Yamanishi, K. (1989). Inductive inference and learning criterion of stochastic classification rules with hierarchical parameter structures.Proceedings of the 12th Symposium of Information Theory and Its Applications, 2 (pp. 707–712) (in Japanese), Inuyama, Japan.
Yamanishi, K. (1990a). Inferring optimal decision lists from stochastic data using the minimum description length criterion. Presented at1990 IEEE International Symposium on Information Theory, San Diego, CA.
Yamanishi, K. (1990b). A learning criterion for stochastic rules.Proceedings of the Third Annual Workshop on Computational Learning Theory (pp. 67–81), Rochester, NY: Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

C&C Information Technology Research Labs., NEC Corporation, 1-1, Miyazaki 4-chome, Miyamaeku, 216, Kawasaki, Kanagawa, Japan
Kenji Yamanishi

Authors

Kenji Yamanishi
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

An extended abstract of this paper appeared in Proceedings of the 3rd Annual Workshop on Computational Learning Theory.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yamanishi, K. A learning criterion for stochastic rules. Mach Learn 9, 165–203 (1992). https://doi.org/10.1007/BF00992676

Download citation

Issue Date: July 1992
DOI: https://doi.org/10.1007/BF00992676

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A learning criterion for stochastic rules

Abstract

Article PDF

Similar content being viewed by others

Finding Probabilistic Rule Lists using the Minimum Description Length Principle

Closed-Form Solutions in Learning Probabilistic Logic Programs by Exact Score Maximization

MetaBayes: Bayesian Meta-Interpretative Learning Using Higher-Order Stochastic Refinement

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A learning criterion for stochastic rules

Abstract

Article PDF

Similar content being viewed by others

Finding Probabilistic Rule Lists using the Minimum Description Length Principle

Closed-Form Solutions in Learning Probabilistic Logic Programs by Exact Score Maximization

MetaBayes: Bayesian Meta-Interpretative Learning Using Higher-Order Stochastic Refinement

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation