Abstract
Statistical relational learning (SRL) is a subarea in machine learning which addresses the problem of performing statistical inference on data that is correlated and not independently and identically distributed (i.i.d.)—as is generally assumed. For the traditional i.i.d. setting, distribution-free bounds exist, such as the Hoeffding bound, which are used to provide confidence bounds on the generalization error of a classification algorithm given its hold-out error on a sample size of N. Bounds of this form are currently not present for the type of interactions that are considered in the data by relational classification algorithms. In this paper, we extend the Hoeffding bounds to the relational setting. In particular, we derive distribution-free bounds for certain classes of data generation models that do not produce i.i.d. data and are based on the type of interactions that are considered by relational classification algorithms that have been developed in SRL. We conduct empirical studies on synthetic and real data which show that these data generation models are indeed realistic and the derived bounds are tight enough for practical use.
Similar content being viewed by others
References
Arias M, Feigelson A, Khardon R, Servedio R (2006) Polynomial certificates for propositional classes. Inf Comput 204(5): 816–834
Arias M, Khardon R (2002) Learning closed horn expressions. Inf Comput 178(1): 214–240
Bakir G, Hofmann T, Schölkopf B, Smola A, Taskar B, Vishwanathan SVN (2007) Predicting structured data. The MIT Press, Cambridge
Bartlett P, Bousquet O, Mendelson S (2002) Local rademacher complexities. Ann Stat 33: 44–58
Bennett G (1962) Probability inequalities for the sums of independent random variables. JASA 57: 33–45
Blum A, Kalai A, Langford J (1999) Beating the hold-out: bounds for k-fold and progressive cross-validation. Comput Learn Theory 203–208
Blumer A, Ehrenfueucht A, Haussler D, Warmuth M (1987) Occam’s razor. Inf Process Lett 24: 377–380
Chernoff H (1952) A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann Math Stat 23: 493–507
Cohen W (1995) Polynomial learnability and inductive logic programming: methods and results. New Gener Comput 13: 369–409
Devroye L, Györfi L, Lugosi G (1996) A Probabilistic theory of pattern recognition. Springer, New York
Floyd S, Warmuth M (1995) Sample compression, learnability and the vapnik-chervonenkis dimension. Mach Learn 21: 269–304
Friedman N, Getoor L, Koller D, Pfeffer A (1999) Learning probabilistic relational models. IJCAI 1300–1309
Getoor L, Taskar B (2007) Introduction to statistical relational learning. MIT Press, Cambridge
Godwin H (1955) On generalization of tchebyshev’s inequality. JASA 50: 923–945
Grimmett G, Stirzaker D (2001) Probability and random processes, 3rd edn. Oxford University Press, Oxford
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. JASA 58(301): 13–30
Hulten G, Domingos P, Abe Y (2003) Mining massive relational databases
Jensen D, Neville J (2002) Linkage and autocorrelation cause feature selection bias in relational learning
Jensen J (1906) Sur les fonctions convexes et les ingalits entre les valeurs moyennes. Acta Math 30: 175–193
Jia Y, Zhang J, Huan J (2011) An efficient graph-mining method for complicated and noisy data with real-world applications. Knowl Inf Syst
Kok S, Singla P, Richardson M, Domingos P (2005) The alchemy system for statistical relational ai. Technical report, department of computer science and engineering, UW, http://www.cs.washington.edu/ai/alchemy/
Langford J (2005) Tutorial on practical prediction theory for classification. J Mach Learn Res 6: 273–306
Mcallester D (1999) Pac-bayesian model averaging. In: Proceedings of the twelfth annual conference on computational learning theory. ACM Press, pp 164–170
Neville J (2006) Statistical models and analysis techniques for learning in relational data. Ph.D. Thesis, University of Massachusetts Amhers
Neville J, Gallagher B, Eliassi-Rad T, Wang T (2011) Correcting evaluation bias of relational classifiers with network cross validation. Knowl Inf Syst
Neville J, Jensen D (2005) Leveraging relational autocorrelation with latent group models. In: MRDM ’05: Proceedings of the 4th international workshop on Multi-relational mining. ACM, New York, NY, USA, pp 49–55
Neville J, Jensen D (2007) Relational dependency networks. J Mach Learn Res 8: 653–692
Neville J, Jensen D, Gallagher B (2003) Simple estimators for relational bayesian classifiers
Okamoto M (1958) Some inequalities relating to the partial sum of binomial probabilites. Ann Inst Stat Math 10: 29–35
Papoulis A (1991) Probability, random variables and stochastic processes. 3. McGraw-Hill, New York
Preisach C, Schmidt-Thieme L (2008) Ensembles of relational classifiers. Knowl Inf Syst 14(2): 249–272
Raedt L (1994) First order jk-clausal theories are pac-learnable. Artif Intell 70: 375–392
Reddy C, Park J (2010) Multi-resolution boosting for classification and regression problems. Knowl Inf Syst
Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62(1–2): 107–136
Rusu F, Dobra A (2007) Pseudo-random number generation for sketch-based estimations. ACM Trans Database Syst 32(2): 11
Savage I (1961) Probability inequalities of the tchebyshev type. J Res Natl Bur Stand 65B: 211–222
Schmidt J, Siegel A, Srinivasan A (1995) Chernoff-hoeffding bounds for applications with limited independence. SIAM J Discret Math 8: 223–250
Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic models for relational data. In: Proceedings 18th conference on uncertainty in AI, pp 485–492
Vapnik V (1998) Statistical learning theory. Wiley, New York
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dhurandhar, A., Dobra, A. Distribution-free bounds for relational classification. Knowl Inf Syst 31, 55–78 (2012). https://doi.org/10.1007/s10115-011-0406-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0406-4