Abstract
Many applications, such as knowledge base completion and automated diagnosis of patients, only have access to positive examples but lack negative examples which are required by standard relational learning techniques and suffer under the closed-world assumption. The corresponding propositional problem is known as Positive and Unlabeled (PU) learning. In this field, it is known that using the label frequency (the fraction of true positive examples that are labeled) makes learning easier. This notion has not been explored yet in the relational domain. The goal of this work is twofold: (1) to explore if using the label frequency would also be useful when working with relational data and (2) to propose a method for estimating the label frequency from relational positive and unlabeled data. Our experiments confirm the usefulness of knowing the label frequency and of our estimate.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Zupanc, K., Davis, J.: Estimating rule quality for knowledge base completion with the relationship between coverage assumption. In: Proceedings of the 27th International Conference on World Wide Web (WWW 2018) (2018)
Claesen, M., De Smet, F., Gillard, P., Mathieu, C., De Moor, B.: Building classifiers to predict the start of glucose-lowering pharmacotherapy using Belgian health expenditure data. arXiv preprint arXiv:1504.07389 (2015)
Muggleton, S.: Learning from positive data. In: Selected Papers from the 6th International Workshop on Inductive Logic Programming, pp. 358–376 (1996)
McCreath, E., Sharma, A.: ILP with noise and fixed example size: a Bayesian approach. In: Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pp. 1310–1315 (1997)
Schoenmackers, S., Davis, J., Etzioni, O., Weld, D.S.: Learning first-order Horn clauses from web text. In: Proceedings of Conference on Empirical Methods on Natural Language Processing (2010)
Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220 (2008)
du Plessis, M.C., Niu, G., Sugiyama, M.: Class-prior estimation for learning from positive and unlabeled data. Mach. Learn., 1–30 (2015)
Jain, S., White, M., Radivojac, P.: Estimating the class prior and posterior from noisy positives and unlabeled data. In: Advances in Neural Information Processing Systems (2016)
Ramaswamy, H.G., Scott, C., Tewari, A.: Mixture proportion estimation via kernel embedding of distributions. In: Proceedings of International Conference on Machine Learning (2016)
Bekker, J., Davis, J.: Estimating the class prior in positive and unlabeled data through decision tree induction. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (2018)
Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: Proceedings of the International Conference on Machine Learning, pp. 387–394 (2002)
Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: Proceedings of the International Joint Conference on Artifical Intelligence, pp. 587–592 (2003)
Yu, H.: Single-class classification with mapping convergence. Mach. Learn. 61(1–3), 49–69 (2005)
Li, X.L., Yu, P.S., Liu, B., Ng, S.K.: Positive unlabeled learning for data stream classification. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 259–270 (2009)
Nguyen, M.N., Li, X.L., Ng, S.K.: Positive unlabeled leaning for time series classification. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1421–1426 (2011)
Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the International Conference on Machine Learning, vol. 3, pp. 448–455 (2003)
Liu, Z., Shi, W., Li, D., Qin, Q.: Partially supervised classification-based on weighted unlabeled samples support vector machine. In: International Conference on Advanced Data Mining and Applications, pp. 118–129 (2005)
Mordelet, F., Vert, J.P.: A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 37, 201–209 (2014)
Claesen, M., De Smet, F., Suykens, J.A., De Moor, B.: A robust ensemble approach to learn from positive and unlabeled data using SVM base models. Neurocomputing 160, 73–84 (2015)
Denis, F.Ç.: PAC learning from positive statistical queries. In: Richter, M.M., Smith, C.H., Wiehagen, R., Zeugmann, T. (eds.) ALT 1998. LNCS (LNAI), vol. 1501, pp. 112–126. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49730-7_9
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 179–186 (2003)
Zhang, D., Lee, W.S.: A simple probabilistic approach to learning from positive and unlabeled examples. In: Proceedings of the 5th Annual UK Workshop on Computational Intelligence, pp. 83–87 (2005)
Denis, F., Gilleron, R., Letouzey, F.: Learning from positive and unlabeled examples. Theoret. Comput. Sci. 348(1), 70–83 (2005)
du Plessis, M.C., Sugiyama, M.: Class prior estimation from positive and unlabeled data. IEICE Trans. 97-D, 1358–1362 (2014)
Khot, T., Natarajan, S., Shavlik, J.W.: Relational one-class classification: a non-parametric approach. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence (2014)
Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.: Fast rule mining in ontological knowledge bases with amie+. The VLDB J. 24(6), 707–730 (2015)
Lao, N., Subramanya, A., Pereira, F., Cohen, W.W.: Reading the web with learned syntactic-semantic inference rules. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1017–1026 (2012)
Socher, R., Chen, D., Manning, C.D., Ng, A.: Reasoning with neural tensor networks for knowledge base completion. In: Advances in Neural Information Processing Systems 26, pp. 926–934 (2013)
Gardner, M., Talukdar, P.P., Krishnamurthy, J., Mitchell, T.M.: Incorporating vector space similarity in random walk inference over knowledge bases. In: EMNLP (2014)
Neelakantan, A., Roth, B., McCallum, A.: Compositional vector space models for knowledge base completion. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2015)
De Comité, F., Denis, F., Gilleron, R., Letouzey, F.: Positive and unlabeled examples help learning. In: Watanabe, O., Yokomori, T. (eds.) ALT 1999. LNCS (LNAI), vol. 1720, pp. 219–230. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-46769-6_18
Blockeel, H., De Raedt, L.: Top-down induction of first-order logical decision trees. Artif. Intell. 101, 285–297 (1998)
Srinivasan, A.: The Aleph manual (2001)
Acknowledgements
JB is supported by IWT (SB/141744). JD is partially supported by the KU Leuven Research Fund (C14/17/070, C32/17/036), FWO-Vlaanderen (SBO-150033, G066818N, EOS-30992574, T004716N), Chist-Era ReGround project, and EU VA project Nano4Sports.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Bekker, J., Davis, J. (2018). Positive and Unlabeled Relational Classification Through Label Frequency Estimation. In: Lachiche, N., Vrain, C. (eds) Inductive Logic Programming. ILP 2017. Lecture Notes in Computer Science(), vol 10759. Springer, Cham. https://doi.org/10.1007/978-3-319-78090-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-78090-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78089-4
Online ISBN: 978-3-319-78090-0
eBook Packages: Computer ScienceComputer Science (R0)