Boosting First-Order Clauses for Large, Skewed Data Sets

Oliphant, Louis; Burnside, Elizabeth; Shavlik, Jude

doi:10.1007/978-3-642-13840-9_15

Boosting First-Order Clauses for Large, Skewed Data Sets

Louis Oliphant^20,22,
Elizabeth Burnside^21,22 &
Jude Shavlik^20,22

Conference paper

534 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5989))

Abstract

Creating an effective ensemble of clauses for large, skewed data sets requires finding a diverse, high-scoring set of clauses and then combining them in such a way as to maximize predictive performance. We have adapted the RankBoost algorithm in order to maximize area under the recall-precision curve, a much better metric when working with highly skewed data sets than ROC curves. We have also explored a range of possibilities for the weak hypotheses used by our modified RankBoost algorithm beyond using individual clauses. We provide results on four large, skewed data sets showing that our modified RankBoost algorithm outperforms the original on area under the recall-precision curves.

Appears In the ILP-2009 Springer LNCS Post-conference Proceedings.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. In: Neural Information Processing Systems (NIPS), MIT Press, Cambridge (2003)
Google Scholar
Davis, J., Burnside, E., Dutra, I., Page, D., Costa, V.: An integrated approach to learning Bayesian networks of rules. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 84–95. Springer, Heidelberg (2005)
Chapter Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)
Google Scholar
Dietterich, T.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Chapter Google Scholar
Dutra, I., Page, D., Costa, V., Shavlik, J.: An empirical evaluation of bagging in inductive logic programming. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 48–65. Springer, Heidelberg (2002)
Chapter Google Scholar
Džeroski, S., Lavrac, N.: An introduction to inductive logic programming. In: Proceedings of Relational Data Mining, pp. 48–66 (2001)
Google Scholar
Freund, Y., Iyer, R., Schapire, R., Singer, Y.: An efficient boosting algorithm for combining preferences. In: Proceedings of 15th International Conference on Machine Learning, pp. 170–178 (1998)
Google Scholar
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning, pp. 148–156 (1996)
Google Scholar
Goadrich, M., Oliphant, L., Shavlik, J.: Gleaner: Creating ensembles of first-order clauses to improve recall-precision curves. Machine Learning 64(1-3), 231–261 (2006)
Article MATH Google Scholar
Quinlan, J.R.: Relational learning and boosting. In: Relational Data Mining, pp. 292–306 (2001)
Google Scholar
Ray, S., Craven, M.: Representing sentence structure in hidden Markov models for information extraction. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence (2001)
Google Scholar
Srinivasan, A.: The Aleph manual version 4 (2003), http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/

Download references

Author information

Authors and Affiliations

Computer Sciences Department,
Louis Oliphant & Jude Shavlik
Radiology Department,
Elizabeth Burnside
Biostatistics and Medical Informatics Department, University of Wisconsin-Madison,
Louis Oliphant, Elizabeth Burnside & Jude Shavlik

Authors

Louis Oliphant
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth Burnside
View author publications
You can also search for this author in PubMed Google Scholar
Jude Shavlik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Heverlee, Belgium
Luc De Raedt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oliphant, L., Burnside, E., Shavlik, J. (2010). Boosting First-Order Clauses for Large, Skewed Data Sets. In: De Raedt, L. (eds) Inductive Logic Programming. ILP 2009. Lecture Notes in Computer Science(), vol 5989. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13840-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-13840-9_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13839-3
Online ISBN: 978-3-642-13840-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics