Article

Cluster-based concept invention for statistical relational learning

Authors:
Alexandrin Popescul

University of Pennsylvania, Philadelphia, PA

University of Pennsylvania, Philadelphia, PA
View Profile

,
Lyle H. Ungar

University of Pennsylvania, Philadelphia, PA

University of Pennsylvania, Philadelphia, PA
View Profile

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2004Pages 665–670https://doi.org/10.1145/1014052.1014137

Published:22 August 2004Publication History

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 665–670

ABSTRACT

We use clustering to derive new relations which augment database schema used in automatic generation of predictive features in statistical relational learning. Entities derived from clusters increase the expressivity of feature spaces by creating new first-class concepts which contribute to the creation of new features. For example, in CiteSeer, papers can be clustered based on words or citations giving "topics", and authors can be clustered based on documents they co-author giving "communities". Such cluster-derived concepts become part of more complex feature expressions. Out of the large number of generated features, those which improve predictive accuracy are kept in the model, as decided by statistical feature selection criteria. We present results demonstrating improved accuracy on two tasks, venue prediction and link prediction, using CiteSeer data.

References

M. Craven and S. Slattery. Relational learning with statistical predicate invention: Better models for hypertext. Machine Learning, 43(1/2):97--119, 2001. Google ScholarDigital Library
Saso Dzeroski and Nada Lavrac. An introduction to inductive logic programming. In Saso Dzeroski and Nada Lavrac, editors, Relational Data Mining, pages 48--73. Springer-Verlag, 2001. Google ScholarDigital Library
Dean Foster and Lyle Ungar. A proposal for learning by ontological leaps. In Proc. of Snowbird Learning Conference, Snowbird, Utah, 2002.Google Scholar
L. Getoor, N. Friedman, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In Saso Dzeroski and Nada Lavrac, editors, Relational Data Mining, pages 307--338. Springer-Verlag, 2001.Google ScholarCross Ref
L. Kaufman and P.J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, 1990.Google Scholar
Mathias Kirsten, Stefan Wrobel, and Tamas Horvath. Distance based approaches to relational learning and clustering. In Saso Dzeroski and Nada Lavrac, editors, Relational Data Mining, pages 213--230. Springer-Verlag, 2001. Google ScholarDigital Library
S. Kramer, N. Lavrac, and P. Flach. Propositionalization approaches to relational data mining. In Saso Dzeroski and Nada Lavrac, editors, Relational Data Mining, pages 262--291. Springer-Verlag, 2001. Google ScholarDigital Library
Stephen Muggleton and Luc De Raedt. Inductive logic programming: Theory and methods. Journal of Logic Programming, 19,20:629--679, 1994.Google ScholarCross Ref
Dmitry Pavlov, Alexandrin Popescul, David M. Pennock, and Lyle H. Ungar. Mixtures of conditional maximum entropy models. In Proc. of ICML-2003, 2003.Google Scholar
Claudia Perlich and Foster Provost. Aggregation-based feature invention and relational concept classes. In Proc. of KDD-2003, 2003. Google ScholarDigital Library
Alexandrin Popescul. Statistical Learning from Relational Databases, PhD thesis. University of Pennsylvania, 2004. Google ScholarDigital Library
Alexandrin Popescul, Gary Flake, Steve Lawrence, Lyle H. Ungar, and C. Lee Giles. Clustering and identifying temporal trends in document databases. In Proc. of the IEEE Advances in Digital Libraries, 2000. Google ScholarDigital Library
J.R. Quinlan and R.M. Cameron-Jones. Induction of logic programs: FOIL and related systems. New Generation Computing, 13:287--312, 1995.Google ScholarDigital Library
G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw Hill, 1983. Google ScholarDigital Library
Gideon Schwartz. Estimating the dimension of a model. The Annals of Statistics, 6(2):461--464, 1979.Google ScholarCross Ref
E. Shapiro. Algorithmic Program Debugging. MIT Press, 1983. Google ScholarDigital Library

Index Terms

Cluster-based concept invention for statistical relational learning
1. Computing methodologies
  1. Machine learning

Recommendations

Relational visual cluster validity (RVCV)

The assessment of cluster validity plays a very important role in cluster analysis. Most commonly used cluster validity methods are based on statistical hypothesis testing or finding the best clustering scheme by computing a number of different cluster ...
Read More
Aggregation-based feature invention and relational concept classes
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Model induction from relational data requires aggregation of the values of attributes of related entities. This paper makes three contributions to the study of relational learning. (1) It presents a hierarchy of relational concepts of increasing ...
Read More
Some connectivity based cluster validity indices

Identification of the correct number of clusters and the appropriate partitioning technique are some important considerations in clustering where several cluster validity indices, primarily utilizing the Euclidean distance, have been used in the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2004
874 pages
ISBN:1581138881
DOI:10.1145/1014052
General Chairs:
Won Kim
Cyber Database Solutions
,
Ronny Kohavi
Amazon.com
,
Program Chairs:
Johannes Gehrke
Cornell University
,
William DuMouchel
AT&T Labs Research
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 August 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
feature generation
relational learning
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 615
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cluster-based concept invention for statistical relational learning

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Relational visual cluster validity (RVCV)

Aggregation-based feature invention and relational concept classes

Some connectivity based cluster validity indices