Heuristic Measures of Interestingness

Hilderman, Robert J.; Hamilton, Howard J.

doi:10.1007/978-3-540-48247-5_25

Robert J. Hilderman⁸ &
Howard J. Hamilton⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1704))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

1970 Accesses
17 Citations

Abstract

The tuples in a generalized relation (i.e., a summary generated from a database) are unique, and therefore, can be considered to be a population with a structure that can be described by some probability distribution. In this paper, we present and empirically compare sixteen heuristic measures that evaluate the structure of a summary to assign a single real-valued index that represents its interestingness relative to other summaries generated from the same database. The heuristics are based upon well-known measures of diversity, dispersion, dominance, and inequality used in several areas of the physical, social, ecological, management, information, and computer sciences. Their use for ranking summaries generated from databases is a new application area. All sixteen heuristics rank less complex summaries (i.e., those with few tuples and/or few non-ANY attributes) as most interesting. We demonstrate that for sample data sets, the order in which some of the measures rank summaries is highly correlated.

Download to read the full chapter text

Chapter PDF

Summary Data Structures for Massive Data

Sets of Robust Rules, and How to Find Them

Two representations of information structures and their comparisons

Article 02 November 2022

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Atkinson, A.B.: On the measurement of inequality. Journal of Economic Theory 2, 244–263 (1970)
Article MathSciNet Google Scholar
Berger, W.H., Parker, F.L.: Diversity of planktonic forminifera in deep-sea sediments. Science 168, 1345–1347 (1970)
Article Google Scholar
Bournaud, I., Ganascia, J.-G.: Accounting for domain knowledge in the construction of a generalization space. In: Proceedings of the Third International Conference on Conceptual Structures, pp. 446–459. Springer, Heidelberg (1997)
Google Scholar
Bray, J.R., Curtis, J.T.: An ordination of the upland forest communities of southern Wisconsin. Ecological Monographs 27, 325–349 (1957)
Article Google Scholar
Freitas, A.A.: On objective measures of rule surprisingness. In: Zytkow, J., Quafafou, M. (eds.) Proceedings of the Second European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD 1998), Nantes, France, September 1998, pp. 1–9 (1998)
Google Scholar
Godin, R., Missaoui, R., Alaoui, H.: Incremental concept formation algorithms based on galois (concept) lattices. Computational Intelligence 11(2), 246–267 (1995)
Article Google Scholar
Hamilton, H.J., Hilderman, R.J., Li, L., Randall, D.J.: Generalization lattices. In: Zytkow, J., Quafafou, M. (eds.) Proceedings of the Second European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD 1998), Nantes, France, September 1998, pp. 328–336 (1998)
Google Scholar
Hilderman, R.J., Hamilton, H.J.: Heuristics for ranking the interestingness of discovered knowledge. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 204–210. Springer, Heidelberg (1999)
Chapter Google Scholar
Hilderman, R.J., Hamilton, H.J., Barber, B.: Ranking the interestingness of summaries from data mining systems. In: Proceedings of the 12th International Florida Artificial Intelligence Research Symposium (FLAIRS 1999), Orlando, Florida, May 1999, pp. 100–106 (1999)
Google Scholar
Hilderman, R.J., Hamilton, H.J., Kowalchuk, R.J., Cercone, N.: Parallel knowledge discovery using domain generalization graphs. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 25–35. Springer, Heidelberg (1997)
Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Annals of Mathematical Statistics 22, 79–86 (1951)
Article MATH MathSciNet Google Scholar
Liu, H., Lu, H., Yao, J.: Identifying relevant databases for multidatabase mining. In: Wu, X., Kotagiri, R., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394, pp. 210–221. Springer, Heidelberg (1998)
Chapter Google Scholar
MacArthur, R.H.: Patterns of species diversity. Biological Review 40, 510–533 (1965)
Article Google Scholar
McIntosh, R.P.: An index of diversity and the relation of certain concepts to diveristy. Ecology 48(3), 392–404 (1967)
Article Google Scholar
Rosenkrantz, W.A.: Introduction to Probability and Statistics for Scientists and Engineers. McGraw-Hill, New York (1997)
Google Scholar
Schutz, R.R.: On the measurement of income inequality. American Economic Review 41, 107–122 (1951)
Google Scholar
Shannon, C.E., Weaver, W.: The mathematical theory of communication. University of Illinois Press, Urbana (1949)
Google Scholar
Simpson, E.H.: Measurement of diversity. Nature 163, 688 (1949)
Article MATH Google Scholar
Stumme, G., Wille, R., Wille, U.: Conceptual knowledge discovery in databases using formal concept analysis methods. In: Zytkow, J., Quafafou, M. (eds.) Proceedings of the Second European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD 1998), Nantes, France, September 1998, pp. 450–458 (1998)
Google Scholar
Theil, H.: Economics and information theory. Rand McNally (1970)
Google Scholar
Whittaker, R.H.: Evolution and measurement of species diversity. Taxon 21(2/3), 213–251 (1972)
Article Google Scholar
Yao, Y.Y., Wong, S.K.M., Butz, C.J.: On information-theoretic measures of attribute importance. In: Zhong, N., Zhou, L. (eds.) Proceedings of the Third Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 1999), Beijing, China, April 1999, pp. 133–137 (1999)
Google Scholar
Young, J.F.: Information theory. John Wiley & Sons, Chichester (1971)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada, S4S 0A2
Robert J. Hilderman & Howard J. Hamilton

Authors

Robert J. Hilderman
View author publications
You can also search for this author in PubMed Google Scholar
Howard J. Hamilton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, UNC Charlotte, Charlotte, N.C. 28223 and Institute of Computer Science, Polish Academy of Sciences,
Jan M. Żytkow
Faculty of Informatics and Statistics, University of Economics, Prague, nám. W. Churchilla 4, 130 67, Prague, Czech Republic
Jan Rauch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hilderman, R.J., Hamilton, H.J. (1999). Heuristic Measures of Interestingness. In: Żytkow, J.M., Rauch, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1999. Lecture Notes in Computer Science(), vol 1704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48247-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-48247-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66490-1
Online ISBN: 978-3-540-48247-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Heuristic Measures of Interestingness

Abstract

Chapter PDF

Similar content being viewed by others

Summary Data Structures for Massive Data

Sets of Robust Rules, and How to Find Them

Two representations of information structures and their comparisons

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Heuristic Measures of Interestingness

Abstract

Chapter PDF

Similar content being viewed by others

Summary Data Structures for Massive Data

Sets of Robust Rules, and How to Find Them

Two representations of information structures and their comparisons

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation