Article

Fast discovery of unexpected patterns in data, relative to a Bayesian network

Authors:
Szymon Jaroszewicz

Technical University of Szczecin, Szczecin, Poland

Technical University of Szczecin, Szczecin, Poland
View Profile

,
Tobias Scheffer

Humboldt-Universität zu Berlin, Berlin, Germany

Humboldt-Universität zu Berlin, Berlin, Germany
View Profile

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data miningAugust 2005Pages 118–127https://doi.org/10.1145/1081870.1081887

Published:21 August 2005Publication History

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Pages 118–127

ABSTRACT

We consider a model in which background knowledge on a given domain of interest is available in terms of a Bayesian network, in addition to a large database. The mining problem is to discover unexpected patterns: our goal is to find the strongest discrepancies between network and database. This problem is intrinsically difficult because it requires inference in a Bayesian network and processing the entire, potentially very large, database. A sampling-based method that we introduce is efficient and yet provably finds the approximately most interesting unexpected patterns. We give a rigorous proof of the method's correctness. Experiments shed light on its efficiency and practicality for large-scale Bayesian networks and databases.

References

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, 1996.]] Google ScholarDigital Library
R. Bayardo and R. Agrawal. Mining the most interesting rules. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 1999.]] Google ScholarDigital Library
H. Dodge and H. Romig. A method of sampling inspection. The Bell System Technical Journal, 8:613--631, 1929.]]Google ScholarCross Ref
C. Domingo, R. Gavalda, and O. Watanabe. Adaptive sampling methods for scaling up knowledge discovery algorithms. Data Mining and Knowledge Discovery, 6(2):131--152, 2002.]] Google ScholarDigital Library
U. Fayyad, G. Piatetski-Shapiro, and P. Smyth. Knowledge discovery and data mining: Towards a unifying framework. In KDD-96, 1996.]]Google Scholar
W. Gilks, S. Richardson, and D. Spiegelhalter, editors. Markov Chain Monte Carlo in Practice. Chapman & Hall, 1995.]]Google ScholarCross Ref
R. Greiner. PALO: A probabilistic hill-climbing algorithm. Artificial Intelligence, 83(1--2), July 1996.]] Google ScholarDigital Library
G. Hulten and P. Domingos. Mining complex models from aribtrarily large datasets in constant time. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 2002.]] Google ScholarDigital Library
S. Jaroszewicz and D. Simovici. A general measure of rule interestingness. In Proceedings of the European Conference on Principles and Practice of Knowledge Discovery and Data Mining, 2001.]] Google ScholarDigital Library
S. Jaroszewicz and D. Simovici. Interestingness of frequent itemsets using Bayesian networks as background knowledge. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 2004.]] Google ScholarDigital Library
F. Jensen. Bayesian Networks and Decision Graphs. Springer Verlag, 2001.]] Google ScholarDigital Library
W. Klösgen. Assistant for knowledge discovery in data. In P. Hoschka, editor, Assisting Computer: A New Generation of Support Systems, 1995.]]Google Scholar
R. Kruse. Knowledge-based operations on graphical models. In Proceedings of the Dagstuhl Seminar on Probabilistic, Logical, and Relational Learning, 2005. In print.]]Google Scholar
O. Maron and A. Moore. Hoeffding races: Accelerating model selection search for classification and function approximating. In Advances in Neural Information Processing Systems, pages 59--66, 1994.]]Google Scholar
B. Padmanabhan and A. Tuzhilin. Unexpectedness as a measure of interestingness in knowledge discovery. Decision Support Systems, 27(3):303--318, 1999.]] Google ScholarDigital Library
B. Padmanabhan and A. Tuzhilin. Small is beautiful: discovering the minimal set of unexpected patterns. In Proceedings of the Sixth SIGKDD Conference on Knowledge Discovery and Data Mining, 2000.]] Google ScholarDigital Library
P. Myllymäki, T. Silander, H. Tirri, and P. Uronen. B-course: A web-based tool for bayesian and causal data analysis. International Journal on Artificial Intelligence Tools, 11(3):369--387, 2002.]]Google ScholarCross Ref
T. Scheffer. Finding association rules that trade support optimally against confidence. In Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases, 2001.]] Google ScholarDigital Library
T. Scheffer and S. Wrobel. Finding the most interesting patterns in a database quickly by using sequential sampling. Journal of Machine Learning Research, 3:833--862, 2002.]] Google ScholarDigital Library
A. Silberschatz and A. Tuzhilin. On subjective measures of interestingness in knowledge discovery. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 1995.]]Google Scholar
H. Toivonen. Sampling large databases for association rules. In Proceedings of the International Conference on Very Large Databases, 1996.]] Google ScholarDigital Library

Index Terms

Fast discovery of unexpected patterns in data, relative to a Bayesian network
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

On Characterization and Discovery of Minimal Unexpected Patterns in Rule Discovery

A drawback of traditional data-mining methods is that they do not leverage prior knowledge of users. In prior work, we proposed a method that could discover unexpected patterns in data by using domain knowledge in a systematic manner. In this paper, we ...
Read More
Discovery of unexpected patterns in data mining applications
Read More
Human disease network guided discovery of interesting itemsets in hospital discharge data
DMMH '11: Proceedings of the 2011 workshop on Data mining for medicine and healthcare

Standard knowledge discovery techniques, such as unsupervised or supervised descriptive rule discovery, have been widely used in medical data mining. Most of the research is focused on developing effective association rule evaluation metrics that would ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
August 2005
844 pages
ISBN:159593135X
DOI:10.1145/1081870
General Chair:
Robert Grossman
University of Illinois at Chicago & Open Data Partners, USA
,
Program Chairs:
Roberto Bayardo
IBM Almaden Research, USA
,
Kristin Bennett
RPI, USA
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bayesian networks
association rules
sampling
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 978
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast discovery of unexpected patterns in data, relative to a Bayesian network

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

On Characterization and Discovery of Minimal Unexpected Patterns in Rule Discovery

Discovery of unexpected patterns in data mining applications

Human disease network guided discovery of interesting itemsets in hospital discharge data