skip to main content
10.1145/1081870.1081887acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Fast discovery of unexpected patterns in data, relative to a Bayesian network

Published:21 August 2005Publication History

ABSTRACT

We consider a model in which background knowledge on a given domain of interest is available in terms of a Bayesian network, in addition to a large database. The mining problem is to discover unexpected patterns: our goal is to find the strongest discrepancies between network and database. This problem is intrinsically difficult because it requires inference in a Bayesian network and processing the entire, potentially very large, database. A sampling-based method that we introduce is efficient and yet provably finds the approximately most interesting unexpected patterns. We give a rigorous proof of the method's correctness. Experiments shed light on its efficiency and practicality for large-scale Bayesian networks and databases.

References

  1. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Bayardo and R. Agrawal. Mining the most interesting rules. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Dodge and H. Romig. A method of sampling inspection. The Bell System Technical Journal, 8:613--631, 1929.]]Google ScholarGoogle ScholarCross RefCross Ref
  4. C. Domingo, R. Gavalda, and O. Watanabe. Adaptive sampling methods for scaling up knowledge discovery algorithms. Data Mining and Knowledge Discovery, 6(2):131--152, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. U. Fayyad, G. Piatetski-Shapiro, and P. Smyth. Knowledge discovery and data mining: Towards a unifying framework. In KDD-96, 1996.]]Google ScholarGoogle Scholar
  6. W. Gilks, S. Richardson, and D. Spiegelhalter, editors. Markov Chain Monte Carlo in Practice. Chapman & Hall, 1995.]]Google ScholarGoogle ScholarCross RefCross Ref
  7. R. Greiner. PALO: A probabilistic hill-climbing algorithm. Artificial Intelligence, 83(1--2), July 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Hulten and P. Domingos. Mining complex models from aribtrarily large datasets in constant time. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Jaroszewicz and D. Simovici. A general measure of rule interestingness. In Proceedings of the European Conference on Principles and Practice of Knowledge Discovery and Data Mining, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Jaroszewicz and D. Simovici. Interestingness of frequent itemsets using Bayesian networks as background knowledge. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Jensen. Bayesian Networks and Decision Graphs. Springer Verlag, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Klösgen. Assistant for knowledge discovery in data. In P. Hoschka, editor, Assisting Computer: A New Generation of Support Systems, 1995.]]Google ScholarGoogle Scholar
  13. R. Kruse. Knowledge-based operations on graphical models. In Proceedings of the Dagstuhl Seminar on Probabilistic, Logical, and Relational Learning, 2005. In print.]]Google ScholarGoogle Scholar
  14. O. Maron and A. Moore. Hoeffding races: Accelerating model selection search for classification and function approximating. In Advances in Neural Information Processing Systems, pages 59--66, 1994.]]Google ScholarGoogle Scholar
  15. B. Padmanabhan and A. Tuzhilin. Unexpectedness as a measure of interestingness in knowledge discovery. Decision Support Systems, 27(3):303--318, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Padmanabhan and A. Tuzhilin. Small is beautiful: discovering the minimal set of unexpected patterns. In Proceedings of the Sixth SIGKDD Conference on Knowledge Discovery and Data Mining, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Myllymäki, T. Silander, H. Tirri, and P. Uronen. B-course: A web-based tool for bayesian and causal data analysis. International Journal on Artificial Intelligence Tools, 11(3):369--387, 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  18. T. Scheffer. Finding association rules that trade support optimally against confidence. In Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Scheffer and S. Wrobel. Finding the most interesting patterns in a database quickly by using sequential sampling. Journal of Machine Learning Research, 3:833--862, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Silberschatz and A. Tuzhilin. On subjective measures of interestingness in knowledge discovery. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 1995.]]Google ScholarGoogle Scholar
  21. H. Toivonen. Sampling large databases for association rules. In Proceedings of the International Conference on Very Large Databases, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast discovery of unexpected patterns in data, relative to a Bayesian network

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
      August 2005
      844 pages
      ISBN:159593135X
      DOI:10.1145/1081870

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2005

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader