ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Data & Knowledge Engineering
Volume 17, Issue 3, December 1995, Pages 245-262
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Purchase PDF (1092 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/0169-023X(95)00024-M    How to Cite or Link Using DOI (Opens New Window)
Copyright © 1995 Published by Elsevier B.V.

Paper

Set-oriented data mining in relational databases

Maurice Houtsmaa, * and Arun Swamib,

a University of Twente, Enschede, The Netherlands b IBM Almaden Research Center, San Jose, CA, USA

Received 25 July 1994; 
Revised 14 March 1995; 
accepted 28 July 1995. 
Available online 28 December 1999.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.

In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases.

Author Keywords: Data mining; Optimization; Set-oriented algorithms

Article Outline

• References

Data & Knowledge Engineering
Volume 17, Issue 3, December 1995, Pages 245-262
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.