Issue Downloads
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering
As a prolific research area in data mining, subspace clustering and related problems induced a vast quantity of proposed solutions. However, many publications compare a new proposition—if at all—with one or two competitors, or even with a so-called “...
Semi-analytical method for analyzing models and model selection measures based on moment analysis
In this article we propose a moment-based method for studying models and model selection measures. By focusing on the probabilistic space of classifiers induced by the classification algorithm rather than on that of datasets, we obtain efficient ...
Closed patterns meet n-ary relations
Set pattern discovery from binary relations has been extensively studied during the last decade. In particular, many complete and efficient algorithms for frequent closed set mining are now available. Generalizing such a task to n-ary relations (n ≥ 2) ...
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets
In this work a novel distance-based outlier detection algorithm, named DOLPHIN, working on disk-resident datasets and whose I/O cost corresponds to the cost of sequentially reading the input dataset file twice, is presented.
It is both theoretically and ...
Bellwether analysis: Searching for cost-effective query-defined predictors in large databases
How to mine massive datasets is a challenging problem with great potential value. Motivated by this challenge, much effort has concentrated on developing scalable versions of machine learning algorithms. However, the cost of mining large datasets is not ...