doi:10.1016/S0167-8655(03)00165-X
Copyright © 2003 Elsevier B.V. All rights reserved.
Detecting pattern-based outliers
Tianming Hu
,
and Sam Y. Sung
Department of Computer Science, National University of Singapore, Singapore 117543, Singapore
Received 11 January 2003;
revised 23 June 2003.
Available online 29 July 2003.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Outlier detection targets those exceptional data that deviate from the general pattern. Besides high density clustering, there is another pattern called low density regularity. Thus, there are two types of outliers w.r.t. them. We propose two techniques: one to identify the two patterns and the other to detect the corresponding outliers.
Author Keywords: Outlier detection; Complete spatial randomness; Clustering; Regular spacing
Fig. 1. (a) Complete spatial randomness (csr), (b) clustering with two clusters, (c) regularity with a small Gaussian disturbance, (d) ratio vs. k.
Fig. 2. (a) Cluster-based global/local outliers, (b) density of clustering, (c) LOF with
k=2, (d) regularity-based outliers, (e) density of regularity, (f) LOF with
k=1,…,10.
Fig. 3. In csr, the fraction of top
p VOV outliers that have an absolute
z-score greater than 1.96.
Fig. 4. (a) Data with both cluster and regularity-based outliers, (b) density, (c) VOV with
k=2.
Fig. 5. (a) Ratio for ionosphere data, (b) LOF vs. VOV with
k=3 for ionosphere data, (c) LOF vs. VOV with
k=7 for ionosphere data, (d) ratio for cancer data. (e) LOF vs. VOV with
k=3 for cancer data, (f) LOF vs. VOV with
k=7 for cancer data, (g) ratio for diabetes data, (h) LOF vs. VOV with
k=3 for diabetes data, (i) LOF vs. VOV with
k=7 for diabetes data.
Fig. 6. Comparison of makeup of prediction by LOF (left bar) and VOV (right bar). TP∩ denotes true positive intersection. TP− denotes true positive difference. FP denotes false positive. Ionosphere data: (a)
k=3, (b)
k=7; Cancer data: (c)
k=3, (d)
k=7; Diabetes data: (e)
k=3, (f)
k=7.