Skip to main content
Log in

On Detecting Spatial Outliers

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

The ever-increasing volume of spatial data has greatly challenged our ability to extract useful but implicit knowledge from them. As an important branch of spatial data mining, spatial outlier detection aims to discover the objects whose non-spatial attribute values are significantly different from the values of their spatial neighbors. These objects, called spatial outliers, may reveal important phenomena in a number of applications including traffic control, satellite image analysis, weather forecast, and medical diagnosis. Most of the existing spatial outlier detection algorithms mainly focus on identifying single attribute outliers and could potentially misclassify normal objects as outliers when their neighborhoods contain real spatial outliers with very large or small attribute values. In addition, many spatial applications contain multiple non-spatial attributes which should be processed altogether to identify outliers. To address these two issues, we formulate the spatial outlier detection problem in a general way, design two robust detection algorithms, one for single attribute and the other for multiple attributes, and analyze their computational complexities. Experiments were conducted on a real-world data set, West Nile virus data, to validate the effectiveness of the proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. C.C. Aggarwal. “Redesigning distance functions and distance-based applications for high dimensional data,” SIGMOD Record, Vol. 30(1):13–18, March 2001.

  2. C.C. Aggarwal, J.L. Wolf, P.S. Yu, C. Procopiuc, and J. S. Park. “Fast algorithms for projected clustering,” in Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 61–72, Philadelphia, Pennsylvania, United States, June 1–3, 1999.

  3. C.C. Aggarwal and P.S. Yu. “Outlier detection for high dimensional data,” in Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 37–46, Santa Barbara, California, United States, May 21–24, 2001.

  4. V. Barnett and T. Lewis. Outliers in Statistical Data. Wiley, New York, 1994.

    Google Scholar 

  5. S. Berchtold, C. Böhm, and H.-P. Kriegal. “The pyramid-technique: Towards breaking the curse of dimensionality,” in Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 142–153, Seattle, Washington, United States, June 2–4, 1998.

  6. M.M. Breunig, H.-P. Kriegel, R.T. Ng, and J. Sander. “Lof: Identifying density-based local outliers.” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104, Dallas, Texas, United States, May 14–19, 2000.

  7. A. Cerioli and M. Riani. “The ordering of spatial data and the detection of multiple outliers,” Journal of Computational and Graphical Statistics, Vol. 8(2):239–258, June 1999.

    Article  Google Scholar 

  8. P.K. Chan, W. Fan, A.L. Prodromidis, and S.J. Stolfo. “Distributed data mining in credit card fraud detection,” IEEE Intelligent Systems, Vol. 14(6):67–74, 1999.

    Article  Google Scholar 

  9. W.S. Chan and W.N. Liu. “Diagnosing shocks in stock markets of Southeast Asia, Australia, and New Zealand,” Mathematics and Computers in Simulation, Vol. 59(1–3):223–232, 2002.

    Article  Google Scholar 

  10. A. Conci and C.B. Proença. “A system for real-time fabric inspection and industrial decision,” in Proceedings of the 14th International Conference on Software Engineering and Knowledge Engineering, pp. 707–714, Ischia, Italy, July 15–19, 2002.

  11. D. Freedman, R. Pisani, and R. Purves. Statistics. Norton, Vol. 41:212–223, 1998.

    Google Scholar 

  12. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. “A density-based algorithm for discovering clusters in large spatial databases with noise,” in the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231, Portland, Oregon, United States, August 2–4, 1996.

  13. R. Haining. Spatial Data Analysis in the Social and Environmental Sciences. Cambridge University Press, 1993.

  14. J. Hardin and D.M. Rocke. “The distribution of robust distances,” Journal of Computational and Graphical Statistics, Vol. 14:1–19, 2005.

    Article  Google Scholar 

  15. J. Haslett, R. Brandley, P. Craig, A. Unwin, and G. Wills. “Dynamic graphics for exploring spatial data with application to locating global and local anomalies,” The American Statistician, Vol. 45:234–242, 1991.

    Article  Google Scholar 

  16. A. Hinneburg, C.C. Aggarwal, and D.A. Keim. “What is the nearest neighbor in high dimensional spaces?” in Proceedings of 26th International Conference on Very Large Data Bases, pp. 506–515, Cairo, Egypt, September 10–14, 2000.

  17. W. Jin, A.K.H. Tung, and J. Han. “Mining top-n local outliers in large databases,” in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 293–298, San Francisco, California, United States, August 26–29, 2001.

  18. E.M. Knorr and R.T. Ng. “Algorithms for mining distance-based outliers in large datasets,” in Proceedings of the 24th International Conference on Very Large Data Bases, pp. 392–403, New York City, NY, United States, August 24–27, 1998.

  19. H. Liu, K.C. Jezek, and M.E. O’Kelly. “Detecting outliers in irregularly distributed spatial data sets by locally adaptive and robust statistical analysis and gis,” International Journal of Geographical Information Science, Vol. 15(8):721–741, 2001.

    Article  Google Scholar 

  20. C.-T. Lu, D. Chen, and Y. Kou. “Detecting spatial outliers with multiple attributes,” in Proceedings of the 15th International Conference on Tools with Artificial Intelligence, pp. 122–128, Sacramento, California, United States, November 3–5, 2003.

  21. C.-T. Lu, D. Chen, and Y. Kou. “Algorithms for spatial outlier detection,” in Proceedings of the 3rd IEEE International Conference on Data Mining, Melbourne, Florida, pp. 597–600, November 19–22, 2003.

  22. C.-T. Lu and L.R. Liang. “Wavelet fuzzy classification for detecting and tracking region outliers in meteorological data,” in Proceedings of the 12th Annual ACM International Workshop on Geographic Information Systems, pp. 258–265, Washington DC, United States, November 12–13, 2004.

  23. A. Luc. “Local indicators of spatial association: Lisa.” Geographical Analysis, Vol. 27(2):93–115, 1995.

    Google Scholar 

  24. M. Blum, R.W. Floyd, V. Pratt, R. Rivest, and R. Tarjan. “Time bounds for selection,” Journal of Computer and System Sciences, Vol. 7:448–461, 1973.

    Article  Google Scholar 

  25. A. Mkhadri. “Shrinkage parameter for the modified linear discriminant analysis,” Pattern Recognition Letters, Vol. 16(3):267–275, 1995.

    Article  Google Scholar 

  26. R. T. Ng and J. Han. “Efficient and effective clustering methods for spatial data mining,” in Proceedings of the 20th International Conference on Very Large Data Bases, pp. 144–155, Santiago de Chile, Chile, September 12–15, 1994.

  27. Y. Panatier. VARIOWIN: Software for Spatial Data Analysis in 2D. Springer, New York, 1996.

    Google Scholar 

  28. M. Prastawa, E. Bullitt, S. Ho, and G. Gerig. “A brain tumor segmentation framework based on outlier detection,” Medical Image Analysis, Vol. 9(5):457–466, 2004.

    Article  Google Scholar 

  29. F.P. Preparata and M.I. Shamos. Computational Geometry—An Introduction. Springer, 1985.

  30. S. Ramaswamy, R. Rastogi, and K. Shim. “Efficient algorithms for mining outliers from large data sets,” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, vol. 29, pp. 427–438, Dallas, Texas, United States, May 16–18, 2000.

  31. P.J. Rousseeuw and K.V. Driessen. “A fast algorithm for the minimum covariance determinant estimator,” Technometrics, Vol. 41:212–223, 1999.

    Article  Google Scholar 

  32. I. Ruts and P.J. Rousseeuw. “Computing depth contours of bivariate point clouds,” Computational Statistics and Data Analysis, Vol. 23(1):153–168, 1996.

    Article  Google Scholar 

  33. S. Shekhar and S. Chawla. A Tour of Spatial Databases. Prentice Hall, 2002.

  34. S. Shekhar, C.-T. Lu, and P. Zhang. “A unified approach to detecting spatial outliers,” GeoInformatica, Vol. 7(2):139–166, 2003.

    Article  Google Scholar 

  35. S. Shekhar, C.-T. Lu, and P. Zhang. “Detecting graph-based spatial outliers: algorithms and applications (a summary of results),” in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 371–376, San Francisco, California, United States, August 26–29, 2001.

  36. M.E. Tipping and C.M. Bishop. “Mixtures of probabilistic principal component analysers,” Neural Computation, Vol. 11(2):443–482, 1999.

    Article  Google Scholar 

  37. W. Tobler. “Cellular geography,” in Philosophy in Geography, pp. 379–386, Dordrecht, Holland. Dordrecht Reidel Publishing Company, 1979.

  38. W.-K. Wong, A. Moore, G. Cooper, and M. Wagner. “Rule-based anomaly pattern detection for detecting disease outbreaks,” in The Eighteenth National Conference on Artificial Intelligence, pp. 217–223, Edmonton, Alberta, Canada, July 28–August 1, 2002.

  39. L. Xu. “Bayesian ying-yang machine, clustering and number of clusters,” Pattern Recognition Letters, Vol. 18(11–13):1167–1178, 1997.

    Article  Google Scholar 

  40. K. Yamanishi, J.-I. Takeuchi, G. Williams, and P. Milne. “On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms,” Data Mining and Knowledge Discovery, Vol. 8(3):275–300, 2004.

    Article  Google Scholar 

  41. S. Zanero and S.M. Savaresi. “Unsupervised learning techniques for an intrusion detection system,” in Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 412–419, Nicosia, Cyprus, March 14–17, 2004.

  42. T. Zhang, R. Ramakrishnan, and M. Livny. “Birch: an efficient data clustering method for very large databases,” in Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114, Montreal, Quebec, Canada, June 4–6, 1996.

  43. J. Zhao, C.-T. Lu, and Y. Kou. “Detecting region outliers in meteorological data,” in Proceedings of the 11th ACM International Symposium on Advances in Geographic Information Systems, pp. 49–55, New Orleans, Louisiana, United States, November 7–8, 2003.

  44. G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 3rd ed., 1996.

  45. S. Verboven and M. Hubert. “LIBRA: a Matlab library for robust analysis,” Chemometrics and Intelligent Laboratory Systems, Vol. 75:127–136, 1996.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, D., Lu, CT., Kou, Y. et al. On Detecting Spatial Outliers. Geoinformatica 12, 455–475 (2008). https://doi.org/10.1007/s10707-007-0038-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-007-0038-8

Keywords

Navigation