Skip to main content

Fractal Mining - Self Similarity-based Clustering and its Applications

  • Chapter
  • First Online:
Book cover Data Mining and Knowledge Discovery Handbook

Summary

Self-similarity is the property of being invariant with respect to the scale used to look at the data set. Self-similarity can be measured using the fractal dimension. Fractal dimension is an important charactaristics for many complex systems and can serve as a powerful representation technique. In this chapter, we present a new clustering algorithm, based on self-similarity properties of the data sets, and also its applications to other fields in Data Mining, such as projected clustering and trend analysis. Clustering is a widely used knowledge discovery technique. The new algorithm which we call Fractal Clustering (FC) places points incrementally in the cluster for which the change in the fractal dimension after adding the point is the least. This is a very natural way of clustering points, since points in the same clusterhave a great degree of self-similarity among them (and much less self-similarity with respect to points in other clusters). FC requires one scan of the data, is suspendable at will, providing the best answer possible at that point, and is incremental. We show via experiments that FC effectively deals with large data sets, high-dimensionality and noise and is capable of recognizing clusters of arbitrary shape.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • E. Backer. Computer-Assisted Reasoning in Cluster Analysis. Prentice Hall, 1995.

    Google Scholar 

  • A. Belussi and C. Faloutsos. Estimating the Selectivity of Spatial Queries Using the ‘Correlation’ Fractal Dimension. In Proceedings of the International Conference on Very Large Data Bases, pages 299–310, September 1995.

    Google Scholar 

  • P.S. Bradley, U. Fayyad, and C. Reina. Scaling Clustering Algorithms to Large Databases (Extended Abstract). In Proceedings of the ACM IGMODWorkshop on Research Issues in Data Mining and Knowledge Discovery, June 1998.

    Google Scholar 

  • CDIA. U.S. Historical Climatology Network Data. http://cdiac.esd.ornl.gov/epubs/ndp019/ ushcn_r3.html.

  • H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations. Annals of Mathematical Statistics, pages 493–509, 1952.

    Google Scholar 

  • C. Domingo, R. Gavaldá, and O. Watanabe. Practical Algorithms for Online Selection. In Proceedings of the first International Conference on Discovery Science, 1998.

    Google Scholar 

  • C. Domingo, R. Gavaldá, and O. Watanabe. Adaptive Sampling Algorithms for Scaling Up Knowledge Discovery Algorithms. In Proceedings of the second International Conference on Discovery Science, 2000.

    Google Scholar 

  • P. Domingos and G. Hulten. Mining High-Speed Data Streams. In Proceedings of the Sixth ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, 2000.

    Google Scholar 

  • C. Faloutsos and V. Gaede. Analysis of the Z-ordering Method Using the hausdorff Fractal Dimension. In Proceedings of the International Conference on Very Large Data Bases, pages 40–50, September 1996.

    Google Scholar 

  • C. Faloutsos and I. Kamel. Relaxing the Uniformity and Independence Assumptions, Using the Concept of Fractal Dimensions. Journal of Computer and System Sciences, 55(2):229–240, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  • C. Faloutsos, Y. Matias, and A. Silberschatz. Modeling Skewed Distributions Using Multifractals and the ‘80-20 law’. In Proceedings of the International Conference on Very Large Data Bases, pages 307–317, September 1996.

    Google Scholar 

  • K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, San Diego, California, 1990.

    MATH  Google Scholar 

  • P. Grassberger. Generalized Dimensions of Strange Attractors. Physics Letters, 97A:227–230, 1983.

    MathSciNet  Google Scholar 

  • P. Grassberger and I. Procaccia. Characterization of Strange Attractors. Physical Review Letters, 50(5):346–349, 1983.

    Article  MathSciNet  Google Scholar 

  • S. Guha, R. Rastogi, and K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. In Proceedings of the ACM SIGMOD Conference on Management of Data, Seattle, Washington, pages 73–84, 1998.

    Google Scholar 

  • A. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, New Jersey, 1988.

    MATH  Google Scholar 

  • L.S. Liebovitch and T. Toth. A Fast Algorithm to Determine Fractal Dimensions by Box Countig. Physics Letters, 141A(8), 1989.

    Google Scholar 

  • R.J. Lipton and J.F Naughton. Query Size Estimation by Adaptive Sampling. Journal of Computer Systems Science, pages 18–25, 1995.

    Google Scholar 

  • R.J. Lipton, J.F. Naughton, D.A. Schneider, and S. Seshadri. Efficient Sampling Strategies for Relational Database Operations. Theoretical Computer Science, pages 195–226, 1993.

    Google Scholar 

  • B.B. Mandelbrot. The Fractal Geometry of Nature. W.H. Freeman, New York, 1983.

    Google Scholar 

  • D.A. Menascé, V.A. Almeida, R.C. Fonseca, and M.A. Mendes. A Methodology for Workload Characterizatoin for E-commerce Servers. In Proceedings of the ACM Conference in Electronic Commerce, Denver, CO, November 1999.

    Google Scholar 

  • J. Sarraille and P. DiFalco. FD3. http://tori.postech.ac.kr/softwares/.

  • E. Schikuta. Grid clustering: An efficient hierarchical method for very large data sets. In Proceedings of the 13th Conference on Pattern Recognition, IEEE Computer Society Press, pages 101–105, 1996.

    Google Scholar 

  • M. Schroeder. Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. W.H. Freeman, New York, 1991.

    Google Scholar 

  • S.Z. Selim and M.A. Ismail. K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(1), 1984.

    Google Scholar 

  • G. Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. In Proceedings of the 24th Very Large Data Bases Conference, pages 428–439, 1998.

    Google Scholar 

  • W. Wang, J. Yand, and R. Muntz. STING: A statistical information grid approach to spatial data mining. In Proceedings of the 23rd Very Large Data Bases Conference, pages 186–195, 1997.

    Google Scholar 

  • O. Watanabe. Simple Sampling Techniques for Discovery Science. IEICE Transactions on Information and Systems, January 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Barbara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Barbara, D., Chen, P. (2009). Fractal Mining - Self Similarity-based Clustering and its Applications. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_28

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-09823-4_28

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-09822-7

  • Online ISBN: 978-0-387-09823-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics