Skip to main content

Interactive Exploration of Subspace Clusters for High Dimensional Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10438))

Abstract

PreDeCon is a fundamental clustering algorithm for finding arbitrarily shaped clusters hidden in high-dimensional feature spaces of data, which is an important research topic and has many potential applications. However, it suffers from very high runtime as well as lack of interactions with users. Our algorithm, called AnyPDC, introduces a novel approach to cope with these problems by casting PreDeCon into an anytime algorithm. It quickly produces an approximate result and iteratively refines it toward the result of PreDeCon at the end. This scheme not only significantly speeds up the algorithm but also provides interactions with users during its execution. Experiments conducted on real large datasets show that AnyPDC acquires good approximate results very early, leading to an order of magnitude speedup factor compared to PreDeCon. More interestingly, while anytime techniques usually end up slower than batch ones, AnyPDC is faster than PreDeCon even if it run to the end.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.cs.ucr.edu/~eamonn/time_series_data/.

  2. 2.

    http://www.bbci.de/competition/iii/.

  3. 3.

    http://www.cru.uea.ac.uk/data/.

References

  1. Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Müller-Gorman, I., Zimek, A.: Finding hierarchies of subspace clusters. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 446–453. Springer, Heidelberg (2006). doi:10.1007/11871637_42

    Chapter  Google Scholar 

  2. Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Müller-Gorman, I., Zimek, A.: Detection and visualization of subspace cluster hierarchies. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 152–163. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71703-4_15

    Chapter  Google Scholar 

  3. Aggarwal, C.C., Procopiuc, C.M., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: SIGMOD, pp. 61–72 (1999)

    Google Scholar 

  4. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD, pp. 94–105 (1998)

    Google Scholar 

  5. Böhm, C., Kailing, K., Kriegel, H.P., Kröger, P.: Density connected clustering with local subspace preferences. In: ICDM, pp. 27–34 (2004)

    Google Scholar 

  6. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)

    Google Scholar 

  7. Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: VLDB, pp. 506–515 (2000)

    Google Scholar 

  8. Kobayashi, T., Iwamura, M., Matsuda, T., Kise, K.: An anytime algorithm for camera-based character recognition. In: ICDAR, pp. 1140–1144 (2013)

    Google Scholar 

  9. Kriegel, H.-P., Kröger, P., Ntoutsi, I., Zimek, A.: Density based subspace clustering over dynamic data. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 387–404. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22351-8_24

    Chapter  Google Scholar 

  10. Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1), 1–58 (2009)

    Article  Google Scholar 

  11. Kröger, P., Kriegel, H.P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: SDM, pp. 246–256 (2004)

    Google Scholar 

  12. Mai, S.T., Assent, I., Storgaard, M.: AnyDBC: an efficient anytime density-based clustering algorithm for very large complex datasets. In: KDD (2016)

    Google Scholar 

  13. Mai, S.T., He, X., Feng, J., Böhm, C.: Efficient anytime density-based clustering. In: SDM, pp. 112–120 (2013)

    Google Scholar 

  14. Mai, S.T., He, X., Feng, J., Plant, C., Böhm, C.: Anytime density-based clustering of complex data. Knowl. Inf. Syst. 45(2), 319–355 (2015)

    Article  Google Scholar 

  15. Mai, S.T., He, X., Hubig, N., Plant, C., Böhm, C.: Active density-based clustering. In: ICDM, pp. 508–517 (2013)

    Google Scholar 

  16. Ntoutsi, I., Zimek, A., Palpanas, T., Kröger, P., Kriegel, H.: Density-based projected clustering over high dimensional data streams. In: SDM (2012)

    Google Scholar 

  17. Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace clustering. Data Min. Knowl. Discov. 26(2), 332–397 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  18. Ueno, K., Xi, X., Keogh, E.J., Lee, D.J.: Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM (2006)

    Google Scholar 

  19. Zaki, M.J., Meira Jr., W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, New York (2014)

    MATH  Google Scholar 

  20. Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17(3), 73–83 (1996)

    Google Scholar 

Download references

Acknowledgments.

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2015.10.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Son T. Mai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kristensen, J., Mai, S.T., Assent, I., Jacobsen, J., Vo, B., Le, A. (2017). Interactive Exploration of Subspace Clusters for High Dimensional Data. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10438. Springer, Cham. https://doi.org/10.1007/978-3-319-64468-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64468-4_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64467-7

  • Online ISBN: 978-3-319-64468-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics