Abstract
PreDeCon is a fundamental clustering algorithm for finding arbitrarily shaped clusters hidden in high-dimensional feature spaces of data, which is an important research topic and has many potential applications. However, it suffers from very high runtime as well as lack of interactions with users. Our algorithm, called AnyPDC, introduces a novel approach to cope with these problems by casting PreDeCon into an anytime algorithm. It quickly produces an approximate result and iteratively refines it toward the result of PreDeCon at the end. This scheme not only significantly speeds up the algorithm but also provides interactions with users during its execution. Experiments conducted on real large datasets show that AnyPDC acquires good approximate results very early, leading to an order of magnitude speedup factor compared to PreDeCon. More interestingly, while anytime techniques usually end up slower than batch ones, AnyPDC is faster than PreDeCon even if it run to the end.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Müller-Gorman, I., Zimek, A.: Finding hierarchies of subspace clusters. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 446–453. Springer, Heidelberg (2006). doi:10.1007/11871637_42
Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Müller-Gorman, I., Zimek, A.: Detection and visualization of subspace cluster hierarchies. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 152–163. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71703-4_15
Aggarwal, C.C., Procopiuc, C.M., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: SIGMOD, pp. 61–72 (1999)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD, pp. 94–105 (1998)
Böhm, C., Kailing, K., Kriegel, H.P., Kröger, P.: Density connected clustering with local subspace preferences. In: ICDM, pp. 27–34 (2004)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)
Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: VLDB, pp. 506–515 (2000)
Kobayashi, T., Iwamura, M., Matsuda, T., Kise, K.: An anytime algorithm for camera-based character recognition. In: ICDAR, pp. 1140–1144 (2013)
Kriegel, H.-P., Kröger, P., Ntoutsi, I., Zimek, A.: Density based subspace clustering over dynamic data. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 387–404. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22351-8_24
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1), 1–58 (2009)
Kröger, P., Kriegel, H.P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: SDM, pp. 246–256 (2004)
Mai, S.T., Assent, I., Storgaard, M.: AnyDBC: an efficient anytime density-based clustering algorithm for very large complex datasets. In: KDD (2016)
Mai, S.T., He, X., Feng, J., Böhm, C.: Efficient anytime density-based clustering. In: SDM, pp. 112–120 (2013)
Mai, S.T., He, X., Feng, J., Plant, C., Böhm, C.: Anytime density-based clustering of complex data. Knowl. Inf. Syst. 45(2), 319–355 (2015)
Mai, S.T., He, X., Hubig, N., Plant, C., Böhm, C.: Active density-based clustering. In: ICDM, pp. 508–517 (2013)
Ntoutsi, I., Zimek, A., Palpanas, T., Kröger, P., Kriegel, H.: Density-based projected clustering over high dimensional data streams. In: SDM (2012)
Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace clustering. Data Min. Knowl. Discov. 26(2), 332–397 (2013)
Ueno, K., Xi, X., Keogh, E.J., Lee, D.J.: Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM (2006)
Zaki, M.J., Meira Jr., W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, New York (2014)
Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17(3), 73–83 (1996)
Acknowledgments.
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2015.10.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kristensen, J., Mai, S.T., Assent, I., Jacobsen, J., Vo, B., Le, A. (2017). Interactive Exploration of Subspace Clusters for High Dimensional Data. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10438. Springer, Cham. https://doi.org/10.1007/978-3-319-64468-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-64468-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64467-7
Online ISBN: 978-3-319-64468-4
eBook Packages: Computer ScienceComputer Science (R0)