Divisive clustering of high dimensional data streams

Hofmeyr, David P.; Pavlidis, Nicos G.; Eckley, Idris A.

doi:10.1007/s11222-015-9597-y

Divisive clustering of high dimensional data streams

Published: 31 July 2015

Volume 26, pages 1101–1120, (2016)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

David P. Hofmeyr¹,
Nicos G. Pavlidis² &
Idris A. Eckley¹

578 Accesses
3 Citations
Explore all metrics

Abstract

Clustering streaming data is gaining importance as automatic data acquisition technologies are deployed in diverse applications. We propose a fully incremental projected divisive clustering method for high-dimensional data streams that is motivated by high density clustering. The method is capable of identifying clusters in arbitrary subspaces, estimating the number of clusters, and detecting changes in the data distribution which necessitate a revision of the model. The empirical evaluation of the proposed method on numerous real and simulated datasets shows that it is scalable in dimension and number of clusters, is robust to noisy and irrelevant features, and is capable of handling a variety of types of non-stationarity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overview of Efficient Clustering Methods for High-Dimensional Big Data Streams

StrDip: A Fast Data Stream Clustering Algorithm Using the Dip Test of Unimodality

Clustering Large Datasets Using Data Stream Clustering Techniques

References

Aggarwal, C.C.: A survey of stream clustering algorithms. In: Aggarwal, C.C., Reddy, C. (eds.) Data Clustering: Algorithms and Applications, pp. 457–482. CRC Press, Boca Raton (2013)
Google Scholar
Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on Very large data bases, vol. 29, pp. 81–92 (2003)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: Proceedings of the Thirtieth international conference on Very large data bases, pp. 852–863 (2004)
Amini, A., Saboohi, H., Wah, T.Y., Herawan, T.: Dmm-stream: A density mini-micro clustering algorithm for evolving datastreams. In: Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), 675-682 (2014)
Amini, A., Wah, T.Y., Saboohi, H.: On density based data streams clustering algorithms: a survey. J. Comput. Sci. Technol. 29(1), 116–141 (2014)
Article Google Scholar
Anagnostopoulos, C., Tasoulis, D.K., Adams, N.M., Pavlidis, N.G., Hand, D.J.: Online linear and quadratic discriminant analysis with adaptive forgetting for streaming classification. Stat. Anal. Data Min. 5(2), 139–166 (2012)
Article MathSciNet Google Scholar
Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: Proceedings of the ACM Sigmod Conference, pp. 49–60 (1999)
Artac, M., Jogan, M., Leonardis, A.: Incremental PCA for on-line visual learning and recognition. In: Proceedings of the 16th International Conference on Pattern Recognition, vol. 3, pp. 781–784 (2002)
Azzalini, A., Torelli, N.: Clustering via nonparametric density estimation. Stat. Comput. 17(1), 71–80 (2007). doi:10.1007/s11222-006-9010-y
Article MathSciNet Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 1–16 (2002)
Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/m
Boley, D.: Principal direction divisive partitioning. Data Min. Knowl. Discov. 2(4), 325–344 (1998)
Article Google Scholar
Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies. Data Min. Knowl. Discov. 27(3), 344–371 (2013)
Article MathSciNet MATH Google Scholar
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM international conference on data mining, pp. 328–339 (2006)
Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142 (2007)
Cuevas, A., Febrero, M., Fraiman, R.: Cluster analysis: a further approach based on density estimation. Comput. Stat. Data Anal. 36(4), 441–459 (2001)
Article MathSciNet MATH Google Scholar
Cuevas, A., Fraiman, R.: A plug-in approach to support estimation. Ann. Stat. 25(6), 2300–2312 (1997)
Article MathSciNet MATH Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)
Article Google Scholar
Hartigan, J.A.: Clustering Algorithms. Wiley Series in Probability and Mathematical Statistics. Wiley, New York (1975)
Google Scholar
Hartigan, P.M.: Algorithm as 217: computation of the dip statistic to test for unimodality. J. R. Stat. Soc. 34(3), 320–325 (1985)
Google Scholar
Hartigan, J.A., Hartigan, P.M.: The dip test of unimodality. Ann. Stat. 13(1), 70–84 (1985)
Article MathSciNet MATH Google Scholar
Hassani, M., Kranen, P., Saini, R., Seidl, T.: Subspace anytime stream clustering. In: Proceedings of the 26th International Conference on Scientific and Statistical Database Management, p. 37 (2014)
Hassani, M., Spaus, P., Gaber, M.M., Seidl, T.: Density-based projected clustering of data streams. In: Proceedings of the 6th International Conference on Scalable Uncertainty Management, pp. 311–324 (2012)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall International, Upper Saddle River (1999)
MATH Google Scholar
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Article Google Scholar
Jia, C., Tan, C., Yong, A.: A grid and density-based clustering algorithm for processing data stream. In: International Conference on Genetic and Evolutionary Computing (2008)
Kranen, P., Assent, I., Baldauf, C., Seidl, T.: Self-adaptive anytime stream clustering. In: IEEE International Conference on Data Mining, pp. 249–258, doi:10.1109/ICDM.2009.47 (2009)
Kranen, P.: Anytime algorithms for stream data mining. Diese Dissertation. RWTH Aachen University (2011)
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data. 3(1), 1–58 (2009)
Article Google Scholar
Li, Y., Xu, L.-Q., Morphett, J., Jacobs, R.: An integrated algorithm of incremental and robust pca. In: Proceedings of the International Conference on Image Processing, 1, pp. 245–248 (2009)
Menardi, G., Azzalini, A.: An advancement in clustering via nonparametric density estimation. Stat. Comput. 24(5), 753–767 (2014). doi:10.1007/s11222-013-9400-x
Article MathSciNet MATH Google Scholar
Müller, D.W., Sawitzki, G.: Excess mass estimates and tests for multimodality. J. Am. Stat. Assoc. 86(415), 738–746 (1991)
MathSciNet MATH Google Scholar
Ntoutsi, I., Zimek, A., Palpanas, T., Kröger, P., Kriegel, H.P.: Density-based projected clustering over high dimensional data streams. In: Proceedings SiAM International Conference on Data Mining, pp. 987–998 (2012)
Pavlidis, N.G., Tasoulis, D.K., Adams, N.M., Hand, D.J.: $\lambda $-perceptron: an adaptive classifier for data-streams. Pattern Recognit. 44(1), 78–96 (2011)
Article MATH Google Scholar
Reynolds Jr, M.R., Stoumbos, Z.G.: A CUSUM chart for monitoring a proportion when inspecting continuously. J. Qual. Technol. 3(1), 87 (1999)
Google Scholar
Rigollet, P., Vert, R.: Optimal rates for plug-in estimators of density level sets. Bernoulli 15(4), 1154–1178 (2009)
Article MathSciNet MATH Google Scholar
Rinaldo, A., Wasserman, L.: Generalized density clustering. Ann. Stat. 38(5), 2678–2722 (2010)
Article MathSciNet MATH Google Scholar
Rosenberg, A., Hirschberg, J.: V-measure: A conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 410–420 (2007)
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization, vol. 383. John Wiley & Sons, New York (2009)
MATH Google Scholar
Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A.C.P.L.F., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 13:1–13:31 (2013)
Article MATH Google Scholar
Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif. 20(5), 25–47 (2003)
Article MathSciNet MATH Google Scholar
Stuetzle, W., Nugent, R.: A generalized single linkage method for estimating the cluster tree of a density. J. Comput. Gr. Stat. 19(2), 397–418 (2010)
Article MathSciNet Google Scholar
Tasoulis, S.K., Tasoulis, D.K., Plagianakos, V.: Enhancing principal direction divisive clustering. Pattern Recognit. 43(10), 3391–3411 (2010)
Article MATH Google Scholar
Tasoulis, S.K., Tasoulis, D.K., Plagianakos, V.P.: Clustering of high dimensional data streams. In: Maglogiannis, L., Vlahavas, L. (eds.) Artificial Intelligence: Theories and Applications, pp. 223–230. Springer, Berlin (2012)
Chapter Google Scholar
Vergara, A., Vembu, S., Ayhan, T., Ryan, M.A., Homer, M.L., Huerta, R.: Chemical gas sensor drift compensation using classifier ensembles. Sens. Actuators B 166, 320–329 (2012)
Article Google Scholar
von Luxburg, U.: Clustering Stability. Now Publishers Inc, Hanover (2010)
MATH Google Scholar
Weng, J., Zhang, Y., Hwang, W.-S.: Candid covariance-free incremental principal component analysis. IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 1034–1040 (2003)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. ACM SIGMOD Conf. 25, 103–114 (1996)
Article Google Scholar
Zhao, Y., Karypis, G.: Criterion functions for document clustering: experiments and analysis. Mach. Learn. 42, 143–175 (2001)
Article Google Scholar

Download references

Acknowledgments

David Hofmeyr gratefully acknowledges funding from both the Engineering and Physical Sciences Research Council (EPSRC) and the Oppenheimer Memorial Trust.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, UK
David P. Hofmeyr & Idris A. Eckley
Department of Management Science, Lancaster University, Lancaster, LA1 4YX, UK
Nicos G. Pavlidis

Authors

David P. Hofmeyr
View author publications
You can also search for this author in PubMed Google Scholar
Nicos G. Pavlidis
View author publications
You can also search for this author in PubMed Google Scholar
Idris A. Eckley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David P. Hofmeyr.

Appendix 1: Proofs

Before we can prove Lemma 2, we require the following preliminaries.

The algorithm for computing the dip of a distribution function F constructs a unimodal distribution function G with the following properties: (i) The modal interval of G, [m, M], is equal to the modal interval of the closest unimodal distribution function to F, which we denote by $F^U$, based on the supremum norm; (ii) $\Vert F - G\Vert _\infty = 2\Vert F - F^U\Vert _\infty $; (iii) G is the greatest convex minorant of F on $(-\infty , m]$; (iv) G is the least concave majorant of F on $[M, \infty )$. By construction, the function G is linear between its nodes. A node $n \le m$ of G satisfies $G(n) = \lim \inf _{x \rightarrow n}F(x)$, while a node $n \ge M$ of G satisfies $G(n) = \hbox {limsup}_{x \rightarrow n}F(x)$. If F is the distribution function of a discrete random variable, then G is continuous.

The function $F^U$ can be constructed by finding appropriate values $b<m, B>M$ s.t. $F^U$ is equal to $G+\hbox {Dip}(F)$ on [b, m], equal to $G-\hbox {Dip}(F)$ on [M, B], linearly interpolating between G(m) and G(M) and given any appropriate tails, which we choose to be linearly decreasing/increasing to 0 and 1 respectively.

Before proving Lemma 2, we require the following preliminary result, which relies on the notion of a step linear function.

Definition 3

(Step Linear) A function f is step linear on a non-empty, compact interval $I = [a, b]$, if

$$\begin{aligned} f(x) = \alpha + \beta \left\lfloor (x-a)\frac{n}{b-a}\right\rfloor , \quad \forall x \in I, \end{aligned}$$

for some $\alpha , \beta \in \mathbb {R}$ and $n \in {\mathbb {N}}$.

A step linear function is piecewise constant, and has n equally sized jumps of size $\beta $ spaced equally on I with the final jump ocurring at b. The approximate empirical distribution function $\tilde{F}$ (Sect. 4.2.2) is therefore step linear over the approximating intervals.

Proposition 4

Let f be step linear on an interval $I = [a, b]$, and satisfy $\lim _{x \rightarrow a^-}f(x) = \alpha - \beta $, where $\alpha , \beta $ as in the above definition for f. Let g be liner on I and continuous on a neighbourhood of I. Then

$$\begin{aligned} \sup _{x \in I} \vert f(x) - g(x) \vert\le & {} \max \left\{ \hbox {limsup}_{x \rightarrow a}\vert f(x) - g(x) \vert ,\right. \\&\left. \hbox {limsup}_{x \rightarrow b} \vert f(x) - g(x) \vert \right\} . \end{aligned}$$

Proof

Let $f_m$ and $f^M$ be linear on a neighbourhood of I s.t. they form the closest lower and upper bounding functions of f on I respectively. Since f is step linear, we have,

$$\begin{aligned} \lim _{x \rightarrow a^-}f(x) = f_m(a),&\lim _{x \rightarrow b^-}f(x) = f_m(b),\\ f(a) = f^M(a),&f(b) = f^M(b). \end{aligned}$$

We therefore have, by above and the fact that $g, f_m$, and $f^M$ are linear on I,

$$\begin{aligned} \sup _{x \in I}\vert f(x) - g(x) \vert\le & {} \max \left\{ \sup _{x \in I}\vert f^M(x) - g(x)\vert ,\right. \\&\left. \sup _{x \in I}\vert f_m(x) - g(x) \vert \right\} \\= & {} \max \left\{ \vert f^M(b) - g(b)\vert ,\right. \\&\vert f^M(a) - g(a)\vert , \vert f_m(a) - g(a) \vert ,\\&\left. \vert f_m(b) - g(b)\vert \right\} \\= & {} \max \left\{ \hbox {limsup}_{x \rightarrow a}\vert f(x) - g(x) \vert ,\right. \\&\left. \hbox {limsup}_{x \rightarrow b}\vert f(x) - g(x) \vert \right\} . \end{aligned}$$

$\square $

We are now in a position to prove Lemma 2, which states that the dip of a compactly approximated sample, as described in Sect. 4.2.2, provides a lower bound on the dip of the true sample.

Proof of Lemma 2

Let $I = [a, b]$ be any compact interval and $F_I$ the empirical distribution function of $(\mathcal {X}\cap I^c) \cup \hbox {Unif}(\mathcal {X}, I)$. Assume $\vert \mathcal {X}\cap I \vert >1$, since otherwise $F_I = F_\mathcal {X}$ and we are done. We can assume that the endpoints of I are elements of $\mathcal {X}$ since this defines the same uniform set. $F_\mathcal {X}$ and $F_I$ are therefore equal on $\hbox {Int}(I)^c$. In fact, since $\mathcal {X}$ consists of unique points, $\exists \epsilon > 0$ s.t. $F_I(x) = F_\mathcal {X}(x) \ \forall x \not \in (a+\epsilon , b-\epsilon )$. Define $F^\prime _I$ to be equal to $F_\mathcal {X}^U$ for $x \not \in \hbox {Int}(I)$ and linearly interpolating between $F_X^U(a)$ and $F_X^U(b)$. By construction $F^\prime _I$ is a continuous unimodal distribution function.

We now show $\Vert F_I - F_I^\prime \Vert _\infty \le \Vert F_\mathcal {X}- F_\mathcal {X}^U\Vert _\infty $. To see this, suppose that it is not true, i.e., $\exists x$ s.t. $\vert F_I(x) - F_I^\prime (x)\vert > \sup _y \vert F_\mathcal {X}(y) - F_\mathcal {X}^U(y)\vert $. Clearly $x \in \hbox {Int}(I)$ due to the equalities discussed above and the construction of $F^\prime _I$. Because of the continuity of $F_\mathcal {X}^U$ and $F_I^\prime $ and the equality of $F_\mathcal {X}$ and $F_I$ on $(a, a+\epsilon ) \cup (b-\epsilon , b)$, we have

$$\begin{aligned} \hbox {limsup}_{y \rightarrow a} \vert F_I(y) - F^\prime _I(y) \vert = \hbox {limsup}_{y \rightarrow a}\vert F_\mathcal {X}- F^U_\mathcal {X}(x) \vert \end{aligned}$$

and

$$\begin{aligned} \hbox {limsup}_{y \rightarrow b} \vert F_I(y) - F^\prime _I(y) \vert = \hbox {limsup}_{y \rightarrow b}\vert F_\mathcal {X}- F^U_\mathcal {X}(x) \vert . \end{aligned}$$

But by Proposition 4 one of these left hand sides is at least as large as $\vert F_I(x) - F_I^\prime (x) \vert $, leading to a contradiction.

We have shown that the addition of a single interval cannot increase the dip. We can apply the same logic to the now modified sample $(\mathcal {X}\cap I^c) \cup \hbox {Unif}(\mathcal {X}, I)$, iterating the addition of disjoint intervals to obtain a non-increasing sequence of dips. $\square $

In the above proof, we do not show that $F_I^\prime $ is the closest unimodal distribution function to $F_I$, however its existence necessitates the closest one being at least as close. Now, the sample approximations we employ still contain a full t atoms after t observations, however, they can be stored in ${\mathcal {O}}(k)$ for k intervals. We can easily show that the dip of such a sample approximation can be computed in ${\mathcal {O}}(k)$ time.

Proposition 5

The dip of a sample consisting of k uniform sets with disjoint ranges can be computed in ${\mathcal {O}}(k)$ time.

Proof

We begin by showing that there exists a unimodal distribution function which is linear on the ranges of the uniform sets and which achieves the minimal distance to the empirical distribution function of the sample.

Let F be a continuous unimodal distribution function s.t. $\Vert F - \tilde{F}\Vert _\infty = \hbox {Dip}(\tilde{F})$. Define $F^\prime $ similarly to in the above proof to be the continuous distribution function which is equal to F outside and at the boundaries of the intervals defining the uniform sets and linearly interpolating on them. Using the same logic, we know that $\sup _{x}\vert F^\prime (x) - \tilde{F}(x) \vert \le \sup _x\vert F(x) - \tilde{F}(x)\vert $, hence $\Vert F^\prime - \tilde{F}\Vert _\infty = \hbox {Dip}(\tilde{F})$.

Proposition 4 ensures that points in the interior of the intervals will not be chosen by the dip algorithm as end points of the modal interval of G, nor points at which the difference between the functions is supremal. The possible choices for these locations is therefore ${\mathcal {O}}(k)$, and the algorithm need not evaluate the functions except at the endpoints of the intervals. $\square $

Finally, we provide a proof of Proposition 3.

Proof of Proposition 3

For $s > 1$ we have $\Vert v_s - v_{s-1}\Vert = \Vert v_s\Vert \Vert v_s-v_{s-1}\Vert \ge \vert v_s \cdot (v_s - v_{s-1})\vert = \vert v_sv_{s-1}-1\vert $, since $\Vert v_t\Vert = 1 \ \forall t$. Therefore, since $\{v_t\}_{t=1}^\infty $ is almost surely convergent, and therefore almost surely Cauchy, we have $v_s \cdot v_{s-1} \xrightarrow {a.s.} 1 \Rightarrow \arccos (v_s \cdot v_{s-1}) \xrightarrow {a.s.}0$. Now, we can easily show that,

$$\begin{aligned} \lambda _t \le \gamma ^{t-1}\lambda _1 + (1-\gamma )\sum _{i=1}^{t-2}\gamma ^i\arccos (v_{t-i}\cdot v_{t-i-1}). \end{aligned}$$

Take $\epsilon > 0$ and t large enough that $\gamma ^{t-1}\lambda _1 < \gamma \epsilon $, and $t>k+2$, where $k = \lfloor \log (\epsilon (1-\gamma )/2\pi )/\log (\gamma )-1\rfloor $. Consider,

$$\begin{aligned} \sum _{i=1}^{t-2}\gamma ^i\arccos (v_{t-i}\cdot v_{t-i-1})\le & {} \sum _{i=1}^{k}\arccos (v_{t-i}\cdot v_{t-i-1})\\&+\frac{\pi \gamma ^{k+1}}{1-\gamma }, \end{aligned}$$

and $\frac{\pi \gamma ^{k+1}}{1-\gamma } \le \frac{\epsilon }{2}$. In all,

$$\begin{aligned} \lambda _t > \epsilon \Rightarrow \sum _{i=0}^k\arccos (v_{t-i}\cdot v_{t-i-1}) > \epsilon /2. \end{aligned}$$

Notice that k does not depend on t. With probability 1, for any given $\epsilon >0$ there is a ${\mathcal {T}}$ s.t. $T>{\mathcal {T}}$ implies $\sum _{i=0}^k\arccos (v_{T-i}\cdot v_{T-i-1})\le \epsilon /2$, implying $\lambda _T \le \epsilon $ for all $T > {\mathcal {T}}$, and the result follows. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hofmeyr, D.P., Pavlidis, N.G. & Eckley, I.A. Divisive clustering of high dimensional data streams. Stat Comput 26, 1101–1120 (2016). https://doi.org/10.1007/s11222-015-9597-y

Download citation

Received: 08 December 2014
Accepted: 13 July 2015
Published: 31 July 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11222-015-9597-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Divisive clustering of high dimensional data streams

Abstract

Access this article

Similar content being viewed by others

Overview of Efficient Clustering Methods for High-Dimensional Big Data Streams

StrDip: A Fast Data Stream Clustering Algorithm Using the Dip Test of Unimodality

Clustering Large Datasets Using Data Stream Clustering Techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix 1: Proofs

Definition 3

Proposition 4

Proof

Proof of Lemma 2

Proposition 5

Proof

Proof of Proposition 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Divisive clustering of high dimensional data streams

Abstract

Access this article

Similar content being viewed by others

Overview of Efficient Clustering Methods for High-Dimensional Big Data Streams

StrDip: A Fast Data Stream Clustering Algorithm Using the Dip Test of Unimodality

Clustering Large Datasets Using Data Stream Clustering Techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix 1: Proofs

Appendix 1: Proofs

Definition 3

Proposition 4

Proof

Proof of Lemma 2

Proposition 5

Proof

Proof of Proposition 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation