Constrained distance based clustering for time-series: a comparative and experimental study

Lampert, Thomas; Dao, Thi-Bich-Hanh; Lafabregue, Baptiste; Serrette, Nicolas; Forestier, Germain; Crémilleux, Bruno; Vrain, Christel; Gançarski, Pierre

doi:10.1007/s10618-018-0573-y

Constrained distance based clustering for time-series: a comparative and experimental study

Published: 30 May 2018

Volume 32, pages 1663–1707, (2018)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Thomas Lampert¹,
Thi-Bich-Hanh Dao²,
Baptiste Lafabregue¹,
Nicolas Serrette²,
Germain Forestier³,
Bruno Crémilleux⁴,
Christel Vrain² &
…
Pierre Gançarski¹

1640 Accesses
24 Citations
4 Altmetric
Explore all metrics

Abstract

Constrained clustering is becoming an increasingly popular approach in data mining. It offers a balance between the complexity of producing a formal definition of thematic classes—required by supervised methods—and unsupervised approaches, which ignore expert knowledge and intuition. Nevertheless, the application of constrained clustering to time-series analysis is relatively unknown. This is partly due to the unsuitability of the Euclidean distance metric, which is typically used in data mining, to time-series data. This article addresses this divide by presenting an exhaustive review of constrained clustering algorithms and by modifying publicly available implementations to use a more appropriate distance measure—dynamic time warping. It presents a comparative study, in which their performance is evaluated when applied to time-series. It is found that k-means based algorithms become computationally expensive and unstable under these modifications. Spectral approaches are easily applied and offer state-of-the-art performance, whereas declarative approaches are also easily applied and guarantee constraint satisfaction. An analysis of the results raises several influencing factors to an algorithm’s performance when constraints are introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review and evaluation of elastic distance functions for time series clustering

Article Open access 07 September 2023

Christopher Holder, Matthew Middlehurst & Anthony Bagnall

TSX-Means: An Optimal K Search Approach for Time Series Clustering

Enhancing Time Series Clustering by Incorporating Multiple Distance Measures with Semi-Supervised Learning

Article 08 July 2015

Jing Zhou, Shan-Feng Zhu, … Yanchun Zhang

Notes

COP-KMeans is presented in more detail in Sect. 4.2.1.
The algorithms developed by Kamvar et al. (2003) and Li et al. (2009) are presented in more detail in Sects. 4.2.2 and 4.2.3 respectively.
Samarah is discussed in more detail in Sect. 4.2.4.
CPClustering is discussed in more detail in Sect. 4.2.5.

References

Aghabozorgi S, Shirkhorshidi A, Wah T (2015) Time-series clustering: a decade review. Inf Syst 53:16–38
Article Google Scholar
Al-Razgan M, Domeniconi C (2009) Clustering ensembles with active constraints. Springer, Berlin, pp 175–189
Google Scholar
Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75(2):245–248
Article MATH Google Scholar
Aloise D, Hansen P, Liberti L (2012) An improved column generation algorithm for minimum sum-of-squares clustering. Math Program 131(1–2):195–220
Article MathSciNet MATH Google Scholar
Alzate C, Suykens J (2009) A regularized formulation for spectral clustering with pairwise constraints. In: Proceedings of the international joint conference on neural networks, pp 141–148
Anand R, Reddy C (2011) Graph-based clustering with constraints. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 51–62
Chapter Google Scholar
Anand S, Bell D, Hughes J (1995) The role of domain knowledge in data mining. In: Proceedings of the international conference on information and knowledge management, pp 37–43
Antunes C, Oliveira A (2001) Temporal data mining: an overview. In: KDD workshop on temporal data mining, pp 1–13
Babaki B (2017) MIPKmeans. https://github.com/Behrouz-Babaki/MIPKmeans. Accessed 01 May 2017
Babaki B, Guns T, Nijssen S (2014) Constrained clustering using column generation. In: Proceedings of the international conference on AI and OR techniques in constriant programming for combinatorial optimization problems, pp 438–454
Chapter Google Scholar
Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31(3):606–660
Article MathSciNet Google Scholar
Banerjee A, Ghosh J (2006) Scalable clustering algorithms with balancing constraints. Data Min Knowl Discov 13(3):365–395
Article MathSciNet Google Scholar
Bar-Hillel A, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence relations. In: Proceedings of the international conference on machine learning, pp 11–18
Bar-Hillel A, Hertz T, Shental M, Weinshall D (2005) Learning a Mahalanobis metric from equivalence constraints. J Mach Learn Res 6:937–965
MathSciNet MATH Google Scholar
Basu S, Banerjee A, Mooney R (2002) Semi-supervised clustering by seeding. In: Proceedings of the international conference on machine learning, pp 19–26
Basu S, Banerjee A, Mooney R (2004) Active semi-supervision for pairwise constrained clustering. In: Proceedings of the SIAM international conference on data mining, pp 333–344
Basu S, Bilenko M, Mooney R (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 59–68
Basu S, Davidson I, Wagstaff K (2008) Constrained clustering: advances in algorithms, theory, and applications, 1st edn. Chapman & Hall, London
MATH Google Scholar
Bellet A, Habrard A, Sebban M (2015) Metric learning. Morgan & Claypool Publishers, Los Altos
Book MATH Google Scholar
Bilenko M, Mooney R (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 39–48
Bilenko M, Basu S, Mooney R (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the international conference on machine learning, pp 11–18
Bradley P, Bennett K, Demiriz A (2000) Constrained k-means clustering. Technical Report MSR-TR-2000-65, Microsoft Research
Chen W, Feng G (2012) Spectral clustering: a semi-supervised approach. Neurocomputing 77(1):229–242
Article Google Scholar
Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. www.cs.ucr.edu/~eamonn/time_series_data/. Accessed 01 May 2017
Cheng H, Hua K, Vu K (2008) Constrained locally weighted clustering. Proc VLDB Endow 1(1):90–101
Article Google Scholar
Cohn D, Caruana R, Mccallum A (2003) Semi-supervised clustering with user feedback. Technical Report TR2003-1892, Department of Computer Science, Cornell University
Cucuringu M, Koutis I, Chawla S, Miller G, Peng R (2016) Simple and scalable constrained clustering: a generalized spectral method. In: Proceedings of the international conference on artificial intelligence and statistics, pp 445–454
Dao TBH, Duong KC, Vrain C (2013) A declarative framework for constrained clustering. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases, pp 419–434
Google Scholar
Dao TBH, Vrain C, Duong KC, Davidson I (2016) A framework for actionable clustering using constraint programming. In: Proceedings of the European conference on artificial intelligence, pp 453–461
Dao TBH, Duong KC, Vrain C (2017) Constrained clustering by constraint programming. Artif Intell 244:70–94
Article MathSciNet MATH Google Scholar
Davidson I, Basu S (2007) A survey of clustering with instance level constraints. ACM Trans Knowl Discov Data 77(1):1–41
Google Scholar
Davidson I, Ravi S (2005) Clustering with constraints: Feasibility issues and the k-means algorithm. In: Proceedings of the SIAM international conference on data mining, pp 307–314
Davidson I, Ravi S (2006) Identifying and generating easy sets of constraints for clustering. In: Proceedings of the AAAI conference on artificial intelligence, pp 336–341
Davidson I, Ravi S (2007) Intractability and clustering with constraints. In: Proceedings of the international conference on machine learning, pp 201–208
Davidson I, Wagstaff K, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: European conference on principles of data mining and knowledge discovery, pp 115–126
Google Scholar
Davidson I, Ravi S, Shamis L (2010) A SAT-based framework for efficient constrained clustering. In: Proceedings of the SIAM international conference on data mining, pp 94–105
Chapter Google Scholar
Delattre M, Hansen P (1980) Bicriterion cluster analysis. IEEE Trans Pattern Anal Mach Intell PAMI 2(4):277–291
Article MATH Google Scholar
Demiriz A, Bennett K, Embrechts M (1999) Semi-supervised clustering using genetic algorithms. In: Proceedings of the conference on artificial neural networks in engineering, pp 809–814
Demiriz A, Bennett K, Bradley P (2008) Chap 9: Using assignment constraints to avoid empty clusters in k-means clustering. In: Basu S, Davidson I, Wagstaff K (eds) Constrained clustering: advances in algorithms, theory, and applications, 1st edn. Chapman & Hall, London, pp 201–220
Google Scholar
Dimitriadou E, Weingessel A, Hornik K (2002) A mixed ensemble approach for the semi-supervised problem. In: Proceedings of the international conference on artificial neural networks, pp 571–576
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: Proceedings of the international conference on very large data bases
Ding S, Qi B, Jia H, Zhu H, Zhang L (2013) Research of semi-supervised spectral clustering based on constraints expansion. Neural Comput Appl 22:405–410
Article Google Scholar
Domeniconi C, Al-Razgan M (2008) Penta-training: clustering ensembles with bootstrapping of constraints. In: Proceedings of workshop on supervised and unsupervised ensemble methods and their applications, pp 47–51
Domeniconi C, Gunopulos D, Ma S, Yan B, Al-Razgan M, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Discov 14(1):63–97
Article MathSciNet Google Scholar
Fisher D (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172
Google Scholar
Forestier G, Gançarski P, Wemmert C (2010) Collaborative clustering with background knowledge. Data Knowl Eng 69(2):211–228
Article MATH Google Scholar
Forestier G, Wemmert C, Gançarski P (2010) Towards conflict resolution in collaborative clustering. In: IEEE International conference on intelligent systems, pp 361–366
Fred ALN, Jain AK (2002) Data clustering using evidence accumulation. In: Proceedings of the IEEE international conference on pattern recognition, pp 276–280
Gançarski P, Wemmert C (2007) Collaborative multi-step mono-level multi-strategy classification. J Multimed Tools Appl 35(1):1–27
Article Google Scholar
Ganji M, Bailey J, Stuckey P (2016) Lagrangian constrained clustering. In: Proceedings of the SIAM international conference on data mining, pp 288–296
Ge R, Ester M, Jin W, Davidson I (2007) Constraint-driven clustering. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 320–329
Grira N, Crucianu M, Boujemaa N (2006) Fuzzy clustering with pairwise constraints for knowledge-driven image categorization. IEE Proc Vis Image Signal Process (CORE B) 153(3):299–304
Article Google Scholar
Guns T, Dao TBH, Vrain C, Duong KC (2016) Repetitive branch-and-bound using constraint programming for constrained minimum sum-of-squares clustering. In: Proceedings of the European conference on artificial intelligence, pp 462–470
Hadjitodorov ST, Kuncheva LI (2007) Selecting diversifying heuristics for cluster ensembles. In: Proceedings of the international workshop on multiple classifier systems, pp 200–209
Handl J, Knowles J (2006) On semi-supervised clustering via multiobjectve optimization. In: Proceedings of the annual conference on genetic and evolutionary computation, pp 1465–1472
Hansen P, Delattre M (1978) Complete-link cluster analysis by graph coloring. J Am Stat Assoc 73(362):397–403
Article MATH Google Scholar
Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Math Program 79(1–3):191–215
MathSciNet MATH Google Scholar
Hiep T, Duc N, Trung B (2016) Local search approach for the pairwise constrained clustering problem. In: Proceedings of the symposium on information and communication technology, pp 115–122
Hoi S, Jin R, Lyu M (2007) Learning nonparametric kernel matrices from pairwise constraints. In: International conference on machine learning, pp 361–368
Hoi S, Liu W, Chang SF (2008) Semi-supervised distance metric learning for collaborative image retrieval. In: Proceedings of the IEEE international conference on computer vision and pattern recognition
Hoi S, Liu W, Chang SF (2010) Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Trans Multimed Comput Commun Appl 6(3):18
Article Google Scholar
Huang H, Cheng Y, Zhao R (2008) A semi-supervised clustering algorithm based on must-link set. In: Proceedings of the international conference on advanced data mining and applications, pp 492–499
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article MATH Google Scholar
Iqbal A, Moh’d A, Zhan Z (2012) Semi-supervised clustering ensemble by voting. In: Proceedings of the international conference on information and communication systems, pp 1–5
Kamvar S, Klein D, Manning C (2003) Spectral learning. In: Proceedings of the international joint conference on artificial intelligence, pp 561–566
Kavitha V, Punithavalli M (2010) Clustering time series data stream–a literature survey. Int J Comput Sci Inf Secur 8(1):289–294
Google Scholar
Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371
Article MathSciNet Google Scholar
Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8(2):154–177
Article Google Scholar
Kittler J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
Article Google Scholar
Klein D, Kamvar S, Manning C (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the international conference on machine learning, pp 307–314
Kruskal J (1964) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1):1–27
Article MathSciNet MATH Google Scholar
Kuhn H, Tucker A (1951) Nonlinear programming. In: Proceedings of the Berkeley symposium, pp 481–492
Kulis B, Basu S, Dhillon I, Mooney R (2005) Semi-supervised graph clustering: a kernel approach. In: Proceedings of the international conference on machine learning, pp 457–464
Kulis B, Basu S, Dhillon I, Mooney R (2009) Semi-supervised graph clustering: a kernel approach. Mach Learn 74(1):1–22
Article Google Scholar
Laxman S, Sastry P (2006) A survey of temporal data mining. Sadhana 31(2):173–198
Article MathSciNet MATH Google Scholar
Li T, Ding C (2008) Weighted consensus clustering. In: Proceedings of the SIAM international conference on data mining, pp 798–809
Chapter Google Scholar
Li T, Ding C, Jordan M (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Proceedings of the IEEE international conference on data mining, pp 577–582
Li Z, Liu J (2009) Constrained clustering by spectral kernel learning. In: IEEE international conference on computer vision, pp 421–427
Li Z, Liu J, Tang X (2008) Pairwise constraint propagation by semidefinite programming for semi-supervised classification. In: Proceedings of the international conference on machine learning, pp 576–583
Li Z, Liu J, Tang X (2009) Constrained clustering via spectral regularization. In: Proceedings of the international conference on computer vision and pattern recognition, pp 421–428
Liao TW (2005) Clustering of time series data: a survey. Pattern Recognit 38(11):1857–1874
Article MATH Google Scholar
Lines J, Bagnall A (2015) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565–592
Article MathSciNet MATH Google Scholar
Lu Z, Carreira-Perpiñán M (2008) Constrained spectral clustering through affinity propagation. In: IEEE conference on computer vision and pattern recognition, pp 1–8
Lu Z, Ip H (2010) Constrained spectral clustering via exhaustive and efficient constraint propagation. In: Proceedings of the European conference on computer vision, pp 1–14
Google Scholar
Lu Z, Leen T (2005) Semi-supervised learning with penalized probabilistic clustering. In: Proceedings of the advances in neural information processing systems
Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article MathSciNet Google Scholar
Merle Od, Hansen P, Jaumard B, Mladenović N (1999) An interior point algorithm for minimum sum-of-squares clustering. SIAM J Sci Comput 21(4):1485–1505
Article MathSciNet MATH Google Scholar
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1):91–118
Article MATH Google Scholar
Mueller M, Kramer S (2010) Integer linear programming models for constrained clustering. In: Proceedings of the international conference on discovery science, pp 159–173
Chapter Google Scholar
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Proceedings of the international conference on neural information processing systems, pp 849–856
Ng M (2000) A note on constrained k-means algorithms. Pattern Recognit 33(3):515–519
Article MathSciNet Google Scholar
Ouali A, Loudni S, Lebbah Y, Boizumault P, Zimmermann A, Loukil L (2016) Efficiently finding conceptual clustering models with integer linear programming. In: Proceedings of the international joint conference on artificial intelligence, pp 647–654
Pedrycz W (2002) Collaborative fuzzy clustering. Pattern Recognit Lett 23(14):1675–1686
Article MATH Google Scholar
Pelleg D, Baras D (2007) K-means with large and noisy constraint sets. In: Proceedings of the European conference on machine learning, pp 674–682
Petitjean F, Ketterlin A, Gançarski P (2011) A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit 44(3):678–693
Article MATH Google Scholar
Rangapuram S, Hein M (2012) Constrained 1-spectral clustering. In: Proceedings of the international conference on artificial intelligence and statistics, pp 1143–1151
Rani S, Sikka G (2012) Recent techniques of clustering of time series data: a survey. Int J Comput Appl 52(15):1–9
Google Scholar
Rossi F, Pv Beek, Walsh T (eds) (2006) Handbook of constraint programming. Foundations of artificial intelligence. Elsevier, Amsterdam
MATH Google Scholar
Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput Appl Math 20:53–65
Article MATH Google Scholar
Rutayisire T, Yang Y, Lin C, Zhang J (2011) A modified COP-KMeans algorithm based on sequenced cannot-link set. In: Proceedings of the International Conference on Rough Sets and Knowledge Technology, pp 217–225
Sakoe H, Chiba S (1971) A dynamic programming approach to continuous speech recognition. In: Proceedings of the international congress on acoustics, vol 3, pp 65–69
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49
Article MATH Google Scholar
Shental N, Bar-Hillel A, Hertz T, Weinshall D (2013) Computing Gaussian mixture models with EM using equivalence constraints. In: International conference on neural information processing systems, pp 465–472
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
MathSciNet MATH Google Scholar
Tan W, Yang Y, Li T (2010) An improved COP-KMeans algorithm for solving constraint violation. In: Proceedings of the international FLINS conference on foundations and applications of computational intelligence, pp 690–696
Tang W, Xiong H, Zhong S, Wu J (2007) Enhancing semi-supervised clustering: a feature projection perspective. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 707–716
Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the international conference on machine learning, pp 1103–1110
Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the international conference on machine learning, pp 577–584
Wagstaff K, Basu S, Davidson I (2006) When is constrained clustering beneficial, and why? In: Proceedings of the national conference on artificial intelligence and the eighteenth innovative applications of artificial intelligence conference
Wang J, Wu S, Vu H, Li G (2010) Text document clustering with metric learning. In: International ACM SIGIR conference on research and development in information retrieval, pp 783–784
Wang X, Davidson I (2010) Active spectral clustering. In: Proceedings of the IEEE international conference on data mining, pp 561–568
Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 563–572
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309
Article MathSciNet Google Scholar
Wang X, Qian B, Davidson I (2014) On constrained spectral clustering and its applications. Data Min Knowl Discov 28(1):1–30
Article MathSciNet MATH Google Scholar
Wemmert C, Gançarski P, Korczak J (2000) A collaborative approach to combine multiple learning methods. Int J Artif Intell Tools 9(1):59–78
Article Google Scholar
Xiao W, Yang Y, Wang H, Li T, Xing H (2016) Semi-supervised hierarchical clustering ensemble and its application. Neurocomputing 173(3):1362–1376
Article Google Scholar
Xing E, Ng A, Jordan M, Russell S (2002) Distance metric learning learning, with application to clustering with side-information. In: Proceedings of the advances in neural information processing systems, pp 521–528
Yang F, Li T, Zhou Q, Xiao H (2017) Cluster ensemble selection with constraints. Neurocomputing 235:59–70
Article Google Scholar
Yang Y, Tan W, Li T, Ruan D (2012) Consensus clustering based on constrained self-organizing map and improved Cop-Kmeans ensemble in intelligent decision support systems. Knowl Based Syst 32:101–115
Article Google Scholar
Yi J, Jin R, Jain A, Yang T, Jain S (2012) Semi-crowdsourced clustering: generalizing crowd labeling by robust distance metric learning. In: Proceedings of the advances in neural information processing systems, pp 1772–1780
Yu Z, Wongb HS, You J, Yang Q, Liao H (2011) Knowledge based cluster ensemble for cancer discovery from biomolecular data. IEEE Trans NanoBioscience 10(2):76–85
Article Google Scholar
Zha H, He X, Ding CHQ, Gu M, Simon HD (2001) Spectral relaxation for k-means clustering. In: Proceedings of the international conference on neural information processing systems, pp 1057–1064
Zhang T, Ando R (2006) Analysis of spectral kernel design based semi-supervised learning. In: Proceedings of the international conference on neural information processing systems, pp 1601–1608
Zhi W, Wang X, Qian B, Butler P, Ramakrishnan N, Davidson I (2013) Clustering with complex constraints-algorithms and applications. In: Proceedings of the conference on artificial intelligence, pp 1056–1062
Zhu X, Loy C, Gong S (2016) Constrained clustering with imperfect oracles. IEEE Trans Neural Netw Learn Syst 27(6):1345–1357
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

ICube, University of Strasbourg, Strasbourg, France
Thomas Lampert, Baptiste Lafabregue & Pierre Gançarski
LIFO, University of Orléans, Orléans, France
Thi-Bich-Hanh Dao, Nicolas Serrette & Christel Vrain
MIPS, University of Haute-Alsace, Mulhouse, France
Germain Forestier
GREYC, University of Caen Normandie, Caen, France
Bruno Crémilleux

Authors

Thomas Lampert
View author publications
You can also search for this author in PubMed Google Scholar
Thi-Bich-Hanh Dao
View author publications
You can also search for this author in PubMed Google Scholar
Baptiste Lafabregue
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Serrette
View author publications
You can also search for this author in PubMed Google Scholar
Germain Forestier
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Crémilleux
View author publications
You can also search for this author in PubMed Google Scholar
Christel Vrain
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Gançarski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Lampert.

Additional information

Communicated by Jian Pei.

Funding: CNES/Unistra R&T research Grant Number 2016-033.

Appendices

Appendix A: Full metric scores

See Tables 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 and 18.

Table 8 Unconstrained mean ARI

Full size table

Table 9 Mean consistency (measured using 50% constraint sets)

Full size table

Table 10 Performance on ECG5000

Full size table

Table 11 Performance on ElectricDevices

Full size table

Table 12 Performance on FacesUCR

Full size table

Table 13 Performance on InsectWingbeatSound

Full size table

Table 14 Performance on MALLAT

Full size table

Table 15 Performance on StarLightCurves

Full size table

Table 16 Performance on TwoPatterns

Full size table

Table 17 Performance on uWaveGestureLibraryX

Full size table

Table 18 Performance on UWaveGestureLibraryAll

Full size table

Appendix B: Constraint coherence

As described in Davidson et al. (2006): “We consider all constraint pairs composed of an ML and a CL constraint (pairs composed of the same constraint type cannot be contradictory). To determine the coherence of two constraints, a and b, we compute the projected overlap of each constraint on the other”.

Let $\mathbf {a}$ and $\mathbf {b}$ be vectors connecting the points constrained by a, i.e. $(a_1,a_2)$, and b, i.e. $(b_1,b_2)$, respectively. We first project the points bound by constraint a onto the line that is defined by the points bound by constraint b, such that

$$\begin{aligned} a'_1= & {} ((a_1 - b_1) \cdot \mathbf {e}) \mathbf {e} + b_1,\\ a'_2= & {} ((a_2 - b_1) \cdot \mathbf {e}) \mathbf {e} + b_1, \end{aligned}$$

where

$$\begin{aligned} \mathbf {e} = \frac{\mathbf {b}}{|\mathbf {b}|}. \end{aligned}$$

The points $a'_1$, $a'_2$, $b_1$, and $b_2$ now all exist in the 1D space described by the basis vector $\mathbf {e}$, and as such are projected into this 1D space, such that

$$\begin{aligned} a''_i = a'_i \mathbf {e}, \quad b''_i = b_i \mathbf {e}, \quad \text {where }\ i \in \{1,2\} . \end{aligned}$$

The 1D points of each constraint are then sorted such that $a''_1 \le a''_2$ and $b''_1 \le b''_2$. With this assumption satisfied, the overlap of constraint a on constraint b becomes

$$\begin{aligned} o_a^b = \max \left\{ 0, \min \{a''_2,b''_2\} - \max \{a''_1, b''_1\}\right\} . \end{aligned}$$

Two constraints are coherent if there is no overlap between them, such that

$$\begin{aligned} \text {coh}_{cm} = {\left\{ \begin{array}{ll} 1, \quad \text {if }\ o_c^m = 0\ \text { and }\ o_m^c = 0,\\ 0, \quad \text {otherwise,} \end{array}\right. } \end{aligned}$$

and the coherence of a set of constraints is defined to be the fraction of coherent constraints within the set, such that

$$\begin{aligned} \text {COH}(C) = \frac{\sum _{c \in C_{\text {CL}}, m \in C_{\text {ML}}}\text {coh}_{cm}}{|C_{\text {CL}}||C_{\text {ML}}|}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lampert, T., Dao, TBH., Lafabregue, B. et al. Constrained distance based clustering for time-series: a comparative and experimental study. Data Min Knowl Disc 32, 1663–1707 (2018). https://doi.org/10.1007/s10618-018-0573-y

Download citation

Received: 26 September 2017
Accepted: 19 May 2018
Published: 30 May 2018
Issue Date: November 2018
DOI: https://doi.org/10.1007/s10618-018-0573-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constrained distance based clustering for time-series: a comparative and experimental study

Abstract

Access this article

Similar content being viewed by others

A review and evaluation of elastic distance functions for time series clustering

TSX-Means: An Optimal K Search Approach for Time Series Clustering

Enhancing Time Series Clustering by Incorporating Multiple Distance Measures with Semi-Supervised Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Full metric scores

Appendix B: Constraint coherence

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Constrained distance based clustering for time-series: a comparative and experimental study

Abstract

Access this article

Similar content being viewed by others

A review and evaluation of elastic distance functions for time series clustering

TSX-Means: An Optimal K Search Approach for Time Series Clustering

Enhancing Time Series Clustering by Incorporating Multiple Distance Measures with Semi-Supervised Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Full metric scores

Appendix B: Constraint coherence

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation