Skip to main content
Log in

Constrained distance based clustering for time-series: a comparative and experimental study

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Constrained clustering is becoming an increasingly popular approach in data mining. It offers a balance between the complexity of producing a formal definition of thematic classes—required by supervised methods—and unsupervised approaches, which ignore expert knowledge and intuition. Nevertheless, the application of constrained clustering to time-series analysis is relatively unknown. This is partly due to the unsuitability of the Euclidean distance metric, which is typically used in data mining, to time-series data. This article addresses this divide by presenting an exhaustive review of constrained clustering algorithms and by modifying publicly available implementations to use a more appropriate distance measure—dynamic time warping. It presents a comparative study, in which their performance is evaluated when applied to time-series. It is found that k-means based algorithms become computationally expensive and unstable under these modifications. Spectral approaches are easily applied and offer state-of-the-art performance, whereas declarative approaches are also easily applied and guarantee constraint satisfaction. An analysis of the results raises several influencing factors to an algorithm’s performance when constraints are introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. COP-KMeans is presented in more detail in Sect. 4.2.1.

  2. The algorithms developed by Kamvar et al. (2003) and Li et al. (2009) are presented in more detail in Sects. 4.2.2 and 4.2.3 respectively.

  3. Samarah is discussed in more detail in Sect. 4.2.4.

  4. CPClustering is discussed in more detail in Sect. 4.2.5.

References

  • Aghabozorgi S, Shirkhorshidi A, Wah T (2015) Time-series clustering: a decade review. Inf Syst 53:16–38

    Article  Google Scholar 

  • Al-Razgan M, Domeniconi C (2009) Clustering ensembles with active constraints. Springer, Berlin, pp 175–189

    Google Scholar 

  • Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75(2):245–248

    Article  MATH  Google Scholar 

  • Aloise D, Hansen P, Liberti L (2012) An improved column generation algorithm for minimum sum-of-squares clustering. Math Program 131(1–2):195–220

    Article  MathSciNet  MATH  Google Scholar 

  • Alzate C, Suykens J (2009) A regularized formulation for spectral clustering with pairwise constraints. In: Proceedings of the international joint conference on neural networks, pp 141–148

  • Anand R, Reddy C (2011) Graph-based clustering with constraints. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 51–62

    Chapter  Google Scholar 

  • Anand S, Bell D, Hughes J (1995) The role of domain knowledge in data mining. In: Proceedings of the international conference on information and knowledge management, pp 37–43

  • Antunes C, Oliveira A (2001) Temporal data mining: an overview. In: KDD workshop on temporal data mining, pp 1–13

  • Babaki B (2017) MIPKmeans. https://github.com/Behrouz-Babaki/MIPKmeans. Accessed 01 May 2017

  • Babaki B, Guns T, Nijssen S (2014) Constrained clustering using column generation. In: Proceedings of the international conference on AI and OR techniques in constriant programming for combinatorial optimization problems, pp 438–454

    Chapter  Google Scholar 

  • Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31(3):606–660

    Article  MathSciNet  Google Scholar 

  • Banerjee A, Ghosh J (2006) Scalable clustering algorithms with balancing constraints. Data Min Knowl Discov 13(3):365–395

    Article  MathSciNet  Google Scholar 

  • Bar-Hillel A, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence relations. In: Proceedings of the international conference on machine learning, pp 11–18

  • Bar-Hillel A, Hertz T, Shental M, Weinshall D (2005) Learning a Mahalanobis metric from equivalence constraints. J Mach Learn Res 6:937–965

    MathSciNet  MATH  Google Scholar 

  • Basu S, Banerjee A, Mooney R (2002) Semi-supervised clustering by seeding. In: Proceedings of the international conference on machine learning, pp 19–26

  • Basu S, Banerjee A, Mooney R (2004) Active semi-supervision for pairwise constrained clustering. In: Proceedings of the SIAM international conference on data mining, pp 333–344

  • Basu S, Bilenko M, Mooney R (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 59–68

  • Basu S, Davidson I, Wagstaff K (2008) Constrained clustering: advances in algorithms, theory, and applications, 1st edn. Chapman & Hall, London

    MATH  Google Scholar 

  • Bellet A, Habrard A, Sebban M (2015) Metric learning. Morgan & Claypool Publishers, Los Altos

    Book  MATH  Google Scholar 

  • Bilenko M, Mooney R (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 39–48

  • Bilenko M, Basu S, Mooney R (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the international conference on machine learning, pp 11–18

  • Bradley P, Bennett K, Demiriz A (2000) Constrained k-means clustering. Technical Report MSR-TR-2000-65, Microsoft Research

  • Chen W, Feng G (2012) Spectral clustering: a semi-supervised approach. Neurocomputing 77(1):229–242

    Article  Google Scholar 

  • Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. www.cs.ucr.edu/~eamonn/time_series_data/. Accessed 01 May 2017

  • Cheng H, Hua K, Vu K (2008) Constrained locally weighted clustering. Proc VLDB Endow 1(1):90–101

    Article  Google Scholar 

  • Cohn D, Caruana R, Mccallum A (2003) Semi-supervised clustering with user feedback. Technical Report TR2003-1892, Department of Computer Science, Cornell University

  • Cucuringu M, Koutis I, Chawla S, Miller G, Peng R (2016) Simple and scalable constrained clustering: a generalized spectral method. In: Proceedings of the international conference on artificial intelligence and statistics, pp 445–454

  • Dao TBH, Duong KC, Vrain C (2013) A declarative framework for constrained clustering. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases, pp 419–434

    Google Scholar 

  • Dao TBH, Vrain C, Duong KC, Davidson I (2016) A framework for actionable clustering using constraint programming. In: Proceedings of the European conference on artificial intelligence, pp 453–461

  • Dao TBH, Duong KC, Vrain C (2017) Constrained clustering by constraint programming. Artif Intell 244:70–94

    Article  MathSciNet  MATH  Google Scholar 

  • Davidson I, Basu S (2007) A survey of clustering with instance level constraints. ACM Trans Knowl Discov Data 77(1):1–41

    Google Scholar 

  • Davidson I, Ravi S (2005) Clustering with constraints: Feasibility issues and the k-means algorithm. In: Proceedings of the SIAM international conference on data mining, pp 307–314

  • Davidson I, Ravi S (2006) Identifying and generating easy sets of constraints for clustering. In: Proceedings of the AAAI conference on artificial intelligence, pp 336–341

  • Davidson I, Ravi S (2007) Intractability and clustering with constraints. In: Proceedings of the international conference on machine learning, pp 201–208

  • Davidson I, Wagstaff K, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: European conference on principles of data mining and knowledge discovery, pp 115–126

    Google Scholar 

  • Davidson I, Ravi S, Shamis L (2010) A SAT-based framework for efficient constrained clustering. In: Proceedings of the SIAM international conference on data mining, pp 94–105

    Chapter  Google Scholar 

  • Delattre M, Hansen P (1980) Bicriterion cluster analysis. IEEE Trans Pattern Anal Mach Intell PAMI 2(4):277–291

    Article  MATH  Google Scholar 

  • Demiriz A, Bennett K, Embrechts M (1999) Semi-supervised clustering using genetic algorithms. In: Proceedings of the conference on artificial neural networks in engineering, pp 809–814

  • Demiriz A, Bennett K, Bradley P (2008) Chap 9: Using assignment constraints to avoid empty clusters in k-means clustering. In: Basu S, Davidson I, Wagstaff K (eds) Constrained clustering: advances in algorithms, theory, and applications, 1st edn. Chapman & Hall, London, pp 201–220

    Google Scholar 

  • Dimitriadou E, Weingessel A, Hornik K (2002) A mixed ensemble approach for the semi-supervised problem. In: Proceedings of the international conference on artificial neural networks, pp 571–576

  • Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: Proceedings of the international conference on very large data bases

  • Ding S, Qi B, Jia H, Zhu H, Zhang L (2013) Research of semi-supervised spectral clustering based on constraints expansion. Neural Comput Appl 22:405–410

    Article  Google Scholar 

  • Domeniconi C, Al-Razgan M (2008) Penta-training: clustering ensembles with bootstrapping of constraints. In: Proceedings of workshop on supervised and unsupervised ensemble methods and their applications, pp 47–51

  • Domeniconi C, Gunopulos D, Ma S, Yan B, Al-Razgan M, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Discov 14(1):63–97

    Article  MathSciNet  Google Scholar 

  • Fisher D (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172

    Google Scholar 

  • Forestier G, Gançarski P, Wemmert C (2010) Collaborative clustering with background knowledge. Data Knowl Eng 69(2):211–228

    Article  MATH  Google Scholar 

  • Forestier G, Wemmert C, Gançarski P (2010) Towards conflict resolution in collaborative clustering. In: IEEE International conference on intelligent systems, pp 361–366

  • Fred ALN, Jain AK (2002) Data clustering using evidence accumulation. In: Proceedings of the IEEE international conference on pattern recognition, pp 276–280

  • Gançarski P, Wemmert C (2007) Collaborative multi-step mono-level multi-strategy classification. J Multimed Tools Appl 35(1):1–27

    Article  Google Scholar 

  • Ganji M, Bailey J, Stuckey P (2016) Lagrangian constrained clustering. In: Proceedings of the SIAM international conference on data mining, pp 288–296

  • Ge R, Ester M, Jin W, Davidson I (2007) Constraint-driven clustering. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 320–329

  • Grira N, Crucianu M, Boujemaa N (2006) Fuzzy clustering with pairwise constraints for knowledge-driven image categorization. IEE Proc Vis Image Signal Process (CORE B) 153(3):299–304

    Article  Google Scholar 

  • Guns T, Dao TBH, Vrain C, Duong KC (2016) Repetitive branch-and-bound using constraint programming for constrained minimum sum-of-squares clustering. In: Proceedings of the European conference on artificial intelligence, pp 462–470

  • Hadjitodorov ST, Kuncheva LI (2007) Selecting diversifying heuristics for cluster ensembles. In: Proceedings of the international workshop on multiple classifier systems, pp 200–209

  • Handl J, Knowles J (2006) On semi-supervised clustering via multiobjectve optimization. In: Proceedings of the annual conference on genetic and evolutionary computation, pp 1465–1472

  • Hansen P, Delattre M (1978) Complete-link cluster analysis by graph coloring. J Am Stat Assoc 73(362):397–403

    Article  MATH  Google Scholar 

  • Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Math Program 79(1–3):191–215

    MathSciNet  MATH  Google Scholar 

  • Hiep T, Duc N, Trung B (2016) Local search approach for the pairwise constrained clustering problem. In: Proceedings of the symposium on information and communication technology, pp 115–122

  • Hoi S, Jin R, Lyu M (2007) Learning nonparametric kernel matrices from pairwise constraints. In: International conference on machine learning, pp 361–368

  • Hoi S, Liu W, Chang SF (2008) Semi-supervised distance metric learning for collaborative image retrieval. In: Proceedings of the IEEE international conference on computer vision and pattern recognition

  • Hoi S, Liu W, Chang SF (2010) Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Trans Multimed Comput Commun Appl 6(3):18

    Article  Google Scholar 

  • Huang H, Cheng Y, Zhao R (2008) A semi-supervised clustering algorithm based on must-link set. In: Proceedings of the international conference on advanced data mining and applications, pp 492–499

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  MATH  Google Scholar 

  • Iqbal A, Moh’d A, Zhan Z (2012) Semi-supervised clustering ensemble by voting. In: Proceedings of the international conference on information and communication systems, pp 1–5

  • Kamvar S, Klein D, Manning C (2003) Spectral learning. In: Proceedings of the international joint conference on artificial intelligence, pp 561–566

  • Kavitha V, Punithavalli M (2010) Clustering time series data stream–a literature survey. Int J Comput Sci Inf Secur 8(1):289–294

    Google Scholar 

  • Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371

    Article  MathSciNet  Google Scholar 

  • Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8(2):154–177

    Article  Google Scholar 

  • Kittler J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239

    Article  Google Scholar 

  • Klein D, Kamvar S, Manning C (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the international conference on machine learning, pp 307–314

  • Kruskal J (1964) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1):1–27

    Article  MathSciNet  MATH  Google Scholar 

  • Kuhn H, Tucker A (1951) Nonlinear programming. In: Proceedings of the Berkeley symposium, pp 481–492

  • Kulis B, Basu S, Dhillon I, Mooney R (2005) Semi-supervised graph clustering: a kernel approach. In: Proceedings of the international conference on machine learning, pp 457–464

  • Kulis B, Basu S, Dhillon I, Mooney R (2009) Semi-supervised graph clustering: a kernel approach. Mach Learn 74(1):1–22

    Article  Google Scholar 

  • Laxman S, Sastry P (2006) A survey of temporal data mining. Sadhana 31(2):173–198

    Article  MathSciNet  MATH  Google Scholar 

  • Li T, Ding C (2008) Weighted consensus clustering. In: Proceedings of the SIAM international conference on data mining, pp 798–809

    Chapter  Google Scholar 

  • Li T, Ding C, Jordan M (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Proceedings of the IEEE international conference on data mining, pp 577–582

  • Li Z, Liu J (2009) Constrained clustering by spectral kernel learning. In: IEEE international conference on computer vision, pp 421–427

  • Li Z, Liu J, Tang X (2008) Pairwise constraint propagation by semidefinite programming for semi-supervised classification. In: Proceedings of the international conference on machine learning, pp 576–583

  • Li Z, Liu J, Tang X (2009) Constrained clustering via spectral regularization. In: Proceedings of the international conference on computer vision and pattern recognition, pp 421–428

  • Liao TW (2005) Clustering of time series data: a survey. Pattern Recognit 38(11):1857–1874

    Article  MATH  Google Scholar 

  • Lines J, Bagnall A (2015) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565–592

    Article  MathSciNet  MATH  Google Scholar 

  • Lu Z, Carreira-Perpiñán M (2008) Constrained spectral clustering through affinity propagation. In: IEEE conference on computer vision and pattern recognition, pp 1–8

  • Lu Z, Ip H (2010) Constrained spectral clustering via exhaustive and efficient constraint propagation. In: Proceedings of the European conference on computer vision, pp 1–14

    Google Scholar 

  • Lu Z, Leen T (2005) Semi-supervised learning with penalized probabilistic clustering. In: Proceedings of the advances in neural information processing systems

  • Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  • Merle Od, Hansen P, Jaumard B, Mladenović N (1999) An interior point algorithm for minimum sum-of-squares clustering. SIAM J Sci Comput 21(4):1485–1505

    Article  MathSciNet  MATH  Google Scholar 

  • Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1):91–118

    Article  MATH  Google Scholar 

  • Mueller M, Kramer S (2010) Integer linear programming models for constrained clustering. In: Proceedings of the international conference on discovery science, pp 159–173

    Chapter  Google Scholar 

  • Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Proceedings of the international conference on neural information processing systems, pp 849–856

  • Ng M (2000) A note on constrained k-means algorithms. Pattern Recognit 33(3):515–519

    Article  MathSciNet  Google Scholar 

  • Ouali A, Loudni S, Lebbah Y, Boizumault P, Zimmermann A, Loukil L (2016) Efficiently finding conceptual clustering models with integer linear programming. In: Proceedings of the international joint conference on artificial intelligence, pp 647–654

  • Pedrycz W (2002) Collaborative fuzzy clustering. Pattern Recognit Lett 23(14):1675–1686

    Article  MATH  Google Scholar 

  • Pelleg D, Baras D (2007) K-means with large and noisy constraint sets. In: Proceedings of the European conference on machine learning, pp 674–682

  • Petitjean F, Ketterlin A, Gançarski P (2011) A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit 44(3):678–693

    Article  MATH  Google Scholar 

  • Rangapuram S, Hein M (2012) Constrained 1-spectral clustering. In: Proceedings of the international conference on artificial intelligence and statistics, pp 1143–1151

  • Rani S, Sikka G (2012) Recent techniques of clustering of time series data: a survey. Int J Comput Appl 52(15):1–9

    Google Scholar 

  • Rossi F, Pv Beek, Walsh T (eds) (2006) Handbook of constraint programming. Foundations of artificial intelligence. Elsevier, Amsterdam

    MATH  Google Scholar 

  • Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput Appl Math 20:53–65

    Article  MATH  Google Scholar 

  • Rutayisire T, Yang Y, Lin C, Zhang J (2011) A modified COP-KMeans algorithm based on sequenced cannot-link set. In: Proceedings of the International Conference on Rough Sets and Knowledge Technology, pp 217–225

  • Sakoe H, Chiba S (1971) A dynamic programming approach to continuous speech recognition. In: Proceedings of the international congress on acoustics, vol 3, pp 65–69

  • Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49

    Article  MATH  Google Scholar 

  • Shental N, Bar-Hillel A, Hertz T, Weinshall D (2013) Computing Gaussian mixture models with EM using equivalence constraints. In: International conference on neural information processing systems, pp 465–472

  • Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  • Strehl A, Ghosh J (2002) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  • Tan W, Yang Y, Li T (2010) An improved COP-KMeans algorithm for solving constraint violation. In: Proceedings of the international FLINS conference on foundations and applications of computational intelligence, pp 690–696

  • Tang W, Xiong H, Zhong S, Wu J (2007) Enhancing semi-supervised clustering: a feature projection perspective. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 707–716

  • Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the international conference on machine learning, pp 1103–1110

  • Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the international conference on machine learning, pp 577–584

  • Wagstaff K, Basu S, Davidson I (2006) When is constrained clustering beneficial, and why? In: Proceedings of the national conference on artificial intelligence and the eighteenth innovative applications of artificial intelligence conference

  • Wang J, Wu S, Vu H, Li G (2010) Text document clustering with metric learning. In: International ACM SIGIR conference on research and development in information retrieval, pp 783–784

  • Wang X, Davidson I (2010) Active spectral clustering. In: Proceedings of the IEEE international conference on data mining, pp 561–568

  • Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 563–572

  • Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309

    Article  MathSciNet  Google Scholar 

  • Wang X, Qian B, Davidson I (2014) On constrained spectral clustering and its applications. Data Min Knowl Discov 28(1):1–30

    Article  MathSciNet  MATH  Google Scholar 

  • Wemmert C, Gançarski P, Korczak J (2000) A collaborative approach to combine multiple learning methods. Int J Artif Intell Tools 9(1):59–78

    Article  Google Scholar 

  • Xiao W, Yang Y, Wang H, Li T, Xing H (2016) Semi-supervised hierarchical clustering ensemble and its application. Neurocomputing 173(3):1362–1376

    Article  Google Scholar 

  • Xing E, Ng A, Jordan M, Russell S (2002) Distance metric learning learning, with application to clustering with side-information. In: Proceedings of the advances in neural information processing systems, pp 521–528

  • Yang F, Li T, Zhou Q, Xiao H (2017) Cluster ensemble selection with constraints. Neurocomputing 235:59–70

    Article  Google Scholar 

  • Yang Y, Tan W, Li T, Ruan D (2012) Consensus clustering based on constrained self-organizing map and improved Cop-Kmeans ensemble in intelligent decision support systems. Knowl Based Syst 32:101–115

    Article  Google Scholar 

  • Yi J, Jin R, Jain A, Yang T, Jain S (2012) Semi-crowdsourced clustering: generalizing crowd labeling by robust distance metric learning. In: Proceedings of the advances in neural information processing systems, pp 1772–1780

  • Yu Z, Wongb HS, You J, Yang Q, Liao H (2011) Knowledge based cluster ensemble for cancer discovery from biomolecular data. IEEE Trans NanoBioscience 10(2):76–85

    Article  Google Scholar 

  • Zha H, He X, Ding CHQ, Gu M, Simon HD (2001) Spectral relaxation for k-means clustering. In: Proceedings of the international conference on neural information processing systems, pp 1057–1064

  • Zhang T, Ando R (2006) Analysis of spectral kernel design based semi-supervised learning. In: Proceedings of the international conference on neural information processing systems, pp 1601–1608

  • Zhi W, Wang X, Qian B, Butler P, Ramakrishnan N, Davidson I (2013) Clustering with complex constraints-algorithms and applications. In: Proceedings of the conference on artificial intelligence, pp 1056–1062

  • Zhu X, Loy C, Gong S (2016) Constrained clustering with imperfect oracles. IEEE Trans Neural Netw Learn Syst 27(6):1345–1357

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Lampert.

Additional information

Communicated by Jian Pei.

Funding: CNES/Unistra R&T research Grant Number 2016-033.

Appendices

Appendix A: Full metric scores

See Tables 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 and 18.

Table 8 Unconstrained mean ARI
Table 9 Mean consistency (measured using 50% constraint sets)
Table 10 Performance on ECG5000
Table 11 Performance on ElectricDevices
Table 12 Performance on FacesUCR
Table 13 Performance on InsectWingbeatSound
Table 14 Performance on MALLAT
Table 15 Performance on StarLightCurves
Table 16 Performance on TwoPatterns
Table 17 Performance on uWaveGestureLibraryX
Table 18 Performance on UWaveGestureLibraryAll

Appendix B: Constraint coherence

As described in Davidson et al. (2006): “We consider all constraint pairs composed of an ML and a CL constraint (pairs composed of the same constraint type cannot be contradictory). To determine the coherence of two constraints, a and b, we compute the projected overlap of each constraint on the other”.

Let \(\mathbf {a}\) and \(\mathbf {b}\) be vectors connecting the points constrained by a, i.e. \((a_1,a_2)\), and b, i.e. \((b_1,b_2)\), respectively. We first project the points bound by constraint a onto the line that is defined by the points bound by constraint b, such that

$$\begin{aligned} a'_1= & {} ((a_1 - b_1) \cdot \mathbf {e}) \mathbf {e} + b_1,\\ a'_2= & {} ((a_2 - b_1) \cdot \mathbf {e}) \mathbf {e} + b_1, \end{aligned}$$

where

$$\begin{aligned} \mathbf {e} = \frac{\mathbf {b}}{|\mathbf {b}|}. \end{aligned}$$

The points \(a'_1\), \(a'_2\), \(b_1\), and \(b_2\) now all exist in the 1D space described by the basis vector \(\mathbf {e}\), and as such are projected into this 1D space, such that

$$\begin{aligned} a''_i = a'_i \mathbf {e}, \quad b''_i = b_i \mathbf {e}, \quad \text {where }\ i \in \{1,2\} . \end{aligned}$$

The 1D points of each constraint are then sorted such that \(a''_1 \le a''_2\) and \(b''_1 \le b''_2\). With this assumption satisfied, the overlap of constraint a on constraint b becomes

$$\begin{aligned} o_a^b = \max \left\{ 0, \min \{a''_2,b''_2\} - \max \{a''_1, b''_1\}\right\} . \end{aligned}$$

Two constraints are coherent if there is no overlap between them, such that

$$\begin{aligned} \text {coh}_{cm} = {\left\{ \begin{array}{ll} 1, \quad \text {if }\ o_c^m = 0\ \text { and }\ o_m^c = 0,\\ 0, \quad \text {otherwise,} \end{array}\right. } \end{aligned}$$

and the coherence of a set of constraints is defined to be the fraction of coherent constraints within the set, such that

$$\begin{aligned} \text {COH}(C) = \frac{\sum _{c \in C_{\text {CL}}, m \in C_{\text {ML}}}\text {coh}_{cm}}{|C_{\text {CL}}||C_{\text {ML}}|}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lampert, T., Dao, TBH., Lafabregue, B. et al. Constrained distance based clustering for time-series: a comparative and experimental study. Data Min Knowl Disc 32, 1663–1707 (2018). https://doi.org/10.1007/s10618-018-0573-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-018-0573-y

Keywords

Navigation