1 Introduction and Background

A recurrence plot (RP) is an advanced technique of nonlinear data analysis [3]. Technically speaking, a recurrence plot R visualizes those times when the trajectory x of a dynamical system visits roughly the same phase space [3]: \(R_{i,j}=\varTheta ( \epsilon - \Vert x_i - x_j \Vert )\), where \(\epsilon \) is the similarity threshold, \(\Vert \cdot \Vert \) a norm, \(\varTheta (\cdot )\) the unit step function, and \(i,j=1 \ldots N\) is the number of states. In addition, a cross recurrence plot (CRP) shows all those times at which a state \(x_i \in \mathbb {R}^m\) in one dynamical system co-occurs \(y_j \in \mathbb {R}^m\) in a second dynamical system [3]: \(R_{i,j}=\varTheta ( \epsilon - \Vert x_i - y_j \Vert )\), where the dimension m of both systems must be the same, but the number of states can be different.

The recurrence quantification analysis (RQA) is a method of nonlinear data analysis which quantifies the number and duration of recurrences of a dynamical system presented by its state space trajectory [3]. RQA measures are derived from RP structures and can be employed to study the dynamics, transitions, or synchronization of complex systems [3, 4]. The determinism measure (\(DET^{\mu }\)), which is the fraction of recurrence points that form diagonal lines of minimum length \(\mu \), has e.g. been successfully applied to detect dynamical transitions [4].

2 Recent Trends and Advances

In time series mining, many algorithms are based on analogical reasoning or pairwise dissimilarity comparisons of (sub)sequences [13]. In general, the distance between time series needs to be carefully defined in order to reflect the underlying dissimilarity of the data, where the choice of distance measure usually depends on the invariance required by the domain [1].

Recent work [912] has introduced novel time series distance measures that use recurrence quantification analysis (RQA) techniques. The main idea [9] is to pairwise compare time series by (i) computing a cross recurrence plot (CRP) that reveals all times at which roughly the same states co-occur and, subsequently, (ii) quantifying the number and length of all diagonal line structures that indicate similar subsequences. Figure 1(a-b) shows a toy example, where a labeled time series is compared to two unlabeled data stream segments using CRPs as well as corresponding RQA measures.

It has been shown [9, 11] that traditional RQA measures, such as the average diagonal line length and the determinism, can be used to compare time series that exhibit similar segments or subsequences at arbitrary positions. Time series with such an order invariance [9] can, for instance, be found in automotive engineering [11], where vehicular sensors observe driving behavior patterns in their natural occurring order and the recorded car drives are compared according to the co-occurrence of these patterns. Although the recurrence plot-based distance [11] was originally developed to determine characteristic driving profiles [12], this approach can be used to find representatives in arbitrary sets of single- or multi-dimensional time series of variable length [10].

In addition, it has been proposed to employ video compression algorithms for measuring the dissimilarity between un-thresholded recurrence plots and accordingly the time series that generated them [8]. This approach relies on the underlying assumption that video compression algorithms are able to detect similar structures in images or recurrence plots, which correspond to time series patterns. The result [8] show that the compression distance of recurrence plots works especially well for time series that represent shapes. A follow-up study [5] compared the performance of various MPEG video compression algorithms and furthermore introduced a compression distance for cross recurrence plots. Figure 1(c) contrasts two un-thresholded recurrence plots, which reveal structural dissimilarities between the examined time series.

Although recurrence plots have been adopted by the data mining community [2, 5, 812], their computation and quantification generally involve operations with quadratic time and space complexity. Hence, recent work [7, 14] has introduced approximate RQA measures, which exhibit significantly lower complexity while maintaining high accuracy. Most important, these novel approximations [7, 14] enable us to efficiently use recurrence quantification analysis for relatively long time series and fast time series streams. Figure 1(d) illustrates the fast computation of the approximate determinism (aDET) [7], which allows us, for example, to filter or identify time series segments with a certain behavior in an online fashion. The approximation of various RQA measures, such as laminarity and determinism, is explained at full length in a recent publication [14]

Fig. 1.
figure 1

Recurrence plot-based distances: (a) illustrates a time series mining scenario that assumes a labeled sequence x and a data stream with unlabeled segments y and z. In case (b) we compare time series x with segment y and z to assign labels. (b) shows two cross recurrence plots that indicate similar states (\(\epsilon = 0.1\)) for time series pairs (xy) and (xz), where recurrence points are represented by ‘1’ entries and diagonal line structures are highlighted in bold font. According to the determinism, \(DET^2_{x,y} = 4/9 > 4/12 = DET^2_{x,z}\), the pair (xy) is more similar than (xz) [11], meaning x and y might be from the same class. (c) shows another way to determine the pairwise dissimilarity of time series. In this case (c) we create un-thresholded recurrence plots (\(\epsilon = 0\)), which facilitate pairwise comparisons by means of image processing and video compression algorithms [5, 8]. The images in (c) resemble each other in structure since time series x and y have a similar shape. In case (d) we compute the approximate determinism to assess the ‘complexity’ of our sample data stream at time interval z and to filter/identify ‘ir-/relevant’ segments with a certain (nonlinear) behavior. (d) illustrates the recurrence plot of segment z and it’s discretized version \(\zeta = \lfloor \frac{z}{2\epsilon } \rfloor \). In our example (d) we achieve a fairly reasonable approximation of the determinism, \(DET^2_{z,z} = 14/20 \approx 10/18 = aDET^2_{\zeta ,\zeta }\). Although the discretization step introduces some rounding errors, it allows us to approximate all traditional RQA measures in an efficient way without even creating and quantifying the RP [7, 14].

3 Conclusion and Open Problems

Recurrence quantification analysis (RQA) is a method of nonlinear data analysis for the investigation of dynamical systems, which has its origin in theoretical physics [3, 4]. Recently, RQA was adopted by the data mining community in order to: (i) define novel time series distance measures [5, 8, 11] and (ii) process massive data streams by means of approximate measures [7, 14].

Although RQA has been successfully applied to data mining problems from engineering [12] and climatology [6, 14], there exist open problems which prevent its widespread acceptance by the time series fraternity. The main problem with traditional RQA is that it excludes curved structures, which prevents us from comparing time series with local scaling or warping invariance [1]. This issue might be addressed by feeding un-thresholded RPs [5, 8] into convolutional neural networks. In the case of the recently introduced approximate RQA [7, 14], it is necessary to investigate time series representations and discretization techniques that enable us to bound the approximation error.