Leveraging large-deviation statistics to decipher the stochastic properties of measured trajectories

Samudrajit Thapa; Agnieszka Wyłomańska; Grzegorz Sikora; Caroline E Wagner; Diego Krapf; Holger Kantz; Aleksei V Chechkin; Ralf Metzler

doi:10.1088/1367-2630/abd50e

1. Introduction

Brownian motion (BM) is characterised by the linear scaling with time of the mean squared displacement (MSD), $\left\langle {\mathbf{r}}^{2}\left(t\right)\right\rangle =2dDt$ in d dimensions, where D is the diffusion coefficient and angular brackets denote the ensemble average over a large number of particles. In many biological and soft-matter systems, this linear scaling has been reported to be violated [1–4]. Instead, anomalous diffusion with the power-law scaling ⟨r²(t)⟩≃ t^α of the MSD is observed. The anomalous diffusion exponent α characterises subdiffusion when 0 < α < 1 and superdiffusion when α > 1 [5–8].

Passive and actively-driven diffusive motion are key to the spreading of viruses, vesicles, or proteins in living biological cells [9–11]. Pinpointing the precise details of their dynamics will ultimately pave the way for improved strategies in drug delivery, or lead to better understanding of molecular signalling used in gene silencing techniques. Similarly, improved analyses of the stochastic dynamics of financial or climate time series will allow us to find better comprehension of economic markets or climate impact.

The most-used observable in the analysis of time-series r(t) garnered for the position of viruses or vesicles by modern single-particle tracking setups in biological cells or for the key quantities in financial or climate dynamics, such as price or temperature, is the time-averaged MSD (TAMSD) [5, 7]

$\begin{equation}\bar{{\delta }^{2}\left({\Delta}\right)}=\frac{1}{T-{\Delta}}{\int }_{0}^{T-{\Delta}}{\left[\mathbf{r}\left(t+{\Delta}\right)-\mathbf{r}\left(t\right)\right]}^{2}\enspace \mathrm{d}t,\end{equation} \tag{ 1 }$

expressed as function of the lag time Δ for a given measurement time T.⁹ For BM at sufficiently long T, the TAMSD (1) converges to the MSD, formally ${\mathrm{lim}}_{T\to \infty }\enspace \bar{{\delta }^{2}\left({\Delta}\right)}=\langle {\mathbf{r}}^{2}\left({\Delta}\right)\rangle =2dD{\Delta}$ , reflecting the ergodicity of this process in the Boltzmann–Khinchin sense [12]. Anomalous diffusion processes may be MSD-ergodic, with a TAMSD of the form $\bar{{\delta }^{2}\left({\Delta}\right)}\simeq {{\Delta}}^{\beta }$ with β = α, e.g. fractional Brownian motion (FBM), or they may be 'weakly non-ergodic', e.g. β = 1 for continuous time random walks (CTRWs) with scale-free waiting times [5, 7, 12].

Due to the random nature of the process, the TAMSD is inherently irreproducible from one trajectory to another, even for BM. The emerging amplitude spread is quantified in terms of the dimensionless variable

$\begin{equation}\xi =\frac{\bar{{\delta }^{2}\left({\Delta}\right)}}{\left\langle \bar{{\delta }^{2}\left({\Delta}\right)}\right\rangle },\end{equation} \tag{ 2 }$

where $\left\langle \bar{{\delta }^{2}\left({\Delta}\right)}\right\rangle$ is the average of the TAMSD over many trajectories [7, 12]. The variance of ξ is the ergodicity breaking (EB) parameter $\mathrm{E}\mathrm{B}\left({\Delta}\right)=\left\langle {\xi }^{2}\right\rangle -1$ . Together with the full distribution ϕ(ξ), EB provides valuable information on the underlying stochastic process [7]. For BM, in the limit of large T, each realisation leads to the same result, ϕ(ξ) = δ(ξ − 1) and EB = 0. For scale-free CTRWs, even in the limit T → ∞ EB retains a finite value and the TAMSD remains a random variable, albeit with a known distribution ϕ(ξ) [5, 7, 12].

The MSD and TAMSD or, alternatively, the power spectrum and its single trajectory analogue [13, 14], are insufficient to fully characterise a measured stochastic process. A TAMSD of the form $\bar{{\delta }^{2}\left({\Delta}\right)}\simeq {\Delta}$ , e.g. may represent BM or weakly non-ergodic anomalous diffusion. Similarly, the linearity of the MSD, ⟨r²(t)⟩ ≃ t is the same for BM and for random-diffusivity models with non-Gaussian distribution (see below). For the identification of a random process from data, additional observables need to be considered which may then be used to build a decision tree [15]. Recent work targeted at objective ranking of the most likely process behind the data is based on Bayesian-maximum likelihood approaches or on machine learning applications [16–19]. While it has been shown that they successfully achieve physical model distinction given measured data, a disadvantage of these methods is that they are often technically involved and thus require particular skills, plus they are computationally expensive. Here we provide a comparatively easy-to-implement and reliable method based on large-deviation properties encoded in the TAMSD. As we will see, this method is very delicate and able to identify important properties of the physical process underlying the measured data. Moreover, it detects correlations in the data and has significantly sharper bounds than the well known Chebyshev inequality [20–22] widely used in different applications [23–25]. The large-deviation method is thus an important tool complementing analysis by TAMSD- or single-trajectory power spectra-amplitude fluctuations or displacement correlations. In the following we report analytical results for the large-deviation statistic of the TAMSD and demonstrate the efficacy of this approach for various data sets ranging from microscopic tracer motion to climate statistics.

The paper continues with a discussion of large deviations of the TAMSD in section 2. Section 3 then defines the data sets used, section 4 outlines the method while results are presented in section 5. We draw our conclusions in section 6. Details to calculations, additional information, and additional figures are presented in the appendices.

2. Large deviations of the TAMSD

Large-deviation theory is concerned with the asymptotic behaviour of large fluctuations of random variables [26–29]. It finds applications in a wide range of fields such as information theory [30], risk management [31], or the development of sampling algorithms for rare events [32]. In thermodynamics and statistical mechanics, large-deviation theory finds prominent applications as described in [33]. More recently large-deviations for a variety of random variables have been analysed for different stochastic processes [34–42]. In fact large-deviation theory is closely related to extreme value statistics [43–45] (see also appendix E) that also received renewed interest recently [46, 47].

An intuitive definition of the large-deviation principle can be given as follows. Let A_N be a random variable indexed by the integer N, and let P(A_N ∈ B) be the probability that A_N takes a value from the set B. We say that A_N satisfies a large-deviation principle with rate function I_B if $P\left({A}_{N}\in B\right)\approx {\mathrm{e}}^{-N{I}_{B}}$ , at large N [33]. The exact definition operates with supremum and infimum of the above probability and the rate function [29]. However, sometimes it is difficult or even impossible to find explicit formulas for the rate function or the large-deviation principle. Still, in such cases one may be able to find an upper bound for the probability P(A_N ∈ B), i.e. $P\left({A}_{N}\in B\right){\leqslant}{\mathrm{e}}^{-N{I}_{B}}$ . This is exactly the case we consider here.

When the TAMSD is a random variable corresponding to a Gaussian processes, we arrive at a strict upper bound on the probability P((ξ − 1) > ɛ) that a given realisation of the TAMSD deviates from the expected mean by a preset amount ɛ [48]: $P\left(\left(\xi -1\right){ >}\varepsilon \right){\leqslant}{\mathrm{e}}^{-F\left(\varepsilon ,{\Delta},N\right)}$ . Here, F is a function of the deviation ɛ, the lag time Δ, and the number N of points in the trajectory. We will see in what follows that for the only relevant physical case N ≫ Δ the function F can be written as F(ɛ, Δ, N) ≈ NI_ɛ(Δ). This form highlights the large deviation principle for the TAMSD of BM.

2.1. Theoretical bounds on the deviations of TAMSD

BM is characterised by the overdamped Langevin equation $\mathrm{d}X\left(t\right)/\mathrm{d}t=\sqrt{2D}\eta \left(t\right)$ , driven by white Gaussian noise η(t) with zero mean and autocorrelation function ⟨η(t₁)η(t₂)⟩ = δ(t₁ − t₂). In the following we consider discretised trajectories of BM, $\mathbb{X}=\left(X\left(1\right),X\left(2\right),\dots ,X\left(N\right)\right)$ . For BM the following statements can be shown to hold.

2.1.1. Chebyshev's inequality

Before we come to large-deviations, we recall the (one-sided) Chebyshev inequality for any random variable X with mean μ and finite variance. For BM, Chebyshev's inequality for the TAMSD reads (see appendix A for details)

$\begin{equation}P\left(\left(\xi -1\right){\geqslant}\varepsilon \right){\leqslant}4{\Delta}/\left(4{\Delta}+3N{\varepsilon }^{2}\right).\end{equation} \tag{ 3 }$

While this inequality is useful for a first analysis and will serve as a reference below, we will show that the large-deviation result presented here has significantly sharper bounds.

2.1.2. Large deviations of the TAMSD for BM

By employing the properties of Gaussian quadratic forms and Chernoff's inequality for subgamma random variables the following exact formula was obtained [48],

$\begin{equation}P\left(\left(\xi -1\right){ >}\varepsilon \right){\leqslant}\mathrm{exp}\left(-\left(N-{\Delta}\right)\tilde {I}\right),\end{equation} \tag{ 4 }$

where the function $\tilde {I}=a\mathcal{H}\left(b\right)$ with $a=\left[4{D}^{2}{\Delta}\left({\Delta}+1\right)\left(2{\Delta}+1\right)\right]/\left[3\bar{\lambda }{\left({\Delta}\right)}^{2}\right]$ and $b=\left[3\bar{\lambda }\left({\Delta}\right)\varepsilon \right]/\left[2D\left({\Delta}+1\right)\left(2{\Delta}+1\right)\right]$ . Moreover, $\mathcal{H}\left(u\right)=1+u-\sqrt{1+2u}$ and $\bar{\lambda }\left({\Delta}\right)=2\enspace \mathrm{max}\left\{{\lambda }_{j}\left({\Delta}\right)\right\}$ , where λ_j(Δ) (j = 1, 2, ..., N −Δ) are the eigenvalues of the (N − Δ) × (N − Δ) positive-definite covariance matrix Σ(Δ) for the increment vector $\mathbb{Y}=\left(X\left(1+{\Delta}\right)-X\left(1\right),X\left(2+{\Delta}\right)-X\left(2\right),\dots ,X\left(N\right)-X\left(N-{\Delta}\right)\right)$ . Note the exponential decay of the upper bound with (N − Δ) as long as the function $\tilde {I}$ is independent of (N − Δ). Moreover although the diffusion coefficient D explicitly appears in (4) it cancels out both in the function $\mathcal{H}\left(\cdot \right)$ and its prefactor, as $\bar{\lambda }$ contributes the factor D. It is noteworthy that $\tilde {I}$ is independent of the diffusion coefficient D. This can be understood intuitively, as different values of D in the log–log plot of the TAMSD merely shift the amplitude but leave the amplitude spread unchanged [7, 12].

For the special choices Δ = 1 and Δ = 2 the eigenvalues of Σ(Δ) can be calculated explicitly. This is relevant because for such low values of Δ, the conclusions drawn from the TAMSD analysis of sufficiently long T (large N) are statistically significant. The eigenvalues are obtained numerically for other values of Δ [49]. For Δ = 1, the eigenvalues λ_j(Δ = 1) = 2D and therefore $\bar{\lambda }=4D$ . Using this in (4) we get

$\begin{equation}P\left(\left(\xi -1\right){ >}\varepsilon \right){\leqslant}\mathrm{exp}\left\{-\left(N-1\right)\mathcal{H}\left(\varepsilon \right)/2\right\}.\end{equation} \tag{ 5 }$

Thus for Δ = 1 the function $\tilde {I}=\mathcal{H}\left(\varepsilon \right)/2$ is independent of (N − Δ) and consequently the large deviation bound clearly decays exponentially with (N − Δ) ≈ N.

For Δ = 2 the eigenvalues are given as (see appendix B) λ_j(Δ = 2) = 4D[1 + cos(jπ/[N − 1])]. We then have $\bar{\lambda }\left({\Delta}\right)=2\enspace \mathrm{max}\left\{{\lambda }_{j}\left({\Delta}\right)\right\}=8D\left[\right.1+\mathrm{cos}\left(\pi /\left[N-1\right]\right)\approx 16D$ . Note that in this case although $\tilde {I}$ depends on N through $\bar{\lambda }$ , the upper bound still decays exponentially with N at large N.

For larger Δ the eigenvalues and $\bar{\lambda }\left({\Delta}\right)$ can be calculated numerically. Figure C1 in appendix C shows that with increasing Δ, $\bar{\lambda }\left({\Delta}\right)$ becomes N-dependent. However, as demonstrated in figures C2 and C3 the large deviation bound (the right-hand side of equation (4) still decays exponentially with N with the same behaviour for other ɛ values). Thus, by using the results of analytical calculations for small Δ (Δ = 1 and 2) along with a numerical analysis for larger Δ we conclude that the large deviation form exp(−NI_ɛ(Δ)) of the upper bound for the probability P((ξ − 1) > ɛ) is valid asymptotically at large N provided that Δ ≪ N.

The schematics in figure 1 outlines how the upper bound (4) can be utilised to check for the consistency with BM of a given dataset of trajectories. As is seen in the plot of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ vs ɛ, the upper bound (4) divides the plot area into two regions: the region below the curve (grey-shaded) is allowed while the region above the curve is forbidden for $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ of a data set consistent with BM. This is verified for a simulated data set of M = 100 BM trajectories with N = 300 points each. The empirical probability $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ was computed using the procedure outlined in section 4 with the lag time fixed at Δ = 1. This probability of deviation of the normalised TAMSD from its mean clearly lies in the grey-shaded Brownian domain for the simulated BM dataset as expected from the theoretical bound (4).

**Figure 1.** Central idea of the large-deviation based analysis. Left: schematic showing how a data set of trajectories can be checked for consistency with BM. Right: illustration of the method. The theoretical curve (4) (labelled 'largedev bound') divides the plot area into two regions: the grey-shaded region corresponds to the Brownian domain including permitted values of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ for a BM, and the unshaded region represents non-Brownian values. Estimates of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ for simulated BM (labelled 'BM') lies in the Brownian domain. The lag time was fixed at Δ = 1 and the data set consisted of M = 100 trajectories with N = 300 points each.
Download figure:
Standard image High-resolution image

3. Data sets for large-deviation analysis

In order to delve into the question of what information one might gain from computing the empirical probability $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ for a given data set and comparing it with the theoretical bound (4) for BM, we consider data sets from different stochastic processes and experiments. These data sets, which contain both BM and non-Brownian processes, are described below.

3.1. Simulated data

Simulated data serve as benchmarks for the experimental data below. We simulate 100 trajectories each for different processes (figures 2(A)–(D)). This number of trajectories is of the same order as in the experimental data sets. A larger set of 10 000 analysed trajectories is presented in figure 3. In addition to BM, we simulate FBM, scaled Brownian motion (SBM), CTRW, superstatistical process, and diffusing-diffusivity (DD) process, see appendix F for their exact definition. FBM [50] is governed by the Langevin equation, driven by power-law correlated fractional Gaussian noise (FGN) η_H(t) with Hurst index H (0 < H < 1), related to the anomalous diffusion exponent by α = 2H. SBM is characterised by the standard Langevin equation but with time-dependent diffusivity D(t) ∝ t^α−1 [7, 51]. CTRW is a renewal process with Gaussian jump lengths and long-tailed distribution ψ(τ) ≃ τ^(−1−α) (0 < α < 1) of sojourn times between jumps [52, 53]. For the simulated superstatistical process [54, 55] the diffusivity for each trajectory is drawn from a Rayleigh distribution. Finally, the DD process is governed by the Langevin equation with white Gaussian noise, but with a time-dependent, stochastic diffusivity, evolving as the square of an Ornstein–Uhlenbeck (OU) process with correlation time τ_c [56].

**Figure 2.** Variation of the estimates of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ as function of the deviation ɛ. In the panels the theoretical curve (4) (labelled 'largedev bound') divides the plot area into two regions: the grey-shaded region corresponds to the Brownian domain including permitted values of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ for a BM, and the unshaded region represents non-Brownian values. Different diffusive processes can be categorised and distinguished based on the estimated values of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ for the given process of data. (A)–(D) show results for simulated processes with M = 100 trajectories of N = 300 points each, and for lag times Δ = 1, 2, 10, and 20, respectively. The insets show the results on semi-log scale. The statistical uncertainty is of the order of 0.01. The parameters of the simulated stochastic processes are: D = 0.5 for BM, D_H = 0.5 for FBM, D₀ = 0.5 for SBM, τ₀ = 1 for CTRW, D₀ = 10 for the superstatistical process, τ_c = 10 and D_⋆ = 0.2 for DD. (E)–(H) show results for different experimental datasets for lag times Δ = 1, 2, 10, and 20, respectively. The insets again show the results on log–lin scale. The statistical uncertainty is of the order of 0.01. For more details see appendix F.
Download figure:
Standard image High-resolution image

**Figure 3.** Variation of the estimates of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ with respect to ɛ, for different simulated datasets with N = 300, M = 10 000 at lag times Δ = 1, 2, 10 and 20 simulation time steps. The parameters of the simulated processes are the same as in figure 2. The statistical uncertainty is of the order of 10⁻⁴. The fact that the results do not change significantly as compared to figure 2 highlights the robustness of the large deviation analysis.
Download figure:
Standard image High-resolution image

3.2. Beads tracked in aqueous solution

This data set (labelled 'BM, x-dim' and 'BM, y-dim' for the two directions) consists of 150 two-dimensional trajectories from single particle tracking of 1.2 μm-sized polystyrene beads in aqueous solution [13]. The time resolution of the data is 0.01 s.

3.3. Beads tracked in mucin hydrogels

These data are from micron-sized tracer beads tracked in mucin hydrogels (MUC5AC with 1 wt% mucin) at pH = 2 (labelled 'pH = 2, x-dim' and 'pH = 2, y-dim') and pH = 7 (labelled 'pH = 7, x-dim' and 'pH = 7, y-dim') [57]. The imaging was performed at a rate of 30.3 frames per second. The pH = 2 data set consists of 131 two-dimensional trajectories of 300 points each while the pH = 7 data set consists of 50 trajectories of 300 points each.

3.4. Climate data

We also use daily temperature records over a 100 year period, after removing the annual cycle (these 'anomalies' represent deviations from the corresponding mean daily temperature) [58]. This data set consists of uninterrupted daily temperature recordings starting 1 January 1893 and are validated by the German Weather Service (Deutscher Wetterdienst, 2016). The records were taken at the meteorological station at Potsdam Telegraphenberg (52.3813 latitude, 13.0622 longitude, 81 m above sea level).

4. Description of the test algorithm

We here outline the procedural algorithm of how, from a data set of trajectories we compute the empirical probability $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ for large deviations of the normalised TAMSD ξ, equation (2), given the preset deviation ɛ. We take M discretised trajectories of length N of a given process (simulated or experimental) as basis. For a fixed time lag Δ the test algorithm then proceeds as follows:

(a)
Calculate the TAMSD for each trajectory according to the discretised expression
$\begin{equation}\bar{{\delta }^{2}\left({\Delta}\right)}={\left(N-{\Delta}\right)}^{-1}\sum _{j=1}^{N-{\Delta}}{\left[X\left(j+{\Delta}\right)-X\left(j\right)\right]}^{2}.\end{equation} \tag{ 6 }$
(b)
Calculate the ensemble-averaged TAMSD $\left\langle \bar{{\delta }^{2}\left({\Delta}\right)}\right\rangle$ .
(c)
For each trajectory calculate the normalised TAMSD ξ, equation (2).
(d)
Calculate the number of trajectories M_ɛ that satisfy the condition ξ − 1 > ɛ for a given value of ɛ.
(e)
The empirical probability $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ of the deviations is calculated as M_ɛ/M.

5. Results

5.1. Large deviations in simulated data sets

Figures 2(A)–(D) shows the comparison of the simulated data with the theoretical upper bounds (3) and (4) for BM, as function of the deviation ɛ ∈ [0.1, 1]. Here each of the M = 100 simulated trajectories was of length N = 300. For small values of the deviation parameter ɛ the theoretical bound (4) is quite high, leading to uninteresting results for P((ξ − 1) > ɛ) which for all the processes fall within the BM bound. Also for very large values of ɛ the estimated values of P((ξ − 1) > ɛ)—corresponding to the fraction of trajectories satisfying the bound—drops to zero as intuitively expected, again yielding uninteresting results. Thus it is the intermediate range of ɛ values that produces interesting results for the classification of different processes. This range depends on the processes under investigation as well as the respective lag time. We find that for the simulated processes and the experimental datasets considered in this study the deviation parameter ɛ in the range [0.3, 0.8] is the most informative.

According to figures 2(A)–(D) we see that the theoretical bound (4) from large-deviation theory clearly distinguishes model classes and/or diffusive regimes. BM, subdiffusive FBM, and superdiffusive SBM clearly lie below the bound (4). In contrast, superdiffusive FBM with a large H exponent, subdiffusive SBM with a small α, CTRW, and random-diffusivity models (superstatistical and DD) clearly exceed the bound (4). Thus, non-Gaussianity (as realised for CTRW and the random-diffusivity processes) is not a unique criterion for the violation of the large-deviation bound. But according to these results the large-deviation method, for a given value of the scaling exponent α, allows to distinguish FBM and SBM that both have a Gaussian probability density function (PDF). These observations can be leveraged to extend the schematic in figure 1 to build an empirical initial test to narrow down the list of plausible models (see figure H1). However the conclusions drawn from such an extension depends on the assumption that all trajectories in the data set come from the same process, and clearly more analytical work needs to be done to address processes such as FBM, SBM, or CTRW, as well as mixtures of these models or noisy trajectories [59] more specifically.

The large deviation method is surprisingly robust with respect to the number of analysed trajectories, as can be seen from the marginal improvement of the results based on 10 000 trajectories in figure 3. Figure H2 shows the effect of change in the number of points N and lag time Δ on the analysis. Variation of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ with N at fixed lag time Δ = 1 and ɛ = 0.5 shows that the distinction of different processes remain valid for trajectories of different length. Moreover it also shows the exponential decay of the theoretical $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ with (N − Δ). Variation of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ with Δ at fixed N = 300 and ɛ = 0.5 shows that the analysis is useful at Δ ≪ N, as for long Δ the empirical $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ for all the processes fall within the theoretical bound (4), see the right panel in figure H2. Note that this is relevant because the TAMSD values at short Δ are more significant statistically than at long Δ. Figures H3 and H4 further analyse FBM and demonstrate the validity of the theoretical bound (D.1) derived for FBM. For further analysis of SBM see figure H5.

Chebyshev's inequality (3) practically provides the same result as the large-deviation bound for the very short Δ = 1, as seen in figures 2(A) and 3(A). However, at longer Δ it provides a much higher estimate than the large-deviation bound, and it is unable to distinguish subdiffusive SBM with α = 0.3 from BM, as both lie below Chebyshev's bound, see figures 2(B)–(D). Moreover, for long Δ the probability $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ for all simulated processes lie either below or very close to the bound of Chebyshev's inequality, rendering it ineffective in discerning different processes¹⁰ . Chebyshev's inequality (3) lies above the large deviation bound (4), except for the cases Δ = 1 and 2 with small ɛ, when it is slightly below but still quite close to the bound set by (4).

5.2. Large deviations in experimental data sets

5.2.1. Beads tracked in aqueous solution

Polystyrene beads tracked in aqueous solution were analysed in [13] using single-trajectory power spectral analysis, concluding that the data are consistent with BM. From figures 2(E)–(H) it can be seen that the estimated probability $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ somewhat exceeds the theoretical bound (4) for BM. To understand this non-BM-like behaviour shown in the large-deviation analysis we closely examined the motion of individual beads. Indeed, the displacement distributions of some beads showed non-Gaussian behaviour, that we could attribute to bead-bead collisions as well as to imprecise localisation of the bead centre when the recorded tracks suffered from non-localised brightness. We removed the non-Gaussian trajectories using the Jarque–Bera (JB) test component-wise (see appendix G). From the filtered data set (M = 129 in x-direction and M = 125 in y-direction) we see that the large-deviation analysis within the error bars is now consistent with BM (especially for Δ = 1, see figure H10). The large-deviation analysis is thus more sensitive to non-BM-like behaviour than other methods [13]. We also note that the analysis based on Chebyshev's inequality could not distinguish these features.

5.2.2. Beads tracked in mucin hydrogels

The data sets (M = 131 at pH = 2 and M = 50 at pH = 7) consisting of beads tracked in mucin hydrogels show different trends of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ depending on the pH values, as seen for N = 300 in figures 2(E)–(H). Notably, for the beads tracked at pH = 2 $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ remains significantly above the bound set by (4), particularly at short Δ. This implies that the spread of the TAMSD is inconsistent with BM and hence the dynamics cannot be explained solely by BM. The data sets at pH = 7 show significantly different behaviour. We observe a clear distinction in the trend of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ along the two directions of motion. Along the x-direction (labelled 'x-dim') $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ remains slightly above the theoretical bound for BM from large-deviation theory for most of the range of ɛ at Δ = 1 and Δ = 2, while it remains below the theoretical bound for the motion along the y-direction. As for the beads in aqueous solution, Chebyshev's inequality provides a looser bound.

The mucin data sets were analysed extensively in terms of Bayesian and other standard data analysis methods in [60]. The MSD and TAMSD exponents for the data at pH = 2 and 7 correspond to α = 0.46 and 0.36 and $\left\langle \beta \right\rangle =1.09$ and 0.94, respectively. The angular bracket for β denotes that these exponents were determined from the ensemble-averaged TAMSD. The discrepancy between the α and β values suggest EB and hence a contribution from a model such as CTRW. For CTRW, the ensemble-averaged TAMSD scales with the total measurement time T as a power-law [7]. However, as shown in [60], the ensemble-averaged TAMSD for the data sets at pH = 7 showed no dependence on T, while the data sets at pH = 2 showed a very weak dependence, ruling out CTRW as a model of diffusion. Moreover in the Bayesian analysis carried out in [60] BM, FBM and DD models were compared and relative probabilities were assigned to each of them, based on the likelihood for each trajectory to be consistent with a given process. It was observed that for both pH = 2 and pH = 7, and for most of the trajectories, both BM and FBM had high probabilities. On comparing the estimated Hurst index H for the FBM, it was seen that for pH = 7, H ≈ 0.5 with a very small spread from trajectory to trajectory. In this sense, the pH = 7 data seemed to be very close to BM. This was also confirmed independently by looking at β extracted from the TAMSD. In contrast, the estimated H for the pH = 2 data showed a large spread in the range 0.3 ⩽ H ⩽ 0.7. These observations are now clearly supported by the results for $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ , demonstrating that the data sets at pH = 7 are close to BM while the data sets at pH = 2 cannot be explained (solely) by BM. Thus, for this data set the large-deviation analysis again demonstrates its effectiveness in unveiling the physical origin of the stochastic time series.

5.2.3. Climate data

The climate data were successfully modelled by an autoregressive fractionally integrated moving average model, more specifically, ARFIMA(1, d, 0) with d ≈ 0.15 [58, 61]. ARFIMA(0, d, 0) corresponds to FGN with H = d + 0.5. It demonstrates the long-range correlations which is characteristic of FGN. We mention here that long-range correlations in the variations of the maximum daily temperature from their average values were reported for data from 14 different meteorological stations in [62]. The climate data set under consideration here also exhibited short range correlations due to which ARFIMA(1, d, 0) fitted the data better than ARFIMA(0, d, 0) [58]. These short-range correlations could be explained by the average atmospheric circulation period of 4–5 days [58]. For our tests of deviations of the TAMSD from the ensemble-averaged TAMSD, we construct FBM trajectories (M = 100) of length N = 300 by taking a cumulative sum of FGN. If the temperature anomalies could be described by ARFIMA(0, d, 0), or, equivalently, by FGN, the cumulative sum would be FBM and hence should show a similar trend of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ , as seen for the simulated FBM processes in figures 2(A)–(D). That means that it remains below the theoretical upper bounds (3) and (4) for FBM, as long as the scaling exponent does not become too large. Alternatively, deviations of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ from the trend exhibited by simulated FBM, particularly at α = 1.3 corresponding to d = 0.15 reported in [58], would support the result in [58] that ARFIMA(0, d, 0) does not completely explain the data of surface temperature anomalies. This indeed turns out to be the case for N = 300 in figures 2(E)–(H) where we observe that $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ remains above the theoretical upper bound for BM from large-deviation theory, especially at short Δ. Moreover, comparing with figures 2(A)–(D) we clearly observe that $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ remains well above the theoretical upper bound (4) for BM for the climate data at sufficiently large values of ɛ, while it always remains below the bound for simulated FBM with α = 1.3 for all lag times. This corroborates the finding in [58] that ARFIMA(0, d, 0) (or equivalently FBM for the data constructed by taking the cumulative sum) cannot completely explain the climate data. In comparison, Chebyshev's inequality (3) provides the same information for short lag times but fails to distinguish the climate data from corresponding simulated FBM for long Δ, as this bound lies above the empirical probability $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ for both corresponding simulated FBM and climate data. In order to check whether the short-term correlations are indeed relevant, we create an artificially correlated process in the form of an integrated OU process, the results of which are shown in figure 4. With a correlation length of five steps the result of this OU process indeed leads to an ɛ dependence of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ that is very similar to the climate data's. Conversely, as soon as we remove the correlations in the climate data by random reshuffling of the temperature anomalies, the large-deviation behaviour becomes BM-like.

**Figure 4.** Variation of the estimates of $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ as function of ɛ for the climate data, in comparison with the integrated OU process with correlation time of five steps. (A)–(D), respectively, show the results for Δ = 1, 2, 10, and 20. The insets show the results in semi-log scale. We observe that random shuffling of the temperature anomalies, before taking the cumulative sum to create the trajectories, removes the correlations in the data, and P((ξ − 1) > ɛ) behaves very similarly to BM ('uncorr climate').
Download figure:
Standard image High-resolution image

6. Conclusions

It is the purpose of time series analysis to detect the underlying physical process encoded in a measured trajectory, and thus to unveil the physical mechanisms governing the spreading of, e.g. viruses, vesicles, or signalling proteins in living cells or tissues. Recently considerable work has been directed to the characterisation of stochastic trajectories using Bayesian analysis [16–18, 63, 64, 115] and machine learning [19, 65, 66]. While highly successful these methods are typically technically involved and expensive computationally. Moreover, the associated algorithms often heavily rely on data pre-processing [19]. To avoid overly expensive computations, it is highly advantageous to first go through a decision tree, to narrow down the possible families of physical stochastic mechanisms. For instance, one can eliminate ergodic versus non-ergodic or Gaussian versus non-Gaussian processes, etc.

Here we analyse a new method based on large-deviation theory, concluding that it is a highly efficient and easy-to-use tool for such a characterisation. We show how we can straightforwardly infer relevant information on the underlying physical process based on the theoretical bounds of the deviations of the TAMSD—routinely measured in single-particle-tracking experiments and supercomputing studies and easy to construct for any time-series such as daily temperature data—from the corresponding trajectory-average. Specifically, we demonstrate that this tool is able to detect the short-time correlations which effect non-FBM behaviour in daily temperature anomalies, as well as the crossover from BM-like behaviour at pH = 7 to non-ergodic, non-BM-like at pH = 2 for the mucin data, and the delicate sensitivity to non-Gaussian trajectories for beads in aqueous solution. We conclude from our analyses here that the large-deviation method would be an excellent basis for a first efficient screening of measured trajectories, before, if necessary, more refined methods are then applied with a reduced search space of possible physical mechanisms.

There are two seeming limitations to the large-deviation tool. First, it is easy to formulate this tool for one-dimensional trajectories, while the generalisation to higher dimensions is not straightforward. However, as we demonstrated it can be used component-wise and, remarkably, can be used to probe the degree of isotropy of the data. In fact, from figures 2(E) and (F) we concluded that the tracer bead motion in mucin at pH = 7 was anisotropic. In this sense, the one-dimensional definition of the large deviation tool is in fact an advantage. Second, it is not trivial to derive similar expressions as (4) for other stochastic processes. Here, numerical evaluations can be used instead. Moreover, in this case we can also use Chebyshev's inequality, with the caveat that it works best at short lag times Δ. Generally, however, the bound provided by the large-deviation theory is considerably more stringent than Chebyshev's inequality, as shown by our analysis here.

For anomalous diffusion processes, we demonstrated that superdiffusive FBM with large H values is outside the large-deviation bound. Superdiffusive FBM applied in mathematical finance are indeed in this range of H values [67–69], and our large-deviation tool is therefore well suited for the analysis of such processes, particularly for different anomalous diffusion exponents. We also showed that the large-deviation tool is able to uncover subtle correlations in the data, similarly to ARFIMA analyses applied mainly in mathematical finance and time series analysis. This similarity between the two methods strengthens the connections to physical models recently worked out between random coefficient autoregressive models and random-diffusivity models [70].

The large-deviation test investigated here is a highly useful tool serving as an easy-to-implement and to-apply initial test in the decision tree for the classification of the physical mechanisms underlying measured time series from single particle trajectories. We note that this work represents a first step in the study of large deviation theory applications to the analysis and classification of stochastic trajectories. While here we demonstrated that the method is useful and sufficiently sensitive in the aspects discussed above, more work is needed to extend this method to other stochastic processes as well as processes with measurement noise.

In building a full-fledged decision tree to identify the physical process underlying a measured time series [15] as many different complementary observables as possible should be analysed [5, 7, 111]. More conventionally, these include the ensemble-averaged MSD, the TAMSD, position and displacement correlation functions, codifference methods [114], non-Gaussianity parameters or kurtosis [56], first-passage methods [109, 110], mean-maximal excursion methods [113], the p-variation method [78], or single trajectory ergodicity and mixing measures [112]. Other interesting methods for time series analysis are based on novel information entropy measures and feature identification methods [116–120]. Finally we mention the use of Josephson devices as detectors to efficiently detect a non-Gaussian noise component, the Lévy flights, in a Gaussian background noise signal [121].

Acknowledgments

ST acknowledges Deutscher Akademischer Austauschdienst (DAAD) for a PhD Scholarship (program ID 57214224). CEW is an Open Philanthropy Project fellow of the Life Sciences Research Foundation. RM and AC acknowledge financial support by the German Science Foundation (DFG, Grant ME 1535/7-1). RM also acknowledges the Foundation for Polish Science (Fundacja na rzecz Nauki Polskiej) for support within a Humboldt Polish Honorary Research Scholarship. We acknowledge the support of the German Research Foundation (DFG) and the Open Access Publication Fund of Potsdam University.

Appendix A.: Derivation of Chebyshev's inequality for TAMSD of BM

Using the Markov inequality, one can also show that for any random variable with mean μ and variance σ², and any positive number k > 0, the following Chebyshev inequality (one-sided) holds [21]

$\begin{equation}P\left(X-\mu {\geqslant}k\right){\leqslant}\frac{{\sigma }^{2}}{{\sigma }^{2}+{k}^{2}}.\end{equation} \tag{ A.1 }$

Here we derive Chebyshev's inequality for the TAMSD statistic for BM. For the TAMSD it takes the following form

$\begin{equation}P\left(\left(\xi -1\right){\geqslant}k/\left\langle \bar{{\delta }^{2}\left({\Delta}\right)}\right\rangle \right){\leqslant}\frac{{\sigma }^{2}}{{\sigma }^{2}+{k}^{2}},\end{equation} \tag{ A.2 }$

where ${\sigma }^{2}=\mathrm{Var}\left(\bar{{\delta }^{2}\left({\Delta}\right)}\right)=4{\left\langle \bar{{\delta }^{2}\left({\Delta}\right)}\right\rangle }^{2}{\Delta}/3N$ , [71]. Taking the notation $\varepsilon =k/\left\langle \bar{{\delta }^{2}\left({\Delta}\right)}\right\rangle$ one obtains the following

$\begin{align}\hfill P\left(\left(\xi -1\right){\geqslant}\varepsilon \right)& {\leqslant}\frac{4{\left\langle \bar{{\delta }^{2}\left({\Delta}\right)}\right\rangle }^{2}{\Delta}/3N}{4{\left\langle \bar{{\delta }^{2}\left({\Delta}\right)}\right\rangle }^{2}{\Delta}/3N+{\varepsilon }^{2}{\left\langle \bar{{\delta }^{2}\left({\Delta}\right)}\right\rangle }^{2}}\hfill \\ \hfill & =\frac{4{\Delta}}{4{\Delta}+3N{\varepsilon }^{2}}.\hfill \end{align} \tag{ A.3 }$

Appendix B.: Eigenvalues of the covariance matrix of increments for BM

The (N − Δ) × (N − Δ) positive-definite covariance matrix Σ(Δ) for the vector of increments $\mathbb{Y}=\left(X\left(1+{\Delta}\right)-X\left(1\right),X\left(2+{\Delta}\right)-X\left(2\right),\dots ,X\left(N\right)-X\left(N-{\Delta}\right)\right)$ takes the form

$\begin{equation}{\Sigma}\left({\Delta}\right)=\left[\begin{matrix}\hfill {\sigma }_{{\Delta}}\left(0\right)\hfill & \hfill \enspace \enspace \enspace {\sigma }_{{\Delta}}\left(1\right)\enspace \enspace \enspace \hfill & \hfill \enspace \enspace \enspace {\sigma }_{{\Delta}}\left(2\right)\enspace \enspace \enspace \hfill & \hfill \enspace \enspace \dots \enspace \hfill & \hfill \enspace \enspace \enspace \dots \enspace \enspace \hfill & \hfill {\sigma }_{{\Delta}}\left(N-{\Delta}-1\right)\hfill \\ \hfill {\sigma }_{{\Delta}}\left(1\right)\hfill & \hfill {\sigma }_{{\Delta}}\left(0\right)\hfill & \hfill {\sigma }_{{\Delta}}\left(1\right)\hfill & \hfill \ddots \hfill & \hfill \hfill & \hfill {\vdots}\hfill \\ \hfill {\sigma }_{{\Delta}}\left(2\right)\hfill & \hfill {\sigma }_{{\Delta}}\left(1\right)\hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill {\vdots}\hfill \\ \hfill {\vdots}\hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill \ddots \hfill & \hfill {\sigma }_{{\Delta}}\left(1\right)\hfill & \hfill {\sigma }_{{\Delta}}\left(2\right)\hfill \\ \hfill {\vdots}\hfill & \hfill \hfill & \hfill \ddots \hfill & \hfill {\sigma }_{{\Delta}}\left(1\right)\hfill & \hfill {\sigma }_{{\Delta}}\left(0\right)\hfill & \hfill {\sigma }_{{\Delta}}\left(1\right)\hfill \\ \hfill {\sigma }_{{\Delta}}\left(N-{\Delta}-1\right)\hfill & \hfill \dots \hfill & \hfill \dots \hfill & \hfill {\sigma }_{{\Delta}}\left(2\right)\hfill & \hfill {\sigma }_{{\Delta}}\left(1\right)\hfill & \hfill {\sigma }_{{\Delta}}\left(0\right)\hfill \end{matrix}\right]\end{equation} \tag{ B.1 }$

with its elements given by

$\begin{equation}{\sigma }_{{\Delta}}\left(j\right)=\begin{cases}2D\left({\Delta}-j\right)\hfill & j{\leqslant}{\Delta}-1\hfill \\ 0\hfill & j{ >}{\Delta}-1\hfill \end{cases}.\end{equation} \tag{ B.2 }$

For the case of Δ = 1, the (N − 1) × (N − 1) covariance matrix Σ(Δ = 1) has elements given by

$\begin{equation}{\sigma }_{{\Delta}=1}\left(j\right)=\begin{cases}2D\hfill & j=0\hfill \\ 0\hfill & j{ >}0\hfill \end{cases}.\end{equation} \tag{ B.3 }$

Hence the matrix Σ(Δ = 1) is a diagonal matrix with the constant main diagonal 2D and all zero entries outside the main diagonal. The characteristic polynomial of Σ(Δ = 1) has the form

$\begin{equation*}\vert {\Sigma}\left({\Delta}=1\right)-\lambda I\vert ={\left(\lambda -2D\right)}^{N-1}\end{equation*}$

and roots λ_j(Δ = 1) = 2D, which are the eigenvalues of that matrix.

For the case Δ = 2 the (N − 2) × (N − 2) covariance matrix Σ(Δ = 2) has elements given by

$\begin{equation}{\sigma }_{{\Delta}=2}\left(j\right)=\begin{cases}2D\left(2-j\right)\hfill & j=0,1\hfill \\ 0\quad \hfill & \quad j{ >}1\hfill \end{cases}.\end{equation} \tag{ B.4 }$

Hence the matrix Σ(Δ = 2) is a tridiagonal Toeplitz matrix. The formula for the eigenvalues of such matrices is well known in the mathematical literature [72],

$\begin{equation*}{\lambda }_{j}\left({\Delta}=2\right)=D\left[4+4\enspace \mathrm{cos}\left(\frac{j\pi }{N-1}\right)\right].\end{equation*}$

Appendix C.: Dependence of $\bar{\lambda }$ and large deviation bound as function of trace length N

Figure C1 shows the parameter $\bar{\lambda }$ as function of the trace length N for different values of the lag time Δ. For Δ = 1 and Δ = 2, $\bar{\lambda }$ is computed from the exact expressions as derived in the appendix B, while for Δ > 2, numerical computations are used. It can be seen from that only for short lag times, Δ ≪ N, $\bar{\lambda }$ is independent of N, while there is a significant dependence on N for longer Δ. In the limit Δ ≪ N (very large N), $\bar{\lambda }$ appears to saturate to a plateau value depending on the lag time Δ, albeit with appreciable fluctuations. The consequence of this observation is the exponential decay with N of the large-deviation bound given by equation (4) for different values of Δ at fixed ɛ. This is demonstrated in figure C2 for ɛ = 0.5, where the logarithm with base 10 of the right-hand-side of equation (4) decays linearly with N. Thus the exponential form of the bound on the probability $P\left(\left(\xi -1\right){ >}\varepsilon \right)$ holds as long as Δ ≪ N.

**Figure C2.** Large-deviation bound, the right-hand side of equation (4), as function of the number of points N with fixed large deviation parameter ɛ = 0.5. Note the distinct exponential decay with N for different values of the lag time Δ.
Download figure:
Standard image High-resolution image

This can also be seen from equation (4): indeed, our numerical estimates show that the value of b is less than unity in the depicted domain of the parameters , Δ, and N, thus H(b) ≈ b²/2, and the function $\tilde {I}$ becomes independent of $\bar{\lambda }$ and, respectively, of N. This is demonstrated in figure C3.

**Figure C3.** $\tilde {I}$ as function of N for different Δ, with = 0.5.
Download figure:
Standard image High-resolution image

**Figure C3.** $\tilde {I}$ as function of N for different Δ, with = 0.5.
Download figure:
Standard image High-resolution image

Appendix D.: Large deviations of TAMSD for FBM

Taking equation (4.5) from [48] one can obtain the large deviation theory for FBM (see below for details of the stochastic process FBM). Namely, if we consider the vector of increments $\mathbb{Y}=\left(X\left(1+{\Delta}\right)-X\left(1\right),X\left(2+{\Delta}\right)-X\left(2\right),\dots ,X\left(N\right)-X\left(N-{\Delta}\right)\right)$ of FBM with Hurst exponent H and generalised diffusion coefficient D_H then we have

$\begin{equation}P\left(\left(\xi -1\right){ >}\varepsilon \right){\leqslant}\mathrm{exp}\left(-\left(N-{\Delta}\right)\tilde {I}\right),\end{equation} \tag{ D.1 }$

where the function $\tilde {I}=a\mathcal{H}\left(b\right)$ with $a=2{D}_{\mathrm{H}}^{2}S\left({\Delta},H,N\right)/\bar{\lambda }{\left({\Delta}\right)}^{2}$ and $b=\frac{\bar{\lambda }\left({\Delta}\right)\varepsilon {{\Delta}}^{2H}}{{D}_{\mathrm{H}}S\left({\Delta},H,N\right)}$ . Here the function $\mathcal{H}\left(u\right)=1+u-\sqrt{1+2u}$ and $\bar{\lambda }\left({\Delta}\right)=2\enspace \mathrm{max}\left\{{\lambda }_{j}\left({\Delta}\right)\right\}$ , where λ_j(Δ) (j = 1, 2, ..., N − Δ) are the eigenvalues of the (N − Δ) × (N − Δ) positive-definite covariance matrix Σ(Δ) for the vector of increments for FBM. Moreover the function S(Δ, H, N) is defined as

$\begin{equation}S\left({\Delta},H,N\right)=\sum _{i=0}^{N-{\Delta}-1}{\left[{\left(i+{\Delta}\right)}^{2H}-2{i}^{2H}+\vert i-{\Delta}{\vert }^{2H}\right]}^{2}.\end{equation} \tag{ D.2 }$

It is worthwhile noting that for the FBM case the eigenvalues of the covariance matrix Σ(Δ) are not given in explicit form and need to be calculated numerically. Also note that equation (D.1) is independent of the generalised diffusion coefficient D_H which gets cancelled both in a and b.

Appendix E.: Connection between extreme value statistic and large deviation theory

Consider M discrete trajectories, $\left\{{\left\{{X}_{1},{X}_{2},\dots ,{X}_{N}\right\}}_{1},{\left\{{X}_{1},{X}_{2},\dots ,{X}_{N}\right\}}_{2},\dots ,{\left\{{X}_{1},{X}_{2},\dots ,{X}_{N}\right\}}_{M}\right\}$ of length N of a given process. Let Y_j be a statistic over each trajectory j, j ∈ {1, 2, ..., M} (for instance, Y could be the TAMSD). Large deviation theory deals with the probability that P(Y > ɛ) ⩽ exp(−F), where F is the rate function and ɛ is the deviation parameter. On the other hand, the extreme value statistic deals with the probability $P\left(\mathrm{max}\left\{{Y}_{1},{Y}_{2},..\hspace{1.5pt},{Y}_{M}\right\}{ >}z\right)$ . This probability can be written as

$\begin{align*}\hfill P\left(\mathrm{max}\left\{{Y}_{1},{Y}_{2},..\hspace{1.5pt},{Y}_{M}\right\}{ >}z\right)& =1-P\left(\mathrm{max}\left\{{Y}_{1},{Y}_{2},..\hspace{1.5pt},{Y}_{M}\right\}{\leqslant}z\right)\hfill \\ \hfill & =1-P\left({Y}_{1}{\leqslant}z,{Y}_{2}{\leqslant}z,\dots ,{Y}_{M}{\leqslant}z\right)\hfill \\ \hfill & =1-\prod _{j=1}^{M}P\left({Y}_{j}{\leqslant}z\right)=1-{P}^{M}\left({Y}_{1}{\leqslant}z\right)\hfill \\ \hfill & =1-{\left[1-P\left({Y}_{1}{ >}z\right)\right]}^{M}.\hfill \end{align*}$

The last three equalities come from the fact that the considered trajectories represent independent realisations of the same process.

Appendix F.: Simulated processes

For our analysis in the central figure 2 we simulate 100 trajectories each for different processes. The number of trajectories is of the same order as in the experimental datasets we analyse.

Brownian motion: BM is characterised by the Langevin equation in the overdamped limit as [73, 74]

$\begin{equation}\frac{\mathrm{d}X\left(t\right)}{\mathrm{d}t}=\sqrt{2D}\eta \left(t\right),\end{equation} \tag{ F.1 }$

driven by the white Gaussian noise η(t) with zero mean and autocorrelation function ⟨η(t₁)η(t₂)⟩ = δ(t₁ − t₂). The parameter D is the diffusion coefficient.

Fractional Brownian motion: FBM has been used to explain anomalous diffusion in a number of experiments [75–82], where the underlying process had long-range correlations. FBM [50, 83] is given by the Langevin equation

$\begin{equation}\frac{\mathrm{d}{X}_{\text{FBM}}\left(t\right)}{\mathrm{d}t}={\eta }_{\text{H}}\left(t\right),\end{equation} \tag{ F.2 }$

driven by the FGN η_H(t) with autocorrelation function

$\begin{equation}\langle {\eta }_{\text{H}}\left({t}_{1}\right){\eta }_{\text{H}}\left({t}_{2}\right)\rangle =2H\left(2H-1\right){D}_{\mathrm{H}}{\times}\vert {t}_{1}-{t}_{2}{\vert }^{2\left(H-1\right)},\end{equation} \tag{ F.3 }$

where D_H is the generalised diffusion coefficient and H is the Hurst index, which is related to the anomalous diffusion exponent α as H = α/2.

Scaled Brownian motion: SBM has been used as a model of anomalous diffusion in numerous experiments [84–89], particularly those with fluorescence recovery after photobleaching [90]. SBM [7, 91] is characterised by equation (F.1) but with a time-dependent diffusivity given by D(t) = D₀ t^α−1, with constant D₀ and the anomalous diffusion exponent α. The parameter 0 < α < 1 leads to a subdiffusive MSD while 1 < α < 2 leads to a superdiffusive MSD.

Continuous time random walk: the subdiffusive CTRW has been used to describe a number of experiments [3, 53, 92–94] exhibiting anomalous diffusion. It is a renewal process with Gaussian jumps with an asymptotic power-law distributed waiting time between successive jumps [7, 52, 53, 95]. The asymptotic PDF of the waiting time τ is given by $\psi \left(\tau \right)\approx {\tau }_{0}^{\alpha }{\tau }^{\left(-1-\alpha \right)}$ , where 0 < α < 1 is the anomalous diffusion exponent of the MSD. We refer to [96] for details of the simulation.

Superstatistical process: by a superstatistical process [54, 55] we mean a process which is defined by equation (F.1) where the diffusion coefficient is a random variable, that is, there exists a distribution of diffusivities over the tracers in a single particle tracking experiment. The convolution of such distributions of diffusivities with a Gaussian distribution can give rise to non-Gaussian displacement distributions routinely observed in many experiments [57, 97–104]. As in many of these experiments, the diffusivity has a Rayleigh-like distribution, for our simulated superstatistical process we applied the Rayleigh distribution for the diffusivity [105],

$\begin{equation}p\left(D\right)=\frac{D}{{D}_{0}^{2}}\enspace \mathrm{exp}\left(-\frac{{D}^{2}}{2{D}_{0}^{2}}\right),\end{equation} \tag{ F.4 }$

where D₀ is the scale parameter of the Rayleigh distribution and is related to the mean $\langle D\rangle ={D}_{0}\sqrt{\left(\pi /2\right)}$ .

Diffusing diffusivity: the minimal DD model can be expressed as the set of stochastic differential equations [56]

$\begin{equation}\frac{\mathrm{d}{X}_{\mathrm{D}\mathrm{D}}\left(t\right)}{\mathrm{d}t}=\sqrt{2D\left(t\right)}{\eta }_{1}\left(t\right),\end{equation} \tag{ F.5a }$

$\begin{equation}D\left(t\right)={Y}^{2}\left(t\right),\end{equation} \tag{ F.5b }$

$\begin{equation}\frac{\mathrm{d}Y\left(t\right)}{\mathrm{d}t}=-\frac{Y\left(t\right)}{{\tau }_{\mathrm{c}}}+\sigma {\eta }_{2}\left(t\right),\end{equation} \tag{ F.5c }$

where the time dependent diffusion coefficient is defined as the square of the OU process Y(t) and τ_c is the relaxation time to the stationary limit [106]. η₁(t) and η₂(t) are independent white Gaussian noise with zero mean and unit variance. In the long time, stationary limit the diffusion coefficients are distributed roughly exponentially [56],

$\begin{equation}p\left(D\right)={\left(\pi D{D}_{\star }\right)}^{-1/2}\enspace \mathrm{exp}\left[-D/{D}_{\star }\right],\end{equation} \tag{ F.6 }$

where D_⋆ = σ² τ_c. The TAMSD for this DD model grows linearly with lag time but the PDF of the process is non-Gaussian (Laplacian) for times less than the relaxation time τ_c, and it crosses over to a Gaussian PDF for t ≫ τ_c. This behaviour was seen in a number of experiments [97, 99].

Appendix G.: The Jarque–Bera test for Gaussianity

In statistics, the JB test is a goodness-of-fit test used to recognise if the sample data have the skewness and kurtosis matching the Gaussian distribution. The test statistic is always nonnegative. If it is far from zero, then we can suspect, the data are not from the Gaussian distribution. The JB statistic for a random sample x₁, x₂, ..., x_n is defined as follows [107],

$\begin{equation}\mathrm{J}\mathrm{B}=\frac{n}{6}\left({S}^{2}+\frac{1}{4}{\left(K-3\right)}^{2}\right),\end{equation} \tag{ G.1 }$

where S and K are the empirical skewness and kurtosis, respectively.

In the literature, the JB test based on the JB statistic is considered as one of the most effective tests for Gaussianity. It is especially useful in the problem of recognition between heavy- and light-tailed (Gaussian) distributions of the data.