Skip to main content
Log in

Linear dynamical modes as new variables for data-driven ENSO forecast

  • Published:
Climate Dynamics Aims and scope Submit manuscript

Abstract

A new data-driven model for analysis and prediction of spatially distributed time series is proposed. The model is based on a linear dynamical mode (LDM) decomposition of the observed data which is derived from a recently developed nonlinear dimensionality reduction approach. The key point of this approach is its ability to take into account simple dynamical properties of the observed system by means of revealing the system’s dominant time scales. The LDMs are used as new variables for empirical construction of a nonlinear stochastic evolution operator. The method is applied to the sea surface temperature anomaly field in the tropical belt where the El Nino Southern Oscillation (ENSO) is the main mode of variability. The advantage of LDMs versus traditionally used empirical orthogonal function decomposition is demonstrated for this data. Specifically, it is shown that the new model has a competitive ENSO forecast skill in comparison with the other existing ENSO models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. NOAA_ERSST_V4 data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at http://www.esrl.noaa.gov/psd/.

References

Download references

Acknowledgements

The results (Sects. 34) were obtained and analyzed under support of the Government of Russian Federation (Agreement #14.Z50.31.0033 with the Institute of Applied Physics of RAS). The methods (Sect. 2) were developed under support of the Russian Science Foundation (Grant #16-12-10198).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrey Gavrilov.

Appendices

Appendix 1: LDM decomposition

Here we describe LDM decomposition which is a subcase of more general NDM decomposition developed essentially in Gavrilov et al. (2016), Mukhin et al. (2015a) and Mukhin et al. (2017).

LDM expression At first, in accordance with Sec.IIB1 of Gavrilov et al. (2016), the initial data \({\mathbf {x}}=({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_N)\) is preliminarily rotated to full traditional EOF basis:

$$\begin{aligned} {\mathbf {y}}_n={\mathbf {V}}^T\cdot {\mathbf {x}}_n. \end{aligned}$$
(12)

Here \({\mathbf {x}}_n \in \mathbb {R}^D\) is spatially distributed data measured at the time moment n (D is equal to the number of nodes in spatial grid), \({\mathbf {y}}_n \in \mathbb {R}^D\) is a vector of EOF-based PCs, \({\mathbf {V}}\) is a full \(D \times D\) rotation matrix whose columns are D EOFs obtained via the diagonalization of the covariance matrix \(\frac{1}{N}\sum \nolimits _{n=1}^{N}{\mathbf {x}}_n\cdot {\mathbf {x}}_n^T\) and ordered so that the corresponding eigenvalues \(\lambda _1,\ldots \lambda _D\) decrease.

After, the model of LDM is constructed in the space of K leading PCs:

$$\begin{aligned} \left( \begin{array}{c} y_{1,n} \\ \vdots \\ y_{K,n} \end{array} \right) = {\mathbf {A}}\cdot {\mathbf {p}}_n+{\mathbf {c}} + \sigma \left( \begin{array}{c} \varvec{\xi }_{1,n} \\ \vdots \\ \varvec{\xi }_{K,n} \end{array} \right) , \end{aligned}$$
(13)

Here \({\mathbf {p}}_n \in \mathbb {R}^d\) is a vector of LDM-based PCs, \(K \times d\) matrix \({\mathbf {A}}\) and \(K-\)dimensional vector \({\mathbf {c}}\) are the parameters of LDM, \(\varvec{\xi }_{1,n},\ldots ,\varvec{\xi }_{K,n}\) are delta-correlated random normal processes and \(\sigma\) is their amplitude.

By direct analogy with geometrical interpretation of NDMs (Sect. IIA of Gavrilov et al. (2016)) the LDM defines a \(d-\)dimensional linear manifold in the space spanned by K leading EOFs (we assume that \(d \le K\)), and the time series \({\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_{N}\) are the values of coordinates on this manifold. The number K is a hyperparameter which is optimized in the algorithm. To do it correctly all the other PCs are modeled as a noise:

$$\begin{aligned} \begin{array}{lcl} y_{K+1,n} &{} = &{} \sigma _{K + 1} \varvec{\xi }_{K + 1,n}, \\ ~~~\vdots &{}&{} \\ y_{D,n} &{} = &{} \sigma _{D} \varvec{\xi }_{D,n}. \end{array} \end{aligned}$$
(14)

Here \(\varvec{\xi }_{K+1,n},\ldots ,\varvec{\xi }_{D,n}\) are delta-correlated random normal processes and \(\sigma _{K+1},\ldots ,\sigma _D\) are their amplitudes (not to be confused with \(\sigma _k\) in Sect. 2.2.2).

Taking into account Eqs. (12)–(14) we can rewrite the LDM as \({\mathbf {L}}({\mathbf {p}}_n)\) introduced in the Eq. (3):

$$\begin{aligned} {\mathbf {L}}({\mathbf {p}}_n) = {\mathbf {V}}\cdot \left( \begin{array}{c} \sum \limits _{i=1}^{d} A_{1i}p_{in}+c_1 \\ \vdots \\ \sum \limits _{i=1}^{d} A_{Ki}p_{in}+c_K \\ 0 \\ \vdots \\ 0 \end{array} \right) . \end{aligned}$$
(15)

LDM learning As was mentioned in Sect. 2.1.2, to estimate both the values \({\mathbf {A}}\), \({\mathbf {c}}\), \(\varvec{\varsigma }=(\sigma ;\sigma _{K+1},\ldots ,\sigma _{D})\) and the time series \({\mathbf {p}}=({\mathbf {p}}_1,\ldots ,{\mathbf {p}}_N)\) in the described model we use probabilistic Bayesian approach (Jeffreys 1998). According to this approach, all the unknown parameters \({\mathbf {A}}\), \({\mathbf {c}}\), \(\varvec{\varsigma }\), \({\mathbf {p}}\) are assigned the prior posterior PDF and then they are learned by the maximization of the posterior PDF which updates the prior PDFs using the observations \({\mathbf {x}}\) and the model (3). The choice of the prior PDFs is explained in great detail in Appendix A of Gavrilov et al. (2016) for the case of one-dimensional NDMs which is trivially generalized for the case of multidimensional manifold (Mukhin et al. 2017). Here we present these PDFs rewritten for the special case of LDMs omitting the detailed explanation.

Expression for the prior PDF for PCs introduced in the main text is as follows:

$$\begin{aligned} \begin{array}{l} P_{pr}({\mathbf {p}}|\varvec{\tau }) ={\prod \limits _{i=1}^{d} \frac{1}{\sqrt{2\pi }}} \exp {\left( -\frac{\big |p_{i1}\big |^2}{2} \right) }\times \prod \limits _{i=1}^{d}\prod \limits _{n=1}^{N-1}\frac{1}{\sqrt{2\pi (1-\alpha _i^2)}}\exp {\left( -\frac{(p_{i,n+1} - \alpha _{i} \cdot p_{in})^2}{ 2 (1-\alpha _i^2)} \right) },\\ \alpha _{i}:=\exp \left( -{1}/{\tau _i}\right) . \end{array} \end{aligned}$$
(16)

Actually, this PDF corresponds to prior assumption that the time series \(p_{i1},\ldots ,p_{iN}\) can be produced by the simplest linear stochastic evolution operator (red noise) with autocorrelation time \(\tau _i\), thus taking into account dynamical information about connection of the successive states with each other (see also Appendix A1 of Gavrilov et al. 2016). The vector \(\varvec{\tau }\) is treated as a hyperparameter (see below), and no specific apriori knowledge is assumed about it.

One should keep in mind that the \(\tau _i\)’s only correspond to autocorrelation times of the a priori assumed PCs (prior PDF (16)). However, this prior PDF is updated by the observations, in consistence with the Bayesian framework. As a result, the \(\tau _i\)’s are not exactly equal to the autocorrelation times of the obtained PCs. In practice, they lead to a restriction on the smallest scale of each PC, i.e. filtering out all scales faster than some value related to \(\tau _i\).

The prior PDF for LDM coefficients \({\mathbf {A}}\) and \({\mathbf {c}}\) is taken in order to complement the PDF (16):

$$\begin{aligned} \begin{array}{l} P_{pr}({\mathbf {A}},{\mathbf {c}}| K) =\prod \limits _{k=1}^{K}\left( \frac{d+1}{2\pi \lambda _k}\right) ^{\frac{d+1}{2}} \exp {\left( -\frac{d+1}{2\lambda _k} (c_k^2+\sum \limits _{i=1}^{d} A_{ki}^2)\right) }. \end{array} \end{aligned}$$
(17)

It is derived directly from Appendix A2 of Gavrilov et al. (2016) and provides the uniqueness of variances of PCs (i.e. normalization).

The prior PDF for \(\varvec{\varsigma }\) is simply taken as constant because the Bayesian problem is well-posed by \(\varvec{\varsigma }\):

$$\begin{aligned} P_{pr}(\varvec{\varsigma }| K)=const. \end{aligned}$$
(18)

Thus, the Bayesian posterior PDF is expressed via the prior PDFs in a standard way:

$$P({\mathbf {A}}, {\mathbf {c}}, \varvec{\varsigma },{\mathbf {p}}, |{\mathbf {x}},\varvec{\tau },K)\propto P({\mathbf {x}}|{\mathbf {A}},{\mathbf {c}},\varvec{\varsigma },{\mathbf {p}},\varvec{\tau },K)\times P_{pr}({\mathbf {p}}|\varvec{\tau }) ~ P_{pr}({\mathbf {A}},{\mathbf {c}}| K) ~ P_{pr}(\varvec{\varsigma }| K).$$
(19)

Here the likelihood of the LDM \(P({\mathbf {x}}|{\mathbf {A}},{\mathbf {c}},\varvec{\varsigma },{\mathbf {p}},\varvec{\tau },K)\) is the only function depending on the observation. It is written from the Gaussianity assumption of the noise in our LDM model (12)–(15):

$$\begin{aligned} \begin{array}{l} P({\mathbf {x}}|{\mathbf {A}},{\mathbf {c}},\varvec{\varsigma },{\mathbf {p}},\varvec{\tau },K) =P({\mathbf {V}}^T{\mathbf {x}}|{\mathbf {A}},{\mathbf {c}},\varvec{\varsigma },{\mathbf {p}},\varvec{\tau },K)\\ =\prod \limits _{n=1}^N \prod \limits _{k=1}^K \frac{1}{\sqrt{2\pi \sigma ^2}} \exp {\left[ -\frac{\left| \sum \nolimits _{i=1}^D V_{ik}x_{in}-\sum \nolimits _{i=1}^{d} A_{ki}p_{in}-c_k\right| ^2}{ 2 \sigma ^2} \right] } \\ \times \prod \limits _{n=1}^{N}\prod \limits _{k=K+1}^{D}\frac{1}{\sqrt{2\pi \sigma _{k}^2}} \exp {\left[ -\frac{\left| \sum \nolimits _{i=1}^D V_{ik}x_{in}\right| ^2}{ 2 \sigma _{k}^2} \right] }. \end{array} \end{aligned}$$
(20)

The parameters \({\mathbf {A}}\), \({\mathbf {c}}\), \(\varvec{\varsigma }\), \({\mathbf {p}}\) of LDM are found by maximization of (19). Note that the exponential factors in (20) are quadratic by \({\mathbf {p}}\) as well as by coefficients \({\mathbf {A}}\), \({\mathbf {c}}\), and the number of the coefficients is proportional to \(d+1\), while in case of general NDM parameterized by nonlinear polynomial the exponential factor would be a polynomial of higher order by \({\mathbf {p}}\), and the number of coefficients would grow dramatically with the manifold dimension d and polynomial degree m (proportional to \(C^{m}_{m+d}\)). These facts make the maximization problem well-posed (solution uniqueness) and numerically fast for LDM dimensions \(d>2\), in contrast with the case of general NDM.

LDM optimization To find the hyperparameters \((\varvec{\tau },K)\) the Bayesian evidence, or marginal likelihood, is used as a cost function. It represents the probability of the model with given hyperparameters to reproduce the observed data and is defined as follows:

$$\begin{aligned} P({\mathbf {x}}|\varvec{\tau },K)&= \int \limits _{{\mathbf {A}},{\mathbf {c}},\varvec{\varsigma },{\mathbf {p}}} [ P({\mathbf {x}}|{\mathbf {A}},{\mathbf {c}},\varvec{\varsigma },{\mathbf {p}},\varvec{\tau },K) \\ & \quad \times P_{pr}({\mathbf {p}}|\varvec{\tau }) ~P_{pr}({\mathbf {A}},{\mathbf {c}}| K) ~P_{pr}(\varvec{\varsigma }| K) ]. \end{aligned}$$
(21)

Such a cost function allows us to find in Bayesian way optimal hyperparameters for which the model is neither underfitted, nor overfitted. It is explored in detail in Gavrilov et al. (2016).

To estimate (21) for different hyperparameters and find the optimal LDM we use exactly the same algorithm based on Laplace approximation as given in Gavrilov et al. (2016).

Appendix 2: Bayesian approach to evolution operator construction

Here we describe the procedure of construction of evolution operator for PCs. Generally it is analogical to the procedure described in Molkov et al. (2011), Molkov et al. (2012), Mukhin et al. (2015b) and Gavrilov et al. (2017), but with slight modifications. According to this procedure, we formulate at first the model and prior PDF for its parameters explicitly.

The evolution operator model is defined by the Eq. (5) where we parameterize the function \(\varvec{\varphi }\) by ANN in the form of perceptron with one hidden layer and hyperbolic tangent activation function (Hornik et al. 1989):

$$\begin{aligned} \varvec{\varphi }({\mathbf {z}},{\mathbf {f}},t)=\sum \limits _{i=1}^m(\varvec{\alpha }_{i} + t\varvec{\beta }_{i})\cdot \tanh (\varvec{\omega }_{i}^T{\mathbf {z}}+\varvec{\nu }_{i}^T{\mathbf {f}}+\gamma _{i}). \end{aligned}$$
(22)

Here \({\mathbf {z}}\) stands for the collection of \({\mathbf {p}}_{n-1},\ldots ,{\mathbf {p}}_{n-l}\) from (5); \(\varvec{\alpha }_i\in \mathbb {R}^{d}\), \(\varvec{\beta }_i\in \mathbb {R}^{d}\), \(\varvec{\omega }_i\in \mathbb {R}^{ld}\), \(\varvec{\nu }_i\in \mathbb {R}^{\dim {\mathbf {f}}}\), \(\gamma _i\in \mathbb {R}\) are the ANN coefficients, let us collect them into vector \(\varvec{\mu }_{\varphi }=(\varvec{\alpha }_1,\varvec{\beta }_1,\varvec{\omega }_1,\varvec{\nu }_1,\gamma _1,\ldots ,\varvec{\alpha }_m,\varvec{\beta }_m,\varvec{\omega }_m,\varvec{\nu }_m,\gamma _m)\).

Generally, to model slow non-stationarity of the parameters of real system’s evolution operator, it is appropriate to introduce a weak time-dependence into all parameters of \(\varvec{\varphi }\). In the first-order expansion it leads to a linear dependence of \(\varvec{\varphi }\) on time t. However, for the particular case of ANN it can be readily shown that the form (22) provides a universal approximation of such an expansion for any nonlinear function of \(({\mathbf {z}},{\mathbf {f}},t)\) (Molkov et al. 2011).

The matrix \(\hat{{\mathbf {g}}}\) in the random part of the model is parameterized as lower-triangular matrix via its values \(\varvec{\mu }_{g}=(g_{11}; \ g_{21}, \ g_{22};\,\, \ldots ;g_{m1}, \ldots , \ g_{mm})\), which is sufficient for representation of an arbitrary cross-correlation matrix of random part (Mukhin et al. 2015b).

The prior PDF for ANN coefficients is derived in Molkov et al. (2011) and Molkov et al. (2012) for the case of normalized ANN inputs and outputs. Here we have multidimensional time series whose components have different norms, and this difference is important. If we take the norms into account, it is easy to write the following modified expression for the prior PDF for the model parameters using the expressions from (Molkov et al. 2011, 2012):

$$\begin{aligned} P_{pr}(\varvec{\mu }_\varphi ,\varvec{\mu }_g|m)&\propto \prod \limits _{i=1}^{m}\prod \limits _{k=1}^{d} \sqrt{\frac{m}{2\pi s_k }} \exp {\left( -\frac{m\alpha _{ki}^2}{2 s_k } \right) } \\ &\quad \times \prod \limits _{i=1}^{m}\prod \limits _{j=1}^{ld} \sqrt{\frac{1}{2\pi \sigma _\omega ^2}} \exp {\left( -\frac{\omega _{ji}^2}{2\sigma _\omega ^2} \right) } \times \quad \prod \limits _{i=1}^{m}\prod \limits _{j=1}^{\dim {\mathbf {f}}} \sqrt{\frac{1}{2\pi \sigma _\nu ^2}} \exp {\left( -\frac{\nu _{ji}^2}{2\sigma _\nu ^2} \right) } \\ &\quad\times \prod \limits _{i=1}^{m} \sqrt{\frac{1}{2\pi \sigma _\gamma ^2}} \exp {\left( -\frac{\gamma _{i}^2}{2\sigma _\gamma ^2} \right) }. \end{aligned}$$
(23)

Here \(s_k=\frac{1}{N-1}\sum \nolimits _{n=2}^{N}(p_{k,n}-p_{k,n-1})^2\), \(\sigma _\omega ^2=2Nd/\sum \nolimits _{n=1}^{N}|{\mathbf {p}}_{n}|^2\), \(\sigma _\nu ^2=2N\dim {\mathbf {f}}/\sum \nolimits _{n=1}^{N}|{\mathbf {f}}_{n}|^2\), \(\sigma _\gamma ^2=2\cdot (ld+\dim {\mathbf {f}})\). In fact, in the case of EOF-ANN model the time series \({\mathbf {p}}_n\) were used with their variances, while in the case of LDM-ANN model \({\mathbf {p}}_n\) were normalized to unity variance because their variance does not keep any information about the system. The time series of forcing were always normalized to equal variances.

In the notation defined above, the expression for the probability of generating the time series \({\mathbf {p}}\) by the model is the following:

$$\begin{aligned} & P({\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_N |\varvec{\mu }_\varphi ,\varvec{\mu }_g,m,l) \\ &\quad =\prod \limits _{n=1}^{l}\prod \limits _{k=1}^{d}\frac{1}{\sqrt{2\pi \lambda _k}}\exp {\left( -\frac{p_{k,n}^2}{2 \lambda _k } \right) } \\ &\qquad \times\prod \limits _{n=l+1}^{N}\frac{1}{\sqrt{\det (2\pi \hat{{\mathbf {g}}}\hat{{\mathbf {g}}}^T)}}\exp {\left( -\frac{{\mathbf {h}}_n^T (\hat{{\mathbf {g}}}\hat{{\mathbf {g}}}^T)^{-1} {\mathbf {h}}_n}{2} \right) },\\ {\mathbf {h}}_n &= {\mathbf {p}}_{n}-{\mathbf {p}}_{n-1}-\varvec{\varphi }\big ({\mathbf {p}}_{n-1},\ldots ,{\mathbf {p}}_{n-l};\,\, {\mathbf {f}}_n;t_n\big ). \end{aligned}$$
(24)

Here \(\lambda _k=\frac{1}{N}\sum \nolimits _{n=1}^{N}p_{k,n}^2\). Note, in the expression (24) the initializing values \({\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_{l}\) were treated as a model parameters via the first term (see Gavrilov et al. 2017 for detailed explanation).

The further procedure of model construction is similar to the above-described procedure for LDM construction. To learn the model parameters we maximize the Bayesian posterior PDF which can be expressed as follows:

$$\begin{aligned} \begin{array}{l} P(\varvec{\mu }_\varphi ,\varvec{\mu }_g | {\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_N,m,l)\propto \\ ~~~~~~~~ P({\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_N |\varvec{\mu }_\varphi ,\varvec{\mu }_g,m,l)\times P_{pr}(\varvec{\mu }_\varphi ,\varvec{\mu }_g|m). \end{array} \end{aligned}$$
(25)

To find optimal hyperparameters (lm) we maximize marginal likelihood:

$$\begin{aligned} & P( {\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_N | m,l )\\ &\quad = \int \limits _{\varvec{\mu }_\varphi ,\varvec{\mu }_g}P({\mathbf {p}}_{1},\ldots ,{\mathbf {p}}_N |\varvec{\mu }_\varphi ,\varvec{\mu }_g,m,l)\times P_{pr}(\varvec{\mu }_\varphi ,\varvec{\mu }_g|m). \end{aligned}$$
(26)

The particular algorithm for learning and optimization using (25) and (26) is the same as described in Gavrilov et al. (2017).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gavrilov, A., Seleznev, A., Mukhin, D. et al. Linear dynamical modes as new variables for data-driven ENSO forecast. Clim Dyn 52, 2199–2216 (2019). https://doi.org/10.1007/s00382-018-4255-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00382-018-4255-7

Keywords

Navigation