1 Introduction

1.1 Predictions Based on Neural Networks

This section reviews how neural networks may predict the evolution of signal temporal evolution.

1.1.1 Prediction by Static (Feed-Forward) NN

A set of input values are considered by each neuron. Each one is associated with a weight, which is a varying value that may be determined by supervised or unsupervised training techniques like data clustering, and a bias. The network selects a neuron’s output depending on its weight and bias. All such activities in the context of Classification require labeled datasets. You thus require guided learning. In supervised learning, people verify that the neural network’s predictions are accurate. This aids the neural network in comprehending how labels and data related. Face identification, picture recognition and labeling, voice detection, and speech transcription are a few examples of this. Deep learning can link pixels in a picture and a person’s name via categorization. The act of grouping or clustering is the identification of commonalities. Understand that labels are not always necessary for the deep learning model to detect commonalities. Unsupervised learning is when a system utilizes machine learning to learn on its own when there are no helping human labels from which to draw. This keeps the possibility of creating extremely precise models. Customer churn is a type of clustering.

As we are all aware, predictive analytic uses methods like predictive modeling and machine learning to examine historical data and forecast future patterns [7]. Contrary to conventional forecasting techniques, neural networks are unique. In contrast to a neural network, the most popular model, linear regression, is actually a pretty straightforward approach to problem-solving. Because of their hidden layers, neural networks do predictive analytic more effectively. Only input and output nodes are used in linear regression models to generate predictions. The hidden layer is also used by the neural network to improve prediction accuracy. That’s because it learns similarly to how people do. So why isn’t neural network prediction used by everyone? They are prohibitively expensive due to their high computer power requirements. In addition, massive data sets are required to train neural networks, which your company might not have. But as IT technology becomes more affordable, the first obstacle could soon vanish. Soon, there won’t be any more "unpleasant shocks" because to technologies like Artificial Neural Networks (ANNs).

1.1.2 Prediction by Dynamic (with Feedback) NN

ANNs are often regarded as effective instruments for modeling intricate, nonlinear systems using hazy dynamic models. ANNs were first utilized as reliable predictors of various processes with static reliance on input–output data. The time effect should be included in the ANN when it must be used to describe a rough model of time-dependent input–output interactions, which necessitates the creation of a dynamic ANN or DNN [11]. In continuous time modelling we will be refereed to DNN as Differential Neural Networks. The review [13] lays forth the different recurrent and differential forms of Dynamic Neural Networks (DNN), their mathematical construction, and techniques for adjusting the network weights. The characteristics of DNNs motivate their use to represent the dynamics of decontamination processes. This review details recent findings on the DNN application for the modelling and controlling of treatment systems based on either biological or chemical processes. The modeling application of DNN for common methods used in the treatment of wastewater, contaminated soil, and the atmosphere is described. The major benefits of using the approximate DNN-based model instead of designing the complex mathematical description for each treatment are analyzed to enhance the efficiency of the decontamination treatment. In this paper, we also highlight the remarkable efficiency of DNNs as a keystone tool for modelling of epidemics. [15, 18].

1.2 On Mathematical Predictions of Epidemics

In the last few years, researchers and government officials have used computer-based models to try to forecast the course of the coronavirus pandemic (see, for example, [2, 8, 10, 19]). To predict the future of the coronavirus disease 2019 (COVID-19) outbreaks globally, several mathematical models have been developed. These forecasts have a significant impact on how soon and forcefully, governments respond to an outbreak. However, rather than producing accurate quantitative predictions regarding the magnitude or duration of illness burdens, the primary and most efficient application of epidemiological models to evaluate the relative efficacy of different treatments in lowering disease burden.

There are several studies remarking that models are hardly crystal balls when it comes to making predictions, and according to science journalist Miles O’Brien (PBS News Hour), "all of them require human assumptions" [1]. The creation of these models and their eventual goal are more sophisticated than many of us think, according to specific research periodicals. Our world is complex and has more data than knowledge. The Global Epidemic and Mobility Model, or GLEAM, is curated by a group of bio-statisticians at Seattle’s Fred Hutchinson Cancer Research Center [3]. They create mathematical models that explain how infections spread chaotically and exponentially. According to the projection from last month, 17,000 to 29,300 additional fatalities would likely be reported in the US solely for the week ending February 13, 2021, totaling 465,000 to 508,000 COVID-19 deaths by this time. The accuracy of mathematical forecasts in battling epidemics is still being worked on. Nevertheless, creating such illness prediction models is a crucial issue for scientific societies worldwide. It necessitates prompt and comprehensive answers, including a potential application for defining new politics and prevention schemes.

1.3 Main Concepts of This Paper

The results presented here are based on three principle concepts:

  • Although we have hundreds of years of theoretical knowledge on how to create mathematical models of infectious diseases, have any of these models ever been put to the test using all of the data sources at our disposal in real-time? No. As we create this automobile and learn more about these models, it is hurtling down the highway. For a more accurate model design, it is really difficult to take into consideration all human aspects (social, informational, climatic, and others) acting during sickness.

  • Any recommended model must include the inherent uncertainties associated with the most recent data. Thus, for instance, we lack sufficient statistical data to accept all of the conditions that should be satisfied to use any stochastic prediction models that are accessible (such as the Kalman filter or any of its modifications such as a requirement for noises to have Gaussian distributions with known covariation matrices, local linearity of the model, exact knowledge all participating parameters and so on). We only have one data trajectory (realization), making it complicated to apply statistical concepts like mathematical expectation (mean value), variance, and confidence interval. We can also not repeat the experiment to get at least one other data curve. This indicates that a statistical method for this kind of problem is not applicable!

  • Given the previous items, we suppose that the current data-set represents the output of some dynamic system governed by a nonlinear ordinary differential equation and may be modeled by a Differential Neural Network with time-varying weights matrix parameters whose dynamics is governed by special Learning laws containing slow and fast components.

All results reported below justify nice performances of the suggested approach.

2 DNN Model with Slow and Fast Learning

2.1 Ideas of a Prediction Algorithm for Models with Complete Information

2.1.1 Non-causal Model

Consider initially an ideal scenario where we know that the following mathematical model produces the scalar output \(x\left( t\right) \in {\mathbb {R}}\) of any dynamic plant.

$$\begin{aligned} \left. \begin{array}{c} x^{\left( n\right) }\left( t\right) =f\left( t,x\left( t\right) ,{\dot{x}} \left( t\right) ,\ldots ,x^{\left( n-1\right) }\left( t\right) \right) \\ \\ x^{\left( r\right) }\left( 0\right) =x_{0}^{\left( r\right) }, r=0,1,\ldots ,n-1 \end{array} \right\} \end{aligned}$$
(1)

where the nonlinear function \(f:{\mathbb {R}}_{+}\times {\mathbb {R}} ^{n}\rightarrow {\mathbb {R}}\) and initial condition \(x_{0}\) supposed to be known exactly. Defining vector \({\textbf{x}}\left( t\right) \in R^{n}\) with components

$$\begin{aligned} x_{1}\left( t\right) :=x\left( t\right) ,x_{2}\left( t\right) :={\dot{x}}_{1},\ldots ,{\dot{x}}_{n-2}:=x_{n-1}, x_{n}:={\dot{x}}_{n-1}, \end{aligned}$$
(2)

we can represent (1) as

$$\begin{aligned} \left. \begin{array}{c} {{\dot{\textbf{x}}}}\left( t\right) ={\textbf{F}}\left( t,{\textbf{x}}\left( t\right) \right) ={{\textbf{A}}}{{\textbf{x}}}\left( t\right) \mathbf {+b}v(t) \\ \\ {\textbf{A}}\ \ {{\textbf {=}}}\left( \begin{array}{ccccccc} {\small 0} &{} {\small 1} &{} {\small 0} &{} {\small \cdots } &{} {\small 0} &{} {\small 0} &{} {\small 0} \\ {\small 0} &{} {\small 0} &{} {\small 1} &{} {\small 0} &{} {\small \cdots } &{} {\small 0} &{} {\small 0} \\ {\small \vdots } &{} {\small 0} &{} {\small 0} &{} {\small 1} &{} {\small 0} &{} {\small \cdots } &{} {\small 0} \\ {\small 0} &{} {\small 0} &{} {\small 0} &{} {\small 0} &{} {\small 1} &{} {\small 0} &{} {\small \cdots } \\ {\small 0} &{} {\small 0} &{} {\small \cdots } &{} {\small 0} &{} {\small 0} &{} {\small 0} &{} {\small 0} \end{array} \right) \in {\mathbb {R}}^{n\times n},{\textbf{b}}\ \ {=}\left( \begin{array}{c} 0 \\ 0 \\ \vdots \\ 0 \\ 1 \end{array} \right) \in {\mathbb {R}}^{n\times 1} \\ v(t)=f\left( t,x_{1}\left( t\right) ,x_{2}\left( t\right) ,\ldots ,x_{n-1}\left( t\right) \right) \in {\mathbb {R}} \end{array} \right\} \end{aligned}$$
(3)

In the corresponding integral form the differential model (3) can be rewritten as

$$\begin{aligned} \begin{array}{c} {\textbf{x}}\left( t+T\right) ={\textbf{x}}\left( t\right) +\displaystyle \int \limits _{\tau =t}^{t+T}{\textbf{F}}\left( \tau ,{\textbf{x}}\left( \tau \right) \right) d\tau = {\textbf{x}}\left( t\right) +T\,{\textbf{r}}\left( t,T\right) \\ {\textbf{r}}\left( t,T\right) :=\dfrac{1}{T}\displaystyle \int \limits _{\tau =t}^{t+T} {\textbf{F}}\left( \tau ,{\textbf{x}}\left( \tau \right) \right) d\tau \end{array} \end{aligned}$$
(4)

where the variable \({\textbf{r}}\left( t,T\right) \) represents the "averaged rate" of changing the considered output variable \({\textbf{x}}\left( t\right) \) on the time-interval \(\left[ t,t+T\right] \). Considering the data set \( \left\{ {\textbf{x}}\left( \tau \right) \right\} _{\tau \in \left[ 0,t\right] } \) as the information on the process available up to the moment t we may conclude that \({\textbf{r}}\left( t,T\right) \) (4) contains the information on nearest future \(\left\{ {\textbf{x}}\left( \tau \right) \right\} _{\tau \in \left[ t,t+T\right] }\) with the horizon T and hence may be considered as “non-causal”.

2.1.2 Causal Approximation

Introduce standard operators of delay \(e^{-sT}\) and differentiation s acting as

$$\begin{aligned} e^{-sT}f\left( t\right) =f\left( t-T\right) ,sf\left( t\right) =f^{\prime }\left( t\right) \end{aligned}$$

and using the local approximation

$$\begin{aligned} e^{sT}\simeq 1+sT+\dfrac{T^{2}}{2}s^{2}+\dfrac{T^{3}}{6}s^{3} \end{aligned}$$

for the "forecasting operator" \(e^{sT}\), we can obtain the following approximate relation:

$$\begin{aligned} \left. \begin{array}{c} {\textbf{r}}\left( t,T\right) =e^{sT}e^{-sT}{\textbf{r}}\left( t,T\right) =e^{sT} {\textbf{r}}\left( t-T,T\right) \simeq \\ \left( 1+sT+\dfrac{T^{2}}{2}s^{2}+\dfrac{T^{3}}{6}s^{3}\right) {\textbf{r}} \left( t-T,T\right) ={\textbf{r}}\left( t-T,T\right) + \\ T\ {{\dot{\textbf{r}}}}\left( t-T,T\right) +\dfrac{T^{2}}{2}\ {\ddot{\textbf{r}}} \left( t-T,T\right) +\dfrac{T^{2}}{2}\mathbf {\dddot{r}}\left( t-T,T\right) := {\textbf{r}}_{caus}\left( t,T\right) , \end{array} \right\} \end{aligned}$$
(5)

where

$$\begin{aligned} {\textbf{r}}\left( t-T,T\right) =\dfrac{1}{T}\displaystyle \int \limits _{\tau =t-T}^{t} {\textbf{F}}\left( \tau ,{\textbf{x}}\left( \tau \right) \right) d\tau , \end{aligned}$$
(6)

depends on available information \(\left\{ {\textbf{x}}\left( \tau \right) \right\} _{\tau \in \left[ t-T,t\right] }\). Given that the new variable \({\textbf{r}}_{caus}\left( t,T\right) \) (5) can be treated as the "causal approximation" of variable \({\textbf{r}}\left( t,T\right) \) and the integral representation (4) of the considered dynamics (1) can be locally approximated as

$$\begin{aligned} {\textbf{x}}\left( t+T\right) \simeq {\textbf{x}}\left( t\right) +T\,{\textbf{r}} _{caus}\left( t,T\right) . \end{aligned}$$
(7)

Remark 1

Since the right-hand side of (7) contains only information

\(\left\{ x\left( \tau \right) \right\} _{\tau \in \left[ t-T,t \right] }\), available up to time t, we can consider the value \(x\left( t+T\right) \) as the " prediction (or forecasting)" of the process \( \left\{ {\textbf{x}}\left( \tau \right) \right\} _{\tau \in \left[ 0,t\right] } \) ahead on horizon T.

3 Prediction Algorithm for Models with Incomplete Information: DNN Approach

When the original dynamics \({\textbf{F}}\left( t,{\textbf{x}}\left( t\right) \right) \) in (3) is completely or partially unknown, we suggest applying the DNN approach [11] which showed nice results being applied to various problems in bio-engineering and the environment science [12, 13].

3.1 DNN Identification Model

Artificial neural networks (ANNs) are thought to be effective modeling tools for non-linear, complicated systems with ambiguous dynamic models. ANNs were first utilized as reliable predictors of various processes with static reliance on input–output data. The time effect must be included in the ANN when it is used to characterize a rough model of time-dependent input–output relationships, which necessitates the reconstruction of a dynamic ANN or the use of Recurrent Neural Networks (RNNs) in discrete time or Differential Neural Networks (DNNs) in continuous time. DNNs sometimes referred to as Auto Associative or Feedback Networks, are a subclass of ANNs in which the connections between the input and the output are organized into a directed cycle. As a result, the network develops an internal state that displays dynamic, temporally dependent behavior. DNN allows the signal to go both forward and backward by including loops in the network design or topology. To achieve the required behavior of this DNN, a particular tuning for the time-dependent weight matrix parameters is realized as a result of such a suggestion. In our scenario, define the single layer DNN model following [11], where the measurable output x(t) is a vector, as

$$\begin{aligned} \left. \begin{array}{c} \dfrac{d}{dt}{\hat{\textbf{x}}}\left( t\right) {=A\hat{\textbf{x}}}\left( t\right) \mathbf {{+}b}x_{1}^{\left( n\right) }\left( t\right) {+}\hat{W}\left( t\right) \sigma \left( {\hat{\textbf{x}}}\left( t\right) \right) {+}L\left[ x_{1}(t){-}C^{\top }{\hat{\textbf{x}}}\left( t\right) \right] , \\ \\ {\hat{\textbf{x}}}\left( 0\right) {=}{\textbf{x}}\left( 0\right) \in {\mathbb {R}} ^{n}, C^{\top }{=}\left( 1,0,\ldots ,0\right) \in {\mathbb {R}}^{n}, \hat{W}\left( t\right) \in {\mathbb {R}}^{n\times p},\sigma :{\mathbb {R}} ^{n}\rightarrow {\mathbb {R}}^{p}, \end{array} \right\} \end{aligned}$$
(8)

where

  • \(\sigma ^{\top }\left( {\hat{\textbf{x}}}\right) =\left( \sigma _{1}\left( {\hat{\textbf{x}}}\right) ,\sigma _{2}\left( {\hat{\textbf{x}}} \right) ,\ldots ,\sigma _{p}\left( {\hat{\textbf{x}}}\right) \right) \) is the vector with sigmoidal components

    $$\begin{aligned} \sigma _{j}\left( {\hat{\textbf{x}}}\right) =\frac{\alpha _{j}}{1+\beta _{j}e^{-\gamma _{j}^{\top }{\hat{\textbf{x}}}}}+\delta _{j}, j=1,\ldots ,p \end{aligned}$$

    (\(\alpha _{j},\) \(\beta _{j}\) and \(\delta _{j}\) are positive scalars and \( \gamma _{j}\in {\mathbb {R}} ^{n}\) is a weighting vector for the component of \({\hat{\textbf{x}}}\));

  • \(\hat{W}\left( t\right) \) is the weight matrix, changing in time according to the Learning Procedure (LP)

    $$\begin{aligned} \left. \begin{array}{c} \dfrac{d}{dt}\hat{W}\left( t\right) =K^{-1}P\left[ \mathbf {x(}t{)- \hat{\textbf{x}}(}t\mathbf {)}\right] \sigma ^{\top }\left( {\hat{\textbf{x}}} \left( t\right) \right) \\ \\ 0<K=K^{\top }\in {\mathbb {R}}^{n\times n},0<P=P^{\top }\in {\mathbb {R}}^{n\times n} \end{array} \right\} \end{aligned}$$
    (9)
  • The vector L \(\in {\mathbb {R}} ^{n}\) must be selected in such a way that

    $$\begin{aligned} \begin{array}{c} L\in {\mathbb {R}}^{n\times 1}:A^{0}(L)=A-LC^{\top }\text { is Hurwitz,} \\ \textrm{spectrum}\left( A^{0}(L)\right) \in {\mathbb {C}} ^{-}\text {.} \end{array} \end{aligned}$$

As it is mentioned in ( [11]), a special selection of matrix P we may guarantee a good DNN-approximation (identification) \( {\hat{\textbf{x}}}\left( t\right) \simeq {\textbf{x}}\left( t\right) \) practically for all \(t\ge 0\). The next subsection explains how the algorithms ( 8) and (9) should be modified to be able to generate a good prediction trajectory \({\hat{\textbf{x}}}\left( t+T\right) \) using only available information \(\left\{ {\hat{\textbf{x}}} \left( \tau \right) ,\right\} _{\tau \in \left[ t-T,t\right] }\).

3.2 DNN Prediction Model

The DNN dynamics (8) in the integral causal format (7) may be represented as

$$\begin{aligned} {\hat{\textbf{x}}}\left( t+T\right) ={\hat{\textbf{x}}}\left( t\right) +T\, {\hat{\textbf{r}}}_{caus}\left( t,T\right) , \end{aligned}$$
(10)

where

  • the signal \({\hat{\textbf{x}}}\left( t\right) \) is generated by (8),

  • the auxiliary vector \({\hat{\textbf{r}}}_{caus}\left( t,T\right) \) is defined as

    $$\begin{aligned} \begin{array}{c} {\hat{\textbf{r}}}_{caus}\left( t,T\right) \text {:=}{\hat{\textbf{r}}}\left( t-T,T\right) +T\dfrac{d}{dt}{\hat{\textbf{r}}}\left( t-T,T\right) \\ + \dfrac{T^{2}}{2}\dfrac{d^{2}}{dt^{2}}{\hat{\textbf{r}}}\left( t-T,T\right) + \dfrac{T^{2}}{2}\dfrac{d^{3}}{dt^{3}}{\hat{\textbf{r}}}\left( t-T,T\right) \end{array} \end{aligned}$$
    (11)

    with

    $$\begin{aligned} \left. \begin{array}{c} {\hat{\textbf{r}}}\left( t-T,T\right) :=\dfrac{1}{T}\displaystyle \int \limits _{\tau =t-T}^{t}{\hat{\textbf{F}}}\left( \tau ,{\textbf{x}}\left( \tau \right) \right) d\tau , \\ \\ {\hat{\textbf{F}}}\left( \tau ,{\textbf{x}}\left( \tau \right) \right) :={ A\hat{\textbf{x}}}\left( \tau \right) \mathbf {+b}x_{1}^{\left( n\right) }\left( \tau \right) + \hat{W}\left( \tau \right) \sigma \left( {\hat{\textbf{x}}}\left( \tau \right) \right) +{L}\left[ x_{1}(\tau )-C^{\top }{\hat{\textbf{x}}}\left( \tau \right) \right] , \end{array} \right\} \nonumber \\ \end{aligned}$$
    (12)
  • the derivatives \(x_{1}^{\left( m\right) }\left( t\right) ,\) \(\left( m=1,\ldots ,n\right) \) and \(\dfrac{d^{k}}{dt^{k}}{\hat{\textbf{r}}}\left( t-T,T\right) ,\) \(\left( k=1,2,3\right) \) are calculated recurrently based on "super-twist algorithm" ( [9]), [16]. To differentiate time function \(f\left( t\right) \), the super-twisting controller is designed to reduce the error s(t) (\(s=x-f\)) between its input \(f\left( t\right) \) and output x(t) to zero:

    $$\begin{aligned} \left. \begin{array}{c} {\dot{x}}(t)=-\alpha \sqrt{|s(t)|}\textrm{sign}(s(t))+y(t), \\ {\dot{y}}(t)=-M\,\textrm{sign}(s(t)) \\ \left| \ddot{f}\right| <F_{0},M>F_{0} \end{array} \right\} \end{aligned}$$
    (13)

    The error s(t) is reduced to zero after a finite time interval \(t_{0}\) and state component y(t) is equal to the first-time derivative of a function \(f\left( t\right) ,\) namely, \(y(t)=\frac{d}{dt}f\left( t\right) \) for all \( t\ge t_{0}\). If \(f\left( t\right) \) is corrupted by bounded noise \( \left| s(t)\right| \le \Delta =\textrm{const}\), then an upper bound of the differentiation, the error is estimated by inequality

    $$\begin{aligned} \left| y(t)-\frac{d}{dt}f\left( t\right) \right| \le \alpha _{1}\Delta +\alpha _{2}\sqrt{\Delta },\alpha _{1},\alpha _{2}\text { {- positive constants.}} \end{aligned}$$

3.3 DNN Predictor with Slow and Fast Components

There are several systems whose trajectories can be understood as the overlapping of signals formed with the combination of slow and fast components. Such systems are also known as multi-rate system that appears naturally in mobile robotics [4], chemical [17] and biochemical [6] reactions, evolution of medical sicknesses [14], evolution of ecosystems animal populations [5] and many others. The same type of combined dynamics is valid for describing the evolution of both infected and deceased persons suffering of the Covid-19 sickness.

The developed DNN structure with mixed (slow and fast) learning scheme could be useful to represent the dynamics of COVID-19. Such a fact can be justified considering that the evolution of infected and deceased persons can be represented as the combination of a slow dynamics defined by the seasonal variations and a fast evolution which corresponds to the daily evolution. One may notice that such multi-rate dynamics has not been considered before in the design of non-parametric identifiers based on differential neural networks, which is indeed a contribution of this study.

3.3.1 Slow Predictive Component

Based on available data \(\left\{ x\left( \tau \right) \right\} _{\tau \in \left[ 0,t\right] }\) let us reconstruct a "slow" trajectory \(\left\{ x_{slow}\left( \tau \right) \right\} _{\tau \in \left[ 0,t\right] }\) defined as the best least squares polynomial approximation of a given order N, that is,

$$\begin{aligned} \left. \begin{array}{c} x_{slow}\left( t\right) =\sum \limits _{i=0}^{N}{\bar{c}}_{i}t^{i}, \\ \\ {{\bar{\textbf{c}}}}=\arg \underset{{\textbf{c}}\in R^{N+1}}{\min } \displaystyle \int \limits _{\tau =0}^{t}\left( x\left( \tau \right) -\sum \limits _{i=0}^{N}c_{i}\tau ^{i}\right) ^{2}d\tau \\ =\left( \displaystyle \int \limits _{\tau =0}^{t}x\left( \tau \right) \mathbf {\tau }\left( \tau \right) \mathbf {\tau }^{\top }\left( \tau \right) d\tau \right) ^{-1}\displaystyle \int \limits _{\tau =0}^{t}x\left( \tau \right) \mathbf {\tau }\left( \tau \right) d\tau , \\ {{\bar{\textbf{c}}}}^{\top }:=\left( {\bar{c}}_{0},\ldots ,{\bar{c}}_{N}\right) \in {\mathbb {R}}^{N+1},\mathbf {\tau }^{\top }\left( \tau \right) :=\left( 1,\tau ,\tau ^{2},\ldots ,\tau ^{N}\right) \in {\mathbb {R}}^{N+1} \end{array} \right\} \end{aligned}$$
(14)

The behavior of trajectory \({{\textbf{x}}}_{slow}\left( t\right) \) is shown in Fig. 1 for COVID-19 case.

Fig. 1
figure 1

Comparative evolution of the slow part of the infected cases over time using the corresponding DNN with its state \({\textbf{x}}_{slow}\left( t\right) \)

Then, as in (10), (11) and (12), define \( {\hat{\textbf{x}}}_{slow}\left( t+T\right) \) as

$$\begin{aligned} {\hat{\textbf{x}}}_{slow}\left( t+T\right) ={\hat{\textbf{x}}}_{slow}\left( t\right) +T\,{\hat{\textbf{r}}}_{caus}^{slow}\left( t,T\right) , \end{aligned}$$
(15)

where

  • \({\hat{\textbf{x}}}_{slow}\left( t\right) \) is generated by the following DNN model:

    $$\begin{aligned} \left. \begin{array}{c} \dfrac{d}{dt}{\hat{\textbf{x}}}_{slow}\left( t\right) {=A\hat{\textbf{x}}} _{slow}\left( t\right) \mathbf {+b}x_{1,slow}^{\left( n\right) }\left( t\right) + \\ \hat{W}_{slow}\left( t\right) \sigma \left( {\hat{\textbf{x}}} _{slow}\left( t\right) \right) +L\left[ x_{1,slow}(t)-C^{\top } {\hat{\textbf{x}}}_{slow}\left( t\right) \right] , \\ \dfrac{d}{dt}\hat{W}_{slow}\left( t\right) =K^{-1}P\left( {\textbf{x}} _{slow}\mathbf {(t)}-\hat{\textbf{x}}_{slow}\mathbf {(}t\mathbf {)} \right) \sigma ^{\top }\left( {\hat{\textbf{x}}}_{slow}\left( t\right) \right) \\ {\hat{\textbf{x}}}_{slow}\left( 0\right) ={\textbf{x}}\left( 0\right) \in {\mathbb {R}}^{n},C^{\top }=\left( 1,0,\ldots ,0\right) \in {\mathbb {R}} ^{n}, \\ \hat{W}_{slow}\left( t\right) \in {\mathbb {R}}^{n\times p},\sigma : {\mathbb {R}}^{n}\rightarrow {\mathbb {R}}^{p}, \end{array} \right\} \end{aligned}$$
    (16)
  • the auxiliary vector \({\hat{\textbf{r}}}_{caus}^{slow}\left( t,T\right) \) is defined as

    $$\begin{aligned} \left. \begin{array}{c} {\hat{\textbf{r}}}_{caus}^{slow}\left( t,T\right) :={\hat{\textbf{r}}} ^{slow}\left( t-T,T\right) +T\dfrac{d}{dt}{\hat{\textbf{r}}}^{slow}\left( t-T,T\right) + \\ \dfrac{T^{2}}{2}\dfrac{d^{2}}{dt^{2}}{\hat{\textbf{r}}}^{slow}\left( t-T,T\right) +\dfrac{T^{2}}{2}\dfrac{d^{3}}{dt^{3}}{\hat{\textbf{r}}} ^{slow}\left( t-T,T\right) , \end{array} \right\} \end{aligned}$$
    (17)

    with

    $$\begin{aligned} \left. \begin{array}{c} {\hat{\textbf{r}}}^{slow}\left( t-T,T\right) :=\dfrac{1}{T} \displaystyle \int \limits _{\tau =t-T}^{t}{\hat{\textbf{F}}}^{slow}\left( \tau ,{\textbf{x}} _{slow}\left( \tau \right) \right) d\tau , \\ \\ {\hat{\textbf{F}}}^{slow}\left( \tau ,{\textbf{x}}_{slow}\left( \tau \right) \right) :={A\hat{\textbf{x}}}_{slow}\left( \tau \right) \mathbf {+b} x_{1,slow}^{\left( n\right) }\left( \tau \right) + \\ \hat{W}\left( \tau \right) \sigma \left( {\hat{\textbf{x}}}_{slow}\left( \tau \right) \right) +{L}\left[ x_{1,slow}(\tau )-C^{\top } {\hat{\textbf{x}}}_{slow}\left( \tau \right) \right] , \end{array} \right\} \end{aligned}$$
    (18)

3.3.2 Fast Predictive Component

Define \(x_{fast}\left( t\right) \) as

$$\begin{aligned} x_{fast}\left( t\right) :=x\left( t\right) -x_{slow}\left( t\right) \end{aligned}$$
(19)

The behavior of trajectory \({\textbf{x}}_{fast}\left( t\right) \) is shown in Fig. 2 for COVID-19 case.

Fig. 2
figure 2

Comparative evolution of the faster part of the infected cases over time using the corresponding DNN with its state \({\textbf{x}}_{fast}\left( t\right) \)

Then, as in (16), (17) and (18), define \({\hat{\textbf{x}}}_{fast}\left( t+T\right) \) as

$$\begin{aligned} {\hat{\textbf{x}}}_{fast}\left( t+T\right) ={\hat{\textbf{x}}}_{fast}\left( t\right) +T\,{\hat{\textbf{r}}}_{caus}^{fast}\left( t,T\right) , \end{aligned}$$
(20)

where

  • \({\hat{\textbf{x}}}_{fast}\left( t\right) \) is generated by the following DNN model:

    $$\begin{aligned} \left. \begin{array}{c} \dfrac{d}{dt}{\hat{\textbf{x}}}_{fast}\left( t\right) {=A\hat{\textbf{x}}} _{fast}\left( t\right) \mathbf {+b}x_{1,fast}^{\left( n\right) }\left( t\right) + \\ \hat{W}_{fast}\left( t\right) \sigma \left( {\hat{\textbf{x}}}_{fast}\left( t\right) \right) +L\left[ x_{1,fast}(t)-C^{\top }{\hat{\textbf{x}}} _{fast}\left( t\right) \right] , \\ \\ \dfrac{d}{dt}\hat{W}_{fast}\left( t\right) =K^{-1}P\left( {\textbf{x}}_{fast} \mathbf {(}t)-{\hat{\textbf{x}}}_{fast}\mathbf {(}t\mathbf {)}\right) \sigma ^{\top }\left( {\hat{\textbf{x}}}_{fast}\left( t\right) \right) \\ \\ {\hat{\textbf{x}}}_{fast}\left( 0\right) ={\textbf{x}}\left( 0\right) \in {\mathbb {R}}^{n},C^{\top }=\left( 1,0,\ldots ,0\right) \in {\mathbb {R}} ^{n}, \\ \hat{W}_{fast}\left( t\right) \in {\mathbb {R}}^{n\times p}, \sigma : {\mathbb {R}}^{n}\rightarrow {\mathbb {R}}^{p}, \end{array} \right\} \end{aligned}$$
    (21)
  • the auxiliary vector \({\hat{\textbf{r}}}_{caus}^{fast}\left( t,T\right) \) is defined as

    $$\begin{aligned} \left. \begin{array}{c} {\hat{\textbf{r}}}_{caus}^{fast}\left( t,T\right) :={\hat{\textbf{r}}} ^{fast}\left( t-T,T\right) +T\dfrac{d}{dt}{\hat{\textbf{r}}}^{fast}\left( t-T,T\right) + \\ \dfrac{T^{2}}{2}\dfrac{d^{2}}{dt^{2}}{\hat{\textbf{r}}}^{fast}\left( t-T,T\right) +\dfrac{T^{2}}{2}\dfrac{d^{3}}{dt^{3}}{\hat{\textbf{r}}} ^{fast}\left( t-T,T\right) , \end{array} \right\} \end{aligned}$$
    (22)

    with

    $$\begin{aligned} \left. \begin{array}{c} {\hat{\textbf{r}}}^{fast}\left( t-T,T\right) :=\dfrac{1}{T}\displaystyle \int \limits _{\tau =t-T}^{t}{\hat{\textbf{F}}}^{fast}\left( \tau ,{\textbf{x}}_{fast}\left( \tau \right) \right) d\tau , \\ {\hat{\textbf{F}}}^{fast}\left( \tau ,{\textbf{x}}_{fast}\left( \tau \right) \right) :={A\hat{\textbf{x}}}_{fast}\left( \tau \right) \mathbf {+b} x_{1,fast}^{\left( n\right) }\left( \tau \right) + \\ \hat{W}\left( \tau \right) \sigma \left( {\hat{\textbf{x}}}_{fast}\left( \tau \right) \right) +L\left[ x_{1,fast}(\tau )-C^{\top }{\hat{\textbf{x}}} _{fast}\left( \tau \right) \right] . \end{array} \right\} \end{aligned}$$
    (23)

3.4 Joint Slow and Fast Predictor

In this paper, we use more advance predictor, consisting in two components: slow \({\hat{\textbf{x}}}_{slow}\left( t+T\right) \) and fast \( {\hat{\textbf{x}}}_{fast}\left( t+T\right) ,\) namely,

$$\begin{aligned} {\hat{\textbf{x}}}\left( t+T\right) ={\hat{\textbf{x}}}_{slow}\left( t+T\right) +{\hat{\textbf{x}}}_{fast}\left( t+T\right) \text {.} \end{aligned}$$
(24)

4 Structure of Numerical Procedure

The suggested predictive numerical structure consists of the following steps:

  1. 1.

    Based on given discrete-time data \(\left\{ x\left( k\right) \right\} _{k\in \left[ 0,1,\ldots {\mathcal {K}}\right] }\), where \(x\left( k\right) \) corresponds to the data value at day k, and applying a spline approximation (herein examples, we use the spline of 15-th order) we construct the continuous-time curve \(\left\{ x\left( \tau \right) \right\} _{\tau \in \left[ 0,t\right] }\) where \(t={\mathcal {K}}\Delta \) (\(\Delta \) is time interval between discrete data).

  2. 2.

    Then using (14) and (19), based on the obtained curve \(\left\{ x\left( \tau \right) \right\} _{\tau \in \left[ 0,t \right] }\) we need to construct the slow \(x_{slow}\left( t\right) \) and fast \(x_{fast}\left( t\right) \) trajectories.

  3. 3.

    Applying the procedures (16), (17) and (18) we obtain the slow predictive curve \({\hat{\textbf{x}}}_{slow}\left( t+T\right) \) (15).

  4. 4.

    Then applying the procedures (20), (22 ) and (23) we obtain the fast predictive curve \( {\hat{\textbf{x}}}_{fast}\left( t+T\right) \) (15).

  5. 5.

    The last step is to construct the final predictive curve \({\hat{ \textbf{x}}}\left( t+T\right) \) (24) for desired T (for example, taking \(T=60,90,120\) days on the COVID-19 prediction).

The corresponding block scheme is shown in Fig. 3.

Fig. 3
figure 3

Flow diagram describing how the proposed forecasting evolution is derived using the tools included in the toolbox

The developed algorithm was implemented accordingly to the following pseudocode.

  1. 1.

    Load information corresponding to infected or deceased patients suffered from COVID-19 sickness

  2. 2.

    Interpolate the loaded information using a third order spline strategy

  3. 3.

    Implement a p-th order low-pass filter with a finite-impulse response strategy using a cut-off frequency of 0.5 Hz. This frequency was determined using the collected information. The value of p-th order is fixed to 7 considering the evolution of Covid information.

  4. 4.

    Filter the loaded information separating the slow and fast components of the infected or deceased datasets, according to the selected cut-off frequency.

  5. 5.

    Develop the slow learning algorithm in the first differential neural network implemented as an non-parametric identifier.

  6. 6.

    Develop the fast learning algorithm in the first differential neural network implemented as an non-parametric identifier.

  7. 7.

    Divide the information considering the training period and the complementary validation period.

  8. 8.

    Evaluate both the slow and fast identifiers to reproduce the information corresponding to the information considered in the training period.

  9. 9.

    Repeat the identification task until the least mean square error of the identification error for both the slow and the fast learning is smaller than a given threshold value \(\varepsilon \).

  10. 10.

    Once the expected quality of training is expected, recover the values of the weights produced during this part of the process for both the slow and fast evaluations.

  11. 11.

    Develop the numerical simulation of the differential neural network working as the predictor using two models using the recovered weights from the slow and the fast evolution of the training algorithms.

  12. 12.

    Add the results of the slow and fast predictors to reconstruct the information during the prediction period.

  13. 13.

    Compare if possible, the information obtained from the Covid statistics during the prediction period with respect to the obtained data during the evaluation of the added identifier.

  14. 14.

    Determine the Least Mean Square Error and the Maximum Error for the predicted period, if possible to characterize the quality of the prediction task.

5 Seventy Days Prediction of Infections and Deaths for Different Countries

This research uses a publicly available dataset “2019 Novel Coronavirus Data Repository-published by Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) available at: https://github.com/CSSEGISandData/COVID-19. Models achieved and the code used in their generation are available in a repository located at: https://github.com/RitehAIandRobot/COVID-19-MLP. This information is proposed in COVID-19 MLP, Riteh AI and Robotics Group, 2020, https://github.com/RitehAIandRobot/COVID-19-MLP.

The presented set of numerical simulations considered a temporal horizon of 70 days. All the selected parameters were obtained using the Hurwitz conditions for \(A-{\textbf {L}}C^{\top }\). The values for the parameters considered in the activation functions were obtained with a uniform distribution for the exponential term, unitary gain with a fixed offset to 0.5. Hence, the parameters used for solving the numerical simulation for this the study was the following:

$$\begin{aligned} A=1.0\cdot 10^{-2} \cdot \left[ \begin{array}{cccc} -25 &{} 0 &{} 0 &{} 0 \\ -8 &{} -32 &{} 0 &{} 0 \\ -12 &{} -11 &{} -44 &{} 0 \\ -13 &{} -13 &{} -12 &{} -52 \\ &{} &{} &{} \end{array} \right] \end{aligned}$$
(25)

The number of sigmoidal functions (artificial neurons in the DNN) used for the identification process was 9600. The parameters in the sigmoidal functions were \(\alpha _{j} = 1\) for all \( j=1,\ldots ,9600\). The parameter in the denominator are \(\beta _{j} = 0.05\) and \(\beta _{j} = 0.08\) for all \(j=1,\ldots ,9600\). The period T was fixed to 10 days. All the initial conditions were fixed as random values between 0 and 1. These selections were obtained using a trial and testing method that effectively estimated the number of infected and deceased persons with Sars-Cov2 virus. These estimations were evaluated using the collected information reported by the World Health Organization.

The values of matrices K, P, and L are as follows:

$$\begin{aligned} \left. \begin{array}{c} K= \left[ \begin{array}{cccc} 25.0 &{} 1.2 &{} -1.5 &{} 2.3 \\ 1.2 &{} 32.0 &{} -1.1 &{} -1.3 \\ -1.5 &{} -1.1 &{} 44 &{} 1.2 \\ 2.3 &{} -1.3 &{} 1.2 &{} 52 \end{array} \right] , \\ P= \left[ \begin{array}{cccc} 2.5 &{} 0.8 &{} -1.2 &{} -0.3 \\ 0.8 &{} 5.2 &{} -1.1 &{} -0.1 \\ -1.2 &{} -1.1 &{} 6.4 &{} -1.2 \\ -0.3 &{} -0.1 &{} -1.2 &{} 9.2 \end{array} \right] , \\ L= \left[ \begin{array}{cccc} -15 &{} -28 &{} -32 &{} -53 \end{array} \right] ^{\top }. \end{array} \right\} \end{aligned}$$
(26)

5.1 Turkey

Figure 4 shows the comparison of estimated data evolution for infected people in Turkey. The comparison of trajectories confirms at first glance the effectiveness of the proposed DNN-based forecasting considering a period of estimation of 70 days. Moreover, it shows the effective estimation of the forecast information.

Fig. 4
figure 4

Prediction results with future estimation for the infected people detected in Turkey using the proposed DNN-based predictor

Figure 5 depicts the evolution of the predicted data for deceased people in Turkey using the proposed multi-rate identifier. In this case, there is a comparison considering the estimated data and the one corresponding to the actual data.

Fig. 5
figure 5

Prediction results with future estimation for the deceased people detected in Turkey using the proposed DNN-based predictor

5.2 USA

Figure 6 shows the comparison of estimated data evolution for infected people in the United States of America. The comparison of trajectories confirms at first glance the effectiveness of the proposed DNN-based forecasting considering a period of 70 days.

Fig. 6
figure 6

Prediction results with future estimation for the infected people detected in the United States of America using the proposed DNN-based predictor

Figure 5 depicts the evolution of the predicted data for deceased people in the United States of America using the proposed multi-rate identifier. In this case, there is a comparison considering the estimated data and the one corresponding to the actual data.

All the previous results confirm that the proposed forecaster is based on the dual configuration of DNN. Moreover, the proposed technique can be easily implemented in different forecast problems taking advantage of the generalized formulation presented here.

For both studied cases, we included here some methods used for comparison including a traditional recurrent neural network (RNN), a Long-Short term memory (LSTM), and a gated network unit (GNU). These networks were considered for comparison taking into account the significant outcomes shown before as potential predictors of complex time dependent information. We have presented two tables (one per infected and one per deceased persons) comparing some quality measurements, including the least mean square evaluation for the signals corresponding to the evolution of infected and deceased persons during the COVID outbreak (Tables 1 and 2). With the aim of introducing a fair comparison, the number of flops used for each of the prediction tasks was also estimated. These results confirm the advances generated by applying the proposed predictor based on the combined learning method introduced in this study.

Table 1 Comparison results for infected persons showing the prediction outcomes for the infected persons and the flops used to perform the task
Table 2 Comparison results for deceased persons showing the prediction outcomes for the infected persons and the flops used to perform the task
Fig. 7
figure 7

Prediction results with future estimation for the deceased people detected in the United States of America using the proposed DNN-based predictor

The proposed outcomes shown in the previous tables confirm the benefits of the proposed methodology, including the prediction quality, as well as the convergence conditions (noticing the maximum error value). However, the augmented number of flops required by the methodology considered in this study still requires some work to improve the prediction abilities. Moreover, showing the better least mean square errors obtained with the proposed methodology highlights the benefit of introducing the mixed learning with slow and fast dynamics.

6 Conclusions

  • In this paper, it is shown that time-series data may be effectively modeled by a Differential Neural Network (DNN) with time-varying weights matrix parameters whose dynamics are governed by special Learning laws containing slow and fast components;

  • This study also demonstrates one of the possible applications of the suggested technique to COVID-19 epidemic prediction, where we suppose that the current data set represents the output of some dynamic system, governed by a nonlinear ordinary differential equation; this method has been evaluated for two nations’ databases (Turkey and the USA) and has demonstrated great performances (70 days of forecast).