1 Introduction

We consider time-homogeneous stochastic differential equations (SDEs) of the form

$$\begin{aligned} dX_t=\mu (X_t) dt + \sigma (X_t) dW_t\,, \quad X_0=x\,, \end{aligned}$$
(1)

where \(x\in {\mathbb {R}}^d\) is the initial value, \(\mu :{\mathbb {R}}^d \longrightarrow {\mathbb {R}}^d\) is the drift and \(\sigma : {\mathbb {R}}^d \longrightarrow {\mathbb {R}}^{d \times d}\) is the diffusion coefficient.

The Euler–Maruyama approximation with step-size \(\delta >0\) of the solution to (1) is given by

$$\begin{aligned} X^\delta _t=x+\int _0^t \mu (X^\delta _{\underline{s}}) ds + \int _0^t \sigma (X^\delta _{\underline{s}}) dW_s\,, \end{aligned}$$
(2)

with \({\underline{s}}=j\delta \) for \(s\in [j \delta , (j+1)\delta )\), \(j=0,\ldots ,(T-\delta )/\delta \). In particular, for \(t\in \{j \delta : j=0,\ldots ,(T-\delta )/\delta \}\), we have

$$\begin{aligned} X^\delta _{t+\delta }=X^\delta _{t}+\mu (X^\delta _{t}) \delta + \sigma (X^\delta _{t}) (W_{t+\delta }-W_{t})\,. \end{aligned}$$

For \(\mu ,\sigma \) Lipschitz, Itô [9] proved existence and uniqueness of the solution of (1). In this case the Euler–Maruyama method (2) converges with strong order 1 / 2 to the true solution, see [12, Theorem 10.2.2]. Higher order algorithms exist, but require stronger conditions on the coefficients.

In applications, frequently SDEs with less regular coefficients appear. For example in stochastic control theory, whenever the optimal control is of bang–bang type, meaning that the strategy is of the form \({\mathbf {1}}_\mathcal{S}(X)\) for a measurable set \(\mathcal{S}\subseteq {\mathbb {R}}^d\), the drift of the controlled dynamical system is discontinuous. Furthermore, there are models which involve only noisy observations of a signal that has to be filtered. After applying filtering theory the diffusion coefficient typically is degenerate in the sense that \(\Vert \sigma (x)^\top v\Vert =0\), for some \(x,v\in {\mathbb {R}}^d\). This motivates the study of SDEs with these kind of irregularities in the coefficients.

If \(\mu \) is bounded and measurable, and \(\sigma \) is bounded, Lipschitz, and uniformly non-degenerate, i.e. if there exists a constant \(c_0>0\) such that for all \(x\in {\mathbb {R}}^d\) and all \(v\in {\mathbb {R}}^d\) it holds that \(\Vert \sigma (x)^\top v\Vert \ge c_0 \Vert v\Vert \), Zvonkin [29] and Veretennikov [25, 26] prove existence and uniqueness of a solution. Veretennikov [27] extends these results by allowing a part of the diffusion to be degenerate.

In [16] existence and uniqueness of a solution for the case where the drift is discontinuous at a hyperplane, or a special hypersurface and where the diffusion coefficient is degenerate is proven, and in [23] it is shown how these results extend to the non-homogeneous case.

Currently, research on numerical methods for SDEs with irregular coefficients is highly active. Hutzenthaler et al. [8] introduce the tamed Euler–Maruyama scheme and prove strong order 1 / 2 convergence for SDEs with continuously differentiable and polynomially growing drift that satisfy a one-sided Lipschitz condition. Sabanis [22] proves strong convergence of the tamed Euler–Maruyama scheme from a different perspective and also considers the case of locally Lipschitz diffusion coefficient. Gyöngy [4] proves almost sure convergence of the Euler–Maruyama scheme for the case where the drift satisfies a monotonicity condition.

Halidias and Kloeden [7] show that the Euler–Maruyama scheme converges strongly for SDEs with a discontinuous monotone drift coefficient. Kohatsu-Higa et al. [13] show weak convergence with rates smaller than 1 of a method where they first regularize the discontinuous drift and then apply the Euler–Maruyama scheme. Étoré and Martinez [1, 2] introduce an exact simulation algorithm for one-dimensional SDEs with a drift coefficient which is discontinuous in one point, but differentiable everywhere else. For one-dimensional SDEs with piecewise Lipschitz drift and possibly degenerate diffusion coefficient, in [14] an existence and uniqueness result is proven, and a numerical method, which is based on applying the Euler–Maruyama scheme to a transformation of (1), is presented. This method converges with strong order 1 / 2. In [15] a (non-trivial) extension of the method is introduced, which converges with strong order 1 / 2 also in the multidimensional case. The paper also contains an existence and uniqueness result for the multidimensional setting under more general conditions than, e.g., the ones stated in [16].

The method introduced in [15] is the first numerical method that is proven to converge with positive strong rate for multidimensional SDEs with discontinuous drift and degenerate diffusion coefficient. It requires application of a transformation and its numerical inverse in each step, which makes the method rather slow in practice. Furthermore, the method requires specific inputs about the geometry of the discontinuity of the drift to calculate this transformation. This is a drawback, if, e.g., the method shall be applied for solving control problems, since the control is usually not explicitly known. So a method is preferred that can deal with the discontinuities in the drift automatically.

First results in this direction are contained in a series of papers by Ngo and Taguchi. In [21] they show convergence of order up to 1 / 4 of the Euler–Maruyama method for multidimensional SDEs with discontinuous bounded drift that satisfies a one-sided Lipschitz condition and with Hölder continuous, bounded, and uniformly non-degenerate diffusion coefficient. In [19] they extend this result to cases where the drift is not necessarily one-sided Lipschitz for one-dimensional SDEs, and in [20] they extend the result for one-dimensional SDEs by allowing for discontinuities also in the diffusion coefficient. For many applications, their results fail to be applicable, since they only hold for one-dimensional SDEs and their method of proof relies on uniform non-degeneracy of the diffusion coefficient.

Contrasting the above, there are several delimiting results which state that even equations with infinitely often differentiable coefficients cannot always be solved approximately in finite time, even if the Euler–Maruyama method converges, cf. Hairer et al. [6], Jentzen et al. [10], Müller-Gronbah and Yaroslavtseva [18], Yaroslavtseva [28]. However, there is still a big gap between the assumptions on the coefficients under which convergence with strong convergence rate has been proven and the properties of the coefficients of the equation presented in [6].

In this paper we prove strong convergence of order \(1/4-\epsilon \) for arbitrarily small \(\epsilon >0\) of the Euler–Maruyama method for multidimensional SDEs with discontinuous drift satisfying a piecewise Lipschitz condition and with a degenerate diffusion coefficient. Note that we do not impose a one-sided Lipschitz condition on the drift. So even for SDEs with non-degenerate diffusion coefficient, which do not have a one-sided Lipschitz drift, this result is novel.

Our convergence proof is based on estimating the difference between the Euler–Maruyama scheme and the scheme presented in [15]. Close to the set of discontinuities of the drift, we have no tight estimate of this difference, so we need to study the occupation time of an Itô process with degenerate diffusion coefficient there. Away from the set of discontinuities, it is essential to estimate the probability that during one step the distance between the interpolation of the Euler–Maruyama method and the previous Euler–Maruyama step becomes greater than some threshold.

This paper’s result is the first one that gives strong convergence and also a strong convergence rate of a fully explicit scheme for multidimensional SDEs with discontinuous drift and degenerate diffusion coefficient, and the first one for multidimensional SDEs with discontinuous drift that does not satisfy a one-sided Lipschitz condition.

2 Preliminaries

In this section we first state the assumptions on the coefficients of SDE (1), under which the result of this paper is proven, then we study the occupation time of an Itô process close to a hypersurface, and finally we recall the transformation from [15], which is also essential for our proof.

2.1 Definitions and assumptions

We want to prove strong convergence of the Euler–Maruyama method for SDEs with discontinuous drift coefficient. Instead of the usual requirement of Lipschitz continuity we only assume that the drift is a piecewise Lipschitz function on the \({\mathbb {R}}^d\).

Definition 2.1

([15, Definitions 3.1 and 3.2]) Let \(A\subseteq {\mathbb {R}}^d\).

  1. 1.

    For a continuous curve \(\gamma :[0,1]\longrightarrow {\mathbb {R}}^d\), let \(\ell (\gamma )\) denote its length,

    $$\begin{aligned} \ell (\gamma )=\sup _{n,0\le t_1<\cdots <t_n\le 1}\sum _{k=1}^n \Vert \gamma (t_k)-\gamma (t_{k-1})\Vert \,. \end{aligned}$$

    The intrinsic metric \(\rho \) on A is given by

    $$\begin{aligned} \rho (x,y):= & {} \inf \{\ell (\gamma ):\gamma :[0,1]\longrightarrow A \text { is a continuous curve satisfying } \gamma (0)\\= & {} x,\, \gamma (1)=y\}\,, \end{aligned}$$

    where \(\rho (x,y):=\infty \), if there is no continuous curve from x to y.

  2. 2.

    Let \(f:A\longrightarrow {\mathbb {R}}^m\) be a function. We say that f is intrinsic Lipschitz, if it is Lipschitz w.r.t. the intrinsic metric on A, i.e. if there exists a constant L such that

    $$\begin{aligned} \forall x,y\in A: \Vert f(x)-f(y)\Vert \le L \rho (x,y)\,. \end{aligned}$$

The prototypical examples for intrinsic Lipschitz function are given, like in the one-dimensional case, by differentiable functions with bounded derivative.

Lemma 2.2

([15, Lemma 3.8]) Let \(A\subseteq {\mathbb {R}}^d\) be open and let \(f:A\longrightarrow {\mathbb {R}}^m\) be a differentiable function with \(\Vert f'\Vert <\infty \). Then f is intrinsic Lipschitz with Lipschitz constant \(\Vert f'\Vert \).

Definition 2.3

([15, Definition 3.4]) A function \(f{:}\,{\mathbb {R}}^d\longrightarrow {\mathbb {R}}^m\) is piecewise Lipschitz, if there exists a hypersurface \(\Theta \) with finitely many connected components and with the property, that the restriction \(f|_{{\mathbb {R}}^d\backslash \Theta }\) is intrinsic Lipschitz. We call \(\Theta \) an exceptional set for f, and we call

$$\begin{aligned} \sup _{x,y\in {\mathbb {R}}^d\backslash \Theta }\frac{\Vert f(x)-f(y)\Vert }{\rho (x,y)} \end{aligned}$$

the piecewise Lipschitz constant of f.

In this paper \(\Theta \) will be a fixed \(C^3\)-hypersurface, and we will only consider piecewise Lipschitz functions with exceptional set \(\Theta \). In the following, \(L_f\) denotes the piecewise Lipschitz constant of a function f, if f is piecewise Lipschitz, and it denotes the Lipschitz constant, if f is Lipschitz.

We define the distance \(d(x,\Theta )\) between a point x and the hypersurface \(\Theta \) by \(d(x,\Theta ):=\inf \{\Vert x-y\Vert :y \in \Theta \}\), and for every \(\varepsilon >0\) we define \(\Theta ^\varepsilon :=\{x\in {\mathbb {R}}^d: d(x,\Theta )<\varepsilon \}\).

Recall that, since \(\Theta \in C^3\), for every \(\xi \in \Theta \) there exists an open environment \(U\subseteq \Theta \) of \(\xi \) and a continuously differentiable function \(n{:}\,U\longrightarrow {\mathbb {R}}^d\) such that for every \(\zeta \in U\) the vector \(n(\zeta )\) has length 1 and is orthogonal to the tangent space of \(\Theta \) in \(\zeta \). On a given connected open subset of \(\Theta \) the local unit normal vector n is unique up to a factor \(\pm 1\).

We recall a definition from differential geometry.

Definition 2.4

Let \(\Theta \in {\mathbb {R}}^d\) be any set.

  1. 1.

    An environment \(\Theta ^\varepsilon \) is said to have the unique closest point property, if for every \(x\in {\mathbb {R}}^d\) with \(d(x,\Theta )<\varepsilon \) there is a unique \(p\in \Theta \) with \(d(x,\Theta )=\Vert x-p\Vert \). Therefore, we can define a mapping \(p{:}\Theta ^{\varepsilon }\longrightarrow \Theta \) assigning to each x the point \(p(x)\) in \(\Theta \) closest to x.

  2. 2.

    \(\Theta \) is said to be of positive reach, if there exists \(\varepsilon >0\) such that \(\Theta ^\varepsilon \) has the unique closest point property. The reach of \(\Theta \) is the supremum over all such \(\varepsilon \) if such an \(\varepsilon \) exists, and 0 otherwise.

Now, we give assumptions which are sufficient for the results in [15] to hold and which we need to prove the main result here.

Assumption 2.1

We assume the following for the coefficients of (1):

  1. 1.

    \(\mu \) and \(\sigma \) are bounded;

  2. 2.

    the diffusion coefficient \(\sigma \) is Lipschitz;

  3. 3.

    the drift coefficient \(\mu \) is a piecewise Lipschitz function \({\mathbb {R}}^d\longrightarrow {\mathbb {R}}^d\). Its exceptional set \(\Theta \) is a \(C^3\)-hypersurface of positive reach;

  4. 4.

    non-parallelity condition: there exists a constant \(c_0>0\) such that \(\Vert \sigma (\xi )^\top n(\xi )\Vert \ge c_0\) for all \(\xi \in \Theta \);

  5. 5.

    the function \(\alpha {:} \Theta \longrightarrow {\mathbb {R}}^d\) defined by

    $$\begin{aligned} \alpha (\xi ):=\lim _{h\rightarrow 0}\frac{\mu (\xi -h n(\xi ))-\mu (\xi +hn(\xi ))}{2 \Vert \sigma (\xi )^\top n(\xi )\Vert ^2} \end{aligned}$$
    (3)

    is \(C^3\) and all derivatives up to order three are bounded.

Theorem 2.5

([15, Theorem 3.21]) Let Assumption 2.1 hold. Then SDE (1) has a unique strong solution.

Remark on Assumption 2.1:

  1. 1.

    For existence and uniqueness of a solution to (1), in [15, Theorem 3.21] instead of Assumption 2.1.1 only boundedness in an \(\varepsilon \)-environment of \(\Theta \) is needed. However, for the proof of our convergence result we require global boundedness. Note that other results in the literature on numerical methods for SDEs with discontinuous drift also rely on boundedness of the coefficients, cf. [19,20,21].

  2. 2.

    Assumption 2.1.2 is a technical condition; the focus in this paper is on other types of irregularities in the coefficients. There are results in the literature, where the authors deal with a non-globally Lipschitz diffusion coefficient, see, e.g., [5], but in contributions where only Hölder continuity is required for \(\sigma \), usually uniform non-degeneracy is assumed.

  3. 3.

    Assumption 2.1.3 is a geometrical condition which we require in order to locally flatten \(\Theta \), i.e. to map \(\Theta \) to a hyperplane in a regular way. This is crucial in many places in [15] and here, in particular for the proof of Theorem 2.7 below. In addition to that, Assumption 2.1.3 implies that there exists a constant \(c_1\) such that \(\Vert n'(\xi )\Vert \le c_1\) for every \(\xi \in \Theta \) and every orthonormal vector n on \(\Theta \), see [15, Lemma 3.10].

  4. 4.

    Assumption 2.1.4 means that the diffusion coefficient must have a component orthogonal to \(\Theta \) in all \(\xi \in \Theta \). This condition is significantly weaker than uniform non-degeneracy, and it is essential: in [16] we give a counterexample for the case where the non-parallelity condition does not hold. Then, even existence of a solution is not guaranteed.

  5. 5.

    Assumption 2.1.5 is a technical condition, which is required for our transformation method to work. Boundedness of \(\alpha \) and \(\alpha '\) is needed for proving the local invertibility of our transform. Existence and boundedness of \(\alpha ''\) and \(\alpha '''\) is required for the multidimensional version of Itô’s formula to hold for the transform, see [15]. Moreover, it has been shown in [15, Proposition 3.13] that \(\alpha \) is a well-defined function on \(\Theta \), i.e. it does not depend on the choice of the normal vector n and, in particular, on its sign.

Example 2.6

Suppose \(\Theta \) is the finite and disjoint union of orientable compact \(C^3\)-manifolds. Then \(\Theta \) is of positive reach by the lemma in [3], and each connected component of \(\Theta \) separates the \({\mathbb {R}}^n\) into two open connected components by the Jordan–Brouwer separation theorem, see [17].

Thus \({\mathbb {R}}^d\backslash \Theta \) is the union of finitely many disjoint open connected subsets of \({\mathbb {R}}^d\); we can write \({\mathbb {R}}^d\backslash \Theta =A_1\cup \cdots \cup A_n\).

Suppose there exist bounded and Lipschitz \(C^3\)-functions \(\mu _1,\ldots ,\mu _n:{\mathbb {R}}^d\longrightarrow {\mathbb {R}}^d\) such that \(\mu =\sum _{k=1}^n {\mathbf {1}}_{A_k}\mu _k\), and suppose that \(\sigma :{\mathbb {R}}^d\longrightarrow {\mathbb {R}}^{d\times d}\) is bounded, Lipschitz, and \(C^3\) with \(\sigma (\xi )^\top n(\xi )\ne 0\) for every \(\xi \in \Theta \).

Then it is readily checked that \(\mu \) and \(\sigma \) satisfy Assumption 2.1.

In Sect. 4 we present a number of concrete examples which satisfy Assumption 2.1 and we perform numerical tests on the associated SDEs.

2.2 Occupation time close to a hypersurface

In this section we study the occupation time of an Itô process close to a \(C^3\)-hypersurface. In the proof of our main theorem, the Euler–Maruyama approximation \(X^\delta \) in equation (2) will play the role of that Itô process.

Theorem 2.7

Let \(\Theta \) be a \(C^3\)-hypersurface of positive reach and let \({\varepsilon _0}>0\) be such that the closure of \(\Theta ^{\varepsilon _0}\) has the unique closest point property. Let further \(X=(X_t)_{t\ge 0}\) be an \({\mathbb {R}}^d\)-valued Itô process

$$\begin{aligned} X_t=X_0+\int _0^t A_s ds+\int _0^t B_s dW_s\,, \end{aligned}$$

with progressively measurable processes \(A=(A_t)_{t\ge 0}\), \(B=(B_t)_{t\ge 0}\), where \(A\) is \({\mathbb {R}}^d\)-valued and \(B\) is \({\mathbb {R}}^{d\times d}\)-valued. Let the coefficients \(A,B\) be such that

  1. 1.

    there exists a constant \(c_{AB}\) such that for almost all \(\omega \in \Omega \) it holds that

    $$\begin{aligned} \forall t\in [0,T]: X_t(\omega )\in \Theta ^{\varepsilon _0} \Longrightarrow \max (\Vert A_t(\omega )\Vert ,\Vert B_t(\omega )\Vert )\le c_{AB}\,; \end{aligned}$$
  2. 2.

    there exists a constant \(c_0\) such that for almost all \(\omega \in \Omega \) it holds that

    $$\begin{aligned} \forall t\in [0,T]: X_t(\omega )\in \Theta ^{\varepsilon _0} \Longrightarrow n (p(X_t(\omega )) )^\top B_t(\omega )B_t(\omega )^\top n (p(X_t(\omega ) ))\ge c_0\,. \end{aligned}$$

Then there exists a constant C such that for all \(0<\varepsilon <\varepsilon _0/2\),

$$\begin{aligned} \int _0^{T} {\mathbb P}\left( \{X_s \in \Theta ^\varepsilon \} \right) ds \le C\varepsilon \,. \end{aligned}$$

For the proof we will construct a one-dimensional Itô process Y with the property that Y is close to 0, if and only if X is close to \(\Theta \). For the construction of Y we decompose the path of X into pieces close to \(\Theta \) and pieces farther away. These pieces are then mapped to \({\mathbb {R}}\) by using a signed distance of X from \(\Theta \) and pasted together in a continuous way.

A signed distance to \(\Theta \) is locally given by \(D(x):=n(p(x))^\top (x-p(x))\), where n is a local unit normal vector.

Lemma 2.8

For all \(x\in \Theta ^{\varepsilon _0}\) it holds that \(D'(x)=n(p(x))^\top \).

Proof

Fix \(x\in \Theta ^{\varepsilon _0}\backslash \Theta \) and consider the function h defined by \(h(b):=\Vert x-p(x+b)\Vert ^2\). By definition of the projection map p, h has a minimum in \(b=0\), such that \(h'(0)=0\). Hence from \(h'(b)=-2(x-p(x+b))^\top p'(x+b)\), we get \((x-p(x))^\top p'(x)=0\). This implies \(n(p(x))^\top p'(x)=0\), since \((x-p(x))\) is a scalar multiple of n(p(x)).

Using that \(D(x)=a\Vert x-p(x)\Vert \) for an \(a\in \{-1,1\}\), we compute

$$\begin{aligned} D'(x)&=a\Vert x-p(x)\Vert ^{-1} (x-p(x))^\top ({{\text {id}}_{{\mathbb {R}}^d}}-p'(x)) =a\,n(p(x))^\top ({{\text {id}}_{{\mathbb {R}}^d}}-p'(x))\nonumber \\&=a\big (n(p(x))^\top -n(p(x))^\top p'(x)\big ) =an(p(x))^\top \,. \end{aligned}$$
(4)

For \(\psi \in {\mathbb {R}}\) with \(|\psi |\) small we get

$$\begin{aligned} D(x+\psi n(p(x)))&=n\Big (p\big (x+\psi n(p(x))\big )\Big )^\top \Big (x+\psi n(p(x))-p\big (x+\psi n(p(x))\big )\Big )\\&=n(p(x))^\top (x+\psi n(p(x))-p(x))=D(x)+\psi \,, \end{aligned}$$

such that the directional derivative of D in direction n(p(x)) in x is 1. From this and from (4) it follows that \(D'(x)=n(p(x))^\top \). This also holds for \(x\in \Theta \) by the continuity of \(D'\). \(\square \)

The following lemma states that for any continuous curve \(\gamma \) in \(\Theta ^{\varepsilon _0}\) there is a continuous path of unit normal vectors, such that to every point of \(\gamma \) we can assign a signed distance in a continuous way.

Lemma 2.9

Let \(\gamma :[a,b]\longrightarrow \Theta ^{\varepsilon _0}\) be a continuous function. Then there exists \(m:[a,b]\longrightarrow {\mathbb {R}}^d\) such that

  1. 1.

    m is continuous;

  2. 2.

    \(\Vert m(t)\Vert =1\) for all \(t\in [a,b]\);

  3. 3.

    m(t) is orthogonal to \(\Theta \) in the point \(p(\gamma (t))\) for all \(t\in [a,b]\).

Proof

For \(\xi \in \Theta \) we denote the tangent space to \(\Theta \) in \(\xi \) by \({{\text {tang}}}_\xi \). Let

$$\begin{aligned} S:= & {} \{a\le s\le b: \exists m:[a,s]\longrightarrow {\mathbb {R}}^d\text { continuous}, \;\Vert m(t)\Vert \\= & {} 1,\; m(t)\bot {{\text {tang}}}_{p(\gamma (t))}\;\forall t\in [a,s]\}. \end{aligned}$$

The set \(S\) is nonempty and its elements are bounded by b. Let \(s_1:=\sup S\). There exists an open and connected subset \(U\subseteq \Theta \) such that \(p(\gamma (s_1))\in U\), and a unit normal vector \(n_1:U\longrightarrow {\mathbb {R}}^d\).

Since U is open and \(p\circ \gamma \) is continuous, there exists \(\eta >0\) such that \(p(\gamma ([s_1-\eta ,s_1]))\subseteq U\). By the definition of \(s_1\) there exists \(s\in (s_1-\eta ,s_1)\) and \(m:[a,s]\longrightarrow {\mathbb {R}}^d\) continuous, with \(\Vert m(t)\Vert =1\) and \(m(t)\bot {{\text {tang}}}_{p(\gamma (t))}\) for all \( t\in [a,s]\).

Since \(n_1\) is unique up to a factor \(\pm 1\), the mapping \(n_1\circ p \circ \gamma \) either coincides with m or \(-m\) on \((s_1-\eta ,s)\). Without loss of generality we may assume that the former is the case. Thus we can extend m continuously to \([a,s_1]\) by defining \(m(t):= n_1(p(\gamma (t)))\) for all \(t\in (s,s_1]\).

Now, if \(s_1\) was strictly smaller than b, then we could use the same mapping \(n_1\circ p \circ \gamma \) to extend m continuously beyond \(s_1\), contradicting the definition of \(s_1\). \(\square \)

We will need the following estimate on the local time of a one-dimensional Itô process.

Lemma 2.10

Let \(Y=(Y_t)_{t\ge 0}\) be an Itô process with bounded and progressively measurable coefficients \(\hat{A}=(\hat{A}_t)_{t\ge 0},\hat{B}=(\hat{B}_t)_{t\ge 0}\).

Then \(\sup _{y\in {\mathbb {R}}}{\mathbb {E}}(L_T^y(Y)) \le \left( 3 T^2\Vert \hat{A}\Vert _\infty ^2 +\frac{3}{2}T \Vert \hat{B}_s\Vert _\infty ^2\right) ^{1/2}\).

The claim is a special case of [19, Lemma 3.2]. We give a proof for the convenience of the reader.

Proof

From the Meyer–Tanaka formula [11, Section 3.7, Eq. (7.9)] we have

$$\begin{aligned} 2 L_T^y(Y)&=|Y_T-y|-|Y_0-y|-\int _0^T \left( {\mathbf {1}}_{\{Y_s>y\}}-{\mathbf {1}}_{\{Y_s<y\}}\right) dY_s\\&\le |Y_T-Y_0|+\left| \int _0^T \left( {\mathbf {1}}_{\{Y_s>y\}}-{\mathbf {1}}_{\{Y_s<y\}}\right) dY_s\right| \\&\le \left| \int _0^T \hat{A}_s ds\right| + \left| \int _0^T \hat{B}_s dW_s\right| +\left| \int _0^T \left( {\mathbf {1}}_{\{Y_s>y\}}-{\mathbf {1}}_{\{Y_s<y\}}\right) \hat{A}_sds\right| \\&\quad +\left| \int _0^T \left( {\mathbf {1}}_{\{Y_s>y\}}-{\mathbf {1}}_{\{Y_s<y\}}\right) \hat{B}_s dW_s\right| \\&\le 2\int _0^T |\hat{A}_s |ds+ \left| \int _0^T \hat{B}_s dW_s\right| +\left| \int _0^T \left( {\mathbf {1}}_{\{Y_s>y\}}-{\mathbf {1}}_{\{Y_s<y\}}\right) \hat{B}_s dW_s\right| \,. \end{aligned}$$

Using the inequality \((a+b+c)^2\le 3(a^2+b^2+c^2)\) we get

$$\begin{aligned} 4 L_T^y(Y)^2\le 12 \Vert \hat{A}\Vert _\infty ^2 T^2+3\left| \int _0^T \hat{B}_s dW_s\right| ^2 +3\left| \int _0^T \left( {\mathbf {1}}_{\{Y_s>y\}}-{\mathbf {1}}_{\{Y_s<y\}}\right) \hat{B}_s dW_s\right| ^2\,, \end{aligned}$$

and, using Itô’s \(L^2\)-isometry,

$$\begin{aligned} 4{\mathbb {E}}\left( L_T^y(Y)^2\right) \le 12 \Vert \hat{A}\Vert _\infty ^2 T^2+6\int _0^T \hat{B}_s^2 ds\le 12 \Vert \hat{A}\Vert _\infty ^2 T^2+6T \Vert \hat{B}\Vert _\infty ^2 \,. \end{aligned}$$

The claim now follows by applying the Cauchy–Schwarz-inequality and taking the supremum over all \(y\in {\mathbb {R}}\). \(\square \)

We are ready to prove the result of this section.

Proof of Theorem 2.7

Let \(\varepsilon _1=\varepsilon _0/2\). Define a mapping \(\lambda : {\mathbb {R}}\longrightarrow {\mathbb {R}}\) by

$$\begin{aligned} \lambda (z)= {\left\{ \begin{array}{ll} z- \frac{2}{3 \varepsilon _1^2}z^3 +\frac{1}{5 \varepsilon _1^4}z^5 &{}\quad |z|\le \varepsilon _1\\ \frac{8 \varepsilon _1}{15} &{}\quad z> \varepsilon _1\\ -\frac{8 \varepsilon _1}{15} &{}\quad z< -\varepsilon _1\,. \end{array}\right. } \end{aligned}$$

Note that \(\lambda '(0)=1\) and \(\lambda '(\pm \varepsilon _1)=\lambda ''(\pm \varepsilon _1)=0\), so that \(\lambda \in C^2\).

Next we decompose the path of X: let \(\tau _{0}:=\inf \{t\ge 0: X_t\in \Theta ^{\varepsilon _1}\}\). In particular we have \(\tau _0=0\), if \(X_0\in \Theta ^{\varepsilon _1}\). For \(k\in {\mathbb {N}}_0\), define

$$\begin{aligned} \kappa _{k+1}&:=\inf \{t\ge \tau _{k}: X_t\notin \Theta ^{2\varepsilon _1}\}\wedge T\,,\\ \tau _{k+1}&:=\inf \{t\ge \kappa _{k+1}: X_t\in \Theta ^{\varepsilon _1}\}\wedge T\,. \end{aligned}$$

By Lemma 2.9 there exist continuous \(m_k:[\tau _k,\kappa _{k+1}]\longrightarrow {\mathbb {R}}^d\), with \(\Vert m_k(t)\Vert =1\) and \(m_k(t)\bot {{\text {tang}}}_{p(X_t)}\) for all \(t\in [\tau _k,\kappa _{k+1}]\). Without loss of generality \(m_0\) can be chosen such that \(m_0(\tau _0)^\top (X_{\tau _0}-p(X_{\tau _0}) )\ge 0\). We construct a one-dimensional process Y as follows:

$$\begin{aligned} Y_{t}&= {\left\{ \begin{array}{ll} \lambda (m_0(\tau _0)^\top (X_{\tau _0}-p(X_{\tau _0}))) &{}\quad t\le \tau _0 \\ \lambda (m_k(t)^\top (X_{t}-p(X_{t}))) &{}\quad t\in [\tau _k,\kappa _{k+1}] \\ \lambda (m_k(\kappa _k)^\top (X_{\kappa _k}-p(X_{\kappa _k}))) &{}\quad t\in [\kappa _k,\tau _{k}]\,, \end{array}\right. } \end{aligned}$$

where without loss of generality the \(m_k\) are chosen such that

$$\begin{aligned} \lambda (m_{k+1}(\tau _{k+1})^\top (X_{\tau _{k+1}}-p(X_{\tau _{k+1}}))) =\lambda (m_k(\kappa _k)^\top (X_{\kappa _k}-p(X_{\kappa _k})))\,. \end{aligned}$$
(5)

Note that by construction both sides of (5) can only take the values \(\pm \lambda (\varepsilon _1)\).

We have thus constructed a continuous \([\lambda (-\varepsilon _1),\lambda (\varepsilon _1)]\)-valued process Y with the property that the occupation time of Y in an environment of 0 is the same as the occupation time of X in an environment of \(\Theta \), i.e. \(Y\in (-\lambda (\varepsilon ),\lambda (\varepsilon ))\), iff \(X\in \Theta ^\varepsilon \) for all \(0<\varepsilon <\varepsilon _1\).

To show that Y is an Itô process, we want to use Itô’s formula. For this we recognize that Y, depending on its proximity to \(\Theta \), is either constant or locally of the form \(Y_t=\lambda (n(p(X_t))^\top (X_t-p(X_t)))\) for a suitable choice of the unit normal vector. Denote \(D(x)=n(p(x))^\top (x-p(x) )\). The function \(D\) is locally a signed distance to \(\Theta \) and \(D \in C^2\). This can be seen by following the proof of [3, Theorem 1]. Hence, we may apply Itô’s formula to get

$$\begin{aligned} dY_t\,=\,&\lambda '(D(X_t))D'(X_t) A_t dt +\lambda '(D(X_t)) D'(X_t) B_t dW_t +\frac{1}{2}{\text {tr}}\left( B_t^\top \lambda ''(D(X_t))B_t\right) dt\,. \end{aligned}$$

By Lemma 2.8 we have \(D'(x)=n(p(x))^\top \), and hence

$$\begin{aligned} (\lambda (D(x)))''= (\lambda '(D(x))n(p(x))^\top )' =\lambda ''(D(x))n(p(x))n(p(x))^\top +\lambda '(D(x))n'(p(x))\,. \end{aligned}$$

Since \(\lambda '\) and \(\lambda ''\) are bounded by construction, \(\Vert n(p(x))n(p(x))^\top \Vert =1\), \(\Vert n'\Vert \) is bounded (c.f. the remark on Assumption 2.1.3), and by Assumption 1 of the theorem, the coefficients of Y are uniformly bounded. Therefore \(dY_t=\hat{A}_t dt+ \hat{B}_t dW_t\), with bounded and progressively measurable \(\hat{A},\hat{B}\).

Let \(0<\varepsilon \le \varepsilon _1/2\). For all \(|z|\le \varepsilon \), we have \(\lambda '(z)\ge \left( \frac{3}{4}\right) ^2\). Thus by Assumption 2 of the theorem,

$$\begin{aligned} \left( \frac{3}{4}\right) ^2 c_0^2\int _0^t{\mathbf {1}}_{\left\{ X_s\in \Theta ^\varepsilon \right\} }ds&= \left( \frac{3}{4}\right) ^2 c_0^2\int _0^t{\mathbf {1}}_{\left\{ Y_s\in (-\lambda (\varepsilon ),\lambda (\varepsilon ))\right\} }ds\\&\le \int _0^t {\mathbf {1}}_{\left\{ Y_s\in (-\lambda (\varepsilon ),\lambda (\varepsilon ))\right\} } \lambda '\left( D(X_s)\right) ^2 n(p(X_s))^\top B_sB_s^\top n(p(X_s))ds\\&= \int _0^t{\mathbf {1}}_{\left\{ Y_s\in (-\lambda (\varepsilon ),\lambda (\varepsilon ))\right\} } d\left[ Y\right] _s\,. \end{aligned}$$

By the occupation time formula [11, Chapter 3, 7.1 Theorem] for one-dimensional continuous semimartingales, we get

$$\begin{aligned} \int _0^T{\mathbb P}\left( \{X_s\in \Theta ^\varepsilon \}\right) ds&\le \left( \frac{4}{3 c_0}\right) ^{2} {\mathbb {E}}\left( \int _0^T{\mathbf {1}}_{\{Y_s\in (-\lambda (\varepsilon ),\lambda (\varepsilon ))\}}d\left[ Y\right] _s\right) \\&=2 \left( \frac{4}{3 c_0}\right) ^{2} {\mathbb {E}}\left( \int _{\mathbb {R}}{\mathbf {1}}_{(-\lambda (\varepsilon ),\lambda (\varepsilon ))}(y)L_T^{y}\left( Y\right) dy\right) \\&\le \frac{4^3}{3^2c_0^2}\, \sup _{y\in {\mathbb {R}}}{\mathbb {E}}\left( L_T^{y}\left( Y\right) \right) \varepsilon \,. \end{aligned}$$

\(\square \)

2.3 The transformation

The proof of convergence is based on a transformation that removes the discontinuity from the drift and makes the drift Lipschitz while preserving the Lipschitz property of the diffusion coefficient. A suitable transform is presented in [15]. We recall it here.

Define \(G{:}{\mathbb {R}}^d\longrightarrow {\mathbb {R}}^d\),

$$\begin{aligned} G(x)={\left\{ \begin{array}{ll} x+{\tilde{\phi }}(x) \alpha (p(x))&{}\quad x\in \Theta ^{\varepsilon _0}\\ x &{}\quad x\in {\mathbb {R}}^d\backslash \Theta ^{\varepsilon _0}\,, \end{array}\right. } \end{aligned}$$

where \(\varepsilon _0>0\) is smaller than the reach of \(\Theta \), see Assumption 2.1.3, \(\alpha \) is the function defined in Assumption 2.1.5, and

$$\begin{aligned} {\tilde{\phi }}(x)=n(p(x))^\top (x-p(x)) \Vert x-p(x)\Vert \phi \left( \frac{\Vert x-p(x)\Vert }{c}\right) \,, \end{aligned}$$

with positive constant c and

$$\begin{aligned} \phi (u)= {\left\{ \begin{array}{ll} (1+u)^3(1-u)^3 &{}\quad |u|\le 1\\ 0 &{}\quad |u|> 1. \end{array}\right. } \end{aligned}$$

Note that \(G''\) is piecewise Lipschitz with exceptional set \(\Theta \).

If c is chosen sufficiently small, see [15, Lemma 3.18], G is invertible by [15, Theorem 3.14]. Furthermore, Itô’s formula holds for G and \(G^{-1}\) by [15, Theorem 3.19].

With this we can define a process \(Z=(Z_t)_{t\ge 0}\) by \(Z_t=G(X_t)\), which solves the SDE

$$\begin{aligned} dZ_t =\tilde{\mu }(Z_t) dt+\tilde{\sigma }(Z_t) dW_t\,, \end{aligned}$$
(6)

where

$$\begin{aligned} \tilde{\mu }(z)&=G'(G^{-1}(z))\mu (G^{-1}(z))+\frac{1}{2}{\text {tr}}\left( \sigma (G^{-1}(z))^\top G''(G^{-1}(z))\sigma (G^{-1}(z))\right) \,,\\ \tilde{\sigma }(z)&=G'(G^{-1}(z)) \sigma (G^{-1}(z)) \,. \end{aligned}$$

From [15, Theorem 3.20] we know that \(\tilde{\mu }\) and \(\tilde{\sigma }\) are Lipschitz, and hence the solution to (6) can be approximated with strong order 1 / 2 using the Euler–Maruyama scheme.

3 Main result

We are ready to formulate the main result.

Theorem 3.1

Let Assumption 2.1 hold. Then the Euler–Maruyama method (2) converges to the solution of SDE (1) with strong order \(1/4-\epsilon \) for arbitrarily small \(\epsilon >0\), i.e. there exists a constant C such that for all \(\epsilon >0\) it holds that for sufficiently small step size \(\delta >0\),

$$\begin{aligned} {\mathbb {E}}\left( \sup _{t\in [0,T]} \Vert X_t-X^\delta _t\Vert ^2\right) ^{1/2} \le C \delta ^{1/4-\epsilon }\,. \end{aligned}$$

In preparation of the proof of the main result, we prove two lemmas.

Lemma 3.2

Let Assumption 2.1.1 hold. Then there exists a constant C such that for sufficiently small step size \(\delta \)

$$\begin{aligned} {\mathbb {E}}\left( \int _0^{T}\Vert X^\delta _s -X^\delta _{\underline{s}}\Vert ^2 ds\right) \le C \delta \,. \end{aligned}$$

Proof

By the definition of the Euler–Maruyama method (2) we have

$$\begin{aligned}&{\mathbb {E}}\left( \int _0^{T}\Vert X^\delta _{\underline{s}}-X^\delta _s\Vert ^2 ds\right) = \sum _{j=0}^{T/\delta -1} {\mathbb {E}}\left( \int _{j\delta }^{(j+1)\delta }\Vert X^\delta _{j \delta } -X^\delta _s\Vert ^2 ds\right) \\&\quad \le \frac{T}{\delta }\sup _{t\in \{j \delta : j=0,\ldots ,T/\delta -1\}} {\mathbb {E}}\left( \int _{t}^{t+\delta }\Vert X^\delta _{t} -X^\delta _s\Vert ^2 ds\right) \\&\quad =\frac{T}{\delta }\sup _{t\in \{j \delta : j=0,\ldots ,T/\delta -1\}} {\mathbb {E}}\left( \int _{t}^{t+\delta }\Vert X^\delta _{t} -X^\delta _t -\mu (X^\delta _t) \delta - \sigma (X^\delta _t)(W_s-W_t)\Vert ^2 ds\right) \\&\quad \le \frac{ 2T}{\delta } \sup _{t\in \{j \delta : j=0,\ldots ,T/\delta -1\}}{\mathbb {E}}\left( \int _{t}^{t+\delta }\Vert \mu (X^\delta _t) \delta \Vert ^2 ds + \int _t^{t+\delta } \Vert \sigma (X^\delta _t) (W_s-W_t)\Vert ^2 ds\right) \\&\quad \le \frac{2T}{\delta } \left( \Vert \mu \Vert _\infty ^2 \delta ^3 + \Vert \sigma \Vert ^2_\infty \sup _{t\in \{j \delta : j=0,\ldots ,(T-\delta )/\delta \}} \int _t^{t+\delta }{\mathbb {E}}( \Vert W_s-W_t\Vert ^2) ds\right) \\&\quad =\frac{2T}{\delta } \left( \Vert \mu \Vert _\infty ^2 \delta ^3 + d\Vert \sigma \Vert ^2_\infty \sup _{t\in \{j \delta : j=0,\ldots ,(T-\delta )/\delta \}}\int _t^{t+\delta }(s-t) ds\right) \\&\quad = \frac{2T}{\delta } \left( \Vert \mu \Vert _\infty ^2 \delta ^3 + \frac{d}{2} \Vert \sigma \Vert ^2_\infty \delta ^2\right) \le C \delta \,. \end{aligned}$$

\(\square \)

For all \(\delta ,\varepsilon >0\) and all \(j=0,\ldots ,T/\delta -1\), define

$$\begin{aligned} {\Omega _{\delta ,\varepsilon ,j}}:=\left\{ \omega \in \Omega : \sup _{s\in [j\delta ,(j+1)\delta ]} \left\| X^\delta _s(\omega )-X^\delta _{\underline{s}}(\omega )\right\| \ge \varepsilon \right\} \,. \end{aligned}$$
(7)

Lemma 3.3

Let Assumption 2.1.1 hold. Then there exists a constant C such that for all \(0<\delta \le 1\), all \(\varepsilon >0\), and all \(j=0,\ldots ,T/\delta -1\), it holds that \({\mathbb P}({\Omega _{\delta ,\varepsilon ,j}})\le C\exp (-\varepsilon / \Vert \sigma \Vert _\infty \delta ^{1/2})\).

Proof

$$\begin{aligned} {\mathbb P}&\left( \sup _{j\delta \le s\le (j+1)\delta }\Vert X^\delta _{s}-X^\delta _{\underline{s}}\Vert \ge \varepsilon \right) ={\mathbb P}\left( \sup _{j\delta \le s\le (j+1)\delta }\Vert \mu (X^\delta _{\underline{s}})(s-{\underline{s}})+\sigma (X^\delta _{\underline{s}})(W_s-W_{\underline{s}})\Vert \ge \varepsilon \right) \\&\le {\mathbb P}\left( \sup _{j\delta \le s\le (j+1)\delta }\Big \{\Vert \mu \Vert _\infty \delta + \Vert \sigma \Vert _\infty \Vert W_s-W_{\underline{s}}\Vert \Big \}\ge \varepsilon \right) ={\mathbb P}\left( \sup _{j\delta \le s\le (j+1)\delta }\Vert W_s-W_{\underline{s}}\Vert \ge \frac{\varepsilon -\Vert \mu \Vert _\infty \delta }{\Vert \sigma \Vert _\infty } \right) \\&={\mathbb P}\left( \sup _{0\le s\le 1}\Vert W_s-W_0\Vert \ge \frac{\varepsilon -\Vert \mu \Vert _\infty \delta }{\Vert \sigma \Vert _\infty \delta ^{1/2}} \right) ={\mathbb P}\left( \exp \left( \sup _{0\le s\le 1}\Vert W_s\Vert \right) \ge \exp \left( \frac{\varepsilon -\Vert \mu \Vert _\infty \delta }{\Vert \sigma \Vert _\infty \delta ^{1/2}}\right) \right) \\&\le {\mathbb {E}}\left( \exp (\Vert W_1\Vert )\right) \exp \left( \frac{\Vert \mu \Vert _\infty \delta -\varepsilon }{\Vert \sigma \Vert _\infty \delta ^{1/2}} \right) \le C\exp \left( -\frac{\varepsilon }{\Vert \sigma \Vert _\infty \delta ^{1/2}} \right) \,, \end{aligned}$$

where we applied Doob’s submartingal inequality, and in the last step used that \(\delta \le 1\). \(\square \)

Now, we are ready to prove our main result.

Proof of Theorem 3.1

Since \(G^{-1}\) is Lipschitz by the proof of [15, Theorem 3.20],

$$\begin{aligned} {\mathbb {E}}\Big (\sup _{0\le t\le T} \Vert X_t-X^\delta _t \Vert ^2 \Big )^{1/2} \le L_{G^{-1}} {\mathbb {E}}\Big (\sup _{0\le t\le T} \Vert Z_t-G (X^\delta _t ) \Vert ^2 \Big )^{1/2}\,, \end{aligned}$$
(8)

with \(Z=G(X)\) as in (6). Let \(Z^\delta \) be the Euler–Maruyama approximation of Z. It holds that

$$\begin{aligned}&{\mathbb {E}}\Big (\sup _{0\le t\le T} \Vert Z_t-G (X^\delta _t ) \Vert ^2 \Big )^{1/2} \le {\mathbb {E}}\Big (\sup _{0\le t\le T} \Vert Z_t- Z^\delta _t \Vert ^2 \Big )^{1/2}\nonumber \\&\,+\,{\mathbb {E}}\Big (\sup _{0\le t\le T} \Vert Z^\delta _t-G (X^\delta _t ) \Vert ^2 \Big )^{1/2}\,. \end{aligned}$$
(9)

For estimating the first term in (9), recall that by [15, Theorem 3.20], the transformed SDE (6) has Lipschitz coefficients. Since the Euler–Maruyama method converges with strong order 1 / 2 for SDEs with Lipschitz coefficients (see [12, Theorem 10.2.2]), there exists a constant \(C_1>0\) such that for sufficiently small \(\delta >0\),

$$\begin{aligned} {\mathbb {E}}\left( \sup _{0\le t\le T}\Vert Z_t- Z^\delta _t\Vert ^2\right) \le C_1 \delta \,. \end{aligned}$$
(10)

We now turn to the second term in (9), i.e. we estimate the difference between G applied to the Euler–Maruyama approximation of X and the Euler–Maruyama approximation of Z. Denote, for all \(\tau \in [0,T]\),

$$\begin{aligned} u(\tau ):={\mathbb {E}}\left( \sup _{0\le t\le \tau } \Vert G(X^\delta _{t})-Z^\delta _{t}\Vert ^2\right) \,. \end{aligned}$$

With \(\nu (x_1,x_2)=G'(x_1)\mu (x_2)+\frac{1}{2}{\text {tr}}(\sigma (x_2)^\top G''(x_1)\sigma (x_2))\) we have by Itô’s formula,

$$\begin{aligned} G(X^\delta _{t})=G(X^\delta _{0})+\int _0^{t}\nu (X^\delta _s,X^\delta _{{\underline{s}}}) ds+\int _0^{t}G'(X^\delta _s)\sigma (X^\delta _{{\underline{s}}})dW_s\,, \end{aligned}$$

so that

$$\begin{aligned} u(\tau )&={\mathbb {E}}\left( \sup _{0\le t\le \tau }\left\| \int _0^{t} \nu (X^\delta _s,X^\delta _{{\underline{s}}})ds +\int _0^{t}G'(X^\delta _s)\sigma (X^\delta _{{\underline{s}}})dW_s-\int _0^{t}\tilde{\mu }\left( Z^\delta _{{\underline{s}}}\right) ds\right. \right. \\ {}&\left. \left. \quad -\int _0^{t}\tilde{\sigma }\left( Z^\delta _{{\underline{s}}}\right) dW_s\right\| ^2\right) \\&\le {\mathbb {E}}\left( \sup _{0\le t\le \tau }\left( 4\left\| \int _0^{t}\left( \nu (X^\delta _s,X^\delta _{{\underline{s}}}) -\nu (X^\delta _{{\underline{s}}},X^\delta _{{\underline{s}}})\right) ds\right\| ^2 +4\left\| \int _0^{t}\left( G'(X^\delta _s)\sigma (X^\delta _{{\underline{s}}})\right. \right. \right. \right. \\ {}&\quad \left. \left. \left. \left. -G'(X^\delta _{{\underline{s}}})\sigma (X^\delta _{{\underline{s}}})\right) dW_s\right\| ^2 \right. \right. \\&\left. \left. \quad +\,4\left\| \int _0^{t}\left( \tilde{\mu }(G(X^\delta _{{\underline{s}}}))-\tilde{\mu }(Z^\delta _{{\underline{s}}})\right) ds\right\| ^2 +4\left\| \int _0^{t}\left( \tilde{\sigma }(G(X^\delta _{{\underline{s}}}))-\tilde{\sigma }(Z^\delta _{{\underline{s}}})\right) dW_s\right\| ^2 \right) \right) \,. \end{aligned}$$

Applying the Cauchy–Schwarz inequality to the Lebesgue integrals and the d-dimensional Burkholder–Davis–Gundy inequality [8, Lemma 3.7] to the Itô integrals, we obtain

$$\begin{aligned} u(\tau )&\le 4 T \,{\mathbb {E}}\left( \int _0^{\tau }\left\| \nu (X^\delta _s,X^\delta _{{\underline{s}}}) -\nu (X^\delta _{{\underline{s}}},X^\delta _{{\underline{s}}})\right\| ^2 ds\right) \nonumber \\&\quad +8d\,{\mathbb {E}}\left( \int _0^{\tau }\left\| G'(X^\delta _s)\sigma (X^\delta _{{\underline{s}}})-G'(X^\delta _{{\underline{s}}})\sigma (X^\delta _{{\underline{s}}})\right\| ^2 ds\right) \nonumber \\&\quad +4T\,{\mathbb {E}}\left( \int _0^{\tau }\left\| \tilde{\mu }(G(X^\delta _{{\underline{s}}}))-\tilde{\mu }(Z^\delta _{{\underline{s}}}) \right\| ^2 ds\right) \nonumber \\&\quad +8d\,{\mathbb {E}}\left( \int _0^{\tau }\left\| \tilde{\sigma }(G(X^\delta _{{\underline{s}}}))-\tilde{\sigma }(Z^\delta _{{\underline{s}}})\right\| ^2 ds\right) \nonumber \\&=:4T\,E_1+8d\,E_2+4T\,E_3+8d\,E_4\,. \end{aligned}$$
(11)

For estimating \(E_1\) in (11), we will use that

$$\begin{aligned} \left\| \nu (x_1,x_2)-\nu (x_2,x_2)\right\| ^2\le {\left\{ \begin{array}{ll} \left( 2L_{G'}^2 \Vert \mu \Vert _\infty ^2 +\frac{1}{2}L_{G''}^2\Vert \sigma \Vert _\infty ^4\right) \Vert x_1-x_2\Vert ^2&{}\quad x_1\notin \Theta ^\varepsilon \,,\Vert x_1-x_2\Vert <\varepsilon \\ 4 \Vert \mu \Vert _\infty ^2 \Vert G'\Vert ^2_\infty + \Vert \sigma \Vert _\infty ^4 \Vert G''\Vert _\infty ^2 &{}\quad \text {otherwise}\,. \end{array}\right. } \end{aligned}$$

With this and the definition of \({\Omega ^c_{\delta ,\varepsilon ,j}}\) in (7), we get

$$\begin{aligned} E_1&=\int _0^{\tau } {\mathbb {E}}\left( \left\| \nu (X^\delta _s,X^\delta _{\underline{s}})-\nu (X^\delta _{\underline{s}},X^\delta _{\underline{s}})\right\| ^2\left( {\mathbf {1}}_{\{X^\delta _s \notin \Theta ^\varepsilon \}}{\mathbf {1}}_{\Omega ^c_{\delta ,\varepsilon ,{\underline{s}}/\delta }}+{\mathbf {1}}_{\{X^\delta _s \notin \Theta ^\varepsilon \}}{\mathbf {1}}_{\Omega _{\delta ,\varepsilon ,{\underline{s}}/\delta }}+{\mathbf {1}}_{\{X^\delta _s \in \Theta ^\varepsilon \}}\right) \right) ds\\&\le \left( 2 L_{G'}^2 \Vert \mu \Vert _\infty ^2+\frac{1}{2}L_{G''}^2\Vert \sigma \Vert _\infty ^4\right) \varepsilon ^2 T\\&\quad + \left( 4 \Vert \mu \Vert _\infty ^2 \Vert G'\Vert ^2_\infty + \Vert \sigma \Vert ^4 \Vert G''\Vert _\infty ^2 \right) \left( \int _0^{T} {\mathbb P}({\Omega _{\delta ,\varepsilon ,{\underline{s}}/\delta }}) ds+ \int _0^T{\mathbb P}( \{X^\delta _s \in \Theta ^\varepsilon \})ds\right) \,. \end{aligned}$$

By Lemma 3.3, \(\int _0^T {\mathbb P}({\Omega _{\delta ,\varepsilon ,{\underline{s}}/\delta }}) ds\le C_2\exp (- \varepsilon /\Vert \sigma \Vert _\infty \delta ^{1/2} )\), and by Theorem 2.7, \(\int _0^T {\mathbb P}( \{X^\delta _s \in \Theta ^\varepsilon \})ds \le C_3 \varepsilon \), for suitable constants \(C_2,C_3\). In order to minimize the bound on \(E_1\), we choose \(\varepsilon \) such that \(\exp (- \varepsilon /\Vert \sigma \Vert _\infty \delta ^{1/2} )+\varepsilon \) is minimized for \(\delta \) sufficiently small, yielding \(\varepsilon =-\Vert \sigma \Vert _\infty \delta ^{1/2} \log (\Vert \sigma \Vert _\infty \delta ^{1/2} )=\Vert \sigma \Vert _\infty \delta ^{1/2-2 \epsilon } (-\delta ^{2 \epsilon } \log (\Vert \sigma \Vert _\infty \delta ^{1/2} ))\) for arbitrarily small \( \epsilon >0\). Hence, with \(C_4=(2 L_{G'}^2 \Vert \mu \Vert _\infty ^2+\frac{1}{2}L_{G''}^2\Vert \sigma \Vert _\infty ^4)T\), \(C_5=(4 \Vert \mu \Vert _\infty ^2 \Vert G'\Vert ^2_\infty + \Vert \sigma \Vert ^4 \Vert G''\Vert _\infty ^2) C_2\), \(C_6=(4 \Vert \mu \Vert _\infty ^2 \Vert G'\Vert ^2_\infty + \Vert \sigma \Vert ^4 \Vert G''\Vert _\infty ^2) C_3\), we get

$$\begin{aligned} E_1&\le C_4 \varepsilon ^2+C_5 \exp \left( - \frac{\varepsilon }{\Vert \sigma \Vert _\infty \delta ^{1/2}}\right) + C_6 \varepsilon \\&= C_4 \Vert \sigma \Vert _\infty ^2 \delta ^{1-4\epsilon } (-\delta ^{2 \epsilon }\log (\Vert \sigma \Vert _\infty \delta ^{1/2} ))^2\\&\quad +C_5 \Vert \sigma \Vert _\infty \delta ^{1/2}+C_6 \Vert \sigma \Vert _\infty \delta ^{1/2-2 \epsilon } (-\delta ^{2 \epsilon }\log (\Vert \sigma \Vert _\infty \delta ^{1/2} ))\,. \end{aligned}$$

Thus, with \(C_7=C_4 \Vert \sigma \Vert _\infty ^2 +C_5 \Vert \sigma \Vert _\infty +C_6 \Vert \sigma \Vert _\infty \) and for arbitrarily small fixed \(\epsilon >0\), it holds that for sufficiently small \(\delta \)

$$\begin{aligned} E_1\le C_7 \delta ^{1/2-2\epsilon }\,. \end{aligned}$$
(12)

For estimating \(E_2\) in (11), we apply Lemma 3.2 to get

$$\begin{aligned} E_2\le L_{G'}^2\Vert \sigma \Vert _\infty ^2 \int _0^{T} {\mathbb {E}}\left( \left\| X^\delta _s-X^\delta _{{\underline{s}}}\right\| ^2 \right) ds \le L_{G'}^2\Vert \sigma \Vert _\infty ^2 C_8 \delta \,. \end{aligned}$$
(13)

For estimating \(E_3,E_4\) in (11), we use that \(\tilde{\mu }, \tilde{\sigma }\) are Lipschitz by [15, Theorem 3.20], to get

$$\begin{aligned} E_3&\le L_{\tilde{\mu }}^2\int _0^{\tau }{\mathbb {E}}\left( \left\| G(X^\delta _{{\underline{s}}})-Z^\delta _{{\underline{s}}}\right\| ^2 \right) ds\le L_{\tilde{\mu }}^2 \int _0^{\tau }u(s)ds\,, \end{aligned}$$
(14)
$$\begin{aligned} E_4&\le L_{\tilde{\sigma }}^2\int _0^{\tau }{\mathbb {E}}\left( \left\| G(X^\delta _{{\underline{s}}})-Z^\delta _{{\underline{s}}}\right\| ^2\right) ds\le L_{\tilde{\sigma }}^2\int _0^{\tau }u(s)ds\,. \end{aligned}$$
(15)

Combining the estimates (12), (13), (14), (15) with (11), we get

$$\begin{aligned} 0\le u(\tau ) \le C_9 \int _0^\tau u(s) ds + 4 T C_7 \delta ^{1/2-2\epsilon }+ 8d L_{G'}^2\Vert \sigma \Vert _\infty ^2 C_8 \delta \,, \end{aligned}$$

with \(C_9= 4T L_{\tilde{\mu }}^2+8dL_{\tilde{\sigma }}^2\). Using that \(4 T C_7 \delta ^{1/2-2\epsilon }+ 8d L_{G'}^2\Vert \sigma \Vert _\infty ^2 C_8 \delta \le C_{10} \delta ^{1/2-2\epsilon }\) for \(\delta \le 1\), and applying Gronwall’s inequality yields for all \(\tau \in [0,T]\),

$$\begin{aligned} u(\tau )\le C_{10} \exp (C_9 \tau ) \delta ^{1/2-2\epsilon }\,. \end{aligned}$$
(16)

Combining (10) and (16) with (9), and the result with (8), finally yields

$$\begin{aligned} {\mathbb {E}}\left( \sup _{0\le t\le T} \Vert X_t-X^\delta _t \Vert ^2 \right) ^{1/2} \le L_{G^{-1}}\Big (C_1 \delta \Big )^{1/2} +L_{G^{-1}}\Big (C_{10}\exp (C_9 T) \delta ^{1/2-2\epsilon }\Big )^{1/2} \le C \delta ^{1/4-\epsilon }\,, \end{aligned}$$

for a suitably chosen constant C, for arbitrarily small \(\epsilon >0\), and for \(\delta \) sufficiently small. \(\square \)

4 Examples

We ran simulations for several examples—ones of theoretical interest as well as an example coming from applications.

When studying stochastic dynamical systems which include a noisy signal, then filtering this signal leads to a higher dimensional system with a degenerate diffusion coefficient. Stochastic control problems often lead to an optimal control policy which makes the drift of the system discontinuous. Examples are models with incomplete market information in mathematical finance where the rate with which cashflows are paid from a firm value process change systematically when the asset-liability ratio passes a certain threshold which then triggers a rating change.

The class of equations studied here appears frequently in several areas of applied mathematics and the natural sciences.

4.1 Step-function

In the first example the drift is the step function \(\mu (x_1,x_2)=(3({\mathbf {1}}_{\{x_1 \ge 0\}}-{\mathbf {1}}_{\{x_1 < 0\}}),1)^\top \), and \(\sigma \equiv {{\text {id}}_{{\mathbb {R}}^2}}\). It can easily be checked that these coefficients satisfy Assumption 2.1. In particular, note that the non-parallelity condition is trivially satisfied, since \(\sigma \) is uniformly non-degenerate. Since \(\mu \) does not satisfy a one-sided Lipschitz condition, our result is the first one that gives a strong convergence rate of the Euler–Maruyama method for this example.

4.2 Discontinuity along the unit circle

In this example the drift has a discontinuity along the unit circle, and the diffusion coefficient is degenerate on the whole of \({\mathbb {R}}^2\):

$$\begin{aligned} \mu (x_1,x_2)={\left\{ \begin{array}{ll} (1,1)^\top &{} x_1^2+x_2^2\ge 1\\ (-x_1,x_2)^\top &{} x_1^2+x_2^2<1\,, \end{array}\right. } \qquad \sigma (x_1,x_2)=\frac{2}{1+x_1^2+x_2^2} \left( \begin{array}{cc} x_1 &{} 0 \\ x_2 &{} 0 \end{array} \right) \,. \end{aligned}$$

Assumption 2.1 largely follows from Example 2.6. The non-parallelity condition is readily verified:

$$\begin{aligned} \left\| \frac{2}{(1+x_1^2+x_2^2)(x_1^2+x_2^2)} \left( \begin{array}{cc} x_1 &{} x_2 \\ 0 &{} 0 \end{array} \right) \left( \begin{array}{c} x_1 \\ x_2 \end{array} \right) \right\| =\frac{2\sqrt{x_1^2+x_2^2}}{(1+x_1^2+x_2^2)(x_1^2+x_2^2)} =1 \end{aligned}$$

for all points \((x_1,x_2)\) that lie on the unit circle, i.e. \(x_1^2+x_2^2=1\).

4.3 Dividend maximization under incomplete information

In insurance mathematics, a well-studied problem is the maximization of the expected discounted future dividend payments until the time of ruin of an insurance company, a value which serves as a risk measure. In [24] the problem is studied in a setup that allows for incomplete information about the market. This leads to a joint filtering and stochastic optimal control problem, and after solving the filtering problem, the driving dynamics are high dimensional and have a degenerate diffusion coefficient. This issue is described in more detail in [24]. Solving the stochastic optimal control problem in dimensions higher than three with the usual technique (solving an associated partial differential equation) becomes practically infeasible. Therefore, one has to resort to simulation. The SDE that has to be simulated has the coefficients

$$\begin{aligned} \mu (x_1,\ldots ,x_d)&= \begin{pmatrix} \vartheta _d+\sum _{i=1}^{d-1}(\vartheta _i-\vartheta _d)x_{i+1}-{\bar{u}}{\mathbf {1}}_{[f(x_2,\ldots ,x_d),\infty )}(x_1)\\ q_{d1} + \sum _{j=1}^{d-1} (q_{j1}-q_{d1}) x_{j+1}\\ \vdots \\ q_{d(d-1)} + \sum _{j=1}^{d-1} (q_{j(d-1)}-q_{d(d-1)}) x_{j+1}\\ \end{pmatrix}\,\\ \sigma (x_1,\ldots ,x_d)&=\left( \begin{array}{llll} \beta &{} 0 &{} \ldots &{} 0\\ x_2 \frac{\vartheta _1-\vartheta _d-\sum _{j=1}^{d-1}(\vartheta _j-\vartheta _d) x_{j+1}}{\beta } &{} \vdots &{} &{} \vdots \\ \vdots &{} \vdots &{} &{} \vdots \\ x_d \frac{\vartheta _{d-1}-\vartheta _d-\sum _{j=1}^{d-1}(\vartheta _j-\vartheta _d) x_{j+1}}{\beta } &{} 0 &{} \ldots &{} 0 \end{array} \right) \,, \end{aligned}$$

where \({\bar{u}},\beta ,(\vartheta _i)_{i=1}^d,(q_{ij})_{i,j=1}^d\) are known constants. The arguments \(x_2,\ldots ,x_d\) are elements of the simplex \(\{(x_2,\ldots ,x_d)\in [0,1]^{d-1}:\sum _{j=1}^{d-1}x_{j+1}\le 1\}\), and the corresponding processes stay within this simplex almost surely, see [24]. The function \(f\) determines the hypersurface \(\Theta \) along which the drift is discontinuous: \(\Theta =\{(x_1,\ldots ,x_d): x_1=f(x_2,\ldots ,x_d)\}\). In our simulations we choose \(d=5\) and \(f\) affine linear, but note that we need not restrict ourselves to affine linear \(f\).

We need to check Assumption 2.1: Since \(x_2,\ldots ,x_d \in [0,1]\), \(\mu , \sigma \) are bounded, and all first order derivatives of the entries of \(\sigma \) are bounded. Hence, \(\sigma \) is Lipschitz. \(\mu \) is piecewise Lipschitz, and since \(f\) is affine linear, \(\Theta \in C^3\). Whether the non-parallelity condition holds depends on the choice of the parameters, but for ours the condition is satisfied. Assumption 2.1.5 can easily be checked. Note that the coefficients can be extended to the whole of \({\mathbb {R}}^d\) in a way that they still satisfy our assumptions.

4.4 Error estimate

The \(L^2\)-error is estimated by

$$\begin{aligned} {{\text {err}}}_k := {\bar{e}} \,\hat{E}\left( \left\| X_T^{(k)} - X_T^{(k-1)}\right\| ^2\right) ^{1/2}\,, \end{aligned}$$

where \(X_T^{(k)}\) is the numerical approximation of \(X_T\) with step size \(\delta ^{(k)}\), \(\hat{E}\) is an estimator of the mean value using \(2^{14}\) paths, and \({\bar{e}}\) is a normalizing constant so that \({{\text {err}}}_1=\sqrt{1/4}\).

Figure 1 shows \(\log _2\) of the estimated \(L^2\)-error of the Euler–Maruyama approximation of \(X_T\) plotted over \(\log _2 \delta ^{(k)}\) for the examples presented above. We observe that the theoretical convergence rate is approximately obtained for the example of a step-function and that the other examples converge at a faster rate. In particular, for the examples with degenerate diffusion coefficient, the convergence rate is not worse than for the other example. Even for the step-function example, for sufficiently small step-size the convergence rate seems to be higher than the theoretical one. Hence, it will be an interesting topic for future research to prove sharpness, or find a sharp bound.

Fig. 1
figure 1

Estimated \(L^2\)-errors

Even though the proven rate for the Euler–Maruyama method is lower than for the transformation-based method from [15], the calculations are usually faster in practice using the first method, since the simulation of a single path is faster. Table 1 confirms this claim: we observe that computation times are higher by up to two orders of magnitude for the transformation method, while the estimated error is of comparable size.

Table 1 Runtimes in seconds using sequential computation and estimated errors for the Euler–Maruyama method (EM) and the transformation method (GM) with 512 time-steps and 1024 paths

For completeness, we remark that one can construct examples, where the transformation method is much faster while giving a smaller error. For example, start with prescribing the transform \(G(x)=x+x|x|\phi (10x)\) and set \(\mu (x)=\frac{1}{2}(G^{-1})''(G(x))\) and \(\sigma (x)=(G^{-1})'(G(x))\). This leads to \(\tilde{\mu }(z)=0\) and \(\tilde{\sigma }(z)=1\). Hence, if we use the transformation method with the same G, then \(Z^\delta =Z=W\) and the transformation method gives the estimate \(G^{-1}(W)\), which is the exact solution.

5 Conclusion

In this paper we have for the first time proven strong convergence and also a positive strong convergence rate for an explicit method (the Euler–Maruyama method) for multidimensional SDEs with discontinuous drift that has a degenerate diffusion coefficient, or with a discontinuous drift that does not satisfy a one-sided Lipschitz condition, or both. The Euler–Maruyama method has the advantage that it does not need the exact form of the set of discontinuities of the drift as an input, and that in practice, computation of one path is fast in comparison to the second method in the literature that can deal with this class of SDEs. Our numerical experiments suggest that in addition to these advantages, it even seems that the Euler–Maruyama method converges at a higher than the theoretically obtained rate for many examples and it will be a topic of future research to prove sharpness, or find a sharp bound.