Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations

E, Weinan; Han, Jiequn; Jentzen, Arnulf

doi:10.1007/s40304-017-0117-6

Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations

Published: 10 November 2017

Volume 5, pages 349–380, (2017)
Cite this article

Communications in Mathematics and Statistics Aims and scope Submit manuscript

Weinan E^1,2,3,
Jiequn Han² &
Arnulf Jentzen⁴

11k Accesses
29 Altmetric
Explore all metrics

Abstract

We study a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, which is based on an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of the BSDE. The policy function is then approximated by a neural network, as is done in deep reinforcement learning. Numerical results using TensorFlow illustrate the efficiency and accuracy of the studied algorithm for several 100-dimensional nonlinear PDEs from physics and finance such as the Allen–Cahn equation, the Hamilton–Jacobi–Bellman equation, and a nonlinear pricing model for financial derivatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning Approximation Algorithms for High-Dimensional Fully Nonlinear Partial Differential Equations and Second-order Backward Stochastic Differential Equations

Article 07 January 2019

Deep learning schemes for parabolic nonlocal integro-differential equations

Article Open access 24 October 2022

Neural networks-based backward scheme for fully nonlinear PDEs

Article 27 January 2021

References

Bellman, R.: Dynamic programming. Princeton Landmarks in Mathematics. Princeton University Press, Princeton, NJ. Reprint of the 1957 edition, with a new introduction by Stuart Dreyfus (2010)
Bender, C., Denk, R.: A forward scheme for backward SDEs. Stoch. Process. Appl. 117(12), 1793–1812 (2007)
Article MathSciNet MATH Google Scholar
Bender, C., Schweizer, N., Zhuo, J.: A primal-dual algorithm for BSDEs. arXiv:1310.3694 (2014)
Bergman, Y.Z.: Option pricing with differential interest rates. Rev. Financ. Stud. 8(2), 475–500 (1995)
Article Google Scholar
Briand, P., Labart, C.: Simulation of BSDEs by Wiener chaos expansion. Ann. Appl. Probab. 24(3), 1129–1171 (2014)
Article MathSciNet MATH Google Scholar
Chassagneux, J.-F.: Linear multistep schemes for BSDEs. SIAM J. Numer. Anal. 52(6), 2815–2836 (2014)
Article MathSciNet MATH Google Scholar
Chassagneux, J.-F., Richou, A.: Numerical simulation of quadratic BSDEs. Ann. Appl. Probab. 26(1), 262–304 (2016)
Article MathSciNet MATH Google Scholar
Crisan, D., Manolarakis, K.: Solving backward stochastic differential equations using the cubature method: application to nonlinear pricing. SIAM J. Financ. Math. 3(1), 534–571 (2012)
Article MathSciNet MATH Google Scholar
Darbon, J., Osher, S.: Algorithms for overcoming the curse of dimensionality for certain Hamilton–Jacobi equations arising in control theory and elsewhere. Res. Math. Sci. 3(19), 26 (2016)
MathSciNet MATH Google Scholar
Debnath, L.: Nonlinear Partial Differential Equations for Scientists and Engineers, 3rd edn. Birkhäuser/Springer, New York (2012)
Book MATH Google Scholar
E, W., Han, J., Jentzen, A.: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. arXiv:1706.04702 (2017)
E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: Linear scaling algorithms for solving high-dimensional nonlinear parabolic differential equations. arXiv:1607.03295 (2017)
E, W., Hutzenthaler, M., Jentzen, A., Kruse, T.: On multilevel Picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. arXiv:1708.03223 (2017)
Gobet, E., Lemor, J.-P., Warin, X.: A regression-based Monte Carlo method to solve backward stochastic differential equations. Ann. Appl. Probab. 15(3), 2172–2202 (2005)
Article MathSciNet MATH Google Scholar
Gobet, E., Turkedjiev, P.: Linear regression MDP scheme for discrete backward stochastic differential equations under general conditions. Math. Comput. 85(299), 1359–1391 (2016)
Article MathSciNet MATH Google Scholar
Gobet, E., Turkedjiev, P.: Adaptive importance sampling in least-squares Monte Carlo algorithms for backward stochastic differential equations. Stoch. Process. Appl. 127(4), 1171–1203 (2017)
Article MathSciNet MATH Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press 2016. http://www.deeplearningbook.org
Han, J., E, W.: Deep learning approximation for stochastic control problems. arXiv:1611.07422 (2016)
Han, J., Jentzen, A., E, W.: Overcoming the curse of dimensionality: solving high-dimensional partial differential equations using deep learning. arXiv:1707.02568 (2017)
Henry-Labordère, P.: Counterparty risk valuation: a marked branching diffusion approach. arXiv:1203.2369 (2012)
Henry-Labordère, P., Oudjane, N., Tan, X., Touzi, N., Warin, X.: Branching diffusion representation of semilinear PDEs and Monte Carlo approximation. arXiv:1603.01727 (2016)
Henry-Labordère, P., Tan, X., Touzi, N.: A numerical algorithm for a class of BSDEs via the branching process. Stoch. Process. Appl. 124(2), 1112–1140 (2014)
Article MathSciNet MATH Google Scholar
Hinton, G.E., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. Sig. Process. Mag. 29, 82–97 (2012)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning (ICML) (2015)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436–444 (2015)
Article Google Scholar
Pardoux, É., Peng, S.: Adapted solution of a backward stochastic differential equation. Syst. Control Lett. 14(1), 55–61 (1990)
Pardoux, É., Peng, S.: Backward stochastic differential equations and quasilinear parabolic partial differential equations. In: Stochastic Partial Differential Equations and Their Applications (Charlotte, NC, 1991), vol. 176 of Lecture Notes in Control and Inform. Sci. Springer, Berlin, pp. 200–217 (1992)
Pardoux, É., Tang, S.: Forward-backward stochastic differential equations and quasilinear parabolic PDEs. Probab. Theory Relat. Fields 114(2), 123–150 (1999)
Article MathSciNet MATH Google Scholar
Peng, S.: Probabilistic interpretation for systems of quasilinear parabolic partial differential equations. Stoch. Stoch. Rep. 37(1–2), 61–74 (1991)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

Christian Beck and Sebastian Becker are gratefully acknowledged for useful suggestions regarding the implementation of the deep BSDE method. This project has been partially supported through the Major Program of NNSFC under grant 91130005, the research grant ONR N00014-13-1-0338, and the research grant DOE DE-SC0009248.

Author information

Authors and Affiliations

Beijing Institute of Big Data Research, Beijing, China
Weinan E
Princeton University, Princeton, NJ, USA
Weinan E & Jiequn Han
Peking University, Beijing, China
Weinan E
ETH Zurich, Zurich, Switzerland
Arnulf Jentzen

Authors

Weinan E
View author publications
You can also search for this author inPubMed Google Scholar
Jiequn Han
View author publications
You can also search for this author inPubMed Google Scholar
Arnulf Jentzen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Weinan E.

Appendix A: Special Cases of the Proposed Algorithm

In this section, we illustrate the general algorithm in Subsect. 3.2 in several special cases. More specifically, in Subsects. 5.1 and 5.2, we provide special choices for the functions $ \psi _m $, $ m \in {\mathbb {N}}$, and $ \Psi _m $, $ m \in {\mathbb {N}}$, employed in (3.14), and in Subsects. 5.3 and 5.4, we provide special choices for the function $ \Upsilon $ in (3.9).

1.1 Stochastic Gradient Descent (SGD)

Example 5.1

Assume the setting in Subsect. 3.2, let $ ( \gamma _m )_{ m \in {\mathbb {N}}} \subseteq (0,\infty ) $, and assume for all $ m \in {\mathbb {N}}$, $ x \in {\mathbb {R}}^{ \varrho } $, $ ( \varphi _j )_{ j \in {\mathbb {N}}} \in ( {\mathbb {R}}^{ \rho } )^{ {\mathbb {N}}} $ that

$$\begin{aligned} \varrho = \rho , \qquad \Psi _m( x, ( \varphi _j )_{ j \in {\mathbb {N}}} ) = \varphi _1 , \qquad \text {and} \qquad \psi _m( x ) = \gamma _m x .\nonumber \\ \end{aligned}$$

(5.1)

Then it holds for all $ m \in {\mathbb {N}}$ that

$$\begin{aligned} \Theta _{ m } = \Theta _{ m - 1 } - \gamma _{ m } \Phi ^{ m - 1, 1 }_{ \mathbb {S}_m }( \Theta _{ m - 1 } ) . \end{aligned}$$

(5.2)

1.2 Adaptive Moment Estimation (Adam) with Mini-Batches

In this subsection, we illustrate how the so-called Adam optimizer (see [25]) can be employed in conjunction with the deep BSDE method in Subsect. 3.2 (cf. also Subsect. 4.1 above).

Example 5.2

Assume the setting in Subsect. 3.2, assume that $ \varrho = 2 \rho $, let $ {\text {Pow}}_r :{\mathbb {R}}^{ \rho } \rightarrow {\mathbb {R}}^{ \rho } $, $ r \in (0,\infty ) $, be the functions which satisfy for all $ r \in (0,\infty ) $, $ x = ( x_1, \dots , x_{ \rho } ) \in {\mathbb {R}}^{ \rho } $ that

$$\begin{aligned} {\text {Pow}}_{ r }( x ) = ( | x_1 |^r, \dots , | x_{ \rho } |^r ) , \end{aligned}$$

(5.3)

let $ \varepsilon \in (0,\infty ) $, $ ( \gamma _m )_{ m \in {\mathbb {N}}} \subseteq (0,\infty ) $, $ ( J_m )_{ m \in {\mathbb {N}}_0 } \subseteq {\mathbb {N}}$, $ \mathbb {X}, \mathbb {Y} \in (0,1) $, let $ \mathbf{m} = ( \mathbf{m}^{ (1) } , \dots , \mathbf{m}^{ ( \rho ) } ) :$ $ {\mathbb {N}}_0 \times \Omega \rightarrow {\mathbb {R}}^{ \rho } $ and $ \mathbb {M} = ( \mathbb {M}^{ (1) } , \dots \mathbb {M}^{ ( \rho ) } ) :{\mathbb {N}}_0 \times \Omega \rightarrow {\mathbb {R}}^{ \rho } $ be the stochastic processes which satisfy for all $ m \in {\mathbb {N}}_0 $ that $ \Xi _m = ( \mathbf{m}_m^{ (1) }, \dots , \mathbf{m}^{ (\rho ) }_m , \mathbb {M}_m^{ (1) }, \dots , \mathbb {M}_m^{ (\rho ) } ) $, and assume for all $ m \in {\mathbb {N}}$, $ x = ( x_1, \dots , x_{ \rho } ) , y = ( y_1, \dots , y_{ \rho } ) \in {\mathbb {R}}^{ \rho } $, $ ( \varphi _j )_{ j \in {\mathbb {N}}} \in ( {\mathbb {R}}^{ \rho } )^{ {\mathbb {N}}} $ that

$$\begin{aligned}&\Psi _m( x, y, ( \varphi _j )_{ j \in {\mathbb {N}}} ) = \big ( \mathbb {X} x + ( 1 - \mathbb {X} ) \big ( \tfrac{ 1 }{ J_m } \textstyle \sum _{ j = 1 }^{ J_m } \varphi _j \big ) , \mathbb {Y} y + ( 1 - \mathbb {Y} ) \nonumber \\&\quad {\text {Pow}}_2\big ( \frac{ 1 }{ J_m } \textstyle \sum _{ j = 1 }^{ J_m } \varphi _j \big ) \big ) \end{aligned}$$

(5.4)

and

$$\begin{aligned} \psi _m( x, y )\! =\! \left( \left[ \varepsilon + \tfrac{ \sqrt{ | y_1 | } }{ \sqrt{ 1 - \mathbb {Y}^m } } \right] ^{ - 1 } \frac{ \gamma _m x_1 }{ ( 1 - \mathbb {X}^m ) } , \dots , \left[ \varepsilon \!+\! \tfrac{ \sqrt{ | y_{ \rho } | } }{ \sqrt{ 1 \!-\! \mathbb {Y}^m } } \right] ^{ - 1 } \frac{ \gamma _m x_{ \rho } }{ ( 1 - \mathbb {X}^m ) } \right) . \end{aligned}$$

(5.5)

Then it holds for all $ m \in {\mathbb {N}}$ that

$$\begin{aligned} \begin{aligned} \Theta _{ m }&= \Theta _{ m - 1 } - \left( \left[ \varepsilon + \tfrac{ \sqrt{ | \mathbb {M}^{ (1) }_m | } }{ \sqrt{ 1 - \mathbb {Y}^m } } \right] ^{ - 1 } \frac{ \gamma _m \mathbf{m}^{ (1) }_m }{ ( 1 - \mathbb {X}^m ) } , \dots , \left[ \varepsilon + \tfrac{ \sqrt{ | \mathbb {M}^{ ( \rho ) }_m | } }{ \sqrt{ 1 - \mathbb {Y}^m } } \right] ^{ - 1 } \frac{ \gamma _m \mathbf{m}^{ (\rho ) }_m }{ ( 1 - \mathbb {X}^m ) } \right) , \\ \mathbf{m}_m&= \mathbb {X} \, \mathbf{m}_{ m - 1 } + \frac{ ( 1 - \mathbb {X} ) }{ J_m } \left( \sum _{ j = 1 }^{ J_m } \Phi ^{ m - 1 , j }_{ \mathbb {S}_m }( \Theta _{ m - 1 } ) \right) , \\ \mathbb {M}_m&= \mathbb {Y} \, \mathbb {M}_{ m - 1 } + \left( 1 - \mathbb {Y} \right) {\text {Pow}}_{ 2 }\left( \frac{ 1 }{ J_m } \sum _{ j = 1 }^{ J_m } \Phi ^{ m - 1 , j }_{ \mathbb {S}_m }( \Theta _{ m - 1 } ) \right) . \end{aligned} \end{aligned}$$

(5.6)

1.3 Euler–Maruyama Scheme

Example 5.3

Assume the setting in Subsect. 3.2, let $ \mu :[0,T] \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d $ and $ \sigma :[0,T] \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d $ be functions, and assume for all $ s, t \in [0,T] $, $ x, w \in {\mathbb {R}}^d $ that

$$\begin{aligned} \Upsilon ( s, t, x, w ) = x + \mu ( s, x ) \, ( t - s ) + \sigma ( s, x ) \, w . \end{aligned}$$

(5.7)

Then it holds for all $ m, j \in {\mathbb {N}}_0 $, $ n \in \{ 0, 1, \dots , N - 1 \} $ that

$$\begin{aligned} \mathcal {X}^{ m, j }_n = \mathcal {X}^{ m, j }_n + \mu \left( t_n, \mathcal {X}^{ m, j }_n \right) \, \left( t_{ n + 1 } - t_n \right) + \sigma \left( t_n, \mathcal {X}^{ m, j }_n \right) \, \left( W_{ t_{ n + 1 } } - W_{ t_n } \right) . \end{aligned}$$

(5.8)

In the setting of Example 5.3, we consider under suitable further hypotheses for every sufficiently large $ m \in {\mathbb {N}}_0 $ the random variable $ \mathcal {U}^{ \Theta _m } $ as an approximation of $ u(0,\xi ) $ where $ u :[0,T] \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^k $ is a suitable solution of the PDE

$$\begin{aligned}&\frac{ \partial u}{ \partial t } ( t, x ) + \frac{ 1 }{ 2 } \sum \limits _{ j = 1 }^d \left( \frac{ \partial ^2 u}{ \partial x^2 } \right) ( t, x )\left[ \sigma ( t, x ) \, e^{ (d) }_j , \sigma ( t, x ) \, e^{ (d) }_j \right] + \left( \frac{ \partial u}{ \partial x } \right) ( t, x ) \, \mu ( t, x ) \nonumber \\&\quad + f\left( t, x, u(t,x), \left( \frac{ \partial u}{ \partial x } \right) ( t, x ) \, \sigma ( t, x ) \right) = 0 \end{aligned}$$

(5.9)

with $ u(T,x) = g(x) $, $ e^{ (d) }_1 = (1,0,\dots ,0) $, $ \dots $, $ e^{ (d) }_d = (0,\dots ,0,1) \in {\mathbb {R}}^d $ for $t \in [0,T] $, $ x = ( x_1, \dots , x_d ) \in {\mathbb {R}}^d $ (cf. (PDE) in Sect. 2 above).

1.4 Geometric Brownian Motion

Example 5.4

Assume the setting in Subsect. 3.2, let $ \bar{\mu }, \bar{\sigma } \in {\mathbb {R}}$, and assume for all $ s, t \in [0,T] $, $ x = ( x_1, \dots , x_d ) $, $ w = ( w_1, \dots , w_d ) \in {\mathbb {R}}^d $ that

$$\begin{aligned} \Upsilon ( s, t, x, w ) \!=\! \exp \left( \left( \bar{\mu }\! -\! \frac{ \bar{\sigma }^2 }{ 2 } \right) ( t\! -\! s ) \right) \exp \left( \bar{\sigma } {\text {diag}}_{ {\mathbb {R}}^{ d \times d } }( w_1, \dots , w_d ) \right) x .\quad \end{aligned}$$

(5.10)

Then it holds for all $ m, j \in {\mathbb {N}}_0 $, $ n \in \{ 0, 1, \dots , N \} $ that

$$\begin{aligned} \mathcal {X}^{ \theta , m, j }_n = \exp \left( \left( \bar{\mu } - \frac{ \bar{\sigma }^2 }{ 2 } \right) t_n {\text {Id}}_{ {\mathbb {R}}^d } + \bar{\sigma } {\text {diag}}_{ {\mathbb {R}}^{ d \times d } }\left( W_{ t_n }^{ m, j } \right) \right) \xi . \end{aligned}$$

(5.11)

In the setting of Example 5.4 we view under suitable further hypotheses (cf. Subsect. 4.4 above) for every sufficiently large $ m \in {\mathbb {N}}_0 $ the random variable $ \mathcal {U}^{ \Theta _m } $ as an approximation of $u(0,\xi ) $ where $ u :[0,T] \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^k $ is a suitable solution of the PDE

$$\begin{aligned}&\tfrac{ \partial u}{ \partial t } ( t, x ) + \tfrac{\bar{\sigma }^2}{ 2 } \textstyle \sum \limits _{i=1}^d | x_i |^2 \, \big ( \tfrac{ \partial ^2 u}{ \partial x^2_i } \big )(t,x) + \bar{\mu } \sum \limits _{i=1}^d x_i \, \big (\tfrac{\partial u}{\partial x_i}\big )(t,x) \nonumber \\&\quad + f\big ( t, x, u(t,x), \bar{\sigma } \, ( \tfrac{ \partial u}{ \partial x } )( t, x ) {\text {diag}}_{ {\mathbb {R}}^{ d \times d } }(x_1, \dots , x_d) \big ) = 0 \end{aligned}$$

(5.12)

with $ u(T,x) = g(x) $ for $ t \in [0,T] $, $ x = ( x_1, \dots , x_d ) \in {\mathbb {R}}^d $.

Rights and permissions

Reprints and permissions

About this article

Cite this article

E, W., Han, J. & Jentzen, A. Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations. Commun. Math. Stat. 5, 349–380 (2017). https://doi.org/10.1007/s40304-017-0117-6

Download citation

Received: 19 July 2017
Accepted: 23 August 2017
Published: 10 November 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s40304-017-0117-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Machine Learning Approximation Algorithms for High-Dimensional Fully Nonlinear Partial Differential Equations and Second-order Backward Stochastic Differential Equations

Deep learning schemes for parabolic nonlocal integro-differential equations

Neural networks-based backward scheme for fully nonlinear PDEs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix A: Special Cases of the Proposed Algorithm

Appendix A: Special Cases of the Proposed Algorithm

1.1 Stochastic Gradient Descent (SGD)

Example 5.1

1.2 Adaptive Moment Estimation (Adam) with Mini-Batches

Example 5.2

1.3 Euler–Maruyama Scheme

Example 5.3

1.4 Geometric Brownian Motion

Example 5.4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now