DeepCME: A deep learning framework for computing solution statistics of the chemical master equation

Ankit Gupta; Christoph Schwab; Mustafa Khammash

doi:10.1371/journal.pcbi.1009623

Abstract

Stochastic models of biomolecular reaction networks are commonly employed in systems and synthetic biology to study the effects of stochastic fluctuations emanating from reactions involving species with low copy-numbers. For such models, the Kolmogorov’s forward equation is called the chemical master equation (CME), and it is a fundamental system of linear ordinary differential equations (ODEs) that describes the evolution of the probability distribution of the random state-vector representing the copy-numbers of all the reacting species. The size of this system is given by the number of states that are accessible by the chemical system, and for most examples of interest this number is either very large or infinite. Moreover, approximations that reduce the size of the system by retaining only a finite number of important chemical states (e.g. those with non-negligible probability) result in high-dimensional ODE systems, even when the number of reacting species is small. Consequently, accurate numerical solution of the CME is very challenging, despite the linear nature of the underlying ODEs. One often resorts to estimating the solutions via computationally intensive stochastic simulations. The goal of the present paper is to develop a novel deep-learning approach for computing solution statistics of high-dimensional CMEs by reformulating the stochastic dynamics using Kolmogorov’s backward equation. The proposed method leverages superior approximation properties of Deep Neural Networks (DNNs) to reliably estimate expectations under the CME solution for several user-defined functions of the state-vector. This method is algorithmically based on reinforcement learning and it only requires a moderate number of stochastic simulations (in comparison to typical simulation-based approaches) to train the “policy function”. This allows not just the numerical approximation of various expectations for the CME solution but also of its sensitivities with respect to all the reaction network parameters (e.g. rate constants). We provide four examples to illustrate our methodology and provide several directions for future research.

Author summary

We develop a deep learning framework for estimating solutions of the chemical master equation (CME) that is fundamental to stochastic analysis of reaction networks. The CME is a system of ordinary differential equations that describes the time-evolution of the probability density of the random state-vector, and owing to an inherent curse of dimensionality, directly solving the CME is generally impractical with existing approaches. Moreover, the commonly employed simulation-based approaches for estimating CME solutions often require an exorbitant amount of computational time, even for moderately-sized networks. To counter these issues, we develop a deep reinforcement learning based method, called DeepCME, in this paper. DeepCME not only estimates function expectations based on the CME solution, but it also solves the more challenging problem of estimating their sensitivities with respect to all the model parameters. We illustrate our approach with four carefully chosen reaction network examples with varying sizes. Our results demonstrate that DeepCME reliably estimates the expectations of interest, along with all the parametric sensitivities, at a fraction of the computational cost of simulation-based estimators. We present many directions for future research and suggest further improvements to DeepCME that can greatly enhance its accuracy and applicability.

Citation: Gupta A, Schwab C, Khammash M (2021) DeepCME: A deep learning framework for computing solution statistics of the chemical master equation. PLoS Comput Biol 17(12): e1009623. https://doi.org/10.1371/journal.pcbi.1009623

Editor: James R. Faeder, University of Pittsburgh, UNITED STATES

Received: June 9, 2021; Accepted: November 8, 2021; Published: December 8, 2021

Copyright: © 2021 Gupta et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The source code used to produce the results and analyses presented in this manuscript are available from GitHub repository: https://github.com/ankitgupta83/DeepCME.

Funding: This project has received funding from the European Research Council (ERC) https://erc.europa.eu/ under the European Union’s Horizon 2020 research and innovation programme grant agreement no. 743269 (CyberGenetics project). This grant was awarded to M.K. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

This is a PLOS Computational Biology Methods paper.

1 Introduction

Stochastic modelling in systems and synthetic biology has become an indispensable tool to quantitatively understand the intrinsically noisy dynamics within living cells [1]. Intracellular reaction networks typically involve low copy-number species in reactions that fire intermittently at random times, as opposed to continuously. Hence, deterministic models of such networks based on Ordinary Differential Equations (ODEs) fail to capture the essential properties of the system, and stochastic models become necessary [2].

Among the most widely used stochastic models are continuous-time Markov chains (CTMCs) whose states represent the copy-numbers of all species involved in the Chemical Reaction Network (CRN) [3]. If the number of species in the CRN is n, the Markov chain evolves over a discrete, possibly infinite, state-space comprising all accessible states. In most applications, the key object of interest is the probability distribution p(t) of the random state at time t. This probability distribution evolves in time t according to Kolmogorov’s forward equation that is more famously known in the chemical literature as the Chemical Master Equation (CME) (see, e.g., [4], and (8)). The CME is a system of coupled, deterministic ODEs describing the rates of inflow and outflow of probability at each state in the state-space . For even very small examples of CRNs, can be very large or infinite, and hence the CME cannot be solved directly despite the linear nature of its constituent ODEs. Hence, one typically estimates CME solutions numerically either by simulating the CTMC trajectories with numerical methods like the Stochastic Simulation Algorithm (SSA) [5] or the modified Next Reaction Method (mNRM) [6], or one models (parts of) the CME asymptotically in various parameter regimes, such as the large copy-number limit, or the large systems limit (see, e.g., [3, 7] and the references therein). Then, Fokker-Planck PDEs govern the evolution of the limiting densities. Solutions of these PDEs are known to admit DNN approximations which are free from the “Curse of Dimensionality” (CoD), see e.g. [8] and the references there.

The main drawback of simulation-based solvers is that obtaining statistically precise estimates of the CME solution can be very cumbersome, due to the high cost of CTMC trajectory simulation. This led to the development of the Finite State Projection (FSP) method [9] that approximately solves the CME by truncating the state-space to a finite, tractable size. The FSP has been successfully used in many important biological studies with stochastic reaction network models. Over time, numerous algorithmic improvements to the original FSP method have been made, using advanced techniques such as Krylov Subspace approximations [10] or Tensor-Train representations [11]. Despite these advances, the scope of FSP’s applicability is still fairly limited because of the CoD inherent to the CME for complex CRNs: the dimension of the copy-number space of a large number of species involved in the CRN can be potentially prohibitive. With the algorithmic complexity of deterministic solution methods of the CME scaling exponentially with the number of species n, the CoD obviates the efficient numerical treatment of the CME for complex CRNs. In spite of these drawbacks, simulation schemes like the SSA or mNRM combined with FSP and its variants have emerged as the methodology of choice during the past decades for the computational exploration of complex CRNs in systems biology. This is mainly due to the lack of computational schemes that can effectively deal with CoD.

In the past decade, with the ubiquitous emergence of possibly massive, noisy data from natural biological CRNs, and the possibility of engineering synthetic biological CRNs, the question of efficient numerical analysis of CRNs has become pivotal. Indeed several tasks in computational biology strongly depend on the availability of scalable, efficient computational tools to analyse large CRNs. These include structure and parameter identification in large CRNs, assimilation of observable data into CRN models, Bayesian estimation of non-observable quantities of interest conditional on CRNs, among many others.

Recently, in the context of high-dimensional partial differential equations (PDEs), deep-learning based numerical approaches have been found highly effective in dealing with the CoD in these settings and appear efficient in numerical approximation of PDE solutions with high-dimensional state and parameter spaces [12–14]. We refer to the survey [8] and the references therein. Importantly, several types of PDEs considered in these studies also arise from various asymptotic scalings (large copy-numbers, large systems limits) of large CRNs. (e.g. [4, 7, 15, 16]). Furthermore, DNNs have been shown to be at least as expressive as certain tensor-structured formats from numerical multi-linear algebra, which were developed for the CME in [11] (see also [17]).

Motivated by these advances and observations, in this paper we develop and explore corresponding deep-learning approaches for the efficient numerical solution of CMEs and for the related tasks of parameter estimation, and inference.

Before detailing our approach, we remark that leveraging Machine Learning (ML) based approaches for the numerical treatment of complex CRNs is, in our view, natural and critical: CRNs being themselves networks, any viable computational approach should, in some sense, mimic this structure in order to accommodate the complexity of CRNs. This is in line with our previous work on tensor network based computational methods (e.g. [11, 18]). On the other hand, ML-based computational methodologies for data assimilation and quantitative prediction of complex systems is currently undergoing intense development. We therefore expect that corresponding advances in computational ML, such as progress in interpretability and training methods for DNNs, will entail corresponding methodological advances in the exploration of large, complex CRNs in biological systems engineering.

Next we briefly describe our ML approach to solving the CME. Instead of estimating the probability p(t, x) for each state , one is often interested in learning the expectation of suitable real-valued functions g, referred to as the output function, under this probability distribution. Therefore, one is interested in the input-output map that associates an initial density to (1) In the case this sum is formal for now. We later will indicate some sufficient conditions for this sum to be well-defined—applying state-space truncation schemes e.g. [9], we may assume that holds with a small error in the estimated expectation, which renders the summation finite. For example, if , for some and i ∈ {1, …, n}, then the output to be estimated is the m-th moment of the random copy-number of the i-th species at time t, i.e. Another relevant example is when , the indicator function for some subset defined as

Then the output is the probability of the state X(t) being in set A at time t

One method of choice to numerically approximate the map is by stochastic simulations generated with the SSA and its variants (e.g. [5, 19–21] and the references therein) combined with ensemble averaging. Generally, this approach mandates a large number of sample paths, to achieve Monte Carlo convergence to reasonable accuracy for at fixed t > 0. In the present paper, we propose DeepCME, a deep neural network based methodology to emulate the above-mentioned map. Also in the present approach path simulation is required, during the DNN training phase. However, we find that the number of paths to achieve DNN training generally is lower than by direct use of Monte Carlo estimator combined with stochastic simulations; accuracy is achieved through the generalization properties of DNNs rather than through approximation of admissible sets of initial densities.

As is by now well-known in ML, an essential ingredient in DNN based approaches to emulate high-dimensional maps is the mathematical setup of suitable loss-functions which determine the training process. In DeepCME, we propose a particular loss function which is inspired by other, recent approaches in computational finance (e.g. [12] and the references therein). Specifically, using Kolmogorov’s backward equation, Kurtz’s random time change formulation [22] and Ito’s formula for jump processes, we identify an equation that the output quantity of interest along with some “policy map” must uniquely satisfy for each stochastic trajectory (X(t))_t≥0 almost surely. Minimising a “loss” function that measures the error in this equation, allows us to train a deep neural network to learn the policy map and the quantity of interest in a reinforcement learning framework. Remarkably, this approach also yields the sensitivities of the quantity of interest w.r.t. all model parameters. Estimating these parametric sensitivities is important for many applications, but it is considered a difficult problem towards which a lot of research effort has recently been directed [23–31].

This paper is organised as follows. In Section 2 we provide some background on the CTMC model of a reaction network. In Section 3 we present our main results that allow us to cast the problem of solving a CME into the reinforcement learning framework. In Section 4 we describe our deep-learning approach and its implementation in TensorFlow. In Section 5 we illustrate this approach with four examples. Finally, in Section 6 we conclude and present directions for future research.

2 Preliminaries

2.1 The stochastic model

We start by describing the continuous-time Markov chain (CTMC) model of a reaction network. Consider a network with n species, denoted by X₁, …, X_n, that participate in K reactions of the form (2) where ν_ki (resp. ) is the number of molecules of species X_i consumed (resp. produced) by reaction k. The system’s state at any time is the vector of copy-numbers of the n species. As time advances, this state gets displaced by the stoichiometric vector when reaction k fires, and this event occurs at rate λ_k(x) where is the propensity function for reaction k. Commonly λ_k is given by mass-action kinetics [3] (3) with c_k > 0 being the associated rate constant.

There are many ways to formally specify the CTMC representing a reaction network. One way is through its generator, which is an operator that captures the rate of change of the probability distribution of the process (see Chapter 4 in [22]). It is given by (4) for any f that is a bounded real-valued function on the state-space of the Markov chain. The state-space is assumed to be nonempty and closed under the reaction dynamics, i.e. if and λ_k(x) > 0 then (x + ζ_k) is also in .

Another way to specify the CTMC is via Kurtz’s random time change representation (see Chapter 6 in [22]) (5) where R_k(t) is a counting process that counts the number of firings of reaction k in the time-period [0, t]. As is customary in trajectory-simulation (e.g. [3, 19]) which will also be required by us for DNN training, we express it in terms of an independent unit rate Poisson process Y_k as e.g. [21] (6)

With this representation in place, we consider the CTMC (X(t))_t≥0 on the canonical probability space generated by the independent unit-rate Poisson processes Y₁, …, Y_K.

2.2 Kolmogorov’s forward and backward equations

Let (X(t))_t≥0 be the CTMC representing reaction dynamics with some initial state . For any state , let (7) be the probability that the CTMC is in state x at time t. These probabilities evolve deterministically in time according to Kolmogorov’s forward equation, more widely known as the Chemical Master Equation (CME) [3, 4]. The CME is the following system of deterministic linear ODEs (8) for each . Note that the number of ODEs in this CME system is equal to , the number of elements in , which is typically exorbitantly large or even infinite.

Consider an output function such that (9) for some finite time horizon T > 0. The Kolmogorov’s backward equation [32] describes the evolution of the martingale (w.r.t. the filtration generated by (X(t))_t≥0) (10) in the time interval [0, T], and it is given by (11) with the terminal condition V_g(T, x) = g(x), . The backward Eq (10) will play a key role in our development of a deep learning approach for estimating quantities of the form .

In the case where the state-space is finite, i.e. , we can enumerate it as . Then the CTMC generator in (11) can be expressed as the m × m transition rate matrix Q = [Q_ij] given by Here we assume for convenience that all stoichiometry vectors (ζ_k-s) are distinct. Viewing p(t) as the vector , we can express the CME (8) as (12)

Here, , i, j ∈ 1: m. CME (12) admits the closed-form solution (13)

Similarly, viewing V_g(t) as the vector , the backward Eq (11) can be solved as (14) where g denotes the vector . We are interested in networks where is extremely large or infinite. Then, numerically computing the matrix exponential in (13) or in (14) is not an option.

2.3 Parametric sensitivity analysis

Now consider the situation where the propensity functions depend on a scalar parameter θ (like reaction rate constant for mass-action kinetics, temperature etc.). Denoting the θ-dependent CTMC as (X_θ(t))_t≥0, it is often of interest to compute the parametric sensitivity (15) of the observed output at time T. Such sensitivity values are important for many applications and their direct calculation is generally impossible but a number of simulation-based approaches have recently been developed to provide efficient numerical estimation of these sensitivity values; we mention only [23–31].

Theorem 3.3 in [29] proves that (16) where V_g(t, x) is defined by (10) with X(⋅) replaced by X_θ(⋅). The main difficulty in using this formula for computing sensitivities is that the function (17) is unknown and hence it must be estimated “on the fly” by numerically generating auxiliary paths [29]. In the method we develop in this paper we shall “learn” (i.e., emulate by ML techniques) this function using deep neural networks. This would provide a simple direct way to estimate the parameter sensitivity via formula (16). This approach would in fact yield sensitivities w.r.t. all the model parameters in one shot, unlike what is afforded by existing sensitivity estimation approaches. In other words once the function x ↦ Δ_k V_g(t, x) is available for each k ∈ 1: K, we can use a common set of simulated trajectories to evaluate Monte Carlo estimators for sensitivities w.r.t. all parameters, based on expression (16), without any extra simulation effort. This is unlike most simulation-based approaches where estimation of each parameter sensitivity requires an additional set of distinct trajectories.

3 Main results

In this section we state and prove the key result on which our deep learning approach depends. Recall that our goal is to estimate (see (1)) which is the same as V_g(0, x₀) (see (10)) if the initial state of the CTMC is X(0) = x₀. Also recall the random time-change representation (5) and the definition of the reaction counting process R_k from (6). Henceforth we shall denote the centred version of this process as (18)

This centred process is a local martingale w.r.t. the filtration generated by (X(t))_t≥0 (see Chapter 1 in [33]).

We now state an assumption that we require for our approach.

Assumption 3.1 (Non-explosivity of the CTMC) Let (X(t))_t≥0 be the CTMC given by (5) with deterministic initial condition X(0) = x₀. Let denote the state-space of this CTMC and let be the filtration it generates. If τ_M is the -stopping time defined by then τ_M → ∞ almost surely as M → ∞.

Remark 3.2 There are a number of works in the literature that provide sufficient conditions for this non-explosivity condition to hold, subject to the form of the propensity functions (see for example in [34–36] and the references therein). Under the no-explosion assumption a probability distribution p(t) satisfying the CME exists uniquely (see e.g. Lemma 1.23 in [33]).

Next we present the main result on which our deep learning approach is based.

Theorem 3.3 (Expected output and policy map characterization) Suppose Assumption 3.1 holds for the CTMC (X(t))_t≥0 and the output function satisfies (9). Let be a real number and let be a measurable -valued function on such that the following relation holds almost surely (19)

Then and exist uniquely and they can be identified as (20) where V_g(t, x) is given by (10) and the difference operator Δ_k is as in (17).

Remark 3.4 (Connection to our deep learning approach) Before we prove this theorem, we briefly describe how this result translates into our deep learning approach, details of which will be provided in Section 4. We can view as the “policy map” (in the parlance of reinforcement learning) that decides actions based on the current time-state pair (t, x), and depending on these actions the constant initial value is evolved in the time-interval [0, T] according to the r.h.s. of (19) for any CTMC trajectory (X(t))_t≥0. Theorem 3.3 shows that the only way the final outcome of this evolution matches g(X(T)) at time T, is when is exactly the expected output , and the policy map is exactly (Δ₁ V_g(t, x), …, Δ_K V_g(t, x)) where Δ_k V_g(t, x) is the difference in the expected output at time T, due to a single firing of reaction k at time t with system’s state at x = X(t).

Using the modified next reaction method [6], one can easily generate trajectories of the CTMC (X(t))_t≥0 along with the associated centred reaction counting processes . For each such trajectory, relation (19) can be interpreted in terms of known and unknown quantities as (21)

We represent the unknown map by a DNN and consider unknown as an optimisation variable. Then by minimising a “loss” function that measures the discrepancy in relation (21) we try to recover the optimal values of and that are given by (20). This allows us to estimate the output of interest (as ) and also its parametric sensitivities by substituting for Δ_k V_g(t, x) in (16).

Observe that in traditional simulation-based estimation approaches, each simulated trajectory contributes with a small weight (viz. reciprocal of the sample size) to the Monte Carlo estimator for the output or one of its parameter sensitivities. This is quite unlike the proposed deep learning approach where each trajectory specifies an almost sure relationship between the unknown quantities that determine both the output and all its parameter sensitivities. Hence the deep learning approach is able to extract more information out of a small number of simulated trajectories as our examples in Section 5 illustrate.

Proof.[Proof of Theorem 3.3] We prove this result in two steps. We first show that and given by (20) satisfy (19) almost surely. Then, we prove that if another such pair satisfying (19) exists then we must necessarily have and .

Applying Ito’s formula for jump Markov processes to V_g(t, X(t)) we obtain

Using Kolmogorov’s backward Eq (11) and simplifying we get (22)

Noting that V_g(T, X(T)) = g(X(T)) and we see that (19) holds with and given by (20).

Now let be another pair satisfying (19), i.e. We subtract (22) from this equation to obtain (23) where Note that is a local martingale w.r.t. the filtration generated by (X(t))_t≥0 as it is defined as a sum of stochastic integrals whose integrands are adapted to and whose integrators are local martingales w.r.t. (see Appendix A.3 in [33]).

If τ_M is the stopping time defined in Assumption 3.1, then the stopped process m(t ∧ τ_M) is a martingale, where a ∧ b ≔ min{a, b}. Applying Doob’s maximal inequality [22] on the submartingale |m(t ∧ τ_M)| we obtain (24) Note that terms on both sides of the inequality are monotonically increasing in M. This monotonicity is obvious for the term on the l.h.s. and for the term on the r.h.s. it follows from the conditional Jensen’s inequality and from the martingale property Letting M → ∞ and using the monotone convergence theorem on both sides of (24) we obtain where we have used the fact that τ_M → ∞ as M → ∞ due to Assumption 3.1. Relation (23) informs us that m(T) = 0 almost surely and hence This is sufficient to conclude that and for any t ∈ [0, T].

As this holds for any CTMC trajectory (X(t))_t≥0, we must have for any . This completes the proof of this theorem.

4 DeepCME: Deep learning formulation for CME

In this section we detail our deep learning method for solving CME, referred to as DeepCME. We have computationally implemented this method using the machine learning library TensorFlow [37]. Our source code for generating the ensuing numerical experiments is available at GitHub: https://github.com/ankitgupta83/DeepCME.

As outlined in Remark 3.4, our approach is based on the almost sure relationship established in Theorem 3.3. Even though this result was presented for a single output function g(x), it can be easily extended for a vector-valued function g(x) = (g₁(x), …, g_R(x)) by considering the unknown variable as a R-dimensional vector and the unknown map that takes a time-state pair (t, x) as input and produces an output in the space of R × K matrices. Such an extension is useful because in most applications one is interested in estimating multiple statistical properties (like means, variances, covariances etc.) of the CME solution p(T, ⋅).

We now define the “loss” function that measures the discrepancies in the R almost sure relations given by (21). Let be the following continuously differentiable function where Δ = (Δ₁, …, Δ_R) is a vector of positive threshold values and

We define the loss function as (25) where the expectation is estimated by computing the sample mean over a finite batch of “training” trajectories. During the training process this loss function is minimised in order to learn the optimal , which estimates our expectations of interest and the optimal matrix-valued policy map (see Remark 3.4). This policy map will enable us to estimate sensitivities of the quantities of interest w.r.t. all the model parameters as discussed in Section 2.3. The threshold values Δ = (Δ₁, …, Δ_R) help in neutralising the disparities in the relative magnitudes of the estimated quantities and the discrepancies in the corresponding almost sure relations. The loss function minimisation is performed with the stochastic gradient descent (SGD) algorithm that makes use of the automatic differentiation routines that are built in TensorFlow. Differentiability properties of the function L which defines the loss function are important for convergence of the SGD iterations. Our choice of ϕ(x) makes L(x₁, …, x_R) a differentiable combination of norm (for components with absolute values greater than 1) and norm squared (for components with absolute values strictly less than 1). Having such a combination makes the training more robust and promotes sparsity.

In DeepCME we first encode the matrix-valued policy map by a DNN and we include as a vector of trainable variables. Then a batch of training trajectories is generated, and based on and the DNN-encoded policy map , the loss function is evaluated for this training data by measuring the discrepancy (according to (25)) in the almost sure relationship presented in Theorem 3.3. Keeping the training data fixed, this loss function is then minimised by adjusting and the DNN with SGD for a given number of iterations. Once these iterations are complete, provides estimates for the expectations of interest and their parametric sensitivities can be estimated by evaluating Monte Carlo estimators based on expression (16), using the DNN-encoded policy map and the training trajectories.

In the next two subsections we elaborate more on the DNN encoding of the policy map and the loss function computation based on simulated trajectories.

4.1 DNN encoding of the policy map

Recall from Section 2.2 that if the state-space is finite then V_g(t, x) can in principle be found by exponentiating the transition rate matrix Q multiplied with (T − t) (see (14)). Hence, if λ = λ₁ + iλ₂ is an eigenvalue of Q, then on the associated eigenspace we would expect that the dependence of V_g(t, x) on time t is given by . Motivated by this rationale, rather than passing the time-values t directly as inputs to the DNN that encodes , we shall pass temporal features of the form (26) where λ₁₁, …, λ_r1, λ₁₂, …, λ_r2 are 2r trainable variables that represent the r dominant eigenvalues of the generator of the CTMC. Additionally, r trainable variables ψ₁, …, ψ_r are included to represent ‘phase shifts’. Problem-specific temporal features, like the ones we consider, have been successfully employed in existing deep learning methods for ODE-based reaction network models (see, e.g., [38] and the references therein). Note that the mapping between time t and the temporal features is one-to-one and hence no information is lost by substituting time inputs with temporal features.

We encode the policy map as a fully connected feedforward deep neural network whose architecture is schematically shown in Fig 1. This neural network consists of an input layer, L hidden layers and an output layer. Mathematically, DNNs Φ considered here are determined by a tuple (27) where in layer ℓ = 1, …, L + 1, the map is an affine transformation i.e. , with weight matrix , and bias vector . As mentioned, in the presently considered DNNs, the input layer takes the temporal features and the state vector x = (x₁, …, x_n).

Download:

Fig 1. Architecture of the neural network.

DNN architecture to encode the matrix-valued map . The inputs (t, x) are passed to an input layer, which leaves the state values x unchanged but activates a dictionary of temporal features (26). The resulting output is propagated through a DNN with L fully connected hidden layers, and an additional output layer with each layer having N_H nodes. For simplicity, we assume no sparsity in the weight matrices and the bias vectors of these layers. In the final step, the output from the output layer is cast into the policy-map matrix corresponding to inputs (t, x). In Section 4.2 we describe how the loss function can be computed using this matrix-valued map for a batch of stochastic trajectories.

https://doi.org/10.1371/journal.pcbi.1009623.g001

The nonlinearities in (27) act on vectors in component-wise, with possibly different activations at each layer. The number L + 1 denotes the number of layers (sometimes referred to as depth) of the DNN Φ, and L denotes the number of hidden layers of DNN Φ.

With the DNN Φ, we associate a realization, i.e., a map

The relation between the DNN parameters Φ and its realisation as a map is not one-to-one: for several choices of Φ, realizations R(Φ) may give rise to the same map . This over-parametrization of DNNs is well-known to cause multiple minima in loss functions of DNN parameters, and to obstruct use of efficient optimisation algorithms in numerical DNN training.

The goal of DNN approximations is to provide a parsimonious surrogate map R(Φ) for many-parametric, input-output maps which are not explicitly known and are accessible computationally only through possibly noisy evaluations.

The input layer transforms the time-value t into temporal features (26) but leaves the state vector x = (x₁, …, x_n) unchanged. For the layers, we assume fixed width, i.e., that each layer consists of N_H nodes (including the output layer). We also assume that no activation is applied at the output layer, i.e. ρ_L+1 is the identity function, and all activations in the hidden layers are equal, i.e. for ℓ = 1, …, L and for i = 1, …, N_H, . In the ensuing numerical examples, we employ the so-called ReLU-activation for the hidden layers, which is given by (28)

Remark 4.1 More generally, for , we may choose the activations , observing that increasing the value of k increases differentiability of realizations of the DNN Φ. This may be of relevance in cases where the diffusion limits for large copy number counts of particular species imply higher smoothness of the map x ↦ p(T, x).

4.2 Loss function computation based on the training data

To numerically evaluate the loss function, we require simulated training trajectories of the form where X denotes the CTMC and each is the vector of centred reaction counting processes defined by (18). Such trajectories can be easily generated with Anderson’s modified next reaction (mNRM) method [6]. We discretise the time-interval [0, T] as Based on this partition, each simulated trajectory can be viewed as a collection of J + 1 triplets For each j we pass the time-state pair (t_j, X(t_j)) as input to the DNN-encoded matrix valued policy map to obtain . This allows us to compute iteratively as with . Here each is a R × 1 vector, is a R × K matrix and is a K × 1 vector. Following this scheme we can compute for the q-th simulated trajectory . With M such i.i.d. trajectories, the loss function (25) can be estimated as (29) Here, we made use of (23).

Remark 4.2 In the loss function (25) and its MC estimate (29), one could add a sparsity-promoting regularization term, in which case (25) would become (30)

Here, μ ≥ 0 is a penalty parameter and promotes sparsity in weights W_ℓ and biases b_ℓ comprising Φ. In the numerical experiments we report we did not use this device.

Remark 4.3 When the time-interval [0, T] is large, instead of using a single DNN to approximate the policy map , it may beneficial to employ multiple temporal DNNs that are uniformly distributed in the time-interval [0, T]. All these DNNs have the same structure, as shown in Fig 1. If N_T such DNNs are employed, then we use the m-th DNN to represent the policy map for t ∈ [(m − 1)δ, mδ) where m = 1, …, N_T and δ = T/N_T. Distributing DNNs across time would reduce the complexity of the policy map (as a function of time t) that is needed to be learned. This is helpful in scenarios where the stochastic dynamics has intricate temporal features, such as oscillations.

5 Examples

We now present four examples to illustrate our DeepCME method. All these examples are reaction networks with n species, denoted by X₁, …, X_n, and 2n reactions. By varying n, we shall investigate how the DeepCME method performs as the network gets larger and compare its performance with simulation based methods.

In all the examples, we assume that all the species have zero copy-numbers initially, and we consider two output functions g₁(x) = x_n and whose expectation is to be estimated under the probability distribution given by the CME solution at time T = 1. In other words, we shall use DeepCME to estimate the first two moments of the copy-number of species X_n at time T, viz. (31) We shall compare these estimates to those obtained by simulating 1000 CTMC trajectories with mNRM [6]. Our method DeepCME also yields estimates of the sensitivities of the estimated moments (31) w.r.t. all model parameters. We plot these estimates and compare them with the estimates obtained via the simulation-based Bernoulli Path Algorithm (BPA) [29]. These latter estimates are based on a sample of size 1000 and for each sample BPA requires generation of a certain number of auxiliary paths (see Section 2.3) which we set to be 10 in our examples.

In all the examples, we encode the policy map as a DNN with L = 2 hidden layers and N_H = 4 nodes per layer (see Fig 1), irrespective of the number of species n. The activation function for all hidden layer nodes is ReLU(x) (see (28)) and we choose r = 1 for the temporal features (26) to transform the time-values. For loss function computation, we partition the time-interval [0, T] into J = 50 equal size time-increments.

The neural network is trained with a training batch of M = 100 trajectories generated a priori with mNRM (see Section 4), and another such batch of M = 100 trajectories is used for validation. We display the loss function for the validation trajectories to track the training process. To facilitate comparison across network sizes, we normalise all the loss function trajectories to be one at the start of training. Note that the definition of our loss function (25) depends on certain threshold values Δ = (Δ₁, Δ₂). We choose these values as where (resp. ) denotes the sample mean (resp. standard deviation) of the values of the output function g_j for the trajectories in the training batch. Finally, to gauge the computational efficiency of DeepCME we compare the total central processing unit (CPU) time it requires (including the time to generate training and validation trajectories) to the total CPU time required by simulation-based approaches (mNRM and BPA) to estimate the expectations (31) and all its parameter sensitivities. All the computations were performed on the Euler computing cluster of ETH Zurich [39].

5.1 Independent birth death network

As our first example (see Fig 2A), we consider a network of n species that are all undergoing independent birth-death reactions We set k = 10 and γ = 1. The propensity functions obey mass-action kinetics and are hence affine functions of the state variable x.

Download:

Fig 2. Independent birth death network.

(A) Depicts the network with n species and 2n reactions with mass-action kinetics. (B) The CPU times are shown for DeepCME for different values of n (denoted as # species), and for comparison the time needed by simulation based methods (mNRM for function estimates and BPA for parameter sensitivities) with 1000 trajectories is also indicated. (C) Plots the validation loss function w.r.t. training steps for various n values. Panels (D-F) show estimates for the function values ( and ) at T = 1 and the parameter sensitivities. The estimates with simulation based methods are shown as 95% confidence intervals with 1000 samples.

https://doi.org/10.1371/journal.pcbi.1009623.g002

For n = 5, 10, 20 species, we apply DeepCME to this reaction network by training the neural network for 10′000 SGD iterations. Based on the trained neural network, we compute estimates of the first two moments (31) and their sensitivities to both the model parameters k and γ. We also estimate these quantities with simulation-based methods (mNRM and BPA) with 1000 samples, and since the propensity functions are linear we can compute these quantities exactly as well. In plots shown in Fig 2D, 2E and 2F, we compare the estimates from all these approaches for various values of n. Observe that DeepCME is in general quite accurate in estimating both the moments and their parametric sensitivities, but there are a few cases where the error is significant (e.g. sensitivity w.r.t. γ for and n = 20). These errors can in principle be reduced by employing a different neural network to encode the policy map. In our experience, these errors were also reduced in some cases by including a sparsity promoting term in the loss function (see Remark 4.2) but the result was highly sensitive to the relative weight (i.e. parameter μ in (30)) of this term.

The CPU time required by DeepCME and simulation-based methods for obtaining moment and sensitivity estimates are plotted in Fig 2B. Note that the CPU time for simulation-based methods grows linearly with the network size n, but for DeepCME this growth is sub-linear owing to the fixed structure of the underlying neural network. Despite this fixed structure, the validation loss function trajectories for DeepCME are similar for all n (see Fig 2C), indicating that the training process has low dependence on the number of species, probably because the species are evolving independently.

5.2 Linear signalling cascade

Our second example is a linear cascade with n-species (see Fig 3A), where species X_i catalyses the production of species X_i+1. The 2n reactions are given by We set β₀ = 10, k = 5 and γ = 1. As in the previous example, all the propensity functions obey mass-action kinetics and are hence affine functions of the state.

Download:

Fig 3. Linear signalling cascade.

(A) Depicts the network with n species and 2n reactions with mass-action kinetics. (B) The CPU times are shown for DeepCME for different values of n (denoted as # species), and for comparison the time needed by simulation based methods (mNRM for function estimates and BPA for parameter sensitivities) with 1000 trajectories is also indicated. (C) Plots the validation loss function w.r.t. training steps for various n values. Panels (D-F) show estimates for the function values ( and ) at T = 1 and the parameter sensitivities. The estimates with simulation based methods are shown as 95% confidence intervals with 1000 samples.

https://doi.org/10.1371/journal.pcbi.1009623.g003

For number of species n = 2, 5, 10, we apply DeepCME to this reaction network by training the neural network for 10′000 SGD iterations. Then we compute the moment estimates (31) and their sensitivities to all the model parameters. These quantities are also estimated with simulation-based methods (mNRM and BPA) with 1000 samples, and as with the previous example, the linearity of the propensity functions enables us to compute these quantities exactly as well. In the plots shown in Fig 3D, 3E and 3F, we compare the estimates from all these approaches for various values of n. Observe that DeepCME is accurate in estimating the moments but some of the parameter sensitivity estimates are not very accurate (e.g. sensitivity w.r.t. k for and n = 10). This is because the training process is not successful, as indicated by the validation loss function trajectories shown in Fig 3C. The CPU times for DeepCME and simulation-based methods are plotted in Fig 3B, and as expected they show sub-linear growth w.r.t. n for the former but linear growth for the latter.

Download:

Fig 4. Linear signalling cascade (continued).

In panels (A) and (B) we illustrate that the accuracy of the DeepCME estimates remains unaltered when the DNN shape parameters—N_H (number of nodes per layer) and L (number of hidden layers)—are doubled from their default values of N_H = 4 and L = 2. In panel (C) we illustrate that the accuracy of these estimates improves when the number of training trajectories is increased from M = 100 to M = 500.

https://doi.org/10.1371/journal.pcbi.1009623.g004

It is natural to ask if the accuracy of the estimates provided by DeepCME for n = 10 can be improved by making the DNN “deeper” (by increasing the number of hidden layers L) or “wider” (by increasing the number of nodes per layer N_H). We tested this by doubling each of these shape parameters, while keeping the other the same, and rerunning the DeepCME training procedure. As results in Fig 4A and 4B indicate, changing the DNN shape parameters did not improve the accuracy of the estimates. However we found that when we increase the number of training trajectories (see Fig 4C), the accuracy of the estimates does improve and this improvement is quite substantial in some cases (e.g. sensitivity w.r.t. γ for and n = 10).

5.3 Nonlinear signalling cascade

We now consider a variant of the network in the previous example where the catalytic production of species X_i+1 by species X_i is non-linear (see Fig 5A) and is given by a activating Hill propensity with a basal rate (32) where b = 1, k_m = 100, k₀ = 10 and H = 1. Other reactions have mass-action kinetics as in the previous example, with the same rate constants β₀ = 10 and γ = 1.

Download:

Fig 5. Nonlinear signalling cascade.

(A) Depicts the network with n species and 2n reactions. The reactions shown with red dashed-arrow have propensities given by a nonlinear activating Hill function (32). Other reactions have mass-action kinetics. (B) The CPU times are shown for DeepCME for different values of n (denoted as # species), and for comparison the time needed by simulation based methods (mNRM for function estimates and BPA for parameter sensitivities) with 1000 trajectories is also indicated. (C) Plots the validation loss function w.r.t. training steps for various n values. Panels (D-F) show estimates for the function values ( and ) at T = 1 and the parameter sensitivities (only the significant sensitivities are shown). The estimates with simulation based methods are shown as 95% confidence intervals.

https://doi.org/10.1371/journal.pcbi.1009623.g005

For number of species n = 2, 5, 10, we apply DeepCME to this reaction network by training the neural network for 10′000 SGD iterations. Then we compute the moment estimates (31) and their sensitivities to all the model parameters. These quantities are also estimated with simulation-based methods (mNRM and BPA) with 1000 samples, and unlike previous examples we cannot compute these quantities exactly due to nonlinear propensities. In the plots shown in Fig 5D, 5E and 5F, we compare the estimates from DeepCME and simulation-based approaches for various values of n. Observe that DeepCME is reasonably accurate in estimating the moments and their parametric sensitivities for all values of n. The success of the training process is shown by the validation loss function profiles in Fig 5C. Note that these loss functions increase monotonically with n and this is consistent with the observation that errors in DeepCME-estimated quantities increase with n (see Fig 5D, 5E and 5F). The CPU times for DeepCME and simulation-based methods are displayed in Fig 5B, and as in the previous examples they show sub-linear growth w.r.t. n for the former but linear growth for the latter.

5.4 Linear signalling cascade with feedback

Lastly we consider another variant of the network in the second example where there is negative feedback in the production of X₁ from species X_n (see Fig 6A) which is given by a repressing Hill function with a basal rate (33) where b = 1, k_m = 100, k₀ = 10 and H = 1. Other reactions have mass-action kinetics as in the second example, with the same rate constants k = 5 and γ = 1. Due to the presence of feedback, oscillations can arise in the dynamics and to better represent this temporal dependence of the policy map we encode it with N_T = 5 identical DNNs (see Remark 4.3).

Download:

Fig 6. Linear signalling cascade with feedback.

(A) Depicts the network with n species and 2n reactions. The reaction shown with red dashed-arrow has propensity given by a nonlinear repressing Hill function (33). All other reactions have mass-action kinetics. (B) The CPU times are shown for DeepCME for different values of n (denoted as # species), and for comparison the time needed by simulation based methods (mNRM for function estimates and BPA for parameter sensitivities) with 1000 trajectories is also indicated. (C) Plots the validation loss function w.r.t. training steps for various n values. Panels (D-F) show estimates for the function values ( and ) at T = 1 and the parameter sensitivities (only the significant sensitivities are shown). The estimates with simulation based methods are shown as 95% confidence intervals.

https://doi.org/10.1371/journal.pcbi.1009623.g006

For number of species n = 2, 5, 10, we apply DeepCME to this reaction network by training the neural network for 10′000 SGD iterations. Then we compute the moment estimates (31) and their sensitivities to all the model parameters, and we also estimate these quantities with simulation-based methods (mNRM and BPA) using 1000 samples. In the plots shown in Fig 6D, 6E and 6F), we compare the estimates from both these approaches for various values of n. Observe that DeepCME is quite accurate in estimating the moments for n = 2, 5 and the parametric sensitivities for only n = 2. For n = 5, 10 only the sensitivities for are accurate but the sensitivities for are not accurate with our neural network architecture. The validation loss function trajectories are shown in Fig 6C. The CPU times for DeepCME and simulation-based methods are plotted in Fig 6B, and they show a similar growth pattern as our earlier examples.

6 Conclusion

Over the past couple of decades, stochastic reaction network models have become increasingly popular as a modelling paradigm for noisy intracellular processes. Many consequential biological studies have experimentally highlighted the random dynamical fluctuations within living cells, and have employed such stochastic models to quantify the effects of this randomness in shaping the phenotype at both the population and the single-cell levels [40]. As experimental technologies continue to improve at a rapid pace, it is urgent to develop computational tools that are able to bring larger and more realistic systems within the scope of stochastic modelling and analysis.

The central object of interest in stochastic reaction network models is a high-dimensional system of linear ODEs, called the Chemical Master Equation (CME). Numerical solutions to the CME are difficult to obtain and commonly used simulation-based schemes to estimate the solutions often require an inordinate amount of computational time, even for moderately-sized networks. Inspired by the recent success of machine learning approaches in solving high-dimensional PDEs [13], our goal in this paper is to devise a similar strategy, based on deep reinforcement learning to numerically estimate solutions to CMEs. We develop such a method, called DeepCME, and we illustrate it with a number of examples. The neural network we train in DeepCME provides estimates for expectations based on the CME solution and in principle it also provides estimates for the sensitivities of these expectations w.r.t. all the model parameters without any extra effort. Such parametric sensitivities are important for many applications, such as evaluating a network’s robustness properties [41] or identifying its critical components [42], but they are even more difficult to estimate than solutions to the CME [23–31].

Our work opens up several directions for future research. The machine-learning based computational framework and the mathematical formulation which we provide allows one to deploy and transfer strong ML methodologies to the quantitative analysis and to data assimilation into complex CRNs. The present, basic approach can be improved/extended in a number of ways.

Firstly, it needs to be investigated how the architecture of the neural network can be optimally selected, for improved convergence of the training process, based on the CRN model. Overparametrized neural network architectures may be regularised by adding suitable weight-bias penalties in the loss-function. The resulting improved convergence will increase the accuracy of the estimates provided by DeepCME, especially for the parameter sensitivities, and reduce the number of trajectories needed for the neural network training.

Secondly, although the presently proposed framework requires relatively few ‘exact’ stochastic simulations of the dynamics, it could nevertheless be computationally prohibitive for many large biological networks, especially if they are multiscale in nature [43, 44]. It might be possible to improve efficiency by replacing exact simulations with τ-leaping simulations [20], multi-level schemes [21] or with simulations based on reduced models for multiscale networks [16, 43–45]. Incorporating such approaches for generating training trajectories would make our approach computationally feasible for much larger networks than those considered here. In particular, multi-level simulation schemes which are based on coupling techniques [21] would allow one to construct a lower variance estimator for the loss function (29). This could in turn benefit the accuracy and the convergence of the training process (see, e.g. [46] for the development of multilevel DNN training algorithms, albeit in another class of applications). In the context of multiscale networks, identifying the appropriate copy-number scalings that give rise to reduced models with simpler dynamics is a highly specialised task requiring careful theoretical analysis [16]. However our approach can be extended to “learn” these scaling factors during the training process by including them as trainable subnetworks into the ML feature space and employing them to scale the state-vectors in the input layer of the DNNs (see Fig 1). It is quite possible that incorporating these scaling factors would enhance the expressivity of the DNN.

Thirdly, the parameter sensitivities that we compute in our method could be employed in an ‘outer’ gradient descent method with the purpose of inferring model parameters by matching the computed statistics of CME solution with experimental data [47].

On the theoretical front, greater mathematical effort is required to understand how deep reinforcement-learning approaches can help in circumventing the curse of dimensionality inherent to CMEs. Alternative training approaches, such as Generative Adversarial Nets (GANs), may also be suitable for acceleration of the training process (see, e.g., [48]).

Finally, the architecture of the DNNs may include feature spaces comprising parametric dictionaries of motifs, which are adjusted during training to the reaction rates and to the kinetics of the CRN under consideration.

References

1. Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic Gene Expression in a Single Cell. Science. 2002;297(5584):1183–1186. pmid:12183631
- View Article
- PubMed/NCBI
- Google Scholar
2. McAdams HH, Arkin A. Stochastic mechanisms in gene expression. Proc Natl Acad Sci, Biochemistry. 1997;94:814–819. pmid:9023339
- View Article
- PubMed/NCBI
- Google Scholar
3. Anderson DA, Kurtz TG. Continuous time Markov chain models for chemical reaction networks. In: Koeppl H, Setti G, di Bernardo M, Densmore D, editors. Design and Analysis of Biomolecular Circuits. Springer-Verlag; 2011.
4. van Kampen NG. A power series expansion of the master equation. Canadian Journal of Physics. 1961;39(4):551–567.
- View Article
- Google Scholar
5. Gillespie DT. Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry. 1977;81(25):2340–2361.
- View Article
- Google Scholar
6. Anderson DF. A modified next reaction method for simulating chemical systems with time dependent propensities and delays. The Journal of chemical physics. 2007;127(21):214107. pmid:18067349
- View Article
- PubMed/NCBI
- Google Scholar
7. Altı ntan D, Koeppl H. Hybrid master equation for jump-diffusion approximation of biomolecular reaction networks. BIT. 2020;60(2):261–294.
- View Article
- Google Scholar
8. Hornung F, Jentzen A, Salimova D. Space-time deep neural network approximations for high-dimensional partial differential equations. Switzerland: Seminar for Applied Mathematics, ETH Zürich; 2020. 2020-35. Available from: https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2020/2020-35.pdf.
9. Munsky B, Khammash M. The finite state projection algorithm for the solution of the chemical master equation. Journal of Chemical Physics. 2006;124(4).
- View Article
- Google Scholar
10. MacNamara S, Burrage K, Sidje RB. Multiscale modeling of chemical kinetics via the master equation. Multiscale Modeling & Simulation. 2008;6(4):1146–1168.
- View Article
- Google Scholar
11. Kazeev V, Khammash M, Nip M, Schwab C. Direct Solution of the Chemical Master Equation Using Quantized Tensor Trains. PLoS Comput Biol. 2014;10(3):e1003359. pmid:24626049
- View Article
- PubMed/NCBI
- Google Scholar
12. Weinan E, Han J, Jentzen A. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in Mathematics and Statistics. 2017;5(4):349–380.
- View Article
- Google Scholar
13. Han J, Jentzen A, Weinan E. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences. 2018;115(34):8505–8510. pmid:30082389
- View Article
- PubMed/NCBI
- Google Scholar
14. Hermann J, Schätzle Z, Noé F. Deep-neural-network solution of the electronic Schrödinger equation. Nature Chemistry. 2020;12(10):891–897. pmid:32968231
- View Article
- PubMed/NCBI
- Google Scholar
15. Kurtz TG. Strong approximation theorems for density dependent Markov chains. Stochastic Processes Appl. 1977/78;6(3):223–240.
- View Article
- Google Scholar
16. Kang HW, Kurtz TG. Separation of Time-Scales and Model Reduction for Stochastic Reaction Networks. Ann Appl Probab. 2013;23(2):529–583.
- View Article
- Google Scholar
17. Cohen N, Sharir O, Shashua A. On the Expressive Power of Deep Learning: A Tensor Analysis. 29th Annual Conference on Learning Theory (COLT). 2016;.
18. Gupta A, Mikelson J, Khammash M. A finite state projection algorithm for the stationary solution of the chemical master equation. The Journal of Chemical Physics. 2017;147(15):154101. pmid:29055349
- View Article
- PubMed/NCBI
- Google Scholar
19. Gillespie DT. Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of Chemical Physics. 2001;115(4):1716–1733.
- View Article
- Google Scholar
20. Cao Y, Gillespie DT, Petzold LR. Efficient step size selection for the tau-leaping simulation method. The Journal of Chemical Physics. 2006;124(4). pmid:16460151
- View Article
- PubMed/NCBI
- Google Scholar
21. Anderson DF, Higham DJ. Multilevel Monte Carlo for continuous time Markov chains, with applications in biochemical kinetics. Multiscale Modeling & Simulation. 2012;10(1):146–179.
- View Article
- Google Scholar
22. Ethier SN, Kurtz TG. Markov processes: Characterization and Convergence. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. New York: John Wiley & Sons Inc.; 1986.
23. Anderson D. An Efficient Finite Difference Method for Parameter Sensitivities of Continuous Time Markov Chains. SIAM: Journal on Numerical Analysis. 2012;50.
- View Article
- Google Scholar
24. Rathinam M, Sheppard PW, Khammash M. Efficient computation of parameter sensitivities of discrete stochastic chemical reaction networks. Journal of Chemical Physics. 2010;132. pmid:20095724
- View Article
- PubMed/NCBI
- Google Scholar
25. Sheppard PW, Rathinam M, Khammash M. A pathwise derivative approach to the computation of parameter sensitivities in discrete stochastic chemical systems. Journal of Chemical Physics. 2012;136. pmid:22280752
- View Article
- PubMed/NCBI
- Google Scholar
26. Plyasunov S, Arkin AP. Efficient stochastic sensitivity analysis of discrete event systems. Journal of Computational Physics. 2007;221:724–738.
- View Article
- Google Scholar
27. Gupta A, Khammash M. Unbiased Estimation of Parameter Sensitivities for Stochastic Chemical Reaction Networks. SIAM Journal on Scientific Computing. 2013;35(6).
- View Article
- Google Scholar
28. Gupta A, Khammash M. An efficient and unbiased method for sensitivity analysis of stochastic reaction networks. Journal of The Royal Society Interface. 2014;11(101). pmid:25354975
- View Article
- PubMed/NCBI
- Google Scholar
29. Gupta A, Rathinam M, Khammash M. Estimation of parameter sensitivities for stochastic reaction networks using tau-leap simulations. SIAM Journal on Numerical Analysis. 2018;56(2):1134–1167.
- View Article
- Google Scholar
30. Gupta A, Khammash M. Sensitivity analysis for stochastic chemical reaction networks with multiple time-scales. Electron J Probab. 2014;.
- View Article
- Google Scholar
31. Gupta A, Khammash M. Sensitivity analysis for multiscale stochastic reaction networks using hybrid approximations. Bulletin of Mathematical Biology. 2019;81(8):3121–3158. pmid:30302636
- View Article
- PubMed/NCBI
- Google Scholar
32. Norris JR. Markov chains. vol. 2 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press; 1998.
33. Anderson DF, Kurtz TG. Stochastic analysis of biochemical systems. Springer; 2015.
34. Gupta A, Briat C, Khammash M. A Scalable Computational Framework for Establishing Long-Term Behavior of Stochastic Reaction Networks. PLoS Comput Biol. 2014;10(6):e1003669. pmid:24968191
- View Article
- PubMed/NCBI
- Google Scholar
35. Rathinam M. Moment growth bounds on continuous time Markov processes on non-negative integer lattices. Quarterly of Applied Mathematics. 2015; p. 347–364.
- View Article
- Google Scholar
36. Engblom S. On the stability of stochastic jump kinetics. arXiv preprint arXiv:12023892. 2012;.
37. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al. Tensorflow: A system for large-scale machine learning. In: 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16); 2016. p. 265–283.
38. Yazdani A, Lu L, Raissi M, Karniadakis GE. Systems biology informed deep learning for inferring parameters and hidden dynamics. PLoS computational biology. 2020;16(11):e1007575. pmid:33206658
- View Article
- PubMed/NCBI
- Google Scholar
39. Euler Cluster Specifications;. https://scicomp.ethz.ch/wiki/Euler.
40. Eldar A, Elowitz MB. Functional roles for noise in genetic circuits. Nature. 2010;467(7312):167–173. pmid:20829787
- View Article
- PubMed/NCBI
- Google Scholar
41. Stelling J, Gilles ED, Doyle FJ. Robustness properties of circadian clock architectures. Proceedings of the National Academy of Sciences. 2004;101(36):13210–13215. pmid:15340155
- View Article
- PubMed/NCBI
- Google Scholar
42. Feng Xj, Hooshangi S, Chen D, Li G, Weiss R, Rabitz H. Optimizing genetic circuits by global sensitivity analysis. Biophysical journal. 2004;87(4):2195–2202. pmid:15454422
- View Article
- PubMed/NCBI
- Google Scholar
43. Cao Y, Gillespie DT, Petzold LR. The slow-scale stochastic simulation algorithm. Journal of Chemical Physics. 2005;122(1):1–18. pmid:15638651
- View Article
- PubMed/NCBI
- Google Scholar
44. E W, Liu D, Vanden-Eijnden E. Nested stochastic simulation algorithms for chemical kinetic systems with multiple time scales. J Comput Phys. 2007;221(1):158–180.
- View Article
- Google Scholar
45. Hepp B, Gupta A, Khammash M. Adaptive hybrid simulations for multiscale stochastic reaction networks. The Journal of chemical physics. 2015;142(3):034118. pmid:25612700
- View Article
- PubMed/NCBI
- Google Scholar
46. Lye K, Mishra S, Molinaro R. A Multi-level procedure for enhancing accuracy of machine learning algorithms. European Journal of Applied Mathematics. 2020;.
- View Article
- Google Scholar
47. Munsky B, Trinh B, Khammash M. Listening to the noise: random fluctuations reveal gene network parameters. Molecular systems biology. 2009;5(1):318. pmid:19888213
- View Article
- PubMed/NCBI
- Google Scholar
48. Francesca Cairoli and Ginevra Carbone and Luca Bortolussi Abstraction of Markov Population Dynamics via Generative Adversarial Nets. arXiv 2106.12981.

[ref1] 1. Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic Gene Expression in a Single Cell. Science. 2002;297(5584):1183–1186. pmid:12183631
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. McAdams HH, Arkin A. Stochastic mechanisms in gene expression. Proc Natl Acad Sci, Biochemistry. 1997;94:814–819. pmid:9023339
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Anderson DA, Kurtz TG. Continuous time Markov chain models for chemical reaction networks. In: Koeppl H, Setti G, di Bernardo M, Densmore D, editors. Design and Analysis of Biomolecular Circuits. Springer-Verlag; 2011.

[ref4] 4. van Kampen NG. A power series expansion of the master equation. Canadian Journal of Physics. 1961;39(4):551–567.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Gillespie DT. Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry. 1977;81(25):2340–2361.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Anderson DF. A modified next reaction method for simulating chemical systems with time dependent propensities and delays. The Journal of chemical physics. 2007;127(21):214107. pmid:18067349
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref7] 7. Altı ntan D, Koeppl H. Hybrid master equation for jump-diffusion approximation of biomolecular reaction networks. BIT. 2020;60(2):261–294.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. Hornung F, Jentzen A, Salimova D. Space-time deep neural network approximations for high-dimensional partial differential equations. Switzerland: Seminar for Applied Mathematics, ETH Zürich; 2020. 2020-35. Available from: https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2020/2020-35.pdf.

[ref9] 9. Munsky B, Khammash M. The finite state projection algorithm for the solution of the chemical master equation. Journal of Chemical Physics. 2006;124(4).
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref10] 10. MacNamara S, Burrage K, Sidje RB. Multiscale modeling of chemical kinetics via the master equation. Multiscale Modeling & Simulation. 2008;6(4):1146–1168.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref11] 11. Kazeev V, Khammash M, Nip M, Schwab C. Direct Solution of the Chemical Master Equation Using Quantized Tensor Trains. PLoS Comput Biol. 2014;10(3):e1003359. pmid:24626049
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref12] 12. Weinan E, Han J, Jentzen A. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in Mathematics and Statistics. 2017;5(4):349–380.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Han J, Jentzen A, Weinan E. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences. 2018;115(34):8505–8510. pmid:30082389
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref14] 14. Hermann J, Schätzle Z, Noé F. Deep-neural-network solution of the electronic Schrödinger equation. Nature Chemistry. 2020;12(10):891–897. pmid:32968231
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref15] 15. Kurtz TG. Strong approximation theorems for density dependent Markov chains. Stochastic Processes Appl. 1977/78;6(3):223–240.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref16] 16. Kang HW, Kurtz TG. Separation of Time-Scales and Model Reduction for Stochastic Reaction Networks. Ann Appl Probab. 2013;23(2):529–583.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref17] 17. Cohen N, Sharir O, Shashua A. On the Expressive Power of Deep Learning: A Tensor Analysis. 29th Annual Conference on Learning Theory (COLT). 2016;.

[ref18] 18. Gupta A, Mikelson J, Khammash M. A finite state projection algorithm for the stationary solution of the chemical master equation. The Journal of Chemical Physics. 2017;147(15):154101. pmid:29055349
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref19] 19. Gillespie DT. Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of Chemical Physics. 2001;115(4):1716–1733.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref20] 20. Cao Y, Gillespie DT, Petzold LR. Efficient step size selection for the tau-leaping simulation method. The Journal of Chemical Physics. 2006;124(4). pmid:16460151
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref21] 21. Anderson DF, Higham DJ. Multilevel Monte Carlo for continuous time Markov chains, with applications in biochemical kinetics. Multiscale Modeling & Simulation. 2012;10(1):146–179.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref22] 22. Ethier SN, Kurtz TG. Markov processes: Characterization and Convergence. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. New York: John Wiley & Sons Inc.; 1986.

[ref23] 23. Anderson D. An Efficient Finite Difference Method for Parameter Sensitivities of Continuous Time Markov Chains. SIAM: Journal on Numerical Analysis. 2012;50.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Rathinam M, Sheppard PW, Khammash M. Efficient computation of parameter sensitivities of discrete stochastic chemical reaction networks. Journal of Chemical Physics. 2010;132. pmid:20095724
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref25] 25. Sheppard PW, Rathinam M, Khammash M. A pathwise derivative approach to the computation of parameter sensitivities in discrete stochastic chemical systems. Journal of Chemical Physics. 2012;136. pmid:22280752
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref26] 26. Plyasunov S, Arkin AP. Efficient stochastic sensitivity analysis of discrete event systems. Journal of Computational Physics. 2007;221:724–738.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref27] 27. Gupta A, Khammash M. Unbiased Estimation of Parameter Sensitivities for Stochastic Chemical Reaction Networks. SIAM Journal on Scientific Computing. 2013;35(6).
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref28] 28. Gupta A, Khammash M. An efficient and unbiased method for sensitivity analysis of stochastic reaction networks. Journal of The Royal Society Interface. 2014;11(101). pmid:25354975
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref29] 29. Gupta A, Rathinam M, Khammash M. Estimation of parameter sensitivities for stochastic reaction networks using tau-leap simulations. SIAM Journal on Numerical Analysis. 2018;56(2):1134–1167.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref30] 30. Gupta A, Khammash M. Sensitivity analysis for stochastic chemical reaction networks with multiple time-scales. Electron J Probab. 2014;.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref31] 31. Gupta A, Khammash M. Sensitivity analysis for multiscale stochastic reaction networks using hybrid approximations. Bulletin of Mathematical Biology. 2019;81(8):3121–3158. pmid:30302636
View Article
PubMed/NCBI
Google Scholar

[95] View Article

[96] PubMed/NCBI

[97] Google Scholar

[ref32] 32. Norris JR. Markov chains. vol. 2 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press; 1998.

[ref33] 33. Anderson DF, Kurtz TG. Stochastic analysis of biochemical systems. Springer; 2015.

[ref34] 34. Gupta A, Briat C, Khammash M. A Scalable Computational Framework for Establishing Long-Term Behavior of Stochastic Reaction Networks. PLoS Comput Biol. 2014;10(6):e1003669. pmid:24968191
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref35] 35. Rathinam M. Moment growth bounds on continuous time Markov processes on non-negative integer lattices. Quarterly of Applied Mathematics. 2015; p. 347–364.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref36] 36. Engblom S. On the stability of stochastic jump kinetics. arXiv preprint arXiv:12023892. 2012;.

[ref37] 37. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al. Tensorflow: A system for large-scale machine learning. In: 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16); 2016. p. 265–283.

[ref38] 38. Yazdani A, Lu L, Raissi M, Karniadakis GE. Systems biology informed deep learning for inferring parameters and hidden dynamics. PLoS computational biology. 2020;16(11):e1007575. pmid:33206658
View Article
PubMed/NCBI
Google Scholar

[110] View Article

[111] PubMed/NCBI

[112] Google Scholar

[ref39] 39. Euler Cluster Specifications;. https://scicomp.ethz.ch/wiki/Euler.

[ref40] 40. Eldar A, Elowitz MB. Functional roles for noise in genetic circuits. Nature. 2010;467(7312):167–173. pmid:20829787
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref41] 41. Stelling J, Gilles ED, Doyle FJ. Robustness properties of circadian clock architectures. Proceedings of the National Academy of Sciences. 2004;101(36):13210–13215. pmid:15340155
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref42] 42. Feng Xj, Hooshangi S, Chen D, Li G, Weiss R, Rabitz H. Optimizing genetic circuits by global sensitivity analysis. Biophysical journal. 2004;87(4):2195–2202. pmid:15454422
View Article
PubMed/NCBI
Google Scholar

[123] View Article

[124] PubMed/NCBI

[125] Google Scholar

[ref43] 43. Cao Y, Gillespie DT, Petzold LR. The slow-scale stochastic simulation algorithm. Journal of Chemical Physics. 2005;122(1):1–18. pmid:15638651
View Article
PubMed/NCBI
Google Scholar

[127] View Article

[128] PubMed/NCBI

[129] Google Scholar

[ref44] 44. E W, Liu D, Vanden-Eijnden E. Nested stochastic simulation algorithms for chemical kinetic systems with multiple time scales. J Comput Phys. 2007;221(1):158–180.
View Article
Google Scholar

[131] View Article

[132] Google Scholar

[ref45] 45. Hepp B, Gupta A, Khammash M. Adaptive hybrid simulations for multiscale stochastic reaction networks. The Journal of chemical physics. 2015;142(3):034118. pmid:25612700
View Article
PubMed/NCBI
Google Scholar

[134] View Article

[135] PubMed/NCBI

[136] Google Scholar

[ref46] 46. Lye K, Mishra S, Molinaro R. A Multi-level procedure for enhancing accuracy of machine learning algorithms. European Journal of Applied Mathematics. 2020;.
View Article
Google Scholar

[138] View Article

[139] Google Scholar

[ref47] 47. Munsky B, Trinh B, Khammash M. Listening to the noise: random fluctuations reveal gene network parameters. Molecular systems biology. 2009;5(1):318. pmid:19888213
View Article
PubMed/NCBI
Google Scholar

[141] View Article

[142] PubMed/NCBI

[143] Google Scholar

[ref48] 48. Francesca Cairoli and Ginevra Carbone and Luca Bortolussi Abstraction of Markov Population Dynamics via Generative Adversarial Nets. arXiv 2106.12981.

Figures

Abstract

Author summary

1 Introduction

2 Preliminaries

2.1 The stochastic model

2.2 Kolmogorov’s forward and backward equations

2.3 Parametric sensitivity analysis

3 Main results

4 DeepCME: Deep learning formulation for CME

4.1 DNN encoding of the policy map

4.2 Loss function computation based on the training data

5 Examples

5.1 Independent birth death network

5.2 Linear signalling cascade

5.3 Nonlinear signalling cascade

5.4 Linear signalling cascade with feedback

6 Conclusion

References