Elsevier

Journal of Theoretical Biology

Volume 242, Issue 1, 7 September 2006, Pages 101-116
Journal of Theoretical Biology

A stochastic model of gene transcription: An application to L1 retrotransposition events

https://doi.org/10.1016/j.jtbi.2006.02.010Get rights and content

Abstract

A simplified mathematical model of gene transcription is presented based on a system of coupled chemical reactions and a corresponding set of stochastic equations similar to those used in enzyme kinetics theory. The quasi-stationary distribution for the model is derived and its usefulness illustrated with an example of model parameters estimation using sparse time course data on L1 retrotransposon expression kinetics. The issue of model validation is also discussed and a simple validation procedure for the estimated model is devised. The procedure compares model predicted values with the laboratory data via the standard Bayesian techniques with the help of modern Markov-Chain Monte-Carlo methodology.

Introduction

Modeling gene transcription events is important since the transcription process is a key component in the regulatory pathways supporting most of the organism's biological function. These pathways are sets of feedforward and feedback loops designed to respond to a stimulus and/or sustain homeostasis. Occasionally, however, these interacting networks permit aberrant behavior that manifests itself as a phenotypic shift that may prove harmful, and ultimately lethal, to the organism. A better understanding of control mechanisms such as transcription, may provide valuable insights into the complexity of biological systems in health and disease. One possible way of analysing the transcription process is to set it in a general framework of systems biology which emphasizes a kinetic system characterization of biological regulatory networks. A regulatory network or, more generally, a reaction network is a chemical system involving multiple reactions and chemical species. Such a network's view of biological organization is aimed to draw on mathematical methods developed in the context of dynamical systems and computational theories since the regulatory networks are dynamic and stochastic in nature. For some recent examples of such modeling for specific regulatory systems as well as further references see, e.g., Reddy and Yin, 1999, Rao and Arkin, 2003, McAdams and Arkin, 1997. The stochastic character of the bio-networks arises from the fact that when modeling chemical reactions within biological cells the number of molecules involved, at least for some of the species, is often too small for the deterministic model to provide a good representation of the behavior of the system. This is in contrast with the settings of classical chemistry, where the amounts of reactants are typically so large that the stochastic fluctuations are irrelevant and reaction networks are modeled with systems of ordinary differential equations.

In the current article, our first goal is to develop a simplified formal stochastic model of gene transcription. The model could be of interest in its own right or as one of the modules of a larger reaction network. For instance, in the case of the L1 retrotransposon gene discussed below the process of interest would be typically one of the reverse transcription and not the transcription itself. However, in the relevant reaction network, the former reaction is coupled with the latter and thus the transcription model is also relevant in the context of modeling the complex regulatory system of the L1 retrotransposon (cf. e.g., Lu and Ramos, 2003, Lu and Ramos, 1998).

Our second goal is to introduce some new procedures for model parameters estimation and general model validation. Under the latter term, we understand herein the statistical comparison of the output from the created stochastic model with the available laboratory data. Such data would be often time dependent as well as possibly sparse and/or incomplete (e.g., could contain missing values). Surprisingly, in the literature there has been until now relatively little attention devoted to the issues of parameter estimation and validation for network models (see, e.g., Bower and Bolouri, 2001; Kitano, 2001, for a review). In most cases the techniques developed so far have been based on the deterministic assumptions which, in general, do not apply to stochastic kinetic models that we are concerned with. On the other hand, current progress in Bayesian stochastic simulation methodology allows, in principle, for direct inference to be made about the parameters of any fully specified model, taking account of prior information about parameter values in the form of probability distributions. Herein we propose to utilize these modern Bayesian methods specializing them to our gene transcription model. We note, however, that many of the ideas may be also extended to other similar systems with quasi-steady-state distributions.

Throughout the article we find it convenient to use data on the L1 retrotransposon in order to illustrate some of the conceptual and numerical issues discussed. L1 retrotransposon is the only active autonomous retrotransposon in mammals (see, e.g., Moran et al., 1996, Sassaman et al., 1997, Borgen et al., 1973). The full length mammalian L1 is approximately 6000 base-pairs (bp) long and is composed of the 5’-untranslated region bearing an internal promoter, two open reading frames separated by intergenic region and a 3 untranslated region containing a poly-A tail. In the L1 system, transcription of the L1 gene is greatly accelerated by the availability of proteins that interact with the regulatory region of the gene as well as proteins that form the pre-initiation complex (Sassaman et al., 1997). The identity of these proteins continues to be debated, but experimental evidence suggests that the rate of the L1 transcription is directly proportional to the number of proteins recruited to the regulatory region (Lu and Ramos, 2003). Additionally, it has been found (see, e.g., Sassaman et al., 1997, Lu and Ramos, 2003, Kerzee and Ramos, 2000, Lu et al., 2000) that the carcinogen benzo(a)pyrene (BaP), once metabolized, will greatly increase the rate at which the transcription regulatory proteins are recruited, and hence increase the rate of transcription. In the current work, we are using time course data acquired from an experiment in which, following addition of the carcinogen, the concentration of L1 mRNA was measured at three time points in the isogenic cells derived from a mouse. Herein we use the data coming from the Northern blot experiment (see, e.g., Rosenthal et al., 1998, for more details on Northern blot). Another typical example of data producing technology available for model inference would be the real-time polymerase chain reaction (PCR) for measurement of gene expression or possibly the real-time imaging of gene expression, for instance, using microscope digital imaging systems which employ a fluorescent reporter mechanism together with the principles of fluorescence resonance energy transfer (FRET). For general reference on real-time PCR, see, Wong and Medrano (2005), for details of FRET and its application to the monitoring of gene expression, see Lakowicz (1983) and Zlokarnik et al. (1998).

The paper is organized as follows. In the next section (Section 2) we give a brief description of a fairly generic transcription model via a system of coupled chemical reactions. For the purpose of illustration of our particular interests in the transcription process we also briefly discuss these reactions in relation to the model of reverse transcription of L1 retrotransposon. For the benefit of the general readership less familiar with the topic, in Section 3 we give a brief overview of the general theory of mathematical models for stochastic biochemical kinetic networks. In Section 4, we discuss the formal methods of approximations of these general stochastic systems and show that in the large system volume our set of stochastic equations reduces to the well-known in the literature set of deterministic rate equations or (under different scaling) the set of the so-called Langevin equations. In Section 5, we specialize our discussion to the transcription model and show that under our model assumptions the system attains an explicit (quasi) equilibrium distribution. The result is utilized in Section 6, when we discuss the issue of making inferences about the relevant system parameters in the presence of incomplete and very sparse time course data from L1 retrotransposon experiment. We also review therein different statistical methods of inference about the system parameters arguing in our specific context the appropriateness of the Bayesian models. In Section 7, we introduce a general algorithm for a simulation-based model validation procedure and illustrate it with a numerical example using our L1 data. We conclude with summary and some discussion in Section 8. The relevant mathematical formula derivations are presented in the Appendix.

Section snippets

Gene transcription model

Transcription is a physically complex process in which proteins (TProt) bind to the regulatory region of target genes to induce comformational changes that allow for unwinding of the DNA double helix and facilitate recruitment of proteins involved in the formation of a pre-initiation complex. These physical changes are designed to allow RNA polymerase (RNAP) to find specific sites among the billions of bases of genomic DNA (gDNA) to initiate transcription. The transcription of information coded

Mathematical models of reaction kinetics

Assume that our reaction network is a system which consists of K chemical species {s1,,sK} interacting through M chemical reaction channels {r1,,rM}. The system is assumed to be well stirred, confined to a constant volume N, and to be in thermal (but not chemical) equilibrium at some constant temperature. In the case of the simplest stochastic model for such a network the system is treated as a continuous time Markov chain whose state is a vector X(t)=(X1(t),,XK(t)) giving the number of

Reaction rate equation and chemical Langevin equation

The stochastic system with trajectories given by (3.4) may be sometimes approximated by either a somewhat simpler stochastic system or by a purely deterministic one. We now briefly discuss both of these approaches. We first consider the general setting of (3.4) and then specialize the discussion to the gene transcription model (3.6).

If N is the volume times Avogadro's number and x gives the number of molecules of each species present, then c=N-1x gives the concentrations in moles per unit

Stationary and quasi-stationary distribution

Setting in (3.2) tP(x,t|x0)=0and solving the resulting equationk=1M[λk(x-νk+νk)P(x-νk+νk,t|x0)-λk(x)P(x,t|x0)]=0gives the steady state or stationary distribution of the stochastic kinetic system. The corresponding steady state of the deterministic approximation (i.e., the reaction rate equation) is achieved when the conditionddtC(t)=0is satisfied.

Even in moderately complicated stochastic kinetic systems the direct analytic solution of the Eq. (5.1) is typically intractable and thus the

Model parameters estimation with sparse data

We shall discuss model parameter estimation using the example of L1 data from the Northern blot experiment to examine the influence of an external factor (BaP) on the system reaction constants. In its simplest version the above question of BaP influence on the system may be translated into a question about the relationship of BaP levels and the corresponding value of κ3 constant. The question is relevant, for instance, in the context of modeling the relationship between the levels of carcinogen

Model validation for L1 retrotransposon data

In the previous sections, we have proposed a simple model of transcription and suggested an approach to model parameter estimation in cases when, similarly to the L1 retrotransposon experiment discussed, only limited experimental data are available. On the other hand, a relevant question seems to be does the model adequately represent the reality? Answering this type of question may possibly be the most important step in the model building process but often is also one of the most overlooked

Discussion

We have proposed herein a simple stochastic kinetic model of transcription which consists of a set of four linked monomolecular chemical reactions and two time evolving species (mRNA and TProt). The model attempts to capture the relationship between the experimentally observed gene transcription rate and the number of protein facilitators that participate in the regulation of the gene. One of the main advantages of the proposed mathematical model is that it offers an explicit formula for the

Acknowledgments

The first author would like to acknowledge the generous research support for this work from the Center for Genetics and Molecular Medicine at the University of Louisville. All authors would like to acknowledge the anonymous referee whose thoughtful comments and suggestions helped improve the original manuscript.

References (47)

  • A. Borgen et al.

    Metabolic conversion of benzo(a)pyrene by syrian hamster liver microsomes and binding of metabolites to deoxyribonucleic acid

    J. Med. Chem.

    (1973)
  • Bower, J., Bolouri, H. (Ed.) (2001). Computational Modeling of Genetic and Biochemical Networks. MIT Press, Cambridge,...
  • Boys, R.J., Wilkinson, D.J., Kirkwood, T.B.L., 2005. Bayesian inference for a discretely observed stochastic kinetic...
  • Y. Cao et al.

    Efficient formulation of the stochastic simulation algorithm for chemically reacting systems

    J. Chem. Phys.

    (2004)
  • M.-H. Chen et al.

    Monte Carlo Methods in Bayesian Computation. Springer Series in Statistics

    (2000)
  • R. Durrett

    Essentials of Stochastic Processes. Springer Texts in Statistics

    (1999)
  • Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap Monographs on Statistics and Applied Probability,...
  • H. El Samad et al.

    Stochastic modeling of gene regulatory networks

    Int. J. Robust Nonlinear Control

    (2005)
  • S.N. Ethier et al.

    Characterization and convergence

  • N. Friedman

    Inferring cellular networks using probabilistic graphical models

    Science

    (2004)
  • Gardiner, C.W., 2004. Handbook of Stochastic Methods for Physics Chemistry and the Natural Sciences. third ed. Springer...
  • D.T. Gillespie

    Exact stochastic simulation of coupled chemical reactions

    J. Phys. Chem.

    (1977)
  • D.T. Gillespie

    Chemical langevin equation

    J. Chem. Phys.

    (2000)
  • Cited by (20)

    • Computational modeling of RNase, antisense ORF0 RNA, and intracellular compartmentation and their impact on the life cycle of the line retrotransposon

      2021, Computational and Structural Biotechnology Journal
      Citation Excerpt :

      These parameter sweeps are intended to model the influence of carcinogen mediated epigenetic dysregulation on the fates and dispositions of LINE-1 components. Previous simulations of LINE-1 activation by Rempala et al. [38–39] focused on the steady state solutions of relatively simple systems. These included reactions involving the creation of LINE-1 mRNA from the corresponding DNA, the formation of a single protein from the mRNA and the creation of complementary DNA.

    • Towards uncertainty quantification and inference in the stochastic SIR epidemic model

      2012, Mathematical Biosciences
      Citation Excerpt :

      Among these research efforts, various Bayesian and likelihood based approaches to parameter estimation have been attempted. Some authors have worked with a deterministic, continuous or steady state alternative model to infer the parameters in the CME [21–23]. Other noteworthy approaches to Bayesian inference with the CME include [24,22], where the CME is approximated by a diffusion process and [25] where the van Kampen expansion is used to derive a diffusion approximation; other similar papers include [26,27].

    • Genomics, Bioinformatics, and Computational Biology

      2010, Comprehensive Toxicology, Second Edition
    View all citing articles on Scopus
    View full text