A stochastic model of gene transcription: An application to L1 retrotransposition events
Introduction
Modeling gene transcription events is important since the transcription process is a key component in the regulatory pathways supporting most of the organism's biological function. These pathways are sets of feedforward and feedback loops designed to respond to a stimulus and/or sustain homeostasis. Occasionally, however, these interacting networks permit aberrant behavior that manifests itself as a phenotypic shift that may prove harmful, and ultimately lethal, to the organism. A better understanding of control mechanisms such as transcription, may provide valuable insights into the complexity of biological systems in health and disease. One possible way of analysing the transcription process is to set it in a general framework of systems biology which emphasizes a kinetic system characterization of biological regulatory networks. A regulatory network or, more generally, a reaction network is a chemical system involving multiple reactions and chemical species. Such a network's view of biological organization is aimed to draw on mathematical methods developed in the context of dynamical systems and computational theories since the regulatory networks are dynamic and stochastic in nature. For some recent examples of such modeling for specific regulatory systems as well as further references see, e.g., Reddy and Yin, 1999, Rao and Arkin, 2003, McAdams and Arkin, 1997. The stochastic character of the bio-networks arises from the fact that when modeling chemical reactions within biological cells the number of molecules involved, at least for some of the species, is often too small for the deterministic model to provide a good representation of the behavior of the system. This is in contrast with the settings of classical chemistry, where the amounts of reactants are typically so large that the stochastic fluctuations are irrelevant and reaction networks are modeled with systems of ordinary differential equations.
In the current article, our first goal is to develop a simplified formal stochastic model of gene transcription. The model could be of interest in its own right or as one of the modules of a larger reaction network. For instance, in the case of the L1 retrotransposon gene discussed below the process of interest would be typically one of the reverse transcription and not the transcription itself. However, in the relevant reaction network, the former reaction is coupled with the latter and thus the transcription model is also relevant in the context of modeling the complex regulatory system of the L1 retrotransposon (cf. e.g., Lu and Ramos, 2003, Lu and Ramos, 1998).
Our second goal is to introduce some new procedures for model parameters estimation and general model validation. Under the latter term, we understand herein the statistical comparison of the output from the created stochastic model with the available laboratory data. Such data would be often time dependent as well as possibly sparse and/or incomplete (e.g., could contain missing values). Surprisingly, in the literature there has been until now relatively little attention devoted to the issues of parameter estimation and validation for network models (see, e.g., Bower and Bolouri, 2001; Kitano, 2001, for a review). In most cases the techniques developed so far have been based on the deterministic assumptions which, in general, do not apply to stochastic kinetic models that we are concerned with. On the other hand, current progress in Bayesian stochastic simulation methodology allows, in principle, for direct inference to be made about the parameters of any fully specified model, taking account of prior information about parameter values in the form of probability distributions. Herein we propose to utilize these modern Bayesian methods specializing them to our gene transcription model. We note, however, that many of the ideas may be also extended to other similar systems with quasi-steady-state distributions.
Throughout the article we find it convenient to use data on the L1 retrotransposon in order to illustrate some of the conceptual and numerical issues discussed. L1 retrotransposon is the only active autonomous retrotransposon in mammals (see, e.g., Moran et al., 1996, Sassaman et al., 1997, Borgen et al., 1973). The full length mammalian L1 is approximately 6000 base-pairs (bp) long and is composed of the 5’-untranslated region bearing an internal promoter, two open reading frames separated by intergenic region and a untranslated region containing a poly-A tail. In the L1 system, transcription of the L1 gene is greatly accelerated by the availability of proteins that interact with the regulatory region of the gene as well as proteins that form the pre-initiation complex (Sassaman et al., 1997). The identity of these proteins continues to be debated, but experimental evidence suggests that the rate of the L1 transcription is directly proportional to the number of proteins recruited to the regulatory region (Lu and Ramos, 2003). Additionally, it has been found (see, e.g., Sassaman et al., 1997, Lu and Ramos, 2003, Kerzee and Ramos, 2000, Lu et al., 2000) that the carcinogen benzo(a)pyrene (BaP), once metabolized, will greatly increase the rate at which the transcription regulatory proteins are recruited, and hence increase the rate of transcription. In the current work, we are using time course data acquired from an experiment in which, following addition of the carcinogen, the concentration of L1 mRNA was measured at three time points in the isogenic cells derived from a mouse. Herein we use the data coming from the Northern blot experiment (see, e.g., Rosenthal et al., 1998, for more details on Northern blot). Another typical example of data producing technology available for model inference would be the real-time polymerase chain reaction (PCR) for measurement of gene expression or possibly the real-time imaging of gene expression, for instance, using microscope digital imaging systems which employ a fluorescent reporter mechanism together with the principles of fluorescence resonance energy transfer (FRET). For general reference on real-time PCR, see, Wong and Medrano (2005), for details of FRET and its application to the monitoring of gene expression, see Lakowicz (1983) and Zlokarnik et al. (1998).
The paper is organized as follows. In the next section (Section 2) we give a brief description of a fairly generic transcription model via a system of coupled chemical reactions. For the purpose of illustration of our particular interests in the transcription process we also briefly discuss these reactions in relation to the model of reverse transcription of L1 retrotransposon. For the benefit of the general readership less familiar with the topic, in Section 3 we give a brief overview of the general theory of mathematical models for stochastic biochemical kinetic networks. In Section 4, we discuss the formal methods of approximations of these general stochastic systems and show that in the large system volume our set of stochastic equations reduces to the well-known in the literature set of deterministic rate equations or (under different scaling) the set of the so-called Langevin equations. In Section 5, we specialize our discussion to the transcription model and show that under our model assumptions the system attains an explicit (quasi) equilibrium distribution. The result is utilized in Section 6, when we discuss the issue of making inferences about the relevant system parameters in the presence of incomplete and very sparse time course data from L1 retrotransposon experiment. We also review therein different statistical methods of inference about the system parameters arguing in our specific context the appropriateness of the Bayesian models. In Section 7, we introduce a general algorithm for a simulation-based model validation procedure and illustrate it with a numerical example using our L1 data. We conclude with summary and some discussion in Section 8. The relevant mathematical formula derivations are presented in the Appendix.
Section snippets
Gene transcription model
Transcription is a physically complex process in which proteins (TProt) bind to the regulatory region of target genes to induce comformational changes that allow for unwinding of the DNA double helix and facilitate recruitment of proteins involved in the formation of a pre-initiation complex. These physical changes are designed to allow RNA polymerase (RNAP) to find specific sites among the billions of bases of genomic DNA (gDNA) to initiate transcription. The transcription of information coded
Mathematical models of reaction kinetics
Assume that our reaction network is a system which consists of chemical species interacting through chemical reaction channels . The system is assumed to be well stirred, confined to a constant volume , and to be in thermal (but not chemical) equilibrium at some constant temperature. In the case of the simplest stochastic model for such a network the system is treated as a continuous time Markov chain whose state is a vector giving the number of
Reaction rate equation and chemical Langevin equation
The stochastic system with trajectories given by (3.4) may be sometimes approximated by either a somewhat simpler stochastic system or by a purely deterministic one. We now briefly discuss both of these approaches. We first consider the general setting of (3.4) and then specialize the discussion to the gene transcription model (3.6).
If is the volume times Avogadro's number and gives the number of molecules of each species present, then gives the concentrations in moles per unit
Stationary and quasi-stationary distribution
Setting in (3.2) and solving the resulting equationgives the steady state or stationary distribution of the stochastic kinetic system. The corresponding steady state of the deterministic approximation (i.e., the reaction rate equation) is achieved when the conditionis satisfied.
Even in moderately complicated stochastic kinetic systems the direct analytic solution of the Eq. (5.1) is typically intractable and thus the
Model parameters estimation with sparse data
We shall discuss model parameter estimation using the example of L1 data from the Northern blot experiment to examine the influence of an external factor (BaP) on the system reaction constants. In its simplest version the above question of BaP influence on the system may be translated into a question about the relationship of BaP levels and the corresponding value of constant. The question is relevant, for instance, in the context of modeling the relationship between the levels of carcinogen
Model validation for L1 retrotransposon data
In the previous sections, we have proposed a simple model of transcription and suggested an approach to model parameter estimation in cases when, similarly to the L1 retrotransposon experiment discussed, only limited experimental data are available. On the other hand, a relevant question seems to be does the model adequately represent the reality? Answering this type of question may possibly be the most important step in the model building process but often is also one of the most overlooked
Discussion
We have proposed herein a simple stochastic kinetic model of transcription which consists of a set of four linked monomolecular chemical reactions and two time evolving species (mRNA and TProt). The model attempts to capture the relationship between the experimentally observed gene transcription rate and the number of protein facilitators that participate in the regulation of the gene. One of the main advantages of the proposed mathematical model is that it offers an explicit formula for the
Acknowledgments
The first author would like to acknowledge the generous research support for this work from the Center for Genetics and Molecular Medicine at the University of Louisville. All authors would like to acknowledge the anonymous referee whose thoughtful comments and suggestions helped improve the original manuscript.
References (47)
- et al.
Multiscale stochastic simulation algorithm with stochastic partial equilibrium assumption for chemically reacting systems
J. Chem. Phys.
(2005) A rigorous derivation of the chemical master equation
Physica A
(1992)- et al.
Identification of genes differentially expressed in vascular smooth muscle cells following benzo[a]pyrene challenge: implications for chemical atherogenesis
Biochem. Biophys. Res. Commun.
(1998) - et al.
Redox activation of a novel a-type mouse l1md retrotransposon in vascular smooth muscle cells
J. Biol. Chem.
(2003) - et al.
Benzo(a)pyrene activates l1md retrotransposon and DNA repair in vascular smooth muscle cells
Mutat. Res.
(2000) - et al.
High frequency retrotransposition in cultured mammalian cells
Cell
(1996) - et al.
Northern blot identification of mRNA containing sequence for protein allergen alt, a1, in eight strains of Alternaria alternata
Ann. Allergy Asthma Immunol.
(1998) - et al.
Stochastic vs. deterministic modeling of intracellular viral kinetics
J. Theoret. Biol.
(2002) - Ball, K., Kurtz, T., Popovic, L., Rempala, G., 2005. Asymptotic analysis of multiscale approximations to reaction...
- Bayarri, M., Berger, J., Higdonand, D., Kennedy, M., Kottas, A., Paulo, R., Sacks, J., Cafeo, J., Cavendish, J., Lin,...
Metabolic conversion of benzo(a)pyrene by syrian hamster liver microsomes and binding of metabolites to deoxyribonucleic acid
J. Med. Chem.
Efficient formulation of the stochastic simulation algorithm for chemically reacting systems
J. Chem. Phys.
Monte Carlo Methods in Bayesian Computation. Springer Series in Statistics
Essentials of Stochastic Processes. Springer Texts in Statistics
Stochastic modeling of gene regulatory networks
Int. J. Robust Nonlinear Control
Characterization and convergence
Inferring cellular networks using probabilistic graphical models
Science
Exact stochastic simulation of coupled chemical reactions
J. Phys. Chem.
Chemical langevin equation
J. Chem. Phys.
Cited by (20)
Computational modeling of RNase, antisense ORF0 RNA, and intracellular compartmentation and their impact on the life cycle of the line retrotransposon
2021, Computational and Structural Biotechnology JournalCitation Excerpt :These parameter sweeps are intended to model the influence of carcinogen mediated epigenetic dysregulation on the fates and dispositions of LINE-1 components. Previous simulations of LINE-1 activation by Rempala et al. [38–39] focused on the steady state solutions of relatively simple systems. These included reactions involving the creation of LINE-1 mRNA from the corresponding DNA, the formation of a single protein from the mRNA and the creation of complementary DNA.
Towards uncertainty quantification and inference in the stochastic SIR epidemic model
2012, Mathematical BiosciencesCitation Excerpt :Among these research efforts, various Bayesian and likelihood based approaches to parameter estimation have been attempted. Some authors have worked with a deterministic, continuous or steady state alternative model to infer the parameters in the CME [21–23]. Other noteworthy approaches to Bayesian inference with the CME include [24,22], where the CME is approximated by a diffusion process and [25] where the van Kampen expansion is used to derive a diffusion approximation; other similar papers include [26,27].
Introduction and Overview of Technological Advances and Predictive Assays
2010, Comprehensive Toxicology, Second EditionGenomics, Bioinformatics, and Computational Biology
2010, Comprehensive Toxicology, Second EditionAlgebraic methods for inferring biochemical networks: A maximum likelihood approach
2009, Computational Biology and ChemistryStatistical Inference of Rate Constants in Chemical and Biochemical Reaction Networks Using an “Inverse” Event-Driven Kinetic Monte Carlo Method
2023, Journal of Physical Chemistry B