Effect of environmental stress on regulation of gene expression in the yeast

https://doi.org/10.1016/j.physa.2015.02.076Get rights and content

Highlights

  • Transcriptional responses to external stimuli are modeled using ODE with an unknown forcing term.

  • Hyperparameters are estimated using Gaussian Process priors and type II maximum likelihood.

  • State-space trajectories representing signature responses to different environmental stimuli are generated.

Abstract

Several mathematical models have been proposed to predict the activation state of a transcription factor (TF) from the expression levels of its target genes. This inference problem is complicated however due to the fact that different genes may be regulated by different activation schemes (linear, exponential, sigmoidal, etc.). In addition to transcription regulation, the rate of gene expression at any instantaneous point in time is also determined by the independent rates of baseline production and degradation. Consequently, the set of solutions to any model equations describe an infinite number of trajectories in probability space, thus rendering the problem NP-hard. In the current study we used a Gaussian process (GP) approach to address this inverse problem. Experimental gene expression data were modeled by a putative linear activation scheme and discrepancy between theory and experiment was modeled by a GP. Model hyperparameters were calculated using maximum likelihood estimates to generate continuous TF state-space profiles. Identifiability of model parameters was optimized by obtaining TF state-space functions for multiple genes simultaneously. We found that model parameters were sensitive to environmental stress conditions, producing different state-space profiles for different stresses.

Introduction

Transcription factors (TFs) are proteins that regulate the expression of genes in the organism’s genome by binding to regulatory binding sites in the promoter region of the gene. Clustering algorithms have been used in the past to identify novel binding sites of TFs from the expression patterns of their target genes  [1]. However, unlike the coding regions, the promoter regions of most genes are not well understood and the nucleotide binding motifs are not highly conserved  [2].

Furthermore, most TFs themselves need to be activated (i.e. by phosphorylation or dephosphorylation) before they can activate their target genes  [3], [4]. Several experimental approaches, such as protein binding arrays and chromosome immunoprecipitation (ChIP) have been used to identify the activation state of different TFs. However due to technical limitations and high costs of these experimental procedures they can only give a relatively static view of the system at a limited number of time points along the experiment.

The problem can be addressed computationally with mathematical models that extract information on TF activation dynamics from the expression patterns of their target genes. Both linear and non-linear models have been proposed to infer the activation state of TFs and the expression levels of their target genes using this approach  [5], [6], [7], [8], [9].

The inference problem is further complicated by identifiability of model parameters representing the independent processes of baseline mRNA transcription and degradation. Consequently, the set of solutions to any model describes an infinite number of trajectories in probability space, thus rendering the problem NP-hard.

To address the issue of model uncertainty, we propose here a Bayesian inference approach with a Gaussian process approximation in which the parameters of a putative activation model are estimated using a Gaussian process. Gaussian process (GP) is a non-parametric regression method that places probability distributions over functions. Several non-parametric methods have been used recently in modeling studies of gene expression regulation including Bayesian analysis  [10], [11], Dirichlet processes  [12], [13] and the Indian buffet process  [14].

GP inference of a linear activation model of gene expression in the fruit fly has recently been used to identify potential target genes of the TFs Twist, and Mef2, under both single-target and multiple-target activation schemes  [15]. The two TFs control mesoderm development in the fly. While their model successfully predicted the expression patterns of the target genes under the single-target activation scheme, it fell somewhat short in predicting their expression patterns under the more biologically-realistic multiple-target activation scheme. Here, we addressed this problem by introducing a hidden variable (hi(t) in Eq. (1)) into our model which represents the activation level (i.e. concentration of the active form) of the TF. Hidden variables approach has been more recently applied in studying motion capture, computational developmental biology, and geostatistics  [16], as well as transcription factor regulation of gene expression  [17]. However, in our opinion, the issue of parameter identifiability has not been properly addressed in these studies.

In the current study, we used GP to place a prior on the space of functions of TF expression levels. Since the expression level of the TF in our model is a latent variable, we refer to it as a state-space function. State-space models are based on the notion that there is an unobserved true state of the system, a latent state, evolving over time that can only be observed indirectly. GP has been recently used in various other computational optimization problems including the art gallery problem  [18] and the traveling sales person problem  [19]. Here, we addressed the identifiability issue by extending the GP model to approximate the responses of multiple genes. A similar computational approach has been previously used in school exams score prediction, pollution prediction and gene expression data  [20], [21].

The use of linear regulation models of gene expression reduces uncertainties about the biological interpretation of model parameters, at the risk of not capturing fine subtleties in the dynamics of these highly complex interacting systems including saturation, repression and combinatorial interactions. To address this dilemma, we used GP to simulate the discrepancy between our linear model predictions and the observed experimental results. In the current study we focused on  Rap1p and Abf1p, two TFs involved in regulation of gene expressions that encode for ribosomal proteins in the yeast Saccharomyces cerevisiae, under two related environmental stress conditions: oxidation stress and glucose-limited (starvation) growth. Ribosomes contain over 60 different protein units, most of which are present in a single copy per ribosome  [22]. The expression of the multiple ribosomal genes must be tightly coordinated to ensure the production of equimolar amounts of the various ribosomal protein units. Furthermore, the production rate of these ribosomal units must respond to physiological challenges the cell is facing, in order to allow a rapid and accurate adjustment of the rate of ribosome formation to new demands and environmental stresses. Accurate characterization of the activity profiles of TFs that control the expression of various ribosomal proteins may provide important insight into system control of the various signal transduction cascades that may be involved in this tightly coordinated process.

Section snippets

Model

We assume that the rate of change in mRNA level (xi̇) of target gene i, can be described by a linear differential equation  [6]: xi̇=j=1FKjhj(t)+bμxi,i=1,,Nexp where hi(t) is a hidden variable (i.e. state-space function) that represents the activation level (i.e. concentration of the active form) of the ith TF and Ki is a coupling constant that represents the coupling strength of mRNA transcription in response to the ith TF binding to its binding site in the DNA’s promoter region. Thus, the

Results and discussion

We first tested our model on synthetic data simulating the regulation of 10 test “genes” by two “TFs” as shown in Fig. 2(A). Using the toy h(t) profiles, three expression data sets were synthesized according to Eq. (2) with σ2=0.05 and three replicates. As can be seen in Fig. 2(B) (lower panel), the inferred mean h(t) profile (dashed line) closely corresponds to the true profile (solid line) from which the data were produced. Also shown (upper panel) are the three replicates generated for one

Conclusions

In the current study we presented a stochastic linear model to solve the inverse problem of inferring TFs state-space functions from the expression patterns of the genes they regulate. We used a Gaussian process approach and informative priors to solve the problem of identifiability of model parameters and uncertainty in the kinetic scheme of the regulation reaction. As a caveat, it should be noted that with the exception of well distinguishable TF activity profiles and a sufficiently smooth

Acknowledgments

This work was supported in part by an Arkansas Biosciences Institute (Grant No. Gross 0824BR092007) grant to EG. EG wishes to thank Dr. David McNabb for helpful discussion on Saccharomyces cerevisiae.

References (40)

  • I. Simon et al.

    Serial regulation of transcriptional regulators in the yeast cell cycle

    Cell

    (2001)
  • T.I. Lee et al.

    Transcriptional regulatory networks in Saccharomyces cerevisiae

    Science

    (2002)
  • C.T. Harbison et al.

    Transcriptional regulatory code of a eukaryotic genome

    Nature

    (2004)
  • V.J. Lynch et al.

    Regulatory evolution through divergence of a phosphoswitch in the transcription factor CEBPB

    Nature

    (2011)
  • F.M. Uckun et al.

    Serine phosphorylation by SYK is critical for nuclear localization and transcription factor function of Ikaros

    Proc. Natl. Acad. Sci. USA

    (2012)
  • M. Barenco et al.

    Ranked prediction of p53 targets using hidden variable dynamic modeling

    Genome Biol.

    (2006)
  • S. Rogers et al.

    Bayesian model-based inference of transcription factor activity

    BMC Bioinform.

    (2007)
  • N.D. Lawrence et al.

    Advances in Neural Information Processing Systems, NIPS

    (2007)
  • R. Khanin et al.

    Statistical reconstruction of transcription factor activity using Michaelis–Menten kinetics

    Biometrics

    (2007)
  • M. Opper et al.

    Learning combinatorial transcriptional dynamics from gene expression data

    Bioinformatics

    (2010)
  • T. Aijo et al.

    Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics

    Bioinformatics

    (2009)
  • C.A. Penfold et al.

    Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks

    Bioinformatics

    (2012)
  • J. Meng et al.

    Bayesian non-negative factor analysis for reconstructing transcription factor mediated regulatory networks

    Proteome Sci.

    (2011)
  • B. Shahbaba et al.

    Bayesian nonparametric variable selection as an exploratory tool for discovering differentially expressed genes

    Stat. Med.

    (2012)
  • B. Chen et al.

    Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies

    BMC Bioinform.

    (2010)
  • A. Honkela et al.

    Model-based method for transcription factor target identification with limited data

    Proc. Natl. Acad. Sci. USA

    (2010)
  • M.A. Alvarez et al.

    Linear latent force models using Gaussian processes

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • P. Gao et al.

    Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities

    Bioinformatics

    (2008)
  • A. Krause et al.

    Near optimal sensor placements in Gaussian processes: theory, efficient algorithms and empirical studies

    J. Mach. Learn. Res.

    (2008)
  • W. Kongkaew et al.

    A Gaussian process regression model for the traveling salesman problem

    J. Comput. Sci.

    (2012)
  • Cited by (0)

    View full text