Effect of environmental stress on regulation of gene expression in the yeast
Introduction
Transcription factors (TFs) are proteins that regulate the expression of genes in the organism’s genome by binding to regulatory binding sites in the promoter region of the gene. Clustering algorithms have been used in the past to identify novel binding sites of TFs from the expression patterns of their target genes [1]. However, unlike the coding regions, the promoter regions of most genes are not well understood and the nucleotide binding motifs are not highly conserved [2].
Furthermore, most TFs themselves need to be activated (i.e. by phosphorylation or dephosphorylation) before they can activate their target genes [3], [4]. Several experimental approaches, such as protein binding arrays and chromosome immunoprecipitation (ChIP) have been used to identify the activation state of different TFs. However due to technical limitations and high costs of these experimental procedures they can only give a relatively static view of the system at a limited number of time points along the experiment.
The problem can be addressed computationally with mathematical models that extract information on TF activation dynamics from the expression patterns of their target genes. Both linear and non-linear models have been proposed to infer the activation state of TFs and the expression levels of their target genes using this approach [5], [6], [7], [8], [9].
The inference problem is further complicated by identifiability of model parameters representing the independent processes of baseline mRNA transcription and degradation. Consequently, the set of solutions to any model describes an infinite number of trajectories in probability space, thus rendering the problem NP-hard.
To address the issue of model uncertainty, we propose here a Bayesian inference approach with a Gaussian process approximation in which the parameters of a putative activation model are estimated using a Gaussian process. Gaussian process (GP) is a non-parametric regression method that places probability distributions over functions. Several non-parametric methods have been used recently in modeling studies of gene expression regulation including Bayesian analysis [10], [11], Dirichlet processes [12], [13] and the Indian buffet process [14].
GP inference of a linear activation model of gene expression in the fruit fly has recently been used to identify potential target genes of the TFs Twist, and Mef2, under both single-target and multiple-target activation schemes [15]. The two TFs control mesoderm development in the fly. While their model successfully predicted the expression patterns of the target genes under the single-target activation scheme, it fell somewhat short in predicting their expression patterns under the more biologically-realistic multiple-target activation scheme. Here, we addressed this problem by introducing a hidden variable ( in Eq. (1)) into our model which represents the activation level (i.e. concentration of the active form) of the TF. Hidden variables approach has been more recently applied in studying motion capture, computational developmental biology, and geostatistics [16], as well as transcription factor regulation of gene expression [17]. However, in our opinion, the issue of parameter identifiability has not been properly addressed in these studies.
In the current study, we used GP to place a prior on the space of functions of TF expression levels. Since the expression level of the TF in our model is a latent variable, we refer to it as a state-space function. State-space models are based on the notion that there is an unobserved true state of the system, a latent state, evolving over time that can only be observed indirectly. GP has been recently used in various other computational optimization problems including the art gallery problem [18] and the traveling sales person problem [19]. Here, we addressed the identifiability issue by extending the GP model to approximate the responses of multiple genes. A similar computational approach has been previously used in school exams score prediction, pollution prediction and gene expression data [20], [21].
The use of linear regulation models of gene expression reduces uncertainties about the biological interpretation of model parameters, at the risk of not capturing fine subtleties in the dynamics of these highly complex interacting systems including saturation, repression and combinatorial interactions. To address this dilemma, we used GP to simulate the discrepancy between our linear model predictions and the observed experimental results. In the current study we focused on Rap1p and Abf1p, two TFs involved in regulation of gene expressions that encode for ribosomal proteins in the yeast Saccharomyces cerevisiae, under two related environmental stress conditions: oxidation stress and glucose-limited (starvation) growth. Ribosomes contain over 60 different protein units, most of which are present in a single copy per ribosome [22]. The expression of the multiple ribosomal genes must be tightly coordinated to ensure the production of equimolar amounts of the various ribosomal protein units. Furthermore, the production rate of these ribosomal units must respond to physiological challenges the cell is facing, in order to allow a rapid and accurate adjustment of the rate of ribosome formation to new demands and environmental stresses. Accurate characterization of the activity profiles of TFs that control the expression of various ribosomal proteins may provide important insight into system control of the various signal transduction cascades that may be involved in this tightly coordinated process.
Section snippets
Model
We assume that the rate of change in mRNA level () of target gene , can be described by a linear differential equation [6]: where is a hidden variable (i.e. state-space function) that represents the activation level (i.e. concentration of the active form) of the th TF and is a coupling constant that represents the coupling strength of mRNA transcription in response to the th TF binding to its binding site in the DNA’s promoter region. Thus, the
Results and discussion
We first tested our model on synthetic data simulating the regulation of 10 test “genes” by two “TFs” as shown in Fig. 2(A). Using the toy profiles, three expression data sets were synthesized according to Eq. (2) with and three replicates. As can be seen in Fig. 2(B) (lower panel), the inferred mean profile (dashed line) closely corresponds to the true profile (solid line) from which the data were produced. Also shown (upper panel) are the three replicates generated for one
Conclusions
In the current study we presented a stochastic linear model to solve the inverse problem of inferring TFs state-space functions from the expression patterns of the genes they regulate. We used a Gaussian process approach and informative priors to solve the problem of identifiability of model parameters and uncertainty in the kinetic scheme of the regulation reaction. As a caveat, it should be noted that with the exception of well distinguishable TF activity profiles and a sufficiently smooth
Acknowledgments
This work was supported in part by an Arkansas Biosciences Institute (Grant No. Gross 0824BR092007) grant to EG. EG wishes to thank Dr. David McNabb for helpful discussion on Saccharomyces cerevisiae.
References (40)
- et al.
Serial regulation of transcriptional regulators in the yeast cell cycle
Cell
(2001) - et al.
Transcriptional regulatory networks in Saccharomyces cerevisiae
Science
(2002) - et al.
Transcriptional regulatory code of a eukaryotic genome
Nature
(2004) - et al.
Regulatory evolution through divergence of a phosphoswitch in the transcription factor CEBPB
Nature
(2011) - et al.
Serine phosphorylation by SYK is critical for nuclear localization and transcription factor function of Ikaros
Proc. Natl. Acad. Sci. USA
(2012) - et al.
Ranked prediction of p53 targets using hidden variable dynamic modeling
Genome Biol.
(2006) - et al.
Bayesian model-based inference of transcription factor activity
BMC Bioinform.
(2007) - et al.
Advances in Neural Information Processing Systems, NIPS
(2007) - et al.
Statistical reconstruction of transcription factor activity using Michaelis–Menten kinetics
Biometrics
(2007) - et al.
Learning combinatorial transcriptional dynamics from gene expression data
Bioinformatics
(2010)