Evolutionary modeling and inference of gene network
Introduction
In this paper, we present an Evolutionary Modeling (EM) approach for identifying a causal model from the observed time series data. Our hybrid EM method of Genetic Programming (GP) and statistical analysis is used to build a system of differential equations, i.e., a causal model for gene regulatory networks along with other practical real-world complex systems.
The inference of gene regulation network is an important and heavily studied topic in bioinformatics. In recent years, the amount and the accuracy of available gene expression profiles have greatly increased in light of advancing molecular engineering technology. The transcription of genes into mRNAs and proteins is termed as the expression of the genes. The genes and their products interact with one another resulting in a cascade of reactions among the cell. It conforms a complex network, providing a challenging task for biologists to understand the underlying structure and predict their behavior. Such task is a major contribution in genome analysis.
Few of the problems we face in modeling these genetic networks are: First, enormous number of related genes provides very large dimension for its solution and many local optima within. Secondly, very limited knowledge on the reaction of the mRNAs, on the contrary to the protein reaction, provides no standard kinetic framework to work on. Also, significant noise ratio in the microarray observation data makes the numerical approach very difficult. Finally, since there are very few known structure of the genetic network, acquired model has to be evaluated experimentally.
The rest of this paper is organized as follows. In Section 2, conventional models of gene network is described. Then in Section 3, we describe the details of modeling by GP + LMS, i.e., how GP and LMS methods are integrated to work in the course of evolution. Three examples are used to examine the effectiveness of our method. Their experimental results and some discussion are shown in Section 4. Further, details of perturbation analysis for application in large-scale gene network analysis is described in Section 6. The analysis in the conventional model simulation and microarray observation is shown in 7.1 Experiment with artificial data, 7.3 Microarray analysis. Discussion and conclusion is given in Section 8.
Section snippets
Conventional models of gene networks
The objective of genetic network modeling is to formulate the transition of gene expression, i.e., concentration of products such as mRNAs. They are observed by cDNA microarrays [1]. These are the variables which defines the behavior of the genes.
Most of the existing genetic network model parameterize degree of effect between the genes in a matrix, e.g., Wi,j (Eq. (1)) and gi,j (Eq. (2)). Most abstract form of such model is the Boolean networks [2], which approximate the gene expression into
Genetic programming
Our method is based on the EM, in which GP is used to derive an arbitrary form in right-hand side of an differential equations. They have been used to model sunspots, chemical reactions, transition in population, and etc. [8]. We use GP [9], an extension of the Genetic Algorithm [10] proposed by Koza, to generate a differential equation, which represent a behavior model of the gene in a biological network. A GP individual is designed as a tree structure of mathematical operations and variables.
Experimental results
We have prepared three different tasks to test the effectiveness of our method. Experimental parameters are summarized in Table 1, Table 2. Function and terminal sets F and T are defined as follows:
Robustness
Although the above section shows the effectiveness of our approach in acquiring the exact form which is very close to the target observed data, there is another factor to be considered, i.e., the robustness. To test the robustness of our method to the real noisy world, we conducted the E-cell experiment (i.e., Exp. 2) with noise-added data sets. Five percent and 10% random noises were added to the target time series. The acquired time series are plotted in Fig. 8, Fig. 9 with the target data.
Implementation of perturbation analysis
In our approach, the behavior of genes are modeled in general differential equation model. The acquired models are evaluated by biologist, and used to analyze gene expression and imply possible relations between genes. To give insight to the biologists, a concise and explanatory programs are preferred. Also, biological presumption is that these gene networks are sparse, further encourage the simple and concise expression. Though GP + LMS reduces the known problems of bloat and introns, acquired
Indicating robust parameters
This section describes Monte Carlo analysis for indicating the robust and relevant terms from complicated programs.
The above-mentioned GP models the structure of the network in following form:But considering the amount of data points and the number of parameters, large portion of the model structure and parameter is not completely determinable, per se.
In such conditions, GP is acute to noise and random seeds, and the generated programs are more likely to be overly
LMSGP
We have proposed the inference method of the system of DEs from the observed time series using GP along with the LMS method. We have shown how successfully our method can infer the causal model by several experiments. More precisely, we succeeded in acquiring the system of DEs which is very close to the observed time series and inferring the exact equation form. The effectiveness of the LMS method and the superiority of our approach over the previous method were confirmed by comparative
Conclusion
Though conventional models approximate all the reaction into uniform function, biological systems consists of multiple functional regions, layers and cell types within. They behave differently under different environment. Genetic program can model such systems in a general form of differential equations for specific gene. These models can suggest useful hypothesis to guide further experiments.
Though we limited the functions to minimal in our experiment, GP can include more functions such as
References (29)
Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods
Methods Enzymol.
(1996)- et al.
Coarse-grained reverse engineering of genetic regulatory networks
Biosystems
(2000) - et al.
Exploring the new world of the genome with DNA microarrays
Nat. Genet.
(1999) - et al.
A system to find genetic networks using weighted network model
Genome Inf. Ser. Workshop Genome Inform.
(1999) - et al.
Genetic network inference: from co-expression clustering to reverse engineering [in process citation]
Bioinformatics
(2000) - et al.
Linear modeling of genetic networks from experimental data
Proc. Int. Conf. Intell. Syst. Mol. Biol.
(2000) Rules for the evolution of gene circuitry
Pac. Symp. Biocomput.
(1998)- et al.
A comparison of genetic network models
Pac. Symp. Biocomput.
(2001) - S. Ando, H. Iba, The matrix modeling of gene regulatory networks-reverse engineering by genetic algorithms, in:...
- et al.
Evolutionary modeling of systems of ordinary differential equatins with genetic programming
Genetic Program. Evol. Mach.
(2000)