Elsevier

Information Sciences

Volume 145, Issues 3–4, September 2002, Pages 237-259
Information Sciences

Evolutionary modeling and inference of gene network

https://doi.org/10.1016/S0020-0255(02)00235-9Get rights and content

Abstract

This paper describes an Evolutionary Modeling (EM) approach to building causal model of differential equation system from time series data. The main target of the modeling is the gene regulatory network. A hybrid method of Genetic Programming (GP) and statistical analysis is featured in our work. GP and Least Mean Square method (LMS) were combined to identify a concise form of regulation between the variables from a given set of time series. Our approach was evaluated in several real-world problems. Further, Monte Carlo analysis is applied to indicate the robust and significant influence from the results for gene network analysis purpose.

Introduction

In this paper, we present an Evolutionary Modeling (EM) approach for identifying a causal model from the observed time series data. Our hybrid EM method of Genetic Programming (GP) and statistical analysis is used to build a system of differential equations, i.e., a causal model for gene regulatory networks along with other practical real-world complex systems.

The inference of gene regulation network is an important and heavily studied topic in bioinformatics. In recent years, the amount and the accuracy of available gene expression profiles have greatly increased in light of advancing molecular engineering technology. The transcription of genes into mRNAs and proteins is termed as the expression of the genes. The genes and their products interact with one another resulting in a cascade of reactions among the cell. It conforms a complex network, providing a challenging task for biologists to understand the underlying structure and predict their behavior. Such task is a major contribution in genome analysis.

Few of the problems we face in modeling these genetic networks are: First, enormous number of related genes provides very large dimension for its solution and many local optima within. Secondly, very limited knowledge on the reaction of the mRNAs, on the contrary to the protein reaction, provides no standard kinetic framework to work on. Also, significant noise ratio in the microarray observation data makes the numerical approach very difficult. Finally, since there are very few known structure of the genetic network, acquired model has to be evaluated experimentally.

The rest of this paper is organized as follows. In Section 2, conventional models of gene network is described. Then in Section 3, we describe the details of modeling by GP + LMS, i.e., how GP and LMS methods are integrated to work in the course of evolution. Three examples are used to examine the effectiveness of our method. Their experimental results and some discussion are shown in Section 4. Further, details of perturbation analysis for application in large-scale gene network analysis is described in Section 6. The analysis in the conventional model simulation and microarray observation is shown in 7.1 Experiment with artificial data, 7.3 Microarray analysis. Discussion and conclusion is given in Section 8.

Section snippets

Conventional models of gene networks

The objective of genetic network modeling is to formulate the transition of gene expression, i.e., concentration of products such as mRNAs. They are observed by cDNA microarrays [1]. These are the variables which defines the behavior of the genes.

Most of the existing genetic network model parameterize degree of effect between the genes in a matrix, e.g., Wi,j (Eq. (1)) and gi,j (Eq. (2)). Most abstract form of such model is the Boolean networks [2], which approximate the gene expression into

Genetic programming

Our method is based on the EM, in which GP is used to derive an arbitrary form in right-hand side of an differential equations. They have been used to model sunspots, chemical reactions, transition in population, and etc. [8]. We use GP [9], an extension of the Genetic Algorithm [10] proposed by Koza, to generate a differential equation, which represent a behavior model of the gene in a biological network. A GP individual is designed as a tree structure of mathematical operations and variables.

Experimental results

We have prepared three different tasks to test the effectiveness of our method. Experimental parameters are summarized in Table 1, Table 2. Function and terminal sets F and T are defined as follows:F={+,−,×},T={X1,…,Xn,1}.

Robustness

Although the above section shows the effectiveness of our approach in acquiring the exact form which is very close to the target observed data, there is another factor to be considered, i.e., the robustness. To test the robustness of our method to the real noisy world, we conducted the E-cell experiment (i.e., Exp. 2) with noise-added data sets. Five percent and 10% random noises were added to the target time series. The acquired time series are plotted in Fig. 8, Fig. 9 with the target data.

Implementation of perturbation analysis

In our approach, the behavior of genes are modeled in general differential equation model. The acquired models are evaluated by biologist, and used to analyze gene expression and imply possible relations between genes. To give insight to the biologists, a concise and explanatory programs are preferred. Also, biological presumption is that these gene networks are sparse, further encourage the simple and concise expression. Though GP + LMS reduces the known problems of bloat and introns, acquired

Indicating robust parameters

This section describes Monte Carlo analysis for indicating the robust and relevant terms from complicated programs.

The above-mentioned GP models the structure of the network in following form:ddtXi=∑k=1mαkl=1nXlgl.But considering the amount of data points and the number of parameters, large portion of the model structure and parameter is not completely determinable, per se.

In such conditions, GP is acute to noise and random seeds, and the generated programs are more likely to be overly

LMS+GP

We have proposed the inference method of the system of DEs from the observed time series using GP along with the LMS method. We have shown how successfully our method can infer the causal model by several experiments. More precisely, we succeeded in acquiring the system of DEs which is very close to the observed time series and inferring the exact equation form. The effectiveness of the LMS method and the superiority of our approach over the previous method were confirmed by comparative

Conclusion

Though conventional models approximate all the reaction into uniform function, biological systems consists of multiple functional regions, layers and cell types within. They behave differently under different environment. Genetic program can model such systems in a general form of differential equations for specific gene. These models can suggest useful hypothesis to guide further experiments.

Though we limited the functions to minimal in our experiment, GP can include more functions such as

References (29)

  • J. Felsenstein

    Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods

    Methods Enzymol.

    (1996)
  • M. Wahde et al.

    Coarse-grained reverse engineering of genetic regulatory networks

    Biosystems

    (2000)
  • P.O. Brown et al.

    Exploring the new world of the genome with DNA microarrays

    Nat. Genet.

    (1999)
  • T. Moriyama et al.

    A system to find genetic networks using weighted network model

    Genome Inf. Ser. Workshop Genome Inform.

    (1999)
  • P. D’Haeseleer et al.

    Genetic network inference: from co-expression clustering to reverse engineering [in process citation]

    Bioinformatics

    (2000)
  • E.P. van Someren et al.

    Linear modeling of genetic networks from experimental data

    Proc. Int. Conf. Intell. Syst. Mol. Biol.

    (2000)
  • M.A. Savageau

    Rules for the evolution of gene circuitry

    Pac. Symp. Biocomput.

    (1998)
  • L.F. Wessels et al.

    A comparison of genetic network models

    Pac. Symp. Biocomput.

    (2001)
  • S. Ando, H. Iba, The matrix modeling of gene regulatory networks-reverse engineering by genetic algorithms, in:...
  • H. Cao et al.

    Evolutionary modeling of systems of ordinary differential equatins with genetic programming

    Genetic Program. Evol. Mach.

    (2000)
  • J.R. Koza, Genetic programming III: Darwinian invention and problem solving, Morgan Kaufmann, San Francisco, 1999, John...
  • J.H. Holland

    Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence

    (1975)
  • J.R. Koza

    Genetic programming: on the programming of computers by means of natural selection, Complex adaptive systems

    (1992)
  • V. Babovic et al.

    Evolutionary algorithms approach to induction of differential equations

  • Cited by (0)

    View full text