Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Optimal parameter identification of synthetic gene networks using harmony search algorithm

  • Wei Zhang,

    Roles Conceptualization, Formal analysis, Validation, Visualization, Writing – original draft

    Affiliation Institute of Cyber-Systems and Control, Department of Control and Engineering, Zhejiang University, Hangzhou, China

  • Wenchao Li,

    Roles Software, Visualization

    Affiliation Institute of Cyber-Systems and Control, Department of Control and Engineering, Zhejiang University, Hangzhou, China

  • Jianming Zhang ,

    Roles Supervision

    jmzhang@iipc.zju.edu.cn

    Affiliation Institute of Cyber-Systems and Control, Department of Control and Engineering, Zhejiang University, Hangzhou, China

  • Ning Wang

    Roles Supervision

    Affiliation Institute of Cyber-Systems and Control, Department of Control and Engineering, Zhejiang University, Hangzhou, China

Abstract

Computational modeling of engineered gene circuits is an important while challenged task in systems biology. In order to describe and predict the response behaviors of genetic circuits using reliable model parameters, this paper applies an optimal experimental design(OED) method to obtain input signals. In order to obtain informative observations, this study focuses on maximizing Fisher information matrix(FIM)-based optimal criteria and to provide optimal inputs. Furthermore, this paper designs a two-stage optimization with the modified E-optimal criteria and applies harmony search(HS)-based OED algorithm to minimize estimation errors. The proposed optimal identification methodology involves estimation errors and the sample size to pursue a trade-off between estimation accuracy and measurement cost in modeling gene networks. The designed cost function takes two major factors into account, in which experimental costs are proportional to the number of time points. Experiments select two types of synthetic genetic networks to validate the effectiveness of the proposed HS-OED approach. Identification outcomes and analysis indicate the proposed HS-OED method outperforms two candidate OED approaches, with reduced computational effort.

Introduction

In synthetic biology, synthetic gene circuits and networks offer the opportunity to modify behaviors of cellular systems in a controllable and stable way. With well-designed modules or so-called biobricks, it has become feasible to design and build complex DNA circuits that can detect and trigger activities in cells [1]. One purpose of synthetic biology is to design and construct artificial biological systems with certain cellular functions [2, 3]. For instance, a DNA-based chemical oscillator has been constructed to accomplish molecular computing in the future [4]. This prototype DNA oscillator was programmed to generate repeating patterns. Synthetic gene circuits have also been applied for programming cellular functionality [5, 6]. Constructing gene networks involves optimization of biochemical parameters and network topologies. The SynNet method applied a two-step optimization strategy to find the global biochemical parameters [7]. Accurate predictive models are important in aiding the process of designing synthetic biological systems. In order to construct mathematical models, computational approaches are used to infer unknown model parameters using expression data. Emerging genomic data provide necessary information in modeling. The topics discussed in model identification include network structures, functional form of nonlinear dynamics and coefficients that represent reaction rates.

Computational modeling of genetic circuits is not only useful in modular construction of synthetic biological systems, but also is beneficial for exploring the gene regulations mechanisms underlying expression data. Gene circuit modeling has became a powerful tool for synthetic biologists. Multiple types of computational methods have been developed for modeling gene regulatory circuits [8, 9]. Mathematical models of genetic oscillators have been derived to investigate the dynamics of gene regulation [10]. Prior biological knowledge such as sparse interactions and network motifs also play crucial roles in identification [11]. Furthermore, sparse regression approaches have been applied to infer biological networks of nonlinear dynamics [12, 13]. Parameter estimation using single cell expression data will be another promising option. In this study, the fluorescence levels of the reporter protein are used to reflect the expression levels, leading to limited sample size [14]. Compared with experiment-based trail and error approaches, model-based computer aided techniques are able to find the best solution within a defined search space using validated models. However, both solutions are considered to lack efficient predictive capabilities.

Deterministic modeling has the advantage of relatively clear biophysical meanings and computational efficiency. However, current computational approaches and mathematical models are still lacking efficient ability to accurately capture and predict the expression dynamics [15]. In general, such computational modeling tasks are time-consuming, even for medium-size gene circuits [16]. Before identification, the model sets, computational complexity and experimental conditions should be taken into consideration. There are some practical problems, such as the number of experiments and measured time points to guarantee the plausibility and reliability of predictive models. The model complexity, partly reflected by the number of unknown kinetic parameters, directly influence the computational complexity. Some research works have noticed the cost in measuring the expression levels of mRNA and expressed proteins [17]. This analysis is qualitative to some extent. This study tries to analyze the measurement cost quantitatively and introduce a penalty in the cost function. Furthermore, a heuristic optimization is applied to solve the optimization problem.

In order to obtain predictive models of genetic circuits, the principle of maximum entropy was employed to build a minimal model with three constraints [18]. As protein synthesis, degradation and positive feedback have been taken into consideration in the modeling, feedback parameters can be yielded for circuit design. The difficulties in modeling of synthetic biological systems partly lie in structural and parametric uncertainty as well as the lack of efficient information. In order to obtain accurate kinetic parameters, increasing the number of experiments and measured data will be a candidate choice. Meanwhile, the quality of data should not be ignored since it directly influences the accuracy of modeling. Only with high-quality measurements, parameter estimation approaches are possible to obtain reliable parameters of biological networks. Global optimization and nonlinear least square methods are commonly used to infer possible model parameters. In most cases, established parameter estimation methods still have high level of uncertainty in dealing with biochemical networks. Under this circumstance, optimal identification through well designed perturbation experiments become a feasible and promising solution.

As researches always explore approaches to accomplish the model identification of synthetic gene networks in an ACAP(as accurate as possible) style, increasing the amount of measurement is a direct and feasible solution. This expensive and time-consuming solution has limited ability to increase information content of experiments. Accurate models need high-quality measurements, which depend on appropriately designed experiments. Relying only on prior experience is not enough to design input signals in modeling of synthetic gene circuits. In this way, optimal experimental design(OED) provides a feasible way to design input signals, including selection of sampling periods and the number of measured points. The basic principle of OED is to yield informative measurements, thus increasing accuracy of identification. As computational modeling is an importance part in computational systems biology, the topic of OED have been discussed to improve the accuracy of identified models [19]. Measurement noise as well as robust architecture of biological circuits are factors that influence the accuracy and reliability of predictive models. Thus, optimal experiment design is regarded as a powerful tool to minimize the number of experiments needed to infer biological parameters, thus reducing experimental cost. Dynamic models of regulatory networks can be denoted in the form of differential equations [20]. In modeling of biochemical network models, pros and cons of three optimal design approaches have been compared [21]. For computational modeling of biochemical networks, lack of efficient experimental data make it difficult to obtain reliable mathematical models. Under the framework of Bayes estimation, the OED is used to predict which experiments can infer accuracy model parameter distribution [22]. This method applied the k-Nearest Neighbor method to estimate the Jensen Shannon divergence between predictive densities of competing models. For gene network, experimental design is beneficial to reduce the uncertainty in network inference. To collect enough observations for network inference, the resource spent on perturbation experiments is considerable [23]. Well-designed perturbation experiments are necessary to obtain high-quality observations.

The problem of OED is usually converted to an optimization problem that involves the judgment of model quality, which is related the scalar functions of FIM [24]. Several factors such as measured time points, sampling time and efficient information contained in measurements have influence on the modeling quality of synthetic gene circuits. Given a cost function, screening search strategies can be applied to find the suitable input level for a given system. However, this brute search strategy may become ineffective due to huge space of viable points, especially for complex gene networks. Considering this limitation, the optimal input level can be determined in another search strategy. The core problem in optimal identification is to keep balance between estimation accuracy and measurement cost using a well designed cost function. This study applied a modified harmony search algorithm to solve OED problem in modeling gene circuits.

Under the framework of deterministic modeling, the paper proposes an two-stage identification method to obtain model parameters at a low experimental cost. In this method, the model quality is judged by the modified E-optimal criteria and the measurement cost is reflected the number of collected data points. Regarding two factors in computational modeling of gene networks, a cost function with two terms is constructed and minimized in the cost function. The first penalty term represents the derivation between predicted and measured output, while the second penalty term denotes the normalized experimental cost related with the number of collected time points. It is noted that selection of measured time points should consider the complexity of the given system, i.e. the number of unknown parameters. Traditional gradient-based methods hardly accomplish computation of the scalar function of information matrix. This paper applies a optimal identification based on harmony search algorithm to compute FIM-based modified E-optimal criteria as well as the parameter vector of ODE systems. Different with existing parameter estimation methods, the proposed method searches an optimal input level to get most informative observations in the outer loop. Experimental outcomes that involve two kinds of synthetic gene networks illustrate that the proposed optimal identification is able to achieve a tradeoff between estimation accuracy and measurement cost.

1 Model identification of gene networks

Under the framework of deterministic modeling, the modeling of gene regulatory networks can be converted to parameter estimation of ordinary differential equation(ODE)-based systems. The task of parameter estimation is to compute parameters of predictive models that predict the system behaviors by extracting information from measurements. There are inherent limitations when applying common estimation approaches in in biological networks. Firstly, empirical parameterizations of the functional form is usually difficult for complex biological networks. Secondly, estimating system parameters typically requires high-quality measured data that is supported by well-designed experiments. Correlation between model parameters is another negative factor that influence estimation accuracy [25]. Other factors such as sampling frequencies and regularization terms in regression also have impact on the performance of deterministic modeling [26].

1.1 Parameter estimation of gene networks

System identification approaches need the model set such as ordinary differential equations(ODE) or stochastic differential differential equations(SDE) to describe the expression behaviors for a given system. In this case, low-order approximation of objective function become a feasible solution. In order to describe the expression behaviors of gene networks, the ODE models are defined by Eq (1). (1) where x(t) denote the observed expression levels of genes, u(t) is the experimental input signal that can be artificially modified. Parameter vector p* can be estimated by minimizing the cost function that is related with prediction error. As a typical kind of biological networks, gene regulatory networks(GRN) consist of a series of biochemical reactions. Coding regions of DNA will be firstly transcribed as mRNA molecules, then translated to proteins. Expression levels of genes or transcription factors (TFs) are used to estimate unknown kinetic parameters. In deterministic modeling of gene networks, prediction errors between measurements and model parameters are defined by Eq (2). (2) where yi and denote the measured and predicted output at the measured time ti. In order to estimate the parameters in ODE models for gene networks, gene expression levels often play the role of information source. The operating principle of parameter estimation is mainly based on the minimization of deviation between model prediction and experimental measurements. This study selects the root-mean-square error(RMSE) index to reflect the degree of deviation, which is defined in Eq (3). (3) where n is the number of time points collected. In addition, the residual sum of squares(RSS) is also a measure of the discrepancy between the measured data and the model. As the quality of measurements influence the accuracy of estimation, effects of parameter variations are taken into consideration in the weighted residuals sum square error(wRSSE) index, that is defined in Eq (4). (4)

This wRSSE index reflects the degree of uncertainty in measured data to some degree. Gradient-based methods are used to search local minimums of the cost function [27]. According to gradient-based methods, the objective function is approximated by gradient vector, defined by Eq (5). (5) where pi(i = 1, …, np) are inferred components of the parameter vector. Unknown parameter vectors represented as individuals are operated by genetic operations including mutation and selection. Such individuals in the population of heuristic algorithms are regarded as candidate solutions. For instance, solutions are represented as strings of binary numbers in genetic algorithms. Evolutional strategies are relatively efficient in dealing with nonlinear ODEs. With the assumption of positive definite Hessian matrix, unique minimizer for the model can be found by solving the linear system. Thus the parameter vector for next iteration can be defined as Eq (6). (6)

Global optimization methods that minimize ϕ(p) over all possible values of p are hard to find, but local optimization approaches are feasible. To search the optimal parameter vector p*, evolutionary algorithms have been widely applied in the optimization problem.

1.2 Sensitivity analysis of gene networks

The initial purpose of sensitivity analysis is to find those parameters that influence system dynamics significantly. Sensitivity analysis also plays a special role in analyzing practical identifiability and uncertainty assessment. For the given system y = f(x, p, t), changes of systems states x(t) depend on changes of model parameters p around a reference point s(0), described by Eq (7). (7)

Regarding the parameter space, time-dependent linear approximation of parameter sensitivity behaviors can be captured by finite difference approximation. Using augmented dynamic ODE system, implicit computation of the sensitivity matrices for gene network are calculated as Eq (8). (8) Where Jx(t, p0) = ∂f(x, p, t)/∂x and Jp(t, p0) = ∂f(x, p, t)/∂p are Jacobian matrices evaluated at the parameter values p0. With the assumption of Gaussian approximation and Q = I, the Hessian matrix of the objective function ϕ(p) can be rewritten in terms of parameter sensitivities, denoted by Eq (9). (9) where S(p) denotes a matrix of time-dependent blocks sij(tk) = ∂xi(tk)/∂pj. Existence of linearly dependent columns of the sensitivity matrix S(p) will lead to non-unique solutions of p. That indicate part of model parameters are unidentifiable at a specific reference point p0 in parameter space. Correlations between two column vectors S.,i, S.,j of the sensitivity matrix S are described by Eq (10). (10) where σ(S.,i), σ(S.,j) represent the covariance between the i-th and j-th columns of S. For two linearly dependent columns, the correlation index is |corr(S.,i, S.,j)| = 1. Threshold for correlation between two model parameters is related with pairwise correlations.

1.3 FIM-based estimation accuracy analysis

After obtaining the optimal parameter vector , goodness-of-fit(GOF) and estimation accuracy will be analyzed in subsequent evaluation. As true kinetic parameters p* are unknown, it is crucial to judge the accuracy of estimated parameters and the degree of deviation. Assume fixed model structure, the observed data become mixed with the measurement noise, that is described by Eq (11). (11)

With the assumption of a normally distributed random variable on measurement error, the resulting residuals for each measurement are given as Eq (12). (12)

With the objective function , the measurement variance can be applied to construct weight matrix Q. Generally, there are two options in values of weight matrix Q. The choice of Q = I corresponds to equal weight for errors, regardless of measurement accuracies. The situation Q = C−1 indicates weighting based on the inverse of the measurement covariance matrix C. In the second situation, standard deviations of measurements appear on the diagonal positions. According to the principle of goodness-of-fit(GOF), statistics for ϕ(p) should follow χ2 distribution with r degrees of freedom, where r equals to the number of data points minus that of estimated parameters. With an assumption of norm distribution, i.e. XiN(μ, σ2), the average value is denoted as . Additionally, the sampling distribution is defined as Eq (13). (13)

This sampling distribution associated with the sample variance follows a χ2 distribution with N − 1 degrees of freedom, i.e. . If a new statistic Z is constructed as , then this statistic index implies the chi-squared distribution χ2(k), denoted by Eq (14). (14)

In computational modeling of gene networks, large-scale models and scarce experimental data will inevitably lead to limited observability of states as well as uncertainty in parameter estimation. Under the Gaussian-Markov assumption, the least squares estimator can be regarded as un-biased estimators, in which the variance is minimized. With the optimal parameter vector p*, the inverse matrix of variance for p* is called the information matrix.

After solving the time-dependent ODE equations, the sensitivity matrices are computed as at time points ti(i = 1, …, N). Under the Gaussian assumption, the Fisher information matrix can be calculated as Eq (15). (15)

Given input-output data, the FIM represents a measures of the information content with regard to the parameters θ in the model. If the measurements for cp(ti) and cs(ti) are independent, then the covariance matrix of measurements consist of diagonal elements, described by Eq (16). (16)

In Eq (16), crjj denotes the diagonal elements of the inverted FIM and represent the variance of measured state variables m at time point ti. Assume that Gaussian noise in the measured data, the FIM is given by Eq (17). (17) Where S(ti; p*) denote sensitivity matrix for p*, and represents the measurement covariance. The diagonal elements of covariance matrix C(ti) should be positive. Parameter uncertainty region can be further determined by eigenvectors and inverse eigenvalues of the FIM. In order to analysis estimation accuracy in deterministic modeling, the parameter uncertainty region is investigated using the information matrix F(p*). In general, small eigenvalues of F(p*) indicate large uncertainty. If the calculated F(p*) is non-singular, then the Crámer-Rao lower bound for the variances of estimated parameters can be determined according to Eq (18). (18) where are diagonal elements of information matrix. The framework of estimation accuracy analysis using FIM-based optimality is designed considering both model accuracy and measurement cost. Given measured expression levels of expressed proteins, the locally optimal kinetic parameters are inferred through curve fitting. Sensitivity matrices for model parameters are calculated to construct FIM.

2 Optimal identification of gene networks

Considering complex regulations between network components, it is beneficial to perform optimal experimental design to obtain measurements with maximized information content. Informative measurements are useful in enhancing the accuracy of parameter identification. Generally, experimental design can be performed based on statistic criterions that are closely related with covariance matrices of the selected models. The OED problem for gene network inference have been investigated in order to obtain mathematical models with high credibility [2830]. For simulation-based OED, the basic purpose is to minimize the uncertainty bounds on the estimated parameters [31, 32]. Computational approaches play a crucial role in reducing the uncertainty of structure identification and parameter estimation. This section will propose a novel framework that pursue a balance between estimation accuracy and measurement costs in modeling synthetic gene networks.

2.1 Basic framework of optimal experiment design(OED)

For deterministic modeling of synthetic gene circuits, major factors that influence the accuracy of estimation include but are not limited to information content of measurement, the number of time points, the sampling periods of output. There are several optimal principles that guide the process of optimal design. The D-optimality principle tries to minimize the covariance matrix of parameter estimate, which correspond to maximizing the determination of information matrix. The A-optimality seeks to minimize the trace of the inverse of the FIM. This A-optimality criterion results in minimizing the average variance of estimated regression coefficients. As for the commonly used D-optimality, this principle maximize the determinant of the FIM and results in maximizing the differential Shannon information content of the parameter estimation [33, 34]. The OED problem can be solved by maximizing measures of FIM, i.e. ζ* = argmaxΦ(FIM(θ, ζ)), where ζ is a function of input level and sampling periods etc. The measurement set selection design can be converted to an optimization problem , where λi is an integer weight that is either 0 or 1.

Another design principle is based on E-optimality, where the minimum eigenvalue of FIM is maximized. This study applies the modified E-optimal criteria as the quantitative index to evaluate the constructed mathematical models of gene networks. The modified E-norm of the FIM is defined as the ratio of the maximum eigenvalue of the FIM by its minimum eigenvalue [35, 36]. This modified E-optimal criteria is defined by Eq (19). (19) where λF represent the eigenvalues of information matrix , tf represents the duration time of perturbation experiments. Information matrix are calculated based on best available estimated parameter vector θ instead of true value θ*. This modified E-optimal criteria can be further used a useful index to determine the appropriate input level that is likely to provide highly informative measurements.

As the sample sizes of measurements are usually limited for regulatory networks due to experimental cost [37], this paper applies an optimization method that reduce experimental cost without loss of modeling accuracy. Both quality and experimental cost have been taken into consideration in the cost function. The proposed OED method search an appropriate sample size by solving the optimization problem, shown by Eq (20). (20) where nm and np represent the number of measured time points and parameters. In the proposed cost function, the first penalty term represents estimation accuracy, with a modified E-optimal criteria. The second penalty term corresponds to the normalized measurement cost, which is related with the number of measured time points. The second penalty term is motivated by the degree of freedom in χ2 distribution. Since the combination of two monotonic functions lead to a unimodal function, the trajectory of cost function is expected to firstly decline and then increase after reaching the minimal point.

2.2 Harmony search-based OED

As a kind of nature-inspired optimization, harmony search(HS) algorithm shares certain characters with other evolution strategies algorithms such as genetic algorithms. The basic idea of harmony search algorithm is to find a vector x that minimizes a given objective function [38]. Standard HS algorithm begins with generating HMS random vectors and stores these vectors in harmony memory denoted by HM. New candidate solution vectors xnew are generated to replace the worse vector in HM. As initial algorithm parameters, global variables including the number of variables v, the maximum of iterations uIter, pIter, harmony memory size HMS and harmony consideration rate HMCR are defined. The pseudo code of HS-based OED is described by Algorithm 1.

There are two iteration loops in the diagram, where the outer loop is responsible to find the best input level u* and the inner loop update the best model parameter estimation p*. In Algorithm 1, lb, ub represent the lower and upper bound for the values of HMu, HMp. The cfun1, cfun2 represent the cost function of inner and outer iteration loop respectively.

Algorithm 1: Pseudo code of HS-based optimal experimental design algorithm.

Input: Choose input u0 based on prior knowledge.;

1 Initialization of HS-OED algorithm: HMu,HMp,PAR,HMCRu,HMCRp,NV AR;

2 Estimate parameter vector p0 using u0;

3 for i = 1 to uIter do

4  initialize HMu;

5  HMu(1,:) = ;

6  HMu(2:end,:) = lb+(ub-lb)*rand();

7  Compute cfun1(u, p) of every u in HMu and load them into fitness array;

8  Compare fitness array to find best input;

9  Update HMu and fitness array;

10  Update as best input;

11  for j = 1 to pIter do

12   initialize HMu;

13   HMp(1,:) = ;

14   HMp(2:end,:) = lb+(ub-lb)*rand();

15   Compute cfun2(u, p) for every parameter in HMp and load to fit2;

16   Compare fit2 array to determine the best parameter vector p*;

17   Update HMp and fit2 array;

18   Update as the current best parameter vector;

19  end

20 end

Output: Optimal input level u*, parameter vector p*;

In the HS-based OED framework, ubest and are used to find better parameter vector pbest. The proposed HS-based OED determines the optimal input level u* in the outer loop and computes the model parameter p* in the inner loop. When the inner iteration obtains a candidate parameter vector pc, it is necessary to compare the value of cfun1(u, pc) with that of current best parameter vector p*. When the condition cfun1(u, pc) < cfun1(u, p*) is satisfied, the model parameter vector will be updated as p* = pc. In this way, the second stage of iteration optimization begins searching the optimal parameter vector u* with optimal input level p*. This two stage optimization framework is described by Fig 1.

thumbnail
Fig 1. The flow chart of harmony search-based OED using a double nested loops.

The inner loop updates the parameter vector based on an initial model parameter vector and the outer loop aims to determine the optimal input level u* by minimizing the cost function.

https://doi.org/10.1371/journal.pone.0213977.g001

When brute screening strategy is applied to find a suitable input level, the screening sizes and steps should be carefully selected. Such screening strategy is relatively inefficient and restricted to a fixed combination of possible input levels. Compared with the screening search strategy, this study applies random search strategy which is able to find an flexible input level rather than selecting one fixed input value. Another advantage of this random search strategy is that the optimal parameter vector of a given system is estimated at the same time.

3 Experimental outcomes and analysis

The main purpose of deterministic modeling based OED is to offer an suitable input signal u* for two types of genetic circuits. Kinetic model parameters are estimated and evaluated from the view of error indexes. Two types of engineered gene circuits with mathematical models are selected as benchmarks [39]. Each system is described by a set of ODEs that describe biochemical reactions governing protein species. Parameter estimation for synthetic gene networks are performed in two stages: initial estimation and parameter refinement. Initial parameter vectors are computed based on prior information and biological experience such as the response time of genetic circuits. The refinement pursues parameters with higher accurateness with the initial parameter vector.

3.1 Unbuffered synthetic gene networks

In the unbuffered system, the chemical species SKN7m activates expression of green fluorescence protein(GFP) from the synthetic promoter PTRSSRE. With the small molecule doxorubicin DOX as input, the reactions considered in the unbuffered system include the DOX activated production of SKN7m, the binding and unbinding of SKN7m, the activated production of reporter protein(GFP) and degradation of all species. Define Cm as the complex formed between SKN7m, the simplified reactions are described as followings: (21) where δc denotes the degradation rate of Cm. Established modeling approaches choose periodic inputs to maximize the steady state peak-peak amplitude percent error between the loaded and unloaded trajectory of output protein GFP. Furthermore, multiple types of inputs including two step and three square inputs are used to induce the response of unbuffered gene networks and to provide input-output data for parameter identification. Error indexes of wRSSE and RMSE under different types of input signals are computed in Table 1.

thumbnail
Table 1. Error indexes of kon, koff in modeling unbuffered gene network.

The wRSSE and RMSE indexes are selected to judge the performance of parameter estimation. The title Step i(i = 1, 2) represent the single and double inputs with the same concentration 20 μM. The columns T100, T200, T400 denote the periodic inputs with periods T1 = 100min,T2 = 200min and T4 = 400min.

https://doi.org/10.1371/journal.pone.0213977.t001

In the parameter setting part of harmony search algorithm, the number of variables is settled as np = 3, the harmony memory size HMS = 15, the harmony consideration rate HMCR = 0.2. Matrices including HMu, fitu are initialized to store relevant indexes, where fitu represents a fitness vector related with optimal input. The first step of optimal identification is to compute the parameter vector using measured input-output data. With the computed vector p, response curves of unbuffered gene networks using four types of inputs including single step up and three periodic inputs are fitted, shown in Fig 2.

thumbnail
Fig 2. Curve fitting of unbuffered gene circuit using four types of input signals.

The periods of first three square inputs are 100,200 and 400 mins respectively. In order to get stable response curve, the simulation time is set as 3000 mins.

https://doi.org/10.1371/journal.pone.0213977.g002

In Fig 2, the oscillatory periods for three square inputs are 100, 200 and 400 mins. Fitted responses of the unbuffered system are basically consistent with the measured output trajectories. This indicates that estimated parameters kon and koff are able to capture part of response dynamics under these circumstance.

Further estimation take all 9 kinetic parameters into consideration. Since incoherent noise may have negative influence on parameter estimation, parameter estimation is performed in two groups: simulation(S) and experimental(E) group. Deterministic modeling of unbuffered systems with 9 kinetic parameters are performed using a step input and four periodic inputs. Similarly, the weighted RSSE(wRSSE) and RMSE indexes under different types of input signals are computed and compared in Table 2.

thumbnail
Table 2. Error indexes of 9 kinetic parameters in modeling unbuffered gene network.

The wRSSE and RMSE indexes are selected to evaluate the performance of parameter estimation. The column denoted Step 1 represents the single and double inputs with the same concentration 20 μM. The columns T150, T200, T250 and T500 denote the periods of square inputs are 150, 200, 250 and 500mins. The symbol # indicates the shortage of relevant tests.

https://doi.org/10.1371/journal.pone.0213977.t002

Since the step test in experimental group provides low-quality measurement, the realistic group only discusses the errors using periodic inputs. Both wRSSE and RMSE indexes fluctuate depending on selection of input signals. This phenomenon indicates that selection of input signals will influence the deviation of estimated parameters from the true values. One of possible reasons is that degrees of perturbation posed to the system are in different levels. Among four square inputs, error indexes obtained by Square 4 that has the least number of measured data points are highest in two groups. During deterministic modeling, the values of wRSSE indexes are generally lower than that of RMSE. To evaluate the effectiveness of estimated parameters, the quantile-quantile(QQ) plots are introduced. When a set of observations are approximately normally distributed, a normal QQ-plot of the observations will lead to an approximately straight line. In this study, QQ-plot is used to compare the estimation error with normal distribution. In numerical simulation, Step 2 represent double step inputs, while square inputs denoted by Square 1,2,4 correspond to the periods of 100, 200 and 400(mins). To visualize the accuracy of estimated model parameters, the quantile-quantile(QQ) plot of Step 2 and Square 1 inputs are compared in Fig 3.

thumbnail
Fig 3. The quantile-quantile(QQ) plots for estimated parameters of unbuffered gene networks using step and periodic inputs.

These QQ plots for unbuffered systems show similar patterns observed in error indexes. Squared inputs are likely to reduce the deviation between estimated parameters from the true values.

https://doi.org/10.1371/journal.pone.0213977.g003

The QQ-plots of estimated parameters for the unbuffered system reflects the deviation of estimated errors against that of standard normal quantile. In Fig 3, the deviation under Square 1 input is smaller than that of other three input signals. Subsequent sensitivity analysis computes the variance for estimated parameters, especially key kinetic parameters. Sensitivity analysis aims to detect those parameters that have significant influence on the response behaviors of a given system, depicted in Fig 4.

thumbnail
Fig 4. Sensitivity analysis of unbuffered gene circuits.

The kinetic parameter kon has negative sensitivity while the degradation rate koff has positive sensitivity. The values of sensitivity matrices exhibit periodic changes.

https://doi.org/10.1371/journal.pone.0213977.g004

In Fig 4, the parameter kon has negative sensitivity while koff has positive sensitivity. That means changes in the value of kinetic parameters kon, koff, have opposite impacts on the response of GFP protein. The quantitative influence of kon is slightly higher than that of koff. With the computed Fisher information matrix, the Crámer-Rao lower bound(CRLB) for kon and koff are computed as 0.687 and 0.0415 respectively, illustrating that parameter uncertainty for production rate kon is higher than that of the degradation rate koff.

Another important task of optimal identification is to provide a low-cost solution without much loss of model accurateness. Measurement cost, which is reflected in the number of inferred parameters, is expected to be high for complex systems. When the number of measured points go beyond a specific level, the benefit brought by increasing the sample size become limited while the cost keeps increasing. The optimal identification approach uses a modified harmony search(HS) method to select a suitable sample size. Selection of measured output points nm, should also consider the complexity of a given system, that is related with the number of unknown parameters np. When the value of nm exceeds a specific level, the benefit of increasing the amount of measured time points in enhancing the modeling accuracy will be limited.

3.2 Buffered synthetic gene networks

In the buffered gene networks, an additional driver module was introduced in the buffered gene networks to eliminate the retroactivity and to improve the response dynamics of regulatory systems. Similar with the unbuffered system, the small molecule DOX induces expression of protein STAT5 − HKRR and activator of transcription 5(STAT5). Buffered genetic circuits accomplish cellular functions based on a series of biochemical reactions, which include DOX activated production of STAT5-HKRR from promoter, SKN7 activated production of reporter protein. Phosphorylation of STAT5-HKRR and YPD1/SKN7 phosphotransfer reactions can be described as the followings: (22) where SKN7** denotes doubly phosphorylated SKN7, which activates expression of reporter GFP from the synthetic promoter PTRSSRE and also binds plasmid-encoded load sites. In unbuffered systems, SKN7m binds promoter directly, while SKN7 in this buffered system needs activation by a series of phosphotransfer reactions. Input concentrations of small molecule DOX will change the response behaviors of genetic circuits. When the concentration of step signal increases, the output level of reporter protein GFP increase as a consequence, shown in Fig 5.

thumbnail
Fig 5. The input-output relations between input DOX and output GFP for buffered genetic circuit.

Optimal experiment design requires a suitable input level to stimulate the system behaviors. In the numerical simulation, the variation range of input DOX concentration is selected as [10−3,102]μM. The output-input ratio for buffered gene circuit decreases as the concentrations of input DOX increases, and remains stable a level of 0.0195.

https://doi.org/10.1371/journal.pone.0213977.g005

In Fig 5, the value of output-input ratio that can be also regarded as the gain of gene circuits, changes in the interval [1.95 × 10−2, 350]. Low-level magnitude of input signal contain limited energy to perturb the system dynamics while high level of input push the system to a saturation condition. When the magnitude of input exceeds a threshold, the expression level of reporter protein declines to increase and keeps a stable level. It is important to choose a suitable level of input for subsequent parameter refinement. Instead of grid search, a novel two-step search strategy is applied to determine the optimal level of input. After double nested iterations using harmony search(HS) algorithm, the optimal level of input is selected as 28.6 μM and the model parameter p* is inferred. Using this optimal input level, the comparison of predicted and measured output of buffered gene network using estimated parameter p* are shown in Fig 6.

thumbnail
Fig 6. Curve fitting for output of buffered gene networks.

In order to validate the effectiveness of estimated parameters, an additional step up signal is introduced to stimulate the response behavior of buffered systems.

https://doi.org/10.1371/journal.pone.0213977.g006

In the first step, initial model parameter vector p0 in ODEs of buffered gene networks are computed by minimizing the error index between the predicted and measured output. From Fig 6, the trajectories of predicted output match the measured output, revealing that initial parameter vector capture output dynamics to some extent. Multiple squared inputs are used to validate the effectiveness of parameters under various experimental conditions. The magnitude of testing inputs is set as 20μM and the oscillatory periods of square waves are selected as 100,200,300,400,500 (min) respectively.

In addition, response behaviors of the buffered system are also influenced by adding load plasmids that are regarded as genetic loads. In numerical simulations, three levels of genetic loads are introduced to evaluate the robustness of inferred model parameters. The load variants are encoded on high-copy 2μ yeast plasmids, with unloaded(model+0x), single loaded(model+x) and double loaded(model+2x) additional copies of PTRSSRE [40]. Such genetic loads lead to reversible binding reactions and slow down the increase of free SKN7m that activates the expression of reporter protein GFP. Under these circumstances, the predicted trajectories of expressed GFP are compared with that of the measured output, shown in Fig 7.

thumbnail
Fig 7. Comparison of predicted output trajectories and measured output GFP concentrations under three square inputs.

The periods for three square inputs are 100,300,500 mins. The legends pt0, pt1 and pt2 correspond to the experimental conditions of unloaded(model+0x), single loaded(model+x) and double loaded(model+2x).

https://doi.org/10.1371/journal.pone.0213977.g007

According to the principle of persistent excitation, square input signals are considered to excite the system dynamics more sufficiently than step signals. It is observed from Fig 7 that response curves under the conditions of model+0x, model+1x and model+2x are approximately the same trajectories. Under three levels of loads, the response curves of buffered system show limited attenuation. In the part of parameter refinement, estimated parameter vector p* reflects the desired behavior of buffered systems. Subsequent analysis will evaluate the accuracy of parameter vector numerically.

Under regulation of multiple periodic inputs, the response curves of buffered gene networks exhibit oscillatory behaviors with different periods. With measured input and output data, unknown parameters of the ODE model can be acquired by estimation algorithms. After obtaining the parameter vector p* using simulated and experimental datasets, two kinds of error indexes including wRSSE and RMSE indexes are calculated. Compared with single step signal, double input signal is introduced in the simulation group. In simulation(S) and experimental(E) group, two step inputs and four square inputs are introduced to perturb system dynamics. Signal periods Square i(i = 12, 3, 4) in the simulation group are selected as 150,200,250 and 500 minutes, that are consistent with the settings of inputs in realistic experiments. Two kinds of error indexes are computed and depicted in Table 3.

thumbnail
Table 3. Performance evaluation indexes of deterministic modeling for the buffered gene network.

The columns T150, T200, T250 and T500 denote the periods of square inputs are 150, 200, 250 and 500 minutes. The symbol # indicates the lack of measured data with efficient quality.

https://doi.org/10.1371/journal.pone.0213977.t003

It is observed from Table 3 that double input signal Step 2 is able to obtain relatively low error indexes, than that obtained by single step input Step 1 in simulation. This phenomenon indicates that double step input can improve the estimation accuracy by increasing the degree of perturbation. Meanwhile, error indexes of the buffered system are slightly higher than that of unbuffered system, reflecting robust of the proposed estimation algorithm. Among four square inputs, the periodic input T2 = 200min obtains the lowest estimation error, showing that oscillatory periods of square inputs is another potential factor that influence the accuracy of deterministic modeling.

For this buffered network, sensitivity analysis is performed to detect key parameters that have significant influence on the output behavior. Among 23 model parameters in the buffered system, this study selects 18 parameters and computes the corresponding sensitivity matrices. For buffered gene networks, the trajectories of sensitivity matrices of 18 kinetic parameters that change over time are depicted in Fig 8.

thumbnail
Fig 8. Sensitivity analysis of model parameters for buffered gene network.

Among these 23 kinetic parameters, sensitivity matrices of 18 model parameters are plotted to reflect the various levels of influence.

https://doi.org/10.1371/journal.pone.0213977.g008

From basic sensitivity analysis, kinetic parameters δz, k6, kp are considered kinetic parameters with highest level of sensitivity. Chemical reactions involved in genetic networks can be reversible. Each reaction has two parameters related with production rates. Perturbations of production rates are more likely to alter the output behaviors of genetic circuits. It is noted that δz, k6 have negative sensitivity while kp has positive sensitivity. Subsequent analysis focus on quantitative impact of those kinetic parameters with high sensitivity levels on the system output. Based on trajectories of sensitivity matrices in Fig 8, parameters kp and k6 have positive and negative sensitivity respectively. Furthermore, ±50% numerical perturbations have been introduced to kinetic parameters kp, kpp, k6 and k7 that have different degrees of influence on the system output. Quantitative influences on the system output GFP are analyzed in three-dimensions, shown in Fig 9.

thumbnail
Fig 9. Quantitative influence of kinetic parameters on system output.

Kinetic parameters with various sensitivity levels have different degree of influence on system output, which correspond to the expressed level of reporter protein GFP in the synthetic genetic network.

https://doi.org/10.1371/journal.pone.0213977.g009

Influence of kinetic parameters with various sensitivity levels are analyzed. Parameters including kp and k6 with high sensitivity have relatively strong impact on the system output. In Fig 9(a), parameters kp and k6 have exhibited opposite impacts on the system output, i.e. the expressed GFP concentration. When the value of kp increases, the expressed GFP level is promoted as a consequence. It is observed from Fig 9(b) that variation of kp has significant impact on output behaviors while k7 has a limited power to change expression behaviors of reporter protein GFP.

In order to compute the modified-E optimal criteria and select the optimal input magnitude, the proposed two-stage to calculate the best input level u* and the parameter vector p* in double nested loops. In the outer iteration loop, the HS algorithm parameters are settled as uIter = 3000, HMS = 15, HMSu = 10, HMCR = 0.8. According to the previous analysis of input-response relations, the feasible interval for input level is selected as [1,30]μM. In the second stage of optimization, the maximum number of iterations in the inner loop is pIter = 5000. Other parameters are the same with that in outer loop. The modified E-optimality criteria and cost function cfun1(u, p*) are minimized in the outer loop. In Fig 10, the trajectories of modified E-optimality criteria under five inputs monotonically decreases and converges to a domain around zero.

thumbnail
Fig 10. Trajectories of the modified E-optimality criteria using HS-OED algorithm.

In the outer loop, the value of modified-E optimal criteria decreases as the value of fitness function is minimized by the proposed HS-OED method.

https://doi.org/10.1371/journal.pone.0213977.g010

During the first stage of optimization, the error value reaches the minimum when the iteration number exceeds 2000. This study focuses on keeping a balance between experimental cost and estimation accuracy in identification of synthetic gene networks. After initial curve fitting and parameter refinement, the optimal parameter vector for the buffered system is estimated using the proposed two-stage optimization approach.

In the two step optimization, the outer loop determines the optimal input while the inner loop applies the designed input to estimate parameter vector. In order to compare the proposed HS-based OED with other approaches, various optimization are compared in the inner iteration loop, where the fitness function denote estimation error. Heuristic optimization approaches including particle swarm optimization(PSO) and genetic algorithm(GA) are able to perform the task of optimal experimental design with suitable fitness functions [41, 42]. Under the framework of deterministic modeling, GA and PSO algorithms have been applied to minimize the fitness function value, thus estimating optimal parameter vectors. For GA algorithm, the population size is 50, the mutation and crossover rate are settled as 0.8 and 0.2, which correspond to HMCR and PAR in HS algorithm. For PSO algorithm, the weight is 1 and coefficients are c1 = c2 = 1. Trajectories of fitness function provided by GA,PSO and HS-OED algorithm are compared by Fig 11.

thumbnail
Fig 11. Fitness trajectories of harmony search denoted by HS-OED and two heuristic optimization approaches in the inner iteration loop of OED.

The HS-OED algorithm has reduced computational time than GA and PSO algorithm, and obtaining lower output errors in estimating parameter vectors.

https://doi.org/10.1371/journal.pone.0213977.g011

In Fig 11, the proposed HS-OED algorithm has shown similar convergence patterns compared with GA and PSO optimization approaches. Standard PSO method usually suffers from premature convergence problem due to loss of diversity in solution search. Experiments of optimal experimental design are performed by the MATLAB software on a PC with Intel i5-3320M and 8GB RAM. After four replicates, the average computation time of 3000 iterations for HS-OED algorithm is 428 seconds. While average computational time of GA and PSO-based OED for 3000 iterations are 3137 and 7378 seconds. In addition, the average best fitness value of HS-OED is computed as 0.0332, which is significantly lower than 0.207 and 0.1873 obtained by GA and PSO-based OED respectively. In this case, HS-OED algorithm provides a feasible and efficient solution to improve optimization ability.

To evaluate the performance of these optimal identification methods, the proposed HS-OED algorithm is further compared with other two methods denoted by PSO-OED and GA-OED in accuracy evaluation. After parameter estimation, estimation accuracy is analyzed based on the value of Crámer-Rao lower bound(CRLB). In estimation analysis, the CRLB values of kinetic parameters ki(i = 1,…,5) are compared in Table 4.

thumbnail
Table 4. Comparison of lower bounds in estimating kinetic parameters ki obtained by HS-OED algorithm and GA,PSO-based OED methods.

These indexes are obtained in numerical simulation.

https://doi.org/10.1371/journal.pone.0213977.t004

Those parameters with higher CRLB values are considered to have higher degree of uncertainty during modeling. Accuracy indexes computed by HS-OED algorithm are significantly lower than that of other two OED methods. And GA-OED has provided more superior performance than PSO-OED. Using experimentally measured step response data, the CRLB for estimated parameters k1, k2 are computed as 1.486 and 0.185, which are higher than that of kon, koff in the unbuffered system. For the buffered system, the CRLB for production rates of k3, k4 are computed as 1.432 and 0.375. The kinetic parameter kg has the highest CRLB value of 2.2767, indicating that it has the highest level of uncertainty. With the purpose of illustrating the advantage of optimal input u*, the values of CRLB for five parameters under u = 5μM, 10μM, 20μM and u* are compared in Fig 12.

thumbnail
Fig 12. Crámer-Rao lower bound(CRLB) of kinetic parameters k1, k2, k3, k4 and k5 of the buffered system.

Compared with three input levels i.e. u = 5μM, 10μM, 20μM, the optimal input u* is able to reduce the values of CRLB for specific parameters.

https://doi.org/10.1371/journal.pone.0213977.g012

The number of measured time points nm is another crucial factor that influence the modeling quality. After reaching a specific threshold, the improvement of model accuracy brought by increased nm become limited. Meanwhile, the variance of parameter estimation decline to decrease and stays a level that is beyond that of CRLB. The optimal identification method tries to control the experimental cost without loss of much modeling quality. After simulation experiments and comparison, the penalty coefficient λ in cost function is tuned for specific gene networks to achieve a tradeoff. To find the suitable number of measurements, the experiments apply nm as the independent variable and calculate the value of cost function that has two penalty terms. Trajectories of cost function are recorded with the increasing iterations, shown in Fig 13.

thumbnail
Fig 13. Cost trajectories of five periodic inputs for buffered system.

Five trajectories of cost function converge to the point N = 50, indicating the amount of measurements meet the requirement of modeling.

https://doi.org/10.1371/journal.pone.0213977.g013

The step is set as 10 and the penalty coefficient λ for measurement cost is suggested as 0.8. It is observed from Fig 13 that five trajectories of cost function meet together at the point which correspond to nm = 50 measured time points. This mean that the sample size nm = 50 can be regarded as a suitable number of measured time points for buffered system. Considering the limited number of time points single experiment, several perturbation experiments under different conditions can be performed to collect enough number of measurements.

As visualizing the trajectories of cost function provide limited information about optimal identification, the three-dimensional surface depicts a broader view of this process, shown in Fig 14.

thumbnail
Fig 14. The three-dimensional surface of designed cost function under five square inputs.

The optimal identification algorithm picks nm = 50 as the suitable number of measured time points.

https://doi.org/10.1371/journal.pone.0213977.g014

The measurement cost is reflected by the number of observed data points nm which is proportional to resources spent in experiments. The optimal identification method increases the number of measured time points at a step of 10. In order to reach the same normalized accuracy level, which can be denoted by CRLB/nm, the necessary number of measured points in buffered system is approximately 67% higher than that in unbuffered system. since nm and np equal to 50 and 23 in two systems. For buffered gene circuit, the value of penalty term that represents the measurement cost (nmnp)/nm equal to 0.54, which multiplies a penalty coefficient λ. The proposed optimal identification is effective to pursue a balance between two factors in deterministic modeling of gene circuits.

Experimental outcomes with two synthetic gene circuits indicate the parameter refinement is able to improve the model quality as well as to reduce the uncertainty. Different with traditional single optimality criteria, the propose optimal identification suggest not to increase the measured points of output when the profit of increasing measurements is not significant. Computational burden of this nested iteration framework is still huge, especially for complex synthetic gene networks. Incorporating biological prior knowledge in the constraints conditions will be beneficial to reduce the viable region.

4 Conclusion

In order to guide the modular construction of synthetic gene networks, a novel optimal identification method that provides accurate predictive models at a low experimental cost is proposed. For synthetic gene networks, optimal experiment design(OED) is feasible and effective to design an appropriate input level through maximizing information content related optimal criteria. In this case, this paper proposes a harmony search-based OED(HS-OED) approach and designs a two-stage optimization. Optimal input signals and parameter vectors are computed by this two-stage optimization framework. The first contribution of this paper is to apply harmony search strategy instead of brute screening search to improve the accuracy and efficiency in searching optimal inputs. Secondly, the designed cost function can be used to select an appropriate number of data points collected for computational modeling of gene circuits. Simulation and experimental analysis indicate the proposed HS-OED method can obtain accurate model parameters than established heuristic algorithms based OED method, with enhanced estimation accuracy. Furthermore, the amount of measured data points are reduced to offer a low-cost identification solution for synthetic gene networks.

Appendix

In the parameter estimation experiments, kinetic parameters of two synthetic gene networks are computed based on the mechanistic mathematical models. The ODE model for unbuffered system consists of 3 equations, shown as follows. where Xm and denote abundance of SKN7m and reporter protein GFP respectively, n1, n2 are Hill coefficients. The ODE system for buffered system consists of 10 equations and 23 kinetic parameters. Assume Z denotes STAT5-HKRR fusion, W, X represent YPD1 and SKN7, is the reporter protein GFP. Thus the ODE system for buffered genetic system can be described as follows: where G represents the expression level of reporter protein GFP, parameters km, kg are maximum activated protein production rates, Kdox, Kgfp are respective Kd for Hill equations, kw, kx are production rates for YPD1 and SKN7 respectively. Kinetic parameters ki(i = 1, 2, 3, 4) are constrained to 1-50 [μM]−1. Based on prior knowledge, the conditions k1k3, k2k4 are used as constraints during computational modeling. Retroactivity between modules are denoted by r1, r2 which will attenuate the response behaviors of synthetic gene networks.

Acknowledgments

We thank Prof Guang Li and Wuming Zhang for the instructive discussions and suggestions.

References

  1. 1. Yamazaki K.I., de Mora K. and Saitoh K. BioBrick-based ‘Quick Gene Assembly’in vitro. Synthetic Biology. 2017,2(1): p.ysx003.
  2. 2. Xie M., Fussenegger M. Designing cell function: assembly of synthetic gene circuits for cell biology applications. Nature Reviews Molecular Cell Biology. 2018, 1(19):507–525.
  3. 3. Segall-Shapiro T. H., Sontag E. D., Voigt C. A. Engineered promoters enable constant gene expression at any copy number in bacteria. Nature biotechnology. 2018, 36(4):352. pmid:29553576
  4. 4. Srinivas N, Parkin J, Seelig G, Winfree E., Soloveichik D. Enzyme-free nucleic acid dynamical systems. Science. 2017, 358(6369):eaal2052. pmid:29242317
  5. 5. Huynh L, Tagkopoulos I. Optimal part and module selection for synthetic gene circuit design automation. ACS Synthetic Biology. 2014, 3(8):556. pmid:24933033
  6. 6. Carbonell-Ballestero M., García-Ramallo E., Montañez R., Rodríguez-Caso C. and Macía J. Dealing with the genetic load in bacterial synthetic biology circuits: convergences with the Ohm’s law. Nucleic Acids Research. 2016, 44(1):496–507. pmid:26656950
  7. 7. Mohammadi P., Beerenwinkel N., Benenson Y. Automated design of synthetic cell classifier circuits using a two-step optimization strategy. Cell systems. 2017, 4(2):207–218. pmid:28189580
  8. 8. Le Novere N. Quantitative and logic modelling of molecular and gene networks. Nature Reviews Genetics. 2015, 16(3):146.
  9. 9. Huang B., Jia D., Feng J., Levine H., Onuchic J.N., Lu M. RACIPE: a computational tool for modeling gene regulatory circuits using randomization. BMC Systems Biology. 2018, 12(1):74–85. pmid:29914482
  10. 10. O’Brien E.L., Van Itallie E., Bennett M.R. Modeling synthetic gene oscillators. Mathematical Biosciences, 2012, 236(1):1–15. pmid:22266166
  11. 11. Alon U. Network motifs: theory and experimental approaches. Nature Reviews Genetics. 2007, 8(6):450–461. pmid:17510665
  12. 12. Mangan N.M., Brunton S.L., Proctor J.L. Kutz J.N. Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Transactions on Molecular, Biological and Multi-Scale Communications. 2016, 2(1):52–63.
  13. 13. Pirgazi J., Khanteymoori A. R. A robust gene regulatory network inference method base on Kalman filter and linear regression. PloS one. 2018, 13(7):e0200094. pmid:30001352
  14. 14. Llamosi A., Gonzalez-Vargas A.M., Versari C., Cinquemani E., Ferrari-Trecate G., Hersen P, et al. What population reveals about individual cell identity: single-cell parameter estimation of models of gene expression in yeast. PLoS computational biology. 2016, 12(2):e1004706. pmid:26859137
  15. 15. Marchisio M.A., Stelling J. Computational design of synthetic gene circuits with composable parts. Bioinformatics. 2008, 24(17):1903–1910. pmid:18579565
  16. 16. Liao C., Blanchard A.E., Lu T. An integrative circuit-host modelling framework for predicting synthetic gene network behaviours. Nature Microbiology. 2017, 2(12):1658–1666. pmid:28947816
  17. 17. Meyer P., Cokelaer T., Chandran D., Kim K.H., Loh P.R., Tucker G., et al. Network topology and parameter estimation: from experimental design methods to gene regulatory network kinetics using a community based approach. BMC Systems Biology. 2014, 8(1):13. pmid:24507381
  18. 18. Firman T., Balázsi G., Ghosh K. Building predictive models of genetic circuits using the principle of maximum caliber. Biophysical journal. 2017, 113(9):2121–2130. pmid:29117534
  19. 19. Faller D., Klingmüller U., Timmer J. Simulation methods for optimal experimental design in systems biology. Simulation. 2003, 79(12):717–725.
  20. 20. Bandara S., Schlöder J.P., Eils R., Bock H.G., Meyer T. Optimal experimental design for parameter estimation of a cell signaling model. Plos Computational Biology. 2009, 5(11):e1000558. pmid:19911077
  21. 21. Melykuti B., August E., Papachristodoulou A., El-Samad H. Discriminating between rival biochemical network models: three approaches to optimal experiment design. BMC Systems Biology. 2010, 4(1):38. pmid:20356406
  22. 22. Vanlier J., Tiemann C. A., Hilbers P. A., van Riel N. A. Optimal experiment design for model selection in biochemical networks. BMC Systems Biology. 2014, 8(1):1–16.
  23. 23. Dehghannasiri R., Yoon B. J., Dougherty E. R. Optimal experimental design for gene regulatory networks in the presence of uncertainty. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015, 12(4):938–950. pmid:26357334
  24. 24. Adoteye K., Banks H. T., Flores K. B. Optimal design of non-equilibrium experiments for genetic network interrogation. Applied Mathematics Letters. 2015, 40:84–89. pmid:25558126
  25. 25. Raue A., Kreutz C., Maiwald T., Bachmann J., Schilling M., Klingmüller U., et al. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics. 2009, 25(15):1923–1929. pmid:19505944
  26. 26. Busetto A. G., Hauser A., Krummenacher G., Sunnaker M., Dimopoulos S., Ong C. S., et al. Near-optimal experimental design for model selection in systems biology. Bioinformatics. 2013, 29(20):2625–2632. pmid:23900189
  27. 27. Nimmegeers P., Telen D., Logist F., Van Impe J. Dynamic optimization of biological networks under parametric uncertainty. BMC Systems Biology. 2016, 10(1):86–105. pmid:27580913
  28. 28. Steinke F., Seeger M. W., Tsuda K. Experimental design for efficient identification of gene regulatory networks using sparse Bayesian models. BMC Systems Biology. 2007, 1(1):51–65. pmid:18021391
  29. 29. Dehghannasiri R., Yoon B. J., Dougherty E. R. Optimal experimental design for gene regulatory networks in the presence of uncertainty. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015, 12(4):938–950.
  30. 30. Ud-Dean S. M., Gunawan R. Optimal design of gene knockout experiments for gene regulatory network inference. Bioinformatics. 2016, 32(6):875–883. pmid:26568633
  31. 31. Banga J. R., Balsacanto E. Parameter estimation and optimal experimental design. Essays in Biochemistry. 2008, 45:195–209. pmid:18793133
  32. 32. Adoteye K., Banks H. T., Flores K. B. Optimal design of non-equilibrium experiments for genetic network interrogation. Applied Mathematics Letters. 2015, 40:84–89.
  33. 33. Bouffier A. M., Arnold J., Schüttler H. B. A mine alternative to d-optimal designs for the linear model. Plos One. 2014, 9(10):e110234. pmid:25356931
  34. 34. Wilson, A. D., Murphey, T. D. Local e-optimality conditions for trajectory design to estimate parameters in nonlinear systems. American Control Conference. Proc Am Control Conf. 2014:443.
  35. 35. Walter E., Pronzato L. Qualitative and quantitative experiment design for phenomenological models—a survey. Automatica. 1990, 26(2):195–213
  36. 36. Stigter J. D., Molenaar J. Network inference via adaptive optimal design. BMC Research Notes. 2012, 5(1):518. pmid:22999252
  37. 37. Zhu F., Shi L., Engel J. D., Guan Y. Regulatory network inferred using expression data of small sample size: application and validation in erythroid system. Bioinformatics. 2015, 31(15): 2537–2544. pmid:25840044
  38. 38. Geem Z. W., Joong H. K., Loganathan G. V. A new heuristic optimization algorithm: harmony search. Simulation. 2001, 76(2):60–68.
  39. 39. Mishra D., Rivera P. M., Lin A., Del Vecchio D., Weiss R. A. load driver device for engineering modularity in biological networks. Nature Biotechnology. 2014, 32(12):1268–1275. pmid:25419739
  40. 40. Futcher A.B., Cox B.S. Copy number and the stability of 2-micron circle-based artificial plasmids of Saccharomyces cerevisiae. J. Bacteriol. 1984, 157:283–290. pmid:6361000
  41. 41. Lin C. D., Anderson C. M., Hamada M. S., Moore L. M., Sitter R. R. Using genetic algorithms to design experiments: a review. Quality and Reliability Engineering International. 2015, 31(2):155–167.
  42. 42. Liu Z., Zhu P., Chen W., Yang R.J. Improved particle swarm optimization algorithm using design of experiment and data mining techniques. Structural and Multidisciplinary Optimization. 2015, 52(4):813–826.