ReviewRecent developments in parameter estimation and structure identification of biochemical and genomic systems
Introduction
The task of biomathematical modeling comprises the conversion of a biological system into a simplified analogue that is easier to analyze, interrogate, predict, extrapolate, manipulate, and optimize than the biological system itself. The typical approach to mathematical model construction consists of nine phases: (1) data selection; (2) collection of information on network structure and regulation; (3) specification of assumptions and simplifications; (4) selection of a mathematical modeling framework; (5) estimation of parameter values; (6) model diagnostics; (7) model validation; (8) model refinements; and (9) model application (see Fig. 1).
The first phase requires the identification and selection of data that are available and suitable to support the purposes of the modeling effort. The second phase is dedicated to collecting information regarding the structure and regulation of the system from the literature and, where feasible, from de novo experiments. This phase is confounded by the fact that the true topology of the biological network is often not fully understood and that regulatory details are in many cases incomplete, obscure, or entirely missing. Under these circumstances the task of this and the next phases includes inferences about the network structure and its regulation from biological data. After collecting all relevant information regarding the biological system, the third phase is dedicated to combining this information with additional, acceptable assumptions and simplifications that aim to fill the gaps in the available information. During this phase one also decides which components and interactions of the system should be included in the model. The results are usually visualized as diagrams with nodes denoting the components and arrows representing interactions between them. The fourth phase includes the choice of a suitable mathematical modeling framework and the formulation of symbolic equations. The process usually starts with converting the ‘wiring diagram’ or the ‘network topology’ obtained from the third phase into model equations. In many biological systems analyses, these form a set of ordinary differential equations that represent the dynamic changes in system variables and are governed by fluxes between them. The particular symbolic format of the fluxes depends on the chosen mathematical modeling framework and defines what questions can be asked of the model and what types of methods will be applicable. After the symbolic equations are formulated, the task of the fifth phase is to determine appropriate numerical parameter values that convert the symbolic model into a numerical model and makes the latter consistent with experimental observations. Once this parameterized, initial model is obtained, the sixth phase is dedicated to diagnostics of the model, including analyses associated with steady states, sensitivities, gains, and stability, as well as dynamic features, such as bifurcations and oscillations. In the seventh phase the validity of the model is tested, either with experimental data that had not been used for model construction in a process called cross-validation, or with information from a different biological level. For instance, a metabolic model could be tested against physiological or clinical observations. As presented so far, the modeling process appears to be quite straightforward. However, in most cases it is not linear but cyclic, requiring the return to earlier phases. Addressing the iterative nature of modeling, an eighth phase of model refinement is almost always necessary. Finally, once the model is deemed reliable and appropriate for the stated purposes, phase nine allows the modeler to reap the fruit of the laborious model design. It is now possible to make predictions, generate new, testable hypotheses, suggest the design of novel biological experiments, or manipulate and optimize the model, for instance, toward increases in yield of some organic compound in metabolic engineering, or toward the development of drug treatments in disease.
Among the nine modeling phases, the most challenging task is usually the estimation of parameter values. This estimation is not an isolated task but closely related to other phases in the modeling process. For instance, the size and complexity of the hypothesized model in the second phase have a direct impact on the difficulty of parameter estimation and also affect later analyses as well as the interpretation of results. Most importantly, the choice of the modeling framework naturally influences the degree of complexity, feasibility, and practicability of the parameter estimation task. As a simple example, an explicit linear model permits the use of linear regression methods, which are very well worked out. As soon as the model becomes nonlinear, many of these methods become inapplicable.
Because of the importance of issues related to model selection and to implications for parameter estimation, we will use Section 2 to review the rationale and special demands on mathematical models for biological pathways and to introduce some of the most prevalent and representative modeling frameworks. Generally, model selection and parameter estimation depend on knowledge about the system, the purpose of the modeling effort, and available data. If much is known about the details of the mechanisms governing the biological system, mechanistic formulations, maybe based on principles of physics, are often a good choice. By contrast, if details are lacking, it has been shown that canonical models, such as Lotka–Volterra models and models within Biochemical Systems Theory (BST) are very advantageous for the purposes of mathematical modeling in biological systems. Pertinent details of canonical models will be reviewed in Section 2.3.3.
The development of parameter estimation methods is driven by the availability of experimental data. Different types of data require distinctly different classes of methods. Conversely, the various estimation methods require different types of data. As a pertinent example, data for metabolic pathway models have traditionally characterized the kinetic properties of individual enzymes catalyzing particular steps within a metabolic pathway of interest. The generation of this information occurred hand in hand with the concepts and terminology of enzyme kinetics and the data were measured specifically to parameterize models in familiar formats such as Michaelis–Menten or Hill rate laws. The strategy of using these types of ‘local’ descriptions of model components (one enzymatic process at a time) and merging them into a much more comprehensive, dynamical pathway model is referred to as ‘bottom-up’ or ‘forward’ modeling and will be discussed in Section 3.
Steady-state data may also be used in parameter estimation. These data characterize a metabolic system under conditions where all concentrations have reached constant levels and all fluxes are in balance. Specifically, this type of steady-state analysis is either based on stoichiometric analysis or on experiments that measure the responses of a biological system after a small perturbation around the steady state. Some of these methods will also be discussed in Section 3.
Recent innovations in biological technology enable us to tackle the parameter estimation task in an entirely different, more comprehensive manner, using a ‘top-down’ or ‘inverse’ approach. The biological tools for these purposes are geared toward generating time series data or ‘global’ data of genes, proteins, or metabolites, sometimes under different sets of conditions, such as initial concentrations, or upon various gene knock-outs or the inhibition of specific enzymatic steps. Inverse methods are very appealing, because they provide measurements on cellular or organismal systems in a larger context. In particular, if the data are generated in vivo and with only minimal disturbance of the biological system, the insights gained are considered to be as close to reality as is presently possible and potentially much more valuable than data obtained from experiments performed in an artificial in vitro setting. Details and features of traditional and newly developed techniques with respect to parameter estimation will also be addressed in Section 3.
The potentially high rewards of the inverse modeling approach have motivated scientists from different backgrounds and from all over the world to dedicate considerable effort to the many challenges that must be overcome to make the approach useful. For BST models alone, about one hundred articles and numerous proceedings and book chapters describing computational methods for inverse tasks have appeared within the past decade. We address the specific challenges and requirements of inverse modeling in Section 3.4, along with different types of support algorithms. Many of the pertinent methods target the main problem of optimizing parameter values against the observed time series data. Others suggest strategies for circumventing the costly integration of differential equations, smoothing noisy data, estimating slopes, constraining the parameter search space, or reducing the complexity of the inference task. These auxiliary methods and algorithms will be reviewed in Section 4. The primary focus will be on methods applicable to models within BST, but we will also discuss related issues that are of interest to other modeling approaches.
The earlier discussion of the second modeling phase (see Fig. 1) mentioned that the true topology of a biological system is sometimes not fully understood or obscure. In such a case, parameter estimation is much more complicated, because it is a priori not clear how to formulate an ill-characterized biological system mathematically. As we will discuss, the use of concept maps [1] and of canonical models is of great help in this situation, because it converts the task of identifying the uncertain structure and regulation of the biological system into a parameter estimation task. Generally, structure identification tasks are much more difficult than parameter estimation, which is already very challenging. Canonical models render both tasks reachable. In particular, one should note that there is no clear boundary between parameter estimation and structure identification if canonical models are used. Section 5 will introduce some of the most relevant structure identification methods, namely the determination of the Jacobian matrix, direct observations, correlation-based approaches, ‘simple-to-general’ and ‘general-to-specific’ modeling, and specially tailored time series data analysis within the framework of BST.
Among all parameter estimation or structure identification methods proposed so far, no single method has risen to the top and can be declared the clear winner in terms of efficiency, robustness and reliability for the majority of realistic case studies. However, it seems that a combination of methods may be useful in a large number of cases. To make inverse modeling more effective and translucent, we propose in Section 6 an operational ‘work-flow’ that guides the user through the various steps of the estimation process, identifies possible problem areas, and suggests promising solutions based on the specific characteristics of the various available algorithms. An interesting consequence, and actually an advantage, of the combined approach is the general result that the optimized solution often consists of multiple, distinctly different parameter sets that are all consistent with the data and that can lead to novel hypotheses for further theoretical and experimental investigation.
Biological systems consist of many organizational layers including genetic, transcriptomic, proteomic, and metabolic levels, as well as phenomena of cell physiology, cell communication, tissue and organ function, populations, and ecology. In this review, we focus primarily on model construction at the genomic and metabolic levels, although many of the computational methods are independent of a particular application. The main reason is that the genomic and metabolic levels are currently best supported by available quantitative data. Metabolic time profiles are particularly well suited because of the stoichiometric property of metabolic pathways, which creates natural constraints on possible parameter combinations, and because its main drivers, namely metabolic concentrations and fluxes, can be measured, at least in principle. In contrast, no material flow is involved in gene–gene, gene–protein, or protein–protein interactions, and the measurable effects are seen in their consequences rather than in characteristics of the processes themselves. Therefore, gene and protein networks must often be studied with coarser methods than metabolic systems, such as graph theory and Boolean or Bayesian methods, which are applied under the simplifying assumption of binary on/off states. Nevertheless, recent methodological developments have enabled the generation of some dynamic profiles of gene networks, and these have been used for the quantitative identification of gene regulatory networks, primarily by several Japanese groups (see Section 2.3.4). Dynamic data on protein levels are still very rare, and quantitative time series responses are very difficult to obtain. Commensurate with the availability of data, we will primarily focus on the construction and analysis of metabolic pathway models but also discuss issues related to gene interaction networks.
Because the material in this review discusses numerous complementary aspects of parameter estimation and structure identification, it seems useful to summarize the structure of the review in the form of a roadmap, which is given in Table 1.
Section snippets
Model requirements
Mathematical modeling and control theory have a long history in physics and engineering. However, the demands and specific requirements in modeling biological systems are quite different and necessitate the adaptation and extension of former methods, as well as the development of novel, additional tools that are optimally suited for modeling biological phenomena. The peculiarities of biological system modeling can be generally described by five aspects (e.g., [2]). First, the biological
Kinetic model construction
The collection of input information and the choice of a mathematical model framework result in a symbolic model which is typically in the form of ordinary equations, as discussed before. The next step is to assign numerical values to all parameters in the model. There is no unique recipe for this task of parameter estimation. In fact, the estimation problem is almost always complicated and continues to be the bottleneck of biomathematical modeling.
In this section, we review the classes of
Parameter estimation techniques for top-down modeling approaches
Many of the recently developed techniques for top-down parameter estimation have been developed for BST models. Most of them are similarly applicable to other canonical models, although a few take advantage of the specific form of power laws in BST. The main algorithms and their representative references are shown in Fig. 3, Fig. 4. A historic listing of representative algorithms is presented in Table 3.
Inference of network structure
As mentioned in Section 1, the traditional approach of modeling begins with collecting network information that is translated into the design of a stoichiometric wiring diagram, which may then be converted into a fully kinetic metabolic pathway model, if desired. The translation and conversion more or less reflect the actual biological system as long as the input information is essentially correct and complete. In reality, information on network connectivity and regulation is often only
Toward a streamlined ‘work-flow’ for inverse modeling
As described in Sections 4 Parameter estimation techniques for top-down modeling approaches, 5 Inference of network structure, many methods have been developed recently that attempt to solve parameter estimation and structure identification problems through inverse modeling using the BST formalism. Most of these methods were developed to address the main problem of optimizing parameter values against observed time series data using gradient based methods, regression algorithms, or evolutionary
Open issues
As mentioned in Section 3.4 and in the previous section, the challenges of inverse modeling can be classified into issues related to data, model structure, computation, and mathematical features of the representation. Most of the recent articles have acknowledged and discussed various computational issues in great detail and some have addressed data and model related issues. However, there has been little discussion of model validity and quality beyond residual errors, the conditions under
Acknowledgments
The authors are grateful to Dr. Siren Veflingstad and two anonymous reviewers for critically reading the manuscript and providing constructive suggestions. This work was supported in part by a National Heart, Lung and Blood Institute Proteomics Initiative (Contract N01-HV-28181; D. Knapp, PI), a Molecular and Cellular Biosciences Grant (MCB-0517135; E.O. Voit, PI) from the National Science Foundation, a grant from the National Institutes of Health (R01 GM063265; Y.A. Hannun, PI), and an
References (260)
- et al.
Mechanistic simulations of inflammation: current state and future prospects
Math. Biosci.
(2009) - et al.
Advances in flux balance analysis
Curr. Opin. Biotechnol.
(2003) - et al.
A reconstruction of the metabolism of Methanococcus jannaschii from sequence data
Gene
(1997) - et al.
Combining metabolic flux analysis tools and 13C NMR to estimate intracellular fluxes of cultured astrocytes
Neurochem. Int.
(2008) - et al.
Quantitative analysis of intracellular metabolic fluxes using GC–MS and two-dimensional NMR spectroscopy
J. Biosci. Bioeng.
(2002) - et al.
Dynamic flux balance analysis of diauxic growth in Escherichia coli
Biophys. J.
(2002) - et al.
Constraints-based models: regulation of gene expression reduces the steady-state solution space
J. Theor. Biol.
(2003) - et al.
Mathematical modelling of metabolism
Curr. Opin. Biotechnol.
(2000) Biochemical systems analysis. I. Some mathematical properties of the rate law for the component enzymatic reactions
J. Theor. Biol.
(1969)The behavior of intact biochemical control systems
Curr. Top. Cell. Regul.
(1972)
Modeling forest growth I. Canonical approach
Ecol. Model.
Biochemical systems analysis. II. The steady-state solutions for an n-pool system using a power-law approximation
J. Theor. Biol.
Recasting nonlinear differential equations as S-systems: a canonical nonlinear form
Math. Biosci.
Development of genetic circuitry exhibiting toggle switch or oscillatory behavior in Escherichia coli
Cell
Power-law models of signal transduction pathways
Cell. Signal.
The potential for signal integration and processing in interacting MAP kinase cascades
J. Theor. Biol.
MCA has more to say
J. Theor. Biol.
The mathematics of metabolic control analysis revisited
Metab. Eng.
Kinetic modeling using S-systems and lin-log approaches
Biochem. Eng. J.
Lotka–Volterra representation of general nonlinear systems
Math. Biosci.
Equivalence between S-systems and Volterra-systems
Math. Biosci.
Development of fractal kinetic theory for enzyme-catalysed reactions and implications for the design of biochemical pathways
Biosystems
Subunit structure of regulator proteins influences the design of gene circuitry: analysis of perfectly coupled and completely uncoupled circuits
J. Mol. Biol.
Rules for coupled expression of regulator and effector genes in inducible circuits
J. Mol. Biol.
Completely uncoupled and perfectly coupled gene expression in repressible systems
J. Mol. Biol.
The dawn of a new era of metabolic systems analysis
Drug Discovery Today BioSilico
The tricarboxylic-acid cycle in Dictyostelium discoideum. 1. Formulation of alternative kinetic representations
J. Biol. Chem.
Comparative characterization of the fermentation pathway of Saccharomyces cerevisiae using biochemical systems theory and metabolic control analysis: steady-state analysis
Math. Biosci.
Comparative characterization of the fermentation pathway of Saccharomyces cerevisiae using biochemical systems theory and metabolic control analysis: model definition and nomenclature
Math. Biosci.
Comparative characterization of the fermentation pathway of Saccharomyces cerevisiae using biochemical systems theory and metabolic control analysis: model validation and dynamic behavior
Math. Biosci.
Mathematical models of purine metabolism in man
Math. Biosci.
Biological systems modeling and analysis: a biomolecular technique of the twenty-first century
J. Biomol. Tech.
Understanding through modeling
Microbial pathway models
Hybrid modeling in biochemical systems theory by means of functional petri nets
J. Bioinform. Comput. Biol.
Integrative biological systems modeling: challenges and opportunities
Frontiers Comput. Sci. Chin.
Nonlinear Differential Equations of Chemically Reacting Systems
The Regulation of Cellular Systems
Systems Biology: Properties of Reconstructed Networks
Metabolic Engineering: Principles and Methodologies
Metabolic flux balancing: basic concepts, scientific, and practical use
Bio/Technology
Reconstruction of amino acid biosynthesis pathways from the complete genome sequence
Genome Res.
The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities
Proc. Natl. Acad. Sci. USA
Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network
Genome Res.
System analysis of acetone–butanol–ethanol fermentation based on time-sliced metabolic flux analysis
Metabolic flux distributions in Corynebacterium glutamicum during growth and lysine overproduction
Biotechnol. Bioeng.
System estimation from metabolic time series data
Bioinformatics
Enzyme Kinetics: From Diastase to Multi-Enzyme Systems
Die Kinetik der Invertinwirkung
Biochem. Zeitschrift
A linear steady-state treatment of enzymatic chains. General properties, control and effector strength
Eur. J. Biochem.
Cited by (333)
Sparse dynamical system identification with simultaneous structural parameters and initial condition estimation
2022, Chaos, Solitons and FractalsExtracting parametric dynamics from time-series data
2023, Nonlinear DynamicsInference of Genetic Networks from Steady-State and Pseudo Time-Series of Single-Cell Gene Expression Data using Modified Random Forests
2023, 2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023Evolving Dilation Functions for Parameter Estimation
2023, CIBCB 2023 - 20th IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology