Simultaneous equations, error-in-variable models, and model integration in systems ecology

doi:10.1016/S0304-3800(01)00326-X

Ecological Modelling

Volume 142, Issue 3, 15 August 2001, Pages 285-294

https://doi.org/10.1016/S0304-3800(01)00326-X Get rights and content

Abstract

Numerous dynamic ecological models of varied time and spatial scales exist in systems ecology. In general, small-scale models are more accurate, more capable of reflecting tiny local variations in eco-processes, and more sensitive to the outside disturbances than large-scale models. On the other hand, large-scale models are more comprehensive, and usually describe the ecosystem's average properties. There has been increased interest in how to integrate accurate small-scale models with comprehensive large-scale models. The two-stage or three-stage least squares regression is the classic parameter estimation method for such purposes. In this study, a two-stage error-in-variable method is introduced to estimate the parameters for model integration. It is proved theoretically that when the restriction is exactly identifiable, the two-stage least squares regression and the two-stage error-in-variable model produce the same estimates. If the restriction is over identifiable, both methods have solutions, but the estimates are not necessarily identical. For under identifiable systems, the estimate from the error-in-variable model still exists, but the estimate from the two-stage least squares regression is not valid any more. An example is provided to demonstrate how to use the two-stage error-in-variable model in a step-by-step fashion.

Introduction

To study how an ecological system functions, it is a common practice to first break down the system to sub-systems. The outputs from some sub-systems are treated as the inputs to others, and all sub-systems are coupled by the outputs and inputs to become a complete system (see the simplest example as in Fig. 1). In forestry, this practice is often referred to as scaling (Jarvis, 1995), linking (Somers and Nepal, 1994), aggregation (O'Neill and Rust, 1979), disaggregation (Zhang et al., 1993), or ‘integration’ (Daniels and Burkhart, 1988, Tang, 1991). The similar problems analogous to the Fig. 1 example also occurred in systems ecology, which is the problem of variable aggregation in ecological simulation models (Luckyanov et al., 1983). Luckyanov (1983/1984) had studied the problem of linear aggregation and separability in linear models of ecological systems and proved a theorem of the existence of a general solution. An attempt was also made on the problem of aggregation in nonlinear ecological system models (Logofet and Svirezhev, 1986, Iwasa et al., 1987). These studies are theoretically oriented, and the authors assumed the parameters in the systems are either given or known, and made efforts on finding the conditions such that a system can be aggregated. The problems in forestry are usually opposite, i.e. it is known that the variables in the system can be aggregated, however, the parameters in the system are generally unknown and have to be estimated from sampling data. In this study, we will concentrate our efforts on the parameters estimation method for integrated ecological models.

The system in the figure can be represented by equations dx₁/dt=f₁(t, x₁, u₁, a₁), dx₂/dt=f₂(t, x₂, u₂, a₂) and dx₃/dt=f₃(t, x₁, x₂, x₃, a₃), where t is time, x_i are the states of the system, u_i are the inputs to the system, and a_i are the parameters to be estimated. As a typical simultaneous estimation problem, the solution to the equations can be expressed as x₁=F₁(t, u₁, a₁), x₂=F₂(t, u₂, a₂), and x₃=F₃(t, x₁, x₂, a₃). There are at least two approaches available to estimate the parameters in the system. First, the three sub-systems could be fitted separately from u₁ and u₂ and X₁, X₂, and X₃ to obtain the estimated parameters $a ̂_{1}, a ̂_{2}, and a ̂_{3};$ hence, the estimated states should satisfy equations: $X_{1} =F_{1} (t, u_{1}, a ̂_{1}),$ $X_{2} =F_{2} (t, u_{2}, a ̂_{2}),$ $X_{3} =F_{3} (t, X_{1}, X_{2}, a ̂_{3}).$

However, the parameters can also be estimated by integrating the three sub-systems as a single model such as: $x_{3} =F_{3} (t, F_{1} (t, u_{1}, a_{1}), F_{2} (t, u_{2}, a_{2}), a_{3}).$ The parameter estimates $a ̄_{1}, a ̄_{2}, and a ̄_{3}$ through Eq. (4) are usually different from $a ̂_{1}, a ̂_{2}, and a ̂_{3} .$ In terms of prediction, the result from Eq. (4) with u₁ and u₂ as input should be more accurate than the prediction through Eq. (3) with X₁ and X₂ as input, which are outputs from , where u₁ and u₂ are input (George et al., 1982). If the correlation between the variables in Eq. (1) or Eq. (2) is low and the chain of serial linking is long, the estimation errors would be large, and such propagated errors as a result of indirect prediction are not necessarily random. In fact, if the output from the previous stage sub model is used as the input to the model in the next stage, then the variables are endogenous, and the estimates are biased. If the model chain is very long, the accumulation of such biases will make the final estimate problematic (see Chapter 14, George et al., 1982).

If , , are linear, then econometricians have defined the system to be linear simultaneous equations with X₁, X₂, and X₃ as endogenous variables. Two-stage or three-stage least square regression (George et al., 1982) is the classic parameter estimation method for such equations. This method is capable of eliminating the error propagation and the parameters estimated are asymptotically unbiased. In forestry, Borders (1989) discussed the applicability of linear simultaneous equations in forest growth-and-yield modeling. Estimation procedures are well documented. However, the procedures work only on the identifiable equations without the restrictions between the equations. The fact is that, in systems ecology, unidentifiable simultaneous equations are frequently found, and very often these equations are nonlinear. The objective of this study is to introduce a two-stage regression method based on the error-in-variable model to solve such problems.

Section snippets

Linear simultaneous equations and two-stage regression

For simplicity, a general conclusion about simultaneous equations (George et al., 1982) is introduced. Suppose the observations on p endogenous variables y₁, y₂, …, y_p are Y_ti (1⩽t⩽T, 1⩽i⩽p), and the observations on q exogenous variables x₁, x₂, …, x_q are x_tj (1⩽t⩽T, 1⩽j⩽q), then the general form of simultaneous equations becomes: $Y_{t1} b_{11} + ⋯ +Y_{tp} b_{p1} +x_{t1} a_{11} + ⋯ +x_{tq} a_{q1} +e_{t1} =0$ $⋮$ $Y_{t1} b_{1p} + ⋯ +Y_{tp} b_{pp} +x_{t1} a_{1p} + ⋯ +x_{tq} a_{qp} +e_{tp} =0,$ where e_tj are random errors. Let B=(b_ij)_p×p, A=(a_ij)_q×p, Y_t=(Y_t1 Y_t2 … Y_tp), x_t=(x_t1 x_t2 … x_tq), and e_t=−(e_t1

Error-in-variable model

In fact, simultaneous , can also be regarded as an error-in-variable model, in which explanatory variables contain measurement errors (Fuller, 1987). In forestry, such a situation often appears in forest mensuration. For example, Curtis et al., 1974, Smith and Watts, 1987 once discussed the applicability of error-in-variable model in the field of forest growth and yield. To be presented in the form of error-in-variable model, Eq. (5) has to be re-written as: $y_{t} B + x_{t} A =0, Y_{t} = y_{t} + δ_{t}, 1≤t≤T,$ and there

The TSEM generalized to simultaneous nonlinear equations

The TSEM method can be extended to solve simultaneous nonlinear equations of $f_{1} (Y_{t1}, …, Y_{tp}, x_{t1}, …, x_{tq}, c)=e_{t1}$ $⋮$ $f_{p} (Y_{t1}, …, Y_{tp}, x_{t1}, …, x_{tq}, c)=e_{tp},$ where Y_ti and x_tj (1⩽i⩽p, 1⩽j⩽q, and 1⩽t⩽T) are observations, e_tp is random error, and c is a parameter vector. If the equations are treated as the error-in-variable model as: $f_{1} (y_{t1}, …, y_{tp}, x_{t1}, …, x_{tq}, c)=0$ $⋮$ $f_{p} (y_{t1}, …, y_{tp}, x_{t1}, …, x_{tq}, c)=0,$ with $Y_{ti} =y_{ti} +ε_{ti},$ then, parameter c can be estimated by the TSEM method. Suppose a unique solution can be found from solving Eq. (17)

The TSEM parameter estimation by a simulation study

A data set (Table 1) of forest stand age (year), stand mean DBH (diameter at breast height, cm), and stand volume (m³/ha) was simulated with the following nonlinear systems: $y_{1} = b_{1} x b_{2} +x +e_{1},$ $y_{2} =b_{3} +b_{4} ln (y_{1})+e_{2},$ where x is the age, y₁ is the DBH, y₂ is the volume, b₁, b₂, b₃, and b₄ are parameters. The values of the four parameters used in simulation are given beforehand (Table 2). Errors e₁ and e₂ are random variables with e₁=8(u−0.5) and e₂=0.1(u−0.5), where u is of uniform distribution in [0, 1].

The

Discussion and conclusions

Although the TSLS model can be used to estimate parameters in Eq. (5), people have to make sure the model is identifiable and find some restrictions on matrices A and B. However, if the parameters in linear simultaneous equations are estimated through the TSEM by treating endogenous variables subject to measurement error and exogenous variables associated with no errors, there is no need to worry about verification on whether the model is identifiable or not. For example, if there is an

Acknowledgements

The authors are grateful to the National Science Foundation of China (NSFC Grant No. 39670609) for financial support, they also thank Brenda Laishley and Ian Corns of the Northern Forestry Center, Canadian Forest Service, for editing the manuscript.

References (17)

R.F. Daniels et al.
An integrated system of forest stand models
For. Ecol. Manage.
(1988)
Y. Iwasa et al.
Aggregation in model ecosystems. I. Perfect aggregation
Ecol. Model.
(1987)
D.O. Logofet et al.
Averaging and aggregation in ecological models: an attempt at a non-linear approach
Ecol. Model.
(1986)
N.K. Luckyanov et al.
Aggregation of variables in simulation models of water ecosystems
Ecol. Model.
(1983)
R.V. O'Neill et al.
Aggregation error in ecological models
Ecol. Model.
(1979)
G.L. Somers et al.
Linking individual-tree and stand-level growth models
For. Ecol. Manage.
(1994)
B.E. Borders
Systems of equations in forest stand modeling
For. Sci.
(1989)
R.O. Curtis et al.
Which dependent variable in site index-height-age regression?
For. Sci.
(1974)

There are more references available in the full text version of this article.

Cited by (87)

A novel method for approaching the compatibility of tree biomass estimation by multi-task neural networks
2022, Forest Ecology and Management
It is important to guarantee the property of biological compatibility when estimating tree biomass of the total and components for carbon accounting under global climate change. The issue was successfully considered in traditional nonlinear regression models, but not for machine learning methods. A new method for approaching the compatibility of tree biomass estimation in ANN (Artificial Neural Network) was developed by using the multi-task loss function, which had the desire features of minimizing residuals and approaching biomass compatibility. The method was tested by two tree species biomass dataset and showed the desired feature. Leave-one-out validation results showed that comparing ANN model with simultaneously fitting 7 outputs (stem, bark, branch, leaf, crown, trunk, aboveground) and classical loss function, the RMSE of aboveground estimation (AGB) and the mean absolute relative difference between AGB and the sum of component biomass estimations from the model developed by our new method decreased from 166.864 (kg) to 154.860 (kg) and from 4.757% to 0.071%, respectively for Abies nephrolepis dataset, and from 49.18 (kg) to 33.060 (kg) and from 5.314% to 0.636%, respectively for Acer mono dataset. It provided a trade-off solution for the error accumulation and the compatibility among components and the total estimations when using ANN for tree biomass modelling, and was useful for carbon accounting using machine learning methods.
Comparing estimation algorithms for compatible biomass models of Moso Bamboo
2024, Tropical Ecology
A Framework for Upscaling Aboveground Biomass from an Individual Tree to Landscape Level and Qualifying the Multiscale Spatial Uncertainties for Natural Secondary Forests
2024, SSRN
Constructing a non-linear additive crown-width model system for moso bamboo forests in eastern China
2024, Australian Forestry
Climate change in the Central Amazon and its impacts on frog populations
2023, Environmental Monitoring and Assessment
Compatible Biomass Model with Measurement Error Using Airborne LiDAR Data
2023, Remote Sensing

View all citing articles on Scopus

¹: Tel.: +86-10-62889178; fax: +86-10-62585584.

²: Tel.: +86-10-62209346.

View full text

Simultaneous equations, error-in-variable models, and model integration in systems ecology

Abstract

Introduction

Section snippets

Linear simultaneous equations and two-stage regression

Error-in-variable model

The TSEM generalized to simultaneous nonlinear equations

The TSEM parameter estimation by a simulation study

Discussion and conclusions

Acknowledgements

For. Ecol. Manage.

Ecol. Model.

Ecol. Model.

Ecol. Model.

Ecol. Model.

For. Ecol. Manage.

Systems of equations in forest stand modeling

For. Sci.

Which dependent variable in site index-height-age regression?

For. Sci.