A multi-model approach to analysis of environmental phenomena

https://doi.org/10.1016/j.envsoft.2005.12.026Get rights and content

Abstract

A data-driven methodology named Evolutionary Polynomial Regression is introduced. EPR permits the symbolic and multi-purpose modelling of physical phenomena, through the simultaneous solution of a number of models. Multi-purpose modelling or “multi-modelling” enables the user to make a different choice according to what the model is aiming at: (a) the scientific knowledge based on data modelling, (b) on-line and off-line forecasting, (c) data augmentation (i.e. infilling of missing data in time series) and so on. This allows a more robust model selection phase. A case study based on the application of Evolutionary Polynomial Regression to the study of the thermal behaviour of a stream is presented.

Introduction

Modelling of environmental phenomena usually relies on sampled data, which are often incomplete. Ideally, analysis should provide new insights into the phenomena, give accurate forecasting of the output for a range of inputs and outputs and fill in gaps in data records. This can be achieved by creating a range of specific models, i.e. models chosen for well-defined purposes, although the construction and choice of the models are often challenging. Environmental phenomena are sometimes non-linear in their dynamics and affected by non-Gaussian background noise. The temptation is to use complex non-linear modelling strategies to better describe the phenomena. Unfortunately, the answers from these strategies are very difficult to interpret from a physical aspect.

An additional problem relates to discontinuities, i.e. gaps, often present in data records, in particular when dealing with environmental data. We are interested in “reconstructing” that information contained in missing data, without influencing the construction of models.

This paper presents the Evolutionary Polynomial Regression (EPR) technique, which is a novel, model-based technique capable of reconstructing and modelling at the same time data series containing information about physical phenomena (Giustolisi et al., 2004a). It provides simple well-defined effective models useful both for on-line forecasting and for simulation (off-line forecasting). Here simulation is meant as an ahead prediction that uses the model-generated output instead of the field measured output, therefore it is also referred to as “off-line forecasting” (Ljung, 1999).

EPR models usually are polynomial structures where each monomial can contain user-defined functions. These structures can also improve physical interpretation of the studied phenomenon (Giustolisi et al., 2004b). EPR has the advantage of combining evolutionary algorithms with traditional numerical regression (Giustolisi and Savic, 2003). EPR is an incremental development of a hybrid methodology (Davidson et al., 1999, Davidson et al., 2003) which incorporated least squares optimisation within symbolic regression (McKay et al., 1995).

Thus, EPR is a hybrid methodology capable of producing a series of pseudo-polynomial models, from which one can choose those considered best for a particular purpose. It is unlikely that the same model would be selected for short gap reconstruction, for forecasting the phenomenon (with a particular time horizon), or for gaining physical insight. This approach is possible with EPR because it does not have a rigid structure, but allows a multi-structure strategy with multiple performances where each different structure has its own advantages for a specific modelling goal.

EPR is tested and demonstrated by means of a UK environmental case study which analyses the thermal behaviour of a river. Air temperature (input) and water temperature (output) data were available, but the data series had several gaps of different duration. Therefore, several models were constructed to reconstruct (infill) data (Bennis et al., 1997), obtain a model for on-line forecasting and scientific knowledge discovery about the dynamics of the heat transfer process over a short time scale. In summary, the case study contains all the features that typify the analysis of an environmental phenomenon.

Section snippets

A general overview

EPR is a data-driven hybrid paradigm for environmental symbolic modelling based on evolutionary computing. The strategy returns symbolic form expressions, as they are usually defined and referred to in mathematics literature (Watson and Parmee, 1996). This approach can be seen as opposite to the numerical regressions performed in Artificial Neural Networks and therefore named sub-symbolic, as in Minns (2000).

According to the classification of modelling techniques based on colours (Giustolisi,

The River Barle

The River Barle is the main tributary of the upper River Exe. It is located in a rural zone of South-west England, see Webb et al. (2003).

Our data collection consists of two years of hourly air (input) and water temperature samples (output). The collection of data at hourly intervals was considered to pose a potential problem of over-sampling because information collected at this frequency may over-represent the background noise and distort the modelling procedure. Therefore, data were

The strategy

The modelling phase was undertaken as follows:

  • The structure of Eq. (1) is assumed polynomial.

  • Each monomial term of Eq. (1) consists of elements from X raised to pre-specified power values.

  • No hypotheses are made about a0, besides its positive sign.

  • The assumed range of possible exponents of terms from X is (0; 0.5; 1; 2). The exponent 0 is useful for deselecting the non-necessary inputs, the exponent 0.5 smoothes the effect of the inputs, the exponent 1 produces a linear effect of the input and

Conclusions

Evolutionary Polynomial Regression results for the case study show the effectiveness of the symbolic multi-model approach in dealing with environmental problems. We proved the ability of EPR to get parsimonious and efficient models, which can be flexibly adapted to an accurate on-line forecasting and simulation. The case study confirmed the real capabilities of the multi-model approach enabled by EPR. Additionally, the multi-model EPR strategy not only gave a good physical insight of the

Acknowledgement

This work in this paper was supported in part by the U.K. Engineering and Physical Sciences Research Council, grant AF/000964.

References (24)

  • Giustolisi, O., Savic, D.A., 2004a. Decision Support for Water Distribution System Rehabilitation Using Evolutionary...
  • Giustolisi, O., Savic, D.A., 2004b. A novel strategy to perform genetic programming: Evolutionary Polynomial...
  • Cited by (107)

    View all citing articles on Scopus
    View full text