A multi-model approach to analysis of environmental phenomena
Introduction
Modelling of environmental phenomena usually relies on sampled data, which are often incomplete. Ideally, analysis should provide new insights into the phenomena, give accurate forecasting of the output for a range of inputs and outputs and fill in gaps in data records. This can be achieved by creating a range of specific models, i.e. models chosen for well-defined purposes, although the construction and choice of the models are often challenging. Environmental phenomena are sometimes non-linear in their dynamics and affected by non-Gaussian background noise. The temptation is to use complex non-linear modelling strategies to better describe the phenomena. Unfortunately, the answers from these strategies are very difficult to interpret from a physical aspect.
An additional problem relates to discontinuities, i.e. gaps, often present in data records, in particular when dealing with environmental data. We are interested in “reconstructing” that information contained in missing data, without influencing the construction of models.
This paper presents the Evolutionary Polynomial Regression (EPR) technique, which is a novel, model-based technique capable of reconstructing and modelling at the same time data series containing information about physical phenomena (Giustolisi et al., 2004a). It provides simple well-defined effective models useful both for on-line forecasting and for simulation (off-line forecasting). Here simulation is meant as an ahead prediction that uses the model-generated output instead of the field measured output, therefore it is also referred to as “off-line forecasting” (Ljung, 1999).
EPR models usually are polynomial structures where each monomial can contain user-defined functions. These structures can also improve physical interpretation of the studied phenomenon (Giustolisi et al., 2004b). EPR has the advantage of combining evolutionary algorithms with traditional numerical regression (Giustolisi and Savic, 2003). EPR is an incremental development of a hybrid methodology (Davidson et al., 1999, Davidson et al., 2003) which incorporated least squares optimisation within symbolic regression (McKay et al., 1995).
Thus, EPR is a hybrid methodology capable of producing a series of pseudo-polynomial models, from which one can choose those considered best for a particular purpose. It is unlikely that the same model would be selected for short gap reconstruction, for forecasting the phenomenon (with a particular time horizon), or for gaining physical insight. This approach is possible with EPR because it does not have a rigid structure, but allows a multi-structure strategy with multiple performances where each different structure has its own advantages for a specific modelling goal.
EPR is tested and demonstrated by means of a UK environmental case study which analyses the thermal behaviour of a river. Air temperature (input) and water temperature (output) data were available, but the data series had several gaps of different duration. Therefore, several models were constructed to reconstruct (infill) data (Bennis et al., 1997), obtain a model for on-line forecasting and scientific knowledge discovery about the dynamics of the heat transfer process over a short time scale. In summary, the case study contains all the features that typify the analysis of an environmental phenomenon.
Section snippets
A general overview
EPR is a data-driven hybrid paradigm for environmental symbolic modelling based on evolutionary computing. The strategy returns symbolic form expressions, as they are usually defined and referred to in mathematics literature (Watson and Parmee, 1996). This approach can be seen as opposite to the numerical regressions performed in Artificial Neural Networks and therefore named sub-symbolic, as in Minns (2000).
According to the classification of modelling techniques based on colours (Giustolisi,
The River Barle
The River Barle is the main tributary of the upper River Exe. It is located in a rural zone of South-west England, see Webb et al. (2003).
Our data collection consists of two years of hourly air (input) and water temperature samples (output). The collection of data at hourly intervals was considered to pose a potential problem of over-sampling because information collected at this frequency may over-represent the background noise and distort the modelling procedure. Therefore, data were
The strategy
The modelling phase was undertaken as follows:
- •
The structure of Eq. (1) is assumed polynomial.
- •
Each monomial term of Eq. (1) consists of elements from X raised to pre-specified power values.
- •
No hypotheses are made about a0, besides its positive sign.
- •
The assumed range of possible exponents of terms from X is (0; 0.5; 1; 2). The exponent 0 is useful for deselecting the non-necessary inputs, the exponent 0.5 smoothes the effect of the inputs, the exponent 1 produces a linear effect of the input and
Conclusions
Evolutionary Polynomial Regression results for the case study show the effectiveness of the symbolic multi-model approach in dealing with environmental problems. We proved the ability of EPR to get parsimonious and efficient models, which can be flexibly adapted to an accurate on-line forecasting and simulation. The case study confirmed the real capabilities of the multi-model approach enabled by EPR. Additionally, the multi-model EPR strategy not only gave a good physical insight of the
Acknowledgement
This work in this paper was supported in part by the U.K. Engineering and Physical Sciences Research Council, grant AF/000964.
References (24)
- et al.
Improving single-variable and multivariable techniques for estimating missing hydrological data
Journal of Hydrology
(1997) - et al.
Symbolic and numerical regression: experiments and applications
Information Sciences
(2003) - et al.
Stream temperature/air temperature relationship: a physical interpretation
Journal of Hydrology
(1999) - et al.
Genetic programming as a model induction engine
Journal of Hydroinformatics, IAHR-IWA
(2000) - et al.
Symbolic and numerical regression: a hybrid technique for polynomial approximators
- et al.
Applied Regression Analysis
(1998) Bootstrap methods. Another look at the jackknife
Annals of Statistics
(1979)Using genetic programming to determine Chèzy resistance coefficient in corrugated channels
Journal of Hydroinformatics, IAHR-IWA
(2004)- Giustolisi, O., Savic, D.A., 2003. Evolutionary Polynomial Regression (EPR): Development and Applications. Report...
Cited by (107)
Physically based machine learning for hierarchical materials
2024, Cell Reports Physical ScienceMechanical behaviour of E-waste aggregate concrete using a novel machine learning algorithm: Multi expression programming (MEP)
2023, Journal of Materials Research and TechnologyEvaluation of properties of bio-composite with interpretable machine learning approaches: optimization and hyper tuning
2023, Journal of Materials Research and TechnologyPrediction models for marshall mix parameters using bio-inspired genetic programming and deep machine learning approaches: A comparative study
2023, Case Studies in Construction Materials