Elsevier

Ecological Modelling

Volume 221, Issue 8, 24 April 2010, Pages 1119-1130
Ecological Modelling

Comparison and ranking of different modelling techniques for prediction of site index in Mediterranean mountain forests

https://doi.org/10.1016/j.ecolmodel.2010.01.007Get rights and content

Abstract

Forestry science has a long tradition of studying the relationship between stand productivity and abiotic and biotic site characteristics, such as climate, topography, soil and vegetation. Many of the early site quality modelling studies related site index to environmental variables using basic statistical methods such as linear regression. Because most ecological variables show a typical non-linear course and a non-constant variance distribution, a large fraction of the variation remained unexplained by these linear models. More recently, the development of more advanced non-parametric and machine learning methods provided opportunities to overcome these limitations. Nevertheless, these methods also have drawbacks. Due to their increasing complexity they are not only more difficult to implement and interpret, but also more vulnerable to overfitting. Especially in a context of regionalisation, this may prove to be problematic. Although many non-parametric and machine learning methods are increasingly used in applications related to forest site quality assessment, their predictive performance has only been assessed for a limited number of methods and ecosystems.

In this study, five different modelling techniques are compared and evaluated, i.e. multiple linear regression (MLR), classification and regression trees (CART), boosted regression trees (BRT), generalized additive models (GAM), and artificial neural networks (ANN). Each method is used to model site index of homogeneous stands of three important tree species of the Taurus Mountains (Turkey): Pinus brutia, Pinus nigra and Cedrus libani. Site index is related to soil, vegetation and topographical variables, which are available for 167 sample plots covering all important environmental gradients in the research area. The five techniques are compared in a multi-criteria decision analysis in which different model performance measures, ecological interpretability and user-friendliness are considered as criteria.

When combining these criteria, in most cases GAM is found to outperform all other techniques for modelling site index for the three species. BRT is a good alternative in case the ecological interpretability of the technique is of higher importance. When user-friendliness is more important MLR and CART are the preferred alternatives. Despite its good predictive performance, ANN is penalized for its complex, non-transparent models and big training effort.

Introduction

In forestry, accurate estimation of site productivity is crucial for good forest resource management (Seynave et al., 2005). Productivity is very dependent on the quality of the site (i.e. the collective of physical and biotic factors present at a given location). Forest research has a long-standing tradition of studies concerning the impact of biotic and abiotic characteristics such as climate, topography, soil and vegetation on site productivity (e.g., Amen, 1945). To estimate forest site quality, foresters face the problem of integrating all these site factors. Moreover, the forest itself is an important site-forming factor, which makes only approximations possible unless forest and site are considered as a complex interrelated ecosystem (Spurr and Barnes, 1980). Because of this complexity, for most areas in Europe and North America forest site quality has been derived only empirically from the tree species specific dominant height of an even-aged tree population of known age and rescaled to a reference age, termed site index (SI) (Fontes et al., 2003).

For several applications, however, it is not possible to measure this site index in a direct way, e.g. in mixed, uneven-aged stands, for stand conversion to another tree species, for afforestation of non-forested land, or because site conditions changed over time. By linking dominant height to environmental variables (Corona et al., 1998, Curt et al., 2001), landscape characteristics (Iverson et al., 1997) and understory vegetation data (Bergès et al., 2006), site quality can be estimated at non-monitored sites. Most of the early site studies predicted forest growth from one or a few environmental variables that could be measured in the field relatively easy and at low cost. Several studies have tried to model site index by coupling age and tree height measurements to abiotic site properties but with alternating success (see e.g., Corona et al., 1998, Chen et al., 2002, Bergès et al., 2005). Many of these yielded low accuracy and a high degree of variation (Kayahara et al., 1998, Curt et al., 2001).

Linear regression is one of the oldest and most widely used statistical techniques for modelling site quality because of its easy use and straightforward interpretability (Curt et al., 2001, Seynave et al., 2005). Although a powerful approach in particular situations when appropriately applied, many ecological relations are typically non-linear. Data often have a non-constant variance distribution and many explanatory variables show collinearity. As a consequence, linear regression may not be appropriate or may lead to high unexplained variation (Guisan et al., 2002).

More recently, the development of more advanced non-parametric and machine learning techniques and the growing availability of geodatasets at high spatial resolution are opening up plenty of opportunities to predict forest site quality with greater accuracy. Despite the flexibility of these techniques to account for non-linear relationships, they are more vulnerable for overfitting the data, i.e. fitting noise resulting in unstable regression coefficients (Harrell et al., 1996, Guisan and Thuiller, 2005). Also the implementation, the capacity to integrate the models with other software and the interpretability of these models can become complicated and should be weighted against the improvement in accuracy and precision.

Non-parametric and machine learning techniques that may be better fit to address the mentioned problems of linear regression should be identified and their performance compared. In the domain of forest site quality assessment Mckenney and Pedlar (2003) successfully used classification and regression trees (CART) to model site index from environmental variables for two boreal tree species in Canada. The performance of non-parametric techniques as CART, generalized additive models (GAM) and artificial neural networks (ANN) compared to parametric techniques was investigated by Moisen and Frescino (2002) for the prediction of several species independent forest characteristics in the Interior Western United States. Wang et al. (2005) also evaluated these techniques for the spatial prediction of site index of Lodgepole pine in Canada. Both studies concluded that these non-parametric approaches can be more effective predictors. Boosted regression trees (BRT), an extension of CART, is a promising technique used in ecological research on species distributions and seems to be a powerful tool for all kinds of ecological modelling (Leathwick et al., 2006, Guisan et al., 2006, Elith et al., 2008). Recently, a number of software programs have been developed, incorporating many of the mentioned techniques for the prediction of species distributions (Thuiller et al., 2009). Yet no such tool exists for the prediction of continuous response variables as site index. Many studies already concluded that there is no general best modelling technique, but depending on the scope and the goal of the study some techniques will probably be better suited than others in particular situations. This study can be a good guideline to acquire more insight in the strength of the different techniques to model site index.

There is no single definite test to evaluate models, and many model predictive performance measures have been formulated (Guisan and Zimmermann, 2000, Moisen and Frescino, 2002, Wang et al., 2005). Moreover, other factors such as the ecological interpretability or the user-friendliness of a technique can be of importance in making a final evaluation and ranking of site index modelling techniques (Maggini et al., 2006). Multi-criteria decision analysis (MCDA) is a family of commonly used methodologies to assist in complex decision-making situations, as it allows the consideration of multiple criteria in incommensurate units (i.e. combination of quantitative and qualitative criteria) to provide a final ranking of alternative decisions (Herath, 2004, Mendoza and Martins, 2006). The aim of this study is to compare and evaluate two statistical non-parametric (GAM, CART), one machine learning (ANN) and one hybrid modelling techniques (BRT) for modelling site index. Although not expected to provide the best performance, multiple linear regression (MLR) is included in this study for its straightforward interpretability and as a benchmark against which other techniques can be compared. Each method is used to model site index in homogeneous stands of three important tree species of the Taurus Mountains (Turkey): Pinus brutia Ten. (Calabrian pine), Pinus nigra ssp. pallasiana (Arnold) K. Richt (Crimean pine) and Cedrus libani A. Rich. (Lebanon cedar). The specific objectives of this study are:

  • (1)

    to compare the modelling techniques with respect to their predictive performance;

  • (2)

    to rank the modelling techniques according to predictive performance and user-oriented criteria including user-friendliness and ecological interpretability.

Section snippets

Study area

The study area (55 000 ha) covers the Ağlasun forest district (37°33′N, 30°32′E, 350–2200 m above sea level) in southern Anatolia, Turkey. The region has a cold and sub-humid Mediterranean climate with pronounced winter precipitation and summer drought (Paulissen et al., 1993). Limestone is the predominating parent material. Locally also conglomerates and sandstones are present. Soil depth, moisture regime and stoniness vary with topography. Most soils can be classified as leptosols, regosols or

Results

A total of 15 SI-models were built, using five modelling techniques for each of the three species. All models were critically investigated for confounding factors and collinearity between explanatory variables and checked whether all basic assumptions were met. The three studied species clearly differ in site quality needs, as expressed by the different models (Table 3). Only easting and soil organic matter content seem to be common predictors for site index for all species, whereas P. brutia

Predictive performance

Based on our data, non-parametric techniques outperform MLR for predicting site index. Only CART performed for all species worse than MLR, which was also observed by Moisen and Frescino (2002) in predicting other forest characteristics. Leathwick et al. (2006) concluded from their study on modelling demersal fish species richness that due to their capability for fitting interactions among predictor variables, BRT appears to offer considerable performance gains over modelling techniques as GAM.

Conclusions

Five modelling techniques were compared and evaluated for predicting the site index of three tree species in the Taurus Mountains of Turkey. Based on a multi-criteria decision analysis that simultaneously evaluated ‘predictive performance’, ‘ecological interpretability’ and ‘user-friendliness’ of the models, GAM is the preferred technique for modelling site index of these species. BRT is a good second choice in case the ecological interpretability of the technique is of high importance. When

Acknowledgements

This research was supported by the Research Fund K.U. Leuven (OT/07/046) and a Concerted Action of the Flemish Government (GOA 02/2) in the framework of the Sagalassos Archaeological project. Special thanks go to the Turkish Forest Administration offices at Burdur and Ağlasun for granting access permission to the forests and for providing valuable forest inventory data.

References (62)

  • A. Guisan et al.

    Predictive habitat distribution models in ecology

    Ecol. Model.

    (2000)
  • G. Herath

    Incorporating community objectives in improved wetland management: the use of the analytic hierarchy process

    J. Environ. Manage.

    (2004)
  • G.J. Kayahara et al.

    Testing site index site-factor relationships for predicting Pinus contorta and Picea engelmannii x P. glauca productivity in central British Columbia, Canada

    Forest Ecol. Manage.

    (1998)
  • A. Lehmann et al.

    GRASP: generalized regression analysis and spatial prediction

    Ecol. Model.

    (2003)
  • D.W. Mckenney et al.

    Spatial models of site index based on climate and soil properties for two boreal tree species in Ontario, Canada

    Forest Ecol. Manage.

    (2003)
  • G.A. Mendoza et al.

    Multi-criteria decision analysis in natural resource management: a critical review of methods and new modelling paradigms

    Forest Ecol. Manage.

    (2006)
  • G.G. Moisen et al.

    Predicting tree species presence and basal area in Utah: a comparison of stochastic gradient boosting, generalized additive models, and tree-based methods

    Ecol. Model.

    (2006)
  • G.G. Moisen et al.

    Comparing five modelling techniques for predicting forest characteristics

    Ecol. Model.

    (2002)
  • J.E. Nash et al.

    River flow forecasting through conceptual models 1: a discussion of principles

    J. Hydrol.

    (1970)
  • S.D. Pohekar et al.

    Application of multi-criteria decision making to sustainable energy planning—a review

    Renew. Sustain. Energy Rev.

    (2004)
  • K.S. Szwaluk et al.

    Near-surface soil characteristics and understory plants as predictors of Pinus contorta site index in Southwestern Alberta, Canada

    Forest Ecol. Manage.

    (2003)
  • Y.H. Wang et al.

    Evaluation of spatial predictions of site index obtained by parametric and nonparametric methods—a case study of Lodgepole pine productivity

    Forest Ecol. Manage.

    (2005)
  • L.E. Allison

    Organic carbon: Walkley–Black method

  • L.E. Allison et al.

    Carbonate: volumetric calcimeter method

  • J.T. Amen

    Prediction of site index for Yellow-poplar from soil and topography

    J. Forest.

    (1945)
  • L. Bergès et al.

    Sessile oak (Quercus Petraea Liebl.) site index variations in relation to climate, topography and soil in even-aged high-forest stands in northern France

    Ann. Forest Sci.

    (2005)
  • L. Bergès et al.

    Can understory vegetation accurately predict site index? A comparative study using floristic and abiotic indices in Sessile oak (Quercus Petraea Liebl.) stands in northern France

    Ann. Forest Sci.

    (2006)
  • G.J. Bouyoucos

    Hydrometer method improved for making particle size analyses of soils

    Agron. J.

    (1962)
  • M. Boydak

    Silvicultural characteristics and natural regeneration of Pinus brutia Ten—a review

    Plant Ecol.

    (2004)
  • L. Breiman et al.

    Classification and Regression Trees

    (1984)
  • M.E. Cerrato et al.

    Comparison of models for describing corn yield response to nitrogen-fertilizer

    Agron. J.

    (1990)
  • Cited by (323)

    View all citing articles on Scopus
    View full text