Comparison and ranking of different modelling techniques for prediction of site index in Mediterranean mountain forests
Introduction
In forestry, accurate estimation of site productivity is crucial for good forest resource management (Seynave et al., 2005). Productivity is very dependent on the quality of the site (i.e. the collective of physical and biotic factors present at a given location). Forest research has a long-standing tradition of studies concerning the impact of biotic and abiotic characteristics such as climate, topography, soil and vegetation on site productivity (e.g., Amen, 1945). To estimate forest site quality, foresters face the problem of integrating all these site factors. Moreover, the forest itself is an important site-forming factor, which makes only approximations possible unless forest and site are considered as a complex interrelated ecosystem (Spurr and Barnes, 1980). Because of this complexity, for most areas in Europe and North America forest site quality has been derived only empirically from the tree species specific dominant height of an even-aged tree population of known age and rescaled to a reference age, termed site index (SI) (Fontes et al., 2003).
For several applications, however, it is not possible to measure this site index in a direct way, e.g. in mixed, uneven-aged stands, for stand conversion to another tree species, for afforestation of non-forested land, or because site conditions changed over time. By linking dominant height to environmental variables (Corona et al., 1998, Curt et al., 2001), landscape characteristics (Iverson et al., 1997) and understory vegetation data (Bergès et al., 2006), site quality can be estimated at non-monitored sites. Most of the early site studies predicted forest growth from one or a few environmental variables that could be measured in the field relatively easy and at low cost. Several studies have tried to model site index by coupling age and tree height measurements to abiotic site properties but with alternating success (see e.g., Corona et al., 1998, Chen et al., 2002, Bergès et al., 2005). Many of these yielded low accuracy and a high degree of variation (Kayahara et al., 1998, Curt et al., 2001).
Linear regression is one of the oldest and most widely used statistical techniques for modelling site quality because of its easy use and straightforward interpretability (Curt et al., 2001, Seynave et al., 2005). Although a powerful approach in particular situations when appropriately applied, many ecological relations are typically non-linear. Data often have a non-constant variance distribution and many explanatory variables show collinearity. As a consequence, linear regression may not be appropriate or may lead to high unexplained variation (Guisan et al., 2002).
More recently, the development of more advanced non-parametric and machine learning techniques and the growing availability of geodatasets at high spatial resolution are opening up plenty of opportunities to predict forest site quality with greater accuracy. Despite the flexibility of these techniques to account for non-linear relationships, they are more vulnerable for overfitting the data, i.e. fitting noise resulting in unstable regression coefficients (Harrell et al., 1996, Guisan and Thuiller, 2005). Also the implementation, the capacity to integrate the models with other software and the interpretability of these models can become complicated and should be weighted against the improvement in accuracy and precision.
Non-parametric and machine learning techniques that may be better fit to address the mentioned problems of linear regression should be identified and their performance compared. In the domain of forest site quality assessment Mckenney and Pedlar (2003) successfully used classification and regression trees (CART) to model site index from environmental variables for two boreal tree species in Canada. The performance of non-parametric techniques as CART, generalized additive models (GAM) and artificial neural networks (ANN) compared to parametric techniques was investigated by Moisen and Frescino (2002) for the prediction of several species independent forest characteristics in the Interior Western United States. Wang et al. (2005) also evaluated these techniques for the spatial prediction of site index of Lodgepole pine in Canada. Both studies concluded that these non-parametric approaches can be more effective predictors. Boosted regression trees (BRT), an extension of CART, is a promising technique used in ecological research on species distributions and seems to be a powerful tool for all kinds of ecological modelling (Leathwick et al., 2006, Guisan et al., 2006, Elith et al., 2008). Recently, a number of software programs have been developed, incorporating many of the mentioned techniques for the prediction of species distributions (Thuiller et al., 2009). Yet no such tool exists for the prediction of continuous response variables as site index. Many studies already concluded that there is no general best modelling technique, but depending on the scope and the goal of the study some techniques will probably be better suited than others in particular situations. This study can be a good guideline to acquire more insight in the strength of the different techniques to model site index.
There is no single definite test to evaluate models, and many model predictive performance measures have been formulated (Guisan and Zimmermann, 2000, Moisen and Frescino, 2002, Wang et al., 2005). Moreover, other factors such as the ecological interpretability or the user-friendliness of a technique can be of importance in making a final evaluation and ranking of site index modelling techniques (Maggini et al., 2006). Multi-criteria decision analysis (MCDA) is a family of commonly used methodologies to assist in complex decision-making situations, as it allows the consideration of multiple criteria in incommensurate units (i.e. combination of quantitative and qualitative criteria) to provide a final ranking of alternative decisions (Herath, 2004, Mendoza and Martins, 2006). The aim of this study is to compare and evaluate two statistical non-parametric (GAM, CART), one machine learning (ANN) and one hybrid modelling techniques (BRT) for modelling site index. Although not expected to provide the best performance, multiple linear regression (MLR) is included in this study for its straightforward interpretability and as a benchmark against which other techniques can be compared. Each method is used to model site index in homogeneous stands of three important tree species of the Taurus Mountains (Turkey): Pinus brutia Ten. (Calabrian pine), Pinus nigra ssp. pallasiana (Arnold) K. Richt (Crimean pine) and Cedrus libani A. Rich. (Lebanon cedar). The specific objectives of this study are:
- (1)
to compare the modelling techniques with respect to their predictive performance;
- (2)
to rank the modelling techniques according to predictive performance and user-oriented criteria including user-friendliness and ecological interpretability.
Section snippets
Study area
The study area (55 000 ha) covers the Ağlasun forest district (37°33′N, 30°32′E, 350–2200 m above sea level) in southern Anatolia, Turkey. The region has a cold and sub-humid Mediterranean climate with pronounced winter precipitation and summer drought (Paulissen et al., 1993). Limestone is the predominating parent material. Locally also conglomerates and sandstones are present. Soil depth, moisture regime and stoniness vary with topography. Most soils can be classified as leptosols, regosols or
Results
A total of 15 SI-models were built, using five modelling techniques for each of the three species. All models were critically investigated for confounding factors and collinearity between explanatory variables and checked whether all basic assumptions were met. The three studied species clearly differ in site quality needs, as expressed by the different models (Table 3). Only easting and soil organic matter content seem to be common predictors for site index for all species, whereas P. brutia
Predictive performance
Based on our data, non-parametric techniques outperform MLR for predicting site index. Only CART performed for all species worse than MLR, which was also observed by Moisen and Frescino (2002) in predicting other forest characteristics. Leathwick et al. (2006) concluded from their study on modelling demersal fish species richness that due to their capability for fitting interactions among predictor variables, BRT appears to offer considerable performance gains over modelling techniques as GAM.
Conclusions
Five modelling techniques were compared and evaluated for predicting the site index of three tree species in the Taurus Mountains of Turkey. Based on a multi-criteria decision analysis that simultaneously evaluated ‘predictive performance’, ‘ecological interpretability’ and ‘user-friendliness’ of the models, GAM is the preferred technique for modelling site index of these species. BRT is a good second choice in case the ecological interpretability of the technique is of high importance. When
Acknowledgements
This research was supported by the Research Fund K.U. Leuven (OT/07/046) and a Concerted Action of the Flemish Government (GOA 02/2) in the framework of the Sagalassos Archaeological project. Special thanks go to the Turkish Forest Administration offices at Burdur and Ağlasun for granting access permission to the forests and for providing valuable forest inventory data.
References (62)
Species distribution models and ecological theory: a critical assessment and some possible new approaches
Ecol. Model.
(2007)Spatial prediction of species distribution: an interface between ecological theory and statistical modelling
Ecol. Model.
(2002)Regeneration of Lebanon Cedar (Cedrus libani A. Rich.) on karstic lands in Turkey
Forest Ecol. Manage.
(2003)- et al.
Relationship between environmental factors and site index in Douglas-fir plantations in central Italy
Forest Ecol. Manage.
(1998) - et al.
Predicting site index of Douglas-fir plantations from ecological variables in the Massif Central area of France
Forest Ecol. Manage.
(2001) - et al.
Hydrotest: a web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts
Environ. Model. Softw.
(2007) - et al.
Estimating stone and boulder content in forest soils—evaluating the potential of surface penetration methods
Catena
(1996) - et al.
Elevation and exposition rather than soil types determine communities and site suitability in Mediterranean mountain forests of southern Anatolia, Turkey
Forest Ecol. Manage.
(2007) - et al.
Comparing multiple criteria decision methods to extend a geographical information system on afforestation
Comput. Electron. Agric.
(2005) - et al.
Generalized linear and generalized additive models in studies of species distributions: setting the scene
Ecol. Model.
(2002)
Predictive habitat distribution models in ecology
Ecol. Model.
Incorporating community objectives in improved wetland management: the use of the analytic hierarchy process
J. Environ. Manage.
Testing site index site-factor relationships for predicting Pinus contorta and Picea engelmannii x P. glauca productivity in central British Columbia, Canada
Forest Ecol. Manage.
GRASP: generalized regression analysis and spatial prediction
Ecol. Model.
Spatial models of site index based on climate and soil properties for two boreal tree species in Ontario, Canada
Forest Ecol. Manage.
Multi-criteria decision analysis in natural resource management: a critical review of methods and new modelling paradigms
Forest Ecol. Manage.
Predicting tree species presence and basal area in Utah: a comparison of stochastic gradient boosting, generalized additive models, and tree-based methods
Ecol. Model.
Comparing five modelling techniques for predicting forest characteristics
Ecol. Model.
River flow forecasting through conceptual models 1: a discussion of principles
J. Hydrol.
Application of multi-criteria decision making to sustainable energy planning—a review
Renew. Sustain. Energy Rev.
Near-surface soil characteristics and understory plants as predictors of Pinus contorta site index in Southwestern Alberta, Canada
Forest Ecol. Manage.
Evaluation of spatial predictions of site index obtained by parametric and nonparametric methods—a case study of Lodgepole pine productivity
Forest Ecol. Manage.
Organic carbon: Walkley–Black method
Carbonate: volumetric calcimeter method
Prediction of site index for Yellow-poplar from soil and topography
J. Forest.
Sessile oak (Quercus Petraea Liebl.) site index variations in relation to climate, topography and soil in even-aged high-forest stands in northern France
Ann. Forest Sci.
Can understory vegetation accurately predict site index? A comparative study using floristic and abiotic indices in Sessile oak (Quercus Petraea Liebl.) stands in northern France
Ann. Forest Sci.
Hydrometer method improved for making particle size analyses of soils
Agron. J.
Silvicultural characteristics and natural regeneration of Pinus brutia Ten—a review
Plant Ecol.
Classification and Regression Trees
Comparison of models for describing corn yield response to nitrogen-fertilizer
Agron. J.
Cited by (323)
Drought susceptibility mapping in Iraq using GRACE/GRACE-FO, GLDAS, and machine learning algorithms
2024, Physics and Chemistry of the EarthCombined use of spectral resampling and machine learning algorithms to estimate soybean leaf chlorophyll
2024, Computers and Electronics in AgricultureEvaluation of wheat yield in North China Plain under extreme climate by coupling crop model with machine learning
2024, Computers and Electronics in AgricultureModeling soil loss under rainfall events using machine learning algorithms
2024, Journal of Environmental ManagementModeling seasonal changes in the habitat suitability of Coilia nasus in the Yangtze River Estuary using tree-based methods
2023, Regional Studies in Marine ScienceStrength evaluation of eco-friendly waste-derived self-compacting concrete via interpretable genetic-based machine learning models
2023, Materials Today Communications