A performance comparison of machine learning methods to estimate the fast-growing forest plantation yield based on laser scanning metrics

https://doi.org/10.1016/j.compag.2015.07.004Get rights and content

Highlights

Abstract

Machine learning models appear to be an attractive route towards tackling high-dimensional problems, particularly in areas where a lack of knowledge exists regarding the development of effective algorithms, and where programs must dynamically adapt to changing conditions. The objective of this study was to evaluate the performance of three machine learning tools for predicting stand volume of fast-growing forest plantations, based on statistical vegetation metrics extracted from an Airborne Laser Scanning (ALS) survey. The forests used in this study were composed of 1138 ha of commercial plantations that consisted of hybrids of Eucalyptus grandis and Eucalyptus urophylla, managed for pulp production. Three machine learning tools were implemented: neural network (NN), random forest (RF) and support vector regression (SV); and their performance was compared to a regression model (RM). The RF and the RM presented an RMSE in the leave-one-out cross-validation of 31.80 and 30.56 m3 ha−1 respectively. The NN and SV presented a higher RMSE than the others, equal to 64.44 and 65.30 m3 ha−1. The coefficient of determination and bias were similar to all modeling techniques. The ranking of ALS metrics based on their relative importance for the estimation of stand volume showed some differences. Rather than being limited to a subset of predictor variables, machine learning techniques explored the complete metrics set, looking for patterns between them and the dependent variable.

Introduction

Remote Sensing (RS) has been used as an efficient assessment tool to monitor large forest areas. RS techniques allow the retrieval of spatial data from the environment as trees, roads, stream flow, and other objects located over the ground surface (Zhou et al., 2013). The available expertise in multi-spectral image acquisition, processing, interpretation, and its relatively lower cost have resulted in the high use of this method within forest monitoring activities (Prasad et al., 2011). However, multi-spectral RS encounters problems when assessing vertical information directly (i.e. incorporating a third dimension) since it performs less impressively in sensing structure under medium to high leaf area conditions. The radiometric interference from the surface, the weather conditions, the atmospheric turbidities, and the angles of solar incidence also present problems to multi-spectral RS (Pflugmacher et al., 2012, Proy et al., 1989, Stojanova et al., 2010).

Airborne Laser Scanning (ALS) has been employed to generate Digital Elevation Models (DEM) throughout the last 20 years (Montaghi et al., 2013). Due to its ability to penetrate the forest canopy, ALS technology has become the primary data source for characterizing vertical forest structure (White et al., 2013), and its use has expanded towards new applications such as monitoring vegetation. Based on Light Detection and Ranging (LiDAR) technology, this sensor provides horizontal and vertical information at high spatial resolution and high vertical accuracy that significantly increase our understanding of the real 3D structure of the forest (Næsset, 2004, Patenaude et al., 2004).

The large amount of data constrains the direct use of ALS as an input to the modeling of forest parameters. After collection, the raw ALS data must subsequently be reduced and represented numerically within the calculation of several spatial ALS-metrics that can then be used to create predictive equations for forestry inventory attributes. The number of metrics can easily reach hundreds of variables, and the selection of these metrics remains an empirical process highly dependent on human intervention. After eliminating all but the most descriptive metrics, forest attributes are then estimated through statistical regression analyses that explore the correlation between field measurements and ALS metrics (Gleason and Im, 2012, Lefsky et al., 1999, Næsset, 1997, Næsset and Bjerknes, 2001, Nelson et al., 1988, Reutebuch et al., 2005, Zhao et al., 2009). Linking ALS-metrics to field data is an effective method for estimating several forest attributes (e.g., stem volume, basal area, biomass, etc.) at the stand or regional level, but there remains a large set of assumptions and site-specific considerations that must be made (Zhao et al., 2011). In fact, a large number of variables could theoretically improve the precision of the models, but models with fewer variables are much easier to interpret (Murphy et al., 2010). It is thus important to develop parsimonious models, mainly because prediction models should be valid for general conditions, and degrees of freedom should not be unreasonably discarded. Finally, large sets of predictor variables often bear strong inter-correlations, which can lead to unstable predictions (Latifi et al., 2010).

Three approaches have been used to select metrics and develop regression models with ALS data. One is to adjust models based on empirical pre-established relationships between field data and ALS metrics established by other studies (Zhao et al., 2011). Another is to determine the best relationship between ALS metrics and field data through optimizing a certain statistical measure (stepwise or exhaustive search) (Næsset, 1997, Patenaude et al., 2004). A third approach involves the use of multivariate statistical analysis, based on the assumption that multicorrelated ALS metrics should be used to estimate forest parameters (Valbuena et al., 2013).

Within the context of remote sensing forest structures via ALS data, studies have shown that the majority of estimation models are not only site- or species-specific, but also scale-dependent, which indicates that the models should be applied to a scale or pixel (cell of the grid) size commensurate to the plot size used in the model fitting (Næsset, 2002). Zhao et al. (2009) demonstrated that it was possible to reduce the effects of plot scale by using machine learning tools.

These machine learning tools are effective when producing software, since they have the ability to tackle high-dimensional problems, poorly understood domains where there is lack of knowledge needed to develop effective algorithms, or domains where programs must dynamically adapt to changing conditions (Mitchell, 1997). An additional advantage is that machine learning allows the user to implement a continuous learning process. Previous remote sensing studies have shown a superior or promising level of performance by artificial intelligence techniques over more classical methods (Atzberger, 2004, Durbha et al., 2007, Fang et al., 2003, Zhao et al., 2011, Zhao et al., 2008).

The primary objective of this study is to evaluate the performance of two machine learning tools (NN, RF, and SV) stand volume estimation. The secondary objectives are: (1) to compare the performance of artificial intelligence (AI) tools to the usual regression model and (2) to assess the relative importance of ALS metrics through AI.

Section snippets

Material and methods

The study area is located in the State of São Paulo, characterized by a mountainous topography, ranging from 589 to 1294 m above the sea level. The area covers approximately 1138 ha. The forest consists of a commercial plantation, the stock being hybrids of Eucalyptus grandis W. Hill ex Maiden and Eucalyptus urophylla S.T. Blake managed by the Fibria SA to supply a pulp production company. The trees spacing is 3 × 2 m, resulting in 1666 plants per hectare, with an age range of 2–8 years old.

The

Results

The estimation based on artificial intelligence tools used all 104 metrics extracted from ALS data sets to estimate stand volume, exempting both metrics selection and reduction. The RF and the RM presented an RMSE of 31.80 m3 ha−1 and 30.56 m3 ha−1 respectively. The NN and SV showed a higher RMSE than RM and RF, equal to 64.44 and 65.30 m3 ha−1, respectively. The bias calculated from all techniques resulted in a normal distribution, centered on zero with a small negative skew. Scatter plots of the

Discussion

This study examined the performance of three artificial intelligence tools regarding the area-based estimation of stand volume in different ages eucalyptus stands. Their performance was compared to the traditional methodology that uses regression models through variables reduction and selection.

The regression models require direct interaction and calibration from the modeler since these models are built upon relationships and assumptions that are not always repeated in other data sets. During

Conclusion

Rather than being limited to a subset of ALS metrics in attempting explain as much variability in a dependent variable as possible, artificial intelligence tools explore the complete metrics set when looking for patterns between ALS metrics and stand volume. AI tools can easily be employed towards dealing with the problems inherent in estimating stand volume from an ALS data set via the use of a machine learning approach. This approach can be implemented through software, continuously learning

Acknowledgments

This paper is part of the research program developed by GET-LiDAR. FIBRIA SA is acknowledged for their funding of this study regarding the laser and field data set acquisition. The Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, CAPES, supported this study via a student grant and ForEAdapt (FP7-PEOPLE-2009-IRSES) supported this work through a visit to SLU/Sweden during data processing. We would like to thank Simon Dunster for his review of our translation of this paper into

References (51)

  • E. Næsset et al.

    Estimation of above- and below-ground biomass across regions of the boreal forest zone using airborne laser

    Remote Sens. Environ.

    (2008)
  • R. Nelson et al.

    Estimating forest biomass and volume using airborne laser data

    Remote Sens. Environ.

    (1988)
  • M. Nilsson

    Estimation of tree heights and stand volume using an airborne lidar system

    Remote Sens. Environ.

    (1996)
  • R. Özçelik et al.

    Estimating tree bole volume using artificial neural network models for four species in Turkey

    J. Environ. Manage.

    (2010)
  • R. Özçelik et al.

    Estimating Crimean juniper tree height using nonlinear regression and artificial neural network models

    Forest Ecol. Manage.

    (2013)
  • G.G. Parker et al.

    The canopy surface and stand development: assessing forest canopy structure and complexity with near-surface altimetry

    Forest Ecol. Manage.

    (2004)
  • D. Pflugmacher et al.

    Using landsat-derived disturbance history (1972–2010) to predict current forest structure

    Remote Sens. Environ.

    (2012)
  • C. Proy et al.

    Evaluation of topographic effects in remotely sensed data

    Remote Sens. Environ.

    (1989)
  • P.R. Stephens et al.

    Airborne scanning LiDAR in a double sampling forest carbon inventory

    Remote Sens. Environ.

    (2012)
  • D. Stojanova et al.

    Estimating vegetation height and canopy cover from remotely sensed data with machine learning

    Ecol. Inform.

    (2010)
  • K. Zhao et al.

    Characterizing forest canopy structure with lidar composite metrics and machine learning

    Remote Sens. Environ.

    (2011)
  • K. Zhao et al.

    Lidar remote sensing of forest biomass: a scale-invariant estimation approach using airborne lasers

    Remote Sens. Environ.

    (2009)
  • J. Zhou et al.

    Mapping local density of young eucalyptus plantations by individual tree detection in high spatial resolution satellite images

    Forest Ecol. Manage.

    (2013)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • J.L. Clutter et al.

    Timber Management: A Quantitative Approach

    (1983)
  • Cited by (0)

    View full text