The CCB-ID approach to tree species mapping with airborne imaging spectroscopy
- Published
- Accepted
- Subject Areas
- Biogeography, Ecology, Data Mining and Machine Learning, Spatial and Geographic Information Science
- Keywords
- biogeography, species mapping, imaging spectroscopy, remote sensing, modeling, open source, CCB-ID
- Copyright
- © 2018 Anderson
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2018. The CCB-ID approach to tree species mapping with airborne imaging spectroscopy. PeerJ Preprints 6:e26972v1 https://doi.org/10.7287/peerj.preprints.26972v1
Abstract
Background. Biogeographers assess how species distributions and abundances affect the structure, function, and composition of ecosystems. Yet we face a major challenge: it is difficult to precisely map species across landscapes. Novel Earth observations could obviate this challenge. Airborne imaging spectrometers measure plant functional traits at high resolution, and these measurements can be used to identify tree species. Plant traits are often highly conserved within species, and highly variable between species, which provides the biophysical basis for species mapping. In this paper I describe a trait-based approach to species identification with imaging spectroscopy, CCB-ID, which was developed as part of a NIST-sponsored ecological data science evaluation (ECODSE).
Methods. These methods were developed using NEON airborne imaging spectroscopy data. CCB-ID classifies tree species using trait-based reflectance variation and decision tree-based machine learning models, approximating a morphological trait and dichotomous key method traditionally used in botanical classification. First, outliers were removed using a spectral variance threshold. The remaining samples were transformed using principal components analysis and resampled by species to reduce common species biases. Gradient boosting and random forest classifiers were trained using the transformed and resampled feature data. Prediction probabilities were then calibrated using sigmoid regression, and sample-scale predictions were averaged to the crown scale.
Results. This approach performed well according to the competition metrics, receiving a rank-1 accuracy score of 0.919, and a cross-entropy cost score of 0.447 on the test data. Accuracy and specificity scores were high for all species, but precision and recall scores were variable for rare species. PCA transformation improved accuracy scores compared to models trained using reflectance data, but outlier removal and data resampling exacerbated class imbalance problems.
Discussion. CCB-ID accurately classified tree species using NEON imaging spectroscopy data, reporting the best classification scores among participants. However, it failed to overcome several well-known species mapping challenges, like precisely identifying rare species. Key takeaways include (1) training models to maximize metrics beyond accuracy (e.g. recall) could improve rare species predictions, (2) within-genus trait variation may drive spectral separability, precluding efforts to distinguish between functionally convergent species, (3) outlier removal and data resampling exacerbated class imbalance problems, and should be carefully implemented, (4) PCA transformation greatly improved model results, and (5) feature selection could further improve species classification models. CCB-ID is open source, designed for use with NEON data, and available to support future species mapping efforts.
Author Comment
This is a submission to PeerJ for review.
Supplemental Information
CCB-ID prediction probability performance
Per-species secondary performance metrics from the test data. These metrics were calculated using the prediction probability confusion matrix reported in Table S1. Low specificity scores for Pinus palustris, which do not appear in the binary classification results (Fig. 3, main text) reflect how it was frequently predicted at higher probabilities as a minority class.
Confusion matrix of prediction probability results
Prediction probability results of the CCB-ID model using the competition test data. Each cell contains the sum of prediction probabilities from all observed crowns per species. These data were used to generate Figure S1.