Earth observation based indication for avian species distribution models using the spectral trait concept and machine learning in an urban setting

https://doi.org/10.1016/j.ecolind.2019.106029Get rights and content

Highlights

  • New methodology to create species distribution models (SDM) for bird species.

  • Random forest is the most suitable machine learning technology for the SDMs.

  • Texture metrics are the most important indicator describing bird-breeding ranges.

  • High to medium accuracies (91–59%) for 44 bird species in an urban setting.

  • Repeatable and cost effective methods for deriving high-resolution SDMs.

Abstract

Birds respond strongly to vegetation structure and composition, yet typical species distribution models (SDMs) that incorporate Earth observation (EO) data use discrete land-use/cover data to model habitat suitability. Since this neglects factors of internal spatial composition and heterogeneity of EO data, we suggest a novel scheme deriving continuous indicators of vegetation heterogeneity from high-resolution EO data.

The deployed concepts encompass vegetation fractions for determining vegetation density and spectral traits for the quantification of vegetation heterogeneity. Both indicators are derived from RapidEye data, thus featuring a continuous spatial resolution of 6.5 m. Using these indicators as predictors, we model breeding bird habitats using a random forest (RF) classifier for the city of Leipzig, Germany using a single EO image.

SDMs are trained for the breeding sites of 44 urban bird species, featuring medium to very high accuracies (59–90%). Analysing similarities between the models regarding variable importance of single predictors allows species groups to be determined based on their preferences and dependencies regarding the amount of vegetation and its spatial and structural heterogeneity. When combining the SDMs, models of urban bird species richness can be derived.

The combination of high-resolution EO data paired with the RF machine learning technique creates very detailed insights into the ecology of the urban avifauna, opening up opportunities of optimising greenspace management schemes or urban development in densifying cities concerning overall bird species richness or single species under threat of local extinction.

Introduction

Modelling potential breeding sites that are species-specific can be an integral part of urban, peri-urban and non-urban biodiversity studies and conservation strategies (Guisan and Thuiller, 2005). The urban environment is especially rich in birds, often surpassing their rural surroundings in terms of biomass and diversity (Chace and Walsh, 2006). A key element of sustaining viable population sizes of single species under threat or increasing overall species richness and abundance is the identification and protection of breeding sites. A core element determining the breeding sites of birds is vegetation structure such as vegetation density and diversity (Paker et al., 2014). Earth observation datasets provide a cost-effective, reproducible and straightforward method for the analysis of such vegetation parameters.

Satellite-derived information has been widely used to predict species richness, diversity and turnover in a variety of kingdoms (Rocchini et al., 2010, Rocchini et al., 2017). While the analysis of such diversity parameters is valuable (Rocchini et al., 2010), those analyses lack species-specific information. For multiple use cases such as species protection measures or environmental impact assessments, species distribution models (SDMs) are needed (Guisan and Thuiller, 2005). However, there is a clear lack in SDMs since existing models have two major problems regarding the characteristics of input data and modelling technique.

Regarding modelling techniques, studies often use regression (Bino et al., 2008, Warton et al., 2015). Due to the model assumptions inherent to most regression methods, problems such as collinearity between predictors, outliers or non-linear and exponential relationships may result in bad model performance (Rousseeuw and Leroy, 2005, Dormann et al., 2013). Thus, to overcome the aforementioned limitations, a more flexible machine learning approach seems favourable for SDMs. One particularly robust and well-established procedure in ecology and EO studies is thereby the RF-algorithm (Cutler et al., 2007, Belgiu and Dragu, 2016), which is an ensemble learning method consisting of a multitude of decision trees (Breiman, 2001). RFs are able to deal with highly collinear predictors that can be both quantitative (numeric) and qualitative (non-numeric) with all kinds of variable interactions, making them, therefore, often superior to regression.

Input data is often inadequate because a multitude of models use classified, discrete land-use/cover data (Falcucci et al., 2007). This implies two important pitfalls, firstly, the loss of information, namely the internal heterogeneity in a certain land-use/cover class, and secondly, the loss of transition zones between different classes through sharp boundaries (Palmer et al., 2002, Lausch et al., 2015). However, transition zones and internal heterogeneity are key factors for bird species’ distribution (He et al., 2015).

Urban environments are dynamic and complex and, within them, sites of high biodiversity can be found next to intensely managed ones (Haase et al., 2014, Knapp et al., 2017). This species richness, however, seems to be in danger as recent reports state that multiple species in Europe (Bowler et al., 2019) and also in Germany (Gedeon et al., 2004) are in rapid decline. This trend is especially apparent for bird species breeding in urban and agricultural settings, since those feature the most rapid reduction of all regarded habitat types (Gedeon et al., 2004). Since the case study area of this paper, the city of Leipzig, Germany, is characterised by a dense centre with vast parks as well as a large natural forest and fertile agrarian surroundings, it is an ideal case study for developing models for those endangered species groups and also for the large group of forest birds (Wellmann et al., 2018).

Urban ornithological studies show that even small patches of vegetation can serve as viable breeding sites (Ikin et al., 2013) and that birds respond to both vegetation composition and configuration (Chace and Walsh, 2006). Hence, for complex urban settings such as the city of Leipzig, high-resolution data is much needed. Therefore, high spatial resolution data, as provided by the RapidEye satellite fleet, seems favourable for deployment in the urban environment (Tigges et al., 2013). From such high-resolution EO data, various plant characteristics can be analysed by using the spectral traits approach (Lausch et al., 2016). This spectral trait framework, introduced by Lausch et al. (2016), builds on the traits framework (Kattge et al., 2011) by incorporating those plant traits that are detectable by EO based techniques. The spectral traits concept hence includes biochemical, biophysical, physiological, structural, phenological or functional characteristics of plants, populations and communities (Kattge et al., 2011, Lausch et al., 2016).

The spectral trait concept is a functional approach in which every plant trait corresponds to a function, that is relevant for (i) the plant and (ii) the larger ecosystem (Violle et al., 2007). Therefore, the spectral traits approach is an efficient interface linking EO data to key ecosystem characteristics, functions and services (Lausch et al., 2016), which in return could be linked to bird species breeding behaviour.

One way of analysing the spatial diversity of spectral traits in a plant community is by quantifying the composition and configuration of a plant trait related product, e.g., Normalized Difference Vegetation Index (NDVI), in space and over time (Wellmann et al., 2018). For this, texture measures by Haralick et al. (1973), such as the grey level co-occurrence matrix (GLCM), are powerful and well established methods used by St-Louis et al. (2009) for the prediction of bird species diversity.

Consequently, the combination of high-resolution satellite data paired with machine learning techniques can create novel and detailed insights into the ecology of urban birds and their habitats. Since there is no established framework for modelling bird-breeding sites based on continuous spectral EO data, this paper seeks to develop an according methodology to predict the breeding sites for urban bird species. The following research questions guide the development:

  • (i)

    Are fractional vegetation cover and spectral plant traits meaningful indicators for the prediction of breeding sites for species in the urban environment?

  • (ii)

    What are suitable modelling techniques?

  • (iii)

    How accurate are SDMs solely derived from EO data?

  • (iv)

    How do the SDMs help to predict bird species richness?

Section snippets

Study area

Leipzig is a dense city in Eastern Germany located at 51°20′N, 12°22′E with 560,000 inhabitants. The city houses a considerable number of natural biotopes and breeding-bird species richness is comparably high (Fig. 1). Almost 40% of all bird species breeding in Germany (n = 314) can also be found in Leipzig (n = 120) (StUfa, 1995, Völkl et al., 2004). Important breeding grounds are located along a north to south transect in the large remnants of the alluvial forest on the floodplains. This

Data and methods

To model the presence and absence of 44 breeding bird species, we propose a new methodology that only uses a single RapidEye EO data set (Fig. 2). The EO based methodology builds on fractional vegetation cover, the NDVI and a principal component analysis (PCA). We then used these products to calculate indicators of spatial heterogeneity, mainly with a grey level co-occurrence matrix (GLCM) and indicators of vegetation density. Using a random forest (RF) classifier on the aforementioned data

Accuracies of computed random forest models

The dataset was split into 20% for testing and 80% for growing the random forests. The mean overall accuracy for the 44 models, based on the testing data set, is 78%, with the best model featuring an overall accuracy of 90% and the worst 59% (Fig. 4 and Table A1 in the Appendix). The mean accuracy for predicting absences (representing species’ specificity) is approximately 77%, while the prediction for presences (representing species’ sensitivity) is about 70% (Table 2). Low sensitivity values

Discussion

This study proposes a new approach to the integration of satellite-derived data for a more transferable, comparable and cost-efficient way to derive high-resolution SDMs. It does so by deriving indicators directly from continuous Earth observation data in order to reduce the deficiencies arising from pre-classified land cover/land use products. These indicators build on functional vegetation traits as crucial habitat variables for species modelling. Since previous studies predicting animal

Conclusions

This study shows that satellite-derived vegetation parameters describing the composition and configuration of vegetation traits in a continuous way can play a crucial role in expanding the knowledge about species distribution patterns. Generally, results are promising and show that the usage of a single RapidEye scene paired with machine learning models can produce SDMs at high resolution and accuracy. Since the provisioning of suitable nesting grounds are key for the survival of a species, the

CRediT authorship contribution statement

Thilo Wellmann: Conceptualization, Methodology, Software, Writing - original draft, Data curation, Formal analysis. Angela Lausch: Supervision, Methodology. Sebastian Scheuer: Software, Writing - review & editing. Dagmar Haase: Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This research was carried out as part of the project ENABLE, funded through the 2015–2016 BiodivERsA COFUND call for research proposals, with the national funders The Swedish Research Council for Environment, Agricultural Sciences, and Spatial Planning, Swedish Environmental Protection Agency, German Aeronautics and Space Research Centre, National Science Centre (Poland), The Research Council of Norway and the Spanish Ministry of Economy and Competitiveness. We further wish to thank the Horizon

Data accessibility

The final modelling dataset and the fractional vegetation cover product for Leipzig can be assessed via: http://doi.org/10.5281/zenodo.3597379.

References (64)

  • D. Rocchini et al.

    Remotely sensed spectral heterogeneity as a proxy of species diversity: recent advances and open challenges

    Ecol. Inf.

    (2010)
  • J. Tigges et al.

    Urban vegetation classification: benefits of multitemporal RapidEye satellite data

    Remote Sens. Environ.

    (2013)
  • T. Wellmann et al.

    Urban land use intensity assessment: the potential of spatio-temporal spectral traits with remote sensing

    Ecol. Ind.

    (2018)
  • S. Bernard et al.

    Influence of hyperparameters on random forest accuracy

  • G. Bino et al.

    Accurate prediction of bird species richness patterns in an urban environment using Landsat-derived NDVI and spectral unmixing

    Int. J. Remote Sens.

    (2008)
  • BirdLife International, 2017. The IUCN Red List of Threatened Species. Retrieved 11 April 2018, from...
  • D. Bowler et al.

    Long-term declines of European insectivorous bird populations and potential causes

    Conserv. Biol.

    (2019)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • C. Chen et al.

    Using random forest to learn imbalanced data

    Univ. California, Berkeley

    (2004)
  • D. Cutler et al.

    Random forests for classification in ecology

    Ecology

    (2007)
  • C. Dormann et al.

    Collinearity: a review of methods to deal with it and a simulation study evaluating their performance

    Ecography

    (2013)
  • J. Evans et al.

    Modeling species distribution and change using random forest

  • A. Falcucci et al.

    Changes in land-use/land-cover patterns in Italy and their implications for biodiversity conservation

    Landscape Ecol.

    (2007)
  • J. Gamon et al.

    Relationships between NDVI, canopy structure, and photosynthesis in three Californian vegetation types

    Ecol. Appl.

    (1995)
  • R. Geary

    The contiguity ratio and statistical mapping

    Incorporated Stat.

    (1954)
  • Gedeon, K., Mitschke, A., & Sudfeldt, C., 2004. Brutvögel in Deutschland. Stiftung Vogelmonitoring Deutschland....
  • M. Greenacre et al.

    Multiple Correspondence Analysis and Related Methods

    (2006)
  • A. Guisan et al.

    Predicting species distribution: offering more than simple habitat models

    Ecol. Lett.

    (2005)
  • D. Haase et al.

    Front and back yard green analysis with subpixel vegetation fractions from earth observation data in a city

    Landscape Urban Plann.

    (2019)
  • D. Haase et al.

    A quantitative review of urban ecosystem service assessments: concepts, models, and implementation

    Ambio

    (2014)
  • R. Haralick et al.

    Textural features for image classification

    IEEE Trans. Syst. Man Cybern.

    (1973)
  • K. He et al.

    Will remote sensing shape the next generation of species distribution models?

    Remote Sens. Ecol. Conserv.

    (2015)
  • Cited by (19)

    • A data-integration approach to correct sampling bias in species distribution models using multiple datasets of breeding birds in the Swiss Alps

      2022, Ecological Informatics
      Citation Excerpt :

      We examined how various bird data sources can be treated differently in SDMs applying two different techniques of data combination. Our result also showed that machine-learning algorithm particularly tree-based ensembles Random Forest (RF) (Tonini et al., 2020) is the most accurate modelling technique for predicting bird species distribution and could offer a more precise prediction of assessing the species-environment interactions comparing to other modelling techniques (Li et al., 2017; Mi et al., 2017; Wellmann et al., 2020). This recent algorithm is one of the most accurate techniques in ecological modelling (Bradter et al., 2013; Li and Wang, 2013) that can better model and implement complex non-linear interactions between species and the ecosystem (Garzon et al., 2006; Heikkinen et al., 2012; Oliver et al., 2012).

    • Predicting bird species presence in urban areas with NDVI: An assessment within and between cities

      2021, Urban Forestry and Urban Greening
      Citation Excerpt :

      On the other hand, the low predictive power of the models could be due to the lack of other important environmental variables that have been shown to explain bird distribution in urban areas, such as landscape composition, vegetation diversity and composition, human population density, noise and chemical pollution (Leveau, 2013; Benito et al., 2019; da Silva et al., 2020; Carral-Murrieta et al., 2020; Chaparro et al., 2020; Plummer et al., 2020; Yang et al., 2020). Moreover, the use of multispectral band data of higher spatial resolution, such as RapidEye, is considered a more adequate way to predict bird species distribution in urban areas (Wellmann et al. 2020). However, unlike the MODIS images, the RapidEye images are costly and not available for everyone (Tewes et al., 2015), thus precluding the transferability of the method among cities.

    • Landscape and parental tree availability drive spread of Ailanthus altissima in the urban ecosystem of Poznań, Poland

      2020, Urban Forestry and Urban Greening
      Citation Excerpt :

      We decided to use this method due to the poor representation of positive observations of A. altissima in the dataset, which inflates the results of commonly used parametric models, e.g. logistic regression. Moreover, machine learning accounts for multiple interactions among variables and provides good tools for an explanation of the model, making presentation of the results more understandable for non-specialists (Fig. 2), and thus it is widely used in various studies in urban ecosystems (e.g. Obidziński et al., 2016; Dyderski et al., 2017a; Wellmann et al., 2020). To decrease potential overfitting of the model we implemented internal repeated cross-validation (10 repeats, 10 times) during model development, using the caret::train() function (Kuhn et al., 2020).

    • Green growth? On the relation between population density, land use and vegetation cover fractions in a city using a 30-years Landsat time series

      2020, Landscape and Urban Planning
      Citation Excerpt :

      In doing so, we have the clear advantage of using all of the bands from the satellite image as opposed to only two as is the case with the NDVI (Okujeni, van der Linden, & Hostert, 2015). Moreover, spectral unmixing has shown promising results for further applications in planning-relevant urban ecology such as species distribution modelling (Wellmann, Lausch, Scheuer, & Haase, 2020). This study exclusively looks at the space within the administrative borders of the city of Berlin and not beyond.

    View all citing articles on Scopus
    View full text