Introduction

Monitoring progress towards Aichi Biodiversity Targets (SCBD 2010) and the Sustainable Development Goals (United Nations 2015), and using corresponding indicators in global environmental governance, all require reliable descriptions of species distribution over time and space (Visconti et al. 2016; Kissling et al. 2018). Gaps in accessible information on species distributions, except in a handful of well-sampled regions and species, necessitates integrating all available data and knowledge sources, and intensifying cooperation to more effectively address societal biodiversity information needs (Kot et al. 2010; Meyer et al. 2015, 2016; GBIF Secretariat 2017). A valuable alternative source by which this scarcity of information can be compensated is experts and their knowledge on species-environment interactions (Murray et al. 2009; Martin et al. 2012). There are few countries in which accessible information on expert-expected species ranges are above one-fifth of all species groups (Map of Life 2018). Therefore, expert knowledge is still the most comprehensive source of information on species distribution all around the world. Experts may come up with beliefs concerning the distribution of species at a specific location, or predictions given specific environmental evidence (Kuhnert 2011). Belief is the simplest form of mental representation in which an expert expresses their knowledge. Expert knowledge of species distributions and its environmental preferences are either gained through direct observation, or deducted from indirect sources of information. Expert knowledge is always subject to a level of uncertainty. In practice, however, application of expert knowledge in species distribution mapping, and accounting for the corresponding level of uncertainty, presents challenges (Ferrier et al. 2002), and has not yet been widely promoted (Carpenter 2002; Franklin 2010).

Several approaches have been developed for knowledge elicitation and incorporation into species distribution modelling practices. Multi-criteria decision methods such as the analytical hierarchy process (Anselin et al. 1989; Store and Kangas 2001; Doswald et al. 2007), expert system formalisms (Yang et al. 2006), and fuzzy sets theory (Rüger et al. 2005) have been widely applied in habitat suitability modelling. However, conventional data driven distribution models often outperform mentioned approaches when sufficient species occurrence data are available. Another approach is to conduct expert knowledge using Bayesian models (Choy et al. 2009; Kuhnert et al. 2010), which accommodates the experts’ knowledge in the form of probabilities. This approach requires an expert to have beliefs on the probability of either presence or absence of species, which is unlikely in most cases, thus often treated as corresponding complementary steps (Niamir et al. 2011).

A flexible framework for approaching the representation of knowledge and therefore the recognized ignorance is the Dempster–Shafer theory of evidence (DST) (Dempster 1968; Shafer 1976; Dempster 2008a). The DST draws inferences from incomplete and uncertain knowledge provided by various independent knowledge sources. An advantage of this theory is its ability to deal with ignorance. In particular, it provides explicit estimation of the extent of imprecision and conflict resulting from the knowledge provided by various experts, and can deal with any union of hypotheses (Le Hegarat-Mascle et al. 1997). It handles ambiguous and incomplete information using evidential functions. Evidential functions can be derived from probabilities (Lee et al. 1987; Rombaut and Zhu 2002), from the distance to cluster centre (Bloch 1996), or from fuzzy membership functions (Binaghi et al. 2000; Boudraa et al. 2004). These functions can then be used to represent incomplete knowledge, and to distinguish between the lack of information and observed information (An et al. 1994). The DST has been successfully applied in the geo-information domain (Malpica et al. 2007), particularly in remote sensing and image processing (Mertikas and Zervakis 2001), decision making (Altieri et al. 2017), and uncertainty management (Comber et al. 2004; Clements et al. 2006; Baraldi and Zio 2010; Feizizadeh 2018), and has been widely used in risk management (Neshat and Pradhan 2015; González et al. 2018).

In this study we aim to demonstrate the application of the DST to compensate for the gaps in accessible information on species distribution where occurrence data is neither available nor reliable, nonetheless expert knowledge exists and surveying knowledge holders is more feasible than running a systemic field sampling. This also provides a framework that accounts for knowledge uncertainty, aggregates knowledge from multiple experts, and thus combines beliefs. We formed an independent expert panel and derived the evidential functions using an online questionnaire (see Appendix S2) for two eagle species, the Bonelli’s eagle (Aquila fasciata), and the short-toed eagle (Circaetus gallicus) in Spain. We then evaluated the performance of our approach using an independent occurrence dataset from the atlas of Spanish breeding birds (Martí and del Moral 2003).

Explanation of Dempster–Shafer theory of evidence

The DST is a generalisation of traditional probability that allows uncertainty to be quantified (Shafer 1982; Dempster 2008a). This theory first supposes the definition of a set of hypotheses Θ called the frame of discernment, defined as follows:

$$\varTheta = \, \left\{ {H_{ 1}, H_{ 2}, \ldots ,H_{\text{N}} } \right\} \, .$$

It is composed of N hypotheses. Let’s denote P(Θ) as the power set composed with the 2 N propositions A of Θ, where ∅ denotes the empty set:

$$P(\Theta ){\mkern 1mu} = {\mkern 1mu} \{ \emptyset ,{\mkern 1mu} \left\{ {H_{1} } \right\},\left\{ {H_{2} } \right\}, \ldots ,{\mkern 1mu} \left\{ {H_{{\text{N}}} } \right\},{\mkern 1mu} \{ H_{1} \cup H_{2} \} ,{\mkern 1mu} \{ H_{1} \cup H_{3} \} ,{\mkern 1mu} \ldots ,\Theta \} {\mkern 1mu} .$$

A key point of evidence theory is the Basic Probability Assignment (BPA) which is a function from P(Θ) to [0,1] defined by:

$$m:P(\varTheta ) \to \left[ {0, 1} \right]$$
$$A \to m\left( A \right)$$

and satisfies the following conditions:

$$\sum\limits_{A \in P(\varTheta )} {m(A) = 1;\,\,\,m(\emptyset ) = 0}$$

From a general point of view, contrary to probability theory—which assigns the probability mass to single elementary events, the DST makes basic probability assignments m(A) on sets of outcomes. The basic belief assignment m(A) expresses the degree of belief that a specific element x belongs to the set A only, and not to any subset of A (Baraldi and Zio 2010). The DST has also conceptual differences from Bayesian Theory. Bayesian theory requires an explicit formulation of conditioning and prior probabilities of events. While the DST embeds conditioning information into its evidential function, making it appropriate for situations where it is difficult to either collect or posit such probabilities, or isolate their contribution (Lee et al. 1987; Hoffman and Murphy 1993; Comber et al. 2004). Unlike probability theory that imposes restrictive conditions on the specification of the likelihood of events as a result of the requirement that the probabilities of the occurrence event must sum to 1, there are two measures of likelihood in the DST called evidential functions. The evidential functions are the degree of support (Sup) and the degree of plausibility (Pls), which represent the upper and lower probability that the available evidence supports a particular belief (Dempster 2008b), as well as the degree of uncertainty (Unc) and the degree of disbelief (Dis) (Fig. 1). Pls is always equal, or larger than Sup, and subtracting Sup from Pls reveals Unc representing the ignorance in one’s belief. If there is no doubt then Sup is equal to Pls. However, if Sup and Dis are set to zero, then there can only remain Unc representing full ignorance, given the evidence.

Fig. 1
figure 1

Schematic relationship of evidential functions (y-axis) for continuous (right x-axis) and categorical (left x-axis) environmental domain (i.e. evidence). The evidential functions are the degree of support (in green), and the degree of plausibility (in blue) which represent the upper and lower probability that the available evidence supports a particular belief, as well as the degree of uncertainty (in red) and the degree of disbelief (in purple)

The Dempster’s (1968) rule of combination is the first one within the framework of evidence theory which can combine two BPAs m1 and m2 to yield a new BPA:

$$m(A) = \frac{{\sum\nolimits_{B \cap C = A} {m_{1} (B)m_{2} (C)} }}{1 - k}.$$

with

$$k = \sum\nolimits_{B \cap C = \emptyset } {m_{1} (B)m_{2} (C)}$$

where k measures the degree of conflict between m1 and m2, k = 0 corresponds to the absence of conflict, whereas k = 1 implies complete contradiction between m1 and m2. The evidential function resulting from the combination of J information sources SJ is defined as

$${\text{m }} = {\text{ m}}_{ 1} \oplus {\text{m}}_{ 2} \cdots \oplus {\text{m}}_{j} \cdots \oplus {\text{m}}_{J} .$$

This provides a framework for the estimation of evidential functions, which are integrated according to Dempster’s rule of combination. For each spatial variable used as an evidence, two independent functions must be estimated, either Sup and Dis, or Sup and Unc (Chung and Fabbri 1993; An et al. 1994). Given evidences X1 and X2, each with Sup and Dis functions, the combined Sup, Dis, and Unc are shown in the following equations (Wright 1996; Carranza and Hale 2001):

$$Sup_{{X_{1} X_{2} }} = \frac{{Sup_{{X_{1} }} Sup_{{X_{2} }} + Sub_{{X_{1} }} Unc_{{X_{2} }} + Sub_{{X_{2} }} Unc_{{X_{1} }} }}{\beta },$$
$$Dis_{{X_{1} X_{2} }} = \frac{{Dis_{{X_{1} }} Dis_{{X_{2} }} + Dis_{{X_{1} }} Unc_{{X_{2} }} + Dis_{{X_{2} }} Unc_{{X_{1} }} }}{\beta },$$
$$Unc_{{X_{1} X_{2} }} = \frac{{Unc_{{X_{1} }} Unc_{X2} }}{\beta },$$
$$\beta = 1 - Sup_{{X_{1} }} Dis_{{X_{2} }} - Dis_{{X_{1} }} Sup_{{X_{2} }} ,$$

where \(\beta\) ensures that the sum of Bel, Dis, and Unc is equal to 1.

The orthogonal sum thus allows two functions to be combined into a third function, which effectively pools pieces of evidence to support propositions of interest and multi-source information can be easily fused in the framework of evidence theory. Derivation of the evidential functions is the most crucial step since it represents the belief as well as the uncertainty surrounding selected evidence. This step requires prudence when applying DST to species distribution since the assignment of function values fully rely on the quality of expert knowledge (Moon 1989; Tangestani and Moore 2002).

Materials and methods

Study area

The study area is mainland Spain, covering about 493,000 km2. Spain may be divided into three climatic areas due to its large size; Atlantic, Mediterranean, and Interior. Forests are widely distributed in Spain and cover approximately half of the country (Balnco et al. 2005). Spain also comprises important mountain ranges, situated mainly in the north (Pyrenees and Cantabrian mountains) and the southeast (Baetic mountains), with relevant mountain chains traversing central Spain from west to east (Iberian and Central systems). The Iberian sclerophyllous and semi-deciduous forests is the dominant ecoregion (Olson et al. 2001) followed by a band of the Cantabrian mixed forests in the north, patches of conifer and montane forests, and Mediterranean sclerophyllous and woodlands. Mainland Spain is of particular interest for the target species of this study in a European context.

Choice of species and independent occurrence data

We selected two raptors for this study; the Bonelli’s eagle, a well-known species, and the poorly-sampled short-toed eagle. This selection provided us an opportunity to further investigate the application of an approach where expert knowledge is mainly based on direct observation and extensive literature, versus the case that it is mainly based on deduction and indirect observations. The Bonelli’s eagle is a medium to large, cliff-nesting bird of prey with its western Palaearctic populations mainly located in the Mediterranean area (Del Hoyo et al. 1994). The European population has suffered a significant decline over the last decade, and the Bonelli’s eagle has consequently been listed as an endangered European Species (Birdlife-International 2015). Spain supports 730–800 breeding pairs (Real 2003) or about 80% of the total European population. Since the Bonelli’s eagle is included as a priority target species for special conservation measures in European and Spanish legislations, high-priority conservation has been urged and consequently, it is a fairly well-studied species (Muñoz and Real 2013; Muñoz et al. 2013). The short-toed eagle is a medium-sized bird of prey. The European population migrates mainly to sub-Saharan Africa (Del Hoyo et al. 1994), leaving in September–October and returning in March–April. This species builds its nests on trees, mainly pines, in forests with relatively little human disturbance (López-Iborra et al. 2011). Nowadays, the species seems to be more abundant than in the past decades, with estimates of approximately 10,000 breeding pairs (Palomino et al. 2011). Currently, there are several reports on their wintering in southern Spain (Jiménez and Muñoz 2008). Due to its shy behaviour, commonness, and low-priority conservation status, the short-toed eagle remains one of the least-sampled raptor species in the region.

We obtained the occurrence datasets from the Atlas of Spanish breeding birds (Martí and del Moral 2003) for both species using Universal Transverse Mercator (UTM) 10 × 10 km grid cells (n = 684). A species was recorded as being present in a grid cell if the atlas of Spanish breeding birds’ database held at least one reliable record of that species from a location within that grid cell. There are 943 grid cells (~ 20% of the study area) for the Bonelli’s eagle, and 2667 grid cells (~ 57% of the study area) for the short-toed eagle is marked as present (see Appendix S1). Since the sampling scheme was harmonized across the country and has been repeated for years, we assumed grids with no occurrence report as absences, 2017 grid cells for the short-toes eagle and 3741 grid cell for the Bonelli’s eagle.

Choice of experts

Independent experts were identified through their membership of a scientific society and by their relevant publications. All selected experts had performed long term extensive studies on birds of prey and specifically on the selected species in Spain. They were invited to participate and reply to the online questionnaire (see Appendix S2) independently. Nine experts accepted the invitation and thus formed the knowledge domain for this study. Seven experts submitted online questionnaires for both species, while two did so for one species only. Therefore the knowledge domain for each species consisted of eight experts. Experts did not exchange knowledge nor fill in the questionnaires collaboratively. We modified the questionnaires in order to avoid any pairwise or multiple comparisons. We also asked experts to provide us with comments at the end of each section. Experts were also asked to express their main sources of knowledge too.

Environmental predictors

We studied the literature and consulted experts on species habitat preferences for both the Bonelli’s eagle and the short-toed eagle in order to collect a list of potentially relevant environmental predictors. The predictors were chosen on the basis of availability, the spatial correlation among predictors (i.e. Multicollinearity) (Legendre 1993), and potential predictive power for the target in Spain. We then presented the expert panel with the land cover classes (EEA 2016), topographical attributes (i.e. elevation, slope, and aspect) (USGS 2006), as well as two bioclimatic variables; maximum temperature in the warmest month, and minimum temperature in the coldest month (Fick and Hijmans 2017). We provided expert panel with 2 classes of aspect (north facing slopes and south facing slopes), 3 classes of slope (flat < 30%, steep 30% < > 100%, and very steep > 100%), and 3 classes of elevation (Lowlands < 1000 masl, Midlands 1000 masl < > 2000 masl, Highlands > 2000 masl). However, the experts had an option to provide their own classes or define response curves for each gradient. We harmonized the spatial resolution of the predictors at 1 × 1 km in Universal Transverse Mercator (UTM) projection. We aggregated and resampled topographical attributes by the mean function and the land cover classes by the majority function from their original spatial resolution (i.e. 0.1 × 0.1 km). See Appendixes S1 and S4.

Performance measures

We measured the discrimination capacity of species distribution models by analysing their receiver operation characteristic (ROC) curves. A ROC curve plots sensitivity values (i.e. a true positive fraction) on the y-axis against 1—specificity values (i.e. a false positive fraction) for all the thresholds on the x-axis. The area under such a curve (AUC) provides a threshold-independent measure across all possible classification thresholds for each model (Fielding and Bell 1997). We used the full independent atlas dataset to measure the discrimination capacity of the models. We also measured the calibration (i.e. goodness-of-fit) (Lemeshow and Hosmer 1982) of the models using Miller’s calibration statistic (Miller et al. 1991; Pearce and Ferrier 2000) based on the full independent atlas dataset. Miller’s calibration statistic evaluates the ability of a distribution model to correctly predict the proportion of species occurrences with a given environmental profile. It is based on the hypothesis that the calibration line (i.e. perfect calibration) has an intercept of zero and a slope of one. The calibration plot shows the model’s estimated probability (x-axis) against the mean observed proportion of positive cases (y-axis) for equal-sized probability intervals (number of intervals = 10). We compared the models by calculating the root mean square error (RMSE) of the calibration plot for each model (Armstrong and Collopy 1992; Niamir et al. 2016). RMSE is always between 0 and 1. For example, if the model line lies exactly on the calibration line, then the RMSE will be 0.

To further investigate the performance of the proposed approach we compared the outcome of the DST models with multiple SDMs within an ensemble forecasting framework (Araújo and New 2007). This provides insights to better explore uncertainty propagation and spatial patterns. Since the study area, the environmental predictors, the target species, and the test dataset were exactly the same (Araújo and New 2007; Godsoe 2012; Qiao et al. 2018); we used AUC to measures and compare the discrimination power of the SDMs and DST models. We also calculated and compared the RMSE calibration plot as a measure of calibration for the SDMs. We used four methods; generalized linear model, generalized additive models, boosted regression trees, and random forests to form an unweighted ensemble framework. See Appendix S5 for settings. We followed the recommended default setting of SDM R package (Naimi and Araújo 2016). For each species and in 100 realizations, we randomly sampled 50% of the species atlas datasets to train, and the remaining to test the models. The idea was to compute the result of the models repeatedly using varied input values and then to assess the accuracy of each (Heuvelink 1998). This so-called Monte Carlo simulation allowed us to assess uncertainty in the ensemble framework.

We also performed a complementary assessment to measure geographical pattern similarity between the DST models and multiple SDMs using the SDMTools package (VanDerWal et al. 2014) proposed by Warren et al. (2008) and modified by Broennimann et al. (2012).

Experimental settings

We used a structured procedure to interrogate the experts. The experts were provided with online questionnaires consisting of factual questions See Appendix S3. The procedure to assign evidential functions and to combine BPAs was as follows:

  • The experts assigned their beliefs in form of likelihood (P) given the evidence. To avoid misinterpretations we stay at the verbal models when asking beliefs. We set four linguistic probabilities (Xu et al. 2003; Bárdossy and Fodor 2004) “very unlikely”, “unlikely”, “likely”, and “very likely” in order to subsequently derive a numerical value to represent belief (Fig. 2); e.g. “It is unlikely that Bonelli’s eagle nest in mine, dump, or constitution sites”. For continuous evidence layers we implemented this process in a fuzzy inference system (Zadeh 1988), into which the evidential functions were imported as membership functions for the given evidence; e.g. “It is likely that the short-toed eagle nest in steep (above 100%) slopes”. Experts had the option not to assign belief (P) to evidence (i.e. environmental factor). This provides an opportunity to accommodate beliefs from variety of evidence. This means that experts can skip a piece of evidence while others assign beliefs to that.

    Fig. 2
    figure 2

    Word-to-probability relationship adopted from (Xu et al. 2003). Top; Experts assigned their beliefs in form of likelihood (italic) using verbal models, and the corresponding numerical values (bold) were used to generate evidential functions. Bottom; experts self-evaluated the level of confidence in their knowledge using a five-point scale ranging from “not confident at all” to “quite confident”, the corresponding numerical values (bold) were used to generate evidential functions

  • In each question—corresponding to a piece of evidence—the expert is asked to self-evaluate the level of confidence in her/his knowledge in order to express their belief on favourable habitat of the target species. We used a five-point scale ranging from “not confident at all” to “quite confident”. We assumed that the level of confidence and the degree of uncertainty are complementary percentage, i.e. Uncertainty =1 − Confidence. We then converted these rankings, using Fig. 2, into belief equivalents to obtain the degree of uncertainty to be applied to that specific belief.

    Given the probability of occurrence (P) and the degree of uncertainty (Unc) for the given evidence, the degree of support (Sup), and the degree of plausibility (Pls) were calculated as follows:

    $$Sup \, = \, P \, {-} \, \left( {Unc \, \times \, P} \right)$$
    $$Pls \, = \, P \, + \, \left( {Unc \, \times \, \left( {1 - P} \right)} \right)$$
  • This approach insured asymmetric weights of uncertainty for the upper (Pls) and the lower (Sup) probabilities.

  • We applied the evidential functions to their corresponding piece of evidence (i.e. environmental factor) to generate the Sup and the Unc maps for each of the experts. Then the generated maps were combined using Dempster’s (1968) rules of combination to produce the Sup map and the Unc map for each of the environmental predictors.

  • Finally, the Sup map and the Unc map of all the environmental factors were combined using the same rules to obtain the final Sup map of the target species along with the degree of uncertainty. The Pls and the Dis maps were generated accordingly. Note that this procedure is not sensitive to the order in which the environmental factors and the experts are combined.

Results

We produced the support (Sup) and the plausibility (Pls) maps for each of the environmental factors (i.e. evidence) based on the knowledge domain using combination rules for both species, and calculated the discrimination capacity and RMSE of the Miller’s calibration plot of the Sup and the Pls maps (Table 1). For Bonelli’s eagle, the slope gradient and the land cover classes were the most discriminative evidential layers with mean AUC values of 0.70 and 0.69, respectively. The discrimination capacity for the aspect and the bioclimatic variables was relatively low with AUC values below 0.60. Two experts did not assign probability values based on the aspect for the Bonelli’s eagle. They both commented that they believe this variable is not a discriminative parameter for this species.

Table 1 Performance of the DST models and the SDMs ensemble to map the distribution of the Bonelli’s Eagle and the short-toed eagle in Spain

For the short-toed eagle the slope gradient gained the highest discrimination capacity with mean AUC values of 0.69 followed by the land cover with mean AUC values of 0.64. The Sup was more discriminative (AUC = 0.68) than the Pls (AUC = 0.62), suggesting that experts overestimated their uncertainty when using the land cover classes. Three of the experts did not use the minimum temperature in the coldest month on the basis that the short-toed eagle is a migratory species. However, the other experts acknowledged that those few birds wintering in southern Spain would avoid freezing temperatures and therefore assigned their evidential functions using this variable.

We applied Dempster’s combination rule recursively so that the above-mentioned evidence maps were combined in a pairwise manner. When we combined all the evidential maps for the short-toed eagle (Fig. 3) the mean AUC improved to 0.70 (Table 1). However, the most discriminative distribution map for the short-toed eagle was achieved by combining the land cover, the elevation, and the slope (AUC = 0.71). The discrimination capacity of distribution maps for the Bonelli’s eagle sharply improved to 0.80 when all the environmental predictors except the maximum temperature in the warmest month were included (Fig. 3). The mean AUC decreased to 0.78 when the temperature in the warmest month also combined.

Fig. 3
figure 3

Distribution maps of the Bonelli’s eagle (left column) and the short-toed eagle (right column) in Spain, modelled by the DST approach (upper row), and the SDMs ensemble (lower row). The DST approach is purely based on expert knowledge. The SDMs ensemble is based on the Atlas of Spanish Breeding Birds. Both approaches used five environmental factors. Maps are presented on a graduated darkgreen-red scale; darkgreen indicating low probability of occurrence, and red indicating high probability of occurrence

We calculated the RMSE of the calibration plot of the final distribution maps for both species to evaluate the reliability of the output (Table 1). The maps of the short-toed eagle were better calibrated (i.e. the model’s estimated probability closer to the Miller’s calibration line) between the probability ranges 0.4 and 0.8 compared to the Bonelli’s eagle. The Bonelli’s eagle maps generally underestimated the probability of occurrence. Overall, in the case of short-toed eagle although the discrimination capacity of the maps was not high (AUC ~ 0.70), the models were well-calibrated (RMSE ~ 0.22) thus reliable for spatial and temporal extrapolation. However, in the case of the Bonelli’s eagle predictive power of the evidential functions was high (AUC ~ 0.80), but the models were not well-calibrated (RMSE ~ 0.40). The uncertainty propagation assessment revealed that the overall uncertainty of the combined maps was diminished when the Dempster’s combination rule was applied recursively for both species. We should note that this solely reflects the uncertainty of the combined beliefs, and should not be interpreted as an accuracy measure, as it absolutely depends on the quality of the expert panel.

We compared the performance of the DST outcomes (i.e. the expert panel) with the performance of the SDMs ensemble (Fig. 3). The average AUC values as a result of the Monte Carlo simulation for Bonelli’s eagle were 0.81 with the variation around 0.09. These values for the short-toed eagle were 0.74 with slightly higher standard deviation around 0.11. The SDMs ensemble outperformed the DST models with small differences for both species. In case of the short-toed eagle the RMSE of the calibration plot were almost similar between SDMs ensemble and DST models, while for the Bonelli’s SDMs ensemble was better calibrated (Table 1). There were high values of geographical pattern similarity between the predictions obtained using multiple SDMs and those obtained using DST models (Hellinger’s I > 0.95; Table 1).

To have a better insight into the variability among experts, we compared the evidential functions assigned by the expert panel. It revealed that two of the experts consistently assigned relatively higher probabilities, while one other consistently assigned relatively lower probabilities for both species. We also noticed in our experiments that direct observations and field experiments made experts more confident, resulting in them assigning a relatively lower uncertainty level to their estimates. In contrast, the experts who obtained their knowledge indirectly and deductively, self-evaluated a lower level of confidence in their estimates. In general the expert’s self-acknowledged confidence level—and therefore the difference between support and plausibility functions for the Bonelli’s eagle, was relatively lower compared with the short-toed eagle. This was in accordance with the fact that the Bonelli’s eagle is a better-known species in the study area than the short-toed eagle. The experts were more consistent in expressing their knowledge when using land cover classes for both species than when using other environmental predictors. There were higher levels of inconsistency among experts when considering topographic variables, though their levels of confidence were relatively high. The experts were less confident when assigning belief functions to bioclimatic variables. We also noticed that experts were more confident in expressing their knowledge based on visible and geographically detectable phenomena (e.g. the land cover class and the aspect) than those with implicit characteristics (e.g. the elevation, and the temperature).

Discussion

Inductive models can be employed to study species-environmental relationships, and to support initiatives to achieve targets where information on species distribution is accessible at the desired spatial scale. However, for many terrestrial and marine regions sufficient information on species distribution is not accessible to inform policymakers (Kot et al. 2010; Meyer et al. 2015). Utilizing expert knowledge, as an alternative source of information, may allow obtaining reliable information on species distribution where occurrence data are scarce (GBIF Secretariat 2017; Map of Life 2018). However, expert knowledge is always subject to uncertainty and this should be accounted for in the procedure.

Walker et al. (2003) categorized the nature of uncertainty into epistemic uncertainty and stochastic uncertainty. Stochastic uncertainty is unavoidable and will always be present when dealing with nature. In contrast, the epistemic uncertainty—or recognised ignorance (Brown 2004)—is due to the imperfect knowledge that may be reduced with study and research. Uncertainty in expert knowledge stems from both incomplete understanding of a phenomenon, and an incomplete ability to accurately generalize knowledge beyond the scope of observations (Walker et al. 2003).

The result of this study demonstrates a successful implementation of expert knowledge elicitation and combination through DST evidential functions while accounting for knowledge uncertainty across multiple experts. In this study, we presented an approach that accounts for the recognized ignorance in the process of knowledge acquisition that helps to distinguish the epistemic uncertainty from the innate uncertainty of natural phenomena. Our approach was entirely based on the knowledge of experts with a variety of backgrounds, i.e. academics and technical fieldworkers; the evidential functions effectively represent experts’ epistemic uncertainty.

We are aware of the potential existing biases in expert knowledge. Experts’ replies to the questionnaire may be biased, producing optimistic or pessimistic responses (Cooke 1991), or they may say “don’t know” and assign 100% uncertainty to a class within a single variable or to an entire set of variables. Furthermore, the rule of combination offers a basis for both aggregation and propagation of uncertainties (Baraldi and Zio 2010). Our results show a decrease in the level of uncertainty, combining them in each iteration. In his third general principle about probability, Laplace and Dale (1995) defined that if the events are independent of one another, then the probability of their combined occurrence is the product of their probabilities. This suggests that the number of pair-wise combinations may compensate for the overall gap between the degree of support and the degree of plausibility, and thus decrease the level of uncertainty per iteration. The number of experts and the variety of explanatory variables may be modified based on the required level of confidence.

We evaluated the final distribution maps produced by the proposed DST approach with an independent atlas dataset, further compared with two conventional inductive approaches. In the case of the Bonelli’s eagle our proposed approach demonstrated almost similar discrimination power compared to inductive approaches, while inductive approaches outperformed DST in the case of the short-toed eagle. Bonelli’s eagle is a cliff-nesting species, evidence of nesting is generally more obvious than it is for the short-toed eagle, which is a forest species. Indeed, cliffs are easier to monitor than forested areas; the reproduction of Bonelli’s eagles is usually confirmed by direct observation of the chicks in the nest, whereas in the case of the short-toed eagle, even if there is reasonable confidence that they are breeding in an area, the nest is rarely observed and therefore leaving doubts. Although evidential functions better discriminated favourable habitats for the Bonelli’s eagle than the short-toed eagle, the distribution map of the short-toed eagle was well-calibrated, while this was not the case for the Bonelli’s eagle. This might be an artefact of direct observations and inductive sources of knowledge—being able to identify highly favourable areas and to a lesser extent highly unfavourable areas, while not being able to identify areas where the probability is intermediate. For the short-toed eagle, which is a relatively lesser known species, experts provided more general beliefs and less specific classes than for the Bonelli’s eagle. This resulted in a well-calibrated model though with a lower discrimination capacity, mainly failing to identify the most favourable areas. This also suggests that areas with intermediate probability are more abundant than areas with high or low probability, and it is more difficult to discriminate the presence from the absence in areas with intermediate probability.

Although the DST is a powerful tool for probabilistic reasoning and has been applied widely in engineering and computer science, it has yet to reach the ecological modelling mainstream. Our proposed approach using the DST would be practical where knowledge of a species’ geographic distribution is needed for conservation purposes, especially in case of poorly-sampled species. However, use of knowledgeable experts and, well-structured elicitation processes, are pre-requisites for maximizing the reliability of expert-based models (Murray et al. 2009; Smith et al. 2012). The particular strength of the proposed approach in this study is that it is explicit in accounting for uncertainty in knowledge for model prediction by using evidential functions. It also yields similar results to conventional inductive methods and provides a framework to propagate and aggregate uncertainty, and it capitalizes on the range of data sources usually considered by an expert.