Introduction

The increasing use of volunteered geographic information (VGI) across various application domains has revolutionized how spatial knowledge can be derived. This is attributed to many of the advantages of VGI, notably the ability to provide pervasive location-based data and to allow more flexible data collection mechanisms. These features not only enable better reflections of the observations of ubiquitous users on the ground, but also facilitate the capture of data that may be otherwise left out by traditional data collection means.

Among the applications characterized by their increasing reliance on VGI efforts, species surveillances have attracted considerable attention from researchers (Goodchild 2007; Zhu et al. 2015). The use of high quality VGI for these purposes has implications for scientific inquiries pertaining to environmental, economic, health security, and by extension, sustainability issues.

VGI systems for such purposes can be readily built using existing off-the-shelf computing technologies. However, it remains challenging to assure the quality of VGI so as to derive valuable knowledge. Indeed, any applications involving VGI inevitably face the challenges of high diversity and a greater level of data uncertainty (Goodchild and Li 2012; Kuhn 2007) because VGI can be developed by both authoritative agencies and amateur communities (Foody et al. 2013) as well as contributors of varying levels of knowledge and expertise (Goodchild 2009; Tulloch 2008). Furthermore, VGI can be created for various personal purposes (Coleman et al. 2009) and collected without explicit quality control measures (Girres and Touya 2010) or metadata (Brando and Bucher 2010). Ensuring meaningful gathering of intelligence from VGI thus demands careful treatments with reference to such data. In other words, it is crucial to measure and ensure that VGI is of high degree of data quality. Therefore, this paper proposes a system to assure the quality of VGI collected for the purposes of species surveillances. The system takes advantage of fuzzy set theory to handle data uncertainty and ambiguity inherent in VGI contributions, incorporating explicitly the unique property of VGI–trust. To demonstrate the usefulness of the fuzzy system in handling VGI quality, it was applied to a specific case scenario, i.e., a VGI-based crop pest surveillance.

The following section reviews how the quality of VGI is assured by the approaches from existing work. It also illustrates the related shortcomings. To improve the existing approaches of VGI quality assurance, a fuzzy system is then presented. It is followed by a discussion on the features of the fuzzy system and the future directions of this line of research. The last section concludes the paper.

Assuring the quality of VGI

Direct and indirect approaches

Several existing studies on VGI quality assurance have adopted a direct approach which compares a VGI dataset to an authoritative gold standard reference dataset. For example, Zielstra and Zipf (2010) examined the completeness of a German OpenStreetMap dataset in comparison to a TeleAtlas MultiNet dataset. Haklay (2010) compared a London OpenStreetMap dataset with an Ordnance Survey dataset based on positional accuracy and completeness. More comprehensively, Girres and Touya (2010) extended the work of Haklay (2010) by comparing a French OpenStreeMap dataset with a BD TOPO® IGN dataset based on a larger set of spatial data quality elements.

Such a direct approach can be seen as an adoption of the traditional data quality assessment method that focuses on internal data quality (Devillers et al. 2005). It, however, has limited applicability for assuring the quality of VGI as there is generally an absence of authoritative gold standard reference datasets for VGI applications (Bishr 2007; Kuhn 2007). For example, in the case of utilizing VGI for species surveillances, voluntary observations are often conducted in sparsely populated, rural, or less explored areas of the world. In such a case the gold standard reference datasets are often lacking. In addition, VGI dataset is often more up-to-date than authoritative dataset and thus may be more accurate than the so called gold standard reference dataset (Goodchild and Li 2012). To cope with this issue, indirect approaches relying on surrogate criteria were proposed. Four mainstream indirect approaches are described as follows:

  1. 1.

    The user review approach (Goodchild and Li 2012; Maué and Schade 2008). This approach is user-driven and relies on Linus’ Law which assumes that “given enough eyes all bugs are shallow”. Based on Linus’ Law, user contributions converge on a truth through an iterative error correction process, either in terms of attributive error or positional error, or both. If one user commits an error, the error can be detected or corrected by the other users. Haklay et al. (2010) have applied this approach to OpenStreetMap and suggested its applicability to VGI in general.

  2. 2.

    The provenance approach (Celino 2013; Trame and Keßler 2011). This approach relies on the history of volunteered information. Requesting or tracing the history of a VGI dataset (e.g., who are the data providers?) is helpful in better understanding and assessing its quality.

  3. 3.

    The geographic approach (Goodchild and Li 2012). This approach is based on Tobler’s first law of geography, which assumes things that are closer are more related than things that are farther apart (Tobler 1970). A VGI contribution should fit its geographic context, e.g., a report of a species occurrence is more likely to be true if many similar reports exist nearby. In addition, more credit can be given to a VGI if it is volunteered by a local resident who is physically close to the site of the VGI event and is familiar with the local environment (Seeger 2008).

  4. 4.

    The trust approach (Bishr 2007; Bishr and Janowicz 2010; Bishr and Mantelas 2008). It uses trust as a proxy of quality to establish a link between VGI quality and VGI contributors’ authority based on subjective evaluations. It rests on the extent to which a VGI contributor has provided honest and accurate information. Trusted VGI contributors tend to provide more trustworthy information compared to less trusted ones. The criteria for evaluating the trustworthiness of VGI replace traditional quality measures of geospatial information (e.g., completeness, logical consistency, and positional accuracy). Indeed, the information asymmetry and imperfection of a VGI environment can lead to social uncertainties in VGI consumptions (Sniezek and Van Swol 2001). When high social uncertainties exist, trust appears to be particularly important as it reduces social uncertainties by confining the range of behavior expected from another (Sniezek and Van Swol 2001).

Challenges in using the indirect approaches for species surveillance applications

Among the indirect approaches, the user review approach works well for those VGI that are more traceable, such as those in Wikimapia and OpenStreetMap. However, it is problematic for species surveillance applications because the objects being recorded are often highly mobile or persist for only a short period of time. It is hardly possible to go back to the reported locations to verify every user surveillance report and therefore it is not peer-reviewable. Goodchild and Li (2012) also pointed out that this approach works less well for obscure phenomena, including those short-lived ones. Conducting the review process for time-critical issues (e.g., pest outbreaks) is also impossible because the process is generally time-consuming. Additionally, Linus’ Law sometimes fails. In a crowdsourcing-based cropland capture game, Salk et al. (2015) demonstrated that the majority agreement among volunteers cannot fully substitute the quality assessment by experts on crowdsourced tasks.

The provenance approach, geographic approach, and trust approach appear to be applicable for species surveillance applications. However, when used alone, all three approaches fall short in fully describing VGI data quality.

The provenance approach considers VGI provenance, including data contributors’ expertise. What is challenging, though, is how to appropriately incorporate provenance of user expertise as the expertise level of a VGI contributor is difficult to collect (Keßler et al. 2009). There are also resistances in providing such information due to the concerns on personal privacy and security (Song and Sun 2010). According to Coleman et al. (2009), VGI contributors can be classified into five types: (1) neophyte, (2) interested amateur, (3) expert amateur, (4) expert professional, and (5) expert authority. Normally, people are inclined to trust contributors who are expert professional and expert authoritative. However, a contributor considered to be an expert may understand a project’s specification very well but lack the knowledge of local history or attributes. A contributor considered as either a neophyte or interested amateur may know little about the professional part of a VGI project but is very familiar with the characteristics and details of his or her current location. In short, the boundary between non-expert amateur and expert professional is quickly blurring in VGI environments where the expertise of a contributor cannot be simply judged based on contributor type.

As for the geographic approach, considering only fitness of geographic context tends to be less effective if a user report fits surrounding geographic context well but actually is a false observation.

Regarding the trust approach, how trust as a proxy of quality can be effectively realized in VGI contexts is problematic. It demands appropriate methods to evaluate and quantify the trustworthiness of VGI. In Bishr and Mantelas (2008) an approach combining the trust approach and the geographic approach was proposed to assure VGI quality. Their work does provide valuable insights into the usage of the proxy. First, indeed, the four indirect approaches reviewed here are not mutually exclusive. For instance, some of the elements of trust fall under the geographic approach, i.e., the trustworthiness of VGI can be assessed based on geographic contexts. Second, their approach leverages crowd’s dual roles in VGI creation—contributing locational data and ascertaining the reliability of data (i.e., user trust rating). The second role can be helpful in evaluating the trustworthiness of VGI. Despite these insights, in the combined approach, fuzziness that is inherent in trust (Chang et al. 2005) is not well accounted for. Assessing the quality of VGI based on trust requires special attentions to the fuzzy nature of trust.

Novel approaches thus are called for to synthesize the advantages and minimize the disadvantages of the approaches mentioned above to assure the quality of VGI, with a better way to account for user expertise, geographic context, and fuzziness involved in trust judgment.

A fuzzy system

To address the problems mentioned above, we present a rule-based fuzzy system to assure the quality of user-generated species surveillance reports. The system uses trust as a proxy of quality, considering both the track record of the VGI contributors (i.e., provenance of user expertise) and the fitness of geographic context as defining factors of the trust.

Fuzziness in geospatial data quality

Traditionally, geospatial data quality is categorized into internal quality or external quality (Devillers et al. 2005). The former refers to the assessment of the difference between a dataset and the reality it represents. The latter refers to the fitness for use, the extent to which a dataset can be a good fit for its different uses. Evaluating geospatial data quality of both kinds involves using realities or fitness as the baseline for comparison. The result of the comparison is clear-cut, or “crisp”, as they can either be meeting or failing to meet the standard.

From a user perspective, VGI quality may be considered by users to be meeting the standard or slightly below standard, implying a transition between all levels of quality. It is extremely limiting to treat a VGI that is slightly below the standard in the same way as another VGI that virtually fails to meet the standard. Yongting (1996) proposed the concept of fuzzy quality to account for such a transition by expressing the quality with a fuzzy set instead of Boolean logic. In addition, given the role of trust in evaluating VGI data quality and the nature of trust being inherently fuzzy (Chang et al. 2005), adopting fuzzy set theory to assess the quality of VGI is likely to capture more accurately the whole assessment process.

Fuzzy set theory

Fuzzy set was first introduced by Zadeh (1965) to model continuous phenomena. It generalizes conventional crisp sets by allowing their elements to have degrees of membership. The membership is defined by mapping every element x from a universe of discourse X to an interval [0, 1], representing the degree to which x is an element of a fuzzy set, expressed as Eq. 1.

$$\begin{aligned} & \mu_{A} (x):X \to [0,1], \;{\text{where}} \\ & \mu_{A} (x) = 1\quad if\;x\;is\;totally\;in\;the\;fuzzy\;set; \\ & \mu_{A} (x) = 0\quad if\;x\;is\;not\;in\;the\;fuzzy\;set; \\ & 0 < \mu_{A} (x) < 1\quad if\;x\;is\;partly\;in\;the\;fuzzy\;set. \\ \end{aligned}$$
(1)

Fuzzy set is often used for modelling subjective human reasoning using natural languages in which many expressions have vague or imprecise meanings (Caha et al. 2012). It is therefore a prominent alternative to more traditional modelling paradigms for addressing complex, ill-defined, and less tractable systems (Manca and Curtin 2012). In geography, fuzzy set has been applied to modelling the uncertainty inherent in spatial datasets (Al-kheder et al. 2008; Zhang et al. 2014).

System development

To introduce the system development based on fuzzy set theory, the following sections first describe its core fuzzy inference method. Then the two input variables (i.e., provenance of user expertise and fitness of geographic context), one output variable (i.e., the trustworthiness of user reports), and fuzzy rules of the system are defined. Last, the system usage is introduced.

Fuzzy inference

Mamdani-style fuzzy inference is adopted in the system as it is better suited to handling fuzziness and data uncertainty and it works better with human inputs (Power et al. 2001). The inference requires the developer to create both input and output membership functions from linguistic interpretations of a subject. It generates output values through compositional inference rules and a defuzzification algorithm. Details about Mamdani-style fuzzy inference can be found in Mamdani (1974) and Negnevitsky (2005). A brief workflow showing how our fuzzy system derives the quality (trustworthiness) of a user report based on Mamdani-style is given in Fig. 1, which has four steps as follows:

Fig. 1
figure 1

Workflow of the Mamdani-style trustworthiness score inference

  1. Step 1.

    Fuzzification: Fuzzifying the crisp inputs of the system against appropriate linguistic fuzzy sets and generating membership degrees based on given membership functions.

  2. Step 2.

    Rule evaluation: Applying a fuzzy rule set to infer fuzzy trustworthiness outputs.

  3. Step 3.

    Aggregation of the rule outputs: Aggregating the output of each rule into a single fuzzy set for the overall fuzzy output.

  4. Step 4.

    Defuzzification: Defuzzifying the aggregate output fuzzy set into a final crisp trustworthiness score using the center of gravity (COG) algorithm. The algorithm finds the point (COG) where a vertical line would slice the aggregate set, on the interval [a, b], into two equal masses using Eq. 2.

$$COG = \frac{{\sum\limits_{x = a}^{b} {\mu_{aggregate} (y)y} }}{{\sum\limits_{x = a}^{b} {\mu_{aggregate} (y)} }}.$$
(2)

Input variable one: provenance of user expertise

The proposed system adopts user confidence, the strength to which a person believes that a piece of information is the best available (Peterson and Pitz 1988), as a surrogate to represent provenance of user expertise because it has been shown that confidence can be a valid cue to information accuracy (Sniezek and Van Swol 2001). This piece of information, specifically the level of confidence about the correctness of a user report, is provided by the user who has generated the report. It contributes to the willingness to accept a piece of information, especially when other materials about the information providers are unavailable (Cofta 2007; Sniezek and Buckley 1995). Indeed, confidence has been utilized to automatically evaluate the expertise of the volunteers in performing tasks such as land cover map validation (Foody et al. 2013) and galaxy classification (Bordogna et al. 2014a, b).

Our fuzzy system requires users to choose a value from a ten-point Likert scale to report their confidence levels. The value provides a measure of self-evaluation to VGI quality. Following the four-level fuzzy confidence adopted in Yu and Tsai (2006), four linguistic fuzzy sets—Not Confident (NC), Somewhat Confident (SC), Confident (C), and Very Confident (VC)—are defined for the input user confidence levels, using standard triangular and left/right trapezoidal shapes. The corresponding membership functions are defined by Eq. 3 and illustrated in Fig. 2a. Note that the four fuzzy sets are not symmetric around the median value of the universe of discourse (i.e., five) (Fig. 2a) for the following reason. As the confidence declared by non-expert VGI contributors tend to be less reliable as the confidence declared by experts because some contributors may be somewhat overconfident about their expertise (Pulford 1996), the membership functions representing moderate to relatively high levels of user confidence (i.e., SC, C, and VC) are shifted closer to the right end of the universe of discourse to compensate for over-confidence. The left starting point of VC is kept at 7.5, allowing the values between 7.5 and 8 to have certain low degrees of membership to VC.

Fig. 2
figure 2

Membership functions of a contributor confidence level, b fitness of geographic context, and c trustworthiness

$$\mu_{\text{C}} (c) = \left\{ {\begin{array}{*{20}l} {Not\;Confident\left\{ {\begin{array}{*{20}l} 1 \hfill & {0 \le c \le 1} \hfill \\ {\left( { - \frac{2}{3}c + \frac{5}{3}} \right){\kern 1pt} } \hfill & {1 \le c \le 2.5} \hfill \\ 0 \hfill & {c \ge 2.5} \hfill \\ \end{array} } \right.} \hfill \\ {Somewhat\;Confident\left\{ {\begin{array}{*{20}l} 0 \hfill & {c \le 1.5} \hfill \\ {\frac{2}{5}c - \frac{3}{5}} \hfill & {1.5 \le c \le 4} \hfill \\ { - \frac{2}{5}c + \frac{13}{5}} \hfill & {4 \le c \le 6.5} \hfill \\ 0 \hfill & {c \ge 6.5} \hfill \\ \end{array} } \right.} \hfill \\ {Confident\left\{ {\begin{array}{*{20}l} 0 \hfill & {c \le 4.5} \hfill \\ {\frac{2}{5}c - \frac{9}{5}} \hfill & {4.5 \le c \le 7} \hfill \\ { - \frac{2}{5}c + \frac{19}{5}} \hfill & {7 \le c \le 9.5} \hfill \\ 0 \hfill & {c \ge 9.5} \hfill \\ \end{array} } \right.} \hfill \\ {Very\;Confident\left\{ {\begin{array}{*{20}l} 0 \hfill & {c \le 7.5} \hfill \\ {\frac{1}{2}c - \frac{15}{4}} \hfill & {7.5 \le c \le 9.5} \hfill \\ 1 \hfill & {c \ge 9.5} \hfill \\ \end{array} } \right.} \hfill \\ \end{array} } \right.{\kern 1pt} {\kern 1pt} {\kern 1pt}$$
(3)

Input variable two: fitness of geographic context

Species occurrences usually form clusters. Therefore, fitness of geographic context is evaluated using spatial clustering analysis. According to Tobler’s Law, it is highly possible that a species can be observed at its habitat center (i.e., cluster center) and the possibility decreases with increasing distance away from the habitat center. Therefore, if a cluster of species surveillance reports is contributed by users, its fitness of geographic context is evaluated using its spatial proximity to the center of the cluster.

The fuzzy system uses DBSCAN clustering algorithm (Ester et al. 1996) to locate VGI clusters. DBSCAN can effectively distinguish noise points (i.e., outliers) and discover clusters with arbitrary shapes. Fitness of geographic context is quantified based on an inverse hyperbolic sine function (Eq. 4). The equation captures precisely the characteristics of the fitness of geographic context—it decays with the distance departing from the center of a VGI cluster (i.e., inverse relation with distance) by generating a value between 0 (zero fitness of geographic context) to 10 (perfect fitness of geographic context). Outliers identified by DBSCAN are assigned zero fitness of geographic context.

$$Fitness\;of\;geographic\;context = \left\{ {1 - \ln \left[ {\frac{Dist_{rtc}}{Dist_{max} } + \sqrt {\left( {\frac{Dist_{rtc}}{Dist_{max} }} \right)^{2} \,+ \,1} } \right]} \right\} \times 10,$$
(4)

where Dist rtc is the distance from a user report to its corresponding cluster center, Dist max is the distance between the cluster’s outermost user report and the cluster’s center.

Following the three-level fuzzy proximity adopted in Al-kheder et al. (2008), three linguistic fuzzy sets—Relatively Low (RL), Medium (M), and Relatively High (RH)—are defined for the input fitness of geographic context, using standard triangular and left/right trapezoidal shapes. The corresponding membership functions are defined by Eq. 5 and illustrated in Fig. 2b. The fuzzy sets are symmetric around the median value of the universe of discourse (i.e., five) (Fig. 2b).

$$\mu_{F} (f) = \left\{ {\begin{array}{*{20}l} {Relatively\;Low\left\{ {\begin{array}{*{20}l} 1 \hfill & 0 \le f \le 0.5 \hfill \\ {\left( { - \frac{1}{4}f + \frac{9}{8}} \right)} \hfill & 0.5 \le f \le 4.5 \hfill \\ 0 \hfill & f \ge 4.5 \hfill \\ \end{array} } \right.} \hfill \\ {Medium\left\{ {\begin{array}{*{20}l} 0 \hfill & f \le 2.5 \hfill \\ {\frac{2}{5}f - 1} \hfill & 2.5 \le f \le 5 \hfill \\ { - \frac{2}{5}f + 3} \hfill & 5 \le f \le 7.5 \hfill \\ 0 \hfill & f \ge 7.5 \hfill \\ \end{array} } \right.} \hfill \\ {Relatively\;High\left\{ {\begin{array}{*{20}l} 0 \hfill & f \le 5.5 \hfill \\ {\frac{1}{4}f - \frac{11}{8}} \hfill & 5.5 \le f \le 9.5 \hfill \\ 1 \hfill & f \ge 9.5 \hfill \\ \end{array} } \right.} \hfill \\ \end{array} } \right.{\kern 1pt}$$
(5)

Output variable: trustworthiness

Following the five-level fuzzy trustworthiness in Song et al. (2004), five linguistic fuzzy sets—Very Low (VL), Low (L), Medium (M), High (H), and Very High (VH)—are defined for the output trustworthiness using standard triangular and left/right trapezoidal shapes. The corresponding membership functions with a universe of discourse from 0 to 10 are defined by Eq. 6. The five fuzzy sets are asymmetric around the median value (i.e., five) (Fig. 2c) for the following reasons. Goodchild and Li (2012) suggested that greater weights can be assigned to similar reports that are spatially clustered than to a single report. This system assesses clustered reports which already have relatively greater weights. Therefore, the fuzzy sets representing relatively poor data quality (i.e., VL and L) are placed closer to the left end of the universe of discourse, meaning that a trustworthiness can be linguistically interpreted as low or very low only when it is associated with a sufficiently low value. The peak of VL is not placed at zero to ensure that the peak value stays the same over a certain range (Zhang et al. 2014). Additionally, the wider range of M can maintain sufficient overlap in adjacent fuzzy sets (especially L and M) for the system to respond smoothly (Negnevitsky 2005).

$$\mu_{T} (t) = \left\{ {\begin{array}{*{20}l} {Very\;Low\left\{ {\begin{array}{*{20}l} 1 \hfill & {0 \le t \le 0.5} \hfill \\ {\left( { - t + \frac{3}{2}} \right)} \hfill & {0.5 \le t \le 1.5} \hfill \\ 0 \hfill & {t \ge 1.5} \hfill \\ \end{array} } \right.} \hfill \\ {Low\left\{ {\begin{array}{*{20}l} 0 \hfill & {t \le 0.5} \hfill \\ {\frac{2}{3}t - \frac{1}{3}} \hfill & {0.5 \le t \le 2} \hfill \\ { - \frac{2}{3}t + \frac{7}{3}} \hfill & {2 \le t \le 3.5} \hfill \\ 0 \hfill & {t \ge 3.5} \hfill \\ \end{array} } \right.} \hfill \\ {Medium\left\{ {\begin{array}{*{20}l} 0 \hfill & {t \le 2.5} \hfill \\ {\frac{2}{5}t - 1} \hfill & {2.5 \le t \le 5} \hfill \\ { - \frac{2}{5}t + 3} \hfill & {5 \le t \le 7.5} \hfill \\ 0 \hfill & {t \ge 7.5} \hfill \\ \end{array} } \right.} \hfill \\ {High\left\{ {\begin{array}{*{20}l} 0 \hfill & {t \le 5.5} \hfill \\ {\frac{2}{3}t - \frac{11}{3}} \hfill & {5.5 \le t \le 7} \hfill \\ { - \frac{2}{3}t + \frac{17}{3}} \hfill & {7 \le t \le 8.5} \hfill \\ 0 \hfill & {t \ge 8.5} \hfill \\ \end{array} } \right.} \hfill \\ {Very\;High\left\{ {\begin{array}{*{20}l} 0 \hfill & {t \le 7.5} \hfill \\ {\frac{2}{3}t - 5} \hfill & {7.5 \le t \le 9} \hfill \\ 1 \hfill & {t \ge 9} \hfill \\ \end{array} } \right.} \hfill \\ \end{array} } \right.$$
(6)

Fuzzy rules

The full IF-THEN fuzzy rule set defined for this system is shown in Fig. 3, using a conjunction, AND, for all the rules (e.g., IF confidence level is SC AND fitness of geographic context is RL THEN trustworthiness is VL). The conjunctions in the fuzzy rules are evaluated using the fuzzy operation intersection (Negnevitsky 2005). Assuming that A and B are two fuzzy sets with membership functions \(\mu_{A}\) and \(\mu_{B}\), respectively, the fuzzy operation intersection for creating the intersection of the two fuzzy sets is expressed as Eq. 7.

Fig. 3
figure 3

Fuzzy rule set defined for the system, using a conjunction, AND, for all the rules

$$\mu_{A \cap B} (x) = min{\kern 1pt} {\kern 1pt} {\kern 1pt} [\mu_{A} (x),\mu_{B} (x)].$$
(7)

System output surface

To evaluate the performance of a Mamdani-style fuzzy system, we used its three-dimensional output surface following the suggestion by Negnevitsky (2005). A satisfactory system building is achieved through empirical tunings until the system generates a gradual changing surface which appropriately emulates subjective human reasoning regarding how the interactions of the system’s inputs influence its output in the context the problem is viewed.

The output surface of our system is shown in Fig. 4. The membership functions and the fuzzy rules mentioned above are decided based by assessing this surface. The general trend should be that higher user confidence levels and higher fitness of geographic context lead to higher trustworthiness, while certain special considerations should be appropriately reflected on the surface. For example, if a report has an extremely low user confidence level (meaning a very low user expertise), its trustworthiness should be very low even if its fitness of geographic context is high. Conversely, even if a report has an extremely low fitness of geographic context, its trustworthiness should be moderate if its user confidence level is very high.

Fig. 4
figure 4

Output surface of the fuzzy system

System usage with a running example

Figure 5 shows an example of generating trustworthiness of a reporting (7.68) with two crisp inputs of confidence level (8) and fitness of geographic context (6.5). The red vertical line through the aggregate output fuzzy set depicts location of the COG.

Fig. 5
figure 5

Fuzzy inference process for an assumed user report

Once the system has generated the trustworthiness scores for a VGI dataset, a user-preferred threshold is used to reject or accept the reports. Non-outlier reports with trustworthiness scores lower than the threshold will be rejected and will be accepted if otherwise. Outlier reports should be specially treated. Outlier reports with trustworthiness scores lower than an assigned threshold can be simply discarded. However, outlier reports above the threshold should be treated with caution. It should be reserved or held for further observations, i.e., to see whether or not similar reports will be reported nearby to confirm it.

The selection of threshold is context-dependent and subject to the accuracy requirements of specific projects. Setting a higher threshold can reduce the number of false positives (FP), but it will inevitably increase the number of false negatives (FN). Setting a lower threshold can reduce the number of FN, while it will increase the number of FP. In the context of VGI, FN is actually better than FP. Because rejecting good quality VGI incorrectly is actually better than accepting poor quality VGI incorrectly. Certainly one can choose a very high threshold to only collect VGI with extremely high trustworthiness scores, and ignoring FN.

This system has been implemented using the following tools. DBSCAN algorithm was integrated to ArcGIS as an extension through Python. The fuzzy logic toolbox of MATLAB was used for performing the fuzzy inference. Figure 6 illustrates the architecture of the implemented system.

Fig. 6
figure 6

Architecture of the system implementation

Case study: a VGI-based crop pest surveillance

Motivation

VGI has been previously explored for location-based crop pest managements given its potential in fostering interactive digital communities in which farmers and experts collaboratively manage crop pest risks (Deng and Chang 2012; Suen et al. 2014). In a VGI-based pest management system, the task of acquiring geospatial data of crop pest surveillances is delegated to farmers to share their location-based observations. Information relevant to managing crop pests is then discovered from the shared surveillance data through various spatiotemporal analytics and subsequently disseminated to the farmers for them to better manage crop pest risks.

Inspired by these previous studies, in order to demonstrate the usefulness of the fuzzy system in handling VGI quality, the system was adopted to measure the quality of a set of crop pest surveillance reports collected in Xiajiang prefecture of Jiangxi province, China.

Study design and data analysis

VGI collection and quality assessment using the fuzzy system

The major crop type cultivated in Xiajiang prefecture was rice which accounted for around 90 % of the total cropland of the prefecture (216 km2). Two hundred local rice farmers distributed across the prefecture were recruited to conduct a rice pest surveillance.

The pest surveillance was conducted by the farmers from 15 to 25 August, 2014. They reported rice pest incidents (pest occurrences, damages, or both) observed during their daily farming activities. To report an observed pest incident, the observer inserted a flat bamboo chip firmly into the soil of the rice paddy where the pest incident was observed. The observer also recorded the species name, observation time, and confidence level on the bamboo chip. No other coaching to the farmers was conducted to ensure a minimum intervention to the user contributions. After the pest surveillance, we collected the geographic coordinates of the inserted bamboo chips using Trimble® GeoXT handheld GPS devices which delivered a 50 cm positioning accuracy.

Various rice pest incidents were reported, the species included mainly rice stem borers, rice leaf rollers, rice plant hoppers, rice water weevils, and mole cricket. Of the species, the rice stem borers’ scope of activity was relatively fixed over time. It thus would be easier to conduct post-surveys to verify the actual presences of the reported rice stem borer incidents. Rice stem borer incident reports were therefore used to evaluate the usefulness of the system. During the pest surveillance period, 209 rice stem borer incident reports were collected.

The quality of the 209 incident reports were assessed using the fuzzy system. A threshold should be assigned to the generated trustworthiness scores of the reports to determine whether or not a report should be accepted. As mentioned above, one can set a high threshold to only collect VGI with extremely high trustworthiness scores and ignore FN. In this case study, however, we intended to preserve as many reports as possible. Thus, a moderate threshold is more appropriate. We used a range of thresholds, from 4 to 6 with an increment of 0.2, to evaluate the performance differences. Outliers, if any, were specially treated according to the method stated in the “System usage with a running example” section. The system generated categorical results, i.e., accepted, rejected, and withheld.

Ground truth data collection

From 26 August to 1 September, 2014 (immediately after the pest surveillance conducted by the farmers), a field pest survey was conducted by the pest management experts from the local agricultural department to verify the actual presences of the 209 reported rice stem borer incidents. The experts scrutinized the evidences including feeding wounds or holes, larval frass, egg masses, damage symptoms, and pupas of the stem borers within a two-meter buffer zone (considering the mobility of the borers) surrounding each bamboo chip. If none of these evidences could be detected within the buffer zone of a report, the report was rejected by the experts; and was accepted if otherwise. The pest survey thus generated categorical results, i.e., reports being accepted and reports being rejected. Since the survey was conducted by experienced experts, the results of which were considered as accurate ground truth data.

Conformity tests

Subsequently, conformity tests were conducted. For each threshold within the interval [4, 6], a Cohen’s kappa statistic (Viera and Garrett 2005) showing the degree of agreement between the fuzzy system-generated results and the pest survey results was calculated. The sensitivity (Eq. 8) and specificity (Eq. 9) were also calculated, respectively. Note that the reports in withheld status were not included in the calculations. The system-generated results corresponding to the highest kappa value were mapped for visualization, for which a confusion matrix was provided to show the details about the degree of agreement.

$$Sensitivity = TP/(TP + FN),$$
(8)
$$Specificity = TN/(TN + FP),$$
(9)

where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative, respectively.

To evaluate the impacts of sample sizes on the system performance, randomization tests were performed. Using the threshold that corresponded to the highest kappa value and following the method mentioned above, 30 rounds of conformity test using 30 groups of different subsets from the whole sample set were conducted. That is, ten groups of 90 %-subset, ten groups of 60 %-subset, and ten groups of 30 %-subset were randomly extracted from the whole dataset to conduct the conformity tests. Cohen’s kappa statistic was calculated for each run of the tests. Mean and standard deviation were calculated for the Cohen’s kappa statistics from each of the same percentile subsets.

Results

The fuzzy system identified from the 209 rice stem borer incident reports eight clusters and 16 outliers (Fig. 7a) and trustworthiness scores from 0.51 to 9.10 (Fig. 7b).

Fig. 7
figure 7

Maps showing the VGI quality assessment results generated by the fuzzy system. Background map is the road map of Bing Maps. Thumbnail on the lower right corner shows the relative location of Xiajiang prefecture in China. ac Grey polyline represents boundary of Xiajiang prefecture. a 209 reported rice stem borer incidents. b Inferred trustworthiness scores of the reports. c Final statuses of the reports based on a threshold five

The results of the conformity tests using different thresholds are shown in Table 1. The sensitivity and specificity values confirm that elevating and lowering the thresholds can increase the numbers of FN and FP, respectively, leading to lower kappa values. The highest kappa value (0.67) corresponds to the thresholds 4.8 and five. Therefore, for the purpose of this case study, the integer five was adopted as the threshold for further analyses, although the threshold 4.8 obtained a same kappa value as the threshold five did.

Table 1 Results of the conformity tests using different thresholds

With a threshold five, see Fig. 7c, the fuzzy system rejected 29 reports, including six outliers with trustworthiness scores lower than five (red circles), and accepted 170 reports (green circles). Ten outlier reports were held (blue circles) due to their relatively high user confidence levels associated. The conformity test showed that around 91 % of the fuzzy system-generated results agreed to the survey results with a corresponding kappa value 0.67. Details are visualized by the confusion matrix shown in Fig. 8. Regarding the ten pest incident reports that were held, eight of which had in fact suffered infestations according to the survey results.

Fig. 8
figure 8

Confusion matrix visualizing the degree of agreement

Finally, using the threshold five, it was observed that the system performed better with larger sample sizes, as the mean kappa values increased with the increase of sample size (Fig. 9). The standard deviations also decreased with the increase of sample size (Fig. 9).

Fig. 9
figure 9

Means and standard deviations (shown as whiskers) calculated for the Cohen’s kappa statistics from the three groups of percentile subsets extracted for testing the impacts of sample sizes

Discussion

Features of the system

Using the pest surveillance data, we demonstrated that a proper use of fuzzy set theory can lead to desired VGI quality assurance results. The fuzzy system was developed based on the idea that the quality of VGI can be assured based on its geographic context and provenance of user expertise, and trust can be used as a proxy of quality. Fuzziness involved in trust judgement requires special attentions, and quality itself is also inherently fuzzy. Therefore, fuzzy set theory was adopted as the key to the system design, which easily incorporates semantic knowledge into the quality assessment. Bordogna et al. (2014a) promote a linguistic decision making approach to assess the quality of VGI. Our system extends their work by demonstrating the utility of fuzzy set theory in assessing the quality of user-contributed species surveillance reports in particular.

The system design echoes the view of van Exel et al. (2010) that assessing VGI quality must consider not only feature quality and user quality but also the interdependency between them. To account for fitness of geographic context (feature quality), DBSCAN clustering was used for identifying VGI clusters. As pointed out by Goodchild and Li (2012), quality measures of VGI can arise from the data themselves. More credit can be given to a clustering of similar reports than to a single report, in which case one can develop metrics of quality based on the clustered reports. In our system, the metric is based on the proximities of user reports to their corresponding cluster centers, so as to measure the reports’ fitness of geographic context. It resembles Gao et al. (2014) in which a distance-decay function is used to measure the memberships of a cluster of VGI points assigned to Harvard University campus. The closer a point was to the campus core area, the higher membership the point obtained. Similarly, Liu et al. (2010) used an interpolation procedure to measure the weights of candidate point locations assigned to South China region. The closer a location was to the core area of South China region, the higher weight the point obtained. In addition, confidence was used as a surrogate to represent provenance of user expertise (user quality). The case study confirmed that requesting the volunteers to self-evaluate the correctness of their observations was useful for assessing the quality of the generated information. By considering the interdependency between the feature quality and user quality, the fuzzy system could detect those VGI which seemed to fit geographic context well but virtually were of high uncertainty.

Through the compositional inference rules (Fig. 3), the system can appropriately emulate human reasoning about how the interactions of the two system input variables influence the system output (Fig. 4). Using a simple linear method, for example, combining the values of the two variables by summation does not have the same capability, which will be demonstrated by three exemplar input–output combinations in Table 2. For Combinations 1 and 2, the simple linear method obtains two identical values (i.e., 12), while the fuzzy system generates two different values (i.e., 3.6 and 5). This is an advantage of using rule-based fuzzy system. In Combination 1, although the fitness of geographic context is perfect, the fuzzy system treats the report as being less trustworthy than that of Combination 2 due to its overly low user confidence. In Combination 2, although the fitness of geographic context is relatively low, the system ranks the report with higher credibility as the user confidence is very high. In Combinations 1 and 3, using the simple linear method obtains two different values (i.e., 12 and 7.9), while the fuzzy system generates two identical values (i.e., 3.6). The former method gives more credit to Combination 1. However, for human judgement, it appears to be less appropriate due to its overly low user confidence level. Therefore, a fuzzy system with appropriately defined system parameters (e.g., membership functions) can deal with such complicated non-linear cases through mimicking human thinking.

Table 2 Three different input–output combinations

Furthermore, VGI datasets are often large in volume as VGI contributors on the ground are ubiquitous. In the case study, although the entire dataset was not large, a trend was observed that the system’s performance improved with increasing sample size, rather than the opposite (Fig. 9). The system is also robust at handling non-clustered VGI (i.e., outlier VGI detected by DBSCAN). Such VGI can be processed with two options, i.e., discarding (for reports with low user confidence levels) and holding (for reports with high user confidence levels), than simply being rejected as poor quality VGI. In the case study, the survey results showed that eight of the ten outlier reports with holding statuses had in fact suffered pest infestations. This supports our thought that outlier VGI should not be simply discarded.

Finally, since user histories (provenance) can be harvested from VGI and fitness of geographic context can be determined from VGI, the approach also has a potential to be adapted and applied to similar VGI applications in different contexts, e.g., VGI-based earthquake casualty surveillances.

Potential future improvement

The fuzzy system should be extended in ways that generalize its applicability. Fuzzy logic enables tools to model the inherent fuzziness that would otherwise be neglected by traditional crisp logic, while it also introduces subjectivity into the modelling process. Imprecision related to subjectivity has often been cited as a limitation in conventional fuzzy systems (Adhikari and Li 2013; Al-kheder et al. 2008). In our study, although the selection and turnings of the system parameters are justified, they are still the results of a subjective process. Therefore, optimizing system parameters is perhaps the most important.

Taking the case study for example, the highest kappa value obtained was 0.67 on a 91 % agreement. Although 0.67 is considered a substantial agreement (Viera and Garrett 2005), the specificity (0.66) was not as good as the sensitivity (0.96) (Table 1). In order to preserve as many reports as possible while reducing false positive value, one solution is to further improve the system through parameter calibrations (e.g., membership function calibrations) based on sensitivity analyses using the pest survey data (ground truth data) collected in the case study. After the calibrations, the system can be generalized to larger spatiotemporal extents with greater reliability. Another solution is to use the consensus approach in which system parameters are optimized based on the subjective opinions of multiple decision-makers (Zhang et al. 2014). However, both solutions are often laborious and time-consuming. To remediate this problem, a machine learning approach, which involves the use of artificial neural network to determine the appropriate system parameters automatically, seems more promising as demonstrated in studies using the combined neuro-fuzzy systems for understanding environmental quality issues (e.g., Carnevale et al. 2009; Yan et al. 2010). It will be interesting to investigate how such systems can be utilized to better assure VGI quality.

Moreover, in calculating fitness of geographic context, the spatial extent of a cluster is subject to the VGI points within the cluster. A small number of false contributions in a VGI cluster would not significantly affect the cluster’s spatial extent (the spatial extent affects the memberships of the points within the cluster), especially not when point density of the clusters is high. However, if the majority of the contributions in a VGI cluster are false contributions, our approach will be less effective because the uncertainty about the spatial extent of the cluster is high. This problem points to the need to incorporate a user reputation database to our fuzzy system to exclude contributions from contributors with lower reputation before our system performs a refined data quality assurance work. The idea is similar to that in a facilitated-VGI system suggested in Cinnamon and Schuurman (2013).

Conclusion

In this paper a fuzzy system to assure the quality of VGI collected for the purposes of species surveillances is presented. With a growing number of volunteered geospatial data of species surveillances garnered from the general public, means for the VGI quality assurance are still limited. Developing robust computational approaches to assure the quality of VGI is crucial to the development of such public participatory surveillance programmes. The fuzzy system has the potential to benefit relevant experts, scientists, policy makers, and ordinary VGI users alike. Quantitatively, the usefulness of the system is demonstrated through a crop pest surveillance case study, although further calibrations of the system parameters will be needed. Qualitatively, the system has various features, including mainly its advantages in terms of linguistic fuzziness handling, geographic context measuring, provenance acquiring, and outlier treating. Nevertheless, future work is needed to establish a neuro-fuzzy system and a user reputation system to generalize its applicability.