Keywords

1 Introduction

Accessibility to large sets of Remote Sensing Images (RSIs) has increased over the years, and RSIs are currently a common source of information in many agribusiness applications. Identifying crops is essential for knowing and monitoring land use, defining new land expansion strategies, and estimating viable production value. In this work, we focus on the use of RSIs for a crucial agro-economic activity in the Brazilian state of Minas Gerais: the coffee crop mapping.

Automatic recognition of coffee plantations using RSIs is typically modeled as a supervised classification problem. However, this task is rather challenging, mainly because the relief and age of the crop may hinder the recognition process. Indeed, different spectral responses and texture patterns can be observed for different regions.

Because of this, spectral information may be significantly reduced or even totally lost. Moreover, since the growing of coffee is not a seasonal activity, there may be coffee plantations of different ages in different regions, which also affects the observed spectral patterns.

Although several approaches have advanced the state of the art in mapping coffee in recent years [5, 6, 8], one problem still remains: how to automatically obtain representative samples for classification of new geographic areas?

A strategy to obtain extra data for training models is to transfer knowledge from previously mapped regions. However, as Nogueira et al. [6] has shown, due to the aforementioned differences that may exist in the coffee patterns, the direct transfer does not yield satisfactory results.

In this work, we investigate the application of existing unsupervised domain adaptation (UDA) approaches to the task of transferring knowledge between crop regions with different coffee patterns. Our intent is to evaluate the effectiveness of UDA approaches to map new coffee crop areas. UDA methods allow labeled data to be employed from one or more prior datasets with the aim of creating a learning model for unseen or unlabeled data. As assumption of UDA, the source (prior labeled datasets) and target (new unlabeled data) domains have related but different probability distributions and the divergence between such distributions is called domain shift.

Since supervised learning methods typically expect both source and target data to follow the same distribution, the presence of domain shift can degrade the accuracy on target data if the training occurs directly in a source domain without domain adaptation. Ideally, we would like to learn a proper domain adaptation in an unsupervised manner. This task, however, is rather challenging, and its relation with realistic applications has been gathering attention in the last years [12].

Encouraged by these challenges, in this paper we perform a comparative experimental study of various methods for UDA in RSIs. We use the dataset composed of four remote sensing images of coffee crop agriculture in scenarios with different plant and terrain conditions.

Following sections of this paper are organized as follows: Sect. 2 presents an overview of UDA techniques. Sections 3 and 4 present, respectively, the evaluation protocol and experimental results of our analysis. We conclude this work in Sect. 5 with some remarks and the future directions in the research.

2 Unsupervised Domain Adaptation Approaches

Our experimental evaluation is focused on feature-based UDA methods [12]. Seven approaches were selected from the literature and are summarized in Table 1 according to their main properties. Note that they are grouped into three branches: Data-centric, Subspace centric, and Hybrid methods. We briefly introduce each UDA method according to their branch in the next subsections.

Table 1. Summary of Unsupervised Domain Adaptation Methods

2.1 Data Centric Approaches

To align source and target data, data-centric methods attempt to find a specific transformation that can project both domains into a domain-invariant space. The Divergence between domains is reduced, while preserving the data properties from the original spaces [3, 4, 7].

Transfer Component Analysis (TCA) [7]: its goal is to learn a set of transfer components in a Reproducing Kernel Hilbert Space. When projecting domain data onto the latent space spanned by the transfer components, the distance between different distributions across domains is reduced and variance is preserved.

Joint Distribution Adaptation (JDA) [3]: these approaches extend the Maximum Mean Discrepancy (MMD) to measure the difference in both marginal and conditional distributions. Despite minimizing the marginal distribution between the source and target data, TCA does not guarantee that conditional distributions are reduced in this formulation, which may lead to a poor adaptation. JDA improves TCA, and integrates MMD with Principal Component Analysis (PCA) to create a feature representation that is effective and robust for large domain shifts.

Transfer Joint Matching (TJM) [4]: aims at minimizing the distribution distance between domains while trying to properly reweigh the most discriminative instances in the final adaptation. Some instances from source data may have more relevance for classification task than others due to the differences in the data. TCA, JDA, and TJM rely on the assumption that there always exists a transformation function which can project the source and target data into a common subspace which, at the same time, reduce distribution difference and preserves most original information. This assumption, however, is not realistic: known problems arising from strong domain shifts suggest that there may not always exist such a space.

2.2 Subspace Centric Approaches

In contrast to data-centric methods, subspace-centric methods do not assume the existence of an unified transformation. They rely on a subspace manipulation of the source and target domains [1] or between them [2], upholding that separate subspaces have very optional particular features to be exploited.

Subspace Alignment (SA) [1]: this approach projects source and target data onto different subspaces using PCA. The method then learns a linear transformation matrix M that aligns the source subspace to the target one while minimizing the Frobenius norm of their difference.

Geodesic Flow Kernel (GFK) [2]: is an approach that integrates an infinite number of subspaces that lie on the geodesic flow from the source subspace to the target one using the kernel trick.

The main drawback of subspace-centric methods is that, while focused on reducing the geometrical shift between subspaces, the distribution shift between projected data of domains is not explicitly treated as in data-centric methods.

2.3 Hybrid Approaches

CORAL [10]: addresses the drawbacks of data and subspace-centric methods. Domain shift is minimized by aligning the covariance of the source and target distributions in the original data. In contrast to subspace-centric methods, CORAL suggests an alignment without the need of subspace projection, which would require intense computation and complex hyper-parameter tuning.

In addition, CORAL do not assume a unified transformation like data-centric methods; it uses, instead, an asymmetric transformation only on source data.

Joint Geometric Subspace Alignment (JGSA) [11]: aims to reduce the statistical and geometrical divergence between domains using common and specific properties of the source and target data. An overall objective function is created by taking into account five terms: target variance, the variance between/within classes, distribution shift, and subspace shift.

3 Methodology

We performed an extensive set of experiments on Brazilian Coffee Crops dataset in order to evaluate the robustness of UDA methods in a remote-sensing agriculture scenario. The experiments were designed to answer the following research questions:

  1. 1.

    Can knowledge transfer between coffee plantations datasets from different geographic regions yield complementary results?

  2. 2.

    Is it possible to infer a spatial relationship between coffee samples correctly predicted from learning models trained in different data sources?

To answer the question (1), we used Venn diagrams of predictions to analyze the complementarity among different coffee datasets.

Concerning question (2), we perform a visual analysis of samples which are correctly predicted by specific models, using two different methods to project the original representation of data in 2D-space: Principal Component Analysis and t-Distributed Stochastic Neighbor Embedding (TSNE).

3.1 Data

The Brazilian Coffee ScenesFootnote 1 dataset consists of four remote sensing images composed of multi-spectral scenes taken by the SPOT sensor in 2005, covering regions of coffee cultivation over four counties in the state of Minas Gerais, Brazil: Arceburgo (AR), Guaxupé (GX), Guaranésia (GA) and Monte Santo (MS). Each county is partitioned into multiple tiles of 64 \(\times \) 64 pixels, which are divided into 2 classes (coffee and non-coffee). To mitigate the problem of imbalanced datasets, we applied a random under-sampling technique, balancing the data by randomly selecting a subset of data for the targeted classes. In our analysis, we considered each county as a different domain, thus we have four domains (AR, GX, GA, and MS) leading to 12 possible domain adaptation combinations. We used the low-level Border/Interior Pixel Classification (BIC) [9] for feature extraction. BIC is a very effective descriptor for coffee crops as shown in [5, 8].

3.2 Setup and Implementation Details

We made a comparison between seven state-of-the-art methods: Transfer Component Analysis (TCA) [7], Geodesic Flow Kernel (GFK) [2], Subspace Alignment (SA) [1], Joint Distribution Analysis (JDA) [3], Transfer Joint Matching (TJM) [4], CORAL [10], Joint Geometrical and Statistical Alignment (JGSA) [11] and transfer with no adaptation (NA). A brief description of methods is shown in Sect. 2 (for more details we recommend the original papers). We follow a full training evaluation protocol, where a Support Vector Machine (SVM) is trained on the labeled source data, and tested on the unlabeled target data. In our experimental setup, tuning of parameters is always made in the source data, since it is impossible to use a cross-validation without labeled samples from the target domain. We evaluate all methods by empirically searching the parameter space for optimal parameter settings that yield the highest average kappa on all datasets, and we report the best accuracy results of each method.

4 Experimental Results and Discussion

In this section we compare different strategies of transferring knowledge between geographic domains in order to map coffee crops. SVM classifiers were trained with no adaptation (NA) and using the seven selected UDA approaches.

Table 2 presents cases of positive and negative transfer for the datasets using L2 Norm-Z-score normalization.

4.1 Complementarity of Cross-Domain Predictions

In this subsection, we select the pair (l2-Norm-Z-score/TCA) to analyze the complementarity of predictions between the source and target data. Given a target data, each group from the diagram represents a source data wherein the pair (l2-Norm-Z-score/TCA) was trained. Intersections between sets show samples that were predicted correctly by both sets. The results were represented in Venn diagrams, as shown in Fig. 1.

Table 2. Positive (blue) and Negative (red) Transfer using L2 Norm - Z-score
Fig. 1.
figure 1

Venn diagram for samples correctly predicted in different target data. Target data: (a) Arceburgo. (b) Guaxupé. (c) Guaranesia. (d) Monte Santo.

As expected, most of the samples in all diagrams are at the intersection of three sets, i.e., the easiest samples are correctly predicted if trained in any of the available source datasets. However, it is possible to notice a considerable number of samples that were correctly predicted only from a single source.

This suggests the existence of complementary information that can be exploited to build a more reliable learning model. One can also notice a relationship of “similarity” between domains. That is, some pair of domains perform better than others. For instance, GX and MS in Fig. 1b.

However, this relationship is not always bidirectional, as exemplified by AR and GA in Fig. 1a. GA yields good results as a source domain, predicting correctly 76.92% of samples, but in Fig. 1c MS is more useful than AR, correctly classifying 77.71% of samples, compared with 75.36% from AR.

4.2 Visual Analysis

In this section we investigate the spatial relationship between the samples. Given a fix adaptation approach, we are focusing on samples that were correctly predicted exclusively for that source data in specific. For this purpose, we propose a visual analysis of these samples using two different methods to project the original representation of data in 2D-space: Principal Component Analysis and t-Distributed Stochastic Neighbor Embedding (TSNE). The projections from PCA and TSNE data are showed in Figs. 2 and 3 respectively.

Fig. 2.
figure 2

2D-space projections using PCA

Fig. 3.
figure 3

2D-space projections using TSNE

With a visual analysis of the projections, it is possible to notice important aspects of data and the complementarity between source data. First, PCA projections show little insight into the spatial relationship of correctly predicted samples; instead, it shows sparsity over the features space. The PCA projection is a powerful dimensional reduction technique since it projects the original high-dimensional data in a low-dimensional space preserving the maximum variance as possible. However, PCA not preserve the local structure of original data, i.e., points that are close, regarding some metric, in original high-dimensional space do not remain close in the new low-dimensional space. Second, in contrast with PCA, TSNE using a non-linear manifold approach can successfully create a low-dimensional representations preserving local structures, as shown in Fig. 3. In addition, we can notice a leaning of a complementarity between learning models, since the samples corrected predict from different sources are tending to create clusters. This behavior in projections can be a suggestive interpretation of shared properties between the source and target data where the clusters show samples whose are more likely to be drawn from a specific source data. Another way of seeing the previous interpretation is taking in consideration the fact of remote sensing images can present a high intra-class variance due to the huge spatial extension explored. An entire image can be seen as a composition of several probabilities distributions which some of them are better explained from different sources of data.

5 Conclusion

This paper describes a comparative experimental analysis of seven UDA approaches to perform automatic coffee crop mapping. We conducted two sets of experiments with the intent of verifying whether existing approaches to unsupervised domain adaptation can assist in the transfer of knowledge between datasets of different geographic domains. The main conclusion is that employing a UDA strategy is more effective than performing transfer knowledge without any adaptation.

In terms of mean accuracy, TCA [7] presents the most suitable results. The negative transfer phenomenon is noticed in several experiments, supporting the importance of an effective adaptation. Analyzing the complementarity of predictions, we observed the existence of additional information that could be exploited from multiple source data to build a more reliable learning model. At last, a visual analysis was conducted to identify clusters between samples correctly predicted using different source data. This observation shows that some samples from target data are likely to be drawn from a specific source. This inspection indicates that a robust UDA approach needs to recognize the importance of multiples sources, considering that each source data has a different contribution for distinct samples from the target.

As future work, we intend to investigate ways for avoiding the negative transfer and employ UDA strategies in other vegetation mapping applications.