Comparison of Three Different Random Forest Approaches to Retrieve Daily High-Resolution Snow Cover Maps from MODIS and Sentinel-2 in a Mountain Area, Gran Paradiso National Park (NW Alps)

Richiardi, Chiara; Siniscalco, Consolata; Adamo, Maria

doi:10.3390/rs15020343

Open AccessArticle

Comparison of Three Different Random Forest Approaches to Retrieve Daily High-Resolution Snow Cover Maps from MODIS and Sentinel-2 in a Mountain Area, Gran Paradiso National Park (NW Alps)

by

Chiara Richiardi

^1,2,*

,

Consolata Siniscalco

² and

Maria Adamo

¹

Institute of Atmospheric Pollution Research (IIA), National Research Council (CNR), c/o Interateneo Physics Department, Via Amendola 173, 70126 Bari, Italy

²

Department of Life Sciences and Systems Biology, University of Torino, Via Pier Andrea Mattioli 25, 10125 Turin, Italy

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(2), 343; https://doi.org/10.3390/rs15020343

Submission received: 2 November 2022 / Revised: 26 December 2022 / Accepted: 4 January 2023 / Published: 6 January 2023

(This article belongs to the Special Issue Emerging Remote Sensing Techniques for Monitoring Glaciers and Snow)

Download

Browse Figures

Versions Notes

Abstract

:

In the Alpine environment, snow plays a key role in many processes involving ecosystems, biogeochemical cycles, and human wellbeing. Due to the inaccessibility of mountain areas and the high spatial and temporal heterogeneity of the snowpack, satellite spatio-temporal data without gaps offer a unique opportunity to monitor snow on a fine scale. In this study, we present a random forest approach within three different workflows to combine MODIS and Sentinel-2 snow products to retrieve daily gap-free snow cover maps at 20 m resolution. The three workflows differ in terms of the type of ingested snow products and, consequently, in the type of random forest used. The required inputs are the MODIS/Terra Snow Cover Daily L3 Global dataset at 500 m and the Sentinel-2 snow dataset at 20 m, automatically retrieved through the recently developed revised-Let It Snow workflow, from which the selected inputs are, alternatively, the Snow Cover Extent (SCE) map or the Normalized Difference Snow Index (NDSI) map, and a Digital Elevation Model (DEM) of consistent resolution with Sentinel-2 imagery. The algorithm is based on two steps, the first to fill the gaps of the MODIS snow dataset and the second to downscale the data and obtain the high resolution daily snow time series. The workflow is applied to a case study in Gran Paradiso National Park. The proposed study represents a first attempt to use the revised-Let It Snow with the purpose of extracting temporal parameters of snow. The validation was achieved by comparison with both an independent dataset of Sentinel-2 to assess the spatial accuracy, including the snowline elevation prediction, and the algorithm’s performance through the different topographic conditions, and with in-situ data collected by meteorological stations, to assess temporal accuracy, with a focus on seasonal snow phenology parameters. Results show that all of the approaches provide robust time series (overall accuracies of A1 = 93.4%, and A2 and A3 = 92.6% against Sentinel-2, and A1 = 93.1%, A2 = 93.7%, and A3 = 93.6% against weather stations), but the first approach requires about one fifth of the computational resources needed for the other two. The proposed workflow is fully automatic and requires input data that are readily and globally available, and promises to be easily reproducible in other study areas to obtain high-resolution daily time series, which is crucial for understanding snow-driven processes at a fine scale, such as vegetation dynamics after snowmelt.

Keywords:

data fusion; gap-filling; MODIS; random forest; Sentinel-2; snow cover; snow cover duration

1. Introduction

Snow is crucial in Alpine ecosystems, being a fundamental driver of multiple factors, both abiotic and biotic. Snow dynamics have a significant impact on ecosystems, as its seasonality affects the phenology of plants and animals, the ground thermal regime [1,2], and it is an important water reservoir, pivotal to mountain hydrological regimes [3,4]. On a wider scale, due to its high albedo, it affects the global energy budget [5]. Snow also affects human welfare; on the one hand, by providing socio-economic services related to tourism and water supply [6,7], and on the other hand, creating avalanche risk [8,9]. Snow studies are also relevant for monitoring urban streams and lakes [10,11]. For these reasons, in recent years, the study of snow distribution and its changes has become the focus of numerous studies [12,13,14,15,16,17,18,19,20].

Satellite remote sensing data can provide continuous spatio-temporal information on snow cover over long time series and on a global scale [21], overcoming the limitations of other types of observations, such as manual or automatic measurements by weather stations, in gathering information over wide and inaccessible areas [22]. The most widely used methods for extracting snow cover from satellite images are based on data from passive sensors [23], which include microwave and optical data. Microwave passive sensors can retrieve information related to snow depth (SD) and snow water equivalent (SWE), and are not affected by cloud cover or solar illumination, but are limited by a low spatial resolution (>1 km) [24]. SWE has been retrieved from the Advanced Microwave Scanning Radiometer—Earth Observing System (AMSR-E) [25], the Scanning Multichannel Microwave Radiometer (SMMR) [26], the Microwave Radiation Imager (MWRI) [27], and the Special Sensor Microwave/Imager (SSM/I) [28] satellites. In contrast to microwave sensors, optical data offer long observation time series at higher spatial resolutions (≤1 km) [24]. Most snow products are based on data acquired from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensors [29] carried by the Terra and Aqua satellites, and the Advanced Very-High-Resolution Radiometer (AVHRR) sensors carried on NOAA’s Polar Orbiting Environmental Satellites (POES) [30], providing daily coverage, but with low spatial resolution. In contrast, the Landsat [31] and, more recently, the Sentinel-2 constellations [32], provide high spatial resolution, but at the expense of revisit times of 16 days and 5 days, respectively. Optical sensors exploit the reflectance/absorption behavior of snow in the short-wave infrared (SWIR) and visible bands [33], summarized in the Normalized Difference Snow Index (NDSI) [34] to discriminate snow from other land cover classes and, in particular, from clouds [23]. The greatest limitation of optical sensors is their dependence on lighting and weather conditions. On the contrary, the use of Synthetic Aperture Radar (SAR) sensors to estimate snow and related variables depends on the roughness and dielectric properties of snow [21], and it is not affected by cloud cover and light conditions. Data collected from different types of sensors have been used in snow-related applications, varying in both temporal and spatial resolution, and type of wavelength emitted: the X-band satellites constellation COSMO-SkyMed [35], the satellites TanDEM-X [36] and TerraSAR-X [37]. More frequently used, due to the longer time series available and the C-band suitability in snow detection, are the Sentinel-1 [38,39,40], ERS [41], Envisat [42], and Radarsat [43] satellites. Lastly, the L-band Advanced Land Observation Satellite (ALOS) [44] has been employed in snow cover detection. Snow detection with SAR sensors has achieved great results in wet snow detection since it was first proposed to be used for this purpose [45], but dry snow detection is still a field under development [46]. Finally, products are available with daily temporal coverage that blend different types of data, such as the Canadian Meteorological Centre (CMC) snow product [47], a combination of weather stations and climate models with a spatial resolution of 24 km; the ESA’s GlobSnow product [48], with 25 km resolution, which integrates satellite data, station observations, and models; and the Interactive Multisensor Snow and Ice Mapping System (IMS) [49] produced by the U.S. National Ice Center, with 1 km resolution.

In summary, no sensor is currently available that provides both adequate spatial and temporal resolution for fine-scale spatio-temporal studies. Several methods have been proposed to overcome this limitation and provide time series of snow cover without gaps, which can be divided into methods based on gap-filling of binary snow data, i.e., snow/no snow maps, gap-filling of continuous snow data, such as fractional snow cover (FSC) or NDSI maps, and, lastly, methods based on the fusion of different data sources.

Among the gap-filling of binary data methods, SNOWL is a spatial filtering method based on snow transition elevation developed for MODIS [50]; simple and easy to implement, but not suitable for fine-scale analysis. The locally weighted logistic regression (LWLR) computes the snow cover probability of the cloudy pixel based on the topographic and spatial relations with its neighboring cloud-free pixels [51]. Examples of temporal filtering are the Terra and Aqua combination (TAC), and the three-days-temporal-filtering (3DTF), which, respectively, use the two daily MODIS images and the images acquired on the previous and following day to fill in the gaps due to clouds [52]. A multi-day-combination approach was used by [53] to replace cloudy pixels with cloud-free pixels in a temporal interval of up to 1 month. These methods benefit from a simple application, but, as the temporal window increases due to persistent cloud cover, the temporal resolution and accuracy of the result will decrease.

Other authors have proposed techniques based on the gap-filling of continuous data. The spatial and temporal adaptive gap-filling method (STAGFM) was proposed by Chen et al. [54] for gap-filling of MODIS NDSI data based on the concept of the similar pixel, i.e., the pixels with smaller NDSI differences from the central pixel, which, when covered by clouds, can be predicted by its weighted similar pixels. This was developed into the conditional probability interpolation method based on a space-time cube (STCPI), which combines a space-time cube and conditional probability interpolation method to produce daily gap-free products [55]. The accuracy of both methods depends on the availability of cloud-free pixels. Other authors have proposed the non-local spatio-temporal filtering (NSTF) method to retrieve cloudy pixels by using the appropriate similar pixels in the remaining known part of an image via an automatic machine learning technique and the use of ancillary data, such as land cover [56].

More recently, several methods based on data fusion have been adopted, fusing SAR and optical data or coarse and high resolution optical data, such as MODIS and Landsat or Sentinel-2 imagery. Liu et al. [57], through an object-based principal component analysis–support vector machine (PCA–SVM) method for snow cover mapping, attempted to merge MODIS snow cover products and the Sentinel-1 scattering; however, the variations of the polarization parameters were not sufficient to differentiate snow from non-snow cover, thus leading to misclassification and false detection in the snow pixel classification. Rittger et al. [58] proposed a machine-learning-based method to fuse Landsat and MODIS images and several ancillary data, some of which were hardly available, to obtain daily images at 30 m resolution, requiring very high computational resources. Another recent work is that of Revuelto et al. [59], which proposes two different approaches, namely ProbMODIS and ProbSentinel, for downscaling MODIS snow products using Sentinel-2 based on snow probability computation.

Snow distribution, especially in mountain areas, has high heterogeneity, both spatially and temporally [60]. Therefore, it is important to have the maximum temporal and spatial resolution of snow [61], which is why Landsat and Sentinel-2 imagery are being increasingly used. Sentinel-2, compared to Landsat, has the advantage of benefitting from a stronger temporal coverage when gap-filling procedures must be applied to obtain a daily temporal series, from which snow phenology metrics can be drawn.

To fulfill these requirements (high spatial and temporal resolution), the goal of this work is to develop a workflow to retrieve a gap-free time series of snow cover maps at high resolution, to investigate the snow dynamics on a fine scale. The proposed method, based on a machine learning approach, involves the data fusion of MODIS snow data [62], consisting of daily images but at low spatial resolution, and the Sentinel 2 constellation data [63], which, conversely, achieve high spatial resolution but are acquired every 5 days. We compared three different approaches, differing in the type of input data used for training random forests, and experimented on a mountain area in the northwest Alps. Validation was performed both to investigate temporal and spatial accuracy through, respectively, a comparison with ground truth data collected by meteorological stations, and a comparison with the Sentinel-2 almost cloud-free images themselves.

This document is divided into five sections. Section 2 describes the study area, the data used, and explains the workflow implemented, as well as the validation methodology. Section 3 reports the results of our double validation approach. Section 4 discusses the experiment and, finally, Section 5 concludes this paper.

2. Materials and Methods

2.1. Study Area

The study area, which covered approximately 2066 km², was centered on Gran Paradiso National Park (GPNP), a protected area established in 1922 for the conservation of the ibex (Capra ibex), whose population, thanks to protection and reintroduction operations, has given rise to all of the individuals currently occurring in the Alps [64]. The GPNP extends over the northwestern Alps, straddling the Piedmont and Val d’Aosta regions, on the border with France (Figure 1). The area is characterized by mountain reliefs up to 4000 m and steep valleys on the Piedmont side which are gentler on the Valle d’Aosta side. The climate is alpine-continental, characterized by a harsh winter and dominated by the presence of snow from November to June above 2000 m, as well as by a short summer. The park has a wide variety of ecosystems distributed along an altitudinal gradient. Perennial snows dominate the landscape above 2500–3000 m. Under these altitudes, seasonal snows play an important role in all forms of life in the park, influencing animal behavior and protecting small plants and soil from freezing in the cold season, and gradually releasing water in the spring as it melts, also determining the plants greening and growing season length [65].

2.2. Data

2.2.1. MODIS

The Moderate Resolution Imaging Spectroradiometer (MODIS) is a sensor on board the Terra and Aqua (EOS AM-1) satellites of the National Aeronautics and Space Administration (NASA) [66]. In this study, we used only the data acquired by Terra to keep the workflow straightforward and simple. Furthermore, countless other studies have shown that the combination of Terra and Aqua does not completely remove gaps, which are instead entirely handled in the first step of the proposed algorithm. The Terra satellite has been collecting daily data since 24 February 2000. It follows a sun-synchronous, near-polar, circular orbit at an altitude of 705 km, with a swath width of 2330 km, collecting data in 36 bands at various spatial resolutions (250, 500, or 1000 m) depending on the different spectral bands, ranging from 0.4 μm to 14.4 μm in wavelength. In addition to MODIS images at different processing levels, there are several MODIS products readily available. In this study, we used the Level 3 dataset MOD10A1 [62], distributed in sinusoidal grid tiles at 500 m of resolution by the National Snow and Ice Data Center (NSIDC). We collected the scenes of tile h18 v04 from 30 July 2015 to 31 December 2021 (available at https://nsidc.org/data/MOD10A1/versions/61, accessed on 24 March 2022). Each scene includes two data quality files, two auxiliary data files, three geolocation files, three metadata files, the FSC (NDSI_Snow_Cover), the raw NDSI, and the albedo (Snow_Albedo_Daily_Tile) data.

2.2.2. Sentinel-2

The Copernicus Sentinel-2 (S2) mission [63] comprises twin satellites, S2A and S2B, in orbit since June 2015 and March 2017, respectively, carrying the Multi-Spectral Instrument (MSI) that samples 13 spectral bands with a spatial resolution ranging from 10 to 60 m, with a swath width of 290 km, and a temporal resolution of about 5 days. All scenes are delivered in 100 × 100 km² grid tiles in the UTM/WGS84 projection.

In this study, we used all the available scenes from the beginning of the mission (i.e., 30 July 2015) to 31 December 2021. The images were processed following the workflow described in Richiardi et al. [67]: first extracting the cloud cover with the Fmask (function of mask) processor [68], then, performing the atmospheric and topographic correction with Sen2Cor [69]. Finally, the Snow Cover Extent (SCE) is extracted through the revised-Let It Snow (rLIS) algorithm, which exploits the approach firstly proposed by Gascoin et al. [32] and then modified by Richiardi et al. [67]. rLIS, which is specifically aimed at improving the discrimination between snow and clouds in mountain areas, is a tree algorithm based on thresholds on the normalized difference snow index (NDSI) and the red and the shortwave-infrared (SWIR) reflectance, and on a Digital Elevation Model (DEM). Primary, it identifies the snowline elevation using conservative thresholds, and then detects the snow cover using lower conservative thresholds and the snowline elevation, under which the snow presence is excluded. Furthermore, in this case, no data and cloudy pixels, retrieved by Fmask, were masked out.

2.2.3. Digital Elevation Model

Given the complex topography of the territory, and to be consistent with S2 imagery, we used a DEM at 20 m resolution. It was created by merging the Digital Terrain Models (DTMs) available on the geoportals of Piedmont (5 m resolution, https://www.geoportale.piemonte.it/geonetwork/srv/ita/catalog.search#/metadata/r_piemon:224de2ac-023e-441c-9ae0-ea493b217a8e, accessed on 5 February 2021) and Aosta Valley (2 m resolution, https://geoportale.regione.vda.it/download/dtm/, accessed on 5 February 2021) Regions and the European Digital Elevation Model version 1.1 (EU-DEM, 25 m resolution, https://land.copernicus.eu/imagery-in-situ/eu-dem/eu-dem-v1.1, accessed on 5 February 2021), to fill the remaining study area.

2.3. Workflow: Two-Stage Random Forests

The implemented workflow, based on random forest (RF) [70], aims to obtain a complete high-resolution time series of snow cover maps by performing data fusion between MODIS and S2 images. The algorithm was developed in R programming language and is based on two steps: the first is aimed at gap-filling the MODIS snow dataset, which is then downscaled and merged with the S2 data in the second step (Figure 2).

In the first step, the MOD10A1 snow dataset is used as training variable, while as predictive variables are used topographical features, i.e., elevation, slope and aspect, temporal features, i.e., Day of Year (DOY) and year, and latitude and longitude. The selected topographical variables exert a strong control on snow distribution [58,71], and the temporal variables are used to exploit the strong interannual consistency of the snow and, thus, reinforce the prediction ability. The use of latitude and longitude also serves as a proxy for local micro-climatic variability. The random forest is constructed using simple random sampling with a size of 25% of each MODIS image, excluding invalid pixels (e.g., cloud cover, missing data, poor light conditions, etc.), which is, then, used to fill the gaps in each MODIS scene. In the second phase, the same predictor variables are used, including the gap-filled MOD10A1 dataset, and, as training dataset are ingested the aforementioned snow products derived from 200 S2 scenes with cloud cover of less than 10% (Figure 3). Again, the random forest is used to complete missing parts of scenes, due to cloud cover or partial acquisitions, and to retrieve the total scene in days without acquisitions (Figure 4). In the case of cloud cover greater than 10%, the S2 scenes are discarded and reconstructed in their entirety.

To find the highest accuracy, we compared three approaches that differed in the selection of data used for random forests training and, consequently, in the type of random forest itself (Table 1):

The first approach (A1) uses the FSC data from MODIS as input for a regression RF in the first step, along with the binary Snow Cover Extent (SCE) product from S2 for a classification RF in the second step.
The second approach (A2) uses the MODIS FSC as input for a regression RF in the first step, and the Normalized Difference Snow Index (NDSI) from S2 for the second step.
The third approach (A3) uses the raw NDSI maps from both MODIS and S2 and a regression RF in both the first and second steps, respectively.

The training dataset for the second step RF was built through a stratified random sampling approach, with a sample size of 2.5% of each available scene with a cloud cover lower than 10%, retaining only simultaneously valid pixels both in S2 and MODIS.

The stratified random sampling for the construction of the training dataset is designed to uniformly select within each topographic variable considered. In other words, the 5% of each scene (with less than 10% cloud cover) is randomly selected within each combination of elevation, aspect and slope classes, so as to be evenly represented. Elevation classes are defined by every 100 m of altitude. The surface faces (north, north-east, east, south-east, south, south-west, west, north-west, and flat) identify the aspect classes. Finally, slope classes (Table 2) are delineated by increasing degrees of steepness, based on the relationship between it and snow cover, as found by other studies [71]. Before each random process, i.e., random sampling and random forest training and classification/prediction steps, seeds were set, equal for each corresponding step of the three approaches, to ensure replicability.

In A1, the RF outputs in step 2 are the daily cloud-free SCE maps at 20 m resolution (Figure 3). In A2 and A3, instead, step 2 results in daily cloud-free NDSI maps at 20 m resolution, thus, further steps are necessary to obtain the final SCE maps; i.e., retrieve a water layer to mask out water bodies and extract the snow cover from the NDSI map.

The water mask is automatically extracted from the Normalized Difference Water Index (NDWI) [72] map derived from the first near-cloud and snow-free S2 image of the ingested collection, using a fixed threshold of 0.3 (i.e., NDWI > 0.3 is considered water and < 0.3 not water). Finally, the SCE map is extracted, using a restrictive NDSI threshold (NDSI > 0.4) in the first instance to extract the snowline elevation (i.e., the lower altitudinal snow limit) and then applying a less restrictive (NDSI > 0.2) one in the second, excluding areas below this threshold and water areas. The first NDSI threshold is widely accepted in optical satellite snow detection [73,74], while the second was established through a trial-and-error approach. The final output is, therefore, the binary snow cover map for all three approaches (Figure 4).

Random Forest

Random forest (RF) consists of an ensemble learning method for classification and regression that operates by constructing a multitude of decision trees at training time [75]. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the average prediction of the individual trees is returned. In the case of regression forests, the prediction values always lie within the range of observed values.

In this study, we used the random forest models growth through the ranger package [76], which has the advantage of running in parallel individual trees in C++ exploiting different cores; widely used in the processing of large data sets, such as satellite imagery [58,77,78].

The hyperparameters considered for random forest tuning were:

ntree: Number of trees grown by the forest;
mtry: Number of variables randomly sampled as candidates at each split;
node size: Minimum number of observations in a terminal node.

Although default values can be used [79], tuning was sequentially conducted for each hyperparameter considered, keeping the previous one found to be the best constant in the next tests, in order to have more control over the RF process, trading between performances and computational time. First, the ntree hyperparameter was tuned, setting mtry and node size with the default values, i.e., the squared root of the number of the predictive variables for classification and one-third of the number of predicted variables for regression, and 1 for classification and 5 for regression, respectively, and finding the optimal value. Next, keeping node size with the default value and ntree with the optimal value previously found, mtry was tuned. Finally, setting the optimal values of ntree and mtry, the tuning of node size was performed.

The experiment was run on a Windows machine with 128 GB of RAM and an Intel i9-11900F, 2.50 GHz CPU with 16 cores.

2.4. Validation

The in situ data collected by the weather stations allow for a complete time comparison, verifying the accuracy of the entire time series day by day. As they represent individual points, they do not allow for verifying the spatial accuracy of the entire image. On the contrary, the comparison with S2 images allows for validating the entire image, but, being sporadic data, they leave uncertainties on the reconstructed time series. Both approaches were adopted for validation, obtaining both a spatial and temporal perspective of accuracy.

2.4.1. Comparison with Real Sentinel-2 SCE Maps

Spatial validity was carried out by comparing the predicted SCE maps with an independent dataset of high-quality SCE maps from S2 with low cloud cover, i.e., between 10 and 15%, which were excluded from RF training and which achieved an overall accuracy of 96% [67]. A total of 44 images met these requirements and were, thus, used for the validation. Cloudy pixels and other gaps in the real S2 scenes, such as no data due to partial acquisitions, were excluded from the computation. The accuracy of the products was investigated both overall and in the different seasons, as well as throughout the different topographies.

2.4.2. Comparison with In Situ Measurements from Weather Stations

Temporal validity was performed by comparing all predicted scenes against daily Snow Depth (SD) data collected by weather stations (Table 3) present in the study area (Figure 1). These weather stations are owned and managed by the Piedmont Regional Agency for Environmental Protection (ARPA Piemonte). Meteorological stations data are accessible through a dedicated portal (available online at https://www.arpa.piemonte.it/rischinaturali/accesso-ai-dati/annali_meteoidrologici/annali-meteo-idro/banca-dati-meteorologica.html, accessed on 12 April 2022).

The daily SD data were collected for the time window of this study (i.e., from 30 July 2015 to 31 December 2021). The day is considered with snow for SD ≥ 1 cm and as snow-free otherwise, as below this threshold it is unlikely to detect snow through remote sensing [80,81,82,83].

In addition to validating the prediction accuracy, the snow season parameters were also validated by comparing the First Snow Day (FSD), i.e., the first day of stable seasonal snow (i.e., lasting more than 15 consecutive days) [84] per pixel, expressed in DOY, the Last Snow Day (LSD), last day of stable season snow per pixel, again, expressed in DOY, and the Snow Cover Duration (SCD), i.e., the number of days of snow cover on the ground in the snow year per pixel, expressed in number of days, computed from 1 September to 31 August of the following calendar year [85], with the same parameters extracted from the weather stations.

2.4.3. Evaluation Metrics

Both comparisons were performed by building a confusion matrix between the predicted dataset (i.e., the synthetic snow cover scenes) and the reference datasets (i.e., the real S2 SCE maps in the first case and SD in the second).

To assess the workflow performance, the following metrics were adopted: the F1-score, Producer and User Accuracy for both classes, and the overall metrics Overall Accuracy (OA) and Macro-F1 score [86,87,88]. The Macro-F1 score, the arithmetic mean of all the per-class F1-scores, was considered because of its capability to manage imbalanced datasets.

The accuracy of the snow seasonal parameters SCD, FSD, and LSD was calculated through the metrics Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Bias Error (MBE), defined as follows:

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(p_{i} - o_{i})}^{2}}

(1)

MAE = \frac{1}{n} \sum_{i = 1}^{n} | p_{i} - o_{i} |

(2)

MBE = \frac{1}{n} \sum_{i = 1}^{n} (p_{i} - o_{i})

(3)

where p_i is predicted values, o_i is observed values, and i is the total number of observations. The RMSE is a quadratic scoring rule which measures the average magnitude of the error. The MAE gives the average of absolute difference between model prediction and target value, while the MBE measures the average magnitude of the errors. In addition, McNemar’s chi-squared (χ²) test was used to evaluate the statistical significance between the performances of the three approaches [89].

3. Results

3.1. Comparison with Real Sentinel-2 SCE Maps

Spatial validation of the predicted images was achieved by comparison with S2 images. All of the images with cloud cover between 10% and 15% were selected, which were excluded from the RF training but still characterized by high accuracy (>96%). A total of 44 scenes, spread over the various months between 2015 and 2021, met these requirements and were used for the comparison, the results of which are shown in Figure 5. Producer and User Accuracies of the two classes are shown in the Supplementary Materials.

In general, the A1 approach reaches the highest overall accuracy (A1 = 93.6%), closely followed by the other two approaches (A2 and A3 = 92.6%). Even considering the performance in individual seasons, there is no outstanding approach compared to others. Snow and No-Snow classes have an opposite behavior: the critical seasons for one class are those in which the other achieves better metrics, and vice versa. The summer season (JJA = June-July-August) is the one which presents the lowest performances for Snow (range values for A1, A2, and A3: PA = 0.81–0.84, UA = 0.89, F1 = 0.85–0.86), while the No-Snow class presents very high values (range values: PA = 0.98, UA = 0.96–0.97, F1 = 0.97). The contrary happens during the winter season (DJF = December-January-February), during which the Snow class has better performances (range values: PA = 0.95–0.96, UA = 0.94–0.95, F1 = 0.95) compared to the No-Snow class (range values: PA = 0.82–0.86, UA = 0.87–0.88, F1 = 0.84–0.87). Unexpectedly, an intermediate behavior occurs in spring (MAM = March-April-May) and autumn (SON = September-October-November), which correspond, respectively, to the snow melting and accumulation phase, in which instable snow behavior is expected, and, thus, is harder to predict. During spring, the Snow keeps high metric values (range values: PA = 0.93–0.95, UA = 0.95, F1 = 0.94–0.95), slightly higher than those for No-Snow (range values: PA = 0.91–0.93, UA = 0.89–0.91, F1 = 0.91). Finally, in fall, the Snow presents lower performances (range values: PA = 0.86–0.88, UA = 0.89–0.91, F1 = 0.88–0.90) than No-Snow (range values: PA = 0.95–0.96, UA = 0.93–0.94, F1 = 0.94–0.95). The results show that the trend of metrics by class goes hand in hand with their representativeness: in winter and spring, the snow class is the most extensive and reaches higher values, while the Non-Snow class, limited to a few pixels, has lower values. In contrast, in summer and autumn, when the situation is reversed and snow-covered territory is limited, the opposite occurs.

Figure 6 shows the algorithm’s performances with respect to different elevations, slopes, and aspects. A2 and A3 perform about the same in all three cases, while A1 shows a slightly different behavior, in particular considering slopes below 10 degrees and the Flat aspect. In all cases, A1 reaches higher performances compared to the other two approaches.

The Macro-F1 score reaches the maximum value around 2500 m a.s.l., where both classes are more equally represented, while, at the extremes, where one or the other class is practically never present, the metric shows low performance. This occurs because the training dataset is skewed toward one of the two classes and, therefore, overestimates them, rather than correctly identifying the minority class when present. Regarding slope areas with low slopes, i.e., less than 5 degrees, they are mainly distributed in the valley bottoms, i.e., at lower elevations, just as higher slopes correspond to higher elevations. This explains the similar trend among Macro-F1 metrics with respect to elevation and slope. Finally, the different exposure does not affect, in all three cases, the performance of the algorithm, which is high for all exposures, except the Flat one. This is because, in both A2 and A3, the watersheds are masked with a fixed layer during the transition from the NDSI map to the binary snow map, while, in A1, the determination of snow cover is directly determined.

Slope and aspect maps of the study area are available in the Supplementary Materials.

Figure 7 shows the prediction accuracy of the predicted snowline (SWL) elevation against the independent dataset of real S2 scenes, with RMSE, MAE, and MBE reported in Table 4. The SWL altitude is computed as the lower altitude at which the snow covers above 50% of the altitudinal band. Being the snow map of 20 m resolution, each altitudinal band investigated is of 20 m in elevation. The SWL pixels are the edges of the snow patches retrieved (Figure 8).

In 25% of times (11 times out of 44 observations), the snowline is predicted with an elevation error of 0 m. However, the results show that there are outliers in the time series, up to an estimation error of 680 m. This suggests that the overall time series is reliable, but the single predicted scenes are not always, and it would be useful to integrate a prediction check based on contiguous days into the algorithm to eliminate these outliers.

3.2. Comparison with In Situ Measurements from Weather Stations

The temporal validation of the reconstructed series was achieved by comparison with Snow Depth data measured by weather stations, the results of which are shown in Figure 9. Producer and User Accuracies of the two classes are shown in the Supplementary material.

All three approaches have almost equal behavior at all stations, except for Locana Rosone and Rosone, both of which are in the valley bottom, at 700 m altitude, about 240 m apart, where the performances of the Snow class are low, especially for A1. This issue may stem from the fact that the valley floors are heavily shaded, so the training dataset, as well as the validation against the S2 themselves, has a low capacity in correctly detecting snow.

The overall accuracy is almost the same for the three approaches: always above 90% up to about 95%, in all the weather stations analyzed, except for the Rosone station, where A1 shows a slightly worst performance. On closer analysis, the Snow class metrics reveal that the behavior between A1 and the other two approaches diverges in the Locana Rosone and Rosone stations, which are characterized by proximity to each other and lower altitude (both at about 700 m a.s.l.). The other stations, on the contrary, are all above 1500 m a.s.l. Clearly, the location of the weather stations equipped with snow gauges is linked to the presence of snow, so they are typically located in open areas at high altitudes, and, thus, the spatial representativeness of the data obtained for analysis must be considered. Snow’s PA is always around 90%, except for the two aforementioned weather stations, where A1 reaches higher values than A2 and A3. Considering Snow’s UA, A1 shows the worst performances, with values less than 20%, while A2 and A3 values only decrease to 50–60%. The F1-score confirms the worst behavior of A1 compared to A2 and A3 for Locana Rosone and Rosone stations, while for all the other stations the performances are virtually equal. A1, therefore, has a higher Producer Accuracy of the Snow class than A2 and A3, but, at the same time, of the times that it indicates the presence of snow, this is true at a lower rate than A2 and A3 (lower User Accuracy). In other words, at the Locana Rosone and Rosone stations, all three overestimate the snow class; A1 to a greater extent than A2 and A3. The metrics of the No-Snow class are also essentially the same for the three approaches at all stations.

Finally, the snow phenology parameters were tested against weather stations (Table 5). The FSD is estimated with an RMSE of 17.2 days. In absolute values, the prediction is wrong by 7.9 days, while on average the error is 2.5 days. The LSD is predicted with an RMSE of 8.7 days, while the days of difference between the predicted and the observed is 5.5 in absolute value, or, on average, 2 days. Finally, the SCD shows an RMSE of 18.9 days, and a difference of 11 days in absolute value and 0.5 days on average. These values are total for all stations.

As pointed out by other authors [90,91], this validation approach based on the comparison of two spatially different types of data (bi-dimensional 20 × 20 m pixels on the one hand and uni-dimensional point data on the other) could lead to errors due to inaccuracies in satellite images georeferencing and co-registration, and, in addition, point data may not be representative of the pixel. However, weather stations represent useful temporally continuous ground truth data and, moreover, results show a good predictive ability of seasonal snow parameters.

3.3. McNemar Test

McNemar test results show (Table 6) that the three approaches’ performances are statistically different, in particular A1 against A2 and A3, while, in comparison, the significance of the difference between A2 and A3 is much lower. This difference could be explained by the different random forest in the second step of the algorithm: while A1 directly produces the binary snow cover map, A2 and A3 give a NDSI map from which to obtain the final binary product. Different from the NDSI computed from the real S2 scene, the predicted NDSI does not show a clear discontinuity in values between snow-covered and non-snow-covered areas, but shows a gradual transition from higher to lower altitudes. This could lead to the observed difference.

3.4. Random Forests

In total, four random forests were tuned: for step 1, the RF regression with MODIS FSC (used in A1 and A2) and the regression with MODIS NDSI (used in A3), and for step 2, the RF classification with SCE S2 (in A1) and the RF regression with NDSI S2 (in A2 and A3). The tuning results of the other RFs are given in the Supplementary Materials.

The two random forests in the workflow were tuned to extensively explore the predictive behavior of RF. The results reveal that this step is superfluous, and no significant improvement in performance is obtained when varying the hyperparameters considered, as highlighted by other studies [92]. The RF of the first step, which was used to complete the time series of the MODIS snow product, was fitted with a training sample of 25% of each MODIS scene available for the study area in the time frame under consideration, equal to 1.9 × 10⁴ pixels. No other training sizes were tested. First, the optimal number of trees to grow, between 100 and 1000, was investigated as a function of the explained variance (R² score) and the computational time. Based on the RF tuning results, we used an RF with 500 trees, mtry equal to 7, and a minimum node size of 3. In the second phase, tuning tests of the ntree hyperparameter were carried out with 100, 200, and 300 trees, the results of which showed that, with larger forests, the calculation time considerably increases with almost no improvement. An RF of 100 trees was then grown, with mtry equal to 3 and a minimum node size of 2. The RF tuning results are showed in the Supplementary Materials.

In terms of computational resources (Table 7), A1 is the most efficient, taking an average of 1.5 min per scene, while A2 and A3 reconstruct each scene in an average of 2 up to 3 min. Furthermore, the random forest training in the second step is faster in A1 compared to the other two approaches, considering a training dataset of 2.3 × 10⁷ observations and eight variables; above all, when reconstructing daily images, A1 requires 20 GB RAM to process, while A2 and A3 occupy more than 110 GB, which is not available on most machines.

Variable Importance

In the ranger package, is it possible to compute the ’impurity_corrected’ measure Gini index for classification and the variance reduction of the responses for regression [93]. The Gini index is a splitting measure that determines the information gained at each split node. Specifically, at each split, the Gini, or impurity index, measures the variable ability to separate observations into two distinct classes, lowering entropy. The average of all Gini scores obtained in the forest for variable yields the Gini variable importance measure [94]. The variance reduction is a method of calculating the importance of features on out-of-bag (OOB) samples based on the average decrease in accuracy [95]. In both cases, the higher the score, the better the predictive ability of the variable.

Figure 10 shows the variable importance of the RFs used in the first (a) and second steps (b). The most important features in the RF of step 1 are the day of the year, elevation, and year, followed by latitude (Y) and longitude (X), and, lastly, aspect and slope. Therefore, the determining factors are season and elevation. In the second RF, the most important features are the output of the first RF, i.e., MODIS FSC, elevation, and the day of the year, in accordance with the results obtained by Rittger et al. [58].

4. Discussion

In this study, three approaches were compared, differing in the type of input data used, based on the same two-stage workflow: the first aimed at filling the gaps of the MODIS snow time series and the second at performing data fusion between MODIS and S2. In both steps, random forests were grown using the satellite MOD10A1 and S2 snow map datasets as training, respectively. Therefore, it must be noted that the training dataset is not entirely free of errors and uncertainties, e.g., due to forests and shaded areas, which have not been explored in this study.

The proposed strategy was applied in one region, extending to about 2066 km², of the European Alps, and included 7 years of data. As the entire workflow is fully automated and requires only readily available satellite data and a DEM, it allows for applications to other study areas and for several years.

This comparison shows that all three approaches are valid, as supported by the high accuracy metrics values, and allow the reconstruction of a daily time series of snow cover at high resolutions, using various types of data: FSC, SCE, and NDSI maps. Overall, we can say that, among them, the A1 approach, which takes as input the FSC MOD10A1 dataset in the first step, and the SCE product in the second, offers the best opportunity, both in terms of accuracy and computational resources. Compared to the other two approaches, A1 is faster, since it does not require a further step to extract the binary snow map from the NDSI map with an optimized threshold and the masking of water bodies, and, moreover, the classification RF is inherently faster and lighter compared to the regression RF performed in step 2 by A2 and A3. When looking at the individual daily maps retrieved, outliers are present, but the entire time series is robust, as confirmed by the results. A future improvement of the algorithm, however, could introduce a probability score or post-processing smoothing based on contiguous days to handle these outliers.

To keep the algorithm easily reproducible in other areas, readily available variables were used: orographic features retrieved from a high-resolution DEM, S2 and MODIS data, which are globally and freely available. On the other hand, variables used in other studies, such as Land Use/Land Cover (LULC), distance to water bodies, and variables related to snow transport, such as wind fields and distance to topographic barriers, as in Rittger et al. [58], were discarded. This choice was guided by the fact that this work aims at reconstructing long time series of snow; thus, this would require high-resolution time series maps of LULC to be consistent, which are difficult to obtain. Moreover, the LULC also suffers from unpredictable errors, especially if moving backwards in time, so it was deemed appropriate not to use it. Other studies used the LULC as a variable for retrieving snow cover maps [96,97], but, in those cases, the data resolution and availability were consistent with the analysis scale. At the European level, CORINE Land Cover cartography at 30 m of resolution is available, which, however, is updated every 6 years. In general, the availability of LULC time series at high resolution represents a data gap at a global level. Despite this, the overall accuracy achieved by Rittger et al. [58] is slightly higher compared to our results (from + 1 up to + 5% based on the different seasons), while the F1-score is practically the same. Wind speed is an important parameter for the simulation of fine-scale snow distribution [98,99,100]; nevertheless, in Rittger et al. [58], this variable has a low importance compared to the others. This is probably due to the resolution of the data used by the authors not being consistent with the scale of analysis. This, on the one hand, shows that sufficiently realistic snow cover data (OA > 90%) can be obtained even without this specific input data, and, on the other hand, highlights a gap in the availability of high-resolution atmospheric data, which would be critical for improving the predictive ability of fine-scale models. In addition, snow transport by wind is affected by snow characteristics at the surface, such as the grain size, depending on various factors, which include melting/freezing cycles and snow age, that are scarcely available. This type of analysis is, however, probably too fine for the decametric scale, whereas it would be relevant for the sub-metric scale or snow depth analysis.

Another aspect of using fewer variables is the decrease in the computational resources required to run the algorithm. Although the machine used for the experiment had considerable resources (128 GB RAM), these were barely sufficient to run the RF of step 2 with a sample size of 5% of each training scene used in a first attempt, corresponding to a training dataset of 4.5 × 10⁷ observations. The computational time required to reconstruct each image with this RF was around 45 min, making the process untenable for long time series. To overcome this problem, the sample size was halved and a stratified random sampling approach was adopted, which proved effective in that, with half the training data, the final accuracy improved slightly and, most importantly, the computational performance of the algorithm greatly benefited. Luan et al. [101], to handle the same issue, proposed an approach based on a temporally dynamic RF training, which dissected the entire time period into smaller time windows of m-days based on the availability of Landsat images, then merged with MODIS. However, this approach misses the opportunity to exploit the strong interannual consistency of snow to predict long time series that may have few data available in the past and could, thus, benefit from the higher density of acquisitions in more recent years, not to mention the fact that this approach is not suitable for areas that suffer from cloud persistence.

5. Conclusions

Snow is an important driver of both biotic and non-biotic processes in mountain areas, both in terms of water intake and for its role of thermal insulation for low vegetation. The availability of high-resolution daily snow maps is critical to monitoring its dynamics.

In this context, this study presents a random-forest-based data fusion workflow to obtain high-resolution daily images using S2 and MODIS data, built on predictive variables: (1) orographic (elevation, aspect and slope), (2) topographic (latitude and longitude), and (3) temporal (day of the year, DOY, and year). The RF in the first step carries out the actual prediction, while the second RF performs a downscaling. The results show that the proposed workflow obtains an accurate gap-free time series of daily snow cover maps at 20 m. We compared three different approaches, differing in the type of data selected as input: in the case of MODIS, the FSC or, alternatively, the NDSI, and in the case of S2, the SCE or, again, the NDSI. The first approach achieves the best accuracy, slightly higher than the other two, and outperforms the latter in terms of computational performances. The workflow requires only three types of data as input: MODIS and S2 snow data and a DEM at a resolution consistent with the latter, from which topographic predictor variables are retrieved. The most important predictor variables are the day of the year, elevation, and year in the first RF and MODIS, elevation and the day of the year in the final RF. The necessary input data are readily and globally available, making the workflow potentially straightforwardly reproducible in other study areas obtaining snow data critical for various types of fine-scale studies. The generalizability of the workflow, with respect to the study site and time period analyzed, will be investigated in the future. The tuning results show that this step is superfluous, and it is possible to directly use the default values to fit the RFs. The presence of snow in forest areas was not investigated in this study and remains a challenging issue, partly due to the lack of ground truth data in these areas.

While the proposed method overcomes the limitations of detecting snow in cloudy conditions and the trade-off between high spatial and temporal resolution, it does not allow the study of time series prior to the start of the MODIS mission. A future improvement could investigate the integration of climate models, AVHRR data, or a combination of both, such as the dataset recently developed by Rößler and Dietz [102], to overcome this limitation. Other future developments include the integration of a snow redistribution model by wind transport and avalanches [98,103], and of Landsat images into the workflow to thicken the S2 collection and, most importantly, to extend the time series from 2000 to the present day [3]. This time series, of about 20 years, will allow the evolution of fine-scale accumulation and melting of snow to be explored, crucial to understanding the ecological dynamics of the study area.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs15020343/s1, Figure S1: Aspect map of the study area; Figure S2: Slope map of the study area; Figure S3: Producer Accuracy (PA) and User Accuracy (UA) for the classes Snow and No-Snow validated against independent dataset of Sentinel-2; Figure S4: Producer Accuracy (PA) and User Accuracy (UA) for the classes Snow and No-Snow validated against in situ snow depth data; Figure S5: Tuning of the hyperparameters of the regression random forest fitted in step 1 of Approaches 1 and 2. (a) Number of trees, (b) relative computing time, (c) mtry and (d) node size in function of R² score; Figure S6: Tuning of the hyperparameters of the regression random forest fitted in step 2 of Approach 2 and 3, (a) mtry and (b) node size in function of R² score; Figure S7: Tuning of the hyperparameters of the regression random forest fitted in step 1 of Approach 3. (a) Number of trees, (b) relative computing time, (c) mtry and (d) node size in function of R² score; Figure S8: Tuning of the hyperparameters of the classification random forest fitted in step 2 of Approach 1. (a) Number of trees, (b) relative computing time, (c) mtry and (d) node size in function of Out-Of-Bag (OOB) prediction error score.

Author Contributions

C.R.: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data Curation, Writing—Original Draft, Writing—Review and Editing, Visualization; C.S.: Writing—Review and Editing, Visualization, Supervision; M.A.: Conceptualization, Methodology, Writing—Review and Editing, Visualization, Supervision, Project administration, Funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the H2020 E-SHAPE project—EuroGEO Showcases: Applications Powered by Europe (www.e-shape.eu (accessed on 23 June 2022))—myEcosystem showcase, Grant Agreement: 820852.

Data Availability Statement

The workflow source code is available on a public repository in GitHub (https://github.com/chiararik/SCEgapfilling, accessed on 1 November 2022) [104].

Acknowledgments

The authors would like to express their gratitude to the Piedmont Regional Agency for Environmental Protection (ARPA Piemonte) for the ground truth data collected by the weather stations. This work also benefited from the support of the ABRESO project, for which we thank the exceptional team.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Zhang, T. Influence of the seasonal snow cover on the ground thermal regime: An overview. Rev. Geophys. 2005, 43, 4002. [Google Scholar] [CrossRef]
Zhu, J.; Wu, Q.; Wu, F.; Yue, K.; Ni, X. Decline in carbon decomposition from litter after snow removal is driven by a delayed release of carbohydrates. Plant Soil 2022, 7, 1–13. [Google Scholar] [CrossRef]
Bormann, K.J.; Brown, R.D.; Derksen, C.; Painter, T.H. Estimating snow-cover trends from space. Nat. Clim. Chang. 2018, 8, 924–928. [Google Scholar] [CrossRef]
Rixen, C.; Høye, T.T.; Macek, P.; Aerts, R.; Alatalo, J.M.; Anderson, J.T.; Arnold, P.A.; Barrio, I.C.; Bjerke, J.W.; Björkman, M.P.; et al. Winters are changing: Snow effects on Arctic and alpine tundra ecosystems. Arct. Sci. 2022, 8, 572–608. [Google Scholar] [CrossRef]
Rumpf, S.B.; Gravey, M.; Brönnimann, O.; Luoto, M.; Cianfrani, C.; Mariethoz, G.; Guisan, A. From white to green: Snow cover loss and increased vegetation productivity in the European Alps. Science 2022, 376, 1119–1122. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Wu, X.-J.; Liu, S.-W.; Xiao, C.-D.; Wang, X. Valuating service loss of snow cover in Irtysh River Basin. Adv. Clim. Chang. Res. 2019, 10, 109–114. [Google Scholar] [CrossRef]
Wu, X.; Wang, X.; Liu, S.; Yang, Y.; Xu, G.; Xu, Y.; Jiang, T.; Xiao, C. Snow cover loss compounding the future economic vulnerability of western China. Sci. Total. Environ. 2020, 755, 143025. [Google Scholar] [CrossRef] [PubMed]
Ma, L.; Qin, D.; Bian, L.; Xiao, C.; Luo, Y. Assessment of Snow Cover Vulnerability over the Qinghai-Tibetan Plateau. Adv. Clim. Chang. Res. 2011, 2, 93–100. [Google Scholar] [CrossRef]
Wang, S.; Yang, B.; Zhou, Y.; Wang, F.; Zhang, R.; Zhao, Q. Snow Cover Mapping and Ice Avalanche Monitoring from the Satellite Data of the Sentinels. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci.—ISPRS Arch. 2018, 42, 1765–1772. [Google Scholar] [CrossRef] [Green Version]
Noges, T.; Eckman, R.; Kangur, K.; Noges, P.; Reinart, A.; Roll, G.; Simola, H.; Viljanen, M. European Large Lakes—Ecosystem Changes and Their Ecological and Socioeconomic Impacts; Developments in Hydrobiology 199; Springer: Berlin/Heidelberg, Germany, 2008; ISBN 9781402083785. [Google Scholar]
Pehme, K.-M.; Burlakovs, J.; Kriipsalu, M.; Pilecka, J.; Grinfelde, I.; Tamm, T.; Jani, Y.; Hogland, W. Urban hydrology research fundamentals for waste management practices. Res. Rural Dev. 2019, 1, 160–167. [Google Scholar] [CrossRef]
Dietz, A.J.; Wohner, C.; Kuenzer, C. European Snow Cover Characteristics between 2000 and 2011 Derived from Improved MODIS Daily Snow Cover Products. Remote Sens. 2012, 4, 2432. [Google Scholar] [CrossRef] [Green Version]
Bi, Y.; Xie, H.; Huang, C.; Ke, C. Snow Cover Variations and Controlling Factors at Upper Heihe River Basin, Northwestern China. Remote. Sens. 2015, 7, 6741. [Google Scholar] [CrossRef] [Green Version]
Ke, C.-Q.; Li, X.-C.; Xie, H.; Ma, D.-H.; Liu, X.; Kou, C. Variability in snow cover phenology in China from 1952 to 2010. Hydrol. Earth Syst. Sci. 2016, 20, 755–770. [Google Scholar] [CrossRef] [Green Version]
Li, C.; Su, F.; Yang, D.; Tong, K.; Meng, F.; Kan, B. Spatiotemporal variation of snow cover over the Tibetan Plateau based on MODIS snow product, 2001–2014. Int. J. Climatol. 2018, 38, 708–728. [Google Scholar] [CrossRef]
Choubin, B.; Alamdarloo, E.H.; Mosavi, A.; Hosseini, F.S.; Ahmad, S.; Goodarzi, M.; Shamshirband, S. Spatiotemporal dynamics assessment of snow cover to infer snowline elevation mobility in the mountainous regions. Cold Reg. Sci. Technol. 2019, 167, 102870. [Google Scholar] [CrossRef]
Notarnicola, C. Hotspots of snow cover changes in global mountain regions over 2000–2018. Remote Sens. Environ. 2020, 243, 111781. [Google Scholar] [CrossRef]
Vorkauf, M.; Marty, C.; Kahmen, A.; Hiltbrunner, E. Past and future snowmelt trends in the Swiss Alps: The role of temperature and snowpack. Clim. Chang. 2021, 165, 44. [Google Scholar] [CrossRef]
Liu, J.; Li, Y.; Yu, J.; Yao, Y. Dynamic characteristics of snow frequency and its relationship with climate change on the Tibetan plateau from 2001 to 2015. Earth Sci. Inform. 2022, 15, 1233–1247. [Google Scholar] [CrossRef]
Tang, Z.; Deng, G.; Hu, G.; Zhang, H.; Pan, H.; Sang, G. Satellite observed spatiotemporal variability of snow cover and snow phenology over high mountain Asia from 2002 to 2021. J. Hydrol. 2022, 613, 2629–2645. [Google Scholar] [CrossRef]
Awasthi, S.; Varade, D. Recent advances in the remote sensing of alpine snow: A review. GIScience Remote Sens. 2021, 58, 852–888. [Google Scholar] [CrossRef]
Dozier, J.; Painter, T.H. Multispectral and Hyperspectral Remote Sensing of Alpine Snow Properties. Annu. Rev. Earth Planet. Sci. 2004, 32, 465–494. [Google Scholar] [CrossRef] [Green Version]
Dietz, A.J.; Kuenzer, C.; Gessner, U.; Dech, S. Remote sensing of snow—A review of available methods. Int. J. Remote Sens. 2011, 33, 4094–4134. [Google Scholar] [CrossRef]
Li, X.; Jing, Y.; Shen, H.; Zhang, L. The recent developments in cloud removal approaches of MODIS snow cover product. Hydrol. Earth Syst. Sci. 2019, 23, 2401–2416. [Google Scholar] [CrossRef] [Green Version]
Kelly, R.E.J.; Chang, A.T.C.; Foster, J.L.; Hai, D.K. Development of a passive microwave global snow monitoring algorithm for the Advanced Microwave Scanning Radiometer-EOS. In Proceedings of the International Geoscience and Remote Sensing Symposium (IGARSS), Sydney, Ausralia, 9–13 July 2001; Volume 2, pp. 804–806. [Google Scholar] [CrossRef]
Dong, J.; Walker, J.; Houser, P.; Sun, C. Scanning multichannel microwave radiometer snow water equivalent assimilation. J. Geophys. Res. Earth Surf. 2007, 112, D7. [Google Scholar] [CrossRef] [Green Version]
Li, L.; Chen, H.; Guan, L. Retrieval of Snow Depth on Arctic Sea Ice from the FY3B/MWRI. Remote Sens. 2021, 13, 1457. [Google Scholar] [CrossRef]
Forman, B.A.; Xue, Y. Machine learning predictions of passive microwave brightness temperature over snow-covered land using the special sensor microwave imager (SSM/I). Phys. Geogr. 2016, 38, 176–196. [Google Scholar] [CrossRef]
Salomonson, V.V.; Barnes, W.; Xiong, J.; Kempler, S.; Masuoka, E. An overview of the Earth Observing System MODIS instrument and associated data systems performance. In Proceedings of the International Geoscience and Remote Sensing Symposium (IGARSS), Waikoloa, HI, USA, 26 September–2 October 2003; Volume 2, pp. 1174–1176. [Google Scholar] [CrossRef]
Wu, X.; Naegeli, K.; Premier, V.; Marin, C.; Ma, D.; Wang, J.; Wunderle, S. Evaluation of snow extent time series derived from Advanced Very High Resolution Radiometer global area coverage data (1982–2018) in the Hindu Kush Himalayas. Cryosphere 2021, 15, 4261–4279. [Google Scholar] [CrossRef]
Crawford, C.J.; Manson, S.M.; Bauer, M.E.; Hall, D.K. Multitemporal snow cover mapping in mountainous terrain for Landsat climate data record development. Remote Sens. Environ. 2013, 135, 224–233. [Google Scholar] [CrossRef]
Gascoin, S.; Grizonnet, M.; Bouchet, M.; Salgues, G.; Hagolle, O. Theia Snow collection: High-resolution operational snow cover maps from Sentinel-2 and Landsat-8 data. Earth Syst. Sci. Data 2019, 11, 493–514. [Google Scholar] [CrossRef] [Green Version]
Warren, S.G. Optical properties of ice and snow. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2019, 377, 20180161. [Google Scholar] [CrossRef]
Dozier, J. Spectral signature of alpine snow cover from the landsat thematic mapper. Remote Sens. Environ. 1989, 28, 9–22. [Google Scholar] [CrossRef]
Paloscia, S.; Pettinato, S.; Santi, E.; Valt, M. COSMO-SkyMed Image Investigation of Snow Features in Alpine Environment. Sensors 2017, 17, 84. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Rizzoli, P.; Martone, M.; Rott, H.; Moreira, A. Characterization of Snow Facies on the Greenland Ice Sheet Observed by TanDEM-X Interferometric SAR Data. Remote Sens. 2017, 9, 315. [Google Scholar] [CrossRef] [Green Version]
Falk, U.; Gieseke, H.; Kotzur, F.; Braun, M. Monitoring snow and ice surfaces on King George Island, Antarctic Peninsula, with high-resolution TerraSAR-X time series. Antarct. Sci. 2015, 28, 135–149. [Google Scholar] [CrossRef]
Tsai, Y.-L.S.; Dietz, A.; Oppelt, N.; Kuenzer, C. Wet and Dry Snow Detection Using Sentinel-1 SAR Data for Mountainous Areas with a Machine Learning Technique. Remote Sens. 2019, 11, 895. [Google Scholar] [CrossRef] [Green Version]
Karbou, F.; Veyssière, G.; Coleou, C.; Dufour, A.; Gouttevin, I.; Durand, P.; Gascoin, S.; Grizonnet, M. Monitoring Wet Snow Over an Alpine Region Using Sentinel-1 Observations. Remote Sens. 2021, 13, 381. [Google Scholar] [CrossRef]
Lievens, H.; Brangers, I.; Marshall, H.-P.; Jonas, T.; Olefs, M.; De Lannoy, G. Sentinel-1 snow depth retrieval at sub-kilometer resolution over the European Alps. Cryosphere 2022, 16, 159–177. [Google Scholar] [CrossRef]
Koskinen, J.T.; Pulliainen, J.T.; Hallikainen, M.T. The use of ERS-1 SAR data in snow melt monitoring. IEEE Trans. Geosci. Remote Sens. 1997, 35, 601–610. [Google Scholar] [CrossRef]
Rao, Y.S.; Venkataraman, G.; Singh, G. ENVISAT-ASAR data analysis for snow cover mapping over Gangotri region. In Microwave Remote Sensing of the Atmosphere and Environment V; SPIE: Bellingham, WA, USA, 2006; Volume 6410, p. 641007. [Google Scholar]
Muhuri, A.; Manickam, S.; Bhattacharya, A. Scattering Mechanism Based Snow Cover Mapping Using RADARSAT-2 C-Band Polarimetric SAR Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3213–3224. [Google Scholar] [CrossRef]
Venkataraman, G.; Singh, G.; Yamaguchi, Y. Fully polarimetric ALOS PALSAR data applications for snow and ice studies. In Proceedings of the International Geoscience and Remote Sensing Symposium, Honolulu, HI, USA, 25–30 July 2010; pp. 1776–1779. [Google Scholar] [CrossRef]
Nagler, T.; Rott, H. Retrieval of wet snow by means of multitemporal SAR data. IEEE Trans. Geosci. Remote Sens. 2000, 38, 754–765. [Google Scholar] [CrossRef]
Tsai, Y.-L.S.; Dietz, A.; Oppelt, N.; Kuenzer, C. Remote Sensing of Snow Cover Using Spaceborne SAR: A Review. Remote Sens. 2019, 11, 1456. [Google Scholar] [CrossRef] [Green Version]
Brown, R.; Brasnett, B. The Canadian Meteorological Centre Global Daily Snow Depth Analysis, 1998-2011: Overview, Experience and Applications. In Proceedings of the 68th Eastern Snow Conference, Montreal, QC, Canada, 14–16 June 2011; pp. 197–200. [Google Scholar]
Metsämäki, S.; Pulliainen, J.; Salminen, M.; Luojus, K.; Wiesmann, A.; Solberg, R.; Böttcher, K.; Hiltunen, M.; Ripper, E. Introduction to GlobSnow Snow Extent products with considerations for accuracy assessment. Remote Sens. Environ. 2015, 156, 96–108. [Google Scholar] [CrossRef]
Ramsay, B.H. The interactive multisensor snow and ice mapping system. Hydrol. Process. 1998, 12, 1537–1546. [Google Scholar] [CrossRef]
Parajka, J.; Pepe, M.; Rampini, A.; Rossi, S.; Blöschl, G. A regional snow-line method for estimating snow cover from MODIS during cloud cover. J. Hydrol. 2010, 381, 203–212. [Google Scholar] [CrossRef]
López-Burgos, V.; Gupta, H.V.; Clark, M. Reducing cloud obscuration of MODIS snow cover area products by combining spatio-temporal techniques with a probability of snow approach. Hydrol. Earth Syst. Sci. 2013, 17, 1809–1823. [Google Scholar] [CrossRef] [Green Version]
Li, M.; Zhu, X.; Li, N.; Pan, Y. Gap-Filling of a MODIS Normalized Difference Snow Index Product Based on the Similar Pixel Selecting Algorithm: A Case Study on the Qinghai–Tibetan Plateau. Remote Sens. 2020, 12, 1077. [Google Scholar] [CrossRef] [Green Version]
Poussin, C.; Guigoz, Y.; Palazzi, E.; Terzago, S.; Chatenoux, B.; Giuliani, G. Snow Cover Evolution in the Gran Paradiso National Park, Italian Alps, Using the Earth Observation Data Cube. Data 2019, 4, 138. [Google Scholar] [CrossRef]
Chen, S.; Wang, X.; Guo, H.; Xie, P.; Sirelkhatim, A.M. Spatial and Temporal Adaptive Gap-Filling Method Producing Daily Cloud-Free NDSI Time Series. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 2251–2263. [Google Scholar] [CrossRef]
Chen, S.; Wang, X.; Guo, H.; Xie, P.; Wang, J.; Hao, X. A Conditional Probability Interpolation Method Based on a Space-Time Cube for MODIS Snow Cover Products Gap Filling. Remote Sens. 2020, 12, 3577. [Google Scholar] [CrossRef]
Hou, J.; Huang, C.; Zhang, Y.; Guo, J.; Gu, J. Gap-Filling of MODIS Fractional Snow Cover Products via Non-Local Spatio-Temporal Filtering Based on Machine Learning Techniques. Remote Sens. 2019, 11, 90. [Google Scholar] [CrossRef] [Green Version]
Liu, Y.; Chen, X.; Hao, J.-S.; Li, L.-H. Snow cover estimation from MODIS and Sentinel-1 SAR data using machine learning algorithms in the western part of the Tianshan Mountains. J. Mt. Sci. 2020, 17, 884–897. [Google Scholar] [CrossRef]
Rittger, K.; Krock, M.; Kleiber, W.; Bair, E.H.; Brodzik, M.J.; Stephenson, T.R.; Rajagopalan, B.; Bormann, K.J.; Painter, T.H. Multi-sensor fusion using random forests for daily fractional snow cover at 30 m. Remote Sens. Environ. 2021, 264, 112608. [Google Scholar] [CrossRef]
Revuelto, J.; Alonso-González, E.; Gascoin, S.; Rodríguez-López, G.; López-Moreno, J.I. Spatial Downscaling of MODIS Snow Cover Observations Using Sentinel-2 Snow Products. Remote Sens. 2021, 13, 4513. [Google Scholar] [CrossRef]
Bormann, K.J.; Westra, S.; Evans, J.P.; McCabe, M.F. Spatial and temporal variability in seasonal snow density. J. Hydrol. 2013, 484, 63–73. [Google Scholar] [CrossRef]
Dedieu, J.-P.; Carlson, B.Z.; Bigot, S.; Sirguey, P.; Vionnet, V.; Choler, P. On the Importance of High-Resolution Time Series of Optical Imagery for Quantifying the Effects of Snow Cover Duration on Alpine Plant Habitat. Remote Sens. 2016, 8, 481. [Google Scholar] [CrossRef] [Green Version]
Hall, D.K.; Riggs, G.A. MODIS/Terra Snow Cover Daily L3 Global 500 m SIN Grid, Version 61; NASA National Snow and Ice Data Center Distributed Active Archive Center: Boulder, CO, USA, 2022. [Google Scholar]
Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA's Optical High-Resolution Mission for GMES Operational Services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
Stüwe, M.; Nievergelt, B. Recovery of alpine ibex from near extinction: The result of effective protection, captive breeding, and reintroductions. Appl. Anim. Behav. Sci. 1991, 29, 379–387. [Google Scholar] [CrossRef]
Hammond, J.C.; Saavedra, F.A.; Kampf, S.K. Global snow zone maps and trends in snow persistence 2001–2016. Int. J. Clim. 2018, 38, 4369–4383. [Google Scholar] [CrossRef]
Justice, C.O.; Townshend, J.R.G.; Vermote, E.F.; Masuoka, E.; Wolfe, R.E.; Saleous, N.; Roy, D.P.; Morisette, J.T. An overview of MODIS Land data processing and product status. Remote Sens. Environ. 2002, 83, 3–15. [Google Scholar] [CrossRef]
Richiardi, C.; Blonda, P.; Rana, F.; Santoro, M.; Tarantino, C.; Vicario, S.; Adamo, M. A Revised Snow Cover Algorithm to Improve Discrimination between Snow and Clouds: A Case Study in Gran Paradiso National Park. Remote Sens. 2021, 13, 1957. [Google Scholar] [CrossRef]
Qiu, S.; Zhu, Z.; He, B. Fmask 4.0: Improved cloud and cloud shadow detection in Landsats 4–8 and Sentinel-2 imagery. Remote Sens. Environ. 2019, 231, 111205. [Google Scholar] [CrossRef]
Main-Knorn, M.; Pflug, B.; Louis, J.; Debaecker, V.; Müller-Wilm, U.; Gascon, F. Sen2Cor for Sentinel-2. In Image and Signal Processing for Remote Sensing XXIII; SPIE: Bellingham, WA, USA, 2017; Volume 10427, p. 3. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Mott, R.; Schirmer, M.; Bavay, M.; Grünewald, T.; Lehning, M. Understanding snow-transport processes shaping the mountain snow-cover. Cryosphere 2010, 4, 545–559. [Google Scholar] [CrossRef] [Green Version]
Saydi, M.; Ding, J.-L. Impacts of topographic factors on regional snow cover characteristics. Water Sci. Eng. 2020, 13, 171–180. [Google Scholar] [CrossRef]
Gao, B.-C. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
Yin, D.; Cao, X.; Chen, X.; Shao, Y.; Chen, J. Comparison of automatic thresholding methods for snow-cover mapping using Landsat TM imagery. Int. J. Remote Sens. 2013, 34, 6529–6538. [Google Scholar] [CrossRef]
Hall, D.K.; Riggs, G.A.; Salomonson, V.V. Development of methods for mapping global snow cover using moderate resolution imaging spectroradiometer data. Remote Sens. Environ. 1995, 54, 127–140. [Google Scholar] [CrossRef]
Belgiu, M.; Drăgu, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Wright, M.N.; Ziegler, A. Ranger: A fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 2017, 77, 1–17. [Google Scholar] [CrossRef] [Green Version]
Raab, C.; Stroh, H.G.; Tonn, B.; Meißner, M.; Rohwer, N.; Balkenhol, N.; Isselstein, J. Mapping semi-natural grassland communities using multi-temporal RapidEye remote sensing data. Int. J. Remote Sens. 2018, 39, 5638–5659. [Google Scholar] [CrossRef]
Poppiel, R.R.; Lacerda, M.P.C.; Rizzo, R.; Safanelli, J.L.; Bonfatti, B.R.; Silvero, N.E.Q.; Demattê, J.A.M. Soil Color and Mineralogy Mapping Using Proximal and Remote Sensing in Midwest Brazil. Remote Sens. 2020, 12, 1197. [Google Scholar] [CrossRef] [Green Version]
Probst, P.; Wright, M.N.; Boulesteix, A.L. Hyperparameters and tuning strategies for random forest. WIREs Data Min. Knowl. Discov. 2019, 9, e1301. [Google Scholar] [CrossRef] [Green Version]
Orsolini, Y.; Wegmann, M.; Dutra, E.; Liu, B.; Balsamo, G.; Yang, K.; de Rosnay, P.; Zhu, C.; Wang, W.; Senan, R.; et al. Evaluation of snow depth and snow cover over the Tibetan Plateau in global reanalyses using in situ and satellite remote sensing observations. Cryosphere 2019, 13, 2221–2239. [Google Scholar] [CrossRef] [Green Version]
Parajka, J.; Blöschl, G. Validation of MODIS snow cover images over Austria. Hydrol. Earth Syst. Sci. 2006, 10, 679–689. [Google Scholar] [CrossRef] [Green Version]
Simic, A.; Fernandes, R.; Brown, R.; Romanov, P.; Park, W. Validation of VEGETATION, MODIS, and GOES+ SSM/I snow-cover products over Canada based on surface snow depth observations. Hydrol. Process. 2004, 18, 1089–1104. [Google Scholar] [CrossRef]
Hao, X.; Huang, G.; Zheng, Z.; Sun, X.; Ji, W.; Zhao, H.; Wang, J.; Li, H.; Wang, X. Development and validation of a new MODIS snow-cover-extent product over China. Hydrol. Earth Syst. Sci. 2022, 26, 1937–1952. [Google Scholar] [CrossRef]
Metsämäki, S.; Böttcher, K.; Pulliainen, J.; Luojus, K.; Cohen, J.; Takala, M.; Mattila, O.-P.; Schwaizer, G.; Derksen, C.; Koponen, S. The accuracy of snow melt-off day derived from optical and microwave radiometer data—A study for Europe. Remote Sens. Environ. 2018, 211, 1–12. [Google Scholar] [CrossRef]
Diodato, N.; Ljungqvist, F.C.; Bellocchi, G. Empirical modelling of snow cover duration patterns in complex ter-rains of Italy. Theor. Appl. Climatol. 2022, 147, 1195–1212. [Google Scholar] [CrossRef]
Tharwat, A. Classification assessment methods. Appl. Comput. Inform. 2018, 17, 168–192. [Google Scholar] [CrossRef]
Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Haddon, M. Modelling and Quantitative Methods in Fisheries, 2nd ed.; Chapman and Hall/CRC: New York, NY, USA; London, UK, 2011; ISBN 9781119130536. [Google Scholar]
Congalton, R.G.; Green, K. Assessing the Accuracy of Remotely Sensed Data, 3rd ed.; CRC Press: Boca Raton, FL, USA; Taylor & Francis Group: Boca Raton, FL, USA, 2019. [Google Scholar]
Mayr, S.; Kuenzer, C.; Gessner, U.; Klein, I.; Rutzinger, M. Validation of Earth Observation Time-Series: A Review for Large-Area and Temporally Dense Land Surface Products. Remote Sens. 2019, 11, 2616. [Google Scholar] [CrossRef] [Green Version]
Ford, T.W.; Quiring, S.M. Comparison of Contemporary In Situ, Model, and Satellite Remote Sensing Soil Moisture With a Focus on Drought Monitoring. Water Resour. Res. 2019, 55, 1565–1582. [Google Scholar] [CrossRef]
Maxwell, A.E.; Warner, T.A.; Fang, F. Implementation of machine-learning classification in remote sensing: An applied review. Int. J. Remote Sens. 2018, 39, 2784–2817. [Google Scholar] [CrossRef] [Green Version]
Nembrini, S.; König, I.R.; Wright, M.N. The revival of the Gini importance? Bioinformatics 2018, 34, 3711–3718. [Google Scholar] [CrossRef] [Green Version]
Sandri, M.; Zuccolotto, P. A Bias Correction Algorithm for the Gini Variable Importance Measure in Classification Trees. J. Comput. Graph. Stat. 2008, 17, 611–628. [Google Scholar] [CrossRef]
Archer, K.J.; Kimes, R.V. Empirical characterization of random forest variable importance measures. Comput. Stat. Data Anal. 2008, 52, 2249–2260. [Google Scholar] [CrossRef]
Dobreva, I.D.; Klein, A.G. Fractional snow cover mapping through artificial neural network analysis of MODIS surface reflectance. Remote Sens. Environ. 2011, 115, 3355–3366. [Google Scholar] [CrossRef]
Xiao, X.; He, T.; Liang, S.; Liu, X.; Ma, Y.; Liang, S.; Chen, X. Estimating fractional snow cover in vegetated environments using MODIS surface reflectance data. Int. J. Appl. Earth Obs. Geoinf. 2022, 114, 103030. [Google Scholar] [CrossRef]
Liston, G.E.; Haehnel, R.B.; Sturm, M.; Hiemstra, C.A.; Berezovskaya, S.; Tabler, R.D. Simulating complex snow distributions in windy environments using SnowTran-3D. J. Glaciol. 2007, 53, 241–256. [Google Scholar] [CrossRef] [Green Version]
Vionnet, V.; Martin, E.; Masson, V.; Guyomarc'H, G.; Naaim-Bouvet, F.; Prokop, A.; Durand, Y.; Lac, C. Simulation of wind-induced snow transport and sublimation in alpine terrain using a fully coupled snowpack/atmosphere model. Cryosphere 2014, 8, 395–415. [Google Scholar] [CrossRef] [Green Version]
Luan, W.; Zhang, X.; Xiao, P.; Wang, H.; Chen, S. Binary and Fractional MODIS Snow Cover Mapping Boosted by Machine Learning and Big Landsat Data. IEEE Trans. Geosci. Remote Sens. 2022, 60, 962. [Google Scholar] [CrossRef]
Rößler, S.; Dietz, A.J. Detection of Snow Cover from Historical and Recent AVHHR Data—A Thematic TIMELINE Processor. Geomatics 2022, 2, 9. [Google Scholar] [CrossRef]
Wayand, N.E.; Marsh, C.B.; Shea, J.M.; Pomeroy, J.W. Globally scalable alpine snow metrics. Remote Sens. Environ. 2018, 213, 61–72. [Google Scholar] [CrossRef]
Richiardi, C.; Adamo, M. chiararik/SCEgapfilling: Snow cover gap filling workflow. Zenodo 2022, 13, 1957. [Google Scholar]

Figure 1. Study area (blue line), located in NW Alps, Italy, and centered on the Gran Paradiso National Park (GPNP, green line). The study area is entirely covered by the Sentinel-2 tile T32TLR (red line).

Figure 2. Algorithm workflow chart. (a1) MODIS dataset: in Approaches 1 and 2 is selected the Fractional Snow Cover (FSC), in Approach 3 is selected the Normalized Difference Snow Index (NDSI). (a2) MODIS dataset of step 1 after gap-filling. (b) Sentinel-2 dataset: in Approach 1 is selected the Snow Cover Extent (SCE) product derived from Sentinel-2 imagery. In Approaches 2 and 3 is se-lected the NDSI derived from the same Sentinel-2 imagery. (c) Random forest 1: in all the three approaches is a regression random forest. (d) Random forest 2: in Approach 1 is a classification random forest, whose output is directly the binary snow cover, in Approaches 2 and 3 is a regres-sion random forest, which gives as intermediate output the NDSI map, subsequently converted in the final binary snow cover output.

Figure 3. S2 scenes with a cloud cover threshold below 10% used to train the random forest (a) per year and (b) per month.

Figure 4. Example of three scenes outputs. First row: MODIS Fractional Snow Cover (FSC) data; second row: real S2 Snow Cover Extent (SCE), with gaps due to clouds and cloud shadows and No Data; third to fifth rows: output SCE maps. In the days without S2 acquisition, as in (c), the entire scene is predicted.

Figure 5. Accuracy metrics for Snow and No-Snow classes of the classification’s performances against an independent dataset of S2 scenes. F1 = F1-score, OA = Overall Accuracy, DJF = December-January-February, MAM = March-April-May, JJA = June-July-August, SON = September-October-November. A1 = blue line, A2 = red line, A3 = yellow line.

Figure 6. Macro-F1 score of the predicted snow cover maps according to elevation (left plot), slope (middle plot) and aspect (right plot). A2 (red) and A3 (yellow) are almost coincident in the three graphs. (Fl = Flat, N = North, NE = North-East, E = East, SE = South-East, S = South, SW = South-West, W = West, NW = North-West).

Figure 7. Scatterplots of the predicted and observed snowline elevation (SWL) (m a.s.l.) for the three approaches (A1–A3).

Figure 8. Example of the snowline (SWL) elevation and pixels predicted against the real S2 scene acquired on 24 June 2016. Top left: natural color S2 with close-up area. A1–A3: close-up with real S2 SCE map in background and SWL pixels and elevation computed, respectively, with A1, A2, and A3 approaches.

Figure 9. Accuracy metrics for Snow and No-Snow classes of the classification’s performances against weather stations. F1 = F1-score, OA = Overall Accuracy. A1 = blue line, A2 = red line, A3 = yellow line.

Figure 10. Variable importance of the regression random forests respectively used in step 1 (a), used to retrieve the gap-filled MODIS now data time series, and in step 2 (b), used to retrieve the daily high resolution snow cover time series.

Table 1. Description of the different input datasets and random forests selected in the three different approaches compared in this study. Fractional Snow Cover (FSC), Snow Cover Extent (SCE), Normalized Difference Snow Index (NDSI).

Figure 2	Data	Approach 1	Approach 2	Approach 3	RF Variables
(a)	MOD10A1 dataset	FSC	FSC	NDSI
(b)	S2 dataset	SCE	NDSI	NDSI
(c)	Random forest 1	Regression	Regression	Regression	Elevation Slope Aspect Day of Year Latitude Longitude Year
(d)	Random forest 2	Classification	Regression	Regression	Elevation Slope Aspect Day of Year Year Latitude Longitude Gap-filled MODIS

Table 2. Classes of slope for the training data sampling.

Slope (Degrees)	Class
0–5	1
5–10	2
10–15	3
15–20	4
20–25	5
25–35	6
35–45	7
>45	8

Table 3. Location and topographic features of the weather stations used for validation (CRS: WGS84 UTM 32N). Fl = Flat, N = North, NE = North-East, E = East, SE = South-East, S = South, SW = South-West, W = West, NW = North-West.

Station	Elevation (m a.s.l.)	Slope (Degrees)	Aspect	UTM Coordinates (m)
Station	Elevation (m a.s.l.)	Slope (Degrees)	Aspect	Northing	Easting
Ceresole Reale	1573	4.6	SW	5,032,244	362,681
Ceresole Villa	1581	2.5	SE	5,033,408	360,081
Eugio	1900	12.9	NE	5,035,124	378,243
Lago Serrù	2283	12.4	N	5,035,792	354,154
Lago Agnel	2304	8.2	E	5,036,613	354,538
Lago Valsoera	2365	7.8	SE	5,038,103	374,395
Locana Rosone	700	4.9	E	5,032,556	376,343
Rosone	701	5.8	S	5,032,323	376,293
Telessio	1940	17.5	NW	5,037,970	372,845
Val Soera	2412	21.8	W	5,038,281	374,628

Table 4. Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Bias Error (MBE) of the predicted snowline elevation.

	RMSE (m)	MAE (m)	MBE (m)
A1	162	85	50
A2	162	98	29
A3	223	113	44

Table 5. Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Bias Error (MBE) calculated comparing snow seasonal parameters (First Snow Day, FSD, Last Snow Day, LSD, and Snow Cover Duration, SCD) of overall predicted snow cover data and weather stations.

Parameter	RMSE (Days)	MAE (Days)	MBE (Days)
FSD	17.2	7.9	2.5
LSD	8.7	5.5	2.0
SCD	18.9	11.0	0.5

Table 6. McNemar test between the three approaches’ performances against weather stations.

	A1 vs. A2	A1 vs. A3	A2 vs. A3
χ²	604.4	464.5	7.6
p-value	<2.2 × 10⁻¹⁶	<2.2 × 10⁻¹⁶	0.0058

Table 7. Computational and time resources required by the processing.

	A1	A2	A3
RF training (step 2) time (min)	43	56	70
Prediction time per image (min)	1.5	2	3
RF prediction (step 2) resources (GB of RAM)	20	>110	>110

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Richiardi, C.; Siniscalco, C.; Adamo, M. Comparison of Three Different Random Forest Approaches to Retrieve Daily High-Resolution Snow Cover Maps from MODIS and Sentinel-2 in a Mountain Area, Gran Paradiso National Park (NW Alps). Remote Sens. 2023, 15, 343. https://doi.org/10.3390/rs15020343

AMA Style

Richiardi C, Siniscalco C, Adamo M. Comparison of Three Different Random Forest Approaches to Retrieve Daily High-Resolution Snow Cover Maps from MODIS and Sentinel-2 in a Mountain Area, Gran Paradiso National Park (NW Alps). Remote Sensing. 2023; 15(2):343. https://doi.org/10.3390/rs15020343

Chicago/Turabian Style

Richiardi, Chiara, Consolata Siniscalco, and Maria Adamo. 2023. "Comparison of Three Different Random Forest Approaches to Retrieve Daily High-Resolution Snow Cover Maps from MODIS and Sentinel-2 in a Mountain Area, Gran Paradiso National Park (NW Alps)" Remote Sensing 15, no. 2: 343. https://doi.org/10.3390/rs15020343

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparison of Three Different Random Forest Approaches to Retrieve Daily High-Resolution Snow Cover Maps from MODIS and Sentinel-2 in a Mountain Area, Gran Paradiso National Park (NW Alps)

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data

2.2.1. MODIS

2.2.2. Sentinel-2

2.2.3. Digital Elevation Model

2.3. Workflow: Two-Stage Random Forests

Random Forest

2.4. Validation

2.4.1. Comparison with Real Sentinel-2 SCE Maps

2.4.2. Comparison with In Situ Measurements from Weather Stations

2.4.3. Evaluation Metrics

3. Results

3.1. Comparison with Real Sentinel-2 SCE Maps

3.2. Comparison with In Situ Measurements from Weather Stations

3.3. McNemar Test

3.4. Random Forests

Variable Importance

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI