Introduction

In the last decades, there has been an increase in the reporting of landslide phenomena that caused damage or threatened society (Petley, 2012). Recent reviews highlight a growing number of publications dealing with the quantitative spatial prediction of landslides (Reichenbach et al., 2018; Steger & Kofler, 2019; Wu et al., 2015). For very large areas, such assessments are usually based on statistical methods and include national-scale analyses (e.g. (Dikau & Glade, 2003; Domínguez-Cuesta & Bobrowsky, 2017; Ferentinou & Chalkias, 2013; Gaprindashvili & Westen, 2015; Graff et al., 2012; Komac, 2006; Komac & Ribicic, 2006; Liu et al., 2013; Malet et al., 2008; Sabatakakis et al., 2013; Trigila et al., 2013)), continental-scale assessments (e.g. (van Den Eeckhaut et al., 2012; Günther et al., 2013; Günther et al., 2014; Wilde et al., 2018)) and global models (Hong et al., 2007; Lin et al., 2017; Nadim et al., 2006). Lack of reliable spatial geotechnical information and restrictions related to computational resources are well known to hamper the application of physically based spatial landslide models for such large territories (Corominas et al., 2014).

The primary purpose of a landslide susceptibility map is to spatially depict the relative likelihood of landslides affecting an area under a given set of geo-environmental conditions (Brabb, 1984; Fell et al., 2008; Glade & Crozier, 2005). The selection of an appropriate modelling approach should consider the size and characteristics of the target area and the availability and quality of input data (Corominas et al., 2014; Guzzetti, 2005). Under real-world data conditions, the size of a study area is likely to influence the consistency and quality of available geo-environmental and landslide information. Compared to physically based slope stability models, statistical landslide susceptibility analyses are more flexible in terms of input data and thus often applied for the assessment of large areas (Cascini, 2008; Corominas et al., 2014; Sabatakakis et al., 2013; van Westen et al., 2008). Statistically based landslide susceptibility assessments are based on the assumption that the potential location of an upcoming slope failure can be estimated by analysing past landslides and their relation to spatial geo-environmental variables. In most cases, the underlying classification algorithms build a statistical relation between past landslide presence/absence information and a set of explanatory variables that describe the underlying terrain conditions, e.g. the topography, lithology and land cover. The ensuing prediction rule often expressed as a susceptibility score between zero and one is then applied to each terrain unit of a study area (e.g. raster cell, slope unit) (Jacobs et al., 2020). For validation, the predicted susceptibility score is confronted with model-independent landslide observations in order to get insights into the prediction capability of the model (Chung & Fabbri, 2003; Frattini et al., 2010; Reichenbach et al., 2018). Several studies have shown that the explanatory power of statistically based spatial landslide predictions is particularly dependent on the reliability of landslide training data (Ardizzone et al., 2002; Harp et al., 2011; Steger et al., 2016a; Zêzere et al., 2017).

In its basic form, a landslide inventory represents a collection of past landslide locations. For most scientific purposes, a further distinction between landslide types is required (Corominas et al., 2014; Guzzetti et al., 2012). Additional information on the spatial accuracy of events, their temporal occurrence and mapping uncertainties can further enhance the usability of inventory data (Guzzetti et al., 2012). However, for very large areas, consistent and accurate landslide information is rarely available and the creation of an assembled landslide inventory based on a variety of sources is common practice (Malamud et al., 2004). The resultant spatially heterogeneous data quality poses a challenge for the spatial prediction of landslide phenomena using statistically based approaches (Ardizzone et al., 2002; Fressard et al., 2014; Harp et al., 2011; Hussin et al., 2016; Steger et al., 2017).

The positional accuracy of a landslide inventory is known to be associated with the underlying data source and mapping technique applied (Guzzetti et al., 2012). To what extent positional inaccuracies are propagated into the final modelling and validation results also depends on the chosen raster resolution and the type of terrain representation (Jacobs et al., 2020; Steger et al., 2016b; Steger et al., 2020). Alternatives to a raster-based representation of the landscape, like slope units (Alvioli et al., 2016), are gaining increasing popularity in the context of statistical landslide susceptibility modelling, also because they are considered to be geomorphically more meaningful and potentially less sensitive to landslide positional errors and noise in geo-environmental input data (Alvioli et al., 2016; Camilo et al., 2017; van Den Eeckhaut et al., 2009; Guzzetti et al., 2006; Jacobs et al., 2020; Schlögel et al., 2018).

Another concern in statistical landslide susceptibility modelling is the spatial representativeness (i.e. completeness) of available landslide information (Guzzetti et al., 2012; Steger et al., 2017). A common source of such spatial bias relates to the underreporting of landslides that did not cause social, infrastructural or monetary damage leading to an underrepresentation of landslides far from settlements and infrastructure (Bell et al., 2012; Brardinoni et al., 2003; Petschko et al., 2014; Sabatakakis et al., 2013). Human activity (e.g. earthmoving) and natural processes (e.g. erosion or ecological succession) may as well induce a spatially changing completeness of landslide information among different land cover types (Bell et al., 2012; Petschko et al., 2016). Varying completeness of landslide data can also be related to the boundaries of administrative units. For example, the resources allocated for landslide mapping may vary among administrative units, such as political counties, provinces or municipalities, which can introduce another type of spatial bias (Steger et al., 2017; Trigila et al., 2013). Merging existing landslide inventories from different study sites may also introduce spatial inconsistencies because areas, where less detailed mapping campaigns were performed, are likely to under represent the portion of past landslide occurrences.

Literature indicates that the positional accuracy and spatial consistency (i.e. representativeness) of mapped events are frequently ignored in statistical susceptibility modelling. Especially for large study sites, where data inconsistencies represent the rule rather than an exception, flawed landslide information may lead to erroneous landslide susceptibility models, which in turn may negatively influence the explanatory power of each subsequent decision or analysis. Previous publications highlighted some strategies that may counteract associated error propagations by focusing on the sampling of non-landslide locations (van Den Eeckhaut et al., 2012), positional inaccuracies of landslide information (Jacobs et al., 2020; Steger et al., 2016b) and systematic inventory-based incompleteness (Steger et al., 2017).

This research aims (i) to assess landslide susceptibility for the entire Austrian territory by (ii) counterbalancing landslide inventory-based incompleteness and by (iii) minimizing the effect of positional inaccuracies through a slope unit terrain representation. The application of different classification algorithms (i.e. logistic regression vs. mixed-effects logistic regression) and different terrain representations (i.e. grid-based vs. slope units) allowed the comparison and evaluation of four models in terms of their ability to counterbalance flawed landslide information. Only shallow translational earth and debris slides (further termed landslides) were considered within this analysis (Cruden & Varnes, 1996; Dikau et al., 1996).

Study area

Containing approximately 84,000 km2, the territory of Austria (Fig. 1) is located centrally on the European continent and its approx. 8.8 million inhabitants are distributed across nine federal provinces (Statistik Austria, 2017). The topography of Austria can be described as undulating to flat in the East and predominantly mountainous in the West. The Danube river basin in the East crosses the political capital Vienna in the very East. In contrast, the alpine landscape in the central and western parts exhibits elevations of up to 3798 m asl. at the Großglockner peak. From the north-eastern lower and gentle terrain to the high alpine landscape in the central and western parts, the Austrian territory is characterized by a considerable morphological variety. Landslides represent a substantial threat to private and public properties, critical infrastructure and the population where the landslide favouring geomorphological conditions coincides with specific socio-economical and demographical settings (Petschko et al., 2013; Schweigl & Hervás, 2009).

Fig. 1
figure 1

(A) The location of Austria at the central European territory. (B) Overview of the Austrian morphological and political territory. (C) Shallow landslide events that occurred within the Austrian territory. Photographs were provided by the Geological Survey of Austria

The geological setting of the Austrian territory is defined by a Northern part mainly dominated by the presence of the Bohemian Massif and Molasse zones. These partly deforested flat to hilly landscapes are located within the northern part of Austria (Krenmayr et al., 2000). The Helvetic and Flysch zones represent relatively narrow lithological units located south of the Molasse zone and correspond to sedimentary deposits. The Flysch zone, an elongated sandstone-rich unit, represents a gentle to hilly landscape that originated from the depositional processes dating from Upper Cretaceous and Neogene. The Flysch zone is well known to be particularly prone to landslides of slide-type movement (Petschko et al., 2014; Schwenk, 1992; Terhorst & Damm, 2009). Various lithologies like the Greywacke zone, the Calcareous Alps, the Penninic unit, crystalline rocks and the Periadriatic compose the high elevation alpine terrain, where landsliding is common, but often underestimated (Krenmayr et al., 2000).

The climate is influenced by the Alpine territory, which represents 62.8% of the total area of Austria (BMLFUW, 2007), with strong influences from the Atlantic continental, sub-Mediterranean and polar or subpolar air masses. Temperature regimes largely depend on elevation ranges and seasonal trends (Auer et al., 2007; Hiebl & Frei, 2016). The mean annual rainfall is also greatly variable over the territory. The humid season also corresponds with the warmer period. Precipitation average annual values range from a few hundred millimetres per year (400 to 600 mm/year in the Eastern part) to more than 2000 mm/year in some of the Alpine areas (BMLFUW, 2007). Localized, highly intense rainfall events are mainly concentrated during the summer-autumn period and can be manifested as heavy thunderstorms, bringing heavy hail, in summer, occurring throughout the country. However, comprehensive historical data analysis has shown significant regional and different seasonal tendencies (Auer et al., 2007).

The predisposition for landsliding in Austria depends on an interplay of different environmental factors, such as lithology and its weathering products, topography, land cover and human impact (Bell et al., 2012; Schweigl & Hervás, 2009). Extreme precipitation events, as well as rapid snow-melting, are known to represent typical landslide triggering factors (Glade et al., 2020). Severe landslide events caused major consequences, for instance in August 2005 in Gasen und Haslau, where hundreds of landslides were reported after a heavy rainfall of 200 mm in 48 h (Tilch, 2009). In June 2009, thousands of landslides were triggered by heavy rainfalls in Feldbach (Hornich & Adelwöhrer, 2010).

Data

Landslide inventory

The landslide information used for this study consists of 23,891 shallow translational landslides (Cruden & Varnes, 1996; Dikau et al., 1996; Hungr et al., 2014). The final landslide inventory was compiled from nine sub-inventories, which differ in their spatial reference area (i.e. mapping domain) and applied mapping technique. From now on, the term “mapping domain” is used to describe the spatial extension (i.e. a polygon) delimiting the area of each sub-inventory. Complementary inventory characteristics, such as the coverage area (km2), density (slides/km2), the mapping technique used, location description and also a quality indicator about the positional accuracy of each inventory, are described within Table 1 and represented by Fig. 2(D). Although information on the trigger mechanism was not available for all the sub-inventories, it is known that commonly the main triggers of shallow earth and debris slides in Austria are associated with hydro-meteorological events and not of tectonic origin (Schweigl & Hervás, 2009). Consequently, the present study refers to hydro-meteorological landslide trigger only.

Table 1 Characterization of available landslide information. The positional accuracy indicator (one to three crosses) was deduced from the authority’s inventory data documentation. The positional accuracy was estimated higher (+++) in the case of inventories mapped utilizing high-resolution aerial photography, high-resolution DTM derivatives and GPS field campaigns. Mid accuracy (++) was assigned to digitalized inventories from other sources. Lower accuracy (+) was assigned to landslides inventories build on inferior accuracy mapping procedures (e.g. digitalization from old archives)
Fig. 2
figure 2

The landslide inventory composed of 23,891 observations is displayed within the central image as black points which are the basis for the calculated landslide density (B). Two closer frames display details of the lithology (A) and land cover variables (B), respectively. The nine different inventory mapping domains (Mp) are given in (D)

Available landslide inventories are a more or less accurate representation of the past landslide locations (Guzzetti et al., 1999; Guzzetti et al., 2012). Detailed new mapping campaigns that cover very large territories (e.g. entire nations) are costly and time-consuming and do not necessarily lead to a representative landslide data set since the footprint of old landslides might be absent within specific areas. In practice, it usually remains to the researcher to work with the best available data by keeping in mind its limitations. For the present study, the spatial bias in the available landslide information plays a significant role. Literature suggests that in Austria, specific land cover features may over- or underrepresent the true landslide occurrence (Bell et al., 2012; Petschko et al., 2014). Although the compiled inventory contains a considerable number of landslides, the first inspection of available meta-data and the available spatial information indicated heterogeneous completeness among the different mapping domains (Fig. 2(D)). For instance, the very low landslide density in the mountainous region of Tyrol is likely associated with the absence or unavailability of a detailed and province-specific landslide inventory (i.e. only Mp1 was available). In contrast, the comparably high landslide density observed for the province of Lower Austria (Mp9) is associated not only with its high landslide susceptibility but also with the underlying systematic mapping campaign associated with a recent project (Petschko et al., 2016). Between all the nine sub-inventories, the mapping domain named Mp1 covers the highest portion of the Austrian territory (73% of the area).

Geo-environmental variables

(Schweigl & Hervás, 2009) reported that landslide occurrence in Austria is mainly controlled by geo-environment factors such as geological, geomorphological and land cover features. For this study, five topographical variables were derived from a resampled airborne laser scanning (ALS)–based DTM available at a 10 m × 10 m spatial resolution. Slope angle is the most commonly used variable in landslide susceptibility modelling (Budimir et al., 2015; Coe et al., 2004; Corominas et al., 2014; Kanungo et al., 2009; Malamud et al., 2014; Pourghasemi & Rossi, 2016; Süzen & Kaya, 2012; van Westen et al., 2008; Wu et al., 2015). This variable was selected to describe the driving forces that directly influence the downslope sliding potential. For large areas, elevation may reflect altitude-dependent environmental, climatic and morphological conditions that are associated with slope stability (e.g. general climatic features, elevation-dependent weathering variations) (Corominas et al., 2014; Dai & Lee, 2002; van Westen et al., 2008). The topographic wetness index (TWI) (Beven & Kirkby, 1979) was used as a hydrological proxy which considers the upslope contributing area and slope gradient. Applied as a conditioning variable for this study, the TWI indicates potential spatial differences in soil moisture and flow accumulation (Süzen & Kaya, 2012).

The slope aspect may describe a varying degree of insolation, which influences soil moisture and weathering conditions (Corominas et al., 2014; Dai & Lee, 2002; van Westen et al., 2008). The slope aspect is also mentioned to be a relevant landslide predisposing variable considering that distinctive aspects may influence the terrain exposition to rainfalls and radiation, conditioning terrain humidity and vegetation patterns (Catani et al., 2013; Pourghasemi & Rossi, 2016). For this study, the aspect was derived from the DTM and resampled from the continuously scaled layer (originally from 0 to 360°) by calculating the respective cosine and sine, representing the degree of north and east exposedness (Brenning, 2009; Brenning & Trombotto, 2006; Steger et al., 2016b). The lithological map provided by the Geological Survey of Austria at the scale 1:500,000 (Weber, 1997) was used as an indicator for the parent structural material composition. The lithological classes were represented by their lithostratigraphical units. The land cover data, used to account for an associated inventory-based incompleteness (cf. “Statistical modelling”), was obtained from the CORINE land cover data (CLC) (European Environment Agency & EEA, 2012).

Methods

The methodological framework (Fig. 3) consists of four main steps: (i) data collection and preparation, (ii) exploratory data analysis, (iii) statistical modelling and (iv) model evaluation. Four models were created by testing two classifiers (logistic regression, mixed-effect logistic regression) and two landscape representations (grid-based, slope unit–based). The slope units were semi-automatically delimited using GRASS GIS and r.slopeunits (Alvioli et al., 2016). All statistical analyses were performed using the open-source statistical software “R” (Core Team, 2020). The terrain and thematic GIS parameters were computed using the open-source “SAGA GIS” (Conrad et al., 2015), while final visualizations of the maps were conducted within ESRI ArcGIS (ArcGIS Desktop, 2017).

Fig. 3
figure 3

Methodological framework. (i) Data collection and preparation. (i.a) National-scale landslide inventory assemble. (i.b) Landscape representation through terrain units (grid approaches and slope units). (i.c) selection, preparation and representation of the variables through the selected landscape representation. (ii) First knowledge gain on the data through the exploratory data analysis. (iii) Statistical classifiers applied. (iv) Outcome evaluation and best performing model selection

Data collection and preparation

Two different landscape representations were tested. For the grid-based approach, all the variables were resampled to the modelling resolution of 100 m × 100 m. The delineation of slope units was performed using the r.slopeunits extension of GRASS GIS (Alvioli et al., 2016) using a scale-dependent parameter optimization, as suggested by (Schlögel et al., 2018). The geo-environmental variables were assigned to the underlying slope units according the following criteria: for continuous variables (e.g. slope and elevation), the mean value was taken; while for categorical variables (e.g. land cover and lithology), the predominant class was taken. Slope units that were entirely located within the class “flat”, as depicted by the topographical position index (Weiss, 2001), or were predominantly located within fluvial deposits, as depicted by the lithological map, were not considered for modelling. The exclusion of such flat “trivial terrain” was expected to increase the explanatory power of the results at the costs of an apparent lower predictive performance (Steger & Glade, 2017). These trivial areas were also excluded for the grid-based analyses. To guarantee a parsimonious model and to enhance the interpretability of the results, we opted to build the study upon a compact set of frequently applied geo-environmental variables (cf. “Geo-environmental variables”).

Exploratory data analysis

An initial examination of the available datasets built the basis to gain insights into empirical relations between geo-environmental variables and the presence/absence of landslide occurrence. The data visualization techniques were also used to explore suspicious data patterns, which may be indicative of data biases. Conditional frequency plots (for continuous variables) and spineplots (for categorical variables) were used to highlight the ratio of landslide presence to absence across the geo-environmental data values. First insights into the capability of geo-environmental variables to distinguish landslide presence from absence observations were gained by evaluating the discriminatory power of single-variable models (Murillo-García et al., 2019; Steger et al., 2020). In this case, the obtained metric reflects the fitting performance of a single-predictor logistic regression measured via the area under the receiver operating characteristic curve (AUROC) (Beguería, 2006; Remondo et al., 2003).

Statistical modelling

Regression-based classification methods are frequently applied to model landslide susceptibility (Brenning, 2005; Budimir et al., 2015; Malamud et al., 2014; Wu et al., 2015). Two classifiers, which are both based on a generalized linear model (GLM), were confronted for this study: logistic regression and mixed-effects logistic regression. The former relates to the most widely applied classifier in landslide susceptibility modelling, while the latter is rather new in the field and allows to additionally isolate variation related to a heterogeneous landslide inventory completeness among classes of categorical variables (i.e. mapping domains, land cover units) (Steger et al., 2017).

Atkinson and Massari (1998) and Guzzetti et al. (1999) were among the first to apply logistic regression (LR) for spatial landslide susceptibility modelling. Besides producing smooth prediction surfaces, the LR results are straightforward to interpret (Felicísimo et al., 2013; Goetz et al., 2015). The applied LR model assumes a binomial probability distribution of the response and the final output relates to the probability of landsliding (Hosmer & Lemeshow, 2000). LR was fitted using six predictors: lithology, slope angle, elevation, TWI, eastness and northness. To better represent the previously explored non-linear relationship between shallow landslide occurrence and slope angle (cf. “Exploratory data analysis”), a quadratic term (X2) was applied (Osborne, 2015).

Most statistical classifiers used in the field of landslide susceptibility modelling, such as LR, can be assigned to the category of fixed-effects models which aim to assess the direct influence of each predictor variable (i.e. the fixed-effect) on the response (Bolker et al., 2009; Steger et al., 2017). Mixed-effects models enable to additionally consider random effects in order to account for data hierarchies or nuisance effects (Bolker et al., 2009; Zuur et al., 2009). For statistical landslide susceptibility modelling, mixed-effects modelling already proved efficient to separate effects related to a systematic incompleteness inherent in the landslide data from effects that describe the quantity of interest, namely landslide susceptibility (Steger et al., 2017). The random intercept was only used for parameter estimation and averaged-out for the final spatial prediction (Bolker et al., 2009; Zuur et al., 2009). Table 2 summarizes the selected variables applied within the LR and MELR models. It is important to note that the random intercepts (i.e. bias-describing variables) were only used for parameter estimation, and lately averaged-out of the final predictions. The final spatial predictions are based on the fixed-effects variables (i.e. slope angle, lithology, elevation, TWI and aspect; eastness and northness). More details on mixed-effect modelling for assessing landslide susceptibility can be found in (Steger et al., 2017). Rather than simply ignoring biased variables, the advantage of mixed-effects model adoption is closely related to the confounding effects, often observed between environmental variables (e.g. slope and land cover). Therefore, the usage of bias-describing variables as random intercepts enables to account for the associated variation during model parameter estimation (Steger et al., 2021), while the effects of these same variables are averaged-out for the predictions. For this study, the two categorical variables, land cover and mapping domain, which mainly describe a systematic incompleteness of landslide information, were introduced as random intercepts to reduce associated confounding and direct bias propagation.

Table 2 Summary of the variables considered within this publication to fit and predict the statistical models

Land cover variables are well known to describe not only landslide influencing processes but often also a heterogeneous completeness of landslide information (Bell et al., 2012; Petschko et al., 2016). For instance, forest cover may impede the identification of landslides while an over reporting of landslide information is likely nearby settlements or agricultural land (Bell et al., 2012; Brardinoni et al., 2003; Petschko et al., 2016; Steger et al., 2017). The completeness of landslide information was also expected to vary across the nine mapping domains (cf. “Landslide inventory”). Thus, the nine mapping domains (Table 1; Fig. 2(D)) were also introduced as a random intercept within the MELR.

Model evaluation

Model evaluation is considered an essential step in landslide susceptibility modelling (Chung & Fabbri, 2003; Guzzetti et al., 2006). The AUROC was used as a performance metric and assessed for the training samples (fitting performance) and test samples (predictive performance) obtained by repeated non-spatial (cross-validation (CV)) and spatial partitions (spatial cross-validation (SCV)). The training and test sample partitioning was performed using a k-Fold cross-validation technique using 25 repetitions and 10-fold for each repetition. This multi-fold partitioning technique is described in more detail within (Brenning, 2012; Schratz et al., 2019). In summary, for each model 250 AUROC values that relate to different partitions of training and test data were calculated (25 repetitions times 10 folds). Model overfitting describes the tendency of a model to adapt itself too closely to the training sample and therefore fails to explain independent test set observations (Hosmer & Lemeshow, 2000). An overfitted landslide susceptibility model may reproduce the training observations in great detail, while yet unseen future landslide locations may remain undetected (Brenning, 2005; Goetz et al., 2015). The difference between the fitting and predictive performances of each model was calculated to get insights into the index of model overfitting. (Steger et al., 2017) highlighted that considerable differences between non-spatially (CV) and spatially (SCV) assessed predictive performances can indicate systematic spatial inconsistencies in the modelling results. Thus, mean AUROC differences (Δ|CV − SCV|) were calculated to expose inconsistent modelling results. Besides a detailed quantitative model evaluation, also a geomorphological plausibility check was conducted to explore whether the results were affected by evident inventory-based biases or artefacts (Steger et al., 2016a).

The geomorphological plausibility of landslide susceptibility models might suffer in case the model training is based on biased landslide data, such as underreporting of past events within specific territories. Taking the landslide data background into account, subsequent bias propagations and misleading spatial predictions can be identified. The morphological coherence of the maps was assessed qualitatively by considering known and suspected flaws in the available landslide data and by scrutinizing whether those flaws are reflected in the final prediction pattern. As an example, a section in the Tyrolean Alps (Fig. 2(A, D); Fig. 6) was selected to check the coherence of the results. This landslide-prone area is characterized by a high relief energy (540–3166 m asl.; mean elevation of 1638 m asl.) and steep terrain (mean slope 25°, max. 76°) and known to be underrepresented in terms of available landslide information. Predicted very low landslide susceptibility scores at the prevalent hill slopes were therefore interpreted as an indicator of a direct landslide data bias propagation and a low morphological coherence of the final map.

Results

Exploratory data analysis

The exploratory analysis (Fig. 4) shows comparable conditional frequencies for the grid-based and slope unit–based terrain representations. For both landscape representations, the plots for the variable lithology evidence the highest conditional landslide frequencies for the units Flysch (Fz), Helvetic zone (Hz) and South Alpine (Sa). Comparably low landslide densities were observed for the units Penninic window (Pw) and Bohemian Massif (Bm). High conditional landslide frequencies were calculated for the land cover class pastures (P), broad-leaved forests (Bf) and mixed forests (Mf). Bare soils (Bs) showed low landslide frequencies. The highest landslide frequencies were observed for medium inclined terrain, with the highest values between 10 and 30° (Fig. 4(D, H)).

Fig. 4
figure 4

Conditional frequency plots. Graphs (A)–(D) represent the grid approach, and graphs (E)–(H) represent the slope unit approach. The light orange colour represents landslide presence (TRUE), while the light-blue colour represents the non-landslide observations (FALSE). The y-axis represents either the binary or the relative proportion between landslide and non-landslide. Legend for the internal variables categories: Graphs (A) and (E) - Lithology: (a) Austroalpine crystalline, (b) Austroalpine Paleozoic, (c) Austroalpine Permo-Mesozoic, (d) Flysch zone, (e) Penninic window, (f) Bohemian Massif, (g) Helvetic window, (h) Periadriatic rocks, (i) Quaternary, (j) South Alpine units, (k) Tertiary basis. Graphs (B) and (F) - Land cover: (1) = settlements, (2) = arable and fallow land, (3) = pasture, (4) = grassland, (5) = broad-leaved forests, (6) = coniferous forests, (7) = mixed forests, (8) = bare soils. For the spine plots (graphs (A), (B), (E) and (D)), the width of the bars corresponds to the relative spatial coverage of each variable class within the Austrian territory

The land cover plots (Fig. 4(B, F)) also reveal a comparably low number of inventoried landslides for settlement areas and arable land. Since these land cover units are simultaneously associated with lower slope angles and low elevations, such univariate data inspections have to be interpreted with care. An additional consideration of the underlying landslide data origin (e.g. mapping purpose) indicates that the observed relations (e.g. land cover vs. landslide occurrence) do not necessarily depict geomorphically plausible relations but are likely to describe a spatially heterogeneous landslide data completeness. Higher landslide frequency was also associated with low to mid slope steepness and lower to medium elevation (Fig. 4), where usually most of the settled areas tend to be located. This supports the argument of an overrepresentation of landslides near settlements. Lower frequencies of landslides were associated with land cover features like arable land, grassland, coniferous forest and bare soils. Predominantly located at steeper slopes, coniferous forests were associated with the lowest frequency of reported landslides compared to the other forest types.

The discriminatory power of each variable varied from 0.5 (TWI and Eastness when applied for the grid approach) to 0.73 (for lithology when applied for the grid approach, refer to Table 3). For the grid-based approach, lithology and slope angles were most efficient in discriminating landslide presence from absence. The discriminatory power was generally lower when assessed for the slope unit models.

Table 3 Univariate AUROCs associated with each single-predictor model

Spatial prediction maps

It was observed that the appearance of the predictive maps differed substantially. In general, the MELR classifier produced a less spatially contrasting prediction pattern over the territory. LR predicted clear heterogeneous landslide susceptibility maps. The LR models clearly reproduced the higher and lower landslide distribution pattern observed in Fig. 2(B). Although this general pattern can also be observed when applying MELR on slope units (Fig. 5), it is much more accentuated when potential bias-causing effects are ignored (i.e. LR models).

Fig. 5
figure 5

National-scale landslide susceptibility maps. The four models were produced under diverse statistical classifiers (LR or MELR) for grid and slope unit approaches. The predictive performance scores (median AUROC) are represented on the bottom-left corner. The upper maps represent the grid approaches (predicted through MELR and LR), while the bottom maps represent the slope unit models (predicted through MELR and LR). In order to avoid any additional bias, some areas have not been modelled (e.g. flat areas, glaciers)

A closer look at the high alpine areas of the Tyrolean Alps indicates considerable discrepancies in the produced spatial prediction patterns when comparing the different modelling strategies (Fig. 6). In contrast to MELR, LR models tend to produce low landslide susceptibility scores even for steeper parts of the Tyrolean Alps. This behaviour became particularly evident when comparing landslide susceptibility scores across slope angles for this same mountainous region (Fig. 8). Instead, for this same region and despite the low amount of inventoried landslides in this area, MELR was still able to assign comparably high susceptibility scores to the steep terrain of this particular area. For this area, 94% of the pixels were classified lower than 0.5 when applying LR. When accounting for potential inventory bias using MELR, the percentage of pixels predicted lower than 0.5 decreased to 53%.

Fig. 6
figure 6

Closer cutout to a region with a low density of landslides. The upper maps represent the grid approach (predicted through MELR and LR, respectively), and the bottom maps represent the slope unit approach (also respectively predicted through MELR and LR).

Model evaluation

Among the grid-based approaches, LR reached the highest predictive performance (CV: 0.842; SCV: 0.778) compared to MELR (CV: 0.834; SCV: 0.773). Also, for the slope unit models based on LR, the AUROCs (CV: 0.769; SCV: 0.723) were slightly higher than their MELR equivalents (CV: 0.755; SCV: 0.734). Lower predictive performances were constantly obtained for the spatial cross-validation technique (SCV) compared to non-spatial cross-validation (CV) as also observed by Petschko et al., (2014), Steger & Glade, (2017), Steger et al., (2016b).

The fitting performance achieved through SCV (black crosses in Fig. 7) showed a similar trend with slightly higher values for the LR models. Fitting performances can jointly be interpreted with the predictive performance to obtain insights into the index of model overfit (here named as overfitting index and represented by the black points within the lower graph of Fig. 7) (Brenning, 2005; Goetz et al., 2015; Murillo-García et al., 2019; Tien Bui et al., 2012). Comparing the two classifiers, the overfitting index was lower for MELR indicating a lower index of overfitting. This tendency was particularly evident for the slope unit terrain partition. The mean difference between CV and SCV (Δ|CV − SCV|), in Fig. 7, was constantly lower for MELR indicating spatially more robust modelling results (Steger et al., 2017).

Fig. 7
figure 7

Quantitative model evaluation based on the spatial (SCV) and non-spatial cross-validation (CV). The boxplots in the upper graphs represent the CV- and SCV-based AUROCs, while the black cross symbol refers to the median SCV fitting performance. The graph at the bottom represents the overfitting index calculated for SCV partitions (black points) and an indicator for spatially inconsistent modelling results, i.e. the difference between median spatial and non-spatial validation scores (red triangles)

Discussion

This research tackled two pending challenges related to the topic of large area statistical landslide susceptibility assessment, namely (i) the systematic incompleteness of landslide data and (ii) the unprecise positional location of landslide samples. These challenges were faced using (i) a mixed-effects modelling approach and (ii) an alternative representation of the terrain, namely slope units.

The initial exploratory data analysis provided further evidence that systematic biases are inherent in the available landslide data. The inventory data that was based on reports was assessed to overrepresent densely populated areas (gentle slopes and relatively lower elevations), which is reflected by a very high conditional frequency of landslides (Fig. (C, D, G, H)). As a consequence, an underreporting of landslides in sparsely populated regions can be expected. Particular high conditional landslide frequencies for both terrain units were associated with elevations below 1500 m and slope inclinations below 20 degrees. Additionally, known challenges in landslide reporting and mapping within specific land cover classes (e.g. forested areas) as described by (Bell, 2007); (Petschko et al., 2016); (Conoscenti et al., 2016) might have contributed to a heterogeneous landslide data completeness. This in turn supported the decision to include land cover as a bias-describing effect within the mixed-effects models.

While the four models reached moderately high and similar predictive performances, the appearance of the final maps varied significantly. Such contrasting spatial prediction patterns for similarly performing models have also been reported in the literature (Hussin et al., 2016; Sterlacchini et al., 2011). The slightly higher predictive performance of the LR models should be interpreted in the context of the underlying landslide data bias. By including a bias-describing predictor like land cover, LR directly reproduced the associated data bias which also led to over-optimistic AUC values (Goetz et al., 2015; Hosmer & Lemeshow, 2000). Although a vast amount of publications focuses on the AUC values as a model and variable selection criterion, high AUCs might not necessarily reflect the real quality of the final maps, especially under data bias conditions (Steger et al., 2016b; Steger et al., 2017). From a purely quantitative point of view, MELR models performed slightly worse compared to LR, a classifier extensively used in the field (Akgun, 2012; Atkinson & Massari, 2011; van Den Eeckhaut et al., 2012; Goetz et al., 2015; Guzzetti et al., 1999; Lee et al., 2004; Moosavi & Niazi, 2016; Nefeslioglu et al., 2008; Pourghasemi et al., 2013; Regmi et al., 2014; Reichenbach et al., 2014; Reichenbach et al., 2018; Trigila et al., 2013). Steger et al. (2017) have shown the potential of mixed-effects modelling for handling landslide data bias for study sites in Austria. This study provides evidence that the application of such approaches is beneficial, also when applied for large area assessments (national scale). MELR achieved was able to reduce the propagation of biased relationships into the final results and produced geomorphologically coherent predictions.

Within this research, the quality of the models was not only interpreted from calculated predictive performance estimates, but also on the basis of other indicators, such as the overfitting index and the difference between CV and SCV. Fitting a model too closely to characteristics in training data is a common concern for landslide susceptibility models using statistical methods (Goetz et al., 2015). Such model overfitting was constantly observed to be lower for the MELR models (Fig. 7) compared to their LR counterparts. At the same time, MELR models were also associated with a lower difference in AUCs between the validation techniques (e.g. Δ|CV − SCV|; Fig. 7), indicating a higher spatial consistency of the results (Steger et al., 2017).

Moreover, beyond the numerical evaluation, the model selection was additionally conducted by assessing the geomorphic plausibility of the maps (Bell, 2007; Steger et al., 2016a). This might be particularly important because differently appearing maps can be associated with similar performance estimates and the appearance of the final maps co-determines the practical acceptance by the final users (Brenning, 2005; Goetz et al., 2015). A close inspection of the prediction patterns created for the Austrian territory (Fig. 5) provided valuable evidence that the inventory biases were directly propagated into the final LR results. In fact, the lack of spatially consistent landslide data across the sampled target area (Fig. 2) is a recurrent challenge in the field of landslide susceptibility assessments, especially for very large areas. The inclusion of landslide mapping domains as a random effect within MELR counterbalanced the associated data heterogeneity across the territory and enabled more plausible results.

In order to assess how landslide data incompleteness affected the final maps, we focused on regions where the landslide occurrence is well known to be underestimated due to underreporting. For the selected landslide-prone Tyrolean area, only a few landslides were registered within the available inventory (samples originally from Mp1), which is mostly reported biased. When viewing the maps from a national-scale perspective (Fig. 5), it became obvious that LR reproduced this data bias by assigning particularly low susceptibility scores to this area (Fig. 6). From a geomorphic viewpoint, much higher landslide susceptibility scores can be expected for this particular region. The approach based on mixed-effects modelling was able to counterbalance this landslide data flaw and produced more appropriate predictions (i.e. higher susceptibility scores) for the hillside areas. In other words, even in the case that very few landslides were officially registered for the Tyrolean hill slope areas, MELR assigned relatively high probability scores as a consequence of the prevalent morphology.

For this sub-region, the lower plausibility of the LR predictions became particularly evident when plotting the predicted susceptibility values against a factor directly related to shear stress, namely the slope angle (Fig. 8). A more detailed view at the prediction patterns for this region (Fig. 8(B)) shows for LR a particularly high concentration of very low susceptibility scores (close to zero) associated even with high slope inclinations (between 30 and 40 degrees). MELR in its place produced a more balanced representation of landslide susceptibility for this region by predicting a substantial amount of medium to steeply inclined slopes as considerably susceptible to landsliding (Fig. 8(B)).

Fig. 8
figure 8

Comparison between predicted susceptibility scores (y-axis in (A) and (B)) and slope angle for MELR (left-side graph (A)) and LR (right-side graph (B)). The graphs and summary statistics at the bottom of each graph are related to the shown area

The application of slope units as an alternative to grid-based assessments has recently gained more and more attention in the field of landslide susceptibility assessment (Alvioli et al., 2016; Camilo et al., 2017; Guzzetti & Reichenbach, 1994; Jacobs et al., 2020; Lombardo et al., 2018; Reichenbach et al., 2014; Schlögel et al., 2018). Our results are in line with (van Den Eeckhaut et al., 2009) and demonstrate an overall higher predictive performance for grid-based assessments, in comparison to their slope unit counterparts. However, other studies showed that this is not necessarily always the case (Erener & Düzgün, 2012). A common aspect of all publications that compared both landscape representations (slope units vs. grid cells) for the purpose of landslide susceptibility modelling is the observed similar predictive performance of the underlying models, despite the rather different appearance of the final maps (e.g. van Den Eeckhaut et al. (2009)). This already suggests that focusing on obtained performance metrics as the sole criteria to select a model is subject to limitations.

For landslide susceptibility assessments, slope units may frequently cover a larger areal extent than a pixel within a conventional grid-based model. As a consequence, inaccurately mapped landslides are still more likely to be assigned to the correct spatial entity (i.e. slope unit) compared to their grid-based counterparts (Steger et al., 2016b). Thus, slope unit–based models are likely to be less sensitive to positional inaccuracies of inventory data. (Jacobs et al., 2020) tackled the effects of uncertain landslide point positioning on landslide susceptibility models and confirmed an improved capacity of slope units to handle positionally inaccurate landslide data, compared to pixel-based representations. Within this research, slope units, with generally larger size than pixels, were better able to accommodate positional inaccuracies from the inventories while still being a reliable terrain unit for landslide susceptibility models. Finally, the joint analysis of several evaluation criteria (prediction pattern surfaces, the susceptibility frequencies distribution and the different validations measures) provides further evidence of the utility of slope units under landslide data bias conditions and positional inaccurate sampling points.

Ultimately, the current results provide evidence that a too detailed representation of the terrain may be detrimental in the presence of inaccurate landslide data and that actively counterbalancing known systematic data biases (e.g. averaging out bias-describing variables using mixed-effects modelling) can improve the plausibility of the results. In this context, we consider mixed-effects models in combination with slope units valuable to handle the impact of flawed landslide information.

Conclusion

Landslide inventory data available for large areas is usually affected by positional inaccuracies and spatial incompleteness. For national-scale analyses, the common unavailability of accurate and representative information on past slope instabilities impedes the straightforward creation of statistically based landslide susceptibility models. This research highlighted that an adaptation of the research design can minimize the propagation of landslide data flaws into susceptibility models for very large areas. The underlying comparative analyses were based on four models which are related to different classifiers (conventional logistic regression vs. mixed-effects logistic regression) and different terrain representations (grid-based vs. slope unit–based).While conventional logistic regression did not specifically account for the underlying data bias, a mixed-effects modelling approach was applied to counterbalance effects associated with a systematic spatial landslide data incompleteness. Using slope units, instead of the more common pixel-based terrain representation, allowed to reduce the effects of positionally inaccurate landslide locations. A holistic evaluation of modelling results (i.e. quantitative and qualitative assessments) provided evidence that mixed-effects modelling in combination with a slope unit terrain representation was beneficial under the prevalent flawed landslide data conditions compared to the standard procedures (e.g. logistic regression and a grid-based terrain representation).

For large area landslide susceptibility assessment, we recommend to (i) gain insights into potential landslide data flaws in order to (ii) allow a corresponding adaptation of the modelling design. In case the landslide data is heterogeneously complete across an area, we advise to avoid explanatory variables that describe and therefore reproduce the underlying landslide data incompleteness. Instead, mixed-effects modelling can prove useful to explicitly reduce associated biases. Avoidance of a detailed representation of the terrain (e.g. via high-resolution grid-based models) is beneficial to tackle the challenge of positionally inaccurate landslide information. In this context, we advocate considering an alternative representation of the terrain, such as slope units. We finally emphasize that such large area analyses have to be interpreted with care, even if the flaws inherent in the data were accounted for in the research design. The present results provide a generalized overview of landslide-prone areas in Austria, but they are not applicable for local decision-making.