Introduction

The landslide assessment of the transmission tower foundations is a critical issue to ensure the construction and stable operation for power systems (Zhou et al. 2022). However, landslide assessment for tower foundations is a complicated task that involves the sciences of geotechnics, geomorphology, hydrology, and statistics (Liu et al. 2020). Although the method of establishing a physical model to evaluate the stable state of a foundation slope is reliable (Jiang et al. 2018; Wang et al. 2019a), it is only suitable for the small research area or a single slope (Park et al. 2019). Since it is time-consuming and expensive (Lee et al. 2013), using the physical models to evaluate slope stability widely is a challenge. As the result of LSP, the landslide susceptibility map (LSM) can solve these problems.

A LSM is a way of expressing the susceptibility indices of a landslide in a region. The generation model of the LSM takes the factors that lead to landslides as the input, and the spatial distribution of the susceptibility indices of a landslide as the output. Recently, many models for generating an LSM are proposed. For example, data statistical models (e.g., entropy data mining model (Chen et al. 2017), Dampster-Shafer model (Chen et al. 2016) and frequency ratio model (Kayastha 2015)) and machine learning models (neuro-fuzzy model (Oh and Pradhan 2011), artificial neural networks (ANNs) (Tian et al. 2018b), random forest (Sun et al. 2020) and decision tree (Guo et al. 2021)). These methods can help engineers to obtain an LSM. However, these methods cannot extract critical information from the landslide influencing factors, especially, the machine learning methods are prone to overfitting, making it difficult to improve the prediction accuracy (Wang et al. 2019b). Each of the previously mentioned models for LSM has its disadvantages and advantages, which are shown in Table 1.

Table 1 Some advantages and disadvantages of models previously used for LSM

To address the above problems, deep learning models have received more attention. Unlike the previously mentioned approaches, it is a non-linear tool that extracts the key features from the input data (Lecun et al. 2015). In many fields, the accuracy and objectivity of deep learning models can be achieved or exceed the level of human experts (Krizhevsky et al. 2017). Among deep learning models, convolutional neural networks (CNNs) have a remarkable ability to solve the problem of pattern recognition (Anwer et al. 2018). CNNs have been also applied to the prediction task of LSMs (Azarafza et al. 2021; Nikoobakht et al. 2022). However, it is time-consuming and impractical to use the enumeration method to try the combinations of model parameters in a specific range for generating the LSMs (Hakim et al. 2022). Hence, many existing studies are more focused on comparing the impact of different models on the accuracy (Sun et al. 2020).

A suitable CNN model plays a decisive role in the prediction result (Lee et al. 2018; Wang et al. 2019c). In order to find suitable CNN hyperparameters, Ezzat et al. (2021), for example, used the gravitational search optimization to find suitable CNN hyperparameters for diagnosing COVID-19 from chest X-ray images. Kim et al. (2020) applied the harmony search (HS) algorithm to improve performance of a 1D CNN in human respiration pattern recognition. Hakim et al. (2022) applied the CNN model that is obtained by the grey wolf optimizer to the LSM prediction. In fact, a few studies have shown the disadvantages of these methods. For example, the HS algorithm is not efficient enough in performing local search in numerical optimization and is sensitive to the parameter settings (Yi et al. 2015). The grey wolf optimizer is prone to fall into local optimums and has slow convergence speed (Xie et al. 2020). To overcome these drawbacks, Bayesian optimization algorithm is proposed in this study to rapidly search suitable CNN hyperparameters and training options in prediction of LSMs. Furthermore, this study explores the performance of CNNs with different frameworks in LSP and applies the LSP technique to the stability evaluation of tower foundation slopes.

Therefore, taking the Luoding and Xinyi counties (Guangdong province, China), which has experienced various landslide hazards in recent decades, as the study area, this paper combines the LSP technique with the big data of transmission tower locations and investigates the regularity of CNN performance on LSP by permutating and combining its key layers based on Bayesian optimization for the first time. And the LSM predicted by the optimized CNN is used to evaluate the stability of transmission tower foundation slopes on large scale and provide the suitable/unsuitable geological, geographical and environmental situations for the tower construction in the study area. Specifically, the goals of this paper mainly include the following: (a) determining the best CNN hyperparameters and training options using Bayesian optimization algorithm, (b) applying the optimized CNN to generate LSMs, (c) analyzing model performance, and (d) evaluating the stability of transmission tower foundations in the study area.

Materials

Study area

The study area is located in Luoding and Xinyi counties, which is a mountainous region of Guangdong Province, China, extending between 22° 11′ N and 22° 57′ N latitude and 110° 40′ E and 111° 52′ E longitude and covering an area of approximately 5408 km2. The altitude of the study area is high in the middle. And the mountainous area exceeds 50%. According to data from the National Meteorological Center of China (http://data.cma.cn/), the average monthly rainfall in the study area from 1960 to 2019 is 136.43 mm. Usually, the rainy season is from April to September in every year, accounting for more than 75% of the annual rainfall.

The geological conditions of the study area can be viewed from three aspects: stratum rocks, igneous rocks and metamorphic plutonic intrusion rocks. The stratum is mainly composed of Jixian-Qingbaikou System (silica biotite schist, plagioclase, etc.), Quaternary (gravel, clay, silt, etc.) and Carboniferous (sandstone, shale, dolomite, limestone, etc.). Igneous rocks are mainly composed of plutonic acid igneous rocks (mixed granites). The metamorphic plutonic intrusion rocks are mainly granite.

Landslide inventory map

Landslide inventory is a basic but essential tool for landslide hazard management, representing a fundamental base of knowledge on the spatial distribution of existing landslides (Tian et al. 2018a). The locations, scales, and directions of the historical landslides in this paper are mainly obtained from the report of the Guangdong Provincial Geological Environment Monitoring Station of the China Geological Environment Monitoring Institute (Li et al. 2012). Finally, the locations of landslides are collected and stored in the geographic information system (GIS) database. Therefore, a total of 424 landslide locations are collected from the study area. Landslides in the study area can be divided into small-, medium-, and large-scale according to their volume. Most of landslides are medium-scaled (less than 1,000,000 m3) and small-scaled (less than 100,000 m3), accounting for more than 90% of the landslides. Furthermore, in order to allow the CNN models to learn the features of the non-landslide locations, 424 locations that are less prone to landslides are also randomly selected in the study area. Additionally, a total of 1658 transmission towers in the study area are found in https://www.openstreetmap.org/. And the landslide inventory map of the study area is shown in Fig. 1.

Fig. 1
figure 1

Landslide inventory map of Luoding and Xinyi counties

Preparation of influencing factors

The selection of landslide influencing factors is particularly important for landslide susceptibility mapping. According to Reichenbach et al. (2018), a total of 596 factors are found to assess landslide susceptibility from 1983 to 2016, and the average number of factors used in each model is 9. Besides, the selected landslide influencing factors should be measurable, operable, uneven, complete, and non-redundant (Ayalew and Yamagishi 2005). In addition, some studies (Aditian et al. 2018; Chang et al. 2020; Huang et al. 2020b) have shown that the number of factors between 4 and 12 are suitable for LSP. Therefore, 14 factors (Table 2) are selected as the input of LSM prediction in the study area, including altitude, aspect, distance to faults (m), distance to rivers (m), distance to roads (m), land use, lithology, normalized difference vegetation index (NDVI), plan curvature, profile curvature, rainfall (mm/month), surface roughness (standard deviation of the slope, SDS), slope (°), and topographic wetness index (TWI). Except the influencing factors of land use, lithology, and aspect, which are discrete variables, the rest are continuous variables. In order to standardize the input form, these landslide influencing factors need to be processed (Guo et al. 2021). Categorical variables are generally classified according to the heuristic classification of the related thematic information (Calvello and Ciurleo 2016). For numerical variables, it can be classified according to natural breaks method, which has been suggested for data with jumps (Calvello and Ciurleo 2016; Chen et al. 2017). Finally, the landslide influencing factor datasets are resampled to 30 m resolution raster (Fig. 2).

Table 2 The information about landslide influencing factors used in this study
Fig. 2
figure 2figure 2

Thematic maps of the study area. a Altitude, b aspect, c land use, d lithology, e distance to faults, f distance to rivers, g TWI, h distance to roads, i plan curvature, j profile curvature, k rainfall, l NDVI, m SDS, n slope

Evaluation of influencing factors

The selection of features is important for the prediction of LSMs (Guo et al. 2021). Redundant features will interfere the recognition ability of a model, reduce the generalization ability, and increase the operation time (Zhou et al. 2021). In order to prove the validity of the selected landslide influencing factors or eliminate irrelevant factors to improve the predictive ability of the model, the gain ratio (GR) technique (Dash and Liu 1997) is adopted in this paper. When the GR of a factor is less than or equal to zero, it is considered irrelevant to the landslide and should not be used as the input of the model. The mathematical calculation of the GR is introduced in the following context.

Let \({p}_{k}\) be the probability of event occurring (either landslide or non-landslide) and the \(n\) be the number of events, with the information entropy before splitting the factor into several classes is:

$$Ent\left(D\right)=-\sum\nolimits_{k=1}^{n}{p}_{k}\log_{2}{p}_{k}$$
(1)

where \(D\) is the total set of a factor (e.g., altitude, slope), and the \({D}_{i}\) (\(i=1, 2, \dots , q\)) is the subset of \(D\) after splitting the factor. The \(q\) is the number of classes (e.g., \(q=6\) in altitude, \(q=9\) in slope). Then, the information gain of each factor is:

$$Gain\left(D\right)=Ent\left(D\right)-\frac{1}{b}\sum\nolimits_{i=1}^{q}{a}_{i}Ent\left({D}_{i}\right)$$
(2)

where \({a}_{i}\) is the number of samples (either landslide or non-landslide) in a class, and \(b\) is the total number of samples. And the split information of a landslide influencing factor is computed as follows:

$$SplitInfo\left(D\right)=-\sum\nolimits_{i=1}^{q}\frac{{a}_{i}}{b}\log_{2}\frac{{a}_{i}}{b}$$
(3)

Finally, the GR of the factor is calculated by the following formula:

$$GainRatio\left(D\right)=\frac{Gain\left(D\right)}{SplitInfo\left(D\right)}$$
(4)

Methodology

CNN

Convolutional neural networks (CNNs) are non-linear tools that can automatically extract key features from large amounts of data (Lin et al. 2022). They are often used as a prediction model in LSM tasks (Hakim et al. 2022; Thi-Ngo et al. 2021; Wang et al. 2019b). In LSP, the input of a CNN is the landslide influencing factors, and the output is the landslide (positive class) and non-landslide (negative class) labels.

Optimization of CNN hyperparameters and training options

The hyperparameters of a CNN and training options (target variables) can significantly affect the prediction results, thus selecting the appropriate parameters is a challenge. In this paper, Bayesian optimization is chosen to obtain the CNN hyperparameters and training parameters. When the objective function is not differentiable, Bayesian optimization algorithm has extraordinary performance in a short time. The optimization process mainly includes the following context.

The prerequisite is assumed that the relationship between the target variables and error rate of the validation data set belongs to the Gaussian process (Garrido-Merchán and Hernández-Lobato 2020), (a) then the original distribution is modified by the responses of the input (i.e., the target variable \({X}_{i}=\left\{{x}_{i1}, {x}_{i2}, \dots \right\}\)) and output (i.e., the results \({Y}_{i}=\left\{{y}_{i1}, {y}_{i2}, \dots \right\}\)). The mathematical expression can be represented by formula 6:

$$p\left(\theta |X,y\right)=\frac{p\left(y|X,\theta \right)p(\theta )}{p\left(y|X\right)}$$
(6)

where \(p\left(\theta |X,y\right)\) is the unknown distribution (previous distribution), \(p(\theta )\) is corrected distribution (posterior distribution), \(p\left(y|X,\theta \right)\) is the likelihood estimate of the sample, and \(p\left(y|X\right)\) is the probability distribution of the sample results.

(b) The points other than the previous sample points (i.e., \({X}_{i}\)) will be selected as the input for the next iteration to obtain detailed distributions or explore unknown distribution areas. If the conditions (error rate of the validation set and the maximum number of optimization iterations) are not satisfied, the program will go back to step b, otherwise output the target variable \({X}_{i}\), which is the optimization solution.

CNN architecture optimization

Furthermore, in order to explore the impact of increasing the number of convolutional layers and pooling layers in a CNN on the prediction performance, this paper uses a total of 5 convolution blocks (including a convolutional layer and relu layer) and pooling blocks (max pooling layer) to perform permutations (Fig. 3).

Fig. 3
figure 3

The CNN model before optimization, \(k=[1, 5]\)

Table 3 summarizes these hyperparameters and their search spaces. The convolution blocks and the pooling blocks are arranged and combined, and a total of 59 cases are obtained. Meanwhile, in the comparison of learning rate strategies, the piecewise decay (PD) learning rate strategy will be optimized first. Then, five best CNN architectures selected among the optimization results will be trained with the constant learning rate strategy under the same other conditions (for example, the hyperparameters and their search space). Finally, the model with the maximum area under the curve (AUC) value is selected from the optimization results as the prediction model of the LSM in this paper.

Table 3 Hyperparameters and their search space of CNN

Accuracy measures

Receiver operating characteristic (ROC) curve is a standard technique for the performance evaluation of landslide prediction methods (Bradley 1997). It is produced by plotting the true positive (TP) rate against the false positive (FP) rate at various threshold values. Moreover, the AUC measure has been widely used to quantitatively evaluate the performance of LSM approaches (Reichenbach et al. 2018). Specifically, the AUC of a model is classified to 0.5–0.6, 0.6–0.7, 0.7–0.8, 0.8–0.9, and 0.9–1 to represent poor, average, good, very good, and excellent performance (Thi-Ngo et al. 2021).

In addition, the landslide frequency ratio (FR) can be used to evaluate the performance of the model even if the landslide susceptibility zone in a LSM are varied (Guo et al. 2021). The mathematical expression of the FR is shown as follows:

$$FR=\frac{{LA}_{i}/TLA}{{A}_{i}/TA}$$
(7)

where \({A}_{i}\) and \({LA}_{i}\) are the area and landslide area in each susceptibility zone, while \(TA\) and \(TLA\) are the total area and total landslide area in the study area. The FR index also considers the relationship between the landslide area and the susceptibility zone in different grades, which indicates the reasonableness of a model to predict the susceptibility zone.

The procedures of generating LSMs

The procedures of generating LSMs mainly contain following steps:

  1. (a)

    The landslide inventory in the study area is produced, including the query of historical landslide data and the use of satellite remote sensing images for location proofreading.

  2. (b)

    Landslide influencing factors of 424 landslide samples and 424 non-landslide samples are extracted by using ArcGIS (Environmental Systems Research Institute, Inc., Redlands, CA, USA). Then, 80% and 20% of samples are randomly assigned as the training set and testing set, respectively (Asadi et al. 2022; Nhu et al. 2020; Xing et al. 2021).

  3. (c)

    Bayesian algorithm is used to optimize the CNN hyperparameters and training options. Meanwhile, the best-performing parameters will be selected from 59 cases, which are obtained by combining different numbers of convolutional layers and pooling layers. Then, the robustness of the best-performing CNN model is verified by fivefold cross-validation.

  4. (d)

    The landslide influencing factors of the study area are extracted by using ArcGIS, then input into the best-performing CNN to obtain the landslide susceptibility indices (probability). This step is performed in MATLAB (MathWorks Inc., Natick, MA, USA).

  5. (e)

    Then, the LSM is generated according to the landslide susceptibility indices. And the natural breaks method (Chen et al. 2017) is used to classify indices as very low, low, moderate, high, and very high.

  6. (f)

    Finally, the performances of the CNN models are evaluated by accuracy measures.

The overall flowchart is shown in Fig. 4.

Fig. 4
figure 4

Illustration of the methodology in this study

Results and discussion

Relationships between landslides and influencing factors

The influencing degree of factors can be reflected by the evaluators, FR and GR. The FR reflects the landslide frequency in a class of each factor (Table 4). For instance, as shown in Table 4, the FR of slope angles from 28° to 65° is 4.19, indicating that most landslides in the study area occur on steep slopes. And FRs in SDS can also reflect similar conclusions. In addition, FRs roughly increase with a decreasing distance to faults, rivers and roads, and most landslides occur within distances less than 1017 m, 1131 m, and 640 m, respectively. In lithology, the H class (Xiaoyunwushan Formation, Triassic; the main components are conglomerate, glutenite intercalated with black siltstone, and lime shale) has the highest FR (3.42), indicating that this characteristic have a large impact on landslides. Then followed by class C (Liujiang Formation and Kuofeng Formation, Devonian; the main components are thin siliceous rock, siliceous mudstone, dolomite), and class D (Pingzheng granulite formation, Fengtongkou formation, Lankeng formation and Shapingwan formation, Jixian System-Qingbaikou System; the main components are silicic biotite schist, plagioclase amphibole), and their FRs are 1.29 and 1.25, respectively.

Table 4 FRs of landslide influencing factors
Table 5 The best-selected parameters
Table 6 The results of 5-fold cross-validation
Table 7 Hyperparameters search space and the best results of BPNN and SVM
Table 8 FR values in each susceptibility level
Table 9 Suggestions for the transmission tower construction

GR reflects the relationship between influencing factors and landslides. Figure 5 shows that SDS and slope contribute the most to the landslide, and their GR are 0.113 and 0.103, respectively. The following factors such as distance to faults, TWI, land use, lithology, NDVI, distance to roads, altitude, and rainfall have GR values between 0.011 and 0.065. Meanwhile, GR values of some influencing factors such as distance to rivers, plan curvature, profile curvature, and aspect are less than 0.01 but positive, which means they do not decrease the model performance. An additional calculation also supports this point: the LSM without considering aspect, profile curve, or plan curve had an AUC value of 0.8446, 0.8602 and 0.8511, respectively, which is less than that of the map considering these factors (0.8612). In general, the topography factors (e.g., SDS and slope) are more important for landslides in this study area than the geological (e.g., lithology, distance from faults), hydrological (e.g., TWI) and environmental ones (e.g., NDVI, distance from rivers, rainfall, and land use). Additionally, GRs of all landslide influencing factors are greater than zeros, thus they all are considered to have positive impact on landslides.

Fig. 5
figure 5

Gain ratio of each landslide influencing factor

Optimization results and accuracy analysis of CNN architectures

The AUC results for the 59 cases obtained by combining different numbers of convolution blocks and pooling blocks are shown in Table 10. The range of AUC is between 0.8096 and 0.8631. Figure 6a shows that, in tasks (such as LSP) with few input features, the AUC first increases and then decreases with the increasing number of the convolution blocks for the case with no pooling layer in the CNN architecture. The AUC reaches the maximum value of 0.8631 when the number of convolution blocks is 3 (index 6).

Fig. 6
figure 6

AUC Results from Bayesian optimization and gravitational search optimization algorithm

In addition, the model performance decreases when pooling layers are used. It can be concluded from the Fig. 6b that the AUC decreases when the number of pooling layers is increased. It is worth noting that, the best AUC results appear in the CNN model without pooling blocks. The function of the pooling blocks is to extract key features from a large amount of information. However, for LSP, when the input dimension is small, adding pooling layers may lose key features, leading to the opposite of what is expected.

Figure 6d shows the comparison results of different learning rate strategies. AUC drops when the learning rate strategy is switched from PD to constant. This indicates that at the later stage of the model iteration, a small step size is beneficial to search for the smaller value in the loss function. After comparing the 59 cases, the optimal CNN model has 3 convolution blocks with no max pooling layer and a piecewise decay learn rate in Table 4. The optimal CNN architecture is shown in Fig. 7, and the hyperparameters and training options are shown in Table 5.

To elucidate the differences between optimization methods, the gravitational search algorithm is also applied to LSP. Similarly, the results of the 59 cases are shown in Table 11. The AUC range is between 0.500 and 0.861, which is wider than that obtained by Bayesian optimization algorithm. It can be found that, in Fig. 6a, the AUCs obtained by the two algorithms are basically similar. For example, the AUCs increase first and then decreases with increasing the number of convolution blocks (Fig. 6a), and the maximum (0.8612) appears when there are 3 convolution blocks; the AUCs decrease roughly as the number of pooling blocks increases (Fig. 6c); and they decrease when the learning rate strategy is switched from PD to constant (Fig. 6e). It is worth noting that, as shown in Table 11, when increasing the number of convblocks or pooling blocks, the AUC obtained by the gravitational search algorithm changes drastically. In addition, the CNN model is trained 10 times with 5 convolution blocks, no pooling blocks, and a PD learning rate. The results of the two optimization algorithms are shown in Fig. 8, which demonstrates that, compared with Bayesian optimization, the gravitational search algorithm makes results unstable when the number of target variables increases.

Fig. 7
figure 7

Selected CNN model by Bayesian optimization

Comparisons show that Bayesian optimization algorithm is more stable than the gravitational search algorithm, and the corresponding AUC is higher. To demonstrate the robustness of the proposed CNN model, a fivefold cross-validation on the model of Fig. 7 is further carried out. Table 6 shows that, the average accuracy is 0.753 and the average AUC is 0.827 in testing dataset. AUC (Fig. 10) represents the division of positive (landslide) and negative (non-landslide) probability thresholds for all cases and the average value of the fivefold cross-validation between 0.8 and 0.9 indicates that the model is very good. Besides, the classification accuracy rate (CAR) and AUC change in a small range and their distributions is relatively concentrated, which indirectly proves that the model is stable and not prone to be affected by the classification of dataset sets. The CNN model established by Bayesian optimization has excellent stability and reliability, which can be used for the analysis of landslide susceptibility in the research area.

Fig. 8
figure 8

Performance of Bayesian optimization and the gravitational search optimization algorithm. Mean of B = mean AUC of Bayesian optimization, Mean of G = mean AUC of gravitational search optimization algorithm

The application of CNN-based landslide susceptibility mapping methods

The LSMs of the study area are generated by using 14 landslide influencing factors, and the results are shown Fig. 11a and b for the two different algorithms. The landslide susceptibility indices are divided into five classes through classification of each pixel using natural breaks method in ArcGIS to create five susceptibility classes: very low, low, moderate, high, and very high. Comparing the LSMs obtained by the two algorithms, the high and very high susceptibility areas are basically the same for the two algorithms. The obvious difference is that the low susceptibility landslide areas of the CNN optimized by the gravitational search algorithm (CNNG) is larger than that of the CNN optimized by Bayesian optimization (CNNB), and the very low susceptibility landslide areas of CNNG is small. From the spatial distribution of landslide susceptibility classes, the landslide influencing factors of slope and distance to faults are critical. On the one hand, the high and very high susceptibility landslide areas are mainly distributed in the central and western study area with steep slopes and close to the faults. On the other hand, low susceptibility landslide areas are mainly located in the central, northeast, and southwest study area with small slope angles and far from the fault. It is worth noting that, the importance of lithology for landslides is illustrated in Guo et al. (2021), which is consistent with the current results. As mentioned in Al-Najjar and Pradhan (2021), slope is an important factor in landslides, which is also consistent with the current results.

In order to better explain the rationality and capability of the optimized CNN model, two other machine learning models (back propagation neural network (BPNN) and support vector machine (SVM)) and a statistical model (entropy data mining model – Entropy model) are added for comparisons. The neuron number of the BPNN is determined by the formula \(n=\sqrt{i+j}+k\) (where \(i\) and \(\mathrm{j}\) are the number of input and output of the model, \(k\) is an integer from 1 to 10) (Geng et al. 2018). Finally, the maximum AUC is found when \(n=12\). The SVM model uses Gaussian kernel function (Aktas and San 2019). And the search spaces of suitable box constraint and kernel scale are from \({10}^{-3}\) to \({10}^{3}\). Finally, the maximum AUC is optimized by Bayesian algorithm when box constraint = 972.83 and kernel scale = 98.822, which are shown in Table 7. And the LSMs for different models are shown in Fig. 11.

The LSMs obtained by the five models (CNNB, CNNG, BPNN, SVM, and Entropy model) are similar, but also different. For example, the high and very high landslide susceptibility areas of LSMs are all distributed in the middle of the study area, especially along the canyon in the middle area. Meanwhile, low and very low susceptibility areas are in the northeast and southwest of the study area. The LSM of the Entropy model is different from those of other models. It predicts the higher-elevation central and southern regions as high susceptibility areas, and the west as low susceptibility areas.

The distribution of susceptibility zones and the corresponding percentage of landslides in each level are computed and are shown in Fig. 9. The CNN models exhibit superior capabilities. In general, an ideal LSM for transmission tower foundations should contain fewer high and very high susceptibility areas. From this viewpoint, the machine learning methods have better performance than the statistical model. However, in the LSM generated by the Entropy model, less landslides occur in the very high susceptibility areas. Moreover, there are more than 80% of historical landslides occurred in the high and very high landslide susceptibility zones that generated by the CNN-based model, indicating that the CNN outperforms the other models. This conclusion is also confirmed in the FR results (Table 8). In the optimized CNN models by different optimization methods, the FR of CNNB in the high and very high susceptibility areas are higher than that of CNNG. This discrepancy indicates that CNNB is more efficient than CNNG.

Fig. 9
figure 9

The classification results considering different methods. CNNB = obtained by Bayesian optimization, CNNG = obtained by the gravitational search optimization

In addition, Table 8 shows that the FRs of the five models in the very low and low susceptibility areas are all less than 1. The FRs increase in moderate susceptibility areas. And the FR is greater than 1 in the high and very high susceptibility areas. Especially in the very high susceptibility area, the FR of all five models reaches the maximum value. It is 5.730 (CNNB), 5.225 (CNNG), 3.961 (Entropy model), 5.723 (SVM), and 5.134 (BPNN), respectively. This reveals that the landslide distribution became gradually denser with increasing landslide susceptibility levels.

The CNNB model exhibit superior capabilities. It has the largest FR (5.730) in very high susceptibility areas and smallest FR (0.057) in the very low susceptibility areas. Meanwhile, in the ROC curve (Fig. 10), CNNB achieves the highest AUC (0.863), while the AUCs of the other models are 0.861 (CNNG), 0.798 (Entropy model), 0.844 (SVM) and 0.861 (BPNN), respectively. In general, among the abovementioned machine learning and statistical models, the machine learning model is better than the statistical model. At the same time, the CNNB model is best in various evaluation indicators, thus it can be a powerful tool to mapping landslide susceptibility at a regional scale.

Fig. 10
figure 10

ROC curves of different models

Fig. 11
figure 11

ae Distribution of transmission tower in LSM predicted by each model and f the percentage of transmission towers in each susceptibility zone.

For the distribution of transmission towers, as shown in Fig. 11, comparatively most of towers (about 72%) are distributed in the very low and low landslide susceptibility areas, while about 14% towers are located in the high and very high susceptibility areas. Most of unstable tower foundation are distributed in the canyon in the south of study area, mainly because of the heavy rainfall (> 1500 mm/year), steep slope (> 30°) and close to the road. The suggestions for the transmission tower construction are listed in Table 9. Avoiding these unsuitable combinations of factors that are accountable for landslides is conducive to ensuring the safety of the power supply system.

Findings and limitations of this study

There are some researches generating the LSMs by using CNN models (Hakim et al. 2022; Sameen et al. 2020; Wang et al. 2019b; Yi et al. 2020). Although, the remarkable conclusions and practical application examples are summarized by these researches, however, there are no further investigations on framework regularity of model performance, which should be much beneficial to understand the CNN architectures. The method based on the LSMs obtained by the CNNs with Bayesian optimization provides a new idea for the evaluation of transmission tower foundation stability.

One of the main limitations of this study is that the sample points are very located in the test site, which is common in most existing studies (Chang et al. 2022; Huang et al. 2022, 2020a). It is a challenge to collect these landslide inventory data no matter from an on-site field survey or a search from remote sensing images and historical data because a lot of professional knowledge will be needed (Zhu et al. 2020). Some studies (Aditian et al. 2018; Huang et al. 2019) converted the landslides to pixels (for example 30 m) of landslide cells to increase the number of samples. However, the above operation requires the landslide influencing factors with higher spatial resolution to provide richer spatial information around the landslide (Yi et al. 2020). And the determination of the optimal spatial resolution factors such as DEM and its derivatives requires a lot of experiments (Merghadi et al. 2020). Meanwhile, the generative adversarial networks (GANs) are also used to increase the number of landslide samples (Al-Najjar and Pradhan 2021), however, this method does not always improve the model performance due to over-fitting. Overall, the impact of these sample generation methods on model performance has not been explored in the same region. Thus, their applicability for LSM deserves further investigation. The Bayesian optimization-based CNNs used in LSMs can also be applied to analyze the susceptibility of other geohazards, for example, floods (Panahi et al. 2021; Tien Bui et al. 2020), land subsidence (Hakim et al. 2020) and multi-hazard (Ullah et al. 2022). As the landslide prediction model in this study is only limited to a deep learning framework, future researches on hybrid deep learning methods (e.g., hybrid deep learning frameworks, hybrid deep learning-machine learning frameworks) may improve the reliability and accuracy of LSMs.

Concluding remarks

Two well-known CNN hyperparameter optimization algorithms, namely Bayesian optimization and the gravitational search algorithm, are applied to generate the LSM of the transmission tower foundation slopes in Xinyi and Luoding counties. Overall, three regularities are found:

  1. (a)

    The CNNs produce the best result with 3 convolution layers.

  2. (b)

    The best performance is achieved without using pooling layers.

  3. (c)

    A piecewise decay learning rate strategy yields better performance.

Additionally, Bayesian optimization algorithm is more efficient and stable than the gravitational search algorithm. The LSM results indicate that there are 72%, 14%, and 14% of transmission towers are located in low and very low, moderate, and high and very high landslide susceptibility zones, respectively, among which, those in high and very high landslide susceptibility zones need more attention.