Abstract

Background

Landslide-affecting factors are uncorrelated or non-linearly correlated, limiting the predictive performance of traditional machine learning methods for landslide susceptibility assessment. Deep learning methods can take advantage of the high-level representation and reconstruction of information from landslide-affecting factors. In this paper, a novel deep learning-based algorithm that combine classifiers of both deep learning and machine learning is proposed for landslide susceptibility assessment. A stacked autoencoder (StAE) and a sparse autoencoder (SpAE) both consist of an input layer for raw data, hidden layer for feature extraction, and output layer for classification and prediction. As a study case, Oda City and Gotsu City in Shimane Prefecture, southwestern Japan, were used for susceptibility assessment and prediction of landslides triggered by extreme rainfall.

Results

The prediction performance was compared by analyzing real landslide and non-landslide data. The prediction performance of random forest (RF) was evaluated as better than that of a support vector machine (SVM) in traditional machine learning, so RF was combined with both StAE and SpAE. The results show that the prediction ratio of the combined classifiers was 93.2% for StAE combined with RF model and 92.5% for SpAE combined with RF model, which were higher than those of the SVM (87.4%), RF (89.7%), StAE (84.2%), and SpAE (88.2%).

Conclusions

This study provides an example of combined classifiers giving a better predictive ratio than a single classifier. The asymmetric and unsupervised autoencoder combined with RF can exploit optimal non-linear features from landslide-affecting factors successfully, outperforms some conventional machine learning methods, and is promising for landslide susceptibility assessment.

Recently, deep learning algorithms have made a series of revolutions in the field of ​​machine learning (Huang et al. 2019) since the classification capability of a neural network to fit a decision boundary plane has become significantly more reliable (LeCun et al. 2015) which can successfully learn and extract patterns and unique features from big data (Ayinde et al. 2019). Deep learning also can effectively avoid local optimization and eliminates the need to set model parameters because of autonomous processes (Zhang et al. 2017). At the moment, the core techniques of deep learning are neural networks that have two or more hidden layers, including the following techniques: the adaptive neuro-fuzzy inference system (Park et al. 2012); recurrent neural networks (Chen et al. 2015); deep belief networks (Huang and Xiang 2018); long short-term memory (Xiao et al. 2018; Yang et al. 2019); and convolutional neural networks (Wang et al. 2019). Deep learning-based autoencoder is a semi-unsupervised learning method with no prior knowledge, such as landslide inventory, which means that landslide and non-landslide labels and linear and non-linear correlation assumptions are not needed (Huang et al. 2019). For landslide susceptibility assessment, traditional methods for de-correlation are based on the prior assumption that there are linear correlations between landslides and non-landslides. However, landslide-affecting factors are usually non-linear in practical applications. The autoencoder driven by data rather than prior knowledge can transform raw data into non-linear correlated features.

In this paper, novel deep learning algorithms, namely, both stacked autoencoder and sparse autoencoder combined with traditional machine learning, are proposed for landslide susceptibility prediction. StAE and SpAE are unsupervised learning as it does not require external labels on landslides information. The encoding and decoding process all happen in the dataset. The input and output data have the same number of dimensions, and the hidden layer has fewer dimensions. Autoencoders are learned automatically from dataset, which is easy to train specialized instances of the algorithm that will perform well on a specific type of landslide-affecting factors. The autoencoder technique takes advantage of dimension reduction by stacked autoencoder and dropout by sparse autoencoder for non-linear correlations of the landslide-affecting factors and gives better feature descriptions than the original data. It does not require any additional methods which are required for appropriate training data. In summary, this study proposes the combined method of the advantage of deep learning and the benefits of machine learning for landslide susceptibility assessment. The landslides in Oda City and Gotsu City in Shimane Prefecture, southwestern Japan, are used as case study. A stacked autoencoder and sparse autoencoder are combined with random forest acquired from the results of a better predictive performance between support vector machine and random forest.

Study area

The study area is located in Oda City and Gotsu City, Shimane Prefecture, southwestern Japan (Fig. 1). The elevation varies from sea level to 1123 m (Table 1). The average annual rainfall recorded from the rainfall stations at Fukumitsu, Oda, and Sakurae are 1657 mm, 1786 mm, and 2011 mm from 2008 to 2018 (Fig. 2). The cumulative rainfall for 2013 recorded from the rainfall stations at Fukumitsu, Oda, and Sakurae are 2270 mm, 2102 mm, and 2656 mm, respectively (http://www.jma.go.jp/jma/index.html). In this study, a total of 90 landslides were caused by extreme rainfall from May to October 2013 (Table 2), and 69 of the landslides were triggered by extreme rainfall in August 2013. These landslides can be described as shallow landslides that were determined based on field investigation.

Fig. 1

Study area and landslide inventory of Oda Citiy and Gotsu City in Shimane Prefecture, southwestern Japan (RGB color of Sentinel-2 satellite)

Spatial data setting

Landslide susceptibility prediction can be evaluated as a binary classification problem between landslides and non-landslides. A spatial database setting including landslide pixel grid, non-landslide pixel grid, and related landslide-affecting factors is needed for statistical analysis (Huang et al. 2019). This spatial database was divided into a training dataset and a validation dataset.

These real 90 of landslides and 90 of non-landslides artificially generated from ArcGIS software were randomly split into two parts with a ratio of 70% and 30%. Seventy percent of the landslide and non-landslide grid cells were selected for the training model, and the remaining 30% were used for the validation model. Furthermore, the landslide (event) and non-landslide (non - event) grid cells were set to 1 and 0, respectively, and the values of 1 and 0 were used for classification and prediction as the output variables of the landslide susceptibility prediction models. Thereafter, the calculated frequency ratio (FR) values were considered as numeric input variables of landslide susceptibility prediction models.

The landslide-affecting factors in study area are complex, and it is difficult to confirm which affecting factors are the most important and necessary among the topographic, geological, hydrological, distance to stream and distance to road. In landslide susceptibility modeling, landslides may reoccur under conditions similar to those of past landslides (Westen et al. 2003; Lee and Talib 2005; Dagdelenler et al. 2016). A total of 14 affecting factors were acquired and chosen as input variables for landslide susceptibility models (Figs. 3, 4 and 5).

For continuous affecting factors, the Jenks natural break method was used to divide each continuous affecting factor into five classes. Then the frequency ratio of all subclasses of each landslide affecting factor was calculated as shown in Tables 1, 2 and 3. The frequency ratio allows that all 14 landslide-affecting factors have significant influences on landslide occurrence. Some studies have suggested that the correlations between affecting factors should be eliminated to reduce model noise for the landslide susceptibility assessment (Hong et al. 2017; Lin et al. 2017; Chen et al. 2018a). However, the number of input variables of the deep learning algorithm is generally hundreds or thousands due to their strong feature extraction ability, and 14 input variables will not result in information redundancy. On the other hand, some collinearity phenomena between landslide-affecting factors can be tolerated by the fast-developed machine learning models (Huang et al. 2019). These 14 landslide-affecting factors provide valuable information for producing landslide susceptibility maps, as quantitative measurement determined by frequency ratio. Therefore, all 14 landslide-affecting factors are utilized as input variables in the model to evaluate their capabilities in performance and feature extraction for the landslide susceptibility assessment.

Table 3 Description and frequency ratio (FR) of geological factors in the study area

Methodology

This study was performed using the following main steps (Fig. 6): (1) correlation analysis between landslide inventory and landslide-affecting factors using frequency ratio, (2) landslide susceptibility prediction using SVM and RF models in machine learning, (3) landslide susceptibility prediction using StAE and SpAE employing back propagation neural network in deep learning, (4) evaluation of StAE and SpAE combined with machine learning acquired from a better prediction ratio between SVM and RF, and (5) validation and comparison of predictive performance from the area under the curves and landslide susceptibility maps produced by six models. The landslide samples were created after collecting and preparing the landslide inventory map, the DEM derived factors, and remote sensing and geological factors. The landslide inventory samples were counted and used to randomly generate non-landslide samples. The final data combined the landslides and non-landslides samples with a defined label (1 and 0, respectively) for each sample. Fourteen landslide-affecting factors were prepared from a spatial database. The values of the landslide-affecting factors at each sample location were utilized, and the derived information was prepared using RStudio. The dependent variable was converted with one-hot encoding. The data were then categorized into subsets: for training (70%) and validation (30%). The StAE and SpAE model was trained in an unsupervised manner for feature extraction, and a set of new features was generated. These new features were used to train StAE-bpnn and SpAE-bpnn in deep learning, and anomaly detection based StAE with RF and SpAE with RF which is selected as better prediction rate than SVM model. In this study, the validation of the proposed models was based on a well-known area under the receiver operating characteristic curve. Parameter tuning was also utilized to assess better accuracy. Finally, landslide susceptibility maps were generated using equal interval function in ArcGIS 10.6 software.

Frequency ratio (FR)

The number of landslide pixel grids in each class is evaluated, and the frequency ratio for each factor class is assigned by dividing the landslide ratio by the area ratio. The frequency ratio shows the correlation between landslides and affecting factors in a specific area. If this ratio is greater than 1, then the relationship between a landslide and the affecting factor’s class will be strong but if the ratio is less than 1, then the relationship will be weak. If the value is 1, it means an average correlation (Meten et al. 2015).

Support vector machine (SVM)

Two main principles of SVM are the optimal classification hyperplane and the use of kernel features. The purpose of optimal sorting hyperplanes is to accurately distinguish the two types of samples between landslides and non-landslides while maximizing the sorting margin. Determining kernel function and optimal parameters are critical for evaluating landslide susceptibility using SVM. Polynomial kernels and radial basis function are the most commonly used kernels in the literature (Huang and Zhao 2018). To optimize two parameters, both penalty coefficient C and kernel function parameters are needed in the SVM model.

Random forest (RF)

The random forest, a classification tree algorithm with repeated dichotomy data, can significantly reduce the computations required for classification and regression. In RF algorithms, predictive models are established by utilizing many decision trees. Based on randomly selected variables and samples, these trees and their decisions are generated. Once the model is established, the samples are first sorted individually according to all decision trees in the model, and then by all trees (Huang and Zhao 2018). The proportion of decision tree estimates and generates landslide susceptibility indexes, which can predict landslide occurrence between all decision trees in the RF model (Goetz et al. 2015).

Stacked autoencoder (StAE)

The StAE is an artificial neural network, which is a special type of multi-layer perceptron. It is a type of unsupervised learning algorithm with an asymmetric structure, in which the middle layer represents the encoding of the input data in the bottleneck layer (Yu and Príncipe 2019). The bottleneck constrains the amount of information that can traverse the full network, forcing the learned compression of the input data. The StAE is trained to reconstruct the input of landslide-affecting factors onto the output layer for feature representation, which prevents the simple copying of the data and the network. The middle layer has a lower dimension to avoid overfitting, which can either select a subset of features with the highest importance or apply some dimension reduction techniques (Hinton and Salakhutdinov 2006; Charte et al. 2018). In this study, the StAE combined with back propagation neural network was processed for a lower dimension of features than the input data have, which can be used for learning the most important features of the data.

Sparse autoencoder (SpAE)

The SpAE consists of an input layer, hidden layers, and an output layer. Each layer in this neural network contains a sufficient number of neurons. Dropout can randomly classify the weight of some implicit layer nodes and reduce the mutual dependence between nodes to realize the normalization of neural networks. Additionally, dropout can effectively prevent overfitting and gradient disappearance (Huang et al. 2019). To initially achieve de-correlation among the 14 landslide-affecting factors, dropout was added to the input layer.

The process of StAE is as follows. First, some of the neurons in the network are randomly dropped in the mini-batch training samples and the remaining neurons are fed to the next layer. After obtaining this mini-batch training sample, the deleted neurons are recovered and some neurons in the network are randomly deleted once again. The corresponding parameters are updated based on the stochastic gradient descent method, performed on the neurons that have not been removed.

Results

Landslide susceptibility modelling using the six models

All models based on the deep learning and machine learning were coded in R language on RStudio. For the SVM model and RF model, parameters were determined using a 10-fold cross-validation approach. With radial basis function, SVM model was acquired from grid search for SVM parameter tuning. For RF model, it was composed of ‘mtry’ and ‘tree’, which were 3 and 300, respectively. The autoencoder models based on the deep neural network were coded in R language on RStudio using H2O packages. These algorithms were performed using hyperbolic tangent function (i.e., the tanh function) in every hidden layer which was used to encode and decode the input to the output in the undercomplete autoencoder. In the H2O library, five hidden layers with encoders and decoders were designed by using the tanh activation function in each layer. Stacked autoencoders (StAE) were constructed by organizing autoencoder on top of each other also known as deep autoencoder. StAE consists of multiple autoencoder stacked into multiple layers where the output of each layer was wired to the inputs of the successive layers, as seen in Fig. 7, which was composed of 80–50–2-50-80. To obtain good parameters, StAE employed greedy layer-wise training. The benefit of StAE was that it can evaluate the benefits of deep network, which has greater expressive power. Furthermore, it usually can capture useful hierarchical grouping of the input. Finally, model construction was determined by the majority vote among all trees using RF models. The aim of sparse autoencoder (SpAE) was to make a large number of neurons to have low average output so that neurons may be inactive most of the time. The limitation of autoencoders to have only small numbers of hidden units can be overcome by adding a sparsity constraint, where a large number of hidden units can be utilized usually more than one input. Three hidden layers with encoders and decoders were designed by using the tanh activation function in each layer in the H2O library. Sparsity can be achieved by introducing a loss function during training or manually zeroing few strongest hidden unit activations, which was composed of 200–200-200 (Fig. 8). For classification, model class was constructed by RF model by means of the majority vote among all trees. Reconstruction error value employing mean square error was used by means of anomaly detection in both StAE and SpAE, which were 0.068 and 0.088, respectively.

Landslide susceptibility maps produced by the six models

The landslide susceptibility maps were derived from SVM, RF, StAE, SpAE, StAE with RF, and SpAE with RF in the ArcGIS 10.6 software (Fig. 9). For better visualization and comparison, the indices were reclassified into five classes using the equal interval function: very low (0–0.2), low (0.2–0.4), moderate (0.4–0.6), high (0.6–0.8), and very high (0.8–1). The susceptibility class area of the StAE model as the best performance (Table 4) were 6.31%, 13.58%, 33.04%, 36.81%, and 10.26%, respectively. The susceptibility class area of the RF model (Fig. 9b) and StAE model (Fig. 9c) has very high value. The susceptibility index value of the SVM model (Fig. 9a) and StAE models (Fig. 9c) were prominent near the road (Fig. 3f). SpAE and SpAE with RF have lower values of class area percentage for a very high (0.8–1.0) index of the susceptibility map. RF and StAE have lower values of class area percentage for a moderate (0.4–0.6) index of the susceptibility map. StAE with RF and SpAE with RF have lower values of class area percentage for a very low (0.0–0.2) index of the susceptibility map (Fig. 9d, e, f and Table 4).

Discussion

Validation of prediction performance

The landslide susceptibility assessment was verified using the area under the curve on the validation dataset for six models. The predictive ratio for landslide susceptibility assessment is mainly calculated by confusion matrix. The true positive rate (TPR) is defined as the ratio of true positive to the sum of true positive and false negative, and the false positive rate (FPR) is defined as the ratio of false positive to the sum of false positive and true negative to the number of validation samples (Zhang and Wang 2019). In general, the true positive defines the landslide grid cells that are predictive as landslides, true negative means non-landslide grid cells that are predictive as non-landslides, false-positive reflects non-landslide grid cells that are predictive as landslides, and false negative means landslide grid cells that are predictive as non-landslides (Huang et al. 2019). The area under the curve was applied to assess the prediction performance of landslide susceptibility index values on the validation dataset. The prediction rate values of SVM, RF, StAE, SpAE, StAE with RF and SpAE with RF model are obtained by calculating the area under the prediction rate curves. The StAE with RF and SpAE model of combined classifier have relatively higher prediction rates than using SVM, RF, StAE, and SpAE model of single classifier (Fig. 10). This means that the classifiers combined with both autoencoder and traditional machine learning are better than using a single classifier. Autoencoder is unsupervised learning as it does not require external labels on landslide information. The encoding and decoding process all happen within the dataset. The input and output data have the same number of dimensions, and the hidden layer has fewer dimensions. Thus, it contains compressed information of the input layer, which is why it acts as a dimension reduction for the original input layer. From the hidden layer, the neural network is able to decode the information to its original dimensions. Autoencoders are learned automatically from data examples, which is a useful property. It means that it is easy to train specialized instances of the algorithm that will perform well on a specific type of input. It does not require any additional methods which are required for appropriate training data.

Fig. 10

The area under the curves for prediction ratio and validation of landslide susceptibility maps produced by the six models

Sample size

One of the challenges for landslide susceptibility mapping is to suggest the sample size on the number of landslide inventories. Several articles have been reported to address adequate numbers of landslide inventories needed to make acceptable landslide susceptibility mapping where sample size varies from 0 to several thousand in different scales of study areas. The sample size affects the result of the statistical analysis, as an increase in sample size, the result would be more acceptable. According to Demoulin and Chung (2007), in spite of the limited sample size using ten landslides in about 15 × 15 km scale, Bayesian method in machine learning delivered satisfying prediction rates. Heckmann et al. (2014) state that small samples result in large standard errors and wide confidence intervals for the population parameters. In the case of regression parameters, small samples cause the estimation to be uncertain, and there is a higher risk of coefficients being insignificant when the respective confidence interval includes zero. With respect to replicate sampling and model selection, it is expected that the diversity of models. However, increasing sample sizes causes standard errors and confidence intervals in parameter estimation to decrease. In a significance-based stepwise model selection, very large samples are expected to facilitate the inclusion of more and more variables. Reichenbach et al. (2018) present that some articles did not use any landslide inventory, which are based on the relative importance of the thematic maps as landslide-affecting factors (Adler and Huffman 2007). In this study, all models obtained from 84% to 93% prediction rate using 90 landslides (about 20 km square), which is similar to previous study (Sabokbar et al. 2014) of different study area where 82 landslides were used (about 24 km square).

Study limitation

In this study, all landslide points were obtained through GPS by field investigation from May to October in 2013 without the aid of satellite imagery or unnamed aerial vehicle (UAV). As seen in Fig. 2f, most landslide points were in the vicinity of human activity near the roads in the mountains, not inside the mountainous area. The landslide inventory near the roads may affect landslide susceptibility maps (Fig. 9), which results in landslide susceptibility index value near the roads higher than in other areas.

Landslide susceptibility mapping is based on the probability of reoccurrence at the area where landslides already occurred, unlike mapping physically based on modeling, which relies on as follows: 1) the number of abundant landslide inventories for statistical analysis, 2) sampling strategy to construct non-landslide for regression and classification, 3) scale of study area, 4) resolution of DEM, 5) relatively equal scatter distribution of landslide inventory in study area 6) setting boundary of study area to construct landslide-affecting factors, 7) reasonable selection of landslide-affecting factors. To construct distinct landslide inventory with distinguishing landslide triggering factors between rainfall and earthquake is considered the most important key step than using any advanced classifier for landslide susceptibility mapping.

Conclusion

In this study, the classifiers combined with both deep learning and traditional machine learning, StAE with RF and SpAE with RF models, are proposed for landslide susceptibility prediction. The autoencoder consists of input layers for raw data, hidden layers for feature extraction, and output layers for landslide susceptibility prediction. The combined classifiers have the advantage of both machine learning and deep learning, i.e., dimension reduction of the StAE model and dropout of the SpAE model for feature extraction.

The six models were applied in Oda City and Gotsu City, Shimane Prefecture, southwestern Japan. The correlation between landslides and landslide-affecting factors using frequency ratio was high in NDVI, distance to road, and altitude. Performance assessment was carried out with the SVM, RF, StAE, SpAE, StAE with RF, and SpAE with RF models. The results show that the proposed StAE with RF and SpAE with RF models have a relatively better prediction rate than a single classifier such as SVM, RF, StAE and SpAE models. In conclusion, the proposed combined classifier is promising for classification between landslide and non-landslide following landslide susceptibility prediction because it can overcome the limitations of conventional machine learning algorithms, extract features and pattern recognition, reduce computations, and improve performance.

Funding

The study was financially supported by funding awarded under the study, “Initiation and motion mechanisms of long runout landslides due to rainfall and earthquake in the falling pyroclastic deposit slope area” (JSPS-B-19H01980, Principal Investigator: Fawu Wang).

Contributions

FW conducted field investigation in 2013 and provided guidance in the study area of landslides triggered by extreme rainfall in Shimane Prefecture, southwestern Japan. KN carried out the landslide susceptibility assessment and produced landslide susceptibility maps using deep learning combined with machine learning. Both authors read and approved the final manuscript.

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.