Abstract
This paper addresses the issue of the prediction of slope stability with machine learning (ML) applications. Five well-known and popular ML algorithms, namely neural network (NNet), decision tree (DT), support vector machine (SVM), k-nearest neighbor (kNN), and random forest (RF), are used to demonstrate the effectiveness of the ML algorithms for predicting binary classification of slope stability based on a case history dataset containing outliers. This study also evaluates the winsorization method used to treat outliers in the dataset by outlining the effect of outliers on the prediction performances of models. To this end, the performance of all the generated ML models is assessed and compared both for unwinsorized (e.g., raw) and winsorized datasets based on performance metrics (i.e., Recall, Precision, Accuracy, and F1-Score) obtained from the confusion matrix. The experimental outputs showed that the application of winsorization enhanced the prediction performance of the models, and thus, all ML models built with winsorized datasets outperformed the unwinsorized ones. In this paper, the RF model achieves the best prediction performance, especially in the case of the winsorized dataset used. Moreover, it is found that SVM is the most sensitive algorithm to outliers as against the other ML algorithms, while the kNN algorithm is the least among the applied algorithms. Results showed that the increment percentage of accuracy nearly reaches 20% for the SVM model and the following 18% for DT, 11% for NNet, 10% for RF, and 4% for kNN, respectively. Furthermore, the results of the study reveal not only the performance of ML algorithms for the slope stability problem but also show how the handling of outliers of a dataset affects the models’ prediction performance.
Similar content being viewed by others
Data Availability
The dataset analyzed during the current study are publicly available at location cited in the reference section.
References
Abramson LW, Lee TS, Sharma S, Boyce GM (2001) Slope stability and stabilization methods. John Wiley & Sons
Aguinis H, Gottfredson RK, Joo H (2013) Best-practice recommendations for defining, identifying, and handling outliers. Organ Res Methods 16:270–301. https://doi.org/10.1177/1094428112470848
Barnett V, Lewis T (1984) Outliers in statistical data, Wiley series in probability and mathematical statistics: applied probability and statistics. Wiley
Breiman L (2001) Random forests. Mach Learn 45:5–32
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees, vol 432. Wadsworth International Group, Belmont, CA, p 9
Brownlee J (2020) Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. Machine Learning Mastery
Chakraborty A, Goswami D (2017) Prediction of slope stability using multiple linear regression (MLR) and artificial neural network (ANN). Arab J Geosci 10:1–11. https://doi.org/10.1007/s12517-017-3167-x
Choobbasti AJ, Farrokhzad F, Barari A (2009) Prediction of slope stability using artificial neural network (case study: Noabad, Mazandaran, Iran). Arab J Geosci 2:311–319. https://doi.org/10.1007/s12517-009-0035-3
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Dawson EM, Roth WH, Drescher A (1999) Slope stability analysis by strength reduction. Geotechnique 49:835–840. https://doi.org/10.1680/geot.1999.49.6.835
Demir S, Sahin EK (2023a) Random forest importance-based feature ranking and subset selection for slope stability assessment using the ranger implementation. Eur J Sci Technol 4823–28. https://doi.org/10.31590/ejosat.1254337
Demir S, Sahin EK (2023b) An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost. Neural Comput App 35(4):3173–3190. https://doi.org/10.1007/s00521-022-07856-4
Demir S, Sahin EK (2023c) Predicting occurrence of liquefaction-induced lateral spreading using gradient boosting algorithms integrated with particle swarm optimization: PSO-XGBoost, PSO-LightGBM, and PSO-CatBoost. Acta Geotech 18:3403–3419. https://doi.org/10.1007/s11440-022-01777-1
Do HY, Cetin KS (2018) Evaluation of the causes and impact of outliers on residential building energy use prediction using inverse modeling. Build Environ 138:194–206. https://doi.org/10.1016/j.buildenv.2018.04.039
Duncan JM (1996) State of the art: limit equilibrium and finite-element analysis of slopes. J Geotech Eng-Asce 122:577–596. https://doi.org/10.1061/(Asce)07339410(1996)122:7(577)
Duncan JM, Wright SG, Brandon TL (2014) Soil strength and slope stability. John Wiley & Sons
Eberhardt E (2003) Rock slope stability analysis–utilization of advanced numerical techniques. Earth and Ocean sciences at UBC:41
Feng XD, Li SC, Yuan C, Zeng P, Sun Y (2018) Prediction of slope stability using naive bayes classifier. KSCE J Civ Eng 22:941–950. https://doi.org/10.1007/s12205-018-1337-3
Field A (2013) Discovering statistics using IBM SPSS statistics. SAGE
Frost J (2019) Introduction to Statistics: an intuitive guide for analyzing data and unlocking discoveries. Statistics By Jim Publishing
Gareth J, Daniela W, Trevor H, Robert T (2013) An introduction to statistical learning: with applications in R. Spinger
Ghosh D, Vogt A, Outliers (2012) : An evaluation of methodologies. In: Joint statistical meetings,
Gou JP, Sun LY, Du L, Ma HX, Xiong TS, Ou WH, Zhan YZ (2022) A representation coefficient-based k-nearest centroid neighbor classifier. Expert Syst Appl 194. https://doi.org/10.1016/j.eswa.2022.116529
Griffiths DV, Lane PA (1999) Slope stability analysis by finite elements. Geotechnique 49:387–403. https://doi.org/10.1680/geot.1999.49.3.387
Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, vol 2. Springer
Hiran KK, Jain RK, Lakhwani K, Doshi R (2021) Machine learning: master supervised and unsupervised learning algorithms with real examples (English Edition). BPB Publications
Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, IEEE, pp 278–282
Hoang ND, Pham AD (2016) Hybrid artificial intelligence approach based on metaheuristic and machine learning for slope stability assessment: a multinational data analysis. Expert Syst Appl 46:60–68. https://doi.org/10.1016/j.eswa.2015.10.020
Krahn J (2003) The 2001 R.M. hardy lecture: the limits of limit equilibrium analyses. Can Geotech J 40:643–660. https://doi.org/10.1139/T03-024
Li J, Dong M (2012) Method to predict slope safety factor using SVM. In: Earth and Space 2012: Engineering, Science, Construction, and Operations in Challenging Environments. pp 888–899
Li AJ, Cassidy MJ, Wang Y, Merifield RS, Lyamin AV (2012) Parametric Monte Carlo studies of rock slopes based on the Hoek-Brown failure criterion. Comput Geotech 45:11–18. https://doi.org/10.1016/j.compgeo.2012.05.010
Lim K, Lyamin AV, Cassidy MJ, Li AJ (2016) Three-dimensional slope stability charts for frictional fill materials placed on purely cohesive clay. Int J Geomech 16. https://doi.org/10.1061/(Asce)Gm.1943-5622.0000526
Lin Y, Zhou K, Li JL (2018) Prediction of slope stability using four supervised learning methods. Ieee Access 6:31169–31179. https://doi.org/10.1109/Access.2018.2843787
Liu ZB, Shao JF, Xu WY, Chen HJ, Zhang Y (2014) An extreme learning machine approach for slope stability evaluation and prediction. Nat Hazards 73:787–804. https://doi.org/10.1007/s11069-014-1106-7
Mahmoodzadeh A, Mohammadi M, Ali HFH, Ibrahim HH, Abdulhamid SN, Nejati HR (2022) Prediction of safety factors for slope stability: comparison of machine learning techniques. Nat Hazards 111:1771–1799. https://doi.org/10.1007/s11069-021-05115-8
Nanehkaran YA, Licai Z, Chengyong J, Chen J, Anwar S, Azarafza M, Derakhshani R (2023) Comparative analysis for slope stability by using machine learning methods. Appl Sci 13(3):1555. https://doi.org/10.3390/app13031555
Neaupane KM, Achet SH (2004) Use of backpropagation neural network for landslide monitoring: a case study in the higher Himalaya. Eng Geol 74:213–226. https://doi.org/10.1016/j.enggeo.2004.03.010
Pham K, Kim D, Park S, Choi H (2021) Ensemble learning-based classification models for slope stability analysis. CATENA 196. https://doi.org/10.1016/j.catena.2020.104886
Pirizadeh M, Alemohammad N, Manthouri M, Pirizadeh M (2021) A new machine learning ensemble model for class imbalance problem of screening enhanced oil recovery methods. J Petrol Sci Eng 198. https://doi.org/10.1016/j.petrol.2020.108214
Qian ZG, Li AJ, Merifield RS, Lyamin AV (2015) Slope stability charts for two-layered purely cohesive soils based on finite-element limit analysis methods. Int J Geomech 15. https://doi.org/10.1061/(Asce)Gm.1943-5622.0000438
Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier
Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection. John Wiley & Sons
Sahin EK (2023) Implementation of free and open-source semi-automatic feature engineering tool in landslide susceptibility mapping using the machine-learning algorithms RF, SVM, and XGBoost. Stoch Env Res Risk A 37:1067–1092. https://doi.org/10.1007/s00477-022-02330-y
Singh A, Misra SC (2020) A dominance based Rough Set analysis for investigating employee perception of safety at workplace and safety compliance. Saf Sci 127. https://doi.org/10.1016/j.ssci.2020.104702
Steward T, Sivakugan N, Shukla SK, Das BM (2011) Taylor’s slope stability charts revisited. Int J Geomech 11:348–352. https://doi.org/10.1061/(Asce)Gm.1943-5622.0000093
Tukey JW (1962) The future of data analysis. Ann Math Stat 33:1–67
Wang L, Gopal R, Shankar R, Pancras J (2015) On the brink: Predicting business failure with mobile location-based checkins. Decis Support Syst 76:3–13. https://doi.org/10.1016/j.dss.2015.04.010
Wu J, Chen X-Y, Zhang H, Xiong L-D, Lei H, Deng S-H (2019) Hyperparameter optimization for machine learning models based on bayesian optimization. J Electron Sci Technol 17:26–40
Xiao SG, Guo WD, Zeng JX (2018) Factor of safety of slope stability from deformation energy. Can Geotech J 55:296–302. https://doi.org/10.1139/cgj-2016-0527
Xue XH (2017) Prediction of slope stability based on hybrid PSO and LSSVM. J Comput Civil Eng 31. https://doi.org/10.1061/(Asce)Cp.1943-5487.0000607
Yang X-S (2019) Introduction to algorithms for data mining and machine learning. Academic press
Yang XL, Yin JH (2004) Slope stability analysis with nonlinear failure criterion. J Eng Mech-Asce 130:267–273. https://doi.org/10.1061/(Asce)0733-9399
Yang B, Li XJ, Liu YH, Chen LG, Guo RQ, Wang FM, Yan K (2022) Comparison of models for predicting winter individual thermal comfort based on machine learning algorithms. Build Environ 215. https://doi.org/10.1016/j.buildenv.2022.108970
Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv tutorials 12:159–170
Zhang ZF, Liu ZB, Zheng LF, Zhang Y (2014) Development of an adaptive relevance vector machine approach for slope stability inference. Neural Comput Appl 25:2025–2035. https://doi.org/10.1007/s00521-014-1690-1
Zhang WG, Li HR, Han L, Chen LL, Wang L (2022) Slope stability prediction using ensemble learning techniques: a case study in Yunyang County, Chongqing, China. J Rock Mech Geotech 14:1089–1099. https://doi.org/10.1016/j.jrmge.2021.12.011
Zhou Z-H (2021) Machine learning. Springer Nature
Zhou J, Li EM, Yang S, Wang MZ, Shi XZ, Yao S, Mitri HS (2019) Slope stability prediction for circular mode failure using gradient boosting machine approach based on an updated database of case histories. Saf Sci 118:505–518. https://doi.org/10.1016/j.ssci.2019.05.046
Zhu JL, Ge ZQ, Song ZH, Gao FR (2018) Annu Rev Control 46:107–133. https://doi.org/10.1016/j.arcontrol.2018.09.003. Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Contributions
S.D. Conceptualization, Investigation, Writing-review and editing, Writing-original draft, Visualization. E.K.S. Conceptualization, Methodology, Software, Writing-review and editing, Visualization.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Communicated by H. Babaie.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Demir, S., Sahin, E.K. Application of state-of-the-art machine learning algorithms for slope stability prediction by handling outliers of the dataset. Earth Sci Inform 16, 2497–2509 (2023). https://doi.org/10.1007/s12145-023-01059-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-023-01059-8