Abstract
In this study, we have proposed a new classification method for determining different soil classes based on three machine learning approaches, namely: support vector classification (SVC), multilayer perceptron (MLP), and random forest (RF) models. For the development of models, we have used a database of 4888 soil samples obtained from Vietnam projects. In the model’s study, 15 soil properties factors (variables) have been selected as input parameters for classifying soil samples into 5 soil classes: lean clay (CL), elastic silt (MH), fat clay (CH), clayey sand (SC), and silt (ML). To evaluate and analyze the results quantitatively and qualitatively, various methods such as learning curve (time and number of training samples), confusion matrix, and several statistical metrics such as precision, recall, accuracy, and F1-score were used. Results indicated that performance of all the three models (average accuracy score = 0.968) is good but of the SVC model (accuracy score = 0.984) is best in accurate classification of soils.












Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Ahmad MW, Reynolds J, Rezgui Y (2018) Predictive modelling for solar thermal energy systems: a comparison of support vector regression, random forest, extra trees and regression trees. J Clean Prod 203:810–821
Albon C (2018) Machine learning with python cookbook: practical solutions from preprocessing to deep learning. O’Reilly Media, Inc
Ao Y, Li H, Zhu L, Ali S, Yang Z (2019) The linear random forest algorithm and its advantages in machine learning assisted logging regression modeling. J Pet Sci Eng 174:776–789
Archer A (1970) Standardization of the size classification of naturally occurring particles. Geotechnique 20:103–107
Atterberg A (1911) Über die physikalishe Bodenuntersuchung und über die Plastizität der Tone. Int Mitt Boden 1:10–43
Barman U, Choudhury RD (2020) Soil texture classification using multi class support vector machine. Inf Process Agric 7:318–332
Barnett V, Lewis T (1984) Outliers in statistical data. osd
Beucher A, Møller AB, Greve MH (2019) Artificial neural networks and decision tree classification for predicting soil drainage classes in Denmark. Geoderma 352:351–359
Bhargavi P, Jyothi S (2009) Applying naive bayes data mining technique for classification of agricultural land soils. Int J Comput Sci Netw Secur 9:117–122
Breiman L (2001) Random forests. Mach Learn 45:5–32
Brevik EC, Calzolari C, Miller BA, Pereira P, Kabala C, Baumgarten A, Jordán A (2016) Soil mapping, classification, and pedologic modeling: history and future directions. Geoderma 264:256–274
Campbell DJ (1976) Plastic limit determination using a drop-cone penetrometer. J Soil Sci 27:295–300
Cao Z-J, Zheng S, Li D-Q, Phoon K-K (2019) Bayesian identification of soil stratigraphy based on soil behaviour type index. Can Geotech J 56(4):570–586
Carter M, Bentley SP (2016) Soil properties and their correlations. John Wiley & Sons
Casagrande A (1948) Classification and identification of soils. Trans Asce 113:901–991
Chepil W (1955) Factors that influence clod structure and erodibility of soil by wind: IV. Sand, silt, and clay. Soil Sci 80:155–162
Costache R, Bui DT (2019) Spatial prediction of flood potential using new ensembles of bivariate statistics and artificial intelligence: a case study at the Putna river catchment of Romania. Sci Total Environ 691:1098–1118
Das BM, Sivakugan N (2016) Fundamentals of geotechnical engineering. Cengage Learning
Debella-Gilo M, Etzelmüller B (2009) Spatial prediction of soil classes using digital terrain analysis and multinomial logistic regression modeling integrated in GIS: examples from Vestfold County, Norway. CATENA 77:8–18
Gambill DR, Wall WA, Fulton AJ, Howard HR (2016) Predicting USCS soil classification from soil property variables using random forest. J Terramechanics 65:85–92
Géron A (2019) Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. O’Reilly Media
Hackeling G (2017) Mastering machine learning with scikit-learn. Packt Publishing Ltd
Hassanien AE, Moftah HM, Azar AT, Shoman M (2014) MRI breast cancer diagnosis hybrid approach using adaptive ant-based segmentation and multilayer perceptron neural networks classifier. Appl Soft Comput 14:62–71
Heung B, Ho HC, Zhang J, Knudby A, Bulmer CE, Schmidt MG (2016) An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping. Geoderma 265:62–77
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844
Hurwitz J, Kirsch D (2018) Machine learning for dummies. IBM Ltd. Ed. 75
Kelleher JD, Mac Namee B, D’arcy A (2020) Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies. MIT press
Kempen B, Brus DJ, Heuvelink GB, Stoorvogel JJ (2009) Updating the 1: 50,000 Dutch soil map using legacy soil data: a multinomial logistic regression approach. Geoderma 151:311–326
Kovačević M, Bajat B, Gajić B (2010) Soil type classification and estimation of soil properties using support vector machines. Geoderma 154:340–347
Lavanya D, Rani KU (2012) Ensemble decision tree classifier for breast cancer data. Int J Inf Technol Converg Serv 2:17
Li H, Ji G, Ma Z (2007) A nonlinear predictive model based on multilayer perceptron network. Presented at the 2007 IEEE Int Conf Autom Log IEEE pp. 2686–2690
Lim T-S, Loh W-Y, Shih Y-S (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40:203–228
Liu T, Abd-Elrahman A, Morton J, Wilhelm VL (2018) Comparing fully convolutional networks, random forest, support vector machine, and patch-based deep convolutional neural networks for object-based wetland mapping using images from small unmanned aircraft system. Giscience Remote Sens 55:243–264
Liu YH (2017) Python machine learning by example. Packt Publishing Ltd
Mansuy N, Thiffault E, Paré D, Bernier P, Guindon L, Villemaire P, Poirier V, Beaudoin A (2014) Digital mapping of soil properties in Canadian managed forests at 250 m of resolution using the k-nearest neighbor method. Geoderma 235:59–73
Meier M, Souza ED, Francelino MR, Fernandes Filho EI, Schaefer CE (2018) Digital soil mapping using machine learning algorithms in a tropical mountainous area. Rev Bras Ciênc Solo 42
Moreno-Maroto JM, Alonso-Azcárate J, O’Kelly BC (2021) Review and critical examination of fine-grained soil classification systems based on plasticity. Appl Clay Sci 200:105955
Müller AC, Guido S (2016) Introduction to machine learning with Python: a guide for data scientists. O’Reilly Media, Inc
Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press
Ng A, Ngiam J, Foo CY, Mai Y, Suen C, Coates A, Maas A, Hannun A, Huval B, Wang T (2015) Deep learning tutorial. Univ Stanf
Pham BT, Nguyen MD, Bui K-TT, Prakash I, Chapi K, Bui DT (2019) A novel artificial intelligence approach based on multi-layer perceptron neural network and biogeography-based optimization for predicting coefficient of consolidation of soil. CATENA 173:302–311
Polat H, Mehr HD, Cetin A (2017) Diagnosis of chronic kidney disease based on support vector machine by feature selection methods. J Med Syst 41:55
Popovici V, Chen W, Gallas BD, Hatzis C, Shi W, Samuelson FW, Nikolsky Y, Tsyganova M, Ishkin A, Nikolskaya T (2010) Effect of training-sample size and classification difficulty on the accuracy of genomic predictors. Breast Cancer Res 12:R5
Priori S, Bianconi N, Costantini EA (2014) Can γ-radiometrics predict soil textural data and stoniness in different parent materials? A comparison of two machine-learning methods. Geoderma 226:354–364
Silveira CT, Oka-Fiori C, Santos LJC, Sirtoli AE, Silva CR, Botelho MF (2013) Soil prediction using artificial neural networks and topographic attributes. Geoderma 195:165–172
Tien Bui D, Tuan TA, Klempe H, Pradhan B, Revhaug I (2015) Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree Landslides 1–18 https://doi.org/10.1007/s10346-015-0557-6
Tzotsos A, Argialas D (2008) Support vector machine classification for object-based image analysis. Object-Based Image Analysis. Springer, pp. 663–677
Urban G, Tripathi P, Alkayali T, Mittal M, Jalali F, Karnes W, Baldi P (2018) Deep learning localizes and identifies polyps in real time with 96% accuracy in screening colonoscopy. Gastroenterology 155:1069–1078
Xiao T, Zou H-F, Yin K-S, Du Y, Zhang L-M (2021) Machine learning-enhanced soil classification by integrating borehole and CPTU data with noise filtering. Bull Eng Geol Env 80(12):9157–9171
Yang W, Si Y, Wang D, Guo B (2018) Automatic recognition of arrhythmia based on principal component analysis network and linear support vector machine. Comput Biol Med 101:22–32
Zhao T, Wang Y (2020) Interpolation and stratification of multilayer soil property profile from sparse measurements using machine learning methods. Eng Geol 265:105430
Acknowledgements
This study was funded by the Ministry of Education and Training under grant number B2020-GHA-03, chaired by the University of Transportation. The authors would like to thank the support of the Department of Science, Technology, and Environment (Ministry of Education and Training), the University of Transport and Communications, and other agencies for providing data used in this research.
Author information
Authors and Affiliations
Contributions
Manh Duc Nguyen: conceptualization, methodology, software, writing—review and editing, validation, and supervision. Romulus Costache: conceptualization, methodology, software, writing—review and editing, validation, and supervision. An Ho Sy: data curation, writing—original draft, software, and validation. Peyman Yariyan: data curation, writing—original draft, software, and validation. Hassan Ahmadzadeh: data curation, writing—original draft, software, and validation. Hiep Van Le: data curation, writing—original draft, software, and validation. Indra Prakash: writing—review and editing, validation, and supervision. Binh Thai Pham: conceptualization, methodology, software, writing—review and editing, validation, and supervision.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nguyen, M.D., Costache, R., Sy, A.H. et al. Novel approach for soil classification using machine learning methods. Bull Eng Geol Environ 81, 468 (2022). https://doi.org/10.1007/s10064-022-02967-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10064-022-02967-7