RESEARCH ARTICLE


Soil Unconfined Compressive Strength Prediction Using Random Forest (RF) Machine Learning Model



Hai-Bang Ly1, Binh Thai Pham1, *
1 Department of Civil Engineering, University of Transport Technology, Hanoi100000, Vietnam


Article Metrics

CrossRef Citations:
6
Total Statistics:

Full-Text HTML Views: 2947
Abstract HTML Views: 921
PDF Downloads: 664
ePub Downloads: 383
Total Views/Downloads: 4915
Unique Statistics:

Full-Text HTML Views: 1449
Abstract HTML Views: 484
PDF Downloads: 522
ePub Downloads: 255
Total Views/Downloads: 2710



Creative Commons License
© 2020 Ly and Thai Pham.

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: (https://creativecommons.org/licenses/by/4.0/legalcode). This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to this author at the Department of Civil Engineering, University of Transport Technology, Hanoi100000, Vietnam;
E-mail: binhpt@utt.edu.vn


Abstract

Aims:

Understanding the mechanical performance and applicability of soils is crucial in geotechnical engineering applications. This study investigated the possibility of application of the Random Forest (RF) algorithm – a popular machine learning method to predict the soil unconfined compressive strength (UCS), which is one of the most important mechanical properties of soils.

Methods:

A total number of 118 samples collected and their tests derived from the laboratorial experiments carried out under the Long Phu 1 power plant project, Vietnam. Data used for modeling includes clay content, moisture content, specific gravity, void ratio, liquid limit and plastic limit as input variables, whereas the target is the UCS. Several assessment criteria were used for evaluating the RF model, namely the correlation coefficient (R), root mean squared error (RMSE) and mean absolute error (MAE).

Results:

The results showed that RF exhibited a strong capability to predict the UCS, with the R value of 0.914 and 0.848 for the training and testing datasets, respectively. Finally, a sensitivity analysis was conducted to reveal the importance of input parameters to the prediction of UCS using RF. The specific gravity was found as the most affecting variable, following by clay content, liquid limit, plastic limit, moisture content and void ratio.

Conclusion:

This study might help in the accurate and quick prediction of the UCS for practice purpose.

Keywords: Unconfined compressive strength (UCS), Unconfined compression test, Random forest, Machine learning, Root mean squared error (RMSE), Mean absolute error (MAE).



1. INTRODUCTION

Soil science is a complex discipline that involves fundamental and applied aspects of soil biology, soil physics and soil chemistry [1]. In civil engineering, understanding the mechanical properties of soils in a relationship with the applications is of fundamental importance [2]. Soil mechanics allows engineers to explore the properties and behaviors of soils, so that an adequate solution to given problems could be granted. While treating settlement or damage problems, soil science is important as many construction works are directly affected by the soil mechanics studies, including building, bridges, road, railway, tunnels and dams. Various investigations related to this field have been conducted, for instance, soil mechanical properties [3], permeability of fractured porous media [4-6], consolidation of soil [7-8] and especially the compressive strength of soil.

Indeed, soil Unconfined Compressive Strength (UCS) is an important factor which is used to validate the compaction ability of soil [9]. It can be directly determined in the laboratory through unconfined compression test. However, this test usually takes a long time and is costly, which might increase construction costs. Moreover, the accuracy of such a test depends significantly on the quality of equipment or the experimenter. It is thus necessary to find an alternative and effective way to predict the soil UCS. The use of machine learning algorithms has spread rapidly over the last decades, especially in computer science. Such approach provides a possibility to learn information from data, which is an attractive alternative compared with “manual” learning [10]. In civil engineering, machine learning algorithms have been applied to solve countless real-world problems, such as landslides [11-13], floods [14], weather and climate [15-17], materials science [18-23], engineering structures [24-26] or soil properties [27]. In general, the machine learning approach is promising and potential for accurate and fast prediction of soil properties.

Despite the fact that Random Forest (RF) is one of the most popular and effective machine learning algorithm, limited research has investigated the possibility of using RF in predicting the Soil Unconfined Compressive Strength. In this work, the RF algorithm was developed to investigate the feasibility of applying such a model for quick estimation of the Soil Unconfined Compressive Strength. For this, a total number of 118 samples was collected from Long Phu 1 power plant project and laboratory experiments were carried out to determine the soil properties. The database included input parameters such as clay content, moisture content, specific gravity, void ratio, liquid limit, plastic limit and one output variable, the Unconfined Compressive Strength (UCS). To validate the performance of RF, several assessment criteria were used, namely the correlation coefficient (R), root mean square error (RMSE) and mean absolute error (MAE). Using RF, feature importance analysis of input parameters in predicting the UCS was also conducted with the aim of providing better insights into the problem.

2. DATA COLLECTION AND ANALYSIS

In this study, soil samples were collected from the Long Phu 1 power plant project, located in Soc Trang province, Vietnam. Laboratory tests of 118 soil samples were carried out to determine the soil properties used for the design and construction of the project, and used to generate the training (70%) and testing (30%) datasets for the development of the RF model. In the datasets, there were six soil properties, including the clay content (%), void ratio, liquid limit (%), moisture content (%), plastic limit (%), and specific gravity. They were used as input parameters of the RF model. Besides, the UCS (or qu), determined by using unconfined compression tests in laboratory conditions, was used as an output parameter. Detail description of these parameters can be found in the work of Das and Sobhan [28]. A correlation analysis of the inputs was carried out and presented in Fig. (1). It can be observed that the value of clay content ranging from 2.4-63.4 mm, with an average and median values of 32.63, 31.55, respectively, and a standard deviation of 13.85. The value of moisture content ranging from 0.61-75.14%, with an average and median values of 28.66, 26.42, respectively, and a standard deviation of 13.46. The value of specific gravity ranging from 0.01-2.72, with an average and median values of 2.53, 2.69, respectively, and a standard deviation of 0.64. The value of the void ratio ranging from 0.017-2.089, with an average and median values of 0.83, 0.78, respectively, and a standard deviation of 0.36. The value of the liquid limit ranging from 1.6-74.9%, with an average and median values of 41.54, 42.0, respectively, and a standard deviation of 14.30. The value of plastic limit ranging from 0.6-41.0%, with an average and median values of 20.64, 20.75, respectively, and a standard deviation of 6.31. Finally, the UCS ranging from 0.078-4.43 kG/cm2, with an average and median values of 1.37, 1.21, respectively, and a standard deviation of 0.87.

It can be seen that for all the variables, the median values were very close to the average values, representing that such variables could be approximated by a normal distribution. The inter-correlation between inputs and between input variables and the output are depicted (Fig. 1). From such results, it can be seen that the moisture content is highly correlated with the void ratio. The plastic limit, with a lower level of correlation, is in relatively strong relationships with the liquid limit and the void ratio. Otherwise, no direct relationship is found between the UCS and the input variables presented in the database.

3. METHODS USED

3.1. Random Forest (RF)

Random Forest (RF), a well-known supervised machine learning algorithm, is a nonparametric technique derived from classification and regression trees (CART), which applies ensemble learning method to solve problems [29]. Since the first introduction by Breiman [29], RF has been extremely applied in practice and with a wide range of applications, such as bioinformatics [30], materials sciences [31], remote sensing [32] or land cover classification [33]. RF is referred to construction of many trees, where each tree is generated by bootstrap samples. Then, a certain number of samples is kept for the validation process, which is called the out-of-bag predictions (OOB). Each split of the tree is constructed by a random process to create a subset of the predictors at each node. The final output of RF is the average of the results obtained by all the trees [29]. In this study, RF was applied to predict the UCS in which the number of bags used for bootstrapping was set at 500, whereas the optimal leaf size was set at 20.

3.2. Performance Assessment Criteria

In this study, various performance assessment criteria, namely Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Correlation Coefficient (R) were selected to evaluate the prediction capability of the proposed RF model. MAE, in general, is a statistical metric used for the assessment of the prediction quality of given soft computing algorithm [34], [35]. MAE measures the absolute difference between the predicted and experimental data. Besides, RMSE calculates the square root of the average of squared difference between the predicted and experimental data [36], [37]. R, the so-called Pearson correlation coefficient, is also a statistic measurement used to quantify the statistical relationship between the predicted and actual data [38]– [40]. The three criteria MAE, RMSE, and R are widely used in prediction problems utilizing machine learning algorithms. Literally, lower RMSE and MAE values mean better prediction capability. On the contrary, higher R values signify better performance [41]. The formulas of these criteria are given as below:

Fig. (1). Correlation analysis between the soil unconfined compression strength and input variables in this study.

(1)
(2)
(3)

where n is the number of data used, and are the ML predicted and mean ML predicted values, while and are the experimental and mean values of the UCS, respectively.

4. RESULTS AND DISCUSSION

4.1. Prediction Capability of RF

For UCS modeling and prediction, the RF algorithm was performed 20 times in randomly shuffling the training dataset (70% of the total samples) and the results of the best configuration was taken. The best adopted configuration was the one which gave highest value of R and lowest values of RMSE and MAE. The results and the corresponding values of error were presented.

The out-of-bag regression error in predicting the soil UCS is shown in Fig. (2). It can be seen that the error stabilized from 200 grown trees, so that increasing the number of grown trees does not seem necessary. This indicated that such number was sufficient to obtain converged prediction results.

The validation process of RF algorithm was performed and shown in Fig. (3). For the training dataset, it can be clearly observed that the experimental data (continuous, black line) and the predicted UCS values (red circles, discontinuous line) obtained by RF model were in strong coherence. The maximum values of error between the predicted UCS were about 1 kG/cm2 for only 3 samples, whereas only 4 samples exhibited an error of about 0.5 kG/cm2. The values of MSE was 0.152, while RMSE value gave 0.390 kG/cm2. Out of these, smaller values of error were in the range around 0 and the probability density was rather narrow, with an average of -0.018 and a standard deviation of 0.392. It is thus universally concluded that RF method has an effectiveness in finding the optimal UCS solutions.

Regarding the testing dataset (30%), the predicted results using RF were highlighted in Fig. (4). It can be seen that, similar to the training part, the experimental data and the predicted UCS values were in good agreement. In this case, the maximum values of error were close to 1.5 kG/cm2 (only 1 samples), whereas only 3 samples exhibited an error of about 1 kG/cm2. The values of MSE for the testing dataset was 0.211 while RMSE value gave 0.460 kG/cm2. The values of error were centered on 0 with an average of -0.093 and a standard deviation of 0.457. The accuracy of the testing part was inferior to the training one, which helps preventing overfitting phenomenon.

Fig. (2). Out-of-bag error in function of number of grown trees.

Fig. (3). Comparison of experimental and predicted values of the UCS using RF model along with error distribution, mean error and standard deviations for the training dataset.

Fig. (4). Comparison of experimental and predicted values of the UCS using RF model along with error distribution, mean error and standard deviations for the testing dataset.

Fig. (5). Prediction capability of RF algorithm for the UCS in a regression form for the training and testing datasets.

Fig. (6). Out-of-Bag feature importance of 6 variables used in this study using RF algorithm.

Validation results of the linear fit line, its equations and the R values are given in Fig. (5) for the training and testing datasets. The performance of RF in predicting the compressive strength values was satisfactory with R = 0.914, R=0.848 for the training and testing parts, respectively. Two linear fits were proposed and plotted in Fig. (5), where the slopes were computed as 0.68 and 0.65 for the training and testing datasets, respectively. The values of intercept were given as 0.47 and 0.56 for the two datasets.

This result demonstrates that the proposed RF model is suitable and can predict the soil compressive strength values which are, in general, close to experimental values.

4.2. Sensitivity Analysis

Naturally, the RF algorithm allows evaluating the significance of input parameters. The estimation of predictor importance values was conducted by summing changes in the risk due to splits on every predictor and dividing the sum by the number of branch nodes. Fig. (6) illustrates the out-of-bag feature importance of variables used in this study. It can be seen that the specific gravity (X3) is the most important variable in predicting the soil UCS as this factor is related to the density of particles presenting in soil [42]. Besides, the clay content (X1) is the second important input parameter, followed by the liquid limit (X5), plastic limit (X6) with equally importance, and the moisture content (X2), void ratio (X4), also with similar level of importance.

CONCLUSION

The soil unconfined compressive strength represents one of the important mechanical properties in civil engineering. In this study, the possibility of using the Random Forest algorithm in predicting the unconfined compressive strength of soil was investigated. A dataset containing 118 samples was constructed, taking the clay content, moisture content, specific gravity, void ratio, liquid limit and plastic limit as input variables. The main objective of the study was to predict the soil unconfined compressive strength. The verification on the reliability of the results was firstly conducted through analysis between the numbers of trees versus the out-of-bag error. The RF prediction process was then conducted and it was found that RF was a good predictor with satisfactory results of R, RMSE and MAE as 0.848, 0.460 and 0.093, respectively. A sensitivity analysis was carried out in order to reveal the importance of each given input to the predicted UCS. The specific gravity was found as most influential feature to the UCS, followed by clay content, liquid limit, plastic limit, moisture content and void ratio.

Many interesting perspectives of this study can be envisioned: (i) collection of more data of the UCS as to cover a wider range of input and output variables, (ii) analysis of the robustness of the RF algorithm taking into account the random data splitting using Monte Carlo simulations [43], as it is well-known that the accuracy of any ML algorithm strongly depends on the construction of the training dataset; and (iii) applying other ML algorithm or hybrid techniques to improve the prediction performance.

NOMENCLATURES

UCS  = Unconfined Compressive Strength
RF  = Random Forest
R  = Correlation Coefficient
RMSE  = Root Mean Square Error
MSE  = Root Mean Square Error
MAE  = Mean Absolute Error
CART  = Classification and Regression Trees
X1  = Clay Content
X2  = Moisture Content
X3  = Specific Gravity
X4  = Void Ratio
X5  = Liquid Limit
X6  = Plastic Limit

CONSENT FOR PUBLICATION

Not applicable.

AVAILABILITY OF DATA AND MATERIALS

Not applicable.

FUNDING

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 105.08-2019.03.

CONFLICT OF INTEREST

The authors declare no conflict of interest, financial or otherwise.

ACKNOWLEDGEMENTS

Declared none.

REFERENCES

[1] D.L. Rowell, Soil science: Methods & applications., Routledge, 2014.
[2] K. Terzaghi, R.B. Peck, and G. Mesri, Soil mechanics., John Wiley Sons: N. Y., 1996.
[3] B.T. Pham, M.D. Nguyen, D.V. Dao, I. Prakash, H.B. Ly, T.T. Le, L.S. Ho, K.T. Nguyen, T.Q. Ngo, V. Hoang, L.H. Son, H.T.T. Ngo, H.T. Tran, N.M. Do, H. Van Le, H.L. Ho, and D. Tien Bui, "Development of artificial intelligence models for the prediction of Compression Coefficient of soil: An application of Monte Carlo sensitivity analysis", Sci. Total Environ., vol. 679, pp. 172-184, 2019.
[4] V. Monchiet, H-B. Ly, and D. Grande, "Macroscopic permeability of doubly porous materials with cylindrical and spherical macropores", Meccanica, vol. 54, no. 10, pp. 1583-1596, 2019.
[5] H.B. Ly, V. Monchiet, and D. Grande, "Computation of permeability with Fast Fourier Transform from 3-D digital images of porous microstructures", Int. J. Numer. Methods Heat Fluid Flow, vol. 26, no. 5, pp. 1328-1345, 2016.
[6] H.B. Ly, B. Le Droumaguet, V. Monchiet, and D. Grande, "Designing and modeling doubly porous polymeric materials", Eur. Phys. J. Spec. Top., vol. 224, no. 9, pp. 1689-1706, 2015.
[7] S. Horpibulsuk, A. Chinkulkijniwat, A. Cholphatsorn, J. Suebsuk, and M.D. Liu, "Consolidation behavior of soil–cement column improved ground", Comput. Geotech., vol. 43, pp. 37-50, 2012.
[8] Z. Shan, D. Ling, and H. Ding, "Exact solutions for one-dimensional consolidation of single-layer unsaturated soil", Int. J. Numer. Anal. Methods Geomech., vol. 36, no. 6, pp. 708-722, 2012.
[9] L.K. Sharma, and T.N. Singh, "Regression-based models for the prediction of unconfined compressive strength of artificially structured soil", Eng. Comput., vol. 34, no. 1, pp. 175-186, 2018.
[10] C. Qi, H-B. Ly, Q. Chen, T-T. Le, V.M. Le, and B.T. Pham, "Flocculation-dewatering prediction of fine mineral tailings using a hybrid machine learning approach", Chemosphere, vol. 244, 2020.125450
[11] B.T. Pham, "A novel intelligence approach of a sequential minimal optimization-based support vector machine for landslide susceptibility mapping", Sustainability, vol. 11, no. 22, p. 6323, 2019.
[12] T.V. Phong, "Landslide susceptibility modeling using different artificial intelligence methods: a case study at Muong Lay district, Vietnam", Geocarto Int., vol. 0, no. 0, pp. 1-24, 2019.
[13] D.V. Dao, "A spatially explicit deep learning neural network model for the prediction of landslide susceptibility", Catena, vol. 188, 2020.104451
[14] K. Khosravi, "A comparative assessment of flood susceptibility modeling using Multi-Criteria Decision-Making Analysis and Machine Learning Methods", J. Hydrol. (Amst.), vol. 573, pp. 311-323, 2019.
[15] T-T. Le, B.T. Pham, H-B. Ly, A. Shirzadi, and L.M. Le, Development of 48-hour precipitation forecasting model using nonlinear autoregressive neural network.CIGOS 2019., Innovation for Sustainable Infrastructure: Singapore, 2020, pp. 1191-1196.
[16] B.T. Pham, "Development of advanced artificial intelligence models for daily rainfall prediction", Atmos. Res., vol. 237, 2020.104845
[17] H-B. Ly, L.M. Le, L.V. Phi, V.H. Phan, V.Q. Tran, B.T. Pham, T.T. Le, and S. Derrible, "Development of an ai model to measure traffic air pollution from multisensor and weather data", Sensors (Basel), vol. 19, no. 22, p. 4941, 2019.
[18] P.G. Asteris, M. Apostolopoulou, A.D. Skentou, and A. Moropoulou, "Application of artificial neural networks for the prediction of the compressive strength of cement-based mortars", Comput. Concr., vol. 24, no. 4, pp. 329-345, 2019.
[19] P.G. Asteris, A. Ashrafian, and M. Rezaie-Balf, "Prediction of the compressive strength of self-compacting concrete using surrogate models", Comput. Concr., vol. 24, no. 2, pp. 137-150, 2019.
[20] P.G. Asteris, P.C. Roussis, and M.G. Douvika, "Feed-forward neural network prediction of the mechanical properties of sandcrete materials", Sensors (Basel), vol. 17, no. 6, p. 1344, 2017.
[21] P.G. Asteris, and K.G. Kolovos, "Self-compacting concrete strength prediction using surrogate models", Neural Comput. Appl., vol. 31, no. 1, pp. 409-424, 2019.
[22] P.G. Asteris, and V.G. Mokos, "Concrete compressive strength using artificial neural networks", Neural Comput. Appl., no. Dec, 2019.
[23] D.V. Dao, "A sensitivity and robustness analysis of gpr and ann for high-performance concrete compressive strength prediction using a Monte Carlo Simulation", Sustainability, vol. 12, no. 3, p. 830, 2020.
[24] P.G. Asteris, "Stochastic vulnerability assessment of masonry structures: concepts, modeling and restoration aspects", Appl. Sci. (Basel), vol. 9, no. 2, p. 243, 2019.
[25] P.G. Asteris, and M. Nikoo, "Artificial bee colony-based neural network for the prediction of the fundamental period of infilled frame structures", Neural Comput. Appl., vol. 31, no. 9, pp. 4837-4847, 2019.
[26] H-B. Ly, "Development of hybrid machine learning models for predicting the critical buckling load of I-shaped cellular beams", Appl. Sci. (Basel), vol. 9, no. 24, p. 5458, 2019.
[27] M.D. Nguyen, "Development of an artificial intelligence approach for prediction of consolidation coefficient of soft soil: a sensitivity analysis", Open Constr. Build. Technol. J., vol. 13, no. 1, 2019.
[28] B.M. Das, and K. Sobhan, Principles of geotechnical engineering., Cengage Learning, 2013.
[29] L. Breiman, "Random forests", Mach. Learn., vol. 45, no. 1, pp. 5-32, 2001.
[30] A-L. Boulesteix, S. Janitza, J. Kruppa, and I.R. König, "Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics", Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 2, no. 6, pp. 493-507, 2012.
[31] H-B. Ly, E. Monteiro, T.T. Le, V.M. Le, M. Dal, G. Regnier, and B.T. Pham, "Prediction and sensitivity analysis of bubble dissolution time in 3d selective laser sintering using ensemble decision trees", Materials (Basel), vol. 12, no. 9, p. 1544, 2019.
[32] M. Pal, "Random forest classifier for remote sensing classification", Int. J. Remote Sens., vol. 26, no. 1, pp. 217-222, 2005.
[33] P.O. Gislason, J.A. Benediktsson, and J.R. Sveinsson, "Random forests for land cover classification", Pattern Recognit. Lett., vol. 27, no. 4, pp. 294-300, 2006.
[34] D.V. Dao, S.H. Trinh, H-B. Ly, and B.T. Pham, "Prediction of compressive strength of geopolymer concrete using entirely steel slag aggregates: novel hybrid artificial intelligence approaches", Appl. Sci. (Basel), vol. 9, no. 6, p. 1113, 2019.
[35] D.V. Dao, H-B. Ly, S.H. Trinh, T-T. Le, and B.T. Pham, "Artificial intelligence approaches for prediction of compressive strength of geopolymer concrete", Materials (Basel), vol. 12, no. 6, p. 983, 2019.
[36] L.M. Le, H.B. Ly, B.T. Pham, V.M. Le, T.A. Pham, D.H. Nguyen, X.T. Tran, and T.T. Le, "Hybrid artificial intelligence approaches for predicting buckling damage of steel columns under axial compression", Materials (Basel), vol. 12, no. 10, p. 1670, 2019.
[37] H-B. Ly, "Hybrid artificial intelligence approaches for predicting critical buckling load of structural members under compression considering the influence of initial geometric imperfections", Appl. Sci. (Basel), vol. 9, no. 11, p. 2258, 2019.
[38] H-B. Ly, B.T. Pham, D.V. Dao, V.M. Le, L.M. Le, and T-T. Le, "Improvement of ANFIS model for prediction of compressive strength of manufactured sand concrete", Appl. Sci. (Basel), vol. 9, no. 18, p. 3841, 2019.
[39] H-L. Nguyen, "Development of hybrid artificial intelligence approaches and a support vector machine algorithm for predicting the marshall parameters of stone matrix asphalt", Appl. Sci. (Basel), vol. 9, no. 15, p. 3172, 2019.
[40] H-L. Nguyen, "Adaptive network based fuzzy inference system with meta-heuristic optimizations for international roughness index prediction", Appl. Sci. (Basel), vol. 9, no. 21, p. 4715, 2019.
[41] R. Taylor, "Interpretation of the correlation coefficient: a basic review", J. Diagn. Med. Sonogr., vol. 6, no. 1, pp. 35-39, 1990.
[42] N.C. Consoli, G.V. Rotta, and P.D.M. Prietto, "Yielding–compressibility–strength relationship for an artificially cemented soil cured under stress", Geotechnique, vol. 56, no. 1, pp. 69-72, 2006.
[43] H-B. Ly, C. Desceliers, L. Minh Le, T.T. Le, B. Thai Pham, L. Nguyen-Ngoc, V.T. Doan, and M. Le, "Quantification of uncertainties on the critical buckling load of columns under axial compression with uncertain random materials", Materials (Basel), vol. 12, no. 11, p. 1828, 2019.