Abstract
Groundwater management decisions require robust methods that allow accurate predictive modeling of pollutant occurrences. In this study, random forest regression (RFR) was used for modeling groundwater nitrate contamination at the African continent scale. When compared to more conventional techniques, key advantages of RFR include its nonparametric nature, its high predictive accuracy, and its capability to determine variable importance. The latter can be used to better understand the individual role and the combined effect of explanatory variables in a predictive model. In the absence of a systematic groundwater monitoring program at the African continent scale, the study used the groundwater nitrate contamination database for the continent obtained from a meta-analysis to test the modeling approach; 250 groundwater nitrate pollution studies from the African continent were compiled using the literature data. A geographic information system database of 13 spatial attributes was collected, related to land use, soil type, hydrogeology, topography, climatology, type of region, and nitrogen fertilizer application rate, and these were assigned as predictors. The RFR performance was evaluated in comparison to the multiple linear regression (MLR) methods. By using RFR, it was possible to establish which explanatory variables influence the occurrence of nitrate pollution in groundwater (population density, rainfall, recharge, etc.). Both the RFR and MLR techniques identified population density as the most important variable explaining reported nitrate contamination. However, RFR has a much higher predictive power (R2 = 0.97) than a traditional linear regression model (R2 = 0.64). RFR is therefore considered a very promising technique for large-scale modeling of groundwater nitrate pollution.
Résumé
Les décisions relatives à la gestion des eaux souterraines nécessitent des méthodes robustes qui permettent une modélisation prédictive exacte de l’occurrence d’un polluant. Dans la présente étude, la méthode de régression dite de forêts aléatoires (RFA) a été utilisée pour modéliser la contamination des eaux souterraines par les nitrates à l’échelle du continent africain. Quand on la compare à la plupart des techniques classiques, les avantages principaux de la RFA comportent: sa nature non paramétrique, sa haute précision prédictive, et sa capacité à déterminer l’importance d’une variable. Cette dernière peut être utilisée pour mieux comprendre le rôle individuel et l’effet combiné des variables explicatives dans un modèle prédictif. En l’absence d’un programme de gestion des eaux souterraines systématique à l’échelle du continent africain, l’étude a utilisé une base de données sur la contamination des eaux souterraines par les nitrates issue d’une méta-analyse, dans le but de tester une approche par modélisation; 250 études de pollution des eaux souterraines par les nitrates concernant le continent africain ont été compilées à partir de données bibliographiques. La base de données d’un système d’information géographique de 13 attributs spatiaux a été construite, relativement à l’utilisation des sols, au type de sol, à l’hydrogéologie, topographie, climatologie, au type de région et au taux d’épandage d’un engrais azotée et ceux-ci ont été désignés comme prédicteurs. La performance de la RFA a été évaluée par comparaison avec les méthodes de régression linéaire multiple (RLM). En utilisant la RFA, il a été possible d’identifier les variables explicatives influençant l’occurrence de la pollution nitratée dans les eaux souterraines (densité de la population, précipitations, recharge, etc.). Les techniques de RFA et de RLM ont identifié l’une et l’autre la densité de population comme la variable la plus importante pour expliquer la contamination par les nitrates. Cependant, la RFA a un pouvoir prédictif plus important (R2 = 0.97) qu’un modèle de régression linéaire traditionnel (R2 = 0.64). La RFA est. ainsi considérée comme une technique très prometteuse de modélisation à grande échelle de la pollution des eaux souterraines par les nitrates.
Resumen
Las decisiones de gestión del agua subterránea necesitan métodos robustos que permitan un modelado predictivo preciso de ocurrencias de contaminantes. En este estudio, se utilizó la regresión de bosques aleatorios (RFR) para modelar la contaminación por nitrato del agua subterránea a escala del continente africano. Cuando se comparan con técnicas más convencionales, las ventajas claves de la RFR incluyen su naturaleza no paramétrica, su alta precisión predictiva y su capacidad para determinar la importancia de las variables. Esta última se puede utilizar para comprender mejor el rol individual y el efecto combinado de las variables explicativas en un modelo predictivo. En ausencia de un programa sistemático de monitoreo de agua subterránea a escala del continente africano, el estudio utilizó una base de datos de contaminación de nitrato de agua subterránea obtenida de un metanálisis para probar el enfoque del modelado; Se compilaron 250 estudios de contaminación de nitrato de agua subterránea del continente africano utilizando los datos de la literatura. Se recopiló una base de datos del sistema de información geográfica de 13 atributos espaciales, relacionada con el uso del suelo, el tipo de suelo, la hidrogeología, la topografía, la climatología, el tipo de región y la tasa de aplicación de fertilizantes nitrogenados, y estos se asignaron como predictores. El rendimiento de RFR se evaluó en comparación con los métodos de regresión lineal múltiple (MLR). Mediante el uso de RFR, fue posible establecer qué variables explicativas influyen en la incidencia de la contaminación por nitratos en el agua subterránea (densidad de población, precipitación, recarga, etc.). Las técnicas RFR y MLR identificaron la densidad de población como la variable más importante que explica la contaminación por nitrato reportada. Sin embargo, la RFR tiene un poder predictivo mucho más alto (R2 = 0.97) que un modelo de regresión lineal tradicional (R2 = 0.64). Por lo tanto, la RFR se considera una técnica muy prometedora para el modelado a gran escala de la contaminación del agua subterránea por nitrato.
摘要
地下水管理决策需要能够准确预测模拟发生污染的强劲方法。本研究中,采用随机预测回归分析模拟非洲大陆尺度上的地下水硝酸盐含量。与更常规的技术相比,随机预测回归分析的主要优点包括其非参数特性、很高的预测精度以及确定变量重要性的能力。后者可用于更好地了解预测模型中解释性变量的各自作用及综合影响。在非洲大陆尺度上缺乏系统地下水监测项目的情况下,研究利用从荟萃分析中得到的地下水硝酸盐含量数据库测试模拟方法。利用文献数据编辑了250项非洲大陆地下水硝酸盐污染方面的研究。收集了13个与土地利用、土壤类型、水文地质学、地形学、气候学、地区类型及氮肥应用量等相关的空间属性的地理信息数据库,这些属性作为预测因子。在比较多重线性回归分析法中评估了随机预测回归分析的性能。利用随机预测回归分析,就有可能确定哪种解释性变量影响地下水中的硝酸盐污染(人口密度、降雨补给等)。随机预测回归分析和多重线性回归分析都确定了人口密度是造成所报道的硝酸盐污染最重要的变量。然而,随机预测回归分析的预测能力(R2 = 0.97)比传统线性回归模型的预测能力(R2 = 0.64)要高很多。因此,随机预测回归分析被认为是大尺度模拟地下水硝酸盐污染非常有前途的一项技术。
Resumo
As decisões de gestão das águas subterrâneas precisam de métodos robustos que permitam a modelagem preditiva precisa das ocorrências de poluentes. Neste estudo, a regressão de floresta aleatória (RFA) foi usada para modelar a contaminação por nitrato em águas subterrâneas na escala do continente africano. Quando comparadas à técnicas mais convencionais, as principais vantagens do RFA incluem sua natureza não paramétrica, sua alta precisão preditiva e sua capacidade de determinar a importância da variável. Este último pode ser usado para entender melhor o papel individual e o efeito combinado de variáveis explicativas em um modelo preditivo. Na ausência de um programa sistemático de monitoramento de águas subterrâneas na escala do continente Africano, o estudo utilizou um banco de dados de contaminação por nitrato em águas subterrâneas obtido a partir de uma meta-análise para testar a abordagem de modelagem; 250 estudos de poluição por nitrato em águas subterrâneas do continente Africano foram compilados usando os dados da literatura. Foi coletado um banco de dados em sistema de informações geográficas com 13 atributos espaciais, relacionados ao uso da terra, tipo de solo, hidrogeologia, topografia, climatologia, tipo de região e taxa de aplicação de fertilizantes nitrogenados, sendo estes atribuídos como preditores. O desempenho do RFA foi avaliado em comparação com os métodos de regressão linear múltipla (RLM). Através da RFA, foi possível estabelecer quais variáveis explicativas influenciam a ocorrência de poluição por nitrato nas águas subterrâneas (densidade populacional, precipitação, recarga, etc.). Ambas as técnicas RFA e RLM identificaram a densidade populacional como a variável mais importante que explica a contaminação relatada por nitrato. No entanto, a RFA tem um poder preditivo muito mais alto (R2 = 0.97) do que um modelo de regressão linear tradicional (R2 = 0.64). A RFA é, portanto, considerada uma técnica muito promissora para a modelagem em grande escala da poluição das águas subterrâneas por nitrato.
Similar content being viewed by others
References
Abrahart RJ et al (2008) Practical hydroinformatics. computational intelligence and technological developments in water applications. Open Model Integration in Flood Forecasting 68
Aljazzar TH (2010) Adjustment of DRASTIC vulnerability index to assess groundwater vulnerability for nitrate pollution using the advection-diffusion cell. Von der Fakultät für Georessourcen und Materialtechnik der Rheinisch-Westfälischen Technischen Hochschule Aachen Ph.D. thesis, 146 pp
Alley WM, Healy RW, LaBaugh JW, Reilly TE (2002) Flow and storage in groundwater systems. Science 296(5575):1985–1990
Andrade AIASS, Stigter TY (2009) Multi-method assessment of nitrate and pesticide contamination in shallow alluvial groundwater as a function of hydrogeological setting and land use. Agric Water Manag 96(12):1751–1765
Anning DW, Paul AP, McKinney TS, Huntington JM, Bexfield LM, Thiros SA (2012) Predicted nitrate and arsenic concentrations in basin-fill aquifers of the southwestern United States. US Geological Survey Scientific Investigations Report 2012–5065
Anuraga TS, Ruiz L, Kumar MSM, Sekhar M, Leijnse A (2006) Estimating groundwater recharge using land use and soil data: a case study in South India. Agric Water Manag 84(1–2):65–76
Barzegar et al (2018) Mapping groundwater contamination risk of multiple aquifers using multi-model ensemble of machine learning algorithms. Sci Total Environ 621(2018):697–712. https://doi.org/10.1016/j.scitotenv.2017.11.185
Bauder J, Sinclair KN, Lund RE (1993) Physiographic and land use characteristics associated with nitrate-nitrogen in Montana groundwater. J Environ Qual 22(2):255–262. https://doi.org/10.2134/jeq1993.00472425002200020004x
BGS (2011) Depth to groundwater map. https://www.bgs.ac.uk/downloads/browse.cfm?sec=9&cat=38. Accessed 19 April 2014
Bonsor HC, MacDonald AM (2011) An initial estimate of depth to groundwater across Africa. British Geological Survey Open Report OR/11/067: 26pp
Boy-Roura M, Nolan BT, Menció A, Mas-Pla J (2013) Regression model for aquifer vulnerability assessment of nitrate pollution in the Osona region (NE Spain). J Hydrol 505:150–162
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001a) Random forests. Mach Learn 45:5–32
Breiman L (2001b) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16(3):199–231
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
Burow KR, Nolan BT, Rupert MG, Dubrovsky NM (2010) Nitrate in groundwater of the United States, 1991−2003. Environ Sci Technol 44(13):4988–4997
Cameron KC, Di HJ, Moir JL (2013) Nitrogen losses from the soil/plant system: a review. Ann Appl Biol 162(2):145–173
Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson JC, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88(11):2783–2792. https://doi.org/10.1890/07-0539.1
Davis DB, Sylvester-Bradley R (1995) The contribution of fertiliser nitrogen to leachable nitrogen in the UK: a review. J Sci Food Agric 68:399–406. https://doi.org/10.1002/jsfa.2740680402
Debernardi L, De-Luca DA, Lasahna M (2007) Correlation between nitrate concentration in groundwater and parameters affecting aquifer intrinsic vulnerability. Environ Geol 55:539–558
Defourny P, Kirches G, Brockmann C, Boettcher M, Peters M, Bontemps S, et al (2014) Land cover CCI product user guide version 2. 2014
Döll P, Fiedler K (2008) Global-scale modeling of groundwater recharge. Hydrol Earth Syst Sci 12:863–885. https://doi.org/10.5194/hess-12-863-2008,2008
Dubrovsky NM, Burow KR, Clark GM, Gronberg JM, Hamilton PA, Hitt KJ, Mueller DK, Munn MD, Nolan BT, Puckett LJ, Rupert MG, Short TM, Spahr NE, Sprague LA, Wilber WG (2010) The quality of our nation’s waters—nutrients in the nation’s streams and groundwater, 1992–2004. US Geological Survey Circular 1350, 174 pp
ESRI (1969) ArcGIS, www.arcgis.com/home. Accessed 23 June 2015
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181
Foster S, Pulido-Bosch A, Vallejos Á, Molina L, Llop A, MacDonald AM (2018) Impact of irrigated agriculture on groundwater-recharge salinity: a major sustainability concern in semi-arid regions. Hydrogeol J. https://doi.org/10.1007/s10040-018-1830-2
Fram MS, Belitz K (2011) Probability of detecting perchlorate under natural conditions in deep groundwater in California and the southwestern United States. Environ Sci Technol 45(4):1271–1277
Friedl MA, Brodley CE, Strahler AH (1999) Maximizing land cover classification accuracies produced by decision trees at continental to global scales. IEEE Trans Geosci Remote Sens 37(2 II):969–977
Gassiat C, Gleeson T, Luijendijk E (2013) The location of old groundwater in hydrogeologic basins and layered aquifer systems. Geophys Res Lett 40(12):3042–3047. https://doi.org/10.1002/grl.50599
Gemitzi A, Petalas C, Pisinaras V, Tsihrintzis VA (2009) Spatial prediction of nitrate pollution in groundwaters using neural networks and GIS: an application to south Rhodope aquifer (Thrace, Greece). Hydrol Process 23(3):372–383. https://doi.org/10.1002/hyp.7143
Genuer R, Poggi JM, Christine TM (2010) Variable selection using random forests. Pattern Recogn Lett 31(14):2225–2236
Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27(4):294–300
Gleeson T, Moosdorf N, Hartmann J, van Beek LPH (2014) A glimpse beneath earth’s surface: global HYdrogeology MaPS (GLHYMPS) of permeability and porosity. Geophys Res Lett 41(11):3891–3898. https://doi.org/10.1002/2014GL059856
Golkarian A, Naghibi SA, Kalantar B, Pradhan B (2018) Groundwater potential mapping using C5. 0, random forest, and multivariate adaptive regression spline models in GIS. Environ Monit Assess 190(3):149. https://doi.org/10.1007/s10661-018-6507-8
Greene EA, LaMotte AE, Cullinan KA (2005) Ground-water vulnerability to nitrate contamination at multiple thresholds in the Mid-Atlantic region using spatial probability models. US Geological Survey Scientific Investigations Report 2004–5118, p 24
Graham MH (2003) Confronting multicollinearity in ecological multiple regression. Ecology 84(11) pp. 2809–2815. https://www.jstor.org/stable/3449952. Accessed 3 Feb 2016
Grömping U (2009) Variable importance assessment in regression: linear regression versus random forest. Am Stat 63(4):308–319. https://doi.org/10.1198/tast.2009.08199
Gurdak JJ, Qi SL (2012) Vulnerability of recently recharged groundwater in principle aquifers of the United States to nitrate contamination. Environ Sci Technol 46(11):6004–6012
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell (10):993–1001
Hanson CR (2002) Nitrate concentrations in Canterbury ground water – a review of existing data. Report no. R02/17. Environment Canterbury Technical Report, 87 pp
Hao A, Zhang Y, Zhang E, Li Z, Yu J, Wang H, Yang J, Wang Y (2018) Review: groundwater resources and related environmental issues in China. Hydrogeol J. https://doi.org/10.1007/s10040-018-1787-1
Hartmann J, Moosdorf N (2012) The new global lithological map database GLiM: a representation of rock properties at the earth surface. Geochem Geophys Geosyst 13:Q12004. https://doi.org/10.1029/2012GC004370
Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning, 2nd edn. Springer
Hengl T, Hengl T, de Jesus JM, MacMillan RA, Batjes NH, Heuvelink GBM, Ribeiro E, Samuel-Rosa A, Kempen B, Leenaars JGB, Walsh MG, Gonzalez MR (2014) Soil-Grids1km – global soil information based on automated mapping. PLoS One 9:e105992. https://doi.org/10.1371/journal.pone.0105992
Hoyos ICP, Krakauer N, Khanbilvardi R (2015) Random forest for identification and characterization of groundwater dependent ecosystems. WIT Trans Ecol Environ 196:89–100
ISRIC (2014) SoilGrids – Global gridded soil information. (https://www.isric.org/explore/soilgrids, Accessed 19 July 2014). [Reference to paper: Hengl T, de Jesus JM, MacMillan RA, Batjes NH, Heuvelink GBM, et al. (2014) SoilGrids1km — global soil information based on automated mapping. PLoS ONE 9(8):e105992. https://doi.org/10.1371/journal.pone.0105992]
Jung Y-Y, Dong-Chan K, Won-Bae P, Kyoochul H (2015) Evaluation of multiple regression models using spatial variables to predict nitrate concentrations in volcanic aquifers. Hydrol Process 30(5):663–675
Kazemi G, Lehr J, Perrochet P (2006) Groundwater age. Wiley-Interscience, Hoboken, New Jersey. 325pp
Khalil A, Almasri MN, McKee M, Kaluarachchi JJ (2005) Applicability of statistical learning algorithms in groundwater quality modeling. Water Resour Res 41(5)
Kihumba AM, Longo JN, Vanclooster M (2015) Modelling nitrate pollution pressure using a multivariate statistical approach: the case of Kinshasa groundwater body. Democratic Republic of Congo. Hydrogeol J: 1–13. https://doi.org/10.1007/s10040-015-1337-z
Kulabako N, Nalubega M, Thunvik R (2007) Study of the impact of land use and hydrogeological settings on the shallow groundwater quality in a peri-urban area of Kampala, Uganda. Sci Total Environ 381(1):180–199. https://doi.org/10.1016/j.scitotenv.2007.03.035
Lapworth DJ, Nkhuwa DCW, Okotto-Okotto J, Pedley S, Stuart ME, Tijani MN, Wright J (2017) Urban groundwater quality in sub-Saharan Africa: current status and implications for water security and public health. Hydrogeol J 25(4):1093–1116. https://doi.org/10.1007/s10040-016-1516-6
Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22
Liu CW, Wang Y-B, Jang C-S (2013) Probability-based nitrate contamination map of groundwater in Kinmen. Environ Monit Assess 185(12):10147–10156
Loosvelt L, Petersb J, Skriverc H, Lievensa H, Van Coillied FMB, De Baetsb B, Verhoesta NEC (2012) Random forests as a tool for estimating uncertainty at pixel-level in SAR image classification. Int J Appl Earth Obs Geoinf 19:173–184
Luo Y, Qiao X, Song J, Christie P, Wong M (2003) Use of a multi-layer column device for study on leachability of nitrate in sludge-amended soils. Chemosphere 52:1483–1488
MacDonald AM, Calow RC, MacDonald DM, Darling WG, Dochartaigh BÉÓ (2009) What impact will climate change have on rural groundwater supplies in Africa. Hydrol Sci J 64(690–703). 18pp
MacDonald AM, Taylor RG, Bonsor HC (2013) Groundwater in Africa – is there sufficient water to support the intensification of agriculture from “Land Grabs”? Hand book of land and water grabs in Africa, 9pp
Mair A, El-Kadi AI (2013) Logistic regression modeling to assess groundwater vulnerability to contamination in Hawaii, USA. J Contam Hydrol 153:1–23
Margat J (2010) Ressources et utilisation des eaux souterraines en Afrique. Managing Shared Aquifer Resources in Africa, Third International Conférence Tripoli 25–27 may 2008. International Hydrological Programme, Division of Water Sciences, IHP-VII Series on groundwater No.1, UNESCO, p 26–34
Masterson, JP, Hess KM, Walter DA, LeBlanc DR (2002) Simulated changes in the sources of ground water for public-supply wells, ponds, streams, and coastal areas on Western Cape Cod, Massachusetts. US Geological Survey Water Resources Investigations Report 02–4143
Mattern S, Vanclooster M (2009) Estimating travel time of recharge water through the unsaturated zone using transfer function model. Environ Fluid Mech. https://doi.org/10.1007/s10652-009-9148-1
Mattern S, Raouafi W, Bogaert P, Fasbender D, Vanclooster M (2012) Bayesian data fusion (BDF) of monitoring data with a statistical groundwater contamination model to map groundwater quality at the regional scale. J Water Resour Prot 4(11):929–943
Mendes MP, Rodriguez-Galiano V, Luque-Espinar JA, Ribeiro L, Chica- Olmo M (2016) Applying random forest to assess the vulnerability of groundwater to pollution by nitrates. geoENV 2016. The 11th International Conference onGeostatistics for Environmental Applications. Lisbon, Portugal. geoENV2016BookofAbstractsMPM
Moreno R, Zamora R, Molina JR, Vasquez A, Herrera MÁ (2011) Predictive modeling of microhabitats for endemic birds in south Chilean temperate forests using maximum entropy (Maxent). Eco Inform 6(6):364–370
Murtaugh PA (2009) Performance of several variable-selection methods applied to real ecological data. Ecol Lett 12(10):1061–1068
Naghibi SA, Ahmadi K, Daneshi A (2017) Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping. Water Resour Manag 31(9):2761–2775. https://doi.org/10.1007/s11269-017-1660-3
Nelson A (2004) Population Density for Africa in 2000, 4th edn. Retrieved 1/27/2011 from UNEP/GRID Sioux Falls. https://databasin.org/datasets/4d59b959e8b040688037d2fe83a3f369. Accessed 19 April 2015
Nolan BT, Hitt KJ (2006) Vulnerability of shallow groundwater and drinking-water wells to nitrate in the United States. Environ Sci Technol 40(24):7834–7840. https://doi.org/10.1021/es060911u
Nolan BT, Hitt KJ, Ruddy BC (2002) Probability of nitrate contamination of recently recharged groundwaters in the conterminous United States. Environ Sci Technol 36(10):2138–2145. https://doi.org/10.1021/es0113854
Nolan BT, Fienen MN, Lorenz DL (2015) A statistical learning framework for groundwater nitrate models of the Central Valley, California, USA. J Hydrol 531:902–911. https://doi.org/10.1016/j.jhydrol.2015.10.025
Nolan BT, Gronberg JM, Faunt CC, Eberts SM, Belitz K (2014) Modeling nitrate at domestic and public-supply well depths in the Central Valley, California. Environ Sci Technol 48(10):5643–5651. https://doi.org/10.1021/es405452q.
Norouz H, Negar AM, Attaallah N (2016) Determining vulnerable areas of Malekan Plain aquifer for nitrate, using random forest method. Journal of Environmental Studies, vol 41, no 4 (76), pp 923–942. http://www.sid.ir/En/Journal/ViewPaper.aspx?ID=550917. Accessed online 2 August 2018
Oliveira S, Oehler F, San-Miguel-Ayanz J, Camia A, Pereira JMC (2012) Modeling spatial patterns of fire occurrence in Mediterranean Europe using multiple regression and random forest. For Ecol Manag 275:117–129
Oppel S, Meirinho A, Ramírez I, Gardner B, O’Connell AF, Miller PI, Louzao, M (2012) Comparison of five modelling techniques to predict the spatial distribution and abundance of seabirds. Biol Conserv 156:94–104. https://doi.org/10.1016/j.biocon.2011.11.013
Ouedraogo I, Vanclooster M (2016a). A meta-analysis and statistical modelling of nitrates in groundwater at the African scale. In: Hydrology and Earth System Sciences 20(6):2353–2381
Ouedraogo I, Vanclooster M (2016b) Shallow groundwater poses pollution problem for Africa. SciDev.Net, 4 pp, http://hdl.handle.net/2078.1/169630
Ouedraogo I, Defourny P, Vanclooster M (2016) Mapping the groundwater vulnerability for pollution at the pan-African scale. In: Science of the Total Environment, 544:939–953. https://doi.org/10.1016/j.scitotenv.2015.11.135
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26(1):217–222
Park N-W (2014) Using maximum entropy modeling for landslide susceptibility mapping with multiple geoenvironmental data sets. Environ Earth Sci 73(3):937–949
Pearson S (2015) Identifying Groundwater Vulnerability from Nitrate Contamination: Comparison of the DRASTIC model and Environment Canterbury’s method. Degree of Master of Applied Science (Environmental Management). Lincoln University. 58 pp
Peters J, Baets BD, Verhoest NEC, Samson R, Degroeve S, Becker PD, Huybrechts W (2007) Random forests as a tool for ecohydrological distribution modelling. Ecol Model 207(2–4):304–318
Potter P, Ramankutty N, Bennett EM, Donner SD (2010) Characterizing the spatial patterns of global fertilizer application and manure production. Earth Interact 14:1–22. https://doi.org/10.1175/2009EI288.1
Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9(2):181–199. https://doi.org/10.1007/s10021-005-0054-1
Puckett LJ, Tesoriero AJ, Dubrovsky NM (2011) Nitrogen contamination of surficial aquifers--a growing legacy. Environ Sci Technol 45(3):839–844. https://doi.org/10.1021/es1038358
R Development Core Team (2015) A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. http://www.r-project.org/. Last accessed 6 March 2015)
Ramasamy N, Krishnan P, Bernard JC, Ritter WF(2003) Modeling Nitrate Concentration in Ground Water Using Regression and Neural Networks. Department of Food and Resource Economics. College of Agriculture and Natural Resources. University of Delaware(ORES SP03–01). 10pp
Rankinen K, Salo T, Granlund K, Rita H (2007) Simulated nitrogen leaching, nitrogen mass field balances and their correlation on four farms in South-Western Finland during the period 2000–2005. Agric Food Sci 16:387–406
Ransom et al (2017). A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA. https://doi.org/10.1016/j.scitotenv.2017.05.192
Rawlings JO, Pantula SG, Dickey DA (1998) Applied regression analysis, a research tool. Springer, Berlin. 658p
Ritter A, Muñoz-Carpena R (2013) Performance evaluation of hydrological models: statistical significance for reducing subjectivity in goodness-of-fit assessments. J Hydrol 480:33–45. https://doi.org/10.1016/j.jhydrol.2012.12.004
Rodriguez-Galiano VF, Chica-Rivas M (2012) Evaluation of different machine learning methods for land cover mapping of a Mediterranean area using multi-seasonal Landsat images and digital terrain models. Int J Digital Earth 7(6):492–509
Rodriguez-Galiano VF, Chica-Olmo M, Abarca-Hernandez F, Atkinson PM, Jeganathan C (2012a) Random forest classification of Mediterranean land cover using multi-seasonal imagery and multi-seasonal texture. Remote Sens Environ 121:93–107
Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012b) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104
Rodriguez-Galiano V, Mendes MP, Garcia-Soldado MJ, Chica-Olmo M, Ribeiro L (2014) Predictive modeling of groundwater nitrate pollution using random forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (southern Spain). Sci Total Environ 476-477:189–206. https://doi.org/10.1016/j.scitotenv.2014.01.001
Saffigna PG, Keeney DR (1997) Nitrate and chloride in groundwater under irrigated agriculture in Central Wisconsin. Groundwater 15(2):170–177
Sahoo S, Russo TA, Elliott J, Foster I (2017) Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S. Water Resour Res 53:3878–3895. https://doi.org/10.1002/2016WR019933
Sajedi-Hosseini F, Malekian A, Choubin B, Rahmati O, Cipullo S, Coulon F, Pradhan B (2018) A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Sci Total Environ 644(2018):954–962. https://doi.org/10.1016/j.scitotenv.2018.07.054
Schweigert P, Pinter N, van der Ploeg R (2004) Regression analyses of weather effects on the annual concentrations of nitrate in soil and groundwater. J Plant Nutr Soil Sci 167(3):309–318
Sesnie SE, Gessler PE, Finegan B, Thessler S (2008) Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and change detection in complex neotropical environments. Remote Sens Environ 112(5):2145–2159
Sieling K, Kage H (2006) N balance as an indicator of N leaching in an oilseed rape – winter wheat – winter barley rotation. Agric Ecosyst Environ 115:261–269
Sophocleous M (2004) Groundwater recharge. In: Silveira L, Wohnlich S, Usunoff EL (eds), Groundwater. Encyclopedia of Life Support Systems (EOLSS), Developed under the Auspices of the UNESCO, Eolss Publishers, Oxford, UK. http://www.eolss.net. Accessed 9 September 2015
Spalding RF, Exner ME (1993) Occurrence of nitrate in groundwater- a review. J Environ Qual 22:392–402. https://doi.org/10.2134/jeq1993.00472425002200030002x
Steele BM (2000) Combining multiple classifiers: an application using spatial and remotely sensed information for land cover type mapping. Remote Sens Environ 74(3):545–556
Stevenson FJ, Cole MA (1999) Cycles of soil carbon, nitrogen, phosphorus, sulfur, micronutrients, 2nd edn. Wiley, Hoboken
Stigter TY, Ribeiro L, Dill AMMC (2008) Building factorial regression models to explain and predict nitrate concentrations in groundwater under agricultural land. J Hydrol 357(1–2):42–56
Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources, and a solution. BMC Bioinf 8:25. https://doi.org/10.1186/1471-2105-8-25
Teng Y, Hu B, Zheng J, Wang J, Zhai Y, Zhu C (2018) Water quality responses to the interaction between surface water and groundwater along the Songhua River, NE China. Hydrogeol J. https://doi.org/10.1007/s10040-018-1738-x
Tesoriero AJ, Voss FD (1997) Predicting the probability of elevated nitrate concentrations in the Puget Sound-Basin, implications for aquifer susceptibility and vulnerability. Ground Water 35(6):1029–1039
Thayalakumaran T, Charlesworth PB, Bristow K, van Bemmelen RJ, & Jaffres J (2004) Nitrate and ferrous iron concentrations in the lower Burdekin aquifers: assessing denitrification potential. In B. Singh (Ed), SuperSoil 2004 Conference 3rd Australian New Zealand Soils Conference (pp. 1-9). Sydney: The Regional Institute Ltd. https://researchoutput.csu.edu.au/en/publications/nitrate-and-ferrous-iron-concentrations-in-the-lower-burdekin-aqu, https://www.researchgate.net/publication/228513222_Nitrate_and_ferrous_iron_concentrations_in_the_lower_Burdekin_aquifers_assessing_denitrification_potenti. Accessed 17 Feb 2016
Trambauer P, Dutra E, Maskey S, Werner M, Pappenberger F, van Beek LPH, Uhlenbrook S (2014) Comparison of different evaporation estimates over the African continent. Hydrol Earth Syst Sci 18(1):193–212
UNECA, AU, AfDB (2000) The Africa Water Vision 2025: Equitable and Sustainable Use of Water for Socioeconomic Development. http://www.afdb.org/fileadmin/uploads/afdb/Documents/Generic-Documents/african%20water%20vision%202025%20to%20be%20sent%20to%20wwf5.pdf. Accessed 11 February 2016
UNEP (1986) Final Report: UNEP/FAO World and Africa GIS Data Base; December 1984. http://www.grid.unep.ch/data/summary.php?dataid=GNV38&category=atmosphere&dataurl=http://www.grid.unep.ch/data/download/gnv038.zip&browsen=http://www.grid.unep.ch/data/download/gnv038.gif. Accessed 17 June 2015
UNEP/DEWA (2014) Sanitation and Groundwater Protection – a UNEP Perspective. http://www.bgr.bund.de/EN/Themen/Wasser/Veranstaltungen/symp_sanitat-gwprotect/present_mmayi_pdf.pdf?__blob=publicationFile&v=2. Accessed 14 August 2014
Ward MH, deKok TM, Levallois P, Brender J, Gulis G, Nolan BT, VanDerslice J (2005) Workgroup report: drinking-water nitrate and health—recent findings and research needs. Environ Health Perspect 113(11):1607–1614. https://doi.org/10.1289/ehp.8043
Wheeler DC, Nolan BT, Flory AR, DellaValle CT, Ward MH (2015) Modeling groundwater nitrate concentrations in private wells in Iowa. Sci Total Environ 536:481–488. https://doi.org/10.1016/j.scitotenv.2015.07.080
Wick K, Heumesser C, Schmid E (2012) Groundwater nitrate contamination: factors and indicators. J Environ Manag 111:178–186
Xu Y, Usher B (2006) Groundwater pollution in Africa. Taylor & Francis/Balkema, the Netherlands, 353 pp
Yost AC et al (2008) Predictive modeling and mapping sage grouse (Centrocercus urophasianus) nesting habitat using maximum entropy and a long-term dataset from southern Oregon. Eco Inform 3(6):375–386
Youssef AM, Pourghasemi HR, Pourtaghi ZS, Al-Katheeri MM (2015) Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 13(5):839–856
Acknowledgments
This work was funded by the IDB (Islamic Development Bank) under its Ph.D. Merit Scholarship Program (MSP). GIS shape files for generating generic attributes were obtained from different sources throughout the world and also online. In this regard, special thanks go to T. Gleeson, P. Döll, N. Moosdoorf, and P. Trambauer. I would like to thank all colleagues, particularly Mr. V. Antharam, for their valuable discussions on the random forest method. We also thank Dr. Lixiang Lin and two reviewers for their constructive comments on the initial version of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ouedraogo, I., Defourny, P. & Vanclooster, M. Application of random forest regression and comparison of its performance to multiple linear regression in modeling groundwater nitrate concentration at the African continent scale. Hydrogeol J 27, 1081–1098 (2019). https://doi.org/10.1007/s10040-018-1900-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10040-018-1900-5