Skip to main content

Advertisement

Log in

Application of random forest regression and comparison of its performance to multiple linear regression in modeling groundwater nitrate concentration at the African continent scale

Application de la méthode de régression dite des forêts aléatoires et comparaison de ses performances avec la régression linéaire multiple pour la modélisation de la concentration en nitrates des eaux souterraines à l’échelle du continent africain

Aplicación de la regresión de bosques aleatorios y comparación de su desempeño con la regresión lineal múltiple en el modelado de la concentración de nitrato de agua subterránea a escala del continente africano

在模拟非洲大陆尺度上地下水硝酸盐含量中随机预测回归分析的应用及其针对多重线性回归性能的比较

Aplicação de regressão de floresta aleatória e comparação de seu desempenho com a regressão linear múltipla na modelagem da concentração de nitrato de águas subterrâneas na escala do continente Africano

  • Paper
  • Published:
Hydrogeology Journal Aims and scope Submit manuscript

Abstract

Groundwater management decisions require robust methods that allow accurate predictive modeling of pollutant occurrences. In this study, random forest regression (RFR) was used for modeling groundwater nitrate contamination at the African continent scale. When compared to more conventional techniques, key advantages of RFR include its nonparametric nature, its high predictive accuracy, and its capability to determine variable importance. The latter can be used to better understand the individual role and the combined effect of explanatory variables in a predictive model. In the absence of a systematic groundwater monitoring program at the African continent scale, the study used the groundwater nitrate contamination database for the continent obtained from a meta-analysis to test the modeling approach; 250 groundwater nitrate pollution studies from the African continent were compiled using the literature data. A geographic information system database of 13 spatial attributes was collected, related to land use, soil type, hydrogeology, topography, climatology, type of region, and nitrogen fertilizer application rate, and these were assigned as predictors. The RFR performance was evaluated in comparison to the multiple linear regression (MLR) methods. By using RFR, it was possible to establish which explanatory variables influence the occurrence of nitrate pollution in groundwater (population density, rainfall, recharge, etc.). Both the RFR and MLR techniques identified population density as the most important variable explaining reported nitrate contamination. However, RFR has a much higher predictive power (R2 = 0.97) than a traditional linear regression model (R2 = 0.64). RFR is therefore considered a very promising technique for large-scale modeling of groundwater nitrate pollution.

Résumé

Les décisions relatives à la gestion des eaux souterraines nécessitent des méthodes robustes qui permettent une modélisation prédictive exacte de l’occurrence d’un polluant. Dans la présente étude, la méthode de régression dite de forêts aléatoires (RFA) a été utilisée pour modéliser la contamination des eaux souterraines par les nitrates à l’échelle du continent africain. Quand on la compare à la plupart des techniques classiques, les avantages principaux de la RFA comportent: sa nature non paramétrique, sa haute précision prédictive, et sa capacité à déterminer l’importance d’une variable. Cette dernière peut être utilisée pour mieux comprendre le rôle individuel et l’effet combiné des variables explicatives dans un modèle prédictif. En l’absence d’un programme de gestion des eaux souterraines systématique à l’échelle du continent africain, l’étude a utilisé une base de données sur la contamination des eaux souterraines par les nitrates issue d’une méta-analyse, dans le but de tester une approche par modélisation; 250 études de pollution des eaux souterraines par les nitrates concernant le continent africain ont été compilées à partir de données bibliographiques. La base de données d’un système d’information géographique de 13 attributs spatiaux a été construite, relativement à l’utilisation des sols, au type de sol, à l’hydrogéologie, topographie, climatologie, au type de région et au taux d’épandage d’un engrais azotée et ceux-ci ont été désignés comme prédicteurs. La performance de la RFA a été évaluée par comparaison avec les méthodes de régression linéaire multiple (RLM). En utilisant la RFA, il a été possible d’identifier les variables explicatives influençant l’occurrence de la pollution nitratée dans les eaux souterraines (densité de la population, précipitations, recharge, etc.). Les techniques de RFA et de RLM ont identifié l’une et l’autre la densité de population comme la variable la plus importante pour expliquer la contamination par les nitrates. Cependant, la RFA a un pouvoir prédictif plus important (R2 = 0.97) qu’un modèle de régression linéaire traditionnel (R2 = 0.64). La RFA est. ainsi considérée comme une technique très prometteuse de modélisation à grande échelle de la pollution des eaux souterraines par les nitrates.

Resumen

Las decisiones de gestión del agua subterránea necesitan métodos robustos que permitan un modelado predictivo preciso de ocurrencias de contaminantes. En este estudio, se utilizó la regresión de bosques aleatorios (RFR) para modelar la contaminación por nitrato del agua subterránea a escala del continente africano. Cuando se comparan con técnicas más convencionales, las ventajas claves de la RFR incluyen su naturaleza no paramétrica, su alta precisión predictiva y su capacidad para determinar la importancia de las variables. Esta última se puede utilizar para comprender mejor el rol individual y el efecto combinado de las variables explicativas en un modelo predictivo. En ausencia de un programa sistemático de monitoreo de agua subterránea a escala del continente africano, el estudio utilizó una base de datos de contaminación de nitrato de agua subterránea obtenida de un metanálisis para probar el enfoque del modelado; Se compilaron 250 estudios de contaminación de nitrato de agua subterránea del continente africano utilizando los datos de la literatura. Se recopiló una base de datos del sistema de información geográfica de 13 atributos espaciales, relacionada con el uso del suelo, el tipo de suelo, la hidrogeología, la topografía, la climatología, el tipo de región y la tasa de aplicación de fertilizantes nitrogenados, y estos se asignaron como predictores. El rendimiento de RFR se evaluó en comparación con los métodos de regresión lineal múltiple (MLR). Mediante el uso de RFR, fue posible establecer qué variables explicativas influyen en la incidencia de la contaminación por nitratos en el agua subterránea (densidad de población, precipitación, recarga, etc.). Las técnicas RFR y MLR identificaron la densidad de población como la variable más importante que explica la contaminación por nitrato reportada. Sin embargo, la RFR tiene un poder predictivo mucho más alto (R2 = 0.97) que un modelo de regresión lineal tradicional (R2 = 0.64). Por lo tanto, la RFR se considera una técnica muy prometedora para el modelado a gran escala de la contaminación del agua subterránea por nitrato.

摘要

地下水管理决策需要能够准确预测模拟发生污染的强劲方法。本研究中,采用随机预测回归分析模拟非洲大陆尺度上的地下水硝酸盐含量。与更常规的技术相比,随机预测回归分析的主要优点包括其非参数特性、很高的预测精度以及确定变量重要性的能力。后者可用于更好地了解预测模型中解释性变量的各自作用及综合影响。在非洲大陆尺度上缺乏系统地下水监测项目的情况下,研究利用从荟萃分析中得到的地下水硝酸盐含量数据库测试模拟方法。利用文献数据编辑了250项非洲大陆地下水硝酸盐污染方面的研究。收集了13个与土地利用、土壤类型、水文地质学、地形学、气候学、地区类型及氮肥应用量等相关的空间属性的地理信息数据库,这些属性作为预测因子。在比较多重线性回归分析法中评估了随机预测回归分析的性能。利用随机预测回归分析,就有可能确定哪种解释性变量影响地下水中的硝酸盐污染(人口密度、降雨补给等)。随机预测回归分析和多重线性回归分析都确定了人口密度是造成所报道的硝酸盐污染最重要的变量。然而,随机预测回归分析的预测能力(R2 = 0.97)比传统线性回归模型的预测能力(R2 = 0.64)要高很多。因此,随机预测回归分析被认为是大尺度模拟地下水硝酸盐污染非常有前途的一项技术。

Resumo

As decisões de gestão das águas subterrâneas precisam de métodos robustos que permitam a modelagem preditiva precisa das ocorrências de poluentes. Neste estudo, a regressão de floresta aleatória (RFA) foi usada para modelar a contaminação por nitrato em águas subterrâneas na escala do continente africano. Quando comparadas à técnicas mais convencionais, as principais vantagens do RFA incluem sua natureza não paramétrica, sua alta precisão preditiva e sua capacidade de determinar a importância da variável. Este último pode ser usado para entender melhor o papel individual e o efeito combinado de variáveis ​​explicativas em um modelo preditivo. Na ausência de um programa sistemático de monitoramento de águas subterrâneas na escala do continente Africano, o estudo utilizou um banco de dados de contaminação por nitrato em águas subterrâneas obtido a partir de uma meta-análise para testar a abordagem de modelagem; 250 estudos de poluição por nitrato em águas subterrâneas do continente Africano foram compilados usando os dados da literatura. Foi coletado um banco de dados em sistema de informações geográficas com 13 atributos espaciais, relacionados ao uso da terra, tipo de solo, hidrogeologia, topografia, climatologia, tipo de região e taxa de aplicação de fertilizantes nitrogenados, sendo estes atribuídos como preditores. O desempenho do RFA foi avaliado em comparação com os métodos de regressão linear múltipla (RLM). Através da RFA, foi possível estabelecer quais variáveis ​​explicativas influenciam a ocorrência de poluição por nitrato nas águas subterrâneas (densidade populacional, precipitação, recarga, etc.). Ambas as técnicas RFA e RLM identificaram a densidade populacional como a variável mais importante que explica a contaminação relatada por nitrato. No entanto, a RFA tem um poder preditivo muito mais alto (R2 = 0.97) do que um modelo de regressão linear tradicional (R2 = 0.64). A RFA é, portanto, considerada uma técnica muito promissora para a modelagem em grande escala da poluição das águas subterrâneas por nitrato.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Abrahart RJ et al (2008) Practical hydroinformatics. computational intelligence and technological developments in water applications. Open Model Integration in Flood Forecasting 68

  • Aljazzar TH (2010) Adjustment of DRASTIC vulnerability index to assess groundwater vulnerability for nitrate pollution using the advection-diffusion cell. Von der Fakultät für Georessourcen und Materialtechnik der Rheinisch-Westfälischen Technischen Hochschule Aachen Ph.D. thesis, 146 pp

  • Alley WM, Healy RW, LaBaugh JW, Reilly TE (2002) Flow and storage in groundwater systems. Science 296(5575):1985–1990

    Article  Google Scholar 

  • Andrade AIASS, Stigter TY (2009) Multi-method assessment of nitrate and pesticide contamination in shallow alluvial groundwater as a function of hydrogeological setting and land use. Agric Water Manag 96(12):1751–1765

    Article  Google Scholar 

  • Anning DW, Paul AP, McKinney TS, Huntington JM, Bexfield LM, Thiros SA (2012) Predicted nitrate and arsenic concentrations in basin-fill aquifers of the southwestern United States. US Geological Survey Scientific Investigations Report 2012–5065

  • Anuraga TS, Ruiz L, Kumar MSM, Sekhar M, Leijnse A (2006) Estimating groundwater recharge using land use and soil data: a case study in South India. Agric Water Manag 84(1–2):65–76

    Article  Google Scholar 

  • Barzegar et al (2018) Mapping groundwater contamination risk of multiple aquifers using multi-model ensemble of machine learning algorithms. Sci Total Environ 621(2018):697–712. https://doi.org/10.1016/j.scitotenv.2017.11.185

    Article  Google Scholar 

  • Bauder J, Sinclair KN, Lund RE (1993) Physiographic and land use characteristics associated with nitrate-nitrogen in Montana groundwater. J Environ Qual 22(2):255–262. https://doi.org/10.2134/jeq1993.00472425002200020004x

    Article  Google Scholar 

  • BGS (2011) Depth to groundwater map. https://www.bgs.ac.uk/downloads/browse.cfm?sec=9&cat=38. Accessed 19 April 2014

  • Bonsor HC, MacDonald AM (2011) An initial estimate of depth to groundwater across Africa. British Geological Survey Open Report OR/11/067: 26pp

  • Boy-Roura M, Nolan BT, Menció A, Mas-Pla J (2013) Regression model for aquifer vulnerability assessment of nitrate pollution in the Osona region (NE Spain). J Hydrol 505:150–162

    Article  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Google Scholar 

  • Breiman L (2001a) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Breiman L (2001b) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16(3):199–231

    Article  Google Scholar 

  • Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton

    Google Scholar 

  • Burow KR, Nolan BT, Rupert MG, Dubrovsky NM (2010) Nitrate in groundwater of the United States, 1991−2003. Environ Sci Technol 44(13):4988–4997

    Article  Google Scholar 

  • Cameron KC, Di HJ, Moir JL (2013) Nitrogen losses from the soil/plant system: a review. Ann Appl Biol 162(2):145–173

    Article  Google Scholar 

  • Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson JC, Lawler JJ (2007) Random forests for classification in ecology. Ecology 88(11):2783–2792. https://doi.org/10.1890/07-0539.1

    Article  Google Scholar 

  • Davis DB, Sylvester-Bradley R (1995) The contribution of fertiliser nitrogen to leachable nitrogen in the UK: a review. J Sci Food Agric 68:399–406. https://doi.org/10.1002/jsfa.2740680402

    Article  Google Scholar 

  • Debernardi L, De-Luca DA, Lasahna M (2007) Correlation between nitrate concentration in groundwater and parameters affecting aquifer intrinsic vulnerability. Environ Geol 55:539–558

    Article  Google Scholar 

  • Defourny P, Kirches G, Brockmann C, Boettcher M, Peters M, Bontemps S, et al (2014) Land cover CCI product user guide version 2. 2014

  • Döll P, Fiedler K (2008) Global-scale modeling of groundwater recharge. Hydrol Earth Syst Sci 12:863–885. https://doi.org/10.5194/hess-12-863-2008,2008

  • Dubrovsky NM, Burow KR, Clark GM, Gronberg JM, Hamilton PA, Hitt KJ, Mueller DK, Munn MD, Nolan BT, Puckett LJ, Rupert MG, Short TM, Spahr NE, Sprague LA, Wilber WG (2010) The quality of our nation’s waters—nutrients in the nation’s streams and groundwater, 1992–2004. US Geological Survey Circular 1350, 174 pp

  • ESRI (1969) ArcGIS, www.arcgis.com/home. Accessed 23 June 2015

  • Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181

    Google Scholar 

  • Foster S, Pulido-Bosch A, Vallejos Á, Molina L, Llop A, MacDonald AM (2018) Impact of irrigated agriculture on groundwater-recharge salinity: a major sustainability concern in semi-arid regions. Hydrogeol J. https://doi.org/10.1007/s10040-018-1830-2

  • Fram MS, Belitz K (2011) Probability of detecting perchlorate under natural conditions in deep groundwater in California and the southwestern United States. Environ Sci Technol 45(4):1271–1277

    Article  Google Scholar 

  • Friedl MA, Brodley CE, Strahler AH (1999) Maximizing land cover classification accuracies produced by decision trees at continental to global scales. IEEE Trans Geosci Remote Sens 37(2 II):969–977

    Article  Google Scholar 

  • Gassiat C, Gleeson T, Luijendijk E (2013) The location of old groundwater in hydrogeologic basins and layered aquifer systems. Geophys Res Lett 40(12):3042–3047. https://doi.org/10.1002/grl.50599

  • Gemitzi A, Petalas C, Pisinaras V, Tsihrintzis VA (2009) Spatial prediction of nitrate pollution in groundwaters using neural networks and GIS: an application to south Rhodope aquifer (Thrace, Greece). Hydrol Process 23(3):372–383. https://doi.org/10.1002/hyp.7143

    Article  Google Scholar 

  • Genuer R, Poggi JM, Christine TM (2010) Variable selection using random forests. Pattern Recogn Lett 31(14):2225–2236

    Article  Google Scholar 

  • Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27(4):294–300

    Article  Google Scholar 

  • Gleeson T, Moosdorf N, Hartmann J, van Beek LPH (2014) A glimpse beneath earth’s surface: global HYdrogeology MaPS (GLHYMPS) of permeability and porosity. Geophys Res Lett 41(11):3891–3898. https://doi.org/10.1002/2014GL059856

    Article  Google Scholar 

  • Golkarian A, Naghibi SA, Kalantar B, Pradhan B (2018) Groundwater potential mapping using C5. 0, random forest, and multivariate adaptive regression spline models in GIS. Environ Monit Assess 190(3):149. https://doi.org/10.1007/s10661-018-6507-8

  • Greene EA, LaMotte AE, Cullinan KA (2005) Ground-water vulnerability to nitrate contamination at multiple thresholds in the Mid-Atlantic region using spatial probability models. US Geological Survey Scientific Investigations Report 2004–5118, p 24

  • Graham MH (2003) Confronting multicollinearity in ecological multiple regression. Ecology 84(11) pp. 2809–2815. https://www.jstor.org/stable/3449952. Accessed 3 Feb 2016

  • Grömping U (2009) Variable importance assessment in regression: linear regression versus random forest. Am Stat 63(4):308–319. https://doi.org/10.1198/tast.2009.08199

    Article  Google Scholar 

  • Gurdak JJ, Qi SL (2012) Vulnerability of recently recharged groundwater in principle aquifers of the United States to nitrate contamination. Environ Sci Technol 46(11):6004–6012

    Article  Google Scholar 

  • Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell (10):993–1001

  • Hanson CR (2002) Nitrate concentrations in Canterbury ground water – a review of existing data. Report no. R02/17. Environment Canterbury Technical Report, 87 pp

  • Hao A, Zhang Y, Zhang E, Li Z, Yu J, Wang H, Yang J, Wang Y (2018) Review: groundwater resources and related environmental issues in China. Hydrogeol J. https://doi.org/10.1007/s10040-018-1787-1

  • Hartmann J, Moosdorf N (2012) The new global lithological map database GLiM: a representation of rock properties at the earth surface. Geochem Geophys Geosyst 13:Q12004. https://doi.org/10.1029/2012GC004370

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning, 2nd edn. Springer

  • Hengl T, Hengl T, de Jesus JM, MacMillan RA, Batjes NH, Heuvelink GBM, Ribeiro E, Samuel-Rosa A, Kempen B, Leenaars JGB, Walsh MG, Gonzalez MR (2014) Soil-Grids1km – global soil information based on automated mapping. PLoS One 9:e105992. https://doi.org/10.1371/journal.pone.0105992

    Article  Google Scholar 

  • Hoyos ICP, Krakauer N, Khanbilvardi R (2015) Random forest for identification and characterization of groundwater dependent ecosystems. WIT Trans Ecol Environ 196:89–100

    Article  Google Scholar 

  • ISRIC (2014) SoilGrids – Global gridded soil information. (https://www.isric.org/explore/soilgrids, Accessed 19 July 2014). [Reference to paper: Hengl T, de Jesus JM, MacMillan RA, Batjes NH, Heuvelink GBM, et al. (2014) SoilGrids1km — global soil information based on automated mapping. PLoS ONE 9(8):e105992. https://doi.org/10.1371/journal.pone.0105992]

  • Jung Y-Y, Dong-Chan K, Won-Bae P, Kyoochul H (2015) Evaluation of multiple regression models using spatial variables to predict nitrate concentrations in volcanic aquifers. Hydrol Process 30(5):663–675

    Article  Google Scholar 

  • Kazemi G, Lehr J, Perrochet P (2006) Groundwater age. Wiley-Interscience, Hoboken, New Jersey. 325pp

    Book  Google Scholar 

  • Khalil A, Almasri MN, McKee M, Kaluarachchi JJ (2005) Applicability of statistical learning algorithms in groundwater quality modeling. Water Resour Res 41(5)

  • Kihumba AM, Longo JN, Vanclooster M (2015) Modelling nitrate pollution pressure using a multivariate statistical approach: the case of Kinshasa groundwater body. Democratic Republic of Congo. Hydrogeol J: 1–13. https://doi.org/10.1007/s10040-015-1337-z

  • Kulabako N, Nalubega M, Thunvik R (2007) Study of the impact of land use and hydrogeological settings on the shallow groundwater quality in a peri-urban area of Kampala, Uganda. Sci Total Environ 381(1):180–199. https://doi.org/10.1016/j.scitotenv.2007.03.035

    Article  Google Scholar 

  • Lapworth DJ, Nkhuwa DCW, Okotto-Okotto J, Pedley S, Stuart ME, Tijani MN, Wright J (2017) Urban groundwater quality in sub-Saharan Africa: current status and implications for water security and public health. Hydrogeol J 25(4):1093–1116. https://doi.org/10.1007/s10040-016-1516-6

  • Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22

    Google Scholar 

  • Liu CW, Wang Y-B, Jang C-S (2013) Probability-based nitrate contamination map of groundwater in Kinmen. Environ Monit Assess 185(12):10147–10156

    Article  Google Scholar 

  • Loosvelt L, Petersb J, Skriverc H, Lievensa H, Van Coillied FMB, De Baetsb B, Verhoesta NEC (2012) Random forests as a tool for estimating uncertainty at pixel-level in SAR image classification. Int J Appl Earth Obs Geoinf 19:173–184

    Article  Google Scholar 

  • Luo Y, Qiao X, Song J, Christie P, Wong M (2003) Use of a multi-layer column device for study on leachability of nitrate in sludge-amended soils. Chemosphere 52:1483–1488

    Article  Google Scholar 

  • MacDonald AM, Calow RC, MacDonald DM, Darling WG, Dochartaigh BÉÓ (2009) What impact will climate change have on rural groundwater supplies in Africa. Hydrol Sci J 64(690–703). 18pp

  • MacDonald AM, Taylor RG, Bonsor HC (2013) Groundwater in Africa – is there sufficient water to support the intensification of agriculture from “Land Grabs”? Hand book of land and water grabs in Africa, 9pp

  • Mair A, El-Kadi AI (2013) Logistic regression modeling to assess groundwater vulnerability to contamination in Hawaii, USA. J Contam Hydrol 153:1–23

    Article  Google Scholar 

  • Margat J (2010) Ressources et utilisation des eaux souterraines en Afrique. Managing Shared Aquifer Resources in Africa, Third International Conférence Tripoli 25–27 may 2008. International Hydrological Programme, Division of Water Sciences, IHP-VII Series on groundwater No.1, UNESCO, p 26–34

  • Masterson, JP, Hess KM, Walter DA, LeBlanc DR (2002) Simulated changes in the sources of ground water for public-supply wells, ponds, streams, and coastal areas on Western Cape Cod, Massachusetts. US Geological Survey Water Resources Investigations Report 02–4143

  • Mattern S, Vanclooster M (2009) Estimating travel time of recharge water through the unsaturated zone using transfer function model. Environ Fluid Mech. https://doi.org/10.1007/s10652-009-9148-1

  • Mattern S, Raouafi W, Bogaert P, Fasbender D, Vanclooster M (2012) Bayesian data fusion (BDF) of monitoring data with a statistical groundwater contamination model to map groundwater quality at the regional scale. J Water Resour Prot 4(11):929–943

  • Mendes MP, Rodriguez-Galiano V, Luque-Espinar JA, Ribeiro L, Chica- Olmo M (2016) Applying random forest to assess the vulnerability of groundwater to pollution by nitrates. geoENV 2016. The 11th International Conference onGeostatistics for Environmental Applications. Lisbon, Portugal. geoENV2016BookofAbstractsMPM

  • Moreno R, Zamora R, Molina JR, Vasquez A, Herrera MÁ (2011) Predictive modeling of microhabitats for endemic birds in south Chilean temperate forests using maximum entropy (Maxent). Eco Inform 6(6):364–370

    Article  Google Scholar 

  • Murtaugh PA (2009) Performance of several variable-selection methods applied to real ecological data. Ecol Lett 12(10):1061–1068

    Article  Google Scholar 

  • Naghibi SA, Ahmadi K, Daneshi A (2017) Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping. Water Resour Manag 31(9):2761–2775. https://doi.org/10.1007/s11269-017-1660-3

    Article  Google Scholar 

  • Nelson A (2004) Population Density for Africa in 2000, 4th edn. Retrieved 1/27/2011 from UNEP/GRID Sioux Falls. https://databasin.org/datasets/4d59b959e8b040688037d2fe83a3f369. Accessed 19 April 2015

  • Nolan BT, Hitt KJ (2006) Vulnerability of shallow groundwater and drinking-water wells to nitrate in the United States. Environ Sci Technol 40(24):7834–7840. https://doi.org/10.1021/es060911u

    Article  Google Scholar 

  • Nolan BT, Hitt KJ, Ruddy BC (2002) Probability of nitrate contamination of recently recharged groundwaters in the conterminous United States. Environ Sci Technol 36(10):2138–2145. https://doi.org/10.1021/es0113854

    Article  Google Scholar 

  • Nolan BT, Fienen MN, Lorenz DL (2015) A statistical learning framework for groundwater nitrate models of the Central Valley, California, USA. J Hydrol 531:902–911. https://doi.org/10.1016/j.jhydrol.2015.10.025

  • Nolan BT, Gronberg JM, Faunt CC, Eberts SM, Belitz K (2014) Modeling nitrate at domestic and public-supply well depths in the Central Valley, California. Environ Sci Technol 48(10):5643–5651. https://doi.org/10.1021/es405452q.

    Article  Google Scholar 

  • Norouz H, Negar AM, Attaallah N (2016) Determining vulnerable areas of Malekan Plain aquifer for nitrate, using random forest method. Journal of Environmental Studies, vol 41, no 4 (76), pp 923–942. http://www.sid.ir/En/Journal/ViewPaper.aspx?ID=550917. Accessed online 2 August 2018

  • Oliveira S, Oehler F, San-Miguel-Ayanz J, Camia A, Pereira JMC (2012) Modeling spatial patterns of fire occurrence in Mediterranean Europe using multiple regression and random forest. For Ecol Manag 275:117–129

    Article  Google Scholar 

  • Oppel S, Meirinho A, Ramírez I, Gardner B, O’Connell AF, Miller PI, Louzao, M (2012) Comparison of five modelling techniques to predict the spatial distribution and abundance of seabirds. Biol Conserv 156:94–104. https://doi.org/10.1016/j.biocon.2011.11.013

    Article  Google Scholar 

  • Ouedraogo I, Vanclooster M (2016a). A meta-analysis and statistical modelling of nitrates in groundwater at the African scale. In: Hydrology and Earth System Sciences 20(6):2353–2381

  • Ouedraogo I, Vanclooster M (2016b) Shallow groundwater poses pollution problem for Africa. SciDev.Net, 4 pp, http://hdl.handle.net/2078.1/169630

  • Ouedraogo I, Defourny P, Vanclooster M (2016) Mapping the groundwater vulnerability for pollution at the pan-African scale. In: Science of the Total Environment, 544:939–953. https://doi.org/10.1016/j.scitotenv.2015.11.135

  • Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26(1):217–222

    Article  Google Scholar 

  • Park N-W (2014) Using maximum entropy modeling for landslide susceptibility mapping with multiple geoenvironmental data sets. Environ Earth Sci 73(3):937–949

    Article  Google Scholar 

  • Pearson S (2015) Identifying Groundwater Vulnerability from Nitrate Contamination: Comparison of the DRASTIC model and Environment Canterbury’s method. Degree of Master of Applied Science (Environmental Management). Lincoln University. 58 pp

  • Peters J, Baets BD, Verhoest NEC, Samson R, Degroeve S, Becker PD, Huybrechts W (2007) Random forests as a tool for ecohydrological distribution modelling. Ecol Model 207(2–4):304–318

    Article  Google Scholar 

  • Potter P, Ramankutty N, Bennett EM, Donner SD (2010) Characterizing the spatial patterns of global fertilizer application and manure production. Earth Interact 14:1–22. https://doi.org/10.1175/2009EI288.1

    Article  Google Scholar 

  • Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9(2):181–199. https://doi.org/10.1007/s10021-005-0054-1

    Article  Google Scholar 

  • Puckett LJ, Tesoriero AJ, Dubrovsky NM (2011) Nitrogen contamination of surficial aquifers--a growing legacy. Environ Sci Technol 45(3):839–844. https://doi.org/10.1021/es1038358

    Article  Google Scholar 

  • R Development Core Team (2015) A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. http://www.r-project.org/. Last accessed 6 March 2015)

  • Ramasamy N, Krishnan P, Bernard JC, Ritter WF(2003) Modeling Nitrate Concentration in Ground Water Using Regression and Neural Networks. Department of Food and Resource Economics. College of Agriculture and Natural Resources. University of Delaware(ORES SP03–01). 10pp

  • Rankinen K, Salo T, Granlund K, Rita H (2007) Simulated nitrogen leaching, nitrogen mass field balances and their correlation on four farms in South-Western Finland during the period 2000–2005. Agric Food Sci 16:387–406

    Article  Google Scholar 

  • Ransom et al (2017). A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA. https://doi.org/10.1016/j.scitotenv.2017.05.192

  • Rawlings JO, Pantula SG, Dickey DA (1998) Applied regression analysis, a research tool. Springer, Berlin. 658p

    Book  Google Scholar 

  • Ritter A, Muñoz-Carpena R (2013) Performance evaluation of hydrological models: statistical significance for reducing subjectivity in goodness-of-fit assessments. J Hydrol 480:33–45. https://doi.org/10.1016/j.jhydrol.2012.12.004

    Article  Google Scholar 

  • Rodriguez-Galiano VF, Chica-Rivas M (2012) Evaluation of different machine learning methods for land cover mapping of a Mediterranean area using multi-seasonal Landsat images and digital terrain models. Int J Digital Earth 7(6):492–509

    Article  Google Scholar 

  • Rodriguez-Galiano VF, Chica-Olmo M, Abarca-Hernandez F, Atkinson PM, Jeganathan C (2012a) Random forest classification of Mediterranean land cover using multi-seasonal imagery and multi-seasonal texture. Remote Sens Environ 121:93–107

    Article  Google Scholar 

  • Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012b) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104

    Article  Google Scholar 

  • Rodriguez-Galiano V, Mendes MP, Garcia-Soldado MJ, Chica-Olmo M, Ribeiro L (2014) Predictive modeling of groundwater nitrate pollution using random forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (southern Spain). Sci Total Environ 476-477:189–206. https://doi.org/10.1016/j.scitotenv.2014.01.001

    Article  Google Scholar 

  • Saffigna PG, Keeney DR (1997) Nitrate and chloride in groundwater under irrigated agriculture in Central Wisconsin. Groundwater 15(2):170–177

    Article  Google Scholar 

  • Sahoo S, Russo TA, Elliott J, Foster I (2017) Machine learning algorithms for modeling groundwater level changes in agricultural regions of the U.S. Water Resour Res 53:3878–3895. https://doi.org/10.1002/2016WR019933

    Article  Google Scholar 

  • Sajedi-Hosseini F, Malekian A, Choubin B, Rahmati O, Cipullo S, Coulon F, Pradhan B (2018) A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Sci Total Environ 644(2018):954–962. https://doi.org/10.1016/j.scitotenv.2018.07.054

    Article  Google Scholar 

  • Schweigert P, Pinter N, van der Ploeg R (2004) Regression analyses of weather effects on the annual concentrations of nitrate in soil and groundwater. J Plant Nutr Soil Sci 167(3):309–318

    Article  Google Scholar 

  • Sesnie SE, Gessler PE, Finegan B, Thessler S (2008) Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and change detection in complex neotropical environments. Remote Sens Environ 112(5):2145–2159

    Article  Google Scholar 

  • Sieling K, Kage H (2006) N balance as an indicator of N leaching in an oilseed rape – winter wheat – winter barley rotation. Agric Ecosyst Environ 115:261–269

    Article  Google Scholar 

  • Sophocleous M (2004) Groundwater recharge. In: Silveira L, Wohnlich S, Usunoff EL (eds), Groundwater. Encyclopedia of Life Support Systems (EOLSS), Developed under the Auspices of the UNESCO, Eolss Publishers, Oxford, UK. http://www.eolss.net. Accessed 9 September 2015

  • Spalding RF, Exner ME (1993) Occurrence of nitrate in groundwater- a review. J Environ Qual 22:392–402. https://doi.org/10.2134/jeq1993.00472425002200030002x

  • Steele BM (2000) Combining multiple classifiers: an application using spatial and remotely sensed information for land cover type mapping. Remote Sens Environ 74(3):545–556

    Article  Google Scholar 

  • Stevenson FJ, Cole MA (1999) Cycles of soil carbon, nitrogen, phosphorus, sulfur, micronutrients, 2nd edn. Wiley, Hoboken

    Google Scholar 

  • Stigter TY, Ribeiro L, Dill AMMC (2008) Building factorial regression models to explain and predict nitrate concentrations in groundwater under agricultural land. J Hydrol 357(1–2):42–56

    Article  Google Scholar 

  • Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources, and a solution. BMC Bioinf 8:25. https://doi.org/10.1186/1471-2105-8-25

    Article  Google Scholar 

  • Teng Y, Hu B, Zheng J, Wang J, Zhai Y, Zhu C (2018) Water quality responses to the interaction between surface water and groundwater along the Songhua River, NE China. Hydrogeol J. https://doi.org/10.1007/s10040-018-1738-x

  • Tesoriero AJ, Voss FD (1997) Predicting the probability of elevated nitrate concentrations in the Puget Sound-Basin, implications for aquifer susceptibility and vulnerability. Ground Water 35(6):1029–1039

    Article  Google Scholar 

  • Thayalakumaran T, Charlesworth PB, Bristow K, van Bemmelen RJ, & Jaffres J (2004) Nitrate and ferrous iron concentrations in the lower Burdekin aquifers: assessing denitrification potential. In B. Singh (Ed), SuperSoil 2004 Conference 3rd Australian New Zealand Soils Conference (pp. 1-9). Sydney: The Regional Institute Ltd. https://researchoutput.csu.edu.au/en/publications/nitrate-and-ferrous-iron-concentrations-in-the-lower-burdekin-aqu, https://www.researchgate.net/publication/228513222_Nitrate_and_ferrous_iron_concentrations_in_the_lower_Burdekin_aquifers_assessing_denitrification_potenti. Accessed 17 Feb 2016

  • Trambauer P, Dutra E, Maskey S, Werner M, Pappenberger F, van Beek LPH, Uhlenbrook S (2014) Comparison of different evaporation estimates over the African continent. Hydrol Earth Syst Sci 18(1):193–212

    Article  Google Scholar 

  • UNECA, AU, AfDB (2000) The Africa Water Vision 2025: Equitable and Sustainable Use of Water for Socioeconomic Development. http://www.afdb.org/fileadmin/uploads/afdb/Documents/Generic-Documents/african%20water%20vision%202025%20to%20be%20sent%20to%20wwf5.pdf. Accessed 11 February 2016

  • UNEP (1986) Final Report: UNEP/FAO World and Africa GIS Data Base; December 1984. http://www.grid.unep.ch/data/summary.php?dataid=GNV38&category=atmosphere&dataurl=http://www.grid.unep.ch/data/download/gnv038.zip&browsen=http://www.grid.unep.ch/data/download/gnv038.gif. Accessed 17 June 2015

  • UNEP/DEWA (2014) Sanitation and Groundwater Protection – a UNEP Perspective. http://www.bgr.bund.de/EN/Themen/Wasser/Veranstaltungen/symp_sanitat-gwprotect/present_mmayi_pdf.pdf?__blob=publicationFile&v=2. Accessed 14 August 2014

  • Ward MH, deKok TM, Levallois P, Brender J, Gulis G, Nolan BT, VanDerslice J (2005) Workgroup report: drinking-water nitrate and health—recent findings and research needs. Environ Health Perspect 113(11):1607–1614. https://doi.org/10.1289/ehp.8043

    Article  Google Scholar 

  • Wheeler DC, Nolan BT, Flory AR, DellaValle CT, Ward MH (2015) Modeling groundwater nitrate concentrations in private wells in Iowa. Sci Total Environ 536:481–488. https://doi.org/10.1016/j.scitotenv.2015.07.080

    Article  Google Scholar 

  • Wick K, Heumesser C, Schmid E (2012) Groundwater nitrate contamination: factors and indicators. J Environ Manag 111:178–186

    Article  Google Scholar 

  • Xu Y, Usher B (2006) Groundwater pollution in Africa. Taylor & Francis/Balkema, the Netherlands, 353 pp

  • Yost AC et al (2008) Predictive modeling and mapping sage grouse (Centrocercus urophasianus) nesting habitat using maximum entropy and a long-term dataset from southern Oregon. Eco Inform 3(6):375–386

    Article  Google Scholar 

  • Youssef AM, Pourghasemi HR, Pourtaghi ZS, Al-Katheeri MM (2015) Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 13(5):839–856

Download references

Acknowledgments

This work was funded by the IDB (Islamic Development Bank) under its Ph.D. Merit Scholarship Program (MSP). GIS shape files for generating generic attributes were obtained from different sources throughout the world and also online. In this regard, special thanks go to T. Gleeson, P. Döll, N. Moosdoorf, and P. Trambauer. I would like to thank all colleagues, particularly Mr. V. Antharam, for their valuable discussions on the random forest method. We also thank Dr. Lixiang Lin and two reviewers for their constructive comments on the initial version of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Issoufou Ouedraogo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ouedraogo, I., Defourny, P. & Vanclooster, M. Application of random forest regression and comparison of its performance to multiple linear regression in modeling groundwater nitrate concentration at the African continent scale. Hydrogeol J 27, 1081–1098 (2019). https://doi.org/10.1007/s10040-018-1900-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10040-018-1900-5

Keywords

Navigation