Abstract
Environmental monitoring programs provide large multivariate data sets that usually cover considerable spatial and temporal variabilities. The apparent complexity of these data sets requires sophisticated tools for their processing. Usually, fixed schemes are followed, including the application of numerical models, which are increasingly implemented in decision support systems. However, these schemes are too rigid with respect to detecting unexpected features, like the onset of subtle trends, non-linear relationships or patterns that are restricted to limited sub-samples of the total data set. In this study, an alternative approach is followed. It is based on an efficient non-linear visualization of the data. Visualization is the most powerful interface between computer and human brain. The idea is to apply an efficient and model-free tool, meaning without the necessity of prior assumptions about key properties of the data, such as dominant processes. In other words, processing of the data aimed at preserving a maximum amount of information and to leave it to the expert which features to analyze in more detail. A comprehensive data set from a 15-year monitoring program in the Lehstenbach watershed was used. The watershed is located in the Fichtelgebirge area, a mountainous region in South Germany, where land-use is forestry. Streamwater and groundwater have been monitored at 38 sampling sites, comprising 13 parameters. The data set was analyzed using a self-organizing map (SOM), combined with Sammon’s mapping. The 2D non-linear projection represented 89% of the variance of the data set. The visualization of the data set enabled an easy detection of outliers, assessing spatial versus temporal variance, and verifying a predefined classification of the sampling sites. Contamination of two of the observation wells was detected. Long-term trends of solute concentration in the catchment runoff could be differentiated from short-term dynamics, and a long-term shift in the dynamics was determined for different flow regimes individually. This analysis helped considerably to better understand the system’s behavior, to detect “hot spots” and to organize subsequent analyses of the data in a very efficient way.
Similar content being viewed by others
References
Abrahart RJ, See L (2000) Comparing neural network and autoregressive moving average techniques for the provision of continuous river flow forecasts in two contrasting catchments. Hydrol Processes 14:1046–1061
Bowden GJ, Dandy GC, Maier HR (2004) Input determination for neural network models in water resources applications. Part 1—background and methodology. J Hydrol 301:75–92
Cruz JV, Amaral CS (2004) Major ion chemistry of groundwater from perched-water bodies of the Azores (Portugal) volcanic archipelago. Appl Geochem 19:445–459
Furrer R (2008) spam: SPArse Matrix. R package version 0.13-2. http://www.mines.edu/~rfurrer/software/spam/
Gámez AJ, Zhou CS, Timmermann A, Kurths J (2004) Nonlinear dimensionality reduction in climate data. Nonlinear Processes Geophys 11:393–398
Gerstberger P, Foken T, Kalbitz K (2004) The Lehstenbach and Steinkreuz catchments in NE Bavaria, Germany. In: Matzner E (ed) Biogeochemistry of forested catchments in a changing environment. A German case study. Ecological studies 172. Springer, Heidelberg, pp 399–436
Haag I, Westrich B (2002) Processes governing river water quality identified by principal component analysis. Hydrol Processes 16:3113–3130
Helsel DR (2006) Fabricating data: how substituting values for nondetects can ruin results, and what can be done about it. Chemosphere 65:2434–2439
Hsu KL, Gupta HV, Gao X, Sorooshian S, Imam B (2002) Self-organizing linear output map (SOLO): an artificial neural network suitable for hydrologic modeling and analysis. Water Resour Res 38(12):1302. doi:10.1029/2001WR000795
Jain A, Srinivasulu S (2006) Integrated approach to model decomposed flow hydrograph using artificial neural network and conceptual techniques. J Hydrol 317:291–306
Kaski S (1997) Data exploration using self-organizing maps, Acta Polytechnica Scandinavia, Mathematics, Computing and Management in Engineering Series no. 82. Helsinki University of Technology, Espoo
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43:59–69
Kohonen T (2001) Self-organizing maps, Springer Series in Information Sciences, vol 30, 3rd edn. Berlin
Kohonen T, Hynninen J, Kangas J, Laaksonen J (1996) SOM-PAK: The self-organizing map program package, Report A31. Helsinki University of Techology, Faculty of Information Technology, Laboratory of Computer and Information Science, Otaniemi, Finland
Lauzon N, Anctil F, Petrinovic J (2004) Characterization of soil moisture conditions at temporal scales from a few days to annual. Hydrol Processes 18:3235–3254. doi:10.1002/hyp.5656
Lauzon N, Anctil F, Baxter CW (2006) Clustering of heterogeneous precipitation fields for the assessment and possible improvement of lumped neural network models for streamflow forecasts. Hydrol Earth Syst Sci 10:485–494
Lee B-H, Scholz M (2006) Application of the self-organizing map (SOM) to assess the heavy metal removal performance in experimental constructed wetlands. Water Res 40:3367–3374
Lerner B, Guterman H, Aladjem M, Dinstein I (2000) On the initialisation of Sammon’s nonlinear mapping. Pattern Anal Appl 3:61–68
Lin G-F, Chen L-H (2005) Time series forecasting by combining the radial basis function network and the self-organizing map. Hydrol Processes 19:1925–1937
Liong S-Y, Lim W-H, Kojiri T, Hori T (2000) Advance flood forecasting for flood stricken Bangladesh with a fuzzy reasoning method. Hydrol Processes 14:431–448
Lischeid G (2001) Investigating short-term dynamics and long-term trends of SO4 in the runoff of a forested catchment using artificial neural networks. J Hydrol 243:31–42
Lischeid G, Bittersohl J (2008) Tracing biogeochemical processes in stream water and groundwater using nonlinear statistics. J Hydrol 357:11–28. doi:10.1016/j.jhydrol.2008.03.013
Lischeid G, Kolb A, Alewell C (2002) Apparent translatory flow in groundwater recharge and runoff generation. J Hydrol 265:195–211
Lischeid G, Lange H, Moritz K, Büttcher H (2004) Dynamics of runoff and runoff chemistry at the Lehstenbach and Steinkreuz catchment. In: Matzner E (ed) Biogeochemistry of forested catchments in a changing environment. A German case study. Ecological studies 172. Springer, Heidelberg, pp 399–436
Mahecha M, Martínez A, Lischeid G, Beck E (2007) Nonlinear dimensionality reduction as a new ordination approach for extracting and visualizing biodiversity patterns in tropical montane forest vegetation data. Ecol Inform 2:138–149. doi:10.1016/j.ecoinf.2007.05.002
Matzner E, Zuber T, Alewell C, Lischeid G, Moritz K (2004) Trends in deposition and canopy leaching of mineral elements as indicated by bulk deposition and throughfall measurements. In: Matzner E (ed) Biogeochemistry of forested catchments in a changing environment. A German case study. Ecological studies 172. Springer, Heidelberg, pp 233–250
Mao J, Jain AK (1995) Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans Neural Netw 6:296–317
Nychka D (2007) fields: Tools for spatial data. R package version 4.1. http://www.image.ucar.edu/GSP/Software/Fields
Peeters L, Bac F, Lobo V, Dassargues A (2007) Exploratory data analysis and clustering of multivariate spatial hydrogeological data by means of GEO3DSOM, a variant of Kohonen’s self-organizing map. Hydrol Earth Syst Sci 11:1309–1321
R Development Core Team (2006) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. http://www.R-project.org
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput C-18/5: 401-409
Sanchez-Martos F, Aguilera PA, Garrido-Frenich A, Torres JA, Pulido-Bosch A (2002) Assessment of groundwater quality by means of self-organizing maps: Application in a semiarid area. Environ Manage 30:716–726. doi:10.1007/s00267-002-2746-z
Singh KP, Malik A, Sinha S (2005) Water quality assessment and apportionment of pollution sources of Gomti river (India) using multivariate statistical techniques—a case study. Anal Chim Acta 538:355–374
Thyne G, Guler C, Poeter E (2004) Sequential analysis of hydrochemical data for watershed characterization. Ground Water 42:711–723
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York
Vesanto J (1999) SOM-based data visualization methods. Intell Data Anal 3:111–126
Weyer C, Lischeid G, Aquilina L, Pierson-Wickmann A-C, Martin C (2008) Investigating mineralogical sources of the buffering capacity of a granite catchment using strontium isotopes. Appl Geochem (in press). doi:10.1016/j.apgeochem.2008.04.006
Yan J (2004) som: self-organizing map. R package version 0.3–4
Acknowledgments
The author is indebted to numerous colleagues that contributed to the data set. Among these are Jochen Bittersohl, Klaus Moritz, and Stefan Wunderlich from the former Bavarian Water Resources Agency, now Bavarian Environmental Agency, as well as Andreas Kolb from the former Department of Hydrogeology at BITÖK, and Gunter Ilgen and his crew from the BITÖK Central Laboratory. Part of this work has been financed by the German Federal Ministry of Education and Research, grant no. PT BEO 51-0339476 A-D.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lischeid, G. Non-linear visualization and analysis of large water quality data sets: a model-free basis for efficient monitoring and risk assessment. Stoch Environ Res Risk Assess 23, 977–990 (2009). https://doi.org/10.1007/s00477-008-0266-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-008-0266-y