Skip to main content

Advertisement

Log in

Non-linear visualization and analysis of large water quality data sets: a model-free basis for efficient monitoring and risk assessment

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

Environmental monitoring programs provide large multivariate data sets that usually cover considerable spatial and temporal variabilities. The apparent complexity of these data sets requires sophisticated tools for their processing. Usually, fixed schemes are followed, including the application of numerical models, which are increasingly implemented in decision support systems. However, these schemes are too rigid with respect to detecting unexpected features, like the onset of subtle trends, non-linear relationships or patterns that are restricted to limited sub-samples of the total data set. In this study, an alternative approach is followed. It is based on an efficient non-linear visualization of the data. Visualization is the most powerful interface between computer and human brain. The idea is to apply an efficient and model-free tool, meaning without the necessity of prior assumptions about key properties of the data, such as dominant processes. In other words, processing of the data aimed at preserving a maximum amount of information and to leave it to the expert which features to analyze in more detail. A comprehensive data set from a 15-year monitoring program in the Lehstenbach watershed was used. The watershed is located in the Fichtelgebirge area, a mountainous region in South Germany, where land-use is forestry. Streamwater and groundwater have been monitored at 38 sampling sites, comprising 13 parameters. The data set was analyzed using a self-organizing map (SOM), combined with Sammon’s mapping. The 2D non-linear projection represented 89% of the variance of the data set. The visualization of the data set enabled an easy detection of outliers, assessing spatial versus temporal variance, and verifying a predefined classification of the sampling sites. Contamination of two of the observation wells was detected. Long-term trends of solute concentration in the catchment runoff could be differentiated from short-term dynamics, and a long-term shift in the dynamics was determined for different flow regimes individually. This analysis helped considerably to better understand the system’s behavior, to detect “hot spots” and to organize subsequent analyses of the data in a very efficient way.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Abrahart RJ, See L (2000) Comparing neural network and autoregressive moving average techniques for the provision of continuous river flow forecasts in two contrasting catchments. Hydrol Processes 14:1046–1061

    Google Scholar 

  • Bowden GJ, Dandy GC, Maier HR (2004) Input determination for neural network models in water resources applications. Part 1—background and methodology. J Hydrol 301:75–92

    Article  Google Scholar 

  • Cruz JV, Amaral CS (2004) Major ion chemistry of groundwater from perched-water bodies of the Azores (Portugal) volcanic archipelago. Appl Geochem 19:445–459

    Article  CAS  Google Scholar 

  • Furrer R (2008) spam: SPArse Matrix. R package version 0.13-2. http://www.mines.edu/~rfurrer/software/spam/

  • Gámez AJ, Zhou CS, Timmermann A, Kurths J (2004) Nonlinear dimensionality reduction in climate data. Nonlinear Processes Geophys 11:393–398

    Google Scholar 

  • Gerstberger P, Foken T, Kalbitz K (2004) The Lehstenbach and Steinkreuz catchments in NE Bavaria, Germany. In: Matzner E (ed) Biogeochemistry of forested catchments in a changing environment. A German case study. Ecological studies 172. Springer, Heidelberg, pp 399–436

    Google Scholar 

  • Haag I, Westrich B (2002) Processes governing river water quality identified by principal component analysis. Hydrol Processes 16:3113–3130

    Article  Google Scholar 

  • Helsel DR (2006) Fabricating data: how substituting values for nondetects can ruin results, and what can be done about it. Chemosphere 65:2434–2439

    Article  CAS  Google Scholar 

  • Hsu KL, Gupta HV, Gao X, Sorooshian S, Imam B (2002) Self-organizing linear output map (SOLO): an artificial neural network suitable for hydrologic modeling and analysis. Water Resour Res 38(12):1302. doi:10.1029/2001WR000795

    Google Scholar 

  • Jain A, Srinivasulu S (2006) Integrated approach to model decomposed flow hydrograph using artificial neural network and conceptual techniques. J Hydrol 317:291–306

    Article  Google Scholar 

  • Kaski S (1997) Data exploration using self-organizing maps, Acta Polytechnica Scandinavia, Mathematics, Computing and Management in Engineering Series no. 82. Helsinki University of Technology, Espoo

  • Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43:59–69

    Article  Google Scholar 

  • Kohonen T (2001) Self-organizing maps, Springer Series in Information Sciences, vol 30, 3rd edn. Berlin

  • Kohonen T, Hynninen J, Kangas J, Laaksonen J (1996) SOM-PAK: The self-organizing map program package, Report A31. Helsinki University of Techology, Faculty of Information Technology, Laboratory of Computer and Information Science, Otaniemi, Finland

  • Lauzon N, Anctil F, Petrinovic J (2004) Characterization of soil moisture conditions at temporal scales from a few days to annual. Hydrol Processes 18:3235–3254. doi:10.1002/hyp.5656

    Article  Google Scholar 

  • Lauzon N, Anctil F, Baxter CW (2006) Clustering of heterogeneous precipitation fields for the assessment and possible improvement of lumped neural network models for streamflow forecasts. Hydrol Earth Syst Sci 10:485–494

    Google Scholar 

  • Lee B-H, Scholz M (2006) Application of the self-organizing map (SOM) to assess the heavy metal removal performance in experimental constructed wetlands. Water Res 40:3367–3374

    Article  CAS  Google Scholar 

  • Lerner B, Guterman H, Aladjem M, Dinstein I (2000) On the initialisation of Sammon’s nonlinear mapping. Pattern Anal Appl 3:61–68

    Article  Google Scholar 

  • Lin G-F, Chen L-H (2005) Time series forecasting by combining the radial basis function network and the self-organizing map. Hydrol Processes 19:1925–1937

    Article  Google Scholar 

  • Liong S-Y, Lim W-H, Kojiri T, Hori T (2000) Advance flood forecasting for flood stricken Bangladesh with a fuzzy reasoning method. Hydrol Processes 14:431–448

    Article  Google Scholar 

  • Lischeid G (2001) Investigating short-term dynamics and long-term trends of SO4 in the runoff of a forested catchment using artificial neural networks. J Hydrol 243:31–42

    Article  CAS  Google Scholar 

  • Lischeid G, Bittersohl J (2008) Tracing biogeochemical processes in stream water and groundwater using nonlinear statistics. J Hydrol 357:11–28. doi:10.1016/j.jhydrol.2008.03.013

    Article  CAS  Google Scholar 

  • Lischeid G, Kolb A, Alewell C (2002) Apparent translatory flow in groundwater recharge and runoff generation. J Hydrol 265:195–211

    Article  CAS  Google Scholar 

  • Lischeid G, Lange H, Moritz K, Büttcher H (2004) Dynamics of runoff and runoff chemistry at the Lehstenbach and Steinkreuz catchment. In: Matzner E (ed) Biogeochemistry of forested catchments in a changing environment. A German case study. Ecological studies 172. Springer, Heidelberg, pp 399–436

    Google Scholar 

  • Mahecha M, Martínez A, Lischeid G, Beck E (2007) Nonlinear dimensionality reduction as a new ordination approach for extracting and visualizing biodiversity patterns in tropical montane forest vegetation data. Ecol Inform 2:138–149. doi:10.1016/j.ecoinf.2007.05.002

    Article  Google Scholar 

  • Matzner E, Zuber T, Alewell C, Lischeid G, Moritz K (2004) Trends in deposition and canopy leaching of mineral elements as indicated by bulk deposition and throughfall measurements. In: Matzner E (ed) Biogeochemistry of forested catchments in a changing environment. A German case study. Ecological studies 172. Springer, Heidelberg, pp 233–250

    Google Scholar 

  • Mao J, Jain AK (1995) Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans Neural Netw 6:296–317

    Article  CAS  Google Scholar 

  • Nychka D (2007) fields: Tools for spatial data. R package version 4.1. http://www.image.ucar.edu/GSP/Software/Fields

  • Peeters L, Bac F, Lobo V, Dassargues A (2007) Exploratory data analysis and clustering of multivariate spatial hydrogeological data by means of GEO3DSOM, a variant of Kohonen’s self-organizing map. Hydrol Earth Syst Sci 11:1309–1321

    Article  CAS  Google Scholar 

  • R Development Core Team (2006) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. http://www.R-project.org

  • Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge

    Google Scholar 

  • Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput C-18/5: 401-409

    Google Scholar 

  • Sanchez-Martos F, Aguilera PA, Garrido-Frenich A, Torres JA, Pulido-Bosch A (2002) Assessment of groundwater quality by means of self-organizing maps: Application in a semiarid area. Environ Manage 30:716–726. doi:10.1007/s00267-002-2746-z

    Article  Google Scholar 

  • Singh KP, Malik A, Sinha S (2005) Water quality assessment and apportionment of pollution sources of Gomti river (India) using multivariate statistical techniques—a case study. Anal Chim Acta 538:355–374

    Article  CAS  Google Scholar 

  • Thyne G, Guler C, Poeter E (2004) Sequential analysis of hydrochemical data for watershed characterization. Ground Water 42:711–723

    Article  CAS  Google Scholar 

  • Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York

    Google Scholar 

  • Vesanto J (1999) SOM-based data visualization methods. Intell Data Anal 3:111–126

    Article  Google Scholar 

  • Weyer C, Lischeid G, Aquilina L, Pierson-Wickmann A-C, Martin C (2008) Investigating mineralogical sources of the buffering capacity of a granite catchment using strontium isotopes. Appl Geochem (in press). doi:10.1016/j.apgeochem.2008.04.006

  • Yan J (2004) som: self-organizing map. R package version 0.3–4

Download references

Acknowledgments

The author is indebted to numerous colleagues that contributed to the data set. Among these are Jochen Bittersohl, Klaus Moritz, and Stefan Wunderlich from the former Bavarian Water Resources Agency, now Bavarian Environmental Agency, as well as Andreas Kolb from the former Department of Hydrogeology at BITÖK, and Gunter Ilgen and his crew from the BITÖK Central Laboratory. Part of this work has been financed by the German Federal Ministry of Education and Research, grant no. PT BEO 51-0339476 A-D.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gunnar Lischeid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lischeid, G. Non-linear visualization and analysis of large water quality data sets: a model-free basis for efficient monitoring and risk assessment. Stoch Environ Res Risk Assess 23, 977–990 (2009). https://doi.org/10.1007/s00477-008-0266-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-008-0266-y

Keywords

Navigation