Skip to main content
Log in

On selection of kernel parametes in relevance vector machines for hydrologic applications

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

Recent advances in statistical learning theory have yielded tools that are improving our capabilities for analyzing large and complex datasets. Among such tools, relevance vector machines (RVMs) are finding increasing applications in hydrology because of (1) their excellent generalization properties, and (2) the probabilistic interpretation associated with this technique that yields prediction uncertainty. RVMs combine the strengths of kernel-based methods and Bayesian theory to establish relationships between a set of input vectors and a desired output. However, a bias–variance analysis of RVM estimates revealed that a careful selection of kernel parameters is of paramount importance for achieving good performance from RVMs. In this study, several analytic methods are presented for selection of kernel parameters. These methods rely on structural properties of the data rather than expensive re-sampling approaches commonly used in RVM applications. An analytical expression for prediction risk in leave-one-out cross validation is derived. For brevity, the effectiveness of the proposed methods is assessed first by data generated from the benchmark sinc function, followed by an example involving estimation of hydraulic conductivity values over a field based on observations. It is shown that a straightforward maximization of likelihood function can lead to misleading results. The proposed methods are found to yield robust estimates of parameters for kernel functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Agarwal A, Triggs B (2006) Recovering 3D human pose from monocular images. IEEE Trans Pattern Anal Mach Intell 28(1):44–58

    Article  Google Scholar 

  • Asefa T, Kemblowski MW, Urroz G, McKee M, Khalil AF (2004) Support vectors-based groundwater head observation networks design. Water Resour Res 40 (11): W11509, DOI 11510.11029/12004WR003304

  • Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach Learn 36(1–2):105–139

    Article  Google Scholar 

  • Berardi VL, Zhang GP (2003) An empirical investigation of bias and variance in time series forecasting: modeling considerations and error evaluation. IEEE Trans Neural Netw 14(3):668–679

    Article  CAS  Google Scholar 

  • Berger JO (1985) Statistical decision theory and Bayesian analysis. Springer, Berlin Heidelberg New York, xvi, 617 p

  • Breiman L (1998) Bias–variance, regularization, instability and stabilization. In: Bishop C (ed) Proceedings of the neural networks and machine learning, Cambridge, UK, pp 27–56

  • Buciu I, Kotropoulos C, Pitas I (2002) On the stability of support vector machines for face detection. In: Proceedings of the international conference on image processing, Rochester, NY, pp 121–124

  • Chalimourda A, Scholkopf B, Smola AJ (2004) Experimentally optimal nu in support vector regression for different noise models and parameter settings. Neural Netw 17(1):127–141

    Article  Google Scholar 

  • Cherkassky V, Ma YQ (2003) Comparison of model selection for regression. Neural Comput 15(7):1691–1714

    Article  Google Scholar 

  • Cherkassky V, Ma YQ (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17(1):113–126

    Article  Google Scholar 

  • Cherkassky V, Mulier F (1998) Learning from data: concepts, theory, and methods. Wiley, New York, xviii, 441 pp

  • Cherkassky V, Mulier F (1999) Vapnik-Chervonenkis (VC) learning theory and its applications. IEEE Trans Neural Netw 10(5):985–987

    Article  Google Scholar 

  • Cherkassky V, Shao XH, Mulier FM, Vapnik VN (1999) Model complexity control for regression using VC generalization bounds. IEEE Trans Neural Netw 10(5):1075–1089

    Article  CAS  Google Scholar 

  • Cover TM (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput EC-14(3):326–334

    Article  Google Scholar 

  • Evgeniou T, Pontil M, Poggio T (2000) Statistical learning theory: a primer. Int J Comput Vis 38(1):9–13

    Article  Google Scholar 

  • Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias–variance dilemma. Neural Comput 4(1):1–58

    Google Scholar 

  • Gregory JM, Wigley TML, Jones PD (1992) Determining and interpreting the order of a 2-state Markov-Chain—application to models of daily precipitation. Water Resour Res 28(5):1443–1446

    Article  Google Scholar 

  • Gyasi-Agyei Y (2001) Modelling diurnal cycles in point rainfall properties. Hydrol Processes 15(4):595–608

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman JH (2001) The elements of statistical learning: data mining, inference, and prediction. Springer series in statistics. Springer, Berlin Heidelberg New York, xvi, 533 p

  • Haykin SS (1999) Neural networks: a comprehensive foundation. Prentice Hall, Upper Saddle River, xxi, 842 p

  • Honjo Y, Kashiwagi N (1999) Matching objective and subjective information in groundwater inverse analysis by Akaike’s Bayesian information criterion. Water Resour Res 35(2):435–447

    Article  CAS  Google Scholar 

  • Khalil AF, Almasri MN, McKee M, Kaluarachchi JJ (2005a) Applicability of statistical learning algorithms in groundwater quality modeling. Water Resour Res 41(5): W05010, DOI 05010.01029/02004WR003608

  • Khalil AF, McKee M, Kemblowski M, Asefa T (2005b) Sparse Bayesian learning machine for real-time management of reservoir releases. Water Resour Res 41(11): W11401, DOI 11410.11029/12004WR003891

  • Khalil AF, McKee M, Kemblowski M, Asefa T, Bastidas L (2006) Multiobjective analysis of chaotic dynamic systems with sparse learning machines. Adv Water Resour 29(1):72–88

    Article  Google Scholar 

  • Knotters M, De Gooijer JG (1999) TARSO modeling of water table depths. Water Resour Res 35(3):695–705

    Article  Google Scholar 

  • Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the 13th international conference of machine learning, Bari, Italy, pp 275–283

  • Kovvali N, Carin L (2004) Analysis of wideband forward looking synthetic aperture radar for sensing land mines. Radio Sci 39(4):RS4S08, DOI 10.1029/2003RS002967

  • Lanckriet GRG, Cristianini N, Bartlett P, El Ghaoui L, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72

    Google Scholar 

  • Luntz A, Brailovsky V (1969) On estimation of characters obtained in statistical procedure of recognition. Techicheskaya Kibernetica, 3 (in Russian)

  • MacKay DJC (1994) Bayesian methods for backpropagation networks. In: Domany E, van Hemmen JL, Schulten K (eds) Models of neural networks III. Springer, Berlin Heidelberg New York, pp 211–254

    Google Scholar 

  • Majumder SK, Ghosh N, Gupta PK (2005) Relevance vector machine for optical diagnosis of cancer. Lasers Surg Med 36(4):323–333

    Article  Google Scholar 

  • Meyer D, Leisch F, Hornik K (2003) The support vector machine under test. Neurocomputing 55(1–2):169–186

    Article  Google Scholar 

  • Mika S, Ratsch G, Weston J, Scholkopf B, Mullers KR (1999) Fisher discriminant analysis with kernels, neural networks for signal processing IX. In: Proceedings of the 1999 IEEE signal processing society workshop, Madison, WI, USA, pp 41–48

  • Muller KR, Mika S, Ratsch G, Tsuda K, Scholkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–201

    Article  CAS  Google Scholar 

  • Mutua FM (1994) The use of the Akaike information criterion in the identification of an optimum flood frequency model. Hydrological Sciences Journal-Journal Des Sciences Hydrologiques 39(3):235–244

    Article  Google Scholar 

  • Neal RM (1996) Bayesian learning for neural networks. Springer, Berlin Heidelberg New York, xiv, 183 p

  • Quinonero-Candela J, Hansen LK (2002) Time series prediction based on the relevance vector machine with adaptive kernels. In: IEEE international conference on acoustics, speech and signal processing, Orlando, FL, USA, pp 985–988

  • Rasmussen CE, Williams CKI. (2006) Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, Cambridge, xviii, 248 p

  • Scholkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. Adaptive computation and machine learning. MIT Press, Cambridge, xviii, 626 pp

  • Scholkopf B, Burges CJC, Smola AJ (eds) (1999) Advances in kernel methods: support vector learning. MIT Press, Cambridge, vii, 376 p

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Google Scholar 

  • Shao XH, Cherkassky V, Li W (2000) Measuring the VC-dimension using optimized experimental design. Neural Comput 12(8):1969–1986

    Article  CAS  Google Scholar 

  • Smola AJ, Scholkopf B, Muller KR (1998) The connection between regularization operators and support vector kernels. Neural Netw 11(4):637–649

    Article  Google Scholar 

  • Snijder E, Babuska R, Verhaegen M (1998) Finding the bias–variance tradeoff during neural network training and its implication on structure selection. In: International conference on neural networks, Anchorage, AK, USA, pp 1613–1618

  • Stankovic S, Milosavljevic M, Buturovic L, Stankovic M, Stankovic M (2002) Statistical learning: data mining and prediction with applications to medicine and genomics. In: 6th seminar on neural network applications in electrical engineering. NEUREL 2002, Belgrade, Yugoslavia, pp 5–6

  • Strang G (2006) Linear algebra and its applications. Thomson, Brooks/Cole, Belmont, viii, 487 p

  • Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1(3):211–244

    Article  Google Scholar 

  • Tipping ME (2004) Bayesian inference: an introduction to principles and practice in machine learning. Advanced lectures on machine learning. Lecture Notes in Artificial Intelligence. Springer, Berlin Heidelberg NewYork, pp 41–62

  • Twining CJ, Taylor CJ (2003) The use of kernel principal component analysis to model data distributions. Pattern Recognit 36(1):217–227

    Article  Google Scholar 

  • Twomey JM, Smith AE (1998) Bias and variance of validation methods for function approximation neural networks under conditions of sparse data. IEEE Trans Syst Man Cybernet C Appl Rev 28(3):417–430

    Article  Google Scholar 

  • Valentini G, Dietterich TG (2004) Bias–variance analysis of support vector machines for the development of SVM-based ensemble methods. J Mach Learn Res 5:725–775

    Google Scholar 

  • Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin Heidelberg New York, xv, 188 pp

  • Vapnik VN (1998) Statistical learning theory. Adaptive and learning systems for signal processing, communications, and control. Wiley, New York, xxiv, 736 pp

  • Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999

    Article  CAS  Google Scholar 

  • Wang WJ, Xu ZB, Lu WZ, Zhang XY (2003) Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing 55(3–4):643–663

    Article  Google Scholar 

  • Wei LY, Yang YY, Nishikawa RM, Wernick MN, Edwards A (2005) Relevance vector machine for automatic detection of clustered microcalcifications. IEEE Trans Med Imaging 24(10):1278–1285

    Article  Google Scholar 

  • Wu W, Massart DL, deJong S (1997) The kernel PCA algorithms for wide data. 1. Theory and algorithms. Chemometr Intell Lab Syst 36(2):165–172

    Article  CAS  Google Scholar 

  • Xu ZX, Li JY (2002) Short-term inflow forecasting using an artificial neural network model. Hydrol Processes 16(12):2423–2439

    Article  Google Scholar 

  • Zhang R (1990) Soil variability and geostatistical applications. Ph.D. thesis, The University of Arizona

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rao S. Govindaraju.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tripathi, S., Govindaraju, R.S. On selection of kernel parametes in relevance vector machines for hydrologic applications. Stoch Environ Res Risk Assess 21, 747–764 (2007). https://doi.org/10.1007/s00477-006-0087-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-006-0087-9

Keywords

Navigation