Skip to main content
Log in

Bayesian kernel projections for classification of high dimensional data

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

A Bayesian multi-category kernel classification method is proposed. The algorithm performs the classification of the projections of the data to the principal axes of the feature space. The advantage of this approach is that the regression coefficients are identifiable and sparse, leading to large computational savings and improved classification performance. The degree of sparsity is regulated in a novel framework based on Bayesian decision theory. The Gibbs sampler is implemented to find the posterior distributions of the parameters, thus probability distributions of prediction can be obtained for new data points, which gives a more complete picture of classification. The algorithm is aimed at high dimensional data sets where the dimension of measurements exceeds the number of observations. The applications considered in this paper are microarray, image processing and near-infrared spectroscopy data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25, 821–837 (1964)

    MathSciNet  Google Scholar 

  • Bakir, G.H., Weston, J., Schölkopf, B.: Learning to find pre-images. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16. pp. 449–456. MIT Press, Cambridge (2004)

    Google Scholar 

  • Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. Wiley, Chichester (1994)

    Book  MATH  Google Scholar 

  • Bishop, C.M., Tipping, M.E.: Variational relevance vector machines. In: UAI ’00: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, USA, pp. 46–53. Morgan Kaufmann, San Mateo (2000)

    Google Scholar 

  • Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152. ACM Press, New York (1992)

    Chapter  Google Scholar 

  • Brown, P.J., Fearn, T., Vannucci, M.: The choice of variables in multivariate regression: a non-conjugate Bayesian decision theory approach. Biometrika 86(3), 635–648 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Chakraborty, S., Mallick, B.K., Ghosh, D., Ghosh, M., Dougherty, E.: Gene expression-based glioma classification using hierarchical Bayesian vector machines. Sankhya 69, 514–547 (2007)

    MATH  MathSciNet  Google Scholar 

  • Dean, N., Murphy, T.B., Downey, G.: Using unlabelled data to update classification rules with applications in food authenticity studies. J. R. Stat. Soc. C 55(1), 1–14 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Denison, D.G.T., Holmes, C.C., Mallick, B.K., Smith, A.F.M.: Bayesian Methods for Nonlinear Classification and Regression. Wiley, Chichester (2002)

    MATH  Google Scholar 

  • Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A.: e1071: Miscellaneous functions of the department of statistics (e1071), TU-Wien, version 1.5-11. (2005). http://CRAN.R-project.org/

  • Figueiredo, M.: Adaptive sparseness using Jeffreys prior. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 697–704. MIT Press, Cambridge (2002)

    Google Scholar 

  • Figueiredo, M.: Adaptive sparseness for supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1150–1159 (2003)

    Article  Google Scholar 

  • Fouskakis, D., Draper, D.: Stochastic optimization: a review. Int. Stat. Rev. 70, 315–349 (2002)

    Article  MATH  Google Scholar 

  • Gelfand, A.E., Dey, D.K., Chang, H.: Model determination using predictive distributions with implementations via sampling-based methods. In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics, vol. 4, pp. 147–167. Oxford Univ. Press, London (1992)

    Google Scholar 

  • Gelman, A.: Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 1(3), 515–533 (2006)

    MathSciNet  Google Scholar 

  • George, E., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993)

    Article  Google Scholar 

  • George, E., McCulloch, R.E.: Approaches for Bayesian variable selection. Stat. Sin. 7, 339–373 (1997)

    MATH  Google Scholar 

  • Herbrich, R., Graepel, T., Campbell, C.: Bayesian learning in reproducing kernel Hilbert spaces—the usefulness of the Bayes point. Technical Report TR-99-11, Technical University Berlin (1999)

  • Holmes, C., Held, L.: Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1, 145–168 (2005)

    MathSciNet  Google Scholar 

  • Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: Kernlab an S4 package for kernel methods in R. J. Stat. Softw. 11 (2004)

  • Key, J.T., Pericci, L.R., Smith, A.F.M.: Bayesian model choice: what and why? In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics, vol. 6, pp. 343–370. Oxford University Press, London (1996)

    Google Scholar 

  • Khan, J., Wei, J.S., Ringnír, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)

    Article  Google Scholar 

  • Krishnapuram, B., Carin, L., Figueiredo, M., Hartemink, A.: Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 957–968 (2005)

    Article  Google Scholar 

  • Kwok, J.T.Y.: Moderating the outputs of support vector machine classifiers. IEEE Trans. Neural Netw. 5, 1018–1031 (1999)

    Article  Google Scholar 

  • Lambert, P.C.: Comment on article by Browne and Draper. Bayesian Anal. 1(3), 543–546 (2006)

    MathSciNet  Google Scholar 

  • Lee, Y., Lin, Y., Wahba, G.: Multicategory Support Vector Machines: Theory, and application to the classification of microarray data and satellite radiance data. J. Am. Stat. Assoc. 99, 67–81 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Lindley, D.V.: The choice of variables in multiple regression (with discussion). J. R. Stat. Soc. B 30, 31–66 (1968)

    MathSciNet  Google Scholar 

  • Mallick, B.K., Ghosh, D., Ghosh, M.: Bayesian classification of tumors using gene expression data. J. R. Stat. Soc. B 67, 219–234 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Marriott, J.M., Spencer, N.M., Pettitt, A.N.: A Bayesian approach to selecting covariates for prediction. Scand. J. Stat. 28, 87–97 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Neal, R.M.: Bayesian Learning for Neural Networks. Springer, New York (1996)

    MATH  Google Scholar 

  • Neal, R.M.: Regression and classification using Gaussian process priors (with discussion). In: Bernardo, J.M., et al. (eds.) Bayesian Statistics, vol. 6, pp. 475–501. Oxford University Press, London (1998)

    Google Scholar 

  • Opper, M., Winther, O.: Gaussian process classification and svm: Mean field results and leave one out estimator. In: Smola, A.J., Bartlett, P., Schölkoph, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 43–65. Cambridge, MA (2000)

  • Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, New York (1986)

    Google Scholar 

  • R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2008). http://www.R-project.org. ISBN 3-900051-07-0

    Google Scholar 

  • Rasmussen, C.E.: Evaluation of Gaussian processes and other methods for non-linear regression. PhD, Dept. of Computer Science, University of Toronto (1996)

  • Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)

    MATH  Google Scholar 

  • Schölkopf, B., Smola, A.: Learning with Kernels- Support Vector Machines, Reproducing Kernel Hilbert Spaces, Regularization, Optimization and Beyond. MIT Press, Cambridge (2002)

    Google Scholar 

  • Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10, 1299–1319 (1998)

    Article  Google Scholar 

  • Schölkopf, B., Mika, S., Burges, C.J.C., et al.: Input space vs feature space in kernel-based methods. IEEE Trans. Neural Netw. 10(5), 1000–1017 (1999)

    Article  Google Scholar 

  • Seeger, M.: Bayesian model selection for support vector machines, Gaussian processes and other kernel classifiers. In: Leen, T.K., Solla, S.A., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12. pp. 603–609. MIT Press, Cambridge (2000)

    Google Scholar 

  • Sollich, P.: Bayesian methods for support vector machines: Evidence and predictive class probabilities. Mach. Learn. 46(13), 21–52 (2002)

    Article  MATH  Google Scholar 

  • Spiegelhalter, D.J., Thomas, A., Best, N.G., Gilks, W.R.: BUGS Examples, vol. 1, Version 0.5. MRC Biostatistics Unit, Cambridge (1996a)

    Google Scholar 

  • Spiegelhalter, D.J., Thomas, A., Best, N.G., Gilks, W.R.: BUGS Examples, vol. 2, Version 0.5. MRC Biostatistics Unit., Cambridge (1996b)

    Google Scholar 

  • Thisted, R.A.: Elements of Statistical Computing. Chapman and Hall, New York (1988)

    MATH  Google Scholar 

  • Tipping, M.E.: The relevance vector machine. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12. pp. 652–658. MIT Press, Cambridge (2000)

    Google Scholar 

  • Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1, 211–244 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Wahba, G.: Spline Models for Observational Data. SIAM, Philadelphia (1990)

    MATH  Google Scholar 

  • Williams, C.K.I., Barber, D.: Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1342–1351 (1998)

    Article  Google Scholar 

  • Zhang, Z., Jordan, M.I.: Bayesian multicategory support vector machines. In: Uncertainty in Artificial Intelligence (UAI), Proceedings of the Twenty-Second Conference (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katarina Domijan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Domijan, K., Wilson, S.P. Bayesian kernel projections for classification of high dimensional data. Stat Comput 21, 203–216 (2011). https://doi.org/10.1007/s11222-009-9161-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-009-9161-8

Navigation