Bayesian kernel projections for classification of high dimensional data

Domijan, Katarina; Wilson, Simon P.

doi:10.1007/s11222-009-9161-8

Bayesian kernel projections for classification of high dimensional data

Published: 01 December 2009

Volume 21, pages 203–216, (2011)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Katarina Domijan¹ &
Simon P. Wilson²

261 Accesses
4 Citations
Explore all metrics

Abstract

A Bayesian multi-category kernel classification method is proposed. The algorithm performs the classification of the projections of the data to the principal axes of the feature space. The advantage of this approach is that the regression coefficients are identifiable and sparse, leading to large computational savings and improved classification performance. The degree of sparsity is regulated in a novel framework based on Bayesian decision theory. The Gibbs sampler is implemented to find the posterior distributions of the parameters, thus probability distributions of prediction can be obtained for new data points, which gives a more complete picture of classification. The algorithm is aimed at high dimensional data sets where the dimension of measurements exceeds the number of observations. The applications considered in this paper are microarray, image processing and near-infrared spectroscopy data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25, 821–837 (1964)
MathSciNet Google Scholar
Bakir, G.H., Weston, J., Schölkopf, B.: Learning to find pre-images. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16. pp. 449–456. MIT Press, Cambridge (2004)
Google Scholar
Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. Wiley, Chichester (1994)
Book MATH Google Scholar
Bishop, C.M., Tipping, M.E.: Variational relevance vector machines. In: UAI ’00: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, USA, pp. 46–53. Morgan Kaufmann, San Mateo (2000)
Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152. ACM Press, New York (1992)
Chapter Google Scholar
Brown, P.J., Fearn, T., Vannucci, M.: The choice of variables in multivariate regression: a non-conjugate Bayesian decision theory approach. Biometrika 86(3), 635–648 (1999)
Article MATH MathSciNet Google Scholar
Chakraborty, S., Mallick, B.K., Ghosh, D., Ghosh, M., Dougherty, E.: Gene expression-based glioma classification using hierarchical Bayesian vector machines. Sankhya 69, 514–547 (2007)
MATH MathSciNet Google Scholar
Dean, N., Murphy, T.B., Downey, G.: Using unlabelled data to update classification rules with applications in food authenticity studies. J. R. Stat. Soc. C 55(1), 1–14 (2006)
Article MATH MathSciNet Google Scholar
Denison, D.G.T., Holmes, C.C., Mallick, B.K., Smith, A.F.M.: Bayesian Methods for Nonlinear Classification and Regression. Wiley, Chichester (2002)
MATH Google Scholar
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A.: e1071: Miscellaneous functions of the department of statistics (e1071), TU-Wien, version 1.5-11. (2005). http://CRAN.R-project.org/
Figueiredo, M.: Adaptive sparseness using Jeffreys prior. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 697–704. MIT Press, Cambridge (2002)
Google Scholar
Figueiredo, M.: Adaptive sparseness for supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1150–1159 (2003)
Article Google Scholar
Fouskakis, D., Draper, D.: Stochastic optimization: a review. Int. Stat. Rev. 70, 315–349 (2002)
Article MATH Google Scholar
Gelfand, A.E., Dey, D.K., Chang, H.: Model determination using predictive distributions with implementations via sampling-based methods. In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics, vol. 4, pp. 147–167. Oxford Univ. Press, London (1992)
Google Scholar
Gelman, A.: Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 1(3), 515–533 (2006)
MathSciNet Google Scholar
George, E., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993)
Article Google Scholar
George, E., McCulloch, R.E.: Approaches for Bayesian variable selection. Stat. Sin. 7, 339–373 (1997)
MATH Google Scholar
Herbrich, R., Graepel, T., Campbell, C.: Bayesian learning in reproducing kernel Hilbert spaces—the usefulness of the Bayes point. Technical Report TR-99-11, Technical University Berlin (1999)
Holmes, C., Held, L.: Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1, 145–168 (2005)
MathSciNet Google Scholar
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: Kernlab an S4 package for kernel methods in R. J. Stat. Softw. 11 (2004)
Key, J.T., Pericci, L.R., Smith, A.F.M.: Bayesian model choice: what and why? In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics, vol. 6, pp. 343–370. Oxford University Press, London (1996)
Google Scholar
Khan, J., Wei, J.S., Ringnír, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)
Article Google Scholar
Krishnapuram, B., Carin, L., Figueiredo, M., Hartemink, A.: Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 957–968 (2005)
Article Google Scholar
Kwok, J.T.Y.: Moderating the outputs of support vector machine classifiers. IEEE Trans. Neural Netw. 5, 1018–1031 (1999)
Article Google Scholar
Lambert, P.C.: Comment on article by Browne and Draper. Bayesian Anal. 1(3), 543–546 (2006)
MathSciNet Google Scholar
Lee, Y., Lin, Y., Wahba, G.: Multicategory Support Vector Machines: Theory, and application to the classification of microarray data and satellite radiance data. J. Am. Stat. Assoc. 99, 67–81 (2004)
Article MATH MathSciNet Google Scholar
Lindley, D.V.: The choice of variables in multiple regression (with discussion). J. R. Stat. Soc. B 30, 31–66 (1968)
MathSciNet Google Scholar
Mallick, B.K., Ghosh, D., Ghosh, M.: Bayesian classification of tumors using gene expression data. J. R. Stat. Soc. B 67, 219–234 (2005)
Article MATH MathSciNet Google Scholar
Marriott, J.M., Spencer, N.M., Pettitt, A.N.: A Bayesian approach to selecting covariates for prediction. Scand. J. Stat. 28, 87–97 (2001)
Article MATH MathSciNet Google Scholar
Neal, R.M.: Bayesian Learning for Neural Networks. Springer, New York (1996)
MATH Google Scholar
Neal, R.M.: Regression and classification using Gaussian process priors (with discussion). In: Bernardo, J.M., et al. (eds.) Bayesian Statistics, vol. 6, pp. 475–501. Oxford University Press, London (1998)
Google Scholar
Opper, M., Winther, O.: Gaussian process classification and svm: Mean field results and leave one out estimator. In: Smola, A.J., Bartlett, P., Schölkoph, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 43–65. Cambridge, MA (2000)
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, New York (1986)
Google Scholar
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2008). http://www.R-project.org. ISBN 3-900051-07-0
Google Scholar
Rasmussen, C.E.: Evaluation of Gaussian processes and other methods for non-linear regression. PhD, Dept. of Computer Science, University of Toronto (1996)
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
MATH Google Scholar
Schölkopf, B., Smola, A.: Learning with Kernels- Support Vector Machines, Reproducing Kernel Hilbert Spaces, Regularization, Optimization and Beyond. MIT Press, Cambridge (2002)
Google Scholar
Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10, 1299–1319 (1998)
Article Google Scholar
Schölkopf, B., Mika, S., Burges, C.J.C., et al.: Input space vs feature space in kernel-based methods. IEEE Trans. Neural Netw. 10(5), 1000–1017 (1999)
Article Google Scholar
Seeger, M.: Bayesian model selection for support vector machines, Gaussian processes and other kernel classifiers. In: Leen, T.K., Solla, S.A., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12. pp. 603–609. MIT Press, Cambridge (2000)
Google Scholar
Sollich, P.: Bayesian methods for support vector machines: Evidence and predictive class probabilities. Mach. Learn. 46(13), 21–52 (2002)
Article MATH Google Scholar
Spiegelhalter, D.J., Thomas, A., Best, N.G., Gilks, W.R.: BUGS Examples, vol. 1, Version 0.5. MRC Biostatistics Unit, Cambridge (1996a)
Google Scholar
Spiegelhalter, D.J., Thomas, A., Best, N.G., Gilks, W.R.: BUGS Examples, vol. 2, Version 0.5. MRC Biostatistics Unit., Cambridge (1996b)
Google Scholar
Thisted, R.A.: Elements of Statistical Computing. Chapman and Hall, New York (1988)
MATH Google Scholar
Tipping, M.E.: The relevance vector machine. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12. pp. 652–658. MIT Press, Cambridge (2000)
Google Scholar
Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1, 211–244 (2001)
Article MATH MathSciNet Google Scholar
Wahba, G.: Spline Models for Observational Data. SIAM, Philadelphia (1990)
MATH Google Scholar
Williams, C.K.I., Barber, D.: Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1342–1351 (1998)
Article Google Scholar
Zhang, Z., Jordan, M.I.: Bayesian multicategory support vector machines. In: Uncertainty in Artificial Intelligence (UAI), Proceedings of the Twenty-Second Conference (2006)

Download references

Author information

Authors and Affiliations

Mathematics Department, NUI Maynooth, Maynooth, Ireland
Katarina Domijan
School of Computer Science, Trinity College Dublin, Dublin, Ireland
Simon P. Wilson

Authors

Katarina Domijan
View author publications
You can also search for this author in PubMed Google Scholar
Simon P. Wilson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katarina Domijan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Domijan, K., Wilson, S.P. Bayesian kernel projections for classification of high dimensional data. Stat Comput 21, 203–216 (2011). https://doi.org/10.1007/s11222-009-9161-8

Download citation

Received: 25 February 2009
Accepted: 17 November 2009
Published: 01 December 2009
Issue Date: April 2011
DOI: https://doi.org/10.1007/s11222-009-9161-8

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian kernel projections for classification of high dimensional data

Abstract

Access this article

Similar content being viewed by others

Introduction to Machine Learning

Survey on SVM and their application in image classification

Maximizing adjusted covariance: new supervised dimension reduction for classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Bayesian kernel projections for classification of high dimensional data

Abstract

Access this article

Similar content being viewed by others

Introduction to Machine Learning

Survey on SVM and their application in image classification

Maximizing adjusted covariance: new supervised dimension reduction for classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation