Abstract
A Bayesian multi-category kernel classification method is proposed. The algorithm performs the classification of the projections of the data to the principal axes of the feature space. The advantage of this approach is that the regression coefficients are identifiable and sparse, leading to large computational savings and improved classification performance. The degree of sparsity is regulated in a novel framework based on Bayesian decision theory. The Gibbs sampler is implemented to find the posterior distributions of the parameters, thus probability distributions of prediction can be obtained for new data points, which gives a more complete picture of classification. The algorithm is aimed at high dimensional data sets where the dimension of measurements exceeds the number of observations. The applications considered in this paper are microarray, image processing and near-infrared spectroscopy data.
Similar content being viewed by others
References
Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25, 821–837 (1964)
Bakir, G.H., Weston, J., Schölkopf, B.: Learning to find pre-images. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16. pp. 449–456. MIT Press, Cambridge (2004)
Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. Wiley, Chichester (1994)
Bishop, C.M., Tipping, M.E.: Variational relevance vector machines. In: UAI ’00: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, USA, pp. 46–53. Morgan Kaufmann, San Mateo (2000)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152. ACM Press, New York (1992)
Brown, P.J., Fearn, T., Vannucci, M.: The choice of variables in multivariate regression: a non-conjugate Bayesian decision theory approach. Biometrika 86(3), 635–648 (1999)
Chakraborty, S., Mallick, B.K., Ghosh, D., Ghosh, M., Dougherty, E.: Gene expression-based glioma classification using hierarchical Bayesian vector machines. Sankhya 69, 514–547 (2007)
Dean, N., Murphy, T.B., Downey, G.: Using unlabelled data to update classification rules with applications in food authenticity studies. J. R. Stat. Soc. C 55(1), 1–14 (2006)
Denison, D.G.T., Holmes, C.C., Mallick, B.K., Smith, A.F.M.: Bayesian Methods for Nonlinear Classification and Regression. Wiley, Chichester (2002)
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A.: e1071: Miscellaneous functions of the department of statistics (e1071), TU-Wien, version 1.5-11. (2005). http://CRAN.R-project.org/
Figueiredo, M.: Adaptive sparseness using Jeffreys prior. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 697–704. MIT Press, Cambridge (2002)
Figueiredo, M.: Adaptive sparseness for supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1150–1159 (2003)
Fouskakis, D., Draper, D.: Stochastic optimization: a review. Int. Stat. Rev. 70, 315–349 (2002)
Gelfand, A.E., Dey, D.K., Chang, H.: Model determination using predictive distributions with implementations via sampling-based methods. In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics, vol. 4, pp. 147–167. Oxford Univ. Press, London (1992)
Gelman, A.: Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 1(3), 515–533 (2006)
George, E., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993)
George, E., McCulloch, R.E.: Approaches for Bayesian variable selection. Stat. Sin. 7, 339–373 (1997)
Herbrich, R., Graepel, T., Campbell, C.: Bayesian learning in reproducing kernel Hilbert spaces—the usefulness of the Bayes point. Technical Report TR-99-11, Technical University Berlin (1999)
Holmes, C., Held, L.: Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1, 145–168 (2005)
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: Kernlab an S4 package for kernel methods in R. J. Stat. Softw. 11 (2004)
Key, J.T., Pericci, L.R., Smith, A.F.M.: Bayesian model choice: what and why? In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics, vol. 6, pp. 343–370. Oxford University Press, London (1996)
Khan, J., Wei, J.S., Ringnír, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)
Krishnapuram, B., Carin, L., Figueiredo, M., Hartemink, A.: Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 957–968 (2005)
Kwok, J.T.Y.: Moderating the outputs of support vector machine classifiers. IEEE Trans. Neural Netw. 5, 1018–1031 (1999)
Lambert, P.C.: Comment on article by Browne and Draper. Bayesian Anal. 1(3), 543–546 (2006)
Lee, Y., Lin, Y., Wahba, G.: Multicategory Support Vector Machines: Theory, and application to the classification of microarray data and satellite radiance data. J. Am. Stat. Assoc. 99, 67–81 (2004)
Lindley, D.V.: The choice of variables in multiple regression (with discussion). J. R. Stat. Soc. B 30, 31–66 (1968)
Mallick, B.K., Ghosh, D., Ghosh, M.: Bayesian classification of tumors using gene expression data. J. R. Stat. Soc. B 67, 219–234 (2005)
Marriott, J.M., Spencer, N.M., Pettitt, A.N.: A Bayesian approach to selecting covariates for prediction. Scand. J. Stat. 28, 87–97 (2001)
Neal, R.M.: Bayesian Learning for Neural Networks. Springer, New York (1996)
Neal, R.M.: Regression and classification using Gaussian process priors (with discussion). In: Bernardo, J.M., et al. (eds.) Bayesian Statistics, vol. 6, pp. 475–501. Oxford University Press, London (1998)
Opper, M., Winther, O.: Gaussian process classification and svm: Mean field results and leave one out estimator. In: Smola, A.J., Bartlett, P., Schölkoph, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 43–65. Cambridge, MA (2000)
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, New York (1986)
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2008). http://www.R-project.org. ISBN 3-900051-07-0
Rasmussen, C.E.: Evaluation of Gaussian processes and other methods for non-linear regression. PhD, Dept. of Computer Science, University of Toronto (1996)
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
Schölkopf, B., Smola, A.: Learning with Kernels- Support Vector Machines, Reproducing Kernel Hilbert Spaces, Regularization, Optimization and Beyond. MIT Press, Cambridge (2002)
Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10, 1299–1319 (1998)
Schölkopf, B., Mika, S., Burges, C.J.C., et al.: Input space vs feature space in kernel-based methods. IEEE Trans. Neural Netw. 10(5), 1000–1017 (1999)
Seeger, M.: Bayesian model selection for support vector machines, Gaussian processes and other kernel classifiers. In: Leen, T.K., Solla, S.A., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12. pp. 603–609. MIT Press, Cambridge (2000)
Sollich, P.: Bayesian methods for support vector machines: Evidence and predictive class probabilities. Mach. Learn. 46(13), 21–52 (2002)
Spiegelhalter, D.J., Thomas, A., Best, N.G., Gilks, W.R.: BUGS Examples, vol. 1, Version 0.5. MRC Biostatistics Unit, Cambridge (1996a)
Spiegelhalter, D.J., Thomas, A., Best, N.G., Gilks, W.R.: BUGS Examples, vol. 2, Version 0.5. MRC Biostatistics Unit., Cambridge (1996b)
Thisted, R.A.: Elements of Statistical Computing. Chapman and Hall, New York (1988)
Tipping, M.E.: The relevance vector machine. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12. pp. 652–658. MIT Press, Cambridge (2000)
Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1, 211–244 (2001)
Wahba, G.: Spline Models for Observational Data. SIAM, Philadelphia (1990)
Williams, C.K.I., Barber, D.: Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1342–1351 (1998)
Zhang, Z., Jordan, M.I.: Bayesian multicategory support vector machines. In: Uncertainty in Artificial Intelligence (UAI), Proceedings of the Twenty-Second Conference (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Domijan, K., Wilson, S.P. Bayesian kernel projections for classification of high dimensional data. Stat Comput 21, 203–216 (2011). https://doi.org/10.1007/s11222-009-9161-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-009-9161-8