Review
A tutorial on kernel methods for categorization

https://doi.org/10.1016/j.jmp.2007.06.002Get rights and content

Abstract

The abilities to learn and to categorize are fundamental for cognitive systems, be it animals or machines, and therefore have attracted attention from engineers and psychologists alike. Modern machine learning methods and psychological models of categorization are remarkably similar, partly because these two fields share a common history in artificial neural networks and reinforcement learning. However, machine learning is now an independent and mature field that has moved beyond psychologically or neurally inspired algorithms towards providing foundations for a theory of learning that is rooted in statistics and functional analysis. Much of this research is potentially interesting for psychological theories of learning and categorization but also hardly accessible for psychologists. Here, we provide a tutorial introduction to a popular class of machine learning tools, called kernel methods. These methods are closely related to perceptrons, radial-basis-function neural networks and exemplar theories of categorization. Recent theoretical advances in machine learning are closely tied to the idea that the similarity of patterns can be encapsulated in a positive definite kernel. Such a positive definite kernel can define a reproducing kernel Hilbert space which allows one to use powerful tools from functional analysis for the analysis of learning algorithms. We give basic explanations of some key concepts—the so-called kernel trick, the representer theorem and regularization—which may open up the possibility that insights from machine learning can feed back into psychology.

Section snippets

Inner products

So what is a kernel? Kernels can be regarded as a non-linear generalization of inner products. We will take a little detour before explaining kernels and discuss the relationship between inner products, perceptrons and prototypes. This will set the stage on which kernels appear naturally to solve non-linear classification problems.

Kernels

The next section will introduce the kernel trick that makes it possible to work with high dimensional (even infinite dimensional) and flexible linearization spaces.

Regularization

By using the exemplar network with as many free parameters as stimuli it is always possible to find weights such that the network can classify all training stimuli perfectly. The price for this flexibility is the danger of overfitting. A network may learn to categorize all training stimuli perfectly but only because it has learned the stimuli by heart. Any regularity in the data is overlooked in this way and therefore the network will not be able to generalize. An example for overfitting is

Conclusions

We have introduced kernel methods as they are used in machine learning. The most important results here are the kernel trick and its link to reproducing kernel Hilbert spaces. On the way we have hinted to parallels with psychological theories. First, kernel methods can be implemented as a one-layer neural network. Second, the Gaussian kernel can be interpreted as a similarity measure and representation of the stimuli in a RKHS can be seen as representing the stimuli via their similarity to all

Acknowledgments

We would like to thank Jakob Macke, Jan Eichhorn, Florian Steinke, and Bruce Henning for comments on an earlier draft of this work.

References (43)

  • N. Christianini et al.

    An introduction to support vector machines and other kernel-based learning methods

    (2000)
  • J. Drösler

    Color similarity represented as a metric of color space

  • S. Edelman

    Representation is representation of similarities

    Behavioral and Brain Sciences

    (1998)
  • L.S. Fried et al.

    Induction of category distributions: A framework for classification learning

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (1984)
  • A.B.A. Graf et al.

    Classification of faces in man and machine

    Neural Computation

    (2006)
  • Jäkel, F., Schölkopf, B., & Wichmann, F. A. (2007a). Generalization and similarity in exemplar models of...
  • Jäkel, F., Schölkopf, B., & Wichmann, F. A. (2007b). Similarity, kernels and the triangle inequality. Journal of...
  • J.K. Kruschke

    ALCOVE: An exemplar-based connectionist model of category learning

    Psychological Review

    (1992)
  • D.L. Medin et al.

    Linear separability in classification learning

    Journal of Experimental Psychology: Human Learning and Memory

    (1981)
  • Minsky, M., & Papert, S. (1967). Linearly unrecognizable patterns. AIM 140,...
  • N.J. Nilsson

    Learning machines

    (1965)
  • Cited by (56)

    View all citing articles on Scopus
    View full text