Elsevier

Information Fusion

Volume 67, March 2021, Pages 3-13
Information Fusion

Bridging deep and multiple kernel learning: A review

https://doi.org/10.1016/j.inffus.2020.10.002Get rights and content

Highlights

  • Deep kernel learning is an effective method to learn complex feature representation.

  • The state-of-the-art approaches that bridge the deep learning and MKL are reviewed.

  • This work serves as a valuable reference for new practitioners and theoreticians.

Abstract

Kernel methods and deep learning are two of the most currently remarkable machine learning techniques that have achieved great success in many applications. Kernel methods are powerful tools to capture nonlinear patterns behind data. They implicitly learn high (even infinite) dimensional nonlinear features in the reproducing kernel Hilbert space (RKHS) while making the computation tractable by leveraging the kernel trick. It is commonly agreed that the success of kernel methods is very much dependent on the choice of kernel. Multiple kernel learning (MKL) is one possible scheme that performs kernel combination and selection for a variety of learning tasks, such as classification, clustering, and dimensionality reduction. Deep learning models project input data through several layers of nonlinearity and learn different levels of abstraction. The composition of multiple layers of nonlinear functions can approximate a rich set of naturally occurring input-output dependencies. To bridge kernel methods and deep learning, deep kernel learning has been proven to be an effective method to learn complex feature representations by combining the nonparametric flexibility of kernel methods with the structural properties of deep learning. This article presents a comprehensive overview of the state-of-the-art approaches that bridge the MKL and deep learning techniques. Specifically, we systematically review the typical hybrid models, training techniques, and their theoretical and practical benefits, followed by remaining challenges and future directions. We hope that our perspectives and discussions serve as valuable references for new practitioners and theoreticians seeking to innovate in the applications of the approaches incorporating the advantages of both paradigms and exploring new synergies.

Introduction

Kernel methods such as support vector machines (SVM), kernel Fisher discriminant analysis (KFDA), and Gaussian process have been successfully applied to a wide variety of machine learning problems [1], [2], [3]. These methods map data points from the input space to the feature space, i.e., higher dimensional reproducing kernel Hilbert space (RKHS), where even relatively simple algorithms, such as linear methods, can deliver very impressive performance. The mapping is determined implicitly by a kernel function (or simply a kernel), which computes the inner product of data points in the feature space. Despite the popularity of kernel methods, there is not yet a mechanism that is capable of guiding kernel leaning and selection. It is well known that selecting an appropriate kernel and thus an appropriate feature space is very important to the success of any kernel method [4], [5], [6], [7]. However, for many real-world situations, it is often not an easy task to choose an appropriate kernel, which usually may require some domain knowledge that would be difficult for non-expert users. To address such limitation, recent years have witnessed the active research of learning effective kernels automatically from data. One popular technique for kernel learning and selection is multiple kernel learning (MKL) [8], [9], [10], which aims to learn a linear or nonlinear combination of a set of predefined kernels (base kernels) in order to identify a good target kernel for real applications. Compared with traditional kernel methods employing a fixed kernel, MKL demonstrates flexibility in automated kernel learning and also reflects the fact that typical learning problems often involve multiple, heterogeneous data sources. Over the past few years, MKL has been actively investigated, in which a number of algorithms have been proposed to improve the learning efficiency by exploiting different optimization techniques [11], [12], [13], [14], [15] and the prediction/classification accuracy by exploring possible combinations of base kernels [16], [17], [18], [19], [20]. A lot of extended MKL techniques have also been proposed to improve the regular MKL method, e.g., localized MKL [21], which achieves local assignments of kernel weights at the group level; sample-adaptive MKL [22], which switches off kernels at the data sample level; Bayesian MKL [23], which estimates the entire posterior distribution of model weights; function approximation MKL [24,25], which uses the function approximation techniques for finding the optimal kernel; multiple empirical kernel learning (MEKL) [26,27], which explicitly maps data points from the input space to the empirical feature space in which the mapped feature vectors can be explicitly represented; and two-stage MKL [24,[28], [29], [30], [31]], which first learns the optimal kernel weights according to certain criteria, and then applies the learned optimal kernel to train a kernel classifier.

On the other hand, as a different subfield of machine learning, deep learning has gained tremendous interests from both the academies and industries in recent years due to its success in revolutionizing many application domains ranging from vision to auditory sensory signal processing [32], [33], [34], [35], [36], [37]. Deep learning based models deal with complex tasks by learning from subtasks. In particular, several nonlinear modules are stacked in hierarchical architectures to learn multiple levels of representation from the raw input data. E ach module transforms the representation at one level into a much more abstract representation at a higher level. In other words, the higher-level features are defined in terms of lower-level ones. There are different types of deep learning architectures in which typical models are convolutional neural networks (CNN) [38] that are specially tailed for image processing (or more generally for processing translation invariant data), recurrent neural network (RNN) [39] that are specially designed to deal with time series data and other sequential data, deep belief networks (DBN) [40] that is the first deep learning model that is successfully trained, generative adversarial networks (GAN) [41] that compete against themselves to create the most possible realistic data, and many more. There are several reasons or motivations that make deep learning architectures attract extensive attention: the appeal of hierarchical distributed representations, the wide range of functions that can be parameterized by composing weakly nonlinear transformations, and the potential for combining supervised and unsupervised methods.

In contrast to deep learning models that learn multiple levels of representation with several hidden layers, kernel methods can generally be seen as shallow models, given that their architecture only has a single hidden processing layer. Recent works in machine learning have highlighted the superiority of deep models over shallow models in terms of accuracy in several application domains [33,34]. However, training deep networks involves costly nonlinear optimizations and many heuristics for determining network structure and associated parameters which are not well founded in theory. In contrast, kernel methods in general and SVM in particular are characterized by strong foundations in optimization and learning theory [1], [2], [3] and are able to deal with high-dimensional data directly. However, shallow architectures such as kernel methods have problems in providing such rich internal representations as deep architectures. Furthermore, by decoupling data representation and learning, kernel methods seem by nature incompatible with the end-to-end learning, which is the cornerstone of deep learning models and one of the main reasons for their success. Therefore, it is worthwhile to pursue hybrid approaches incorporating the advantages of both paradigms and exploring new synergies. Recently, many deep kernel algorithms have been presented to try to link deep learning and kernel methods. The authors in [42] provided a brief motivated survey of recent proposals to explicitly or implicitly combine the two frameworks. A more detailed overview of the conducted research works in the literature for exploring synergies between the two paradigms can be found in [43].

In this paper, we explore and review the emerging works that bridge the MKL and deep learning, in which the ideas of MKL can be applied to improve the deep learning methods and vice versa. It should be noted that the focus of our work is very different from [42,43]. While they provide a general overview of deep kernel methods (or deep kernel learning), we choose to focus on the deep MKL methods where MKL has shown great success in automated kernel learning and optimization for kernel methods. The rest of this paper is outlined as follows. Section 2 gives preliminary concepts on kernel methods, MKL, and deep learning. Section 3 briefly overviews the related approaches exploiting kernel methods in synergy with deep learning. The main part of this work is Section 4, which provides the state-of-the-art approaches that bridge the MKL and deep learning, including typical hybrid models, training techniques, and their theoretical and practical benefits, as well as remaining challenges and future directions. Finally, Section 5 concludes this paper.

Section snippets

Preliminaries

In this section, we introduce some preliminaries of kernel methods, MKL, and deep leaning.

Combining kernel methods with deep learning

Recently, deep kernel learning has been comprehensively investigated to combine kernel methods with deep learning. Ideas from the deep learning field can be transferred to the kernel framework and vice versa. There are generally three active research directions in deep kernel learning. The first direction is to directly combine deep modules as the front-end with kernel machines as the back-end to form a synergy model. For example, using SVM in combination with CNN has been proposed as part of a

Bridging deep and multiple kernel learning

Both deep learning and MKL are representation learning methodologies that have shown increasing success in widespread applications. While the former learns representations through a hierarchy of features of increasing complexity, the latter provides a principled method for the combination of base representations. Bridging these two paradigms has led to new architectures and inference strategies.

Conclusion

Deep learning and MKL are research areas of machine learning with constant development due to the fast-growing demand for data analysis in the Big Data era. Especially, deep learning provides numerous challenges as well as opportunities and solutions in a variety of applications. More importantly, it transfers machine learning to its new stage, namely, “Smarter AI”. Both deep learning and MKL are representation learning methods that have shown increasing success. The hybrid model that combines

CRediT authorship contribution statement

Tinghua Wang: Conceptualization, Methodology, Investigation, Writing - original draft, Writing - review & editing, Project administration. Lin Zhang: Data curation, Visualization. Wenyu Hu: Formal analysis, Resources, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China (No. 61966002) and the Natural Science Foundation of Jiangxi Province of China (No. 20192BAB207016) .

References (138)

  • D.H. Wolpert

    Stacked generalization

    Neural Netw.

    (1992)
  • M.R. Mohammadnia-Qaraei et al.

    Convolutional kernel networks based on a convex combination of cosine kernels

    Pattern Recognit. Lett.

    (2018)
  • S. Mehrkanoon et al.

    Deep hybrid neural-kernel networks using random Fourier features

    Neurocomputing

    (2018)
  • S. Mehrkanoon

    Deep neural-kernel blocks

    Neural Netw.

    (2019)
  • L. Le et al.

    Deep embedding kernel

    Neurocomputing

    (2019)
  • Y. Li et al.

    Deep neural mapping support vector machines

    Neural Netw.

    (2017)
  • S. Poria et al.

    Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis

    Neurocomputing

    (2017)
  • B. Sheng et al.

    Multilayer deep features with multiple kernel learning for action recognition

    Neurocomputing

    (2020)
  • I. Lauriola et al.

    Enhancing deep neural networks via multiple kernel learning

    Pattern Recognit.

    (2020)
  • K.-R. Müller et al.

    An introduction to kernel-based learning algorithms

    IEEE Trans. Neural Netw.

    (2001)
  • J. Shawe-Taylor et al.

    Kernel Methods for Pattern Analysis

    (2004)
  • T. Hofmann et al.

    Kernel methods in machine learning

    Ann. Stat.

    (2008)
  • T. Wang et al.

    An overview of kernel alignment and its applications

    Artif. Intell. Rev.

    (2015)
  • J. Li et al.

    Efficient kernel selection via spectral analysis

  • T. Wang et al.

    Kernel learning and optimization with Hilbert-Schmidt independence criterion

    Int. J. Mach. Learn. Cybern.

    (2018)
  • M. Gönen et al.

    Multiple kernel learning algorithms

    J. Mach. Learn. Res.

    (2011)
  • S.S. Bucak et al.

    Multiple kernel learning for visual object recognition: a review

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2014)
  • S. Niazmardi et al.

    Multiple kernel learning for remote sensing image classification

    IEEE Trans. Geosci. Remote Sens.

    (2018)
  • G.R.G. Lanckriet et al.

    Learning the kernel matrix with semidefinite programming

    J. Mach. Learn. Res.

    (2004)
  • S. Sonnenburg et al.

    Large scale multiple kernel learning

    J. Mach. Learn. Res.

    (2006)
  • A. Rakotomamonjy et al.

    SimpleMKL

    J. Mach. Learn. Res.

    (2008)
  • M. Alioscha-Perez et al.

    SVRG-MKL: a fast and scalable multiple kernel learning solution for features combination in multi-class classification problems

    IEEE Trans. Neural Netw. Learn. Syst.

    (2020)
  • M. Kloft et al.

    lp-norm multiple kernel learning

    J. Mach. Learn. Res.

    (2011)
  • Z. Xu et al.

    Smooth optimization for effective multiple kernel learning

  • M. Varma et al.

    More generality in efficient multiple kernel learning

  • C. Cortes et al.

    Learning non-linear combinations of kernels

    Adv. Neural Inf. Process. Syst.

    (2009)
  • Y. Zhou et al.

    Veto-consensus multiple kernel learning

  • Y. Han et al.

    Localized multiple kernel learning via sample-wise alternating optimization

    IEEE Trans. Cybern.

    (2014)
  • X. Liu et al.

    Sample-adaptive multiple kernel learning

  • C.-Y. Du et al.

    Efficient Bayesian maximum margin multiple kernel learning

  • A. Kumar, A. Niculescu-Mizil, K. Kavukcoglu, H. Daumé, A binary classification framework for two-stage multiple kernel...
  • S.S. Shiju et al.

    Multiple kernel learning using single stage function approximation for binary classification problems

    Int. J. Syst. Sci.

    (2017)
  • Z. Wang et al.

    MultiK-MHKS: a novel multiple kernel learning algorithm

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2008)
  • C. Cortes et al.

    Algorithms for learning kernels based on centered alignment

    J. Mach. Learn.Res.

    (2012)
  • T. Wang et al.

    Two-stage fuzzy multiple kernel learning based on Hilbert-Schmidt independence criterion

    IEEE Trans. Fuzzy Syst.

    (2018)
  • G.E. Hinton et al.

    Reducing the dimensionality of data with neural networks

    Science

    (2006)
  • Y. Bengio

    Learning deep architectures for AI

    Found. Trends Mach. Learn.

    (2009)
  • Y. LeCun et al.

    Deep learning

    Nature

    (2015)
  • S. Pouyanfar et al.

    A survey on deep learning: algorithms, techniques, and applications

    ACM Comput. Surv.

    (2019)
  • G. Nguyen et al.

    Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey

    Artif. Intell. Rev.

    (2019)
  • Cited by (40)

    • Kernel functions embed into the autoencoder to identify the sparse models of nonlinear dynamics

      2024, Communications in Nonlinear Science and Numerical Simulation
    View all citing articles on Scopus
    View full text