Bridging deep and multiple kernel learning: A review
Introduction
Kernel methods such as support vector machines (SVM), kernel Fisher discriminant analysis (KFDA), and Gaussian process have been successfully applied to a wide variety of machine learning problems [1], [2], [3]. These methods map data points from the input space to the feature space, i.e., higher dimensional reproducing kernel Hilbert space (RKHS), where even relatively simple algorithms, such as linear methods, can deliver very impressive performance. The mapping is determined implicitly by a kernel function (or simply a kernel), which computes the inner product of data points in the feature space. Despite the popularity of kernel methods, there is not yet a mechanism that is capable of guiding kernel leaning and selection. It is well known that selecting an appropriate kernel and thus an appropriate feature space is very important to the success of any kernel method [4], [5], [6], [7]. However, for many real-world situations, it is often not an easy task to choose an appropriate kernel, which usually may require some domain knowledge that would be difficult for non-expert users. To address such limitation, recent years have witnessed the active research of learning effective kernels automatically from data. One popular technique for kernel learning and selection is multiple kernel learning (MKL) [8], [9], [10], which aims to learn a linear or nonlinear combination of a set of predefined kernels (base kernels) in order to identify a good target kernel for real applications. Compared with traditional kernel methods employing a fixed kernel, MKL demonstrates flexibility in automated kernel learning and also reflects the fact that typical learning problems often involve multiple, heterogeneous data sources. Over the past few years, MKL has been actively investigated, in which a number of algorithms have been proposed to improve the learning efficiency by exploiting different optimization techniques [11], [12], [13], [14], [15] and the prediction/classification accuracy by exploring possible combinations of base kernels [16], [17], [18], [19], [20]. A lot of extended MKL techniques have also been proposed to improve the regular MKL method, e.g., localized MKL [21], which achieves local assignments of kernel weights at the group level; sample-adaptive MKL [22], which switches off kernels at the data sample level; Bayesian MKL [23], which estimates the entire posterior distribution of model weights; function approximation MKL [24,25], which uses the function approximation techniques for finding the optimal kernel; multiple empirical kernel learning (MEKL) [26,27], which explicitly maps data points from the input space to the empirical feature space in which the mapped feature vectors can be explicitly represented; and two-stage MKL [24,[28], [29], [30], [31]], which first learns the optimal kernel weights according to certain criteria, and then applies the learned optimal kernel to train a kernel classifier.
On the other hand, as a different subfield of machine learning, deep learning has gained tremendous interests from both the academies and industries in recent years due to its success in revolutionizing many application domains ranging from vision to auditory sensory signal processing [32], [33], [34], [35], [36], [37]. Deep learning based models deal with complex tasks by learning from subtasks. In particular, several nonlinear modules are stacked in hierarchical architectures to learn multiple levels of representation from the raw input data. E ach module transforms the representation at one level into a much more abstract representation at a higher level. In other words, the higher-level features are defined in terms of lower-level ones. There are different types of deep learning architectures in which typical models are convolutional neural networks (CNN) [38] that are specially tailed for image processing (or more generally for processing translation invariant data), recurrent neural network (RNN) [39] that are specially designed to deal with time series data and other sequential data, deep belief networks (DBN) [40] that is the first deep learning model that is successfully trained, generative adversarial networks (GAN) [41] that compete against themselves to create the most possible realistic data, and many more. There are several reasons or motivations that make deep learning architectures attract extensive attention: the appeal of hierarchical distributed representations, the wide range of functions that can be parameterized by composing weakly nonlinear transformations, and the potential for combining supervised and unsupervised methods.
In contrast to deep learning models that learn multiple levels of representation with several hidden layers, kernel methods can generally be seen as shallow models, given that their architecture only has a single hidden processing layer. Recent works in machine learning have highlighted the superiority of deep models over shallow models in terms of accuracy in several application domains [33,34]. However, training deep networks involves costly nonlinear optimizations and many heuristics for determining network structure and associated parameters which are not well founded in theory. In contrast, kernel methods in general and SVM in particular are characterized by strong foundations in optimization and learning theory [1], [2], [3] and are able to deal with high-dimensional data directly. However, shallow architectures such as kernel methods have problems in providing such rich internal representations as deep architectures. Furthermore, by decoupling data representation and learning, kernel methods seem by nature incompatible with the end-to-end learning, which is the cornerstone of deep learning models and one of the main reasons for their success. Therefore, it is worthwhile to pursue hybrid approaches incorporating the advantages of both paradigms and exploring new synergies. Recently, many deep kernel algorithms have been presented to try to link deep learning and kernel methods. The authors in [42] provided a brief motivated survey of recent proposals to explicitly or implicitly combine the two frameworks. A more detailed overview of the conducted research works in the literature for exploring synergies between the two paradigms can be found in [43].
In this paper, we explore and review the emerging works that bridge the MKL and deep learning, in which the ideas of MKL can be applied to improve the deep learning methods and vice versa. It should be noted that the focus of our work is very different from [42,43]. While they provide a general overview of deep kernel methods (or deep kernel learning), we choose to focus on the deep MKL methods where MKL has shown great success in automated kernel learning and optimization for kernel methods. The rest of this paper is outlined as follows. Section 2 gives preliminary concepts on kernel methods, MKL, and deep learning. Section 3 briefly overviews the related approaches exploiting kernel methods in synergy with deep learning. The main part of this work is Section 4, which provides the state-of-the-art approaches that bridge the MKL and deep learning, including typical hybrid models, training techniques, and their theoretical and practical benefits, as well as remaining challenges and future directions. Finally, Section 5 concludes this paper.
Section snippets
Preliminaries
In this section, we introduce some preliminaries of kernel methods, MKL, and deep leaning.
Combining kernel methods with deep learning
Recently, deep kernel learning has been comprehensively investigated to combine kernel methods with deep learning. Ideas from the deep learning field can be transferred to the kernel framework and vice versa. There are generally three active research directions in deep kernel learning. The first direction is to directly combine deep modules as the front-end with kernel machines as the back-end to form a synergy model. For example, using SVM in combination with CNN has been proposed as part of a
Bridging deep and multiple kernel learning
Both deep learning and MKL are representation learning methodologies that have shown increasing success in widespread applications. While the former learns representations through a hierarchy of features of increasing complexity, the latter provides a principled method for the combination of base representations. Bridging these two paradigms has led to new architectures and inference strategies.
Conclusion
Deep learning and MKL are research areas of machine learning with constant development due to the fast-growing demand for data analysis in the Big Data era. Especially, deep learning provides numerous challenges as well as opportunities and solutions in a variety of applications. More importantly, it transfers machine learning to its new stage, namely, “Smarter AI”. Both deep learning and MKL are representation learning methods that have shown increasing success. The hybrid model that combines
CRediT authorship contribution statement
Tinghua Wang: Conceptualization, Methodology, Investigation, Writing - original draft, Writing - review & editing, Project administration. Lin Zhang: Data curation, Visualization. Wenyu Hu: Formal analysis, Resources, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is supported in part by the National Natural Science Foundation of China (No. 61966002) and the Natural Science Foundation of Jiangxi Province of China (No. 20192BAB207016) .
References (138)
- et al.
Learning by local kernel polarization
Neurocomputing
(2009) - et al.
EasyMKL: a scalable multiple kernel learning algorithm
Neurocomputing
(2015) - et al.
MREKLM: a fast multiple empirical kernel learning machine
Pattern Recognit.
(2017) - et al.
Two-stage multiple kernel learning with multiclass kernel polarization
Knowl.-Based Syst.
(2013) - et al.
Two-stage multiple kernel learning for supervised dimensionality reduction
Pattern Recognit.
(2015) - et al.
A survey on deep learning for big data
Inf. Fusion
(2018) Deep learning in neural networks: an overview
Neural Netw.
(2015)- et al.
A novel hybrid CNN-SVM classifier for recognizing handwritten digits
Pattern Recognit.
(2012) - et al.
Classification of glomerular hypercellularity using convolutional features and support vector machine
Artif. Intell. Med.
(2020) - et al.
Kernelized support vector machine with deep learning: an efficient approach for extreme multiclass dataset
Pattern Recognit. Lett.
(2018)
Stacked generalization
Neural Netw.
Convolutional kernel networks based on a convex combination of cosine kernels
Pattern Recognit. Lett.
Deep hybrid neural-kernel networks using random Fourier features
Neurocomputing
Deep neural-kernel blocks
Neural Netw.
Deep embedding kernel
Neurocomputing
Deep neural mapping support vector machines
Neural Netw.
Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis
Neurocomputing
Multilayer deep features with multiple kernel learning for action recognition
Neurocomputing
Enhancing deep neural networks via multiple kernel learning
Pattern Recognit.
An introduction to kernel-based learning algorithms
IEEE Trans. Neural Netw.
Kernel Methods for Pattern Analysis
Kernel methods in machine learning
Ann. Stat.
An overview of kernel alignment and its applications
Artif. Intell. Rev.
Efficient kernel selection via spectral analysis
Kernel learning and optimization with Hilbert-Schmidt independence criterion
Int. J. Mach. Learn. Cybern.
Multiple kernel learning algorithms
J. Mach. Learn. Res.
Multiple kernel learning for visual object recognition: a review
IEEE Trans. Pattern Anal. Mach. Intell.
Multiple kernel learning for remote sensing image classification
IEEE Trans. Geosci. Remote Sens.
Learning the kernel matrix with semidefinite programming
J. Mach. Learn. Res.
Large scale multiple kernel learning
J. Mach. Learn. Res.
SimpleMKL
J. Mach. Learn. Res.
SVRG-MKL: a fast and scalable multiple kernel learning solution for features combination in multi-class classification problems
IEEE Trans. Neural Netw. Learn. Syst.
lp-norm multiple kernel learning
J. Mach. Learn. Res.
Smooth optimization for effective multiple kernel learning
More generality in efficient multiple kernel learning
Learning non-linear combinations of kernels
Adv. Neural Inf. Process. Syst.
Veto-consensus multiple kernel learning
Localized multiple kernel learning via sample-wise alternating optimization
IEEE Trans. Cybern.
Sample-adaptive multiple kernel learning
Efficient Bayesian maximum margin multiple kernel learning
Multiple kernel learning using single stage function approximation for binary classification problems
Int. J. Syst. Sci.
MultiK-MHKS: a novel multiple kernel learning algorithm
IEEE Trans. Pattern Anal. Mach. Intell.
Algorithms for learning kernels based on centered alignment
J. Mach. Learn.Res.
Two-stage fuzzy multiple kernel learning based on Hilbert-Schmidt independence criterion
IEEE Trans. Fuzzy Syst.
Reducing the dimensionality of data with neural networks
Science
Learning deep architectures for AI
Found. Trends Mach. Learn.
Deep learning
Nature
A survey on deep learning: algorithms, techniques, and applications
ACM Comput. Surv.
Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey
Artif. Intell. Rev.
Cited by (40)
Kernel functions embed into the autoencoder to identify the sparse models of nonlinear dynamics
2024, Communications in Nonlinear Science and Numerical SimulationLarge-margin multiple kernel ℓ<inf>p</inf>-SVDD using Frank–Wolfe algorithm for novelty detection
2024, Pattern RecognitionMulti-correntropy fusion based fuzzy system for predicting DNA N4-methylcytosine sites
2023, Information FusionA deep kernel method for lithofacies identification using conventional well logs
2023, Petroleum Science