LettersFast training of support vector machines on the Cell processor
Introduction
SVMs are widely used supervised learning methods which can be employed in many classification tasks (see, e.g., [1] and references therein). In this paper we consider the problem of binary classification, where N data points (training set) must be classified into two classes.
Formally, let us consider N vectors in m−dimensional space: . Vector is associated with label . The set is called the training set. The classification problem is to separate the two classes with a m-dimensional surface that maximizes the margin between them. The separating surface is obtained by computing the solution of a quadratic programming (QP) problem of the form [2]:where the entries Qij of the symmetric positive semidefinite matrix Q are defined as
is a kernel function which depends on the type of the separating surface. Examples are the polynomial kernel:and the radial basis function (RBF) kernel:
A SVM is trained by solving the QP problem (1) using vectors and the corresponding labels yi. The solution can then be used to classify any new point by computing its class as:where the offset b is computed during the training step as well.
The size of real-world datasets makes the solution of (1) using general-purpose QP solvers impractical. For this reason, efficient ad-hoc algorithms that take advantage of the special structure of (1) have been developed. The sequential minimal optimization (SMO) algorithm, originally proposed by Platt [3], decomposes the original QP problem into two-dimensional subproblems which can be solved analytically. The idea of SMO is to compute a solution iteratively, by optimizing two coefficients at each iteration. SMO is efficient because it does not use a costly numerical QP solver in its inner loop.
Unfortunately, training times are still significant for many real-world datasets. The reason is that matrix Q can be very large and can not be kept entirely in memory. Thus, the SMO algorithm (as well as most of the existing SVM training algorithms) needs to recompute the values Qij many times, which in turn require many evaluations of the kernel function.
In this paper we describe , an efficient implementation of the SMO algorithm for asymmetric multi-core architectures. Specifically, employs the peculiar features of the Cell Broadband Engine (CBE), an asymmetric multi-core processor architecture originally developed for the consumer market (it is used inside Sony's (PS3) gaming console), but which is also used on high-end servers targeted at scientific computations. The Cell processor includes a conventional core based on the PowerPC architecture, together with specialized vector coprocessors called synergistic processor elements (SPEs). splits the evaluation of the elements of matrix Q across the SPEs; an optimized, Single-instruction multiple-data (SIMD) algorithm for computing the dot product of two vectors is used inside each SPE, so that the evaluation of kernel functions is very fast.
is based on LIBSVM [4], an efficient and widely used implementing the SMO algorithm. LIBSVM employs several heuristics to reduce the training time, such as the shrinking heuristic which dynamically reduces the set of coefficients to be optimized, and a caching strategy to avoid recomputations of recently used entries of matrix Q; still, evaluation of elements Qij is the bottleneck of LIBSVM. improves that bottleneck by offloading the computation of Q to the vector coprocessors. We test on several real-world datasets and show how this optimization yields significant speedups over the sequential algorithm. Our optimization is very general, because it can be applied to any SVM training and classification package which relies on multiple evaluations of the kernel function.
Organization of this paper: This paper is organized as follows. In Section 2 we revise some of the existing parallelization strategies for training SVMs. In Section 3 we describe the SMO algorithm. In Section 4 we give a brief overview of the architecture of the Cell processor, and then present , a Cell-optimized version of the LIBSVM software package. In Section 5 we evaluate on some training datasets. Finally, conclusions and future works are illustrated in Section 6. We include some implementation details in Appendix A.
Section snippets
Related work
There have been several attempts to optimize the training and classification times of SVMs, by considering parallel approaches to the solution of the QP problem (1).
In [5] the authors describe an optimized version of SMO which makes use of graphics processing units GPUs. Modern GPUs can be considered as specialized highly parallel processors, containing a large number (hundreds or even thousands) of relatively simple processing cores connected to a high bandwidth memory subsystem. This kind of
The SMO algorithm
Sequential minimal optimization is sketched in Algorithm 1 (see [13] for a more detailed description). Algorithm 1 includes the main loop which is used to optimize two coefficients at each iteration. The selection of the index i,j is a crucial task, as it influences the convergence speed. LIBSVM uses the second order heuristic proposed in [13], shown in Algorithm 2. Algorithm 1 Sequential minimal optimization {small positive constant} for all do loop Select i,j using Algorithm 2 if
Fast kernel evaluation on the Cell processor
The CBE is a heterogeneous multi-core processor, whose internal architecture is shown in Fig. 1. The CBE contains nine processors on a single chip, connected with a high bandwidth circular bus [14].
The power processor element (PPE) is the main processor, and is based on a 64 bit PowerPC architecture with vector and SIMD multimedia extensions. The PPE is responsible for executing the operating system, allocating resources and distributing the workload to the other computing cores. The PPE has
Experimental results
In this section we analyze the performance of by measuring the training time on the datasets listed in Table 1: N is the number of training vectors; m is the number of elements of each vector; Density is the average fraction of nonzero elements in each training vector (100% denotes fully dense vectors); finally, Av. nonzero is the average number of nonzero elements (computed as m×Density).
chess8_12K contains 12 000 points which are randomly distributed over a 8×8 chessboard; each point
Conclusions
In this paper we described , an optimized implementation of the SMO algorithm for the Cell processor. is a modified version of LIBSVM which improves the most time-consuming step of the training process, that is the evaluation of the kernel function.
has been tested on some widely used datasets; results show speedups up to 6.35×with respect to the sequential version. High speedups are achieved on datasets with dense, high dimensional training vectors. We remark that
Acknowledgments
The author is grateful to Gaetano Zanghirati for suggesting this problem and for many useful discussions.
Software availability: The source code of , including the datasets and scripts used to produce the results shown in this paper, is available at http://www.cs.unibo.it/pub/marzolla/svmcell/.
Moreno Marzolla graduated in Computer Science from the University of Venezia “Ca’ Foscari” (Italy) in 1998, and received a Ph.D. in Computer Science from the same University in 2004. From 1998 to 2001 he worked at the Italian National Institute for Nuclear Physics (INFN) as a software developer. From 2004 to 2005 he was a post-doc researcher at the University of Venezia “Ca’ Foscari”. From 2005 to 2009 he was again with INFN as a Software Engineer, working in the area of Grid Computing
References (16)
- et al.
A parallel solver for large quadratic programs in training support vector machines
Parallel Computing
(2003) Advanced support vector machines and kernel methods
Neurocomputing
(2003)Statistical Learning Theory
(1998)Fast training of support vector machines using sequential minimal optimization
- C.-C. Chang, C.-J. Lin, LIBSVM: a Library for Support Vector Machines,...
- et al.
Fast support vector machine training and classification on graphics processors
- et al.
Parallel sequential minimal optimization for the training of support vector machines
IEEE Transactions on Neural Networks
(2006) - et al.
Parallel software for training large scale support vector machines on multiprocessor systems
Journal of Machine Learning Research
(2006)
Cited by (3)
Computational performance optimization of support vector machine based on support vectors
2016, NeurocomputingCitation Excerpt :Due to the powerful abilities of generalization, small-sample and nonlinear processing, SVMs have been successfully applied to classification decision [3,4], regressive modeling [5], fault diagnosis [6,7] and bioinformatics [8,9]. Although SVM can effectively avoid the problem of ‘curse of dimensionality’, the degraded computational performance caused by the increase of sample size or dimension cannot be solved, which are usually dealt by the following two ways: one is the improvement of learning algorithm, such as sequential minimal optimization (SMO) [10], successive overrelaxation (SOR) [11] and LIBSVMCBE [12] while the other is to simplify computation by reducing sample size or dimension of training sample set, which are respectively studied in this paper. In dealing with large datasets, how to reduce the scale of training samples is one of the most important aspects to improve computation efficiency.
Parallel computing of support vector machines: A survey
2019, ACM Computing SurveysLow power and scalable many-core architecture for big-data stream computing
2014, Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI
Moreno Marzolla graduated in Computer Science from the University of Venezia “Ca’ Foscari” (Italy) in 1998, and received a Ph.D. in Computer Science from the same University in 2004. From 1998 to 2001 he worked at the Italian National Institute for Nuclear Physics (INFN) as a software developer. From 2004 to 2005 he was a post-doc researcher at the University of Venezia “Ca’ Foscari”. From 2005 to 2009 he was again with INFN as a Software Engineer, working in the area of Grid Computing supported by the EGEE, OMII-Europe and EGEE-3 EU-funded projects. In November 2009, he joined the Department of Computer Science of the University of Bologna, where he is currently an assistant professor. His research interests include performance modeling of complex systems and high performance and Cloud computing.