Elsevier

Neurocomputing

Volume 74, Issue 17, October 2011, Pages 3700-3707
Neurocomputing

Letters
Fast training of support vector machines on the Cell processor

https://doi.org/10.1016/j.neucom.2011.04.011Get rights and content

Abstract

Support vector machines (SVMs) are a widely used technique for classification, clustering and data analysis. While efficient algorithms for training SVM are available, dealing with large datasets makes training and classification a computationally challenging problem. In this paper we exploit modern processor architectures to improve the training speed of LIBSVM, a well known implementation of the sequential minimal optimization algorithm. We describe LIBSVMCBE, an optimized version of LIBSVM which takes advantage of the peculiar architecture of the Cell Broadband Engine. We assess the performance of LIBSVMCBE on real-world training problems, and we show how this optimization is particularly effective on large, dense datasets.

Introduction

SVMs are widely used supervised learning methods which can be employed in many classification tasks (see, e.g., [1] and references therein). In this paper we consider the problem of binary classification, where N data points (training set) must be classified into two classes.

Formally, let us consider N vectors in m−dimensional space: xiRm,i=1,,N. Vector xi is associated with label yi{1,1}. The set D={(xi,yi):i=1,,N} is called the training set. The classification problem is to separate the two classes with a m-dimensional surface that maximizes the margin between them. The separating surface is obtained by computing the solution α=[α1,,αN]T of a quadratic programming (QP) problem of the form [2]:minimizef(α)=12αTQαi=1Nαisubjecttoi=1Nyiαi=00αjC,j=1,,Nwhere the entries Qij of the symmetric positive semidefinite matrix Q are defined asQij=yiyjK(xi,xj),i,j=1,,N

K:Rm×RmR is a kernel function which depends on the type of the separating surface. Examples are the polynomial kernel:K(xi,xj;a,r,d)=(axiTxj+r)d,a,rR,dNand the radial basis function (RBF) kernel:K(xi,xj;γ)=exp(γxixj2),γR+

A SVM is trained by solving the QP problem (1) using vectors xi and the corresponding labels yi. The solution α can then be used to classify any new point zRm by computing its class f(z) as:f(z)=sgnb+i=1NyiαiK(xi,z)where the offset b is computed during the training step as well.

The size of real-world datasets makes the solution of (1) using general-purpose QP solvers impractical. For this reason, efficient ad-hoc algorithms that take advantage of the special structure of (1) have been developed. The sequential minimal optimization (SMO) algorithm, originally proposed by Platt [3], decomposes the original QP problem into two-dimensional subproblems which can be solved analytically. The idea of SMO is to compute a solution iteratively, by optimizing two coefficients αi,αj at each iteration. SMO is efficient because it does not use a costly numerical QP solver in its inner loop.

Unfortunately, training times are still significant for many real-world datasets. The reason is that matrix Q can be very large and can not be kept entirely in memory. Thus, the SMO algorithm (as well as most of the existing SVM training algorithms) needs to recompute the values Qij many times, which in turn require many evaluations of the kernel function.

In this paper we describe LIBSVMCBE, an efficient implementation of the SMO algorithm for asymmetric multi-core architectures. Specifically, LIBSVMCBE employs the peculiar features of the Cell Broadband Engine (CBE), an asymmetric multi-core processor architecture originally developed for the consumer market (it is used inside Sony's PlayStation®3 (PS3) gaming console), but which is also used on high-end servers targeted at scientific computations. The Cell processor includes a conventional core based on the PowerPC architecture, together with specialized vector coprocessors called synergistic processor elements (SPEs). LIBSVMCBE splits the evaluation of the elements of matrix Q across the SPEs; an optimized, Single-instruction multiple-data (SIMD) algorithm for computing the dot product of two vectors is used inside each SPE, so that the evaluation of kernel functions is very fast.

LIBSVMCBE is based on LIBSVM [4], an efficient and widely used implementing the SMO algorithm. LIBSVM employs several heuristics to reduce the training time, such as the shrinking heuristic which dynamically reduces the set of coefficients to be optimized, and a caching strategy to avoid recomputations of recently used entries of matrix Q; still, evaluation of elements Qij is the bottleneck of LIBSVM. LIBSVMCBE improves that bottleneck by offloading the computation of Q to the vector coprocessors. We test LIBSVMCBE on several real-world datasets and show how this optimization yields significant speedups over the sequential algorithm. Our optimization is very general, because it can be applied to any SVM training and classification package which relies on multiple evaluations of the kernel function.

Organization of this paper: This paper is organized as follows. In Section 2 we revise some of the existing parallelization strategies for training SVMs. In Section 3 we describe the SMO algorithm. In Section 4 we give a brief overview of the architecture of the Cell processor, and then present LIBSVMCBE, a Cell-optimized version of the LIBSVM software package. In Section 5 we evaluate LIBSVMCBE on some training datasets. Finally, conclusions and future works are illustrated in Section 6. We include some implementation details in Appendix A.

Section snippets

Related work

There have been several attempts to optimize the training and classification times of SVMs, by considering parallel approaches to the solution of the QP problem (1).

In [5] the authors describe an optimized version of SMO which makes use of graphics processing units GPUs. Modern GPUs can be considered as specialized highly parallel processors, containing a large number (hundreds or even thousands) of relatively simple processing cores connected to a high bandwidth memory subsystem. This kind of

The SMO algorithm

Sequential minimal optimization is sketched in Algorithm 1 (see [13] for a more detailed description). Algorithm 1 includes the main loop which is used to optimize two coefficients αi,αj at each iteration. The selection of the index i,j is a crucial task, as it influences the convergence speed. LIBSVM uses the second order heuristic proposed in [13], shown in Algorithm 2.

Algorithm 1

Sequential minimal optimization

τ1e12 {small positive constant}
for all i1,,N do
 Gi0
 αi1
loop
 Select i,j using Algorithm 2
if

Fast kernel evaluation on the Cell processor

The CBE is a heterogeneous multi-core processor, whose internal architecture is shown in Fig. 1. The CBE contains nine processors on a single chip, connected with a high bandwidth circular bus [14].

The power processor element (PPE) is the main processor, and is based on a 64 bit PowerPC architecture with vector and SIMD multimedia extensions. The PPE is responsible for executing the operating system, allocating resources and distributing the workload to the other computing cores. The PPE has

Experimental results

In this section we analyze the performance of LIBSVMCBE by measuring the training time on the datasets listed in Table 1: N is the number of training vectors; m is the number of elements of each vector; Density is the average fraction of nonzero elements in each training vector (100% denotes fully dense vectors); finally, Av. nonzero is the average number of nonzero elements (computed as m×Density).

chess8_12K contains 12 000 points which are randomly distributed over a 8×8 chessboard; each point

Conclusions

In this paper we described LIBSVMCBE, an optimized implementation of the SMO algorithm for the Cell processor. LIBSVMCBE is a modified version of LIBSVM which improves the most time-consuming step of the training process, that is the evaluation of the kernel function.

LIBSVMCBE has been tested on some widely used datasets; results show speedups up to 6.35×with respect to the sequential version. High speedups are achieved on datasets with dense, high dimensional training vectors. We remark that

Acknowledgments

The author is grateful to Gaetano Zanghirati for suggesting this problem and for many useful discussions.

Software availability: The source code of LIBSVMCBE, including the datasets and scripts used to produce the results shown in this paper, is available at http://www.cs.unibo.it/pub/marzolla/svmcell/.

Moreno Marzolla graduated in Computer Science from the University of Venezia “Ca’ Foscari” (Italy) in 1998, and received a Ph.D. in Computer Science from the same University in 2004. From 1998 to 2001 he worked at the Italian National Institute for Nuclear Physics (INFN) as a software developer. From 2004 to 2005 he was a post-doc researcher at the University of Venezia “Ca’ Foscari”. From 2005 to 2009 he was again with INFN as a Software Engineer, working in the area of Grid Computing

References (16)

  • G. Zanghirati et al.

    A parallel solver for large quadratic programs in training support vector machines

    Parallel Computing

    (2003)
  • V.A. David Sánchez

    Advanced support vector machines and kernel methods

    Neurocomputing

    (2003)
  • V.N. Vapnik

    Statistical Learning Theory

    (1998)
  • J.C. Platt

    Fast training of support vector machines using sequential minimal optimization

  • C.-C. Chang, C.-J. Lin, LIBSVM: a Library for Support Vector Machines,...
  • B. Catanzaro et al.

    Fast support vector machine training and classification on graphics processors

  • L. Cao et al.

    Parallel sequential minimal optimization for the training of support vector machines

    IEEE Transactions on Neural Networks

    (2006)
  • L. Zanni et al.

    Parallel software for training large scale support vector machines on multiprocessor systems

    Journal of Machine Learning Research

    (2006)
There are more references available in the full text version of this article.

Cited by (3)

  • Computational performance optimization of support vector machine based on support vectors

    2016, Neurocomputing
    Citation Excerpt :

    Due to the powerful abilities of generalization, small-sample and nonlinear processing, SVMs have been successfully applied to classification decision [3,4], regressive modeling [5], fault diagnosis [6,7] and bioinformatics [8,9]. Although SVM can effectively avoid the problem of ‘curse of dimensionality’, the degraded computational performance caused by the increase of sample size or dimension cannot be solved, which are usually dealt by the following two ways: one is the improvement of learning algorithm, such as sequential minimal optimization (SMO) [10], successive overrelaxation (SOR) [11] and LIBSVMCBE [12] while the other is to simplify computation by reducing sample size or dimension of training sample set, which are respectively studied in this paper. In dealing with large datasets, how to reduce the scale of training samples is one of the most important aspects to improve computation efficiency.

  • Low power and scalable many-core architecture for big-data stream computing

    2014, Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI

Moreno Marzolla graduated in Computer Science from the University of Venezia “Ca’ Foscari” (Italy) in 1998, and received a Ph.D. in Computer Science from the same University in 2004. From 1998 to 2001 he worked at the Italian National Institute for Nuclear Physics (INFN) as a software developer. From 2004 to 2005 he was a post-doc researcher at the University of Venezia “Ca’ Foscari”. From 2005 to 2009 he was again with INFN as a Software Engineer, working in the area of Grid Computing supported by the EGEE, OMII-Europe and EGEE-3 EU-funded projects. In November 2009, he joined the Department of Computer Science of the University of Bologna, where he is currently an assistant professor. His research interests include performance modeling of complex systems and high performance and Cloud computing.

View full text