Elsevier

Neurocomputing

Volume 173, Part 2, 15 January 2016, Pages 127-136
Neurocomputing

An online generalized eigenvalue version of Laplacian Eigenmaps for visual big data

https://doi.org/10.1016/j.neucom.2014.12.119Get rights and content

Abstract

This paper presents a generalized incremental Laplacian Eigenmaps (GENILE), a novel online version of the Laplacian Eigenmaps, one of the most popular manifold-based dimensionality reduction techniques which solves the generalized eigenvalue problem. We evaluate the comparative performance of the manifold-based learning techniques using both artificial and real data. Specifically, two popular artificial datasets: swiss roll and s-curve datasets, are used, in addition to real MNIST digits, bank-note and heart disease datasets for testing and evaluating our novel method benchmarked against a number of standard batch-based and other manifold-based learning techniques. Preliminary experimental results demonstrate consistent improvements in the classification accuracy of the proposed method in comparison with other techniques.

Introduction

Most of the traditional techniques used for feature extraction and dimensionality reduction come in both batch and incremental versions. Out of all of the dimensionality reduction methods proposed in the past, manifold-based learning techniques for feature extraction and dimensionality reduction have gained great popularity, but most of these run in batch mode. Very few incremental approaches based on manifold-based learning have been proposed in the past. The major difficulty arises in scenarios involving incoming data in multiple chunks from time to time. Batch mode algorithms repetitively recalculate the previous chunks again and again at each new input, which becomes computationally very expensive and less efficient.

The general problem of classical dimensionality reduction methods like PCA, MDS, etc., is to produce projections for non-linear data. This has, to some extent, been solved by methods that generate non-linear maps such as self-organizing maps [1] and other neural network based methods [2]. These methods normally resolve a non-linear optimization problem by using gradient descent method that do not always reach the global optimum and most of the time produce a local optimum. The ideal solution for this is to explicitly consider the structure of the manifold on which the data reside.

A classical and widely used dimensionality reduction method is principal component analysis (PCA), used primarily for visualization and pre-processing purposes in the area of data mining and machine learning [3], [4], information retrieval [5], multimedia [6], etc. Dimensionality reduction using PCA is performed on the basis of leading eigenvectors of the data׳s covariance matrix. One more classical linear method is multi-dimensional scaling, which can only see the flat euclidean structure and is unable to find the non-linear structure in the data [7]. In [8] the authors have proposed a novel distance metric learning model based on standard fisher׳s linear discriminant analysis technique into the classical maximum margin criterion. The basic idea of this paper is to maximize the margin between dissimilar samples; while maintaining a larger mean squared distance ratio. The only disadvantage of this technique is its iterative learning nature which makes it slower to converge in comparison with other standard state of the art classification techniques. Similarly in [9] the authors have proposed a variant of fisher׳s linear discriminant analysis technique – another classical dimensionality reduction technique which resulted in more than one filters for the two class problem and also produced higher classification accuracy in comparison with other dimensionality reduction techniques. In [10] the authors have proposed a regularized version of linear discriminant analysis by introducing within cluster scatter matrix generated on the basis of ‘between-class’ and ‘within-class’ scatter matrix. In [11] the authors have proposed a multi-feature multi-manifold learning method to address the single-sample face recognition problem. Multiple discriminative features from the face image patches are extracted by a multi-manifold based subspace learning criteria; which significantly improved the recognition performance in comparison to other techniques. Similarly in [12] the authors have presented a novel active learning algorithm named manifold optimal experimental design via dependence maximization (MODM) based on Laplacian regularized least squares. The objective function of the proposed learning algorithm is efficiently optimized by using the sequential optimization strategy and the results showed the effectiveness of the proposed method in comparison with other standard approaches. In [13] the authors have proposed an L1-norm based multi-linear subspace analysis technique which is proved very robust to outliers as it suppresses the negative effect of outliers on the resulting projection matrix or projection vector. This method is suitable in scenarios where less training data is available. Most of these classical dimensionality reduction methods suffer from difficulties in designing the cost function and are often limited to relatively low-dimensional data sets. Recently, manifold based dimensionality reduction methods have been developed such as Isomap [14], [15], local linear embedding (LLE) [16], Laplacian Eigenmaps [17], [18], Hessian Eigenmap [19], semi-definite programming (SDE) [20], manifold based charting [21], local tangent space alignment (LTSA) [22] and Diffusion maps [23]. Due to these methods׳ non-linear geometrical nature, they have attracted wide attention in various domains of computational intelligence.

All the above mentioned algorithms run in batch mode and necessarily require complete data to be available at once for processing. These types of algorithms are computationally very expensive in scenarios where the data is required to be observed sequentially. The reason is the need for repetitively re-considering the previously processed data; this makes the execution of these batch algorithms computationally very expensive. There are several scenarios for explaining the benefits of incremental learning and how it overcomes the problem of the high computational cost involved in the execution of a batch version. Instead of considering a single new entry, we have considered the most common scenario, where the data is coming in more than one chunk, and in a sequence. The problem of incremental learning involving incoming data in this way can be stated as follows. Assume X=[x1,x2,,xt1,xt1+1,xt1+2,,xt1+t2] as a sum of two chunks at timestep t1 and t2 of the whole dataset, where xiϵRt1+t2. Suppose the low-dimensional coordinates yi of xi representing the first two chunks containing t1+t2 training samples have already been produced. When the third chunk, namely [xt1+t2+1,xt1+t2+2,,xt1+t2+t3], comes at timestep t3, incremental learning should independently figure out how to project this chunk of information onto the low dimensional space.

In many scenarios, it can be uncommon for all the data to be present before learning; for example social networking site data, online web transaction data, and data received through sensors, due to which, the storage mechanism has also completely changed. These kinds of data are mostly collected and stored in raw form in a distributed file structure storage environment like Hadoop or Cassandra. Analytical programming environments like java, matlab and revolution R extract data from these storages ranging from terabytes to petabytes and perform learning on these big datasets. The incremental learning technique is best suited to these scenarios because the huge amount of transactional data cannot be learned at once. Instead, the best choice is to learn the data in the form of chunks, or more appropriately, one data point at a time in a completely adaptive environment.

Several incremental manifold-based learning methods related to different dimensionality reduction techniques have been proposed in the literature. In [24], an incremental manifold-based learning method via tangent space alignment is presented. This method works by firstly updating the existing local geometrical information in view of the new input, followed by using it with respect to the existing points to perform new estimations and updating of the whole tangent space. In [25] the authors have proposed a multiple threshold based incremental learning classifier that performs sample by sample incremental learning and then update a pre-defined number of threshold based classifiers without re-training on previous data. Another important characteristic of this algorithm is the optimal determination of threshold and training error of each classifier in a completely close form. In [26], Martin and Anil presented an incremental version of Isomap, in which the geodesic is updated every time when a new input comes, leading to the solution of an incremental eigen-decomposition problem. In [27], the author has not changed the cost matrix on a new arrival, as the smallest eigenvalue is taken for projection. This is always susceptible to change, and then an incremental learning problem of local linear embedding (LLE) is processed by solving a d2×d2 minimization problem considering d2 as the dimensionality of the low-dimensional embedding space. The author in [28] has presented a generalized common framework for local linear embedding (LLE), multi-dimensional scaling (MDS), Isomap and Laplacian Eigenmaps by proposing a novel nystrom formula for new datapoints. This helps in solving the subset eigendecomposition problem and tries to generalize the dimensionality reduction results for the novel datapoints. Similarly in [29] the authors have used Laplacian scores and Pearson correlation coefficients in the data preparation step to preserve the structure of the data in the lower-dimensional space. This helps in reducing the computational cost and also improves the classification accuracy. In [30], a general incremental learning framework capable of dealing with one or more new samples each time for the so-called spectral embedding methods is presented. The authors in that paper have solved the dimensionality reduction problem by recovering the latent manifold as new samples on the basis of the low-dimensional embedding coordinates learned from the previous samples.

The incremental methods presented so far in the literature can be easily divided into two groups:

  • 1.

    Independent training: Calculate the low-dimensional embedding of a new chunk from a new class or an existing class like incremental subspace versions of PCA and LDA methods [31], [32].

  • 2.

    Dependent training: Calculate the low-dimensional embedding of a new chunk by using the existing adjacent information of the previous chunk. An example is the most recently proposed incremental version of Laplacian Eigenmaps, which is very much dependent on the previously processed information to compute the new information [33].

The rationale behind deriving a new online version in the presence of an existing extension of Laplacian Eigenmaps already proposed in [33] is to highlight four key findings:

  • 1.

    For big data computations, the two positive semi-definite matrices produced for learning will be quite big in size and require a large amount of computations which can only be solved by incrementally learning each vector point by point for both matrices.

  • 2.

    If the data is online in nature and a light-weight adaptable learning mechanism is required, then too our online version will be a preferable choice compared to the standard Laplacian Eigenmaps approach.

  • 3.

    It is not always important as mentioned in [17], [33] to consider only minimum eigenvalues to produce low-dimensional projections for Laplacian Eigenmaps.

  • 4.

    The low dimensional embedding can also be calculated incrementally very easily and independently in one pass without using the existing adjacent information of the previous chunk as in [33].

Section snippets

Manifold-based learning

Let us consider the problem of observing some images: there are factors like the view angle, rotation and the lighting angle of the pixel intensities, which means that the data in the high-dimensional space attains a complex non-linear structure. These changes do not occur abruptly and so the data can be reasonably assumed to lie approximately on a (Riemannian) manifold. This is one reason for manifold-based learning techniques’ gaining a lot of attention. In this paper, our topic of discussion

The Laplacian Eigenmaps algorithm

Given l points x1,x2,,xl in Rl, we construct a weighted graph one for each point connected by the set of edges between neighboring points. The steps involved in the execution of a Laplacian Eigenmaps are stated below:

  • 1.

    Step 1 [Construct an adjacency graph matrix]: Using the K-Nearest Neighbor algorithm on the complete dataset, create an edge between xi and xj if i is among the n nearest neighbor of j or j is among the n nearest neighbors of i.

  • 2.

    Step 2 [Weighting the edges]: There are two different

Generalized eigenproblems: incremental solutions

Ref. [34] shows that one method of finding the maximum eigenvalue of the generalized eigenproblemAw=λBw,is to iteratively useΔw=Awf(w)Bw,w=w+ηΔw,where η is a learning rate or step size. In (8) the first term on the right-hand side can be considered as a standard Hebbian rule term, and the second term acts to bound the length of the vector w. In (8) f(w)=wTw becomes the continuous version of Oja׳s algorithm as mentioned by Zhang in [34]. The function f(w):Rn{0}R satisfies

  • 1.

    f(w) is locally

Generalized incremental Laplacian Eigenmaps

In this section, we provide our state-of-the-art purely incremental version of the manifold-based learning technique discussed in [17] for Laplacian Eigenmaps.

Let L be the laplacian matrix and D be the diagonal matrix where each value of D is the sum of each column of W as explained in [17]. As shown in [34], the optimal weights for a linear projection can be found as the solution of the generalized eigenproblemsLw=λDw.

Therefore we can use the method of the previous section to getΔw=Lwf(w)Dw,w=

Experiment on an artificial dataset

We have used the swiss roll as an artificial dataset for our initial experiment. It consists of 20,000 datapoints and each data point has three dimensions. Fig. 1 shows the original inputs. Since our method is purely incremental, we divide the data into four different chunks and perform dimensionality reduction on each chunk separately by using the same learned filters w1 and w2 of the previous chunk for the next chunk coming ahead. The learning rate was set to 0.00001 and the number of

Conclusion

This paper has presented a novel online version of the Laplacian Eigenmaps, termed the generalized incremental Laplacian Eigenmaps (GENILE). Experimental results show that the proposed technique can be viewed as a purely incremental technique as it is able to consider each datapoint separately while processing the whole dataset, compared to traditional incremental methods proposed in the literature which do not work separately on each instance. Results have also demonstrated a consistently

Zeeshan Khawar Malik is completing his PhD from the University of Stirling. He received his MPhil from the University of the West of Scotland and MS and BS (honors) degrees from the University of The Central Punjab, Lahore Pakistan, in 2003 and 2006, respectively. By profession, he is an Assistant Professor in the University of The Punjab, Lahore, Pakistan, on leave and currently working as a Research Associate in the University of Windsor.

References (40)

  • S. Haykin, Neural Networks: A Comprehensive Foundation. Macmillan, New Jersey,...
  • S. Papadimitriou, J. Sun, C. Faloutsos, Streaming pattern discovery in multiple time-series, in: VLDB ׳05 Proceedings...
  • A. Loizou et al.

    Developing prognosis tools to identify learning difficulties in children using machine learning technologies

    Cognit. Comput.

    (2011)
  • S. Deerwester et al.

    Indexing by latent semantic analysis

    J. Am. Soc. Inf. Sci.

    (1990)
  • B. Moghaddam et al.

    Visualization and user-modeling for browsing personal photo libraries

    Int. J. Comput. Vis.

    (2004)
  • Y. Guo, X. Ding, C. Fang, H. Xue, J., Fisher׳s linear discriminant embedded metric learning, Neurocomputing 143 (2014)...
  • Y. Pang et al.

    Learning regularized LDA by clustering

    IEEE Trans. Neural Netw. Learn. Syst.

    (2014)
  • Y. Pang et al.

    Robust tensor analysis with l1-norm

    IEEE Trans. Circuits Syst. Video Technol.

    (2010)
  • J. Tenenbaum et al.

    A global geometric framework for nonlinear dimensionality reduction

    Science

    (2000)
  • V. Silva and J. Tenenbaum, Global versus local methods in non-linear dimensionality reduction, in: Advances in Neural...
  • Cited by (28)

    • A generalized multi-dictionary least squares framework regularized with multi-graph embeddings

      2019, Pattern Recognition
      Citation Excerpt :

      Such representative methods are Isometric Feature Mapping (ISOMAP) [12], Multi-manifold Discriminant Isomap (MMD-Isomap) [5], Multi-dimensional Scaling (MDS) [13], Maximum Variance Unfolding (MVU) [14], and Local Tangent Space Alignment (LTSA) [15]. In the second category, there exist methods such as, Locality Preserving Projections (LPP) [16], KLPP [17], Locality Sensitive Discriminant Analysis (LSDA) [18], Locally Linear Embedding (LLE) [19], Locality Preserving Graph Construction (LGPC) [20], Laplacian Eigenmaps (LE) [21], Neighborhood Preserving Projections (NPP) [22], and Neighborhood Preserving Embedding (NPE) [23]. Currently multi-view datasets are widespread.

    • A new two-layer mixture of factor analyzers with joint factor loading model for the classification of small dataset problems

      2018, Neurocomputing
      Citation Excerpt :

      Subsequently, the reduced features can be fed to various models for accurately learning a classification task. A typical example of this workflow, includes a Gaussian mixture model (GMM) classifier applied after a linear DR method, such as principal component analysis (PCA), linear discriminant analysis (LDA), factor analyzer (FA) [3–5], or a method from the recently proposed [6–9]. Besides linear methods, there are other DR techniques that achieve nonlinear projections of the data [10–12].

    • Customer churn prediction in the telecommunication sector using a rough set approach

      2017, Neurocomputing
      Citation Excerpt :

      Data preparation and feature selection are important steps in the knowledge discovery process. In order to identify those variables or attributes, from a large number of attributes in a dataset, that are relevant and will reduce the computational cost [64,69,70], the selection of the most appropriate attributes from the dataset in hand was carried out using a feature ranking method known as “Information Gain Attribute Evaluator” using a Weka toolkit [71]. It evaluates the attributes worth through the information gain and measurement procedure as per the class value.

    • A dimension reduction algorithm preserving both global and local clustering structure

      2017, Knowledge-Based Systems
      Citation Excerpt :

      On the other hand, some dimension reduction methods aim to preserve the local properties of the data. Commonly used methods include Locality Preserving Projections (LPP) [15], Locally Linear Embedding (LLE) [16], Neighborhood Preserving Embedding (NPE) [17], Laplacian Eigenmaps [18], Kernel LPP [19]. In these methods, the definition of neighborhood plays an important role.

    View all citing articles on Scopus

    Zeeshan Khawar Malik is completing his PhD from the University of Stirling. He received his MPhil from the University of the West of Scotland and MS and BS (honors) degrees from the University of The Central Punjab, Lahore Pakistan, in 2003 and 2006, respectively. By profession, he is an Assistant Professor in the University of The Punjab, Lahore, Pakistan, on leave and currently working as a Research Associate in the University of Windsor.

    Amir Hussain obtained his BEng (with the highest 1st Class Honours) and PhD in Electronic and Electrical Engineering from the University of Strathclyde in Glasgow, in 1992 and 1997 respectively. Following a post-doctoral Research Fellowship at the University of Paisley (1996–1998) and a research Lectureship at the University of Dundee (1998–2000), he joined the University of Stirling in 2000, where is currently a Professor of Computing Science and founding Director of the Cognitive Signal-Image Processing and Control Systems Research (COSIPRA) Laboratory. He has (co)authored/edited 10 Books and around 200 papers to-date in leading international journals and refereed Conference proceedings. Since 2000, he has generated over £1 m in research income as principal investigator, including from UK research councils, EU FP6/7, international charities and industry. He is founding Editor-in-Chief of both Springer׳s Cognitive Computation journal and SpringerBriefs in Cognitive Computation, Associate Editor for the IEEE Transactions on Neural Networks & Learning Systems and serves on the Editorial Board of a number of other journals. He regularly serves as invited speaker, general and program (co)chair and organizing committee member for leading international conferences. He is Chair of the IEEE UK & Republic of Ireland (RI) Industry Applications Society Chapter.

    Jonathan Wu (M׳92SM׳09) received the PhD degree in electrical engineering from the University of Wales, Swansea, UK, in 1990. He was with the National Research Council of Canada for 10 years from 1995, where he became a Senior Research Officer and a Group Leader. He is currently a Professor with the Department of Electrical and Computer Engineering, University of Windsor, Windsor, ON, Canada. He has published more than 250 peer-reviewed papers in computer vision, image processing, intelligent systems, robotics, and integrated microsystems. His current research interests include 3-D computer vision, active video object tracking and extraction, interactive multimedia, sensor analysis and fusion, and visual sensor networks. Dr. Wu holds the Tier 1 Canada Research Chair in Automotive Sensors and Information Systems. He is an Associate Editor for the IEEE Transactions on Systems, Man, and Cybernetics Part A, and the International Journal of Robotics and Automation. He has served on technical program committees and international advisory committees for many prestigious conferences.

    View full text