Abstract

In recent years, many face image feature extraction and dimensional reduction algorithms have been proposed for linear and nonlinear data, such as local-based graph embedding algorithms or fuzzy set algorithms. However, the aforementioned algorithms are not very effective for face images because they are always affected by overlaps (outliers) and sparsity points in the database. To solve the problems, a new and effective dimensional reduction method for face recognition is proposed—sparse graph embedding with the fuzzy set for image classification. The aim of this algorithm is to construct two new fuzzy Laplacian scattering matrices by using the local graph embedding and fuzzy k-nearest neighbor. Finally, the optimal discriminative sparse projection matrix is obtained by adding elastic network regression. Experimental results and analysis indicate that the proposed algorithm is more effective than other algorithms in the UCI wine dataset and ORL, Yale, and AR standard face databases.

1. Introduction

Machine learning has become one of the main research tools for small sample learning in complex networks. Using the ability of small sample and the knowledge learned, it is helpful to improve the model of key participant discovery in reality and effectively find the key participants under different objective criteria. There are more and more manifold learning algorithms with local preserving ability, which have become the active research fields in machine learning and pattern recognition [16]. Among them, the classical algorithms such as Laplacian eigenmaps (LE) [7, 8], isometric feature mapping (ISOMAP) [9], local linear embedding (LLE) [10, 11], and local tangent space alignment (LTSA) [12] are widely applied in machine learning and pattern recognition. LLE algorithm creates local optimal coordinates with least construction cost to map them to global coordinates. However, LLE algorithm lacks the generalization ability when applied to new face samples. Therefore, neighborhood preserving embedding (NPE) [13] was proposed by He et al., which maintained the optimal local neighborhood reconstruction relationship on the data. Another representative learning method of local preserving projection (LPP) [14, 15] is an unsupervised algorithm derived from LE, where local information can be found and retained in an embedding space. Recently, in order to unify the manifold learning framework, linear graph embedding (LGE) algorithm was proposed by Yan et al. [16].

However, in the real world, the face image is always affected by various expressions, posture, and illumination variation, so LGE and its extended algorithms cannot effectively solve the above problems [1719]. By using the fuzzy k-nearest neighbor of fuzzy set [20] technology, a lot of research studies have been completed on fuzzy image filtration, fuzzy edge detection, and fuzzy image segmentation. Its ultimate goal is to solve the uncertainty in pattern recognition and machine learning [2124].

However, all the above methods are difficult to get a reasonable explanation, that is, feature extraction and classification; feature plays an important role in high-dimensional data [2527]. Many sparse subspace learning methods have aroused great research interest [2830] in recent years. In recent years, the more effective sparse feature extraction methods include the least absolute shrinkage and selection operator (LASSO) [31], LARS [32], and elastic net [33]. Under the L1-norm, these methods can make the corresponding coefficient of the partial characteristics shrink to zero [2438].

In order to solve the above problems, we study the LGE algorithm based on subspace learning, elastic network regression, and fuzzy set theory, namely, sparse graph embedding with fuzzy set for image classification. In the proposed method, fuzzy k-nearest neighbor (FKNN) and sparse learning are used to improve the discrimination ability of the method in a low-dimensional space. Firstly, the value of the membership matrix is calculated by FKNN, and then the value is introduced into the Laplacian scatter matrix of within-class and between-class, respectively.

The novelty of our algorithm includes the following perspectives:(1)In this algorithm, manifold learning method and fuzzy set are used to keep the original neighbor relations and the intrinsic geometry structure for overlapped (outliers) data points. At the same time, the algorithm uses FKNN to construct two new fuzzy graphs, which has stronger recognition ability compared with other graph-based algorithms.(2)The proposed algorithm has a lower sensitivity to sparsity points in the data through elastic network regression which enhances the robustness of our algorithm. The sparse projections learned by the proposed algorithm indicate physical interpretation in real-world applications.

The rest of this paper is planned as follows: the idea of linear methods is described in Section 2. The algorithm innovation and implementation are reviewed in Section 3. The experimental results of the proposed algorithm verify its discrimination and effectiveness on four different standard databases in Section 4. The conclusions are summarized in Section 5.

2. Outlines of FKNN, LLE, and Lasso Regression Net

Firstly, we consider set samples of linear transformation mapping original space into feature space , where . The new eigenvector is from the linear transformation formula:where is the linear transformation projection matrix. The algorithms of FKNN, LLE, and lasso regression network are reviewed, respectively.

2.1. Fuzzy k-Nearest Neighbor (FKNN)

In the FKNN algorithm, the value of the membership degree is generated by the class assignment of the fuzzy k-nearest neighbor and the corresponding degrees of class membership. The following five steps comprise the FKNN algorithm:  Step 1:the value of the Euclidean distance matrix, which is calculated between eigenvectors using the training sample set.  Step 2:the value of the diagonal element is set to infinity by using the Euclidean distance matrix.  Step 3:the value of the distance matrix in the ascending order (process each column separately).  Step 4:the class corresponding to the set pattern, which is in the nearest neighborhood of the pattern sample (because neighborhoods are concerned, this returns a list of integers).  Step 5:the expression proposed in [20], which is used to calculate the type membership of mode , which iswhere is the sample (patterns) inclusion of the class. For example, if has neighbors belonging to the equal pattern, the member value will remain close to 0.51. Otherwise, it will remain close to 0.49. Generally, contains the following important properties:Therefore, can be obtained by the FKNN algorithm:where is the total class number and is all sample numbers in the class.

2.2. Locally Linear Embedding

LLE is a classic method of manifold learning methods, which can better maintain the original manifold structure after dimensionality reduction. There are three main steps to the algorithm.

The first step is to calculate the k-nearest neighbor relation of each data point by Euclidean distance.

The second step is to count the coefficient of by minimizing the reconstruction error. We can get the reconstruction weight matrix :

The reconstruction error is transformed into the following equation:where is the local Gram matrix. Optimal coefficient with constraint is obtained using the least square problem to solve

After iterating steps 1 and 2 for N times, we get the reconstruction weight coefficient matrix .

The third step is to use weight matrix to reconstruct in equation (7).

By minimizing the reconstruction error function, the inherent geometric features of the embedded data can be maintained. Considering the transformation , the objective function is simplified as follows:where .

2.3. Lasso Regression Network

In this study, we use the lasso algorithm to obtain the optimal solution for sparse samples. In this algorithm, a penalty function is constructed to obtain the biased estimation of complex linear data processing. Using the lasso interpretation model, the absolute value of the regression coefficient minimizes the sum of the squares of the residuals so that the regression coefficient is exactly equal to 0.

Suppose we have the value of and the value of in the observation variable.where and are variables and p is the number of elements in .

In order to further elaborate and solve this problem, it is assumed that the response variables are independent of the given observation conditions.

In general, the regression structure is as follows:

The available lasso estimate is

Here, is a harmonic parameter, and is obtained for all t. Let , and the number of regression coefficients will decrease to 0. If the harmonic parameter t is controlled, the regression coefficient can be reduced.

3. Sparse Graph Embedding with Fuzzy Set

3.1. The Basic Idea of the Proposed Algorithm

Firstly, two new fuzzy graphs are constructed to keep the same data points together and different data points separate by the FKNN algorithm. Secondly, in order to avoid singularity, the algorithm seeks the difference between the fuzzy scatter matrix of within-class and between-class by the MMC [39] framework. Finally, the optimal mapping projection matrix is acquired by the lasso regression network.

3.2. Fuzzy Within-Class Scatter Matrix with Locality Preserving

Firstly, we minimize the fuzzy within-class scatter matrix while maintaining the locality preserving of each data point and reconstruct the data points by using the linear coefficients of other data points. The algorithm is divided into four steps:(1)The -nearest neighbors of are obtained by Euclidean distance.(2)The reconstruction weight matrix is minimized bywhere . If is not the neighbor of , the row sum of is 1: .We get after repeating steps 1 and 2 for times.(3)To calculate the fuzzy weight of affinity, we can specify the new weight matrix :where and “∗” denotes the matrix element-wise multiplication.(4)The reconstruction error function is used to reconstruct weight matrix expressed as :where is the neighbor data point of and is the output of .

The objective function can be obtained by using equation (1) as follows:where matrix .

3.3. Fuzzy Between-Class Scatter Matrix with Locality Preserving

First, if they have different labels, we output the sum of the maximum square distances between the data. Therefore, the following three steps are used to solve the fuzzy between-class scatter matrix with locality preserving:(1)The similarity matrix is constructed as follows:where indicates that is the nearest neighbor of sample data point in the different class.(2)To calculate the fuzzy weight of affinity, we can specify the new weight matrix :where .(3)Complete the final embedding: By considering the mapping relation in equation (1), the objective function can be simplified:

3.4. The Criteria of the Proposed Algorithm

Finally, the MMC algorithm framework is used to minimize fuzzy within-class scatter matrix and maximize fuzzy between-class scatter matrix to find the optimal projection matrix . In fact, we can get such optimal projection matrix through the following multiobjective optimization problems, namely,

Finally, the above formula can be transformed into a constraint problem:Where is a constant used to balance matrix and matrix .

The Lagrange multiplier method can be used in equation (21):where is the Lagrangian multiplier parameter. Thus, the above formula can be further simplified to obtainwhere is a generalised eigenvector corresponding to a generalised eigenvalue .

Projection matrix can be obtained as follows:where indicates the quantity of nonzero elements in matrix . In this way, the optimal sparse projection matrix can be obtained in equation (24).

At last, we need the following theorem of feature analysis. After that, the learning problem of the sparse projection function is transformed into the regression framework.

Theorem 1. Let be the eigenvector of the following formula with eigenvalue :

If , then is the eigenvector of the eigenproblem with the same eigenvalue in equation (23).

Proof. Since , it is easy to prove the following formula according to equation (23):Therefore, is the eigenvector of the same eigenvalue of equation (23).
We can solve this problem with ridge regression:where . is the element of , and is the element of .
However, the ridge penalty of equation (27) does not supply sparse matrix . When using the lasso regression algorithm, add L1-norm to to getSo, we integrate the lasso regression and ridge regression [40]:Then, we get the optimal sparse transformation matrix .

4. Experimental Results and Analysis

In order to test and verify the effectiveness of the proposed algorithm, we compared it with PCA [41], LDA [42], MMC [39], LPP [14, 15], LLE [10, 11], SDE [38], and FLGE/MMC [4] on the UCI wine dataset and ORL, Yale, and AR face standard databases, respectively. In our experiments, the values of parameters in our algorithm are given in Table 1. Each experiment runs independently for ten times when different training samples are taken for the training set. In the experiment, we used Euclidean distance and nearest neighbor classifier to measure and classify, respectively. The four databases’ experimental results are completed in the same experimental environment (CPU: P4 4 GHz; RAM: 8 GB).

4.1. The Experiments and Interpretations in the UCI Wine Database

Now, we use a wine database from the UCI machine learning library to further validate and interpret the proposed method. There are 3 classes, 13 features, and 178 instances in the wine database. We select 48 samples in each class, and the basic statistics of the variance, mean, and range of the 13 features in the experiment are shown in Table 2.

Both the proposed method and USSL sparse projections are suitable for classification. We learn that there are precise physical explanations, which provide us a deeper understanding of the database. That is to say, the second (malic acid), the fourth (ash alkaline), the fifth (magnesium), the seventh (xanthic acid), the tenth (color intensity), and the thirteenth (proline) are the most important and significant distinguishing features. Moreover, the proposed method consistently outperforms USSL within lower dimension.

4.2. The Experiments in the ORL Database

There are 40 classes in the ORL face database [43]; each class contains 10 different face images for a total of 400 face images, among which facial expression, posture, and illumination vary. The images allow the face to tilt and rotate 20 degrees. In addition, there are some changes as high as about 10% in this scale. Figure 1 shows ten images of one class in the ORL face database. All images were normalized to 56 × 46 resolution.

In this experiment, the performance of PCA, MMC, LDA, LLE, LPP, SDE, FLGE/MMC, and the new proposed method is tested, respectively. First of all, images ( ranging from 2 to 6) from each class of the image database are randomly selected as the training sample set. Secondly, the remaining face images of each class are selected as the testing sample set. Finally, we independently run 50 times on images of each class.

In the PCA phase of seven algorithms (MMC, LDA, LPP, LLE, SDE, FLGE/MMC, and the proposed algorithm), 95% of the face image energy is retained for computational effectiveness. Table 3 displays the standard deviation, the best average recognition rates (%), and corresponding feature dimension of various algorithms in the ORL face database.

Figure 2 shows the curve when four face images in each class are randomly selected as the training set in the ORL face database. The curve shows the average recognition rate (%) of each algorithm corresponding to different dimensions.

4.3. The Experiments in the Yale Database

There are 15 classes in the Yale face database [44]; each class contains 11 different face images, a total of 165 face images, among which normal, surprised, happy, sad, sleepy, and winking are varied. Figure 3 shows 11 images of one class in the Yale face database.

In our experiment, all images were normalized to 50 × 40 resolution in the Yale database. It also retains 95% of face image energy as the same as the ORL face database. First of all, images ( ranging from 2 to 6) from each class of image database are randomly selected as the training sample set. Secondly, the remaining face images of each class are selected as the testing sample set. Finally, we independently run 50 times on images of each class. The curve in Figure 4 shows the average recognition rate (%) of various algorithms under different feature dimensions when six face images of each class are randomly selected as training sets in the Yale face database. Table 4 shows the standard deviation, the best average recognition rate (%), and the corresponding feature dimensions of various algorithms in the Yale face database, respectively.

4.4. The Experiments in the AR Database

The AR face database [45] has 70 men and 56 women, a total of 126 people. The database consists of more than 4000 color people, including occluded front views, lighting conditions, and facial expressions. In AR database, there are 120 face images, including 65 male and 55 female. These images are divided into two parts (two weeks apart), each part containing 20 color face images. The face part image of each person is normalized to 50 × 40 pixels. And Figure 5 displays an example image of one class in the AR face database.

It also retains 95% of face image energy as the same as ORL and Yale face database. First of all, images ( ranging from 2 to 6) from each class of the image database are randomly selected as the training sample set. Secondly, the remaining face images of each class are selected as the testing sample set. Finally, we independently run 10 times on images of each class. Table 5 shows the standard deviation, the best average recognition rate (%), and the corresponding feature dimensions of various algorithms in the AR face database, respectively.

Figure 2 shows the curve when five face images in each class are randomly selected as the training set in the AR face database. The curve shows the average recognition rate (%) of each algorithm corresponding to different dimensions.

4.5. Overall Observations and Discussion

After analyzing the experimental results of the above four databases, the following conclusions can be drawn:(1)As can be seen from Table 1, the first six important characteristics (the second, fourth, fifth, seventh, tenth, and thirteenth characteristics) reflect the data distribution. The last characteristic factor plays a decisive role in the data distribution because of its wide range and large variance.(2)It can be seen from Tables 35 that compared with PCA, MMC, LDA, LLE, LPP, SDE, and FLGE/MMC, the proposed algorithm obtains higher average recognition rate, which shows that the proposed algorithm can well control the data overlap (outlier) and sparse points. The results also show that the proposed algorithm is more discriminative than other algorithms because the membership degree of the fuzzy neighborhood helps to make the nearest neighbor samples closer to each other in the same class, while repel the far neighbor samples of different classes farther.(3)As shown in Figures 2, 4, and 6, under the same experimental conditions, the average recognition rate (%) of the proposed algorithm is better than that of other algorithms. It is further illustrated that the algorithm can obtain the optimal sparse projection discriminant feature because elastic network regression can obtain sparse feature extraction results.(4)Different from PCA, MMC, and LDA which attempt to preserve the global Euclidean structure, LPP, LLE, SDE, FLGE/MMC, and the proposed algorithm aim to discover the local geometric structure. However, sparse subspace learning methods are superior to the compared methods. The sparse projections learned by the proposed algorithm provide physical interpretation in real-world applications. Therefore, the discriminant knowledge is effectively discovered.

5. Conclusions

This paper proposes a new supervised learning algorithm for feature extraction and face recognition, namely, sparse graph embedding with fuzzy set for image classification. The experimental results show the proposed algorithm is more effective than other techniques. For future work, we will decide optimal parameters such as,, and and perform more tests on other types of face data. Specifically, for face recognition, the sparse face subspace shows us an intuitive, semantic, and insightful understanding of the feature extraction. In future work, we will extend the SDE algorithm to kernel and tensor forms via the kernel and tensor methods, respectively.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by the National Science Foundation of China (Grant nos. 61876213, 6177227, 61861033, 61976118, U1831127, and 71972102), the Key R&D Program Science Foundation in Colleges and Universities of Jiangsu Province (Grant nos. 18KJA520005, 19KJA360001, and 20KJA520002), the Natural Science Fund of Jiangsu Province (Grant nos. BK20201397 and BK20171494), the National Key R&D Program (Grant nos. 2017YFC0804002 and 2019YF B1404602), the Natural Science Fund of Jiangxi Province (Grant no. 20202ACBL202007), the Natural Science Foundation of Guangdong Province (Grant no. 2016A030307050), and the Special Foundation of Public Research of Guangdong Province (Grant nos. 2016A020225008 and 2017A040405062).