Abstract

Photoplethysmography (PPG) biometric recognition has recently received considerable attention and is considered to be a promising biometric trait. Although some promising results on PPG biometric recognition have been reported, challenges in noise sensitivity and poor robustness remain. To address these issues, a PPG biometric recognition framework is presented in this article, that is, a PPG biometric recognition model based on a sparse softmax vector and k-nearest neighbor. First, raw PPG data are rerepresented by sliding window scanning. Second, three-layer features are extracted, and the features of each layer are represented by a sparse softmax vector. In the first layer, the features are extracted by PPG data as a whole. In the second layer, all the PPG data are divided into four subregions, then four subfeatures are generated by extracting features from the four subregions, and finally, the four subfeatures are averaged as the second layer features. In the third layer, all the PPG data are divided into 16 subregions, then 16 subfeatures are generated by extracting features from the 16 subregions, and finally, the 16 subfeatures are averaged as the third layer features. Finally, the features with first, second, and third layers are combined into three-layer features. Extensive experiments were conducted on three PPG datasets, and it was found that the proposed method can achieve a recognition rate of 99.95%, 97.21%, and 99.92% on the respective sets. The results demonstrate that the proposed method can outperform current state-of-the-art methods in terms of accuracy.

1. Introduction

As a kind of biological signal, photoplethysmography (PPG) is difficult to steal or replicate, which has advantages of inherent antispoofing and liveness detection, and it can be conveniently recorded with just a combination of light-emitting diodes and photodiodes (PDs) from any part of the body and is thus very cost-effective compared to other biometric traits. Gu et al. [1] were the first group to investigate PPG for user authentication, considering four-feature parameters and achieving 94% accuracy. Since then, PPG biometric recognition has attracted increasing research interest and is regarded as one of the most promising biometric techniques. For PPG biometric recognition, many fiducial-point and nonfiducial-point approaches have been proposed in the past.

In fiducial-point-based approaches, features are extracted from systolic peaks, diastolic peaks, dicrotic notches, interpulse intervals, amplitudes of peaks, etc. Given the variability in PPG shape in different states, fiducial detection on raw PPG signals might be unsuccessful or incorrect. Therefore, many researchers [25] extended the paradigm to first derivatives (FDs) or second derivatives (SDs) of raw PPG signals and used similar points on FDs and SDs as features for recognition. For example, Kavsaoğlu et al. [2] proposed that a feature with 40 dimensions was extracted from raw PPG and its derivatives for 30 healthy subjects, and k-nearest neighbor (k-NN) was used for classification. Chakraborty and Pal [3] proposed that features with 12 dimensions were extracted from filtered PPG and its derivatives, and linear discriminant analysis (LDA) was used for classification, achieving 100% accuracy for 15 subjects. Kavsaoğlu et al. [4] showed that features with 20 dimensions were extracted from a PPG signal and its second derivative, which achieved a 95% recognition rate by using 10-fold cross-validation. Al-Sidani et al. [5] proposed that a feature with 40 dimensions was extracted from filtered PPG and its derivatives, and then the k-NN classifier was applied, achieving a 100% recognition rate for 23 subjects. Jindal et al. [6] presented a novel two-stage technique for PPG biometric identification involving the clustering of individual PPG sources into different groups and using deep belief networks as classification models; the approach was tested on the TROIKA dataset and achieved an accuracy of 96.1%. Nadzri et al. [7] applied a low-pass filter to remove unwanted noise from the PPG signal, and then discriminant features including systolic peaks, diastolic peaks, and dicrotic notches were extracted from the filtered PPG signals. Later, a Bayes network (BN), naïve Bayes (NB), radial basis function (RBF), and multilayer perceptron (MLP) were used to classify the 21 subjects using the discriminant features. Sancho et al. [8] studied several feature extractors (e.g., cycles average, multicycle based on the time domain, and the Karhunen-Loève transform average) and matching metrics (Manhattan and Euclidean distances) that had been tested by using four different PPG databases (CapnoBase, MIMIC-II, Berry, and Nonin), and an optimal equal error rate (EER) of 1.0% was achieved for CapnoBase.

Many of the previous methods focused on fiducial approaches, but fiducial-point detection in any condition is error prone. Therefore, many researchers [913] have researched nonfiducial-point-based approaches. Nonfiducial approaches take a more holistic approach in which features are extracted statistically based on the overall signal morphology [10]. Spachos et al. [9] proposed a PPG biometric recognition method based on LDA/k-NN, and an EER of 0.5% was achieved for 14 subjects of the OpenSignal dataset. Karimian et al. [10] used an anon-fiducial approach for PPG with a discrete wavelet transform (DWT) and k-NN and reported a 99.84% accuracy rate with an EER of 1.31% on the CapnoBase dataset. Yadav et al. [11] proposed a method for PPG authentication based on a continuous wavelet transform (CWT) and direct linear discriminant analysis (DLDA) and reported a 0.46% EER on the CapnoBase dataset. Farago et al. [12] presented the correlation-based nonfiducial features extraction technique and achieved a 98% accuracy rate for 30 subjects by computing correlations of the individual’s peak-to-peak interval to a reference PPG peak-to-peak interval. Lee et al. [13] tried to use a discrete cosine transform (DCT) for extracting features from the preprocessed PPG data, and the extracted features were used as the input variables for machine-learning techniques, e.g., decision tree, k-NN, and random forest (RF); the accuracy of these aforementioned algorithms was 93%, 98%, and 99%, respectively.

Recently, deep-learning methodologies have been used in PPG biometric recognition. Everson et al. [14] formulated a completely personalized data-driven approach and used a four-layer deep neural network employing two convolutional neural network (CNN) layers in conjunction with two long short-term memory (LSTM) layers, followed by a dense output layer for modeling the temporal sequence inherent within the pulsatile signal representative of cardiac activity. The proposed network was evaluated on the TROIKA dataset, which was collected from 12 subjects involved with physical activities and achieved an average accuracy of 96%. Biswas et al. [15] presented a novel deep-learning framework (CorNET) to efficiently estimate heart rate (HR) information and performed biometric identification using only wrist-worn, single-channel PPG signals collected in an ambulant environment, and an average accuracy of 96% was achieved for 20 subjects on the TROIKA dataset. Hwang and Hatzinakos [16] proposed a novel deep-learning-based verification model using PPG signals, built a personalized data-driven network by employing a CNN with LSTM, and modeled the time-series sequence inherent within the PPG signal.

From the current studies for PPG biometric recognition, the nonfiducial-based approach is better than the fiducial-based approach with the same conditions, and the deep-learning method is initially launched. However, there are several shortcomings in these studies: (1) most of the PPG biometric methods are sensitive to noise and have poor robustness, and (2) deep learning has good recognition performance, but it is not easy to train on small-scale data by means of data augmentation. In addition, deep learning requires powerful computational resources, and it has too many hyperparameters that need intricate adjustment. Inspired by [1721], a PPG biometric recognition model based on a sparse softmax vector (SSV) and k-NN is proposed herein, and the main contributions of our work are the following.(1)Raw PPG data are rerepresented by sliding window scanning. Raw PPG data is rerepresented by a sliding window so as to maximize the total data and reduce the impact caused by too little data(2)A three-layer feature-extraction method based on a sparse softmax vector is presented. Most of the sparse representation methods are based on all the raw data, which ignores the local data. However, the local data tend to show more detailed and discriminative information than all the data. Hence, all the data and their subregions are combined to obtain softmax vectors by using three layers(3)To verify the recognition performance of k-NN, a variety of classifiers are also used for experiments, such as RF, a linear discriminating classifier (LDC), and NB

The rest of this article is organized as follows. In Section 2, the proposed method is explained. In Section 3, the experimental process is detailed and the results obtained are presented. Finally, in Section 4, conclusions and directions for future work are presented.

2. Proposed Method

To perform PPG biometric recognition, a PPG biometric recognition framework was designed. First (mentioned in Section 2.1), raw PPG data are rerepresented by sliding window scanning. Second (mentioned in Section 2.2), the final discriminative features are generated from the rerepresented PPG data. Finally (mentioned in Section 2.3), the classification procedure is performed for the final features. The flow diagram of PPG biometric recognition using the proposed framework is shown in Figure 1. In the following subsections, preprocessing of raw PPG data, base feature extraction, and three-layer feature extraction and classification will be detailed.

2.1. Preprocessing Raw PPG Data

Preprocessing means that raw PPG data are rerepresented by sliding window scanning in our approach, and the input raw PPG data of a subject can be segmented into N-dimensional segments. Suppose that one has a sliding window of dimension M, an M-dimensional vector will be generated by scanning the raw PPG data for one step, and N − M + 1 vectors are produced by sliding for N − M + 1 steps; finally, these vectors are concatenated to a (M, L) matrix, and the matrix is denoted by Aml. That is to say, the raw PPG data of K subjects are rerepresented as Aml. L can be computed as follows:where Nk is the number of class subjects. The flow diagram of raw PPG data rerepresented using sliding window scanning is shown in Figure 2.

2.2. Features Extraction
2.2.1. Basic Acquisition of New Feature

In this section, we use the alternating direction method of multipliers (ADMM) [19, 22] to solve the sparse representation problem. The sparse representation method is more robust than others for biometric recognition [17, 23], so it is used in the getting new feature (GNF) process to obtain softmax vectors.

Suppose that one has K-class subjects, y represents a testing sample, and represents training samples. In terms of a sparse-representation-based classifier (SRC) [17, 24] and dictionary learning [19], the representation model can be transformed into the following minimization problem:where λ is a scalar constant, and and represent the L1-norm and L2-norm, respectively. Sparse-representation-based classifier codes a testing sample by a sparse linear combination of all training samples and classifies it to the class which has minimum representation error. Therefore, we propose to use the representation error vector of all classes to represent a sample. After solving the representation coefficients , the representation error of each class can be computed as follows:where is the sample set with respect to class k and the coefficient vector associated with class k. Then, the softmax vector can be computed by a softmax function as follows:where (4) can be written as . If the testing sample y belonged to the ith class (i ≤ K), should be larger than other atoms in the softmax vector , which is called class discrimination. The above process of obtaining softmax vector is named as getting a new feature on dictionary X (GNFX). For convenience, the entire procedure of computing the softmax vector for a given testing sample y is defined asand the solution process of the sparse representation coefficient can be seen in [20, 21]. The detailed procedure for extracting the sparse softmax vector of a test sample is summarized in Algorithm 1.

Input: K-class subjects, training sample X and a testing sample y normalized with L2-norm, parameters , , , identity matrix I
Output: softmax vector of K-class
(1) initialize
(2) repeat
(3)  update
(4)  update
(5)  update
(6) until convergence
(7) let
(8) for each k in {1, 2, 3, … , K} do
(9)  let
(10)  let
(11) end for
(12) let
(13) output
2.2.2. Three-Layer Feature Extraction

Inspired by the three-layer spatial pyramid model in [20, 21], a three-layer feature-extraction method is proposed herein. First, Aml is divided into X and Y, where X represents the training sample matrix and Y a testing sample matrix; the class number is K. Second, the final features are generated by the concatenation of three layers. For the first layer, each sample of X and Y is taken as a whole to call GNFX, and then a K-dimensional sparse softmax vector is generated. For the second layer, each sample of X and Y is divided into four subregions, and each subregion is taken to call , the results of the operation are averaged, and a K-dimensional sparse softmax vector is generated. For the third layer, each sample of X and Y may be divided into 16 subregions, and a K-dimensional sparse softmax vector is generated similarly. Finally, the sparse softmax vectors generated by X and Y are integrated into a training template and a new testing sample matrix, respectively. Figure 3 shows the process by which a sample y is divided into four subregions and is generated by a sparse softmax vector. Figure 4 shows the process by which the sparse softmax matrix is generated by three-layer feature extraction for y.

2.2.3. Algorithm of Three-Layer Feature Extraction

The computing process of a three-layer sparse softmax vector is shown in detail in Algorithm 2.

Input: Training sample X, and a testing sample y, and K-class subjects
Output:
(1) let
(2) dividing X into 4 subregions: X1, X2, X3, X4
(3) dividing y into 4 subregions: Y1, Y2, Y3, Y4
(4) for each i in {1, 2, 3, 4} do
(5)  let
(6) end for
(7) let
(8) dividing X into 16 subregions:
(9) dividing X into 16 subregions:
(10)  for each j in {1, 2, 3, … , 16} do
(11)   let
(12) end for
(13) let
(14) output
2.3. Classification

Machine-learning algorithms used in biometric recognition are used to try and solve two different problems. Biometric identification is seen as a multiclass classification problem. Biometric verification is seen as a one-class classification problem. Although there are some algorithms specifically used for each problem, some multiclass classification problems can be adapted to solve one-class classification problems. In general, the matching uses similarity and machine-learning techniques to produce its decision [14], and in this study, machine-learning technology is used to make decisions. Here, the training template and test sample generated by the proposed feature-extraction method are input into the machine-learning classifier, and then the recognition rate is output. The classifiers used here include k-NN [4, 10, 25], NB [7, 26], RF [13, 27], and LDC [3].

3. Experiments and Results

3.1. PPG DataSets

Three public datasets are introduced here: Beth Israel Deaconess Medical Center (BIDMC), Multiparameter Intelligent Monitoring for Intensive Care (MIMIC), and CapnoBase. Table 1 summarizes the characteristics of these datasets, with a subsection devoted to a specific description of each dataset.

3.1.1. BIDMC Dataset

The BIDMC dataset [28, 29] is a dataset of electrocardiogram (ECG), pulse oximetry (PPG), and impedance pneumography respiratory signals acquired from the intensive care unit (ICU) patients. The dataset comprises 53 8-minute recordings of ECG, PPG, and impedance pneumography signals (sampling frequency, fs = 125 Hz) acquired from adult patients (aged 19–90+, 32 females). The patients in the dataset were randomly selected from a larger cohort that were admitted to medical and surgical intensive care units at the BIDMC, Boston, Mass., USA.

3.1.2. MIMIC Database

The MIMIC database [30, 31] collects recordings of PLETH, ABP, RESP, etc. of patients in ICUs and is published on PhysioBank ATM for free. These recordings have multiple data formats. The partial recordings of 32 patients were downloaded for this work, that is, Nos. 039, 055, 210, 211, 212, 216, 218, 219, 221, 224, 225, 230, 240, 253, 276, 281, 284, 403, 408, 410, 411, 437, 439, 444, 449, 451, 466, 471, 472, 474, 476, and 484. The recordings of each patient include .hea, .mat, .info, and plotATM.m files (.mat is the matrix of raw signal value, .info the signal name, and other information about .mat; .hea is needed to read .mat files using applications in the WFDB Software Package or functions in the WFDB Toolbox for MATLAB (MathWorks, USA)). plotATM.m is a function that reads .mat and .info files and plots the converted data. PLETH is the PPG data signal needed in this work and its frequency is 125 Hz.

3.1.3. CapnoBase Dataset

The CapnoBase dataset [32, 33] contains raw PPG signals for 42 cases of 8-minute duration, pulse peak and artifact labels validated by an expert rater, reference CO2 signal and derived instantaneous respiratory rate for all cases, reference electrocardiogram (ECG) signal with R peak and artifact labels validated by an expert rater, and reference instantaneous heart rate derived from ECG and PPG pulse peak labels. References [4, 5] also used the dataset for PPG biometric recognition and achieved good results.

3.2. Experiments

To evaluate the performance of PPG biometric recognition, extensive experiments were performed under the MatLab2018b programming environment. To avoid special cases, all experiments were run multiple times, and the average recognition rates are reported. Following experimental settings in the experiments, experimental data of each group are randomly obtained from a continuous sequence of raw PPG data, 80% of which are used for training and 20% for testing.

3.2.1. Evaluation Metrics

To evaluate the performance of the proposed method, experiments were conducted using the following method. For the identification problem, the recognition rate is used as the evaluation criterion, which is the percentage of correctly recognized testing samples, defined as follows:where is the total number of probe samples and the number of probe samples that are correctly identified.

3.2.2. Performance of Proposed Method

Experiments were conducted on the datasets, that is, BIDMC, MIMIC, and CapnoBase, to examine the performance of the proposed approach. All subjects for each dataset participated in the experiments. For a single subject in each dataset, PPG data points of a cycle were randomly selected to participate in the experiments, 80% of which were used for training and 20% for testing. Average recognition rates were summarized based on extracting one-, two-, and three-layer features and classifiers including k-NN, RF, LDC, and NB.

The experimental results of one-layer feature extraction are summarized in Table 2, which shows average recognition rates from each of , and based on k-NN, RF, LDC, and NB on the three datasets. The experimental results of two-layer feature extraction are summarized in Table 3, which shows average recognition rates from each of , , and based on k-NN, RF, LDC, and NB on the three datasets. The experimental results of three-layer feature extraction are summarized in Table 4, which shows recognition rates from based on k-NN, RF, LDC, and NB on the three datasets; the same experiment was conducted five times. Table 5 shows a comparison of recognition rates for one-, two-, and three-layer feature extraction based on k-NN, RF, LDC, and NB on the three datasets.

As can be seen from Tables 2 and 3, although the recognition rates based on one or two layers are relatively good, they are not very stable. According to Table 4, the recognition rate with the extracted three-layer features is relatively stable. As shown in Table 5, the performance of three-layer feature extraction based on the classifiers is superior to that of one or two layers, demonstrating that the representation generated by three-layer features has high discrimination. Moreover, it was found that our three-layer features based on these classifiers achieve satisfactory recognition performance on the three datasets. In addition, the recognition rate of three-layer features based on these classifiers is higher than that of two-layer features on the three datasets, and that of two-layer features based on these classifiers is higher than that of one-layer features on the three datasets. For example, the proposed method achieves recognition rates of 99.73%, 99.78%, and 99.92% for one, two, and three layers based on k-NN on the CapnoBase dataset, respectively. In short, all these results demonstrate the effectiveness of the three-layer feature extraction method. As can be also seen from Table 5, the recognition rate of three-layer features based on k-NN is obviously better than that based on RF, LDC, and NB on the three datasets. It indicates that k-NN has advantages for small-scale data. In addition, small-scale data were used in the literature [1416], and the corresponding recognition rate was only 96%, which showed that the performance of deep learning drops apparently on small-scale data.

Furthermore, additional experiments were conducted to analyze the influence of data length in feature extraction. First, 20 subjects were randomly selected from each dataset for the three datasets. Second, continuous data of 0.5, 1, 1.5, and 2 cycles were blindly selected for each subject, 80% of which were used for training and 20% for testing, and three-layer features were extracted from the continuous data. Finally, Figure 5 shows the influence of data length on the proposed method, from which it can be observed that the best recognition rate is achieved when the data is 1.5 or 2 cycles in length. When data of 0.5 cycle length are used, the recognition rate fluctuates greatly, but when those of 1 cycle length are selected, the recognition rate is clearly improved. However, when the data length is sufficiently long, the recognition rate can be improved, but the effect is not obvious, and it will cost a lot of time.

3.2.3. Comparisons with the State-of-the-Art Methods

From the current research on PPG biometric recognition, several methods use their datasets [1, 2, 4, 5, 7, 12], while others use, e.g., OpenSignal [9], Biosec [9, 16], and TROIKA [6, 14, 15], yet the BIDMC and MIMIC datasets are not involved in this literature. Therefore, comparisons between our method and other state-of-the-art methods on the CapnoBase dataset are summarized in Table 6. Compared with the other methods, our method need not extract feature points from raw PPG data, and the data only needs to simply be rerepresented by a sliding window, so our method has the advantages of simple operation and low time complexity. Three-layer features based on softmax vector are extracted only from blindly taking the continuous data of more than 1 cycle length, in which we consider the combination of macro- and microdata. The experiments show that our method achieves very high accuracy and has strong robustness, and the microdata tend to show more detailed and discriminative information than the macrodata.

3.2.4. Time-Cost Analysis

To verify the efficiency of the proposed framework, the preprocessing time (in s), feature-extraction time, and matching time cost during the PPG biometric recognition procedure on the CapnoBase dataset are further summarized. The results are summarized in Table 7. The experimental environment includes an HP EliteBook 8570w notebook with an Intel (R) Core (TM) i7-3740QM CPU @ 2.7 GHz with 2.70 GHz, 8.00 GB RAM, and the 64-bit Windows 7 operating system. Table 7 lists the average preprocessing time per sample, average feature-extraction time per sample, and average matching time per sample pair. Since there is no analysis of time cost in other studies, only our experimental results are summarized here. From Table 7, it is found that the feature-extraction time of the proposed method is fast and acceptable. In addition, the proposed method is more efficient than deep-learning-based methods, as it is well known that deep-learning-based training is time-consuming. In conclusion, the proposed method is efficient and scalable.

4. Conclusions and Directions of Future Work

In this study, a PPG biometric recognition framework is proposed. First, raw PPG data are rerepresented by sliding window scanning, which is not sensitive to noise; the raw PPG data can be blindly segmented, and then the raw PPG data do not need other complicated denoising operation. Second, the three-layer extracted features combine global and local subfeatures, and each layer feature is represented by a sparse softmax vector. Finally, the extensive experimental results demonstrate that the method based on three-layer features and k-NN can achieve high recognition rates of 99.95%, 97.21%, and 99.92% on the BIDMC, MIMIC, and CapnoBase datasets, respectively, and three-layer features can also achieve high recognition rates using RF, LDC, and NB on the three datasets. The extensive experimental results on the three datasets also demonstrate that our method outperforms several state-of-the-art methods. In particular, our method is suitable for small-scale PPG data biometric recognition and consumes fewer resources. Despite the satisfactory performance achieved by our method, there is still some room for the proposed PPG biometric recognition framework, especially for large-scale PPG applications. In our future work, we will further explore the attributive information of the PPG signal to improve the performance.

Data Availability

The simulated data used to support the simulation part of this study are available from the corresponding author upon request, and the real-world PPG data can be obtained from https://www.physionet.org/physiobank/database/bidmc, https://archive.physionet.org/cgi-bin/ATM?database=mimic2db, and http://www.capnobase.org/database/pulse-oximeter-ieee-tbme-benchmark.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors thank LetPub (http://www.letpub.com) for its linguistic assistance during the preparation of this manuscript. This work was supported in part by the NSFC-Xinjiang Joint Fund under Grant U1903127 and in part by the Key Research and Development Project of Shandong Province under Grant 2018GGX101032.