Abstract

Discovering shared, invariant feature representations across subjects in electrocardiogram (ECG) classification tasks is crucial for improving the generalization of models to unknown patients. Although deep neural networks have recently been emerging in extracting generalizable ECG features, they usually rely on labeled samples from a large number of subjects to guarantee generalization. Extracting invariant representations to intersubject variabilities from a small number of subjects is still a challenge today due to individual physical differences. To address this problem, we propose an adversarial deep neural network framework for interpatient heartbeat classification by integrating adversarial learning into a convolutional neural network to learn subject-invariant, class-discriminative features. The proposed method was evaluated on the MIT-BIH arrhythmia database which is a publicly available ECG dataset collected from 47 patients. Compared with the state-of-the-art methods, the proposed method achieves the highest performance for detecting supraventricular ectopic beats (SVEBs), which are very challenging to identify, and also gains comparable performance on the detection of ventricular ectopic beats (VEBs). The sensitivities of SVEBs and VEBs are 78.8% and 92.5%, respectively. The precisions of SVEBs and VEBs are 90.8% and 94.3%, respectively. With high performance in the detection of pathological classes (i.e., SVEBs and VEBs), this work provides a promising method for ECG classification tasks when the number of patients is limited.

1. Introduction

Classifying electrocardiogram (ECG) heartbeat is essential for cardiac diseases (e.g., cardiac arrhythmia) diagnosis. However, it is time consuming for cardiologists to inspect a long-term electrocardiogram (ECG) manually, making automatic ECG analysis useful. Currently, a large number of methods have been proposed for ECG classification. Two paradigms, known as intrapatient and interpatient paradigms, are usually adopted for evaluating ECG classification methods. In the intrapatient paradigm, the heartbeats from different patients are divided into the training and evaluation sets randomly. This evaluation paradigm is not highly reliable in the real world since the heartbeats from the same patients may be used for both the training and the testing, making the evaluation of the generalization of the classifier biased. In practice, an automatic ECG classification system should provide an accurate diagnosis for any unknown patient (patient not in the training set). The interpatient paradigm specifies that the heartbeats used for the training and the testing are from different individuals to obtain a more realistic evaluation. However, automatic interpatient ECG classification is a challenge today due to variations in ECG morphology and rhythm caused by individual physiological differences.

As illustrated in Figure 1, an ECG heartbeat mainly consists of a P wave, QRS complex wave, and T wave, which reflect electrical activities of depolarization and repolarization processes of the atria and ventricle. In general, a complete ECG classification system consists of three procedures: (1) ECG signal preprocessing, such as baseline wander removal and heartbeat segmentation; (2) feature extraction, mainly including morphological features [14], statistical features [57], P-QRS-T features [810], and wavelet features [1113]; and (3) classification, such as support vector machine (SVM) [3, 9, 14, 15] and artificial neural network (ANN) [8, 16]. Chen et al. [9] combined projected ECG features and weighted RR interval features and then input these features into SVM for heartbeat classification. While their method yielded a high classification performance under the intrapatient evaluation paradigm, the sensitivity and precision metrics for detecting supraventricular ectopic beats were only 29.5% and 38.4% under the interpatient evaluation paradigm on the MIT-BIH arrhythmia database. Raj et al. [17] introduced a sparse representation technique to extract features representing ECG signals and used machine learning techniques (such as SVM and k-nearest neighbor) to classify these features, which obtain a good result in detecting supraventricular ectopic beats. Mondejar et al. [4] extracted morphological features and the features based on wavelets, high-order statistics, local binary patterns, and RR intervals. They proposed to feed each type of feature into a single SVM to train and obtain specific SVM models. Then, the predictions of these SVM models were combined to obtain the final prediction, which achieved an overall good performance for interpatient heartbeat classification. These methods rely on expert knowledge and experience for feature engineering. Thus, the classification performance could be very sensitive to the quality of extracted features.

Recently, many studies on ECG classification are increasingly focusing on deep learning due to its powerful ability for automatic feature learning and classification. When the training dataset is sufficient, deep neural networks (e.g., convolutional neural network (CNN)) are shown to be very predominant in classification tasks [1822]. Hannun et al. presented a 34-layer deep CNN trained on 91232 ECG recordings collected from 53549 individuals, which achieved cardiologist-level accuracy in arrhythmia classification. However, complex models such as the CNN are prone to overfitting when the number of patients is limited (e.g., 47 different patients included in the MIT-BIH arrhythmia database), making it difficult for classifying the heartbeats of unknown patients. In fact, some deep learning-based methods [2325] have achieved satisfactory results on small databases such as the MIT-BIH arrhythmia database for interpatient ECG classification. Li et al. [23] developed a multiscale convolutional neural network in which 3D features containing morphological characteristic, beat-to-beat correlation feature, and RR interval were taken as inputs. Niu et al. [24] proposed a deep-learning framework that introduces a symbolization approach to represent the rhythm and morphology of the heartbeat and feeds the symbolic representation into a multiperspective convolutional neural network. However, current methods lacked explicit mechanisms to explore ECG feature invariance across subjects. They usually stand on the assumption that their proposed models can intrinsically learn generalizable features during training. This implicit learning is naturally restrained by the amount of individual ECG data. Therefore, how to explicitly learn invariant representations against intersubject variations is a critical issue, especially when the number of patients is limited.

In this paper, we propose an adversarial ECG heartbeat classification framework based on a convolutional neural network, as illustrated in Figure 2. The framework integrates adversarial learning into a convolutional neural network, which extends deep-learning models for ECG identification tasks. The adversarial CNN is composed of an encoder, classifier, and adversary networks. The encoder network extracts features from ECG heartbeat signals and corresponding RR intervals. The classifier and adversary networks are responsible for maximizing the class labels prediction and minimizing the subject ID identification. By this adversary game, the encoder is trained to learn subject-invariant, class-discriminative features. The proposed method was evaluated on the MIT-BIH arrhythmia database which is a publicly available ECG dataset collected from 47 patients. Ablation studies show that our adversarial subject-invariant feature learning significantly enhances interpatient ECG heartbeat classification accuracy compared to conventional deep-learning methods.

The main contributions of this paper are concluded as follows:(1)Our goal is that the features learned by a deep-learning model can generalize to unknown patients well for ECG identification/classification tasks. To this end, a deep-learning-based ECG heartbeat classification framework is proposed for tackling the learning of generalizable features. Specifically, we introduce an adversary loss into the convolutional neural network, encouraging the model to learn subject-invariant, class-discriminative representations from an insufficient number of subjects through the adversary game.(2)The experiments on the publicly available and commonly used dataset, MIT-BIH database, demonstrate that the proposed method can achieve the state-of-the-art performance on the detection of pathological classes when the number of subjects is limited.

2. Method

2.1. Problem Description

Let indicate the training set, with denoting the original ECG heartbeat, denoting the class label of , and denoting the subject identification (ID) number of . The reasonable assumption here is ECG data being jointly dependent on class labels and subject IDs . The task of ECG classification is to predict given . In the real world, this task requires the predictions invariant to , namely, a generalizable model across subjects is necessary. In this study, we regard as the nuisance variable and aim to develop a convolutional neural network model to learn generalizable features across subjects that are invariant to .

2.2. Data Preprocessing and Feature Extraction

All original ECG recordings are preprocessed to generate the input of the proposed adversarial convolutional neural network, as presented in Figure 2(a). First, we segment the original ECG recordings into heartbeats according to the locations of R peaks annotated by the MIT-BIH arrhythmia database. Specifically, the 50 points after the previous R peak and the 100 points after the current R peak are taken as a heartbeat. This segmentation allows heartbeats to contain a more robust P-QRS-T complex waveform since the heart rate is constantly changing, and the fixed starting point relative to the current R peak may introduce disturbance information (heartbeats with a short RR interval) or lose information (heartbeats with a wide waveform). Our segmentation will result in heartbeats of different lengths; however, CNNs fail to accept the varied-length input. Therefore, in the second step, we resample all heartbeats to the same length 128. Third, the average of all heartbeat segments is subtracted to suppress the baseline wander.

In addition to the preprocessed heartbeat signal, the heartbeat rhythm (RR interval information) is extracted as another part of the input, as shown in Figure 2(b). The pre-RR interval (the interval between the current R peak and the previous one) is a typical RR interval feature, which generally can distinguish arrhythmias from normal heartbeats of a person [27]. However, the pre-RR interval distribution of arrhythmic heartbeats may overlap with that of normal heartbeats as the individual basic heart rate is different, especially for the patient population. To eliminate the overlap, we extract the pre-RR ratio (the ratio of the current pre-RR interval to the average of all pre-RR intervals of the corresponding recording) to unify everyone’s basic heart rate. Furthermore, the near-pre-RR ratio (the ratio of the current pre-RR interval to the average of the previous ten pre-RR intervals) is also extracted since the individual basic heart rate changes with mood and movement state [1]. To build the input of the adversarial convolutional neural network, we duplicate these two scalar features as vectors with a length of 128 and then concatenate with the preprocessed heartbeat signal.

2.3. Adversarial Model Learning

The proposed adversarial ECG heartbeat classification model mainly consists of three parts: an encoder, classifier, and adversary subnetworks, as illustrated in Figure 2(c). The encoder network parameterized by is used to learn representations . In implementation, the convolution neural network is as the encoder, which is detailed in Section 2.4. The encoder outputs the representations , and are fed into the classifier parameterized by and the adversary network parameterized by separately. The classifier and adversary, consisting of a fully connected layer with softmax function, are used to classify the representations into heartbeat classes and subject IDs , respectively. To eliminate interferences caused by that are embedded in , we present an adversarial game. Here, the adversary is trained to predict subject IDs by maximizing the likelihood , while at the same time, the encoder is trained to conceal information regarding within by minimizing this likelihood and retain sufficient discriminative information for the classifier to estimate class labels by maximizing . Overall, we train the encoder, classifier, and adversary networks jointly towards the objective:where is the cross-entropy loss function, defined bywhere denotes the adversarial weight trading off between stronger invariance with task-discriminative performance. A higher enhances invariance to subjects, whereas forces the encoder to learn features that are discriminative for class labels, as well as subject IDs, which is not expected in our ECG classification task.

2.4. Convolutional Network Architecture

The ECG feature encoder is composed of 7 convolution layers and three spatiotemporal attention modules in total. The specific configuration of the encoder network is shown in Table 1. Following the first convolution layer, three residual convolution blocks with average pooling shortcuts are built to facilitate the optimization of the network and gain classification accuracy. The second (the last) convolution layer of each residual block uses the dilation rate of 3 to enlarge the receptive field without increasing the parameter amount. After all convolution layers, batch normalization (BN) [28] is used to accelerate model convergence by renormalizing the distribution of training minibatch. The Rectified Linear Unit (ReLU) function [29] is applied to activate the output of each BN layer, which could prevent the vanishing gradient problem well. Furthermore, we introduce a spatiotemporal attention mechanism [30], including spatial and temporal attention modules, which is embedded after each residual convolution block. This mechanism could focus on more informative features by assigning different weights to both channels and temporal segments of the feature map.

Learned representations by the encoder network are input to the classifier and adversary for task discrimination (heartbeat class) and subject ID discrimination. Both the classifier and adversary consist of a fully connected layer with and softmax units, respectively, to output normalized log-probabilities that will be used to calculate the loss in equation (2).

3. Experimental Studies and Results

3.1. Dataset

The MIT-BIH arrhythmia database [31] is used for evaluating the performance of the proposed method. This database consists of 48 two-lead ambulatory ECG recordings collected from 47 individuals, where recordings 201 and 202 were obtained from the same subjects. Each recording lasts about 30 minutes and is sampled at 360 Hz. According to ANSI/AAMI EC57:1998 [32], all heartbeats can be grouped into five superclasses: heartbeats originating in the sinus node (N), supraventricular ectopic beats (SVEBs or S), ventricular ectopic beats (VEBs or V), fusion beats (F), and unknown beat type (Q).

Following the AAMI-recommended practice, four paced recordings are not used. To obtain a more realistic evaluation, De Chazal et al. [33] recommended dividing the remaining 44 recordings into DS1 and DS2 sets for the training and test, respectively. This division splits the recordings by considering the identification of patients and the balance of classes, which guarantees that the heartbeats in the training and testing sets are from different patients. The detailed heartbeat distribution used in this paper is shown in Table 2.

3.2. Training Setting

20% of the training data is randomly chosen as the validation data, and the remaining data are used as the training samples. We set the adversarial weight to 0.005 by finetuning this parameter. The proposed adversarial deep-learning framework is trained by using an adaptive moment estimation (Adam) optimizer [34] with an initial learning rate of 0.001. During training, the model parameters are updated iteratively based on batches of 128 training samples. When the loss of the validation data remains undeclined for 10 epochs, the learning rate decreases to 0.0001, while for 20 epochs, the training will terminate. The best-performing model on validation data for heartbeat classification is saved.

3.3. Evaluation Metrics

Four typical metrics, including accuracy (Acc), sensitivity (Sen), precision (Pre), and score, are used to measure the classification performance of the proposed method. Here, accuracy measures the overall classification performance of the proposed method, whereas sensitivity and precision metrics are calculated for each specific class. score is the harmonic mean of precision and recall. These metrics are defined aswhere TP, TN, FP, and FN refer to the sample number of true positive, true negative, false positive, and false negative, respectively. Actually, the accuracy metric is largely dominated by the class (class N) with larger number of samples. To saliently reflect the classification performance of a model for pathological classes S and V, in addition to class-level scores and for these two classes, we further define the average score of S and V as

3.4. Classification Performance

Following the AAMI recommendation, we particularly focus on the classification performance of classes S and V since the proportions of training samples for these two arrhythmic classes are much higher (2.8% and 7.0%) and cover the majority of arrhythmias. The training samples of classes F and Q are very scarce (0.8% of the whole dataset), and the detection accuracy is usually pretty low in the literature. Figure 3 presents the confusion matrix for the heartbeat classification results on DS2, where the darker color indicates the more accurate prediction. Overall, the proposed method achieves high ECG heartbeat classification performance on classes N, S, and V. Most instances of classes N, S, and V are correctly classified. Nevertheless, the classification of classes F and Q is unsatisfactory. It is mainly due to the considerable small number of training samples for these two classes, as seen in Table 2. Furthermore, we evaluate the record-level classification results of the proposed method on DS2, as shown in Table 3. 18 out of 22 recordings attain an accuracy of above 90%. The classification accuracies of other 4 recordings 105, 202, 213, and 214 are 87.9%, 85.4%, 88.7%, and 65.2%, respectively. The overall classification performance of class V (92.5% sensitivity and 94.3% precision) is better than that of class S (78.8% sensitivity and 90.8% precision). This is partially because class S has a smaller sample size but more subclasses than class V.

3.5. Performance Comparison

Table 4 compares the interpatient heartbeat classification performance of several other methods and ours. Same as our evaluation scheme, these methods trained their models using the DS1 set and were evaluated on DS2, ensuring a fair comparison. As mentioned above, we focus more on the classification performance for classes S and V rather than the overall accuracy which is mainly governed by class N with the very large instances (90% of the whole dataset). In clinic, missing diagnosis is particularly serious, which can be reflected by sensitivity metric. Also, precise diagnosis is necessary. Thus, the comparison focuses on scores for pathological classes S and V, taking into account both sensitivity and precision metrics. Moreover, it is easy to make a comparison of a single metric between different methods. Thus, score, which is the average value of and for pathological classes S and V, is used as the final metric.

In [3, 4, 17, 35], the traditional ECG classification pipeline is adopted, which extracts features based on experiences from raw or preprocessed ECG signals and then inputs these extracted features into a classifier. Compared with these methods, the proposed method has a higher score of 11.4%–25%. [23, 24], and ours utilized a deep-learning model to automatically extract useful features and classification, coupled with some hand-craft features. The proposed adversarial CNN outperforms [23, 24] by 17.2% and 5.8% scores, respectively. It can be observed that the proposed method achieves the highest score. On the whole, the proposed method has an advantage in detecting pathological classes, especially class S which is challenging to identify in the MIT-BIH dataset, and also obtains a satisfactory performance ( score of >90%) in detecting class V.

4. Discussion

4.1. Effects of RR Ratio Features

To explore the effect of the pre-RR ratio and near-pre-RR ratio for classifying arrhythmias (i.e., classes N, S, V, F, and Q), the box plots that show the distribution of these two RR ratios among classes are given as Figure 4. It is obviously observed that two RR ratios can distinguish pathological classes S and V from class N well. Nevertheless, it is difficult to distinguish between S and V. This is reasonable due to some shared characteristics between pathological ECG recordings, such as too fast or too slow rhythm. Therefore, additional ECG feature learning by other techniques is necessary, such as deep learning used in this paper. Class F, which is the fusion of ventricular and normal beats, has a distribution of two RR ratios close to that of class N. Class Q consists of unknown beats. Thus, its RR ratios span a wide range of distribution. The comparison for classification performance between with/without the pre-RR ratio and near-pre-RR ratio is shown in Table 5. The experimental results demonstrate that these two RR ratio features greatly improve the sensitivity and precision in detecting pathological classes S and V by providing more prior knowledge about heart rhythms to the deep network.

4.2. Regular CNN vs. Adversarial CNN

Here, the regular CNN indicates the encoder-classifier network. We remove the adversary subnetwork from the proposed framework to validate the effectiveness of adversarial learning. The same data processing, feature extraction, and experiment setting are performed between the regular CNN and the proposed adversarial CNN. The comparison for classification performance is shown in Table 6. It is obvious that the proposed adversarial CNN is far superior to the regular CNN, except that the precision metric for class V is slightly lower. The regular CNN is data driven in essence. However, the ECG recordings provided in the MIT-BIH database are collected from an insufficient number of subjects. Therefore, it is challenging to capture the robust features against intersubject variabilities using the regular CNN, and the learned features could be subject related. On the contrary, the proposed adversarial CNN works out concealing the information of subject IDs by the adversarial game. The experimental result suggests that the adversarial learning can significantly facilitate learning generalizable features across subjects that are invariant to subjects.

4.3. Choosing the Adversarial Weight Parameter

The adversarial weight makes a tradeoff between the invariance to subjects and task-discriminative performance. A very strong will promote the encoder to learn subject-invariant information. However, increasing can result in losing task-discriminative information. Here, we implemented several experiments to analyze the effect of different adversarial weights . Table 7 shows experimental results. For class N, the sensitivity and precision of different are all higher than 90%, which should be attributed to a large sample number of class N. For classes S and V, it can be seen that the performance of a higher is low (when , 0.05, and 0.1). When , the overall performance is the highest.

4.4. Visualization of Learned Features

The t-distributed stochastic neighbor embedding (t-SNE) [36] can reduce high-dimensional data to a two-dimensional map nonlinearly. Here, we applied t-SNE to evaluate the proposed method visually. The preprocessed heartbeat segment is 256-dimensional vectors (the length is 128 and the channel number is 2). Combining RR ratio features with the heartbeat segment, 768-dimensional vectors (two RR ratio features and the heartbeat segment are all 256-dimensional vectors) were used as the input of the proposed adversarial CNN. We extracted the outputs from different layers. The visualizations are shown in Figure 5. The sample size of class N was reduced in the figures for a good visualization. It can be observed from Figures 5(a) and 5(b) that no obvious clusters exist in the input feature vectors. As the layer deepens, the clusters become apparent (Figures 5(c) and 5(f)). However, in the first three residual blocks, the clustering of each class is still separated. This means that these feature vectors fail to distinguish classes N, S, V, F, and Q well and further nonlinear operations are required. For the feature vectors output by the global average-pooling layer (Figure 5(f)), the clustering is very apparent. Figure 5(f) demonstrates that the extracted features by the proposed method are discriminative to classify multiclass arrhythmias. It is noted that each class may contain multiple clusters. This is because each class consists of multiple subclasses in which some features are different. For example, bundle branch block beat and normal beat belong to class N, while they have different QRS complex durations.

5. Conclusions

This paper presents a CNN-based adversarial deep-learning framework for interpatient heartbeat classification using a small subject number of ECG signals. The proposed framework consists of an encoder, classifier, and adversary networks. The encoder is used to learn representations from input data generated by raw signal preprocessing and feature extraction procedures. Then, these representations are fed separately into the classifier and adversary to classify heartbeats and subject IDs. The overall framework is trained by minimizing the heartbeat classification loss and maximizing the subject ID identification loss, enforcing the encoder to conceal information regarding subject IDs and retain sufficient discriminative information for task (heartbeat) classification. The proposed framework can help to eliminate the interpatient variability and obtain invariant representations across subjects by utilizing the adversarial learning. Therefore, it is especially suitable for ECG classification tasks with an insufficient number of patients.

Data Availability

The MIT-BIH Arrhythmia Database used to support the findings of this study is publicly available and can be downloaded at https://physionet.org/content/mitdb/1.0.0/.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant 61922075) and USTC Research Funds of the Double First-Class Initiative (YD2100002004).