Abstract
Emotion can be influenced during self-isolation, and to avoid severe mood swings, emotional regulation is meaningful. To achieve this, efficiently recognizing emotion is a vital step, which can be realized by electroencephalography signals. Previously, inspired by the knowledge of sequencing in bioinformatics, a method termed brain rhythm sequencing that analyzes electroencephalography as the sequence consisting of the dominant rhythm has been proposed for seizure detection. In this work, with the help of similarity measure methods, the asymmetric features are extracted from the sequences generated by different channel data. After evaluating all asymmetric features for emotion recognition, the optimal feature that yields remarkable accuracy is identified. Therefore, the classification task can be accomplished through a small amount of channel data. From a music emotion recognition experiment and a public DEAP dataset, the classification accuracies of various test sets are approximately 80–85% when employing an optimal feature extracted from one pair of symmetrical channels. Such performances are impressive when using fewer resources is a concern. Further investigation revealed that emotion recognition shows strongly individual characteristics, so an appropriate solution is to include the subject-dependent properties. Compared to the existing works, this method benefits from the design of a portable emotion-aware device used during self-isolation, as fewer scalp sensors are needed. Hence, it would provide a novel way to realize emotional applications in the future.
Similar content being viewed by others
Introduction
Currently, coronavirus disease 2019 (COVID-19) is causing a terrible health crisis worldwide, and many governments have imposed strict regulations to prevent infections, such as self-isolation. During isolation periods, emotions can be strongly influenced. Therefore, to avoid severe mood swings, emotional regulation is meaningful. To achieve this goal, efficiently recognizing the emotional state is a vital step. Previously, emotion recognition has been realized by various modalities, including speech [1], facial expression [2], and body posture [3]. Nonetheless, the emotions indicated by these approaches are subjective and can be easily disguised, especially when the subjects are unwilling for them to be recognized. In addition, a camera and microphone are required for constantly recording the response data; this method is unrealistic for self-isolation due to violations of personal privacy. In this regard, the electroencephalography (EEG) signal is more suitable because it is a tool that has been extensively employed to assess the electrical activities of the brain, which is the control center of emotion. Meanwhile, abnormalities in EEG can also aid in the diagnosis of COVID-19 [4]. Therefore, emotion recognition using EEG is a potential method used for achieving emotional regulation during the self-isolation due to COVID-19.
In order to achieve emotion recognition, it is necessary to extract trustworthy features from EEG. The typical EEG features can be fundamentally categorized into time-domain, frequency-domain, time–frequency domain, and others. Time-domain features apply statistical measurements to characterize EEG, such as the mean, standard deviation, kurtosis, skewness, first difference, and second difference [5]. Frequency-domain features focus on the spectral properties of EEG, such as the powers of frequency subbands and higher-order spectra (HOS) [6]. Time–frequency domain features are mainly from the time–frequency analysis (TFA), which enables frequency information to be related to the time domain. Thus, TFA can provide the features that present dynamic variations in both the time and frequency directions [7]. For example, discrete wavelet transform (DWT) decomposes the EEG into several components that correspond to various frequency subbands and simultaneously conserve time-related information [8]. Similarly, intrinsic mode functions (IMFs) acquired from empirical mode decomposition (EMD) can be denoted as the features to indicate the amplitudes, frequencies, and phases of EEG [9]. Finally, the entropy (approximate entropy (ApEn), differential entropy (DE), sample entropy (SampEn), etc.) that reveals the irregularities of EEG [10] and the connectivity (brain symmetry index (BSI), rational asymmetry (RASM), differential asymmetry (DASM), etc.) that characterize the hemispherical asymmetry of the brain [11] are also valuable features in this field.
An earlier work [12] mentioned that the signal powers with the spectra of EEG are widely used in emotion recognition. In addition, Niknazar et al. [13] claimed that the frequency subbands contain more details regarding constituent neuronal activities underlying EEG. Therefore, the characteristics in the EEG that are not evident in the full spectrum can be amplified when each subband is considered separately. Such spectra are termed brain rhythms: δ (0–4 Hz), θ (4–8 Hz), α (8–13 Hz), β (13–30 Hz), and γ (30–50 Hz) [14]. Furthermore, the existing works all concluded that their variations could help to assess emotion accordingly. For instance, γ power is sensitive to sadness and happiness [15]. θ power exhibits a negative correlation with arousal [16]. In the T3–T4 channels, a hemispherical asymmetry exists in β power when the emotion is fear, while another hemispherical asymmetry of α power appears when the emotion is sadness [17].
Generally, the rhythmic features are extracted from multichannel data. Such a large data size incurs a heavy computational burden of feature extraction and increases the hardware complexity of emotion recognition. Hence, selecting the optimal features from several representative channels is an efficient solution. This consideration is vital when designing a portable emotion-aware device applied for self-isolation because fewer sensors or electrodes placed on the scalp can support a convenient way to measure EEG. Therefore, channel selection is needed, and several works have been conducted to achieve this goal. Zheng and Lu [11] employed deep belief networks (DBNs) to recognize three types of emotions (positive, neutral, and negative) and explored the representative channels that outperform full-channel data with less performance loss. The power spectral densities (PSDs) of five brain rhythms, RASM, DASM, and the differences between the DE of 23 pairs of channels were employed as the features. These results indicated that the classification accuracy using 4 channels (T7, T8, FT7, and FT8) was 82.88% that using 12 channels (C5, C6, CP5, CP6, T7, T8, FT7, FT8, P7, P8, TP7, and TP8) was 86.65%, and that using all 62 channels was 86.08%. Menezes et al. [18] extracted the PSDs of five brain rhythms from 4 channels (FP1, FP2, F3, and F4) for emotion recognition. With the help of the support vector machine (SVM), the classification accuracies achieved 71.7% for arousal and 73.8% for valence. In addition, the features of δ and θ produced better results than the others. Wang et al. [19] applied normalized mutual information (NMI) for emotion recognition. First, short-time Fourier transform (STFT) was used to obtain EEG spectrograms. Then, all spectrograms were utilized to calculate the NMI connection matrix. Finally, emotion recognition was accomplished by thresholding with connection matrix analysis. This approach can achieve classification accuracies of 74.41% for valence using 8-channel data and 73.64% for arousal using 10-channel data. Mohammadi et al. [20] performed DWT to investigate a minimum number of channels and the optimal rhythmic features for emotion recognition. They applied the entropies and PSDs of five brain rhythms as the features. The results revealed that five pairs of symmetrical channels (FP1–FP2, F3–F4, F7–F8, FC1–FC2, and FC5–FC6) realize the classification accuracies of 84.05% for arousal and 86.75% for valence. Zheng [21] developed group sparse canonical correlation analysis (GSCCA) for emotion recognition and utilized logarithm frequency subband powers of five brain rhythms as features to train the classification model. The results demonstrated that the higher frequency subbands (such as β and γ) are more appropriate for emotion recognition. In addition, the accuracies through 4, 12, and 20 channels were 80.20%, 83.72%, and 82.45%, respectively.
The above works mainly used PSDs, DE, etc., of five brain rhythms for emotion recognition, and the channel selection is implemented by the classification accuracies accordingly. Nevertheless, the chronological variations in brain rhythms have not yet been considered. Inspired by the knowledge of sequencing in bioinformatics, the characteristics of different species are represented as biological sequences, which can be used for data mining, analysis, and classification [22]. Then, if the brain rhythms are interpreted in a sequential format, the time–frequency characteristics of EEG can be expressed simultaneously. Such time-series data are also available for classification. To this end, a method termed brain rhythm sequencing (BRS) that analyzes the EEG as the sequence consisting of a dominant rhythm has been proposed for seizure detection in previous work [23]. Now, considering that similarity is a fundamental analysis derived from homology theory [24] and that asymmetry can be denoted by measuring the similarity between pairwise sequences, in this work, the similarity measures are operated on the brain rhythm sequences generated by symmetrical channels (e.g., FP1–FP2, F3–F4, and F7–F8). Then, the asymmetric feature that shows neuronal synchrony of the left and right hemispheres can be acquired for emotion recognition. This method provides a novel way to study brain asymmetry, where asymmetry is a vital aspect of cognitive functions, including emotion [25], and most of the existing works usually analyze asymmetry through frontal alpha asymmetry or the brain asymmetry index [26]. In addition, asymmetric features can be extracted from all pairs of symmetrical channels. After these evaluations, the best one that produces impressive accuracy is found. Therefore, high classification accuracy can be accomplished by an optimal feature extracted from only one pair of symmetrical channels. Such results also contribute insights to explore individual characteristics of emotion recognition. In short, the novelties of this work are as follows:
-
The BRS concentrates on the chronological variations of brain rhythms, and with the help of similarity measure methods, asymmetric features can be extracted and applied for emotion recognition.
-
The representative symmetrical channels of emotion recognition are studied by considering the optimal asymmetric features found, so the portable emotion-aware device can be further simplified with fewer channels of data.
-
The emotional EEG recordings are acquired from a music emotion recognition (MER) experiment and public DEAP dataset [16], so the proposed method can be extensively evaluated, providing insights for exploring individual characteristics in different scenarios.
For illustration, Fig. 1 shows the system workflow of this work. First, the EEG recordings are acquired from the MER experiment and public DEAP dataset. Then, the brain rhythm sequences of different channels are generated using the reassigned smoothed pseudo Wigner-Ville distribution (RSPWVD) method. Second, the generated sequences are paired based on symmetrical channels located on the left and right hemispheres. Hence, a number of asymmetric features can be extracted from various pairs of symmetrical sequences through similarity measure methods. Subsequently, k-nearest neighbors (k-NN), support vector machine (SVM), and linear discriminant analysis (LDA) are applied to train and test the extracted features based on leave-one-trial-out (LOTO) cross-validation. Therefore, the classification accuracies of all asymmetric features are evaluated. Finally, the optimal feature and its related channels are identified by considering the highest classification accuracies. Such results are also utilized to investigate individual characteristics. Meanwhile, a comparative study with the existing works that exploit symmetrical spatial features is conducted.
The rest of this work is as follows: the “Experimental data” section describes the EEG data acquired from the self-designed MER experiment and public DEAP dataset. The “Proposed methodology” section introduces the details about the BRS and its classification method using asymmetric features. The “Results and discussion” section shows the results from the respective scenarios, with discussion and performance comparisons. The “Conclusion” section is the summary and future work.
Experimental Data
Data acquisition is the first stage in an EEG-based study. In this work, the EEG recordings from two scenarios are included. One is from the self-designed MER experiment, and the other is from the public DEAP dataset, as detailed below.
Self-Designed MER Experiment
The MER experiment that evokes emotion through music clips was conducted in the laboratory at the Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China. The experimental procedures involving human subjects were performed in accordance with the ethical standards of the institutional research committee. A Neuroscan 64-channel system (62 scalp channels, 2 periocular channels) was applied to record the EEG. Thirty-six healthy subjects (13 women, 23 men, 22.22 ± 3.13 years) were recruited. The musical clips (each 30 s) from the PMEmo dataset [27] were employed to evoke different emotional reactions of the subjects. The PMEmo contains 794 songs (almost all are in English) selected from three popular music charts, with annotations of arousal and valence (normalized into 0–1) rated by 457 subjects. To perform balance elicitation in this work, 40 stimuli were chosen based on the emotional annotations, in which 10 stimuli were used for each category, i.e., HAHV (high arousal high valence), HALV (high arousal low valence), LAHV (low arousal high valence), and LALV (low arousal low valence). Here, the threshold of high and low was 0.5. In addition, these 40 stimuli were divided into two sessions for presentation, and there was a break (5 min) between the two sessions. Meanwhile, each trial included three phases: rest (10 s), music listening (30 s), and self-assessment (20 s). Thus, the duration of one trial was 1 min, and the entire experiment required 45 min per subject, as depicted in Fig. 2.
In the beginning, the subjects were informed about the procedure, and they signed the consent form after their questions and doubts were fully answered. After that, a questionnaire including age, gender, body condition, and habits was collected. To protect personal privacy, the names of the subjects were denoted by S1, S2, S3, and so on. During the experiment, the subjects performed self-assessments after listening to each musical clip, and their ratings (from 1 to 9) were based on five factors: arousal, valence, liking, familiarity, and understanding. The experimental trials are labeled high or low, where the threshold is 5 [20]. Then, five test sets were obtained: set-A (arousal), set-V (valence), set-L (liking), set-F (familiarity), and set-U (understanding).
Figure 3 illustrates the system overview of the MER experiment. The subjects sat on a sofa and listened to the musical clips with earphones. A stimuli computer scheduled and presented the musical clips, while a data acquisition computer measured the EEG data. Thus, the full data of all trials were contained in the recordings after pairing the two computers. In addition, an amplifier was applied to connect the data acquisition computer and the EEG cap. For preprocessing, the raw EEG recordings were downsampled to 200 Hz and then a bandpass filter with a cut-off frequency of 0.01–50 Hz was utilized for data filtering [28]. Subsequently, the EEG artefacts (e.g., eye movements, muscle activities) were removed by employing independent component analysis (ICA) through the EEGLAB toolbox [29]. Finally, the preprocessed data were acquired for method validation.
Pubic DEAP Dataset
DEAP [16] is one of the most famous public datasets and has been extensively evaluated in emotion recognition. In this dataset, a 32-channel system was employed to record the EEG data from 32 healthy subjects (15 women, 17 men, 27.19 ± 4.45 years). Regarding the stimuli, 40 musical videos (each 60 s) were utilized to evoke the emotions. Thus, the emotional EEG size of each subject was 32 channels × 40 trials × 60 s. After watching each video, the subjects performed self-assessments (from 1 to 9) based on two factors: arousal and valence. Hence, the trials were also labeled and divided into two test sets, where set-A is high arousal (HA), A ≥ 5, and low arousal (LA), A < 5; set-V is high valence (HV), V ≥ 5, and low valence (LV), V < 5. Moreover, DEAP provides the preprocessed data, in which the raw recordings were downsampled to 128 Hz and filtered through a bandpass filter with a cut-off frequency of 0.01–100 Hz. Therefore, the preprocessed data were applied for method validation.
Proposed Methodology
To achieve emotion recognition using the BRS, signal processing by the RSPWVD is conducted first, which aims to extract the rhythmic powers of the EEG. Then, from the resulting time–frequency plane, the instantaneous powers of five brain rhythms in the same time bin are estimated, so the dominant rhythms can be determined and used for generating the sequence data. Next, the asymmetric features are extracted through the similarity measure methods of the rhythm sequences from the symmetrical channels. Finally, the classification task is achieved by the asymmetric features after training and testing based on LOTO cross-validation. The above operations are detailed in the following subsections.
Signal Processing by TFA
The sequence data need the dominant rhythms along the time scale of EEG, so it is necessary to find a particular brain rhythm in each time bin. This objective can easily be realized by considering the instantaneous power distributions in the time–frequency plane. To this end, TFA is used to extract the signal power information first. Previously, several TFA techniques have been employed for emotion recognition, such as STFT [19], DWT [20], Hilbert-Huang transform (HHT) [30], and Wigner-Ville distribution (WVD) [31]. After comparisons, WVD was chosen because it is good at tracking the sudden variations of the signal in the time domain and at preserving both the time and frequency shift information [32]. Consequently, the instantaneous power distributions in the time–frequency plane can be acquired by WVD (1):
where x(t) denotes the input signal, t and ω are time and frequency, respectively, and * refers to the complex conjugate.
However, WVD suffers from cross-terms in its resulting plane. Cross-terms cause multiple irrelevant regions, which can be regarded as the artefacts that appear in the WVD representations. Such artefacts falsely show the signal components and interfere with the power localization in the plane accordingly [33]. To eliminate the cross-terms, the smoothing version of WVD over time and frequency is needed (2):
where h(t) and g(t) are the smoothing windows applied to the frequency and time to eliminate the cross-terms, and H(ω) denotes the Fourier transform of h(t).
In this work, the smoothing window is the Hamming window, and the independent controls are equipped with the WVD. This variant is smoothed pseudo WVD (SPWVD) (3):
Furthermore, the reassignment provides the effectiveness to enhance the readability of the TFA [34], so it is conducted in SPWVD. Its principle is to rearrange the coefficients of the time–frequency distributions around new zones to produce a high-resolution result, which can be viewed as a complement to achieve the true region of the analyzed signal. For instance, relocate each value of SPWVD at any point (t, ω) to another point (\(\hat{t}\), \(\hat{\omega }\)), which is the center of gravity of the power distribution around (t, ω). Hence, the reassigned value of SPWVD at any point (\(\hat{t}\), \(\hat{\omega }\)) is the sum of all of the values reassigned to that point (4):
where:
A result using RSPWVD to process an EEG signal (F8 channel, subject S2, DEAP) is presented on the left side of Fig. 4, in which the horizontal and vertical axes are time and frequency, respectively, and the color bar indicates the variations in the signal powers.
Generation of Rhythm Sequence
In the resulting time–frequency plane from RSPWVD, the frequency axis is divided into five parts based on five rhythms. Meanwhile, the time axis is separated into various time bins (t1, t2, t3 …), which can be referenced by the average reaction time of neurons from the existing works. Previously, Chandra et al. [35] claimed that the average reaction time of neurons is approximately 0.14–0.2 s. Rey et al. [36] employed the TFA method to analyze the EEG and found that the average evoked power occurs at approximately 0.2 s. In addition, in two EEG-based studies, Korik et al. [37] and Azevedo et al. [38] exploited the 0.2 s time bin EEG data to accomplish the decoding of hand motion trajectories and seizure detection, respectively. Based on these findings, the time bin of BRS is 0.2 s.
Next, the dominant rhythm in each time bin is acquired by considering the instantaneous power because it has been demonstrated to be the key to emotion recognition [39]. For this aim, the five rhythmic powers in the same time bin are calculated. For instance, on the right side of Fig. 4, α power at t3 has been illustrated, which is estimated by the average of all powers located inside the boundary. In this way, all five rhythmic powers at each 0.2 s are obtained, and the dominant rhythm having the maximum instantaneous power in each time bin of the EEG can be identified, which forms the rhythm sequence accordingly. A sample is depicted in Fig. 5, where an EEG signal from the PO3 channel of subject S2 in DEAP is displayed at the bottom, and its generated sequence (20–25 s) is shown at the top.
Asymmetric Features Extraction
As stated, emotion is an inner reaction controlled by the brain, which is a complex network system organized into different functional areas. Typically, functional differences appear between the left and right hemispheres on particular tasks, such as motor control, perception, memory, and emotion [40]. Such differences are ubiquitous across brain information processing. Consequently, the asymmetric features are valuable to assess emotion, and they have been considered in this work.
As seen, the proposed sequence discloses the chronological variations of the dominant rhythms on a specific channel during the emotional process. Hence, the neuronal synchrony of the left and right hemispheres can be indicated through similarity measures to those sequences from the symmetrical channels, then denoted as the asymmetric features. To this end, it is necessary to pair them based on the scalp locations, as summarized in Table 1, in which the first column displays the scalp regions and the remaining columns list the details of the symmetrical channels from DEAP and MER.
In Table 1, the scalp is divided into five regions: frontal, central, parietal, temporal, and occipital, so the total number of symmetrical channels is 14 and 27 from DEAP and MER, respectively. Next, a more important step is to appropriately measure the similarity levels between the pairwise rhythm sequences so that the asymmetric features can be extracted correspondingly. To this end, seven typical similarity measure methods are considered, including Jaccard index (JAC), Hamming distance (HAM), Levenshtein distance (LEV), dynamic time warping (DTW), mutual information (MUT), local sequence alignment (LSA), and global sequence alignment (GSA).
JAC is a statistical approach that measures the percentage of overlap between pairwise sequences. HAM and LEV belong to distance-based methods, in which HAM calculates the number of elements at which the pairwise sequences differ, and LEV finds the minimum number of edits (either insertion, deletion, or substitution) required to change one sequence to be the same as the other. DTW applies a time-warping function that transforms or warps the elements to align the pairwise sequences. Hence, it can generate an optimal alignment between them. MUT evaluates the interdependence interactions derived from the concept of entropy in information theory. Therefore, it can estimate the information integration of pairwise sequences to reveal their similarity levels. LSA and GSA are widely used in bioinformatics, as they are good at identifying the similarity regions between pairwise sequences, where LSA is operated by the Smith-Waterman algorithm that aligns a portion between the sequences and GSA is implemented by the Needleman-Wunsch algorithm that aims for an end-to-end alignment.
Figure 6 shows the extraction of asymmetric features. First, the brain rhythm sequences are paired based on the symmetrical channels listed in Table 1, such as FP1–FP2, AF3–AF4, P7–P8, and O1–O2. Then, the aforementioned similarity measures are performed on all pairwise sequences. Here, the total number of extracted features per subject was 3920 (14 pairs × 7 measures × 40 trials) and 7560 (27 pairs × 7 measures × 40 trials) on DEAP and MER, respectively. Such asymmetric features can achieve emotion recognition after training and testing by the classifiers, as described in the next stage.
Classification Method
After extraction, the number of each asymmetric feature (e.g., FP1–FP2 by DTW) per subject is 40. Compared with the deep learning method, the conventional classifier is more appropriate, as it can build up a classification model when the feature size is small, which also yields a good performance in the existing works [20, 41]. Therefore, k-NN, SVM, and LDA are utilized in this work.
For k-NN, k means the number of nearby instances used for deciding the category of testing data. This value typically approximates the square root of the number of the training set, and it prefers to be a small positive integer. In addition, keeping its value odd makes the decision process faster. Following this rule, the value of k is chosen as 5, as the number of training sets on DEAP and MER is 39 trials per case. SVM creates a classification model based on a decision boundary or a maximal margin that separates the training set into two categories. Hence, the testing data can be classified by its location. LDA is a linear classifier that establishes a probabilistic model for each category by considering the specific distribution of the input training set. Thus, the testing data can be classified based on its conditional probability belonging to the category with a higher probability.
To reduce perturbations incurred by different trials and solve the overfitting risks, LOTO cross-validation is applied in training and testing. Its procedure is allocating the feature from one trial as testing data and then assigning the features from the remaining trials as the training set. This process is repeated by defining the features from various trials as the testing data until all of them are classified. After comparing the testing results with the original labels, the classification accuracy by each type of asymmetric feature is obtained, and the optimal feature that produces the best result for emotion recognition can be identified. Consequently, high classification accuracy is accomplished with an optimal asymmetric feature only.
Results and Discussion
In this work, MATLAB R2021a was applied for programming the proposed methodology, and the results in various test sets are from its calculations. Then, for the results, the performances of seven similarity measure methods with three classifiers are discussed to summarize the appropriate method for measuring the similarity levels of rhythm sequences and the suitable classifiers for asymmetric features. In addition, the representative symmetrical channels used for recognizing specific emotional factors are analyzed based on the optimal features found. Note that the conditions are different between DEAP and MER, so the results and discussion are separated into two subsections. Finally, the performance comparison with the existing works that consider symmetrical spatial features is carried out.
Results From the DEAP Dataset
The average classification accuracies of the asymmetric features extracted by seven similarity measure methods are presented in Tables 2, 3, and 4, respectively, in which the first column lists the methods and the remaining columns display the accuracies of set-A and set-V using the asymmetric features extracted from the sequence data from the first 30-s (F30 s), last 30-s (L30 s), and all 60-s (A60 s) periods. Here, to calculate the classification accuracy of each subject, 40 experimental trials are classified. Thus, these results are from 40 simulation runs per subject and then averaged by 32 subjects. In addition, the best of each case is underlined.
Meanwhile, for illustration, Fig. 7 depicts a comparative histogram to display the average accuracies of three classifiers with different similarity measure methods on set-A of the DEAP dataset, where the data sources are from the L30 s. As observed, the performances of SVM are similar to those of LDA, while k-NN yields better results. Similar trends can also be found in the other scenarios. The main reason may be the properties of the classifiers. SVM generates a hyperplane that separates the training set in the frontier between two classes, and LDA makes a hyperplane that separates the training set. Thus, both achieve classification by separating the hyperplane with a special margin. Then, k-NN conducts the classification through a cluster determined by known neighbors (i.e., a training set) around the testing data. Such results also reveal that the distribution of the asymmetric features is more fit with k-NN. The comparisons indicate that k-NN is more suitable for use as the classifier for training and testing the asymmetric features in this work.
In addition, when using the same classifier, the performances by different similarity measure methods are close, as their variations are slight. This indicates that there are no substantial differences in similarity measure methods for the asymmetric features. The main reason may be that the components of the sequences are only five brain rhythms, and their length is either 150 (i.e., 30 s) or 300 (i.e., 60 s), so they can be viewed as a short string. For the methods investigated, even though some are distance-based, and some are shape-based, they may not produce different performances in the similarity levels between such strings. Here, DTW provides approximately 1–2% higher accuracy than the others. Based on the above considerations, it can be said that the performances by different similarity measure methods are close when using the same classifier. Overall, DTW is slightly better. Therefore, DTW is recommended as the similarity measure method to extract the asymmetric features in this work.
Furthermore, the length of the brain rhythm sequence is the same as the length of EEG, so different lengths are evaluated to investigate the time effect in emotion recognition. Here, close results are obtained when employing 30 s and 60 s data on the respective classifiers, disclosing that the 30-s period is sufficient to realize a similar performance as 60 s. As a result, the time applied for emotion recognition can be further reduced from 60 to 30 s, which also removes the redundant data at the time scale. More importantly, the L30 s data exhibit slightly better results than the F30 s data. This may be due to the later periods containing more emotion-related information than the earlier periods. Similar findings have been reported previously. Kumar et al. [42] compared the classification accuracies on DEAP by F30 s and L30 s data, respectively. The results revealed that the L30 s period is more associated with emotion. In another work, Jatupaiboon et al. [43] assessed the accuracies of arousal and valence through the F30 s, L30 s, and A60 s data, respectively. They claimed that the L30 s data yield the best average accuracy. Thus, the aforementioned works also demonstrated that the results from the proposed methodology are reasonable.
The above analysis indicates that the DTW is appropriate for the similarity measure, k-NN is suitable for the classifier, and the L30 s period is proper for emotion recognition. Based on such properties, the classification accuracies using the asymmetric features extracted from various symmetrical channels are evaluated. Figure 8 illustrates the results of subject S3 from DEAP, in which a and b depict the accuracies on set-A and set-V, respectively. The deeper the red, the higher the classification accuracy. In Fig. 8, the accuracies of the asymmetric features vary with the emotional factors, even for the same subject. For example, the asymmetric feature of FC1–FC2 yields a remarkable accuracy (95%) on set-A, but it is not the best (75%) on set-V, while CP1–CP2 is more useful (80%) on set-V. Such findings further imply that the similarity levels of rhythm sequences between FC1 and FC2 and between CP1 and CP2 are sensitive to variations in arousal and valence, respectively. Consequently, the emotion recognition of subject S3 can be directly achieved by the corresponding asymmetric features.
Further investigations were conducted to determine the performances of asymmetric features among different subjects on the same test set. Figure 9 draws the accuracies of the asymmetric features for set-A from four subjects (S3, S5, S21, and S25) on DEAP. As observed, although the asymmetric features are extracted and classified in the same way, their performances change by subject. For instance, the asymmetric feature of FC5–FC6 is only vital for subject S21 (Fig. 9c), while it is not active for the others. Such distinctness implies that emotion recognition exhibits subject-dependent properties, consistent with earlier works [44, 45].
Taking the highest accuracy, the optimal asymmetric features are identified for all subjects of DEAP. It is interesting to know the locations of the optimal features. To this end, the statistical percentages based on five scalp regions are presented in Table 5, in which the first row denotes the region and the remaining rows display the percentages on different test sets. The results reveal that the optimal features are mainly from the symmetrical channels located in the frontal or parietal regions. Regarding brain function, the frontal region regulates cognitive awareness from stimuli, and the parietal region processes perceptual information from audio and vision. Thus, the EEG recordings from such regions function in the reactions under stimuli. This may be why frontal asymmetry has been commonly used to assess emotions [25, 26]. In addition, Table 5 implies the involvement of other regions in emotion recognition, revealing that an appropriate solution considers the representative symmetrical channels per subject, rather than a fixed feature for all cases. In this regard, the proposed methodology is valid for obtaining the optimal feature for each test set.
Results From the MER Experiment
Using DTW and k-NN, the results from the MER experiment are obtained. Here, the analysis and discussion also consider the optimal asymmetric features found. For this aim, the statistical percentages based on five regions are summarized in Table 6, in which the first row shows the region, and the remaining rows list the percentages of the respective test sets (arousal, valence, liking, familiarity, and understanding).
In Table 6, the MER results of set-A and set-V are similar to the DEAP results displayed in Table 5, as the optimal asymmetric features are also mainly from the frontal or parietal regions. Such consistency proves that the proposed methodology is available to select the representative symmetrical channels under different experimental conditions. Moreover, regarding the three test sets (set-L, set-F, and set-U) that are not investigated in DEAP, their optimal features are primarily located in the temporal, parietal, and frontal regions, respectively. Usually, the temporal region copes with sound information such as music. Therefore, it is reasonable that the data from this region can assess the liking feeling when listening to the music. As discussed, the parietal region always addresses perceptive information involving audio. Hence, its data can disclose the effect of familiarity evoked by the music. The frontal region controls conscious thought from external stimuli, so its data can help to answer whether the lyrics or musical rhythm is understood by the subjects.
In addition, the statistical percentages in Table 6 exhibit individual characteristics. To further discuss such characteristics, Fig. 10 illustrates the optimal asymmetric features used for recognizing five emotional factors for subjects S9 and S14 of MER, where different colors correspond to various emotional factors. As observed, the locations of the optimal features are adjacent, revealing that the emotional reaction should be a complex procedure that requires a group of surrounding channels to process. Moreover, different factors are typically recognized by particular symmetrical channels, and such properties also vary with the subjects. This may imply that there is no general model of emotion elicitation among the different cases. Previously, Lim [46] claimed that emotion is related to the cultures, backgrounds, and experiences of the subjects, so emotion recognition is likely to be subjective, such as in the results found here.
Performance Comparison
A performance comparison with the existing works is summarized in Table 7, in which the first column lists the work and the remaining columns show the number of channels applied for emotion recognition, methodology, and the classification accuracies on various cases correspondingly. In addition, the best of each case is underlined.
In Table 7, all of these works consider the symmetrical spatial features to investigate the DEAP dataset. For example, Wang et al. [19] used the NMI matrix derived from the spectrograms of all pairs of symmetrical channels. Mohammadi et al. [20] applied entropies and PSDs from five pairs of symmetrical channels. Kumar et al. [42] utilized bispectrum analysis of the symmetrical channels FP1–FP2. Islam et al. [47] designed Pearson’s correlation coefficient images from all pairs of symmetrical channels. Xing et al. [48] developed a linear mixing model based on the frequency subband power features from all pairs of symmetrical channels. Ahmed et al. [49] proposed a two-dimensional vector consisting of the asymmetry in different brain regions and termed it AsMap. Cui et al. [50] exploited the regional asymmetric features located on the left and right hemispheres of the brain. From the comparisons, even though the accuracies are not the best when using the proposed BRS method, it achieves impressive results through an asymmetric feature extracted from only one pair of symmetrical channels. In addition, deep learning methods such as neural networks achieve superior accuracy. However, their main limitation is that a large training dataset is needed, so all channel data are usually applied, meaning that when the dataset is smaller, it is not easy to train a neural network with outstanding performance. In this regard, the proposed methodology is more suitable for processing a smaller dataset because the number of applied channels is comparatively lower. This property fully considers the trade-off between classification accuracy and the number of channels. Therefore, different approaches can determine various conditions of emotion recognition.
Moreover, this work obtains superior results in the MER experiment, while most of the existing works were without self-designed experiments. This comparison demonstrates that the proposed methodology has stable performances on both the public dataset and the experimental data, indicating that it is reliable for different scenarios. In addition, in this work, the simulation conditions are central processing unit (CPU): Intel Core i5-10,505@3.20 GHz; random access memory (RAM), 8 GB; hard disk drive, 1 TB, 7500 revolutions per minute. Using it, the time of sequencing is approximately 18 s when the EEG length is 30 s, and it is approximately 52 s when the EEG is 60 s. After that, for each subject, with the sequences generated by different channel data, it takes approximately 49 s to extract the asymmetric features using seven similarity measure methods. Finally, regarding the classification through k-NN, SVM, and LDA, the time including the training and testing periods is approximately 31 s. Therefore, the settings of DTW, k-NN, and L30 s are formed, which can simplify the whole classification process. Note that there is no strict memory requirement for the proposed methodology. Undoubtedly, a larger memory size speeds up the simulation runs. In short, the BRS exhibits advantages in simplifying portable emotion-aware devices such as low-cost EEG headsets, which further provide a solution to recognize the emotions of the human being during self-isolation through the use of fewer electrodes or sensors.
Conclusion
In this work, the asymmetric features derived from the similarity measures of brain rhythm sequences have been proposed, which provide a potential solution to design low-cost emotion-aware devices used for self-isolation during the COVID-19 pandemic. The method validation was performed on the EEG recordings from the MER experiment and the public DEAP dataset. The results revealed that one pair of symmetrical channels is sufficient to extract an optimal feature for producing classification accuracies up to 80–85%. In addition, the asymmetric features found are beneficial for investigating the response mechanisms of emotion, and further investigation showed that emotion recognition exhibits strongly individual characteristics. Therefore, to achieve an impressive performance, an appropriate approach is to consider the subject-dependent properties, which can be obtained by the proposed methodology. Finally, compared to the existing works that consider symmetrical spatial features, this method contributes insights to guide emotion recognition with fewer resources. In the future, to realize emotion recognition and regulation, a hardware design embedded with the BRS will be developed.
Appendix
The source codes with an example have been uploaded to the IEEE DataPort (https://doi.org/10.21227/dzsq-b842).
References
Wen G, Li H, Huang J, Li D, Xun E. Random deep belief networks for recognizing emotions from speech signals. Comput Intell Neurosci. 2017;2017:1945630.
Albu F, Hagiescu D, Vladutu L, Puica MA. Neural network approaches for children’s emotion recognition in intelligent learning applications. In: Int Conf Educ New Learn Technol (EDULEARN). 2015;3229–3239.
Lenzoni S, Bozzoni V, Burgio F, et al. Recognition of emotions conveyed by facial expression and body postures in myotonic dystrophy (DM). Cortex. 2020;127:58–66.
Faezipour M, Faezipour M. Efficacy of smart EEG monitoring amidst the COVID-19 pandemic. Electronics. 2021;10(9):1001.
Rahman MM, Chowdhury MA, Fattah SA. An efficient scheme for mental task classification utilizing reflection coefficients obtained from autocorrelation function of EEG signal. Brain Inform. 2018;5(1):1–12.
Yuvaraj R, Murugappan M, Ibrahim NM, et al. Emotion classification in Parkinson’s disease by higher-order spectra and power spectrum features using EEG signals: a comparative study. J Integr Neurosci. 2014;13(1):89–120.
Padfield N, Zabalza J, Zhao HM, Masero V, Ren JC. EEG-based brain-computer interfaces using motor-imagery: techniques and challenges. Sensors. 2019;19(6):1423.
Li S, Lyu X, Zhao L, Chen Z, Gong A, Fu Y. Identification of emotion using electroencephalogram by tunable Q-factor wavelet transform and binary gray wolf optimization. Front Comput Neurosci. 2021;15: 732763.
Zhuang N, Zeng Y, Tong L, Zhang C, Zhang H, Yan B. Emotion recognition from EEG signals using multidimensional information in EMD domain. Biomed Res Int. 2017;2017:8317357.
Patel P, Raghunandan R, Annavarapu RN. EEG-based human emotion recognition using entropy as a feature extraction measure. Brain Inform. 2021;8(1):20.
Zheng WL, Lu BL. Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks. IEEE Trans Auton Ment Dev. 2015;7:162–75.
Al-Nafjan A, Hosny M, Al-Ohali Y, Al-Wabil A. Review and classification of emotion recognition based on EEG brain-computer interface system research: a systematic review. Appl Sci. 2017;7(12):1239.
Niknazar M, Mousavi SR, Vahdat BV, et al. A new framework based on recurrence quantification analysis for epileptic seizure detection. IEEE J Biomed Health Inf. 2013;17(3):572–8.
Choi SJ, Kang BG. Prototype design and implementation of an automatic control system based on a BCI. Wireless Pers Commun. 2014;79:2551–63.
Onton J, Makeig S. High-frequency broadband modulations of electroencephalographic spectra. Front Hum Neurosci. 2009;3:61.
Koelstra S, Muhl C, Soleymani M, et al. DEAP: a database for emotion analysis using physiological signals. IEEE Trans Affect Comput. 2012;3:18–31.
Park SK, Choi H, Lee KJ, Lee JY, An KO, Kim EJ. Emotion recognition based on the asymmetric left and right activation. Int J Med Med Sci. 2011;3(6):201–9.
Menezes MLR, Samara A, Galway L, et al. Towards emotion recognition for virtual environments: an evaluation of EEG features on benchmark dataset. Pers Ubiquit Comput. 2017;21:1003–13.
Wang ZM, Hu SY, Song H. Channel selection method for EEG emotion recognition using normalized mutual information. IEEE Access. 2019;7:143303–11.
Mohammadi Z, Frounchi J, Amiri M. Wavelet-based emotion recognition system using EEG signal. Neural Comput & Applic. 2017;28:1985–90.
Zheng W. Multichannel EEG-based emotion recognition via group sparse canonical correlation analysis. IEEE Trans Cogn Dev Syst. 2016;9(3):281–90.
Kuksa PP. Biological sequence classification with multivariate string kernels. IEEE/ACM Trans Comput Biol Bioinform. 2013;10(5):1201–10.
Li JW, Barma S, Mak PU, Pun SH, Vai MI. Brain rhythm sequencing using EEG signal: a case study on seizure detection. IEEE Access. 2019;7:160112–24.
Chen J, Guo M, Wang X, Liu B. A comprehensive review and comparison of different computational methods for protein remote homology detection. Brief Bioinform. 2018;19(2):231–44.
Reznik SJ, Allen JJB. Frontal asymmetry as a mediator and moderator of emotion: an updated review. Psychophysiology. 2018;55(1):12965.
Ren F, Dong Y, Wang W. Emotion recognition based on physiological signals using brain asymmetry index and echo state network. Neural Comput & Applic. 2019;31:4491–501.
Zhang K, Zhang H, Li S, Yang C, Sun L. The PMEmo dataset for music emotion recognition. In: ACM Int Conf Multimed Retriev (ICMR). 2018;135–142.
Hu X, Yu J, Song M, et al. EEG correlates of ten positive emotions. Front Hum Neurosci. 2017;11:26.
Delorme A, Makeig S. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Methods. 2004;134(1):9–21.
Peng CJ, Chen YC, Chen CC, Chen SJ, Cagneau B, Chassagne L. An EEG-Based attentiveness recognition system using Hilbert-Huang transform and support vector machine. J Med Biol Eng. 2020;40:230–8.
Mert A, Akan A. Emotion recognition based on time-frequency distribution of EEG signals using multivariate synchrosqueezing transform. Digit Signal Process. 2018;81:106–15.
Barma S, Chen BW, Ji W, Rho S, Chou CH, Wang JF. Detection of the third heart sound based on nonlinear signal decomposition and time-frequency localization. IEEE Trans Biomed Eng. 2016;63(8):1718–27.
Sharma RR, Kalyani A, Pachori RB. An empirical wavelet transform-based approach for cross-terms-free Wigner-Ville distribution. Signal Image Video Process. 2020;14:249–56.
Djebbari A, Bereksi-Reguig F. Detection of the valvular split within the second heart sound using the reassigned smoothed pseudo Wigner-Ville distribution. Biomed Eng Online. 2013;12:37.
Chandra AM, Ghosh S, Barman S, Iqbal R, Sadhu N. Effect of exercise and heat-load on simple reaction time of university students. Int J Occup Saf Ergon. 2010;16(4):497–505.
Rey HG, Fried I, Quiroga RQ. Timing of single-neuron and local field potential responses in the human medial temporal lobe. Curr Biol. 2014;24(3):299–304.
Korik A, Siddique N, Sosnik R, Coyle D. Correlation of EEG band power and hand motion trajectory. In: Int Brain-Comp Inter Conf (IBCIC). 2014;95.
Azevedo CR, Boos CF, de Azevedo FM. Classification of epileptiform events in EEG signals using neural classifier based on SOM. In: Int Conf Electric Eng Inform Commun Technol (ICEEICT). 2015;1–5.
Kim MK, Kim M, Oh E, Kim SP. A review on the computational methods for emotional state estimation from the human EEG. Comput Math Methods Med. 2013;573734.
Tyng CM, Amin HU, Saad MNM, Malik AS. The influences of emotion on learning and memory. Front Psychol. 2017;8:1454.
Jie X, Cao R, Li L. Emotion recognition based on the sample entropy of EEG. Biomed Mater Eng. 2014;24(1):1185–92.
Kumar N, Khaund K, Hazarika SM. Bispectral analysis of EEG for emotion recognition. Procedia Comput Sci. 2016;84:31–35.
Jatupaiboon N, Pan-Ngum S, Israsena P. Real-time EEG-based happiness detection system. Sci World J. 2013;618649.
Jatupaiboon N, Pan-Ngum S, Israsena P. Subject-dependent and subject-independent emotion classification using unimodal and multimodal physiological signals. J Med Imaging Health Inform. 2015;5:1020–7.
Val-Calvo M, Álvarez-Sánchez JR, Ferrández-Vicente JM, Fernández E. Optimization of real-time EEG artifact removal and emotion estimation for human-robot interaction applications. Front Comput Neurosci. 2019;13:80.
Lim N. Cultural differences in emotion: differences in emotional arousal level between the East and the West. Integr Med Res. 2016;5(2):105–9.
Islam MR, Islam MM, Rahman MM, et al. EEG channel correlation based model for emotion recognition. Comput Biol Med. 2021;136:104757.
Xing X, Li Z, Xu T, et al. SAE+LSTM: a new framework for emotion recognition from multi-channel EEG. Front Neurorobot. 2019;13:37.
Ahmed MZI, Sinha N, Phadikar S, et al. Automated feature extraction on AsMap for emotion classification using EEG. Sensors. 2022;22(6):2346.
Cui H, Liu A, Zhang X, et al. EEG-based emotion recognition using an end-to-end regional-asymmetric convolutional neural network. Knowl Based Syst. 2020;205: 106243.
Acknowledgements
The authors would like to thank the Guangdong Provincial Key Laboratory of Intellectual Property & Big Data, the Digital Content Processing and Security Technology of Guangzhou Key Laboratory, and the Guangxi Key Lab of Multi-source Information Mining & Security for their support.
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant 62072122, in part by the Scientific and Technological Planning Projects of Guangdong Province under Grant 2021A0505030074, in part by the Project for Distinctive Innovation of Ordinary Universities of Guangdong Province under Grant 2018KTSCX120, in part by the Guangdong Colleges and Universities Young Innovative Talents Projects under Grant 2018KQNCX138, in part by the Special Projects in Key Fields of Ordinary Universities of Guangdong Province under Grant 2021ZDZX1087, and in part by the Guangzhou Science and Technology Plan Project under Grant 202102020857, in part by the Research Fund of Guangdong Polytechnic Normal University under Grant 2022SDKYA015, and in part by the Research Fund of Guangxi Key Lab of Multi-source Information Mining & Security under Grant MIMS22-02.
Author information
Authors and Affiliations
Contributions
Conceptualization: Jia Wen Li and Shovan Barma; methodology: Jia Wen Li and Shovan Barma; formal analysis and investigation: Jia Wen Li, Rong Jun Chen, Shovan Barma, Fei Chen, Sio Hang Pun, and Peng Un Mak. Funding acquisition: Jia Wen Li, Rong Jun Chen, Lei Jun Wang, Xian Xian Zeng, and Hui Min Zhao; resources: Fei Chen, Sio Hang Pun, Peng Un Mak, and Hui Min Zhao. Writing—original draft preparation: Jia Wen Li and Rong Jun Chen; writing—review and editing: Jia Wen Li, Rong Jun Chen, Fei Chen, and Jin Chang Ren; supervision: Rong Jun Chen, Shovan Barma, Fei Chen, Sio Hang Pun, Peng Un Mak, Jin Chang Ren, and Hui Min Zhao. All authors have read and agreed to the final manuscript.
Corresponding author
Ethics declarations
Ethical Approval
All procedures performed in studies involving human participants were conducted in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Consent to Participate
Informed consent was obtained from all individual participants included in this study.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, J.W., Chen, R.J., Barma, S. et al. An Approach to Emotion Recognition Using Brain Rhythm Sequencing and Asymmetric Features. Cogn Comput 14, 2260–2273 (2022). https://doi.org/10.1007/s12559-022-10053-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-022-10053-z