Introduction

Currently, coronavirus disease 2019 (COVID-19) is causing a terrible health crisis worldwide, and many governments have imposed strict regulations to prevent infections, such as self-isolation. During isolation periods, emotions can be strongly influenced. Therefore, to avoid severe mood swings, emotional regulation is meaningful. To achieve this goal, efficiently recognizing the emotional state is a vital step. Previously, emotion recognition has been realized by various modalities, including speech [1], facial expression [2], and body posture [3]. Nonetheless, the emotions indicated by these approaches are subjective and can be easily disguised, especially when the subjects are unwilling for them to be recognized. In addition, a camera and microphone are required for constantly recording the response data; this method is unrealistic for self-isolation due to violations of personal privacy. In this regard, the electroencephalography (EEG) signal is more suitable because it is a tool that has been extensively employed to assess the electrical activities of the brain, which is the control center of emotion. Meanwhile, abnormalities in EEG can also aid in the diagnosis of COVID-19 [4]. Therefore, emotion recognition using EEG is a potential method used for achieving emotional regulation during the self-isolation due to COVID-19.

In order to achieve emotion recognition, it is necessary to extract trustworthy features from EEG. The typical EEG features can be fundamentally categorized into time-domain, frequency-domain, time–frequency domain, and others. Time-domain features apply statistical measurements to characterize EEG, such as the mean, standard deviation, kurtosis, skewness, first difference, and second difference [5]. Frequency-domain features focus on the spectral properties of EEG, such as the powers of frequency subbands and higher-order spectra (HOS) [6]. Time–frequency domain features are mainly from the time–frequency analysis (TFA), which enables frequency information to be related to the time domain. Thus, TFA can provide the features that present dynamic variations in both the time and frequency directions [7]. For example, discrete wavelet transform (DWT) decomposes the EEG into several components that correspond to various frequency subbands and simultaneously conserve time-related information [8]. Similarly, intrinsic mode functions (IMFs) acquired from empirical mode decomposition (EMD) can be denoted as the features to indicate the amplitudes, frequencies, and phases of EEG [9]. Finally, the entropy (approximate entropy (ApEn), differential entropy (DE), sample entropy (SampEn), etc.) that reveals the irregularities of EEG [10] and the connectivity (brain symmetry index (BSI), rational asymmetry (RASM), differential asymmetry (DASM), etc.) that characterize the hemispherical asymmetry of the brain [11] are also valuable features in this field.

An earlier work [12] mentioned that the signal powers with the spectra of EEG are widely used in emotion recognition. In addition, Niknazar et al. [13] claimed that the frequency subbands contain more details regarding constituent neuronal activities underlying EEG. Therefore, the characteristics in the EEG that are not evident in the full spectrum can be amplified when each subband is considered separately. Such spectra are termed brain rhythms: δ (0–4 Hz), θ (4–8 Hz), α (8–13 Hz), β (13–30 Hz), and γ (30–50 Hz) [14]. Furthermore, the existing works all concluded that their variations could help to assess emotion accordingly. For instance, γ power is sensitive to sadness and happiness [15]. θ power exhibits a negative correlation with arousal [16]. In the T3–T4 channels, a hemispherical asymmetry exists in β power when the emotion is fear, while another hemispherical asymmetry of α power appears when the emotion is sadness [17].

Generally, the rhythmic features are extracted from multichannel data. Such a large data size incurs a heavy computational burden of feature extraction and increases the hardware complexity of emotion recognition. Hence, selecting the optimal features from several representative channels is an efficient solution. This consideration is vital when designing a portable emotion-aware device applied for self-isolation because fewer sensors or electrodes placed on the scalp can support a convenient way to measure EEG. Therefore, channel selection is needed, and several works have been conducted to achieve this goal. Zheng and Lu [11] employed deep belief networks (DBNs) to recognize three types of emotions (positive, neutral, and negative) and explored the representative channels that outperform full-channel data with less performance loss. The power spectral densities (PSDs) of five brain rhythms, RASM, DASM, and the differences between the DE of 23 pairs of channels were employed as the features. These results indicated that the classification accuracy using 4 channels (T7, T8, FT7, and FT8) was 82.88% that using 12 channels (C5, C6, CP5, CP6, T7, T8, FT7, FT8, P7, P8, TP7, and TP8) was 86.65%, and that using all 62 channels was 86.08%. Menezes et al. [18] extracted the PSDs of five brain rhythms from 4 channels (FP1, FP2, F3, and F4) for emotion recognition. With the help of the support vector machine (SVM), the classification accuracies achieved 71.7% for arousal and 73.8% for valence. In addition, the features of δ and θ produced better results than the others. Wang et al. [19] applied normalized mutual information (NMI) for emotion recognition. First, short-time Fourier transform (STFT) was used to obtain EEG spectrograms. Then, all spectrograms were utilized to calculate the NMI connection matrix. Finally, emotion recognition was accomplished by thresholding with connection matrix analysis. This approach can achieve classification accuracies of 74.41% for valence using 8-channel data and 73.64% for arousal using 10-channel data. Mohammadi et al. [20] performed DWT to investigate a minimum number of channels and the optimal rhythmic features for emotion recognition. They applied the entropies and PSDs of five brain rhythms as the features. The results revealed that five pairs of symmetrical channels (FP1–FP2, F3–F4, F7–F8, FC1–FC2, and FC5–FC6) realize the classification accuracies of 84.05% for arousal and 86.75% for valence. Zheng [21] developed group sparse canonical correlation analysis (GSCCA) for emotion recognition and utilized logarithm frequency subband powers of five brain rhythms as features to train the classification model. The results demonstrated that the higher frequency subbands (such as β and γ) are more appropriate for emotion recognition. In addition, the accuracies through 4, 12, and 20 channels were 80.20%, 83.72%, and 82.45%, respectively.

The above works mainly used PSDs, DE, etc., of five brain rhythms for emotion recognition, and the channel selection is implemented by the classification accuracies accordingly. Nevertheless, the chronological variations in brain rhythms have not yet been considered. Inspired by the knowledge of sequencing in bioinformatics, the characteristics of different species are represented as biological sequences, which can be used for data mining, analysis, and classification [22]. Then, if the brain rhythms are interpreted in a sequential format, the time–frequency characteristics of EEG can be expressed simultaneously. Such time-series data are also available for classification. To this end, a method termed brain rhythm sequencing (BRS) that analyzes the EEG as the sequence consisting of a dominant rhythm has been proposed for seizure detection in previous work [23]. Now, considering that similarity is a fundamental analysis derived from homology theory [24] and that asymmetry can be denoted by measuring the similarity between pairwise sequences, in this work, the similarity measures are operated on the brain rhythm sequences generated by symmetrical channels (e.g., FP1–FP2, F3–F4, and F7–F8). Then, the asymmetric feature that shows neuronal synchrony of the left and right hemispheres can be acquired for emotion recognition. This method provides a novel way to study brain asymmetry, where asymmetry is a vital aspect of cognitive functions, including emotion [25], and most of the existing works usually analyze asymmetry through frontal alpha asymmetry or the brain asymmetry index [26]. In addition, asymmetric features can be extracted from all pairs of symmetrical channels. After these evaluations, the best one that produces impressive accuracy is found. Therefore, high classification accuracy can be accomplished by an optimal feature extracted from only one pair of symmetrical channels. Such results also contribute insights to explore individual characteristics of emotion recognition. In short, the novelties of this work are as follows:

  • The BRS concentrates on the chronological variations of brain rhythms, and with the help of similarity measure methods, asymmetric features can be extracted and applied for emotion recognition.

  • The representative symmetrical channels of emotion recognition are studied by considering the optimal asymmetric features found, so the portable emotion-aware device can be further simplified with fewer channels of data.

  • The emotional EEG recordings are acquired from a music emotion recognition (MER) experiment and public DEAP dataset [16], so the proposed method can be extensively evaluated, providing insights for exploring individual characteristics in different scenarios.

For illustration, Fig. 1 shows the system workflow of this work. First, the EEG recordings are acquired from the MER experiment and public DEAP dataset. Then, the brain rhythm sequences of different channels are generated using the reassigned smoothed pseudo Wigner-Ville distribution (RSPWVD) method. Second, the generated sequences are paired based on symmetrical channels located on the left and right hemispheres. Hence, a number of asymmetric features can be extracted from various pairs of symmetrical sequences through similarity measure methods. Subsequently, k-nearest neighbors (k-NN), support vector machine (SVM), and linear discriminant analysis (LDA) are applied to train and test the extracted features based on leave-one-trial-out (LOTO) cross-validation. Therefore, the classification accuracies of all asymmetric features are evaluated. Finally, the optimal feature and its related channels are identified by considering the highest classification accuracies. Such results are also utilized to investigate individual characteristics. Meanwhile, a comparative study with the existing works that exploit symmetrical spatial features is conducted.

Fig. 1
figure 1

System workflow of this work

The rest of this work is as follows: the “Experimental data” section describes the EEG data acquired from the self-designed MER experiment and public DEAP dataset. The “Proposed methodology” section introduces the details about the BRS and its classification method using asymmetric features. The “Results and discussion” section shows the results from the respective scenarios, with discussion and performance comparisons. The “Conclusion” section is the summary and future work.

Experimental Data

Data acquisition is the first stage in an EEG-based study. In this work, the EEG recordings from two scenarios are included. One is from the self-designed MER experiment, and the other is from the public DEAP dataset, as detailed below.

Self-Designed MER Experiment

The MER experiment that evokes emotion through music clips was conducted in the laboratory at the Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China. The experimental procedures involving human subjects were performed in accordance with the ethical standards of the institutional research committee. A Neuroscan 64-channel system (62 scalp channels, 2 periocular channels) was applied to record the EEG. Thirty-six healthy subjects (13 women, 23 men, 22.22 ± 3.13 years) were recruited. The musical clips (each 30 s) from the PMEmo dataset [27] were employed to evoke different emotional reactions of the subjects. The PMEmo contains 794 songs (almost all are in English) selected from three popular music charts, with annotations of arousal and valence (normalized into 0–1) rated by 457 subjects. To perform balance elicitation in this work, 40 stimuli were chosen based on the emotional annotations, in which 10 stimuli were used for each category, i.e., HAHV (high arousal high valence), HALV (high arousal low valence), LAHV (low arousal high valence), and LALV (low arousal low valence). Here, the threshold of high and low was 0.5. In addition, these 40 stimuli were divided into two sessions for presentation, and there was a break (5 min) between the two sessions. Meanwhile, each trial included three phases: rest (10 s), music listening (30 s), and self-assessment (20 s). Thus, the duration of one trial was 1 min, and the entire experiment required 45 min per subject, as depicted in Fig. 2.

Fig. 2
figure 2

Paradigm of the MER experiment

In the beginning, the subjects were informed about the procedure, and they signed the consent form after their questions and doubts were fully answered. After that, a questionnaire including age, gender, body condition, and habits was collected. To protect personal privacy, the names of the subjects were denoted by S1, S2, S3, and so on. During the experiment, the subjects performed self-assessments after listening to each musical clip, and their ratings (from 1 to 9) were based on five factors: arousal, valence, liking, familiarity, and understanding. The experimental trials are labeled high or low, where the threshold is 5 [20]. Then, five test sets were obtained: set-A (arousal), set-V (valence), set-L (liking), set-F (familiarity), and set-U (understanding).

Figure 3 illustrates the system overview of the MER experiment. The subjects sat on a sofa and listened to the musical clips with earphones. A stimuli computer scheduled and presented the musical clips, while a data acquisition computer measured the EEG data. Thus, the full data of all trials were contained in the recordings after pairing the two computers. In addition, an amplifier was applied to connect the data acquisition computer and the EEG cap. For preprocessing, the raw EEG recordings were downsampled to 200 Hz and then a bandpass filter with a cut-off frequency of 0.01–50 Hz was utilized for data filtering [28]. Subsequently, the EEG artefacts (e.g., eye movements, muscle activities) were removed by employing independent component analysis (ICA) through the EEGLAB toolbox [29]. Finally, the preprocessed data were acquired for method validation.

Fig. 3
figure 3

System overview of the MER experiment

Pubic DEAP Dataset

DEAP [16] is one of the most famous public datasets and has been extensively evaluated in emotion recognition. In this dataset, a 32-channel system was employed to record the EEG data from 32 healthy subjects (15 women, 17 men, 27.19 ± 4.45 years). Regarding the stimuli, 40 musical videos (each 60 s) were utilized to evoke the emotions. Thus, the emotional EEG size of each subject was 32 channels × 40 trials × 60 s. After watching each video, the subjects performed self-assessments (from 1 to 9) based on two factors: arousal and valence. Hence, the trials were also labeled and divided into two test sets, where set-A is high arousal (HA), A ≥ 5, and low arousal (LA), A < 5; set-V is high valence (HV), V ≥ 5, and low valence (LV), V < 5. Moreover, DEAP provides the preprocessed data, in which the raw recordings were downsampled to 128 Hz and filtered through a bandpass filter with a cut-off frequency of 0.01–100 Hz. Therefore, the preprocessed data were applied for method validation.

Proposed Methodology

To achieve emotion recognition using the BRS, signal processing by the RSPWVD is conducted first, which aims to extract the rhythmic powers of the EEG. Then, from the resulting time–frequency plane, the instantaneous powers of five brain rhythms in the same time bin are estimated, so the dominant rhythms can be determined and used for generating the sequence data. Next, the asymmetric features are extracted through the similarity measure methods of the rhythm sequences from the symmetrical channels. Finally, the classification task is achieved by the asymmetric features after training and testing based on LOTO cross-validation. The above operations are detailed in the following subsections.

Signal Processing by TFA

The sequence data need the dominant rhythms along the time scale of EEG, so it is necessary to find a particular brain rhythm in each time bin. This objective can easily be realized by considering the instantaneous power distributions in the time–frequency plane. To this end, TFA is used to extract the signal power information first. Previously, several TFA techniques have been employed for emotion recognition, such as STFT [19], DWT [20], Hilbert-Huang transform (HHT) [30], and Wigner-Ville distribution (WVD) [31]. After comparisons, WVD was chosen because it is good at tracking the sudden variations of the signal in the time domain and at preserving both the time and frequency shift information [32]. Consequently, the instantaneous power distributions in the time–frequency plane can be acquired by WVD (1):

$$W_{x} (t,\omega ) = \int\limits_{ - \infty }^{ + \infty } {x(t + \frac{\tau }{2})} x^{ * } (t - \frac{\tau }{2})e^{ - j\omega \tau } d\tau$$
(1)

where x(t) denotes the input signal, t and ω are time and frequency, respectively, and * refers to the complex conjugate.

However, WVD suffers from cross-terms in its resulting plane. Cross-terms cause multiple irrelevant regions, which can be regarded as the artefacts that appear in the WVD representations. Such artefacts falsely show the signal components and interfere with the power localization in the plane accordingly [33]. To eliminate the cross-terms, the smoothing version of WVD over time and frequency is needed (2):

$$W_{h} (t,\omega ) = g(t)H( - \omega )$$
(2)

where h(t) and g(t) are the smoothing windows applied to the frequency and time to eliminate the cross-terms, and H(ω) denotes the Fourier transform of h(t).

In this work, the smoothing window is the Hamming window, and the independent controls are equipped with the WVD. This variant is smoothed pseudo WVD (SPWVD) (3):

$$SPW_{x} (t,\omega ) = \int\limits_{ - \infty }^{ + \infty } {h(\tau )\int\limits_{ - \infty }^{ + \infty } g (s - t)x(s + \frac{\tau }{2})} x^{ * } (s - \frac{\tau }{2})e^{ - j\omega \tau } dsd\tau$$
(3)

Furthermore, the reassignment provides the effectiveness to enhance the readability of the TFA [34], so it is conducted in SPWVD. Its principle is to rearrange the coefficients of the time–frequency distributions around new zones to produce a high-resolution result, which can be viewed as a complement to achieve the true region of the analyzed signal. For instance, relocate each value of SPWVD at any point (t, ω) to another point (\(\hat{t}\), \(\hat{\omega }\)), which is the center of gravity of the power distribution around (t, ω). Hence, the reassigned value of SPWVD at any point (\(\hat{t}\), \(\hat{\omega }\)) is the sum of all of the values reassigned to that point (4):

$$\begin{aligned}SPW_{x}^{(r)} (t^{\prime},\omega^{\prime};g,h) = &\int\limits_{ - \infty }^{ + \infty } {\int\limits_{ - \infty }^{ + \infty } {SPW_{x} (t,\omega ;g,h)} } \delta \\&(t^{\prime} - \hat{t}(x;t,\omega ))\delta (\omega^{\prime} - \hat{\omega }(x;t,\omega ))dtd\omega\end{aligned}$$
(4)

where:

$$\hat{t}(x;t,\omega ) = t - \frac{{SPW_{x} (t,\omega ;\tau_{g} ,h)}}{{2\pi SPW_{x} (t,\omega ;g,h)}}$$
(5)
$$\hat{\omega }(x;t,\omega ) = \omega + j\frac{{SPW_{x} (t,\omega ;g,D_{h} )}}{{2\pi SPW_{x} (t,\omega ;g,h)}}$$
(6)

A result using RSPWVD to process an EEG signal (F8 channel, subject S2, DEAP) is presented on the left side of Fig. 4, in which the horizontal and vertical axes are time and frequency, respectively, and the color bar indicates the variations in the signal powers.

Fig. 4
figure 4

Power distributions of five brain rhythms in each 0.2 s time bin on a time–frequency plane from the RSPWVD method. The EEG signal is from the F8 channel of subject S2, DEAP. The left side depicts a result using RSPWVD and the right side illustrates the estimations of the instantaneous rhythmic powers at each 0.2 s time bin of the signal

Generation of Rhythm Sequence

In the resulting time–frequency plane from RSPWVD, the frequency axis is divided into five parts based on five rhythms. Meanwhile, the time axis is separated into various time bins (t1, t2, t3 …), which can be referenced by the average reaction time of neurons from the existing works. Previously, Chandra et al. [35] claimed that the average reaction time of neurons is approximately 0.14–0.2 s. Rey et al. [36] employed the TFA method to analyze the EEG and found that the average evoked power occurs at approximately 0.2 s. In addition, in two EEG-based studies, Korik et al. [37] and Azevedo et al. [38] exploited the 0.2 s time bin EEG data to accomplish the decoding of hand motion trajectories and seizure detection, respectively. Based on these findings, the time bin of BRS is 0.2 s.

Next, the dominant rhythm in each time bin is acquired by considering the instantaneous power because it has been demonstrated to be the key to emotion recognition [39]. For this aim, the five rhythmic powers in the same time bin are calculated. For instance, on the right side of Fig. 4, α power at t3 has been illustrated, which is estimated by the average of all powers located inside the boundary. In this way, all five rhythmic powers at each 0.2 s are obtained, and the dominant rhythm having the maximum instantaneous power in each time bin of the EEG can be identified, which forms the rhythm sequence accordingly. A sample is depicted in Fig. 5, where an EEG signal from the PO3 channel of subject S2 in DEAP is displayed at the bottom, and its generated sequence (20–25 s) is shown at the top.

Fig. 5
figure 5

EEG signal analysis using BRS, which is generated by the dominant rhythm having the maximum instantaneous power in each 0.2 s time bin. The EEG signal is from the PO3 channel of Subject S2, DEAP

Asymmetric Features Extraction

As stated, emotion is an inner reaction controlled by the brain, which is a complex network system organized into different functional areas. Typically, functional differences appear between the left and right hemispheres on particular tasks, such as motor control, perception, memory, and emotion [40]. Such differences are ubiquitous across brain information processing. Consequently, the asymmetric features are valuable to assess emotion, and they have been considered in this work.

As seen, the proposed sequence discloses the chronological variations of the dominant rhythms on a specific channel during the emotional process. Hence, the neuronal synchrony of the left and right hemispheres can be indicated through similarity measures to those sequences from the symmetrical channels, then denoted as the asymmetric features. To this end, it is necessary to pair them based on the scalp locations, as summarized in Table 1, in which the first column displays the scalp regions and the remaining columns list the details of the symmetrical channels from DEAP and MER.

Table 1 The symmetrical channels based on the five scalp regions from DEAP and MER

In Table 1, the scalp is divided into five regions: frontal, central, parietal, temporal, and occipital, so the total number of symmetrical channels is 14 and 27 from DEAP and MER, respectively. Next, a more important step is to appropriately measure the similarity levels between the pairwise rhythm sequences so that the asymmetric features can be extracted correspondingly. To this end, seven typical similarity measure methods are considered, including Jaccard index (JAC), Hamming distance (HAM), Levenshtein distance (LEV), dynamic time warping (DTW), mutual information (MUT), local sequence alignment (LSA), and global sequence alignment (GSA).

JAC is a statistical approach that measures the percentage of overlap between pairwise sequences. HAM and LEV belong to distance-based methods, in which HAM calculates the number of elements at which the pairwise sequences differ, and LEV finds the minimum number of edits (either insertion, deletion, or substitution) required to change one sequence to be the same as the other. DTW applies a time-warping function that transforms or warps the elements to align the pairwise sequences. Hence, it can generate an optimal alignment between them. MUT evaluates the interdependence interactions derived from the concept of entropy in information theory. Therefore, it can estimate the information integration of pairwise sequences to reveal their similarity levels. LSA and GSA are widely used in bioinformatics, as they are good at identifying the similarity regions between pairwise sequences, where LSA is operated by the Smith-Waterman algorithm that aligns a portion between the sequences and GSA is implemented by the Needleman-Wunsch algorithm that aims for an end-to-end alignment.

Figure 6 shows the extraction of asymmetric features. First, the brain rhythm sequences are paired based on the symmetrical channels listed in Table 1, such as FP1–FP2, AF3–AF4, P7–P8, and O1–O2. Then, the aforementioned similarity measures are performed on all pairwise sequences. Here, the total number of extracted features per subject was 3920 (14 pairs × 7 measures × 40 trials) and 7560 (27 pairs × 7 measures × 40 trials) on DEAP and MER, respectively. Such asymmetric features can achieve emotion recognition after training and testing by the classifiers, as described in the next stage.

Fig. 6
figure 6

Extraction of asymmetric features based on similarity measures of the brain rhythm sequences from symmetrical channels located on the left and right hemispheres

Classification Method

After extraction, the number of each asymmetric feature (e.g., FP1–FP2 by DTW) per subject is 40. Compared with the deep learning method, the conventional classifier is more appropriate, as it can build up a classification model when the feature size is small, which also yields a good performance in the existing works [20, 41]. Therefore, k-NN, SVM, and LDA are utilized in this work.

For k-NN, k means the number of nearby instances used for deciding the category of testing data. This value typically approximates the square root of the number of the training set, and it prefers to be a small positive integer. In addition, keeping its value odd makes the decision process faster. Following this rule, the value of k is chosen as 5, as the number of training sets on DEAP and MER is 39 trials per case. SVM creates a classification model based on a decision boundary or a maximal margin that separates the training set into two categories. Hence, the testing data can be classified by its location. LDA is a linear classifier that establishes a probabilistic model for each category by considering the specific distribution of the input training set. Thus, the testing data can be classified based on its conditional probability belonging to the category with a higher probability.

To reduce perturbations incurred by different trials and solve the overfitting risks, LOTO cross-validation is applied in training and testing. Its procedure is allocating the feature from one trial as testing data and then assigning the features from the remaining trials as the training set. This process is repeated by defining the features from various trials as the testing data until all of them are classified. After comparing the testing results with the original labels, the classification accuracy by each type of asymmetric feature is obtained, and the optimal feature that produces the best result for emotion recognition can be identified. Consequently, high classification accuracy is accomplished with an optimal asymmetric feature only.

Results and Discussion

In this work, MATLAB R2021a was applied for programming the proposed methodology, and the results in various test sets are from its calculations. Then, for the results, the performances of seven similarity measure methods with three classifiers are discussed to summarize the appropriate method for measuring the similarity levels of rhythm sequences and the suitable classifiers for asymmetric features. In addition, the representative symmetrical channels used for recognizing specific emotional factors are analyzed based on the optimal features found. Note that the conditions are different between DEAP and MER, so the results and discussion are separated into two subsections. Finally, the performance comparison with the existing works that consider symmetrical spatial features is carried out.

Results From the DEAP Dataset

The average classification accuracies of the asymmetric features extracted by seven similarity measure methods are presented in Tables 23, and 4, respectively, in which the first column lists the methods and the remaining columns display the accuracies of set-A and set-V using the asymmetric features extracted from the sequence data from the first 30-s (F30 s), last 30-s (L30 s), and all 60-s (A60 s) periods. Here, to calculate the classification accuracy of each subject, 40 experimental trials are classified. Thus, these results are from 40 simulation runs per subject and then averaged by 32 subjects. In addition, the best of each case is underlined.

Table 2 The average classification accuracies (mean ± standard deviation %) of the asymmetric features using k-NN, DEAP dataset
Table 3 The average classification accuracies (mean ± standard deviation %) of the asymmetric features using SVM, DEAP dataset
Table 4 The average classification accuracies (mean ± standard deviation %) of the asymmetric features using LDA, DEAP dataset

Meanwhile, for illustration, Fig. 7 depicts a comparative histogram to display the average accuracies of three classifiers with different similarity measure methods on set-A of the DEAP dataset, where the data sources are from the L30 s. As observed, the performances of SVM are similar to those of LDA, while k-NN yields better results. Similar trends can also be found in the other scenarios. The main reason may be the properties of the classifiers. SVM generates a hyperplane that separates the training set in the frontier between two classes, and LDA makes a hyperplane that separates the training set. Thus, both achieve classification by separating the hyperplane with a special margin. Then, k-NN conducts the classification through a cluster determined by known neighbors (i.e., a training set) around the testing data. Such results also reveal that the distribution of the asymmetric features is more fit with k-NN. The comparisons indicate that k-NN is more suitable for use as the classifier for training and testing the asymmetric features in this work.

Fig. 7
figure 7

Average classification accuracies (set-A_L30 s, DEAP) using the asymmetric features extracted from different similarity measure methods with three classifiers (k-NN, SVM, and LDA)

In addition, when using the same classifier, the performances by different similarity measure methods are close, as their variations are slight. This indicates that there are no substantial differences in similarity measure methods for the asymmetric features. The main reason may be that the components of the sequences are only five brain rhythms, and their length is either 150 (i.e., 30 s) or 300 (i.e., 60 s), so they can be viewed as a short string. For the methods investigated, even though some are distance-based, and some are shape-based, they may not produce different performances in the similarity levels between such strings. Here, DTW provides approximately 1–2% higher accuracy than the others. Based on the above considerations, it can be said that the performances by different similarity measure methods are close when using the same classifier. Overall, DTW is slightly better. Therefore, DTW is recommended as the similarity measure method to extract the asymmetric features in this work.

Furthermore, the length of the brain rhythm sequence is the same as the length of EEG, so different lengths are evaluated to investigate the time effect in emotion recognition. Here, close results are obtained when employing 30 s and 60 s data on the respective classifiers, disclosing that the 30-s period is sufficient to realize a similar performance as 60 s. As a result, the time applied for emotion recognition can be further reduced from 60 to 30 s, which also removes the redundant data at the time scale. More importantly, the L30 s data exhibit slightly better results than the F30 s data. This may be due to the later periods containing more emotion-related information than the earlier periods. Similar findings have been reported previously. Kumar et al. [42] compared the classification accuracies on DEAP by F30 s and L30 s data, respectively. The results revealed that the L30 s period is more associated with emotion. In another work, Jatupaiboon et al. [43] assessed the accuracies of arousal and valence through the F30 s, L30 s, and A60 s data, respectively. They claimed that the L30 s data yield the best average accuracy. Thus, the aforementioned works also demonstrated that the results from the proposed methodology are reasonable.

The above analysis indicates that the DTW is appropriate for the similarity measure, k-NN is suitable for the classifier, and the L30 s period is proper for emotion recognition. Based on such properties, the classification accuracies using the asymmetric features extracted from various symmetrical channels are evaluated. Figure 8 illustrates the results of subject S3 from DEAP, in which a and b depict the accuracies on set-A and set-V, respectively. The deeper the red, the higher the classification accuracy. In Fig. 8, the accuracies of the asymmetric features vary with the emotional factors, even for the same subject. For example, the asymmetric feature of FC1–FC2 yields a remarkable accuracy (95%) on set-A, but it is not the best (75%) on set-V, while CP1–CP2 is more useful (80%) on set-V. Such findings further imply that the similarity levels of rhythm sequences between FC1 and FC2 and between CP1 and CP2 are sensitive to variations in arousal and valence, respectively. Consequently, the emotion recognition of subject S3 can be directly achieved by the corresponding asymmetric features.

Fig. 8
figure 8

Classification accuracies using the asymmetric features extracted from various symmetrical channels (subject S3, DEAP). The deeper red indicates a higher classification accuracy: a set-A, b set-V

Further investigations were conducted to determine the performances of asymmetric features among different subjects on the same test set. Figure 9 draws the accuracies of the asymmetric features for set-A from four subjects (S3, S5, S21, and S25) on DEAP. As observed, although the asymmetric features are extracted and classified in the same way, their performances change by subject. For instance, the asymmetric feature of FC5–FC6 is only vital for subject S21 (Fig. 9c), while it is not active for the others. Such distinctness implies that emotion recognition exhibits subject-dependent properties, consistent with earlier works [44, 45].

Fig. 9
figure 9

Classification accuracies of four subjects on set-A of DEAP: a S3, b S5, c S21, and d S25

Taking the highest accuracy, the optimal asymmetric features are identified for all subjects of DEAP. It is interesting to know the locations of the optimal features. To this end, the statistical percentages based on five scalp regions are presented in Table 5, in which the first row denotes the region and the remaining rows display the percentages on different test sets. The results reveal that the optimal features are mainly from the symmetrical channels located in the frontal or parietal regions. Regarding brain function, the frontal region regulates cognitive awareness from stimuli, and the parietal region processes perceptual information from audio and vision. Thus, the EEG recordings from such regions function in the reactions under stimuli. This may be why frontal asymmetry has been commonly used to assess emotions [25, 26]. In addition, Table 5 implies the involvement of other regions in emotion recognition, revealing that an appropriate solution considers the representative symmetrical channels per subject, rather than a fixed feature for all cases. In this regard, the proposed methodology is valid for obtaining the optimal feature for each test set.

Table 5 Statistical percentages of optimal asymmetric features based on five scalp regions from 32 subjects of DEAP

Results From the MER Experiment

Using DTW and k-NN, the results from the MER experiment are obtained. Here, the analysis and discussion also consider the optimal asymmetric features found. For this aim, the statistical percentages based on five regions are summarized in Table 6, in which the first row shows the region, and the remaining rows list the percentages of the respective test sets (arousal, valence, liking, familiarity, and understanding).

Table 6 Statistical percentages of optimal asymmetric features based on five scalp regions from 36 subjects of MER

In Table 6, the MER results of set-A and set-V are similar to the DEAP results displayed in Table 5, as the optimal asymmetric features are also mainly from the frontal or parietal regions. Such consistency proves that the proposed methodology is available to select the representative symmetrical channels under different experimental conditions. Moreover, regarding the three test sets (set-L, set-F, and set-U) that are not investigated in DEAP, their optimal features are primarily located in the temporal, parietal, and frontal regions, respectively. Usually, the temporal region copes with sound information such as music. Therefore, it is reasonable that the data from this region can assess the liking feeling when listening to the music. As discussed, the parietal region always addresses perceptive information involving audio. Hence, its data can disclose the effect of familiarity evoked by the music. The frontal region controls conscious thought from external stimuli, so its data can help to answer whether the lyrics or musical rhythm is understood by the subjects.

In addition, the statistical percentages in Table 6 exhibit individual characteristics. To further discuss such characteristics, Fig. 10 illustrates the optimal asymmetric features used for recognizing five emotional factors for subjects S9 and S14 of MER, where different colors correspond to various emotional factors. As observed, the locations of the optimal features are adjacent, revealing that the emotional reaction should be a complex procedure that requires a group of surrounding channels to process. Moreover, different factors are typically recognized by particular symmetrical channels, and such properties also vary with the subjects. This may imply that there is no general model of emotion elicitation among the different cases. Previously, Lim [46] claimed that emotion is related to the cultures, backgrounds, and experiences of the subjects, so emotion recognition is likely to be subjective, such as in the results found here.

Fig. 10
figure 10

The optimal asymmetric features used for recognizing five emotional factors of two subjects of MER: a S9, b S14

Performance Comparison

A performance comparison with the existing works is summarized in Table 7, in which the first column lists the work and the remaining columns show the number of channels applied for emotion recognition, methodology, and the classification accuracies on various cases correspondingly. In addition, the best of each case is underlined.

Table 7 Performance comparison with the existing emotion recognition works

In Table 7, all of these works consider the symmetrical spatial features to investigate the DEAP dataset. For example, Wang et al. [19] used the NMI matrix derived from the spectrograms of all pairs of symmetrical channels. Mohammadi et al. [20] applied entropies and PSDs from five pairs of symmetrical channels. Kumar et al. [42] utilized bispectrum analysis of the symmetrical channels FP1–FP2. Islam et al. [47] designed Pearson’s correlation coefficient images from all pairs of symmetrical channels. Xing et al. [48] developed a linear mixing model based on the frequency subband power features from all pairs of symmetrical channels. Ahmed et al. [49] proposed a two-dimensional vector consisting of the asymmetry in different brain regions and termed it AsMap. Cui et al. [50] exploited the regional asymmetric features located on the left and right hemispheres of the brain. From the comparisons, even though the accuracies are not the best when using the proposed BRS method, it achieves impressive results through an asymmetric feature extracted from only one pair of symmetrical channels. In addition, deep learning methods such as neural networks achieve superior accuracy. However, their main limitation is that a large training dataset is needed, so all channel data are usually applied, meaning that when the dataset is smaller, it is not easy to train a neural network with outstanding performance. In this regard, the proposed methodology is more suitable for processing a smaller dataset because the number of applied channels is comparatively lower. This property fully considers the trade-off between classification accuracy and the number of channels. Therefore, different approaches can determine various conditions of emotion recognition.

Moreover, this work obtains superior results in the MER experiment, while most of the existing works were without self-designed experiments. This comparison demonstrates that the proposed methodology has stable performances on both the public dataset and the experimental data, indicating that it is reliable for different scenarios. In addition, in this work, the simulation conditions are central processing unit (CPU): Intel Core i5-10,505@3.20 GHz; random access memory (RAM), 8 GB; hard disk drive, 1 TB, 7500 revolutions per minute. Using it, the time of sequencing is approximately 18 s when the EEG length is 30 s, and it is approximately 52 s when the EEG is 60 s. After that, for each subject, with the sequences generated by different channel data, it takes approximately 49 s to extract the asymmetric features using seven similarity measure methods. Finally, regarding the classification through k-NN, SVM, and LDA, the time including the training and testing periods is approximately 31 s. Therefore, the settings of DTW, k-NN, and L30 s are formed, which can simplify the whole classification process. Note that there is no strict memory requirement for the proposed methodology. Undoubtedly, a larger memory size speeds up the simulation runs. In short, the BRS exhibits advantages in simplifying portable emotion-aware devices such as low-cost EEG headsets, which further provide a solution to recognize the emotions of the human being during self-isolation through the use of fewer electrodes or sensors.

Conclusion

In this work, the asymmetric features derived from the similarity measures of brain rhythm sequences have been proposed, which provide a potential solution to design low-cost emotion-aware devices used for self-isolation during the COVID-19 pandemic. The method validation was performed on the EEG recordings from the MER experiment and the public DEAP dataset. The results revealed that one pair of symmetrical channels is sufficient to extract an optimal feature for producing classification accuracies up to 80–85%. In addition, the asymmetric features found are beneficial for investigating the response mechanisms of emotion, and further investigation showed that emotion recognition exhibits strongly individual characteristics. Therefore, to achieve an impressive performance, an appropriate approach is to consider the subject-dependent properties, which can be obtained by the proposed methodology. Finally, compared to the existing works that consider symmetrical spatial features, this method contributes insights to guide emotion recognition with fewer resources. In the future, to realize emotion recognition and regulation, a hardware design embedded with the BRS will be developed.

Appendix

The source codes with an example have been uploaded to the IEEE DataPort (https://doi.org/10.21227/dzsq-b842).