1. Introduction
In neuroscience, electroencephalography (EEG) has commonly been used as an early diagnostic tool for examining various abnormal neurological or psychiatric conditions, encompassing attention-deficit/hyperactivity disorder (ADHD), chronic pain, major depressive disorder, and obsessive–compulsive disorder. ADHD is one of the most common neurodevelopmental disabilities affecting both children (approximately
) and adults (approximately
) [
1]. Individuals who have ADHD can experience various symptoms, such as social interaction difficulties, depression, anxiety, learning challenges, and sleep disruptions [
2]. Insomnia, a sleep disorder, can contribute to emotional issues, frustration, and learning difficulties [
3]. It also causes problems with decision making and poses a risk for heart-related diseases [
4]. Chronic pain, persisting for over 12 weeks despite treatment, significantly impacts patients’ lives, leading to depression, presenteeism, and financial burdens on families or caregivers [
5]. Major depressive disorder (MDD), the most prevalent mental disorder, adversely affects physical health, sleep, mood, and social activities, potentially leading to suicidal thoughts or attempts [
6]. Obsessive–compulsive disorder (OCD), another common mental disorder, diminishes quality of life through persistent intrusive thoughts and repetitive behaviors [
7,
8]. The direct economic cost of mental or neurological disorders in the United States surpassed USD 210 billion in 2010, with the global economic cost of mental disorders estimated at USD 2.5 trillion [
9,
10].
Since neurological and mental disorders are increasingly prevalent and closely related to various mental and physical health conditions and deficits in social interaction, the early detection of abnormal conditions is crucial for effective care planning, intervention, and advocacy. EEG is capable of early diagnosis identification and is reliable and valid in accurately detecting neurological and mental disorders. While previous studies have adeptly utilized EEG data for early detection, these studies primarily focused on analyzing and computing techniques for isolating single disorders. Moreover, despite the existence of more than six hundred neurological diseases, limited research has been conducted to identify neurological disorders using a unified framework. This study shifts the focus to comprehending the patterns associated with multiple neurological disabilities by extracting significant features and subjecting them to analysis to enhance our understanding of these disabilities through visualization techniques. We introduce an automated pattern recognition approach that integrates visual analysis and machine learning to analyze neurological disabilities. It is important to note that, in this study, the term ’neurological disability’ encompasses both neurological disorders and mental disorders. To the best of our knowledge, there is no previous study that analyzes multiple disorders based on EEG data while integrating multiple visualization techniques.
EEG signals are recognized as multivariate time series data, comprising a sequence of spatial and temporal information ordered over time. The signals encompass diverse local and global information, enabling the identification of distinctive patterns for various health conditions and the tracking of trends in individual health conditions. This information is particularly valuable for capturing unique characteristics related to neurological disabilities. However, extracting significant features from EEG signals is an essential yet challenging task in neurological studies. Although a few previous studies have demonstrated the potential to differentiate neurological disabilities using classification algorithms, there exists room for performance enhancement and the advancement of knowledge to accurately differentiate disorders from one another. Due to similar symptoms between ADHD and other psychiatric disorders, identifying ADHD through the analysis of neurophysiological signals remains a challenging task [
11]. Therefore, this study introduces an approach that integrates machine learning and visualization to determine differences among patients with neurological disabilities. We emphasize the importance of identifying critical features from EEG signals associated with individual neurological disabilities and determining their unique patterns, as this enhances our ability to understand the disabilities. In summary, the primary contributions of this study are outlined as follows:
We designed an efficient feature extraction approach, utilizing signal processing techniques to analyze EEG data associated with multiple neurological and mental disorders with short-term EEG segmentation, such as 1 s and 2 s segmentation intervals.
Our work is the first study utilizing visualization techniques to enhance understanding of the extracted features to distinguish the disorders.
We conducted multiple data classification analyses utilizing machine learning (ML) algorithms, including support vector machine (SVM), random forest (RF), k-nearest neighbors (KNN), and logistic regression (LR).
We also performed an extensive evaluation to determine the optimal wavelets and decomposition levels for analyzing and differentiating neurological disabilities.
The rest of the paper is organized as follows: First, we discuss prior research in neurological disorders utilizing EEG in the Previous Works section. Then, the proposed approach is explained in greater detail, emphasizing the importance of utilizing wavelet transform and integrating visualization. After presenting the research findings and implications, we conclude this paper with possible future works.
2. Previous Works
Various studies have been performed to understand neurodevelopmental disorders and analyze neurological diseases. Musser and Nigg [
12] conducted an analysis on electrocardiogram (ECG) and impedance cardiography data in individuals with ADHD to explore its connection with emotions. Out of a total of 100 datasets, 50 pertained to individuals with ADHD. They revealed that children, both with and without ADHD, exhibited similar facial affection behaviors. However, those with ADHD demonstrated reduced coherence between facial affection behavior and their parasympathetic function. In comparison to their counterparts without ADHD, children with ADHD may have experienced conflicting emotional signals. While it is acknowledged that genetic factors may strongly contribute to the onset of ADHD [
13], researchers emphasize the need for numerous ongoing studies to gain a more accurate understanding of symptoms and facilitate more effective medical treatments. Martin et al. [
14] emphasized the existence of gender differences in ADHD. They studied to explore gender differences through quantitative and qualitative analyses of genetic factors. Their research indicated a higher risk of ADHD among siblings of females. In another study, Maniruzzaman et al. [
15] investigated an optimal channel selection approach for predicting ADHD using machine learning techniques, including decision tree (DT), KNN, RF, LR, and SVM. The study employed a hybrid approach of SVM and an independent t-test to identify optimal channels, revealing six crucial channels (Fz, F8, F3, C4, C3, and F7) used for predicting ADHD.
Most neurological studies using EEG have focused on building a predictive model with a single neurological disease. For instance, Khare and Acharya [
16] examined EEG data to find appropriate brain regions. Specifically, they focused on determining significant EEG channels by employing entropy, statistical measures, and nonlinear features to identify ADHD. They identified the crucial role of frontal regions in ADHD detection and highlighted specific EEG channels (Fz, F7, Pz, P7, and Cz) as significant for ADHD analysis. Coelho et al. [
17] evaluated EEG Hjorth features extracted from the brain regions, i.e., the parietal, frontal, central, and occipital lobes, to identify Parkinson’s disease patients. To evaluate the performance of differentiating the patients with Parkinson’s disease and healthy individuals, they utilized classification methods such as SVM, KNN, and RF. Koh et al. [
18] introduced an approach to detect ADHD and conduct disorder (CD) using ECG signals by integrating empirical wavelet transform (EWT). They applied feature extraction by measuring entropy and performed feature selection via the analysis of variance (ANOVA) test. They assessed the selected features to determine ADHA and CD using machine learning techniques such as SVM, decision tree (DT), and KNN. Yasin et al. [
19] reviewed previous EEG studies focused on MDD and observed a growing demand for applying deep learning (DL) in EEG analysis. The study also underscored the challenges in interpreting results from neural network-based approaches. Mulaffer et al. [
20] highlighted the significance of EEG features in insomnia detection by comparing them with hypnogram-based features using SVM. The study employed C3 and C4 channels to assess these features. Although numerous research studies have been performed to identify neurodevelopmental disorders, there is still a need to identify essential features for accurate identification. SVM has been underscored as a crucial technique for analyzing EEG signals in dyslexia detection [
21,
22]. Despite extensive research, it is still necessary to find essential features to identify them accurately. Seshadri et al. [
23] proposed an approach to detect learning disabilities (LD), a neurodevelopmental disorder that severely impacts children’s lives. They assessed classification performance using various ML classifiers, including DT, KNN, SVM, ensemble classifiers, Naive Bayes, linear discriminant analysis (LDA), and LR, and neural network (shallow and deep) models. They segmented ten seconds of non-overlapping windows to extract features from EEG data. In detail, they used discrete wavelet transform (DWT) to decompose EEG data into different frequency levels with diverse EEG frequency bands (i.e.,
) to extract features such as mean, standard deviation, median absolute deviation, variance, interquartile range, kurtosis, and skewness. They showed the effectiveness of DWT in analyzing EEG data by evaluating the performance differences of multiple predictive models using combined features from nineteen channels.
Mumtaz et al. [
24] presented a method for diagnosing MDD by utilizing EEG power features with various machine learning algorithms, such as SVM, Naive Bayes (NB), and logistic regression (LR). They conducted data standardization through z-scores to identify critical features by eliminating potential outliers. The study revealed that SVM yielded high classification results in the analysis of MDD. Suuronen et al. [
25] used data collected from multiple academic institutions (i.e., the University of Turku, the University of New Mexico, and the University of Iowa) to investigate Parkinson’s disease. Sample entropy was used to extract features from different EEG frequency bands, delta ([0.5–4] Hz), theta ([4–8] Hz), low alpha ([8–10] Hz), high alpha ([10–13] Hz), and beta ([13–30] Hz). They developed a greedy-based feature selection procedure to determine the optimal channels for disease analysis. The relevance of the channel was determined by measuring the area under the ROC curve. Consequently, the performance using one channel and five channels with the eyes open (EO) datasets was determined as
and
, respectively. Additionally, crucial channels were identified as frontal, left-temporal, and midline-occipital.
Shilaskar et al. [
26] proposed an approach for dyslexia detection using principal component analysis (PCA). To address the data imbalance problem, they applied synthetic minority oversampling techniques (SMOTEs). Then, classification performance results to detect dyslexia were evaluated using various machine learning algorithms, including SVM, LR, RF, NB, decision trees, and KNN. The results indicated that the combination of SMOTE and RF accurately identified dyslexia. In a comprehensive review, Ahire et al. [
27] explored diverse approaches utilizing machine learning algorithms for dyslexia identification based on EEG signals. Their findings suggested that SVM outperformed other machine learning algorithms in this context. Zolezzi et al. [
28] concentrated on categorizing chronic pain into three groups (high, moderate, and low) by employing approximate entropy (ApEn) and absolute band power from five frequency bands: delta ([0.1–4] Hz), theta ([4–8] Hz), alpha ([8–12] Hz), beta ([12–30] Hz), and gamma ([30–100] Hz). By analyzing multiple datasets with chronic neuropathic pain, they found that combining three bands (theta, alpha, and beta) with the ApEn feature separated the pain group well with the approximate entropy. Aydin et al. [
29] performed a classification of patients with obsessive–compulsive disorder (OCD) using SVM with EEG segments of 2 s. By employing three entropy measurements (approximate entropy, sample entropy, and permutation entropy) after segmenting EEG in 2 s, they determined that permutation entropy yielded the highest accuracy in classifying OCD.
Researchers have also performed various multi-class classification studies for analyzing neurological disorders with EEG data. Alturki et al. [
30] conducted a study to analyze neurological disorders by extracting features from EEG sub-bands and applying multiple classification algorithms, such as LDA, SVM, KNN, and ANN. They used five statistical methods, including logarithmic band power (LBP), standard deviation, variance, kurtosis, and Shannon entropy (SE), from DWT with level four decomposition, ‘db4’ wavelet, and EEG segments of length 50 s, to extract the features. They performed an extensive analysis of evaluating classification accuracies in both two- and three-class scenarios. They also conducted a study with three types of EEG datasets, considering both single-channel and multi-channel modes of diagnosis. They identified that SVM produced high classification accuracy, especially when combined with the logarithmic band power feature. In a different approach, Tawhid et al. [
31] proposed a method for analyzing EEG signals to classify various neurological disabilities, such as autism, epilepsy, Parkinson’s disease, and schizophrenia, utilizing Convolutional Neural Networks (CNNs). As part of the pre-processing steps, EEG signals were segmented into 3 s intervals and transformed into 2D time–frequency–spectrogram images using a short-time Fourier transform. The classification analysis, incorporating five-fold cross validation, aimed to distinguish neurological disorders from normal subjects. The results indicated that the proposed CNN model achieved a notable performance accuracy of 98.33% in classifying the disorders.
Although numerous studies have been performed to identify neurological disorders or mental disorders by analyzing EEG data, the majority have centered on understanding individual disorders by simply applying diverse classification algorithms. The critical examination of each channel associated with neurological disabilities is essential for discerning both commonalities and distinctions among them. Despite this importance, limited studies have been performed to analyze multiple neurological disorders or mental disorders by evaluating their shared characteristics and variations. This study addresses this gap by undertaking a comparative analysis of various features, including EEG frequency bands, power spectral density, raw data, and wavelet features. Our study gauges the efficacy of these features in analyzing neurological disorders, utilizing a publicly available dataset containing diverse neurophysiological information encompassing conditions, such as ADHD, MDD, OCD, and Parkinson’s disease. Multiple features were extracted from the dataset using various EEG segmentation techniques, including short-term EEG segmentation based on DWT. EG signals are irregular and non-stationary with changing magnitude values over time [
31]. So, presenting information from a specific duration satisfying the stationarity assumption in EEG signals is important [
32,
33]. Thus, the choice of a segmentation length in analyzing EEG signals is always challenging. Using longer segmentation lengths may produce misleading results [
32] due to increased non-stationarity. Also, utilizing longer segmentation lengths typically results in an elevated computational cost [
34,
35]. Numerous studies are conducted on a single neurological disorder with various segmentation lengths empirically as 1 s, 2 s, 3 s, and 5 s [
16,
31,
35,
36,
37,
38]. In this study, we examined two segmentation lengths (1 s and 2 s) to identify neurological disabilities by comparing their features, classification performance results, and visual representations. To the best of our knowledge, no previous studies have provided detailed comparisons illustrating feature extraction, classification performance results, and visual representation in different segmentation lengths with various neurological disorders. To advance our understanding of the distinctions among neurological disorders, our study advocates adopting a sophisticated analysis technique—visual analytics. In detail, multiple visualization methods were employed to generate graphical representations of data, facilitating interactive visual analysis for users. Additionally, an extensive region-based analysis was performed to address the limitation of comprehending neurological disorders solely through a broad perspective, initiating a detailed exploration of specific regions. A detailed explanation of the conducted procedures and steps is provided in the following sections.
3. Methods
As mentioned above, the primary objective of this study is to discern neurological disabilities by combining wavelet-based feature extraction and visual analysis. Since the identification of crucial features that encapsulate the distinctive characteristics of neurological disabilities is pivotal in differentiating them, our approach involves an examination of the most pertinent channels associated with each disability and an assessment of the performance variances between employing single-channel- and region-based predictive models. The incorporation of visual representations of features enhances the capacity to comprehend both the extracted features and EEG data, facilitating the differentiation of various disabilities. Our proposed approach contains four steps: pre-processing, extracting and selecting features, generating predictive models, and conducting visual analysis. Detailed elucidations for each of these steps are added in the following subsections.
3.1. Data Description
We used a publicly available TDBRAIN dataset [
39]. The dataset contains resting state raw EEG signals with relevant clinical and demographic data collected from psychiatric patients between 2001 and 2021. The dataset contains 1274 EEG signals (with a sampling frequency of 500 Hz) for psychiatric patients (aged 5 to 89 years). It includes both formal diagnoses and referral indications (unofficial diagnoses). The raw EEG signals have 26-channel (10–10 Ag/AgCl electrodes) recordings with eyes open (EO) and eyes closed (EC) for 2 min each. The data contain both healthy and several disorders such as major depressive disorder (MDD), attention deficit hyperactivity disorder (ADHD), subjective memory complaints, obsessive–compulsive disorder, burnout, dyslexia, chronic pain, Parkinson’s, tinnitus, insomnia, and migraine as indications. In this study, we only analyzed the subjects that matched between each indication and formal diagnosis. Collected during eyes open (EO) and eyes closed (EC) tasks, the data in different task conditions are referred to as EO-data and EC-data throughout the paper for clarity. We analyzed 23 channels (C3, C4, CP3, CP4, CPz, F3, F4, F7, Cz, F8, P3, P4, FC3, FC4, FCz, Fp2, Fz, O1, O2, Oz, P8, T7, and T8). Given the prevalence of subjects categorized as ‘Unknown’ for formal diagnoses, we only considered the data associated with seven disabilities: ‘burnout’, ‘dyslexia’, ‘chronic pain’, ‘MDD’, ‘OCD’, ‘ADHD’, ‘Parkinson’s disease’, and ‘healthy’ condition (see
Table 1).
Due to the existence of numerous disorder types in the dataset, we analyzed the data by categorizing them into two groups: ‘normal (healthy)’ and ‘abnormal (disabilities)’.
3.2. Feature Extraction and Selection
Data pre-processing plays a crucial role in eliminating noise and obtaining clean data for EEG analysis. In this study, we employed notch and band-pass filters to enhance the data quality. Since the EEG data contain a 50 Hz frequency associated with power line noise, the notch filter was applied for frequency removal. Then, the band-pass filter within the frequency range of 0.5 to 100 Hz was employed for further refinement.
Feature extraction is an essential step influencing the overall classification performance. To analyze the EEG data, a non-overlapping segmentation is utilized to split EEG signals into distinct segments. For the multi-channel EEG data of subject , where N indicates the total number of the subjects, including healthy and unhealthy subjects, each channel is divided into segments with a pre-defined segmentation length. Specifically, , where represents the total number of segments for the subject k, and l denotes the total length of the EEG data. The i-th segment of a channel is represented as containing T instances, where and , where is the pre-defined segmentation length and is a sampling rate. As the suitable segmentation length for EEG data analysis is unknown, we evaluated the EEG data with two segmentation lengths (i.e., second, 2 seconds). The class label of the i-th channel is determined by , and 0 and 1 represent “normal” and “abnormal”, respectively.
We employed four different feature extraction approaches to extract features from each segment to compare and examine the capability of identifying the abnormal condition. The extracted features encompass raw features, EEG frequency band information, power spectral density (PSD), and wavelet features. The raw features represent the features extracted directly from each segment, calculated by determining the average of the z-score, mean, variance, kurtosis, and entropy. The EEG frequency band information is broadly used in analyzing EEG data. To extract EEG frequency band features, commonly used EEG frequency band information, such as
, and
, was identified in each segment. Then, the average and standard deviation of the frequency band were computed. For the extraction of PSD features, Welch’s periodogram [
40] was employed to compute the frequency spectrum of EEG signals using fast Fourier transform (FFT). The resulting PSD features comprised five attributes: frequency, energy, variance, sample entropy, and permutation entropy. While FFT is effective in identifying the underlying frequency information within the EEG data, it struggles to detect local transient changes at specific frequencies. Therefore, wavelet transform (WT) is considered due to its suitability for analyzing non-stationary data, offering the capability to extract both frequency (scales) and time information. Wavelets are characterized by small waves with limited duration and zero average values [
41]. WT is good for analyzing data at a specific time and frequency or revealing information with different scales [
42]. It extracts local information with different frequencies to reveal trends, discontinuities, and repeated patterns underlying the data. Through the analysis of low- and high-frequency information, data trends and deviations (representing low and high fluctuations) can be identified accordingly [
43]. DWT decomposes the EEG data at a predefined level by splitting each level into two sub-bands containing different frequency ranges (i.e., low and high frequencies) using the formula
where
represents the EEG data at the segment
of the channel
c, and
is a scaling function, called wavelet basis functions, generated by the mother wavelet
.
a and
indicate the scale and translation factor, respectively, controlling the scaling and translation of the wavelet. In our study, DWT was employed to extract features by decomposing EEG data into high- and low-frequency information at each level. High-frequency information produces detail coefficients, presenting any rapid or sudden changes in the data, while low-frequency information produces approximate coefficients, showing slow changes in the data. The detail coefficients are particularly useful for detecting rapid changes, such as discontinuities or sudden shifts. Using the coefficients, the following wavelet features are computed at each level:
where
represent the wavelet coefficients at the
jth level (
),
l denotes a pre-defined decomposition level,
i indicates the elements of the coefficients (
),
indicates the mean of
,
represents the sum of the coefficients in level
l, and
n denotes the length of the coefficients.
, representing the noise variance, can be computed as
, where
indicates the median of the coefficients, and
presents the wideband neuronal signal for Gaussian noise [
44]. We evaluated various mother wavelets and decomposition levels to examine their similarities and differences for analyzing neurological disabilities.
Once all features were extracted, we performed feature selection to remove any redundancies. In detail, we employed one-way analysis of variance (ANOVA) [
45] to determine statistically significant features (
). Only the statistically significant features were utilized for classification and visual analysis.
3.3. Data Classifications
To assess the effectiveness of the selected features in identifying unhealthy conditions, we conducted a classification analysis using machine learning (ML) algorithms. Specifically, four classification algorithms—SVM, RF, KNN, and LR—were employed to measure the performance differences. The four features, i.e., wavelet, raw, PSD, and EEG frequency band, were tested to explore their distinctiveness in classifying “normal” and “abnormal”. In detail, we conducted the classification analysis under two conditions: single channel based and region based. The single-channel-based analysis involved the independent analysis of data from each channel. The region-based analysis focused on the examination of five brain regions, i.e., frontal, temporal, central, parietal, and centroparietal, whose channel information is detailed in
Table 2. All features in each brain region were combined to investigate the capability of identifying unhealthy subjects. Various performance metrics, including precision, recall, F1 score, and the area under the receiver operating characteristic curve (AUC), were compared using
k-fold cross validation to evaluate the classification performance and identify unhealthy subjects. With
k-fold cross validation, the feature set in channel
c, denoted as
, was divided into
n subsets, represented as
. In detail,
subsets were used for training, and the remaining one was employed for testing the trained model. This process was repeated
n times, and the average performance scores were determined.
3.4. Visual Analysis
Conducting data classifications with different machine learning algorithms is valuable for assessing their effectiveness in distinguishing between “normal” and “abnormal”. However, the clarity in differentiating between them is hindered by the imbalanced data. Imbalanced data may cause challenges in comprehending the performance of a classification model due to its inherent unreliability. To address this issue, we considered leveraging visualization techniques to represent the extracted features by mapping them into visual glyphs. Visual representations can help in understanding the patterns and structures of the data by revealing underlying patterns. Since the data utilized in this study consist of numerous variables, dimension reduction is applied to show the data in a lower-dimensional space (i.e., 2D space). In the visualization domain, various dimension reduction techniques have been employed for presenting high-dimensional data, including PCA, LDA, MDS (multi-dimensional scaling), t-SNE (t-distributed stochastic neighbor embedding), UMAP (Uniform Manifold Approximation and Projection), and others [
46]. Among them, t-SNE and UMAP have been broadly used recently by researchers in representing data because they show a better separation of the classes than others. However, these methods come with computational costs to approximate the optimal positions. Therefore, we considered using PCA to determine the points representing the high-dimensional data. It computes eigenvectors and eigenvalues to identify multiple principal components. By utilizing the 1st and 2nd principal components, the data can be effectively represented in a 2D space [
47].
We also employed parallel coordinates visualization [
48] to represent all data instances. It generates polylines through vertically arranged variables (i.e., parallel axes), representing individual instances. Despite its effectiveness in presenting multivariate data, researchers have proposed numerous variations of parallel coordinates visualization [
49,
50]. However, parallel coordinates visualization has a major limitation when handling large data instances, occurring a visual cluttering problem where multiple polylines overlap. To address this problem, researchers considered reducing the number of polylines or rearranging (or reordering) the axes of the parallel coordinates. Since we utilized the parallel coordinates visualization to understand the patterns between classes, we did not consider addressing the visual cluttering problem in this study. Instead, we considered showing data distribution in each axis because it helps users understand clusters in each group (e.g., normal vs. abnormal). To represent data distributions in each variable, angular histograms [
51] were generated to illustrate the density and slopes of underlying polylines overlaid onto the parallel coordinates. Angular histogram is a technique that presents the frequency distribution of underlying data within each parallel axis by measuring the frequency of the data and the directional information of polylines in each axis.
Figure 1 shows an example of applying angular histograms to parallel coordinates. The histogram highlights density using different colors (red for high density and green for low density) based on the measured distributions. To measure the distributions, various statistical approaches have been proposed to determine an appropriate bin size for histograms [
52]. However, traditional histograms only present univariate data. In detail, a single histogram can only represent the frequency of the data plots within the selected vertical axis. Therefore, no unique statistical approach can be applied to determine an appropriate bin size for handling all variables in the parallel coordinates visualization. Instead, a user-driven approach is commonly utilized to determine the bin size of histograms in parallel coordinates [
51,
53,
54]. To generate density distributions, the angular histogram utilizes a user-defined binning approach to determine the denseness of each distribution. Each bin is defined as a direction vector representing the slope (
) of each polyline toward another parallel axis. The angular frequency distance (
) is computed by measuring the frequency in each bin and evaluating the distance to the next parallel axis.
4. Results
As mentioned above, we applied four different feature extraction approaches with two EEG segmentations (1 s and 2 s) to examine the effect of the segmentation length in distinguishing abnormal conditions. For applying DWT, we employed a pre-defined wavelet decomposition level of six () to extract wavelet features. Then, statistically significant features were determined by applying ANOVA with the p-value (<0.05). Varying numbers of significant features were determined in each channel. By analyzing the EEG frequency band features, we found three to nine and four to nine significant features from the EC- and EO-data, respectively. In detail, we found nine significant features with the 1 s EEG segmentation in the O2 channel. But, with the 2 s EEG segmentation, only three significant features were determined. For the FC3 channel, we found three and nine significant features with the 1 s and 2 s EEG segmentations, respectively. With the 2 s EEG segmentation, nine significant features were identified in the channels (i.e., CP4, Cz, F3, Fc3, Fz, O1, Oz, P3, P4, CP4, and T7), seven features in the channels (i.e., C3, C4, CP3, CPz, F4, F7, and Fp1), six features in the channels (i.e., FCz, and T8), and four features in the FC4. We also found the same number of features from the channels with the 1 s and 2 s segmentations: nine features—CP4, F3, and P7; eight features—F8; and six features—T8. We determined nine features from the channels (Cpz, Cz, and Fp2) in both the 1 s and 2 s segmentations. Among the various features, we found that the measured standard deviation was commonly identified as a significant feature. The measured average value was also identified as a significant feature in some channels. These features might be critical for analyzing the conditions precisely because they hold unique characteristics for differentiating them. The extracted features from and bands in all channels of the EC- and EO-data were also recognized as significant features.
By evaluating the PSD features from the EO-data, we identified that all features extracted from of the channels were determined as significant features. But for the features from the EC-data, only and were determined to produce all significant features with the 1 s segmentation and the 2 s segmentation accordingly. We also found that sample entropy (SmpE) and permutation entropy (PE) were identified as significant features from all channels in the EC-data. Three raw features (kurtosis, variance, and entropy) were selected as significant features from of the channels with the 1 s segmentation and with the 2 s segmentation. The two features (kurtosis and variance) were determined as significant features in most EC- and EO-data channels. But the feature (z-score) was not identified as a significant feature in any of the channels. The feature (kurtosis) was determined as a significant feature in the two channels (C3 and Oz) with the 1 s segmentation and in the four channels (C3, F3, FCz3, and Oz) with the 2 s segmentation with the EO-data. For the EC-data, only the P7 channel presented significant features with the 1 s segmentation. When evaluating the wavelet features, we determined the features from the coefficient levels (3, 4, and 5) corresponding to 4 Hz∼32 Hz to be significant features in the 1 s and 2 s segmentations. In detail, the features ( corresponding to 16 Hz∼32 Hz were selected as significant features in level three detail coefficients. Only the features ( were identified as significant features in level one detail coefficients. The feature () was also identified as a significant feature in the EC- and EO-data with the 2 s segmentation.
4.1. Single-Channel Analysis
Figure 2 shows examples of the four features extracted from the channels (O2, C4, Fp1, and T7) using the EC-data with the ‘db4’ wavelet and level six decomposition. The raw features exhibited subtle distinctions with minimal overlaps among the various disabilities. Specifically, the T7 channel presented slight differences in patterns between OCD, dyslexia, and Parkinson’s (see
Figure 2M). In both the O2 and T7 channels, OCD was distinguishable, presenting an isolated pattern (high values) compared to others (see
Figure 2A,M).
Using the PSD features, we found distinct patterns between the two disabilities (dyslexia and ADHD) and the healthy condition, particularly in the Fp1 and O2 channels. Notably, the C4 and T7 channels exhibited nearly identical patterns between dyslexia and ADHD. When analyzing EEG frequency band features, a clear separation among dyslexia, Parkinson’s disease, and MDD was determined. When employing wavelet features, the Fp1 channel emerged as effective in differentiating between dyslexia, ADHD, Parkinson’s disease, chronic pain, and the healthy condition. For the O2 channel, we observed differences, especially when comparing features of OCD, ADHD, and Parkinson’s disease with others. Results for the disabilities (burnout and MDD) were almost identical across most channels. Additionally, chronic pain and Parkinson’s disease showed almost similar patterns in the C4 channel. We also found a high similarity between MDD and Parkinson’s disease in the Fp1 channel. But, we noticed completely different patterns between them in some of the channels (O2, T7, and C4). MDD and burnout displayed high similarity across most channels, except for the T7 channel. Among the four features, the wavelet features were determined as superior in distinguishing the disabilities from healthy. Specifically, we identified that the wavelet features in the Fp1 channel could be a good indicator for differentiating Parkinson’s disease, OCD, ADHD, dyslexia, and healthy because of the clear separation between them. With the 2 s segmentation, we observed the wavelet features in the C4 channel showed three distinctively separable groups: dyslexia and ADHD, chronic pain, and the rest of the disabilities.
Utilizing the level six decomposition (), we observed nearly identical patterns between ADHD and dyslexia in the O2 channel. However, a clear differentiation among the disabilities within the same channel became evident when employing the level seven decomposition (). Furthermore, with the level seven decomposition, we identified a distinct separation between MDD and the healthy condition, which was not apparent with the level six decomposition. Distinctions between Parkinson’s disease and burnout were highlighted through features extracted from level four and six detail coefficients and approximate coefficients in the T7 channel. We observed a clear separation among burnout, healthy, and other disabilities with the level seven decomposition, particularly when utilizing the features from the approximate coefficient corresponding to 1∼4 Hz. We also found that the features with detail coefficient levels five, six, and seven presented distinctive separations across the disabilities: dyslexia and ADHD, and other disabilities. In the Fp1 channel, chronic pain showed a clearly distinguishable pattern from other disabilities, similar to the results obtained using the level six decomposition. The wavelet features derived from the level seven decomposition revealed two to three divisible groups among the disabilities, whereas the level six decomposition showed individual differences.
Figure 3 illustrates the comprehensive results of single-channel classification for the EC-data with both 1 s and 2 s segmentations. We found that the wavelet features showed better classification performance results than those using other features. Among the classification algorithms, RF showed better classification performance than the others. While the 2 s segmentation showed slightly better classification results than the 1 s segmentation, variations in classification performance were noted across the channels depending on the segmentation (either 1 s or 2 s segmentation). Specifically, the channels F3, F4, F8, and CP4 showed enhanced performance results with the 1 s segmentation. From the evaluation of the classification performance results, we found an improvement in AUC with increased segmentation length. For the single-channel classification evaluation, we used the wavelet (‘db4’). Alternatively, different wavelets can be utilized to extract wavelet features. Based on our evaluation with another wavelet (‘coif’), we found slightly different classification performance results. For instance, the F1 classification scores for analyzing the EC-data were determined as RF (
), SVM (
), KNN (
), and LR (
). When analyzing the EO-data, the F1 scores were identified as RF (
), SVM (
, KNN (
, and LR (
). We found the AUC scores to be RF (
), SVM (
), KNN (
), and LR (
). Although slightly different classification results can be observed depending on the applied wavelet, we determined that the wavelet features produced better classification performance results than other features.
By evaluating the classification performance results for the features with the 2 s segmentation using the level six and seven decompositions ( and 7), we found higher F1 scores with . But similar AUC scores were observed between them. When using the wavelet features extracted from the EC-data with ‘db4’, we found the overall average F1 scores for all channels to be RF , SVM , KNN , and LR . For evaluating the wavelet features from the EO-data, the F1 scores were determined as RF , SVM , KNN , and LR . The overall average AUC of all channels from the EC-data was measured as RF , SVM , KNN , and LR . For the EO-data, the performance results were RF , SVM , KNN , and LR . Overall, we found slightly better performance reults with when comparing the F1 and AUC scores with 2 s segmentation. We also found slightly better performance in multiple channels (Fp1, Fp2, F3, F4, and FcZ) with .
4.2. Region-Based Analysis
We also conducted a region-based channel analysis on the five regions (frontal, temporal, central, parietal, and occipital).
Figure 4 shows the F1 scores for the applied classification algorithms. When evaluating the performance results between 1 s and 2 s segmentations, the 2 s segmentation showed slightly better results. We also found that the wavelet features performed better with the region-based classification. Because of numerous performance results from our study, three region-based classification results (accuracy, precision, and AUC) are listed in
Appendix A. Notably, we observed high accuracy and precision scores for the regions utilizing the wavelet features from the region-based classification analysis on the EC- and EO-data. We also found that the wavelet features showed better AUC scores. Among the classification algorithms, both KNN and RF showed better classification performance results. More importantly, we determined that RF demonstrated high classification performance results with the wavelet features (see the red glyphs in
Figure 4B). High F1 scores were identified in the frontal region with ML techniques. The wavelet features also showed high F1 scores with SVM, RF, and LR in all regions except KNN. When using KNN, minor differences were observed between the wavelet and EEG frequency band features in the frontal brain region (see
Figure 4C). The EEG frequency band features showed comparable performance with the wavelet features in the frontal region. Except for the frontal region, all other regions performed better with the wavelet features.
Additionally, we performed a comparison of the classification results employing different wavelets (
w) and decomposition levels (
l).
Table 3 presents the F1 scores for the region-based classification with the 2 s segmentation using two wavelets (
‘coif3’ and ‘db4’) and two decomposition levels (
6 and 7). We observed similar F1 scores between the two wavelets, with minor differences depending on the decomposition levels and the specific classification algorithms. Overall, the frontal region showed higher performance results across the regions. RF and KNN showed higher performance results with the EC- and EO-data. Specifically, the occipital and frontal regions showed enhanced performance results with
and
‘coif3’, while the central region exhibited better performance with
and
‘coif3’. However, considering the frequency of achieving higher performance by each classification algorithm, ‘db4’ was selected for
(
) with the EO-data. With
, the ‘coif’ was determined as the best wavelet when analyzing both the EC- and EO-data.
By measuring AUC scores for the frontal region in the EC-data with and ‘db4’, we determined the performance scores as SVM (, RF (), KNN (), and LR (). For analyzing the EO-data, we found the scores to be SVM (, RF (), KNN (), and LR (). When examining the same region with and ‘coif3’, the AUC scores for the EC-data were identified as SVM (), RF (), KNN (), and LR () and for the EO-data as SVM (), RF (), KNN (), and LR (). With and ‘db4’, the AUC scores for the EC-data were found to be SVM (, RF (), KNN (), and LR (), and those for the EO-data were SVM (, RF (), KNN (), and LR (). With and ‘coif3’, we found the scores for the EO-data to be SVM (, RF (), KNN (), and LR (), and for the EC-data, SVM (, RF (), KNN (), and LR (). Overall, KNN showed better results when comparing the AUC scores. By evaluating the average classification scores for all classification algorithms, we observed that the AUC scores with and ‘db4’ were higher than with and ‘coif3’ except for the central region in the EC-data. With the EO data, and ‘db4’ was higher in the frontal, occipital, and parietal regions. With the level seven decomposition (), the ‘db4’ performed better for the regions except for the frontal region with the EC data. With the EO data, the ‘db4’ was superior for all regions except the frontal and occipital regions.
4.3. Visual Analysis
To understand the patterns and structures of the different features, PCA-based visualization is performed.
Figure 5 shows examples of the PCA projections of different features. To generate the PCA projections, the first- and second-principal components are determined and used to show all instances in 2D scatterplots, mapped to the
x- and
y-axis, respectively. Although different visualizations were generated depending on the features, we could identify similarities among PCA projections. For instance, the visual representations with the EEG frequency band and wavelet features from the occipital region showed similar widely spread patterns (see
Figure 5A,D). Triangular shapes were also observed when analyzing the EEG frequency band and wavelet features from the O2 channel (see
Figure 5E,H). For the PSD features, we could not find any similarities compared to the visualizations with other features. But, we noticed completely different patterns when analyzing the visualizations with the raw features (see
Figure 5C,G). This might happen because of the small number of attributes and their high similarities. The visualizations with the features from the O2 channel showed various possible outliers (see the arrow). The raw features showed a dense region in the bottom left corner. Two mostly extreme outliers, representing ADHD and Parkinson’s instances, appeared in each corner of the PCA space. The PCA-based visualization is good for understanding the overall pattern of the features because PCA is useful for identifying major variances by determining the principal components from the data. However, we found that it was difficult to determine the unique difference between normal and abnormal groups with PCA. This might be because no major difference between them can be determined due to their high similarity. To address this limitation, we performed an additional analysis with parallel coordinates visualization.
To help users perform a comparative analysis, group-based visualizations are generated to represent the normal and abnormal groups in each separated parallel coordinates visualizations. Since we found that it was not easy to identify major differences between the groups within the PCA projections, multiple parallel coordinates visualizations were generated to determine the differences. In the parallel coordinates, the variables are arranged vertically to represent parallel axes. Angular histograms are overlaid on parallel axes to show the distribution of each variable.
Figure 6 shows multiple parallel coordinates visualizations with different features. The difference was not apparent in the parallel coodinates visualizations using the raw features because they may have similar characteristics. But, with the angular histograms, minor differences were observed. Specifically, the angular histograms of the variables O201 and O202 showed the high-density areas at the lower region in the normal group (see the arrows in
Figure 6C). The visualizations with the PSD and raw features showed similar results between the groups because there was no significant difference in the distributions of the features (see the zoomed representations of the angular histograms in the figure). The visualizations with the EEG frequency band and wavelet showed distinctive representations because of many statistically significant features. In detail, they showed various distinctive data distributions of variables (see the angular histograms highlighted by arrows).
5. Discussion
By evaluating the classification performance results, we found that the segmentation length (2 s) is suitable for analyzing the disabilities. We also identified that RF showed better performance results than the other algorithms. When evaluating the difference using the decomposition levels ( or 7), we found a clear distinction between chronic pain and healthy with . With the level , we also observed slightly better performance results in most channels except the four channels (F3, Fp1, F4, Fcz, and P4). The four channels showed opposite results with the level . These results indicated that a higher decomposition level should be applied for analyzing certain disabilities. However, this statement requires an extensive analysis of identifying the relationship between decomposition levels and EEG channels in producing classification performance results for analyzing disabilities. Since this analysis is not a primary consideration of this study, we leave this as our future work. Comparing the two wavelets (‘db4’ and ‘coif3’), we found similar classification performance results, even if different decomposition levels were applied. For the single-channel-based classification, we found a higher performance result with RF ( difference) when using either wavelet. However, slightly better performance results with the 2 s segmentation and ‘coif3’ were determined when analyzing the EO-data with . Furthermore, for the region-based classification, ‘coif3’ performed better than ‘db4.’ Since DWT is suitable for detecting local events with different frequency scales, combining multiple wavelet features from different channels could synergistically improve the overall performance of identifying disabilities.
As discussed above, the most influencing features were determined with the ANOVA test to distinguish the disabilities in the channels. The cardinality of determining the subset of the features as significant features varied depending on the channel, segmentation length, and feature approach. For example, four EEG frequency band features were selected in the O2 channel with the 2 s segmentation, producing the recall performance as SVM (0.787), RF (0.807), KNN (0.780), and LR (0.784) for analyzing the EO-data. For the EC-data, nine features were selected, producing the performance results of SVM (0.859), RF (0.857), KNN (0.843), and LR (0.858). When using the wavelet features, the same cardinality of features was selected in the O2 channel, even when using different segmentation lengths. Instead, the C4 channel showed the different cardinality as important features between the two segmentation lengths. The recall scores with the 2 s segmentation were determined as the EC-data of SVM (0.883), RF (0.898), KNN (0.871), and LR (0.879), and the EO-data of SVM (0.852), RF (0.868), KNN (0.844), and LR (0.852). The same metric score performance with the 1 s segmentation was SVM (0.813), RF (0.859), KNN (0.844), and LR (0.814) with the EC-data and SVM (0.714), RF (0.806), KNN (0.782), and LR (0.725) with the EC-data. Overall, the EO-data with the 2 s segmentation provide better performance results. We found that the higher the cardinality, the better the performance presented.
For understanding the neurological disabilities depending on their age groups, multiple parallel coordinates visualizations were generated as shown in
Figure 7. With the traditional parallel coordinates visualization (i.e., representing each instance as polyline), we could not determine the difference between the age groups when analyzing the data. However, we identified the difference between the groups by evaluating the angular histograms (bin size = 25) that present the distribution of each variable. For instance, the variable (‘Cz07’) represents fat-tailed distributions (i.e., skewed distributions). But distinctive distributions were observed depending on the age groups. In detail, the high peak appeared near the centerline of the angular histograms in the age 20s. But, the high peak existed at the bottom edge of the angular histograms in the age 70s. The differences, depending on the age groups, were more clearly visible in the variable ‘C417’. Specifically, the distribution in the age 40s showed skew normal distribution, having normal distribution but leaning toward one side of the distribution. Although identifying the primary cause of such distinctive distributions depending on the age groups cannot be determined from the visualization, it highlights the need to conduct extensive analysis to understand the effectiveness of each attribute in differentiating the distributions in the age groups. We performed Tukey’s Honest Significant Difference (HSD) test to examine the differences between the age groups. We found most variables showed significant differences (
) between the age groups. For instance, the variable (‘Cz07’) in the Cz channel showed significant differences (
) except for the age groups between 0 and 50 (
), 0 and 70 (
), 20 and 70 (
), and 30 and 60 (
). The variable (‘C417’) in the C4 channel presented significant differences (
) except for the age groups between 20 and 30 (
), and 40 and 50 (
).
We conducted an additional analysis to understand the difference between the disorders by generating multiple parallel coordinates visualizations depending on the conditions (seven neurological disorders and one healthy condition).
Figure 8 shows eight parallel coordinates visualizations with angular histograms (bin size = 25) arranged horizontally. From the figure, we determined distinctive patterns depending on the conditions. For instance, the healthy condition showed similar frequency distributions compared to the disorders (i.e., Parkinson’s disease and MDD) in the variables (‘C300’∼‘Cz06’—from left to right in the visualization). But the rest of the other variables represented distinctive distributions between them. For example, the variables (‘Cz10’ and ‘Cz11’) in the healthy condition showed highly skewed distributions (i.e., asymmetric distributions) (see
Figure 8A). Although the disorders MDD and Parkinson’s disease also represented asymmetric distributions, their distributions were different (see
Figure 8B and
Figure 8C, respectively). More interestingly, Parkinson’s disease shows peaks at the bottom edge of the distributions (see
Figure 8C). These findings can be used to extract more critical features from data to differentiate the disabilities. The disorders chronic pain and dyslexia showed similar frequency distributions in the variables (‘C300’∼‘Cz06’—from left to right in the visualization). The disorders burnout and OCD represented sparse frequency distributions (small-sized histograms appeared on one side of the distribution) compared to other conditions. This may have occurred because fewer instances exist in the dataset.
The limitations of this study are as follows. First, this study used a relatively imbalanced sample size of individual neurological disabilities. Second, we did not perform extensive analyses of identifying individual neurological disability comparisons with healthy subjects. At last, we only provided comparisons of two decomposition levels (i.e., six and seven) with short-length segmentation (1 s and 2 s). To have a full understanding of neurological disabilities, extensive analysis with different datasets needs to be performed. Various segmentation sizes should be tested to determine the effectiveness of different segmentations for analyzing neurological disabilities.
6. Conclusions and Future Works
This study presents a novel approach to analyzing EEG data with neurological disabilities, incorporating DWT, ML classification algorithms, and visualization techniques. Despite various existing methods utilizing feature extraction and machine learning, previous studies have faced challenges in accurately identifying disabilities and enhancing understanding of the features and data. To address these limitations, we introduced a feature extraction approach using DWT to differentiate disabilities with short-length segmentations (e.g., 1 s and 2 s). We evaluated the effectiveness of this approach by employing multiple feature extraction methods (extracting the EEG frequency band, power spectral density, and raw features) and comparing their classification performance with wavelet features. Specifically, we conducted single-channel-based and region-based classifications. Then, further extensive analysis was performed through visual analysis using multiple visualization techniques. The comprehensive analysis revealed that wavelet features with the region-based classification exhibited strong performance results in differentiating disabilities, with the frontal brain region proving particularly effective. Among the classification algorithms, RF and KNN presented good performance results in the region-based classification, with RF demonstrating higher performance than other algorithms such as SVM, KNN, and LR. While evaluating different decomposition levels and wavelets for analyzing disabilities, no explicit patterns were observed. However, we noted that the ‘db4’ wavelet with 2 s segmentation and level six decomposition showed high classification performance results in the single-channel-based classification. Additionally, we found that the ‘coif’ wavelet yielded high performance with level seven decomposition, and the ‘db4’ with level six decomposition demonstrated high performance in the region-based classification.
For future works, we will expand our study through extensive analyses, exploring individual disabilities compared to healthy conditions across channels and region-based classifications while considering age and gender differences. Additionally, we will conduct multiclass classifications of neurological disabilities across individual channels. The integration of deep learning techniques will also be explored as a means to distinguish between disabilities using the proposed features. Furthermore, we plan to test longer EEG segmentation lengths, such as 5 s, 10 s, 20 s, 30 s, and 60 s, with various wavelets to determine their effectiveness for analyzing neurological disabilities. This will enable us to examine the relationship between wavelets and decomposition levels, uncovering the distinctive characteristics associated with different disabilities.