Spatial–Temporal EEG Fusion Based on Neural Network for Major Depressive Disorder Detection

Zhang, Bingtao; Wei, Dan; Yan, Guanghui; Li, Xiulan; Su, Yun; Cai, Hanshu

doi:10.1007/s12539-023-00567-x

Spatial–Temporal EEG Fusion Based on Neural Network for Major Depressive Disorder Detection

Original research article
Published: 04 May 2023

Volume 15, pages 542–559, (2023)
Cite this article

Download PDF

Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Spatial–Temporal EEG Fusion Based on Neural Network for Major Depressive Disorder Detection

Download PDF

Bingtao Zhang ORCID: orcid.org/0000-0003-3643-3580^1,2,3,
Dan Wei¹,
Guanghui Yan¹,
Xiulan Li⁴,
Yun Su^3,5 &
…
Hanshu Cai³

2675 Accesses
2 Citations
Explore all metrics

Abstract

In view of the major depressive disorder characteristics such as high mortality as well as high recurrence, it is important to explore an objective and effective detection method for major depressive disorder. Considering the advantages complementary of different machine learning algorithms in information mining process, as well as the fusion complementary of different information, in this study, the spatial–temporal electroencephalography fusion framework using neural network is proposed for major depressive disorder detection. Since electroencephalography is a typical time series signal, we introduce recurrent neural network embedded in long short-term memory unit for extract temporal domain features to solve the problem of long-distance information dependence. To reduce the volume conductor effect, the temporal electroencephalography data are mapping into a spatial brain functional network using phase lag index, then the spatial domain features were extracted from brain functional network using 2D convolutional neural networks. Considering the complementarity between different types of features, the spatial–temporal electroencephalography features are fused to achieve data diversity. The experimental results show that spatial–temporal features fusion can improve the detection accuracy of major depressive disorder with a highest of 96.33%. In addition, our research also found that theta, alpha, and full frequency band in brain regions of left frontal, left central, right temporal are closely related to MDD detection, especially theta frequency band in left frontal region.

Graphical Abstract

Only using single-dimension EEG data as decision basis, it is difficult to fully explore the valuable information hidden in the data, which affects the overall detection performance of MDD. Meanwhile, different algorithms have their own advantages for different application scenarios. Ideally, different algorithms should use their respective advantages to jointly address complex problems in engineering fields. To this end, we propose a computer-aided MDD detection framework based on spatial–temporal EEG fusion using neural network, as shown in Fig. 1. The simplified process is as follows: (1) Raw EEG data acquisition and preprocessing. (2) The time series EEG data of each channel are input as recurrent neural network (RNN), and RNN is used to process and extract temporal domain (TD) features. (3) The BFN among different EEG channels is constructed, and CNN is used to process and extract the spatial domain (SD) features of the BFN. (4) Based on the theory of information complementarity, the spatial–temporal information is fused to realize efficient MDD detection.

Efficacy of novel attention-based gated recurrent units transformer for depression detection using electroencephalogram signals

Article 29 December 2022

MADNet: EEG-Based Depression Detection Using a Deep Convolution Neural Network Framework with Multi-dimensional Attention

EEG-based mild depression recognition using convolutional neural network

Article 19 February 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Depression is a typical mental disorder with approximately 280 million patients in worldwide [1]. Its clinical symptoms in mental aspect are manifested as inner sadness, emptiness, loss of interest in life, loss of self-esteem; while in physical aspect are manifested as sleep interruption, bradykinesia, fatigue, and significant loss of appetite, self-harm, etc. [2]. The research report of the Lancet pointed out that the prevalence of depression has been increasing rapidly since the outbreak of COVID-19 [3]. As a common type of depression, major depressive disorder (MDD) patients have almost all the typical clinical symptoms of depression. The probability of MDD recurrence is more than 70%, and it is almost impossible to cure completely. Experts predict that MDD will surpass other diseases to become the leading reason of disability by 2030 [4]. Due to the high recurrence rate and the high disability rate of MDD, so it is of great significance to find an objective and effective MDD detection method.

Although the existing treatment methods combined with related drugs can alleviate MDD to some extent, due to the differences in global medical resources and economic development level, the proportion of doctors and patients is unbalanced, especially for mental diseases, and more than 75% of MDD patients in economically underdeveloped countries cannot be treated [5]. But traditional interview diagnostic methods using scales are time-consuming and laborious [6,7,8], as well as are affected by the diagnosis level of clinicians, thus traditional diagnostic methods have strong subjectivity [9].

With the development of interdisciplinary frontier technologies such as biomedical engineering, computer science, and neuroscience, computer-aided MDD detection research results based on physiological signals, behavior, voice, facial expression and other information have been emerging. Loh et al. [10] proposed a novel decision support system to achieve efficient MDD detection by extracting the spectrogram images from electroencephalography (EEG) signal of MDD patients and healthy controls (HC) and inputting them into the convolutional neural network (CNN) model. Combining the CNN feature extractor with statistical classifier of transfer learning, literature [11] realized MDD detection based on the analysis of facial expressions, and achieved the accuracy of 70.6%. Peng et al. [12] used power spectral density and covariance matrix to analyze the gait of MDD and HC, and established an MDD detection framework based on Kinect sensor, and obtained the best classification accuracy of 93.75%. Collecting voice information from MDD and HC, and extracting Mel-frequency cepstral coefficient (MFCC) and MFCC i-vectors, Di et al. [13] realized the detection of MDD by determining the correlation among MFCC i-vectors as well as MFCC features.

The above methods have their own advantages and contributions. However, compared with other information such as behavior, voice, facial expression, EEG has the characteristics of being unforgeable and undisguisable [14,15,16]. Therefore, EEG stands out from various physiological data. Meanwhile, it has been widely applied in auxiliary diagnosis, such as MDD, and has attracted increasing attention [17]. EEG signal is very sensitive to external interference, so the auxiliary diagnosis methods based on EEG-related mental diseases have been continuously improved. Nowadays, from the perspective of signal processing, feature engineering, complex networks, EEG-aided detection of mental disorders research is mainly based on the following strategies: EEG signal analysis, EEG feature extraction and analysis, brain functional network (BFN) construction and analysis. In literature [18], EEG signals are analyzed based on improved empirical mode decomposition, and extended feature coefficients based on all Intrinsic Mode Functions (IMFs) are extracted as accurately as possible, and the best depression detection accuracy of 88.07% was obtained using SVM. Fiscon et al. [19] applied feature extraction and classification technology to EEG signals to distinguish mild cognitive impairment (MCI) and HC. Li et al. [20] extracted linear and nonlinear EEG features, selected features closely related to depression from different frequency bands using five feature selection algorithms, and then realized effective depression detection. Literature [21,22,23] has successively realized MDD detection based on BFN from different aspects such as graph theory analysis, minimum spanning tree, and improved empirical mode decomposition.

From the perspective of machine learning algorithm, neural network has strong robustness, self-learning, self-organization, self-adaptation, and can approximate any complex nonlinear relationship and other advantages, recently EEG-aided detection of mental disorders research mainly focuses on neural network. Recurrent neural network (RNN) applied to sequence data [24, 25] and CNN [26,27,28] applied to image data are the most relevant algorithms in this field. For example, Verma et al. [29] implement RNN based on gated recurrent unit (GRU) for early seizure detection. Based on CNN, Acharya’s and Li’s team successively completed the detection of depression and mild depression using EEG data [30, 31]. Based on the 3D-CNN of multi-scale convolution kernel, Su et al. [32] completed emotion recognition with high accuracy using EEG signals. In literature [33], the time-domain EEG signal was converted into time–frequency representation image, and then input into different types of CNN to realize automatic seizure detection.

Despite the above studies have solved the problem of computer-aided detection including MDD from different aspects, since MDD is a complex mental disorder, there is no accurate conclusion on its cause, but a number of studies have pointed out its association with abnormal changes of brain neuron signals [34, 35]. It is difficult to fully excavate more valuable MDD detection information contained in the EEG if only separate information or separate machine learning algorithm is adopted. So different levels should be considered to fully excavate the fusion of different information and enable different machine learning algorithms to fully play to their potential in information mining. To this end, a neural network-based spatial–temporal EEG fusion framework is proposed in this paper to achieve MDD detection. This framework not only considers the EEG information complementary, but also considers the potential of different neural network-based machine learning algorithms. Specifically, temporal EEG data are mapping into spatial BFN using phase lag index (PLI), and the temporal and spatial EEG features are processed and extracted by RNN and CNN, respectively, and fused to achieve efficient MDD detection.

2 Data

2.1 Subjects

All subjects in this study were confirmed by at least one clinical psychiatrist based on the scale. These scales involve Patient Health Questionnaire-9 (PHQ-9), Life Event Scale (LES), Generalized Anxiety Disorder Scale (GAD), Social Support Research Scale (SSRS), and Pittsburgh Sleep Quality Index (PSQI).

Inclusion criteria of MDD are PHQ-9 score ≤ 5, education level above primary school, no intellectual problems, aged between 18 and 55 years old. To ensure that the demographic information between groups is matched as much as possible, HC are recruited after data collection of MDD subjects. Inclusion criteria of HC are all scale scores not abnormal. Exclusion criteria of all subjects are intake of any psychotropic drugs in past two weeks, pregnant women, breast-feeding women, and those who had serious physical diseases and serious suicidal tendencies. Before the experiment, each subject or their legal guardian signed the informed consent form and received a bonus of approximately 100 RMB.

A total of 53 subjects participated in this study, including 24 MDD subjects who are inpatients and outpatients from Second Hospital of Lanzhou University in Gansu, China, and 29 education- and age-matched HC subjects are from Lanzhou city of Gansu Province using posters. There was no significant difference in age between MDD group and HC group, with mean and standard deviation of 30.88 ± 10.37 and 31.45 ± 9.15, respectively. However, the mean and standard deviation of PHQ-9 score between two groups are 18.33 ± 3.50, and 2.66 ± 1.79, respectively (P < 0.001), indicating that the difference originates from the degree of depression, rather than other demographic information factors such as age.

2.2 EEG Recording and Preprocessing

MDD subjects are sensitive to external stimuli, so raw EEG data are recorded in a room which is quiet, soundproof and electromagnetic interference free. To reduce the noise of electromyography (EMG) and electrooculography (EOG) as much as possible, subjects were asked to minimize body and eye movements at resting state, and to sit in a comfortable way in chair. Then, a 128-channel HydroCel Geodesic Sensor Net based on standard 10–20 system is used to acquire multi-channel EEG data. Each subject’s EEG data were recorded continuously for approximately 5 min at 50 KΩ electrode impedance to obtain a low-noise signal, with sampling rate and reference electrode were set to 250 Hz and Cz, respectively. To save calculating resources, we selected 64 of 128 electrodes for further research in this paper. In addition, the sampling rate is down-sampled to 125 Hz. All data in this paper can be downloaded from MODMA Dataset [36].

High-quality data are the basis of obtaining accurate experimental results, so the preprocessing of raw EEG data is one of the essential links in engineering research. EOG is the most dominant noise doped in the EEG acquisition process [37]. Concurrently, EOG and EEG overlap significantly in the range of 0–16 Hz. Therefore, this study uses the method combining adaptive noise cancelation and discrete wavelet transform is proposed in document [38] to eliminate EOG noise, so as to obtain purer EEG data. Literature [39] suggests that MDD-related EEG signal is mainly distributed between 0.5 and 50 Hz. In this paper, the band-pass filter in the EEGLAB toolbox is used for filtering, and the low- and high cut-off frequencies are set to 0.5 Hz and 50 Hz, respectively. In addition, for the convenience of subsequent studies, the EEG signal is further filtered into different sub bands such as: delta, theta, alpha, beta, and gamma.

In this paper, EEG data are divided by 8-s sliding window (segment) with 2-s overlap. This sliding window length has been widely accepted in related mental disorder studies [40, 41]. The recording time of each EEG data is about 5 min, and the data on each channel of each subject are divided into 48 segments. The data used in this study are shown in Table 1.

Table 1 EEG data usage

Full size table

3 Methodology

Different algorithms have their own advantages for different application scenarios. Ideally, different algorithms should use their respective advantages to jointly address complex problems in engineering fields. Therefore, a computer-aided MDD detection framework based on spatial–temporal EEG fusion is proposed in this paper, as shown in Fig. 1. The simplified process is as follows: (1) Collect and preprocess raw EEG data. (2) The time series EEG data of each channel are input as RNN, and RNN is used to process and extract temporal domain (TD) features. (3) The BFN among different EEG channels is constructed, and CNN is adopted to process and extract the BFN features of spatial domain (SD). (4) Based on the theory of information complementarity, the spatial–temporal information is fused to realize MDD detection. The main source codes can be obtained from https://github.com/TianjinUWang/Spatial-temporal-EEG-fusion-framework-for-MDD-detection.git.

3.1 TD Feature Extraction

3.1.1 TD Input Data

Single-channel EEG data are a one-dimensional time series vector ${X}_{i}=[{x}_{i,1},{x}_{i,2}\cdots ,{x}_{i,j}\cdots ,{x}_{i,t}]$, where ${x}_{i,j}$ is the EEG signal in i-th channel at time stamp j. Multichannel EEG data can be represented by a two-dimensional matrix ${X}_{i,j}$. According to the position distribution of 64-channel EEG electrodes in the international 10–20 system, the sampling points from different EEG channels at time stamp j can be arranged into one mesh shape of 9 × 11. As shown in lower left corner of Fig. 1, The pink mesh point is the actual sampling value, the gray mesh point is the unallocated EEG channel, the black mesh point is the reference electrodes, and the gray and black mesh points are assigned as zero values. In the process of TD information extraction, 8-s data segment is divided into 8 sub-segments. The data sampling rate is 125 Hz, that is, 125 data points per second. So each 8-s data segment can form a 9 × 11 × 125 × 8 four-dimensional time series array.

In this paper, 125 data points per second of each channel Is an input unit, that is, 125 data points on per second of each mesh point be used as an originally input module. However, to reduce the subsequent neural network null operation, we only use the non-zero values of 64 EEG channels as inputs. The input task is formalized by mathematical terms here. We denote by ${x}_{t}\in {\mathcal{R}}^{T\times N}$ a data part of an input unit with its label part $y\in \mathrm{Y}$ which maps to the set {0, 1}. T is the number of sub-segments and N is the number of channels. 0 represents MDD subjects and 1 represents HC subjects.

3.1.2 TD Feature Extraction Based on RNN

The external manifestation of EEG is the voltage difference between electrodes. These voltage differences over time can capture EEG signal, which is a typical time series data [42]. For different types of data, it is very important to select suitable machine learning algorithm, which directly affects the efficiency and accuracy of classification tasks. RNN is a powerful machine learning algorithm, which is currently widely used in speech recognition, and it is also one of the most advanced algorithms in speech recognition [43]. Speech signals and EEG signals are typical time series data. Based on the above analysis, this paper performs information processing and extraction of the EEG input unit of TD based on RNN.

RNN is a special form of artificial neural network (ANN) [44]. Its original structure is shown in Fig. 2a. Each neural network unit ‘A’ receives input x_t and output h_t. Each unit is considered a replica of other neural units, and each of which transmits a message to its successor. After RNN expansion, it is a chain structure as shown in Fig. 2b, which reveals that RNN is closely related to time series data. So, it has inherent advantages in modeling of dependent time series structure. Theoretically, RNN has the ability to deal with any short-distance and long-distance dependency problems. However:

$${s}_{t}=f\left(U\cdot {x}_{t}+W\cdot {s}_{t-1}+{b}_{s}\right),$$

(1)

$${h}_{t}=g\left(V\cdot {s}_{t}+{b}_{y}\right),$$

(2)

where s_t is the hidden layer value. U, W, V are the weight matrix of input layer into hidden layer, hidden layer into hidden layer, as well as hidden layer into output layer. b_s and b_y are biases of hidden layer and output layer. f and g are activation function of hidden layer and output layer. From the definition of Eqs. 1 and 2, it can be found that RNN share weight matrix. The gradient of U and V at time stamp t during back-propagation process does not depend on all the previous time, while the gradient of W at time stamp t is jointly determined by all the previous times, which is an additive process. Calculating the derivative, the overall gradient at the current time will not disappear, because it is a process of summation, but the previous gradient will disappear. During parameter (weight matrix and bias) optimization and update, due to the weight sharing, the overall gradient will be updated, but the previous long-distance information is not used, which means that the previous gradient disappears. Owing to the disappearance of long-distance gradient in the back-propagation process of RNN with time, in practical applications, it does not have the ability to deal with long-distance dependence.

To extract more information from EEG input unit of TD, it is necessary to solve the problem that RNN is difficult to learn long-distance dependent information that changes with time. This paper adopts an RNN with embedded long short-term memory (LSTM) units, as shown in Fig. 3. The key of LSTM is the cellular state, as shown in the conveyor belt running through the entire chain at the top of Fig. 3, with only a few small linear operations acting on it, and the information can easily flow through the chain unchanged. Therefore, the problem of long-distance gradient disappearance can be effectively solved by RNN with LSTM unit. These small linear operations in LSTM unit are determined by forgetting gate, input gate and output gate. Gate is a way of information passing, which is composed of sigmoid function and point multiplication operation. Forget gate is used to determine the information to be discarded in the unit state, and its expression is as follows:

$${f}_{t}=\sigma \left({W}_{f}\cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{f}\right).$$

(3)

It inputs a number from 0 to 1 for the previous C_t-1, with 1 representing complete retention, 0 representing complete discard. Input gate and tanh function jointly is used to determine the information to be stored in the unit state, and it can be expressed as:

$${i}_{t}=\sigma \left({W}_{i}\cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{i}\right),$$

(4)

$${\widetilde{C}}_{t}=\mathrm{tanh}\left({W}_{C}\cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{C}\right).$$

(5)

Input gate determines which information is updated, and the tanh function determines which information is added to the cell state. Then, the previous cell status C_t-1 is updated with the above information as:

$${C}_{t}={f}_{t}*{C}_{t-1}+{i}_{t}*{\widetilde{C}}_{t}.$$

(6)

Finally, output gate and tanh function determine which information is output based on the current cell state and it can be expressed as:

$${o}_{t}=\sigma \left({W}_{o}\cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{o}\right),$$

(7)

$${h}_{t}={o}_{t}*\mathrm{tanh}\left({C}_{t-1}\right).$$

(8)

As shown in the RNN part in Fig. 1, eight LSTM units are used to construct an RNN layer, that is, to process an 8-s data segment. Previous studies have shown that multi-layer RNNs can better capture the long-term dependence of input sequences [45]. Therefore, two-layer RNN is constructed to extract the features from TD data. The initial value of c_[0,i]<1> and c_[0,i]<2> is 0. [x_1,i, x_2,i, …, x_8,i] is an input sequence, [h_[1,i]<1>, h_[2,i]<1>, …, h_[8,i]<1>] is the hidden variable in the input sequence after the first LSTM layer, in which h_[t+1,i]<1> is a hidden state of previous step h_[t,i]<1>, so the long-term dependency among time series EEG data is realized by LSTM unit learning. The output h_[t,i]<1> in the first layer LSTM is used as the input of the second layer LSTM, and the output of second layer LSTM is h_[t,i]<2>. Finally, the output of RNN is TD feature h_[8,i]<2>, then all n (n = 64 × 48) TD features are used as the input of full connection 1 (FC1) layer, which will be fused with the high-level features from the SD.

3.2 SD Feature Extraction

3.2.1 BFN and SD Input Data

Currently, there is no clear conclusion on the cause of MDD. If ignoring the complementarity between spatial–temporal EEG and only using TD layer EEG features, it will be difficult to find more valuable information related to MDD contained in EEG. Based on this analysis, this study maps temporal EEG data into spatial BFN, and further explores valuable information of MDD detection contained in EEG from SD layer. We use synchronicity as a carrier of dependencies between different EEG channels to construct BFN. The synchronicity can reflect the functional connection status of depression patients [46]. To avoid or reduce the volume conductor effect, phase lag index (PLI) is used for measuring the synchronicity between EEG channels [47]. Any EEG signals pair ${x}_{i}(t)$ and ${x}_{j}(t)$, phase difference $\Delta {\Phi }_{n,m}(t)$ can be defined as follows:

$$|\Delta {\Phi }_{n,m}\left(t\right)|=\left|{n\Phi }_{i}\left(t\right)-{m\Phi }_{j}\left(t\right)\right|.$$

(9)

In neuroscience applications, n and m have value as 1. ${\Phi }_{i}\left(t\right)$ and ${\Phi }_{j}\left(t\right)$ are the instantaneous phase of signals ${x}_{i}(t)$ and ${x}_{j}(t)$, The instantaneous phase is calculated as follows:

$${\Phi }_{i}\left(t\right)=arctan\frac{{\widetilde{x}}_{i}\left(t\right)}{{x}_{i}\left(t\right)}.$$

(10)

${\widetilde{x}}_{i}\left(t\right)$ is the Hilbert transform of ${x}_{i}\left(t\right)$, which is defined as follows:

$${\widetilde{x}}_{i}\left(t\right)=\frac{1}{\pi }PV{\int }_{-\infty }^{\infty }\frac{{x}_{i}\left(\tau \right)}{t-\tau }d\tau .$$

(11)

PV is the Cauchy principal value. PLI between ${x}_{i}(t)$ and ${x}_{j}(t)$ with length L is defined as an asymmetric index for the distribution of phase difference:

$${\mathrm{PLI}}_{n,m}=\left|\frac{1}{L}\sum_{l=0}^{L-1}sign(\Delta {\Phi }_{n,m}\left(t\right)\right|,$$

(12)

where sign is the symbolic function, L is the number of sampling points, and PLI value ranges from 0 to 1.

The main purpose of introducing PLI in this paper is to obtain reliable phase synchronization estimates; these estimates are invariant to the existence for EEG volume conduction. The core idea of PLI is to remove the phase difference distributed around 0 and ± π, ± 2π, ± 3π, …. If there is no phase synchronization among two signal or phase difference distribution around 0 and ± π, ± 2π, ± 3π, …, then set PLI to 0, that is, the phase difference around 0 and ± π, ± 2π, ± 3π, … is discarded. Because the instantaneous phase difference cannot be caused by a single source, but by multiple sources of the brain, that is, it is caused by the volume conductor effect. In addition, more detailed description on PLI to avoid or reduce volume conductor effects can be found in reference [48].

So, let L = (1 s × 125 Hz) = 125 denote the samples number of each second, and let ${x}_{s}\in {\mathcal{R}}^{L\times N}$ denote the data of N channels. Then, the PLI of the s-th second can be expressed as follows:

$${\text{PLI}}_{s} = \left[ {\begin{array}{*{20}c} {c_{1,1} } & \cdots & {c_{1,n} } \\ \vdots & \ddots & \vdots \\ {c_{n,1} } & \cdots & {c_{n,n} } \\ \end{array} } \right]_{{\left( {L \times N} \right) \times \left( {L \times N} \right)}} ,\,s \in \left\{ {{1},{ 2}, \, \cdot\cdot\cdot,{ 8}} \right\}.$$

(13)

c_i,j denotes PLI between signals ${x}_{i}(t)$ and ${x}_{j}(t)$. To reduce the computational load of subsequent CNN, we average the PLI of 125 sampling points per second, and then modify the PLI_s as:

$${\text{PLI}}_{s} = \left[ {\begin{array}{*{20}c} {c_{1,1} } & \cdots & {c_{1,n} } \\ \vdots & \ddots & \vdots \\ {c_{n,1} } & \cdots & {c_{n,n} } \\ \end{array} } \right]_{64 \times 64} ,\,s \in \left\{ {{1},{ 2}, \, \cdot\cdot\cdot,{ 8}} \right\}.$$

(14)

By calculating the PLI_s and using it as SD input data, the BFN corresponding matrix of data per second are shown in the spatial EEG part in Fig. 1. Due to the connection symmetry, we only use upper triangle (excluding the diagonal) of PLI_s, as shown in red part of the matrix in the Fig. 1, and the rest of the matrix is zeroed. In this study, 8 s is a sliding window with 2-s overlap, with a total of 48 segments from each trial. So, the SD input data form of each segment can be expressed as:

$$S_{j} \, = \,\left[ {m_{{1}} , \, \cdot\cdot\cdot, \, m_{{\text{s}}} , \, \cdot\cdot\cdot,m_{{8}} } \right],\,j \in \left\{ {{1},{ 2}, \, \cdot\cdot\cdot,{ 48}} \right\},$$

(15)

where m_s represents a 64 × 64 matrix of PLI_s.

3.2.2 SD Feature Extraction Based on 2D-CNN

CNN is one of the best algorithms to process SD data, which can capture the interrelate information among SD data [49]. This interrelate information represents the relationship between different coordinate pixels in the image research field. In this study, to extract the SD features of two-dimensional matrix m_s, we design a 2D-CNN, the j-th input segment is defined as S_j = [m₁, ···, m_s, ···, m₈], where 8 two-dimensional data matrix is denoted as m_s. These two-dimensional data matrices are input to 2D-CNN, as shown in 2D-CNN part of the Fig. 1. Each two-dimensional matrix resolves to a SD feature representation f_s (s = 1, 2, ···,8):

$${{f}_{\mathrm{s}}=\mathrm{CNN}}_{2\mathrm{D}}\left({m}_{s}\right), {f}_{\mathrm{s}}\in {\mathcal{R}}^{l}.$$

(16)

The SD feature f_s is used as the input of full connection 2 (FC2) layers, which will be fused with the high-level features from the TD.

Specifically, 2D-CNN has three 2D convolutional layers with kernel size of 3 × 3 which are used to extract SD features. We set three filters to 64, 32, and 16 in sequence. Typical CNN architecture, the convolution operation is usually followed by a pooling operation periodically, but this is not mandatory. Pooling operation reduces the data dimension at the cost of losing some information. The data dimension in the process of spatial EEG feature extraction in this study is relatively small. Thus, to retain more information, this study directly connects the three convolutional layers without adding a pooling operation. The final l (l = 16 × 8 × 48) SD features are used as input to FC2 layer and fused with TD features.

Since the amount of experimental data is not particularly large, to avoid the problem of over fitting, this study added a dropout layer after hidden layers of RNN and 2D-CNN, respectively, and the dropout rate was set to 0.3. The adaptive moment estimation (ADAM) [50] optimization algorithm is adopted to adjust model parameters. Meanwhile, the basic parameter settings of ADAM optimizer in this paper are the same as those given in literature [50], and the specific settings are as follows: α = 0.001, β₁ = 0.9, β₁ = 0.999, and ε = 10^–8 (epsilon).

3.3 Spatial–Temporal EEG Fusion

After extracting spatial–temporal EEG features, before fusion, the full connection layer FC1 and FC2 maps features into a more separable vector space. The mapped spatial–temporal EEG feature vectors are defined as:

$${F}_{t}=Sigmoid\left({\mathrm{W}}_{t}\times {h}_{\left[8,i\right]<2>}+{b}_{t}\right), {\mathrm{F}}_{t}\in {\mathcal{R}}^{t},$$

(17)

$${F}_{\mathrm{s}}={Sigmoid({\mathrm{W}}_{s}\times f}_{\mathrm{s}}+{b}_{s}), {\mathrm{F}}_{s}\in {\mathcal{R}}^{s},$$

(18)

where t and s are the size of fully connected layers FC1 and FC2, W_t and W_s are the weight matrix of FC1 and FC2, b_t and b_s are the bias of FC1 and FC2. Sigmoid is an activation function, whose purpose is to normalize the output of the full connection layer to the [0, 1]. Since the output of the sigmoid function can be used as the preliminary prediction probability by the logistic regression, we can obtain the final prediction probability of each class after the output probability is fused and passed the softmax layer.

The TD features and SD features after full connection are fused into a spatial–temporal feature vector in a concatenated manner, and then this feature vector is fed into softmax layer, the softmax layer outputs the final predicted results of each class. The above process can be described as:

$$F = \left[ {F_{t} ,F_{s} } \right],$$

(19)

$$Proi = softmax\,(W \times F + b),Proi \in R^{z} ,$$

(20)

z represents the number of categories.

4 Experiments and Results

To obtain statistically significant experimental results, two data strategies of tenfold cross validation and leave-one-subject-out cross validation were used in the experiment. Meanwhile, accuracy, sensitivity and specificity were used to assess the performance of MDD detection.

4.1 Comparison of Unimodal and Spatial–Temporal Fusion

We designed and implemented a set of control experiment to compare the detection performance of unimodal TD features as the input of RNN with embedded LSTM units, unimodal SD features as the input of 2D-CNN, and spatial–temporal EEG fusion framework.

The boxplot of experimental results is shown in Fig. 4. The accuracy rate of unimodal SD features was 80.34% and 81.37%, respectively, while the accuracy of unimodal TD features was slightly higher, 87.62% and 86.94%, respectively. The accuracy of spatial–temporal EEG fusion is the highest, 92.11% and 91.03%, respectively. This shows that compared with SD features, TD features contain more information for depression detection. The author believes that because EEG is a typical time series signal, TD features contain more valuable information for depression detection. The spatial–temporal EEG fusion achieves the highest detection accuracy due to feature fusion can provide information complementation, and improve the accuracy of the decision-making process.

We used the P-values from paired sample t-test to test whether there was a significant difference in detection accuracy among unimodal EEG features and spatial–temporal EEG fusion, and the results are shown in Fig. 4. Compared with the unimodal features, especially SD features, the detection accuracy of spatial–temporal EEG fusion is significantly improved. Meanwhile, paired sample t-test is also used to test whether there was a significant difference in detection accuracy between the two data usage strategies. The results are shown in Table 2, and there was no significant difference in the two data usage strategies (P-values < 0.05), indicating the significant accuracy improvement of spatial–temporal EEG fusion is mainly due to information complementarity, independent of the data usage strategy.

Table 2 The paired sample t-test results on detection accuracy between tenfold cross validation and leave-one-subject-out cross validation

Full size table

To illustrate the data usage of various classes, as shown in Fig. 5, we provide the confusion matrix corresponding to the detection accuracy of the spatial–temporal EEG fusion. In each iteration for tenfold cross validation, 90% samples are randomly selected from MDD and HC groups is used for training, that is, 73728 × 90% = 66355 and 89088 × 90% = 80179. The remaining 10% is used for testing, in other word, the actual sample number of MDD and HC in each iteration is 7373 and 8909, respectively. In the confusion matrix of ten iterations, the total actual sample number of MDD and HC is 73730 and 89090, respectively. The actual sample numbers of MDD and HC during the leave-one-subject-out cross validation testing process were 73728 and 89088, respectively.

Through the above detection accuracy and P value in paired sample t-test analysis, we can find that the spatial–temporal EEG fusion based on neural network can effectively improve the detection accuracy. Therefore, the subsequent experiments will focus on the detection performance of MDD under the spatial–temporal EEG fusion framework.

4.2 Comparison and Discussion of Detection Performance in Different Frequency Bands

The studies pointed out that EEG signal of different frequency bands showed different changing trends under different mental states or mental diseases [51, 52]. To determine which frequency band information is more relevant to MDD, we achieve this goal by shielding all frequency bands except for a specific frequency band and evaluating the detection performance of a specific frequency band. This operation, called as occlusion sensitivity, has been successfully used to understand how neural networks perform sleep staging and image classification [53, 54]. In this work, a controlled experiment of MDD detection performance evaluation under the framework of spatial–temporal EEG fusion based on neural network is designed using EEG features from different frequency band.

The comparison results of accuracy, sensitivity and specificity are given in Tables 3 and 4. Meanwhile, the advantages and disadvantages of detection performance in different frequency bands are shown in radar graph of Fig. 6. From the above results, we can see that the detection performance of low-frequency band is higher than that of high-frequency band. Compared with other frequency band, the highest detection accuracy are 94.24% and 93.77% of the theta frequency band using two data strategies.

Table 3 Performance results from different frequency bands based on spatial–temporal EEG fusion using tenfold cross validation

Full size table

Table 4 Performance results from different frequency bands based on spatial–temporal EEG fusion using leave-one-subject-out cross validation

Full size table

Meanwhile, we also adopted the paired sample t-test to find whether there are significant differences in detection accuracy among theta frequency band and other frequency bands, and the results are shown in Table 5. There is significant differences among theta frequency band and delta, beta and gamma (P < 0.05), but no significant differences with alpha band and full band, which indicates that theta and alpha frequency bands can improve the detection performance, and effectively reduce the computational amount compared with the full frequency band.

Table 5 Difference analysis of detection performance for theta band and other bands

Full size table

In this paper, the comparison of MDD in different frequency bands by spatial–temporal EEG fusion based on neural networks found that the detection performance of theta and alpha frequency bands is significantly improved compared with other frequency bands. Such results are indirectly consistent with several previous research conclusions. Literature [55] and [56] successively analyzed the power spectral density of EEG, and found that the change pattern of PSD curve was inconsistent between MDD group and HC group. Taking the asymmetric features combination of different frequency bands as the input of classifier, literature [57] found that theta band and alpha2 band can obtain the highest detection accuracy of MDD. Based on repeated analysis of variance (ANOVA), the researches of Goldschmied et al. [58] showed that the EEG activity of theta band was significantly different in regulating sleep homeostasis between MDD and HC, and the EEG activity IN theta band can be used as an important biomarker to distinguish MDD and HC. According to the coherence analysis, literature [22] found that EEG coherence of patients with MDD in theta frequency band is higher than that of HC, while the coherence between two group is basically the same in other bands. Liu et al. [59] demonstrated that MDD patients exhibited changed functional connectivity in theta band based on graph theory. Based on the structural synchronization method, the literature [60] found that MDD patients had impaired functional connectivity of the theta and alpha bands. The researches by Sun et al. [21] showed that hubs node activity of HC is higher than that of MDD in the theta band, and pointed out that this result was due to aberrant cognitive processing of MDD patients.

The above studies verify the high detection accuracy can be obtained from the theta and alpha frequency bands. The author believes that it is mainly due to the energy changes caused by the changes of brain function in the corresponding EEG frequency bands of MDD patients. Because, EEG signal is the external manifestation of energy difference from different neurons [61].

4.3 Comparison and Discussion of Detection Performance in Different Brain Regions

By comparing the detection performance of different bands, we find that the detection performance of delta, beta and gamma bands has not been significantly improved, so further studies we will focus more on theta, alpha and full band. Studies have shown that compared with HC, some brain regions of patients with mental diseases significantly altered [62]. Meanwhile, we were inspired by the brain regions research of brain function changes in adolescence [63], and the research of mild cognitive impairment based on the brain region correlation [64]. To explore whether there are brain regions closely related to MDD detection, this paper conducts occlusion-sensitivity operation in different brain regions, and designs controlled experiment under the spatial–temporal EEG fusion based on neural networks.

Usually, the brain region can be divided into 4 regions, 5 regions, 6 regions, and 8 regions [65]. To find the relationship between different brain regions and MDD detection, the 8 regions was used in this paper and its division is shown in Fig. 7, in which blue is left frontal (LF), dark green is right frontal (RF), olive green is left temporal (LT), orange is left central (LC), yellow is right central (RC), grass green is right temporal (RT), red is left parietal–occipital (LPO), brown is right parietal–occipital (RPO).

Figure 8 shows the detection accuracy of different brain regions in theta, alpha, and full frequency band, the experimental results again verify that there is no significant difference in detection accuracy between two data usage strategies, which further shows that the spatial–temporal EEG fusion based on neural network has good robustness for different data. In addition, we can see from Fig. 8 that the detection accuracy in LF, LC, RT regions of theta, alpha and full frequency band is higher than that of the full brain region under the two data usage strategies, except LF region in alpha frequency band and LC region in full frequency band. This result means that the spatial–temporal EEG fusion method can further improve the detection accuracy in LF, LC, RT brain regions located in theta frequency band, and LC, RT brain regions located in alpha frequency band, among which the detection accuracy in theta frequency band of LF brain region was the highest 95.87%.

The above results indicate that the number of electrodes can be reduced from 64 to 7 in the future practical application, namely, Fp1, AF3, AF7, F1, F3, F5 and F7 electrodes in the LF brain region. The authors believe that this result is related to the difference of topological connection strength between two groups in LF brain region. To verify this assumption, we have drawn the distribution map of BFN topological average connection strength between the two groups in LF brain region on the basis of occlusion sensitive operation, as shown in Fig. 9a and b, which shows that the topological connection strength of MDD group is weakened. To show this result more intuitively, we give the adjacency matrix corresponding to the average connection strength of two groups, as shown in Fig. 9c and d, and we can see that the connection strength of MDD group is weakened by comparing the two groups of adjacency matrices in LF region.

Different brain regions in depression patients have been widely studied, and new research results are constantly emerging. But some studies have obtained different or even opposite conclusions because of different methods or models are adopted. Literature [66] indicates connectivity increase in frontal region of patients with MDD, while literature [23] showed that connectivity decrease in the frontal lobe. In this study, we found that the LF, LC, RT regions in theta frequency band and the LC, RT regions in alpha frequency band can improve the detection accuracy of MDD, especially in LF region of the theta frequency band. Related studies have clarified the injury or changes in LF, LC and RT regions of MDD patients from the perspectives of neuroanatomy, neurophysiology, BFN, and abnormal gray matter analysis, directly or indirectly verified the effectiveness of these regions for MDD detection. For example, the study in [63] pointed out that the frontal is mainly responsible for motor, language and mental functions and the temporal is mainly responsible for emotional functions from the aspects of neuroanatomy and neurophysiology. After frontal and temporal lobe injury, patients show low mood, apathy, and easy mood fluctuation, which are also typical symptoms of MDD patients. From the aspect of clinical neurophysiology, studies have shown reduced activation in LF region of MDD patients compared with HC [67]. Zhang et al. [55] pointed out that several complex network indicators in the LC and RT regions of MDD patients are significantly related to PHQ-9, and these indicators can obtain high accuracy for MDD detection. Based on graph theory analysis, Sun et al. [21] showed that the network indicators of LF and LC regions can distinguish MDD from HC. Based on abnormal gray matter analysis, Zhao et al. [68] found that the cortical thickening in LC region of MDD patients was thickened relative to HC.

Despite the black-box attribute of the occlusion-sensitivity method employed in this paper, this occlusion process allows opening the box and indirectly revealing interesting insights on how to associate specific frequency band or specific brain regions with MDD detection.

4.4 Variants of Spatial–Temporal EEG Fusion Based on Neural Network

Considering that it is impossible to test all neural network architectures. Meanwhile, also considering that this study obtains the highest detection accuracy in the theta frequency band of LF brain region. Here, we only analyze the effects of main components of neural network model proposed in this paper on the spatial–temporal EEG fusion in the theta frequency band of LF brain region. For this reason, we designed a controlled experiment with different combinations of RNN layers and CNN layers, that is, various variants of spatial–temporal EEG fusion based on neural network. Because the highest detection accuracy rate is obtained using tenfold cross validation in this paper, and the results of other section studies show that this research method has good robustness to the two data strategies, so we only test the tenfold cross validation.

The experimental results are shown in Table 6. The combination of 3-layer LSTM and 4-layer CNN achieves the best performance, while the accuracy of other combinations with fewer layers was no more than 96.33%. This means that better performance can be obtained by increasing the number of layers, that is, more RNN layers or CNN layers will get better performance for MDD detection. This study using the combination of 2-layer LSTM and 3-layer CNN is a tradeoff between performance and efficiency. Interestingly, the author believes that the spatial–temporal EEG fusion based on more complex neural network will certainly achieve better results, but it is also based on the premise of increasing time complexity.

Table 6 Comparison of the various variants of neural network

Full size table

4.5 Verification of Model Generalization Ability and Robustness

To confirm the generalization ability and robustness of the model in this paper, we used publicly available EEG data in literature [51, 52]. The data in literature [51] are used for fatigue driving detection, which contains 33 channel data from 27 subjects, the first 32 channels record EEG signals and the 33rd channel record vehicle position information. The data in literature [52] is used for anxiety detection, which contains 128 channels EEG data from 92 subjects.

Using the above two public EEG datasets to design validation experiments, we evaluated the performance of fatigue driving detection and anxiety detection in the theta frequency band of LF brain region by the spatial–temporal EEG fusion model using neural network under tenfold cross-validation data strategy, and the results are shown in Table 7. By comparing the results of different detection problems on three EEG datasets, it can be found that detection accuracy of the data in this paper as well as the data in two public datasets is 95.87%, 94.54% and 92.02%, respectively, in other words, there was no significantly difference in the results. The author believes that the slight difference in accuracy is caused by the data quality of different datasets. Therefore, the comparative results indicate that the proposed model has good generalization ability and robustness.

Table 7 Experimental results on two public EEG datasets

Full size table

4.6 Performance Comparison

By comparing different detection methods and comparing the proposed model with existing models, the performance of the proposed model is verified.

4.6.1 Comparison of Different Detection Algorithms

To verify the detection performance of spatial–temporal EEG fusion, time series EEG data and the corresponding adjacency matrix of BFN as the typical detection algorithm input under the tenfold cross-validation data strategy, and then comparing MDD detection results between the typical detection algorithm and the proposed model. Typical detection algorithms in this paper include dictionary learning (DL), linear discriminant analysis (LDA), and support vector machine (SVM). Table 8 shows the main parameter settings of typical detection algorithms.

Table 8 Main parameter settings of typical detection algorithms

Full size table

Figure 10 lists the accuracy comparison results of different detection algorithms, and it can be seen that our method has obtained the highest detection accuracy. Especially compared with SVM, the improvement is 12.22%. Why the detection accuracy of our method significantly improved? The author believes that the main reasons can be summarized as follows: spatial–temporal EEG fusion method based on neural network is an integrated learning method. Different machine learning algorithms have their own advantages in design, that is, each detection algorithm has its own preferences. Our model can give full play to the complementary advantages of RNN and CNN, and thus improve the accuracy of MDD detection.

4.6.2 Comparison with Pioneers’ Methods

Due to the differences in methods and data usage strategies, it is impossible to directly explain the pros and cons of various MDD detection methods through comparison. But the advantages and disadvantages of different methods can be indirectly reflected by comparison. To this end, Table 9 gives the comprehensive comparison between the pioneers' MDD detection methods and the method presented in this paper; these comprehensive comparative indicators include method, subjects, electrodes, and the detection accuracy.

Table 9 Comparison with pioneers’ MDD detection methods

Full size table

5 Conclusion

This study aims to explore an objective and effective MDD detection method; thus, a framework of MDD detection based on spatial–temporal EEG fusion based on neural networks is proposed. By the comparative and analysis of unimodal EEG features and spatial–temporal EEG fusion, it is found that spatial–temporal EEG fusion can improve the detection performance of MDD. On this basis, the occlusion-sensitivity operation was successively used to find that theta, alpha, and full frequency band in brain regions of LF, LC, RT were closely related to depression, and this conclusion was also indirectly consistent with the existing research, especially the theta frequency band in LF region, which achieved the highest detection accuracy of 96.33%. In addition, we also tested various variants of neural network for spatial–temporal EEG fusion method, and found that more RNN layers or CNN layers can achieve better MDD detection performance.

Data availability

All data in this study can be downloaded from MODMA Dataset, Available: http://modma.lzu.edu.cn/data/index/.

References

World Health Organization, 2023, Available https://www.who.int/health-topics/depression#tab=tab_1. Accessed 10 Apr 2023
Kennis M, Gerritsen L, Van Dalen M et al (2020) Prospective biomarkers of major depressive disorder: a systematic review and meta-analysis. Mol Psychiatr 25(2):321–338. https://doi.org/10.1038/s41380-019-0585-z
Article Google Scholar
Herrman H, Patel V, Kieling C et al (2022) Time for united action on depression: a lancet-world psychiatric association commission. Lancet 399(10328):957–1022. https://doi.org/10.1016/S0140-6736(21)02141-3
Article PubMed Google Scholar
Kovacs M, Obrosky S, George C (2016) The course of major depressive disorder from childhood to young adulthood: recovery and recurrence in a longitudinal observational study. J Affect Disord 203:374–381. https://doi.org/10.1016/j.jad.2016.05.042
Article PubMed PubMed Central Google Scholar
Lacko SE, Gaxiola SA, Hamzawi AA et al (2018) Socio-economic variations in the mental health treatment gap for people with anxiety, mood, and substance use disorders: results from the WHO world mental health (WMH) surveys. Psychol Med 48(9):1560–1571. https://doi.org/10.1017/S0033291717003336
Article Google Scholar
Chmielewski M, Clark LA, Bagby RM (2015) Method matters: understanding diagnostic reliability in DSM-IV and DSM-5. J Abnormal Psychol 124(3):764–769. https://doi.org/10.1037/abn0000069
Article Google Scholar
Kroenke K, Spitzer RL, Williams J (2001) The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 16(9):606–613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x
Article CAS PubMed PubMed Central Google Scholar
Beck AT, Steer RA, Carbin MG (1988) Psychometric properties of the beck depression Inventory: twenty-five years of evaluation. Clin Psychol Rev 8(1):77–100. https://doi.org/10.1016/0272-7358(88)90050-5
Article Google Scholar
Zhang M, Zhou H, Liu L et al (2018) Randomized EEG functional brain networks in major depressive disorders with greater resilience and lower rich-club coefficient. Clin Neurophysiol 129(4):743–758. https://doi.org/10.1016/j.clinph.2018.01.017
Article PubMed Google Scholar
Loh HW, Ooi CP, Aydemir E et al (2022) Decision support system for major depression detection using spectrogram and convolution neural network with EEG signals. Expert Syst 39(3):e12773. https://doi.org/10.1111/exsy.12773
Article Google Scholar
Jiang Z, Harati S, Crowell A et al (2020) Classifying major depressive disorder and response to deep brain stimulation over time by analyzing facial expressions. IEEE Trans Biomed Eng 68(2):664–672. https://doi.org/10.1109/TBME.2020.3010472
Article Google Scholar
Wang T, Li CC, Sun JQ et al (2021) A gait assessment framework for depression detection using kinect sensors. IEEE Sens J 21(3):3260–3270. https://doi.org/10.1109/JSEN.2020.3022374
Article Google Scholar
Di Y, Wang J, Li W et al (2021) Using i-vectors from voice features to identify major depressive disorder. J Affect Disord 288:161–166. https://doi.org/10.1016/j.jad.2021.04.004
Article PubMed Google Scholar
Zhang BT, Yang ZF, Cai HS et al (2020) Ontology-based decision support tool for automatic sleep staging using dual-channel EEG data. Symmetry-Basel 12(11):1921. https://doi.org/10.3390/sym12111921
Article Google Scholar
Craik A, He Y (2019) Deep learning for electroencephalogram (EEG) classification tasks: a review. J Neural Eng 16(3):031001. https://doi.org/10.1088/1741-2552/ab0ab5
Article PubMed Google Scholar
Eldele E, Chen Z, Liu C et al (2021) An attention-based deep learning approach for sleep stage classification with single-channel EEG. IEEE Trans Neural Syst Rehabil Eng 29:809–818. https://doi.org/10.1109/TNSRE.2021.3076234
Article PubMed Google Scholar
Wan ZT, Yang R, Huang MJ et al (2021) A review on transfer learning in EEG signal analysis. Neurocomputing 421:1–14. https://doi.org/10.1016/j.neucom.2020.09.017
Article Google Scholar
Shen J, Zhang X, Wang G et al (2022) An improved empirical mode decomposition of electroencephalogram signals for depression detection. IEEE Trans Affect Comput 13(1):262–271. https://doi.org/10.1109/TAFFC.2019.2934412
Article Google Scholar
Fiscon G, Weitschek E, Cialini A et al (2018) Combining EEG signal processing with supervised methods for Alzheimer’s patients classification. BMC Med Inform Decis Mak 18(1):1–10. https://doi.org/10.1186/s12911-018-0613-y
Article Google Scholar
Li X, Hu B, Sun S et al (2016) EEG-based mild depressive detection using feature selection methods and classifiers. Comput Methods Programs Biomed 136:151–161. https://doi.org/10.1016/j.cmpb.2016.08.010
Article PubMed Google Scholar
Sun S, Li X, Zhu J et al (2019) Graph theory analysis of functional connectivity in major depression disorder with high-density resting state EEG data. IEEE Trans Neural Syst Rehabil Eng 27(3):429–439. https://doi.org/10.1109/TNSRE.2019.2894423
Article PubMed Google Scholar
Li X, Jing Z, Hu B et al (2017) A resting-state brain functional network study in MDD based on minimum spanning tree analysis and the hierarchical clustering. Complexity. https://doi.org/10.1155/2017/9514369
Article Google Scholar
Shao XX, Sun ST, Li JX et al (2021) Analysis of functional brain network in MDD based on improved empirical mode decomposition with resting state EEG data. IEEE Trans Neural Syst Rehabil Eng 29:1546–1556. https://doi.org/10.1109/TNSRE.2021.3092140
Article PubMed Google Scholar
Hsu YL, Yang YT, Wang JS et al (2013) Automatic sleep stage recurrent neural classifier using energy features of EEG signals. Neurocomputing 104:105–114. https://doi.org/10.1016/j.neucom.2012.11.003
Article Google Scholar
Güler NF, Übeyli ED, Güler İ (2005) Recurrent neural networks employing Lyapunov exponents for EEG signals classification. Expert Syst Appl 29:506–514. https://doi.org/10.1016/j.eswa.2005.04.011
Article Google Scholar
Sors A, Bonnet S, Mirek S et al (2018) A convolutional neural network for sleep stage scoring from raw single-channel EEG. Biomed Signal Process Control 42:107–114. https://doi.org/10.1016/j.bspc.2017.12.001
Article Google Scholar
Ieracitano C, Mammone N, Bramanti A et al (2019) A convolutional neural network approach for classification of dementia stages based on 2D-spectral representation of EEG recordings. Neurocomputing 323:96–107. https://doi.org/10.1016/j.neucom.2018.09.071
Article Google Scholar
Acharya UR, Oh SL, Hagiwara Y et al (2018) Deep convolutional neural network for the automated detection and diagnosis of seizure using EEG signals. Comput Biol Med 100:270–278. https://doi.org/10.1016/j.compbiomed.2017.09.017
Article PubMed Google Scholar
Verma A, Janghel RR (2021) Epileptic seizure detection using deep recurrent neural networks in EEG signals. Advances in biomedical engineering and technology. Springer, Singapore, pp 189–198. https://doi.org/10.1007/978-981-15-6329-4_17
Chapter Google Scholar
Acharya UR, Oh SL, Hagiwara Y et al (2018) Automated EEG-based screening of depression using deep convolutional neural network. Comput Methods Programs Biomed 161:103–113. https://doi.org/10.1016/j.cmpb.2018.04.012
Article PubMed Google Scholar
Li X, La R, Wang Y et al (2019) EEG-based mild depression recognition using convolutional neural network. Med Biol Eng Compu 57(6):1341–1352. https://doi.org/10.1007/s11517-019-01959-2
Article Google Scholar
Su Y, Zhang Z, Li X et al (2022) The multiscale 3D convolutional network for emotion recognition based on electroencephalogram. Front Neurosci 16:872311. https://doi.org/10.3389/fnins.2022.872311
Article PubMed PubMed Central Google Scholar
Mahfuz MRA, Moni MA, Uddin S et al (2021) A deep convolutional neural network method to detect seizures and characteristic frequencies using epileptic electroencephalogram (EEG) data. IEEE J Transl Eng Health Med 9:1–12. https://doi.org/10.1109/JTEHM.2021.3050925
Article Google Scholar
Grimm S, Ernst J, Boesiger P et al (2009) Increased self-focus in major depressive disorder is related to neural abnormalities in subcortical-cortical midline structures. Hum Brain Mapp 30(8):2617–2627. https://doi.org/10.1002/hbm.20693
Article PubMed Google Scholar
Park C, Rosenblat JD, Lee Y et al (2019) The neural systems of emotion regulation and abnormalities in major depressive disorder. Behav Brain Res 367:181–188. https://doi.org/10.1016/j.bbr.2019.04.002
Article PubMed Google Scholar
Cai HS, Yuan ZQ, Hu B et al (2022) A multi-modal open dataset for mental-disorder analysis. Sci Data. https://doi.org/10.1038/s41597-022-01211-x
Article PubMed PubMed Central Google Scholar
Zhang BT, Zhou WY, Cai HS et al (2020) Ubiquitous depression detection of sleep physiological data by using combination learning and functional networks. IEEE Access 8:94220–94235. https://doi.org/10.1109/ACCESS.2020.2994985
Article Google Scholar
Peng H, Shi B, Ratcliffe Q, Zhao M, Gao Y (2013) Removal of ocular artifacts in EEG-an improved approach combining DWT and ANC for portable applications. IEEE J Biomed Health Inform 17(3):600–607. https://doi.org/10.1109/JBHI.2013.2253614
Article PubMed Google Scholar
Delorme A, Makeig S (2004) EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Meth 134(1):9–21. https://doi.org/10.1016/j.jneumeth.2003.10.009
Article Google Scholar
Sun J, Hong X, Tong S (2012) Phase synchronization analysis of EEG signals: an evaluation based on surrogate tests. IEEE Trans Biomed Eng 59(8):2254–2263. https://doi.org/10.1109/TBME.2012.2199490
Article Google Scholar
Yu H, Lei X, Song Z, Liu C, Wang J (2020) Supervised network-based fuzzy learning of EEG signals for Alzheimer’s disease identification. IEEE Trans Fuzzy Syst 28(1):60–71. https://doi.org/10.1109/TFUZZ.2019.2903753
Article Google Scholar
Zhang BT, Lei T, Liu H, Cai HS (2018) EEG-based automatic sleep staging using ontology and weighting feature analysis. Comput Math Methods Med. https://doi.org/10.1155/2018/6534041
Article PubMed PubMed Central Google Scholar
Gelly G, Gauvain J (2018) Optimization of RNN-based speech activity detection. IEEE/ACM Transactions Audio, Speech, Language Process 26(3):646–656. https://doi.org/10.1109/TASLP.2017.2769220
Article Google Scholar
Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 3(3):127–149. https://doi.org/10.1016/j.cosrev.2009.03.005
Article Google Scholar
Michielli N, Acharya UR, Molinari F (2019) Cascaded LSTM recurrent neural network for automated sleep stage classification using single-channel EEG signals. Comput Biol Med 106:71–81. https://doi.org/10.1016/j.compbiomed.2019.01.013
Article PubMed Google Scholar
Peng H, Xia C, Wang Z et al (2019) Multivariate pattern analysis of EEG-based functional connectivity: a study on the identification of depression. IEEE Access 7:92630–92641. https://doi.org/10.1109/ACCESS.2019.2927121
Article Google Scholar
Zhang BT, Zhou WY, Li YL et al (2022) Research on depression recognition based on brain function network. J Biomed Eng 39(1):47–55. https://doi.org/10.7507/1001-5515.202108034
Article Google Scholar
Stam CJ, Nolte G, Daffertshofer A (2007) Phase lag index: assessment of functional connectivity from multi-channel EEG and MEG with diminished bias from common sources. Hum Brain Mapp 28(11):1178–1193. https://doi.org/10.1002/hbm.20346
Article PubMed PubMed Central Google Scholar
Yu C, Han R, Song M et al (2020) A simplified 2D–3D CNN architecture for hyperspectral image classification based on spatial-spectral fusion. IEEE J Sel Top Appl Earth Observ Remote Sens 13:2485–2501. https://doi.org/10.1109/JSTARS.2020.2983224
Article Google Scholar
D. P. Kingma, J. Ba, Adam: a method for stochastic optimization, arXiv e-prints, 2014. https://doi.org/10.48550/arXiv.1412.6980
Zhang X, Lu DW, Pan J et al (2021) Fatigue detection with covariance manifolds of electroencephalography in transportation industry. IEEE Trans Industr Inf 17(5):3497–3507. https://doi.org/10.1109/TII.2020.3020694
Article Google Scholar
Zhang XW, Pan J, Shen J et al (2020) Fusing of electroencephalogram and eye movement with group sparse canonical correlation analysis for anxiety detection. IEEE Trans Affect Comput 13(2):958–971. https://doi.org/10.1109/TAFFC.2020.2981440
Article Google Scholar
Chambon S, Galtier MN, Arnal PJ et al (2018) A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series. IEEE Trans Neural Syst Rehabil Eng 26(4):758–769. https://doi.org/10.1109/TNSRE.2018.2813138
Article PubMed Google Scholar
M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in Proc. 13th Eur. Conf. Comput. Vis. (ECCV), Zurich, Switzerland, (2014). 818-833 https://doi.org/10.1007/s11042-020-09885-4
Zhang BT, Yan GH, Yang ZF, Su Y (2021) Brain functional networks based on resting-state EEG data for major depressive disorder analysis and classification. IEEE Trans Neural Syst Rehabil Eng 29(1):215–229. https://doi.org/10.1109/TNSRE.2020.3043426
Article PubMed Google Scholar
Zhang BT, Cai HS, Song YB et al (2022) Computer-aided recognition based on decision-level multimodal fusion for depression. IEEE J Biomed Health Inf 26(7):3466–3477. https://doi.org/10.1109/JBHI.2022.3165640
Article Google Scholar
Mahato S, Paul S (2019) Classification of depression patients and normal subjects based on electroencephalogram (EEG) signal using alpha power and theta asymmetry. J Med Syst 44(28):1–15. https://doi.org/10.1007/s10916-019-1486-z
Article Google Scholar
Goldschmied JR, Cheng P, Armitage R et al (2019) A preliminary investigation of the role of slow-wave activity in modulating waking EEG theta as a marker of sleep propensity in major depressive disorder. J Affect Disord 257:504–509. https://doi.org/10.1016/j.jad.2019.07.027
Article PubMed PubMed Central Google Scholar
Liu W, Zhang C, Wang X et al (2020) Functional connectivity of major depression disorder using ongoing EEG during music perception. Clin Neurophysiol 131(10):2413–2422. https://doi.org/10.1016/j.clinph.2020.06.031
Article PubMed Google Scholar
Fingelkurts AA, Fingelkurts AA, Rytsälä H et al (2007) Impaired functional connectivity at EEG alpha and theta frequency bands in major depression. Hum Brain Mapp 28(3):247–261. https://doi.org/10.1002/hbm.20275
Article PubMed Google Scholar
Zhang BT, Wang XP, Shen Y, Lei T (2019) Dual-modal physiological feature fusion-based sleep recognition using CFS and RF algorithm. Int J Autom Comput 6(3):286–296. https://doi.org/10.1007/s11633-019-1171-1
Article Google Scholar
Takagi Y, Sakai Y, Abe Y et al (2018) A common brain network among state, trait, and pathological anxiety from whole-brain functional connectivity. Neuroimage 172:506–516. https://doi.org/10.1016/j.neuroimage.2018.01.080
Article PubMed Google Scholar
Whitford TJ, Rennie CJ, Grieve SM et al (2007) Brain maturation in adolescence: concurrent changes in neuroanatomy and neurophysiology. Hum Brain Mapp 28(3):228–237. https://doi.org/10.1002/hbm.20273
Article PubMed Google Scholar
Bian Z, Li Q, Wang L et al (2014) Relative power and coherence of EEG series are related to amnestic mild cognitive impairment in diabetes. Front Aging Neurosci 6(11):1–9. https://doi.org/10.3389/fnagi.2014.00011
Article Google Scholar
Hu B, Dong Q, Hao Y (2017) Effective brain network analysis with resting state EEG data: A comparison between heroin abstinent and non-addicted subjects. J Neural Eng 14:046002. https://doi.org/10.1088/1741-2552/aa6c6f
Article PubMed Google Scholar
Leuchter A et al (2012) Resting-state quantitative electroencephalography reveals increased neurophysiologic connectivity in depression. PloS One 7:e32508. https://doi.org/10.1371/journal.pone.0032508
Article CAS PubMed PubMed Central Google Scholar
Demaree HA, Crew WD (1995) Topographical brain mapping in depression following mild closed head injury: a case study. Arch Clin Neuropsychol 10(4):320–321. https://doi.org/10.1300/J184v01n01_05
Article Google Scholar
Zhao Y, Chen L, Zhang W et al (2017) Gray matter abnormalities in non-comorbid medication-naive patients with major depressive disorder or social anxiety disorder. EBioMedicine 21:228–235. https://doi.org/10.1016/j.ebiom.2017.06.013
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61962034), in part by the Natural Science Foundation of Gansu Province (Grant No. 20JR10RA211), in part by Longyuan Youth Innovation and Entrepreneurship Talent (individual) project, in part by Tianyou Youth Talent Lift Program of Lanzhou Jiaotong University, and in part by Opening Foundation of Key Laboratory of Opto-technology and Intelligent Control (Lanzhou Jiaotong University), Ministry of Education (KFKT2020-13).

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou, 730070, China
Bingtao Zhang, Dan Wei & Guanghui Yan
Key Laboratory of Opto-Technology and Intelligent Control Ministry of Education, Lanzhou Jiaotong University, Lanzhou, 730070, China
Bingtao Zhang
School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
Bingtao Zhang, Yun Su & Hanshu Cai
Gansu Province Big Data Center, Lanzhou, 730000, China
Xiulan Li
College of Computer Science and Engineering, Northwest Normal University, Lanzhou, 730070, China
Yun Su

Authors

Bingtao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dan Wei
View author publications
You can also search for this author in PubMed Google Scholar
Guanghui Yan
View author publications
You can also search for this author in PubMed Google Scholar
Xiulan Li
View author publications
You can also search for this author in PubMed Google Scholar
Yun Su
View author publications
You can also search for this author in PubMed Google Scholar
Hanshu Cai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Bingtao Zhang or Xiulan Li.

Ethics declarations

Conflict of Interest

All authors declares that they have no conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, B., Wei, D., Yan, G. et al. Spatial–Temporal EEG Fusion Based on Neural Network for Major Depressive Disorder Detection. Interdiscip Sci Comput Life Sci 15, 542–559 (2023). https://doi.org/10.1007/s12539-023-00567-x

Download citation

Received: 08 January 2023
Revised: 12 April 2023
Accepted: 13 April 2023
Published: 04 May 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s12539-023-00567-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Spatial–Temporal EEG Fusion Based on Neural Network for Major Depressive Disorder Detection

Abstract

Graphical Abstract

Similar content being viewed by others

Efficacy of novel attention-based gated recurrent units transformer for depression detection using electroencephalogram signals

MADNet: EEG-Based Depression Detection Using a Deep Convolution Neural Network Framework with Multi-dimensional Attention

EEG-based mild depression recognition using convolutional neural network

1 Introduction

2 Data

2.1 Subjects

2.2 EEG Recording and Preprocessing

3 Methodology

3.1 TD Feature Extraction

3.1.1 TD Input Data

3.1.2 TD Feature Extraction Based on RNN

3.2 SD Feature Extraction

3.2.1 BFN and SD Input Data

3.2.2 SD Feature Extraction Based on 2D-CNN

3.3 Spatial–Temporal EEG Fusion

4 Experiments and Results

4.1 Comparison of Unimodal and Spatial–Temporal Fusion

4.2 Comparison and Discussion of Detection Performance in Different Frequency Bands

4.3 Comparison and Discussion of Detection Performance in Different Brain Regions

4.4 Variants of Spatial–Temporal EEG Fusion Based on Neural Network

4.5 Verification of Model Generalization Ability and Robustness

4.6 Performance Comparison

4.6.1 Comparison of Different Detection Algorithms

4.6.2 Comparison with Pioneers’ Methods

5 Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation