1 Introduction

With the improvement of people’s living standards, more and more families have bought vehicles. Vehicles play an influential role in transportation because of their flexible transportation capabilities and also promoted the development of many industries and accelerated the economic improvement. According to the public statistics, China had 261.5 million vehicles by the end of 2019 [1]. The increase in the number of vehicles will also increase the number of traffic accidents, especially those caused by the drowsy driving of vehicle drivers. In long-distance driving, vehicle drivers will drive for a long time in order to improve driving efficiency. Prolonged driving of vehicle will make the driver feel tired and distracted, which may lead to mortal traffic accidents. On the other hand, vehicle drivers who obtain insufficient sleep on most nights can also cause drowsy driving. The American Automobile Association (AAA) estimates that one-sixth (16.5%) of fatal traffic accidents and one-eighth (12.5%) of accidents requiring driver or passengers hospitalized are caused by drowsy driving [2]. According to the US National Highway Traffic Safety Administration (NHTSA) statistical analysis, 91,000 police-reported crashes involved drowsy drivers in 2017. These crashes led to an estimated 50,000 people injured and nearly 800 deaths [3]. The German Road Safety Council claims that 25% of the highway traffic fatalities are caused by driver drowsiness [4]. A report has shown that vehicle driver drowsy driving can happen any time, but most frequently occur between midnight and 6 a.m., or in the late afternoon [3]. The above statistics show that the harmfulness of drowsy driving is very huge, which is one of the causes of major traffic accidents.

The successful detection of drowsiness is a crucial step to reduce the cost to society of traffic accident. Until now, there are many published researches have tried to solve the problem of driver drowsiness detection [5,6,7]. Among many researches, driver drowsiness detection methods based on computer vision occupies the vast majority. Computer vision based measurements mainly detect the driver’s eye motion, eye blinking, head motion and head position [8, 9]. However, with the spread of COVID-19, it has become normal for drivers to wear masks, which is a challenge for drowsiness detection. On the other hand, in the case of drowsy driving, the light is generally not very good, which also has a great influence on the detection accuracy rate. Many studies have shown that among the many indicators of drowsiness detection, electroencephalographic (EEG) signals are known as the gold standard for drowsiness detection. Gharagozlou et al. [10] analyzed the EEG signals of sleep-deprived drivers while performing simulated driving tasks, and found that \(\alpha\) wave (8–13 Hz) can be used as an indicator to detect driver drowsiness. Lin et al. [11] use the offset distance of the correct lane line in the simulation device as the evaluation index and found that the correlation coefficient between \(\alpha\) wave and drowsiness is the largest. In summary, EEG signals are closely related to human drowsiness, especially the \(\alpha\) wave. So in this paper, we mainly detect and analyze the \(\alpha\) band of EEG signals under drowsy driving. Reasonable analysis and processing of EEG signals can effectively predict the drowsiness of drivers.

In order to detect the driver drowsiness state in time and accurately and reduce traffic accidents caused by drowsy driving, in this paper, we propose a vehicle driver drowsiness detection method using wearable electroencephalographic based on convolution neural network (CNN). The system architecture of the vehicle driver drowsiness detection method is shown in Fig. 1, which consists of three portions, namely the data collection using wearable EEG, vehicle driver drowsiness detection module and the early warning strategy. Firstly, a wearable brain computer interface (BCI) is used to monitor and collect the EEG signals. We collect the EEG signals in two different conditions: one is sleep deprivation and the test time is between 3 a.m. and 5 a.m., and the other is having a normal night’s sleep and the test time is between 10 a.m. and 12 a.m. Secondly, the collected EEG signals need to be pre-processed. We use linear filter, fast independent components analysis (FastICA) and wavelet threshold denoising to get high-quality EEG signals. Then, the convolution neural networks with Inception module and modified AlexNet module are trained to classify the EEG signals. Finally, the early warning strategy module will function and it will sound an alarm if the vehicle driver is judged as drowsy. The feasibility of the proposed drowsiness detecting method for vehicle driving safety is demonstrated by the simulation and test results.

Fig. 1
figure 1

The architecture of the vehicle driver drowsiness detection system

The contributions can be summarized as follows: (1) a vehicle driver drowsiness detection method using wearable EEG is proposed to alert and warn vehicle drivers under drowsiness conditions. (2) The method uses the neural networks with Inception module and modified AlexNet module to extract the feature of the EEG signals and then train and classify the EEG signals. (3) The early warning device is used to warn the status of vehicle drivers. If the vehicle driver is normal, the early warning device will show a white light. If the vehicle driver is judged as drowsy, the early warning will show a red light and sound an alarm.

The rest of this paper organized as follows. Section 2 provides a review of the related work. Then, the proposed methodology is described in Sect. 3. Section 4 analyzes the simulation and test results. Finally, some conclusions are provided in Sect. 5.

2 Related work

The driver plays an important role in the driving of the vehicle, so the driver drowsiness detection and early warning can effectively reduce traffic accidents. According to the different equipment and method used, drowsiness detection can be divided into three different measures.

The first method is based on measuring vehicle behaviors to evaluate the driver state. This measure mainly detects the analysis of vehicle movement, like steering wheel movement, acceleration pedal movement, lane keeping and braking, etc., to determine the state of driver alertness [12, 13]. Mortazavi et al. [14] found that when the driver is in a state of drowsiness, the reaction speed is reduced because the brain is not awake, which is manifested by a weakened steering wheel control ability. Forsman et al. [15] developed a driver drowsiness detection method at moderate levels of fatigue; this method could provide the driver with sufficient time to reach a rest stop.

The second method is based on the visual features, which mainly analyzes the driver’s eyes state, mouth state, expression, head position and head motion through the camera [16, 17]. The driver’s eye blinking frequency and eye-closing time in the drowsiness state are different from the normal state, so many drowsiness detection methods are mainly eye detections [18]. Head pose is also an important feature for judging whether the driver is drowsy. When the drivers are drowsy, they may lower their head or lean to the side [19]. For more robust and reliable driver inattention monitoring systems, some researchers combined more facial expressions to detect drowsiness. Mbouna et al. [20] presented a method to analyze both eye state and head pose for continuous monitoring the alertness of a vehicle driver.

The third detection method is based on physiological signals, such as electrocardiogram (ECG), electrooculogram (EoG), electromyogram (EMG), and electroencephalogram (EEG) [21]. Khushaba et al. [22] extracted the related information from ECG, EEG and EOG, and used fuzzy wavelet-packet-based feature extraction algorithm to classify the drowsiness state. Many studies have shown that among many indicators of drowsiness detection, EEG signal based method is the most promising and feasible method for drowsiness detection [23,24,25]. Lin et al. [26] proposed a novel brain–computer interface system that can acquire and analyze EEG signals in real time to monitor and warn the drowsiness of the drivers, and the system obtained an average sensitivity of 88.7% and positive predictive value of 76.9%. Chai et al. [27] presented an EEG-based driver drowsiness classification method using sparse-deep belief networks and autoregressive modeling, which achieved a classification accuracy of 90.6%. Yeo et al. [28] used support vector machines (SVM) in identifying and differentiating EEG changes that occur between alert and drowsy states, and obtained an accuracy over 90%. Gu et al. [29] surveyed the recent literature on EEG-based intelligent BCI technologies and introduced driving fatigue detection research using deep learning algorithms. Gao et al. [30] proposed an EEG-based spatial temporal convolutional neural network (ESTCNN) for driver drowsiness detection and achieved a high accuracy of 97.37%. Zeng et al. [31] developed two mental state classification models called EEG-Conv and EEG-Conv-R for driver drowsiness detection and obtained 91.788% and 92.682% classification accuracy.

From the above literature, it can be seen that EEG signals are widely used in driver drowsiness detection. However, researchers also found that EEG signals are very weak and susceptible to the background noise. Therefore, how to extract high-quality EEG signals under drowsy driving and how to accurately classify the EEG signals require further researches.

3 Methodology

3.1 EEG signal acquisition

Acquisition of EEG signals is the first step. We decide to use a programmable collecting scheme of OpenBCI open source. The Ag–CL electrode is shown in Fig. 2a, which has higher applicability than the medical wet electrode. As shown in Fig. 2b, the OpenBCI demo board collects EEG signals, converts potential signals into digital signals through digital-to-analog conversion circuits and transmits them to the personal computer. The EEG cap consists of 8 dry electrodes with ultra-high impedance amplifiers and 2 ear clips as reference electrodes. The EEG acquisition module has a sampling rate of 256 Hz and an operating voltage of 6 V, which is shown in Fig. 2c.

Fig. 2
figure 2

EEG signals collection system. a The Ag–CL dry electrode, b The open BCI platform, c The EEG acquisition hat (d) international 10–20 system of electrode placement

There are many forms of EEG signal acquisition, and the location of the collection electrode is also different. In order to better analyze the EEG signal of different regions, the international 10–20 system of electrode placement has been formulated to standardize EEG signal collection, which is shown in Fig. 2d. According to some research [26, 32,33,34], the EEG signals of prefrontal lobe (Fp1, Fp2), central lobe (C3, C4), temporal lobe (T7, T8) and occipital lobe (O1, O2) are related to the drowsy state, so electrodes are placed at these points in this paper.

3.2 EEG signal pre-processing

As a weak physiological signal, EEG signal is affected by various aspects in the process of acquisition, resulting in low quality. One of challenges of using EEG-based systems is the contamination from EEG artifacts, including muscle noise, eye activity, blink artifacts, and instrumental noises such as line noise and electronic interference. So it is necessary to perform certain pre-processing operations on the collected EEG signals to remove these artifacts and improve the information quality. In this paper, we use linear filters, FastICA method and wavelet threshold denoising to remove these artifacts. The general pre-processing process of EEG in this paper is shown in Fig. 3.

Fig. 3
figure 3

The flowchart of EEG signal pre-processing

The sampling frequency of the acquisition device in this paper is 256 Hz, and the maximum frequency of the collected EEG signal is 128 Hz, which is much larger than the frequency of drowsy EEG signals, so linear filters are used for filter first. Because the frequency response curve of the Butterworth filter in the passband is the flattest and can effectively retain the useful components of the signal, a third-order Butterworth bandpass filter is selected to initially filter the signal, and the cut off frequency is set to 1 Hz to 60 Hz, preliminary filter of unwanted low-frequency and high-frequency components. Next, we design a Butterworth trap filter with a cut off frequency of 50 Hz to remove power frequency interference.

Second, eight-channel EEG signals are implemented with FastICA. Independent components analysis (ICA) is a common classical algorithm for blind source separation, which assumes that the original signal is statistically independent and the observed signal is formed by instantaneous mixing of the original signal. The observed signals are separated according to some prior knowledge to obtain independent original signal components. Independent components analysis has been wildly used in EEG artifacts removal [11, 43]. The specific process of FastICA algorithm used in this article is as follows:

  • Step 1 Centralize and whiten the observed data;

  • Step 2 Initialize the separation matrix \(W\), convergence error \(\varepsilon\), number of iterations \(p\);

  • Step 3 Update

    $$ W_{n + 1} = E\left\{ {Xg^{\prime}\left( {W_{n}^{T} Z} \right)} \right\} - E\left\{ {g^{\prime\prime}\left( {W_{n}^{T} X} \right)} \right\}W $$
    (1)
  • Step 4 Standardize

    $$ W_{n + 1} = W_{n + 1} /\left\| {W_{n + 1} } \right\| $$
    (2)
  • Step 5 Judge whether \(\left\| {W_{n + 1} - W_{n} } \right\| < \varepsilon\) is established, if it is established or the number of iterations reaches \(p\), it ends. Otherwise, return to Step 3.

\(E\) is the mean operation and \(g\) is the non-quadratic function that generally is shown as follows:

$$ g = - \frac{1}{{a_{1} }}logcosa_{1} y $$
(3)

In the formula, \(1 \le a_{1} \le 2\), usually \(a_{1} \,{ = }\,1\).

The signal separated by FastICA is only a relatively independent component, which also contains some noise and not accurate enough. Therefore, after signal separation, we use wavelet threshold method to decompose and reconstruct the signal of each channel. Wavelet threshold method is based on discrete wavelet transforms (DWT). First, performing wavelet decomposition on the original signal to obtain scale coefficients. Three-layer wavelet decomposition is used in this paper. Then, threshold processing is used. We use a soft-threshold processing method [35]. The specific method of soft threshold is to discard this item when the absolute value of the wavelet coefficient is less than the given threshold. When the wavelet coefficient is greater than the given threshold, the coefficient is set to the difference between the original value and the threshold. Finally, the inverse wavelet transform is used to reconstruct the signal to achieve the purpose of reducing noise. The structure of three-layer wavelet decomposition is shown in Fig. 4. We directly remove the high-frequency component D1 and reconstruct the decomposition coefficients of D2, D3, A3 and obtain a reliable EEG signal.

Fig. 4
figure 4

Three-layer wavelet decomposition structure

3.3 Classification of EEG signals based on CNN

Previously, the mainstream solution for classifying EEG signals was machine learning methods. In recent years, with the development of deep learning, convolutional neural networks have also been applied to EEG signals classification because of their excellent performance in computer vision and natural language processing. Hajinoroozi et al. [36] used convolutional neural networks for drowsiness detection of EEG signals and achieved good results. In this paper, the neural networks with Inception module and modified AlexNet module are proposed to classify the EEG signals.

3.3.1 Convolutional neural networks with Inception module

In terms of convolutional networks, Inception and residual network (ResNet) have the best performance and are the most popular. The purpose of ResNet is to solve the problem of gradient explosion or gradient disappearance. However, the size of the dataset trained in this paper is small, and it is better to capture features rather than increase network depth, so the network structure with Inception unit is selected. The Inception unit was first proposed by Szegedy et al. [37] in 2015. The structure of Inception increases the width of the neural network and replaces large convolution kernels with several parallel small convolution kernels to perform operations. While increasing the running speed, it can connect different outputs together and adaptively select the required information through the weight of the next layer of network.

In addition to the Inception unit, we also use the batch normalization (BN) layer in the construction of the network model. Ioffe et al. [38] proposed the BN layer which is a training optimization method. The essence of the BN layer is to adjust the distribution. When the distribution of training data and test data is different, the generalization ability of the network will be affected. The BN layer is to normalize each training mini-batch and finally restore it to an approximate original distribution.

The specific operation is as follows:

Suppose that the input of a batch in a layer of neural network is \(X{\text{ = [x}}_{1} {\text{,x}}_{2} {,}...{\text{x}}_{n} {]}\), and set two learning parameters \(\gamma\) and \(\beta\). First find the mean and variance of the elements in this mini-batch:

$$ \mu_{B} = \frac{1}{n}\sum\limits_{i = 1}^{n} {x_{i} } $$
(4)
$$ \sigma_{B}^{2} = \frac{1}{n}\sum\limits_{i = 1}^{n} {(x_{i} - \mu {}_{B})^{2} } $$
(5)

Then normalize each sample element:

$$ x_{i}{^{\prime}} = \frac{{x_{i} - \mu_{B} }}{{\sqrt {\sigma_{B}^{2} + \varepsilon } }} $$
(6)

Finally, scaling and deviation are performed to approximate the original distribution, and the output is:

$$ y_{i} = \gamma_{i} x_{i}{^{\prime}} + \beta_{i} $$
(7)

In the BN layer, the performance of the network can be optimized by learning the two parameters \(\gamma\) and \(\beta\). The network with the BN layer has a faster convergence speed and can effectively prevent the gradient dispersion problem and enhance the robustness of the network. Therefore, in the network construction of this paper, the BN layer is used to improve the capabilities of the model.

We are inspired from the Inception network structure and propose our own network structure, which is shown in Fig. 5. The model in this paper consists of five convolutional layers, two pooling layers, three Inception modules, and three fully connected layers. All paddings use the same type.

Fig. 5
figure 5

The proposed network structure with Inception module

3.3.2 Modified AlexNet model

Because the EEG signals are relatively weak, and the acquisition and pre-processing are difficult, the dataset size is relatively small. In order to judge whether the adopted Inception model is reasonable, a modified AlexNet model is also used for comparative analysis.

AlexNet model was proposed by Krizhevsky et al. [39]. The AlexNet model is divided into 8 layers, 5 convolutional layers, and 3 fully connected layers. Generally, the activation function of neurons will choose tanh function or sigmoid function. To speed up training, AlexNet model used rectified linear unit (ReLU) in each convolutional layer.

Because the value range obtained by the ReLU activation function does not have an interval, the results obtained by ReLU should be normalized, which is local response normalization (LRN). The LRN is as follows:

$$ b_{x,y}^{i} = a_{x,y}^{i} /\left( {k + \alpha \sum\limits_{{j = \max \left( {0,i - n/2} \right)}}^{{\min \left( {N - 1,i + n/2} \right)}} {\left( {a_{x,y}^{i} } \right)^{2} } } \right)^{\beta } $$
(8)

where \(a_{x,y}^{i}\) is the activity of a neuron computed by applying kernel i at position (x, y), N is the total number of kernels in the layer, n is ‘adjacent’ kernel maps at the same spatial position, \(k = 2,\alpha = 10^{ - 4} ,\beta = 0.75\).

AlexNet model also uses the overlapping pooling method and the experiments show that using pooling with overlap is better than the traditional.

In this paper, a similar structure is used. Compared with the original AlexNet network, we reduce the size of the convolution kernel of the convolution layer to extract more detailed features, and the output nodes of the convolution layer are also reduced. In addition, the LRN layer in the network structure is deleted and changes to the BN layer. The position of the BN layer is after the convolution layer and before the activation layer, and the Group convolution operation is deleted at the same time, so that the training can be performed on one GPU. On the other hand, a convolutional layer has been added to increase the depth of the network to increase its capabilities of representation. Finally, because the BN layer is added, the dropout parameter setting at the fully connected layer is small. The network model built in this paper includes 8 convolutional layers, 4 pooling layers, and 3 fully connected layers. The padding of the first three convolutional layers selects the valid mode, and the padding of the subsequent layers selects the same mode. The modified AlexNet model structure is shown in Fig. 6.

Fig. 6
figure 6

Modified AlexNet model structure

3.4 Early warning strategy module

Reasonable strategies can effectively remind vehicle drivers to restore their attention. In this paper, we propose a simple and feasible early warning strategy for EEG signals, which can prompt the driver when the driver drowsiness is detected, so that the driver can restore attention as soon as possible. The early warning strategy process is shown below.

  1. (1)

    When the vehicle driver is driving normally, the EEG signal detection system does not detect an abnormal state, and the system indicator light is white.

  2. (2)

    When the vehicle driver is determined to be drowsy for 3 s, the driver is deemed to be in a drowsy state. The indicator light will be red to remind the driver to restore his attention. At this time, the driver is judged as a first-level drowsiness state.

  3. (3)

    When the red light continues to turn on for more than 5 s, if the vehicle driver’s EEG signal is still judged to be drowsy at this time, it can be determined that the driver is already at a high level of drowsiness at this time, that is, the second-level drowsiness state. The buzzer sounds to alert the driver.

  4. (4)

    If the red light turns on less than 5 s and the EEG signal returns to the normal state, it means that the vehicle driver has recovered his attention, the indicator light returns to white.

4 Experimental results

The goal of this section is to experiment and demonstrate scientifically the capability of the vehicle driver drowsiness detection method using wearable EEG based on convolution neural network.

4.1 EEG signals acquisition program

Twenty subjects (18 males and 2 females) were selected for this collection experiment. They ranged in age from 22 to 42 years old and were in good health and had no history of mental illness. Before the experiment, all subjects have been informed of all experimental purposes and specific operating procedures and signed a written consent form.

In order to avoid the influence of blood glucose level in human body on EEG, data collection experiments were carried out at least one hour after meals. All sedatives and sleeping drugs were stopped three days before the experiment. And in order to reduce scalp resistance, the hair was washed the day before the experiment. The sampling frequency of the acquisition device is 256 Hz, and the subjects will collect awake and drowsy EEG data in the following two time periods after completing the above acquisition preparation conditions: One is that the subjects had effective sleep for 8 h before the awake data collection experiment, eating breakfast at 8 a.m., and maintaining emotional stability, collecting awake EEG data from10 a.m. and 12 a.m. The other is that the subjects staying up late before collecting drowsy data. Collecting drowsy EEG data between 3 a.m. and 5 a.m. after one day. The time interval between the collection of awake EEG data and drowsy data for each subject was one week.

The data acquisition experiment of EEG signals is shown in Fig. 7. We collect the awake EEG signals, while the subject is driving a vehicle on the road in school campus without people during 10 a.m. and 12 a.m., which can be seen in Fig. 7a, b. Considering the danger of collecting drowsiness driving data in really driving environment, we collect drowsiness driving data in a stationary laboratory location during 3 a.m. and 5 a.m., which are shown in in Fig. 7c, d.

Fig. 7
figure 7

EEG signal collection experiment

The experiment used fatigue warning system MR688 [40] to assist in verifying the true state of the subjects when collecting EEG signals.

4.2 Experimental setup

The data used in this article is collected by OpenBCI and imported into the data stream in real time through MATLAB. The pre-processing part which includes linear filter, FastICA, and wavelet threshold are all completed on MATLAB, and the dataset in the format of ‘mat’ is obtained.

The hardware environment of this experiment is as follows: CPU Intel I5-6300HQ, frequency 2.6 GHz, GPU is NVIDIA GeForce 960 M, video memory is 4G, running memory is 16G DDR4 2133 MHz.

The software environment includes Python 3.6, and Anaconda is selected for package management. The Tensorflow version used is the Tensorflow 1.7 GPU version. The NVIDIA computing platform CUDA version 7.0 is adopted, and the supporting CuDNN 7.0 is adopted to accelerate the calculation.

4.3 Dataset description

The data samples used in this paper include drowsy state and awake state. The acquisition equipment is an eight-channels EEG signal acquisition device with a sampling frequency of 256 Hz. This paper selects the EEG signal for 1 s as a training sample and collects the EEG signal for 1 h under the condition of awake and drowsiness for each of the 20 subjects. In the end, we obtain a total of 69,054 samples, of which 33,035 were awake period samples and 36,019 were drowsiness period samples. Figure 8 shows the EEG of the selected O1 channel in the awake state and drowsy state. It can be seen from the figure that the EEG signal is sparser in the drowsy state and presents a certain waveform, which is caused by the increase of \(\alpha\) wave activity in the drowsy state of the human body. The collected signals are first filtered by linear filter, then FastICA is used for signal separation, and then wavelet threshold is used to denoise. Then perform matrix stitching to obtain an EEG dataset with dimensions of 69,054 × 256 × 8. According to the different states of the EEG signals, a 69,054 × 2 corresponding label dataset is established. The labels use one-hot coding, and ‘10’ and ‘01’ represent awake and drowsy, respectively. Finally, the dataset is randomly shuffled and divided into two parts: 50,000 × 256 × 8 and 19,054 × 256 × 8, which are used as the training set and test set, respectively.

Fig. 8
figure 8

EEG of O1 channel in the awake state and drowsy state

4.4 EEG signal pre-processing results

The EEG data obtained by the preliminary collection are not completely valid EEG data, which also contains a lot of noise, which affects the subsequent experimental results. The EEG data collected in the acquisition experiment need to be filtered to remove the noise part that does not coincide with the EEG spectrum. We select a third-order Butterworth bandpass filter to initially filter the signal, and the results of each channel component of EEG signals after linear filter are shown in Fig. 9a. It can be seen from the figure that the EEG signal after linear filter has EEG signal characteristics, which has a nonlinear trend and has analytical value. Since each channel of the EEG signal affects each other, in order to purify the EEG signal of each channel, Fast ICA is used for separation. After the separation, the signal waveforms of the8 EEG channels are shown in Fig. 9b. As can be seen from the figure, after Fast ICA processing, the quality of EEG signals has been significantly improved. The signals processed by Fast ICA contain some noise and the accuracy is not enough, so we use the wavelet threshold method to decompose and reconstruct the signal of each channel and extract the signal of the frequency band we need. We use three-layer wavelet decomposition to reconstruct the EEG signal, and the results are shown in Fig. 9c. The output y in the figure is the reliable EEG signal we need after reconstruction, and its band is 0–64 Hz.

Fig. 9
figure 9

EEG signal pre-processing. a EEG signal after linear filter. b EEG signal after FastICA. c EEG signal after wavelet threshold

4.5 EEG signals classification results based on CNN with Inception module

Considering the limited hardware resources, the training process employs batch processing, which requires less memory and has faster calculations. The extreme small batch size will cause the loss curve to violently oscillate. Therefore, the batch size is set to 64 and the learning rate is 0.0003, so that the smaller learning rate can be more accurate to find the optimal point and be better for the classification problem. Dropout is added to the full connection layer to reduce the calculation parameters and prevent overfitting. Finally, because the Inception model is more complex, we also add the L2 regularity after the loss function to further prevent overfitting and randomly extract 15% of the data from the training set as a validation set to determine the network training status.

Figure 10a is the change of training loss of the network structure with Inception module. By the loss function of the training image, it can be roughly judged whether the learning rate is reasonable. As seen from the figure, the loss of the network initially drops rapidly to a lower value due to the batch normalization layer. In the subsequent iterations, the training loss is constantly fluctuating since the mini-batch is smaller than the entire training sample, but the overall trend is declining. Figure 10b shows the change of the validation set loss during the training process. As the iterations increase, the loss of the validation set also continuously decreases and eventually stabilizes. It can be seen from the images that the overfitting phenomenon has not occurred because the model parameters were selected properly.

Fig. 10
figure 10

Loss change of the network structure with Inception module: (a) training loss; (b) validation loss

The performance of the model is evaluated next. Model evaluation is mainly carried out in terms of accuracy, precision, recall, F1 score and area under curve (AUC). Assuming that the driver's original state is awake and is judged to be awake by the network, such samples are true positives (TP). The awake state is judged as drowsy is false negatives (FN). The drowsy state is judged as awake is false positives (FP). The drowsy state is judged as drowsy is false negatives (FN). The formulas of accuracy, precision, recall and F1 score are as follows:

$$ Accuracy = \frac{TP + TN}{{TP + TN + FP + FN}} $$
(9)
$$ Precision = \frac{TP}{{TP + FP}} $$
(10)
$$ Recall = \frac{TP}{{TP + FN}} $$
(11)
$$ F_{1} = \frac{2TP}{{2TP + FP + FN}} $$
(12)

The results of model evaluation are shown in Fig. 11. It can be seen that as the iteration progresses, these indicators continue to rise, which also shows that the capabilities of the model continue to rise. Finally, when the network is close to convergence, the indices tend to be stable.

Fig. 11
figure 11

Evaluation on the validation set using Inception module

The robustness of a classifier is mainly measured through the receiver operating characteristic curve (ROC). The abscissa and ordinate of ROC are false positive rate (FPR) and true positive rate (TPR), respectively. The calculation formulas are:

$$ FPR = \frac{FP}{{TN + FP}} $$
(13)
$$ TPR = \frac{TP}{{TP + FN}} $$
(14)

The ROC curve can reflect the classification effect of a classifier. However, when the curve crosses, the ROC characterization method is not intuitive enough to quickly judge the quality of the classifier. Therefore, the area under the curve is used to intuitively reflect the classification ability expressed by the ROC curve, which is AUC. Figure 12 shows the variation of AUC with the number of iterations during the training process. It can be seen from the figure that the AUC curve rises steadily and finally stabilizes at about 0.95 when the iteration is performed approximately 20,000 times.

Fig. 12
figure 12

AUC change curve on the Inception network verification set

After training, the accuracy of the training set of the model on the last mini-batch is 96.87%. The accuracy of the validation set is 95.33%, the precision is 95.57%, the recall is 95.60%, the F1 score is 95.48%, and the AUC is 0.9553. By introducing the test set size of 19,054 × 256 × 8 into the completed model, a final model with an accuracy of 95.59% and a recall of 96.12% is designed.

After the training is completed, the network structure used in this paper is visualized to visually observe the extracted features. This paper selects the input of the first Inception module and the output of the three Inception modules for visualization. For the convenience of visualization, the output of the first 16 filters is selected for visualization in each layer, and the results are shown in Fig. 13. From the feature extraction map, it can be seen that in the shallow network, the sequence length of the data is relatively long, and it can represent more information and can contain more features. As the number of network layers deepens, the length of the sequence extracted by the convolution kernel becomes shorter, and the fewer features extracted by each filter, but the more representative it is.

Fig. 13
figure 13

Feature extraction map of network structure with Inception model

We visualize the relationship between driver drowsiness and brain position using network structure with Inception module. The visualization result is shown in Fig. 14. The horizontal axis of the image represents the sequence value. The vertical axis is the channel, which from top to bottom are Fp1, Fp2, C3, C4, T7, T8, O1 and O2. Figure 14a shows the beginning of the network training, and its weights are randomly generated, which looks chaotic. As the training progresses, the weight distribution gradually changes. After 20,000 iterations of the network, it can be seen from Fig. 14b that the basic white dots are concentrated in the bottom two rows, which means the EEG signals of O1 and O2 channels are most closely related to the drowsy state.

Fig. 14
figure 14

Relationship between channels and drowsiness: a initial and (b) 20,000 iterations

4.6 EEG signals classification results based on modified AlexNet model

The parameters of the modified AlexNet network model built in this paper are as follows: batch size is 64, and the learning rate is set to 0.0003. At the same time, because the network model is simpler than the Inception structure, in order to prevent insufficient representation capabilities, the dropout ratio is set to 0.5, that is, neurons are retained with a probability of 0.5 in the fully connected layer, which is greater than 0.2 of the Inception structure, and remove the regularization term in the cost function. The change of the training loss and verification loss of the network model with the number of iterations is shown in Fig. 15. The network converged when the iteration was performed approximately 16,000 times. During this period, the training loss continued to decline and eventually stabilized, and the overall loss of the validation set also continued to decline.

Fig. 15
figure 15

Loss change of the modified AlexNet module. a Training loss. b Validation loss

We evaluate the indicators of the model on the validation set. The accuracy, precision, recall, and F1 score on the validation set can be seen in Fig. 16, respectively. As the iteration progresses, all indicators maintain an upward trend, which proves that as the training progresses, the model continues to improve without overfitting. AUC change trend of the model is shown in Fig. 17. After training, the accuracy of the training set of the model on the last mini-batch is 98.43%. The accuracy of the validation set is 94.76%, the precision is 94.87%, the recall is 95.28%, the F1 score is 94.96%, and the AUC is 0.9504. The accuracy and recall of the final model on the test set are 94.68% and 95.32%

Fig. 16
figure 16

Evaluation on the validation set using modified AlexNet network

Fig. 17
figure 17

AUC change curve on the modified AlexNet network verification set

Next, select test data samples for visual management. In the established modified AlexNet network, the second, fourth, sixth, and eighth layers of convolutional networks are visualized. Due to the long length of the shallow sequence, for the convenience of visualization, the first 8 filters are selected for the second layer, and the remaining 16 filters are selected for the rest. The test data sample visualization results are shown in Fig. 18. The modified AlexNet network is similar to the previous Inception network, and the features extracted in the shallow layer are similar to the original waveform. With the deepening of the number of layers, the extracted features are more and more advanced, representing the features that are actually applied by network recognition.

Fig. 18
figure 18

Modified AlexNet model feature extraction map

4.7 Comparison

By comparing the two models, the following conclusions can be drawn. As the number of iterations increases, both models eventually converge, and the accuracy on the test set of the network structure with Inception module is 95.59%, while the modified AlexNet module is 94.68%. The classification accuracy of the network structure with the Inception model is slightly higher than the modified AlexNet model, in addition to the validation set accuracy, precision, recall, F1 score and AUC, the network model with the Inception structure is also slightly higher, but the difference is not large and the two types of models have close capabilities. From the perspective of training time, the training of the network structure with the Inception model takes 1 h and 16 s, while the modified AlexNet network model takes only 39 min. The modified AlexNet model has fewer parameters, so it trains faster and converges faster.

On the other hand, we compared the proposed method with the other state of the art methods. Lin et al. [26] proposed a one channel BCI system using Mahalanobis distance (MD) to detect the drowsiness in real time. Zhang et al. [41] used a support vector machine (SVM) classification algorithm and the fast Fourier transform (FFT) to determine the vigilance level. Li et al. [42] proposed a smartwatch-based wearable EEG system using support vector machine-based posterior probabilistic model (SVMPPM) for driver drowsiness detection. Punsawad et al. [32] developed a single-channel EEG-based device for real time drowsiness detection. Chai et al. [43] presented a two-class EEG-based classification using Bayesian neural network for classifying of driver fatigue. Wali et al. [44] used discrete wavelet packet transformation (DWPT) and fast Fourier transformation (FFT) to classify the driver drowsiness level. The comparison results are shown in Table 1.

Table 1 Comparison

It can be seen from the data in Table 1 that the method proposed by Lin et al. [26] obtained an accuracy of 82.8% and a recall of 88.7%. The accuracy of the method proposed by Li et al. [42] is 88.6% when time window is 1 min. The method proposed by Wali et al. [44] obtained an accuracy of 79.21% and a recall of 82.09%. Zhang et al. [41] showed the effect of time window on accuracy. When the time window is set to 1 s, the average classification accuracy of the O1 EEG signal is 83.28%. As the time window rises, the overall trend of the algorithm classification accuracy is rising and obtained a classification accuracy of 90.7% and a recall of 86.8% in the end.

This is because the traditional EEG signal classification methods rely more on extracting corresponding features through experience and subjective observations, so the longer time window you choose, the higher the classification accuracy. If we choose different features, the final classification accuracy is also different. The convolutional neural network with the Inception module and the modified AlexNet module proposed in this paper both use EEG signals within 1 s time window as training samples. We obtained a final accuracy of 95.59% and 94.68%, and a recall of 96.12% and 95.32%. Punsawad et al. [32] used 4-channel EEG-based method and obtained a classification accuracy of 90.4%. Chai et al. [43] used 32-channel EEG-based system and got a classification accuracy of 88.2% and a recall of 89.7%. It can be seen from the comparison that the multi-channel EEG signal classification method is better than the single channel, as multi-channel equipment can collect more useful information. However, with the increase of datasets and the increase of classification categories, traditional classification algorithms will face problems such as too long calculation time and insufficient accuracy. For deep learning methods, with the improvement of computer computing power, we only need to produce more accurate and larger datasets, and the generalization ability and performance of the model will continue to improve. After large-scale network training, we can easily classify and warn different drivers EEG signals.

4.8 Early warning strategy module results

In order to verify the feasibility of the early warning strategy described in this paper, MATLAB is used for simulation. We assume that the driver’s EEG signal state sequence is shown in Fig. 19a. ‘1’ and ‘0’ represent the drowsy state and awake state. Figure 19b shows the vehicle driver drowsiness level obtained from the assuming EEG signal sequence through the proposed early warning strategy. ‘0’ represents awake, ‘1’ means the driver is in the first level of drowsiness, and ‘2’ means the driver is in the second level of drowsiness.

Fig. 19
figure 19

Early warning strategy simulation. a Driver EEG state sequence. b Driver drowsiness rating sequence

For security reasons, the early warning system cannot be tested in a real environment, so we use the OpenBCI Cyton EEG detection system and the Arduino open-source electronic platform to verify the effectiveness of the above simulation. The early warning strategy experiment is shown in Fig. 20. When the driver is in awake state, first-class drowsiness state and second-class drowsiness state, the corresponding response of the early warning equipment is that the white light is on, the red light is on, and the buzzer sounds. The experimental results are consistent with the simulation state, verifying the reliability of the early warning strategy.

Fig. 20
figure 20

Early warning strategy experiment

5 Conclusions

In this study, the vehicle driver drowsiness detection method using wearable EEG based on convolution neural network is presented. The EEG collection module, EEG signal processing module and early warning module formed a complete system which can be used in vehicle driving safety. The final experimental results show the great performance of the proposed method in vehicle driver drowsiness detection. Specifically, the equipment provides excellent classification efficiency, and the accuracy can reach 95.59% based on a one second time window samples using neural network with Inception module and reach 94.68% using modified AlexNet network module during simulation and tests. The proposed early warning strategy is also very effective. The simulation and test results demonstrate the feasibility of the proposed drowsiness detection system using EEG signals for vehicle driver driving safety.

In our future research, we will focus on integrating all modules and embedding them into the development board. We will also conduct more in-depth research on EEG artifact removal, signal classification and real-time signal processing.