Background & Summary

Brain-computer interfaces (BCIs) allow communication without muscular activity based on brain signals measured with electroencephalography (EEG). The P300 component is an event-related potential elicited during the process of decision making. P300-based BCIs1 have been gaining attention in recent years and nowadays are considered one of the main BCI categories2. Compared to other BCI paradigms, P300 BCIs are relatively fast, effective for most users, straightforward, and require practically no training of the subjects2.

However, since BCIs typically rely on supervised classification, a substantial amount of P300 data is necessary for their training. Continuous EEG data are relatively common in data publications, e.g., (ref. 3). Unfortunately, data sharing is rare in P300-related publications. There are some publicly available P300 datasets, e.g., a benchmark P300 speller dataset from the BCI Competition 2003 (ref. 4). In ref. 5, the authors describe off-line analysis of P300 data for a simple BCI system. In addition, the authors offer the related data for sharing5. Both offered P300 datasets were obtained using short inter-stimulus intervals, 100 and 400 ms, respectively. The data are stored in various inner Matlab structures and the description of related metadata is limited. We have already published a smaller collection of P300 datasets based on a simple LED-based protocol6. With this exception, to the best knowledge of the authors, there is no publicly available database that offers P300 data for subsequent analysis while including metadata in a format widely accepted by the neuroscience community. Consequently, researchers in this area often use their own data. This limits the opportunity to reasonably compare different studies.

The aim of this article is to describe a large collection of P300 datasets and provide it to the scientific community.

The P300 data contained in the presented P300 collection of datasets were collected during the ‘Guess the number’ experiment. This experiment, based on visual stimulation, was originally developed to demonstrate the benefits of using BCI to public. The participant in the experiment is asked to choose a number between 1 and 9 and concentrate on it (i.e., this number is the target stimulus). Then, the subject is exposed to visual stimuli that include numbers between 1 and 9 randomly appearing on the monitor. During the experiment, both EEG signal and stimuli markers are recorded. Concurrently, experimenters observe average event-related potential (ERP) waveforms for each number and try to guess the number thought. Their guess is finally verified when the participant is asked to reveal the thought number.

Methods

Environment

The experiments were carried out in elementary and secondary schools, mainly located in the Pilsen region, the Czech Republic, between autumn 2014 and spring 2015. The measurements were taken at the time of regular school hours, typically in the morning. Each experiment was performed in a classroom that was arranged for health entertaining and educating programme including Neurosky brain games, ECG monitoring, modeling of body muscles, etc. Unfortunately, the environment was usually quite noisy since many children and also many electrical devices were present in the room at the same time. However, in any case there were no people standing or moving behind the monitor or in the close proximity of the measured participant.

Stimulation protocol

The participants were stimulated with numbers between 1 and 9 flashing on the monitor in random order. The numbers were white on the black background as shown in Fig. 1c. The inter-stimulus interval was set to 1,500 ms.

Figure 1: Experiment, hardware equipment.
figure 1

(a) The medium 10/20 EEG cap. (b) The BrainVision V-Amp amplifier. (c) Course of the experiment. Researchers are observing event-related potentials while these are averaged in the BrainVision Recorder. The notebook on the right is used to control the stimulation. The subject sitting in the chair is exposed to visual stimuli. The subject’s face has been blurred to protect his/her privacy.

Hardware and software

A mobile EEG laboratory (equipment easy to unpack, operate, and pack again in the conditions described above) was transported to schools to perform experiments. More precisely, the following hardware devices were used: the BrainVision standard V-Amp amplifier (Fig. 1b), standard small or medium 10/20 EEG cap (Fig. 1a), standard reference, ground and EOG electrodes, monitor for presenting the numbers, and two notebooks necessary to run stimulation and recording software applications. To speed up the guessing task, only three electrodes, Fz, Cz and Pz, were active. The stimulation protocol was developed and run using the Presentation software tool produced by Neurobehavioral Systems, Inc. The BrainVision Recorder was used for recording and storing raw EEG data, metadata describing the raw data, and stimuli data. MATLAB, EEGLAB and ERPLAB were later used to validate the data.

Participants and experimenters

The participants were school-age children and teenagers (aged between 7 and 17; average age 12.9), 138 males and 112 females. All participants and their parents were informed about the programme of the day and the experiments carried out. All participants took part in the experiment voluntarily. Many of them took part in more brain experiments during the day. The gender, age, and laterality of the participants were collected, it means that no personal or sensitive data such as names, birth dates or identifying physical symptoms were asked or recorded. Experiments, which were conducted in one day, were carried out in the same place in the same classroom.

There were usually three experimenters present. The first experimenter, a health-care professional in EEG, was responsible for preparing the participant for the experiment (by applying the EEG cap and electrodes) and explaining them the goal of the experiment and behavior rules that are necessary to follow during the experiment. This experimenter was also responsible for replacing the cap and electrodes after the end of the experiment. The second experimenter was mainly responsible for correct functioning of the used hardware and software infrastructure. The third experimenter spent most of his time explaining the nature of the experiment to the participant and to other onlookers. All experimenters usually participated in the main task—guessing the number.

Data acquisition

Before starting the experiment itself the participants were informed about the goal of the experiment, course of the experiment, and used equipment. Each participant was familiarized with basic behavioral rules, asked to sit comfortably, pay attention to the stimulation, not to move, and limit their eye blinking. To increase alertness, the participants were instructed to silently count the total number of target stimuli presented on the monitor. During the experiment the participants were sitting approximately 1.5 m in front of the monitor for as long as needed (approximately 10 min on average). Other children observing the experiment were asked not to enter into the field of view of the participant and not to disturb him/her in any other way.

Then the participant was technically prepared for the experiment: an EEG cap was used depending on the size of the participant’s head, the reference electrode was placed on the bridge of the nose and the ground electrode was placed on the ear. The EOG electrode for observing eye movements was placed under the participant’s eye. The reference, ground, and EOG electrodes as well as the EEG cap were connected to the V-Amp amplifier. The impedances of all electrodes were checked and corrected if necessary. When the participant assured experimenters that he/she had understand all the circumstances of the experiment and selected a target number to concentrate on, the experiment was launched.

During the experiment the participant was regularly checked if he/she was following the rules. If the signal was damaged by eye blinking or other movement artifacts, the participant was asked to reduce these movements. However, there were several cases when the experiment was terminated prematurely because of a large number of artifacts or bad feelings (nausea, headache) of the participant. Normally the experiment was stopped at the time the experimenters decided to guess the number or assumed not having any chance to guess the number from the signal. If the experimenters were not successful in guessing the number, they usually asked the participant to continue in the experiment and tried the second or even the third guess. After finishing the experiment the experimenters showed and explained the participant his/her results including the P300 average waveforms. The explanation was always adjusted to the age of the participant.

Data Records

Data storage

The EEG/ERP Portal (EEGBase) (Data Citation 1) was used for storing the experimental data and metadata. It is a web application that serves not only for long-term storage of EEG/ERP experiments, but also for their annotation, management, and sharing. The stored data are protected by the system of user accounts and defined user roles (Reader, Experimenter, Group Administrator, and Supervisor). Individual users are grouped into self-managed groups. The user is required to create a personal account prior to uploading or downloading any experiment. Metadata are stored using metadata templates that reflect the odML terminologies7.

Although the EEG/ERP Portal has been developed and optimized as a data storage for human EEG data, it does not provide direct and permanent download links for individual datasets. Currently the EEG/ERP Portal also does not support DOI citations. Therefore, each dataset stored in the EEG/ERP Portal is mirrored in the Harvard Dataverse (Data Citations 2251).

Data organization

The data and metadata from 250 participants are stored in the EEG/ERP Portal and downloadable as ‘PROJECT DAYS P3 NUMBERS’ zip package (the procedure of getting this package is described in Section usage notes). Each dataset has its own folder that is further internally organized in the following way:

  1. 1

    the experimental protocol (the files generated by the Presentation software) is located in the Scenario folder,

  2. 2

    the experimental data and metadata stored in the BrainVision format (.eeg,.vhdr and .vmrk files) and the basic experimental metadata (.txt file) are located in the Data folder.

    1. i

      P3Numbers_yyyymmdd_gender_age_id.eeg is a binary file containing raw EEG data,

    2. ii

      P3Numbers_yyyymmdd_gender_age_id.vhdr is a text file containing metadata that describe raw EEG data stored in the corresponding.eeg file,

    3. iii

      P3Numbers_yyyymmdd_gender_age_id.vmrk is a text file containing stimuli markers used in the experiment,

    4. iv

      P3Numbers_yyyymmdd_gender_age_id.txt is a text file containing basic experimental metadata—gender, age, the number thought, first guessed number, second guessed number, third guessed number, laterality, and eventually any interesting additional information (the field named as ‘other’) collected on site (these metadata are presented separately because they did not meet fully the allowable content of EEG/ERP Portal metadata templates at the time when they were collected and stored).

  3. 3

    The License agreement (Creative Commons Attribution Non Commercial 4.0) is located in the License folder.

  4. 4

    The experimental metadata file (metadata.xml) contains a set of metadata (such as used hardware and software) describing the experimental conditions. It is stored in the root folder of each dataset and structured according to the portal metadata template used for data storing. It reflects the EEG/ERP Portal terminology restrictions applied to the metadata content at the time the metadata were collected and stored (These metadata restrictions are no more applied; currently all experimental metadata could be stored in one file only).

While all the described files are organized in a hierarchical folder structure within a .zip package when they are downloaded from the EEG/ERP Portal, this hierarchical structure is not applied to the replicated data in the Harvard Dataverse repository (Data Citations 2251). Instead, the files are organized in the plain structure there.

Technical Validation

All data were saved in a raw form. It means that the preprocessing methods (filtering, baseline correction, artifacts rejection) applied to the data during experimental sessions (to visualize and analyze them) were not applied to the stored data. The quality of datasets varies because all measurements were performed outside the laboratory. EEG signal of most datasets shows a declining signal trend. Because of that we tested the hardware amplifier for possible defects and took measures to eliminate sources of interference in classrooms as much as possible. We believe that the declining signal trend was caused by outside interference we could no longer influence (see Fig. 2). This issue can be easily handled by applying high pass filtering (with cutoff frequency e.g., 0.5 Hz). Most data were first stored on a laptop that was running on battery power. However, in the case of low battery power (only during the days when many experiments were carried out) it was necessary to switch the power source to grid. Then the data contain 50 Hz interference that can be also removed by filtering.

Figure 2: Averaged Pz channel epochs of the target stimulus.
figure 2

(a) Averaged Pz channel epochs of the target stimulus of the experiment with ID 341 (data file P3Numbers_20150618_f_10_001.eeg). Epochs were extracted in the interval −500 ms to 1,000 ms relative to stimulus onset. Subsequently a baseline acquired from the −500 ms to 0 ms interval before the stimulus was subtracted from each epoch. The declining signal trend is clearly visible on the plot even after baseline correction. The trials above the chart represent moving averages with the smoothing window of length 10. The original data consist of 17 trials. The scale on the right side is in microvolts. (b) The same baseline corrected epochs averaged after high pass filtering with 0.5 Hz cut-off frequency. The declining signal trend was removed by the high pass filter.

The technical validation of each dataset was performed separately. Two different parameters were considered:

  • The rate of eye-blinking artifacts in ERP epochs. Eye-blinks severely distort the EEG signal and reduce the usability of datasets. The percentage of epochs damaged by eye blinks was calculated using a combination method described in8 for each experiment separately. The combination method iterates over all baseline-corrected ERP epochs. For eye-blinks detection, it uses a combined threshold factoring in maximum absolute value of amplitude and a correlation with a sample eye-blink. The results for each dataset are available in Table 1 (available online only).

    Table 1 The technical validation of datasets
  • It was evaluated if the number thought was correctly guessed by the experimenters or not. The experiments with successful guesses are typically associated with a larger amplitude of the P300 component. The results for each dataset are available in Table 1 (available online only).

Usage Notes

The experimental data and metadata can be downloaded from the EEG/ERP Portal (Data Citation 1) according to the following procedure. Any user has to be registered first. When the registration form is completed, a confirmation e-mail is sent to the user. Then the user is requested to click on the confirmation link contained in the confirmation e-mail. After successful login a personalized user’s homepage including an overview of user’s experiments, scenarios, research group memberships, etc. is displayed. In order to see publicly offered experiments and find the package named ‘PROJECT DAYS P3 NUMBERS’ the user selects the Experiments section from the main menu appearing at the top of the homepage. When the Experiment section is loaded, the user selects the package ‘PROJECT DAYS P3 NUMBERS’, chooses the license under which he/she wants to use the data (Creative Commons BY-NC) and clicks on the ‘Add to cart’ link (see Fig. 3).

Figure 3: EEG/ERP Portal (EEGBase)—list of experiments.
figure 3

List of experiments in the package ‘PROJECT DAYS P3 NUMBERS’ (only the first ten experiments are showed) and the ‘Add to cart’ link.

When the package is added into the cart, the user is requested to click on the ‘My cart’ link at the top of the page. The content of the cart is shown (Fig. 4). The experiments in the ‘PROJECT DAYS P3 NUMBERS’ package are available under the selected license. When the user finishes the order (by clicking on the ‘Create order’ button), the package is formally available for downloading (by clicking on the ‘Download’ link). Then the user confirms his/her selection of the experiments within the package and clicks on the ‘Create package’ button to create a.zip package (PROJECT_DAYS_P3_NUMBERS.zip). Since the data are quite large, the progress bar indicates the portion of the package that has been already created. When the package is created, it can be finally downloaded by clicking on the ‘Download’ link.

Figure 4: EEG/ERP Portal (EEGBase)—Content of the cart.
figure 4

Content of the cart that is available under the selected license.

The ordered (purchased) package could be re-downloaded at any time in the Experiment section by clicking on the ‘Download’ link that appears instead of the ‘Add to cart’ link within the package.

Since the data were stored in the BrainVision (BV) format9, appropriate software tools have to be used to read and further process the data. EEGLab, an open-source Matlab toolbox for EEG signal processing10, is one of the preferred options. It is necessary to download the BVA-io plugin (available at http://sccn.ucsd.edu/wiki/EEGLAB_Extensions_and_plug-ins) in order to easily import the data stored in the BV format into EEGLab. Another option is to use the EEGLoader library (available at https://github.com/stebjan/eegloader) that provides a simple interface for reading the BV format.

Additional Information

How to cite this article: Mouček, R. et al. Event-related potential data from a guess the number brain-computer interface experiment on school children. Sci. Data 4:160121 doi: 10.1038/sdata.2016.121 (2017).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.