The version 1.04 of MindBigData "IMAGENET" of
The Brain, open Data Base contains 70,060 brain signals of
3
seconds each, captured with the stimulus of seeing a random
image (14,012 so far) from the Imagenet ILSVRC2013 train dataset
and thinking about it, over the course of 2018, from a single Test Subject David Vivancos.
( Note that we realase earlier another Open Data Base for digits 0-9
instead of images The "MNIST" of Brain Digits.)

All the signals have been captured using commercial
EEGs (not medical
grade), with the Emotiv Insight
headset,
covering a total of 5 Brain (10/20) locations.

Files available for download (for raw EEG only current version is shown since it is
incremental) :

We built our own tools to capture them, but there is no post-processing on
our side, so they come raw as they are read from the EEG device, in total
26,850,320 Data Points.

Feel free
to test any machine learning, deep learning or whatever algorithm you think it
could fit, or add them to your ImageNet pipeline to try to improve your
performance, we only ask for acknowledging the source and please let us know of
your performance to post it here!

We choose not to differentiate the signals into
training/test sets at this
point so pick the distribution you prefer.

Periodically the Data Base will be increased with more EEG signals , last update 07/03/2018,
please feel free to forward any thoughts you may have for improving the dataset.

FILE FORMAT:

The
data is stored in a very simple text format
including 1 CSV file for each EEG data recorded related to a single image
14,012 so far.

The naming convention is as follows, for
example lets use the file "MindBigData_Imagenet_Insight_n09835506_15262_1_20.csv"

MindBigData_Imagenet_Insight_ : ralates to the EEG headset used
Insight atm only

n09835506 : ralates to category of the image
from the synsent of ILSVRC2013 in this example
n09835506 is
"ballplayer, baseball player", I added a "WordReport-v1.04.txt" file
too in the zip file with 3 files per row TAB separated with: the category names,
the eeg image recorded count and the synsent ID

15262 : ralates to the exact image from the above category ,
all the images are from the ILSVRC2013_train dataset you could download them
from the
Kaggle Website (imagenet_object_detection_train.tar.gz 56.68 GB)
, this image for example is n09835506_15262.JPEG from the
ILSVRC2013_train\n09835506\ folder

_1_ : ralates to the number of EEG sessions recorded for this
image, usually there will be only 1 but it is possible to have several brain
recordings for the same image, second will be 2 and so on.

_20 : ralates to a global session number where the EEG signal
for this image was recorded, to avoid long recording times only 5 images are
shown in each session with 3 seconds of visualization and 3 seconds of black
screen between them.

Inside the CSV file there are 5 lines of plain text one for each EEG channel
recorded, ending with a new line escape character, in this example

The
first field of each line is a text
string, to indentify the 10/20 brain location of the signal, with possible
values:
"AF3,"AF4","T7","T8","Pz" for
the Insight Headset (look bellow for the brain locations)

After that, you have separated by a coma, all the raw EEG values capttured, for
this headset it is done at 128Hz so there should be arround 384 (128 x 3 secs)
decimal values like "4304.61538461538" for each channel note the dot
is used for the decimal point

If you plot all the raw values for the AF3 channel (first line of the file) you
have this signal:

Note that this is the temporal series for the raw EEG electrical signal
cuputured from my brain stearing at the image relatead above for 3 secs, without
blinking and as still as possible to avoid EMG noise.

The other EEG channels follow the same pattern and the "time" coordinate of the
time serie is shared between the 5 channels so the first "column" of numbers is
the first time step and so on.

OPTIONAL SAMPLE SPECTROGRAM:

With the release of v.1.0 we include optionally a generated Spectrogram
for each of the EEG captures, for a faster include into your existing
image based deep learning pipeline, but we encourage to build your own one,
based on the raw eeg data.

The sample spectrogram is created using only 3 of the 5 EEG Channels available,
creating PNG RGB Files, AF3 channel is used as RED, AF4 as GREEN, and Pz as BLUE
, this is just a sample case, for example you may want to build your own black &
white files from the raw data and generating a PNG file for each channel.

To Generate the spectrogram for each raw eeg wave, the first 64 samples (1/2
seconds) are discarded to avoid possible communication lags, and then 128
samples are used to generate a FFT, and so on for each time step

Each time step is taken with a move ahead of 2 samples, until we reach 256
samples (2 seconds) so in total we cover 64 overlapped timesteps
too. Probably the overlap is too big you may want to try smaller ones in your
pre-processing pipeline.

Notice that for this sample scenario we are using the raw wave, but it is
advisable to use some filters previously as the EEG literature suggest.

At the end we have 64 Frequencies (0 to 63hz) values (from the
FFTs of 128 samples) for each of the overlapped 64 timesteps.

Notice also the the Frequencies included here are also not limited, and
including, as the EEG literature suggest, some frequencies beyond what is
expected for a brain signal. probably EMG or other signal artifacts.

Once we have the Frequency (magnitude) values, they are coded into color values
(0-255) using EEG AF3 channel as RED, AF4 as GREEN and Pz as BLUE.

To reduce the effects of outliers, the frequency value distribution is
proportionally mapped into the 0-255 value range for each color channel.

Here is a sample espectrogram amplified 10 times ( the ones included in the zip
are only 64x64 pixels)

The name of the files will be the same as the csv but ending in .png and you can find them in the folder
MindBigData-Imagenet-v1.0-Imgs

Beware that a few of the png files maybe filled mostly with a single color
reflecting a probable unexpected capture flaw for the EEG device, and probably
worth discarding.

BRAIN LOCATIONS:

Each
EEG device capture the signals via different sensors,
located in these areas of
my
brain, the color represents the device. Note that for the "IMAGENET" dataset only
Insight is used atm

Feel free to
Contact us if you need any more info, and glad to hear from your feedback.

PREVIOUS RESEARCH :

This is a list of realted work in the past, using
other high density EEG devices:

This MindBigData The "IMAGENET" of
The Brain is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/