Dataset Summary

The DEAP dataset consists of two parts:

  1. The ratings from an online self-assessment where 120 one-minute extracts of music videos were each rated by 14-16 volunteers based on arousal, valence and dominance.
  2. The participant ratings, physiological recordings and face video of an experiment where 32 volunteers watched a subset of 40 of the above music videos. EEG and physiological signals were recorded and each participant also rated the videos as above. For 22 participants frontal face video was also recorded.

For a more thorough explanation of the dataset collection and its contents, see [1]

File Listing

The following files are available (each explained in more detail below):

File nameFormatPartContents
Online_ratingsxls, csv, ods spreadsheetOnline self-assessmentAll individual ratings from the online self-assessment.
Video_listxls, csv, ods spreadsheetBoth partsNames/YouTube links af the music videos used in the online self-assessment and the experiment + stats of the individual ratings from the online self-assessment.
Participant_ratingsxls, csv, ods spreadsheetExperimentAll ratings participants gave to the videos during the experiment.
Participant_questionnairexls, csv, ods spreadsheetExperimentThe answers participants gave to the questionnaire before the experiment.
Face_videoZip fileExperimentThe frontal face video recordings from the experiment for participants 1-22.
Data_originalZip fileExperimentThe original unprocessed physiological data recordings from the experiment in BioSemi .bdf format
Data_preprocessedZip file for Python and MatlabExperimentThe preprocessed (downsampling, EOG removal, filtering, segmenting etc.) physiological data recordings from the experiment in Matlab and Python(numpy) formats

File details

Online_ratings

This file contains all the individual video ratings collected during the online self-assessment. The file is available in Open-Office Calc (online_ratings.ods), Microsoft Excel (online_ratings.xls), and Comma-separated values (online_ratings.csv) formats.

The ratings were collected using an online self-assessment tool as described in [1]. Participants rated arousal, valence and dominance using SAM mannequins on a discrete 9-point scale. In addition, participants also rated the felt emotion using an emotion wheel (see [2]).

The table in the file has one row per individual rating and the following columns:

Column nameDescription
Online_idThe video id corresponding to the same column in the video_list file.
ValenceThe valence rating (integer between 1 and 9).
ArousalThe arousal rating (integer between 1 and 9).
DominanceThe dominance rating (integer between 1 and 9).
Wheel_sliceThe slice selected on the emotion wheel. For some participants the emotion wheel rating was not properly recorded. In these cases, the Wheel_slice value is 0. Otherwise, the mapping of emotions on the wheel to integers given here is:
  1. Pride
  2. Elation
  3. Joy
  4. Satisfaction
  1. Relief
  2. Hope
  3. Interest
  4. Surprise
  1. Sadness
  2. Fear
  3. Shame
  4. Guilt
  1. Envy
  2. Disgust
  3. Contempt
  4. Anger
Wheel_strengthThe strength selected on the emotion wheel (integer between 0=weak and 4=strong).

Video_list

This file lists all the videos used in the online self-assessment and in the experiment in a table. The file is available in Open-Office Calc (video_list.ods), Microsoft Excel (video_list.xls), and Comma-separated values (video_list.csv) formats.

The table has one row per video and the following columns:

Column nameDescription
Online_idThe unique id used in the online self-assessment.
Experiment_idIf this video was selected for the experiment, this lists the unique id used in the experiment. Blank if not selected.
Lastfm_tagIf this video was selected via last.fm affective tags, this lists the affective tag. Blank otherwise.
ArtistThe artist that recorded the song.
TitleTitle of the song.
Youtube_linkThe original youtube link where the video was downloaded. Note that due to copyright restrictions we are unable to provide the videos we used and these links may have been removed or may be unavailable in your country.
Highlight_startThe time in seconds where the extracted one-minute highlight begins as determined by MCA analysis. For some videos, the highlight was manually overridden (for instance when a section of the song is particularly well-known).
Num_ratingsThe number of volunteers that rated this video in the online self-assessment
VAQ_EstimateThe valence/arousal quadrant this video was selected for by the experimenters. For each quadrant, 15 videos were selected by last.fm and 15 by manual selection. The quadrants are:
  1. high arousal, high valence.
  2. low arousal, high valence.
  3. low arousal, low valence.
  4. high arousal, low valence.
VAQ_OnlineThe valence/arousal quadrant as determined by the average ratings of the volunteers in the online self-assessment. Note that these can and sometimes do differ from the estimated quadrants.
AVG_x, STD_x,
Q1_x, Q2_x, Q3_x
Average, standard deviation and first, second and third quartile of ratings x (Valence/Arousal/Dominance) by volunteers in the online self-assessment.

Participant_ratings

This file contains all the participant video ratings collected during the experiment. The file is available in Open-Office Calc (participant_ratings.ods), Microsoft Excel (participant_ratings.xls), and Comma-separated values (participant_ratings.csv) formats.

The start_time values were logged by the presentation software. Valence, arousal, dominance and liking were rated directly after each trial on a continuous 9-point scale using a standard mouse. SAM Mannequins were used to visualize the ratings for valence, arousal and dominance. For liking (i.e. how much did you like the video?), thumbs up and thumbs down icons were used. Familiarity was rated after the end of the experiment on a 5-point integer scale (from "never heard it before" to "listen to it regularly"). Familiarity ratings are unfortunately missing for participants 2, 15 and 23.

The table in the file has one row per participant video rating and the following columns:

Column nameColumn contents
Participant_idThe unique id of the participant (1-32).
TrialThe trial number (i.e. the presentation order).
Experiment_idThe video id corresponding to the same column in the video_list file.
Start_timeThe starting time of the trial video playback in microseconds (relative to start of experiment).
ValenceThe valence rating (float between 1 and 9).
ArousalThe arousal rating (float between 1 and 9).
DominanceThe dominance rating (float between 1 and 9).
LikingThe liking rating (float between 1 and 9).
FamiliarityThe familiarity rating (integer between 1 and 5). Blank if missing.

Participant_questionnaire

This file contains the participants' responses to the questionnaire filled in before the experiment. The file is available in Open-Office Calc (participant_questionnaire.ods), Microsoft Excel (participant_questionnaire.xls), and Comma-separated values (participant_questionnaire.csv) formats.

Most questions in the questionnaire were multiple-choice and speak pretty much for themselves. Participant 26 unfortunately failed to fill in the questionnaire. This questionnaire also contains the answers to the questions on the consent forms (can the data be used for research, can your imagery be published?).

Face_video.zip

Face_video.zip contains the frontal face videos recorded in the experiment for the first 22 participants, segmented into trials. In the zip file, sXX/sXX_trial_YY.avi corresponds to the video for trial YY of subject XX.

For participants 3, 5, 11 and 14, one or several of the last trials are missing due to technical issues (i.e. the tape ran out). Please note that these videos are in the order of presentation, so the trial numbers do not correspond to the Experiment_id columns in the video_list file. The mapping between trial numbers and Experiment_ids can be found in the participant_ratings file.

Videos were recorded from a tripod placed behind the screen in DV PAL format using a SONY DCR-HC27E camcorder. The videos were then segmented according to the trials and transcoded to a 50 fps deinterlaced video using the h264 codec. The transcoding was done using the mencoder software with the following command:

mencoder sXX.dv -ss trialYY_start_second -endpos 59.05 -nosound -of avi -ovc x264
  -fps 50 -vf yadif=1:1,hqdn3d -x264encopts bitrate=50:subq=5:8x8dct:frameref=2:bframes=3 
  -noskip -ofps 50 -o sXX_trialYY.avi

The synchronisation of the video is accurate to approximately 1/25 second (barring human error). Synchronisation was achieved by displaying a red screen before and after the experiment at the same time as a marker sent to the EEG recording PC. The onset frame of this screen was then manually marked in the video recording. Individual trial starting times were then calculated from the trial starting markers in the EEG recording.

Data_original.zip

These are the original data recordings. There are 32 .bdf files (BioSemi's data format generated by the Actiview recording software), each with 48 recorded channels at 512Hz. (32 EEG channels, 12 peripheral channels, 3 unused channels and 1 status channel). The .bdf files can be read by a variety of software toolkits, including EEGLAB for Matlab and the BIOSIG toolkit.

The data was recorded in two separate locations. Participants 1-22 were recorded in Twente and participant 23-32 in Geneva. Due to a different revision of the hardware, there are some minor differences in the format. First, the order of EEG channels is different for the two locations. Second, the GSR measure is in a different format for each location.

The table below gives the EEG channel names (according to the 10/20 system) for both locations and the indices that can be used to convert one ordering to the other:

Channel no.Ch. name TwenteCh. name GenevaGeneva > TwenteTwente > Geneva
1 Fp1 Fp111
2 AF3AF322
3 F7 F344
4 F3F733
5 FC1 FC566
6 FC5FC155
7 T7 C388
8 C3T777
9 CP1 CP51010
10CP5CP199
11P7P31212
12P3P71111
13PzPO31614
14PO3O11315
15O1Oz1416
16OzPz1513
17O2Fp23230
18PO4AF43129
19P4Fz2931
20P8F43027
21CP6F82728
22CP2FC62825
23C4FC22526
24T8Cz2632
25FC6C42223
26FC2T82324
27F4CP62021
28F8CP22122
29AF4P41819
30Fp2P81720
31FzPO41918
32CzO22417

The remaining channel numbering is the same for both locations. However, please note the GSR measurement is in different units for the two locations. The Twente GSR measurement is skin resistance in nano-Siemens, whereas the Geneva GSR measurement is skin conductance in Ohm. The conversion is given by:

GSRGeneva = 109 / GSRTwente

The following table gives the meaning of the remaining channels:

Channel numberChannel nameChannel content
33EXG1hEOG1 (to the left of left eye)
34EXG2hEOG2 (to the right of right eye)
35EXG3vEOG1 (above right eye)
36EXG4vEOG4 (below right eye)
37EXG5zEMG1 (Zygomaticus Major, +/- 1cm from left corner of mouth)
38EXG6zEMG2 (Zygomaticus Major, +/- 1cm from zEMG1)
39EXG7tEMG1 (Trapezius, left shoulder blade)
40EXG8tEMG2 (Trapezius, +/- 1cm below tEMG1)
41GSR1Galvanic skin response, left middle and ring finger
42GSR2Unused
43Erg1Unused
44Erg2Unused
45RespRespiration belt
46PletPlethysmograph, left thumb
47TempTemperature, left pinky
48StatusStatus channel containing markers

The status channel contains markers sent from the stimuli presentation PC, indicating when trials start and end. The following status markers were employed:

Status codeEvent durationEvent Description
1 (First occurence)N/Astart of experiment (participant pressed key to start)
1 (Second occurence)120000 msstart of baseline recording
1 (Further occurences)N/Astart of a rating screen
21000 msVideo synchronization screen (before first trial, before and after break, after last trial)
35000 msFixation screen before beginning of trial
460000 msStart of music video playback
53000 msFixation screen after music video playback
7N/AEnd of experiment

Data_preprocessed_matlab.zip and Data_preprocessed_python.zip

These files contain a downsampled (to 128Hz), preprocessed and segmented version of the data in Matlab (data_preprocessed_matlab.zip) and pickled python/numpy (data_preprocessed_python.zip) formats. This version of the data is well-suited to those wishing to quickly test a classification or regression technique without the hassle of processing all the data first. Each zip file contains 32 .dat (python) or .mat (matlab) files, one per participant. Some sample code to load a python datafile is below:

import cPickle
x = cPickle.load(open('s01.dat', 'rb'))

Each participant file contains two arrays:

Array nameArray shapeArray contents
data40 x 40 x 8064video/trial x channel x data
labels40 x 4video/trial x label (valence, arousal, dominance, liking)

The videos are in the order of Experiment_id, so not in the order of presentation. This means the first video is the same for each participant. The following table shows the channel layout and the preprocessing performed:

Channel no.Channel contentPreprocessing
1 Fp1
  1. The data was downsampled to 128Hz.
  2. EOG artefacts were removed as in [1].
  3. A bandpass frequency filter from 4.0-45.0Hz was applied.
  4. The data was averaged to the common reference.
  5. The EEG channels were reordered so that they all follow the Geneva order as above.
  6. The data was segmented into 60 second trials and a 3 second pre-trial baseline removed.
  7. The trials were reordered from presentation order to video (Experiment_id) order.
2 AF3
3 F3
4 F7
5 FC5
6 FC1
7 C3
8 T7
9 CP5
10CP1
11P3
12P7
13PO3
14O1
15Oz
16Pz
17Fp2
18AF4
19Fz
20F4
21F8
22FC6
23FC2
24Cz
25C4
26T8
27CP6
28CP2
29P4
30P8
31PO4
32O2
33hEOG (horizontal EOG, hEOG1 - hEOG2)
  1. The data was downsampled to 128Hz.
  2. The data was segmented into 60 second trials and a 3 second pre-trial baseline removed.
  3. The trials were reordered from presentation order to video (Experiment_id) order.
34vEOG (vertical EOG, vEOG1 - vEOG2)
35zEMG (Zygomaticus Major EMG, zEMG1 - zEMG2)
36tEMG (Trapezius EMG, tEMG1 - tEMG2)
37GSR (values from Twente converted to Geneva format (Ohm))
38Respiration belt
39Plethysmograph
40Temperature

References

  1. "DEAP: A Database for Emotion Analysis using Physiological Signals", S. Koelstra, C. Muehl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, I. Patras, IEEE Transaction on Affective Computing, under review
  2. "What are emotions? And how can they be measured", K.R. Scherer, Social Science Information,vol. 44, no. 4, pp. 695-729, 2005.