Dataset Summary
The DEAP dataset consists of two parts:
- The ratings from an online self-assessment where 120 one-minute extracts of music videos were each rated by 14-16 volunteers based on arousal, valence and dominance.
- The participant ratings, physiological recordings and face video of an experiment where 32 volunteers watched a subset of 40 of the above music videos. EEG and physiological signals were recorded and each participant also rated the videos as above. For 22 participants frontal face video was also recorded.
For a more thorough explanation of the dataset collection and its contents, see [1]
File Listing
The following files are available (each explained in more detail below):
File name | Format | Part | Contents |
---|---|---|---|
Online_ratings | xls, csv, ods spreadsheet | Online self-assessment | All individual ratings from the online self-assessment. |
Video_list | xls, csv, ods spreadsheet | Both parts | Names/YouTube links af the music videos used in the online self-assessment and the experiment + stats of the individual ratings from the online self-assessment. |
Participant_ratings | xls, csv, ods spreadsheet | Experiment | All ratings participants gave to the videos during the experiment. |
Participant_questionnaire | xls, csv, ods spreadsheet | Experiment | The answers participants gave to the questionnaire before the experiment. |
Face_video | Zip file | Experiment | The frontal face video recordings from the experiment for participants 1-22. |
Data_original | Zip file | Experiment | The original unprocessed physiological data recordings from the experiment in BioSemi .bdf format |
Data_preprocessed | Zip file for Python and Matlab | Experiment | The preprocessed (downsampling, EOG removal, filtering, segmenting etc.) physiological data recordings from the experiment in Matlab and Python(numpy) formats |
File details
Online_ratings
This file contains all the individual video ratings collected during the online self-assessment. The file is available in Open-Office Calc (online_ratings.ods), Microsoft Excel (online_ratings.xls), and Comma-separated values (online_ratings.csv) formats.
The ratings were collected using an online self-assessment tool as described in [1]. Participants rated arousal, valence and dominance using SAM mannequins on a discrete 9-point scale. In addition, participants also rated the felt emotion using an emotion wheel (see [2]).
The table in the file has one row per individual rating and the following columns:
Column name | Description |
---|---|
Online_id | The video id corresponding to the same column in the video_list file. |
Valence | The valence rating (integer between 1 and 9). |
Arousal | The arousal rating (integer between 1 and 9). |
Dominance | The dominance rating (integer between 1 and 9). |
Wheel_slice | The slice selected on the emotion wheel. For some participants the emotion wheel rating was not properly recorded. In these cases, the Wheel_slice value is 0. Otherwise, the mapping of emotions on the wheel to integers given here is:
|
Wheel_strength | The strength selected on the emotion wheel (integer between 0=weak and 4=strong). |
Video_list
This file lists all the videos used in the online self-assessment and in the experiment in a table. The file is available in Open-Office Calc (video_list.ods), Microsoft Excel (video_list.xls), and Comma-separated values (video_list.csv) formats.
The table has one row per video and the following columns:
Column name | Description |
---|---|
Online_id | The unique id used in the online self-assessment. |
Experiment_id | If this video was selected for the experiment, this lists the unique id used in the experiment. Blank if not selected. |
Lastfm_tag | If this video was selected via last.fm affective tags, this lists the affective tag. Blank otherwise. |
Artist | The artist that recorded the song. |
Title | Title of the song. |
Youtube_link | The original youtube link where the video was downloaded. Note that due to copyright restrictions we are unable to provide the videos we used and these links may have been removed or may be unavailable in your country. |
Highlight_start | The time in seconds where the extracted one-minute highlight begins as determined by MCA analysis. For some videos, the highlight was manually overridden (for instance when a section of the song is particularly well-known). |
Num_ratings | The number of volunteers that rated this video in the online self-assessment |
VAQ_Estimate | The valence/arousal quadrant this video was selected for by the experimenters. For each quadrant, 15 videos were selected by last.fm and 15 by manual selection. The quadrants are:
|
VAQ_Online | The valence/arousal quadrant as determined by the average ratings of the volunteers in the online self-assessment. Note that these can and sometimes do differ from the estimated quadrants. |
AVG_x, STD_x, Q1_x, Q2_x, Q3_x | Average, standard deviation and first, second and third quartile of ratings x (Valence/Arousal/Dominance) by volunteers in the online self-assessment. |
Participant_ratings
This file contains all the participant video ratings collected during the experiment. The file is available in Open-Office Calc (participant_ratings.ods), Microsoft Excel (participant_ratings.xls), and Comma-separated values (participant_ratings.csv) formats.
The start_time values were logged by the presentation software. Valence, arousal, dominance and liking were rated directly after each trial on a continuous 9-point scale using a standard mouse. SAM Mannequins were used to visualize the ratings for valence, arousal and dominance. For liking (i.e. how much did you like the video?), thumbs up and thumbs down icons were used. Familiarity was rated after the end of the experiment on a 5-point integer scale (from "never heard it before" to "listen to it regularly"). Familiarity ratings are unfortunately missing for participants 2, 15 and 23.
The table in the file has one row per participant video rating and the following columns:
Column name | Column contents |
---|---|
Participant_id | The unique id of the participant (1-32). |
Trial | The trial number (i.e. the presentation order). |
Experiment_id | The video id corresponding to the same column in the video_list file. |
Start_time | The starting time of the trial video playback in microseconds (relative to start of experiment). |
Valence | The valence rating (float between 1 and 9). |
Arousal | The arousal rating (float between 1 and 9). |
Dominance | The dominance rating (float between 1 and 9). |
Liking | The liking rating (float between 1 and 9). |
Familiarity | The familiarity rating (integer between 1 and 5). Blank if missing. |
Participant_questionnaire
This file contains the participants' responses to the questionnaire filled in before the experiment. The file is available in Open-Office Calc (participant_questionnaire.ods), Microsoft Excel (participant_questionnaire.xls), and Comma-separated values (participant_questionnaire.csv) formats.
Most questions in the questionnaire were multiple-choice and speak pretty much for themselves. Participant 26 unfortunately failed to fill in the questionnaire. This questionnaire also contains the answers to the questions on the consent forms (can the data be used for research, can your imagery be published?).
Face_video.zip
Face_video.zip contains the frontal face videos recorded in the experiment for the first 22 participants, segmented into trials. In the zip file, sXX/sXX_trial_YY.avi corresponds to the video for trial YY of subject XX.
For participants 3, 5, 11 and 14, one or several of the last trials are missing due to technical issues (i.e. the tape ran out). Please note that these videos are in the order of presentation, so the trial numbers do not correspond to the Experiment_id columns in the video_list file. The mapping between trial numbers and Experiment_ids can be found in the participant_ratings file.
Videos were recorded from a tripod placed behind the screen in DV PAL format using a SONY DCR-HC27E camcorder. The videos were then segmented according to the trials and transcoded to a 50 fps deinterlaced video using the h264 codec. The transcoding was done using the mencoder software with the following command:
mencoder sXX.dv -ss trialYY_start_second -endpos 59.05 -nosound -of avi -ovc x264 -fps 50 -vf yadif=1:1,hqdn3d -x264encopts bitrate=50:subq=5:8x8dct:frameref=2:bframes=3 -noskip -ofps 50 -o sXX_trialYY.avi
The synchronisation of the video is accurate to approximately 1/25 second (barring human error). Synchronisation was achieved by displaying a red screen before and after the experiment at the same time as a marker sent to the EEG recording PC. The onset frame of this screen was then manually marked in the video recording. Individual trial starting times were then calculated from the trial starting markers in the EEG recording.
Data_original.zip
These are the original data recordings. There are 32 .bdf files (BioSemi's data format generated by the Actiview recording software), each with 48 recorded channels at 512Hz. (32 EEG channels, 12 peripheral channels, 3 unused channels and 1 status channel). The .bdf files can be read by a variety of software toolkits, including EEGLAB for Matlab and the BIOSIG toolkit.
The data was recorded in two separate locations. Participants 1-22 were recorded in Twente and participant 23-32 in Geneva. Due to a different revision of the hardware, there are some minor differences in the format. First, the order of EEG channels is different for the two locations. Second, the GSR measure is in a different format for each location.
The table below gives the EEG channel names (according to the 10/20 system) for both locations and the indices that can be used to convert one ordering to the other:
Channel no. | Ch. name Twente | Ch. name Geneva | Geneva > Twente | Twente > Geneva |
---|---|---|---|---|
1 | Fp1 | Fp1 | 1 | 1 |
2 | AF3 | AF3 | 2 | 2 |
3 | F7 | F3 | 4 | 4 |
4 | F3 | F7 | 3 | 3 |
5 | FC1 | FC5 | 6 | 6 |
6 | FC5 | FC1 | 5 | 5 |
7 | T7 | C3 | 8 | 8 |
8 | C3 | T7 | 7 | 7 |
9 | CP1 | CP5 | 10 | 10 |
10 | CP5 | CP1 | 9 | 9 |
11 | P7 | P3 | 12 | 12 |
12 | P3 | P7 | 11 | 11 |
13 | Pz | PO3 | 16 | 14 |
14 | PO3 | O1 | 13 | 15 |
15 | O1 | Oz | 14 | 16 |
16 | Oz | Pz | 15 | 13 |
17 | O2 | Fp2 | 32 | 30 |
18 | PO4 | AF4 | 31 | 29 |
19 | P4 | Fz | 29 | 31 |
20 | P8 | F4 | 30 | 27 |
21 | CP6 | F8 | 27 | 28 |
22 | CP2 | FC6 | 28 | 25 |
23 | C4 | FC2 | 25 | 26 |
24 | T8 | Cz | 26 | 32 |
25 | FC6 | C4 | 22 | 23 |
26 | FC2 | T8 | 23 | 24 |
27 | F4 | CP6 | 20 | 21 |
28 | F8 | CP2 | 21 | 22 |
29 | AF4 | P4 | 18 | 19 |
30 | Fp2 | P8 | 17 | 20 |
31 | Fz | PO4 | 19 | 18 |
32 | Cz | O2 | 24 | 17 |
The remaining channel numbering is the same for both locations. However, please note the GSR measurement is in different units for the two locations. The Twente GSR measurement is skin resistance in nano-Siemens, whereas the Geneva GSR measurement is skin conductance in Ohm. The conversion is given by:
GSRGeneva = 109 / GSRTwente
The following table gives the meaning of the remaining channels:
Channel number | Channel name | Channel content |
---|---|---|
33 | EXG1 | hEOG1 (to the left of left eye) |
34 | EXG2 | hEOG2 (to the right of right eye) |
35 | EXG3 | vEOG1 (above right eye) |
36 | EXG4 | vEOG4 (below right eye) |
37 | EXG5 | zEMG1 (Zygomaticus Major, +/- 1cm from left corner of mouth) |
38 | EXG6 | zEMG2 (Zygomaticus Major, +/- 1cm from zEMG1) |
39 | EXG7 | tEMG1 (Trapezius, left shoulder blade) |
40 | EXG8 | tEMG2 (Trapezius, +/- 1cm below tEMG1) |
41 | GSR1 | Galvanic skin response, left middle and ring finger |
42 | GSR2 | Unused |
43 | Erg1 | Unused |
44 | Erg2 | Unused |
45 | Resp | Respiration belt |
46 | Plet | Plethysmograph, left thumb |
47 | Temp | Temperature, left pinky |
48 | Status | Status channel containing markers |
The status channel contains markers sent from the stimuli presentation PC, indicating when trials start and end. The following status markers were employed:
Status code | Event duration | Event Description |
---|---|---|
1 (First occurence) | N/A | start of experiment (participant pressed key to start) |
1 (Second occurence) | 120000 ms | start of baseline recording |
1 (Further occurences) | N/A | start of a rating screen |
2 | 1000 ms | Video synchronization screen (before first trial, before and after break, after last trial) |
3 | 5000 ms | Fixation screen before beginning of trial |
4 | 60000 ms | Start of music video playback |
5 | 3000 ms | Fixation screen after music video playback |
7 | N/A | End of experiment |
Data_preprocessed_matlab.zip and Data_preprocessed_python.zip
These files contain a downsampled (to 128Hz), preprocessed and segmented version of the data in Matlab (data_preprocessed_matlab.zip) and pickled python/numpy (data_preprocessed_python.zip) formats. This version of the data is well-suited to those wishing to quickly test a classification or regression technique without the hassle of processing all the data first. Each zip file contains 32 .dat (python) or .mat (matlab) files, one per participant. Some sample code to load a python datafile is below:
import cPickle x = cPickle.load(open('s01.dat', 'rb'))
Each participant file contains two arrays:
Array name | Array shape | Array contents |
---|---|---|
data | 40 x 40 x 8064 | video/trial x channel x data |
labels | 40 x 4 | video/trial x label (valence, arousal, dominance, liking) |
The videos are in the order of Experiment_id, so not in the order of presentation. This means the first video is the same for each participant. The following table shows the channel layout and the preprocessing performed:
Channel no. | Channel content | Preprocessing |
---|---|---|
1 | Fp1 |
|
2 | AF3 | |
3 | F3 | |
4 | F7 | |
5 | FC5 | |
6 | FC1 | |
7 | C3 | |
8 | T7 | |
9 | CP5 | |
10 | CP1 | |
11 | P3 | |
12 | P7 | |
13 | PO3 | |
14 | O1 | |
15 | Oz | |
16 | Pz | |
17 | Fp2 | |
18 | AF4 | |
19 | Fz | |
20 | F4 | |
21 | F8 | |
22 | FC6 | |
23 | FC2 | |
24 | Cz | |
25 | C4 | |
26 | T8 | |
27 | CP6 | |
28 | CP2 | |
29 | P4 | |
30 | P8 | |
31 | PO4 | |
32 | O2 | |
33 | hEOG (horizontal EOG, hEOG1 - hEOG2) |
|
34 | vEOG (vertical EOG, vEOG1 - vEOG2) | |
35 | zEMG (Zygomaticus Major EMG, zEMG1 - zEMG2) | |
36 | tEMG (Trapezius EMG, tEMG1 - tEMG2) | |
37 | GSR (values from Twente converted to Geneva format (Ohm)) | |
38 | Respiration belt | |
39 | Plethysmograph | |
40 | Temperature |
References
- "DEAP: A Database for Emotion Analysis using Physiological Signals (PDF)", S. Koelstra, C. M\ uehl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, I. Patras, EEE Transactions on Affective Computing, vol. 3, no. 1, pp. 18-31, 2012
- "What are emotions? And how can they be measured", K.R. Scherer, Social Science Information,vol. 44, no. 4, pp. 695-729, 2005.