Datasets
This page
provides publicly available benchmark datasets for testing and
evaluating detection and tracking algorithms. The datasets are
free for research and educational
purposes only and can be used in scientific publications at
the condition of respecting the
requested citation acknowledgment.
i-Lids
bag and vehicle detection challenge
This is a dataset for event detection in CCTV footage and is a
sub-set of the i-Lids dataset.
The events of interest appearing in the dataset are
abandoned baggage (Task 1) and parked vehicle (Task
2). The description of the tasks can be found
here. Please refer
to the description of the
i-Lids bag and vehicle detection challenge for the
submission procedure.
Sample
frames (AB:
abandoned baggage scenario)
|
|
|
AVSS
AB Easy |
AVSS
AB Medium |
AVSS
AB Hard |
Sample
frames (PV:
parked
vehicle scenarios)
|
|
|
|
AVSS
PV Easy |
AVSS
PV Medium |
AVSS
PV Hard |
AVSS
PV Night |
Data
details
- Location of recording: various locations in the UK
- Number of sequences: 7
- Total number of images: 35000
- Format of images: 8-bit color MOV
- Image size: 720 x 576 pixels
- Video sampling rate: 25 Hz
Ground
truth
To
download the ground truth data click
here
Requested citation acknowledgment
i-Lids
dataset for AVSS 2007
Download
To download this dataset click
here
Audiovisual people dataset
This is a dataset for uni-modal and multi-modal (audio and
visual) people detection tracking. The dataset consists of
three sequences recorded
in different scenarios
with a video camera and a two microphones. Two sequences
(motinas_Room160 and motinas_Room105) are recorded in rooms
with reverberations. The third sequence (motinas_Chamber) is
recorded in a room with reduced reverberations.
Sensor details
- The camera is placed in the centre of a bar that
supports two microphones
- Distance between the microphones: 95 cm
- Microphones: Beyerdynamic MCE 530 condenser
microphones
- Camera: KOBI KF-31CD analog CCD surveillance camera
Sample
frames
|
|
|
motinas_Room160 |
motinas_Room105 |
motinas_Chamber |
Data
details
- Location of recording:
Department of Electronic Engineering - Queen Mary, University
of London
- Number of sequences: 3
- Total number of images:
3271
- Format of images: 8-bit color AVI
- Image size: 360 x 288 pixels
- Video sampling rate: 25 Hz
- Audio sampling rate: 44.1 kHz
Ground
truth
The ground truth data are provided
together with the sequences
in the corresponding .zip file, as list of XML files
representing the positions of the objects in the field of
view.
Requested
citation acknowledgment
Courtesy of EPSRC funded
MOTINAS project (EP/D033772/1)
Point
of contact
Murtaza Taj, murtaza.taj[at]elec.qmul.ac.uk
Download
To download this dataset click
here
Single face dataset
This
is a dataset for single person/face visual detection and
tracking. The dataset is composed of five sequences with
different illumination conditions and resolutions. Three
sequences (motinas_toni,
motinas_toni_change_ill and
motinas_nikola_dark) are shot with
a hand held camera (JVC GR-20EK).
In motinas_toni the target moves
under a constant bright illumination; in
motinas_toni_change_ill the illumination changes from
dark to bright; the sequence
motinas_nikola_dark is constantly dark.
Two
sequences (motinas_emilio_webcam
and motinas_emilio_webcam_turning)
are shot with a webcam (Logitech
Quickcam)
under a fairly constant illumination.
Sample
frames
|
|
|
motinas_toni |
motinas_toni_change_ill |
motinas_nikola_dark |
|
|
motinas_emilio_webcam |
motinas_emilio_webcam_turning |
Sensor
details
- video camera:
JVC GR-20EK and Logitech Quickcam
Data
details
-
Location of recording: Department of Electronic Engineering -
Queen Mary, University of London
- Number of sequences: 5
- Total number of images: 3018
- Format of images: DivX 6 compression
- Image size and sampling rate:
640 x 480
pixels,
25 Hz
(motinas_toni,
motinas_toni_change_ill,
motinas_nikola_dark)
-
Image size
and sampling rate:
320 x 240
pixels, 10 Hz (motinas_emilio_webcam
and motinas_emilio_webcam_turning)
Target
initialization
The target
initialization parameters (the parameters of an ellipse around
the face) are provided in the .zip files together with the
sequences.
Ground
truth
The ground
truth data is available in the .zip files for the sequences
motinas_toni and
motinas_emilio_webcam. In the
ground truth files each line of text describes the objects'
position and size in a frame. The syntax of a line is the
following:
frame number_of_objects
obj_1_name x y half_width half_height angle obj_2_name x y half_width half_height angle ...
Example: first line of a ground truth file in the
single object case:
1 1
man 172 77 29 36 -5
Requested
citation acknowledgment
E. Maggio, A. Cavallaro,
"Hybrid particle filter and mean shift tracker with adaptive
transition model", in Proc. of IEEE Int. Conference
on Acoustics, Speech and Signal Processing (ICASSP 2005),
Philadelphia, 19-23 March 2005, pp. 221 - 224.
Point
of contact
Emilio Maggio, emilio.maggio[at]elec.qmul.ac.uk
Download
To download this dataset click
here
Multiple faces dataset
This
is a dataset for multiple people/faces visual detection and
tracking. The dataset is composed of 3 sequences (same
scenario); 4 targets repeatedly occlude each other while
appearing and disappearing from the
field of
view of the
camera. The
sequence motinas_multi_face_frontal shows
frontal faces only; in
motinas_multi_face_turning the faces are frontal and
rotated; in motinas_multi_face_fast
the targets move faster that in the previous two sequences.
Sample
frames
|
|
|
motinas_multi_face_frontal |
motinas_multi_face_turning |
motinas_multi_face_fast
|
Sensor
details
- video camera: JVC GR-20EK
Data
details
-
Location of recording: Department of Electronic Engineering -
Queen Mary, University of London
- Number of sequences:3
- Total number of images: 2769
- Format of images:
DivX 6 compression
- Image size: 640 x 480 pixels
- Sampling rate: 25 Hz
Target
initialization
The target
initialization parameters (the parameters of an ellipse around
the face) are provided in the .zip files together with the
sequences.
Requested
citation acknowledgment
E. Maggio, E. Piccardo, C. Regazzoni, A. Cavallaro. "Particle
PHD filter for multi-target visual tracking", in Proc. of IEEE
International Conference on Acoustics, Speech and Signal
Processing (ICASSP 2007), Honolulu (USA), April 15-20, 2007
Point
of contact
Emilio Maggio, emilio.maggio[at]elec.qmul.ac.uk
Download
To download this dataset click
here
Other datasets
Name
OTCBVS Dataset
Description Videos and images recorded in and beyond the
visible spectrum (faces and people)
Download
http://www.cse.ohio-state.edu/otcbvs-bench/
Name
PETS 2001 Dataset
Description Two view-monitoring of a campus site (people
and vehicles)
Download
http://www.cvg.cs.rdg.ac.uk/cgi-bin/PETSMETRICS/page.cgi?dataset
Name
PETS 2006 Dataset
Description Person and baggage detection in a train
station
Download
http://www.cvg.rdg.ac.uk/PETS2006/data.html
Name
AMI Corpora
Description Meeting room scenarios, with two people
sitting around meeting tables
Download
http://corpus.amiproject.org/amicorpus/download/download
Annotation Tool
ViPER
|