AVSS 2007
2007 IEEE International Conference on
Advanced Video and Signal based Surveillance
London (United Kingdom) 
                                                     5-7 September 2007




Paper submission


This page provides publicly available benchmark datasets for testing and evaluating detection and tracking algorithms. The datasets are free for research and educational purposes only and can be used in scientific publications at the condition of respecting the requested citation acknowledgment.


i-Lids bag and vehicle detection challenge 

This is a dataset for event detection in CCTV footage and is
a sub-set of the i-Lids dataset. The events of interest appearing in the dataset are abandoned baggage (Task 1) and parked vehicle (Task 2). The description of the tasks can be found here. Please refer to the description of the i-Lids bag and vehicle detection challenge for the submission procedure.

Sample frames (AB: abandoned baggage scenario)


AVSS AB Medium


Sample frames (PV: parked vehicle scenarios)


AVSS PV Medium



Data details
- Location of recording: various locations in the UK
- Number of sequences: 7
- Total number of images: 35000
- Format of images: 8-bit color MOV
- Image size: 720 x 576 pixels
- Video sampling rate: 25 Hz

Ground truth
To download the ground truth data click here

Requested citation acknowledgment
i-Lids dataset for AVSS 2007

To download this dataset click here


Audiovisual people dataset 

This is a dataset for uni-modal and multi-modal (audio and visual) people detection tracking. The dataset consists of three sequences recorded in different scenarios with a video camera and a two microphones. Two sequences (motinas_Room160 and motinas_Room105) are recorded in rooms with reverberations. The third sequence (motinas_Chamber) is recorded in a room with reduced reverberations.

Sensor details
- The camera is placed in the centre of a bar that supports two microphones
- Distance between the microphones: 95 cm
- Microphones: Beyerdynamic MCE 530 condenser microphones
- Camera: KOBI KF-31CD analog CCD surveillance camera

Sample frames




Data details
- Location of recording: Department of Electronic Engineering - Queen Mary, University of London
- Number of sequences: 3
- Total number of images:
- Format of images: 8-bit color AVI
- Image size: 360 x 288 pixels
- Video sampling rate: 25 Hz
- Audio sampling rate: 44.1 kHz

Ground truth
The ground truth data are provided together with the sequences in the corresponding .zip file, as list of XML files representing the positions of the objects in the field of view.  

Requested citation acknowledgment
Courtesy of EPSRC funded MOTINAS project (EP/D033772/1)

Point of contact
Murtaza Taj, murtaza.taj[at]elec.qmul.ac.uk

To download this dataset click here


Single face dataset

This is a dataset for single person/face visual detection and tracking. The dataset is composed of five sequences with different illumination conditions and resolutions. Three sequences (motinas_toni, motinas_toni_change_ill and motinas_nikola_dark) are shot with a hand held camera (JVC GR-20EK). In motinas_toni the target moves under a constant bright illumination; in motinas_toni_change_ill the illumination changes from dark to bright; the sequence motinas_nikola_dark is constantly dark. Two sequences (motinas_emilio_webcam and motinas_emilio_webcam_turning) are shot with a webcam (Logitech Quickcam) under a fairly constant illumination.

Sample frames









Sensor details
- video camera:
JVC GR-20EK and Logitech Quickcam

Data details
Location of recording: Department of Electronic Engineering - Queen Mary, University of London  
- Number of sequences: 5
- Total number of images: 3018 
- Format of images: DivX 6 compression
- Image size and sampling rate:
640 x 480 pixels,  25 Hz (motinas_toni, motinas_toni_change_ill, motinas_nikola_dark
- Image size and sampling rate: 320 x 240 pixels, 10 Hz (motinas_emilio_webcam and motinas_emilio_webcam_turning) 

Target initialization
The target initialization parameters (the parameters of an ellipse around the face) are provided in the .zip files together with the sequences.

Ground truth
The ground truth data is available in the .zip files for the sequences motinas_toni and motinas_emilio_webcam. In the ground truth files each line of text describes the objects' position and size in a frame. The syntax of a line is the following:
rame  number_of_objects  obj_1_name  x  y  half_width  half_height  angle  obj_2_name  x  y  half_width  half_height  angle ...
Example: first line of a ground truth file in the single object case:
1  1  man  172  77  29  36  -5

Requested citation acknowledgment
E. Maggio, A. Cavallaro, "Hybrid particle filter and mean shift tracker with adaptive transition model", in Proc. of IEEE Int. Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), Philadelphia, 19-23 March 2005, pp. 221 - 224.

Point of contact
Emilio Maggio, emilio.maggio[at]elec.qmul.ac.uk

To download this dataset click here


Multiple faces dataset

This is a dataset for multiple people/faces visual detection and tracking. The dataset is composed of 3 sequences (same scenario); 4 targets repeatedly occlude each other while appearing and disappearing from the field of view of the camera. The sequence motinas_multi_face_frontal shows frontal faces only; in motinas_multi_face_turning the faces are frontal and rotated; in motinas_multi_face_fast the targets move faster that in the previous two sequences.

Sample frames





Sensor details
- video camera: JVC GR-20EK

Data details
Location of recording: Department of Electronic Engineering - Queen Mary, University of London  
- Number of sequences:3
- Total number of images: 
- Format of images:
DivX 6 compression
- Image size: 640 x 480 pixels
- Sampling rate: 25 Hz

Target initialization
The target initialization parameters (the parameters of an ellipse around the face) are provided in the .zip files together with the sequences.

Requested citation acknowledgment
E. Maggio, E. Piccardo, C. Regazzoni, A. Cavallaro. "Particle PHD filter for multi-target visual tracking", in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), Honolulu (USA), April 15-20, 2007

Point of contact
Emilio Maggio, emilio.maggio[at]elec.qmul.ac.uk

To download this dataset click here



Other datasets

Name OTCBVS Dataset
Videos and images recorded in and beyond the visible spectrum (faces and people)
Download http://www.cse.ohio-state.edu/otcbvs-bench/

Name PETS 2001 Dataset
Two view-monitoring of a campus site (people and vehicles)
Download http://www.cvg.cs.rdg.ac.uk/cgi-bin/PETSMETRICS/page.cgi?dataset

Name PETS 2006 Dataset
Person and baggage detection in a train station
Download http://www.cvg.rdg.ac.uk/PETS2006/data.html

Name AMI Corpora
Meeting room scenarios, with two people sitting around meeting tables
Download http://corpus.amiproject.org/amicorpus/download/download

Annotation Tool ViPER


Contact email: info@avss2007.org  
© AVSS 2007