Surveillance Performance EValuation Initiative (SPEVI)

SPEVI Datasets

This page provides publicly available benchmark datasets for testing and evaluating target tracking algorithms for surveillance-related applications. The datasets are free for research and educational purposes only and can be used in scientific publications at the condition of respecting the requested citation acknowledgment.

Because the accuracy of target tracking algorithms is highly data dependent, they need be evaluated with large test corpora containing significant statistical data variability. Current test sets used for target tracking evaluation are generally composed of a limited number of data items. This limitation is due to two main reasons: (i) the generation of ground-truth data is a highly time-consuming and tedious task and (ii) audiovisual data involving people and their properties (e.g., vehicles) are not easily distributed due to privacy issues. To complement the existing datasets (links at the bottom of this page), this page distributes additional data and their associated ground-truth.

If you want to contribute to this dataset, please contact us at info@spevi.org

New - PFT: A protocol for evaluating video trackers

Audiovisual people dataset

This is a dataset for uni-modal and multi-modal (audio and visual) people detection tracking. The dataset consists of three sequences recorded in different scenarios with a video camera and a two microphones. Two sequences (motinas_Room160 and motinas_Room105) are recorded in rooms with reverberations. The third sequence (motinas_Chamber) is recorded in a room with reduced reverberations.

Sensor details
- The camera is placed in the centre of a bar that supports two microphones
- Distance between the microphones: 95 cm
- Microphones: Beyerdynamic MCE 530 condenser microphones
- Camera: KOBI KF-31CD analog CCD surveillance camera

Sample frames

$Description: \\bronze\andrea\public_html\camera2.jpg$	$Description: \\bronze\andrea\public_html\vlcsnap-460634.jpg$	$Description: \\bronze\andrea\public_html\preview.jpg$
motinas_Room160	motinas_Room105	motinas_Chamber

Data details
- Location of recording: Department of Electronic Engineering - Queen Mary, University of London
- Number of sequences: 3
- Total number of images: 3271
- Format of images: 8-bit color AVI
- Image size: 360 x 288 pixels
- Video sampling rate: 25 Hz
- Audio sampling rate: 44.1 kHz

Ground truth
The ground truth data are provided together with the sequences in the corresponding .zip file, as list of XML files representing the positions of the objects in the field of view.

Requested citation acknowledgment
Courtesy of EPSRC funded MOTINAS project (EP/D033772/1)

Point of contact
Murtaza Taj, murtaza.taj[at]elec.qmul.ac.uk

Download
To download this dataset click here

Single face dataset

This is a dataset for single person/face visual detection and tracking. The dataset is composed of five sequences with different illumination conditions and resolutions. Three sequences (motinas_toni, motinas_toni_change_ill and motinas_nikola_dark) are shot with a hand held camera (JVC GR-20EK). In motinas_toni the target moves under a constant bright illumination; in motinas_toni_change_ill the illumination changes from dark to bright; the sequence motinas_nikola_dark is constantly dark. Two sequences (motinas_emilio_webcam and motinas_emilio_webcam_turning) are shot with a webcam (Logitech Quickcam) under a fairly constant illumination.

Sample frames


motinas_toni	motinas_toni_change_ill	motinas_nikola_dark

$Description: \\bronze\andrea\public_html\emilio.jpg$	$Description: \\bronze\andrea\public_html\EmilioTurning2.jpg$
motinas_emilio_webcam	motinas_emilio_webcam_turning

Sensor details
- video camera: JVC GR-20EK and Logitech Quickcam

Data details
- Location of recording: Department of Electronic Engineering - Queen Mary, University of London
- Number of sequences: 5
- Total number of images: 3018
- Format of images: DivX 6 compression
- Image size and sampling rate: 640 x 480 pixels, 25 Hz (motinas_toni, motinas_toni_change_ill, motinas_nikola_dark)
- Image size and sampling rate: 320 x 240 pixels, 10 Hz (motinas_emilio_webcam and motinas_emilio_webcam_turning)

Target initialization
The target initialization parameters (the parameters of an ellipse around the face) are provided in the .zip files together with the sequences.

Ground truth
The ground truth data is available in the .zip files for the sequences motinas_toni and motinas_emilio_webcam. In the ground truth files each line of text describes the objects' position and size in a frame. The syntax of a line is the following:
frame number_of_objects obj_1_name x y half_width half_height angle obj_2_name x y half_width half_height angle ...
Example: first line of a ground truth file in the single object case: 1 1 man 172 77 29 36 -5

Requested citation acknowledgment
E. Maggio, A. Cavallaro, "Hybrid particle filter and mean shift tracker with adaptive transition model", in Proc. of IEEE Int. Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), Philadelphia, 19-23 March 2005, pp. 221 - 224.

Point of contact
Emilio Maggio, emilio.maggio[at]elec.qmul.ac.uk

Download
To download this dataset click here

Multiple faces dataset

This is a dataset for multiple people/faces visual detection and tracking. The dataset is composed of 3 sequences (same scenario); 4 targets repeatedly occlude each other while appearing and disappearing from the field of view of the camera. The sequence motinas_multi_face_frontal shows frontal faces only; in motinas_multi_face_turning the faces are frontal and rotated; in motinas_multi_face_fast the targets move faster that in the previous two sequences.

Sample frames

	$Description: \\bronze\andrea\public_html\MultiTurning.jpg$	$Description: \\bronze\andrea\public_html\MultiFast.jpg$
motinas_multi_face_frontal	motinas_multi_face_turning	motinas_multi_face_fast

Sensor details
- video camera: JVC GR-20EK

Data details
- Location of recording: Department of Electronic Engineering - Queen Mary, University of London
- Number of sequences:3
- Total number of images: 2769
- Format of images: DivX 6 compression
- Image size: 640 x 480 pixels
- Sampling rate: 25 Hz

Target initialization
The target initialization parameters (the parameters of an ellipse around the face) are provided in the .zip files together with the sequences.

Requested citation acknowledgment
E. Maggio, E. Piccardo, C. Regazzoni, A. Cavallaro. "Particle PHD filter for multi-target visual tracking", in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), Honolulu (USA), April 15-20, 2007

Point of contact
Emilio Maggio, emilio.maggio[at]elec.qmul.ac.uk

Download
To download this dataset click here

"i-Lids (AVSS 2007)" bag and vehicle detection challenge

This is a dataset for event detection in CCTV footage and is a sub-set of the i-Lids dataset. The events of interest appearing in the dataset are abandoned baggage (Task 1) and parked vehicle (Task 2). The description of the tasks can be found here. Please refer to the description of the i-Lids bag and vehicle detection challenge

for the submission procedure.

Data details
- Location of recording: various locations in the UK
- Number of sequences: 7
- Total number of images: 35000
- Format of images: 8-bit color MOV
- Image size: 720 x 576 pixels
- Video sampling rate: 25 Hz

Ground truth
To download the ground truth data click here

Requested citation acknowledgment
i-Lids dataset for AVSS 2007

Download
To download this dataset click here

The "i-Lids (AVSS 2007)" Evaluation dataset can be found here

Annotation Tool

ViPER

Other datasets

Name: cVSG Dataset
Description: The Chroma Video Segmentation Ground Truth (cVSG) is a corpus of video sequences and segmentation masks. Chroma based techniques were used to first acquire foregrounds and backgrounds separately and then combined to form video sequences. Sequences have been selected to ensure different complexities.
Download: http://www-vpu.ii.uam.es/CVSG/

Name OTCBVS Dataset
Description Videos and images recorded in and beyond the visible spectrum (faces and people)
Download http://www.cse.ohio-state.edu/otcbvs-bench/

Name PETS 2001 Dataset
Description Two view-monitoring of a campus site (people and vehicles)
Download http://www.cvg.cs.rdg.ac.uk/cgi-bin/PETSMETRICS/page.cgi?dataset

Name PETS 2006 Dataset
Description Person and baggage detection in a train station
Download http://www.cvg.rdg.ac.uk/PETS2006/data.html

Name VIVID PETS 2005 Dataset
Description Aerial footage (vehicles)
Download http://www.vividevaluation.ri.cmu.edu/datasets/datasets.html

Name AMI Corpora
Description Meeting room scenarios, with two people sitting around meeting tables
Download http://corpus.amiproject.org/amicorpus/download/download

Name: PETS 2000
Description: Outdoor people and vehicle tracking (single camera)
Download: ftp://ftp.pets.rdg.ac.uk/pub/PETS2000/

Name: PETS 2002
Description: Moving People
Download: http://www.cvg.cs.rdg.ac.uk/PETS2002/pets2002-db.html

Name: VS - PETS 2003 – INMOVE
Description: Outdoor people tracking - football data (three synchronised views)
Download: http://www.cvg.cs.rdg.ac.uk/VSPETS/vspets-db.html

Name: PETS - ECCV 2004 – CAVIAR
Description: A number of video clips were recorded acting out the different scenarios of interest. These include people walking alone, meeting with others, window shopping, fighting and passing out and abandoned luggage
Download: http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/

data.html

Name: VISOR Dataset
Description: Multicamera outdoors and indoors scenarios
Download: http://imagelab.ing.unimore.it/visor/video_categories.asp

Name: NGSIM
Description: detailed vehicle trajectory data on parts of highways
Download: http://ngsim.fhwa.dot.gov/modules.php?op=modload&name=News&file=article&sid=4

Name: ETISEO Dataset
Description: Database of 8-view video sequences representative of various complexity levels
Download: http://www-sop.inria.fr/orion/ETISEO/download.htm

Name: IBM
Description: 4 outdoor (from PETS2001) of people and vehicles and 11 indoor clips of people.
Download: http://domino.research.ibm.com/comm/research_projects.nsf/pages/s3.performanceevaluation.html

Name: CANDELA
Description: "Indoor abandoned object" and "road intersection"
Download: http://www.multitel.be/~va/candela/

Name: Traffic datasets
Description: Traffic databases
Download: http://i21www.ira.uka.de/image_sequences/

Name: WAMOP-PETS'2005
Description: Scenes on water.
Download: http://www.vast.uccs.edu/~tboult/PETS05/

Name: DaimlerChrysler Pedestrian Classification Benchmark Dataset
Description: Collection of pedestrian and non-pedestrian images.
Download: http://www.gavrila.net/Computer_Vision/Research/Pedestrian_Detection/DC_Pedestrian_Class__Benchmark/dc_pedestrian_class__benchmark.html

Name: PETS-ICVS'2003 – Fgnet
Description: Smart meeting, that includes facial expressions, gaze and gesture/action.
Download: http://www.cvg.cs.rdg.ac.uk/PETS-ICVS/pets-icvs-db.html

Name ViHASi dataset
Description Virtual Human Action Silhouette Data
Download http://dipersec.king.ac.uk/VIHASI/

Name MuHAVi dataset
Description Multicamera Human Action Video Data
Download http://dipersec.king.ac.uk/MuHAVi-MAS/

Name: PLIA2
Description: set of common household activities during the four-hour period using a set of instructions
Download: http://architecture.mit.edu/house_n/data/PlaceLab/PLIA2.htm

Name: KTH data set
Description: six types of human actions (walking, jogging, running, boxing, hand waving and hand clapping) performed several times by 25 subjects in four different scenarios
Download: http://www.nada.kth.se/cvap/actions/

Name: Weizmann dataset
Description: actions as walk, run, rump, gallop sideways, bend, one-hand wave, two-hands wave, jump in place, jumping Jack, skip
Download: http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html

Name: FRGC dataset
Description: The FRGC database is a collection of Biometric images with both 2D and 3D information. It has 50,000 recordings divided into training and validation partitions
Download: http://face.nist.gov/frgc/

Name: 3D_RMA
Description: This database has been acquired in the framework of the M2VTS project and contains 6 3D scans of 120 individuals
Download: http://www.sic.rma.ac.be/~beumier/DB/3d_rma.html

Name: GavabDB: face database
Description: GavabDB is a 3D face database. It contains 549 three-dimensional images of facial surfaces. These meshes correspond to 61 different individuals (45 male and 16 female) having 9 images for each person.
Download: http://gavab.escet.urjc.es/recursos_en.html

Name: IPPR Contest motion segmentation dataset
Description:
Download: http://media.ee.ntu.edu.tw/Archer_contest/

$Description: \\bronze\andrea\public_html\Dr. Andrea Cavallaro_files\spacer.gif$

Contact email: info@spevi.org

Sample frames

Data details - Location of recording: Department of Electronic Engineering - Queen Mary, University of London - Number of sequences: 3 - Total number of images: 3271 - Format of images: 8-bit color AVI - Image size: 360 x 288 pixels - Video sampling rate: 25 Hz - Audio sampling rate: 44.1 kHz

Ground truth The ground truth data are provided together with the sequences in the corresponding .zip file, as list of XML files representing the positions of the objects in the field of view.

Requested citation acknowledgment Courtesy of EPSRC funded MOTINAS project (EP/D033772/1)

Sensor details - video camera: JVC GR-20EK and Logitech Quickcam

Requested citation acknowledgment E. Maggio, A. Cavallaro, "Hybrid particle filter and mean shift tracker with adaptive transition model", in Proc. of IEEE Int. Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), Philadelphia, 19-23 March 2005, pp. 221 - 224.

Data details - Location of recording: various locations in the UK - Number of sequences: 7 - Total number of images: 35000 - Format of images: 8-bit color MOV - Image size: 720 x 576 pixels - Video sampling rate: 25 Hz Ground truth To download the ground truth data click here

Requested citation acknowledgment i-Lids dataset for AVSS 2007

Name: PETS 2007 Benchmark Data Description: Multi-sensor sequences containing scenarios like: loitering; attended luggage removal (theft); unattended luggage with increasing scene complexity Download: http://www.cvg.rdg.ac.uk/PETS2007/data.html

Name: ETISEO Dataset Description: Database of 8-view video sequences representative of various complexity levels Download: http://www-sop.inria.fr/orion/ETISEO/download.htm

Name: GavabDB: face database Description: GavabDB is a 3D face database. It contains 549 three-dimensional images of facial surfaces. These meshes correspond to 61 different individuals (45 male and 16 female) having 9 images for each person. Download: http://gavab.escet.urjc.es/recursos_en.html

Data details
- Location of recording: Department of Electronic Engineering - Queen Mary, University of London
- Number of sequences: 3
- Total number of images: 3271
- Format of images: 8-bit color AVI
- Image size: 360 x 288 pixels
- Video sampling rate: 25 Hz
- Audio sampling rate: 44.1 kHz

Ground truth
The ground truth data are provided together with the sequences in the corresponding .zip file, as list of XML files representing the positions of the objects in the field of view.

Requested citation acknowledgment
Courtesy of EPSRC funded MOTINAS project (EP/D033772/1)

Sensor details
- video camera: JVC GR-20EK and Logitech Quickcam

Requested citation acknowledgment
E. Maggio, A. Cavallaro, "Hybrid particle filter and mean shift tracker with adaptive transition model", in Proc. of IEEE Int. Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), Philadelphia, 19-23 March 2005, pp. 221 - 224.

Data details
- Location of recording: various locations in the UK
- Number of sequences: 7
- Total number of images: 35000
- Format of images: 8-bit color MOV
- Image size: 720 x 576 pixels
- Video sampling rate: 25 Hz

Ground truth
To download the ground truth data click here

Requested citation acknowledgment
i-Lids dataset for AVSS 2007

Name: PETS 2007 Benchmark Data
Description: Multi-sensor sequences containing scenarios like: loitering; attended luggage removal (theft); unattended luggage with increasing scene complexity
Download: http://www.cvg.rdg.ac.uk/PETS2007/data.html

Name: ETISEO Dataset
Description: Database of 8-view video sequences representative of various complexity levels
Download: http://www-sop.inria.fr/orion/ETISEO/download.htm

Name: GavabDB: face database
Description: GavabDB is a 3D face database. It contains 549 three-dimensional images of facial surfaces. These meshes correspond to 61 different individuals (45 male and 16 female) having 9 images for each person.
Download: http://gavab.escet.urjc.es/recursos_en.html