See video results of recent projects!

Multi-camera networks [ the book ]

(calibration, detection, tracking, camera selection)


Target detection and tracking [ the book ]

(2D/3D faces, people, vehicles, moving objects, shadows)

Behaviour and identity recognition

(lip-reading, expressions, interactions, unusual behaviour)

Perceptual semantics in multimedia


Completed projects

  • Audio-visual semantic discovery

  • Dynamic visual scene analysis

  • Audio detection and classification of events

  • Automatic object prototyping for video annotation

  • Cognitive models for personalised presentation and retrieval of visual information

  • 3D facial scan analysis


  • MOTINAS - Multi-modal object tracking in a network of audio-visual sensors

    The goal of this project was to develop algorithms for  multi-modal and multi-sensor tracking using STAC sensors (stereo microphones coupled with cameras. To evaluate the tracking scheme, we created a test corpus and its associated ground-truth data for use in the project as well as for distribution to the research community through the website http://www.spevi.org to facilitate comparisons.

    Project webpage: http://www.elec.qmul.ac.uk/staffinfo/andrea/motinas.html

  • Smart Camera

    The goal of this internal project is to develop a camera which extends the capabilities of standard cameras by analyzing the scene in order to generate a scalable content description. Such a device has a wide range of actual and potential applications, including textual scene description, video surveillance, augmented reality, Universal Multimedia Access (UMA)

    Smart camera

  • art.live - Architecture and authoring tools for prototype for Living Images and new Video Experiments

    art.live develops an innovative authoring tool that enables artists/users to easily create mixed real and virtual narrative spaces and disseminate them in real-time to the public through the Internet (or any IP support). art.live is an European IST (Information Society Technology, part of the 5th Framework Programme. art.live develops an architecture and a set of tools for the enhancement of narrative spaces. To this aim, art.live gathers image processing engineers, AI computer scientists and multimedia authors. This approach is concretely implemented by way of several techniques of artificial intelligence, signal processing, communications and Human-Computer interfaces.

    art.live web page:

  • MODEST - Multimedia Object Descriptors Extraction from Surveillance Tapes

    MODEST defines and develops a framework for the analysis of video sequences aiming at high-level semantic scene interpretation. The approach is based on the segmentation, tracking and indexing of moving objects in video scenes by using Intelligent Physical Agents (IPA). The work is performed in the scope of the MPEG-4 and MPEG-7 standards. The final goal of the system is to provide automatic interpretations and decisions from visual observation. A human user interacting with the system may confirm the automatic decisions, which are usually alarms following event detection.

    MODEST web page:

    the MODEST video object kernel

  • SURVEILLANCE - Analysis of video sequences to track moving objects

    SURVEILLANCE focuses on the analysis of images with the general goal of identifying, indexing and tracking moving objects and recognizing automatically changes in a given scenery. Main methods used are optical flow, change detection, segmentation and tracking. The Surveillance Project is performed under the framework of CTI. There are three groups involved into this project namely EPFL-LTS, Siemens-Cerberus and Motorola.

  • Integrated circuits for low-cost multimedia systems: compression and decompression of audio and video signals and data streams

    The objective of the project is to develop a low-cost and low-consuming interactive multimedia system, suitable for those information services which provide audio/video upstream communication of low cost and wide diffusion, such as interactive television and consumer playback applications. The information servers are connected to set top boxes at the customer premises through upstream channels of 512Kbit/s - 2Mbit/s. The audio/video signal, suitably coded and transmitted by a single user, is collected by the server, transcoded in MPEG, mixed with the main signal and other user-contributed MPEG signals and then broadcasted to all the users on downstream channels with a bit-rate of 4-20 Mbit/s. We propose to overcome some drawbacks of the current coders, i.e. the fixed block size and the use of non realistic transformation in motion compensation. As regards the former, a quadtree structure can be employed allowing a careful treatment of the highly detailed areas of an image. This can be achieved using two sizes of blocks and a three stage motion compensation algorithm, while the former drawback can be overcome considering a non-translational motion field that allows not only the modelling of translation but also of other kind of motion such as rotation, shearing, warping and uneven stretching.

    Project web page:








Description: Description: \\bronze\andrea\public_html\Dr. Andrea Cavallaro_files\spacer.gif


Description: Description: \\bronze\andrea\public_html\Dr. Andrea Cavallaro_files\spacer.gif








Description: Description: \\bronze\andrea\public_html\Dr. Andrea Cavallaro_files\spacer.gif





(C) Queen Mary, University of London