Multi-Object Video Behaviour Modelling for Abnormality Detection and Differentiation

Funded by UK EPRSC under the First Grant Scheme from 2009-2012

There are over 4.2 million closed-circuit television (CCTV) surveillance cameras operational in the UK and many more worldwide, collecting a colossal amount of video data for security, safety, and infrastructure and facility management purposes. A typical existing CCTV system relies on a handful of human operators at a centralised control room for monitoring video inputs from hundreds of cameras. Too many cameras and too few operators leave the system ill equipped to fulfil the task of detecting events and anomalies that require immediate and appropriate response. Consequently, the use of the existing CCTV surveillance systems is limited predominately to post-mortem analysis. There is thus an increasing demand for automated intelligent systems for analysing the content of the vast quantities of surveillance videos and triggering alarms in a timely and robust fashion. One of the most critical components and functionalities of such a system is to monitor object behaviour captured in the videos and detect/predict any suspicious and abnormal behaviour that could pose a threat to public safety and security.

This project aims to develop underpinning capabilities for an innovative intelligent video analytics system for detecting abnormal video behaviour in public spaces. More specifically, the project will address three open problems:

  1. To develop a new model for spatio-temporal visual context for abnormal behaviour detection. Behaviours are inherently context-aware, exhibited through constraints imposed by scene layout and the temporal nature of activities in a given scene. Consequently, the same behaviour can be deemed as either normal or abnormal depending on where and when it occurs. For instance, in a road junction, object behaviours are regulated by both the layout of the roads (spatially) and the traffic lights (temporally); on a train platform, the spatial and temporal visual context is defined by the layout of the platform and the timing of train arrival and departure respectively. We aim to go beyond the state-of-the-art semantic scene modelling approaches, most of which are focused solely on scene layout such as entry and exit points, by developing a more comprehensive spatio-temporal model of dynamic visual context.
  2. To develop a novel multi-object behaviour model for real-time detection and differentiation of abnormalities in complex video behaviours that involve multiple objects interacting with each other (e.g. a group of people meet in front of a ticket office at a train station and then go to different platforms). The ability to not only detect but also differentiate different types of abnormalities given noisy input has not been exploited up to now, although it is important in practical applications. Specifically, in many real-world scenarios, one type of abnormality could be deemed as more critical for triggering an alarm than others. For instance, in a bank branch, a different order of 'entering the branch' and 'using an ATM outside the branch' is of no significance. However, the amount of time spent in front of the ATM (too long or too short) may be of more interest. On the other hand, in a convenience shop, the temporal order of 'taking something', 'paying' an d 'leaving the shop' is important, whilst variations in the time spent at these behaviour constituent parts are much less critical. The behaviour model to be developed in this project will have the novel features of distinguishing a) an abnormal behaviour pattern from a normal one that is contaminated by noise; b) different types of abnormalities, e.g. abnormal temporal order or temporal duration.
  3. To develop a novel online adaptive learning algorithm for estimating the parameters of the behaviour model to be developed. Although primitive video abnormality detection tools are already available in many existing CCTV control systems, human operators are often reluctant to use them because there are too many parameters to tune and re-tune for different scenarios and for changing visual context. With the online adaptive learning algorithm our abnormality method can be used for different surveillance scenarios on the fly over a long period of time with minimal human intervention. More importantly, with the learning algorithm, our behaviour model will become adaptive to both changes of visual context (therefore the definition of normality/abnormality), and valuable feedbacks from human operators on the abnormality detection output of the model.