From Behaviour Analysis to Re-Identification
The Computer Vision Lab has pioneered visual analysis of behaviour since the early 1990s. The field has become a significant focus world-wide, underpinning technologies for surveillance, retail, robotics, healthcare, HCI. Commercialisation of early work on face and gesture recognition was licensed to a start-up Safehouse (1998-2010).
Recent work on behaviour recognition, semantic video search, crowd analysis, person re-identification has led to a spin-out Vision Semantics in 2007 which has attracted investment to a joint venture from a global market-leader in the banking sector. It has also benefited UK government services and industrial competitiveness.
Recent work (2007-2013) on self-learning visual context of crowded spaces, unsupervised video behaviour profiling, abnormal behaviour recognition, semantic video search/screening, distributed multi-camera behaviour correlation, and person re-identification over distributed large spaces has led to the formation of QMUL spin-out company Vision Semantics Limited (VSL) in 2007. VSL has developed market gap-filling semantic video analysis technologies and multi-camera tracking systems for watchlist re-identification in distributed urban environments.
The system has opened up opportunities to potentially improve existing industrial video analytics products used by a wide range of customers. VSL has also developed innovative technologies for automatic passenger management and real-time crowd density analysis for transport safety and crowd evacuation requirements.
Early work (1996-2006) on motion analysis, face detection, tracking and recognition was licensed to a start-up video analytics company Safehouse Technologies, who built a substantial computer vision technology product base with numerous patents granted and recognition within the industry. Safehouse established considerable inroads to the networked CCTV surveillance sector in North America, and Australasia.
QMUL's Computer Vision Lab has undertaken substantial research programmes in collaboration with DSTL, MOD SA/SD, the US Army Labs, Vision Semantics and Safehouse Technologies to develop robust and scalable mathematical models and computer algorithms for automatic detection, tracking and recognition of object behaviour patterns captured in distributed CCTV cameras from a distance in public spaces, solving a significant challenge on how to analyse and effectively filter massive amounts of public space video data to find “needles in haystacks” [1,6].
There has been an accelerated expansion of Closed-Circuit TeleVision (CCTV) camera systems in public spaces ranging from transport infrastructures, shopping centres, sports arenas to residential streets, serving as a tool for crime reduction and risk management. Current CCTV surveillance continues to be a repetitive, time-consuming manual task that is often reliant on a human operator to spot a momentary incident occurring on dozens of monitors concurrently. CCTV systems rely heavily on human operators to monitor activities and determine incidents, e.g. tracking a suspicious target from one camera to another in a large area of distributed space, or across disjoint views. However, there are inherent limitations to employing unsupervised human operators due to the lack of a priori knowledge for what to look for.
Consequently, most existing CCTV recordings are never replayed, or at best retrieved only after an incident had occurred. Very little if anything, is known about what exactly has been recorded. When a major incident occurs, the police have to review thousands of hours of video recordings to look for a single event that may only last a few seconds. Even if the precise image frames of interest are identified, the image data can often be of insufficient quality either for recognition or as evidence.
There is a massive demand from the commercial technology providers and end-users for activity- and behaviour-based semantic video content analysis to enable fully automated and highly selective screening and search of salient events and objects (e.g. a watchlist) in the colossal amount of video data generated from both infrastructure CCTV cameras and mobile devices. Currently there is no suitable solution on the market. Existing video analytics suffer from (a) crude signal thresholding, (b) hard-wired configuration requiring specialist setup per application domain; (c) rule-based detection systems inflexible and not scalable to different operational conditions; (d) unacceptable false alarm rates, poor usability for user control.
The Computer Vision Lab has endured to develop leading and innovative techniques for object tracking and re-identification, behaviour profiling and anomaly detection based semantic video search/auto-screening, scalable to large scale public space video data. Specifically, fundamental mathematical models and scalable computer algorithms have been developed for
- solving the re-identification problem: detection, tracking and re-identification of people on a watchlist over distributed physical sites at different times over large spaces (e.g. different stations across an underground network);
- emantic video search by unsupervised learning to detect and correlate human/vehicle behaviours at different locations and time;
- unsupervised learning of behavioural context without manual labelling to provide a fully automated mechanism for model transfer/scaling to different domains;
- "human-in-the-loop" relevance feedback to model weak association, e.g. rare behaviours, and discovering unknown behaviours of significance (alarm events) in public spaces [2,3,4,5,7].
This research has been led by Professor Shaogang Gong, founder and Chief Scientist of Vision Semantics since 2007, who also founded the QMUL Computer Vision Lab and has led the QMUL Computer Vision Group since 1993.
The research has been funded by 6 consecutive EPSRC/DTI grants between 1995-2011 including 2 successive MOD JGS grants between 2004-2011, with a further MOD grant from 2011 to 2015. The research has also been funded by three EU FP7 Security grants between 2008-2017, and a EU FP7 Transport grant between 2011-2014.
- [R1] S. Gong and T. Xiang,Visual Analysis of Behaviour: From Pixels to Semantics, 376 pages, Springer, May 2011.
- [R2] W. Zheng, S. Gong and T. Xiang,Re-identification by relative distance comparison, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 3, pp. 653-668, March 2013. [Google Scholar 20 cites, preliminary work at CVPR'11 (GS 66 cites) and BMVC'10 (GS 92 cites)]
- [R3] Wall of Fame - Most Viewed ICCV-2009 Paper (6295 views): "A Markov Clustering Topic Model for Mining Behaviour in Video" (http://www.sciweavers.org/conference/iccv-2009), also as IJCV'2012 at http://www.springerlink.com/content/4244528725h331l1/)
- [R4]T. Xiang and S. Gong, Video behaviour profiling for anomaly detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.30, No.5, pp.893-908, May 2008. [Google Scholar 166 cites]
- [R5] T. Xiang and S. Gong Beyond tracking: Modelling activity and understanding behaviour, , International Journal of Computer Vision, Vol.67, No.1, pp.21-51, 2006. [Google Scholar 178 cites]
- [R6] S. Gong and T. Xiang, Recognition of group activities using a dynamic probabilistic network, IEEE International Conference on Computer Vision, pp.742-749, Nice, France, October 2003. [Google Scholar 193 cites]
- [R7] S. Gong, S. McKenna and A. Psarrou.Dynamic Vision: From Images to Face Recognition, 364 pages, Imperial College Press, World Scientific Publishing, May 2000. [Google Scholar 279 cites]
- [G1] EU FP7 SECURITY SmartPrevent, 2014-2016
- [G2] EU FP7 SECURITY SUNNY, 2013-2017
- [G3] EU FP7 TRANSPORT GETAWAY, 2011-2014
- [G4] EPSRC Multi-Object Behaviour Modelling, 2009-2012
- [G5] EU FP7 SECURITY SAMURAI, 2008-2011
- [G6] EPSRC/MOD JGS BEWARE, 2007-2012
- [G7] EPSRC/MOD JGS INSIGHT, 2004-2007
- [G8] EPSRC / DTI LINK ICONS, 2000-2003
- [I1]New Scientist article on "Smart CCTV Learns to Spot Suspicious Types" (http://www.newscientist.com/article/mg20427385.800-smart-cctv-learns-to-spot-suspicious-types.html)
- [I2]European Technology Marketplace (CORDIS RCN 45790) on "The Hunt for More Robust Surveillance Systems" (http://cordis.europa.eu/fetch?ACTION=D&SESSION=&DOC=1&TBL=EN_OFFR&RCN=5970&CALLER=OFFR_TM_EN)