Chen Change Loy

Research Assistant Professor

Rm 717 SHB
Department of Information Engineering
The Chinese University of Hong Kong
Shatin, NT, Hong Kong

Laboratory: Multimedia Lab

ccloy at ie cuhk edu hk
ccloy at ieee org

I am a research assistant professor in the Department of Information Engineering, the Chinese University of Hong Kong.

Previously I was a post-doctoral researcher at Vision Semantics Limited. I received my PhD (2010) in Computer Science from the Queen Mary University of London. I has been involved in two European FP7 computer vision projects on security and surveillance using multi-camera CCTV systems, SAMURAI (2008-2011) and GETAWAY (2011-2014).

My research interests include computer vision and pattern recognition, particularly in video analysis, active learning, random forest, non-parametric Bayesian models, and probabilistic graphical models. More ...

Google scholar profile

Sciweavers

Existing person re-identification methods conventionally rely on labelled pairwise data to learn a task-specific distance metric for ranking. The value of unlabelled gallery instances is generally overlooked. In this study, we show that it is possible to propagate the query information along the unlabelled data manifold in an unsupervised way to obtain robust ranking results. In addition, we demonstrate that the performance of existing supervised metric learning methods can be significantly boosted once integrated into the proposed manifold ranking-based framework. Extensive evaluation is conducted on three benchmark datasets.

A number of computer vision problems such as human age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem by learning a mapping function between a high dimensional vector-formed feature input and a scalar-valued output. Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e.g. age, people count) changes continuously and cumulatively. Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling.

The QMUL underGround Re-IDentification (GRID) dataset contains 250 pedestrian image pairs. Each pair contains two images of the same individual seen from different camera views. In addition, there are 775 extra individual images that do not belong to any of the paired images. All images are captured from 8 disjoint camera views installed in a busy underground station. The dataset is challenging due to variations of pose, colours, lighting changes; as well as poor image quality caused by low spatial resolution.

State-of-the-art person re-identification methods seek robust person matching through combining various feature types. Often, these features are implicitly assigned with a single vector of global weights, which are assumed to be universally good for all individuals, independent to their different appearances. In this study, we show that certain features play more important role than others under different circumstances. Consequently, we propose a novel unsupervised approach for learning a bottom-up feature importance, so features extracted from different individuals are weighted adaptively driven by their unique and inherent appearance attributes. Extensive experiments on two public datasets demonstrate that attribute-sensitive feature importance facilitates more accurate person matching when it is fused together with global weights obtained using existing methods.

Learning from streams of evolving and unbounded data is an important problem, for example in visual surveillance or internet scale data. For such large and evolving real- world data, exhaustive supervision is impractical, particularly so when the full space of classes is not known in advance therefore joint class discovery (exploration) and boundary learning (exploitation) becomes critical. Active learning has shown promise in jointly optimising exploration-exploitation with minimal human supervision. However, existing active learning methods either rely on heuristic multi-criteria weighting or are limited to batch processing. In this paper, we present a new unified framework for joint exploration-exploitation active learning in streams without any heuristic weighting. Extensive evaluation on classification of various image and surveillance video datasets demonstrates the superiority of our framework over existing methods.

Activity modelling and unusual event detection in a network of cameras is challenging particularly when the camera views are not overlapped. We show that it is possible to detect unusual events in multiple disjoint cameras as context-incoherent patterns, through incremental learning of time delayed dependencies between distributed local activities observed within and across camera views. Specifically, we model multi-camera activities using a Time Delayed Probabilistic Graphical Model (TD-PGM) with different nodes representing activities in different decomposed regions from different views and the directed links between nodes encoding their time delayed dependencies. To deal with visual context changes, we formulate an incremental learning method for modelling time delayed dependencies that change over time. We validate the effectiveness of the proposed approach using a synthetic dataset and videos captured from a camera network installed at a busy underground station.

This paper systematically investigates the effectiveness of different visual feature coding schemes for facilitating the learning of time-delayed dependencies among disjoint multi-camera views. Accurate inter-camera dependency estimation across non-overlapping camera views is non-trivial especially in crowded scenes where inter-object occlusion can be severe and frequent, and when the degree of crowdedness can change drastically over time. In contrast to existing methods that learn dependencies between disjoint cameras by solely relying on correlating universal object-independent low-level visual features or transition time statistics, we propose to use either supervised or unsupervised feature coding, to establish a robust and reliable representation for estimating more accurately inter-camera activity pattern dependencies. We show comparative experiments to demonstrate the superiority of robust feature coding for learning inter-camera dependencies using benchmark multi-camera datasets of crowded public scenes.

This paper presents a multi-output regression model for crowd counting in public scenes. Existing counting by regression methods either learn a single model for global counting, or train a large number of separate regressors for localised density estimation. In contrast, our single regression model based approach is able to estimate people count in spatially localised regions and is more scalable without the need for training a large number of regressors proportional to the number of local regions. In particular, the proposed model automatically learns the functional mapping between interdependent low-level features and multi-dimensional structured outputs. The model is able to discover the inherent importance of different features for people counting at different spatial locations. Extensive evaluations on an existing crowd analysis benchmark dataset and a new more challenging dataset demonstrate the effectiveness of our approach.

Full Publication List

Datasets / Codes