Regression-based Tracking with Data Relevance Determination |
Summary |
This work addresses the problem of efficient visual 2D template tracking in image sequences. We adopt a discriminative approach in which the observations at each frame yield direct predictions of a parametrisation of the state (e.g. position/scale/rotation) of the tracked target. To this end, a Bayesian Mixture of Experts (BME) is trained on a dataset of image patches that are generated by applying artificial transformations to the template at the first frame. In contrast to other methods in the literature, we explicitly address the problem that the prediction accuracy can deteriorate drastically for observations that are not similar to the ones in the training set; such observations are common in case of partial occlusions or of fast motion. To do so, we couple the BME with a probabilistic kernel-based classifier which, when trained, can determine the probability that a new/unseen observation can accurately predict the state of the target (the relevance of the observation in question). In addition, in the particle filtering framework, we derive a recursive scheme for maintaining an approximation of the posterior probability of the target's state in which the probabilistic predictions of multiple observations are moderated by their corresponding relevance. We apply the algorithm in the problem of 2D template tracking and demonstrate that the proposed scheme outperforms classical methods for discriminative tracking in case of motions large in magnitude and of partial occlusions.
Examples |
Here we present some videos of the results that were presented in the paper and a video from a new experiment. We present only videos with rapid motion and/or occlusions. No motion model (e.g. constant motion or constant acceleration) is used as we want to test the behaviour of the algorithm when the true motion deviates significantly from the one predicted by a motion model. (All the results that are provided as supplementary material to CVPR 2007). A readme.txt file explains the results in more detail.
As explained in the paper we approximate the posterior of the target state with a mixture of (5) Gaussians. In the videos we depict the means of the Gaussians with white crosses (some crosses may be on the same position). We draw an 11x11 window around the mean of the Gaussian at which the posterior (under our approximation) is the highest.
For the presented video we used 40 - 100 particles which resulted in performances of 2.5 - 4.6 fr/sec (excluding read write) in a matlab implementation on a 1.5 GHz Toshiba Satellite Pro.
![]() Occlusions, changes in the intensity and small deformations |
![]() Persistent occlusions (a quarter of the target is artificially occluded) |
![]() Persistent occlusions (Half of the target is artificially occluded) |
![]() Large motions of a rigid body (a quarter of the target is artificially occluded) |
Persistent occlusions (Half of the target is artificially occluded every second frame) |
Persistent
occlusions (Half of the target is artificially occluded Large Motion as the sequence is temporally subsampled by a factor of 2) |