QM Logo Fabrizio

Attention-driven Pattern Recognition

  1. Smeraldi, thesis no. 2153, Swiss Federal Institute of Technology - Lausanne, Switzerland, 2000


Computer Vision systems deal with an extremely rich and complex representation of the external world encoded in the form of static images or video streams. In order to make sense of the huge amount visual input it receives, a system must be able to identify and process the most informative and interesting parts of it. In other words, an attentional mechanism is needed that selects task-relevant information.

A natural source of inspiration for attentional mechanisms is given by the human visual system. The present Thesis aims at introducing a new biologically inspired vision paradigm, which we refer to as Retinal Vision. This is intended as a general framework that pursues the integration of low-level biologically inspired signal
conditioning such as retinotopic sampling and the Gabor decomposition with models of higher-order visual processes that constitute the interface to cognitive processes in human beings, with particular reference to the saccadic system. The underlying idea is that of implementing attentional mechanisms that would allow a cognitive task to steer visual acquisition and processing.

Humans and primates do not explore a visual scene in a raster-like fashion. They rather perform large jumps, known as saccades, between the points of interest in the scene, which are fixated for a fraction of a second. Saccades are known to play a role in cognitive processes, as the saccadic pattern depends both on the visual scene and on the task to be performed.

In this Thesis we introduce a Saccadic Search strategy that allows an efficient detection of deformable objects in static images or video streams. A description of the visual scene is constructed by computing the responses of a set of modified Gabor filters on the sparse nodes of a Log-polar retinotopic graph. A priori knowledge about the search targets is included in the form of appearance-based models, that are implemented by Support Vector Machine classifiers. These models are employed to generate a sequence of saccades that eventually centres the retinotopic grid on the patterns of interest. The process does not require the computation of the Gabor decomposition over the entire image; the resulting increase in efficiency makes such a powerful tool as the Gabor decomposition suitable for an active vision scenario, despite the computationally demanding nature of Gabor features.

We introduce the Retinal Vision paradigm through a series of applications related to the Face Authentication problem, that combines the necessary level of generality and technical complexity with an evident scientific and practical relevance. As a first application of the Saccadic Search strategy we present a Real-time Head Detection and Tracking system. To the best of our knowledge, this represents the first example of the use of the Gabor decomposition in such an active vision task. The second application we consider, Facial Feature Detection, demonstrates the suitability of our approach for a complex pattern recognition problem on static images.

The descriptiveness of the image representation, which is based on the computation of the responses of modified Gabor filters at the nodes of a sparse Log-polar graph, has been assessed applying it to Face Authentication. A consistently good performance is reported over the two largest authentication-oriented image databases available.

A separate Chapter is dedicated to the study of the dimensionality of the Face Authentication problem, a subject on which widely contrasting estimates can be found in the literature. Classical compression techniques such as Principal Components Analysis and Linear Discriminant Analysis are considered together with a newly developed algorithm, Support Vector Features. These techniques, in combination with both traditional and modern classifiers such as K Nearest Neighbours and Support Vector Machines, are used to investigate the effect of the dimensionality of the feature space on authentication performance.

Full thesis (PostScript)

Backlinks: Publications