Multi-View Dynamic Face Models

Yongmin Li , Shaogang Gong and Heather Liddell

  1. Introduction
  2. A Sparse 3D PDM of Faces
  3. A Shape-and-Pose-Free Texture Model
  4. Model Fitting
  5. Examples
  6. Relavant Publications


Modelling faces under large pose variation and dynamically over time in video sequences are two challenging problems in face recognition and facial analysis. To address these problems, a comprehensive novel multi-view dynamic face model is presented in this work. The model consists of a 3D shape model, a shape-and-pose-free texture model, and an affine geometrical model.

A Sparse 3D PDM of Faces

The 3D shape vector of a face is defined as the 3D positions of a sparse set of landmarks. Given a set of 2D face images with known pose and 2D positions of the landmarks, the 3D shape vector can be estimated using linear regression. A sparse set of 44 landmarks locating the mouth, nose, eyes, and face contour were semi-automatically labelled on each face image. Figure 1 shows the sample face images used to construct the model and the landmarks labelled on each image as well.


Figure 1: Sample training face images (first row) and the landmarks labelled on the images (second row).

The 3D shape estimated from these labelled face images is shown in Figure 2 with tilt fixed on $0\mbox{$^\circ\ $}$ and yaw changing from $-40\mbox{$^\circ\ $}\sim +40\mbox{$^\circ\ $}$.

Figure 2: A 3D shape vector estimated from the face images shown in Figure 1.
The Point Distribute Model (PDM)  is adopted in this work to further reduce the dimensionality of shape space. We trained the PDM on a set of 600 3D shape vectors from 12 different subjects (50 of each subject). Each 3D shape vector was estimated from a random selection of 20 of 45 face images of the same subject. After training, the first 10 eigenshapes take 95.5% of all variance.

A Shape-and-Pose-Free Texture Model

We present a statistical approach to model face textures by extracting shape-and-pose-free texture information. To decouple the covariance between shape and texture, a face image fitted by the shape model is warped to the mean shape at frontal view (with $0\mbox{$^\circ\ $}$ in both tilt and yaw). This is implemented by forming a triangulation from the landmarks and employing a piece-wise affine transformation between each triangle pair. By warping to the mean shape, one obtains the shape-free texture of the given face image. Furthermore, by warping to the frontal view, a pose-free texture representation is achieved. Figure 3 illustrates the shape-and-pose-free texture patterns of the face images shown in Figure 1.
Figure 3: Extract the shape-and-pose-free texture patterns of the face images shown in Figure 1.

We applied a PCA to a set of 540 shape-and-pose-free face textures from 12 subjects.The first 12 eigen modes take 96.4% of all variance.

Model Fitting

Model fitting is performed by optimising a global appearance fitting criterion, a local fitting criterion on landmarks, and a temporal fitting criterion between successive frames in a video sequence. After fitting the multi-view face model to a given image or video sequence containing faces, one receives the following set of parameters:${\bf c} = ({\bf s}, {\bf t}, \alpha, \beta, dx,
dy, r)^{\tt T}$ where s is the shape parameter, t is the shape-and-pose-free texture parameter, $(\alpha, \beta)$ is pose in tilt and yaw estimated by an SVM (Support Vector Machine) based pose estimator, (dx,dy) is the translation of the centroid of the face, and r is its scale.
Therefore, this model provides the identity information (s, t) of a face which is crucial to face recognition and facial analysis, and the geometrical information (the remaining part) which can be used for face tracking and alignment.


  1. The process of fitting the model to a face image.
  2. \includegraphics[width=.4\textwidth]{figures/jamiefit/before.eps}\includegraphics[width=.4\textwidth]{figures/jamiefit/after.eps}\includegraphics[width=.1\textwidth]{figures/jamiefit/iter01.eps}\includegraphics[width=.1\textwidth]{figures/jamiefit/iter02.eps}\includegraphics[width=.1\textwidth]{figures/jamiefit/iter03.eps}\includegraphics[width=.1\textwidth]{figures/jamiefit/iter04.eps}\includegraphics[width=.1\textwidth]{figures/jamiefit/iter06.eps}\includegraphics[width=.1\textwidth]{figures/jamiefit/iter07.eps}\includegraphics[width=.1\textwidth]{figures/jamiefit/iter08.eps}\includegraphics[width=.1\textwidth]{figures/jamiefit/iter10.eps}
    Figure 4: Fit the multi-view face model to a face image. The first two images shows the original face image and the fitted pattern warped on the original image. The others are the fitting results in 8 iterations.

  3. Fit the model to a sequence containing faces with large pose variation (nearly profile to profile). The length of the sequence is 81 frames.
  4. \includegraphics[width=.07\textwidth]{figures/jon01/fitOrig000.eps}\includegraphics[width=.07\textwidth]{figures/jon01/fitOrig008.eps}\includegraphics[width=.07\textwidth]{figures/jon01/fitOrig016.eps}\includegraphics[width=.07\textwidth]{figures/jon01/fitOrig024.eps}\includegraphics[width=.07\textwidth]{figures/jon01/fitOrig032.eps}\includegraphics[width=.07\textwidth]{figures/jon01/fitOrig040.eps}\includegraphics[width=.07\textwidth]{figures/jon01/fitOrig048.eps}\includegraphics[width=.07\textwidth]{figures/jon01/fitOrig056.eps}\includegraphics[width=.07\textwidth]{figures/jon01/fitOrig064.eps}\includegraphics[width=.07\textwidth]{figures/jon01/fitOrig072.eps}


    Figure 5: Tracking faces undergoing large pose change. The first rows are original images from sample frames with 8 frame interval, and the second row shows the reconstructed face patterns overlapped on the original images.

  5. Fit the model to a sequence containing faces with significant expression change. The length of the sequence is 47 frames.
  6. \includegraphics[width=.07\textwidth]{figures/yongmin04/fitOrig000.eps}\includegraphics[width=.07\textwidth]{figures/yongmin04/fitOrig005.eps}\includegraphics[width=.07\textwidth]{figures/yongmin04/fitOrig010.eps}\includegraphics[width=.07\textwidth]{figures/yongmin04/fitOrig015.eps}\includegraphics[width=.07\textwidth]{figures/yongmin04/fitOrig020.eps}\includegraphics[width=.07\textwidth]{figures/yongmin04/fitOrig025.eps}\includegraphics[width=.07\textwidth]{figures/yongmin04/fitOrig030.eps}\includegraphics[width=.07\textwidth]{figures/yongmin04/fitOrig035.eps}\includegraphics[width=.07\textwidth]{figures/yongmin04/fitOrig040.eps}\includegraphics[width=.07\textwidth]{figures/yongmin04/fitOrig045.eps}


    Figure 6: Tracking faces with significant expression change. Images are sampled with 5 frame interval.

Relavant Publications

  1. Y. Li, S. Gong, and H. Liddell.
    Modelling faces dynamically across views and over time.
    Technical report, Queen Mary, University of London, 2001.
  2. Y. Li, S. Gong, and H. Liddell.
    Support vector regression and classification based multi-view face detection and recognition.
    In IEEE International Conference on Automatic Face & Gesture Recognition, pages 300-305, Grenoble, France, 2000.

Yongmin Li 2001-02-05