doi:10.1016/j.imavis.2005.08.003
Copyright © 2005 Elsevier B.V. All rights reserved.
Efficient particle filtering using RANSAC with application to 3D face tracking
Le Lu
, a,
, Xiangtian Daia and Gregory Hagera
aComputational Interaction and Robotics Lab, Computer Science Department, the Johns Hopkins University Baltimore, MD 21218, USA
Received 31 December 2004;
revised 10 July 2005;
accepted 23 August 2005.
Available online 4 November 2005.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Particle filtering is a very popular technique for sequential state estimation. However, in high-dimensional cases where the state dynamics are complex or poorly modeled, thousands of particles are usually required for real applications. This paper presents a hybrid sampling solution that combines RANSAC and particle filtering. In this approach, RANSAC provides proposal particles that, with high probability, represent the observation likelihood. Both conditionally independent RANSAC sampling and boosting-like conditionally dependent RANSAC sampling are explored. We show that the use of RANSAC-guided sampling reduces the necessary number of particles to dozens for a full 3D tracking problem. This method is particularly advantageous when state dynamics are poorly modeled. We show empirically that the sampling efficiency (in terms of likelihood) is much higher with the use of RANSAC. The algorithm has been applied to the problem of 3D face pose tracking with changing expression. We demonstrate the validity of our approach with several video sequences acquired in an unstructured environment.
Keywords: Random projection; RANSAC; Particle filtering; Robust 3D face tracking
Fig. 1. An example of RANSAC sampling of feature points (red dots are inliers, and green squares are outliers.) to track planar motions.
Fig. 2. Feature based tracking for a planar patch's cyclic in-plane motion sequence. Blue represents ground truth of rotation angles; black and magenta represent the results of RANSAC with and without outliers, respectively. The results for RANSAC-PF are: red, highest weighted particle, and green, mean state value. In most cases, the latter are closely superimposed with blue.
Fig. 3. We simulate the tracking accuracy with varying numbers of particles in 2, 4 and 6 state space dimensions. Because data is synthesized, the ground truth is known and used to observe the likelihood via an independent Gaussian process assumption. To illustrate, only the tracking results of parameter 1 (no physical meaning) is shown; similar results are obtained for other parameters. (a) 200 Particles for two parameters. (b) 200 Particles for four parameters. (c) 200 Particles for six parameters. (d) 800 Particles for six parameters. Colors code same information as in Fig. 2.
Fig. 4. The RANSAC-PF algorithm.
Fig. 5. A graphical representation of RANSAC-PF where the new state Xt is a function of new observation Zt, former observation Zt−1 and state Xt−1.
Fig. 6. The comparison of multi-modal density tracking on in-plane rotation. There are three modes in the underlying density function. The synthesized sequence contains 800 frames. Six hundred particles are used for tracking, and the density function is visualized as a histogram representation of 608 bins in each frame. (a) (Conditional independent) RANSAC particle filtering. (b) (Conditional dependent) boosted RANSAC particle filtering. (c) Particle filtering with one diffused dynamics. (d) Particle filtering with three switched dynamics.
Fig. 7. The likelihood distribution of particles in a video sequence. There are 100 (blue star) dynamics driven particles and 100 (red circle) RANSAC-PF guided particles.
Fig. 8. Diagram of RANSAC-PF as applied to 3D face pose tracking. (b) The graphical representation of RANSAC-PF where the new state Xt is a function of new observation Zt, former observation Zt−1 and state Xt−1.
Fig. 9. The initialization process of face tracking. (a) The initial frame is manually aligned with a 3D face model using six fiducial corners. (b) The next frame is tracked through the two-view motion estimation. The RANSAC-PF tracking begins from the third frame. We show the Maximum A Posterior (MAP) result with a red color reprojected face mesh overlaid on the images, while the mean of weighted particles (MWP) with a black color. (c) MAP and MWP are different at the beginning frames of the RANSAC-PF tracking. (d) MAP and MWP converge together quickly.
Fig. 10. (a) A particle of head pose is overlaid in current frame. (b) Some facial image feature correspondences between the current and next frame are established. (c) The particle is projected into the next frame via random selected image features. (d) Image features are selected spatially and uniformly via RANSAC.
Fig. 11. The tracking result comparison of RANSAC-PF under different configurations. (a) 100 RP particles and 100 DP particles. (b) 100 RP particles and 10 DP particles. (c) 200 DP particles. (d) 10 RP particles and 100 DP particles.
Fig. 12. Robustness testing for two misaligned video sequences. (a) The initial frame with the visible alignment error of subject 1. (b) Frame 195. (c) Frame 628. (d) Frame 948. (e) The initial frame with the visible alignment error of subject 2. (f) Frame 185. (g) Frame 255. (h) Frame 285.
Fig. 13. Comparison of the tracked rotations and the ground truth. The first row is a video sequence containing the rigid face motions only; the second row is a video sequence containing some moderate facial deformations. (a) Yaw. (b) Pitch. (c) Roll. (d) Yaw. (e) Pitch. (f) Roll.
Fig. 14. The entropy curves based on merging, discrete distribution and Parzon windows [5] during tracking.