doi:10.1016/j.cviu.2005.07.001
Copyright © 2005 Elsevier Inc. All rights reserved.
Tracking based motion segmentation under relaxed statistical assumptions
Department of Computer Science, Centre of Vision Research, York University, 4700 Keele Street, Toronto, Ont., Canada M3J 1P3
Received 15 August 2003;
accepted 6 July 2005.
Available online 3 October 2005.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
We present a novel and efficient motion segmentation and tracking algorithm that follows the shift and align paradigm. We introduce two statistical tests to evaluate the similarity of aligned image pixels or patches and we use them to determine the spatial extend of each segment. The one statistical test is fast and accurate when the noise is moderate and the other employs a sophisticated noise model involving the Mahalanobis distance to handle correlated noise. Direct computation of the Mahalanobis distance is prohibitively expensive so we apply the Sherman–Morrison–Woodbury identity and amortization to reduce the cost by several orders of magnitude. We tested both versions of the algorithm on a variety of image sequences (indoor and outdoor, real and synthetic, constant and varying lighting, stationary and moving camera, one of them with known ground truth) with very good results.
Keywords: Motion segmentation; Tracking; Varying light; Optical flow; Hypothesis testing; Mahalanobis
Fig. 1. Postprocessing. (A) Input image frames. (B) Before postprocessing. (C) After postprocessing.
Fig. 2. The zoo sequence using the pixel statistic. (A) Original frames 5, 15, and 27. (B) Segments corresponding to the hippopotamus for frames 5, 15, and 27. (C) Segment corresponding to the background for frames 5, 15, and 27.
Fig. 3. The toy truck sequence with the hippopotamus using the pixel statistic. The sequence is taken by a hand-held camera that attempts to follow the truck. (A) Frames 1, 15, and 28 of the turning truck sequence. (B) Segment corresponding to the turning truck with seed region highlighted for frames 1, 15, and 28. (C) Segment corresponding to the moving background for frames 1, 15, and 28.
Fig. 4. The input image frames show the Hamburg taxi sequence using the pixel statistic. (A) Input sequence at frames 0, 9, and 18. (B) Segment corresponding to the turning taxi for frames 0, 9, and 18.
Fig. 5. The input image sequence show the Yosemite sequence using the pixel statistic. (A) Input sequences at frames 2, 8, and 15. (B) Segment corresponding to the distant mountain for frames 2, 8, and 15. (C) Segment corresponding to the left cliff for frames 2, 8, and 15. (D) Segment corresponding to the middle valley for frames frames 2, 8, and 15.
Fig. 9. The toy truck sequence with varying light. (A) Close-up of frame 9. (B) Close-up of frame 10. (C) Close-up of frame 12. (D) Output from tracker for frame 9 and 10. (E) Scan lines across seed region in frame 9 and 10.
Fig. 11. Flower Garden sequence with the pixel-wise motion component disabled. As expected the performance degrades. The parameters are: σn = 1.28, σa = 1e − 6, σu = 0.128, σv = 0.359, σl = 0.0078, σf = 0.45. Output from tracker for frames 1, 3, and 5 with σa close to zero.
Fig. 12. Moving truck in a dynamic background under varying lighting conditions with the varying illumination components disabled. Again the performance degrades. The parameters are: σn = 2.585, σa = 0.0437, σu = 0.397, σv = 0.268, σl = 1e − 6, σf = 1e − 6. (A) Output from tracker for frames 1, 11, and 21 setting σl, σf close to zero. (B) Output from tracker for frames 31, 41 setting σl, σf close to zero.
Fig. 13. Residual flow between aligned images on truck segment. (A) Residual horizontal flow as gray scale image for frames 1, 15, and 28. (B) Residual vertical flow as gray scale image for frames 1, 15, and 28. (C) Motion stabilized frames with grid for frames 1, 15, and 28. (D) SD between aligned images on the truck segment.
Fig. 14. Yosemite sequence needle map and its mean square flow error (u). (A) Needle maps of Lucas and Kanade, Lucas and Kanade with preprocessing and ground truth on the cliff segment. (B) Mean square flow error (u) on the left cliff segment compared with the ground truth.
Fig. 15. Yosemite mean square flow error (v) and mean angular error (radian) with and without segmentation and alignment preprocessing. (A) Mean square flow error (v) on the left cliff segment compared with the ground truth. (B) Mean angular error (radian) on the left cliff segment compared with the ground truth.