doi:10.1016/j.imavis.2003.12.006
Copyright © 2004 Published by Elsevier Science B.V.
Improving accuracy, robustness and computational efficiency in 3D computer vision
a ARM Ltd, New Spring House, 231 Glossop Rd, Sheffield S10 2GW, UK
b Department of Electronic and Electrical Engineering, University of Sheffield, Mappin Building, Mappin St., Sheffield S1 3JD, UK
c Division of Imaging Science and Biomedical Engineering, University of Manchester, Stopford Building, Oxford Rd., Manchester M13 9PT, UK
Received 20 June 2002;
Revised 6 December 2003;
accepted 8 December 2003.
Available online 13 February 2004.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
This paper analyses the strengths and weaknesses of some of the most popular traditional and contemporary 3D vision techniques for accuracy, robustness and computational efficiency. A novel technique is proposed that extends traditional stereo vision (SV) algorithms using some of the previously identified techniques resulting in improved robustness, accuracy and computational efficiency. The new multi-scale temporally constrained SV technique is then applied to a conventional SV algorithm and the performance improvements demonstrated.
Author Keywords: Author Keywords: Stereo vision; Structure from motion; Temporal stereo
Fig. 1. Pin-hole camera model of a video camera of focal length (f). The location of the projected point p′ on the virtual image plane is given by the perspective projection (xi′=fXP/ZP,yi′=fYP/ZP).
Fig. 2. Capturing a scene for structure-from-motion analysis using either camera or object motion. In this example, either method will produce the same perceived temporal image sequence (right).
Fig. 3. A SfM system translating in the direction V with rotation Ω. The FOE can be seen in the middle of the virtual image plane.
Fig. 4. A typical stereo camera configuration. The distance between the optical centres of the cameras is the baseline length, b. Two corresponding epipolar lines are also shown.
Fig. 5. The processing stages used in the FSC SV algorithm.
Fig. 6. Combined motion-stereo matching using initial stereo matching (1) followed by stereo matching (4) via motion correspondences (2,3).
Fig. 7. Example of how the MS-TCS algorithm can use variable correlation block sizes (right) to match the scene shown in the left hand image.
Fig. 8. The left start frame from the synthetic moving cubes sequence. The arrows show how two of the cubes move during the sequence. Each image is 512×512 pixels.
Fig. 9. Sample images from the synthetic moving room sequence. The arrows show the direction taken by the vision system through the scene. Each image is 512×512 pixels.
Fig. 10. A typical frame from the translating train sequence. The arrow shows the direction in which the train moves during the sequence. Each image is 748×560 pixels.
Fig. 11. The re-projected 3D results from the start, middle, and end frames of the cubes sequence as processed by the CCS algorithm.
Fig. 12. The re-projected 3D results from the start, middle, and end frames of the cubes sequence as processed by the TCS algorithm.
Fig. 13. The re-projected 3D results from the start, middle, and end frames of the cubes sequence as processed by the MS-TCS algorithm.
Fig. 14. The number of matches and outliers per frame produced by the CCS, TCS, and MS-TCS algorithms for the cubes sequence.
Fig. 15. The number of multiplies and additions per frame required by the CCS, TCS, and MS-TCS algorithms for the cubes sequence.
Fig. 16. The re-projected 3D results from the start, middle, and end frames of the room sequence as processed by the CCS algorithm.
Fig. 17. The re-projected 3D results from the start, middle, and end frames of the room sequence as processed by the MS-TCS algorithm.
Fig. 18. The number of matches and outliers per frame produced by the CCS and MS-TCS algorithms for the room sequence. Due to the large number of frames in this sequence, the results are grouped into bins of 50 frames showing the mean value for those frames together with 10 and 90% percentile markers.
Fig. 19. The number of multiplies and additions per frame required by the CCS and MS-TCS algorithms for the room sequence. Due to the large number of frames in this sequence, the results are grouped into bins of 50 frames showing the mean value for those frames together with 10 and 90% percentile markers.
Fig. 20. The re-projected 3D results from the start, middle, and end frames of the train sequence as processed by the CCS algorithm.
Fig. 21. The re-projected 3D results from the start, middle, and end frames of the train sequence as processed by the MS-TCS algorithm.
Fig. 22. The number of matches per frame produced by the CCS and MS-TCS algorithms for the train sequence.
Fig. 23. The number of multiplies and additions per frame required by the CCS and MS-TCS algorithms for the train sequence.