doi:10.1016/j.imavis.2005.05.006
Copyright © 2005 Elsevier B.V. All rights reserved.
Object recognition and pose estimation using color cooccurrence histograms and geometric modeling
aComputational Vision and Active Perception, Royal Institute of Technology, Stockholm, Sweden
bCentre for Autonomous Systems, Royal Institute of Technology, Stockholm, Sweden
cElectrical Engineering and Information Technology, University of Dortmund, Dortmund, Germany
Received 5 March 2004;
revised 26 April 2005;
accepted 5 May 2005.
Available online 9 August 2005.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Robust techniques for object recognition and pose estimation are essential for robotic manipulation and object grasping. In this paper, a novel approach for object recognition and pose estimation based on color cooccurrence histograms and geometric modelling is presented. The particular problems addressed are: (i) robust recognition of objects in natural scenes, (ii) estimation of partial pose using an appearance based approach, and (iii) complete 6DOF model based pose estimation and tracking using geometric models.
Our recognition scheme is based on the color cooccurrence histograms embedded in a classical learning framework that facilitates a ‘winner-takes-all’ strategy across different views and scales. The hypotheses generated in the recognition stage provide the basis for estimating the orientation of the object around the vertical axis. This prior, incomplete pose information is subsequently made precise by a technique that facilitates a geometric model of the object to estimate and continuously track the complete 6DOF pose of the object.
Major contributions of the proposed system are the ability to automatically initiate an object tracking process, its robustness and invariance towards scaling and translations as well as the computational efficiency since both recognition and pose estimation rely on the same representation of the object. The performance of the system is evaluated in a domestic environment with changing lighting and background conditions on a set of everyday objects.
Keywords: Object recognition; Pose estimation; Color cooccurrence histograms; Model based tracking
Fig. 1. Some of the objects used for robotic manipulation tasks.
Fig. 2. Left: A typical testing scene. Right: After applying Canny edge detector.
Fig. 3. The small image shows the training image used to estimate the nearest pose of the object for the current image. (Left) the initial pose overlaid on the current image, and (right) the final pose obtained by a local refinement method.
Fig. 4. Block diagram of our model based tracking system.
Fig. 5. The experimental platform Nomadic Technologies XR4000.
Fig. 6. The original image compared with the vote matrix for an orange rice packet.
Fig. 7. (Left) The match values μ(i, T) of training images before, and (Right) after convolution with a Gaussian kernel.
Fig. 8. (First row) An example of tracking a package of raisins: a fairly textured object against a textured background. The estimated pose of the object is overlaid in white. During this experiment a 6 mm lens was used and the object was at a distance of approximately 50 cm from the camera, and (second row) A moving camera and a static object show the ability of the system to cope with significant depth changes and perspective effects.
Fig. 9. (Left) An example of object recognition acing the proposed CCH scheme. (Right) CCH object segmentation performance as a function of the maximum pixel distance considered, dmax.
Fig. 10. Left: The unmodified image. Center: The background. Right: The merged image.
Fig. 11. The CCH of a training image changes significantly with the angle of the object. The size of the UCH is 50×50 bins. Dark areas indicate high counts in the corresponding CCH bin. Left: Object rotated with 0 deg. Center: Object rotated with 45 deg. Right: Object rotated with 90 deg.
Fig. 12. By separating pixel pairs with different orientation, and storing them in separate bins, mirrored images will not have the same CCH. Pixel pairs on the left side have the same orientation, opposite to the orientation of the pixel pairs on the right side.
Fig. 13. Center: The appearance of rice package rotated −90 deg is very similar to the appearance when it is rotated +90 deg. This results in a bimodal match value graph (left). An example of a match value graph in an unambiguous case is shown to the right.
Fig. 14. (Left) Distribution of angular error, and (Right) Mean angular error as a function of variations in scale.
Fig. 15. (Left) Angular error as a function of image noise, and (Right) segmentation threshold θ.
Fig. 16. From object recognition to pose estimation Test1–Test4, (from left): (i) the output of the recognition, (ii) initial pose estimation, (iii) after few fitting iterations, (iv) the estimated pose of the object.
Table 1.
Localisation success (LOC), window number (WINNR), window size (WINSZ) and object integrity (INT) for the segmentation scheme using X–Y-histograms (XY) and color cooccurrence histograms (CO)

Table 2.
Values show the object's pose before and after the fitting stage. Test 1–4 represent experiments shown in Fig. 16.
