doi:10.1016/j.cag.2006.09.004
Copyright © 2006 Elsevier Ltd All rights reserved.
Virtual Environments
Graphtracker: A topology projection invariant optical tracker
aCenter for Mathematics and Computer Science (CWI), Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
bDepartment of Mathematics and Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands
Available online 4 December 2006.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
In this paper, we describe a new optical tracking algorithm for pose estimation of interaction devices in virtual and augmented reality. Given a 3D model of the interaction device and a number of camera images, the primary difficulty in pose reconstruction is to find the correspondence between 2D image points and 3D model points. Most previous methods solved this problem by the use of stereo correspondence. Once the correspondence problem has been solved, the pose can be estimated by determining the transformation between the 3D point cloud and the model.
Our approach is based on the projective invariant topology of graph structures. The topology of a graph structure does not change under projection: in this way we solve the point correspondence problem by a subgraph matching algorithm between the detected 2D image graph and the model graph.
In addition to the graph tracking algorithm, we describe a number of related topics. These include a discussion on the counting of topologically different graphs, a theoretical error analysis, and a method for automatically estimating a device model. Finally, we show and discuss experimental results for the position and orientation accuracy of the tracker.
Keywords: Optical tracking; Spatial interaction; Pose estimation; Projection invariant; AR/VR
Fig. 1. A
cubical input device augmented by a graph pattern of retro-reflective markers. All six faces of the cube are shown. Note that graph edges are allowed to cross over between faces of the cube and do not need to be straight lines.
Fig. 2. The Personal Space Station, a near-field VR/AR environment.
Fig. 3. The sequence of stages in the pipeline to go from a camera image to a device pose.
Fig. 4. A schematic visualization of the various stages in converting a camera image to a graph topology that can be matched (also see Fig. 3). From top-left to bottom-right the images show a visualization of the state after: image acquisition, thresholding, region detection, skeletization, end-point removal, graph detection, short edge removal, degree-two removal, and graph matching.
Fig. 5. (left) A detected region in light grey with its skeleton shown in black. The top-right junction of five edges is split over three vertices. Our goal is to combine these vertices into a single vertex while maintaining edge ordering. (right) Merging vertices v1, v2, v3 results in the vertex with incident edge ordering as given in the dashed inset. The numbers indicate the order of each incident edge ei. The order of merging does not matter, for example merge(merge(v1, v2), v3) equals merge(merge(v2, v3), v1).
Fig. 6. Schematic view of the perspective n-point problem with two cameras. The goal is to reconstruct the pi given the camera positions Ci, image points ui and corresponding model points mi.
Fig. 7. Expected error perpendicular to the camera plane.
Fig. 8. A number of processing steps performed to match a graph in a camera image. (top left) A raw image as captured by our infra-red cameras. (top right) The resulting image after thresholding and region detection. (bottom left) The result of running a skeletization algorithm. (bottom right) The detected graph after reading and matching the graph from the skeletized image. Five unique points are identified.
Fig. 9. When parts of the graph are occluded, some fixed points can still be detected. An interesting example is the bottom-right image; the detected subgraph matches the model in two ways. The point connecting the two self-loops can be uniquely identified by noting a fixed point. However, the points representing the loops themselves cannot be uniquely identified as the two points can be interchanged freely.
Fig. 10. Positional tracking accuracy with respect to the XZ-plane for single cameras and all four cameras combined. The vertical axis shows the distance to the XZ-plane in millimeters. The horizontal axis represents a sequence of about 200 frames. When a camera could not detect a pose the values are omitted.
Fig. 11. Positional tracking accuracy with respect to the XZ-plane for two pairs of cameras and all cameras combined.
Fig. 12. Angular tracking accuracy with respect to the XZ-plane for single cameras and all four cameras combined. The vertical axis shows the angle with the XZ-plane in degrees. The horizontal axis represents a sequence of frames.
Fig. 13. Angular tracking accuracy with respect to the XZ-plane for two pairs of cameras and all cameras combined.
Table 1.
Number of topologically different planar graphs using a fixed amount of vertices of degree at least n

Table 2.
Measurement-to-plane summarized results for 1/2/4 cameras

The average distance to the XZ-plane and the RMSE are given in the first two columns. The average angle with the plane and corresponding RMSE are given in the last two columns.
Table 3.
Average number of detected unique points for single cameras, stereo and projection invariant tracking during a random interaction session
