Infinitesimal Plane-Based Pose Estimation

Collins, Toby; Bartoli, Adrien

doi:10.1007/s11263-014-0725-5

Infinitesimal Plane-Based Pose Estimation

Published: 24 July 2014

Volume 109, pages 252–286, (2014)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Toby Collins¹ &
Adrien Bartoli¹

3908 Accesses
100 Citations
1 Altmetric
Explore all metrics

Abstract

Estimating the pose of a plane given a set of point correspondences is a core problem in computer vision with many applications including Augmented Reality (AR), camera calibration and 3D scene reconstruction and interpretation. Despite much progress over recent years there is still the need for a more efficient and more accurate solution, particularly in mobile applications where the run-time budget is critical. We present a new analytic solution to the problem which is far faster than current methods based on solving Pose from $n$ Points (PnP) and is in most cases more accurate. Our approach involves a new way to exploit redundancy in the homography coefficients. This uses the fact that when the homography is noisy it will estimate the true transform between the model plane and the image better at some regions on the plane than at others. Our method is based on locating a point where the transform is best estimated, and using only the local transformation at that point to constrain pose. This involves solving pose with a local non-redundant 1st-order PDE. We call this framework Infinitesimal Plane-based Pose Estimation (IPPE), because one can think of it as solving pose using the transform about an infinitesimally small region on the surface. We show experimentally that IPPE leads to very accurate pose estimates. Because IPPE is analytic it is both extremely fast and allows us to fully characterise the method in terms of degeneracies, number of returned solutions, and the geometric relationship of these solutions. This characterisation is not possible with state-of-the-art PnP methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

6D object position estimation from 2D images: a literature review

Article Open access 28 November 2022

Giorgia Marullo, Leonardo Tanzi, … Enrico Vezzetti

LSD-SLAM: Large-Scale Direct Monocular SLAM

A review of three dimensional reconstruction techniques

Article 12 February 2021

Jonathan Then Sien Phang, King Hann Lim & Raymond Choo Wee Chiong

References

Ansar, A., & Daniilidis, K. (2003). Linear pose estimation from points or lines. Pattern Analysis and Machine Intelligence (PAMI), 25, 282–296.
Barreto, J., Roquette, J., Sturm, P., & Fonseca, F. (2009). Automatic camera calibration applied to medical endoscopy. In: British machine vision conference (BMVC).
Bouguet, J. Y. A camera calibration toolbox for matlab. http://www.vision.caltech.edu/bouguetj/calib_doc/. Accessed May 2013.
Brown, M., Majumder, A., & Yang, R. (2005). Camera-based calibration techniques for seamless multiprojector displays. Visualization and Computer Graphics, 11, 193–206.
Article Google Scholar
Chen, P., & Suter, D. (2009). Error analysis in homography estimation by first order approximation tools: A general technique. Journal of Mathematical Imaging and Vision, 33, 281–295.
Article MathSciNet Google Scholar
Collins, T., Durou, J. D., Gurdjos, P., & Bartoli, A. (2010). Single-view perspective shape-from-texture with focal length estimation: A piecewise affine approach. In 3D data processing visualization and transmission (3DPVT10).
Dhome, M., Richetin, M., & Lapreste, J. T. (1989). Determination of the attitude of 3D objects from a single perspective view. Pattern Analysis and Machine Intelligence (PAMI), 11, 1265–1278.
Article Google Scholar
Faugeras, O., Luong, Q. T., & Papadopoulou, T. (2001). The geometry of multiple images: The laws that govern the formation of images of a scene and some of their applications. Cambridge, MA: MIT Press.
Google Scholar
Fiore, P. D. (2001). Efficient linear solution of exterior orientation. Pattern Analysis and Machine Intelligence (PAMI), 23, 140–148.
Article Google Scholar
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24, 381–395.
Article MathSciNet Google Scholar
Gao, X. S., Hou, X., Tang, J., & Cheng, H. F. (2003). Complete solution classification for the perspective-three-point problem. Pattern Analysis and Machine Intelligence (PAMI), 25, 930–943.
Article Google Scholar
Gao, X. S., Hou, X. R., Tang, J., & Cheng, H. F. (2003). Complete solution classification for the perspective-three-point problem. Pattern Analysis and Machine Intelligence (PAMI), 25(8), 930–943.
Article Google Scholar
Geiger, A., Moosmann, F., Car, M., & Schuster, B. (2012). A toolbox for automatic calibration of range and camera sensors using a single shot. In International conference on robotics and automation (ICRA).
Haralick, R. M., Lee, C. N., Ottenberg, K., & Nölle, M. (1994). Review and analysis of solutions of the three point perspective pose estimation problem. International Journal of Computer Vision (IJCV), 13, 331–356.
Article Google Scholar
Haralick, R. M., Lee, D., Ottenburg, K., & Nölle, M. (1991). Analysis and solutions of the three point perspective pose estimation problem. In Computer vision and pattern recognition (CVPR).
Harker, M., & O’Leary, P. (2005). Computation of homographies. In British computer vision conference (BMVC).
Hartley, R., & Zisserman, A. (2004). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Hesch, J. A., & Roumeliotis, S. I. (2011). A direct least-squares (DLS) method for PnP. In International conference on computer vision (ICCV).
Hilsmann, A., Schneider, D., & Eisert, P. (2011). Template-free shape from texture with perspective cameras. In British machine vision conference (BMVC).
Horaud, R., Dornaika, F., Lamiroy, B., & Christy, S. (1997). Object pose: The link between weak perspective, paraperspective and full perspective. International Journal of Computer Vision (IJCV), 22, 173–189.
Article Google Scholar
Hung, Y., Harwood, D., & Yeh, P.-S., (1984). Passive ranging to known planar point sets. Technical Report. College Park, MD: University of Maryland.
Kato, H., & Billinghurst, M. (1999). Marker tracking and HMD calibration for a video-based augmented reality conferencing system. In International workshop on augmented reality (IWAR).
Lepetit, V., Moreno-Noguer, F., & Fua, P. (2009). EPnP: An accurate O(n) solution to the pnp problem. International Journal of Computer Vision (IJCV), 81, 155–166.
Article Google Scholar
Li, S., Xu, C., & Xie, M. (2012). A robust O(n) solution to the perspective-n-point problem. Pattern Analysis and Machine Intelligence (PAMI).
Lobay, A., & Forsyth, D. A. (2004). Recovering shape and irradiance maps from rich dense texton fields. In Computer vision and pattern recognition (CVPR).
Lobay, A., & Forsyth, D. A. (2006). Shape from texture without boundaries. International Journal of Computer Vision (IJCV), 67, 71–91.
Article Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV), 60, 91–110.
Article Google Scholar
Lu, C. P., Hager, G. D., & Mjolsness, E. (2000). Fast and globally convergent pose estimation from video images. Pattern Analysis and Machine Intelligence (PAMI), 22, 610–622.
Article Google Scholar
Munoz-Salinas, R. ArUco: Augmented reality library from the university of cordoba. http://www.uco.es/investiga/grupos/ava/node/26. Accessed May 2013.
Oberkampf, D., DeMenthon, D., & Davis, L. S. (1996). Iterative pose estimation using coplanar feature points. Computer Vision and Image Understanding (CVIU), 63, 495–511.
Article Google Scholar
Ohta, Y., Maenobu, K., & Sakai, T. (1981). Obtaining surface orientation from texels under perspective projection. In International joint conferences on artificial intelligence (IJCAI).
Poelman, C., & Kanade, T. (1993). A paraperspective factorization method for shape and motion recovery. Technical Report.
Quan, L., & Lan, Z. (1999). Linear n-point camera pose determination. In Pattern analysis and machine intelligence (PAMI).
Schweighofer, G., & Pinz, A. (2006). Robust pose estimation from a planar target. Pattern Analysis and Machine Intelligence (PAMI), 28, 2024–2030.
Article Google Scholar
Sturm, P. (2000). Algorithms for plane-based pose estimation. In Computer vision and pattern recognition (CVPR).
Taubin, G. (1991). Estimation of planar curves, surfaces, and nonplanar space curves defined by implicit equations with applications to edge and range image segmentation. Pattern Analysis and Machine Intelligence (PAMI), 13, 1115–1138.
Article Google Scholar
Triggs, B. (1999). Camera pose and calibration from 4 or 5 known 3D points. In International conference on computer vision (ICCV).
Vedaldi, A., & Fulkerson, B. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/. Accessed May 2013.
Zhang, C. X., & Hu, Z. Y. (2005). A general sufficient condition of four positive solutions of the p3p problem. Journal of Computer Science and Technology, 20, 836–842.
Article MathSciNet Google Scholar
Zhang, Z. (2000). A flexible new technique for camera calibration. Pattern Analysis and Machine Intelligence (PAMI), 22, 1330–1334.
Article Google Scholar

Download references

Acknowledgments

This research has received funding from the EU FP7 ERC research Grant 307483 FLEXABLE. Code is available at http://www.tobycollins.net/research/IPPE.

Author information

Authors and Affiliations

ALCoV-ISIT, UMR 6284 CNRS/Université d’Auvergne, Clermont-Ferrand, France
Toby Collins & Adrien Bartoli

Authors

Toby Collins
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Bartoli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Toby Collins.

Additional information

Communicated by C. Schnörr.

Appendices

Appendix 1: IPPE Using the Para-Perspective and Weak-Perspective Cameras

Para-perspective projection approximates perspective projection by linearising $\pi $ about some 3D point $\mathbf {x}_{c}=[x_{c},y_{c},z_{c}]^{\top }$ in camera coordinates. We denote this by $\pi _{pp}(\mathbf {x}):\mathbb {R}^{3}\rightarrow \mathbb {R}^{2}$. To reduce approximation error $\mathbf {x}_{c}$ is chosen to be the centroid of the model’s points (Ohta et al. 1981; Poelman and Kanade 1993). $\mathbf {x}_{c}$ can be parameterised by a 2D point ${\tilde{\mathbf {q}}}_{c}$ in normalised coordinates, scaled by a depth $z_{c}$: $\mathbf {x}_{c}=z_{c}[{\tilde{\mathbf {q}}}_{c}^{\top },1]^{\top }$. $\pi _{pp}$ is then given by:

$$\begin{aligned} \pi _{pp}(\mathbf {x})=\tilde{\mathbf {q}}_{c}+z_{c}^{-1}\left[ \begin{array}{cc} \mathbf {I}_{2}&|-\tilde{\mathbf {q}}_{c}\end{array}\right] \mathbf {x}. \end{aligned}$$

(39)

Because para-perspective projection is an affine transform, $\mathbf {H}$ is also an affine transform, and computed by the best fitting affine transform that maps $\{\mathbf {u}_{i}\}$ to $\{\tilde{\mathbf {q}}_{i}\}$. The Jacobian of the model-to-image transform $w$ is therefore constant, which we denote by $\mathbf {J}_{a}\in \mathbb {R}^{2\times 2}$. We can then estimate $z_{c}$ (i.e. the depth of the centroid of the correspondences in camera coordinates) and estimate the plane’s rotation using IPPE by replacing $\pi $ with $\pi _{pp}$. This leads to an instance of Problem (16) with substitutions $\mathbf {J}\leftarrow \mathbf {J}_{a}$, $\mathbf {v}\leftarrow \tilde{\mathbf {q}}_{c}$ and $\gamma \leftarrow z_{c}^{-1}$.

The weak-perspective camera can be treated similarly to the para-perspective camera. The difference is that in weak-perspective projection the linearisation is done at a 3D point passing through the camera’s optical axis. The weak-perspective projection function $\pi _{wp}(\mathbf {x}):\mathbb {R}^{3}\rightarrow \mathbb {R}^{2}$ is given by:

$$\begin{aligned} \pi _{wp}(\mathbf {x})=\tilde{\mathbf {q}}_{c}+z_{0}^{-1}\left[ \begin{array}{cc} \mathbf {I}_{2}&|\mathbf {0}\end{array}\right] \mathbf {x}, \end{aligned}$$

(40)

where $z_{0}$ approximates the depth of the plane along the camera’s optical axis. We can estimate $z_{0}$ and estimate the plane’s rotation using IPPE by replacing $\pi $ with $\pi _{wp}$. This leads to an instance of Problem (16) with substitutions $\mathbf {J}\leftarrow \mathbf {J}_{a}$, $\mathbf {v}\leftarrow \mathbf {0}$ and $\gamma \leftarrow z_{0}^{-1}$.

Appendix 2: Proof of Eq. (17)

We prove Eq. (17) using a general form with point correspondences in $d$-dimensional space. $\mathbf {U}\in \mathbb {R}^{d\times n}$ denotes the set of points in the domain space, where $n$ is the number of points. $\mathbf {Q}\in \mathbb {R}^{d\times n}$ denotes the corresponding set of points in the target space (of the same dimensionality $d$). We use $\bar{\mathbf {U}}$ to denote ${\mathbf {U}}$ but zero-meaned (so that the sum of the rows of $\bar{\mathbf {U}}$ are zero). Let $\hat{\mathbf {M}}=\left[ \begin{array}{cc} \hat{\mathbf {A}} &{} \hat{\mathbf {t}}\\ \mathbf {0}^{\top } &{} 1 \end{array}\right] $ denote the maximum likelihood homogeneous affine transform that maps $\bar{\mathbf {U}}$ to $\mathbf {Q}$, with $\hat{\mathbf {A}}\in \mathbb {R}^{d\times d},\hat{\mathbf {t}}\in \mathbb {R}^{d}$. $\hat{\mathbf {M}}$ is given by:

$$\begin{aligned} \begin{aligned} \hat{\mathbf {t}}&=\mathbf {Q}\mathbf {1}\\ \hat{\mathbf {A}}&=(\mathbf {B}^{\top }\mathbf {B})^{-1}\mathbf {B}^{\top }\mathbf {q},\quad \mathbf {B}\overset{\mathrm {def}}{=}\mathbf {I}_{d}\otimes \bar{\mathbf {U}}^{\top }, \end{aligned} \end{aligned}$$

(41)

where $\mathbf {1}$ is the all-ones $n\times 1$ vector and $\mathbf {q}\in \mathbb {R}^{dn\times 1}$ denotes $\mathbf {Q}$ stacked into a column vector. The transformation of a point $\mathbf {u}\in \mathbb {R}^{d}$ in the domain according to $\hat{\mathbf {M}}$ is given by: $f(\mathbf {u})=\mathbf {V}\hat{\mathbf {M}}$, where $\mathbf {V}\overset{\mathrm {def}}{=}\mathbf {I}_{d}\otimes \mathbf {u}^{\top }$. Suppose $\mathbf {Q}$ is corrupted by IID zero-mean Gaussian noise with variance $\sigma $. The uncertainty covariance matrix in $\mathbf {q}$ is ${\varvec{\Sigma }}_{\mathbf {q}}=\sigma \mathbf {I}_{2}$ and using propagation of uncertainty, the uncertainty in the position of $\mathbf {u}$ transformed according to $\hat{\mathbf {M}}$ is given by the $n\times n$ covariance matrix ${\varvec{\Sigma }}_{f(\mathbf {u})}$:

$$\begin{aligned} \begin{aligned} {\varvec{\Sigma }}_{f(\mathbf {u})}&={\varvec{\Sigma }}_{\hat{\mathbf {t}}}+{\varvec{\Sigma }}_{\hat{\mathbf {A}}}&(a)\\ {\varvec{\Sigma }}_{\hat{\mathbf {t}}}&=\frac{\sigma }{n}\mathbf {I}_{n}&(b)\\ {\varvec{\Sigma }}_{\hat{\mathbf {A}}}&=\sigma \mathbf {V}^{\top }\hat{\mathbf {M}}\hat{\mathbf {M}}^{\top }\mathbf {V}=\mathbf {V}^{\top }(\mathbf {B}^{\top }\mathbf {B})^{-1}\mathbf {V}&(c)\\ \Leftrightarrow \left[ {\varvec{\Sigma }}_{\hat{\mathbf {A}}}\right] _{ij}&=\left\{ \begin{array}{lr} \mathbf {u}^{\top }(\bar{\mathbf {U}}^{\top }\bar{\mathbf {U}})^{-1}\mathbf {u} &{} i=j\\ 0 &{} i\ne j \end{array}\right.&(d) \end{aligned} \end{aligned}$$

(42)

The step from Eq. (42c) to Eq. (42d) is made because of the block-diagonal structure of $(\mathbf {B}^{\top }\mathbf {B})^{-1}$.

Appendix 3: Proof of Theorem 2

Proof of Lemma 1

Lemma 1 comes directly from Eq. (19). To first order we have:

$$\begin{aligned} \mathrm {arg}\underset{\mathbf {u}_{0}}{\,\mathrm {min}}\,\mathrm {trace}\left( {\varvec{\Sigma }}_{\mathbf {J}}(\mathbf {u}_{0})\right) =\mathrm {arg}\underset{\mathbf {u}_{0}}{\mathrm {\, min}}\,\left\| \frac{\partial }{\partial \hat{\mathbf {q}}}\mathrm {vec}(\mathbf {J})\right\| _{F}^{2} \end{aligned}$$

(43)

Equation (43) tells us that to minimise the uncertainty in $\mathbf {J}$ we should find $\mathbf {u}_{0}$ where a small change in the correspondences in the image changes $\mathbf {J}$ the least. $\square $

Proof of Lemma 2

Let $\mathbf {J}'$ denote the Jacobian of $\mathbf {H}'$, and $\tilde{\mathbf {q}}'_{i}=s_{q}\tilde{\mathbf {q}}{}_{i}+\mathbf {t}_{q}$ for some $s_{q}\in \mathbb {R}^{+}$ and $\mathbf {t}_{q}\in \mathbb {R}^{2}$. Recall the centroid of $\{{\mathbf {u}}_{i}\}$ is already at the origin, and so $\mathbf {u}'_{i}=s_{u}\mathbf {u}_{i}$ for some $s_{u}\in \mathbb {R}^{+}$. We use $\hat{\mathbf {q}}'$ to be the vector of length $2n$ that holds $\{\tilde{\mathbf {q}}'{}_{i}\}$. Using the product rule we have $\frac{\partial \mathrm {vec}(\mathbf {J})}{\partial \hat{\mathbf {q}}'}=s_{q}\frac{\partial \mathrm {vec}(\mathbf {J})}{\partial \hat{\mathbf {q}}}$. Because $s_{q}\in \mathbb {R}^{+}$ we have:

$$\begin{aligned} \begin{array}{ll} \mathrm {arg}\underset{\mathbf {u}_{0}}{\,\mathrm {min}}\,\mathrm {trace}\left( {\varvec{\Sigma }}_{\mathbf {J}}(\mathbf {u}_{0})\right) &{} =\mathrm {arg}\underset{\mathbf {u}_{0}}{\mathrm {\, min}}\,\left\| \frac{\partial }{\partial \hat{\mathbf {q}}}\mathrm {vec}(\mathbf {J})\right\| _{F}^{2}\\ &{} =\mathrm {arg}\underset{\mathbf {u}_{0}}{\mathrm {\, min}}\,\left\| \frac{\partial }{\partial \hat{\mathbf {q}}'}\mathrm {vec}(\mathbf {J})\right\| _{F}^{2} \end{array} \end{aligned}$$

(44)

Normalising $\{\tilde{\mathbf {q}}_{i}\}$ therefore does not affect the solution. We then make the coordinate transform $\mathbf {u}\leftarrow s_{u}\mathbf {u}$, and solve Problem (44) using $\{{\mathbf {u}}'_{i}\}$ in place of $\{{\mathbf {u}}_{i}\}$ and $\mathbf {J}'$ in place of $\mathbf {J}$. Suppose a solution to this is given by $\hat{\mathbf {u}}'_{0}$. By undoing the coordinate transform, a solution to the original problem is given by $s_{u}^{-1}\hat{\mathbf {u}}'_{0}$.

When the perspective terms of ${\mathbf {H}}'$ (${H}'_{31}$ and ${H}'_{32}$) are small a good approximation to ${\mathbf {J}}'$ can be made by linearising with respect to ${H}'_{31}$ and ${H}'_{32}$ about ${H}'_{31}={H}'_{32}=0$. This linearisation gives:

(45)

The approximation of $\mathrm {vec}(\mathbf {J}')$ in Eq. (45b) is linear in $\mathbf {u}_{0}$, and so $\frac{\partial }{\partial \hat{\mathbf {q}}'}\mathrm {vec}({\mathbf {J}}')$ is also linear in $\mathbf {u}_{0}$. This means $\left\| \frac{\partial }{\partial \hat{\mathbf {q}}'}\mathrm {vec}({\mathbf {J}}')\right\| _{F}^{2}$ is of the form:

$$\begin{aligned} \left\| \frac{\partial }{\partial \hat{\mathbf {q}}'}\mathrm {vec}({\mathbf {J}}')\right\| _{F}^{2}\approx \mathbf {u}_{0}^{\top }\mathbf {Q}\mathbf {u}_{0}+\mathbf {b}^{\top }\mathbf {u}_{0}+c \end{aligned}$$

(46)

for some $2 \times 2$ matrix $\mathbf {Q}$ (which is either positive definite or positive semi-definite), a $2 \times 1$ vector $\mathbf {b}$ and a constant scalar $c$.

Using the product rule we have:

$$\begin{aligned} \begin{array}{lr} \left\| \frac{\partial }{\partial \hat{\mathbf {q}}'}\mathrm {vec}({\mathbf {J}}')\right\| _{F}^{2}=\left\| \frac{\partial }{\partial \mathbf {h}'}\mathrm {vec}(\mathbf {J}')\frac{\partial \mathbf {h}'}{\partial \hat{\mathbf {q}}'}\right\| _{F}^{2}\\ \mathrm {=trace}\left( \frac{\partial }{\partial \mathbf {h}'}\mathrm {vec}(\mathbf {J}')\mathbf {C}\frac{\partial }{\partial \mathbf {h}'}\mathrm {vec}(\mathbf {J}')^{\top }\right) \\ \mathbf {h}'\overset{\mathrm {def}}{=}\mathrm {vec}(\mathbf {H}'),\quad \mathbf {C}\overset{\mathrm {def}}{=}\frac{\partial }{\partial \hat{\mathbf {q}}'}\mathbf {h}'\frac{\partial }{\partial \hat{\mathbf {q}}'}\mathbf {h}'^{\top },\quad \mathbf {C}\succ \mathbf {0}, \end{array} \end{aligned}$$

(47)

$\mathbf {C}$ is a $8 \times 8$ positive definite matrix that has been studied in Chen and Suter (2009). When $\mathbf {H}'$ is approximately affine the perspective terms $H'_{31}$ and $H'_{32}$ and the translational terms $H'_{13}$ and $H'_{23}$ are negligible. When $H'_{31}=H'_{32}=H'_{13}=H'_{23}=0$, $\frac{\partial }{\partial \mathbf {h}'}\mathrm {vec}(\mathbf {J}')$ is given by:

(48)

It was shown that the normalisation step orthogonalises $\frac{\partial }{\partial \hat{\mathbf {q}}'}\mathbf {h}'$ (Chen and Suter 2009). This implies $\mathbf {C}$ is approximately a diagonal matrix and so:

$$\begin{aligned} \begin{aligned} \left\| \frac{\partial }{\partial \hat{\mathbf {q}}'}\mathrm {vec}({\mathbf {J}}')\right\| _{F}^{2}&=\mathrm {trace}\left( \frac{\partial }{\partial \hat{\mathbf {q}}}\mathrm {vec}(\mathbf {J}')\frac{\partial }{\partial \hat{\mathbf {q}}}\mathrm {vec}(\mathbf {J}')^{\top }\right) \\&=\mathrm {trace}\left( \frac{\partial }{\partial \mathbf {h}'}\mathrm {vec}(\mathbf {J}')\mathbf {C}\frac{\partial }{\partial \mathbf {h}'}\mathrm {vec}(\mathbf {J}')^{\top }\right) \\&\approx \sum _{ij}\left[ \frac{\partial }{\partial \mathbf {h}'}\mathrm {vec}(\mathbf {J}')\right] _{ij}^{2}\mathbf {C}_{jj}. \end{aligned} \end{aligned}$$

(49)

This is a weighted sum of the (squared) elements of $\frac{\partial }{\partial \mathbf {h}'}\mathrm {vec}(\mathbf {J}')$. The weights are $\mathbf {C}_{jj}$ which are non-negative because $\mathbf {C}$ is positive definite. Therefore when the perspective terms of $\mathbf {H}'$ are negligible $\mathrm {trace}\left( {\varvec{\Sigma }}_{\mathbf {J}'}(\mathbf {u}_{0})\right) $ is minimised by ${\mathbf {u}}_{0}=\mathbf {0}$, and so $\mathrm {trace}\left( {\varvec{\Sigma }}_{\mathbf {J}}(\mathbf {u}_{0})\right) $ minimised by ${\mathbf {u}}_{0}=s_{u}^{-1}\mathbf {0}=\mathbf {0}$ (i.e. the centroid of $\{\mathbf {u}_{i}\}$). $\square $

Appendix 4: Proof of Lemma 3

For simplicity we centre the model’s coordinate frame at $\mathbf {u}_{0}$, so $\mathbf {u}_{i}\leftarrow (\mathbf {u}_{i}-\mathbf {u}_{0})$ and $\mathbf {u}_{0}\leftarrow \mathbf {0}$. Because $\{\mathbf {u}_{1},\mathbf {u}_{2},\mathbf {u}_{3}\}$ are non-colinear at least two members of $\{\mathbf {u}_{1},\mathbf {u}_{2},\mathbf {u}_{3}\}$ cannot be $\mathbf {0}$. Without loss of generality let these be $\mathbf {u}_{1}$ and $\mathbf {u}_{2}$.

Let $\mathbf {v}_{i}\overset{\mathrm {def}}{=}w(\mathbf {u}_{i})$, $i\in \{1,2,3\}$ be the position of the three points in the image (in normalised coordinates). From Eq. (28) the two embeddings of $\mathbf {u}_{i}$ into camera coordinates are:

$$\begin{aligned} \begin{array}{l} s_{1}(\mathbf {u}_{i})=\mathbf {R}_{1}\left[ \begin{array}{c} \mathbf {u}_{i}\\ 0 \end{array}\right] +\gamma ^{-1}\left[ \begin{array}{c} \mathbf {v}_{0}\\ 1 \end{array}\right] \\ s_{2}(\mathbf {u}_{i})=\mathbf {R}_{2}\left[ \begin{array}{c} \mathbf {u}_{i}\\ 0 \end{array}\right] +\gamma ^{-1}\left[ \begin{array}{c} \mathbf {v}_{0}\\ 1 \end{array}\right] . \end{array} \end{aligned}$$

(50)

If $s_{1}(\mathbf {u}_{i})$ and $s_{2}(\mathbf {u}_{i})$ project $\mathbf {u}_{i}$ to the same image point (i.e. they exist along the same line-of-sight) then pose cannot be disambiguated using the reprojection error of $\mathbf {u}_{i}$. This is true for $\mathbf {u}_{0}$ because $\mathbf {u}_{0}=\mathbf {0}\Rightarrow s_{1}(\mathbf {u}_{0})=s_{2}(\mathbf {u}_{0})=\gamma ^{-1}[\mathbf {v}_{0}^{\top }1]^{\top }$. For $\mathbf {u}_{i},i\ne 0$, we cannot disambiguate pose using reprojection error iff:

$$\begin{aligned} \begin{array}{l} \forall i\in \{1,2,3\}\,\exists s_{i}\in \mathbb {R}^{+}\,\mathrm {s.t.}\\ \mathbf {R}_{1}\left[ \begin{array}{c} \mathbf {u}_{i}\\ 0 \end{array}\right] +\gamma ^{-1}\left[ \begin{array}{c} \mathbf {v}_{0}\\ 1 \end{array}\right] = s_{i}\left( \mathbf {R}_{2}\left[ \begin{array}{c} \mathbf {u}_{i}\\ 0 \end{array}\right] +\gamma ^{-1}\left[ \begin{array}{c} \mathbf {v}_{0}\\ 1 \end{array}\right] \right) . \end{array} \end{aligned}$$

(51)

Using the decompositions of $\mathbf {R}_{1}$ and $\mathbf {R}_{2}$ from Eq. (24) we pre-multiply both sides of Eq. (51) by $\mathbf {R}_{v}^{\top }$ to give:

$$\begin{aligned} \begin{array}{lr} \forall i\in \{1,2,3\}\,\exists s_{i}\in \mathbb {R}^{+}\,\mathrm {s.t.}\\ \left[ \begin{array}{c} \gamma ^{-1}\mathbf {A}\\ +\mathbf {b}^{\top } \end{array}\right] \mathbf {u}_{i}+\tilde{\mathbf {t}}= s_{i}\left( \left[ \begin{array}{c} \gamma ^{-1}\mathbf {A}\\ -\mathbf {b}^{\top } \end{array}\right] \mathbf {u}_{i}+\tilde{\mathbf {t}}\right) \\ \tilde{\mathbf {t}}\overset{\mathrm {def}}{=}\gamma ^{-1}\mathbf {R}_{v}^{\top }\mathbf {\left[ \begin{array}{c} \mathbf {v}_{0}\\ 1 \end{array}\right] } \end{array} \end{aligned}$$

(52)

We split Eq. (52) into three cases. The first case is when $\mathbf {b}=\mathbf {0}$. In this case there is no ambiguity because from Eq. (24) $\mathbf {b}=\mathbf {0}\Leftrightarrow \tilde{\mathbf {R}}_{1}=\tilde{\mathbf {R}}_{2}\Leftrightarrow \mathbf {R}_{1}=\mathbf {R}_{2}$. The second case is when $\mathbf {b}\ne \mathbf {0}$ and the top two rows of the left side of Eq. (52) are non-zero: $\gamma ^{-1}\mathbf {A}\mathbf {u}_{i}+\tilde{\mathbf {t}}_{12}\ne \mathbf {0}$. This implies $\sigma _i=1$. The third row of Eq. (52) then implies $\mathbf {b}^{\top }\mathbf {u}_{i}=-\mathbf {b}^{\top }\mathbf {u}_{i}$. Because $\mathbf {b}\ne \mathbf {0}$ and $\mathbf {u}_{i}\ne \mathbf {0}$ for $i\in \{1,2\}$, $\mathbf {b}$ must be orthogonal to $\mathbf {u}_{1}$ and $\mathbf {u}_{2}$. This implies $\mathbf {u}_{1}$ and $\mathbf {u}_{2}$ are colinear, which is a contradiction.

The third case is when $\mathbf {b}\ne \mathbf {0}$ and the top two rows of the left side of Eq. (52) are zero: $\gamma ^{-1}\mathbf {A}\mathbf {u}_{i}+\tilde{\mathbf {t}}_{12}=\mathbf {0}$. By eliminating $\tilde{\mathbf {t}}_{12}$ and cancelling $\gamma $ this implies $\mathbf {A}(\mathbf {u}_{2}-\mathbf {u}_{1})=\mathbf {0}$ and $\mathbf {A}(\mathbf {u}_{3}-\mathbf {u}_{1})=\mathbf {0}$. Because $\mathbf {u}_{2}\ne \mathbf {u}_{1}$, this implies $\mathbf {A}$ has a nullspace. Because $\mathrm {rank}(\mathbf {A})\ge 1$, this implies $\mathrm {rank}(\mathbf {A})=1$, and so $(\mathbf {u}_{2}-\mathbf {u}_{1})=\lambda (\mathbf {u}_{3}-\mathbf {u}_{1})$ for some $\lambda \ne 0$. This implies $\{\mathbf {u}_{1},\mathbf {u}_{2},\mathbf {u}_{3}\}$ are colinear, which is a contradiction.

To summarise, when $\mathbf {b}= \mathbf {0}$ there is no ambiguity because both solutions to pose are the same, and when $\mathbf {b}\ne \mathbf {0}$ Eq. (52) is false, and hence Eq. (51) is false. Therefore when $\mathbf {b}\ne \mathbf {0}$ Eq. (28) will project either $\mathbf {u}_{1}$, $\mathbf {u}_{2}$ or $\mathbf {u}_{3}$ to two different image points.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Collins, T., Bartoli, A. Infinitesimal Plane-Based Pose Estimation. Int J Comput Vis 109, 252–286 (2014). https://doi.org/10.1007/s11263-014-0725-5

Download citation

Received: 01 June 2013
Accepted: 20 April 2014
Published: 24 July 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11263-014-0725-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Infinitesimal Plane-Based Pose Estimation

Abstract

Access this article

Similar content being viewed by others

6D object position estimation from 2D images: a literature review

LSD-SLAM: Large-Scale Direct Monocular SLAM

A review of three dimensional reconstruction techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: IPPE Using the Para-Perspective and Weak-Perspective Cameras

Appendix 2: Proof of Eq. (17)

Appendix 3: Proof of Theorem 2

Proof of Lemma 1

Proof of Lemma 2

Appendix 4: Proof of Lemma 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Infinitesimal Plane-Based Pose Estimation

Abstract

Access this article

Similar content being viewed by others

6D object position estimation from 2D images: a literature review

LSD-SLAM: Large-Scale Direct Monocular SLAM

A review of three dimensional reconstruction techniques

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: IPPE Using the Para-Perspective and Weak-Perspective Cameras

Appendix 2: Proof of Eq. (17)

Appendix 3: Proof of Theorem 2

Proof of Lemma 1

Proof of Lemma 2

Appendix 4: Proof of Lemma 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation