Multi-view Scene Flow Estimation: A View Centered Variational Approach

Basha, Tali; Moses, Yael; Kiryati, Nahum

doi:10.1007/s11263-012-0542-7

Multi-view Scene Flow Estimation: A View Centered Variational Approach

Published: 14 June 2012

Volume 101, pages 6–21, (2013)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Tali Basha¹,
Yael Moses² &
Nahum Kiryati¹

1800 Accesses
78 Citations
3 Altmetric
Explore all metrics

Abstract

We present a novel method for recovering the 3D structure and scene flow from calibrated multi-view sequences. We propose a 3D point cloud parametrization of the 3D structure and scene flow that allows us to directly estimate the desired unknowns. A unified global energy functional is proposed to incorporate the information from the available sequences and simultaneously recover both depth and scene flow. The functional enforces multi-view geometric consistency and imposes brightness constancy and piecewise smoothness assumptions directly on the 3D unknowns. It inherently handles the challenges of discontinuities, occlusions, and large displacements. The main contribution of this work is the fusion of a 3D representation and an advanced variational framework that directly uses the available multi-view information. This formulation allows us to advantageously bind the 3D unknowns in time and space. Different from optical flow and disparity, the proposed method results in a nonlinear mapping between the images’ coordinates, thus giving rise to additional challenges in the optimization process. Our experiments on real and synthetic data demonstrate that the proposed method successfully recovers the 3D structure and scene flow despite the complicated nonconvex optimization problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D Scene Flow Estimation with a Piecewise Rigid Scene Model

Article 24 February 2015

Christoph Vogel, Konrad Schindler & Stefan Roth

View-Consistent 3D Scene Flow Estimation over Multiple Frames

Monocular scene flow estimation via variational method

Article 02 December 2015

Degui Xiao, Qiuwei Yang, … Wei Wei

Notes

The source code is publicly available.

References

Ayvaci, A., Raptis, M., & Soatto, S. (2010). Occlusion detection and motion estimation with convex optimization. NIPS (pp. 100–108).
Basha, T., Moses, Y., & Kiryati, N. (2010). Multi-view scene flow estimation: A view centered variational approach. In Proc. IEEE conf. comp. vision patt. recog. (pp. 1506–1513).
Google Scholar
Ben-Ari, R., & Sochen, N. A. (2007). Variational stereo vision with sharp discontinuities and occlusion handling. In Proc. int. conf. comp. vision (pp. 1–7).
Google Scholar
Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In Proc. European conf. comp. vision (pp. 25–36).
Google Scholar
Carceroni, R. L., & Kutulakos, K. N. (2002). Multi-view scene capture by surfel sampling: from video streams to non-rigid 3d motion, shape and reflectance. International Journal of Computer Vision, 49(2–3), 175–214.
Article MATH Google Scholar
Courchay, J., Pons, J. P., Monasse, P., & Keriven, R. (2009). Dense and accurate spatio-temporal multi-view stereovision. In Asian conf. on computer vision (pp. 11–22).
Google Scholar
Felzenszwalb, P., & Huttenlocher, D. (2006). Efficient belief propagation for early vision. International Journal of Computer Vision, 70(1), 41–54.
Article Google Scholar
Furukawa, Y., & Ponce, J. (2008). Dense 3d motion capture from synchronized video streams. In Proc. IEEE conf. comp. vision patt. recog.
Google Scholar
Huguet, F., & Devernay, F. (2007). A variational method for scene flow estimation from stereo sequences. In Proc. int. conf. comp. vision (pp. 1–7).
Google Scholar
Isard, M., & MacCormick, J. (2006). Dense motion and disparity estimation via loopy belief propagation. In Asian conf. on computer vision (Vol. 3852, p. 32).
Google Scholar
Li, R., & Sclaroff, S. (2008). Multi-scale 3d scene flow from binocular stereo sequences. Computer Vision and Image Understanding, 110(1), 75–90.
Article Google Scholar
Min, D. B., & Sohn, K. (2006). Edge-preserving simultaneous joint motion-disparity estimation. In Proc. international conf. patt. recog. (pp. 74–77).
Google Scholar
Neumann, J., & Aloimonos, Y. (2002). Spatio-temporal stereo using multi-resolution subdivision surfaces. International Journal of Computer Vision, 47(1–3), 181–193.
Article MATH Google Scholar
Pock, T., Schoenemann, T., Graber, G., Bischof, H., & Cremers, D. (2008). A convex formulation of continuous multi-label problems. In Proc. European conf. comp. vision (pp. 792–805).
Google Scholar
Pons, J., Keriven, R., & Faugeras, O. (2007). Multi-view stereo reconstruction and scene flow estimation with a global image-based matching score. International Journal of Computer Vision, 72(2), 179–193.
Article Google Scholar
Robert, L., & Deriche, R. (1996). Dense depth map reconstruction: A minimization and regularization approach which preserves discontinuities. In Proc. European conf. comp. vision (pp. 439–451).
Google Scholar
Scharstein, D., & Szeliski, R. Middlebury stereo vision research page. http://vision.middlebury.edu/stereo.
Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1–3), 7–42.
Article MATH Google Scholar
Scharstein, D., & Szeliski, R. (2003). High-accuracy stereo depth maps using structured light. In Proc. IEEE conf. comp. vision patt. recog. (pp. 195–202).
Google Scholar
Strecha, C., Tuytelaars, T., & Gool, L. J. V. (2003). Dense matching of multiple wide-baseline views. In Proc. int. conf. comp. vision (pp. 1194–1201).
Chapter Google Scholar
Vedula, S., Baker, S., Rander, P., Collins, R. T., & Kanade, T. (1999). Three-dimensional scene flow. In Proc. int. conf. comp. vision (pp. 722–729).
Google Scholar
Vedula, S., Baker, S., Rander, P., Collins, R. T., & Kanade, T. (2005). Three-dimensional scene flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3), 475–480.
Article Google Scholar
Vedula, S., Baker, S., Seitz, S., & Kanade, T. (2000). Shape and motion carving in 6D. In Proc. IEEE conf. comp. vision patt. recog. (Vol. 2).
Google Scholar
Wedel, A., Brox, T., Vaudrey, T., Rabe, C., Franke, U., & Cremers, D. (2011). Stereoscopic scene flow computation for 3d motion understanding. International Journal of Computer Vision, 95(1), 29–51.
Article MATH Google Scholar
Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., & Cremers, D. (2008). Efficient dense scene flow from sparse or dense stereo data. In Proc. European conf. comp. vision (pp. 739–751).
Google Scholar
Woodford, O. J., Torr, P. H. S., Reid, I. D., & Fitzgibbon, A. W. (2009). Global stereo reconstruction under second-order smoothness priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2115–2128.
Article Google Scholar
Young, D. (1954). Iterative methods for solving partial difference equations of elliptic type. Transactions of the American Mathematical Society, 76(1), 92–111.
Article MathSciNet MATH Google Scholar
Zhang, Y., & Kambhamettu, C. (2000). Integrated 3d scene flow and structure recovery from multiview image sequences. In Proc. IEEE conf. comp. vision patt. recog. (Vol. 2, pp. 674–681).
Google Scholar
Zhang, Y., & Kambhamettu, C. (2001). On 3d scene flow and structure estimation. In Proc. IEEE conf. comp. vision patt. recog. (pp. 778–785).
Google Scholar

Download references

Acknowledgements

The authors are grateful to the A.M.N. foundation for its generous financial support.

Author information

Authors and Affiliations

School of Electrical Engineering, Tel Aviv University, Tel Aviv, 69978, Israel
Tali Basha & Nahum Kiryati
Efi Arazi School of Computer Science, The Interdisciplinary Center, Herzliya, 46150, Israel
Yael Moses

Authors

Tali Basha
View author publications
You can also search for this author in PubMed Google Scholar
Yael Moses
View author publications
You can also search for this author in PubMed Google Scholar
Nahum Kiryati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tali Basha.

Additional information

An earlier version of part of this work appear in CVPR 2010 (Basha et al. 2010).

Appendices

Appendix A: Mapping Between Images

Our 3D parameterization in the presented framework introduces a nonlinear transformation of the 3D unknowns, Z and V, to each of the image’s plane. A notable challenge in the minimization of the proposed functional arises from the nontrivial mapping of the images’ coordinates to the reference camera coordinate system.

Using our parametrization, each pixel in the reference camera, (x,y), and its corresponding depth, Z(x,y), specify a 3D point, P (see Eq. (5)). It follows that projecting P onto the ith camera maps (x,y,Z(x,y)) to the point p _i=(x _i,y _i)^T. That is,

$$ \textbf {p}_i = \mathit{Proj}\bigl(\textbf {P},M^i\bigr) = f^i\bigl(x,y,Z(x,y)\bigr), $$

(17)

where f ⁱ is the mapping to the corresponding ith image. More precisely, f ⁱ is given by substituting Eq. (5) into Eq. (1). For example, the component x _i is given by:

$$ x_i = {a\cdot Z + b \over c\cdot Z + d}~. $$

(18)

The coefficients a,b,c and d depend on the reference camera coordinates, (x,y):

(19)

where M ⁱ is the 3×4 projection matrix of the ith camera (subscripts denote the row and column indices). The expression for y _i is equivalently computed.

Similarly, at time step t+1, projecting $\widehat{\textbf {P}}= \textbf {P}+ \textbf {V}$ maps (x,y,Z(x,y),V(x,y)) to $\widehat{\textbf {p}}_{i}$, denoted by a mapping, $\widehat{f}^{i}$:

$$ \widehat{\textbf {p}}_i = \mathit{Proj}\bigl(\,\widehat{\textbf {P}},M^i\bigr) = \widehat{f}^i\bigl(x,y,Z(x,y),\textbf {V}(x,y) \bigr). $$

(20)

Analogously to Eq. (18), the component $\widehat{x}_{i}$ is given by:

$$ \widehat{x}_i = {a\cdot Z + M^i_{{11}}\cdot u + M^i_{{12}}\cdot v + M^i_{{13}}\cdot w+ b \over c\cdot Z + M^i_{{31}} \cdot u + M^i_{{32}}\cdot v + M^i_{{33}} \cdot w + d}, $$

(21)

where the coefficients a,b,c and d are defined in Eq. (19).

Appendix B: Image Derivatives with Respect to the 3D Unknowns

A first step toward the numerical solution of the resulting Euler-Lagrange equations (Eq. (12) or Eq. (13)) requires computing the derivatives of the intensity functions with respect to the 3D unknowns. To produce the final expressions for these derivatives, the nonlinear relation between the 3D unknowns and the image plane has to be carefully considered (see Appendix A). This appendix shows how these computations are performed. The mathematical analysis is preformed in the continuous domain. Thus, the frames as well as the 3D unknowns are regarded as continuous functions. Finally, the resulting equations are discretized by using standard approximations for the derivatives.

For simplicity, given a time step, t, we use the intensity functions $I_{i}^{t}$ and $I_{i}^{t+1}$ to abbreviate I _i(p _i,t) and $I_{i}(\widehat{\textbf {p}}_{i}, t+1)$, respectively. We next elaborate on the computation of derivatives of $I_{i}^{t}$ and $I_{i}^{t+1}$ with respect to Z and u, denoted by $\partial_{Z}I_{i}^{t}$, $\partial_{Z}I_{i}^{t+1}$ and $\partial_{u}I_{i}^{t+1}$ (the other derivatives with respect to v and w are similarly computed).

$I_{i}^{t}$ can be regarded as a function of the reference image coordinates, (x,y), and the corresponding depth, Z(x,y), by considering a composition of two functions: the ith intensity function and the mapping transformation, defined in Appendix A. That is,

$$ I_i^t\bigl(x,y,Z(x,y)\bigr) = I_i(f^i\bigl(x,y,Z(x,y),t\bigr). $$

(22)

Similarly, $I_{i}^{t+1}$ can be regarded as a function of (x,y,Z(x,y)) and V. That is,

$$ I_i^{t+1}(x,y,Z,\textbf {V})= I_i \bigl(\widehat{f}^i(x,y,Z,\textbf {V}),t+1\bigr). $$

(23)

Considering Eqs. (17)–(20), the chain rule is applied for computing the partial derivatives:

(24)

(25)

(26)

The derivatives $\partial_{Z}\textbf {p}_{i}^{T} = ( \partial_{Z}x_{i}, \partial_{Z}y_{i})^{T}$ are directly computed from Eqs. (18)–(19).

To compute the derivative of $I_{i}^{t}$ with respect to p _i, $(\nabla I_{i}^{t})^{T}$, we use a warping approach. As discussed in Appendix A, a nonlinear mapping relates each of the image’s plane to the reference camera. By warping $I_{i}^{t}$ toward the reference image using the estimated Z, the values of $I_{i}^{t}$ can be directly related to the reference image values, $I_{0}^{t}$. Specifically, the required derivatives, $\nabla I_{i}^{t}$ are then computed using the warped image. Let $I_{i,w}^{t}$ be the warped image of $I_{i}^{t}$. That is,

(27)

The warped image gradient is related to the original image by:

(28)

where J is the Jacobian matrix of the change of coordinates, (x _i,y _i)→(x,y). Therefore, the original image derivatives are obtained by multiplying Eq. (28) by J ⁻¹, leading to:

(29)

The Jacobian matrix, J, is obtained by computing the derivatives of p _i with respect to x and y. In particular, J involves the derivatives of Z(x,y), namely ∂ _x Z and ∂ _y Z. Following the explanation above, $\nabla I_{i,w}^{t+1}$ is similarly computed. In this case, the Jacobian matrix, J, additionally involves the derivatives of u,v and w with respect to the reference camera coordinates.

Appendix C: Linearizion

This appendix describes the linearizion process of the resulting Euler-Lagrange equations and the numerical approximations used. At each pyramid level, a linear system of equations is obtained and small increments in the 3D unknowns, dZ, and d V, are estimated. The total solution, Z+dZ, and V+d V, is then used to initialize the next finer level (see Sect. 2.3.2).

Considering equations (12)–(13), there are two sources of nonlinearity:

1.
nonlinearized data term;
2.
nonquadratic cost function Ψ.

Following the numerical approach suggested by Brox et al. (2004), two nested fixed point iterations are used at each pyramid level to remove the nonlinearity.

The outer iteration is responsible for removing the nonlinearity resulting from the nonlinear data term, using fixed point iteration on Z and V. Let k be the outer index iteration. The solution at the (k+1)th iteration is decomposed of the previous solution and small, unknown increments. That is, Z ^k+1=Z ^k+dZ ^k and V ^k+1=V ^k+d V ^k, where d V ^k=(du ^k,dv ^k,dw ^k)^T.

The first step toward linearizion is approximating the nonlinear expression given in Eq. (11) using first order Taylor expansion. We use $\Delta_{i}^{k},~\widehat{\Delta}_{i}^{k}$ and $\Delta_{i}^{t, k}$ to denote the expressions given in Eq. (11) using the fixed values Z ^k and V ^k. That is,

(30)

where $\textbf {p}_{i}^{k} = \mathit{Proj}(\textbf {P}^{k}, M^{i})$ and P ^k is given by placing Z ^k in Eq. (5). The expressions for $\widehat{\textbf {p}}_{i}^{k}$ and $\widehat{\textbf {P}}^{k}$ are analogously given. Using these notations, the first order Taylor expansions for these expressions are given by:

(31)

Equation (31) is computed by using the first order Taylor expansion for the following expressions:

(32)

(33)

where P ^k+1=P ^k+d P ^k is given by placing Z ^k+dZ ^k (Eq. (5)). Similarly, $\widehat{\textbf {P}}^{k+1}=\widehat{\textbf {P}}^{k} + d\widehat{\textbf {P}}^{k}$ where $d\widehat{\textbf {P}}^{k}= dZ^{k} + d\textbf {V}^{k}$. The computation of the image derivatives with respect to the 3D unknowns is detailed in Appendix A.

Therefore, deriving the associated Euler-Lagrange equations with respect to the unknown increments dZ ^k and du ^k results in:

(34)

(35)

The dependency of the above two equations in the increments, dZ ^k and du ^k, is obtained by substituting Eq. (31) into $\Delta_{i}^{k+1},\Delta_{i}^{t, k+1}$, and, $\widehat{\Delta}_{i}^{k+1}$. The equations for dv ^k and dw ^k are similar to Eq. (35).

Applying the above approximations (Eq. (31)), the resulting Euler-Lagrange equations are a nonlinear system of equations in the unknowns dZ ^k and d V ^k. The remaining nonlinearity is originated by Ψ′. Therefore, an additional fixed point iterations loop for Ψ′ expressions is preformed. Finally, after standard discretization of the derivatives, a linear system of equations is introduced. The solution is obtained by applying the successive overrelaxation (SOR) method.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Basha, T., Moses, Y. & Kiryati, N. Multi-view Scene Flow Estimation: A View Centered Variational Approach. Int J Comput Vis 101, 6–21 (2013). https://doi.org/10.1007/s11263-012-0542-7

Download citation

Received: 07 September 2010
Accepted: 23 May 2012
Published: 14 June 2012
Issue Date: January 2013
DOI: https://doi.org/10.1007/s11263-012-0542-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-view Scene Flow Estimation: A View Centered Variational Approach

Abstract

Access this article

Similar content being viewed by others

3D Scene Flow Estimation with a Piecewise Rigid Scene Model

View-Consistent 3D Scene Flow Estimation over Multiple Frames

Monocular scene flow estimation via variational method

Notes

References

Acknowledgements