Skip to main content
Log in

Multi-view Scene Flow Estimation: A View Centered Variational Approach

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We present a novel method for recovering the 3D structure and scene flow from calibrated multi-view sequences. We propose a 3D point cloud parametrization of the 3D structure and scene flow that allows us to directly estimate the desired unknowns. A unified global energy functional is proposed to incorporate the information from the available sequences and simultaneously recover both depth and scene flow. The functional enforces multi-view geometric consistency and imposes brightness constancy and piecewise smoothness assumptions directly on the 3D unknowns. It inherently handles the challenges of discontinuities, occlusions, and large displacements. The main contribution of this work is the fusion of a 3D representation and an advanced variational framework that directly uses the available multi-view information. This formulation allows us to advantageously bind the 3D unknowns in time and space. Different from optical flow and disparity, the proposed method results in a nonlinear mapping between the images’ coordinates, thus giving rise to additional challenges in the optimization process. Our experiments on real and synthetic data demonstrate that the proposed method successfully recovers the 3D structure and scene flow despite the complicated nonconvex optimization problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The source code is publicly available.

References

  • Ayvaci, A., Raptis, M., & Soatto, S. (2010). Occlusion detection and motion estimation with convex optimization. NIPS (pp. 100–108).

  • Basha, T., Moses, Y., & Kiryati, N. (2010). Multi-view scene flow estimation: A view centered variational approach. In Proc. IEEE conf. comp. vision patt. recog. (pp. 1506–1513).

    Google Scholar 

  • Ben-Ari, R., & Sochen, N. A. (2007). Variational stereo vision with sharp discontinuities and occlusion handling. In Proc. int. conf. comp. vision (pp. 1–7).

    Google Scholar 

  • Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In Proc. European conf. comp. vision (pp. 25–36).

    Google Scholar 

  • Carceroni, R. L., & Kutulakos, K. N. (2002). Multi-view scene capture by surfel sampling: from video streams to non-rigid 3d motion, shape and reflectance. International Journal of Computer Vision, 49(2–3), 175–214.

    Article  MATH  Google Scholar 

  • Courchay, J., Pons, J. P., Monasse, P., & Keriven, R. (2009). Dense and accurate spatio-temporal multi-view stereovision. In Asian conf. on computer vision (pp. 11–22).

    Google Scholar 

  • Felzenszwalb, P., & Huttenlocher, D. (2006). Efficient belief propagation for early vision. International Journal of Computer Vision, 70(1), 41–54.

    Article  Google Scholar 

  • Furukawa, Y., & Ponce, J. (2008). Dense 3d motion capture from synchronized video streams. In Proc. IEEE conf. comp. vision patt. recog.

    Google Scholar 

  • Huguet, F., & Devernay, F. (2007). A variational method for scene flow estimation from stereo sequences. In Proc. int. conf. comp. vision (pp. 1–7).

    Google Scholar 

  • Isard, M., & MacCormick, J. (2006). Dense motion and disparity estimation via loopy belief propagation. In Asian conf. on computer vision (Vol. 3852, p. 32).

    Google Scholar 

  • Li, R., & Sclaroff, S. (2008). Multi-scale 3d scene flow from binocular stereo sequences. Computer Vision and Image Understanding, 110(1), 75–90.

    Article  Google Scholar 

  • Min, D. B., & Sohn, K. (2006). Edge-preserving simultaneous joint motion-disparity estimation. In Proc. international conf. patt. recog. (pp. 74–77).

    Google Scholar 

  • Neumann, J., & Aloimonos, Y. (2002). Spatio-temporal stereo using multi-resolution subdivision surfaces. International Journal of Computer Vision, 47(1–3), 181–193.

    Article  MATH  Google Scholar 

  • Pock, T., Schoenemann, T., Graber, G., Bischof, H., & Cremers, D. (2008). A convex formulation of continuous multi-label problems. In Proc. European conf. comp. vision (pp. 792–805).

    Google Scholar 

  • Pons, J., Keriven, R., & Faugeras, O. (2007). Multi-view stereo reconstruction and scene flow estimation with a global image-based matching score. International Journal of Computer Vision, 72(2), 179–193.

    Article  Google Scholar 

  • Robert, L., & Deriche, R. (1996). Dense depth map reconstruction: A minimization and regularization approach which preserves discontinuities. In Proc. European conf. comp. vision (pp. 439–451).

    Google Scholar 

  • Scharstein, D., & Szeliski, R. Middlebury stereo vision research page. http://vision.middlebury.edu/stereo.

  • Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1–3), 7–42.

    Article  MATH  Google Scholar 

  • Scharstein, D., & Szeliski, R. (2003). High-accuracy stereo depth maps using structured light. In Proc. IEEE conf. comp. vision patt. recog. (pp. 195–202).

    Google Scholar 

  • Strecha, C., Tuytelaars, T., & Gool, L. J. V. (2003). Dense matching of multiple wide-baseline views. In Proc. int. conf. comp. vision (pp. 1194–1201).

    Chapter  Google Scholar 

  • Vedula, S., Baker, S., Rander, P., Collins, R. T., & Kanade, T. (1999). Three-dimensional scene flow. In Proc. int. conf. comp. vision (pp. 722–729).

    Google Scholar 

  • Vedula, S., Baker, S., Rander, P., Collins, R. T., & Kanade, T. (2005). Three-dimensional scene flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3), 475–480.

    Article  Google Scholar 

  • Vedula, S., Baker, S., Seitz, S., & Kanade, T. (2000). Shape and motion carving in 6D. In Proc. IEEE conf. comp. vision patt. recog. (Vol. 2).

    Google Scholar 

  • Wedel, A., Brox, T., Vaudrey, T., Rabe, C., Franke, U., & Cremers, D. (2011). Stereoscopic scene flow computation for 3d motion understanding. International Journal of Computer Vision, 95(1), 29–51.

    Article  MATH  Google Scholar 

  • Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., & Cremers, D. (2008). Efficient dense scene flow from sparse or dense stereo data. In Proc. European conf. comp. vision (pp. 739–751).

    Google Scholar 

  • Woodford, O. J., Torr, P. H. S., Reid, I. D., & Fitzgibbon, A. W. (2009). Global stereo reconstruction under second-order smoothness priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2115–2128.

    Article  Google Scholar 

  • Young, D. (1954). Iterative methods for solving partial difference equations of elliptic type. Transactions of the American Mathematical Society, 76(1), 92–111.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, Y., & Kambhamettu, C. (2000). Integrated 3d scene flow and structure recovery from multiview image sequences. In Proc. IEEE conf. comp. vision patt. recog. (Vol. 2, pp. 674–681).

    Google Scholar 

  • Zhang, Y., & Kambhamettu, C. (2001). On 3d scene flow and structure estimation. In Proc. IEEE conf. comp. vision patt. recog. (pp. 778–785).

    Google Scholar 

Download references

Acknowledgements

The authors are grateful to the A.M.N. foundation for its generous financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tali Basha.

Additional information

An earlier version of part of this work appear in CVPR 2010 (Basha et al. 2010).

Appendices

Appendix A: Mapping Between Images

Our 3D parameterization in the presented framework introduces a nonlinear transformation of the 3D unknowns, Z and V, to each of the image’s plane. A notable challenge in the minimization of the proposed functional arises from the nontrivial mapping of the images’ coordinates to the reference camera coordinate system.

Using our parametrization, each pixel in the reference camera, (x,y), and its corresponding depth, Z(x,y), specify a 3D point, P (see Eq. (5)). It follows that projecting P onto the ith camera maps (x,y,Z(x,y)) to the point p i =(x i ,y i )T. That is,

$$ \textbf {p}_i = \mathit{Proj}\bigl(\textbf {P},M^i\bigr) = f^i\bigl(x,y,Z(x,y)\bigr), $$
(17)

where f i is the mapping to the corresponding ith image. More precisely, f i is given by substituting Eq. (5) into Eq. (1). For example, the component x i is given by:

$$ x_i = {a\cdot Z + b \over c\cdot Z + d}~. $$
(18)

The coefficients a,b,c and d depend on the reference camera coordinates, (x,y):

(19)

where M i is the 3×4 projection matrix of the ith camera (subscripts denote the row and column indices). The expression for y i is equivalently computed.

Similarly, at time step t+1, projecting \(\widehat{\textbf {P}}= \textbf {P}+ \textbf {V}\) maps (x,y,Z(x,y),V(x,y)) to \(\widehat{\textbf {p}}_{i}\), denoted by a mapping, \(\widehat{f}^{i}\):

$$ \widehat{\textbf {p}}_i = \mathit{Proj}\bigl(\,\widehat{\textbf {P}},M^i\bigr) = \widehat{f}^i\bigl(x,y,Z(x,y),\textbf {V}(x,y) \bigr). $$
(20)

Analogously to Eq. (18), the component \(\widehat{x}_{i}\) is given by:

$$ \widehat{x}_i = {a\cdot Z + M^i_{{11}}\cdot u + M^i_{{12}}\cdot v + M^i_{{13}}\cdot w+ b \over c\cdot Z + M^i_{{31}} \cdot u + M^i_{{32}}\cdot v + M^i_{{33}} \cdot w + d}, $$
(21)

where the coefficients a,b,c and d are defined in Eq. (19).

Appendix B: Image Derivatives with Respect to the 3D Unknowns

A first step toward the numerical solution of the resulting Euler-Lagrange equations (Eq. (12) or Eq. (13)) requires computing the derivatives of the intensity functions with respect to the 3D unknowns. To produce the final expressions for these derivatives, the nonlinear relation between the 3D unknowns and the image plane has to be carefully considered (see Appendix A). This appendix shows how these computations are performed. The mathematical analysis is preformed in the continuous domain. Thus, the frames as well as the 3D unknowns are regarded as continuous functions. Finally, the resulting equations are discretized by using standard approximations for the derivatives.

For simplicity, given a time step, t, we use the intensity functions \(I_{i}^{t}\) and \(I_{i}^{t+1}\) to abbreviate I i (p i ,t) and \(I_{i}(\widehat{\textbf {p}}_{i}, t+1)\), respectively. We next elaborate on the computation of derivatives of \(I_{i}^{t}\) and \(I_{i}^{t+1}\) with respect to Z and u, denoted by \(\partial_{Z}I_{i}^{t}\)\(\partial_{Z}I_{i}^{t+1}\) and \(\partial_{u}I_{i}^{t+1}\) (the other derivatives with respect to v and w are similarly computed).

\(I_{i}^{t}\) can be regarded as a function of the reference image coordinates, (x,y), and the corresponding depth, Z(x,y), by considering a composition of two functions: the ith intensity function and the mapping transformation, defined in Appendix A. That is,

$$ I_i^t\bigl(x,y,Z(x,y)\bigr) = I_i(f^i\bigl(x,y,Z(x,y),t\bigr). $$
(22)

Similarly, \(I_{i}^{t+1}\) can be regarded as a function of (x,y,Z(x,y)) and V. That is,

$$ I_i^{t+1}(x,y,Z,\textbf {V})= I_i \bigl(\widehat{f}^i(x,y,Z,\textbf {V}),t+1\bigr). $$
(23)

Considering Eqs. (17)–(20), the chain rule is applied for computing the partial derivatives:

(24)
(25)
(26)

The derivatives \(\partial_{Z}\textbf {p}_{i}^{T} = ( \partial_{Z}x_{i}, \partial_{Z}y_{i})^{T}\) are directly computed from Eqs. (18)–(19).

To compute the derivative of \(I_{i}^{t}\) with respect to p i , \((\nabla I_{i}^{t})^{T}\), we use a warping approach. As discussed in Appendix A, a nonlinear mapping relates each of the image’s plane to the reference camera. By warping \(I_{i}^{t}\) toward the reference image using the estimated Z, the values of \(I_{i}^{t}\) can be directly related to the reference image values, \(I_{0}^{t}\). Specifically, the required derivatives, \(\nabla I_{i}^{t}\) are then computed using the warped image. Let \(I_{i,w}^{t}\) be the warped image of \(I_{i}^{t}\). That is,

(27)

The warped image gradient is related to the original image by:

(28)

where J is the Jacobian matrix of the change of coordinates, (x i ,y i )→(x,y). Therefore, the original image derivatives are obtained by multiplying Eq. (28) by J −1, leading to:

(29)

The Jacobian matrix, J, is obtained by computing the derivatives of p i with respect to x and y. In particular, J involves the derivatives of Z(x,y), namely x Z and y Z. Following the explanation above, \(\nabla I_{i,w}^{t+1}\) is similarly computed. In this case, the Jacobian matrix, J, additionally involves the derivatives of u,v and w with respect to the reference camera coordinates.

Appendix C: Linearizion

This appendix describes the linearizion process of the resulting Euler-Lagrange equations and the numerical approximations used. At each pyramid level, a linear system of equations is obtained and small increments in the 3D unknowns, dZ, and d V, are estimated. The total solution, Z+dZ, and V+d V, is then used to initialize the next finer level (see Sect. 2.3.2).

Considering equations (12)–(13), there are two sources of nonlinearity:

  1. 1.

    nonlinearized data term;

  2. 2.

    nonquadratic cost function Ψ.

Following the numerical approach suggested by Brox et al. (2004), two nested fixed point iterations are used at each pyramid level to remove the nonlinearity.

The outer iteration is responsible for removing the nonlinearity resulting from the nonlinear data term, using fixed point iteration on Z and V. Let k be the outer index iteration. The solution at the (k+1)th iteration is decomposed of the previous solution and small, unknown increments. That is, Z k+1=Z k+dZ k and V k+1=V k+d V k, where d V k=(du k,dv k,dw k)T.

The first step toward linearizion is approximating the nonlinear expression given in Eq. (11) using first order Taylor expansion. We use \(\Delta_{i}^{k},~\widehat{\Delta}_{i}^{k}\) and \(\Delta_{i}^{t, k}\) to denote the expressions given in Eq. (11) using the fixed values Z k and V k. That is,

(30)

where \(\textbf {p}_{i}^{k} = \mathit{Proj}(\textbf {P}^{k}, M^{i})\) and P k is given by placing Z k in Eq. (5). The expressions for \(\widehat{\textbf {p}}_{i}^{k}\) and \(\widehat{\textbf {P}}^{k}\) are analogously given. Using these notations, the first order Taylor expansions for these expressions are given by:

(31)

Equation (31) is computed by using the first order Taylor expansion for the following expressions:

(32)
(33)

where P k+1=P k+d P k is given by placing Z k+dZ k (Eq. (5)). Similarly, \(\widehat{\textbf {P}}^{k+1}=\widehat{\textbf {P}}^{k} + d\widehat{\textbf {P}}^{k}\) where \(d\widehat{\textbf {P}}^{k}= dZ^{k} + d\textbf {V}^{k}\). The computation of the image derivatives with respect to the 3D unknowns is detailed in Appendix A.

Therefore, deriving the associated Euler-Lagrange equations with respect to the unknown increments dZ k and du k results in:

(34)
(35)

The dependency of the above two equations in the increments, dZ k and du k, is obtained by substituting Eq. (31) into \(\Delta_{i}^{k+1},\Delta_{i}^{t, k+1}\), and, \(\widehat{\Delta}_{i}^{k+1}\). The equations for dv k and dw k are similar to Eq. (35).

Applying the above approximations (Eq. (31)), the resulting Euler-Lagrange equations are a nonlinear system of equations in the unknowns dZ k and d V k. The remaining nonlinearity is originated by Ψ′. Therefore, an additional fixed point iterations loop for Ψ′ expressions is preformed. Finally, after standard discretization of the derivatives, a linear system of equations is introduced. The solution is obtained by applying the successive overrelaxation (SOR) method.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Basha, T., Moses, Y. & Kiryati, N. Multi-view Scene Flow Estimation: A View Centered Variational Approach. Int J Comput Vis 101, 6–21 (2013). https://doi.org/10.1007/s11263-012-0542-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-012-0542-7

Keywords

Navigation