Learning Online Multi-sensor Depth Fusion

Sandström, Erik; Oswald, Martin R.; Kumar, Suryansh; Weder, Silvan; Yu, Fisher; Sminchisescu, Cristian; Van Gool, Luc

doi:10.1007/978-3-031-19824-3_6

Erik Sandström¹²,
Martin R. Oswald^12,13,
Suryansh Kumar¹²,
Silvan Weder¹²,
Fisher Yu¹²,
Cristian Sminchisescu^14,16 &
…
Luc Van Gool^12,15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13692))

Included in the following conference series:

European Conference on Computer Vision

2481 Accesses
2 Citations

Abstract

Many hand-held or mixed reality devices are used with a single sensor for 3D reconstruction, although they often comprise multiple sensors. Multi-sensor depth fusion is able to substantially improve the robustness and accuracy of 3D reconstruction methods, but existing techniques are not robust enough to handle sensors which operate with diverse value ranges as well as noise and outlier statistics. To this end, we introduce SenFuNet,- a depth fusion approach that learns sensor-specific noise and outlier statistics and combines the data streams of depth frames from different sensors in an online fashion. Our method fuses multi-sensor depth streams regardless of time synchronization and calibration and generalizes well with little training data. We conduct experiments with various sensor combinations on the real-world CoRBS and Scene3D datasets, as well as the Replica dataset. Experiments demonstrate that our fusion strategy outperforms traditional and recent online depth fusion approaches. In addition, the combination of multiple sensors yields more robust outlier handling and more precise surface reconstruction than the use of a single sensor. The source code and data are available at https://github.com/tfy14esa/SenFuNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Enforcing free space for voxels along the ray from the camera to the surface [36]. Note that outliers behind surfaces are not removed with this technique.
2.
See supplementary material for a definition.
3.
Additionally, we tweak the original implementation to get rid of outliers. See supplementary material.
4.
https://github.com/marian42/mesh_to_sdf.
5.
http://redwood-data.org/indoor/dataset.html.
6.
Unfortunately, no suitable public real 3D dataset exists, which comprises binocular stereo pairs, and an active depth sensor, as well as ground truth geometry.

References

Agresti, G., Minto, L., Marin, G., Zanuttigh, P.: Deep learning for confidence information in stereo and ToF data fusion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 697–705 (2017)
Google Scholar
Agresti, G., Minto, L., Marin, G., Zanuttigh, P.: Stereo and ToF data fusion by learning from synthetic data. Inf. Fusion 49, 161–173 (2019)
Article Google Scholar
Ali, M.K., Rajput, A., Shahzad, M., Khan, F., Akhtar, F., Börner, A.: Multi-sensor depth fusion framework for real-time 3d reconstruction. IEEE Access 7, 136471–136480 (2019)
Article Google Scholar
Božič, A., Palafox, P., Thies, J., Dai, A., Nießner, M.: Transformerfusion: monocular RGB scene reconstruction using transformers. arXiv preprint arXiv:2107.02191 (2021)
Bylow, E., Olsson, C., Kahl, F.: Robust online 3d reconstruction combining a depth sensor and sparse feature points. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3709–3714 (2016)
Google Scholar
Bylow, E., Maier, R., Kahl, F., Olsson, C.: Combining depth fusion and photometric stereo for fine-detailed 3d models. In: Felsberg, M., Forssén, P.-E., Sintorn, I.-M., Unger, J. (eds.) SCIA 2019. LNCS, vol. 11482, pp. 261–274. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20205-7_22
Chapter Google Scholar
Cao, Y.P., Kobbelt, L., Hu, S.M.: Real-time high-accuracy three-dimensional reconstruction with consumer RGB-D cameras. ACM Trans. Graph. (TOG) 37(5), 1–16 (2018)
Article Google Scholar
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)
Google Scholar
Choe, J., Im, S., Rameau, F., Kang, M., Kweon, I.S.: VolumeFusion: deep depth fusion for 3d scene reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 16086–16095, October 2021
Google Scholar
Choi, O., Lee, S.: Fusion of time-of-flight and stereo for disambiguation of depth measurements. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7727, pp. 640–653. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37447-0_49
Chapter Google Scholar
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 303–312 (1996)
Google Scholar
Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: BundleFusion: real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 36(4), 1 (2017)
Article Google Scholar
Dal Mutto, C., Zanuttigh, P., Cortelazzo, G.M.: Probabilistic TOF and stereo data fusion based on mixed pixels measurement models. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2260–2272 (2015)
Article Google Scholar
Deng, Y., Xiao, J., Zhou, S.Z.: TOF and stereo data fusion using dynamic search range stereo matching. IEEE Trans. Multimedia 24, 2739–2751 (2021)
Article Google Scholar
Dong, W., Wang, Q., Wang, X., Zha, H.: PSDF fusion: probabilistic signed distance function for on-the-fly 3d data fusion and scene reconstruction. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 701–717 (2018)
Google Scholar
Duan, Y., Pei, M., Wang, Y.: Probabilistic depth map fusion of kinect and stereo in real-time. In: 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 2317–2322. IEEE (2012)
Google Scholar
Duan, Y., Pei, M., Wang, Y., Yang, M., Qin, I., Jia, Y.: A unified probabilistic framework for real-time depth map fusion. J. Inf. Sci. Eng. 31(4), 1309–1327 (2015)
Google Scholar
Evangelidis, G.D., Hansard, M., Horaud, R.: Fusion of range and stereo data for high-resolution scene-modeling. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2178–2192 (2015)
Article Google Scholar
Golodetz, S., Cavallari, T., Lord, N.A., Prisacariu, V.A., Murray, D.W., Torr, P.H.: Collaborative large-scale dense 3d reconstruction with online inter-agent pose optimisation. IEEE Trans. Visual. Comput. Graph. 24(11), 2895–2905 (2018)
Article Google Scholar
Gu, P., et al.: A 3d reconstruction method using multisensor fusion in large-scale indoor scenes. Complexity 2020 (2020)
Google Scholar
Handa, A., Whelan, T., McDonald, J., Davison, A.J.: A benchmark for RGB-D visual odometry, 3d reconstruction and slam. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1524–1531. IEEE (2014)
Google Scholar
Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2007)
Article Google Scholar
Huang, J., Huang, S.S., Song, H., Hu, S.M.: Di-fusion: online implicit 3d reconstruction with deep priors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8932–8941 (2021)
Google Scholar
Izadi, S., et al.: KinectFusion: real-time 3d reconstruction and interaction using a moving depth camera. In: Proceedings of the 24th annual ACM Symposium on User Interface Software and Technology, pp. 559–568. ACM (2011)
Google Scholar
Kähler, O., Prisacariu, V.A., Ren, C.Y., Sun, X., Torr, P.H.S., Murray, D.W.: Very high frame rate volumetric integration of depth images on mobile devices. IEEE Trans. Vis. Comput. Graph. 21(11), 1241–1250 (2015)
Article Google Scholar
Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. ACM Trans. Graph. (ToG) 32(3), 1–13 (2013)
Article MATH Google Scholar
Kim, Y.M., Theobalt, C., Diebel, J., Kosecka, J., Miscusik, B., Thrun, S.: Multi-view image and TOF sensor fusion for dense 3d reconstruction. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV workshops, pp. 1542–1549. IEEE (2009)
Google Scholar
Lefloch, D., Weyrich, T., Kolb, A.: Anisotropic point-based fusion. In: 2015 18th International Conference on Information Fusion (Fusion), pp. 2121–2128. IEEE (2015)
Google Scholar
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3d surface construction algorithm. ACM siggraph Comput. Graph. 21(4), 163–169 (1987)
Article Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
MATH Google Scholar
Maddern, W., Newman, P.: Real-time probabilistic fusion of sparse 3d lidar and dense stereo. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2181–2188. IEEE (2016)
Google Scholar
Savva, M., et al.: Habitat: a platform for embodied AI Research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Marin, G., Zanuttigh, P., Mattoccia, S.: Reliable fusion of ToF and stereo depth driven by confidence measures. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 386–401. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_24
Chapter Google Scholar
Martins, D., Van Hecke, K., De Croon, G.: Fusion of stereo and still monocular depth estimates in a self-supervised learning context. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 849–856. IEEE (2018)
Google Scholar
Murez, Z., van As, T., Bartolozzi, J., Sinha, A., Badrinarayanan, V., Rabinovich, A.: Atlas: end-to-end 3D scene reconstruction from posed images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 414–431. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_25
Chapter Google Scholar
Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: ISMAR, vol. 11, pp. 127–136 (2011)
Google Scholar
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: ICCV (2011)
Google Scholar
Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3d reconstruction at scale using voxel hashing. ACM Trans. Graph. (TOG) 32 (2013). https://doi.org/10.1145/2508363.2508374
Oleynikova, H., Taylor, Z., Fehr, M., Siegwart, R., Nieto, J.I.: Voxblox: incremental 3d euclidean signed distance fields for on-board MAV planning. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2017, Vancouver, BC, Canada, 24–28 September 2017, pp. 1366–1373. IEEE (2017). https://doi.org/10.1109/IROS.2017.8202315
Park, K., Kim, S., Sohn, K.: High-precision depth estimation with the 3d lidar and stereo fusion. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2156–2163. IEEE (2018)
Google Scholar
Patil, V., Van Gansbeke, W., Dai, D., Van Gool, L.: Don’t forget the past: Recurrent depth estimation from monocular video. IEEE Robot. Autom. Lett. 5(4), 6813–6820 (2020)
Google Scholar
Poggi, M., Mattoccia, S.: Deep stereo fusion: combining multiple disparity hypotheses with deep-learning. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 138–147. IEEE (2016)
Google Scholar
Pu, C., Song, R., Tylecek, R., Li, N., Fisher, R.B.: SDF-MAN: semi-supervised disparity fusion with multi-scale adversarial networks. Remote Sens. 11(5), 487 (2019)
Article Google Scholar
Qiu, J., et al.: Deeplidar: deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 3313–3322. Computer Vision Foundation/IEEE (2019). https://doi.org/10.1109/CVPR.2019.00343, https://openaccess.thecvf.com/content_CVPR_2019/html/Qiu_DeepLiDAR_Deep_Surface_Normal_Guided_Depth_Prediction_for_Outdoor_Scene_CVPR_2019_paper.html
Rozumnyi, D., Cherabier, I., Pollefeys, M., Oswald, M.R.: Learned semantic multi-sensor depth map fusion. In: International Conference on Computer Vision Workshop (ICCVW), Workshop on 3D Reconstruction in the Wild, 2019. Seoul, South Korea (2019)
Google Scholar
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Chapter Google Scholar
Schops, T., Sattler, T., Pollefeys, M.: BAD SLAM: bundle adjusted direct RGB-D SLAM. In: CVPR (2019)
Google Scholar
Steinbrucker, F., Kerl, C., Cremers, D., Sturm, J.: Large-scale multi-resolution surface reconstruction from RGB-D sequences. In: 2013 IEEE International Conference on Computer Vision, pp. 3264–3271 (2013)
Google Scholar
Straub, J., et al.: The replica dataset: a digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019)
Sucar, E., Liu, S., Ortiz, J., Davison, A.: iMAP: implicit mapping and positioning in real-time. In: Proceedings of the IEEE International Conference on Computer Vision (2021)
Google Scholar
Sun, J., Xie, Y., Chen, L., Zhou, X., Bao, H.: NeuralRecon: real-time coherent 3d reconstruction from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15598–15607 (2021)
Google Scholar
Van Baar, J., Beardsley, P., Pollefeys, M., Gross, M.: Sensor fusion for depth estimation, including TOF and thermal sensors. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pp. 472–478. IEEE (2012)
Google Scholar
Wasenmüller, O., Meyer, M., Stricker, D.: Corbs: comprehensive RGB-D benchmark for slam using kinect v2. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–7. IEEE (2016)
Google Scholar
Weder, S., Schönberger, J.L., Pollefeys, M., Oswald, M.R.: RoutedFusion: learning real-time depth map fusion. ArXiv abs/2001.04388 (2020)
Google Scholar
Weder, S., Schonberger, J.L., Pollefeys, M., Oswald, M.R.: NeuralFusion: online depth fusion in latent space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3162–3172 (2021)
Google Scholar
Yan, Z., Tian, Y., Shi, X., Guo, P., Wang, P., Zha, H.: Continual neural mapping: learning an implicit scene representation from sequential observations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15782–15792, October 2021
Google Scholar
Yang, S., et al.: Noise-resilient reconstruction of panoramas and 3d scenes using robot-mounted unsynchronized commodity RGB-D cameras. ACM Trans. Graph. (TOG) 39(5), 1–15 (2020)
Article Google Scholar
Yang, S., Li, B., Liu, M., Lai, Y.K., Kobbelt, L., Hu, S.M.: HeteroFusion: dense scene reconstruction integrating multi-sensors. IEEE Trans. Visual. Comput. Graph. 26(11), 3217–3230 (2019)
Article Google Scholar
Zhou, Q.Y., Koltun, V.: Dense scene reconstruction with points of interest. ACM Trans. Graph. (ToG) 32(4), 1–8 (2013)
Article MATH Google Scholar
Zhu, Z., et al.: Nice-slam: neural implicit scalable encoding for slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12786–12796 (2022)
Google Scholar
Zollhöfer, M., et al.: State of the art on 3d reconstruction with RGB-D cameras. In: Computer Graphics Forum, vol. 37, pp. 625–652. Wiley Online Library (2018)
Google Scholar

Download references

Acknowledgements

This work was supported by the Google Focused Research Award 2019-HE-318, 2019-HE-323, 2020-FS-351, 2020-HS-411, as well as by research grants from FIFA and Toshiba. We further thank Hugo Sellerberg for helping with video editing.

Author information

Authors and Affiliations

ETH Zürich, Zürich, Switzerland
Erik Sandström, Martin R. Oswald, Suryansh Kumar, Silvan Weder, Fisher Yu & Luc Van Gool
University of Amsterdam, Amsterdam, Netherlands
Martin R. Oswald
Lund University, Lund, Sweden
Cristian Sminchisescu
KU Leuven, Leuven, Belgium
Luc Van Gool
Google Research, Sunnyvale, USA
Cristian Sminchisescu

Authors

Erik Sandström
View author publications
You can also search for this author in PubMed Google Scholar
Martin R. Oswald
View author publications
You can also search for this author in PubMed Google Scholar
Suryansh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Silvan Weder
View author publications
You can also search for this author in PubMed Google Scholar
Fisher Yu
View author publications
You can also search for this author in PubMed Google Scholar
Cristian Sminchisescu
View author publications
You can also search for this author in PubMed Google Scholar
Luc Van Gool
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erik Sandström .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 13552 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sandström, E. et al. (2022). Learning Online Multi-sensor Depth Fusion. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13692. Springer, Cham. https://doi.org/10.1007/978-3-031-19824-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-19824-3_6
Published: 11 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19823-6
Online ISBN: 978-3-031-19824-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics