Abstract
Many hand-held or mixed reality devices are used with a single sensor for 3D reconstruction, although they often comprise multiple sensors. Multi-sensor depth fusion is able to substantially improve the robustness and accuracy of 3D reconstruction methods, but existing techniques are not robust enough to handle sensors which operate with diverse value ranges as well as noise and outlier statistics. To this end, we introduce SenFuNet,- a depth fusion approach that learns sensor-specific noise and outlier statistics and combines the data streams of depth frames from different sensors in an online fashion. Our method fuses multi-sensor depth streams regardless of time synchronization and calibration and generalizes well with little training data. We conduct experiments with various sensor combinations on the real-world CoRBS and Scene3D datasets, as well as the Replica dataset. Experiments demonstrate that our fusion strategy outperforms traditional and recent online depth fusion approaches. In addition, the combination of multiple sensors yields more robust outlier handling and more precise surface reconstruction than the use of a single sensor. The source code and data are available at https://github.com/tfy14esa/SenFuNet.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Enforcing free space for voxels along the ray from the camera to the surface [36]. Note that outliers behind surfaces are not removed with this technique.
- 2.
See supplementary material for a definition.
- 3.
Additionally, we tweak the original implementation to get rid of outliers. See supplementary material.
- 4.
- 5.
- 6.
Unfortunately, no suitable public real 3D dataset exists, which comprises binocular stereo pairs, and an active depth sensor, as well as ground truth geometry.
References
Agresti, G., Minto, L., Marin, G., Zanuttigh, P.: Deep learning for confidence information in stereo and ToF data fusion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 697–705 (2017)
Agresti, G., Minto, L., Marin, G., Zanuttigh, P.: Stereo and ToF data fusion by learning from synthetic data. Inf. Fusion 49, 161–173 (2019)
Ali, M.K., Rajput, A., Shahzad, M., Khan, F., Akhtar, F., Börner, A.: Multi-sensor depth fusion framework for real-time 3d reconstruction. IEEE Access 7, 136471–136480 (2019)
Božič, A., Palafox, P., Thies, J., Dai, A., Nießner, M.: Transformerfusion: monocular RGB scene reconstruction using transformers. arXiv preprint arXiv:2107.02191 (2021)
Bylow, E., Olsson, C., Kahl, F.: Robust online 3d reconstruction combining a depth sensor and sparse feature points. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3709–3714 (2016)
Bylow, E., Maier, R., Kahl, F., Olsson, C.: Combining depth fusion and photometric stereo for fine-detailed 3d models. In: Felsberg, M., Forssén, P.-E., Sintorn, I.-M., Unger, J. (eds.) SCIA 2019. LNCS, vol. 11482, pp. 261–274. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20205-7_22
Cao, Y.P., Kobbelt, L., Hu, S.M.: Real-time high-accuracy three-dimensional reconstruction with consumer RGB-D cameras. ACM Trans. Graph. (TOG) 37(5), 1–16 (2018)
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)
Choe, J., Im, S., Rameau, F., Kang, M., Kweon, I.S.: VolumeFusion: deep depth fusion for 3d scene reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 16086–16095, October 2021
Choi, O., Lee, S.: Fusion of time-of-flight and stereo for disambiguation of depth measurements. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7727, pp. 640–653. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37447-0_49
Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 303–312 (1996)
Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: BundleFusion: real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 36(4), 1 (2017)
Dal Mutto, C., Zanuttigh, P., Cortelazzo, G.M.: Probabilistic TOF and stereo data fusion based on mixed pixels measurement models. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2260–2272 (2015)
Deng, Y., Xiao, J., Zhou, S.Z.: TOF and stereo data fusion using dynamic search range stereo matching. IEEE Trans. Multimedia 24, 2739–2751 (2021)
Dong, W., Wang, Q., Wang, X., Zha, H.: PSDF fusion: probabilistic signed distance function for on-the-fly 3d data fusion and scene reconstruction. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 701–717 (2018)
Duan, Y., Pei, M., Wang, Y.: Probabilistic depth map fusion of kinect and stereo in real-time. In: 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 2317–2322. IEEE (2012)
Duan, Y., Pei, M., Wang, Y., Yang, M., Qin, I., Jia, Y.: A unified probabilistic framework for real-time depth map fusion. J. Inf. Sci. Eng. 31(4), 1309–1327 (2015)
Evangelidis, G.D., Hansard, M., Horaud, R.: Fusion of range and stereo data for high-resolution scene-modeling. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2178–2192 (2015)
Golodetz, S., Cavallari, T., Lord, N.A., Prisacariu, V.A., Murray, D.W., Torr, P.H.: Collaborative large-scale dense 3d reconstruction with online inter-agent pose optimisation. IEEE Trans. Visual. Comput. Graph. 24(11), 2895–2905 (2018)
Gu, P., et al.: A 3d reconstruction method using multisensor fusion in large-scale indoor scenes. Complexity 2020 (2020)
Handa, A., Whelan, T., McDonald, J., Davison, A.J.: A benchmark for RGB-D visual odometry, 3d reconstruction and slam. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1524–1531. IEEE (2014)
Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2007)
Huang, J., Huang, S.S., Song, H., Hu, S.M.: Di-fusion: online implicit 3d reconstruction with deep priors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8932–8941 (2021)
Izadi, S., et al.: KinectFusion: real-time 3d reconstruction and interaction using a moving depth camera. In: Proceedings of the 24th annual ACM Symposium on User Interface Software and Technology, pp. 559–568. ACM (2011)
Kähler, O., Prisacariu, V.A., Ren, C.Y., Sun, X., Torr, P.H.S., Murray, D.W.: Very high frame rate volumetric integration of depth images on mobile devices. IEEE Trans. Vis. Comput. Graph. 21(11), 1241–1250 (2015)
Kazhdan, M., Hoppe, H.: Screened poisson surface reconstruction. ACM Trans. Graph. (ToG) 32(3), 1–13 (2013)
Kim, Y.M., Theobalt, C., Diebel, J., Kosecka, J., Miscusik, B., Thrun, S.: Multi-view image and TOF sensor fusion for dense 3d reconstruction. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV workshops, pp. 1542–1549. IEEE (2009)
Lefloch, D., Weyrich, T., Kolb, A.: Anisotropic point-based fusion. In: 2015 18th International Conference on Information Fusion (Fusion), pp. 2121–2128. IEEE (2015)
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3d surface construction algorithm. ACM siggraph Comput. Graph. 21(4), 163–169 (1987)
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Maddern, W., Newman, P.: Real-time probabilistic fusion of sparse 3d lidar and dense stereo. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2181–2188. IEEE (2016)
Savva, M., et al.: Habitat: a platform for embodied AI Research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Marin, G., Zanuttigh, P., Mattoccia, S.: Reliable fusion of ToF and stereo depth driven by confidence measures. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 386–401. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_24
Martins, D., Van Hecke, K., De Croon, G.: Fusion of stereo and still monocular depth estimates in a self-supervised learning context. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 849–856. IEEE (2018)
Murez, Z., van As, T., Bartolozzi, J., Sinha, A., Badrinarayanan, V., Rabinovich, A.: Atlas: end-to-end 3D scene reconstruction from posed images. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 414–431. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_25
Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: ISMAR, vol. 11, pp. 127–136 (2011)
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: ICCV (2011)
Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3d reconstruction at scale using voxel hashing. ACM Trans. Graph. (TOG) 32 (2013). https://doi.org/10.1145/2508363.2508374
Oleynikova, H., Taylor, Z., Fehr, M., Siegwart, R., Nieto, J.I.: Voxblox: incremental 3d euclidean signed distance fields for on-board MAV planning. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2017, Vancouver, BC, Canada, 24–28 September 2017, pp. 1366–1373. IEEE (2017). https://doi.org/10.1109/IROS.2017.8202315
Park, K., Kim, S., Sohn, K.: High-precision depth estimation with the 3d lidar and stereo fusion. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2156–2163. IEEE (2018)
Patil, V., Van Gansbeke, W., Dai, D., Van Gool, L.: Don’t forget the past: Recurrent depth estimation from monocular video. IEEE Robot. Autom. Lett. 5(4), 6813–6820 (2020)
Poggi, M., Mattoccia, S.: Deep stereo fusion: combining multiple disparity hypotheses with deep-learning. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 138–147. IEEE (2016)
Pu, C., Song, R., Tylecek, R., Li, N., Fisher, R.B.: SDF-MAN: semi-supervised disparity fusion with multi-scale adversarial networks. Remote Sens. 11(5), 487 (2019)
Qiu, J., et al.: Deeplidar: deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 3313–3322. Computer Vision Foundation/IEEE (2019). https://doi.org/10.1109/CVPR.2019.00343, https://openaccess.thecvf.com/content_CVPR_2019/html/Qiu_DeepLiDAR_Deep_Surface_Normal_Guided_Depth_Prediction_for_Outdoor_Scene_CVPR_2019_paper.html
Rozumnyi, D., Cherabier, I., Pollefeys, M., Oswald, M.R.: Learned semantic multi-sensor depth map fusion. In: International Conference on Computer Vision Workshop (ICCVW), Workshop on 3D Reconstruction in the Wild, 2019. Seoul, South Korea (2019)
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Schops, T., Sattler, T., Pollefeys, M.: BAD SLAM: bundle adjusted direct RGB-D SLAM. In: CVPR (2019)
Steinbrucker, F., Kerl, C., Cremers, D., Sturm, J.: Large-scale multi-resolution surface reconstruction from RGB-D sequences. In: 2013 IEEE International Conference on Computer Vision, pp. 3264–3271 (2013)
Straub, J., et al.: The replica dataset: a digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019)
Sucar, E., Liu, S., Ortiz, J., Davison, A.: iMAP: implicit mapping and positioning in real-time. In: Proceedings of the IEEE International Conference on Computer Vision (2021)
Sun, J., Xie, Y., Chen, L., Zhou, X., Bao, H.: NeuralRecon: real-time coherent 3d reconstruction from monocular video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15598–15607 (2021)
Van Baar, J., Beardsley, P., Pollefeys, M., Gross, M.: Sensor fusion for depth estimation, including TOF and thermal sensors. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pp. 472–478. IEEE (2012)
Wasenmüller, O., Meyer, M., Stricker, D.: Corbs: comprehensive RGB-D benchmark for slam using kinect v2. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–7. IEEE (2016)
Weder, S., Schönberger, J.L., Pollefeys, M., Oswald, M.R.: RoutedFusion: learning real-time depth map fusion. ArXiv abs/2001.04388 (2020)
Weder, S., Schonberger, J.L., Pollefeys, M., Oswald, M.R.: NeuralFusion: online depth fusion in latent space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3162–3172 (2021)
Yan, Z., Tian, Y., Shi, X., Guo, P., Wang, P., Zha, H.: Continual neural mapping: learning an implicit scene representation from sequential observations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15782–15792, October 2021
Yang, S., et al.: Noise-resilient reconstruction of panoramas and 3d scenes using robot-mounted unsynchronized commodity RGB-D cameras. ACM Trans. Graph. (TOG) 39(5), 1–15 (2020)
Yang, S., Li, B., Liu, M., Lai, Y.K., Kobbelt, L., Hu, S.M.: HeteroFusion: dense scene reconstruction integrating multi-sensors. IEEE Trans. Visual. Comput. Graph. 26(11), 3217–3230 (2019)
Zhou, Q.Y., Koltun, V.: Dense scene reconstruction with points of interest. ACM Trans. Graph. (ToG) 32(4), 1–8 (2013)
Zhu, Z., et al.: Nice-slam: neural implicit scalable encoding for slam. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12786–12796 (2022)
Zollhöfer, M., et al.: State of the art on 3d reconstruction with RGB-D cameras. In: Computer Graphics Forum, vol. 37, pp. 625–652. Wiley Online Library (2018)
Acknowledgements
This work was supported by the Google Focused Research Award 2019-HE-318, 2019-HE-323, 2020-FS-351, 2020-HS-411, as well as by research grants from FIFA and Toshiba. We further thank Hugo Sellerberg for helping with video editing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sandström, E. et al. (2022). Learning Online Multi-sensor Depth Fusion. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13692. Springer, Cham. https://doi.org/10.1007/978-3-031-19824-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-19824-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19823-6
Online ISBN: 978-3-031-19824-3
eBook Packages: Computer ScienceComputer Science (R0)