Skip to main content

Scale Estimation of Monocular SfM for a Multi-modal Stereo Camera

  • Conference paper
  • First Online:
Book cover Computer Vision – ACCV 2018 (ACCV 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11363))

Included in the following conference series:

  • 3216 Accesses

Abstract

This paper proposes a novel method of estimating the absolute scale of monocular SfM for a multi-modal stereo camera. In the fields of computer vision and robotics, scale estimation for monocular SfM has been widely investigated in order to simplify systems. This paper addresses the scale estimation problem for a stereo camera system in which two cameras capture different spectral images (e.g., RGB and FIR), whose feature points are difficult to directly match using descriptors. Furthermore, the number of matching points between FIR images can be comparatively small, owing to the low resolution and lack of thermal scene texture. To cope with these difficulties, the proposed method estimates the scale parameter using batch optimization, based on the epipolar constraint of a small number of feature correspondences between the invisible light images. The accuracy and numerical stability of the proposed method are verified by synthetic and real image experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a day. In: International Conference on Computer Vision (ICCV), pp. 72–79 (2009)

    Google Scholar 

  2. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32

    Chapter  Google Scholar 

  3. Bertozzi, M., Broggi, A., Caraffi, C., Rose, M.D., Felisa, M., Vezzoni, G.: Pedestrian detection by means of far-infrared stereo vision. Comput. Vis. Image Underst. 106(2), 194–204 (2007)

    Article  Google Scholar 

  4. Clipp, B., Kim, J.H., Frahm, J.M., Pollefeys, M., Hartley, R.: Robust 6DOF motion estimation for non-overlapping, multi-camera systems. In: IEEE Workshop on Applications of Computer Vision (WACV) (2008)

    Google Scholar 

  5. Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: real-time single camera SLAM. Trans. Pattern Anal. Mach. Intell. (TPAMI) 29(6), 1052–1067 (2007)

    Article  Google Scholar 

  6. DeTone, D., Malisiewicz, T., Rabinovich, A.: Toward geometric deep SLAM. arXiv preprint arXiv:1707.07410 (2017)

  7. Furukawa, Y., Ponce, J.: Accurate, dense, and robust multi-view stereopsis. Trans. Pattern Anal. Mach. Intell. (TPAMI) 32(8), 1362–1376 (2010)

    Article  Google Scholar 

  8. Ham, Y., Golparvar-Fard, M.: An automated vision-based method for rapid 3D energy performance modeling of existing buildings using thermal and digital imagery. Adv. Eng. Inform. 27(3), 395–409 (2013)

    Article  Google Scholar 

  9. Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet: unifying feature and metric learning for patch-based matching. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3279–3286 (2015)

    Google Scholar 

  10. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004). ISBN 0521540518

    Book  Google Scholar 

  11. Iwaszczuk, D., Stilla, U.: Camera pose refinement by matching uncertain 3D building models with thermal infrared image sequences for high quality texture extraction. ISPRS J. Photogramm. Remote. Sens. 132, 33–47 (2017)

    Article  Google Scholar 

  12. Jancosek, M., Pajdla, T.: Multi-view reconstruction preserving weakly-supported surfaces. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3121–3128 (2011)

    Google Scholar 

  13. Kitt, B.M., Rehder, J., Chambers, A.D., Schonbein, M., Lategahn, H., Singh, S.: Monocular visual odometry using a planar road model to solve scale ambiguity. In: European Conference on Mobile Robots (2011)

    Google Scholar 

  14. Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: International Symposium on Mixed and Augmented Reality (ISMAR), pp. 225–234 (2007)

    Google Scholar 

  15. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 60(2), 91–110 (2004)

    Article  Google Scholar 

  16. Müller, A.O., Kroll, A.: Generating high fidelity 3-D thermograms with a handheld real-time thermal imaging system. IEEE Sens. J. 17(3), 774–783 (2017)

    Article  Google Scholar 

  17. Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136 (2011)

    Google Scholar 

  18. Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 756–770 (2004)

    Article  Google Scholar 

  19. Nützi, G., Weiss, S., Scaramuzza, D., Siegwart, R.: Fusion of IMU and vision for absolute scale estimation in monocular SLAM. J. Intell. Robot. Syst. 61(1), 287–299 (2011)

    Article  Google Scholar 

  20. Oreifej, O., Cramer, J., Zakhor, A.: Automatic generation of 3D thermal maps of building interiors. ASHRAE Trans. 120, C1 (2014)

    Google Scholar 

  21. Phuc Truong, T., Yamaguchi, M., Mori, S., Nozick, V., Saito, H.: Registration of RGB and thermal point clouds generated by structure from motion. In: International Conference on Computer Vision Workshop (ICCVW) (2017)

    Google Scholar 

  22. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: International Conference on Computer Vision (ICCV), pp. 2564–2571 (2011)

    Google Scholar 

  23. Scaramuzza, D., Fraundorfer, F., Pollefeys, M., Siegwart, R.: Absolute scale in structure from motion from a single vehicle mounted camera by exploiting nonholonomic constraints. In: International Conference on Computer Vision (ICCV), pp. 1413–1419 (2009)

    Google Scholar 

  24. Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4104–4113 (2016)

    Google Scholar 

  25. Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31

    Chapter  Google Scholar 

  26. Stewénius, H., Engels, C., Nistér, D.: Recent developments on direct relative orientation. ISPRS J. Photogramm. Remote Sens. 60, 284–294 (2006)

    Article  Google Scholar 

  27. Thiele, S.T., Varley, N., James, M.R.: Thermal photogrammetric imaging: a new technique for monitoring dome eruptions. J. Volcanol. Geotherm. Res. 337(Suppl. C), 140–145 (2017)

    Article  Google Scholar 

  28. Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment—a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) IWVA 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44480-7_21

    Chapter  Google Scholar 

  29. Vidas, S., Moghadam, P., Bosse, M.: 3D thermal mapping of building interiors using an RGB-D and thermal camera. In: International Conference on Robotics and Automation (ICRA), pp. 2311–2318 (2013)

    Google Scholar 

  30. Weinmann, M., Leitloff, J., Hoegner, L., Jutzi, B., Stilla, U., Hinz, S.: Thermal 3D mapping for object detection in dynamic scenes. ISPRS Ann. Photogramm. Remote. Sens. Spat. Inf. Sci. 2(1), 53 (2014)

    Article  Google Scholar 

  31. Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361 (2015)

    Google Scholar 

  32. Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 22, 1330–1334 (2000)

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the Hori Sciences & Arts Foundation, the New Energy and Industrial Technology Development Organization (NEDO) and JSPS KAKENHI Grant Number 18K18071.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shinya Sumikura .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 30669 KB)

Supplementary material 1 (pdf 15031 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sumikura, S., Sakurada, K., Kawaguchi, N., Nakamura, R. (2019). Scale Estimation of Monocular SfM for a Multi-modal Stereo Camera. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11363. Springer, Cham. https://doi.org/10.1007/978-3-030-20893-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20893-6_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20892-9

  • Online ISBN: 978-3-030-20893-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics