skip to main content
research-article
Open Access

An Implicit Neural Representation for the Image Stack: Depth, All in Focus, and High Dynamic Range

Published:05 December 2023Publication History
Skip Abstract Section

Abstract

In everyday photography, physical limitations of camera sensors and lenses frequently lead to a variety of degradations in captured images such as saturation or defocus blur. A common approach to overcome these limitations is to resort to image stack fusion, which involves capturing multiple images with different focal distances or exposures. For instance, to obtain an all-in-focus image, a set of multi-focus images is captured. Similarly, capturing multiple exposures allows for the reconstruction of high dynamic range. In this paper, we present a novel approach that combines neural fields with an expressive camera model to achieve a unified reconstruction of an all-in-focus high-dynamic-range image from an image stack. Our approach is composed of a set of specialized implicit neural representations tailored to address specific sub-problems along our pipeline: We use neural implicits to predict flow to overcome misalignments arising from lens breathing, depth, and all-in-focus images to account for depth of field, as well as tonemapping to deal with sensor responses and saturation - all trained using a physically inspired supervision structure with a differentiable thin lens model at its core. An important benefit of our approach is its ability to handle these tasks simultaneously or independently, providing flexible post-editing capabilities such as refocusing and exposure adjustment. By sampling the three primary factors in photography within our framework (focal distance, aperture, and exposure time), we conduct a thorough exploration to gain valuable insights into their significance and impact on overall reconstruction quality. Through extensive validation, we demonstrate that our method outperforms existing approaches in both depth-from-defocus and all-in-focus image reconstruction tasks. Moreover, our approach exhibits promising results in each of these three dimensions, showcasing its potential to enhance captured image quality and provide greater control in post-processing.

Skip Supplemental Material Section

Supplemental Material

papers_480s4-file3.mp4

mp4

203 MB

References

  1. Maryam Azimi et al. 2021. PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR. In 2021 Picture Coding Symposium (PCS). IEEE, 1--5.Google ScholarGoogle Scholar
  2. Sai Bangaru, Jesse Michel, Kevin Mu, Gilbert Bernstein, Tzu-Mao Li, and Jonathan Ragan-Kelley. 2021. Systematically Differentiating Parametric Discontinuities. ACM Trans. Graph. 40, 107 (2021), 107:1--107:17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Odysseas Bouzos, Ioannis Andreadis, and Nikolaos Mitianoudis. 2019. Conditional random field model for robust multi-focus image fusion. IEEE Transactions on Image Processing 28, 11 (2019), 5636--5648.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Paul E. Debevec and Jitendra Malik. 1997. Recovering High Dynamic Range Radiance Maps from Photographs. In Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '97). ACM Press/Addison-Wesley Publishing Co., USA, 369--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems 27 (2014).Google ScholarGoogle Scholar
  6. Paolo Favaro. 2010. Recovering thin structures via nonlocal-means regularization with application to depth from defocus. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 1133--1140.Google ScholarGoogle ScholarCross RefCross Ref
  7. Brandon Yushan Feng, Susmija Jabbireddy, and Amitabh Varshney. 2022. Viinter: View interpolation with implicit neural representations of images. In SIGGRAPH Asia 2022 Conference Papers. 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Herbert Gross. 2005. Handbook of Optical Systems. (2005).Google ScholarGoogle Scholar
  9. Pascal Gwosdek, Sven Grewenig, Andrés Bruhn, and Joachim Weickert. 2012. Theoretical foundations of gaussian convolution by extended box filtering. In Scale Space and Variational Methods in Computer Vision: Third International Conference, SSVM 2011, Ein-Gedi, Israel, May 29--June 2, 2011, Revised Selected Papers 3. Springer, 447--458.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Samuel W Hasinoff and Kiriakos N Kutulakos. 2007. A layer-based restoration framework for variable-aperture photography. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  11. Caner Hazirbas, Sebastian Georg Soyer, Maximilian Christian Staab, Laura Leal-Taixé, and Daniel Cremers. 2019. Deep depth from focus. In Computer Vision-ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2--6, 2018, Revised Selected Papers, Part III 14. Springer, 525--541.Google ScholarGoogle Scholar
  12. Xin Huang, Qi Zhang, Ying Feng, Hongdong Li, Xuan Wang, and Qing Wang. 2022. Hdr-nerf: High dynamic range neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18398--18408.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ralph Jacobson, Sidney Ray, Geoffrey G Attridge, and Norman Axford. 2000. Manual of Photography. Taylor & Francis.Google ScholarGoogle Scholar
  14. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part II 14. Springer, 694--711.Google ScholarGoogle Scholar
  15. Kim Jun-Seong, Kim Yu-Ji, Moon Ye-Bin, and Tae-Hyun Oh. 2022. HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXXII. Springer, 384--401.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yoni Kasten, Dolev Ofri, Oliver Wang, and Tali Dekel. 2021. Layered neural atlases for consistent video editing. ACM Transactions on Graphics (TOG) 40, 6 (2021), 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In 2016 Fourth international conference on 3D vision (3DV). IEEE, 239--248.Google ScholarGoogle ScholarCross RefCross Ref
  18. Jun Li, Reinhard Klein, and Angela Yao. 2017. A two-streamed network for estimating fine-scaled depth maps from single rgb images. In Proceedings of the IEEE International Conference on Computer Vision. 3372--3380.Google ScholarGoogle ScholarCross RefCross Ref
  19. Shutao Li, Xudong Kang, and Jianwen Hu. 2013. Image fusion with guided filtering. IEEE Transactions on Image processing 22, 7 (2013), 2864--2875.Google ScholarGoogle ScholarCross RefCross Ref
  20. Xing Lin, Jinli Suo, Xun Cao, and Qionghai Dai. 2013. Iterative feedback estimation of depth and radiance from defocused images. In Computer Vision-ACCV 2012: 11th Asian Conference on Computer Vision, Daejeon, Korea, November 5--9, 2012, Revised Selected Papers, Part IV 11. Springer, 95--109.Google ScholarGoogle Scholar
  21. Yu Liu, Xun Chen, Hu Peng, and Zengfu Wang. 2017. Multi-focus image fusion with a deep convolutional neural network. Information Fusion 36 (2017), 191--207.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yu-Lun Liu, Wei-Sheng Lai, Yu-Sheng Chen, Yi-Lung Kao, Ming-Hsuan Yang, Yung-Yu Chuang, and Jia-Bin Huang. 2020. Single-image HDR reconstruction by learning to reverse the camera pipeline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1651--1660.Google ScholarGoogle ScholarCross RefCross Ref
  23. Li Ma, Xiaoyu Li, Jing Liao, Qi Zhang, Xuan Wang, Jue Wang, and Pedro V Sander. 2022. Deblur-nerf: Neural radiance fields from blurry images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12861--12870.Google ScholarGoogle ScholarCross RefCross Ref
  24. David Mandl, Peter M Roth, Tobias Langlotz, Christoph Ebner, Shohei Mori, Stefanie Zollmann, Peter Mohr, and Denis Kalkofen. 2021. Neural cameras: Learning camera characteristics for coherent mixed reality rendering. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 508--516.Google ScholarGoogle ScholarCross RefCross Ref
  25. Rafal K Mantiuk, Dounia Hammou, and Param Hanji. 2023. HDR-VDP-3: A multi-metric for predicting image differences, quality and contrast distortions in high dynamic range and regular content. arXiv preprint arXiv:2304.13625 (2023).Google ScholarGoogle Scholar
  26. Maxim Maximov, Kevin Galim, and Laura Leal-Taixé. 2020. Focus on defocus: bridging the synthetic to real domain gap for depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1071--1080.Google ScholarGoogle ScholarCross RefCross Ref
  27. Tom Mertens, Jan Kautz, and Frank Van Reeth. 2007. Exposure fusion. In 15th Pacific Conference on Computer Graphics and Applications (PG'07). IEEE, 382--390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ben Mildenhall, Peter Hedman, Ricardo Martin-Brualla, Pratul P Srinivasan, and Jonathan T Barron. 2022. Nerf in the dark: High dynamic range view synthesis from noisy raw images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16190--16199.Google ScholarGoogle ScholarCross RefCross Ref
  29. Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99--106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Michael Moeller, Martin Benning, Carola Schönlieb, and Daniel Cremers. 2015. Variational depth from focus reconstruction. IEEE Transactions on Image Processing 24, 12 (2015), 5369--5378.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. 2022. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Trans. Graph. 41, 4, Article 102 (July 2022), 15 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Seonghyeon Nam, Marcus A Brubaker, and Michael S Brown. 2022. Neural image representations for multi-image fusion and layer separation. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part VII. Springer, 216--232.Google ScholarGoogle Scholar
  33. Michael Potmesil and Indranil Chakravarty. 1981. A lens and aperture camera model for synthetic image generation. ACM SIGGRAPH Computer Graphics 15, 3 (1981), 297--305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Haozhe Si, Bin Zhao, Dong Wang, Yupeng Gao, Mulin Chen, Zhigang Wang, and Xuelong Li. 2023. Fully Self-Supervised Depth Estimation from Defocus Clue. arXiv preprint arXiv:2303.10752 (2023).Google ScholarGoogle Scholar
  35. Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. ECCV (5) 7576 (2012), 746--760.Google ScholarGoogle Scholar
  36. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  37. Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. 2020. Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems 33 (2020), 7462--7473.Google ScholarGoogle Scholar
  38. Supasorn Suwajanakorn, Carlos Hernandez, and Steven M Seitz. 2015. Depth from focus with your mobile phone. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3497--3506.Google ScholarGoogle ScholarCross RefCross Ref
  39. Chao Wang, Ana Serrano, Xingang Pan, Bin Chen, Hans-Peter Seidel, Christian Theobalt, Karol Myszkowski, and Thomas Leimkuehler. 2022. GlowGAN: Unsupervised Learning of HDR Images from LDR Images in the Wild. arXiv preprint arXiv:2211.12352 (2022).Google ScholarGoogle Scholar
  40. Ning-Hsu Wang, Ren Wang, Yu-Lun Liu, Yu-Hao Huang, Yu-Lin Chang, Chia-Ping Chen, and Kevin Jou. 2021. Bridging unsupervised and supervised depth from focus via all-in-focus supervision. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12621--12631.Google ScholarGoogle ScholarCross RefCross Ref
  41. Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600--612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Juyang Weng, Paul Cohen, Marc Herniou, et al. 1992. Camera calibration with distortion models and accuracy evaluation. IEEE Transactions on pattern analysis and machine intelligence 14, 10 (1992), 965--980.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Changyeon Won and Hae-Gon Jeon. 2022. Learning Depth from Focus in the Wild. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part I. Springer, 1--18.Google ScholarGoogle Scholar
  44. Zijin Wu, Xingyi Li, Juewen Peng, Hao Lu, Zhiguo Cao, and Weicai Zhong. 2022. DoF-NeRF: Depth-of-Field Meets Neural Radiance Fields. In Proceedings of the 30th ACM International Conference on Multimedia. 1718--1729.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Fengting Yang, Xiaolei Huang, and Zihan Zhou. 2022. Deep Depth from Focus with Differential Focus Volume. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12642--12651.Google ScholarGoogle ScholarCross RefCross Ref
  46. Qiang Zhang and Martin D Levine. 2016. Robust multi-focus image fusion using multitask sparse representation and spatial context. IEEE Transactions on Image Processing 25, 5 (2016), 2045--2058.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Implicit Neural Representation for the Image Stack: Depth, All in Focus, and High Dynamic Range

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 42, Issue 6
        December 2023
        1565 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/3632123
        Issue’s Table of Contents

        Copyright © 2023 Owner/Author

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 December 2023
        Published in tog Volume 42, Issue 6

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader