Abstract
In everyday photography, physical limitations of camera sensors and lenses frequently lead to a variety of degradations in captured images such as saturation or defocus blur. A common approach to overcome these limitations is to resort to image stack fusion, which involves capturing multiple images with different focal distances or exposures. For instance, to obtain an all-in-focus image, a set of multi-focus images is captured. Similarly, capturing multiple exposures allows for the reconstruction of high dynamic range. In this paper, we present a novel approach that combines neural fields with an expressive camera model to achieve a unified reconstruction of an all-in-focus high-dynamic-range image from an image stack. Our approach is composed of a set of specialized implicit neural representations tailored to address specific sub-problems along our pipeline: We use neural implicits to predict flow to overcome misalignments arising from lens breathing, depth, and all-in-focus images to account for depth of field, as well as tonemapping to deal with sensor responses and saturation - all trained using a physically inspired supervision structure with a differentiable thin lens model at its core. An important benefit of our approach is its ability to handle these tasks simultaneously or independently, providing flexible post-editing capabilities such as refocusing and exposure adjustment. By sampling the three primary factors in photography within our framework (focal distance, aperture, and exposure time), we conduct a thorough exploration to gain valuable insights into their significance and impact on overall reconstruction quality. Through extensive validation, we demonstrate that our method outperforms existing approaches in both depth-from-defocus and all-in-focus image reconstruction tasks. Moreover, our approach exhibits promising results in each of these three dimensions, showcasing its potential to enhance captured image quality and provide greater control in post-processing.
Supplemental Material
Available for Download
supplemental
- Maryam Azimi et al. 2021. PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR. In 2021 Picture Coding Symposium (PCS). IEEE, 1--5.Google Scholar
- Sai Bangaru, Jesse Michel, Kevin Mu, Gilbert Bernstein, Tzu-Mao Li, and Jonathan Ragan-Kelley. 2021. Systematically Differentiating Parametric Discontinuities. ACM Trans. Graph. 40, 107 (2021), 107:1--107:17.Google ScholarDigital Library
- Odysseas Bouzos, Ioannis Andreadis, and Nikolaos Mitianoudis. 2019. Conditional random field model for robust multi-focus image fusion. IEEE Transactions on Image Processing 28, 11 (2019), 5636--5648.Google ScholarDigital Library
- Paul E. Debevec and Jitendra Malik. 1997. Recovering High Dynamic Range Radiance Maps from Photographs. In Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '97). ACM Press/Addison-Wesley Publishing Co., USA, 369--378. Google ScholarDigital Library
- David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems 27 (2014).Google Scholar
- Paolo Favaro. 2010. Recovering thin structures via nonlocal-means regularization with application to depth from defocus. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 1133--1140.Google ScholarCross Ref
- Brandon Yushan Feng, Susmija Jabbireddy, and Amitabh Varshney. 2022. Viinter: View interpolation with implicit neural representations of images. In SIGGRAPH Asia 2022 Conference Papers. 1--9.Google ScholarDigital Library
- Herbert Gross. 2005. Handbook of Optical Systems. (2005).Google Scholar
- Pascal Gwosdek, Sven Grewenig, Andrés Bruhn, and Joachim Weickert. 2012. Theoretical foundations of gaussian convolution by extended box filtering. In Scale Space and Variational Methods in Computer Vision: Third International Conference, SSVM 2011, Ein-Gedi, Israel, May 29--June 2, 2011, Revised Selected Papers 3. Springer, 447--458.Google ScholarDigital Library
- Samuel W Hasinoff and Kiriakos N Kutulakos. 2007. A layer-based restoration framework for variable-aperture photography. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1--8.Google ScholarCross Ref
- Caner Hazirbas, Sebastian Georg Soyer, Maximilian Christian Staab, Laura Leal-Taixé, and Daniel Cremers. 2019. Deep depth from focus. In Computer Vision-ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2--6, 2018, Revised Selected Papers, Part III 14. Springer, 525--541.Google Scholar
- Xin Huang, Qi Zhang, Ying Feng, Hongdong Li, Xuan Wang, and Qing Wang. 2022. Hdr-nerf: High dynamic range neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18398--18408.Google ScholarCross Ref
- Ralph Jacobson, Sidney Ray, Geoffrey G Attridge, and Norman Axford. 2000. Manual of Photography. Taylor & Francis.Google Scholar
- Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part II 14. Springer, 694--711.Google Scholar
- Kim Jun-Seong, Kim Yu-Ji, Moon Ye-Bin, and Tae-Hyun Oh. 2022. HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXXII. Springer, 384--401.Google ScholarDigital Library
- Yoni Kasten, Dolev Ofri, Oliver Wang, and Tali Dekel. 2021. Layered neural atlases for consistent video editing. ACM Transactions on Graphics (TOG) 40, 6 (2021), 1--12.Google ScholarDigital Library
- Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In 2016 Fourth international conference on 3D vision (3DV). IEEE, 239--248.Google ScholarCross Ref
- Jun Li, Reinhard Klein, and Angela Yao. 2017. A two-streamed network for estimating fine-scaled depth maps from single rgb images. In Proceedings of the IEEE International Conference on Computer Vision. 3372--3380.Google ScholarCross Ref
- Shutao Li, Xudong Kang, and Jianwen Hu. 2013. Image fusion with guided filtering. IEEE Transactions on Image processing 22, 7 (2013), 2864--2875.Google ScholarCross Ref
- Xing Lin, Jinli Suo, Xun Cao, and Qionghai Dai. 2013. Iterative feedback estimation of depth and radiance from defocused images. In Computer Vision-ACCV 2012: 11th Asian Conference on Computer Vision, Daejeon, Korea, November 5--9, 2012, Revised Selected Papers, Part IV 11. Springer, 95--109.Google Scholar
- Yu Liu, Xun Chen, Hu Peng, and Zengfu Wang. 2017. Multi-focus image fusion with a deep convolutional neural network. Information Fusion 36 (2017), 191--207.Google ScholarDigital Library
- Yu-Lun Liu, Wei-Sheng Lai, Yu-Sheng Chen, Yi-Lung Kao, Ming-Hsuan Yang, Yung-Yu Chuang, and Jia-Bin Huang. 2020. Single-image HDR reconstruction by learning to reverse the camera pipeline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1651--1660.Google ScholarCross Ref
- Li Ma, Xiaoyu Li, Jing Liao, Qi Zhang, Xuan Wang, Jue Wang, and Pedro V Sander. 2022. Deblur-nerf: Neural radiance fields from blurry images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12861--12870.Google ScholarCross Ref
- David Mandl, Peter M Roth, Tobias Langlotz, Christoph Ebner, Shohei Mori, Stefanie Zollmann, Peter Mohr, and Denis Kalkofen. 2021. Neural cameras: Learning camera characteristics for coherent mixed reality rendering. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 508--516.Google ScholarCross Ref
- Rafal K Mantiuk, Dounia Hammou, and Param Hanji. 2023. HDR-VDP-3: A multi-metric for predicting image differences, quality and contrast distortions in high dynamic range and regular content. arXiv preprint arXiv:2304.13625 (2023).Google Scholar
- Maxim Maximov, Kevin Galim, and Laura Leal-Taixé. 2020. Focus on defocus: bridging the synthetic to real domain gap for depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1071--1080.Google ScholarCross Ref
- Tom Mertens, Jan Kautz, and Frank Van Reeth. 2007. Exposure fusion. In 15th Pacific Conference on Computer Graphics and Applications (PG'07). IEEE, 382--390.Google ScholarDigital Library
- Ben Mildenhall, Peter Hedman, Ricardo Martin-Brualla, Pratul P Srinivasan, and Jonathan T Barron. 2022. Nerf in the dark: High dynamic range view synthesis from noisy raw images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16190--16199.Google ScholarCross Ref
- Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99--106.Google ScholarDigital Library
- Michael Moeller, Martin Benning, Carola Schönlieb, and Daniel Cremers. 2015. Variational depth from focus reconstruction. IEEE Transactions on Image Processing 24, 12 (2015), 5369--5378.Google ScholarDigital Library
- Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. 2022. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Trans. Graph. 41, 4, Article 102 (July 2022), 15 pages.Google ScholarDigital Library
- Seonghyeon Nam, Marcus A Brubaker, and Michael S Brown. 2022. Neural image representations for multi-image fusion and layer separation. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part VII. Springer, 216--232.Google Scholar
- Michael Potmesil and Indranil Chakravarty. 1981. A lens and aperture camera model for synthetic image generation. ACM SIGGRAPH Computer Graphics 15, 3 (1981), 297--305.Google ScholarDigital Library
- Haozhe Si, Bin Zhao, Dong Wang, Yupeng Gao, Mulin Chen, Zhigang Wang, and Xuelong Li. 2023. Fully Self-Supervised Depth Estimation from Defocus Clue. arXiv preprint arXiv:2303.10752 (2023).Google Scholar
- Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. ECCV (5) 7576 (2012), 746--760.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. 2020. Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems 33 (2020), 7462--7473.Google Scholar
- Supasorn Suwajanakorn, Carlos Hernandez, and Steven M Seitz. 2015. Depth from focus with your mobile phone. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3497--3506.Google ScholarCross Ref
- Chao Wang, Ana Serrano, Xingang Pan, Bin Chen, Hans-Peter Seidel, Christian Theobalt, Karol Myszkowski, and Thomas Leimkuehler. 2022. GlowGAN: Unsupervised Learning of HDR Images from LDR Images in the Wild. arXiv preprint arXiv:2211.12352 (2022).Google Scholar
- Ning-Hsu Wang, Ren Wang, Yu-Lun Liu, Yu-Hao Huang, Yu-Lin Chang, Chia-Ping Chen, and Kevin Jou. 2021. Bridging unsupervised and supervised depth from focus via all-in-focus supervision. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12621--12631.Google ScholarCross Ref
- Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600--612.Google ScholarDigital Library
- Juyang Weng, Paul Cohen, Marc Herniou, et al. 1992. Camera calibration with distortion models and accuracy evaluation. IEEE Transactions on pattern analysis and machine intelligence 14, 10 (1992), 965--980.Google ScholarDigital Library
- Changyeon Won and Hae-Gon Jeon. 2022. Learning Depth from Focus in the Wild. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part I. Springer, 1--18.Google Scholar
- Zijin Wu, Xingyi Li, Juewen Peng, Hao Lu, Zhiguo Cao, and Weicai Zhong. 2022. DoF-NeRF: Depth-of-Field Meets Neural Radiance Fields. In Proceedings of the 30th ACM International Conference on Multimedia. 1718--1729.Google ScholarDigital Library
- Fengting Yang, Xiaolei Huang, and Zihan Zhou. 2022. Deep Depth from Focus with Differential Focus Volume. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12642--12651.Google ScholarCross Ref
- Qiang Zhang and Martin D Levine. 2016. Robust multi-focus image fusion using multitask sparse representation and spatial context. IEEE Transactions on Image Processing 25, 5 (2016), 2045--2058.Google ScholarDigital Library
Index Terms
- An Implicit Neural Representation for the Image Stack: Depth, All in Focus, and High Dynamic Range
Recommendations
Multiple depth layers and all-in-focus image generations by blurring and deblurring operations
The depth map and all-in-focus generations from a single image are proposed.The depth map bases on the characteristic curve of COC vs. the depth characteristic curve of a camera.All-in-focus image generation is from the estimated depth map.A joint ...
Half-sweep imaging for depth from defocus
Depth from defocus (DFD) is a technique that restores scene depth based on the amount of defocus blur in the images. DFD usually captures two differently focused images, one near-focused and the other far-focused, and calculates the size of the defocus ...
An MRF Model-Based Approach to Simultaneous Recovery of Depth and Restoration from Defocused Images
Depth from defocus (DFD) problem involves calculating the depth of various points in a scene by modeling the effect that the focal parameters of the camera have on images acquired with a small depth of field. In this paper, we propose a MAP-MRF-based ...
Comments