Joint inpainting of depth and reflectance with visibility estimation☆
Introduction
Image-based 3D reconstruction of static and dynamic scenes (Seitz et al., 2006, Herbort and Wöhler, 2011, Stoykova et al., 2007) is one of the main challenges in computer vision nowadays. In the recent years many efforts have been made to elaborate configurations and approaches, possibly requiring the employment of multiple sensors, with the final goal of generating plausible and detailed 3D models of scenes. To this end, typical optical cameras are often combined with non-visual sensors. The intermediate outputs of these hybrid systems, prior to the final scene rendering, are in general depth or depth + color images (RGB-D). Among the non-visual sensors, we can find Time-of-Flight (ToF) cameras (Kolb et al., 2010), which acquire low-resolution co-registered depth and color images at a cheap cost, and the famous Kinect (Zhang, 2012), capable to extract depth information by exploiting structural light. Another possibility is represented by lidar devices, which are used in a variety of applications and provide as output point clouds with measures of distance and reflectivity of the sensed surfaces.
This work lies in the context described and is particularly driven by the exploitation of data acquired by Mobile Mapping Systems (MMS), such as Paparoditis et al. (2012). MMS systems are vehicles equipped with high-resolution cameras and at least one lidar sensor: their contained dimensions allow them to be driven through regular streets and acquire data of urban scenes. The data acquired is a set of calibrated and geolocated images, together with coherent lidar point clouds. The interest towards them comes from the possibility of having available, at a relatively small processing cost, the combination of depth and color information, without having to perform explicit (error-prone) reconstructions. Having a good depth estimate at each pixel, for example, would enable the possibility to perform depth-image-based rendering algorithms, e.g. Zinger et al., 2010, Chen et al., 2005, Schmeing and Jiang, 2011. Similarly, the availability of depth information allows the insertion of virtual elements into the image, such as pedestrians or vehicles generated by a traffic simulation (Brédif, 2013). While MMS data sets do not include directly depth images aligned with the available color images, it is easy, by exploiting the known geometry, to project the lidar point clouds onto each image. This operation produces initial depth images, which present three main issues (see Fig. 1, where three parts of an input depth image are shown, together with the corresponding image parts).
- 1.
Undersampling: since lidar and image acquisitions are deeply different in terms of geometry and characteristics, the resulting depth images turn to be irregular. No points are present in the sky and on reflective surfaces. Moreover, the point density, which depends on the variable distances between the camera image plane and the positions of the lidar sensor, is generally significantly smaller than the pixel resolution. We can therefore talk about sparse input depth images (see for example Fig. 1a, showing the low density of lidar points from the ground).
- 2.
Visibility (hidden parts appear): since points that are not visible from the image view point (hidden points) can be occasionally “seen” by the moving lidar sensor, erroneous values referring to such points can appear in the input depth image. This occurs even when a Z-buffer approach (Greene et al., 1993) is used, i.e. only the closest depth values for each pixel are kept (in case multiple values end up in the same pixel location). E.g., Fig. 1 b shows that depth values from the building behind appear as foreground points.
- 3.
Occlusions (visible parts disappear): for the same reason as above, i.e. the different acquisition timing and geometry between image and lidar sensors, surfaces normally visible from the image view point do not get a corresponding depth. This can happen when the lidar sensor suffers occlusions at a given instant or because of the scene dynamics. E.g., in Fig. 1c, a moving bus that is not present at the moment of the image shot happens to appear in the depth image.
While there is a variety of methods in the literature that deal with the first issue, i.e. that aim at upscaling an irregular input depth image possibly with the guidance of a corresponding color image, little work has been performed to address the last two issues. In this paper, while inpainting the input depth image, we also intend to tackle the visibility problem. Moreover, we treat at the same time an additional input: a sparse reflectance image derived in the same way as the input depth image (i.e., by naively projecting the lidar point cloud, considering the reflectance information carried out by each point). We will show that the simultaneous use of a reflectance image, which is inpainted jointly with the depth, improves the quality of the produced depth image itself. To jointly inpaint depth and reflectance and concurrently evaluate the visibility of each point (i.e. establish if a single point is reliable or, since non-visible, must be discarded), we formulate an optimization problem with three variables to estimate: depth, reflectance and a visibility attribute per pixel. The inpainting process is also guided by the available color image, by means of a twofold coupled total variation (TV) regularizer.
The remainder of the paper is organized as follows. In Section 2, we present our approach and mention the related works, in particular on the image-guided depth inpainting problem. In Sections 3 Model, 4 Algorithm we describe the model used and the primal-dual optimization algorithm that arises, respectively. Finally, in Section 5 we bring experimental evidence that proves the effectiveness of the proposed approach.
Section snippets
Problem addressed and related work
Fig. 2 depicts the scheme of the proposed approach. Given an MMS data set consisting of a lidar point cloud and a set of camera images, we choose among the latter a reference color image (w), and we obtain input depth () and reflectance () images by re-projecting the lidar points according to the image geometry. The two lidar-originated images are sparse images with irregular sampling and need to be inpainted. We propose to do that jointly and simultaneously estimate the visibility of the
Model
Let be the “full” image support, and the sparse image support where the input images are defined (i.e., there is at least one lidar point ending up there after projection). Given an input depth image , an input reflectance image , and the luminance component of their corresponding color image (defined in the complete domain), the goal is to fully inpaint the depth and reflectance input images to obtain and , and concurrently estimate a visibility
Algorithm
The optimization problem (7) turns out to be convex, but not smooth, due to -type data-fidelity terms, and , and the total variation regularization term . Recently, in Chambolle and Pock (2011) a primal-dual first-order algorithm has been proposed to solve such problems. In Section 4.1 we provide the necessary definitions for the algorithm, which is subsequently described in Section 4.2.
Experimental results
The algorithm presented in Section 4 is evaluated with a new data set acquired in an urban scenario by a Mobile Mapping System (MMS), composed of lidar measures and camera-originated images. With this data set, we provide a qualitative evaluation of our algorithm in comparison with other methods, by showing the reconstructed depth and reflectance images, and we assess the quality of the visibility estimation task, which is a crucial characteristic of our algorithm. Moreover, we also provide a
Conclusion
In this paper we presented a novel strategy to jointly inpaint depth and reflectance images with the guidance of a co-registered color image, and by simultaneously estimating a visibility attribute for each pixel. The problem studied and the proposed approach are particularly suited for data sets acquired by Mobile Mapping Systems (MMS): vehicles that can easily image urban scenes by means of optical cameras and lidar sensors. By projecting the 3D lidar points onto a chosen reference image, we
References (33)
- et al.
Fusion of range and color images for denoising and resolution enhancement with a non-local filter
Comput. Vis. Image Understand.
(2010) - et al.
Street environment change detection from mobile laser scanning point clouds
ISPRS J. Photogram. Remote Sensing
(2015) - et al.
Free-viewpoint depth image based rendering
J. Visual Commun. Image Represent.
(2010) - et al.
Visibility estimation and joint inpainting of lidar depth maps
Image-based rendering of LOD1 3D city models for traffic-augmented immersive street-view navigation
ISPRS Ann. Photogram. Remote Sensing Spatial Inform. Sci.
(2013)- et al.
A first-order primal-dual algorithm for convex problems with applications to imaging
J. Math. Imaging Vis.
(2011) - et al.
A noise-aware filter for real-time depth upsampling
- et al.
Aspects of total variation regularized L1 function approximation
SIAM J. Appl. Math.
(2005) - et al.
A nonlinear primal-dual method for total variation-based image restoration
SIAM J. Sci. Comput.
(1999) - et al.
Efficient depth image based rendering with edge dependent depth filter and interpolation
An application of Mmarkov random fields to range sensing
Image guided depth usampling using anisotropic total generalized variation
Pixel weighted average strategy for depth sensor data fusion
Vision meets robotics: the KITTI dataset
Int. J. Robot. Res.
Hierarchical Z-buffer visibility
Image and sparse laser fusion for dense scene reconstruction
Cited by (13)
Diffusion and inpainting of reflectance and height LiDAR orthoimages
2019, Computer Vision and Image UnderstandingCitation Excerpt :Orthoimage generation from LiDAR scans acquired at ground level as been scarcely studied in the past. Nevertheless, the relation between LiDAR reflectance and optical acquisition has already been used for different applications such as depth map generation from point cloud (Bevilacqua et al., 2017), which shows the correlation between both modalities. Vallet and Papelard (2015) first extract the ground points by considering the lowest points projected in each pixel.
An Analysis of Generative Methods for Multiple Image Inpainting
2023, Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging: Mathematical Imaging and VisionWHU-Stereo: A Challenging Benchmark for Stereo Matching of High-Resolution Satellite Images
2023, IEEE Transactions on Geoscience and Remote Sensing
- ☆
This study has been carried out with financial support from the French State, managed by the French National Research Agency (ANR) in the frame of the Investments for the future Programme IdEx Bordeaux (ANR-10-IDEX-03-02). J.-F. Aujol also acknowledges the support of the Institut Universitaire de France.