Joint inpainting of depth and reflectance with visibility estimation

https://doi.org/10.1016/j.isprsjprs.2017.01.005Get rights and content

Abstract

This paper presents a novel strategy to generate, from 3-D lidar measures, dense depth and reflectance images coherent with given color images. It also estimates for each pixel of the input images a visibility attribute. 3-D lidar measures carry multiple information, e.g. relative distances to the sensor (from which we can compute depths) and reflectances. When projecting a lidar point cloud onto a reference image plane, we generally obtain sparse images, due to undersampling. Moreover, lidar and image sensor positions typically differ during acquisition; therefore points belonging to objects that are hidden from the image view point might appear in the lidar images. The proposed algorithm estimates the complete depth and reflectance images, while concurrently excluding those hidden points. It consists in solving a joint (depth and reflectance) variational image inpainting problem, with an extra variable to concurrently estimate handling the selection of visible points. As regularizers, two coupled total variation terms are included to match, two by two, the depth, reflectance, and color image gradients. We compare our algorithm with other image-guided depth upsampling methods, and show that, when dealing with real data, it produces better inpainted images, by solving the visibility issue.

Introduction

Image-based 3D reconstruction of static and dynamic scenes (Seitz et al., 2006, Herbort and Wöhler, 2011, Stoykova et al., 2007) is one of the main challenges in computer vision nowadays. In the recent years many efforts have been made to elaborate configurations and approaches, possibly requiring the employment of multiple sensors, with the final goal of generating plausible and detailed 3D models of scenes. To this end, typical optical cameras are often combined with non-visual sensors. The intermediate outputs of these hybrid systems, prior to the final scene rendering, are in general depth or depth + color images (RGB-D). Among the non-visual sensors, we can find Time-of-Flight (ToF) cameras (Kolb et al., 2010), which acquire low-resolution co-registered depth and color images at a cheap cost, and the famous Kinect (Zhang, 2012), capable to extract depth information by exploiting structural light. Another possibility is represented by lidar devices, which are used in a variety of applications and provide as output point clouds with measures of distance and reflectivity of the sensed surfaces.

This work lies in the context described and is particularly driven by the exploitation of data acquired by Mobile Mapping Systems (MMS), such as Paparoditis et al. (2012). MMS systems are vehicles equipped with high-resolution cameras and at least one lidar sensor: their contained dimensions allow them to be driven through regular streets and acquire data of urban scenes. The data acquired is a set of calibrated and geolocated images, together with coherent lidar point clouds. The interest towards them comes from the possibility of having available, at a relatively small processing cost, the combination of depth and color information, without having to perform explicit (error-prone) reconstructions. Having a good depth estimate at each pixel, for example, would enable the possibility to perform depth-image-based rendering algorithms, e.g. Zinger et al., 2010, Chen et al., 2005, Schmeing and Jiang, 2011. Similarly, the availability of depth information allows the insertion of virtual elements into the image, such as pedestrians or vehicles generated by a traffic simulation (Brédif, 2013). While MMS data sets do not include directly depth images aligned with the available color images, it is easy, by exploiting the known geometry, to project the lidar point clouds onto each image. This operation produces initial depth images, which present three main issues (see Fig. 1, where three parts of an input depth image are shown, together with the corresponding image parts).

  • 1.

    Undersampling: since lidar and image acquisitions are deeply different in terms of geometry and characteristics, the resulting depth images turn to be irregular. No points are present in the sky and on reflective surfaces. Moreover, the point density, which depends on the variable distances between the camera image plane and the positions of the lidar sensor, is generally significantly smaller than the pixel resolution. We can therefore talk about sparse input depth images (see for example Fig. 1a, showing the low density of lidar points from the ground).

  • 2.

    Visibility (hidden parts appear): since points that are not visible from the image view point (hidden points) can be occasionally “seen” by the moving lidar sensor, erroneous values referring to such points can appear in the input depth image. This occurs even when a Z-buffer approach (Greene et al., 1993) is used, i.e. only the closest depth values for each pixel are kept (in case multiple values end up in the same pixel location). E.g., Fig. 1 b shows that depth values from the building behind appear as foreground points.

  • 3.

    Occlusions (visible parts disappear): for the same reason as above, i.e. the different acquisition timing and geometry between image and lidar sensors, surfaces normally visible from the image view point do not get a corresponding depth. This can happen when the lidar sensor suffers occlusions at a given instant or because of the scene dynamics. E.g., in Fig. 1c, a moving bus that is not present at the moment of the image shot happens to appear in the depth image.

While there is a variety of methods in the literature that deal with the first issue, i.e. that aim at upscaling an irregular input depth image possibly with the guidance of a corresponding color image, little work has been performed to address the last two issues. In this paper, while inpainting the input depth image, we also intend to tackle the visibility problem. Moreover, we treat at the same time an additional input: a sparse reflectance image derived in the same way as the input depth image (i.e., by naively projecting the lidar point cloud, considering the reflectance information carried out by each point). We will show that the simultaneous use of a reflectance image, which is inpainted jointly with the depth, improves the quality of the produced depth image itself. To jointly inpaint depth and reflectance and concurrently evaluate the visibility of each point (i.e. establish if a single point is reliable or, since non-visible, must be discarded), we formulate an optimization problem with three variables to estimate: depth, reflectance and a visibility attribute per pixel. The inpainting process is also guided by the available color image, by means of a twofold coupled total variation (TV) regularizer.

The remainder of the paper is organized as follows. In Section 2, we present our approach and mention the related works, in particular on the image-guided depth inpainting problem. In Sections 3 Model, 4 Algorithm we describe the model used and the primal-dual optimization algorithm that arises, respectively. Finally, in Section 5 we bring experimental evidence that proves the effectiveness of the proposed approach.

Section snippets

Problem addressed and related work

Fig. 2 depicts the scheme of the proposed approach. Given an MMS data set consisting of a lidar point cloud and a set of camera images, we choose among the latter a reference color image (w), and we obtain input depth (uS) and reflectance (rs) images by re-projecting the lidar points according to the image geometry. The two lidar-originated images are sparse images with irregular sampling and need to be inpainted. We propose to do that jointly and simultaneously estimate the visibility of the

Model

Let ΩR2 be the “full” image support, and ΩSΩ the sparse image support where the input images are defined (i.e., there is at least one lidar point ending up there after projection). Given an input depth image uS:ΩSR, an input reflectance image rS:ΩSR, and the luminance component of their corresponding color image w:ΩR (defined in the complete domain), the goal is to fully inpaint the depth and reflectance input images to obtain u:ΩR and r:ΩR, and concurrently estimate a visibility

Algorithm

The optimization problem (7) turns out to be convex, but not smooth, due to 1-type data-fidelity terms, F(u,v|uS) and G(r,v|rS), and the total variation regularization term R(u,r|w). Recently, in Chambolle and Pock (2011) a primal-dual first-order algorithm has been proposed to solve such problems. In Section 4.1 we provide the necessary definitions for the algorithm, which is subsequently described in Section 4.2.

Experimental results

The algorithm presented in Section 4 is evaluated with a new data set acquired in an urban scenario by a Mobile Mapping System (MMS), composed of lidar measures and camera-originated images. With this data set, we provide a qualitative evaluation of our algorithm in comparison with other methods, by showing the reconstructed depth and reflectance images, and we assess the quality of the visibility estimation task, which is a crucial characteristic of our algorithm. Moreover, we also provide a

Conclusion

In this paper we presented a novel strategy to jointly inpaint depth and reflectance images with the guidance of a co-registered color image, and by simultaneously estimating a visibility attribute for each pixel. The problem studied and the proposed approach are particularly suited for data sets acquired by Mobile Mapping Systems (MMS): vehicles that can easily image urban scenes by means of optical cameras and lidar sensors. By projecting the 3D lidar points onto a chosen reference image, we

References (33)

  • B. Huhle et al.

    Fusion of range and color images for denoising and resolution enhancement with a non-local filter

    Comput. Vis. Image Understand.

    (2010)
  • W. Xiao et al.

    Street environment change detection from mobile laser scanning point clouds

    ISPRS J. Photogram. Remote Sensing

    (2015)
  • S. Zinger et al.

    Free-viewpoint depth image based rendering

    J. Visual Commun. Image Represent.

    (2010)
  • M. Bevilacqua et al.

    Visibility estimation and joint inpainting of lidar depth maps

  • M. Brédif

    Image-based rendering of LOD1 3D city models for traffic-augmented immersive street-view navigation

    ISPRS Ann. Photogram. Remote Sensing Spatial Inform. Sci.

    (2013)
  • A. Chambolle et al.

    A first-order primal-dual algorithm for convex problems with applications to imaging

    J. Math. Imaging Vis.

    (2011)
  • D. Chan et al.

    A noise-aware filter for real-time depth upsampling

  • T.F. Chan et al.

    Aspects of total variation regularized L1 function approximation

    SIAM J. Appl. Math.

    (2005)
  • T.F. Chan et al.

    A nonlinear primal-dual method for total variation-based image restoration

    SIAM J. Sci. Comput.

    (1999)
  • W.-Y. Chen et al.

    Efficient depth image based rendering with edge dependent depth filter and interpolation

  • J. Diebel et al.

    An application of Mmarkov random fields to range sensing

  • D. Ferstl et al.

    Image guided depth usampling using anisotropic total generalized variation

  • F. Garcia et al.

    Pixel weighted average strategy for depth sensor data fusion

  • A. Geiger et al.

    Vision meets robotics: the KITTI dataset

    Int. J. Robot. Res.

    (2013)
  • N. Greene et al.

    Hierarchical Z-buffer visibility

  • A. Harrison et al.

    Image and sparse laser fusion for dense scene reconstruction

  • Cited by (13)

    • Diffusion and inpainting of reflectance and height LiDAR orthoimages

      2019, Computer Vision and Image Understanding
      Citation Excerpt :

      Orthoimage generation from LiDAR scans acquired at ground level as been scarcely studied in the past. Nevertheless, the relation between LiDAR reflectance and optical acquisition has already been used for different applications such as depth map generation from point cloud (Bevilacqua et al., 2017), which shows the correlation between both modalities. Vallet and Papelard (2015) first extract the ground points by considering the lowest points projected in each pixel.

    • An Analysis of Generative Methods for Multiple Image Inpainting

      2023, Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging: Mathematical Imaging and Vision
    View all citing articles on Scopus

    This study has been carried out with financial support from the French State, managed by the French National Research Agency (ANR) in the frame of the Investments for the future Programme IdEx Bordeaux (ANR-10-IDEX-03-02). J.-F. Aujol also acknowledges the support of the Institut Universitaire de France.

    View full text