Elsevier

Pattern Recognition

Volume 105, September 2020, 107391
Pattern Recognition

Optical flow-based structure-from-motion for the reconstruction of epithelial surfaces

https://doi.org/10.1016/j.patcog.2020.107391Get rights and content

Highlights

  • Novel optical flow-based Structure-from-Motion for scenes with few textures.

  • High accuracy optical flow estimation using a new illumination-invariant descriptor.

  • Determination of large homologous point groups using dense optical flow.

  • Dense 3D point cloud generation without any Multi-view Stereo step.

  • Robust surface reconstruction for different medical scenes and image modalities.

Abstract

This paper details a novel optical flow-based structure from motion (SfM) approach for the reconstruction of surfaces with few textures using video sequences acquired under strong illumination changes. An original image search and grouping strategy allows to reconstruct each 3D scene point using a large set of 2D homologous points extracted from a reference image and its superimposed images acquired from different viewpoints. A variational optical flow scheme with a descriptor-based data term leads to a robust, accurate and dense homologous point determination between the image pairs. Thus, contrary to classical SfM usable for textured scenes, the proposed dense point cloud reconstruction algorithm requires neither a feature point tracking method nor any multi-view stereo technique. The performance of the proposed SfM approach is assessed on phantoms with known ground truth and on very complex patient data of various medical examinations and image modalities.

Introduction

Multiview 3D techniques aim to reconstruct scenes with an extended field of view (FoV) using sequences of 2D images with limited FoV. The intrinsic camera parameters are usually obtained either through a offline calibration or are directly estimated with the images used to reconstruct the scene [1]. Multiview 3D techniques recover a scene in several steps. The acquired sequences are first preprocessed to correct image distortions or remove images with poor quality (e.g., blurred data). The 3D scene structure is reconstructed in the second step, referred to as structure-from-motion (SfM). According to the image contents, this step is among the most challenging in the whole process. In this SfM step, 3D geometrical structures are obtained using triangulation techniques applied on groups of homologous points seen (preferably) in numerous 2D images [2], the 3D point positions being refined by a bundle adjustment [3]. The performance of the determination of homologous points (matching) is a key issue in SfM. Almost all SfM methods in the literature determine homologous points using feature detection and matching algorithms (as SIFT [4] or SURF [5]). The SfM step delivers sparse 3D point clouds since feature based methods detect a limited number of points in images of most of the scenes. Multi-view Stereo (MVS) techniques represent a classical step used to increase the density of the 3D point clouds. Patched-MVS [6], CMPMVS [7], and MVS [8] are state-of-the-art MVS methods. In the next step, a mesh generation algorithm (as the Poisson surface algorithm in [9]) uses the dense 3D point cloud to approximate surfaces with triangular facets. These meshes are usually refined to obtain the final surface [10]. Finally, the superimposition of the 2D image textures onto the meshed surface leads to a visually coherent scene rendering [11].

Feature-based SfM methods were used to recover the surface of objects of a few centimeters of diameter up to one kilometer across (see [12]). SfM-based 3D reconstruction was also used to reconstruct large monuments [13], or even complete city districts [14] with high accuracy. However, there is a class of medical scenes for which feature based-SfM approaches are an optimal solution.

The epithelium (tissue that covers the external human body surface or that lines the internal wall of all hollow organs) is visualized by cameras in various medical examinations. In dermatology, in gastroenterology and in cystoscopy the epithelial surfaces (respectively corresponding to the skin, the inner stomach wall or the inner bladder wall) are scanned by a camera to search for lesions or to assess their evolution. All these medical applications have a common point: the images are acquired close to the tissue to ensure high image resolution.

Due to these acquisition conditions, the FoV of the images is very limited. Small FoVs do not facilitate the diagnosis since, on the one hand, cancerous lesions on the skin or in the bladder have to be completely seen and, on the other hand, an urologist or a gastroenterologist cannot mentally visualize the endoscope position in hollow organs. Extending the FoV using mosaicing algorithms favours a simultaneous visualization of complete lesions and of anatomical landmarks helping endoscopists to localize the instrument into the organ. In the last two decades, 2D image mosaicing algorithms were proposed in endoscopy [15], [16]. 2D mosaics increase the FoV, but have two major drawbacks. On the one hand, the 3D organs are projected on a 2D plane defined by the image taken as a reference for the mosaicing. When moving away from this reference image in the mosaic plane, the projection distortions become strong and result both in a loss of image resolution and in an incorrect organ representation at the borders of the mosaics which remain of limited size. On the other hand, 2D mosaics are not in accordance with the 3D mental organ representation of endoscopists or dermatologists. Obtaining extended 3D FoV mosaics using SfM techniques can be of high interest in dermatology and endoscopy.

However, in medical examinations both the acquisition conditions and the scene characteristics are significantly different from those of the applications for which SfM has been proven efficient. First, the reconstruction of 3D points is more accurate when homologous points can be acquired from very different viewpoints. In classical SfM applications (e.g., manufactured part or monument surface construction), the acquisition conditions are controlled in the sense that scene parts can effectively be acquired from very different viewpoints. In dermatology, and more particularly in endoscopy, the camera trajectory is quite difficult to control. Obtaining images of the same organ part from very different viewpoints is a difficult task. Secondly, images of natural scenes or manufactured parts usually include image primitives (corners, line segments, etc.), contrasted textures and/or a great variation in terms of colours. On the contrary, the color variations are very small in dermatology, while in gastroscopy most images are with very few and weakly contrasted textures and structures.

As shown in Fig. 1(a) for two pyloric antrum images, only few homologous points were found when associating the SIFT algorithm [4] to the RANSAC outlier rejection method [17]. Besides the lack of textures, homologous point determination is also impeded by the strong illumination changes between two acquisitions and inhomogeneous lighting due to viewpoint changes. Specular reflections also favor false point correspondences. Such few and partially wrong matches are not appropriate for a 3D reconstruction using SfM approaches.

As shown in Fig. 1(b), the optical flow (OF) approach described in this work is usable for scenes with few textures and structures. Although dense optical flow (DOF) provides numerous homologous points between two images, DOF matching techniques have been rarely used in SfM up to now.

Let us consider the following situation to understand the reason for this. Suppose that Ii and Ij (with j ≠ i ± 1) are two non (temporally) consecutive video-sequence images that share a common scene part. If the images are well structured/textured, feature detectors and descriptor (e.g., SIFT) can effectively determine (both in quantity and quality) the homologous points between Ii and Ij. The advantage of the feature matching methods is that the points detected by detector (keypoints) and their descriptors are often invariant to geometric and photometric changes. Thus, point-tracks determined by feature matching often ensure a high accuracy (the key-points can be localized with a subpixel accuracy). In contrary, if an OF-based tracking method is used to find homologous points between Ii and Ij, flow fields Fk,k+1 (with k=i,i+1,,j1) have to be computed for consecutive image pairs (Ik,Ik+1) from Ii to Ij. With a starting point Ai in Ii, the tracked sequence of points (Ai,Ai+1,,Aj) is determined, with Ak=Ak1+Fk1,k(Ak1) and k=i+1,,j, and Aj is defined to be the homologous point of Ai. Two issues are related to this way to track homologous points: (i) even if a very accurate OF method providing a dense flow field between images is used, it is impossible to reach the subpixel accuracy of feature matching methods, and (ii) although the errors affecting the OF vectors linking points in consecutive images are weak, these errors accumulate themselves along the sequence and may become quickly large when the length of the point track increases. Therefore, Ai and Aj are often wrong homologous points when the temporal distance |ji| is large. This lack of accuracy explains why DOF is rarely used in SfM approaches.

The proposed SfM approach is based on the fact that in the scenes where feature detectors are unusable, DOF may be the only option for point correspondance establishment. The global aim of this paper is to show that a DOF-approach can lead to an efficient surface reconstruction solution for scenes for which feature-based methods cannot be used. The described solution is based on two contributions. The paper shows first how a dense point correspondence can be established even in complexe scenes with few textures and strong illumination changes as in Fig. 1(b). Then, one proposes an image grouping strategy that leads to numerous and large homologous point sets enabling a robust surface reconstruction. Compared to the point tracking in consecutive images of a sequence, the proposed image grouping strategy avoids accumulated errors leading to inaccurate or false correspondences. Moreover, unlike feature-based SfM methods, the proposed DOF-based SfM method directly provides dense 3D point clouds and makes the implementation of a MVS method unnecessary.

Section 2 presents previous contributions relating to the reconstruction of medical scenes. Section 3 focuses on the two main contributions enabling SfM methods to be robust, namely the OF method for finding numerous homologous points between two images and the strategy for finding numerous 2D homologous image points for each 3D scene point to be reconstructed. Results are first given in Section 4 to compare the performance of a state-of-the-art SfM method based on feature detection (COLMAP [18]) with that of the proposed DOF-based SfM approach. Epithelial surface construction examples are then given for three medical examinations (gastroscopy, cystoscopy and dermatology) to show the large scene variability which can be handled by the proposed SfM scheme. A conclusion and perspectives are presented in Section 5.

Section snippets

Related work

A straightforward solution to tackle the issue relating to the lack of feature points would be to use active stereo-vision systems projecting light patterns on the surfaces to be reconstructed [19]. An active vision method was developed to show the feasibility of 3D bladder mosaicing [20]. However, such a solution lead to too significant hardware changes for endoscope manufacturers who prefer passive vision solutions keeping the instruments unchanged. Moreover, active vision solutions are

Dense optical flow for SfM

This section begins by briefly describing a robust illumination-invariant OF method that delivers accurate correspondences even with weakly structured and textured images. Then, Section 3.2 details the image grouping strategy which maximizes the sizes of the homologous point sets by uniquely computing the DOF between image pairs (i.e., without tracking homologous points along a sequence of more than two images). This image grouping strategy is intergraded in the incremental SfM pipeline given

Results and discussion

This section successively quantifies the accuracy of the DOF-based SfM scheme (Section 4.1), demonstrates the differences between feature and OF approaches when textures are missing (Section 4.2), and highlight the robustness of the proposed method which can deal with very different scenes and acquisition conditions (Section 4.3). The code (computation of homologous point groups usable as input by any SfM approach) and all data used in this section are given as supplementary material.

The

Conclusion: global discussion and perspectives

An algorithm can be considered as being robust when it provides appropriate results for very different scene contents and acquisition conditions. In this paper, surface construction tests were presented on very different data. Surfaces with almost no textures (gastroscopy), with rather few textures (cystoscopy) and with more textures (dermatology) were successfully reconstructed with the proposed DOF-based SfM method. These surfaces were reconstructed for hardly controllable camera trajectories

Declaration of Competing Interest

None.

Acknowledgments

This work was partially funded by the Agence Nationale de la Recherche in the frame of the EMMIE (Endoscopie MultiModale pour les lésions Inflammatoires de l’Estomac) project (ANR-15-CE17-0015).

Tan-Binh Phan received the B.S. degree in mathematics from the University of Science of Ho Chi Minh City, Vietnam, and the M.S. degree in Mathematics of the University of Tours, France, in 2016 and 2017, respectively. He is currently a researcher at the Centre de Recherche en Automatique (CRAN UMR 7039, CNRS/Université de Lorraine), Vandoeuvre-Lés-Nancy, France, where he is working towards a Ph.D. in the field of image processing. His research topic is structure from motion applied to

References (43)

  • J. Schönberger et al.

    Pixelwise view selection for unstructured multi-view stereo

    European Conference on Computer Vision (ECCV)

    (2016)
  • M.M. Kazhdan et al.

    Poisson surface reconstruction

    Eurographics Symposium on Geometry Processing

    (2006)
  • H. Vu et al.

    High accuracy and visibility-consistent dense multiview stereo

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • M. Waechter et al.

    Let there be color! large-scale texturing of 3D reconstructions

    ECCV

    (2014)
  • M. James et al.

    Straightforward reconstruction of 3D surfaces and topography with a camera: accuracy and geoscience application

    J. Geophys. Res.

    (2012)
  • J.-M. Frahm et al.

    Building rome on a cloudless day

    ECCV

    (2010)
  • D.J. Crandall et al.

    SfM with MRFs: discrete-continuous optimization for large-scale structure from motion

    IEEE Trans. Patter Anal. Mach. Intell.

    (2013)
  • A. Behrens et al.

    Local and global panoramic imaging for fluorescence bladder endoscopy

    Int. Conf. of the IEEE Engineering in Medicine and Biology Society

    (2009)
  • M.A. Fischler et al.

    Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

    Commun. ACM

    (1981)
  • J.L. Schönberger et al.

    Structure-from-motion revisited

    CVPR

    (2016)
  • N. Shevchenko et al.

    A high resolution bladder wall map: Feasibility study

    34th Int. Conf. of the IEEE Engineering in Medicine and Biology Society

    (2012)
  • Cited by (22)

    • Optical biopsy mapping on endoscopic image mosaics with a marker-free probe

      2022, Computers in Biology and Medicine
      Citation Excerpt :

      3D image mosaics can also be built using only 2D endoscopic images and superimposed by markerless biopsy data. However, due to higher computation times, a 3D map [54] can be useful for an accurate off-line examination documentation (and a second diagnosis), while 2D mosaics can be obtained in realtime and facilitate a diagnosis during gastroscopy itself. Omar Zenteno: Methodology, Software, Investigation, Writing - Original Draft.

    View all citing articles on Scopus

    Tan-Binh Phan received the B.S. degree in mathematics from the University of Science of Ho Chi Minh City, Vietnam, and the M.S. degree in Mathematics of the University of Tours, France, in 2016 and 2017, respectively. He is currently a researcher at the Centre de Recherche en Automatique (CRAN UMR 7039, CNRS/Université de Lorraine), Vandoeuvre-Lés-Nancy, France, where he is working towards a Ph.D. in the field of image processing. His research topic is structure from motion applied to endoscopic video-sequences for building 3D mosaics representing extended field of views of the inner epithelial walls of hollow organs (stomach in gastroscopy and bladder in cystoscopy).

    Dinh-Hoan Trinh received the Ph.D. degree in applied mathematics (signal and image processing) from the Laboratoire Analyse, Géométrie et Applications (LAGA), Université Paris 13, France in 2013. From 2007 to 2009, he was a researcher with the Institute of Mathematics, Vietnam Academy of Science and Technology. From 2014 to 2016, he was principal R&D engineer at the Viettel Group, Vietnam. From 2016 to September 2019, he was postdoctoral researcher at the Centre de Recherche en Automatique de Nancy (CRAN UMR 7039 CNRS/Lorraine University). Currently, he is a member of the VIBOT team (ERL CNRS 6000), Imagerie et Vision Artificielle (ImViA) laboratory, Université de Bourgogne, France. His research interests include data mining, image processing, medical imaging, computer vision, and machine learning.

    Didier Wolf received the Ph.D. degree in electrical engineering from the Institut National Polytechnique de Lorraine, Nancy, France, in 1986. Currently, he is University Professor at the Université de Lorraine (UL), where he is teaching in the signal processing field. He is the Director of the Centre de Recherche en Automatique de Nancy (CRAN UMR 7039 CNRS/Nancy University), where he also belongs to the biomedical engineering team. His main research interests include image processing, signal processing, and medical imaging techniques applied in the field of oncology.

    Christian Daul received the Ph.D. degree in computer vision from the Université Louis Pasteur, Strasbourg, France, in 1994. From 1990 to 1995, he was with the Laboratoire des Sciences de l’Image, de l’Informatique et de la Télédétection (current laboratory name: ICube, UMR 7357 CNRS/Université de Strasbourg) before joining the Institute of Applied Mathematics of Kaiserslautern (ITWM, Fraunhofer Institut für Techno- und Wirtschaftsmathematik, Germany), where he was a member of the Image Processing Group. Since October 1999, he has been with the Centre de Recherche en Automatique de Nancy (CRAN UMR 7039 CNRS/Université de Lorraine), Vandœuvre-Lès-Nancy, France, where he is working in the area of medical imaging (mainly in urology, gastroentorology and dermatology). His main research interests include image segmentation, data registration, 3D data reconstruction and 2D/3D endoscopic image mosaicing. He is University Professor at the Université de Lorraine (UL, Ecole Européenne d’Ingénieurs en Génie des Matériaux), Nancy, France, where he is teaching in the signal processing field.

    View full text