SEpi-3D: soft epipolar 3D shape measurement with an event camera for multipath elimination

Xinjie Yang; Qingmin Liao; Xiaowei Hu; Chenbo Shi; Guijin Wang; Guijin Wang

doi:10.1364/OE.482348

1. Introduction

Multipath interference has always been a complex problem for three-dimensional sensors. Geometry shapes such as pits, humps, corners, and arcs may produce multiple reflections. Specular or partially specular surfaces make the direct reflection and multiple reflections comparable in intensity, causing the 3D sensor to obtain the target object’s actual shape incorrectly [1]. Much research has been developed using spatial or temporal information to distinguish the light from the direct and indirect paths [2–14].

The spatial methods can be categorized into three principles: light transmission matrix, adaptive projection pattern, and homogeneous coding. The light transmission matrix method eliminates multipath interference from the retrieved light transport coefficients [2–4]. The light transport coefficients of each pixel are obtained by projecting a large number of Fourier patterns. By applying epipolar constraints to the light transport coefficients, the direct and multiple paths of the reflection can be distinguished. However, since a pattern can only obtain one component in the frequency domain, the number of projected patterns is related to the scene’s complexity. A regular scene often requires many patterns, such as 1,036,800 shots [2]. Jiang et al. determine the position of the measured object in the field of view according to the rough measurement results and only encode and project the position to reduce the number of projections [3]. However, tens of thousands of projection patterns are still needed for better measurement results. Later, Jiang et al. reduced the number of projecting patterns based on the Nyquist sampling theorem but still required 25,600 images [4].

Adaptive projection methods eliminate multipath interference by adjusting the projecting pattern to avoid multiple reflections. Assuming multiple reflections are often caused by part of the projected pattern, Jiang et al. [5] and Zhao et al. [6] project one line at a time to determine which lines generate multipath. Then they project complementary projections patterns to avoid inter-regional multipath interference. However, the number of projected patterns ranges from hundreds to thousands because the method requires the pattern number to equal the projector’s resolution. Moreover, the projection pattern for 3D reconstruction also increases proportionally when multiple reflective surfaces exist in the scene. Xu et al. [7] project a single-column fringe pattern with a DMD projector, forming an active binocular system. Although the system can distinguish multipath line by line, more than 1000 patterns are projected based on the required spatial resolution.

Homogeneous coding eliminates multipath interference by distinguishing light transport coefficients with pattern coding. O’Toole et al. [8] built a new 3D imaging system using a rolling shutter camera and laser scanning projector. By aligning each line of the camera and the projector, the system can eliminate reflection and ambient light outside the epipolar plane. However, the poses of the projector and the camera must meet the requirement of the epipolar geometry on the two displacement axes and the three rotation axes. Such strict conditions make hardware assembly difficult.

In summary of the spatial methods, these methods often require thousands to millions of modulating patterns to distinguish direct and indirect paths from pure spatial information. In comparison, the temporal method is used in non-line-of-sight imaging. The temporal method is based on capturing temporal intensity waveform with a high temporal resolution sensor such as APD or SPAD [9–11], enabling a short capture time. Then the reflections from different paths are distinguished by separating the peaks on the captured waveform. The same method is used in direct time-of-flight imaging, such as lidar and SPAD array 3D imaging. However, a high temporal resolution sensor was not applied to structured light 3D imaging until recent years. Therefore, no multipath-eliminating method utilizing the temporal information was possible before. Now, works with event cameras in structured light have been developed for higher dynamic range or higher framerate [12–15].

Inspired by the temporal method used in non-line-of-sight imaging and direct time-of-flight imaging, we propose the soft epipolar 3D(SEpi-3D) method. The method uses the asynchronous output characteristics of the event camera to implement a multipath-resistant 3D imaging system. By calibrating the external parameters of the projector and the event camera, their corresponding rows are aligned to the epipolar planes software-wise rather than hardware-wise. With the projector’s scanning direction information and projector calibration parameters, the event stream is filtered along the projector’s scanning line. After the event accumulation map corresponding to each projection pattern is obtained, the final 3D point cloud without multipath interference can be obtained through phase calculation, parallax matching, and projection transformation. The experimental results show that the system can effectively eliminate multipath interference with a small number of projecting patterns and without a rigid hardware alignment.

Section 2 explains the principle of the multipath effect in 3D imaging and the detailed steps of the SEpi-3D method. Section 3 displays a comparison experiment result between the regular fringe projection profilometry (FPP) method [16] and the SEpi-3D method. Section 4 concludes the paper and shows the problems and possible future improvements.

2. Principle

2.1 Epipolar imaging for multipath elimination

In 3D imaging, multipath interference arises from the same pixel receiving multiple reflections of actively projected light between surfaces in complex scenes. As shown in Fig. 1, the blue path is the multipath generated by the secondary reflection; the yellow path is the direct path generated by the direct reflection; the green path is the mixed path of the same scene point reflected direct path and multipath. The yellow pixel and blue pixel emit light modulated by a different part of the pattern. Then the green pixel receives the mixed light, causing the captured pattern differs from the direct patterns. If the components of the received mixed light can be separated, it is possible to eliminate multipath interference in 3D imaging. The epipolar constraint is commonly used to differentiate multipath. As shown in Fig. 1, the yellow and green pixels are located on the same polar plane. In contrast, the blue pixel is not located on the same polar plane, so if the projection pixels of different polar planes are time-divisionally projected, the green pixel can be distinguished.

Fig. 1. Solving multipath interference with epipolar imaging. The yellow and blue pixels are on and off the epipolar line. The yellow pixel illuminates the scene point directly, while the blue pixel illuminates the scene point through multiple reflections. The green pixel is the camera pixel that receives the mixed light reflected from the scene point and is also on the corresponding epipolar line.

Download Full Size | PDF

Laser projection has a temporal delay of projected pixels, enabling the real-time separation of multipath. It can be seen from Fig. 1 that when the unrectified projecting plane of the projector is not parallel to the baseline, the scanning line forms a certain angle with the epipolar line. The method of hardware epipolar constraints [8] is to align the pixel plane of the projector with the imaging plane by carefully controlling the camera and the projector’s three rotation axes and two displacement axes. The installation adjustment in this way is very complicated because five of the six degrees of freedom are strictly limited. So a method that does not rely on strict hardware alignment is desired.

2.2 Soft epipolar 3D imaging(SEpi-3D)

We propose the SEpi-3D method to eliminate the multipath interference with software-wise epipolar rectification, which aligns the scanning line of the projector and the imaging row in a software method. Figure 2 shows the flowchart of the SEpi-3D method. The laser projector and camera plane can be aligned through epipolar calibration when the alignment is processed from software rather than hardware. In that case, the scanning direction of the laser projection point on the plane cannot be guaranteed to become horizontal after calibration. Therefore we introduce the event camera into the system. Compared with the traditional frame-based camera, the event camera has higher temporal resolution and higher dynamic range. The event camera is a novel type of sensor that only output binary event along with the X-Y coordinate and the timestamp of a pixel whose light variation exceeds a preset threshold. This feature gives the event camera a high equivalent framerate with low data throughput. The scanning line can be tracked even though it is not perfectly horizontal by using the temporal information and the asynchronous output characteristic of the event camera.

Fig. 2. Flow chart for the soft epipolar 3D(SEpi-3D) method. (a) Acquiring projector and camera intrinsic and extrinsic for calculating soft epipolar constraint between the scan line and imaging pixel; (b) Capturing event stream while projecting patterns synchronously; (c) The reprojected scan line on the imaging plane is calculated with soft epipolar constraint at a particular timestamp, and disparity gating is applied; (d) The event stream is filtered according to the reprojected scan line at each timestamp; (e) Events are accumulated into a set of modulated patterns by the projecting interval; (f) The accumulated event patterns are decoded into a phase map, from which the disparity and depth maps are calculated.

Download Full Size | PDF

For stereo vision, the left and right images remapped after epipolar calibration have the feature of line-to-line alignment, which significantly reduces the complexity of finding corresponding feature points for binocular matching. In the case of removing multipath interference, the epipolar calibration of projectors and cameras also aims at aligning the lines, making it possible to distinguish the direct path on the same line and the multipath reflection on different lines. As shown in Fig. 2, the present multipath removal method consists of two parts of algorithms. The first part is spatial and temporal alignment, which means stereo rectification and acquiring an event stream synchronized with the projector. The second part is soft epipolar 3D imaging, which means multipath elimination by soft epipolar constraint and point cloud calculation from accumulated event patterns.

2.2.1 Spatial alignment with epipolar constraint

The internal parameter calibration of the event camera can be realized by displaying a flashing checkerboard or dot diagram on the LCD screen. First, the LCD screen will change several poses to obtain multiple events accumulated images. Then, Zhang Zhengyou’s calibration method [17] is applied to obtain the focal length, optical center, and distortion parameter. Second, the internal parameter calibration of the projector can use the traditional checkerboard or circle plate since the scanning projector can trigger event output automatically. The checkerboard corner or circle center on the accumulated images can be extracted by setting the appropriate threshold so that the event camera only generates events in the white part. The projector’s internal reference focal length, optical center, and distortion parameters are acquired with the calculated phase from the accumulated images and the reverse camera method. Then, the external parameter rotation matrix $R$ and translation vector $T$ from the projector to the camera coordinate system are obtained by shooting the same calibration board.

The spatial alignment of the event-based camera and laser projector is based on the internal and external parameters $\mathbf {R}, \mathbf {T}$ of the calibrated projector and camera. The new rotation matrix $\mathbf {R}_l, \mathbf {R}_r$ and the new projection matrix $\mathbf {P}_l, \mathbf {P}_r$ of the projector and the camera is constructed to make the rectified projector and camera planes satisfy

(1)$$\begin{cases} \mathbf{x}_l\cdot(\mathbf{t}\times \mathbf{x}_l)=0 \\ \mathbf{x}_r\cdot(\mathbf{t}\times \mathbf{x}_r)=0 \\ (\mathbf{t}\times \mathbf{x}_l)\cdot(\mathbf{t}\times \mathbf{x}_r)=1 \end{cases},$$

where $\mathbf {x}_l$ and $\mathbf {x}_r$ are the x-axis normal vectors of the projector and the camera coordinates, and $\mathbf {t}$ is the baseline normal vector. In order to obtain the image after epipolar calibration, it is necessary to construct a pixel coordinate map $\mathbf {M}_l, \mathbf {M}_r$ between the original rotation matrix, projection matrices, and the new rotation matrices $\mathbf {R}_l, \mathbf {R}_r$ and projection matrices $\mathbf {P}_l, \mathbf {P}_r$. We denote the new focal lengths as $f_l, f_r$. For a scene point on the polar plane to be projected onto the same row, the new focal length should satisfy $f=f_l=f_r$.

2.2.2 Temporal alignment with scan line synchronization

The temporal alignment of the event-based camera and laser projector is based on synchronization. The 2D galvanometer is responsible for projecting the laser points onto the scene row by row. The driver board controls the row and column scanning frequencies of the 2D galvanometer. Taking an image with $H \cdot W$ pixels projected by a projector as an example, if the image frame rate is $F$, the column scanning frequency is $H \cdot F$, the row scanning frequency is $F$, and the laser modulation frequency is $H \cdot W \cdot F$. The event camera is responsible for receiving the scan pattern returned from the scene. Because the event camera has no concept of frames, synchronization can be achieved based on the first-pixel projection signal of the laser and the timestamp returned by the event camera. When the laser and the galvanometer project the first pixel, a pulse signal is generated synchronously. On receiving the pulse signal, the computer reads the current timestamp $t_0$ from the event stream $(x, y, p, t)$. If the projected pixel frequency is $H \cdot W \cdot F$ and the projected frame interval is $T$, then it can be known that the timestamp of the $Nth$ pixel is $t=t_0+N \cdot T/(H \cdot W \cdot F)$.

2.2.3 Multipath elimination by narrowing ambiguous region

The soft epipolar constraint considers the laser scan line direction on the camera plane. For an ideal epipolar scanning system, regardless of the distance between the scene points, the corresponding point of a particular projector pixel in the camera can only appear in the corresponding line. However, since the scanning direction of the projector does not strictly follow the direction of the epipolar line, the laser scanning track still goes across multiple epipolar lines. The scanning direction of the projector on the rectified projector plane can be calculated with the information of the rotation matrix $\mathbf {R}_l$. And $\mathbf {R}_l$ can be decomposed into rotations $\mathbf {R}_x, \mathbf {R}_y, \mathbf {R}_z$ along the corresponding axes. We decompose the $\mathbf {R}_l$ in the order of

(2)$$\mathbf{R}_l=\mathbf{R}_z(\theta_z)\mathbf{R}_y(\theta_y)\mathbf{R}_x(\theta_x) \qquad.$$

Since Eq. (1) only restrict the relative rotation along baseline direction, we can make the $\mathbf {R}_x(\theta _x)$ in $\mathbf {R}_l$ an identity matrix to simplify the rotation for scanning direction. The rotation $\mathbf {R}_x(\theta _x)$ will be done by the camera side in $\mathbf {R}_r$ to satisfy Eq. (1). For $\mathbf {R}_y(\theta _y)$, as shown in Fig. 3(a), the projector coordinates of the scan line’s two endpoints whose heights are $h$ on the imaging plane are

(3)$$\begin{cases} A^{'} =({-}w\cdot\cos\theta_y,h,f_l+w\cdot\sin\theta_y) \\ B^{'} =(w\cdot\cos\theta_y,h,f_l-w\cdot\sin\theta_y) \end{cases} ,$$

where $w, h$ can be calculated by the pixel coordinate of the scan line and projection matrix $\mathbf {P}_l$. So the coordinates of the scanning line endpoints projected onto the plane after rotation $\mathbf {R}_y(\theta _y)$ is

(4)$$\begin{cases} A^{\prime\prime} =(\frac{-f_lx\cdot\cos\theta_y}{f_l+w\cdot\sin\theta_y},\frac{f_lh}{f_l+w\cdot\sin\theta_y},f_l ) \\ B^{\prime\prime} =(\frac{f_lx\cdot\cos\theta_y}{f_l-w\cdot\sin\theta_y},\frac{f_lh}{f_l-w\cdot\sin\theta_y},f_l ) \end{cases} .$$

And the $k, b$ of the scanning line $y = k\cdot x + b$ on the plane after rotation $\mathbf {R}_y(\theta _y)$ are

(5)$$\begin{cases} k=\frac{y\cdot tan\theta_y}{f_l} \\ b=h \end{cases}.$$

Then for the $\mathbf {R}_z(\theta _z)$, as shown in Fig. 3(b), the coordinates of the rotated endpoints will be

(6)$$\begin{cases} A^{\prime\prime\prime} =\mathbf{R}_z(\theta_z)\cdot A^{\prime\prime} \\ B^{\prime\prime\prime} =\mathbf{R}_z(\theta_z)\cdot B^{\prime\prime} \end{cases},$$

where $\mathbf {R}_z(\theta _z)$ is

(7)$$\mathbf{R}_z(\theta_z)=\begin{bmatrix} cos\theta_z & -sin\theta_z & 0\\ sin\theta_z & cos\theta_z & 0\\ 0 & 0 & 1\\ \end{bmatrix}\qquad.$$

Therefore, the $k', b'$ of the scanning line $y = k'\cdot x + b'$ on the plane after epipolar calibration are

(8)$$\begin{cases} k'=\frac{k\cdot cos\theta_z+sin\theta_z}{cos\theta_z-k\cdot sin\theta_z} \\ b'=\frac{b}{cos\theta_z-k\cdot sin\theta_z} \end{cases}.$$

Then we have the temporal epipolar constraint for the projector pixel and camera pixel on the rectified focal plane as

(9)$$\mathbf{P}^{{-}1}_l\frac{Z}{f_l}\begin{bmatrix} x_{p}(t) \\k'x_{p}(t)+b' \\f_l \end{bmatrix}+\mathbf{T}= \mathbf{P}^{{-}1}_r\frac{Z}{f_r}\begin{bmatrix} x_{c} (t) \\y_{c} (t) \\f_r \end{bmatrix} \qquad ,$$

where $x_{p}(t), z_{p}(t)$ and $x_{c}(t), y_{c}(t), z_{c}(t)$ are the coordinates of the projected scene point at timestamp $t$ under the epipolar rectified projector and camera coordinate system, $Z$ is the z-coordinate of the scene point and $\mathbf {T}$ is the baseline vector. This means that within the minimum resolution time interval of the event camera, if one scan line of the projector covers a width range of $x$ on the rectified projector plane, then the height range of the direct path on the rectified camera plane will be $| k'x |$, as shown in Fig. 4(a). The above scan line calculation does not involve distortion parameters, so it is a close estimation when camera and projector distortion is minimal. The undistorted scan line can be acquired by using the mapping relationship $M_l, M_r$. The distortion causes the scan line to skew and makes it harder to determine the region of the direct path.

Fig. 3. Relationship between tilt angle and scan line direction. $A^{'}B^{'}$ and $A^{'''}B^{'''}$ are the scan lines on the projection plane before and after rectification. Tilt angle $\theta _y$ and $\theta _z$ causes the direction of scan line $A^{'''}B^{'''}$ to mismatch with the direction of the horizontal epipolar line on the rectified projection plane.

Download Full Size | PDF

Fig. 4. Ambiguous region for multipath elimination. (a)Restricted by epipolar plane only; (b)Restricted by epipolar plane and disparity range. The red grid areas are the ambiguous regions calculated using the soft epipolar constraint. By applying disparity gating, the ambiguous region can be narrowed down to a region along the reprojected scan line with reference disparity $d$.

Download Full Size | PDF

We can introduce the disparity constraint to further reduce the ambiguous region so that only part of the camera rows will be in the ambiguous range. As shown in Fig. 4(b), if the reference depth is $Z_0$ and the range of depth is $\Delta Z$, according to the relationship between disparity and depth, the reference disparity on the rectified camera plane will be $d$, and the disparity range will be $\Delta d$. The relationship between disparity and depth is

(10)$$z={-}\frac{T_{x} f}{d-(c_{x} -c_{x'})} \qquad,$$

where $T_{x}$ is the baseline length, $c_{x}$ and $c_{x'}$ are the x-coordinates of the camera and projector’s optical center. Then the current projection pixel of the projector is determined according to the timestamp, and multipath filtering is performed on the event stream according to the projected pixel’s epipolar constraint and disparity constraint. Firstly, the coordinates of the current projection pixel on the projector plane are calculated according to the synchronization signal and the projection frequency. Then according to the mapping $M_l$, the projection plane coordinates of the pixels after calibration are calculated. Having the index of the current project point in one project frame and the height $H$ and width $W$ of the frame, the position of the current project point on the unrectified plane is

(11)$$\begin{cases} X=\left (\lfloor \frac{N \bmod \left ( W\cdot H \right ) }{W} \right \rfloor-c_x) \\ Y=[(N \bmod W)-c_y] \end{cases} ,$$

where $c_x, c_y$ is the pixel coordinate of the projector’s optical center. Then according to the mapping $M_l$, the coordinate of the projected point on the rectified plane is

(12)$$(X',Y')=M_l(X,Y) \qquad.$$

With $(X',Y')$ calculated, $k',b'$ can be determined with Eq. (5) and Eq. (8). Therefore, the range of direct path on rectified imaging plane is

(13)$$\begin{cases} k'(x+d-\frac{\Delta d}{2})+b' \lt y \lt k'(x+d+\frac{\Delta d}{2})+b' \qquad(k'\gt 0) \\ k'(x+d+\frac{\Delta d}{2})+b' \lt y \lt k'(x+d-\frac{\Delta d}{2})+b' \qquad(k'\leq0) \end{cases}.$$

All events outside this interval range are filtered out, as shown in Fig. 5.

Fig. 5. An example of multipath interference being filtered out. The events in the red box are multipath interference. The events between the yellow lines satisfy the epipolar and disparity constraints.

Download Full Size | PDF

2.2.4 Event accumulation and phase decoding

After multipath filtering is performed on the event stream, the events in the event stream must be accumulated to obtain the projection image returned by the scene. The starting point of the accumulation time is set to the timestamp where the synchronization signal is located, and the accumulation time is set to the period of one frame. Depending on the projection pattern, the phase recovery algorithm is also different. We use the phaseshift method with Gray code, the most widely used method in industrial applications. Figure 6 shows the comparison before and after applying the multiple reflection filtering algorithms. The phaseshift method recovers the wrapped phase from N phaseshift patterns with

(14)$$\phi _{wrap} ={-}\arctan \left [ \frac{\sum_{m=0}^{N-1}I_{m}\sin \left ( \frac{2m\pi }{N} \right ) }{\sum_{m=0}^{N-1}I_{m}\cos \left ( \frac{2m\pi }{N} \right ) } \right ] \qquad,$$

where $I_m$ is the image value of the $m$-th phaseshift pattern. Then, the unwrapping period is recovered from M gray code patterns with

(15)$$\begin{cases} G_{0}=C_{0}\oplus \mathbf{0} \\ G_{m}=C_{m}\oplus G_{m-1} \qquad(m=1,\ldots,M-1) \end{cases} ,$$

where $C_{m}$ is the image value of the $m$-th Gray code pattern, and $G_{m}$ is the $m$-th bit of the Gray code. And then, the unwrapping phase is obtained by superimposing the wrapping phase with the unwrapping period by

(16)$$\phi _{unwrap} =\phi _{wrap} + \sum_{m=0}^{M-1} \left ( G_{0}\cdot 2^{m+1} \cdot \pi \right ) \qquad.$$

Fig. 6. Comparison of multipath filtering. (a) Event accumulation graph for a pattern affected by multipath; (b) Event accumulation graph for a pattern with the effects of multipath removed; (c) Unwrapped phase map affected by multipath; (d) Unwrapped phase map unaffected by multipath.

Download Full Size | PDF

After obtaining the phase map after multipath removal, perform phase value matching line by line on the image after epipolar correction, and obtain the parallax according to the difference between the pixel coordinates of the projector plane and the pixel coordinates of the corresponding points on the phase map. Then convert the parallax to point cloud coordinates through

(17)$$\begin{bmatrix} X\\ Y\\ Z\\ W \end{bmatrix} =\mathbf{Q} \begin{bmatrix} x\\ y\\ d\\ 1 \end{bmatrix} \qquad,$$

where the matrix $\mathbf {Q}$ is constructed by

(18)$$\mathbf{Q}=\begin{bmatrix} 1 & 0 & 0 & -c_{x}\\ 0 & 1 & 0 & -c_{y}\\ 0 & 0 & 0 & f\\ 0 & 0 & -\frac{1}{T_{x}} & \frac{c_{x}-c_{x^{'}}}{T_{x}} \end{bmatrix} \qquad.$$

3. Experiments

As shown in Fig. 7, our system consists of a laser scanning projector MP-CL1A and an event camera SilkyEvCam. MP-CL1A is a laser projector based on a two-dimensional MEMS galvanometer released by Sony. This projector has an HDMI input resolution of 1280x720, can project at a frame rate of 60fps, and has RGB color lasers. To effectively eliminate the multipath interference, we select a high temporal resolution event camera, SilkyEvCam launched by Prophesee, with 640x480 spatial resolution and 1us temporal resolution. In the experiments, we place the event camera and the projector side by side with a baseline of 15cm. In order to make the field of view of the event camera and the projector overlap as much as possible, a 16mm C-mount lens was selected for the event camera with a field of view of 32$^{\circ }$x25$^{\circ }$. We select a fringe period of 18 pixels, 18 phase shift patterns with a step of 1 pixel, and 7 Gray code patterns to alleviate the quantization effect and calculate the phase map correctly [18].

Fig. 7. System setup. (a) System setup diagram; (b) Actual setup. 1. The projector is modified from a commercial laser projector MP-CL1A. The projected pattern is transferred to the projector control board through the HDMI interface. The projecting frame synchronizing signal is sampled from the modulating waveform for the MEMS galvo mirror. 2. The event camera SilkyEvCam captures the event stream stimulated by the raster scanning pattern. 3. The whole algorithm is implemented in MATLAB, and its runtime is 0.32s on a computer with Intel i7-1165G7 CPU @2.8GHz.

Download Full Size | PDF

In order to verify the effectiveness of multipath elimination, three common objects are chosen, as shown in Fig. 8. These three objects are placed 40-80cm away from the camera. In Fig. 9, Fig. 10, and Fig. 11, the first row of (a)-(c) are the results that are calculated with the FPP method, and the second row of (a)-(c) are the results using FPP after SEpi-3D method.

Fig. 8. RGB image of the tested objects. (a) Plastic tray; (b) Metal part; Ceramic bowl.

Download Full Size | PDF

Fig. 9. Multipath elimination of the tray scene. (a) Accumulated pattern; (b) Depth map; (c) Point cloud.

Download Full Size | PDF

Fig. 10. Multipath elimination of the metal part scene. (a) Accumulated pattern; (b) Depth map; (c) Point cloud.

Download Full Size | PDF

Fig. 11. Multipath elimination of the bowl scene. (a) Accumulated pattern; (b) Depth map; (c) Point cloud.

Download Full Size | PDF

As shown in Fig. 8(a), the tray is made of red plastic with a partially specular surface. It can be seen in Fig. 9(a) that the inter-reflection of the inner corner causes an area where the stripe pattern is indistinguishable. A noticeable pattern can be identified in the second image of Fig. 9(a) after the SEpi-3D method. We can also see that the multipath distortion in Fig. 9(b)(c) is eliminated compared to the first row.

As shown in Fig. 8(b), the metal part has a 90-degree corner and several holes and arcs on its surface. Metal often has a severe specular reflection and can cause strong multipath distortion. In the FPP result of Fig. 10(a), the pattern details are entirely overwhelmed by the multipath interference. While in the SEpi-3D result of Fig. 10(a), the details are finely preserved. In consequence, the depth map and point cloud in the FPP result of Fig. 10(b)(c) have no hole or corner that can be identified. With the SEpi-3D, the holes and corners can be seen in the SEpi-3D result of Fig. 10(b)(c). However, part of the lower surface is missing, probably due to the intensity of the direct reflection from the metal surface being too low. In other words, the original light received at this part may mainly consist of indirect light.

As shown in Fig. 8(c), the bowl is ceramic and has a specular surface. In addition to the tray and metal part, the bowl has a spherical shape that can lead to multipath interference in all directions. In Fig. 11(a), the multipath interference occurs inside the bowl and on the concentric circles of the bowl. Comparing FPP and SEpi-3D result in Fig. 11(b)(c), most of the distortion caused by multipath interference is eliminated by the SEpi-3D method. However, some distortions near the horizontal line remain because the epipolar constraint cannot distinguish multipath on the same epipolar line.

A numerical experiment is conducted to quantify the effectiveness of the SEpi-3D method. The event camera captures the original scene point cloud, and then the surface is sprayed with white paint. The influence of multipath interference is restricted by changing the surface characteristics to a diffuse reflection surface, as shown in Fig. 12. Then the same event camera captures the depth map of the scene after spraying at the same setting of viewpoint to avoid comparison error. And the depth map is calculated by averaging 10 repetitive captures and is adopted as the ground truth. The depth map reconstructed by FPP and SEpi-3D method is then compared with this ground truth value. As shown in Fig. 13, (a)-(c) are the difference maps of depth capturedwith the FPP and the SEpi-3D method. The RMSE and the percentage of pixels with errors larger than 5mm of the FPP and SEpi-3D methods are shown in Table 1. The result shows that the RMSE decreases by 6.55mm on average in the tested multipath scene, and the percentage of error points decreases by 7.04%.

Fig. 12. Method for obtaining the reference depth map. (a) Without paint; (b) With paint. The FPP and SEpi-3D depth map are captured in the event camera without the white paint where multipath interference occurs. The ground truth is captured by the same event camera in the same position with the white paint and calculated with the FPP method, where multipath interference does not occur due to the diffuse surface.

Download Full Size | PDF

Fig. 13. Difference map between ground truth and depth map. (a) The tray scene; (b) The metal scene; (c) The bowl scene.

Download Full Size | PDF

Table 1. RMSE and percentage of errors of the multipath scenes.

View Table

4. Conclusions

In this paper, we construct an event camera soft epipolar scanning system that can resist multipath interference with dozens of projection patterns without a rigid hardware setup to meet the epipolar constraint. This is achieved by combining software-wise epipolar rectification with temporal information from the event stream. The experiments verified the effectiveness of the system against multipath interference. In future work, we will improve the physical alignment between the galvo mirror and the event camera with a customized scanning waveform to further narrow the ambiguous range. The noise level of our system setup is relatively high due to the discrete event output and event triggering noise. In the future, such noise can be further reduced by using a gray-value event camera with high temporal resolution. Also, with a higher temporal resolution, the events in the same row can be distinguished by temporal information. With disparity constraint, it is possible to eliminate multipath in the epipolar line and further improve the effectiveness of multipath elimination.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. M. O’Toole, J. Mather, and K. N. Kutulakos, “3d shape and indirect appearance by structured light transport,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2014), pp. 3246–3253.

2. H. Jiang, H. Zhai, Y. Xu, X. Li, and H. Zhao, “3d shape measurement of translucent objects based on fourier single-pixel imaging in projector-camera system,” Opt. Express 27(23), 33564–33574 (2019). [CrossRef]

3. H. Jiang, S. Zhu, H. Zhao, B. Xu, and X. Li, “Adaptive regional single-pixel imaging based on the fourier slice theorem,” Opt. Express 25(13), 15118–15130 (2017). [CrossRef]

4. H. Jiang, Y. Li, H. Zhao, X. Li, and Y. Xu, “Parallel single-pixel imaging: A general method for direct–global separation and 3d shape reconstruction under strong global illumination,” Int. J. Comput. Vis. 129(4), 1060–1086 (2021). [CrossRef]

5. H. Jiang, Y. Zhou, and H. Zhao, “Using adaptive regional projection to measure parts with strong reflection,” AOPC 2017: 3D Measurement Technology for Intelligent Manufacturing, vol. 10458 (SPIE, 2017), pp. 345–350.

6. H. Zhao, Y. Xu, H. Jiang, and X. Li, “3d shape measurement in the presence of strong interreflections by epipolar imaging and regional fringe projection,” Opt. Express 26(6), 7117–7131 (2018). [CrossRef]

7. Y. Xu, H. Zhao, H. Jiang, Y. Wang, and X. Li, “3d shape measurement in the presence of interreflections by light stripe triangulation with additional geometric constraints,” in Optical Measurement Systems for Industrial Inspection XI, vol. 11056 (SPIE, 2019), pp. 927–932.

8. M. O’Toole, S. Achar, S. G. Narasimhan, and K. N. Kutulakos, “Homogeneous codes for energy-efficient illumination and imaging,” ACM Trans. Graph. 34(4), 1–13 (2015). [CrossRef]

9. M. O’Toole, D. B. Lindell, and G. Wetzstein, “Confocal non-line-of-sight imaging based on the light-cone transform,” Nature 555(7696), 338–341 (2018). [CrossRef]

10. S. Hernandez-Marin, A. M. Wallace, and G. J. Gibson, “Bayesian analysis of lidar signals with multiple returns,” IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2170–2180 (2007). [CrossRef]

11. M. O’Toole, F. Heide, D. B. Lindell, K. Zang, S. Diamond, and G. Wetzstein, “Reconstructing transient images from single-photon sensors,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), pp. 1539–1547.

12. G. Wang, C. Feng, X. Hu, and H. Yang, “Temporal matrices mapping-based calibration method for event-driven structured light systems,” IEEE Sens. J. 21(2), 1799–1808 (2020). [CrossRef]

13. X. Huang, Y. Zhang, and Z. Xiong, “High-speed structured light based 3d scanning using an event camera,” Opt. Express 29(22), 35864–35876 (2021). [CrossRef]

14. M. Muglikar, G. Gallego, and D. Scaramuzza, “Esl: Event-based structured light,” in 2021 International Conference on 3D Vision (3DV), (IEEE, 2021), pp. 1165–1174.

15. N. Matsuda, O. Cossairt, and M. Gupta, “Mc3d: Motion contrast 3d scanning,” in 2015 IEEE International Conference on Computational Photography (ICCP), (IEEE, 2015), pp. 1–10.

16. C. Zuo, S. Feng, L. Huang, T. Tao, W. Yin, and Q. Chen, “Phase shifting algorithms for fringe projection profilometry: A review,” Opt. Lasers Eng. 109, 23–59 (2018). [CrossRef]

17. Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000). [CrossRef]

18. L. Ekstrand and S. Zhang, “Three-dimensional profilometry with nearly focused binary phase-shifting algorithms,” Opt. Lett. 36(23), 4518–4520 (2011). [CrossRef]

Scene	RMSE(mm)		Percentage of Errors(%)
Scene	FPP	SEpi-3D	FPP	SEpi-3D
box	6.20	3.23	9.3	2.4
metal	19.88	6.14	11.1	0.5
bowl	7.67	4.72	3.5	1.1

SEpi-3D: soft epipolar 3D shape measurement with an event camera for multipath elimination

Abstract

1. Introduction

2. Principle

2.1 Epipolar imaging for multipath elimination

2.2 Soft epipolar 3D imaging(SEpi-3D)

2.2.1 Spatial alignment with epipolar constraint

2.2.2 Temporal alignment with scan line synchronization

2.2.3 Multipath elimination by narrowing ambiguous region

2.2.4 Event accumulation and phase decoding

3. Experiments

4. Conclusions

Disclosures

Data availability

References

Data availability

Cited By

Figures (13)

Tables (1)

Equations (18)

Optics Express