Visual EKF-SLAM from Heterogeneous Landmarks

Esparza-Jiménez, Jorge Othón; Devy, Michel; Gordillo, José L.

doi:10.3390/s16040489

Open AccessArticle

Visual EKF-SLAM from Heterogeneous Landmarks^†

by

Jorge Othón Esparza-Jiménez

^1,*,

Michel Devy

^2,‡ and

José L. Gordillo

^1,‡

¹

Center for Robotics and Intelligent Systems, Tecnológico de Monterrey, Monterrey 64849, Mexico

²

CNRS, LAAS, Université de Toulouse, 7 Avenue du Colonel Roche, Toulouse F-31400, France

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in Esparza-Jimenez, J.O.; Devy, M.; Gordillo, J.L. EKF-based SLAM fusing heterogeneous landmarks. In Proceedings of the IEEE 2014 17th International Conference on Information Fusion (FUSION), Salamanca, Spain, 7–10 July 2014; pp. 1–8

^‡

These authors contributed equally to this paper.

Sensors 2016, 16(4), 489; https://doi.org/10.3390/s16040489

Submission received: 31 December 2015 / Revised: 24 March 2016 / Accepted: 30 March 2016 / Published: 7 April 2016

(This article belongs to the Special Issue Sensors for Robots)

Download

Browse Figures

Versions Notes

Abstract

:

Many applications require the localization of a moving object, e.g., a robot, using sensory data acquired from embedded devices. Simultaneous localization and mapping from vision performs both the spatial and temporal fusion of these data on a map when a camera moves in an unknown environment. Such a SLAM process executes two interleaved functions: the front-end detects and tracks features from images, while the back-end interprets features as landmark observations and estimates both the landmarks and the robot positions with respect to a selected reference frame. This paper describes a complete visual SLAM solution, combining both point and line landmarks on a single map. The proposed method has an impact on both the back-end and the front-end. The contributions comprehend the use of heterogeneous landmark-based EKF-SLAM (the management of a map composed of both point and line landmarks); from this perspective, the comparison between landmark parametrizations and the evaluation of how the heterogeneity improves the accuracy on the camera localization, the development of a front-end active-search process for linear landmarks integrated into SLAM and the experimentation methodology.

Keywords:

SLAM; EKF; computer vision; landmarks; points; lines

Graphical Abstract

1. Introduction

Simultaneous localization and mapping (SLAM) is an essential functionality required on a moving object for many applications where the localization or the motion estimation of this object must be determined from sensory data acquired by embedded sensors. The object is typically a robot or a vehicle, the position of which is required to deal with robust navigation in a cluttered environment. A SLAM module could also be required on smart tools (phones, glasses) to offer new services, e.g., augmented reality [1,2,3].

The robot or smart tool could be equipped with a global navigation satellite system (GNSS) receiver for outdoor applications to obtain directly a position with respect to the Earth reference frame [4]; at present, indoor localization with respect to a building reference frame could also be provided using ultra-wide band (UWB) [5], WiFi [6] or RF devices [7], on the condition that a hotspot or antenna network has been previously installed and calibrated. However, the direct localization is not always available (i.e., occlusions, bad propagation, multiple paths); so generally, they are combined using loose or tie fusion strategies, with motion estimates provided by an inertial measurement unit (IMU), integrating successive accelerometer and gyro data [8,9,10]. Nevertheless, even GPS-IMU fusion could fail or be too inaccurate. Depending on the context, a priori knowledge could be exploited; a map matching function can be sufficient, as in the GPS-based navigation systems available on commercial vehicles.

Considering mobile robots or the emerging autonomous vehicles, it is necessary to also make use of data acquired on the environment with embedded exteroceptive sensors, e.g., laser range finders [11], 3D sensors (ToF cameras [12], Kinect [13]) or vision with many possible modalities (mono, stereo, omni). Here, only visual SLAM is considered, due to the fact that it could be integrated both in low-cost unmanned ground and aerial vehicles and on smart tools equipped with cameras. Many visual SLAM methods have been proposed during the last decade [3,14].

A SLAM method combines two interleaved functionalities shown in Figure 1: the front-end detects and tracks features from images acquired from the moving robot, while the back-end, interpreting these feature and landmark observations, estimates both the landmark and robot positions with respect to the selected reference frame.

The back-end can be based either on estimation (Kalman [15], information [16], particle filters [17]) or optimization (bundle adjustment) frameworks [18]. The more classic landmarks are 3D points, detected as interest points (SIFT [19], SURF [20], FAST [21]), matched by using their descriptors (binary robust independent elementary features (BRIEF) [22]), or tracked by using Kanade-Lucas-Tomasi (KLT) feature tracker [23], or an active-search strategy [24]. Generally, the set of 3D points extracted from an image does not give any semantic information, unlike 3D lines, which correspond generally to sharp 3D edges in the environment. This is the reason why segment-based SLAM from either an estimation [25] or optimization [26] back-end, has been proposed. The main challenge of these methods concerns the front-end, i.e., the robustness of the line detection and tracking in successive images.

The initialization of such landmarks with their minimal Euclidean parameters requires more than one observation. One way to solve this problem was the delayed initialization [3,27], in which a landmark was added to the map only when it was known in the Euclidean space. This does not allow use of landmarks that are very far from the robot. An alternative solution is to add them to the map, as soon as they are observed (i.e., undelayed initialization), and it has been proposed for point [28,29] or line [25] landmarks. The pros and cons of several representations for 3D points and 3D lines have been analyzed in [30].

This article is devoted to the analysis of a visual SLAM solution using a heterogeneous map, as a more complete approach where points and lines are both included from features extracted in images acquired by a camera moving in the environment, and with undelayed initialization. Therefore, the contributions of the proposed method comprehend the use of heterogeneous landmark-based SLAM (the management of a map composed of heterogeneous landmarks), and from this perspective, the comparison between landmark parametrizations, the development of a front-end active-search process for linear landmarks integrated into SLAM (the processing of linear landmarks by the inclusion of detection and tracking algorithms taken from the state of the art) and the experimentation methodology. These contributions correspond to the gray blocks in Figure 1. Only the first two contributions were covered in [31], which dealt with the information fusion back-end for the map computation. The present article extends this concept to its application on real images, by developing the proposed front-end. Even if optimization-based methods make it possible to avoid the possible divergence of methods based on estimation due to linearization of the observation model, the fusion is performed from an extended Kalman filter, as a very light approach that can be integrated into a dedicated architecture to be used on small aerial vehicles.

Section 2 covers the techniques and parametrizations for the initialization and update of point and line landmarks on the map. Section 3 describes the detection and tracking methods selected and developed for the front-end. Section 4 focuses on experiments, including a simulation part that compares landmark representations, a real image part that recalls the integration of segments with the already existing point-based front-end and an integral experiment. The results of the experiments are presented and discussed in Section 5. Finally, Section 6 offers some conclusions.

2. SLAM Back-End

The SLAM back-end deals with the initialization of observed features as landmarks on the map and the estimation or update of both the landmarks and the robot positions with respect to a selected reference frame. This section deepens the description of each landmark parametrization, the initialization and update algorithms and it derives from the theoretical study presented in [31].

The undelayed landmark initialization (ULI) is presented in [30] for different point and line parametrizations. The main idea is to model the uncertainty derived from unmeasured degrees of freedom by a Gaussian prior that handles unbounded limits, in a manner that can be handled by EKF.

The implications of uncertainty are different for point and line landmarks. For points, depth is initially unknown, and uncertainty is present along the visual ray until infinity. In the case of infinite straight lines, uncertainty is present in two degrees of freedom, corresponding to a depth that should be covered up to infinity and all possible orientations.

2.1. 3D Point Parametrizations

In this section, a Euclidean point is described as a reference, for subsequent covering of the ones used for initialization purposes (i.e., homogeneous point and anchored homogeneous point). Each description includes camera projection and back-projection.

Point parametrizations used for initialization are shown in Figure 2.

2.1.1. Euclidean Point

A Euclidean point is parametrized by its Cartesian coordinates.

L_{E P} = p = {[x y z]}^{T} \in R^{3}

The projection to the camera frame is performed as follows:

\underset{̲}{u} = K R^{T} (p - T) \in P^{2}

(1)

where,

K = [\begin{matrix} α_{u} & 0 & u_{0} \\ 0 & α_{v} & v_{0} \\ 0 & 0 & 1 \end{matrix}]

is the intrinsic parameter matrix and

R

and

T

are the rotation matrix and translation vector that define the camera

C

. Homogeneous coordinates are represented by underlined vectors, like

\underset{̲}{u}

.

2.1.2. Homogeneous Point

Homogeneous points are four-vector composed of the 3D vector m and scalar ρ, as introduced in [32].

L_{H P} = \underset{̲}{p} = [\begin{matrix} m \\ ρ \end{matrix}] = {[m_{x} m_{y} m_{z} ρ]}^{T} \in P^{3} \subset R^{4}

The vector m gives the direction from the origin

O

to the point p, while ρ serves as a scale factor for providing the magnitude for each coordinate of the point.

The conversion from homogeneous to Euclidean coordinates is given by the following equation:

p = \frac{m}{ρ}

(2)

Depending on the characteristics of the parameters m and ρ, there are three different canonical representations for a homogeneous point. The original Euclidean point refers to the case when ρ = 1, inverse-depth has

m_{z} = 1

and in inverse-distance

∥ m ∥ = 1

.

In the camera frame, m is the director vector of the optical ray, and ρ has a linear dependence on the inverse of the distance d defined from the optical center to the point:

ρ = \frac{∥ m ∥}{d}

The unbounded distance of a point along the optical ray from zero to infinity can then be expressed in the bounded interval in parameter space

ρ \in (0, ∥ m ∥ / d_{m i n}]

.

The frame transformation of a homogeneous point is performed according to the next equation:

\underset{̲}{p} = H {\underset{̲}{p}}^{C} = [\begin{matrix} R & T \\ 0 & 1 \end{matrix}] {\underset{̲}{p}}^{C}

(3)

where super-index

C

indicates the frame to which the point is referred and matrix H specifies the frame to which the point is transformed.

The projection of a homogeneous point into the image frame is performed with the following equation:

\underset{̲}{u} = K R^{T} (m - T ρ) \in P^{2}

(4)

By expressing a homogeneous point in the camera frame, the projected image point is

\underset{̲}{u} = {Km}^{C}

, where super-index

C

indicates the frame to which the point is referred. In this case,

ρ^{C}

is not measurable. Back-projection is then:

m^{C} = K^{- 1} \underset{̲}{u}

The complete homogeneous point parametrization is the following:

\begin{matrix} L_{H P} = \underset{̲}{p} = [\begin{matrix} m \\ ρ \end{matrix}] = H [\begin{matrix} K^{- 1} \underset{̲}{u} \\ ρ^{C} \end{matrix}] = [\begin{matrix} R K^{- 1} \underset{̲}{u} + T ρ^{C} \\ ρ^{C} \end{matrix}] \end{matrix}

(5)

where

ρ^{C}

must be given as a prior and represents the inverse-distance from the origin of the coordinates, that is the scalar value that makes

∥ m ∥ = 1

.

2.1.3. Anchored Homogeneous Point

Linearity is supposed to be improved by the addition of an anchor that serves as a reference to the optical center at the initialization time of the landmark. The landmark is then composed of seven elements that include Cartesian coordinates of the anchor, the point with respect to the anchor and an inverse-distance scalar.

L_{A H P} = [\begin{matrix} p_{0} \\ m \\ ρ \end{matrix}] = {[x_{0} y_{0} z_{0} m_{x} m_{y} m_{z} ρ]}^{T} \in R^{7}

The conversion from the anchored homogeneous point to Euclidean coordinates can be obtained by the following equation:

p = p_{0} + \frac{m}{ρ}

(6)

The projection and frame transformation process is given below:

\underset{̲}{u} = K R^{T} (m - (T - p_{0}) ρ) \in P^{2}

(7)

The anchor is chosen to be the position of the optical center at the initialization time, given by

T

. That way, the term multiplying the unmeasured degree of freedom ρ (i.e.,

(T - p_{0}) ρ

) is small after initialization. This helps to decouple the uncertainty of the most uncertain parameter ρ. The complete anchored homogeneous point parametrization for back projection and transformation is the following:

L_{A H P} = [\begin{matrix} p_{0} \\ m \\ ρ \end{matrix}] = [\begin{matrix} T \\ R K^{- 1} \underset{̲}{u} \\ ρ^{C} \end{matrix}]

(8)

where

ρ^{C}

must be given as the prior.

2.2. 3D Line Parametrizations

The line parametrization includes the projection to the image frame and back-projection to 3D.

The Plücker line and anchored homogeneous point line are shown in Figure 3.

2.2.1. Plücker Line

Plücker coordinates are conformed by a six-vector that represents a line in

P^{3}

defined by two points

\underset{̲}{a} = {[a a]}^{T}

and

\underset{̲}{b} = {[b b]}^{T}

:

L_{P L} = [\begin{matrix} n \\ v \end{matrix}] = {[n_{x} n_{y} n_{z} v_{x} v_{y} v_{z}]}^{T} \in P^{5} \subset R^{6},

where

n = a \times b

,

v = a b - b a

,

n, v \in R^{3}

, with the Plücker constraint:

n^{T} v = 0

.

In terms of geometry, n is the vector normal to the plane π containing the line and the origin and v is the director vector from

\underset{̲}{a}

to

\underset{̲}{b}

. The Euclidean orthogonal distance from the line to the origin is given by

∥ n ∥ / ∥ v ∥

. Hence,

∥ v ∥

is the inverse-distance, analogous to ρ of homogeneous points. Plücker line geometrical representation is shown in Figure 3a.

Expressions for transformation and inverse-transformation of Plücker coordinates from and to the camera frame are as shown next:

\begin{matrix} L_{P L} & = & H \cdot L_{P L}^{C} = [\begin{matrix} R & {[T]}_{\times} R \\ 0 & R \end{matrix}] \cdot [\begin{matrix} n^{C} \\ v^{C} \end{matrix}] \end{matrix}

(9)

\begin{matrix} L_{P L}^{C} & = & H^{- 1} \cdot L_{P L} = [\begin{matrix} R^{T} & - R^{T} {[T]}_{\times} \\ 0 & R^{T} \end{matrix}] \cdot [\begin{matrix} n \\ v \end{matrix}] \end{matrix}

(10)

The transformation and projection process in terms of

R

,

T

, n and v is as follows:

l = K \cdot R^{T} \cdot (n - T \times v)

(11)

where the intrinsic projection Plücker matrix

K

is defined as:

K = [\begin{matrix} α_{v} & 0 & 0 \\ 0 & α_{u} & 0 \\ - α_{v} u_{0} & α_{u} v_{0} & α_{u} α_{v} \end{matrix}]

When Plücker coordinates are expressed in the camera frame, projection is obtained by:

l = K \cdot n^{C}

(12)

The range and orientation of the line are included in

v^{C}

and are not measurable.

For Plücker line back projection, vectors

n^{C}

and

v^{C}

are obtained with these expressions:

\begin{matrix} n^{C} & = K^{- 1} \cdot l \\ v^{C} & = β_{1} \cdot e_{1} + β_{2} \cdot e_{2} \end{matrix}

where

β_{1}, β_{2} \in R

and

\{e_{1}, e_{2}, n^{C}\}

are mutually orthogonal.

Defining

β = (β_{1}, β_{2}) \in R^{2}

, vector

v^{C}

can also be expressed as:

v^{C} = E \cdot β,

where

v^{C} \in π^{C}

for any value of β. Plücker line back projection is shown in Figure 3b.

The complete Plücker line parametrization for back projection and transformation is given in the following equation:

L_{P L} = H [\begin{matrix} n^{C} \\ v^{C} \end{matrix}] = H [\begin{matrix} K^{- 1} l \\ E β \end{matrix}] = [\begin{matrix} R K^{- 1} l + T \times R E β \\ R E β \end{matrix}]

(13)

where β must be provided as a prior.

2.2.2. Anchored Homogeneous Points Line

A line can also be represented by the end points defining it. With the application of the anchored homogeneous point parametrization, shown in Figure 3c, an anchored homogeneous-points line is an eleven-vector defined as follows:

L_{A H P L} = {[p_{0} m_{1} ρ_{1} m_{2} ρ_{2}]}^{T} \in R^{11}

For each point, the transformation and projection of a pinhole camera is as previously stated in Equation (4).

A homogeneous 2D line is obtained by the cross product of two points lying on it,

l = {\underset{̲}{u}}_{1} \times {\underset{̲}{u}}_{2}

, giving:

l = K R^{T} ((m_{1} \times m_{2}) - (T - p_{0}) \times (ρ_{1} m_{2} - ρ_{2} m_{1}))

(14)

In comparison to the result obtained for Plücker coordinates, the product

m_{1} \times m_{2}

is a vector orthogonal to the plane π, analogous to the Plücker sub-vector n. Furthermore, the term

(ρ_{1} m_{2} - ρ_{2} m_{1})

is a vector that gives the direction between the points of the line, therefore related to Plücker sub-vector v.

The complete anchored homogeneous point line parametrization for back projection and transformation is the following:

L_{A H P L} = [\begin{matrix} p_{0} \\ m_{1} \\ ρ_{1} \\ m_{2} \\ ρ_{2} \end{matrix}] = [\begin{matrix} T \\ R K^{- 1} {\underset{̲}{u}}_{1} \\ ρ_{1}^{C} \\ R K^{- 1} {\underset{̲}{u}}_{2} \\ ρ_{2}^{C} \end{matrix}]

(15)

where

ρ_{1}^{C}

and

ρ_{2}^{C}

must be given as priors.

2.3. Landmark Initialization

The process of the initialization of a landmark consists of the detection of a feature in the image, retro-projection to 3D and inclusion into the map. There are three important concepts that are involved in the landmark initialization: the 3D landmark x itself, the 2D measurement z of the landmark in the image and the unmeasured degree of freedom π. All of these are modeled as Gaussian variables, whose notation is

N \{μ, σ^{2}\}

. Thus, the cited concepts are expressed as

x \sim N \{\bar{x}, P\}

,

z \sim N \{\bar{z}, R\}

and

π \sim N \{\bar{π}, Π\}

, respectively. The 3D landmarks x considered for this study are points and lines, already described in Section 2.1 and Section 2.2. The parametrizations for landmark 2D measurements z and unmeasured degrees of freedom π, as well as the description of the initialization algorithm are covered in the following sections.

2.3.1. Landmark 2D Measurements in the Image

Points are represented as a two-vector containing Cartesian coordinates in pixel space, leading to the following:

u = [\begin{matrix} u \\ v \end{matrix}] \sim N \{\bar{u}, U\},

where U is the covariance matrix of the position of the point.

In homogeneous coordinates,

\underset{̲}{u} = [\begin{matrix} u \\ 1 \end{matrix}] \sim N \{\underset{̲}{\bar{u}}, \underset{̲}{U}\} = N \{[\begin{matrix} \bar{u} \\ 1 \end{matrix}], [\begin{matrix} U & 0 \\ 0 & 0 \end{matrix}]\}

Lines can be expressed by a four-vector that represents the coordinates of their end-points, also with a Gaussian probability density function.

s = [\begin{matrix} u_{1} \\ u_{2} \end{matrix}] \sim N \{\bar{s}, S\} = N \{[\begin{matrix} {\bar{u}}_{1} \\ {\bar{u}}_{2} \end{matrix}], [\begin{matrix} U & 0 \\ 0 & U \end{matrix}]\}

The probability density function for infinite lines like Plücker,

N \{\bar{l}, L\}

, is composed of the homogeneous line representation and the covariance matrix defined as follows:

\begin{matrix} \bar{l} & = {\bar{\underset{̲}{u}}}_{1} \times {\bar{\underset{̲}{u}}}_{2}, and \\ L & = {[{\bar{\underset{̲}{u}}}_{1}]}_{\times} \underset{̲}{U} {[{\bar{\underset{̲}{u}}}_{1}]}_{\times}^{T} + {[{\bar{\underset{̲}{u}}}_{2}]}_{\times} \underset{̲}{U} {[{\bar{\underset{̲}{u}}}_{2}]}_{\times}^{T} \end{matrix}

2.3.2. Unmeasured Degrees of Freedom

The uncertainty in 3D points and lines coming from projection is represented by inverse-distance variables

ρ^{C}

and

β^{C}

, which are modeled as Gaussian variables. The origin of each of these priors must be inside the

2 σ

of their probability density functions.

For points and end-point-based lines, the minimum distance must match the upper

2 σ

boundary, hence:

\begin{matrix} ρ & - n σ_{ρ} = 0, 0 \leq n < 2 \\ ρ & + 2 σ_{ρ} = 1 / d_{min} \end{matrix}

Then,

n = 1

leads to,

\bar{ρ} = 1 / 3 d_{min} a n d σ_{ρ} = 1 / 3 d_{min}

The probability density function of a point based line is defined as

t^{C} \sim N \{\bar{t}, T\}

, with:

\bar{t} = [\begin{matrix} \bar{ρ} \\ \bar{ρ} \end{matrix}], T = [\begin{matrix} σ_{ρ}^{2} & 0 \\ 0 & σ_{ρ}^{2} \end{matrix}]

The Plücker lines prior

β^{C} \sim N \{\bar{β}, B\}

take the following values:

\bar{β} = [\begin{matrix} 1 / 3 d_{min} \\ 0 \end{matrix}], and B = [\begin{matrix} {(1 / 3 d_{min})}^{2} & 0 \\ 0 & {(1 / 2 d_{min})}^{2} \end{matrix}]

This initializes lines at the front of the camera.

2.3.3. Undelayed Landmark Initialization Algorithm

The ULI algorithm was presented in [30] for the construction of landmark-based stochastic maps, including a single type of landmark

L

. The approach presented in this paper includes heterogeneous parametrizations of landmarks

L_{*}

on the same map, where

L_{*}

can be a point or line. The resulting algorithm is composed of the following steps:

Identify mapped magnitudes $x \sim N \{\bar{x}, P\}$ .
Identify measurements $z \sim N \{\bar{z}, R\}$ , where $z$ is either a point or a line (i.e., $u$ or $s$ , respectively).
Define Gaussian prior $π \sim N \{\bar{π}, Π\}$ for the unmeasured degree of freedom. π can either be $ρ^{C}$ , $t^{C}$ or $β^{C}$ .
Back-project the Gaussian measurement, and get the landmark mean and Jacobians.

$\begin{matrix} \bar{L_{*}} = g (\bar{C}, \bar{z}, \bar{π}) \\ G_{C} = {\frac{d g}{d C}|}_{\bar{C}, \bar{z}, \bar{π}}, G_{z} = {\frac{d g}{d z}|}_{\bar{C}, \bar{z}, \bar{π}}, G_{π} = {\frac{d g}{d π}|}_{\bar{C}, \bar{z}, \bar{π}}, \end{matrix}$

where $g ()$ is the back projection and transformation function for the corresponding landmark. $C = (T, Q)$ is the camera frame expressed in terms of its position $T$ and orientation $Q$ in quaternion nomenclature.
Compute landmarks’ co- and cross-variances.

$\begin{matrix} P_{L} * L_{*} = G_{C} P_{CC} G_{C}^{T} + G_{z} {RG}_{z}^{T} + G_{π} Π G_{π}^{T} \\ P_{L_{*} x} = G_{C} P_{C x} = G_{C} [P_{CC} P_{CM}] \end{matrix}$
Augment the SLAM map.

$\begin{matrix} \bar{x} \leftarrow [\begin{matrix} \bar{x} \\ L_{*} \end{matrix}] \\ P \leftarrow [\begin{matrix} P & P_{L_{*} x}^{T} \\ P_{L_{*} x} & P_{L_{*} L_{*}} \end{matrix}] \end{matrix}$

2.4. Landmark Update

The purpose of the landmark update process is to recalculate the parameters of the elements on the map, (i.e., the robot and landmark poses), given the observation of the already mapped landmarks in the current frame. This process starts by projecting all of the observable landmarks to the image plane and selecting those with higher uncertainty for correction. For points, the observation function h() applies a homogeneous to Euclidean transformation h2e() once having performed the projection process previously explained, as follows:

z = h 2 e (\underset{̲}{u}) = [\begin{matrix} u_{1} / u_{3} \\ u_{2} / u_{3} \end{matrix}] \in R^{2}

Innovation mean y and covariance Y is then obtained as shown next:

\begin{matrix} y & = z - h (\bar{x}) \\ Y & = R + H \cdot P \cdot H^{T}, \end{matrix}

where

R = U

is the measurement noise covariance and Jacobian

H = {\frac{\partial h}{\partial x}|}_{\bar{x}}

.

For lines, the innovation function computes the orthogonal distances from the detected end-points

u_{i}

to a line l, as shown in Figure 4, leading to the following:

z = [\begin{matrix} l^{T} \cdot {\underset{̲}{u}}_{1} / \sqrt{l_{1}^{2} + l_{2}^{2}} \\ l^{T} \cdot {\underset{̲}{u}}_{2} / \sqrt{l_{1}^{2} + l_{2}^{2}} \end{matrix}] \in R^{2}

(16)

Since the EKF innovation is the difference between the actual measurement and the expectation and z is the orthogonal distance previously described, the line innovation function is:

y = 0 - z,

as the desired orthogonal distance from the predicted line to the matched end-points is zero.

A landmark is found consistent if the squared Mahalanobis distance MD2 of innovation is smaller than a threshold MD2th.

M D 2 = y^{T} \cdot Y^{- 1} \cdot y < M D 2 t h

As that is true, the landmark is updated:

\begin{matrix} Kalman gain : & K = P \cdot H \cdot Y^{- 1} \\ State update : & \bar{x} \leftarrow \bar{x} + K \cdot y \\ Covariance update : & P \leftarrow P - K \cdot H \cdot P \end{matrix}

Point and line parametrizations are modeled as Gaussian variables in [25,29,30], validating the use of Mahalanobis distance as compared to a chi-squared distribution. Kalman gain is assumed to be optimal. Since this process is intended to be developed as a light approach that could be integrated into a dedicated architecture on small vehicles, the selected covariance update formula is used instead of the Joseph form, which has such a high complexity that it may compromise performance. Successful results of this formulation are presented in [30].

3. SLAM Front-End

To obtain the geometrical representation of landmarks given by the SLAM back-end, it is necessary to process the information coming from the sensors embedded in the moving agent (i.e., cameras mounted on the mobile robot). The front-end deals with the detection of new landmarks and the matching of already existent ones in subsequent images.

This section covers the image processing algorithms used for detecting and matching points and lines. Points have been widely studied and implemented as SLAM landmarks [3,24,33,34]. In the case of lines, a different front-end strategy was integrated for the detection and tracking of line segments.

3.1. Point Landmarks

A point landmark is modeled as an appearance descriptor composed of a patch of pixels around the point in the image. Once detected, the patch is used for the matching of the feature on incoming images.

3.1.1. Point Detection

An active-search approach [24,34] can ensure that the point landmarks are equally distributed in the image by dividing it into a number of equal regions, in which it is expected to have a landmark (Figure 5a). At each iteration, an empty region is randomly selected, and a corner point is chosen to be the strongest Harris point [35] (Figure 5b). This point is used for the landmark initialization, and its appearance and the current position and orientation of the camera

C_{0} = (T_{0}, R_{0})

are saved. The appearance of the point is given by the patch of pixels surrounding it, as seen in Figure 5c.

3.1.2. Point Matching

When there are point landmarks already mapped, the matching process searches for a point landmark x in the frame captured at camera pose

C_{i}

. This point has been initialized in the frame captured with camera pose

C_{0}

(Figure 6a,b).

The saved appearance patch of the landmark is warped by applying a homography transformation. This transformation takes into account the rotation and translation in the camera position and orientation, with respect to its pose when the landmark was detected. The transformed coordinates of pixel j of the patch at camera pose i (i.e.,

q_{i}^{j}

) is computed as follows:

q_{i}^{j} = H q_{0}^{j}

(17)

where:

H = K [\begin{matrix} R & T \\ 0 & 1 \end{matrix}] K^{- 1}

and

q_{0}^{j}

are the coordinates of pixel j of the original patch, as

R

and

T

are the rotation matrix and translation vector between camera pose

C_{0}

and camera pose

C_{i}

. Once the patch is warped, it is cropped to be squared in order to maintain the same dimensions of the original. This warping process is shown in Figure 6c.

The matching process performs the projection of point landmark x into the image at camera pose

C_{i}

to get a point expected position

u_{i}

, given the current pose i of the camera:

u_{i} = h (C_{i}, x)

The 2D covariance matrix U of this point is obtained from the 3D covariance matrix

P_{L L}

corresponding to landmark

L

, as follows:

U = [\begin{matrix} U_{RF} & U_{SF} & U_{L} \end{matrix}] P_{LL} {[\begin{matrix} U_{RF} & U_{SF} & U_{L} \end{matrix}]}^{T}

where

U_{RF}

,

U_{SF}

and

U_{L}

are the Jacobians of the projection u with respect to the robot frame, the sensor frame and the landmark, respectively.

Then, the zero mean normalized cross-correlation (ZNCC) test [36] is applied to the warped patch and a region of pixels surrounding the expected point in the image (Figure 6d). The rectangular search region is based on the projection mean u and the covariance ellipse U. The mean is the center of the search box, and the square roots of the diagonals of the covariance are the standard deviations,

σ_{u}

and

σ_{v}

. The search region goes

\pm 3 σ

at each side of the center. If the ZNCC score is over a threshold, the point is said to be matched.

3.2. Line landmarks

Many methods have been proposed to extract lines in the image processing community, generally starting from the detection of intensity discontinuities (gradient, Laplacian, Canny filter). The first method, introduced in the 1980s, was the chaining method [37], based on the polygonal approximation of extracted contours. This method is efficient, but the result is too dependent on its parametrizations (gradient threshold, contour thinning). This is why the Hough transform became so popular [38]; a recent variant, the kernel-based Hough transform [39], has been implemented and evaluated for line detection. However, this approach, when working with infinite lines rather than segments, makes its performance less than optimal for the intended purposes. In [40], the Dseg algorithm is proposed, close to the chaining method, but using an iterative filtering approach to integrate a contour point in the processed segment. The Dseg algorithm was compared to the chaining method, with the Hough transform and with the line segment detector (LSD) detector [41]; it was found that it allows extraction of a greater number of segments of various lengths.

This section covers the description of a front-end line segment active-search process, developed for segment-based SLAM. The selected techniques for working with segments are the LSD for the detection and moving edges (ME) [42] for matching.

An active-search approach was developed and implemented in order to handle the line segment landmarks, similar to the one previously described for points.

3.2.1. Segment Detection

The process starts by building a grid that divides the image into rectangular cells. A 3 × 3 grid was chosen, as shown in Figure 7a. There are two different ways of detecting lines: the one applied in the first frame of the sequence, and the one used in all other frames. In the first case, the segment detection algorithm is run for the whole image, and the longest segments found are selected. The cells containing a whole or partial segment are marked as “occupied”. This is shown in Figure 7b. The detection process, applied in a subsequent frame (Figure 7c), departs from the assumption that there are line landmarks already on the map. Once the EKF back-end computes the 3D position of the robot and the landmarks seen so far, landmarks are projected to the image. The projection of line landmark x into the image at camera pose

C_{i}

to get a line segment’s expected position

s_{i}

, given the current pose i of the camera, is performed as follows:

s_{i} = h (C_{i}, x)

(18)

Each projection is taken into account for updating the grid. The occupied cells are not considered, and one empty cell is chosen randomly. The image patch delimited by this cell is used to run the segment detector and to find the longest segment on it for initializing a new landmark. The line detected is extended to the other cells, and they are marked as “occupied” when this is the case. This process is shown in Figure 7d.

The patch of the cells where lines were detected in the present frame is saved for the line matching process, as well as the current position and orientation of the camera

C_{0} = (T_{0}, R_{0})

. Each line is defined by its end points, and only the pixels surrounding it are used for matching, as can be seen in Figure 7e.

3.2.2. Segment Matching

When there are line landmarks already mapped, the matching process searches for a line landmark x in the frame captured at camera pose

C_{i}

. This line has been initialized in the frame captured with camera pose

C_{0}

(Figure 8a,b).

The saved appearance patch of the landmark is warped by applying a homography transformation. This transformation takes into account the rotation and translation in the camera position and orientation, with respect to its pose when the landmark was detected. The transformed coordinates of pixel j of the patch at camera pose i (i.e.,

q_{i}^{j}

) are computed applying Equation (17). This warping process is shown in Figure 8c.

The matching process performs the projection of the line landmark from Equation (18) to estimate its position of line segment

s_{i}

in the image. For updating the current position of each line landmark, their estimated position in the image is used for initializing the matching algorithm, and a search for a match is performed in the region surrounding the line estimation (Figure 8d). The errors between the estimated position and the position of the match found are used by the back-end to update the state and uncertainties of the landmarks and robot poses on the map.

To perform the tracking of the points that make up part of line segments, the moving edges algorithm is implemented as discussed in [43].

The algorithm consists of searching the correspondent point

p^{t + 1}

on line

l {(r)}^{t + 1}

in image

I^{t + 1}

of point

p^{t}

in line

l {(r)}^{t}

. The search for a match is performed in the direction normal to the line

l {(r)}^{t}

, given by δ. For each point

p^{t}

, a search interval

\{Q_{j}, j \in [- J, J]\}

is defined. Each sample

Q_{j}

is evaluated by the criterion

ζ_{j}

. This evaluation consists in computing the convolution value between an image patch at the neighborhood ν of

Q_{j}

, and the mask

M_{δ}

, which is a function of the orientation of line

l {(r)}^{t}

. The algorithm is shown in Figure 9.

Thus, the position of point

p^{t + 1}

on line

l {(r)}^{t + 1}

in image

I^{t + 1}

is given by:

Q_{j^{*}} = arg max_{j \in [- J, J]} ζ_{j} w i t h = |I_{ν (p^{t})}^{t} * M_{δ} + I_{ν (Q_{j})}^{t + 1} * M_{δ}|

(19)

A list of k points is produced, from which the segment extremities

s = (u_{1}, u_{2})

are extracted.

One way to express the measurement z of the matched segment with respect to the expected prediction of it is to compute the orthogonal distance of the matched end points

u_{1}

and

u_{2}

to the predicted line l, as shown in Equation (16) and Figure 4.

By defining line measurement in this way, the matching can be accomplished regardless of which points of the corresponding line were detected by the tracker and of the segment length.

4. Experiments

This section includes the experimental part that tests the three main contributions of this article. The first part deals with the back-end and consists of a comparative evaluation between different landmark parametrizations. A set of simulations tests the benefits of the combination of point and line landmarks in the same map. The following part deals with the implementation of the developed segment-based SLAM front-end that includes the line segment active-search process presented in this paper. Finally, a complete heterogeneous landmark-based SLAM experiment that integrates the contributions to back-end and front-end is included.

4.1. Simulation of the Back-End for Heterogeneous SLAM

The point and line parametrizations previously presented have been tested independently in previous studies, such as [25,29,30]. This section offers a comparison of different heterogeneous approaches, including combinations of distinct landmarks on the same map. The purpose is to show the benefits of working with a heterogeneous parametrization that combines points and lines in a single map. The combinations performed are enumerated below:

Anchored homogeneous point (AHP)
Plücker line (PL)
Anchored homogeneous-points line (AHPL)
AHP + PL
AHP + AHPL

The MATLAB® EKF-SLAM toolbox [44] was extended with the heterogeneous functionality to perform the simulations.

Figure 10a shows the simulation environment. It consists of a house conformed by 23 lines and an array of 16 points distributed uniformly among the walls.

The robot performs two different trajectories. The first one is a circular path of 5 m in diameter, with a pose step of 8 cm and 0.09°. The second is a motion of 70 steps of 4 cm, each taken towards the scene. The linear noise is 0.5 cm and the angular noise 0.05°.

Besides the heterogeneous landmark capability of the toolbox, the transparency of the objects in the scene was also considered. By default, objects in the simulation environment of the toolbox are transparent, so landmarks are visible on almost every image frame. To work in a more realistic manner, an aspect graph was implemented to only observe visible surfaces of the house at each camera pose.

Both transparent and opaque object visualizations are shown in Figure 11. An example of a heterogeneous map constructed after a complete turn of the robot around the house is shown in Figure 10b. The parametrization used is AHP + PL. For the case of the approaching trajectory, a final heterogeneous map constructed is shown in Figure 10c. The parametrization used is AHP + AHPL. They display in green the line landmarks estimated and in blue the point landmarks. Real, predicted and estimated robot trajectories are displayed in blue, red and green, respectively.

For the circular path, a trajectory of five turns, considering transparent and opaque objects, was performed for each parametrization.

4.2. Integration of Line Segment Active-Search to the SLAM Front-End

The line-based SLAM front-end that was developed and that implements the line segment active-search presented in this article is covered in this section. The LSD and ME algorithms were applied to an image sequence showing a piece of furniture inside a room.

Figure 12 shows the operation of this segment-based front-end of an EKF-SLAM process. Infinite thin lines represent the estimated position of the landmark in the current image, while thicker segments show the match found. It can be observed that the match corresponds to the estimation in most of the cases.

4.3. Heterogeneous SLAM Experiment

The complete heterogeneous SLAM solution was tested with the experiment described in this section.

The mobile robot used was a CRS Robotics F3 system, with a Microsoft LifeCam Studio camera mounted on it, along with a 9DOF Razor IMU that provided the robot state estimation at each frame. A total of 401 images with a 1280 × 720 resolution form the sequence. From this robot, it was possible to get the ground truth information with a repeatability of ±0.05 mm. The robot described a total trajectory of 0.4653 m. To get a prediction of the motion, the information provided by the IMU was used in a constant acceleration motion model. The inclusion of this additional sensor made it possible to cope with the inherent scale ambiguity of monocular systems.

Figure 13 presents certain frames of the sequence, showing the landmarks used to update the state of the map. The AHP and AHPL were the parametrizations selected for the experiments, as they were the ones that provided better simulation results.

5. Results and Discussion

For the analysis, root mean square error (RMSE) evaluation is used to compute errors. At each instant k, the estimated position

(x_{k}, y_{k}, z_{k})

is compared to the ground truth position

({\hat{x}}_{k}, {\hat{y}}_{k}, {\hat{z}}_{k})

.

ϵ_{k} = \sqrt{(({(x_{k} - {\hat{x}}_{k})}^{2} + {(y_{k} - \hat{y_{k}})}^{2} + {(z_{k} - {\hat{z}}_{k})}^{2}))}

(20)

From the previous results, the mean and standard deviation of the error are computed as follows:

\begin{matrix} μ_{ϵ} = \frac{1}{N} \sum_{k = 1}^{N} ϵ_{k} \end{matrix}

(21)

\begin{matrix} σ_{ϵ} = \sqrt{\frac{1}{N} \sum_{k = 1}^{N} {(ϵ_{k} - μ_{ϵ})}^{2}} \end{matrix}

(22)

Simulation of the back-end for heterogeneous SLAM was intended to compare the different parametrizations and to show the benefits of landmark heterogeneity.

The parametrization with the highest error is the Plücker line. Anchored parametrizations achieved the best performance, for both points and lines. There is an improvement effect in line parametrizations by the addition of points. Even for the anchored cases, already having a relative good performance while working independently, the heterogeneity improves the results, in such a way that the combination of both AHP and AHPL is the one with the least error along the simulated trajectories.

The position error of the robot in the case of a circular trajectory with transparent and opaque objects is shown in Figure 14; these results are summarized in Table 1. For the case of the approaching trajectory, the results are shown and summarized in Figure 15 and Table 2.

The complete SLAM experiment integrating the back-end with the developed front-end is used to compare the heterogeneous approach with a classic point-based SLAM applied to the same sequence. The ground truth and estimated trajectories for each SLAM approach tested are shown in Figure 16.

Figure 17 presents results in terms of the robot position estimation error, comparing the IMU estimation to both point and heterogeneous SLAM. As can be observed, the heterogeneous approach results in lower errors, as previously presented in the simulation part. Near Frame 200, there was a change in the motion direction of the robot, which can be seen as a peak in the position estimation error graph. Even in this case, heterogeneous SLAM achieved a better performance than the point-based SLAM. Table 3 shows a comparative summary of the errors from the three cases.

6. Conclusions

The purpose of this paper is to prove the benefits of including heterogeneous landmarks when building a map from an EKF-based visual SLAM method. Several authors have shown interest in the use of heterogeneous landmarks. For the front-end, the interest for joint tracking of points and lines is found in [45], while for the back-end, a theoretical study is presented in [31], and preliminary results of an EKF-based SLAM method, based on heterogeneous landmarks, are presented in [46]. The experiments performed that the authors describe have shown that the robot localization or the SLAM stability can be improved by combining several landmarks, i.e., points and lines. The use of just monocular vision provides only partial observations of landmarks by features extracted from images; here, undelayed initialization of landmarks is used, as was proposed initially by Solà et al. [25,29] for points and lines. The use of simulated data has shown how the choice of the landmark representation has an impact on the accuracy of the map. The best ones, considering the construction of a map with heterogeneous landmarks, are anchored homogeneous points and anchored homogeneous points lines. These parametrizations were used in a complete heterogeneous SLAM experiment that produced better results than the classic point-based case, by reducing the camera position estimation error.

Another contribution of this paper is a method proposed for a segment-based SLAM front-end. This method relies on the line segment active-search process presented in this article and on state-of-the-art line detection and matching processes. The methods that compose the developed front-end that resulted were discussed, recalling, first, their theoretical background and, then, presenting some experimental evaluations on image sequences that show the stability of the process.

Finally, a complete heterogeneous landmark-based SLAM experiment was presented, integrating the contributions with the back-end and the front-end and confirming the results obtained independently.

In future work, constraints will be exploited in the map, typically when points and lines are extracted from known portions of the scene.

Acknowledgments

This study was performed during a long-term stay of Jorge Othón Esparza-Jiménez, PhD student in Tecnológico de Monterrey, Mexico, in the Robot, Action and Perception team of LAAS-CNRS in France. The authors would like to thank Mexico’s National Council of Science and Technology (Consejo Nacional de Ciencia y Tecnología (CONACyT)), the Focus Group on Robotics at Tecnológico de Monterrey and the Laboratorio de Robótica del Área Noreste y Centro de México (Robotics Laboratory of Northeastern and Central Mexico) founded by CONACyT-Tecnológico de Monterrey, for supporting this research. This study contributes to the French FUI project AIR-COBOT. The authors also acknowledge Susan Fortenbaugh for the English language revision of the paper.

Author Contributions

Michel Devy and Jorge Othón Esparza-Jiménez conceived and designed the experiments; Jorge Othón Esparza-Jiménez performed the experiments; Jorge Othón Esparza-Jiménez analyzed the data; J. L. Gordillo contributed materials and equipment; Michel Devy contributed analysis tools; Michel Devy and J. L. Gordillo supervised the work; Jorge Othón Esparza-Jiménez wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Carmigniani, J.; Furht, B.; Anisetti, M.; Ceravolo, P.; Damiani, E.; Ivkovic, M. Augmented reality technologies, systems and applications. Multimedia Tools Appl. 2011, 51, 341–377. [Google Scholar] [CrossRef]
Chekhlov, D.; Gee, A.P.; Calway, A.; Mayol-Cuevas, W. Ninja on a plane: Automatic discovery of physical planes for augmented reality using visual slam. In Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16 November 2007; pp. 1–4.
Davison, A.J. Real-time simultaneous localisation and mapping with a single camera. In Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; pp. 1403–1410.
Dow, J.M.; Neilan, R.; Rizos, C. The international GNSS service in a changing landscape of global navigation satellite systems. J. Geodesy 2009, 83, 191–198. [Google Scholar] [CrossRef]
Zhang, C.; Kuhn, M.; Merkl, B.; Fathy, A.E.; Mahfouz, M. Accurate UWB indoor localization system utilizing time difference of arrival approach. In Proceedings of the 2006 IEEE Radio and Wireless Symposium, San Diego, CA, USA, 17–19 October 2006.
Lim, C.H.; Wan, Y.; Ng, B.P.; See, C.M.S. A real-time indoor WiFi localization system utilizing smart antennas. IEEE Trans. Consum. Electron. 2007, 53, 618–622. [Google Scholar] [CrossRef]
Bulusu, N.; Heidemann, J.; Estrin, D. GPS-less low-cost outdoor localization for very small devices. IEEE Pers. Commun. 2000, 7, 28–34. [Google Scholar] [CrossRef]
Yi, J.; Zhang, J.; Song, D.; Jayasuriya, S. IMU-based localization and slip estimation for skid-steered mobile robots. In Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), San Diego, CA, USA, 29 October–2 November 2007; pp. 2845–2850.
Karras, G.C.; Kyriakopoulos, K.J. Localization of an underwater vehicle using an IMU and a laser-based vision system. In Proceedings of the IEEE Mediterranean Conference on Control & Automation (MED’07), Athens, Greece, 21–24 June 2007; pp. 1–6.
Ndjeng, A.N.; Gruyer, D.; Glaser, S.; Lambert, A. Low cost IMU–Odometer–GPS ego localization for unusual maneuvers. Inf. Fusion 2011, 12, 264–274. [Google Scholar] [CrossRef]
Surmann, H.; Nüchter, A.; Hertzberg, J. An autonomous mobile robot with a 3D laser range finder for 3D exploration and digitalization of indoor environments. Robot. Auton. Syst. 2003, 45, 181–198. [Google Scholar] [CrossRef]
Hussmann, S.; Liepert, T. Robot vision system based on a 3D-ToF camera. In Proceedings of the IEEE Instrumentation and Measurement Technology Conference Proceedings (IMTC 2007), Warsaw, Poland, 1–3 May 2007; pp. 1–5.
Endres, F.; Hess, J.; Engelhard, N.; Sturm, J.; Cremers, D.; Burgard, W. An evaluation of the RGB-D SLAM system. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA, 14–18 May 2012; pp. 1691–1696.
Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceedings of the 6th IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR 2007), Nara, Japan, 13–16 November 2007; pp. 225–234.
Dissanayake, M.; Newman, P.; Clark, S.; Durrant-Whyte, H.F.; Csorba, M. A solution to the simultaneous localization and map building (SLAM) problem. IEEE Trans. Robot. Autom. 2001, 17, 229–241. [Google Scholar] [CrossRef]
Walter, M.R.; Eustice, R.M.; Leonard, J.J. Exactly sparse extended information filters for feature-based SLAM. Int. J. Robot. Res. 2007, 26, 335–359. [Google Scholar] [CrossRef]
Sim, R.; Elinas, P.; Griffin, M.; Little, J.J. Vision-based SLAM using the Rao-Blackwellised particle filter. In Proceedings of the IJCAI Workshop on Reasoning with Uncertainty in Robotics, Edinburgh, UK, 30 July–5 August 2005; Volume 14, pp. 9–16.
Strasdat, H.; Montiel, J.; Davison, A.J. Real-time monocular slam: Why filter? In Proceedings of the 2010 IEEE International Conference on Robotics and Automation (ICRA), Anchorage, Alaska, 3–8 May 2010; pp. 2657–2664.
Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 1150–1157.
Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Computer vision–ECCV 2006; Springer: Berlin, Germany, 2006; pp. 404–417. [Google Scholar]
Rosten, E.; Drummond, T. Machine learning for high-speed corner detection. In Computer Vision–ECCV 2006; Springer: Berlin, Germany, 2006; pp. 430–443. [Google Scholar]
Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. Brief: Binary robust independent elementary features. In Computer Vision–ECCV 2010; Springer: Berlin, Germay, 2010; pp. 778–792. [Google Scholar]
Tomasi, C.; Kanade, T. Detection and Tracking of Point Features; Technical Report CMU-CS-91-132; Carnegie Mellon Univ.: Pittsburgh, PA, USA, April 1991. [Google Scholar]
Davison, A.J. Active search for real-time vision. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 2005), Beijing, China, 17–20 October 2005; Volume 1, pp. 66–73.
Sola, J.; Vidal-Calleja, T.; Devy, M. Undelayed initialization of line segments in monocular SLAM. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2009), St. Louis, MO, USA, 11–15 October 2009; pp. 1553–1558.
Lee, J.H.; Zhang, G.; Lim, J.; Suh, I.H. Place recognition using straight lines for vision-based SLAM. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 6–10 May 2013; pp. 3799–3806.
Lemaire, T.; Lacroix, S.; Sola, J. A practical 3D bearing-only SLAM algorithm. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2005), Edmonton, AB, Canada, 2–6 August 2005; pp. 2449–2454.
Montiel, J.M.M. Unified inverse depth parametrization for monocular SLAM. In Proceedings of the Robotics: Science and Systems (RSS), Philadelphia, PA, USA, 16–19 August 2006.
Sola, J.; Monin, A.; Devy, M.; Lemaire, T. Undelayed initialization in bearing only SLAM. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Edmonton, AB, Canada, 2–6 August 2005.
Sola, J.; Vidal-Calleja, T.; Civera, J.; Montiel, J.M.M. Impact of landmark parametrization on monocular EKF-SLAM with points and lines. Int. J. Comput. Vis. 2012, 97, 339–368. [Google Scholar] [CrossRef]
Esparza-Jimenez, J.O.; Devy, M.; Gordillo, J. EKF-based SLAM fusing heterogeneous landmarks. In Proceedings of the 2014 17th International Conference on Information Fusion (FUSION), Salamanca, Spain, 7–10 July 2014; pp. 1–8.
Smith, D. History of Modern Mathematics; Cornell University Library Historical Math Monographs; John Wiley & Sons: Hoboken, NJ, USA, 1906. [Google Scholar]
Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Civera, J.; Grasa, O.G.; Davison, A.J.; Montiel, J. 1-Point RANSAC for extended Kalman filtering: Application to real-time structure from motion and visual odometry. J. Field Robot. 2010, 27, 609–631. [Google Scholar] [CrossRef]
Harris, C.; Stephens, M. A combined corner and edge detector. In Proceedings of the Alvey Vision Conference, Manchester, UK, 31 August–2 September 1988.
Di Stefano, L.; Mattoccia, S.; Tombari, F. ZNCC-based template matching using bounded partial correlation. Pattern Recognit. Lett. 2005, 26, 2129–2134. [Google Scholar] [CrossRef]
Etemadi, A. Robust segmentation of edge data. In Proceedings of the IET International Conference on Image Processing and its Applications, Maastricht, The Netherlands, 7–9 April 1992; pp. 311–314.
Ballard, D.H. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognit. 1981, 13, 111–122. [Google Scholar] [CrossRef]
Fernandes, L.A.; Oliveira, M.M. Real-time line detection through an improved Hough transform voting scheme. Pattern Recognit. 2008, 41, 299–314. [Google Scholar] [CrossRef]
Berger, C.; Lacroix, S. DSeg: Direct Line Segments Detection; Technical Report; LAAS/CNRS: Toulouse, France, September 2009. [Google Scholar]
Von Gioi, R.G.; Jakubowicz, J.; Morel, J.M.; Randall, G. LSD: A line segment detector. Image Process. Line 2012, 2, 35–55. [Google Scholar] [CrossRef]
Bouthemy, P. A maximum likelihood framework for determining moving edges. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 499–511. [Google Scholar] [CrossRef]
Comport, A.I.; Marchand, E.; Pressigout, M.; Chaumette, F. Real-time markerless tracking for augmented reality: the virtual visual servoing framework. IEEE Trans. Vis. Comput. Graph. 2006, 12, 615–628. [Google Scholar] [CrossRef] [PubMed]
Sola, J.; Marquez, D.; Codol, J.; Vidal-Calleja, T. An EKF-SLAM toolbox for MATLAB. Available online: https://github.com/joansola/slamtb (accessed on 24 September 2013).
Rosten, E.; Drummond, T. Fusing points and lines for high performance tracking. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 2005), Beijing, China, 17–21 October 2005; Volume 2, pp. 1508–1515.
Vidal, T.; Berger, C.; Sola, J.; Lacroix, S. Environment modeling for cooperative aerial/ground robotic systems. In Robotics Research; Springer: Berlin, Germany, 2011; pp. 681–696. [Google Scholar]

Figure 1. General block diagram of the SLAM solution. The blocks in gray represent the contributions of the present method.

Figure 2. Point parametrizations used for initialization. (a) Homogeneous point parametrization; (b) anchored homogeneous point parametrization.

Figure 3. Line parametrizations used for initialization. (a) Plücker line geometrical representation; (b) Plücker line back-projection; (c) anchored homogeneous-points line parametrization.

Figure 4. Measurement of the orthogonal distances from the detected end-points to the expected (or predicted) line.

Figure 5. Point detection process. (a) Image divided into regions; one is randomly selected; (b) selected region, processed to find a corner point; (c) appearance of the point.

Figure 6. Point matching process. (a) Frame captured at camera pose

C_{0}

, where point x was initialized; (b) frame captured at camera pose

C_{i}

, where point x is intended to be matched; (c) patch warping process, showing the original, the transformed and the cropped patches; (d) projection of landmark point x into image point u_i on the image at camera pose

C_{i}

. The search area is indicated by the rectangle surrounding the point.

Figure 6. Point matching process. (a) Frame captured at camera pose

C_{0}

, where point x was initialized; (b) frame captured at camera pose

C_{i}

, where point x is intended to be matched; (c) patch warping process, showing the original, the transformed and the cropped patches; (d) projection of landmark point x into image point u_i on the image at camera pose

C_{i}

. The search area is indicated by the rectangle surrounding the point.

Figure 7. Line detection process. (a) Initial image divided in a 3 × 3 grid of cells. (b) Line segments detected. Only the ones that are longer than a threshold are selected. The cells that contain them are marked as “occupied”. (c) Later image of the sequence. The map already contains line landmarks. (d) Existing landmarks are projected into the image, and the occupation grid is updated. Occupied grids are marked with a textured pattern. The first image also shows the randomly-selected empty cell and the segment detected on it. The second image shows the result of extending the detected segment. Cells occupied by this segment are colored in blue. (e) Appearance of the detected lines. Each line is parametrized by its end points, the pixels surrounding the line are used for matching.

Figure 8. Line matching process. (a) Frame captured at camera pose

C_{0}

, where line x was initialized; (b) frame captured at camera pose

C_{i}

, where line x is intended to be matched; (c) patch warping process, showing the original and the transformed patches; (d) projection of landmark line x into image segment s_i on the image at camera pose

C_{i}

. The search area is indicated by the rectangle surrounding the segment.

Figure 8. Line matching process. (a) Frame captured at camera pose

C_{0}

, where line x was initialized; (b) frame captured at camera pose

C_{i}

, where line x is intended to be matched; (c) patch warping process, showing the original and the transformed patches; (d) projection of landmark line x into image segment s_i on the image at camera pose

C_{i}

. The search area is indicated by the rectangle surrounding the segment.

Figure 9. Moving edges (ME) algorithm to find correspondent points in image sequences. (a) Calculation of the direction δ normal to the line l(r)^t; (b) search sampling {Q_j, j ∈ [−J, J]} along the normal direction; (c) M_δ mask at 180° and (d) M_δ mask at 45°.

Figure 10. Environment world used for the simulation experiments of heterogeneous point and line SLAM. (a) Initial state of the environment world; (b) environment after performing circular trajectory with anchored homogeneous point (AHP) + Plücker line (PL) parametrization; (c) environment after performing approaching trajectory with AHP + AHPL parametrization.

Figure 11. Different visualization modes of objects in the simulation environment. (a) Sensor view considering transparent objects; (b) sensor view considering opaque objects.

Figure 12. Front end operation of segment-based EKF-SLAM in an indoor sequence, showing the prediction and observation of landmarks. Infinite thin lines represent the estimated position of the landmark in the current image, while thicker segments show the match found.

Figure 13. Point and line heterogeneous EKF-SLAM frames. (a,c,e) The front-end part; infinite thin lines represent the estimated position of the landmark in the current image, while thicker segments show the match found; circles represent the prediction of points, and points themselves are the match found; (b,d,f) The maps created on each frame, with the landmarks’ estimation, the current pose of the camera represented by the small reference frame and the trajectory described.

Figure 14. Robot position estimation errors for the circular trajectory.

Figure 15. Robot position estimation errors for the approach trajectory.

Figure 16. Ground truth, point SLAM estimated and heterogeneous SLAM estimated trajectories.

Figure 17. Robot position estimation error comparison between IMU, point SLAM and heterogeneous SLAM approaches.

Table 1. Robot position error mean and standard deviation, for transparent and opaque object visualization, for the different heterogeneous landmark parametrizations in the circular trajectory in the simulated environment.

**Table 1.** Robot position error mean and standard deviation, for transparent and opaque object visualization, for the different heterogeneous landmark parametrizations in the circular trajectory in the simulated environment.
		AHP	PL	AHPL	AHP + PL	AHPL + AHPL
Mean	Transparent objects	0.05856	0.1532	0.05861	0.1109	0.04982
Mean	Opaque objects	0.06332	0.08806	0.04561	0.06634	0.04114
Standard Deviation	Transparent objects	0.02426	0.07139	0.02268	0.05733	0.02025
Standard Deviation	Opaque objects	0.0316	0.03961	0.0236	0.02298	0.02368

Table 2. Robot position error mean and standard deviation for the different heterogeneous landmark parametrizations in the approach trajectory in the simulated environment.

**Table 2.** Robot position error mean and standard deviation for the different heterogeneous landmark parametrizations in the approach trajectory in the simulated environment.
	AHP	PL	AHPL	AHP + PL	AHP + AHPL
Mean	0.038	0.053	0.044	0.040	0.038
Standard Deviation	0.024	0.023	0.024	0.025	0.023

Table 3. Robot position error mean, maximum, minimum and standard deviation, for IMU estimation, point and heterogeneous SLAM solutions in the real environment.

**Table 3.** Robot position error mean, maximum, minimum and standard deviation, for IMU estimation, point and heterogeneous SLAM solutions in the real environment.
	Mean	Max	Min	SD
IMU	0.0018	0.0047	0.0002	0.0007
Point SLAM	0.0011	0.1500	0.0000	0.0012
Heterogeneous SLAM	0.0008	0.0107	0.0000	0.0011

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Esparza-Jiménez, J.O.; Devy, M.; Gordillo, J.L. Visual EKF-SLAM from Heterogeneous Landmarks. Sensors 2016, 16, 489. https://doi.org/10.3390/s16040489

AMA Style

Esparza-Jiménez JO, Devy M, Gordillo JL. Visual EKF-SLAM from Heterogeneous Landmarks. Sensors. 2016; 16(4):489. https://doi.org/10.3390/s16040489

Chicago/Turabian Style

Esparza-Jiménez, Jorge Othón, Michel Devy, and José L. Gordillo. 2016. "Visual EKF-SLAM from Heterogeneous Landmarks" Sensors 16, no. 4: 489. https://doi.org/10.3390/s16040489

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Visual EKF-SLAM from Heterogeneous Landmarks †

Abstract

1. Introduction

2. SLAM Back-End

2.1. 3D Point Parametrizations

2.1.1. Euclidean Point

2.1.2. Homogeneous Point

2.1.3. Anchored Homogeneous Point

2.2. 3D Line Parametrizations

2.2.1. Plücker Line

2.2.2. Anchored Homogeneous Points Line

2.3. Landmark Initialization

2.3.1. Landmark 2D Measurements in the Image

2.3.2. Unmeasured Degrees of Freedom

2.3.3. Undelayed Landmark Initialization Algorithm

2.4. Landmark Update

3. SLAM Front-End

3.1. Point Landmarks

3.1.1. Point Detection

3.1.2. Point Matching

3.2. Line landmarks

3.2.1. Segment Detection

3.2.2. Segment Matching

4. Experiments

4.1. Simulation of the Back-End for Heterogeneous SLAM

4.2. Integration of Line Segment Active-Search to the SLAM Front-End

4.3. Heterogeneous SLAM Experiment

5. Results and Discussion

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Visual EKF-SLAM from Heterogeneous Landmarks^†