This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Simultaneous Localization and Mapping (SLAM) is perhaps the most fundamental problem to solve in robotics in order to build truly autonomous mobile robots. The sensors have a large impact on the algorithm used for SLAM. Early SLAM approaches focused on the use of range sensors as sonar rings or lasers. However, cameras have become more and more used, because they yield a lot of information and are well adapted for embedded systems: they are light, cheap and power saving. Unlike range sensors which provide range and angular information, a camera is a projective sensor which measures the bearing of images features. Therefore depth information (range) cannot be obtained in a single step. This fact has propitiated the emergence of a new family of SLAM algorithms: the Bearing-Only SLAM methods, which mainly rely in especial techniques for features system-initialization in order to enable the use of bearing sensors (as cameras) in SLAM systems. In this work a novel and robust method, called Concurrent Initialization, is presented which is inspired by having the complementary advantages of the Undelayed and Delayed methods that represent the most common approaches for addressing the problem. The key is to use concurrently two kinds of feature representations for both undelayed and delayed stages of the estimation. The simulations results show that the proposed method surpasses the performance of previous schemes.

bearing-sensorSLAMrobotics1.Introduction

Simultaneous Localization and Mapping (SLAM) is perhaps the most fundamental problem to solve in robotics in order to build truly autonomous mobile robots. SLAM treats of the way how a mobile robot can operate in an a priori unknown environment using only onboard sensors to simultaneously building a map of its surroundings which uses to track its position.

The robot’s sensors have a large impact on the algorithm used for SLAM. Early SLAM approaches focused on the use of range sensors as sonar rings or lasers e.g., [1]. Nevertheless there are some disadvantages with the use of range sensors in SLAM: correspondence or data association is difficult; they are expensive and some of them are limited to 2D maps and computational overhead due to large number of features (see [2,3] for a complete review).

The aforementioned issues have propitiated that recent work moves towards the use of cameras as the primary sensing modality. Cameras have become more and more interesting for the robotic research community, because they yield a lot of information for data association, although this problem remains latent. Cameras are well adapted for embedded systems: they are light, cheap and power saving. Using vision, a robot can localize itself using common objects as landmarks.

On the other hand, while range sensors (i.e., laser) provide range and angular information, a camera is a projective sensor which measures the bearing of images features. Therefore depth information (range) cannot be obtained in a single frame. This fact has propitiated the emergence of a new family of SLAM methods: The Bearing-Only SLAM methods, which mainly rely in especial techniques for features system-initialization in order to enable the use of bearing sensors (as cameras) in SLAM systems.

In this context, a camera connected to a computer becomes a position sensor which could be applied to different fields such as robotics (motion estimation for generally moving robots humanoids), wearable robotics (motion estimation for camera equipped devices worn by humans), tele-presence (head motion estimation using an outward-looking camera), or television (camera motion estimation for live augmented reality) [4].

Usually the Bearing-Only SLAM has been associated with vision-based SLAM systems, possibly because cameras are by far the most popular bearing sensor used in robotics. In that sense, the use of alternative bearing sensors (i.e., auditory sensing) for performing SLAM has been much less explored. Nevertheless in an authors’ previous work [5], a Sound-Based SLAM system is proposed where sound sources are used as map features and thus showing the viability on the inclusion of the hearing sense in SLAM and the use of alternative bearing sensors.

In recent years several important improvements and variants to this kind of methods have appeared [6,7]. Also different schemes for increasing the number of features managed into the map have appeared [8]. Nevertheless the initialization process of new features is still the most important problem for addressing in Bearing-Only SLAM in order to improve the robustness.

In this work a novel and robust method called Concurrent Initialization is presented which is inspired by having the complementary advantages of the Undelayed and Delayed methods, which represent the most common approaches for addressing the problem of initializing new features in bearing-only SLAM.

2.Related Work

Bearing-Only SLAM has received most attention in the current decade. Therefore many of the related approaches are actually very recent. In [9] Deans proposes a combination of a global optimization BA, (Bundle Adjustment) for feature initialization and Kalman Filter for state estimation. In this method, due to the limitation on the baseline on which features can be initialized and depending on the camera motion and the landmark location, some features cannot be initialized. Strelow proposes in [10] a similar method but mixing camera and inertial sensors measurements in an Iterated Extended Kalman Filter (IEKF). In [11] Bailey proposes a variant of constrained initialization for bearing-only SLAM, where past vehicle pose estimates are retained in the SLAM state so that feature initialization can be deferred until their estimates become well-conditioned. In that sense the past poses of the robot are stacked in the map, together with associated measures, until base-line is sufficient to permit a Gaussian initialization. The criteria used for determining whether the estimation is well-conditioned (Gaussian) is the Kullback distance. The complexity of the proposed sampling method to evaluate this distance is very high.

Also there are some works that use other estimation techniques (apart to the EKF) in Bearing-Only SLAM like the Particle Filters (PF) based methods. In a series of papers, Kwok uses Particle Filters: in [12] variations to standard PF are proposed to remedy the sample impoverishment problem in bearing-only SLAM. In [13] initial state of features is approximated using a sum of Gaussians, which defines a set of hypothesis for the position of the landmark, and includes all the features inside the map from the beginning. On successive observations, sequential radio test (SRT) based on likelihoods is used to prune bad hypothesis. The way these hypotheses are initialized is not detailed, and convergence and consistency issues are not discussed. In [14] Kwok extends the algorithm using a Gaussian Sum Filter, with an approach similar to the proposed in [15] for bearing-only tracking. This method is perhaps the first undelayed feature initialization method. The main drawback of this approach is that the number of required filters can grow exponentially, and therefore computational load grows exponentially with the number of landmarks.

Some of the most notably advances on Bearing-Only SLAM have been presented by Davison [4,16], who shows the feasibility of real-time SLAM with a single camera, using the well-established EKF estimation framework. In this work a Bayesian partial-initialization scheme for incorporating new landmarks are proposed where a separate Particle Filter is used for estimating the feature depth which is not correlated with the rest of the map. In that sense it maintains a set of depth hypotheses uniformly distributed along the viewing ray of a new landmark, with a particle filter in one dimension. Each new observation is used to update the distribution of possible depths, until the variance range is small enough to consider a Gaussian estimation, in this point the estimation is added to the map as a three-dimensional entity. Until this initialization occurs, the ray estimation is maintained in the system`s single Extended Kalman Filter. A drawback of this approach is that the initial distribution of particles has to cover all possible depth values for a landmark, this fact makes it difficult to use when the number of detected features is large or when there are far features in the scene. As a result, its application in large environments is not straightforward, as it would require a huge number of particles.

Jensfelt in [17] presents a method where the idea is to let the SLAM estimation lag behind N frames and using these N frames to determine which points are good landmarks and find an estimate of their 3D location. Mainly the focus in this work is on the management of the features to achieve real-time performance in extraction, matching and loop detection.

In [18] Sola presents a method based on Federate Kalman Filtering technique. With initial Probability Distribution Function (PDF) for the features, a geometric sum of Gaussians is defined. The method is an approximation of the Gaussian Sum Filter (GSF), which was used in [14], that permits undelayed initialization with simply an additive growth of the problem size. A drawback of this approach is that it does not cope with features at very large depths. In [19] Sola presents a similar method to the presented one in [18] but features are initialized with a delayed method.

In [20] a FastSLAM [21] based approach is proposed by Eade. Here the pose of the robot is represented by particles and a set of Kalman Filters refines the estimation of the features. When the inverse depth collapses, the feature is converted to a fully initialized standard Euclidean representation. This approach for features initialization seems appropriate within a FastSLAM implementation, but it lacks for a more general framework.

Montiel et al. in [22] presented a method, where the transition from partially to fully initialized features does not need to be explicitly tackled, making it suitable for direct use in EKF framework for sparse mapping. In this approach the features are initialized in the first frame observed with an initial fixed inverse depth and uncertainty, determined heuristically to cover range from nearby to infinity, therefore distant points can be coded. Due to the clarity and scalability, this approach is a good option to be implemented.

On the other hand, experiments show that initial fixed parameters can affect the robustness of the method, especially when an initial metric reference is used in order to recover/set the scale of the map. This fact motivated the authors to develop in [23] a delayed version of the above method. In this case, initial depth and uncertainty of each feature are dynamically estimated prior to add this new landmark in the stochastic map. The works [22] and [23] are analyzed in the next sections. Table 1 shows a summary of the above methods.

3.Problem Statement3.1.Sensor motion model

Let us consider a bearing sensor, with a limited field of view, moving freely in 2DOF. The sensor state x̂v is defined by:
(1)x^v=[xv,yv,θv,vx,vy,vθ]Twhere [xv, yv, θv] represents the center position and orientation of the sensor and [vx, vy, vθ] denoting linear and angular velocity.

At every step it is assumed an unknown linear and angular acceleration with zero mean and known covariance Gaussian processes, aW and αW, producing an impulse of linear and angular velocity:
(2)n=[VxWVyWVθW]=[axWΔtayWΔtaθWΔt]

An Extended Kalman Filter propagates the sensor pose and velocity estimates, as well as feature estimates.

3.2.Features definition and measurement model

The complete state x̂ that includes the features ŷ is made of:
(4)x^=[x^v,y^1,y^2,...y^n]Twhere a feature ŷ represents a feature i defined by the 4-dimension state vector:
(5)y^i=[xi,yi,θi,ρi]Twhich models a 2-D point located at:
(6)[xiyi]+1ρim(θi)where xi, yi is the sensor center coordinates when the feature was first observed; and θi represents the azimuth (respect to the world reference W) for the directional unitary vector m(θi). The point depth di along the ray is coded by its inverse ρi = 1/di (Figure 1).

The use of an inverse depth parameterization for bearing-only SLAM can improve the linearity of the measurement equation even for small changes in the sensor position (corresponding to small changes in the parallax angle), this fact allows a Gaussian distribution to cover uncertainty in depth which spans a depth range from nearby to infinity. It is well known the relevance of a good uncertainty Gaussian representation in a scheme based in EKF [24].

Figure 2 shows a simulation for a point reconstruction from noisy bearing measurements at different parallax (upper plot), using both, the Euclidean and the Inverse Depth parameterization. The location of the vehicle is known. A Gaussian error σθ = 1° (degrees) is introduced in bearings. Lower plots show the evolution of the likelihood for depth and inverse depth as the parallax in the observation grows: in (a), the estimates of depth likelihood converge to a Gaussian-like shape, but the initial estimates are highly non-Gaussian, with heavy tails. In contrast, likelihoods of inverse depth (b) (abscissa in inverse meters) are nearly Gaussian, even for low parallax. Therefore it can be clearly appreciated how the uncertainty can be represented by a Gaussian using the inverse depth parameterization over whole parallax range, whereas the Euclidean representation converges to a Gaussian-like shape only to the final estimates. For the Euclidean representation, the parallax needed for the likelihood converges to a Gaussian-like shape, depending on the sensor noise, and thus the noisier the sensor the more parallax is needed for convergence.

The different locations of the sensor, along with the location of the already mapped features, are used to predict the feature angle hi (angle describing the direction of the feature in the sensor coordinate frame). The measurement model is defined by:
(7)hi=atan2(1ρisinθi+yi−yv,1ρicosθi+xi−xv)

atan2 is a two-argument function that computes the arctangent of y/x given y and x, within a range of [−π, π]. At this stage it is assumed that the bearing sensor is capable of tracking and discriminating between the landmarks, in other words, the data association problem is obviated.

In implementation using real data, features search could be constrained to regions around the predicted hi. These regions are defined by the innovation covariance matrix Si = HiPk+1H′i + R where Hi is the Jacobian of the sensor model with respect to the state, Pk+1 is the prior state covariance, and measurements z are assumed corrupted by zero mean Gaussian noise with covariance R.

As it was stated before, depth information cannot be obtained in a single measurement when bearing sensors are used. To infer the depth of a feature, the sensor must observe it repeatedly as the sensor freely moves through its environment, estimating the angle from the feature to its center. The difference between angle measurements is the feature parallax. Actually, parallax is the key that allows to estimating features depth. In the case of indoor sequences, centimeters are enough to produce parallax, on the other hand, the more distant the feature, the more the sensor has to travel to produce parallax. Therefore, in order to incorporate new features to the map, special techniques for features system-initialization are needed in order to enable the use of bearing sensors in SLAM systems.

Let us consider two methods, which represent the main approaches (undelayed and delayed) for addressing the initialization problem.

3.3.ID-Undelayed initialization

For the Inverse depth (ID) undelayed method presented in [22], transition from partially to fully initialized features do not need to be explicitly tackled; this means that the feature is added to the map in its final representation since the first frame was observed. The initialization includes both the feature state initial values and the covariance assignment. The initial uncertainty region covers a huge range depth [dmin, ∞] as Gaussian because the low linearization errors, due to the inverse depth parameterization. Once initialized, the feature is processed with the standard EKF prediction-update loop.

Using the inverse depth parameterization, while the feature is observed at low parallax, the feature will be used mainly to determine the sensor orientation but the feature depth will be kept quite uncertain; if the sensor translation is able to produce a parallax big enough then the feature depth estimation will be improved.

For the ID-Undelayed method (Figure 1), a new feature ŷnew (Equation 5) is initialized, when is detected the first time k, as follows:
(8)[xiyiθi]=[xvyvθv+zθ]where xv, yv, θv are taken directly from the current state x̂v and zθ is the initial bearing measurement. The initial value for ρi is derived heuristically to cover in its 95% acceptance region, a working space from infinity to a predefined close distance dmin expressed as inverse depth:
(9)[1dmin,0]so:ρi=ρmin2σρ=ρmin4ρmin=1dmin

In [22] the parameters are set as dmin = 1, ρi = 0.5, σρ = 0.25. The new system state x̂ is conformed simply adding the new feature ŷi to the final of the vector state:
(10)x^=[x^vy^1y^2]x^new=[x^vy^1y^2y^i]

The state covariance after feature initialization is defined by:
(11)Pnew=J[P(k)000Rj000σρ2]J′being J the Jacobian for the initialization function.

3.4.ID-Delayed initialization

In experiments using the undelayed initialization, it often happens that the inverse depth becomes negative after a Kalman update, due to the observation noise that predominates over the update of the depth, but there are simple solutions to solve this problem. Moreover, when an initial metric reference is used in order to recover/set the scale of the map (very relevant for robotics applications), initial fixed parameters (inverse depth and uncertainty) must be tuned in order to ensure convergence.

Figure 3 illustrates the SLAM process using the ID-Undelayed following a simple straight trajectory for 5 features, (4 of them near to the vehicle and the other more distant). In upper plots (a,b and c) the initial parameters (ρo = 0.01, σρ = 0.005) are set in order to initialize the features at the middle of the distance between the vehicle and the more distant feature, and therefore far enough respect to the nearby features (plot a). In this case we found in almost every case a huge drift in the estimates (plot c), and it can be appreciated that initializing the features far away from the sensor fails if there are some closest landmarks. If other values (ρo = 0.5, σρ = 0.25) are used (near to the sensor), (central plots d, e and f) then the percentage of convergence is poor (approx 50%) (plot f), due to the influence of the distant feature. In this case if the distant feature is removed from the map (not illustrated here) then a percentage of convergence of 80% is achieved. In the last series (lower plots g, h and i) we combine both initial values: ρo = 0.01, σρ = 0.005 for the distant landmark and ρo = 0.5, σρ = 0.25 for the all the nearby ones. In this case an effectiveness of 90% was achieved for the algorithm.

The issues mentioned above suggest us that the initial inverse depth and their associated initial uncertainty of the new features added to the map could be treated before to be added to the system state instead of use fixed initial depth and uncertainty. In [23] a delayed version of an undelayed method is proposed. In this case, initial depth and uncertainty of each feature are dynamically estimated prior to adding the new landmark in the stochastic map.

For the ID-Delayed method, a new feature ŷnew (Equation 5) is initialized as follows:

When a feature is detected the first time k, some part of the current state x̂ and covariance matrix P together with the sensor measurement are stored, this data λ (called candidate points) is composed by:
(12)λi=[x1,y1,θ1,z1,σ1x,σ1y,σ1θ]

The values x1, y1 and θ1 represent the current robot position; σ1x, σ1y and σ1θ represent their associated variances taken from the state covariance matrix Pk and z1 is the first bearing measurement to the landmark. In subsequent instants k, the feature is tracked until a minimum parallax threshold αmin is reached. Figure 4 shows that a few degrees of parallax are enough to reduce the uncertainty in the estimation.

The parallax α is estimated using:

The base-line b.

λi, using its associated data (x1, y1, θ1, z1, σ1x, σ1y, σ1θ).

The current state (xv, yv, θv, z, σxv, σyv, σθv).

For each candidate point λi, every time that a new bearing measurement z is available, the parallax angle α can be estimated as (Figure 4):
(13)α=π−(β+γ)

The angle β is determined by the directional unitary vector h1 and the vector b1 defines the base-line b in the direction of the sensor trajectory.

The angle γ is determined in a similar way as β but using the directional unitary vector h2 and the vector b2 defining the base line in the opposite direction of the sensor trajectory by:
(14)β=cos−1(h1⋅b1‖h1‖‖b1‖)γ=cos−1(h2⋅b2‖h2‖‖b2‖)where (h1 · b1) is the dot product between h1 and b1. The directional vector h1, expressed in the absolute frame W, points from the sensor location to the direction when the landmark was observed for the first time, and is computed using the data stored in λi denoting the bearing zi. The directional vector h2 expressed in the absolute frame W is computed in a similar way as h1 but using the current sensor position x̂v and the current measurement zi.
(15)h1=[cos(θ1+z1)sin(θ1+z1)]h2=[cos(θv+zi)sin(θv+zi)]

b1 is the vector representing the robot base-line between the robot center position x1, y1 stored in λi where the point was first detected and the current sensor center (xv, yv). b2 is equal to b1 but pointing to the opposite direction. The base-line b is the module of b2 or b1:
(16)b=‖b1‖=‖b2‖b1=[(xv−x1),(yv−y1)]b2=[(x1−xv),(y1−yv)]

If α > αmin then λi is initialized as a new feature ŷi. The threshold αmin can be established depending on the accuracy of the bearing sensor. Depth uncertainty is reasonably well minimized when α = 10°.

For a new feature ŷi, values of xi, yi, θi are defined in the same way as Equation 8. For the delayed approach the dynamical estimation of ρi is derived from:
(17)ρi=sinαbsinβ

The variance σρ for the inverse depth ρ is calculated now from the initialization process, instead of a variance predefined heuristically as it was made in the undelayed method, therefore the covariance for the new feature ŷnew is derived from the error diagonal covariance matrix Ri measurement and the state covariance matrix P.
(18)Ri=diag[σ1x,σ1y,σ1θ,σz2,σz2]

For reasons of simplicity Ri is defined as a diagonal matrix (cross-covariances are not taken into account) and is now conformed by the error variance of the standard deviation of the bearing sensor σz (one for each bearing estimation z1 and z) and the variances stored in λi (σ1x, σ1y and σ1θ). Note that the value of σz is constant and is not stored previously in Equation 12.

The new state covariance matrix, after initialization, is:
(19)Pnew=J[Pk00Rj]J′

Note that unlike the ID-Undelayed method there is not an implicit initial uncertainty in depth σρ (Equation 11). In the ID-Delayed method the complete covariance for the new feature is fully estimated by the initialization process.

3.5.Undelayed vs. Delayed

In an Undelayed approach, when a feature is added to the map the first time that it has been observed, its depth is modeled with a huge uncertainty. In that sense, this new feature does not provide any depth information. However, at this stage the benefit of the Undelayed approach is that features provide information about the sensor orientation from the beginning.

On the other hand, it can be useful to wait until the sensor movement produces some degrees of parallax, (gathering depth information) in order to improve robustness, especially when an initial metric reference is used for recovering scale. Moreover, when cameras are used in real cluttered environments, the delay can be used for efficiently reject weak features, thus initializing only the best features as new landmarks to the map.

4.Concurrent Initialization

This section presents a novel and robust method, called Concurrent Initialization, for initializing new features in bearing-sensor-based SLAM systems. The method takes advantage of both, undelayed and delayed approaches. When a feature is detected for the first time, it is immediately initialized in the map (undelayed) as a directional vector which contributes since the beginning to the estimation of the sensor orientation. After that, while the sensor moves freely through its environment, the incoming measurements are incorporated via an uncorrelated linear Kalman filter in order to estimate the depth of the feature. If the sensor movement produces enough parallax, then the feature will be updated with the estimated depth, and thus also contributing to the estimation of the sensor location. Very far features will not produce parallax, and will remain in the form of a directional vector in the map, but contributing to the estimation of sensor orientation. Figure 5 illustrates the concurrent initialization process.

4.1.Undelayed stage

When a feature is detected for the first time k, it is initialized immediately in the map as a new landmark ŷL(i) which is composed by the 3-dimension state vector:
(20)y^L(i)=[xi,yi,θi]Twhere
(21)[xiyiθi]=[xvyvθv+zθ]

xv, yv, θv are taken directly from the current state x̂v and zθ is the initial bearing measurement. ŷL(i) defines a directional vector, expressed in the absolute frame W, which represents the direction of the landmark from the sensor, when it was observed for the first time. The covariance matrix P is updated in the same manner as Equation 19 but using the proper Jacobian J.

Parallel to the system state (represented by the state vector x̂), a state vector x̂can is used for estimating (via an extra linear Kalman Filter) the feature depth of each landmark ŷL. The state x̂can is not directly correlated with the map.

Every time a new feature ŷL(i) is initialized in x̂, the state x̂can is augmented as:
(22)x^can=[λ1λ2]x^can_new=[λ1λ2λi]where λi is a 3-dimension vector composed by:
(23)λi=[αi,Δαi,ρi]′

For each λi, αi is the estimated parallax, Δαi is the rate of change in parallax and ρi is the estimated inverse depth.

The three initial values of λi are set to zero, and the initial values of Rc have been heuristically determined as: Rc = diag(.01, .01, 1).

4.2.Delayed stage

While the sensor moves through its environment, it can observe repeatedly a landmark ŷL(i), at each iteration generating a new angle measurement z. All these new measurements are successively added to the linear Kalman Filter (responsible for estimating x̂can) in order to infer the landmark depth. For each new measurement zi of a feature ŷL(i) an iteration of the filter is executed.

The state transition model for each λi is:
(25)x^can_i(k+1)=[αi(k+1)Δαi(k+1)ρi(k+1)]=[αi(k)+Δαi(k)Δαi(k)ρi(k)]

The measurement prediction model is directly obtained from the state. On the other hand, the measurements zcan used to update the filter are a function of: (i) the feature ŷL(i), (ii) the sensor state x̂v and (iii) the current measurement zi.
(26)zcan=[zαzρ]=fz(y^L(i),x^v,zi)

zα and zρ are estimated in the same manner as Equation 13 and Equation 17 respectively. Only note that in Equation 16, x1 and y1 are taken from ŷL(i), so x1 = xi and y1 = yi.

The implicit uncertainties in the estimation of the function fz are used to compose the error measurement covariance matrix Rcan:
(27)Rcan=∇fz(Pt)∇fz′where Δfz is the Jacobian of fz with respect to zcan. Pt is formed by:
(28)Pt=[Px^vPx^vy^L(i)0Py^L(i)x^vPy^L(i)000σz2]

All the components of Pt, except σz (the error variance of the bearing sensor) are taken directly from the covariance matrix P of the system state x̂. Px̂v is the submatrix of P corresponding to the covariance of the sensor state x̂v. PŷL(i) is the covariance of the feature ŷL(i). Px̂vŷL(i) and PŷL(i)x̂v are the correlations between x̂v and ŷL(i).

Rcan is used in the Kalman update equations for estimating the innovation covariance matrix Si.

4.3.Updating depth

Features expressed in the form of ŷL(i) are very useful to estimate the sensor orientation θv. In [25] a visual compass is proposed based in this fact. Besides, the depth of features are needed for estimating the sensor location [xv,yv]. For near features ŷL(i), a small sensor translation is enough to produce some parallax and thus to infer depth.

The state x̂can encloses the parallax αi and inverse depth ρi estimations for each feature ŷL(i). Figure 6 shows the evolution of parameters αi (upper plot) and d = 1/ρi (lower plot) and its uncertainty for the feature estimated by the linear Kalman filter, (left plot) feature at a distance of d = 1,000 units and (right plot) for a distance of d = 50 units. The boundary uncertainties at 3σ are indicated in blue color. The filtered values are depicted in red color. Also note in green color the raw measurements zcan (taken from Equation 26). In these graphics it can be clearly appreciated how the estimation of depth d is directly influenced by the parallax; for the near feature, about 100 steps are needed to producing parallax and thus d converges rapidly to its real value. Also note that the uncertainty is rapidly minimized. On the other hand, the distant feature produces small parallax. For a low parallax the sensor noise predominates and therefore produces very fluctuant raw measurements. Even so, after several steps, the filter estimates the depth reasonable well with a high related uncertainty.

A minimum parallax threshold αmin is used for updating a feature ŷL(i) as ŷi. Distant features will not produce parallax and therefore will remain expressed as ŷL(i), but contributing to the estimation of sensor orientation θv.

On the other hand, if αi > αmin then:
(29)y^L(i)=[xi,yi,θi]T→y^i=[xi,yi,θi,ρi]Twhere ρi is taken directly from x̂can. The covariance matrix P is transformed by the corresponding Jacobian:
(30)Pnew=J[P00qσρi2]J′being σρy the variance of the inverse depth estimation for ŷL(i) and taken from Pcan. The constant q is used to increase the initial uncertainty of ρi in order to improve filter consistency. In experiments q is set to 100.
(31)J=[I0000∂y^i∂y^L(i)0100I0]

When a feature ŷL(i) is updated as ŷi, then its corresponding values will be removed from the linear Kalman filter responsible for estimating the state x̂can.

4.4.Measurement

At any time, the map can include both kind of features ŷL(i) and ŷi. Thus each kind of feature has its own measurement prediction model:

- For features ŷi measurement Equation 7 is used. A value of 2 to 6 times the real error of the bearing sensor is considered for the measurement process.

- Features ŷL(i) are supposed to be very far from the sensor and therefore it is assumed that its corresponding bearing measurement zi will remain almost constant. The measurement prediction model is simply:
(32)hL(i)=θi−θv

- For near features ŷL(i), the bearing measurement zi will change rapidly. Due to this fact the standard deviation of the bearing sensor σz for features ŷL(i), (in update Kalman equations) is multiplied by a high value c, (c = 10e10 were used in experiments). An interesting issue, to be treated for further work, could be the dynamical estimation of parameter c.

Both kinds of measurement prediction models must be used together, if several measurements zi are made at the same time for both kinds of features. For example, consider that the bearing sensor takes measurements of the features ŷL(2) and ŷ3 simultaneously, then:
(33)v=[z2z3]−[hL(2)h3]R=[cσz200σz2](34)∇H=[∂hL(2)∂x^v0∂hL(2)∂y^L(2)00∂h3∂x^v00∂h3∂y^30]where v is the innovation, R is the error covariance matrix and ∇H is the Jacobian measurement model.

5.Experiments

Several simulations have been executed in order test the performance of the proposed method in relation to other approaches. Simulations are extremely helpful when different methods are compared among others, because numerous variables (inherent to real environments) are obviated (e.g., data association), or become constants (i.e., synthetic map), therefore the cores of the methods are compared.

5.1.Initialization process of distant and near features

Figure 7 shows the evolution in the initialization process of two features maps with the concurrent initialization: a distant one (600 units of depth) and a near one (50 units of depth). The only information given a priori to the system was the scale reference (the three points in yellow) which was introduced with an associated uncertainty close to zero in the covariance matrix R. Taking into account that there is not an additional sensory input (e.g., odometry), at every step an unknown linear and angular acceleration is introduced with zero mean and known-covariance Gaussian processes (Section 3.1). In this case, aWx = 4 m/s2, aWy = 4 m/s2 and aWθ = 2 rad/s2 were used. The only input sensor of the system is a noisy sensor capable of measuring the bearing of features, with a limited field of view of 110° (emulating a 2-DOF camera). A standard deviation of σ = 1° is considered for the sensor readings.

At the begin of the sequence (plot a) both features has been initialized as a directional vector ŷL(i), defining a ray (in cyan color). Note that three feature points (in yellow color) have been previously added to the map as a priori known landmarks in order to define/set the scale of the world. Around step 100 (plot b), the small displacement of the sensor to the right produces enough parallax to estimating the depth of the near feature. Observe that the near feature ŷL(i) has been transformed to a ŷi feature (blue color). Also note that some uncertainty (especially in depth) remains at the moment of the transformation. By the last step, at 250 iterations, (plot c) the uncertainty in the near feature has been full minimized, on the other hand, the movement of the sensor has not produced enough parallax and therefore, the distant feature remains in the form of ŷL(i) but still contributing to the estimation of the sensor orientation.

5.2.Comparative study

In order to show the performance of the Concurrent method proposed in this article, a comparative study between the ID-Undelayed method [22] and the ID-Delayed [23] method is presented.

Figure 8 illustrates the environment setup used in the study. For all the tests, the bearing sensor is moved over a semi-cycled U-like shape trajectory, since our main goal is to observe the effect of the initialization process of new features in the estimation of both map and sensor location, instead of the closing loop problem. About 100 landmarks (in green) simulate the environment of the sensor. Figure 8 also shows both the map and sensor trajectory estimates after a single run of 2,000 steps of the Concurrent method. The features map and their uncertainties are indicated in blue. Note the evolution of uncertainty in both sensor and features location. Also note the typical (in SLAM systems) drift in both trajectory and map estimations as the sensor moves far away from its initial location.

The average NEES (normalised estimation error squared [26]) over N Monte Carlo runs of the filter was used in order to evaluate the consistency of the methods, as it is proposed in [27]. The NEES is estimated as follows:
(35)εk=[xk−x^k|k]TPk|k−1[xk−x^k|k]where xk is the true state and the average NEES is computed as:
(36)ε¯k=1N∑i=1Nεk(i)

Four different tests were realized to comparing the methods under diverse conditions. Table 2 shows the values for the linear and angular acceleration (aWx, aWy and aWθ in 4m/s2) and the time between “frames” (Δt in seconds) used for each test. In this case, a higher Δt implies bigger displacements of the robot from frame to frame and therefore more linearization errors due to large changes in parallax.

In experiments ρini = 0.05 and σρ = 0.025 were used for the ID-Undelayed method and αmin = 10° were used for both ID-Delayed and Concurrent methods. The NEES was estimated over the 3-dimensional robot pose. The average NEES for each method was estimated using N=20 Monte Carlo runs. In simulations the sensor was moved 3 meters every Δt = 1 second for the straight sections and 4.5 meters every Δt = 1 seconds for the curve sections of the trajectory.

MATLAB code was run using a 1.73 GHz Pentium M laptop. Table 3 shows the execution time of the three methods for conditions Δt = 1/30 and Δt = 1/120. As it would be expected, for Δt = 1/120 the execution time was around four times longer than Δt = 1/30, for the same trajectory. Execution time of the Concurrent method was estimated using a diagnostic version of the algorithm, marked with an asterisk (*) in Table 3. This version uses a 5n Kalman filter (instead of 3n) which estimates two extra diagnostic parameters. This means that the real execution time of the Concurrent method should be somewhat faster.

For some runs of the algorithms, filter divergence could occur. Table 3 also shows the failed attempts, this means the number of times that filter divergence appeared before a method reach N = 20 positives runs (with convergence), for each test. The average NEES was only estimated using positive runs.

Figure 9 shows the evolution of the average NEES (ε̄k) for tests (a), (b), (c), (d) for each method. Note that test (b) and (d) need four times more steps than test (a) and (c) in order to complete the same trajectory due to their particular value of Δt.

As it would be expected, the whole methods become optimistic after a certain time [27]. Nevertheless, it can be observed for tests (a), (b) and (c) that ε̄k achieved for Concurrent method (red) tends to be lower than ε̄ k achieved for both ID-Undelayed (blue) and ID-Delayed (green) methods. This behavior is more evident in test (a) and (c), where lower noise is injected into the sensor motion model. The difference between (a) and (c) is the time between “frames” (Δt) used for each test. The above results suggest, that if adequate noise is injected, then Concurrent method seems to be less sensitive to large jumps in parallax (and linearization errors), and therefore could be suitable for application where a high “frame-rate” is not available. Regarding to the ID-Undelayed method, it can be observed the effect of the parameter Δt over the magnitude of ε̄k, although its form is somewhat similar for all tests. In test (a) and (c) where Δt is higher (1/30), ε̄k reaches around 600 units, while in test (b) and (d) where Δt is lower (1/120) ε̄k reaches around 200 units. Theoretically the ε̄k for the ID-Undelayed method would be more favorable as Δt→0. At this point, it is important to remember that implementation of Concurrent method implies the estimation of an extra linear Kalman filter of dimension 3n, where n is the number of features ŷL(i) in the state x̂. On the other hand, the Concurrent method is lees sensitive to the parameter Δt. At least for this experimental setup, the average NEES for the Concurrent method, when Δt = 1/30 (94 s of execution time), is even lower than the ε̄k for the ID-Undelayed, when Δt = 1/120 (302 s of execution time). The ID-Delayed method represents the most efficient alternative in computational cost with medium performance in average NEES terms. However the ID-Delayed method shows to be the least robust method in terms of convergence (Table 3). The problem of convergence for the ID-Delayed method is notorious in tests (a) and (c) where Δt is higher (1/30), these results indicate a frame-rate dependency of the method. On the other hand, for the realized tests, the Concurrent method shows the best results in terms of convergence.

6.Conclusions

This work proposes a novel and robust approach for initializing new features in SLAM systems based in bearing sensors. First, an overview of the problem is given and the most relevant related work is presented. Most of the methods presented in the literature can be classified into two categories: Undelayed and Delayed Methods. In that sense, an analysis of two representative methods of the aforementioned taxonomy is also presented. Undelayed methods provide information of orientation since a feature is first detected, on the other hand, Delayed methods await until some depth information is gathered, improving convergence.

The proposed approach in this article, the concurrent initialization method, takes the best of both, Undelayed and Delayed approaches. The key is to use two kinds of feature representations concurrently for both undelayed and delayed stages of the estimation. The simulations results, based in the average NEES test, showed that the Concurrent method can maintain the filter consistency satisfactorily. Moreover, observing the percentage of convergence, the Concurrent method appears also to be robust.

It is important to mention that the complexity of the Concurrent method is higher than other methods (i.e., ID-Undelayed method) and the additional Kalman filter certainly implies an increase of computational requirements per frame. On the other hand, the Concurrent method appears to be less sensitive to the linearization errors induced for large jumps in parallax (much time between frames). In that sense the Concurrent method could be even more efficient in computational terms than other methods because it seems to work properly at low frame rate. This attribute also makes it suitable for applications where a high frame rate is not available for different reasons. The concurrent initialization method could be a robust alternative to bearing sensor based SLAM systems.

This research was conducted in the Automatic Control Department, Universitat Politecnica de Catalunya (UPC) and supported by the Spanish Ministry of Science and Innovation, UbROB Project, CICYT number DPI2007-61452.