1. Introduction
At present, the influential simultaneous localization and mapping (SLAM) systems with monocular vision include ORB-SLAM2 [
1], LSD-SLAM [
2], DSO [
3], and SVO [
4]. Among them, ORB-SLAM2 [
4] has high adaptability and strong robustness to indoor and outdoor environments. However, the algorithm itself is based on pure vision. When there are no visual features, the accuracy and robustness of the pose estimation will decrease rapidly, and the algorithm may fail. Therefore, in the follow-up development of visual SLAM, in order to overcome the shortcomings of pure vision, the strategy of multisensor fusion is adopted.
Accordingly, sensors with scale measurement capabilities and monocular vision sensors are used to perform fusion vision SLAM to increase accuracy and robustness. Relatively stable and reliable solutions can be obtained with lasers [
5]; however, at present, this method is generally only suitable for large-scale scenarios, such as unmanned driving, and is unsuitable for applications with limited costs. The inertial measurement unit (IMU) has become a generally accepted option. However, it exhibits a non-negligible cumulative error if run for a long period of time [
6], especially in visually restricted conditions without texture or under weak illumination, in which case the visual mile cannot be used to correct the IMU error. In [
7], the scale observability of a monocular visual inertial odometer on a ground mobile robot was analyzed in detail. When the robot moved at a constant speed, due to the lack of acceleration excitation, the constraint on the scale was lost, resulting in a gradual increase in the scale uncertainty and positioning error.
A wheel speed inertial odometer was integrated with a monocular visual odometer using the extended Kalman filter (EKF) [
8,
9], assuming the robot to be running on an ideal plane, and 3-DOF pose measurement was performed. The wheel speed inertial odometer uses wheel speed measurement for EKF status prediction; the angular speed measurement is integrated for dead reckoning; and the visual odometer method is used for an EKF measurement update. In the above-mentioned EKF-based loose coupling method, when the visual odometer cannot accurately calculate the pose due to insufficient visual characteristics, some proportion of its output in the filter will fall, resulting in ineffective visual observation. Consequently, the accuracy is reduced. VINS-Mono used tightly coupled nonlinear optimization [
10,
11]. By combining the IMU pre-integral measurement and visual measurement in a tightly coupled form to achieve the maximum posterior probability, we use the nonlinear optimization method to reach the optimal state. An open-source visual inertia SLAM algorithm VINS-Fusion [
12,
13] was developed based on VINS-Mono, supporting multiple sensor combinations (binocular camera + IMU; monocular camera + IMU; binocular camera only); GPS is used to improve the accuracy of the global path. In pursuit of better performance, wheel speed is used in SLAM system [
14]—the wheel speed sensor and the visual odometer were integrated tightly. In this way, they make least-squares rather than optimization. However, they do not consider the unreliability of wheel speed. When a robot moves on uneven surfaces, or a wheel slip occurs, an incorrect wheel speed measurement will significantly affect the scale accuracy, and can potentially lead to system failure. In [
15], a batch optimization framework with formulated edges was proposed. Moreover, tightly coupled nonlinear optimization methods were used to integrate vision, IMU, and wheel speed, and to perform pose. The error cost function was composed of vision errors, inertial measurement errors, and wheel speed sensor dead reckoning errors. In addition, assuming that the vehicle was moving on an approximate plane, a “soft” plane constraint term was added to the error cost function. While the robot is running with fixed acceleration or going straight, we do not know how long it moves in real time or what the direction is. Therefore, the encoder and soft plane constraints will make the task easier.
According to whether the visual feature points are added to the state vector of the filter or the optimization equation, the visual inertial odometry method based on filtering can be divided into two types: loose coupling and tight coupling. Further, the loose coupling method is divided into two parts: the visual odometer and state estimator. The visual odometer tracks the feature points and calculates the current camera pose, and the state estimator uses the IMU to measure and adjust the camera pose obtained by the visual odometer. As the visual features are invisible to the state estimator, they cannot be adjusted based on the IMU measurement, and the poor position accuracy of the visual features causes the accuracy of the visual odometer to decline. As a result, the correlation between measurements is not fully utilized for pose measurement, thereby reducing the accuracy. In the tight coupling method, the carrier pose and position of the feature point are used as variables to be estimated, and the filter method or optimized method is used for estimation.
Thus far, for the existing visual inertia SLAM algorithm, when the robot is moving at a constant speed, or purely rotating, and encounters scenes with insufficient visual features, the problems of low accuracy and poor robustness arise. Tightly coupled wheel speed information can help solve this problem. However, there are few studies on multi-sensor fusion SLAM for wheeled mobile robots based on vision, inertia, and wheel speed measurements that are tightly coupled and optimized. There is no complete solution similar to ORB-SLAM. Therefore, research on tightly coupled monocular visual odometer combined with wheel speed measurement is significant. Therefore, we use IMU and wheel odometry to make a monocular visual odometry system to gain the scale reconstruction.
The main novelty and contributions of this paper include:
A multi-sensor fusion SLAM algorithm using monocular vision, inertial measurement, and wheel speed measurement is proposed. The wheel speed and IMU angular velocity pre-integration of the wheel odometer can avoid the repeated integration caused by the linearization point change in the iterative optimization process.
Based on the state initialization of the wheel odometer and IMU, the initial state value required by the state estimator can be quickly and reliably calculated in both stationary and moving states.
The method in this paper solves the problem of poor pose estimation accuracy caused by the weak observability of monocular visual inertial SLAM, and further improves the robustness of pose estimation.
2. Multi-Sensor State Estimation Based on Tight Coupling Optimization
2.1. SLAM Based on Multi-Sensor Fusion
The multi-sensor fusion state estimator in this study uses monocular vision, IMU, and wheel odometer measurements based on feature point optical flow tracking [
16]. None of these sensors can measure the absolute pose. Therefore, the multi-sensor fusion state estimator, as an odometer algorithm, has an unavoidable cumulative error. Therefore, we used key frame selection, loop detection [
17], and back-end optimization algorithms based on the VINS-Mono framework and applied them to the multi-sensor fusion state estimator to form a complete SLAM system.
Figure 1 shows a system block diagram of the SLAM method.
In a multi-sensor state estimation process, the main data processing and analysis processes include raw sensor input, data pre-processing (including calibration compensation, time alignment, and pre-fused wheel odometer), state initialization, state estimation problem solving, loopback detection, and backend optimization. The processes of optimal state estimator, wheel odometer pre-integration, and initialization reflect the contributions and innovations of this paper. The rest is our flexible use of existing methods.
Section 2.2,
Section 2.3,
Section 2.4,
Section 2.5,
Section 2.6 and
Section 2.7 analyze the core multi-sensor fusion state estimation problem and the pre-integration process of IMU and wheel speed.
Section 2.8 analyzes the initialization process of the state.
2.2. Variables to Be Estimated and Observation
Considering that the IMU zero offset always exists, when defining the variable
to be estimated, the IMU zero offset of each key frame needs to be included. Therefore, in the k key frame, we have:
For the IMU, is the state, denotes the position, is the attitude (quaternion form), denotes the speed, and and are the accelerometer and gyroscope bias, respectively. is the key frames and is the feature points. Further, is the inverse depth of (the inverse of the Z-axis coordinate).
Additionally, pre-fusion wheel odometer observations are added based on VINS-Mono. The original wheel odometer is defined as:
where {
} is the coordinate of the wheel odometer relative to the starting point and
is the angle of rotation relative to the starting direction. Therefore, the observation
used to constrain the
:
where visual feature point observation
, containing all feature points
observed under the i key frame;
is IMU pre-integration of accelerated speed and angular speed; and
is pre-fusion wheel odometer.
2.3. Optimal Estimation Problem
The maximum posterior estimation and Bayes’ theorem shows:
Here,
is the conditional probability of observing the occurrence of
under the existed state
, which can be obtained by the observation equation and the covariance.
is the prior probability (edge probability) of
. According to formula (3), we have:
We use the least-squares method to find the maximum posterior estimate, and use Mahalanobis distance to measure the difference between the residual and the covariance matrix.
Here,
is the Mahalanobis distance and
is the covariance matrix.
is the prior information from marginalization;
is the visual residual;
is the wheel odometer residual; and
is the IMU residual. In addition, the Huber loss function
[
18] is used to improve the robustness, where
is
2.4. Visual Measurement Constraints
The specific process is as follows: while the feature point
is first observed in the key frame i, it is recorded and tracked. Its spatial pose is defined as a function of the key frame i pose
and the inverse depth
of the feature point. While the feature point
is observed again in the key frame j, a visual residual term is generated. The residual term
represents the error of the feature point
l in the position of j. It is also called the re-projection error, which is a function of the key frame i pose
.
In the formula, represents the rotation matrix from coordinate system to coordinate system; is the position where the feature point l is projected onto the unit ball in the key frame j; is the back projection function, which can project the pixel coordinates into the camera coordinate system is the position projected on the unit ball in the key frame j; to compare the error with , it needs to be transformed into the camera coordinate system Cj of the key frame j; and are the two orthogonal base vectors on the tangent plane of the unit ball and the projection lines with the feature points in the orthogonal direction.
2.5. IMU Constraints
In the visual odometer method based on bundle adjustment, the state of the carrier is optimized and visual measurement is used to constrain. The IMU measurement between frames is added as a constraint on the optimization framework, thereby improving robustness.
2.5.1. IMU Pre-Integration
To reduce the complicated operation caused by reintegration, we used the IMU pre-integration method [
19] to fuse IMU measurements between two consecutive key frames.
2.5.2. Residual Term
The IMU pre-integration processes the IMU measurement for a continuous period based on the given IMU zero offset and obtains the relative pose constraint between the initial and end states of the time period. The IMU pre-integration residual term is defined as:
The random distribution of the residual term conforms to and is obtained by updating the covariance equation. is three-dimensional small perturbation and represents the error term. The IMU pre-integration provides constraints on the variables to be optimized contained in the two key frames before and after. In the process of nonlinear optimization, the essence of the constraint is to provide the direction and gradient of the variable to be optimized by calculating the Jacobian matrix of the residual. As the direction of gravity is obtained during the initialization of the visual inertial odometer, the gravity acceleration is not used as a variable to be optimized.
2.6. Wheeled Odometer Constraints
On ground mobile robots, a wheel speed meter is typically used to perform dead reckoning to obtain continuous relative poses of the robot. The continuous position and reliable scale estimation of the wheel odometer make it suitable for tasks such as path planning and navigation.
2.6.1. Two-Dimensional Wheel Odometer Algorithm
The two-dimensional wheel odometer has an unavoidable cumulative, error but can provide a continuous carrier trajectory. As the wheel speed meter measures the average wheel speed over a period of time, the chassis speed measurement measures the average speed during this time. The position and attitude update methods of the wheel speed odometer include Euler points, median points, and higher-order Runge–Kutta methods. As the sampling speed of the wheel speed meter is high (1 kHz), the Euler integration method is used to reduce the calculation time of the main control microcontroller. This is achieved while assuming that the chassis moves with fixed direction and speed in the original direction during the period and rotates to a new direction at the end of the time period.
The initial state of the wheel odometer is
. Given the previous state
of the wheel odometer, the current chassis speed measurement
, and the time difference
, we can obtain the new wheel odometer state
as:
2.6.2. Wheel Odometer Pre-Integration
In this paper, we propose a pre-integration method for wheel odometer. We use IMU and wheel speed to measure the relative pose between two key frames. The data of the two sensors makes pre-fusion . Thereafter, only the is measured, according to the wheel odometer kinematics equation, and a continuous calculation and integration is made to find the displacement.
The incremental update equation of the wheel odometer is:
where
denotes wheel odometer with noise measurement; the initial state value
.
is the pre-credit term.
The nominal weight of the wheel odometer pre-integration item can be incrementally updated based on the pre-fusion wheel odometer measurement:
The initial value of nominal weight: .
According to the definition of the error amount of the pre-integration term, an incremental update formula of the error amount of the pre-integration term is:
is anti-symmetric matrix of Lie algebra SO (3). The nominal value of the wheel odometer pre-integration term depends on the pre-fusion wheel odometer measurement and the gyroscope zero offset. For the variable to be optimized, the zero bias of the gyroscope needs to be continuously adjusted in the pose measurement process to reduce the residual error. Therefore, in the optimization process, the partial derivatives and of the nominal value of the pre-integral term of the wheel odometer with respect to the zero offset of the gyroscope need to be used.
According to the incremental update of the error value of the pre-integration term of the wheel odometer, the Jacobian matrix of the error value between the two frames before and after can be obtained as:
According to the definition of the nominal value of the pre-integration item of the wheel odometer, the Jacobian matrix of the error value is the nominal Jacobian matrix of
. Therefore, according to the chain-derivation rule,
, the update equation of the nominal value of the pre-integration term of the wheel odometer with respect to the zero-biased Jacobian matrix is:
The initial value of the Jacobian matrix: .
2.6.3. Residual Term
Definition: In the least-squares problem of robot pose measurement, the wheel odometer residual term
represents the error distance between the frame-to-frame relative displacement
and the key frame displacement
in the variable to be optimized, where
is the observation, and
is the estimator.
The wheel odometer residual does not include the errors and with respect to the rotation and gyro zero offset. This is as these terms are already defined in the residual term of the IMU pre-integration. The IMU pre-integration uses the original IMU measurement as the angular velocity input, which provides higher rotational integration accuracy than the wheel odometer pre-integration measured with a lower frequency pre-fused wheel odometer.
To use the variable
xk,
xk+1 to represent the wheel odometer pre-integration residual term,
needs to be transformed:
We obtain the residual term expressed using only the variables to be optimized and the wheel odometer pre-integration:
Here, and are known constants.
As a maximum posterior problem, the robot pose measurement is the same as least-squares by introducing a covariance matrix of the residuals to transform the residuals with dimensions into a unified probability representation. The wheeled odometer residual obeys the covariance matrix of the wheeled odometer pre-integration, . Here, represents the displacement covariance in the wheel odometer pre-integration covariance matrix , .
Regarding the Jacobian matrix, according to the definition of the wheel odometer residual
, in the optimization process, the residual value
will change with the adjustment of the previous key frame poses
and
, and the poses
and
of the next key frame, and the gyroscope zero offset
of the previous frame. To provide the necessary gradient direction for optimization, the system needs to be linearized in the current state
and the ratio between the increment of the residual
and the increment of the variable to be optimized is calculated. Thus, the Jacobian matrix
is defined:
Here, . As the increment is small, using the quaternion definition will produce additional degrees of freedom. The increment for the rotation state in the formula is defined as the shaft angle representation.
As the wheel odometer residual is only related to some variables in the previous key frame state
and next key frame state
, the value of the Jacobian matrix
is:
2.7. Marginalization and Prior Constraints
The state of the key frame and its related observations are constantly removed from the optimization equation. If all observations related to the removed key frames are directly discarded, the constraints of state estimation will be reduced, and the loss of valid information will lead to a decrease in accuracy. Here, a marginalization algorithm is used, while removing the key frames, retaining the removed observations to constrain the optimization variables. According to [
6], the use of the Gauss–Newton method can be understood as adding an increment to the variable to be optimized; the objective function is the smallest.
If the residual function
is linearized at x, and the Jacobian matrix
is obtained, the nonlinear least-squares problem becomes a linear least-squares problem:
Here,
. Taking the derivative of this formula with respect to
be 0, we obtain:
Let ; thus, we obtain the incremental equation , where H is called the Hessian matrix.
Divide the variable x to be optimized into the part
, which needs to be removed, and the part
,
, which needs to be retained, then the incremental equation becomes:
The Schur method is used to eliminate the element to obtain the solution of
:
Intercepting the second row of the above matrix, we obtain:
In the above formula, only is unknown, and no information in H and b is lost. This process removes the rows and columns related to from the incremental equation, marginalizes the state that needs to be removed, and retains the historical observation constraints on the state . When the next image frame arrives, the prior information in the above Formula (31) will be used as a prior constraint term to construct a nonlinear least-squares problem.
2.8. State Initialization
VINS-Mono uses multiple steps to initialize the state: gyroscope zero offset correction, initializing gravity, speed, and scale coefficients, and modifying the direction of gravity. The disadvantage of this method is that it depends on sufficient visual measurement of parallax and sufficient acceleration excitation. When there is no abnormal situation, such as skidding, the wheel odometer has better accuracy and reliability over a short distance and a short duration. Compared with monocular vision, there is no scale uncertainty, and it is easier to initialize the keyframe pose, velocity, and gravity directions.
2.8.1. Gyro Zero Offset Initialization
As the gyroscope and wheel odometer measurements are on the same rigid body, the rotations of the two are the same. The relative rotation between the two key frames can be obtained through IMU pre-integration and wheel odometer pre-integration, respectively:
and
. The rotation term of the pre-integration of the wheeled odometer above is also obtained through the gyro integration and has no reference value. Therefore, during the initialization process, the gyroscope pre-integration will use the heading angle for rotation integration. Rotation term
is a function of the gyroscope bias
. If the error between
and
is used as a constraint, the gyroscope bias
can be estimated. Assuming that the gyro bias
of each key frame during the initialization process is the same
, the construction of the least-squares problem is as follows:
Linearizing the rotation transform at
, we obtain:
Here,
is the partial derivative of the inter-frame rotation
.
is the gyroscope zero bias. The objective function of the least-squares problem is written as:
Considering only the imaginary part of the quaternion, we obtain:
The above Formula (35) conforms to the format of
, and the Cholesky decomposition can be used to find the least-squares solution:
If the robot rotates rapidly during the gyro work offset initialization process, the elasticity of the wheel, the rigid connection between the wheel and the IMU, the misalignment of the wheel odometer clock and the IMU clock, and a calibration error of the wheel odometer rotation scale factor may lead to poor gyro work offset initialization results.
2.8.2. Initialization of Key Frame Speed and Gravity
As the Mecanum wheel will tremble during movement, and the wheeled odometer algorithm can only obtain the heading angle information, it is difficult to obtain accurate relative rotation between key frames through wheeled odometer integration. In the previous step, the zero offset of the gyroscope has been initialized, and the relative rotation between all key frames can be obtained through IMU pre-integration. As the rotation is known, the key frame speed and gravity can be calculated by solving linear equations. Decomposing the position term
and speed term
in the IMU pre-integration and transforming it into the form of matrix multiplication
, we obtain:
Here, is the IMU pre-integration measurement between the key frames and . The variable to be estimated related to the key frames and is defined as , and represents the constraint between the measurement and the . The s in the represents the distance of the wheel odometer with respect to the actual distance, that is, the X-axis and Y-axis scale factors of the wheel odometer . If the IMU excitation is sufficient, it can be used to calibrate the scale factor. is defined here for the reliability of initialization.
To reduce initialization errors and improve reliability, multiple key frame measurements need to be used as constraints to calculate the key frame speed and gravity direction. By combining the multiple linear equations above, we can obtain the least-squares problem:
Here,
x is the variable to be estimated, and
is the optimal estimated value of
x. The program uses Cholesky decomposition to solve the least-squares problem:
In the formula, the matrix H is obtained by inserting all into empty columns at the corresponding positions of the unrelated variables and summing them, and is obtained by combining all .
4. Discussion
Aiming to solve the problems of low accuracy and poor robustness of the visual inertial SLAM algorithm, we designed and implemented a tightly coupled monocular visual inertial pose estimation algorithm that integrates wheel speed information. The state estimator, based on tightly coupled multi-sensor information, is used as the core of the algorithm, and the wheel odometer pre-integration that integrates wheel speed and IMU angular velocity avoids repeated integration in the iterative optimization process. The pose of the robot in the sliding window and the visual feature points are the state to be estimated. Then, the state is initialized based on the wheel odometer and IMU, so that the initial value of the state can be calculated quickly and reliably in both stationary and moving states. Finally, the residual constraint is added to the state estimator, and the optimal robot pose is solved through nonlinear optimization. Compared with the existing SLAM algorithm for many experiments, the SLAM algorithm in this paper can achieve higher pose accuracy and robustness in environments with more extreme situations. Experimental comparison results show that the SLAM algorithm proposed in this paper is feasible and effective.
In future research, our algorithm can be improved in the following ways. First, a motion capture system or D-GPS can be employed to obtain ground truth for quantitative pose comparison analysis. Second multiple monocular cameras with non-common view relationships can be used, which can improve robustness. When some cameras are blocked, the other cameras can still provide reliable visual measurement. Further, based on the robust robot pose measurement, using binocular or depth cameras to build a dense map of the environment can meet the requirements of robot navigation and obstacle avoidance.