CN116957958A

CN116957958A - VIO front end improvement method based on inertia prior correction image gray scale

Info

Publication number: CN116957958A
Application number: CN202310751164.7A
Authority: CN
Inventors: 潘树国; 徐锦乐; 高旺; 刘宏; 赵庆; 何少鹏
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2023-06-25
Filing date: 2023-06-25
Publication date: 2023-10-27

Abstract

The invention discloses a VIO front end improvement method based on inertia prior correction image gray scale. Firstly, reading in image information acquired by a camera, carrying out feature extraction and tracking on the image, after the initialization of a VIO is completed, predicting the posture change of the camera by taking feature points with known depth as tracked points through an IMU pre-integration result, obtaining the pixel position of the tracked point in the current camera input image as prior estimation, calculating structural similarity measurement index results by taking pixel information in a neighborhood range and neighborhood pixel information corresponding to the tracked point, selecting different strategies according to the results to carry out gray change processing on an image area, using the predicted position of the tracked point as an initial value in an L-K optical flow pyramid to obtain a feature point pair with more accurate tracking result, and obtaining a position pose estimation result by taking the filtered key frame as a visual front-end geometric constraint to enter into rear-end optimization. The track progress solved by the method is improved by 55.7% compared with the track precision of the VINS-Mono algorithm.

Description

VIO front end improvement method based on inertia prior correction image gray scale

Technical Field

The invention belongs to the technical field of VIO of vehicle platforms, relates to a positioning method in a tunnel, and in particular relates to a VIO front end improvement method based on inertia priori correction of image gray scale.

Background

The feature point method is classified into optical flow matching and feature descriptor matching according to the matching method, the optical flow method has higher operation efficiency than the latter, but is more sensitive to illumination change and rapid movement, and severe illumination change may cause the VIO to lack sufficient visual geometric constraint to participate in optimization so as to generate system divergence. Therefore, the method has great significance for improving the enhancement of the VIO vision front-end image.

There are many image enhancement methods currently in focus of improving the visual front end, such as the image enhancement algorithms of the classical Retinex theory, including single-scale enhancement and multi-scale enhancement, which decompose the original image into an illumination component and a reflection component, with the reflection component being the enhancement result; a dark channel prior defogging algorithm is used for restoring a clear image by utilizing a dark channel diagram, and the algorithm effect is improved by combining guide filtering; most classical and widely used histogram equalization algorithms, etc. The high-altitude and the like convert the image into HSV space, process the brightness information in the V channel, enhance the image by using a gamma correction and histogram equalization weighting fusion mode, and improve the feature point extraction quantity and the matching success rate after the improved random sampling consistency (RANSAC) algorithm processing. In 2021, yin Shengnan et al improved on the automatic color enhancement (automatic color enhancement, ACE) algorithm, and the use of the fast ACE algorithm made great progress in improving both the number of feature point extractions and the number of successful matches. Or from a deep learning perspective, to adaptively change the camera exposure time and gain, thereby increasing the number of high quality feature points. However, the above technology is often suitable for a single scene, and has the problems of high power consumption and the like. The previous image enhancement strategy aims to process a single frame image to improve the quality and quantity of feature points, and does not effectively utilize the existence of the feature points between the frames before and after the image to improve the stability of feature tracking.

Disclosure of Invention

In order to solve the problems, the invention discloses a VIO front end improvement method based on inertia prior correction of image gray scale, which utilizes IMU prior prediction of image characteristic point positions and corrects the image gray scale according to the prediction points, so that the visual geometric constraint of the VIO front end is more robust and reliable.

In order to achieve the above purpose, the technical scheme of the invention is as follows:

a VIO front end improvement method based on inertia prior correction image gray scale comprises the following steps:

(1) Reading in image information acquired by a camera and inertial information acquired by an IMU;

the VIO system needs to align the image information with the inertial information time stamp because the IMU acquisition frequency (100 Hz) is higher than the camera acquisition frequency (10 Hz), and the data needs to be aligned and packed;

(2) Initializing an odometer according to IMU pre-integration and image characteristics;

before the odometer is formally operated, the position and depth information of map points are needed to be initialized, the external rotation parameters are estimated according to the inertial integration result and the visual geometric constraint, the deviation of the gyroscope is estimated by utilizing the rotation constraint, and the gravity direction, the speed and the scale initial value are estimated by utilizing the translation constraint. Solving a rotation matrix between the world coordinate system and the initial camera coordinate system and aligning the trajectory to the world coordinate system (heading angle is not considerable).

(3) Predicting map point positions by using an IMU pre-integration result;

the camera pose change between two moments is obtained through preliminary estimation by pre-integrating the IMU in the time of two frames of images, and the corresponding formula is as follows:

in the method, in the process of the invention,representing the position, speed and attitude of the carrier at time k+1 under the world coordinate system, representing the rotation matrix at time t under the world coordinate system,/for>And->Representing measurements of accelerometer and gyroscope at time t, b _at And b _ωt Indicating zero offset of accelerometer and gyroscope at time t, n _a ，n _ω Representing accelerometer and gyroscope noise, g ^w Is the gravitational force under the world system.

If there is a feature point P of known depth in the previous frame image _pre ＝[X,Y,Z] ^T Its predicted pixel location in the current frame can be predicted by the following equation:

s[u,v，1] ^T ＝π _c (R _pc (R _pre ·P _pre +t _pre )+t _pc ) (4)

in the method, in the process of the invention,u, v represents predicted pixel coordinates; s represents a scale factor; pi _c (. Cndot.) represents the camera projection function, including camera intrinsic and distortion operations; r is R _pre And t _pre Representing pose estimation of a camera in a world coordinate system at the moment of the previous frame; and representing camera pose changes between the two moments.

(4) Calculating the similarity of neighborhood pixels according to the predicted position and the known position of the map point;

selecting neighborhood pixels with the same size of the neighborhood positions corresponding to the known depth feature points and the prediction feature points of the two points, calculating the gray scale, contrast and structural similarity of the two areas, and obtaining an image similarity measurement index; the image similarity measurement index has the following calculation formula:

SSIM(x,y)＝[l(x,y)] ^α [c(x,y)] ^β [s(x,y)] ^λ (5)

wherein l (x, y) represents the gray scale similarity of x, y; c (x, y) represents the comparative similarity of x, y; s (x, y) represents the structural similarity of x, y; α, β, λ represent the specific gravity of the similarity in several aspects of similarity, respectively, and typically take the value α=β=λ=1.

l (x, y), c (x, y) and s (x, y) are calculated as follows:

wherein mu is _x ，μ _y Representing the average gray value of the neighborhood of the predicted point and the known point; sigma (sigma) _x ，σ _y ，σ _xy Representing standard deviation and covariance of the neighborhood of the front frame and the rear frame; c (C) ₁ ，C ₂ ，C ₃ The method is characterized in that the parameters for controlling the three similarity magnitudes are respectivelyThe method is respectively as follows:

wherein k is ₁ ＝0.01，k ₂ =0.03, l takes the image gray maximum 255. The higher the similarity of the neighborhood images of the feature points is, the higher the success rate of feature tracking is, and the prediction points with low structural similarity can be removed by screening according to the result, and the prediction points with low contrast and texture missing can be removed. According to experimental experience, when s (x, y) is lower than 0.3, the corresponding predicted points are eliminated, and the vision geometric constraint is not participated.

(5) After screening the proper characteristic point pairs, the SSIM result needs to be evaluated, and the adaptive gray correction is performed on the area where the predicted point of the current image is located, which can be divided into three cases: the condition of the illumination condition is greatly changed, the condition of the illumination condition is poor and the contrast ratio is changed;

the lighting conditions vary greatly:

when the similarity of the neighborhood pixels of the prediction point and the known feature point is evaluated, SSIM results with gray scale, contrast and structural similarity results close to 1 are often generated under the condition that tracking is successful, in order to ensure that information in the neighborhood pixels only has gray scale difference, gamma correction is carried out on an image when the contrast and structural similarity results are large and the gray scale similarity is small, and the correction result of the condition that the illumination condition is greatly changed is calculated by the following formula:

in the formula, gamma in gamma correction is set asThe purpose is to obtain the gray level difference between two frames in an exponential form, and after correction, to avoid layering at the edges of the regionThe correction area is subjected to value adjustment:

wherein, the symbol represents the multiplication of the corresponding position element of the matrix; g represents a Gaussian matrix with the size consistent with the neighborhood of the feature points, sigma takes 1/2 of the size of the matrix, and the value is multiplied by a gain k, so that the maximum value in the matrix is 1; the max function and the min function correspond to different illumination variations, respectively, with the intention of avoiding layering and preserving brighter (dark) pixels in the area.

Poor lighting conditions:

in the process of capturing images by a camera, if the two images are too bright or too dark, the adopted gamma correction is expressed by the following formula:

where M represents the median of the gray values of the corresponding image. Similarly, the correction region is also required to be adjusted by the expression (10).

Contrast change condition:

when the illumination condition of the camera changes, the contrast of the acquired image often changes along with the gray level change, so when the contrast similarity of the adjacent pixels is low and the structural similarity is high, gamma correction is performed according to the contrast, but only the condition of contrast reduction is corrected, whether the basis is larger or not is judged, and the corresponding gamma correction formula is slightly different relatively:

where c represents the contrast similarity of pixels within the region.

In summary, different strategy gamma correction methods are selected according to different results obtained by SSIM calculation.

As a further improvement of the invention, the image information collected by the camera and the inertial information collected by the IMU in the step (1) are read in, and the VIO system needs to align the image information with the inertial information time stamp, because the IMU collection frequency (100 Hz) is higher than the camera collection frequency (10 Hz), and the data needs to be aligned and packed.

According to the invention, in the step (2), the odometer is initialized according to IMU pre-integration and image characteristics, the position and depth information of map points are needed to be initialized before the odometer is formally operated, the external rotation parameters are estimated according to the inertial integration result and visual geometric constraint, the deviation of the gyroscope is estimated by utilizing the rotation constraint, and the gravity direction, the speed and the scale initial value are estimated by utilizing the translation constraint. Solving a rotation matrix between the world coordinate system and the initial camera coordinate system and aligning the trajectory to the world coordinate system (heading angle is not considerable).

As a further improvement of the invention, the method in the step (3) is used for preliminarily estimating the camera pose change between two moments by pre-integrating the IMU in the time of the two frames of images and predicting the map point position.

As a further improvement of the invention, the neighborhood of the predicted position of the map point is selected according to the selection rule in the step (4) to calculate the neighborhood pixel similarity of the predicted position of the map point and the known position.

As a further improvement of the present invention, after screening the appropriate feature point pairs in the step (5), the SSIM result needs to be evaluated, and the adaptive gray correction is performed on the area where the predicted point of the current image is located, which is divided into three cases: the condition of the illumination condition is greatly changed, the condition of the illumination condition is poor and the contrast is changed.

The beneficial effects of the invention are as follows:

according to the VIO front end improvement method based on the inertia priori correction image gray scale, the IMU priori predicts the image characteristic point positions, and corrects and screens the image gray scale according to the predicted point SSIM result, so that visual geometric constraint of the VIO front end is more robust and reliable, accurate positioning is achieved, and compared with the VINS-Mono in a data set and actual measurement data, the accuracy of the algorithm estimated track is improved and can reach 55.7% at most.

Drawings

FIG. 1 is a flow chart of gamma correction of the present invention;

FIG. 2 is a prediction point neighborhood selection rule;

FIG. 3 is a graph of gray scale correction effects;

fig. 4 is an experimental facility installation diagram.

Detailed Description

The present invention is further illustrated in the following drawings and detailed description, which are to be understood as being merely illustrative of the invention and not limiting the scope of the invention.

The invention relates to a VIO front end improvement method based on inertia prior correction image gray scale, which comprises the following steps:

(3) Predicting map point positions by using an IMU pre-integration result;

If there is a feature point P of known depth in the previous frame image _pre ＝[X,Y,Z] ^T Its predicted pixel position at the current frame is predicted by the following equation:

s[u,v，1] ^T ＝π _c (R _pc (R _pre ·P _pre +t _pre )+t _pc ) (4)

where u, v represents predicted pixel coordinates; s represents a scale factor; pi _c (. Cndot.) represents the camera projection function, including camera intrinsic and distortion operations; r is R _pre And t _pre Representing pose estimation of a camera in a world coordinate system at the moment of the previous frame; and representing camera pose changes between the two moments.

selecting two corresponding neighborhood positions, and calculating the neighborhood pixel similarity, wherein the image similarity measurement index is as follows:

SSIM(x,y)＝[l(x,y)] ^α [c(x,y)] ^β [s(x,y)] ^λ (5)

l (x, y), c (x, y) and s (x, y) are calculated as follows:

wherein mu is _x ，μ _y Representing the average gray value of the neighborhood of the predicted point and the known point; sigma (sigma) _x ，σ _y ，σ _xy Representing standard deviation and covariance of the neighborhood of the front frame and the rear frame; c (C) ₁ ，C ₂ ，C ₃ The method is respectively set as follows:

wherein k is ₁ ＝0.01，k ₂ =0.03, l takes the image gray maximum 255. The higher the similarity of the neighborhood images of the feature points is, the higher the success rate of feature tracking is, and the predicted points with low structural similarity can be removed by screening according to the result, and the predicted points with low contrast and texture missing can be removed when s (x, y) is smaller than a certain valueAnd when the matching method is used, the structural similarity of the corresponding areas is low, and the matching success rate is low. According to experimental experience, when s (x, y) is lower than 0.3, the corresponding predicted points are eliminated, and the vision geometric constraint is not participated.

the lighting conditions vary greatly:

when the similarity of the neighborhood pixels of the prediction point and the known feature point is evaluated, SSIM results with gray scale, contrast and structural similarity results close to 1 are often generated under the condition that tracking is successful, in order to ensure that information in the neighborhood pixels has only gray scale difference, gamma correction is performed on an image when the contrast and structural similarity results are large and the gray scale similarity is small, and the correction result of the condition that the illumination condition is greatly changed can be calculated by the following formula:

in the formula, in gamma correction, the purpose is to obtain the gray level difference between two frames in an exponential form, and after correction, in order to avoid layering phenomenon at the edge of the region, the correction region needs to be subjected to value adjustment:

Poor lighting conditions:

Contrast change condition:

where c represents the contrast similarity of pixels within the region.

The technical scheme of the invention is verified to be effective and accurate according to experiments in actual environments, and the track accuracy calculated by the improved algorithm and the original VINS-Mono algorithm is compared. Firstly, evaluating the track estimation precision of an algorithm in a TUM data set, wherein an evaluation strategy is to compare a data set true value with an algorithm output track through an evo track positioning precision evaluation tool, and calculate an RMSE value. The computer configuration adopted is as follows: the processor is Intel Corei 7-11700.50 GHz, and the memory is 16GB.

The other group of tests are actual measurement data, the experimental site is a four-hand building school parking lot of southeast university in Nanjing, and the track precision is estimated by using GNSS/INS combined navigation positioning results with the precision within 10cm as true values. With the use of a sea-health industrial camera (model: MV-CA016-10UC, resolution: 1440×1080), the system operation is configured as CPU: intel corei7-117002.50GHz, graphics card: GTX1060 has 16GB memory.

Data set test results are shown in table 1 below:

TABLE 1 comparison of track estimation results RMSE on TUM data sets

In the measured dataset, the result of estimating the trajectory is shown in table 2 below:

table 2 comparison of the measured data trace estimation results RMSE

As can be seen from tables 1 and 2, the track accuracy obtained by the improved algorithm has a more obvious improvement effect compared with the VINS-Mono algorithm. In the data set test, the track accuracy improvement amplitude in 10 groups of data sets can reach 57.9% at maximum, and the improvement is realized. In the measured data, in the test field environment, the track estimation accuracy of the improved algorithm is improved by 31.0%, and the test result shows that the track estimation accuracy of the VIO front end improved method based on the inertia priori correction image gray scale provided by the invention is improved by a larger extent compared with the VINS-Mono algorithm, and the highest improvement amplitude can reach 57.9%.

It should be noted that the foregoing merely illustrates the technical idea of the present invention and is not intended to limit the scope of the present invention, and that a person skilled in the art may make several improvements and modifications without departing from the principles of the present invention, which fall within the scope of the claims of the present invention.

Claims

1. The VIO front end improvement method based on inertia prior correction image gray scale is characterized by comprising the following steps of:

the VIO system takes the acceleration value and the angular velocity value acquired by the camera acquisition image and the IMU as input, and finally outputs the pose of the system;

the image data is used as input, global structure reconstruction is carried out to obtain system vision pose estimation, alignment is carried out according to the time stamp and IMU data, scale is obtained through joint optimization estimation, and system initialization is completed;

(3) Predicting map point positions by using an IMU pre-integration result;

in the method, in the process of the invention,representing the position, speed, attitude, and/or the +/of the carrier at time k+1 in the world coordinate system>Representing the rotation matrix under world coordinate system at time t, < >>And->Representing measurements of accelerometer and gyroscope at time t, b _at And b _ωt Indicating zero offset of accelerometer and gyroscope at time t, n _a ，n _a Representing accelerometer and gyroscope noise, g ^w Is the gravity under the world system;

s[u,v，1] ^T ＝π _c (R _pc (R _pre ·P _pre +t _pre )+t _pc ) (4)

where u, v represents predicted pixel coordinates; s represents a scale factor; pi _c Representing a camera projection function, wherein camera internal parameters and distortion operation are included; r is R _pre And P _pre Representing pose estimation of a camera in a world coordinate system at the moment of the previous frame; r is R _pc And t _pc Representing camera pose changes between two moments;

(4) Calculating the neighborhood pixel similarity according to the predicted position and the known position of the map point

Selecting neighborhood pixels with the same position and size of known depth feature points and predicted feature points, and calculating the image similarity of the two areas;

the image similarity measurement index is as follows:

SSIM(x,y)＝[l(x,y)] ^α [c(x,y)] ^β [s(x,y)] ^λ (5)

wherein l (x, y) represents the gray scale similarity of x, y; c (x, y) represents the comparative similarity of x, y; s (x, y) represents the structural similarity of x, y; α, β, λ (α is greater than or equal to 0, β is greater than or equal to 0, λ is greater than or equal to 0) respectively represent specific gravity of similarity among the several aspects of similarity, and the value is α=β=λ=1;

l (x, y), c (x, y) and s (x, y) are calculated as follows:

C ₁ ＝(k ₁ *L) ² ,C ₂ ＝(k ₂ *L) ² ,

wherein k is ₁ ＝0.01，k ₂ =0.03, l takes the image gray maximum 255; the higher the similarity of the neighborhood images of the feature points is, the higher the success rate of feature tracking is, and the prediction points with low structural similarity can be removed according to the result, and the prediction points with low contrast and texture missing can be removed, so that when s (x, y) is smaller than a certain value, the structural similarity of the corresponding region is lower, and the matching success rate is low; according to experimental experience, eliminating the corresponding predicted points when s (x, y) is lower than 0.3, and not participating in visual geometric constraint;

(5) After screening proper characteristic point pairs, evaluating SSIM results, carrying out self-adaptive gray correction on the area where the predicted point of the current image is located, and then respectively processing three conditions: the condition of the illumination condition is greatly changed, the condition of the illumination condition is poor and the contrast ratio is changed;

the lighting conditions vary greatly:

in the formula, gamma in gamma correction is set asThe purpose is to obtain the gray level difference between two frames in an exponential form, and after correction, in order to avoid layering phenomenon at the edge of the region, the correction region needs to be subjected to value adjustment:

wherein, the symbol represents the multiplication of the corresponding position element of the matrix; g represents a Gaussian matrix with the size consistent with the neighborhood of the feature points, sigma takes 1/2 of the size of the matrix, and the value is multiplied by a gain k, so that the maximum value in the matrix is 1; the max function and the min function respectively correspond to different illumination change conditions, and are intended to avoid layering phenomenon and reserve brighter/darker pixels in the region;

poor lighting conditions:

wherein, the gray value median of the corresponding image is represented; similarly, the correction region is also required to be adjusted by the formula (10);

contrast change condition:

when the illumination condition of the camera changes, the contrast of the acquired image often changes along with the gray level change, so when the contrast similarity of the neighborhood pixels is low and the structural similarity is high, gamma correction is performed according to the contrast, but correction is performed only for the condition of contrast reduction, and the judgment basis is sigma _x Whether or not to be greater than sigma _y The corresponding gamma correction formula is:

where c represents the contrast similarity of pixels within the region.

2. The method for improving the VIO front end based on the inertia prior correction image gray scale according to claim 1, wherein in the step (1), image information collected by a camera and inertia information collected by an IMU are read in, the IMU collection frequency (100 Hz) is higher than the camera collection frequency (10 Hz), and time stamp alignment and packaging are needed for data.

3. The method for improving the VIO front end based on the inertia prior correction image gray scale according to claim 1, wherein in the step (2), the odometer is initialized according to IMU pre-integration and image characteristics, the initialization is needed to obtain map point position and depth information before the odometer formally operates, the external rotation parameters are estimated according to an inertia integration result and visual geometric constraint, the deviation of a gyroscope is estimated by utilizing the rotation constraint, and the gravity direction, the speed and the scale initial value are estimated by utilizing the translation constraint; and solving a pitching and rolling angle between the world coordinate system and the initial camera coordinate system according to the gravity direction, converting the pitching and rolling angle into a rotation matrix, and aligning the track to the world coordinate system.

4. The method for improving the VIO front end based on the prior correction of the image gray scale of inertia according to claim 1, wherein the method in the step (3) obtains the camera pose change between two moments by pre-integrating the IMU data in the time range of the two frames of images, and predicts the pixel position of the known depth feature point in the current frame in the image of the next frame according to the system pose change.

5. The method for improving the VIO front end based on the prior correction of image gray scale of inertia according to claim 4, wherein in the step (4), the neighborhood image similarity measurement index SSIM is calculated according to the map point prediction pixel and the known pixels, and the gray scale, the contrast and the structural similarity result are obtained.

6. The method for improving the VIO front end based on the prior correction of the image gray scale of inertia according to claim 5, wherein in the step (5), after the appropriate feature point pairs are screened, the SSIM result is evaluated, and three cases are classified according to different results: the conditions of the illumination condition greatly change, the condition of the illumination condition is poor and the contrast ratio changes, and different strategies are selected according to the result to process the image.