CN110378997A

CN110378997A - A kind of dynamic scene based on ORB-SLAM2 builds figure and localization method

Info

Publication number: CN110378997A
Application number: CN201910481714.1A
Authority: CN
Inventors: 龙土志; 蔡述庭; 李丹; 杨家兵; 董海涛
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2019-06-04
Filing date: 2019-06-04
Publication date: 2019-10-25
Anticipated expiration: 2039-06-04
Also published as: CN110378997B

Abstract

The invention discloses a kind of dynamic scenes based on ORB-SLAM2 to build figure and localization method, including local map tracking process, dynamic pixel reject process, sparse mapping process, closed loop detection process and building Octree map process；This method has the function of dynamic pixel rejecting, combines the depth image of new key frame to be quickly detected mobile object in the image information of camera by object detection method and constructs a clean static background Octree map in complicated dynamic environment.

Description

A dynamic scene mapping and positioning method based on ORB-SLAM2

技术领域technical field

本发明涉及机器人同步建图与定位slam技术领域，具体涉及一种基于orb-slam 2的动态场景建图与定位方法。The invention relates to the technical field of robot synchronous mapping and positioning slam, in particular to a dynamic scene mapping and positioning method based on orb-slam 2.

背景技术Background technique

SLAM(同时定位与地图重建)一直是计算机视觉和机器人领域的热门话题，同时也吸引了很多高科技公司的关注。SLAM技术是在未知的环境当中建立一个地图并且能够在地图当中实时的定位。现代可视化SLAM系统的框架非常成熟，如ORB-SLAM2，LSD-SLAM；最先进的视觉同步定位和映射(V-SLAM)系统具有高精度定位功能，但是大多数这些系统假设操作环境是静态的，从而限制了它们的应用。SLAM (simultaneous localization and map reconstruction) has been a hot topic in the field of computer vision and robotics, and it has also attracted the attention of many high-tech companies. SLAM technology is to build a map in an unknown environment and be able to locate it in real time on the map. The framework of modern visual SLAM systems is very mature, such as ORB-SLAM2, LSD-SLAM; the most advanced Visual Synchronous Localization and Mapping (V-SLAM) systems have high-precision positioning capabilities, but most of these systems assume that the operating environment is static, thus limiting their application.

针对于在动态场景中建立静态地图，现有的算法都有其不足，例如DynaSLAM不能实时的，DynaSLAM只剔除预定义对象的像素，而对于未定义的对象或者只是预定义对象的一部分则无法剔除，StaticFusion系统对于长时间静止的人无法剔除，以上算法都不能在复杂的动态环境实时地建立一个干净的静态地图。For the establishment of static maps in dynamic scenes, existing algorithms have their shortcomings. For example, DynaSLAM cannot be real-time. DynaSLAM only removes pixels of predefined objects, but cannot remove undefined objects or only part of predefined objects. , the StaticFusion system cannot eliminate people who have been stationary for a long time, and none of the above algorithms can build a clean static map in real time in a complex dynamic environment.

发明内容Contents of the invention

针对上述现有技术中存在的问题，本发明的目的是提供一种基于ORB-SLAM2的动态场景建图与定位方法，该方法具有动态像素剔除的功能，通过目标检测方法集合新关键帧的深度图像在相机的图像信息中快速检测到移动对象并在复杂的动态环境中构建一个干净的静态背景八叉树地图。In view of the problems existing in the above-mentioned prior art, the object of the present invention is to provide a dynamic scene mapping and positioning method based on ORB-SLAM2, which has the function of dynamic pixel elimination, and collects the depth of new key frames through the target detection method Image quickly detects moving objects in the camera's image information and builds a clean static background octree map in complex dynamic environments.

为了实现上述任务，本发明采用以下技术方案：In order to achieve the above tasks, the present invention adopts the following technical solutions:

一种基于ORB-SLAM2的动态场景建图与定位方法，包括以下步骤：A dynamic scene mapping and positioning method based on ORB-SLAM2, comprising the following steps:

步骤1，局部地图跟踪Step 1, local map tracking

利用机器人所携带的相机捕获的图像信息对相机位姿进行初始化；初始化时将相机捕获的第一帧图像作为关键帧；得到初始位姿后，对局部地图进行跟踪，从而优化相机位姿以及生产新的关键帧；Use the image information captured by the camera carried by the robot to initialize the camera pose; the first frame of image captured by the camera is used as the key frame during initialization; after the initial pose is obtained, the local map is tracked to optimize the camera pose and production new keyframe;

步骤2，动态像素剔除Step 2, dynamic pixel culling

使用目标检测算法在新关键帧的彩色图像中检测预定义动态对象，然后结合新关键帧的深度图像来识别动态像素；先后通过这两种方法检测到的动态像素都被剔除；Use the target detection algorithm to detect predefined dynamic objects in the color image of the new key frame, and then combine the depth image of the new key frame to identify dynamic pixels; the dynamic pixels detected by these two methods are all eliminated;

步骤3，稀疏映射Step 3, Sparse Mapping

对于剔除了动态像素的关键帧，优化关键帧的机器人位姿，增加新的地图点，维护关键帧集合的质量与规模；For the keyframes with dynamic pixels removed, optimize the robot pose of the keyframes, add new map points, and maintain the quality and scale of the keyframe collection;

步骤4，闭环检测Step 4, closed loop detection

对每个新的关键帧进行闭环检测，一旦检测到闭环，进行位姿图优化；Perform loop closure detection for each new key frame, and once a closed loop is detected, perform pose graph optimization;

步骤5，构建八叉树地图Step 5, build an octree map

利用八叉树将地图点划分为体素，并通过八叉树结构存储这些体素，以构建八叉树地图；通过计算体素的占据概率来判断体素是否被占据，如占据则在八叉树图进行可视化。Use the octree to divide the map points into voxels, and store these voxels through the octree structure to construct the octree map; determine whether the voxel is occupied by calculating the occupancy probability of the voxel. Fork tree diagram for visualization.

进一步地，所述的对局部地图进行跟踪，从而优化相机位姿以及生产新的关键帧，包括：Further, the tracking of the local map to optimize the camera pose and generate new keyframes includes:

所述的局部地图指的是距离和视角接近当前帧的关键帧所观察到的3D点；通过重投影得到更多匹配的3D点，从而最小误差进行优化相机位姿及产生新的关键帧：The local map refers to the 3D points observed by the key frame whose distance and viewing angle are close to the current frame; more matching 3D points are obtained through reprojection, so as to optimize the camera pose and generate new key frames with the minimum error:

将局部地图上的3D点投影到当前帧上，得到3D-2D特征匹配；Project the 3D points on the local map onto the current frame to obtain 3D-2D feature matching;

限制在当前帧的搜索2D匹配点的区域来减少误匹配，然后将当前帧中的像素与3D点按照当前估计的相机位姿进行投影得到的位置相比较得到的误差构造最小二乘法问题，使它最小化，然后寻找最好的相机位姿，以进行定位；Limit the area of searching for 2D matching points in the current frame to reduce false matching, and then compare the pixels in the current frame with the positions obtained by projecting the 3D points according to the current estimated camera pose to construct the least squares method problem, so that It minimizes and then finds the best camera pose for localization;

根据预设条件判断是否要生成新的关键帧。Determine whether to generate a new keyframe according to preset conditions.

进一步地，步骤2所述的结合新关键帧的深度图像来识别动态像素，包括：Further, the step 2 described in combination with the depth image of the new key frame to identify dynamic pixels includes:

把经过目标检测算法剔除预定义的对象剩下的像素投影到世界坐标下来创建3D点；将3D点分成多个簇，从每个簇中均匀地选出M个像素；对于每一个像素，将像素投影到离新关键帧最近的N帧关键帧进行比较来检测像素是为否动态像素：Project the remaining pixels of the target detection algorithm to remove the predefined objects to the world coordinates to create 3D points; divide the 3D points into multiple clusters, and select M pixels evenly from each cluster; for each pixel, the The pixel is projected to the N keyframes closest to the new keyframe for comparison to detect whether the pixel is a dynamic pixel:

使用关键帧的深度图像中的深度和新关键帧的机器人位姿将像素u反投影到世界坐标下的3D点p^w；Back-project pixel u to a 3D point ^pw in world coordinates using the depth from the keyframe's depth image and the new keyframe's robot pose;

将3D点p^w投影到新关键帧附近的第j个关键帧的彩色图像上；Project the 3D point p ^w onto the color image of the jth keyframe near the new keyframe;

如果第j个关键帧的像素u′在对应的深度图像上存在有效的深度值z′，则像素u′反投影到世界坐标下的3D点p^w′；If the pixel u' of the jth key frame has a valid depth value z' on the corresponding depth image, then the pixel u' is back-projected to the 3D point p ^w' in the world coordinates;

通过将p^w′和p^w之间的距离d与设定的阈值d_mth比较来判断像素u是否为动态的：Whether a pixel u is dynamic is judged by comparing the distance d between ^pw′ and ^pw with a set threshold _dmth :

搜索u′周围的正方形内的像素，使得d取最小值d_min；如果d_min大于阈值d_mth，则初步判断像素u判断为静态的，否则初步判断它是动态的。Search the pixels in the square around u′ so that d takes the minimum value d _min ; if d _min is greater than the threshold d _mth , the pixel u is preliminarily judged to be static, otherwise it is preliminarily judged to be dynamic.

进一步地，假设像素u在所有附近关键帧的初步判断结果中，静态结果的数量是N_S，动态结果的数量是N_d，无效结果的数量是N_i，像素u的最终属性如下：Further, assuming that pixel u is in the preliminary judgment results of all nearby key frames, the number of static results is N _S , the number of dynamic results is N _d , and the number of invalid results is N _i , the final attributes of pixel u are as follows:

如果(N_S≥N_d,N_S＞0)，则像素u为静态像素；If (N _S ≥ N _d , N _S >0), then the pixel u is a static pixel;

如果(N_d≥N_s,N_d＞0)，则像素u为动态像素；If (N _d ≥ N _s , N _d > 0), then the pixel u is a dynamic pixel;

如果(N_S＝N_d＝0)，则像素u为无效。If (N _s =N _d =0), then pixel u is invalid.

进一步地，所述的先后通过这两种方法检测到的动态像素都被剔除，其中结合新关键帧的深度图像来识别动态像素后，进行剔除的方法为：Further, the dynamic pixels detected by the two methods successively are all eliminated, and after the dynamic pixels are identified by combining the depth image of the new key frame, the method of elimination is as follows:

在一个簇中均匀选择的M个像素中，假设静态像素数为N_s'，动态像素数为N_d'，无效像素数为N_i'，簇的最终属性如下：Among M pixels uniformly selected in a cluster, assuming that the number of static pixels is N _s ', the number of dynamic pixels is N _d ', and the number of invalid pixels is N _i ', the final properties of the cluster are as follows:

如果(N_S'≥N_d')，则该簇为静态簇，保留该簇；If (N _S '≥N _d '), the cluster is a static cluster, and the cluster is reserved;

如果(N_d'≥N_s')，则该簇为动态簇，进行剔除。If (N _d '≥N _s '), then the cluster is a dynamic cluster and will be eliminated.

进一步地，所述的优化关键帧的机器人位姿，增加新的地图点，维护关键帧集合的质量与规模，包括：Further, the optimization of the robot pose of the key frame, adding new map points, and maintaining the quality and scale of the key frame set includes:

计算当前关键帧的Bow向量，更新当前关键帧的地图点；Calculate the Bow vector of the current keyframe and update the map point of the current keyframe;

应用滑动窗口局部BA进行对机器人位姿的优化，优化对象为当前帧中的位姿；Apply sliding window local BA to optimize the pose of the robot, and the optimized object is the pose in the current frame;

检测冗余关键帧并剔除，如果关键帧上90％的像素能够被超过三个任意关键帧观察到，则被认为是冗余关键帧，并将其删除。Redundant keyframes are detected and eliminated. If 90% of the pixels on a keyframe can be observed by more than three arbitrary keyframes, it is considered a redundant keyframe and deleted.

进一步地，所述的通过计算体素的占据概率来判断体素是否被占据，如占据则在八叉树图进行可视化，包括：Further, the calculation of the occupancy probability of the voxel is used to determine whether the voxel is occupied, and if it is occupied, it is visualized in the octree diagram, including:

设Z_t表示体素n在时间t的观察结果，体素n从时间为t开始的概率对数值为L(n|Z_1:t)，然后在时间为t+1时，体素n的概率对数值由下面公式得到：Let Z _t represent the observation result of voxel n at time t, the probability log value of voxel n starting from time t is L(n|Z _1:t ), and then at time t+1, voxel n’s The log probability value is obtained by the following formula:

如果体素n在时间t被观察到，则L(n|Z_t)为τ，否则为0；增量τ是预定值；L(n|Z _t ) is τ if voxel n is observed at time t, otherwise 0; increment τ is a predetermined value;

定义p∈[0,1]表示体素被占据的概，和l∈R表示概率对数值，l通过logit变换计算得到：Define p∈[0,1] to represent the probability that a voxel is occupied, and l∈R to represent the logarithmic value of the probability, and l is calculated by logit transformation:

以上式子反变换为：The inverse transformation of the above formula is:

通过将概率对数值代入式10来计算体素的占据概率p；仅当占用概率p大于预定阈值时，才认为体素n被占据并且将在八叉树图中可视化。The occupancy probability p of a voxel is calculated by substituting the logarithm value into Equation 10; only if the occupancy probability p is greater than a predetermined threshold, voxel n is considered occupied and will be visualized in the octree.

本发明具有以下技术特点：The present invention has the following technical characteristics:

1.快速性1. Rapidity

该算法使用CornerNet-Squeeze目标检测算法来检测动态对象以及使用k-means变体Mini Batch K-Means聚类算法对图像的深度信息进行聚类，比现已有的算法快。因为CornerNet-Squeeze目标检测算法处理一张图片只需34ms，比YOLOv3等算法都要快，测试硬件环境为：1080ti GPU+Intel Core i7-7700k CPU。对于大数据大于一万的聚类方法，k-means变体Mini Batch K-Means要比k-means本身要快三倍，性能也相差不大。The algorithm uses the CornerNet-Squeeze target detection algorithm to detect dynamic objects and uses the k-means variant Mini Batch K-Means clustering algorithm to cluster the depth information of the image, which is faster than existing algorithms. Because the CornerNet-Squeeze target detection algorithm only needs 34ms to process a picture, which is faster than YOLOv3 and other algorithms, the test hardware environment is: 1080ti GPU+Intel Core i7-7700k CPU. For clustering methods with big data larger than 10,000, the k-means variant Mini Batch K-Means is three times faster than k-means itself, and the performance is not much different.

另外，八叉树地图的建立也缩短了地图的更新时间。In addition, the establishment of the octree map also shortens the update time of the map.

2.在复杂的环境中可以建立非常干净的静态地图2. A very clean static map can be built in a complex environment

该算法结合目标检测方法以及八叉树地图的概率对数值更新体素方式来检测并剔除动态像素。The algorithm combines the target detection method and the probability logarithm of the octree map to update voxels to detect and eliminate dynamic pixels.

附图说明Description of drawings

图1为本发明方法的流程图；Fig. 1 is the flowchart of the inventive method;

图2为局部BA优化过程图；Figure 2 is a diagram of the local BA optimization process;

图3为八叉树地图模型。Figure 3 is an octree map model.

具体实施方式Detailed ways

ORB-SLAM2是基于单目，双目和RGB-D相机的一套完整的SLAM方案。它能够实现地图重用，回环检测和重新定位的功能。但是它假设操作环境是静态的，限制了它的应用。ORB-SLAM2 is a complete SLAM solution based on monocular, binocular and RGB-D cameras. It enables map reuse, loop detection and relocation. But it assumes that the operating environment is static, which limits its application.

本发明算法是在ORB-SLAM2算法的基础上提出的，能够实时快速地在动态的环境中建立一个干净的静态八叉树地图，主要由五个步骤组成：局部地图跟踪、动态像素剔除、稀疏映射、闭环检测以及创建八叉树地图，整体流程图如图1所示。具体内容如下：The algorithm of the present invention is proposed on the basis of the ORB-SLAM2 algorithm, which can quickly establish a clean static octree map in a dynamic environment in real time. It mainly consists of five steps: local map tracking, dynamic pixel removal, and sparse Mapping, loop closure detection, and creation of octree maps, the overall flow chart is shown in Figure 1. The specific content is as follows:

步骤1，局部地图跟踪Step 1, local map tracking

利用机器人所携带的相机捕获的图像信息对相机位姿进行初始化；初始化时将相机捕获的第一帧图像作为关键帧；得到初始位姿后，对局部地图进行跟踪，从而优化相机位姿以及生产新的关键帧；具体步骤如下：Use the image information captured by the camera carried by the robot to initialize the camera pose; the first frame of image captured by the camera is used as the key frame during initialization; after the initial pose is obtained, the local map is tracked to optimize the camera pose and production New keyframe; the specific steps are as follows:

步骤1.1，对通过Kinect2捕获的原始RGB图像信息(包括彩色图像和深度图像)提取ORB特征点并进行匹配，然后依次通过运动模式、关键模式和重定位模式这三种模式来跟踪和初始化相机位姿，即定位；初始化时，把第一帧设为关键帧，此步骤1.1与ORB-SLAM2一样。Step 1.1, extract and match ORB feature points from the original RGB image information (including color image and depth image) captured by Kinect2, and then track and initialize the camera position through the three modes of motion mode, key mode and repositioning mode in turn Pose, that is, positioning; when initializing, set the first frame as a key frame, and this step 1.1 is the same as ORB-SLAM2.

步骤1.2，得到初始位姿后，对局部地图进行跟踪，局部地图指的是距离和视角(在距离设为4m，角度设为1rad范围内)接近当前帧(相机当前拍摄的照片)的关键帧(局部关键帧)所观察到的3D点；通过重投影得到更多匹配的3D点，从而最小误差进行优化相机位姿：Step 1.2, after obtaining the initial pose, track the local map. The local map refers to the key frame whose distance and viewing angle (within the distance set to 4m and the angle set to 1rad) are close to the current frame (the photo currently taken by the camera) (Local keyframe) Observed 3D points; get more matching 3D points through reprojection, so as to optimize the camera pose with minimum error:

(1)定义：(1) Definition:

从相机坐标系c到机器人坐标系r的变换矩阵： Transformation matrix from camera coordinate system c to robot coordinate system r:

从机器人坐标系r到世界坐标系w的变换矩阵(即机器人的位姿)： The transformation matrix from the robot coordinate system r to the world coordinate system w (that is, the pose of the robot):

从一帧图片C对应的3D点P_c投影到此图片上的2D点u的投影关系：The projection relationship from the 3D point P _c corresponding to a frame of picture C to the 2D point u on this picture:

从一帧图片C上的2D点u结合其对应的深度信息z反投影到此图片对应的3D点P_c反投影关系：From the 2D point u on a frame of picture C combined with its corresponding depth information z back-projection to the corresponding 3D point P _c back-projection relationship of this picture:

(2)重投影得到特征匹配：(2) Reprojection to get feature matching:

设机器人位姿(即当前帧位姿)为将局部地图上的3D点投影到当前帧上，则可得到3D-2D的特征匹配：Let the pose of the robot (that is, the pose of the current frame) be 3D points on the local map Projected onto the current frame, 3D-2D feature matching can be obtained:

(3)优化相机位姿：(3) Optimize the camera pose:

在动态场景下，特征匹配会存在大量的误匹配，为了解决这个问题，本发明采用限制了在当前帧的搜索2D匹配点(即像素)的区域(半径设置为3个像素的圆)来减少误匹配。然后将当前帧中的像素u_i与3D点按照当前估计的相机位姿进行投影得到的位置u_i'(式3)相比较得到的误差构造最小二乘法问题，使它最小化，然后寻找最好的相机位姿，以进行定位：In a dynamic scene, there will be a large number of mismatches in feature matching. In order to solve this problem, the present invention uses a region (a circle with a radius set to 3 pixels) that limits the search for 2D matching points (ie pixels) in the current frame to reduce Mismatch. Then compare the pixel u _i in the current frame with the position u _i ' of the 3D point projected according to the current estimated camera pose (Eq. Good camera poses for localization:

(4)最后根据预设条件判断是否要生成新的关键帧；此预设条件与orb-slam2算法一样。(4) Finally, judge whether to generate a new key frame according to the preset condition; this preset condition is the same as the orb-slam2 algorithm.

步骤2，动态像素剔除Step 2, dynamic pixel culling

在动态场景中构建静态地图，动态像素的识别和删除是最关键的。因为只有关键帧用于构造八叉树地图，所以在这里只对上一步骤新选出的关键帧(新关键帧)进行剔除动态像素。To build a static map in a dynamic scene, the identification and deletion of dynamic pixels is the most critical. Because only the key frame is used to construct the octree map, only the newly selected key frame (new key frame) in the previous step is used to remove dynamic pixels.

该步骤首先使用目标检测方法在新关键帧的彩色图像中检测预定义动态对象，然后结合新关键帧的深度图像来识别动态像素；先后通过这两种方法检测到的动态像素都被剔除。步骤如下：This step first uses the object detection method to detect predefined dynamic objects in the color image of the new key frame, and then combines the depth image of the new key frame to identify dynamic pixels; the dynamic pixels detected by these two methods successively are eliminated. Proceed as follows:

步骤2.1，首先对于预定义对象例如：人、桌子、椅子等，本方案可以通过CornerNet-Lite中的CornerNet-Squeeze目标检测算法来对新关键帧的彩色图像中的预定义动态对象进行检测，如果检测到动态对象，则将动态对象的像素剔除。Step 2.1. First, for predefined objects such as people, tables, chairs, etc., this solution can use the CornerNet-Squeeze target detection algorithm in CornerNet-Lite to detect predefined dynamic objects in the color image of the new key frame, if When a dynamic object is detected, the pixels of the dynamic object are removed.

CornerNet-Squeeze目标检测算法处理一张图片只需34ms，比YOLOv3等算法都要快，测试硬件环境为：1080ti GPU+Intel Core i7-7700k CPU。The CornerNet-Squeeze target detection algorithm only needs 34ms to process a picture, which is faster than YOLOv3 and other algorithms. The test hardware environment is: 1080ti GPU+Intel Core i7-7700k CPU.

步骤2.2，对于存在一些未定义动态对象例如书籍、箱子等或者是预定义对象的一部分都没能被目标检测算法检测得到例如人的手等，本方案通过以下方法在经过上一步骤处理过的彩色图片上结合新关键帧的深度图像进行检测动态像素：Step 2.2, for some undefined dynamic objects such as books, boxes, etc., or part of predefined objects that cannot be detected by the target detection algorithm, such as human hands, etc., this solution uses the following method to process the previous step Combine the depth image of the new keyframe on the color image to detect dynamic pixels:

2.2.1把经过目标检测算法剔除预定义的对象剩下的像素投影到世界坐标下来创建3D点。2.2.1 Project the remaining pixels after the target detection algorithm eliminates the predefined objects to the world coordinates to create 3D points.

2.2.2使用聚类算法将3D点分成几个簇2.2.2 Use a clustering algorithm to divide 3D points into several clusters

将3D点分成多个簇；簇的数量k根据3D点的数量s_p确定：k＝s_p/n_pt，n_pt是要调整的集群的平均点数，s_p表示点云的大小，从每个簇中均匀地选出M个像素。Divide the 3D points into multiple clusters; the number of clusters k is determined according to the number of 3D points s _p : k=s _p /n _pt , n _pt is the average number of points in the cluster to be adjusted, and s _p represents the size of the point cloud, starting from each M pixels are evenly selected from the clusters.

由于像素的数量极大和聚类速度需要尽可能快，因此本方案中使用K-means的变体聚类方法Mini Batch K-Means方法。其中，减少n_pt可以保证更好的近似，但也会增加计算负担，本方案将n_pt设置为6000以平衡计算消耗和精度。Since the number of pixels is huge and the clustering speed needs to be as fast as possible, the Mini Batch K-Means method, a variant clustering method of K-means, is used in this scheme. Among them, reducing n _pt can ensure a better approximation, but it will also increase the calculation burden. In this solution, n _pt is set to 6000 to balance calculation consumption and accuracy.

因为本方案专注于移除动态像素和构建静态地图而不跟踪动态对象，所以假设聚类是刚体，这意味着同一聚类中的像素具有相同的运动属性；因此，本方案只需要检测哪些簇是动态簇；为了加速动态聚类检测过程，本方案针对每一个簇均匀地选择其中M＝100个像素。Because this scheme focuses on removing dynamic pixels and building static maps without tracking dynamic objects, clusters are assumed to be rigid bodies, which means that pixels in the same cluster have the same motion properties; therefore, this scheme only needs to detect which clusters is a dynamic cluster; in order to speed up the process of dynamic cluster detection, this scheme uniformly selects M=100 pixels for each cluster.

后面的步骤中，判断选出的动态和静态属性；如果动态像素比静态像素多，则确定该簇为动态的以进行剔除，否则确定为静态的则予以保留。In the following steps, the selected dynamic and static attributes are judged; if there are more dynamic pixels than static pixels, the cluster is determined to be dynamic to be eliminated, otherwise it is determined to be static and retained.

2.2.4判断像素是否为动态像素2.2.4 Determine whether a pixel is a dynamic pixel

本方案通过将步骤2.2.2选出的M个像素投影到离新关键帧最近的N＝6帧关键帧(新关键帧附近)进行比较来检测像素是否是为动态像素。具体步骤如下：This solution detects whether the pixel is a dynamic pixel by projecting the M pixels selected in step 2.2.2 to the N=6 frame key frames closest to the new key frame (near the new key frame) for comparison. Specific steps are as follows:

(1)使用关键帧的深度图像中的深度z和新关键帧n的机器人位姿将像素u反投影到世界坐标下的3D点p^w：(1) Use the depth z in the depth image of the keyframe and the robot pose of the new keyframe n Backproject pixel u to 3D point p ^w in world coordinates:

(2)将3D点p^w投影到新关键帧附近的第j个关键帧的彩色图像上：(2) Project the 3D point p ^w onto the color image of the jth keyframe near the new keyframe:

其中是关键帧附近第j个关键帧的机器人位姿。in is the robot pose of the jth keyframe near the keyframe.

(3)如果第j个关键帧的像素u′在对应的深度图像上存在有效的深度值z′，则像素u′反投影到世界坐标下的3D点p^w′：(3) If the pixel u' of the jth key frame has a valid depth value z' on the corresponding depth image, then the pixel u' is back-projected to the 3D point p ^w' in the world coordinates:

(4)通过将p^w′和p^w之间的距离d与设定的阈值d_mth比较来判断像素u是否为动态的：(4) Determine whether the pixel u is dynamic by comparing the distance d between ^pw′ and ^pw with the set threshold _dmth :

因为关键帧的深度图像和姿势都有误差，u′可能不是与u对应的像素，所以本方案通过搜索u′周围的正方形(根据经验将正方形边长S设置为10个像素)内的像素，使得d取最小值d_min；如果d_min大于阈值d_mth(阈值d_mth设置为与深度值z′线性增长)，则初步判断像素u判断为静态的，否则初步判断它是动态的；其他情况是在方形搜索区域中找不到有效的深度值，或者u超出图像的范围，在这种情况下，像素u被判断为无效。Because there are errors in the depth image and pose of the key frame, u′ may not be the pixel corresponding to u, so this scheme searches for pixels within a square around u′ (set the side length S of the square to 10 pixels according to experience), Make d take the minimum value d _min ; if d _min is greater than the threshold d _mth (threshold d _mth is set to grow linearly with the depth value z′), the pixel u is preliminarily judged to be static, otherwise it is preliminarily judged to be dynamic; in other cases Either no valid depth value can be found in the square search area, or u is outside the bounds of the image, in which case pixel u is judged invalid.

鉴于一个关键帧的结果不够可靠并且可能产生无效结果，本方案将上述初步判断过程(1)-(4)应用于新关键帧的所有附近关键帧(本实施例选6帧关键帧)，最后，像素u的最终情况由投票来决定：假设像素u在所有附近关键帧的初步判断结果中，静态结果的数量是N_S，动态结果的数量是N_d，无效结果的数量是N_i，像素u的最终属性如下：In view of the fact that the result of a key frame is not reliable enough and may produce invalid results, this program applies the above-mentioned preliminary judgment process (1)-(4) to all nearby key frames of the new key frame (select 6 key frames in this embodiment), and finally , the final situation of pixel u is determined by voting: Suppose pixel u is in the preliminary judgment results of all nearby key frames, the number of static results is N _S , the number of dynamic results is N _d , the number of invalid results is N _i , pixel The final properties of u are as follows:

如果(N_S≥N_d,N_S＞0)，则像素u为静态像素；If (N _S ≥N _d , N _S >0), then the pixel u is a static pixel;

如果(N_d≥N_s,N_d＞0)，则像素u为动态像素；If (N _d ≥ N _s , N _d >0), then the pixel u is a dynamic pixel;

2.2.5判断一个簇是否为动态2.2.5 Determine whether a cluster is dynamic

该步骤也采用上一步的投票方法来确定簇的属性；在一个簇中均匀选择的M个像素中，假设静态像素数为N_s'，动态像素数为N_d'，无效像素数为N_i'，簇的最终属性如下：This step also adopts the voting method in the previous step to determine the attributes of the clusters; among M pixels uniformly selected in a cluster, it is assumed that the number of static pixels is N _s ', the number of dynamic pixels is N _d ', and the number of invalid pixels is N _i ', the final properties of the cluster are as follows:

步骤3，稀疏映射Step 3, Sparse Mapping

稀疏映射主要目的是接收处理经过剔除动态像素的关键帧，优化此关键帧的机器人位姿，增加新的地图点，维护关键帧集合的质量与规模。具体步骤如下：The main purpose of sparse mapping is to receive and process keyframes that have eliminated dynamic pixels, optimize the robot pose of this keyframe, add new map points, and maintain the quality and scale of the keyframe set. Specific steps are as follows:

步骤3.1，处理新引入的关键帧Step 3.1, process newly introduced keyframes

步骤3.2，局部BAStep 3.2, Local BA

应用滑动窗口局部BA进行对机器人位姿的优化，优化框架如图2所示，优化对象为当前帧中的位姿，参与优化的有：Apply the sliding window local BA to optimize the robot pose. The optimization framework is shown in Figure 2. The optimization object is the pose in the current frame. Participating in the optimization are:

(1)所有在滑动窗口中与当前关键帧相连的关键帧中的位姿；为了平衡时间和精度，本方案把滑动窗口中的n设为7；(1) All the poses in the keyframes connected to the current keyframe in the sliding window; in order to balance time and precision, this scheme sets n in the sliding window to 7;

(2)在滑动窗口中的关键帧之前创建的两个黑色地图点；(2) Two black map points created before the keyframe in the sliding window;

(3)在滑动窗口中的关键帧之后创建的两个白色地图点，这两个白色地图点不作为变量，已经是固定的。在局部BA优化之后，新的关键帧的位姿得到优化，创建新的地图点，该新的关键帧将用于构建八叉树地图。(3) The two white map points created after the key frame in the sliding window are not used as variables and are already fixed. After local BA optimization, the pose of a new keyframe is optimized to create a new map point, and this new keyframe will be used to construct the octree map.

步骤3.3，局部关键帧筛选Step 3.3, local key frame screening

为了控制重建密度和BA优化的复杂度，该步骤还包括检测冗余关键帧并剔除的过程；通过重投影可知，如果关键帧上90％的像素能够被超过三个任意关键帧观察到，则被认为是冗余关键帧，并将其删除。In order to control the reconstruction density and the complexity of BA optimization, this step also includes the process of detecting and eliminating redundant key frames; through reprojection, if 90% of the pixels on the key frame can be observed by more than three arbitrary key frames, then are considered redundant keyframes and are removed.

步骤4，闭环检测Step 4, closed loop detection

利用词袋模型Dbow2对每个新的关键帧进行闭环检测。一旦检测到闭环，将进行位姿图优化；此过程与ORB-SLAM2算法一样，具体的位姿图优化过程属现有技术，不再赘述。The bag-of-words model Dbow2 is used to perform loop closure detection for each new keyframe. Once a closed loop is detected, the pose graph optimization will be performed; this process is the same as the ORB-SLAM2 algorithm, and the specific pose graph optimization process belongs to the existing technology and will not be repeated here.

步骤5，构建八叉树地图Step 5, build an octree map

利用八叉树将上面创建并经过优化的地图点划分为体素(或小方块)，并通过八叉树结构存储这些体素，以构建八叉树地图；通过计算体素的占据概率来判断体素是否被占据，如占据则在八叉树图进行可视化。Use the octree to divide the above-created and optimized map points into voxels (or small squares), and store these voxels through the octree structure to construct an octree map; judge by calculating the occupancy probability of voxels Whether the voxel is occupied, if so, it is visualized in the octree diagram.

如图3所示，构建八叉树地图是要不断去更新这些体素是否被占据；八叉树地图使用概率的形式来表示体素是否被占据，而不像地图点仅用0表示空白，1表示被占据；下面用概率对数值方法来描述。具体步骤如下：As shown in Figure 3, the construction of an octree map is to continuously update whether these voxels are occupied; the octree map uses the form of probability to indicate whether a voxel is occupied, unlike map points that only use 0 to represent blanks, 1 means occupied; the probability log value method is used to describe below. Specific steps are as follows:

步骤5.1，设Z_t表示体素n在时间t的观察结果(通过重投影可以得出结果)，体素n从时间为t开始的概率对数值为L(n|Z_1:t)，然后在时间为t+1时，体素n的概率对数值由下面公式可得：Step 5.1, let Z _t represent the observation result of voxel n at time t (the result can be obtained by reprojection), the logarithm value of the probability of voxel n starting from time t is L(n|Z _1:t ), then At time t+1, the logarithm value of voxel n can be obtained by the following formula:

如果体素n在时间t被观察到，则L(n|Z_t)为τ，否则为0；增量τ是预定值。该公式表示当重复观察到体素被占据时体素的概率对数值将增加，否则减小。L(n|Z _t ) is τ if voxel n is observed at time t, and 0 otherwise; increment τ is a predetermined value. This formula states that the logarithmic value of a voxel will increase when the voxel is repeatedly observed to be occupied, and decrease otherwise.

步骤5.2，定义p∈[0,1]表示体素被占据的概，和l∈R表示概率对数值，l可以通过logit变换计算得到：In step 5.2, define p∈[0,1] to represent the probability that a voxel is occupied, and l∈R to represent the logarithm value of the probability, l can be calculated by logit transformation:

以上式子反变换为：The inverse transformation of the above formula is:

通过上一步得到的概率对数值代入式10来计算体素的占据概率p；仅当占用概率p大于预定阈值时，才认为体素n被占据并且将在八叉树图中可视化。The occupancy probability p of a voxel is calculated by substituting the probability log value obtained in the previous step into Equation 10; only when the occupancy probability p is greater than a predetermined threshold, voxel n is considered occupied and will be visualized in the octree diagram.

换句话说，已被观察到多次占据的体素被认为是稳定的占据体素，通过这种方法，本方案可以很好地处理动态环境中的地图构建问题。在复杂情况下，八叉树地图有助加强对动态像素的剔除，最小化动态对象的影响。In other words, voxels that have been observed to be occupied multiple times are considered as stable occupied voxels, and in this way, our scheme can well handle the problem of mapping in dynamic environments. In complex situations, the octree map helps to strengthen the culling of dynamic pixels and minimize the impact of dynamic objects.

Claims

1. a kind of dynamic scene based on ORB-SLAM2 builds figure and localization method, which comprises the following steps:

Step 1, local map tracks

Camera pose is initialized using camera captured image information entrained by robot；Camera is caught when initialization The first frame image obtained is as key frame；After obtaining initial pose, local map is tracked, thus optimize camera pose with And the key frame that production is new；

Step 2, dynamic pixel is rejected

Predefined dynamic object is detected in the color image of new key frame using algorithm of target detection, then in conjunction with new key frame Depth image identify dynamic pixel；The dynamic pixel successively detected by both methods is all removed；

Step 3, sparse mapping

For eliminating the key frame of dynamic pixel, optimizes the robot pose of key frame, increase new point map, maintenance is crucial The quality and scale of frame set；

Step 4, closed loop detects

Closed loop detection is carried out to each new key frame, once detecting closed loop, carries out the optimization of pose figure；

Step 5, Octree map is constructed

Point map is divided into voxel using Octree, and these voxels are stored by octree structure, with constructing Octree Figure；Judge whether voxel is occupied by calculating the occupation probability of voxel, as occupied, is visualized in Octree figure.

2. the dynamic scene based on ORB-SLAM2 builds figure and localization method as described in claim 1, which is characterized in that described Local map is tracked, to optimize camera pose and the new key frame of production, comprising:

The local map refers to the point of 3D observed by the key frame of distance and visual angle close to present frame；Pass through re-projection More matched 3D points are obtained, so that minimal error optimizes camera pose and generates new key frame:

3D point in local map is projected on present frame, 3D-2D characteristic matching is obtained；

The region of the search 2D match point of present frame is limited in reduce error hiding, then presses the pixel in present frame with 3D Least Square Solution is constructed according to the error that the position that the camera pose currently estimated is projected compares, makes it It minimizes, best camera pose is then looked for, to be positioned；

It is determined whether to generate new key frame according to preset condition.

3. the dynamic scene based on ORB-SLAM2 builds figure and localization method as described in claim 1, which is characterized in that step 2 The depth image of the new key frame of the combination identifies dynamic pixel, comprising:

Get off to create 3D point to world coordinates the predefined remaining pixel projection of object is rejected by algorithm of target detection；It will 3D point is divided into multiple clusters, and M pixel is selected uniformly at from each cluster；For each pixel, pixel projection is closed to from new It is for no dynamic pixel that the nearest N frame key frame of key frame, which is compared to detection pixel:

Using the robot pose of depth and new key frame in the depth image of key frame by pixel u back projection to world coordinates Under 3D point p^w；

By 3D point p^wIt projects on the color image of j-th of key frame near new key frame；

If there are effective depth value z ', pixel u ' instead to throw on corresponding depth image by the pixel u ' of j-th of key frame 3D point p under shadow to world coordinates^w′；

By by p^w′And p^wThe distance between d and setting threshold value d_mthCompare to judge whether pixel u is dynamic:

The pixel in the square around u ' is searched for, so that d is minimized d_min；If d_minGreater than threshold value d_mth, then preliminary judgement Pixel u is judged as static, otherwise tentatively judges that it is dynamic.

4. the dynamic scene based on ORB-SLAM2 builds figure and localization method as claimed in claim 3, which is characterized in that assuming that For pixel u in all preliminary judging results of key frame nearby, the quantity of static result is N_S, the quantity of dynamic result is N_d, nothing The quantity for imitating result is N_i, the final attribute of pixel u is as follows:

If (N_S≥N_d,N_S> 0), then pixel u is static pixels；

If (N_d≥N_s,N_d> 0), then pixel u is dynamic pixel；

If (N_S=N_d=0), then pixel u is invalid.

5. the dynamic scene based on ORB-SLAM2 builds figure and localization method as claimed in claim 4, which is characterized in that described The successive dynamic pixel detected by both methods be all removed, wherein the depth image in conjunction with new key frame identifies After dynamic pixel, the method rejected are as follows:

In a cluster in M pixel of uniform design, it is assumed that static pixels number is N_s', dynamic pixel number is N_d', inactive pixels Number is N_i', the final attribute of cluster is as follows:

If (N_S'≥N_d'), then the cluster is static cluster, retains the cluster；

If (N_d'≥N_s'), then the cluster is Dynamic Cluster, is rejected.

6. the dynamic scene based on ORB-SLAM2 builds figure and localization method as described in claim 1, which is characterized in that described Optimization key frame robot pose, increase new point map, safeguard the quality and scale of key frame set, comprising:

The Bow vector for calculating current key frame, updates the point map of current key frame；

The optimization to robot pose is carried out using sliding window part BA, optimization object is the pose in present frame；

Detection redundancy key frames are simultaneously rejected, if 90% pixel can be exceeded three any key frame observations on key frame It arrives, is then considered as redundancy key frames, and be deleted.

7. the dynamic scene based on ORB-SLAM2 builds figure and localization method as described in claim 1, which is characterized in that described The occupation probability by calculating voxel judge whether voxel is occupied, as occupied, visualized, wrapped in Octree figure It includes:

If Z_tIndicate voxel n time t observation as a result, probability logarithm of the voxel n since the time is t be L (n | Z_1:t), so Afterwards when the time is t+1, the probability logarithm of voxel n is obtained by following formula:

L(n|Z_1:t+1)=L (n | Z_1:t-1)+L(n|Z_1:t) formula 8

If voxel n is observed in time t, and L (n | Z_t) it is τ, it is otherwise 0；Increment τ is predetermined value；

Defining the general and l ∈ R that p ∈ [0,1] indicates that voxel is occupied indicates probability logarithm, and l is obtained by logit transformation calculations It arrives:

Above equation inverse transformation are as follows:

The occupation probability p of voxel is calculated by the way that probability logarithm is substituted into formula 10；Only when acquistion probability p is greater than predetermined threshold When, just think that voxel n is occupied and will visualize in Octree figure.