CN110378997A - A kind of dynamic scene based on ORB-SLAM2 builds figure and localization method - Google Patents
A kind of dynamic scene based on ORB-SLAM2 builds figure and localization method Download PDFInfo
- Publication number
- CN110378997A CN110378997A CN201910481714.1A CN201910481714A CN110378997A CN 110378997 A CN110378997 A CN 110378997A CN 201910481714 A CN201910481714 A CN 201910481714A CN 110378997 A CN110378997 A CN 110378997A
- Authority
- CN
- China
- Prior art keywords
- pixel
- dynamic
- key frame
- voxel
- pose
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000004807 localization Effects 0.000 title claims abstract description 13
- 230000003068 static effect Effects 0.000 claims abstract description 28
- 238000001514 detection method Methods 0.000 claims abstract description 25
- 238000013507 mapping Methods 0.000 claims abstract description 12
- 238000005457 optimization Methods 0.000 claims description 14
- 230000009466 transformation Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000004519 manufacturing process Methods 0.000 claims description 4
- 230000000007 visual effect Effects 0.000 claims description 3
- 238000012423 maintenance Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 5
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/05—Geographic models
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Computer Graphics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Remote Sensing (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明涉及机器人同步建图与定位slam技术领域,具体涉及一种基于orb-slam 2的动态场景建图与定位方法。The invention relates to the technical field of robot synchronous mapping and positioning slam, in particular to a dynamic scene mapping and positioning method based on orb-slam 2.
背景技术Background technique
SLAM(同时定位与地图重建)一直是计算机视觉和机器人领域的热门话题,同时也吸引了很多高科技公司的关注。SLAM技术是在未知的环境当中建立一个地图并且能够在地图当中实时的定位。现代可视化SLAM系统的框架非常成熟,如ORB-SLAM2,LSD-SLAM;最先进的视觉同步定位和映射(V-SLAM)系统具有高精度定位功能,但是大多数这些系统假设操作环境是静态的,从而限制了它们的应用。SLAM (simultaneous localization and map reconstruction) has been a hot topic in the field of computer vision and robotics, and it has also attracted the attention of many high-tech companies. SLAM technology is to build a map in an unknown environment and be able to locate it in real time on the map. The framework of modern visual SLAM systems is very mature, such as ORB-SLAM2, LSD-SLAM; the most advanced Visual Synchronous Localization and Mapping (V-SLAM) systems have high-precision positioning capabilities, but most of these systems assume that the operating environment is static, thus limiting their application.
针对于在动态场景中建立静态地图,现有的算法都有其不足,例如DynaSLAM不能实时的,DynaSLAM只剔除预定义对象的像素,而对于未定义的对象或者只是预定义对象的一部分则无法剔除,StaticFusion系统对于长时间静止的人无法剔除,以上算法都不能在复杂的动态环境实时地建立一个干净的静态地图。For the establishment of static maps in dynamic scenes, existing algorithms have their shortcomings. For example, DynaSLAM cannot be real-time. DynaSLAM only removes pixels of predefined objects, but cannot remove undefined objects or only part of predefined objects. , the StaticFusion system cannot eliminate people who have been stationary for a long time, and none of the above algorithms can build a clean static map in real time in a complex dynamic environment.
发明内容Contents of the invention
针对上述现有技术中存在的问题,本发明的目的是提供一种基于ORB-SLAM2的动态场景建图与定位方法,该方法具有动态像素剔除的功能,通过目标检测方法集合新关键帧的深度图像在相机的图像信息中快速检测到移动对象并在复杂的动态环境中构建一个干净的静态背景八叉树地图。In view of the problems existing in the above-mentioned prior art, the object of the present invention is to provide a dynamic scene mapping and positioning method based on ORB-SLAM2, which has the function of dynamic pixel elimination, and collects the depth of new key frames through the target detection method Image quickly detects moving objects in the camera's image information and builds a clean static background octree map in complex dynamic environments.
为了实现上述任务,本发明采用以下技术方案:In order to achieve the above tasks, the present invention adopts the following technical solutions:
一种基于ORB-SLAM2的动态场景建图与定位方法,包括以下步骤:A dynamic scene mapping and positioning method based on ORB-SLAM2, comprising the following steps:
步骤1,局部地图跟踪Step 1, local map tracking
利用机器人所携带的相机捕获的图像信息对相机位姿进行初始化;初始化时将相机捕获的第一帧图像作为关键帧;得到初始位姿后,对局部地图进行跟踪,从而优化相机位姿以及生产新的关键帧;Use the image information captured by the camera carried by the robot to initialize the camera pose; the first frame of image captured by the camera is used as the key frame during initialization; after the initial pose is obtained, the local map is tracked to optimize the camera pose and production new keyframe;
步骤2,动态像素剔除Step 2, dynamic pixel culling
使用目标检测算法在新关键帧的彩色图像中检测预定义动态对象,然后结合新关键帧的深度图像来识别动态像素;先后通过这两种方法检测到的动态像素都被剔除;Use the target detection algorithm to detect predefined dynamic objects in the color image of the new key frame, and then combine the depth image of the new key frame to identify dynamic pixels; the dynamic pixels detected by these two methods are all eliminated;
步骤3,稀疏映射Step 3, Sparse Mapping
对于剔除了动态像素的关键帧,优化关键帧的机器人位姿,增加新的地图点,维护关键帧集合的质量与规模;For the keyframes with dynamic pixels removed, optimize the robot pose of the keyframes, add new map points, and maintain the quality and scale of the keyframe collection;
步骤4,闭环检测Step 4, closed loop detection
对每个新的关键帧进行闭环检测,一旦检测到闭环,进行位姿图优化;Perform loop closure detection for each new key frame, and once a closed loop is detected, perform pose graph optimization;
步骤5,构建八叉树地图Step 5, build an octree map
利用八叉树将地图点划分为体素,并通过八叉树结构存储这些体素,以构建八叉树地图;通过计算体素的占据概率来判断体素是否被占据,如占据则在八叉树图进行可视化。Use the octree to divide the map points into voxels, and store these voxels through the octree structure to construct the octree map; determine whether the voxel is occupied by calculating the occupancy probability of the voxel. Fork tree diagram for visualization.
进一步地,所述的对局部地图进行跟踪,从而优化相机位姿以及生产新的关键帧,包括:Further, the tracking of the local map to optimize the camera pose and generate new keyframes includes:
所述的局部地图指的是距离和视角接近当前帧的关键帧所观察到的3D点;通过重投影得到更多匹配的3D点,从而最小误差进行优化相机位姿及产生新的关键帧:The local map refers to the 3D points observed by the key frame whose distance and viewing angle are close to the current frame; more matching 3D points are obtained through reprojection, so as to optimize the camera pose and generate new key frames with the minimum error:
将局部地图上的3D点投影到当前帧上,得到3D-2D特征匹配;Project the 3D points on the local map onto the current frame to obtain 3D-2D feature matching;
限制在当前帧的搜索2D匹配点的区域来减少误匹配,然后将当前帧中的像素与3D点按照当前估计的相机位姿进行投影得到的位置相比较得到的误差构造最小二乘法问题,使它最小化,然后寻找最好的相机位姿,以进行定位;Limit the area of searching for 2D matching points in the current frame to reduce false matching, and then compare the pixels in the current frame with the positions obtained by projecting the 3D points according to the current estimated camera pose to construct the least squares method problem, so that It minimizes and then finds the best camera pose for localization;
根据预设条件判断是否要生成新的关键帧。Determine whether to generate a new keyframe according to preset conditions.
进一步地,步骤2所述的结合新关键帧的深度图像来识别动态像素,包括:Further, the step 2 described in combination with the depth image of the new key frame to identify dynamic pixels includes:
把经过目标检测算法剔除预定义的对象剩下的像素投影到世界坐标下来创建3D点;将3D点分成多个簇,从每个簇中均匀地选出M个像素;对于每一个像素,将像素投影到离新关键帧最近的N帧关键帧进行比较来检测像素是为否动态像素:Project the remaining pixels of the target detection algorithm to remove the predefined objects to the world coordinates to create 3D points; divide the 3D points into multiple clusters, and select M pixels evenly from each cluster; for each pixel, the The pixel is projected to the N keyframes closest to the new keyframe for comparison to detect whether the pixel is a dynamic pixel:
使用关键帧的深度图像中的深度和新关键帧的机器人位姿将像素u反投影到世界坐标下的3D点pw;Back-project pixel u to a 3D point pw in world coordinates using the depth from the keyframe's depth image and the new keyframe's robot pose;
将3D点pw投影到新关键帧附近的第j个关键帧的彩色图像上;Project the 3D point p w onto the color image of the jth keyframe near the new keyframe;
如果第j个关键帧的像素u′在对应的深度图像上存在有效的深度值z′,则像素u′反投影到世界坐标下的3D点pw′;If the pixel u' of the jth key frame has a valid depth value z' on the corresponding depth image, then the pixel u' is back-projected to the 3D point p w' in the world coordinates;
通过将pw′和pw之间的距离d与设定的阈值dmth比较来判断像素u是否为动态的:Whether a pixel u is dynamic is judged by comparing the distance d between pw′ and pw with a set threshold dmth :
搜索u′周围的正方形内的像素,使得d取最小值dmin;如果dmin大于阈值dmth,则初步判断像素u判断为静态的,否则初步判断它是动态的。Search the pixels in the square around u′ so that d takes the minimum value d min ; if d min is greater than the threshold d mth , the pixel u is preliminarily judged to be static, otherwise it is preliminarily judged to be dynamic.
进一步地,假设像素u在所有附近关键帧的初步判断结果中,静态结果的数量是NS,动态结果的数量是Nd,无效结果的数量是Ni,像素u的最终属性如下:Further, assuming that pixel u is in the preliminary judgment results of all nearby key frames, the number of static results is N S , the number of dynamic results is N d , and the number of invalid results is N i , the final attributes of pixel u are as follows:
如果(NS≥Nd,NS>0),则像素u为静态像素;If (N S ≥ N d , N S >0), then the pixel u is a static pixel;
如果(Nd≥Ns,Nd>0),则像素u为动态像素;If (N d ≥ N s , N d > 0), then the pixel u is a dynamic pixel;
如果(NS=Nd=0),则像素u为无效。If (N s =N d =0), then pixel u is invalid.
进一步地,所述的先后通过这两种方法检测到的动态像素都被剔除,其中结合新关键帧的深度图像来识别动态像素后,进行剔除的方法为:Further, the dynamic pixels detected by the two methods successively are all eliminated, and after the dynamic pixels are identified by combining the depth image of the new key frame, the method of elimination is as follows:
在一个簇中均匀选择的M个像素中,假设静态像素数为Ns',动态像素数为Nd',无效像素数为Ni',簇的最终属性如下:Among M pixels uniformly selected in a cluster, assuming that the number of static pixels is N s ', the number of dynamic pixels is N d ', and the number of invalid pixels is N i ', the final properties of the cluster are as follows:
如果(NS'≥Nd'),则该簇为静态簇,保留该簇;If (N S '≥N d '), the cluster is a static cluster, and the cluster is reserved;
如果(Nd'≥Ns'),则该簇为动态簇,进行剔除。If (N d '≥N s '), then the cluster is a dynamic cluster and will be eliminated.
进一步地,所述的优化关键帧的机器人位姿,增加新的地图点,维护关键帧集合的质量与规模,包括:Further, the optimization of the robot pose of the key frame, adding new map points, and maintaining the quality and scale of the key frame set includes:
计算当前关键帧的Bow向量,更新当前关键帧的地图点;Calculate the Bow vector of the current keyframe and update the map point of the current keyframe;
应用滑动窗口局部BA进行对机器人位姿的优化,优化对象为当前帧中的位姿;Apply sliding window local BA to optimize the pose of the robot, and the optimized object is the pose in the current frame;
检测冗余关键帧并剔除,如果关键帧上90%的像素能够被超过三个任意关键帧观察到,则被认为是冗余关键帧,并将其删除。Redundant keyframes are detected and eliminated. If 90% of the pixels on a keyframe can be observed by more than three arbitrary keyframes, it is considered a redundant keyframe and deleted.
进一步地,所述的通过计算体素的占据概率来判断体素是否被占据,如占据则在八叉树图进行可视化,包括:Further, the calculation of the occupancy probability of the voxel is used to determine whether the voxel is occupied, and if it is occupied, it is visualized in the octree diagram, including:
设Zt表示体素n在时间t的观察结果,体素n从时间为t开始的概率对数值为L(n|Z1:t),然后在时间为t+1时,体素n的概率对数值由下面公式得到:Let Z t represent the observation result of voxel n at time t, the probability log value of voxel n starting from time t is L(n|Z 1:t ), and then at time t+1, voxel n’s The log probability value is obtained by the following formula:
L(n|Z1:t+1)=L(n|Z1:t-1)+L(n|Z1:t) 式8L(n|Z 1:t+1 )=L(n|Z 1:t-1 )+L(n|Z 1:t ) Formula 8
如果体素n在时间t被观察到,则L(n|Zt)为τ,否则为0;增量τ是预定值;L(n|Z t ) is τ if voxel n is observed at time t, otherwise 0; increment τ is a predetermined value;
定义p∈[0,1]表示体素被占据的概,和l∈R表示概率对数值,l通过logit变换计算得到:Define p∈[0,1] to represent the probability that a voxel is occupied, and l∈R to represent the logarithmic value of the probability, and l is calculated by logit transformation:
以上式子反变换为:The inverse transformation of the above formula is:
通过将概率对数值代入式10来计算体素的占据概率p;仅当占用概率p大于预定阈值时,才认为体素n被占据并且将在八叉树图中可视化。The occupancy probability p of a voxel is calculated by substituting the logarithm value into Equation 10; only if the occupancy probability p is greater than a predetermined threshold, voxel n is considered occupied and will be visualized in the octree.
本发明具有以下技术特点:The present invention has the following technical characteristics:
1.快速性1. Rapidity
该算法使用CornerNet-Squeeze目标检测算法来检测动态对象以及使用k-means变体Mini Batch K-Means聚类算法对图像的深度信息进行聚类,比现已有的算法快。因为CornerNet-Squeeze目标检测算法处理一张图片只需34ms,比YOLOv3等算法都要快,测试硬件环境为:1080ti GPU+Intel Core i7-7700k CPU。对于大数据大于一万的聚类方法,k-means变体Mini Batch K-Means要比k-means本身要快三倍,性能也相差不大。The algorithm uses the CornerNet-Squeeze target detection algorithm to detect dynamic objects and uses the k-means variant Mini Batch K-Means clustering algorithm to cluster the depth information of the image, which is faster than existing algorithms. Because the CornerNet-Squeeze target detection algorithm only needs 34ms to process a picture, which is faster than YOLOv3 and other algorithms, the test hardware environment is: 1080ti GPU+Intel Core i7-7700k CPU. For clustering methods with big data larger than 10,000, the k-means variant Mini Batch K-Means is three times faster than k-means itself, and the performance is not much different.
另外,八叉树地图的建立也缩短了地图的更新时间。In addition, the establishment of the octree map also shortens the update time of the map.
2.在复杂的环境中可以建立非常干净的静态地图2. A very clean static map can be built in a complex environment
该算法结合目标检测方法以及八叉树地图的概率对数值更新体素方式来检测并剔除动态像素。The algorithm combines the target detection method and the probability logarithm of the octree map to update voxels to detect and eliminate dynamic pixels.
附图说明Description of drawings
图1为本发明方法的流程图;Fig. 1 is the flowchart of the inventive method;
图2为局部BA优化过程图;Figure 2 is a diagram of the local BA optimization process;
图3为八叉树地图模型。Figure 3 is an octree map model.
具体实施方式Detailed ways
ORB-SLAM2是基于单目,双目和RGB-D相机的一套完整的SLAM方案。它能够实现地图重用,回环检测和重新定位的功能。但是它假设操作环境是静态的,限制了它的应用。ORB-SLAM2 is a complete SLAM solution based on monocular, binocular and RGB-D cameras. It enables map reuse, loop detection and relocation. But it assumes that the operating environment is static, which limits its application.
本发明算法是在ORB-SLAM2算法的基础上提出的,能够实时快速地在动态的环境中建立一个干净的静态八叉树地图,主要由五个步骤组成:局部地图跟踪、动态像素剔除、稀疏映射、闭环检测以及创建八叉树地图,整体流程图如图1所示。具体内容如下:The algorithm of the present invention is proposed on the basis of the ORB-SLAM2 algorithm, which can quickly establish a clean static octree map in a dynamic environment in real time. It mainly consists of five steps: local map tracking, dynamic pixel removal, and sparse Mapping, loop closure detection, and creation of octree maps, the overall flow chart is shown in Figure 1. The specific content is as follows:
步骤1,局部地图跟踪Step 1, local map tracking
利用机器人所携带的相机捕获的图像信息对相机位姿进行初始化;初始化时将相机捕获的第一帧图像作为关键帧;得到初始位姿后,对局部地图进行跟踪,从而优化相机位姿以及生产新的关键帧;具体步骤如下:Use the image information captured by the camera carried by the robot to initialize the camera pose; the first frame of image captured by the camera is used as the key frame during initialization; after the initial pose is obtained, the local map is tracked to optimize the camera pose and production New keyframe; the specific steps are as follows:
步骤1.1,对通过Kinect2捕获的原始RGB图像信息(包括彩色图像和深度图像)提取ORB特征点并进行匹配,然后依次通过运动模式、关键模式和重定位模式这三种模式来跟踪和初始化相机位姿,即定位;初始化时,把第一帧设为关键帧,此步骤1.1与ORB-SLAM2一样。Step 1.1, extract and match ORB feature points from the original RGB image information (including color image and depth image) captured by Kinect2, and then track and initialize the camera position through the three modes of motion mode, key mode and repositioning mode in turn Pose, that is, positioning; when initializing, set the first frame as a key frame, and this step 1.1 is the same as ORB-SLAM2.
步骤1.2,得到初始位姿后,对局部地图进行跟踪,局部地图指的是距离和视角(在距离设为4m,角度设为1rad范围内)接近当前帧(相机当前拍摄的照片)的关键帧(局部关键帧)所观察到的3D点;通过重投影得到更多匹配的3D点,从而最小误差进行优化相机位姿:Step 1.2, after obtaining the initial pose, track the local map. The local map refers to the key frame whose distance and viewing angle (within the distance set to 4m and the angle set to 1rad) are close to the current frame (the photo currently taken by the camera) (Local keyframe) Observed 3D points; get more matching 3D points through reprojection, so as to optimize the camera pose with minimum error:
(1)定义:(1) Definition:
从相机坐标系c到机器人坐标系r的变换矩阵: Transformation matrix from camera coordinate system c to robot coordinate system r:
从机器人坐标系r到世界坐标系w的变换矩阵(即机器人的位姿): The transformation matrix from the robot coordinate system r to the world coordinate system w (that is, the pose of the robot):
从一帧图片C对应的3D点Pc投影到此图片上的2D点u的投影关系:The projection relationship from the 3D point P c corresponding to a frame of picture C to the 2D point u on this picture:
从一帧图片C上的2D点u结合其对应的深度信息z反投影到此图片对应的3D点Pc反投影关系:From the 2D point u on a frame of picture C combined with its corresponding depth information z back-projection to the corresponding 3D point P c back-projection relationship of this picture:
(2)重投影得到特征匹配:(2) Reprojection to get feature matching:
设机器人位姿(即当前帧位姿)为将局部地图上的3D点投影到当前帧上,则可得到3D-2D的特征匹配:Let the pose of the robot (that is, the pose of the current frame) be 3D points on the local map Projected onto the current frame, 3D-2D feature matching can be obtained:
(3)优化相机位姿:(3) Optimize the camera pose:
在动态场景下,特征匹配会存在大量的误匹配,为了解决这个问题,本发明采用限制了在当前帧的搜索2D匹配点(即像素)的区域(半径设置为3个像素的圆)来减少误匹配。然后将当前帧中的像素ui与3D点按照当前估计的相机位姿进行投影得到的位置ui'(式3)相比较得到的误差构造最小二乘法问题,使它最小化,然后寻找最好的相机位姿,以进行定位:In a dynamic scene, there will be a large number of mismatches in feature matching. In order to solve this problem, the present invention uses a region (a circle with a radius set to 3 pixels) that limits the search for 2D matching points (ie pixels) in the current frame to reduce Mismatch. Then compare the pixel u i in the current frame with the position u i ' of the 3D point projected according to the current estimated camera pose (Eq. Good camera poses for localization:
(4)最后根据预设条件判断是否要生成新的关键帧;此预设条件与orb-slam2算法一样。(4) Finally, judge whether to generate a new key frame according to the preset condition; this preset condition is the same as the orb-slam2 algorithm.
步骤2,动态像素剔除Step 2, dynamic pixel culling
在动态场景中构建静态地图,动态像素的识别和删除是最关键的。因为只有关键帧用于构造八叉树地图,所以在这里只对上一步骤新选出的关键帧(新关键帧)进行剔除动态像素。To build a static map in a dynamic scene, the identification and deletion of dynamic pixels is the most critical. Because only the key frame is used to construct the octree map, only the newly selected key frame (new key frame) in the previous step is used to remove dynamic pixels.
该步骤首先使用目标检测方法在新关键帧的彩色图像中检测预定义动态对象,然后结合新关键帧的深度图像来识别动态像素;先后通过这两种方法检测到的动态像素都被剔除。步骤如下:This step first uses the object detection method to detect predefined dynamic objects in the color image of the new key frame, and then combines the depth image of the new key frame to identify dynamic pixels; the dynamic pixels detected by these two methods successively are eliminated. Proceed as follows:
步骤2.1,首先对于预定义对象例如:人、桌子、椅子等,本方案可以通过CornerNet-Lite中的CornerNet-Squeeze目标检测算法来对新关键帧的彩色图像中的预定义动态对象进行检测,如果检测到动态对象,则将动态对象的像素剔除。Step 2.1. First, for predefined objects such as people, tables, chairs, etc., this solution can use the CornerNet-Squeeze target detection algorithm in CornerNet-Lite to detect predefined dynamic objects in the color image of the new key frame, if When a dynamic object is detected, the pixels of the dynamic object are removed.
CornerNet-Squeeze目标检测算法处理一张图片只需34ms,比YOLOv3等算法都要快,测试硬件环境为:1080ti GPU+Intel Core i7-7700k CPU。The CornerNet-Squeeze target detection algorithm only needs 34ms to process a picture, which is faster than YOLOv3 and other algorithms. The test hardware environment is: 1080ti GPU+Intel Core i7-7700k CPU.
步骤2.2,对于存在一些未定义动态对象例如书籍、箱子等或者是预定义对象的一部分都没能被目标检测算法检测得到例如人的手等,本方案通过以下方法在经过上一步骤处理过的彩色图片上结合新关键帧的深度图像进行检测动态像素:Step 2.2, for some undefined dynamic objects such as books, boxes, etc., or part of predefined objects that cannot be detected by the target detection algorithm, such as human hands, etc., this solution uses the following method to process the previous step Combine the depth image of the new keyframe on the color image to detect dynamic pixels:
2.2.1把经过目标检测算法剔除预定义的对象剩下的像素投影到世界坐标下来创建3D点。2.2.1 Project the remaining pixels after the target detection algorithm eliminates the predefined objects to the world coordinates to create 3D points.
2.2.2使用聚类算法将3D点分成几个簇2.2.2 Use a clustering algorithm to divide 3D points into several clusters
将3D点分成多个簇;簇的数量k根据3D点的数量sp确定:k=sp/npt,npt是要调整的集群的平均点数,sp表示点云的大小,从每个簇中均匀地选出M个像素。Divide the 3D points into multiple clusters; the number of clusters k is determined according to the number of 3D points s p : k=s p /n pt , n pt is the average number of points in the cluster to be adjusted, and s p represents the size of the point cloud, starting from each M pixels are evenly selected from the clusters.
由于像素的数量极大和聚类速度需要尽可能快,因此本方案中使用K-means的变体聚类方法Mini Batch K-Means方法。其中,减少npt可以保证更好的近似,但也会增加计算负担,本方案将npt设置为6000以平衡计算消耗和精度。Since the number of pixels is huge and the clustering speed needs to be as fast as possible, the Mini Batch K-Means method, a variant clustering method of K-means, is used in this scheme. Among them, reducing n pt can ensure a better approximation, but it will also increase the calculation burden. In this solution, n pt is set to 6000 to balance calculation consumption and accuracy.
因为本方案专注于移除动态像素和构建静态地图而不跟踪动态对象,所以假设聚类是刚体,这意味着同一聚类中的像素具有相同的运动属性;因此,本方案只需要检测哪些簇是动态簇;为了加速动态聚类检测过程,本方案针对每一个簇均匀地选择其中M=100个像素。Because this scheme focuses on removing dynamic pixels and building static maps without tracking dynamic objects, clusters are assumed to be rigid bodies, which means that pixels in the same cluster have the same motion properties; therefore, this scheme only needs to detect which clusters is a dynamic cluster; in order to speed up the process of dynamic cluster detection, this scheme uniformly selects M=100 pixels for each cluster.
后面的步骤中,判断选出的动态和静态属性;如果动态像素比静态像素多,则确定该簇为动态的以进行剔除,否则确定为静态的则予以保留。In the following steps, the selected dynamic and static attributes are judged; if there are more dynamic pixels than static pixels, the cluster is determined to be dynamic to be eliminated, otherwise it is determined to be static and retained.
2.2.4判断像素是否为动态像素2.2.4 Determine whether a pixel is a dynamic pixel
本方案通过将步骤2.2.2选出的M个像素投影到离新关键帧最近的N=6帧关键帧(新关键帧附近)进行比较来检测像素是否是为动态像素。具体步骤如下:This solution detects whether the pixel is a dynamic pixel by projecting the M pixels selected in step 2.2.2 to the N=6 frame key frames closest to the new key frame (near the new key frame) for comparison. Specific steps are as follows:
(1)使用关键帧的深度图像中的深度z和新关键帧n的机器人位姿将像素u反投影到世界坐标下的3D点pw:(1) Use the depth z in the depth image of the keyframe and the robot pose of the new keyframe n Backproject pixel u to 3D point p w in world coordinates:
(2)将3D点pw投影到新关键帧附近的第j个关键帧的彩色图像上:(2) Project the 3D point p w onto the color image of the jth keyframe near the new keyframe:
其中是关键帧附近第j个关键帧的机器人位姿。in is the robot pose of the jth keyframe near the keyframe.
(3)如果第j个关键帧的像素u′在对应的深度图像上存在有效的深度值z′,则像素u′反投影到世界坐标下的3D点pw′:(3) If the pixel u' of the jth key frame has a valid depth value z' on the corresponding depth image, then the pixel u' is back-projected to the 3D point p w' in the world coordinates:
(4)通过将pw′和pw之间的距离d与设定的阈值dmth比较来判断像素u是否为动态的:(4) Determine whether the pixel u is dynamic by comparing the distance d between pw′ and pw with the set threshold dmth :
因为关键帧的深度图像和姿势都有误差,u′可能不是与u对应的像素,所以本方案通过搜索u′周围的正方形(根据经验将正方形边长S设置为10个像素)内的像素,使得d取最小值dmin;如果dmin大于阈值dmth(阈值dmth设置为与深度值z′线性增长),则初步判断像素u判断为静态的,否则初步判断它是动态的;其他情况是在方形搜索区域中找不到有效的深度值,或者u超出图像的范围,在这种情况下,像素u被判断为无效。Because there are errors in the depth image and pose of the key frame, u′ may not be the pixel corresponding to u, so this scheme searches for pixels within a square around u′ (set the side length S of the square to 10 pixels according to experience), Make d take the minimum value d min ; if d min is greater than the threshold d mth (threshold d mth is set to grow linearly with the depth value z′), the pixel u is preliminarily judged to be static, otherwise it is preliminarily judged to be dynamic; in other cases Either no valid depth value can be found in the square search area, or u is outside the bounds of the image, in which case pixel u is judged invalid.
鉴于一个关键帧的结果不够可靠并且可能产生无效结果,本方案将上述初步判断过程(1)-(4)应用于新关键帧的所有附近关键帧(本实施例选6帧关键帧),最后,像素u的最终情况由投票来决定:假设像素u在所有附近关键帧的初步判断结果中,静态结果的数量是NS,动态结果的数量是Nd,无效结果的数量是Ni,像素u的最终属性如下:In view of the fact that the result of a key frame is not reliable enough and may produce invalid results, this program applies the above-mentioned preliminary judgment process (1)-(4) to all nearby key frames of the new key frame (select 6 key frames in this embodiment), and finally , the final situation of pixel u is determined by voting: Suppose pixel u is in the preliminary judgment results of all nearby key frames, the number of static results is N S , the number of dynamic results is N d , the number of invalid results is N i , pixel The final properties of u are as follows:
如果(NS≥Nd,NS>0),则像素u为静态像素;If (N S ≥N d , N S >0), then the pixel u is a static pixel;
如果(Nd≥Ns,Nd>0),则像素u为动态像素;If (N d ≥ N s , N d >0), then the pixel u is a dynamic pixel;
如果(NS=Nd=0),则像素u为无效。If (N s =N d =0), then pixel u is invalid.
2.2.5判断一个簇是否为动态2.2.5 Determine whether a cluster is dynamic
该步骤也采用上一步的投票方法来确定簇的属性;在一个簇中均匀选择的M个像素中,假设静态像素数为Ns',动态像素数为Nd',无效像素数为Ni',簇的最终属性如下:This step also adopts the voting method in the previous step to determine the attributes of the clusters; among M pixels uniformly selected in a cluster, it is assumed that the number of static pixels is N s ', the number of dynamic pixels is N d ', and the number of invalid pixels is N i ', the final properties of the cluster are as follows:
如果(NS'≥Nd'),则该簇为静态簇,保留该簇;If (N S '≥N d '), the cluster is a static cluster, and the cluster is reserved;
如果(Nd'≥Ns'),则该簇为动态簇,进行剔除。If (N d '≥N s '), then the cluster is a dynamic cluster and will be eliminated.
步骤3,稀疏映射Step 3, Sparse Mapping
稀疏映射主要目的是接收处理经过剔除动态像素的关键帧,优化此关键帧的机器人位姿,增加新的地图点,维护关键帧集合的质量与规模。具体步骤如下:The main purpose of sparse mapping is to receive and process keyframes that have eliminated dynamic pixels, optimize the robot pose of this keyframe, add new map points, and maintain the quality and scale of the keyframe set. Specific steps are as follows:
步骤3.1,处理新引入的关键帧Step 3.1, process newly introduced keyframes
计算当前关键帧的Bow向量,更新当前关键帧的地图点;Calculate the Bow vector of the current keyframe and update the map point of the current keyframe;
步骤3.2,局部BAStep 3.2, Local BA
应用滑动窗口局部BA进行对机器人位姿的优化,优化框架如图2所示,优化对象为当前帧中的位姿,参与优化的有:Apply the sliding window local BA to optimize the robot pose. The optimization framework is shown in Figure 2. The optimization object is the pose in the current frame. Participating in the optimization are:
(1)所有在滑动窗口中与当前关键帧相连的关键帧中的位姿;为了平衡时间和精度,本方案把滑动窗口中的n设为7;(1) All the poses in the keyframes connected to the current keyframe in the sliding window; in order to balance time and precision, this scheme sets n in the sliding window to 7;
(2)在滑动窗口中的关键帧之前创建的两个黑色地图点;(2) Two black map points created before the keyframe in the sliding window;
(3)在滑动窗口中的关键帧之后创建的两个白色地图点,这两个白色地图点不作为变量,已经是固定的。在局部BA优化之后,新的关键帧的位姿得到优化,创建新的地图点,该新的关键帧将用于构建八叉树地图。(3) The two white map points created after the key frame in the sliding window are not used as variables and are already fixed. After local BA optimization, the pose of a new keyframe is optimized to create a new map point, and this new keyframe will be used to construct the octree map.
步骤3.3,局部关键帧筛选Step 3.3, local key frame screening
为了控制重建密度和BA优化的复杂度,该步骤还包括检测冗余关键帧并剔除的过程;通过重投影可知,如果关键帧上90%的像素能够被超过三个任意关键帧观察到,则被认为是冗余关键帧,并将其删除。In order to control the reconstruction density and the complexity of BA optimization, this step also includes the process of detecting and eliminating redundant key frames; through reprojection, if 90% of the pixels on the key frame can be observed by more than three arbitrary key frames, then are considered redundant keyframes and are removed.
步骤4,闭环检测Step 4, closed loop detection
利用词袋模型Dbow2对每个新的关键帧进行闭环检测。一旦检测到闭环,将进行位姿图优化;此过程与ORB-SLAM2算法一样,具体的位姿图优化过程属现有技术,不再赘述。The bag-of-words model Dbow2 is used to perform loop closure detection for each new keyframe. Once a closed loop is detected, the pose graph optimization will be performed; this process is the same as the ORB-SLAM2 algorithm, and the specific pose graph optimization process belongs to the existing technology and will not be repeated here.
步骤5,构建八叉树地图Step 5, build an octree map
利用八叉树将上面创建并经过优化的地图点划分为体素(或小方块),并通过八叉树结构存储这些体素,以构建八叉树地图;通过计算体素的占据概率来判断体素是否被占据,如占据则在八叉树图进行可视化。Use the octree to divide the above-created and optimized map points into voxels (or small squares), and store these voxels through the octree structure to construct an octree map; judge by calculating the occupancy probability of voxels Whether the voxel is occupied, if so, it is visualized in the octree diagram.
如图3所示,构建八叉树地图是要不断去更新这些体素是否被占据;八叉树地图使用概率的形式来表示体素是否被占据,而不像地图点仅用0表示空白,1表示被占据;下面用概率对数值方法来描述。具体步骤如下:As shown in Figure 3, the construction of an octree map is to continuously update whether these voxels are occupied; the octree map uses the form of probability to indicate whether a voxel is occupied, unlike map points that only use 0 to represent blanks, 1 means occupied; the probability log value method is used to describe below. Specific steps are as follows:
步骤5.1,设Zt表示体素n在时间t的观察结果(通过重投影可以得出结果),体素n从时间为t开始的概率对数值为L(n|Z1:t),然后在时间为t+1时,体素n的概率对数值由下面公式可得:Step 5.1, let Z t represent the observation result of voxel n at time t (the result can be obtained by reprojection), the logarithm value of the probability of voxel n starting from time t is L(n|Z 1:t ), then At time t+1, the logarithm value of voxel n can be obtained by the following formula:
L(n|Z1:t+1)=L(n|Z1:t-1)+L(n|Z1:t) 式8L(n|Z 1:t+1 )=L(n|Z 1:t-1 )+L(n|Z 1:t ) Formula 8
如果体素n在时间t被观察到,则L(n|Zt)为τ,否则为0;增量τ是预定值。该公式表示当重复观察到体素被占据时体素的概率对数值将增加,否则减小。L(n|Z t ) is τ if voxel n is observed at time t, and 0 otherwise; increment τ is a predetermined value. This formula states that the logarithmic value of a voxel will increase when the voxel is repeatedly observed to be occupied, and decrease otherwise.
步骤5.2,定义p∈[0,1]表示体素被占据的概,和l∈R表示概率对数值,l可以通过logit变换计算得到:In step 5.2, define p∈[0,1] to represent the probability that a voxel is occupied, and l∈R to represent the logarithm value of the probability, l can be calculated by logit transformation:
以上式子反变换为:The inverse transformation of the above formula is:
通过上一步得到的概率对数值代入式10来计算体素的占据概率p;仅当占用概率p大于预定阈值时,才认为体素n被占据并且将在八叉树图中可视化。The occupancy probability p of a voxel is calculated by substituting the probability log value obtained in the previous step into Equation 10; only when the occupancy probability p is greater than a predetermined threshold, voxel n is considered occupied and will be visualized in the octree diagram.
换句话说,已被观察到多次占据的体素被认为是稳定的占据体素,通过这种方法,本方案可以很好地处理动态环境中的地图构建问题。在复杂情况下,八叉树地图有助加强对动态像素的剔除,最小化动态对象的影响。In other words, voxels that have been observed to be occupied multiple times are considered as stable occupied voxels, and in this way, our scheme can well handle the problem of mapping in dynamic environments. In complex situations, the octree map helps to strengthen the culling of dynamic pixels and minimize the impact of dynamic objects.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910481714.1A CN110378997B (en) | 2019-06-04 | 2019-06-04 | ORB-SLAM 2-based dynamic scene mapping and positioning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910481714.1A CN110378997B (en) | 2019-06-04 | 2019-06-04 | ORB-SLAM 2-based dynamic scene mapping and positioning method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110378997A true CN110378997A (en) | 2019-10-25 |
CN110378997B CN110378997B (en) | 2023-01-20 |
Family
ID=68249727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910481714.1A Active CN110378997B (en) | 2019-06-04 | 2019-06-04 | ORB-SLAM 2-based dynamic scene mapping and positioning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110378997B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827305A (en) * | 2019-10-30 | 2020-02-21 | 中山大学 | Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment |
CN110930519A (en) * | 2019-11-14 | 2020-03-27 | 华南智能机器人创新研究院 | Semantic ORB-SLAM perception method and device based on environmental understanding |
CN111178342A (en) * | 2020-04-10 | 2020-05-19 | 浙江欣奕华智能科技有限公司 | Pose graph optimization method, device, equipment and medium |
CN111539305A (en) * | 2020-04-20 | 2020-08-14 | 肇庆小鹏汽车有限公司 | Map construction method and system, vehicle and storage medium |
CN111709982A (en) * | 2020-05-22 | 2020-09-25 | 浙江四点灵机器人股份有限公司 | Three-dimensional reconstruction method for dynamic environment |
CN111862162A (en) * | 2020-07-31 | 2020-10-30 | 湖北亿咖通科技有限公司 | Loop detection method and system, readable storage medium and electronic device |
CN111914832A (en) * | 2020-06-03 | 2020-11-10 | 华南理工大学 | A SLAM method for RGB-D cameras in dynamic scenes |
CN112703368A (en) * | 2020-04-16 | 2021-04-23 | 华为技术有限公司 | Vehicle positioning method and device and positioning layer generation method and device |
CN112802053A (en) * | 2021-01-27 | 2021-05-14 | 广东工业大学 | Dynamic object detection method for dense mapping in dynamic environment |
CN112884835A (en) * | 2020-09-17 | 2021-06-01 | 中国人民解放军陆军工程大学 | Visual SLAM method for target detection based on deep learning |
CN113547501A (en) * | 2021-07-29 | 2021-10-26 | 中国科学技术大学 | SLAM-based mobile mechanical arm cart task planning and control method |
CN114529672A (en) * | 2022-02-15 | 2022-05-24 | 国网山东省电力公司建设公司 | Three-dimensional reconstruction method and system for transformer substation construction site |
CN117596367A (en) * | 2024-01-19 | 2024-02-23 | 安徽协创物联网技术有限公司 | A low-power video surveillance camera and its control method |
CN117649760A (en) * | 2023-10-20 | 2024-03-05 | 北京交通大学 | Intelligent automobile traffic scene graph building method considering dynamic targets |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105856230A (en) * | 2016-05-06 | 2016-08-17 | 简燕梅 | ORB key frame closed-loop detection SLAM method capable of improving consistency of position and pose of robot |
CN108596974A (en) * | 2018-04-04 | 2018-09-28 | 清华大学 | Dynamic scene robot localization builds drawing system and method |
CN109631855A (en) * | 2019-01-25 | 2019-04-16 | 西安电子科技大学 | High-precision vehicle positioning method based on ORB-SLAM |
-
2019
- 2019-06-04 CN CN201910481714.1A patent/CN110378997B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105856230A (en) * | 2016-05-06 | 2016-08-17 | 简燕梅 | ORB key frame closed-loop detection SLAM method capable of improving consistency of position and pose of robot |
CN108596974A (en) * | 2018-04-04 | 2018-09-28 | 清华大学 | Dynamic scene robot localization builds drawing system and method |
CN109631855A (en) * | 2019-01-25 | 2019-04-16 | 西安电子科技大学 | High-precision vehicle positioning method based on ORB-SLAM |
Non-Patent Citations (1)
Title |
---|
王召东等: "一种动态场景下语义分割优化的ORB_SLAM2", 《大连海事大学学报》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827305A (en) * | 2019-10-30 | 2020-02-21 | 中山大学 | Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment |
CN110827305B (en) * | 2019-10-30 | 2021-06-08 | 中山大学 | A Tightly Coupled Approach to Semantic Segmentation and Visual SLAM for Dynamic Environments |
CN110930519A (en) * | 2019-11-14 | 2020-03-27 | 华南智能机器人创新研究院 | Semantic ORB-SLAM perception method and device based on environmental understanding |
CN110930519B (en) * | 2019-11-14 | 2023-06-20 | 华南智能机器人创新研究院 | Semantic ORB-SLAM sensing method and device based on environment understanding |
CN111178342A (en) * | 2020-04-10 | 2020-05-19 | 浙江欣奕华智能科技有限公司 | Pose graph optimization method, device, equipment and medium |
CN111178342B (en) * | 2020-04-10 | 2020-07-07 | 浙江欣奕华智能科技有限公司 | Pose graph optimization method, device, equipment and medium |
CN112703368A (en) * | 2020-04-16 | 2021-04-23 | 华为技术有限公司 | Vehicle positioning method and device and positioning layer generation method and device |
CN111539305A (en) * | 2020-04-20 | 2020-08-14 | 肇庆小鹏汽车有限公司 | Map construction method and system, vehicle and storage medium |
CN111539305B (en) * | 2020-04-20 | 2024-03-12 | 肇庆小鹏汽车有限公司 | Map construction method and system, vehicle and storage medium |
CN111709982A (en) * | 2020-05-22 | 2020-09-25 | 浙江四点灵机器人股份有限公司 | Three-dimensional reconstruction method for dynamic environment |
CN111914832A (en) * | 2020-06-03 | 2020-11-10 | 华南理工大学 | A SLAM method for RGB-D cameras in dynamic scenes |
CN111914832B (en) * | 2020-06-03 | 2023-06-13 | 华南理工大学 | SLAM method of RGB-D camera under dynamic scene |
CN111862162A (en) * | 2020-07-31 | 2020-10-30 | 湖北亿咖通科技有限公司 | Loop detection method and system, readable storage medium and electronic device |
CN112884835A (en) * | 2020-09-17 | 2021-06-01 | 中国人民解放军陆军工程大学 | Visual SLAM method for target detection based on deep learning |
CN112802053A (en) * | 2021-01-27 | 2021-05-14 | 广东工业大学 | Dynamic object detection method for dense mapping in dynamic environment |
CN113547501A (en) * | 2021-07-29 | 2021-10-26 | 中国科学技术大学 | SLAM-based mobile mechanical arm cart task planning and control method |
CN114529672A (en) * | 2022-02-15 | 2022-05-24 | 国网山东省电力公司建设公司 | Three-dimensional reconstruction method and system for transformer substation construction site |
CN117649760A (en) * | 2023-10-20 | 2024-03-05 | 北京交通大学 | Intelligent automobile traffic scene graph building method considering dynamic targets |
CN117596367A (en) * | 2024-01-19 | 2024-02-23 | 安徽协创物联网技术有限公司 | A low-power video surveillance camera and its control method |
Also Published As
Publication number | Publication date |
---|---|
CN110378997B (en) | 2023-01-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110378997B (en) | ORB-SLAM 2-based dynamic scene mapping and positioning method | |
Rafique et al. | Statistical multi-objects segmentation for indoor/outdoor scene detection and classification via depth images | |
CN111814683B (en) | Robust visual SLAM method based on semantic prior and deep learning features | |
CN109544677B (en) | Indoor scene main structure reconstruction method and system based on depth image key frame | |
CN107392964B (en) | The indoor SLAM method combined based on indoor characteristic point and structure lines | |
CN106887011B (en) | A multi-template target tracking method based on CNN and CF | |
CN110060277A (en) | A kind of vision SLAM method of multiple features fusion | |
CN110298404A (en) | A kind of method for tracking target based on triple twin Hash e-learnings | |
CN110688905B (en) | Three-dimensional object detection and tracking method based on key frame | |
CN112002009B (en) | Unsupervised three-dimensional face reconstruction method based on generation of confrontation network | |
AU2020300067B2 (en) | Layered motion representation and extraction in monocular still camera videos | |
CN111260661A (en) | A visual semantic SLAM system and method based on neural network technology | |
CN110473231A (en) | A kind of method for tracking target of the twin full convolutional network with anticipation formula study more new strategy | |
CN109063549A (en) | High-resolution based on deep neural network is taken photo by plane video moving object detection method | |
CN109389156B (en) | Training method and device of image positioning model and image positioning method | |
CN111402429B (en) | Scale reduction and three-dimensional reconstruction method, system, storage medium and equipment | |
CN110633628A (en) | 3D model reconstruction method of RGB image scene based on artificial neural network | |
CN108830170A (en) | A kind of end-to-end method for tracking target indicated based on layered characteristic | |
CN114067128A (en) | SLAM loop detection method based on semantic features | |
CN116740539A (en) | Visual SLAM method and system based on lightweight target detection network | |
CN113011359B (en) | Method for simultaneously detecting plane structure and generating plane description based on image and application | |
Wang et al. | Digital twin: Acquiring high-fidelity 3D avatar from a single image | |
CN108961385A (en) | A kind of SLAM patterning process and device | |
CN118071932A (en) | Three-dimensional static scene image reconstruction method and system | |
Zhao et al. | Real-time human segmentation by bowtienet and a slam-based human ar system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |