[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113916245A - Semantic map construction method based on instance segmentation and VSLAM - Google Patents

Semantic map construction method based on instance segmentation and VSLAM Download PDF

Info

Publication number
CN113916245A
CN113916245A CN202111176088.9A CN202111176088A CN113916245A CN 113916245 A CN113916245 A CN 113916245A CN 202111176088 A CN202111176088 A CN 202111176088A CN 113916245 A CN113916245 A CN 113916245A
Authority
CN
China
Prior art keywords
instance
rgb
point cloud
map
plane
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111176088.9A
Other languages
Chinese (zh)
Other versions
CN113916245B (en
Inventor
陈建驱
徐昱琳
李铭扬
李璇
杨傲雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202111176088.9A priority Critical patent/CN113916245B/en
Publication of CN113916245A publication Critical patent/CN113916245A/en
Application granted granted Critical
Publication of CN113916245B publication Critical patent/CN113916245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/28Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network with correlation of data from several navigational instruments
    • G01C21/30Map- or contour-matching
    • G01C21/32Structuring or formatting of map data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of mobile robot positioning navigation, and discloses a semantic map construction method based on instance segmentation and VSLAM, which comprises the following steps: s1: matching the acquired RGB image with the depth map to obtain an RGB-D frame; s2: performing pose estimation on all RGB-D frames frame by frame, and finding out key frames in all RGB-D frames; s3: carrying out example segmentation on RGB images in all RGB-D key frames frame by frame to obtain an example segmentation result of each RGB-D key frame; s4: converting the RGB-D key frame processed in the step S3 into a point cloud; s5: creating an instance set in each RGB-D key frame point cloud, and firstly fusing the instance sets in all the RGB-D key frame point clouds into a blank initial instance map to obtain an instance map; s6: fusing background point clouds in all RGB-D key frame point clouds to obtain a global background point cloud; s7: and optimizing and fusing the example map and the global background point cloud to obtain a semantic map. The semantic map constructed by the semantic map construction method is high in precision and can be constructed in real time.

Description

Semantic map construction method based on instance segmentation and VSLAM
Technical Field
The invention relates to the technical field of mobile robot positioning and navigation, in particular to a semantic map construction method based on instance segmentation and VSLAM.
Background
With the development of society, the demand of human beings on robots is increasing. The mobile robot has the important characteristics that the mobile robot can sense and understand the environment by depending on a sensor, can autonomously or semi-autonomously execute tasks, and has certain learning capacity on the environment. SLAM aims to reconstruct the three-dimensional structure of the environment in real time in an unknown environment while positioning a mobile robot, and can be classified as visual SLAM (vslam) or laser SLAM. The map constructed by the traditional VSLAM only carries basic information such as color and texture of the environment, and cannot provide semantic information of higher level, such as semantic category of a certain space point in the point cloud. The robot cannot understand the environment and thus cannot perform more advanced tasks. For example, for a mobile robot to automatically grab a task, the robot is required to know what the target is, where.
Therefore, semantic information, called a semantic map, needs to be added to the map. The semantic map contains both geometric and semantic information of the environment. In addition to improving the intelligence level of the robot, the semantic map may also allow the user to interact directly with the robot, for example, the user may query how many cups are on the map.
The traditional semantic map is mainly constructed by machine learning algorithms such as a support vector machine and a conditional random field, and the semantic map constructed by the methods has low precision and is difficult to run in real time. Since 2012, the field of deep learning has been developed in a spanning manner, and some methods for constructing semantic maps through deep learning have appeared. However, the semantic map constructed by the methods is low in semantic level and difficult to directly utilize; or the calculation amount is too large to be deployed on the mobile robot. Therefore, a new semantic map construction algorithm based on deep learning needs to be provided.
Disclosure of Invention
Aiming at the problems and the defects in the prior art, the invention provides a semantic map construction method based on example segmentation and VSLAM.
In order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows:
a semantic map construction method based on instance segmentation and VSLAM comprises the following steps:
s1: matching the RGB image and the depth map acquired from the RGB-D camera according to the time stamp to obtain RGB-D frames, wherein each RGB-D frame consists of the RGB image and the depth map with the same time stamp;
s2: performing pose estimation on all RGB-D frames frame by adopting a VSLAM algorithm, finding out key frames in all RGB-D frames according to a pose estimation result, and recording the key frames as RGB-D key frames;
s3: carrying out example segmentation on the RGB images in all the RGB-D key frames frame by adopting a trained example segmentation model to obtain an example segmentation result of each RGB-D key frame, wherein the example segmentation result comprises semantic categories and sequence numbers of each example in the RGB images;
s4: converting the RGB-D key frame processed in the step S3 into a point cloud;
s5: creating an example set in each RGB-D key frame point cloud, inserting the example set in the first RGB-D key frame point cloud into a blank initial example map according to the sequence of time stamps, and then sequentially fusing the example sets in the remaining RGB-D key frame point clouds into the initial example map to obtain an example map;
s6: creating background point clouds in each RGB-D key frame point cloud, and then fusing the background point clouds in all the RGB-D key frame point clouds to obtain a global background point cloud;
s7: and respectively optimizing the example map and the global background point cloud, and then fusing the optimized example map and the optimized global background point cloud to obtain the semantic map.
According to the semantic mapping method based on example segmentation and VSLAM, preferably, the specific operation of creating the example set in each RGB-D keyframe point cloud in step S5 is as follows:
s51: traversing each three-dimensional point in the point cloud aiming at an RGB-D key frame point cloud, and extracting each instance point cloud according to instance semantic categories and instance sequence numbers corresponding to the three-dimensional points;
s52: for an example point cloud, carrying out clustering segmentation processing on the example point cloud by adopting an Euclidean distance clustering segmentation algorithm, and then calculating the length, width, height and center coordinates of the example point cloud after clustering segmentation to obtain a three-dimensional surrounding frame of the example point cloud, namely completing the construction of an example corresponding to the example point cloud;
s53: and combining all the examples constructed by the RGB-D key frame point cloud to obtain an example set of the RGB-D key frame.
According to the above semantic mapping method based on instance segmentation and VSLAM, the representation attributes of the instance in step S52 are preferably: semantic type, sequence number, geometric information, point cloud and observation dictionary of the instance; the geometric information includes length, width, height, center coordinates of the instance; and the key value pair of the observation dictionary is the serial number of the example and the observation times, and the observation times are the times of the example appearing in all RGB-D key frames.
According to the semantic map construction method based on example segmentation and VSLAM, preferably, the specific operation of sequentially fusing the example sets in the remaining RGB-D keyframe frame point clouds to the initial example map in step S5 is as follows: aiming at any instance A in an RGB-D key frame instance set, finding an instance B with the maximum IoU of the instance A in an initial instance map, judging whether the instance A and the instance B need to be fused, if so, fusing the instance A and the instance B to obtain an instance C, and inserting the instance C into the initial instance map; and judging that the instance A is directly inserted into the initial instance map if the instance A and the instance B do not need to be fused. More preferably, the condition for judging whether the instance a and the instance B need to be fused is as follows: if IoU of example A and example B>0.2 and semantic categories are the same, or IoU for instance A and instance B>0.5, judging that the example A and the example B need to be fused. Wherein IoU is equal to the volume of the intersection of example A and example B divided by the volume of the ice maker part, i.e.
Figure BDA0003295666260000031
Wherein Vab represents the volume of the intersection part of the example A and the example B, Va represents the volume of the example A, and Vb represents the volume of the example B.
According to the semantic map construction method based on instance segmentation and VSLAM, preferably, the semantic category and the sequence number of the instance C are consistent with those of the instances a and B with the largest observation times; the geometric information of the example C is an extreme value of the geometric information of the examples a and B (where the extreme value may be a maximum value or a minimum value; for example, minX is a minimum value of the X axis of the example in the world coordinate system, then minX of the example C is a smaller value of minX of the example a and minX of the example B, i.e., c.minx ═ MIN (a.minx, b.minx), or maxZ is a maximum value of the Z axis of the example in the world coordinate system, then maxZ of the example C is a larger value of maxZ of the example a and maxZ of the example B, i.e., c.maxz ═ MAX (a.maxz, b.maxz)); the point cloud of the example C is the sum of the point clouds of the example A and the example B.
According to the semantic map construction method based on example segmentation and VSLAM, preferably, the method for optimizing the example map in step S7 is as follows: firstly, carrying out non-maximum value inhibition processing on an example map, and then carrying out point cloud alignment processing to obtain an optimized example map; the method for optimizing the global background point cloud comprises the following steps: and processing the global background point cloud by adopting a voxel filtering algorithm, and then performing point cloud alignment processing to obtain the optimized global background point cloud.
According to the above semantic mapping method based on example segmentation and VSLAM, the specific operation steps of the non-maximum suppression processing are preferably: for each instance in the instance map, calculating the PIoU values of any two instances, and fusing the two instances if the PIoU is greater than a preset threshold Th. The calculation method of the PIoU comprises the following steps: calculating the volumes Vm and Vn of the instances M and N, calculating the volume Vmn of the intersecting part of the instances M and N, calculating the proportions Pm and Pn of the Vmn relative to the Vm and Vn, selecting the maximum value of the Pm and Pn, wherein the maximum value is the PIoU value of the instances A and B, and the calculation formula of the PIoU is as follows:
Figure BDA0003295666260000041
according to the above semantic map construction method based on example segmentation and VSLAM, preferably, the specific operations of the point cloud alignment processing are as follows: fitting the maximum plane in the point cloud by adopting a random sampling consistency algorithm to obtain a fitted plane; constructing a rotation matrix according to the normal vector of the fitting plane, rotating the point cloud according to the rotation matrix until the normal vector is aligned with the z axis, and translating the fitting plane to an xy plane in a world coordinate system along the z axis; and then, taking the xy plane as a partition plane, calculating the number of the point clouds on two sides of the partition plane, judging whether to turn the point clouds or not according to the number of the point clouds on two sides of the partition plane, and if the number of the point clouds above the partition plane is smaller than that below the partition plane, turning the point clouds along the xy plane, namely finishing the point cloud alignment treatment.
According to the above semantic map construction method based on example segmentation and VSLAM, preferably, the construction method of the rotation matrix is as follows:
(1) fitting the maximum plane in the point cloud by adopting a random consensus algorithm to obtain a fitting plane, wherein the equation of the fitting plane is ax + by + cz + d is 0, and the normal vector of the fitting plane is recorded as p1=(x1,y1,z1) The normal vector p1The unit vector to the Z axis is denoted as p2=(x2,y2,z2) Let the origin be denoted as p3=(x3,y3,z3) (0, 0, 0) according to p1、p2、p3Three points defining a plane p1-p2-p3Plane p of1-p2-p3Is p1To p2Of the plane of rotation, the plane of calculation p1-p2-p3The normal vector n is p1To p2A rotating shaft of (a); since n and p1p2、p1p3Perpendicular, with p1p1And p1p3Solving a solution vector n by the cross product of the n, wherein a calculation formula of the normal vector n is shown as follows;
Figure BDA0003295666260000042
a=(y2-y1)*(z3-z1)-(y3-y1)*(z2-z1)
b=(z2-z1)*(x3-x1)-(z3-z1)*(x2-x1)
c=(x2-x1)*(y3-y1)-(x3-x1)*(y2-y1)
wherein i represents a direction vector on the X-axis, j represents a direction vector on the Y-axis, and k represents a direction vector on the Z-axis;
(2) calculating p1To p2The rotation angle θ of the rotation shaft of (a), the calculation formula of the rotation angle θ is as follows;
p1*p2=|p1|*|p2|*cosθ
Figure BDA0003295666260000051
(3) and calculating according to a Rodrigues formula to obtain a rotation matrix, wherein the calculation formula of the rotation matrix is as follows:
R=cosθI+(1-cosθ)nnT+sinθn^
where R denotes a rotation matrix, θ denotes a rotation angle, I denotes a unit vector, T denotes a transpose of the vector, and the symbol Λ denotes a vector to antisymmetric converter.
According to the semantic mapping method based on example segmentation and VSLAM, preferably, the specific operation of creating the background point cloud in each RGB-D keyframe point cloud in step S6 is as follows: and removing all example point clouds in the point cloud aiming at one RGB-D key frame point cloud to obtain the background point cloud in the RGB-D key frame point cloud.
According to the above semantic mapping method based on example segmentation and VSLAM, preferably, the VSLAM algorithm in step S2 is ORB-SLAM 2.
According to the semantic mapping method based on example segmentation and VSLAM, the example segmentation model in step S3 is preferably a YOLACT model.
According to the semantic mapping method based on example segmentation and VSLAM, in step S4, the specific operation of converting the RGB-D key frame into a point cloud is preferably as follows: traversing each pixel point of the RGB-D key frame aiming at the RGB-D key frame, judging whether the pixel point is effective or not according to the depth measurement value of each pixel point, if the pixel point is effective, converting the pixel point from a pixel coordinate system into three-dimensional points under a world coordinate system, and combining the three-dimensional points of all effective pixel points under the world coordinate system together to obtain the point cloud of the RGB-D key frame.
According to the semantic map construction method based on example segmentation and VSLAM, preferably, the specific operation of converting the pixel point from the pixel coordinate system to the three-dimensional point in the world coordinate system is as follows:
(A) firstly, converting pixel points into three-dimensional points under a camera coordinate system according to a pinhole camera model, and setting the pixel coordinate of the pixel point P as PuvThe coordinate of the pixel point P in the camera coordinate system is PcThe formula of the available pinhole camera model is as follows:
Figure BDA0003295666260000052
wherein f isx、fyIs the focal length of the camera, cx、cyIs the offset value of the origin of the physical imaging coordinate system and the pixel coordinate system, and K is the internal reference matrix of the camera;
(B) let the coordinate of the pixel point P in the world coordinate system be Pw,PcIs PwTransforming the current pose of the camera from the world coordinate system to a result in the camera coordinate system according to the current pose of the camera; the pose of the camera is taken by the camera relative to the worldThe target rotation matrix R and the translation vector t, then,
Figure BDA0003295666260000061
the formula describes the projection relation from the world coordinate of a pixel point P to the pixel coordinate, wherein R represents the pose of the camera, T is an external parameter, T represents a transformation matrix, T is a homogeneous matrix formed by R and T, K represents an internal reference matrix of the camera, and P represents the position of the camerawAnd the pixel coordinate of the pixel point P can be obtained by first multiplying T by T and then multiplying K by K. Conversely, the above process is reversed, so that the transformation from the pixel coordinate system to the world coordinate system of the pixel point P can be obtained, and the transformation formula is as follows:
T-1K-1ZPuv=Pw
compared with the prior art, the invention has the following positive beneficial effects:
(1) according to the method, semantic information is introduced through an end-to-end instance segmentation model, and instance segmentation can directly segment instances in the image.
(2) The method constructs the instance-level semantic map, is more practical compared with the pixel-level semantic map constructed by the existing method, and can be directly used on high-level tasks.
(3) The invention adopts a distributed architecture, and deploys a module with small calculation amount on the mobile robot, and deploys a module with large calculation amount, such as a deep learning model, on a remote server, so that the universality of the invention is greatly increased, and for example, a program can run on the mobile robot with lack of calculation resources in real time.
(4) According to the method, when an example corresponding to the example point cloud is constructed, an observation dictionary is introduced into the attribute of the example, the observation dictionary is used for recording the class of the example judged by the example segmentation model and the occurrence frequency of the example in all RGB-D key frames, and the observation dictionary represents the confidence degree of whether the example is correctly modeled, so that the observation dictionary is used as auxiliary information when a high-level task (such as a grabbing task) of the robot is executed. Due to the introduction of the observation dictionary, the information of the semantic map is richer.
(5) Due to the precision problem of the example segmentation algorithm, the segmentation results of the same three-dimensional object predicted by different frames in the example segmentation may have larger difference, so that repeated examples cannot be completely fused when the example sets of different RGB-D key frames are fused. According to the method, the non-maximum value inhibition optimization processing is carried out on the example map, the redundant modeling example in the example map can be eliminated, and the construction quality of the example map can be improved; moreover, the purpose and effect of the invention to perform point cloud alignment processing on the example map is to align the constructed map to the world coordinate system.
(6) The invention can reduce the number of point clouds in the map by carrying out voxel filtering processing on the global background point clouds, thereby reducing the size of the occupied space of the map in the memory and reducing the calculation amount required when the map is processed.
Drawings
FIG. 1 is a schematic diagram of a semantic map construction method based on example segmentation and VSLAM in accordance with the present invention;
FIG. 2 is a pseudo code for fusing the sample sets in all RGB-D keyframe point clouds to the initial sample map in embodiment 1 of the present invention;
FIG. 3 is pseudo code for non-maxima suppression of an example map in embodiment 1 of the present invention;
FIG. 4 is a semantic mapping system based on example segmentation and VSLAM of the present invention.
Detailed Description
The present invention is further illustrated by the following specific examples, which are not intended to limit the scope of the invention.
Example 1:
a semantic mapping method based on instance segmentation and VSLAM, as shown in fig. 1, includes the following steps:
s1: and matching the RGB image and the depth map acquired from the RGB-D camera according to the time stamp to obtain RGB-D frames, wherein each RGB-D frame consists of the RGB image and the depth map with the same time stamp.
S2: and performing pose estimation on all RGB-D frames frame by adopting a VSLAM algorithm, finding out key frames in all RGB-D frames according to the pose estimation result, and recording the key frames as RGB-D key frames. Wherein, the VSLAM algorithm is preferably ORB-SLAM 2.
S3: and carrying out example segmentation on the RGB images in all the RGB-D key frames frame by adopting a trained example segmentation model to obtain an example segmentation result of each RGB-D key frame, wherein the example segmentation result comprises the semantic category and the sequence number of each example in the RGB images. Wherein the example segmentation model is preferably a YOLACT model.
S4: and converting the RGB-D key frame processed in the step S3 into a point cloud.
S5: creating an instance set in each RGB-D key frame point cloud, inserting the instance set in the first RGB-D key frame point cloud into a blank initial instance map according to the sequence of time stamps, and then sequentially fusing the instance sets in the remaining RGB-D key frame point clouds into the initial instance map to obtain the instance map (the pseudo code for fusing the instance sets in all the RGB-D key frame point clouds into the initial instance map is shown in FIG. 2).
S6: and creating background point clouds in each RGB-D key frame point cloud, and then fusing the background point clouds in all the RGB-D key frame point clouds to obtain a global background point cloud. Preferably, the specific operation of creating the background point cloud in each RGB-D keyframe point cloud is: and removing all example point clouds in the point cloud aiming at one RGB-D key frame point cloud to obtain the background point cloud in the RGB-D key frame point cloud.
S7: and respectively optimizing the example map and the global background point cloud, and then fusing the optimized example map and the optimized global background point cloud to obtain the semantic map. Preferably, the example map optimization method comprises the following steps: the non-maximum suppression processing is performed on the example map (the pseudo code for non-maximum suppression is shown in fig. 3), and then the point cloud alignment processing is performed to obtain the optimized example map. The optimization method of the global background point cloud comprises the following steps: processing the global background point cloud by adopting a voxel filtering algorithm (the voxel filtering algorithm is an algorithm known in the field), and then performing point cloud alignment processing to obtain the optimized global background point cloud.
The specific operation of converting the RGB-D key frame into a point cloud in step S4 is as follows:
traversing each pixel point of the RGB-D key frame aiming at the RGB-D key frame, judging whether the pixel point is effective or not according to the depth measurement value of each pixel point, if the pixel point is effective, converting the pixel point from a pixel coordinate system into three-dimensional points under a world coordinate system, and combining the three-dimensional points of all effective pixel points under the world coordinate system together to obtain the point cloud of the RGB-D key frame.
The specific operation of converting the pixel point from the pixel coordinate system to the three-dimensional point in the world coordinate system is as follows:
(A) firstly, converting pixel points into three-dimensional points under a camera coordinate system according to a pinhole camera model, and setting the pixel coordinate of the pixel point P as PuvThe coordinate of the pixel point P in the camera coordinate system is PcThe formula of the available pinhole camera model is shown in formula I:
Figure BDA0003295666260000081
wherein f isx、fyIs the focal length of the camera, cx、cyIs the offset value of the origin of the physical imaging coordinate system and the pixel coordinate system, and K is the internal reference matrix of the camera;
(B) let the coordinate of the pixel point P in the world coordinate system be Pw,PcIs PwTransforming the current pose of the camera from the world coordinate system to a result in the camera coordinate system according to the current pose of the camera; the pose of the camera is described by the rotation matrix R and translation vector t of the camera with respect to the world coordinates, then,
Figure BDA0003295666260000091
formula II describes the projection relationship from the world coordinate to the pixel coordinate of the pixel point P, wherein R represents the pose of the camera, and t isThe external parameter T represents a transformation matrix, the transformation matrix T is a homogeneous matrix formed by R and T, K represents an internal parameter matrix of the camera, PwAnd the pixel coordinate of the pixel point P can be obtained by first multiplying T by T and then multiplying K by K. Conversely, the above process is reversed, so that the conversion of the pixel point P from the pixel coordinate system to the world coordinate system selenium can be obtained, and the conversion formula is shown in formula III:
T-1K-1ZPuv=Pwformula III.
The specific operation of creating an instance set in each RGB-D keyframe point cloud in step S5 is:
s51: traversing each three-dimensional point in the point cloud aiming at an RGB-D key frame point cloud, and extracting each instance point cloud according to instance semantic categories and instance sequence numbers corresponding to the three-dimensional points. The semantic information of the three-dimensional points is represented by the colors of the three-dimensional points; the second channel value of the color of each three-dimensional point is the semantic category and the third channel value is the instance number.
S52: for an example point cloud, performing clustering segmentation processing on the example point cloud by using an Euclidean distance clustering segmentation algorithm (the algorithm is a known algorithm in the field), and then calculating the length, width, height and center coordinates of the example point cloud after clustering segmentation to obtain a three-dimensional surrounding frame of the example point cloud, namely completing the example construction corresponding to the example point cloud. The representation attributes of the constructed instance are: semantic type, sequence number, geometric information, point cloud and observation dictionary of the instance; the geometric information includes length, width, height, center coordinates of the instance; the key value pair of the observation dictionary is the serial number and the observation times of the example, and the observation times are the times of the example appearing in all RGB-D key frames; the observation dictionary is used for recording the types and the occurrence times of the instances judged by the instance segmentation model, and is an important reference for the fusion and the use of the subsequent instances. The example segmentation model can perform discontinuous segmentation on the same example, so that the obtained example point cloud is divided into a plurality of blocks, and discrete small block-shaped point clouds exist around the object due to example segmentation errors; the discrete and multiple pieces of example point clouds can greatly influence the construction of the example map, so that the method for clustering and partitioning the example point clouds by adopting the Euclidean distance clustering and partitioning algorithm is favorable for obtaining accurate examples.
S53: and combining all the examples constructed by the RGB-D key frame point cloud to obtain an example set of the RGB-D key frame.
The concrete operation of sequentially fusing the instance sets in the remaining RGB-D keyframe frame point clouds to the initial instance map in step S5 is as follows:
aiming at any instance A in an RGB-D key frame instance set, finding an instance B with the maximum IoU of the instance A in an initial instance map, judging whether the instance A and the instance B need to be fused, if so, fusing the instance A and the instance B to obtain an instance C, and inserting the instance C into the initial instance map; and judging that the instance A is directly inserted into the initial instance map if the instance A and the instance B do not need to be fused. The semantic category and the sequence number of the instance C are consistent with those of the instances A and B with the maximum observation times; the geometric information of the example C is an extreme value of the geometric information of the examples a and B (where the extreme value may be a maximum value or a minimum value; for example, minX is a minimum value of the X axis of the example in the world coordinate system, then minX of the example C is a smaller value of minX of the example a and minX of the example B, i.e., c.minx ═ MIN (a.minx, b.minx), or maxZ is a maximum value of the Z axis of the example in the world coordinate system, then maxZ of the example C is a larger value of maxZ of the example a and maxZ of the example B, i.e., c.maxz ═ MAX (a.maxz, b.maxz)); the point cloud of the example C is the sum of the point clouds of the example A and the example B.
More preferably, the condition for judging whether the instance a and the instance B need to be fused is as follows: if IoU of example A and example B>0.2 and semantic categories are the same, or IoU for instance A and instance B>0.5, judging that the example A and the example B need to be fused. Wherein IoU is equal to the volume of the intersection of example A and example B divided by the volume of the ice maker part, i.e.
Figure BDA0003295666260000101
Wherein Vab represents the volume of the intersection of the example A and the example B, Va represents the volume of the example AVolume, Vb, represents the volume of example B.
The specific operation of performing the non-maximum suppression processing on the example map in step S7 is:
for each instance in the instance map, calculating the PIoU values of any two instances, and fusing the two instances if the PIoU is greater than a preset threshold Th. The calculation method of the PIoU comprises the following steps: calculating the volumes Vm and Vn of the instances M and N, calculating the volume Vmn of the intersecting part of the instances M and N, calculating the proportions Pm and Pn of the Vmn relative to the Vm and Vn, selecting the maximum value of Pm and Pn, wherein the maximum value is the PIoU value of the instances A and B, and the calculation formula of the PIoU is shown in a formula IV:
Figure BDA0003295666260000102
the specific operation of the point cloud alignment processing in step S7 is:
fitting the maximum plane in the point cloud by adopting a random sampling consistency algorithm to obtain a fitted plane; constructing a rotation matrix according to the normal vector of the fitting plane, rotating the point cloud according to the rotation matrix until the normal vector is aligned with the z axis, and translating the fitting plane to an xy plane in a world coordinate system along the z axis; and then, taking the xy plane as a partition plane, calculating the number of the point clouds on two sides of the partition plane, judging whether to turn the point clouds or not according to the number of the point clouds on two sides of the partition plane, and if the number of the point clouds above the partition plane is smaller than that below the partition plane, turning the point clouds along the xy plane, namely finishing the point cloud alignment treatment.
Preferably, the method for constructing the rotation matrix comprises the following steps:
(1) fitting the maximum plane in the point cloud by adopting a random consensus algorithm to obtain a fitting plane, wherein the equation of the fitting plane is ax + by + cz + d is 0, and the normal vector of the fitting plane is recorded as p1=(x1,y1,z1) The normal vector p1The unit vector to the Z axis is denoted as p2=(x2,y2,z2) Let the origin be denoted as p3=(x3,y3,z3) (0, 0, 0) according to p1、p2、p3Three points defining a plane p1-p2-p3Plane p of1-p2-p3Is p1To p2Of the plane of rotation, the plane of calculation p1-p2-p3The normal vector n is p1To p2A rotating shaft of (a); since n and p1p2、p1p3Perpendicular, with p1p1And p1p3Solving a solution vector n by the cross product of the N, wherein a calculation formula of a normal vector n is shown as a formula V;
Figure BDA0003295666260000111
a=(y2-y1)*(z3-z1)-(y3-y1)*(z2-z1)
b=(z2-z1)*(x3-x1)-(z3-z1)*(x2-x1)
c=(x2-x1)*(y3-y1)-(x3-x1)*(y2-y1) Formula V
Wherein i represents a direction vector on the X-axis, j represents a direction vector on the Y-axis, and k represents a direction vector on the Z-axis;
(2) calculating p1To p2The calculation formula of the rotation angle theta is shown in formula VI;
p1*p2=|p1|*|p2|*cosθ
Figure BDA0003295666260000112
(3) and calculating according to a Rodrigues formula to obtain a rotation matrix, wherein the calculation formula of the rotation matrix is shown as a formula VII:
R=cosθI+(1-cosθ)nnT+ sin theta n ^ formula VII
Where R denotes a rotation matrix, θ denotes a rotation angle, I denotes a unit vector, T denotes a transpose of the vector, and the symbol Λ denotes a vector to antisymmetric converter.
Example 2:
the embodiment provides a system for implementing the semantic mapping method based on instance segmentation and VSLAM described in embodiment 1, and as shown in fig. 4, the system adopts ROS as an implementation framework. The system is realized by a plurality of nodes, including a data source node, a SLAM node, an instance segmentation node, a graph building node and an interaction node. The data source node is used for reading and acquiring an RGB image and a depth map from the RGB-D camera or the data set, and matching the acquired RGB image and the depth map according to the time stamp to obtain an RGB-D frame; the SLAM node is used for carrying out pose estimation on all RGB-D frames frame by frame and finding out key frames in all the RGB-D frames according to a pose estimation result; the example segmentation node is used for carrying out example segmentation on the RGB images in all the RGB-D key frames frame by frame to obtain an example segmentation result of each RGB-D key frame; the map building node is used for extracting the examples and the background point clouds in all RGB-D key frames according to the example segmentation result and the estimated pose to obtain an example map and a global background point cloud, and then fusing the example map and the optimized global background point cloud to obtain a semantic map; the interactive node is responsible for the interaction of the user with the system.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the present invention, but rather as the following description is intended to cover all modifications, equivalents and improvements falling within the spirit and scope of the present invention.

Claims (10)

1. A semantic map construction method based on instance segmentation and VSLAM is characterized by comprising the following steps:
s1: matching the RGB image and the depth map acquired from the RGB-D camera according to the time stamp to obtain RGB-D frames, wherein each RGB-D frame consists of the RGB image and the depth map with the same time stamp;
s2: performing pose estimation on all RGB-D frames frame by adopting a VSLAM algorithm, selecting a key frame in the RGB-D frames according to a pose estimation result, and recording the key frame as an RGB-D key frame;
s3: carrying out example segmentation on the RGB images in all the RGB-D key frames frame by adopting a trained example segmentation model to obtain an example segmentation result of each RGB-D key frame, wherein the example segmentation result comprises semantic categories and sequence numbers of each example in the RGB images;
s4: converting the RGB-D key frame processed in the step S3 into a point cloud;
s5: creating an example set in each RGB-D key frame point cloud, inserting the example set in the first RGB-D key frame point cloud into a blank initial example map according to the sequence of time stamps, and then sequentially fusing the example sets in the remaining RGB-D key frame point clouds into the initial example map to obtain an example map;
s6: creating a background point cloud in each RGB-D key frame point cloud, and then fusing the background point clouds in all the RGB-D key frame point clouds to obtain a global background point cloud;
s7: and respectively optimizing the example map and the global background point cloud, and then fusing the optimized example map and the optimized global background point cloud to obtain the semantic map.
2. The method for semantic mapping according to claim 1, wherein the specific operations of creating the object instance set in each RGB-D keyframe frame point cloud in step S5 are as follows:
s51: traversing each three-dimensional point in the point cloud aiming at an RGB-D key frame point cloud, and extracting each instance point cloud according to instance semantic categories and instance sequence numbers corresponding to the three-dimensional points;
s52: for an example point cloud, carrying out clustering segmentation processing on the example point cloud by adopting an Euclidean distance clustering segmentation algorithm, and then calculating the length, width, height and center coordinates of the example point cloud after clustering segmentation to obtain a three-dimensional surrounding frame of the example point cloud, namely completing the construction of an example corresponding to the example point cloud;
s53: and combining all the examples constructed by the RGB-D key frame point cloud to obtain an example set of the RGB-D key frame.
3. The method for semantic mapping according to example segmentation and VSLAM as claimed in claim 2, wherein the representation attributes of the example in step S52 are: semantic type, sequence number, geometric information, point cloud and observation dictionary of the instance; the geometric information includes length, width, height, center coordinates of the instance; and the key value pair of the observation dictionary is the serial number of the example and the observation times, and the observation times are the times of the example appearing in all RGB-D key frames.
4. The semantic map construction method based on instance segmentation and VSLAM as claimed in claim 3, wherein the specific operation of sequentially fusing the instance sets in the remaining RGB-D keyframe frame point clouds to the initial instance map in step S5 is as follows:
aiming at any instance A in an RGB-D key frame instance set, finding an instance B with the maximum IoU of the instance A in an initial instance map, and judging whether the instance A and the instance B need to be fused or not; judging that if the instance A and the instance B need to be fused, fusing the instance A and the instance B to obtain an instance C, and inserting the instance C into the initial instance map; judging that the instance A is directly inserted into the initial instance map if the instance A and the instance B do not need to be fused; the condition for judging whether the instance A and the instance B need to be fused is as follows: if IoU >0.2 and the semantic category is the same for instance A and instance B, or IoU >0.5 for instance A and instance B, then it is determined that instance A and instance B need to be fused.
5. The method for semantic mapping according to claim 4, wherein the semantic category and the sequence number of the instance C are consistent with those of the instances A and B with the largest observation times; the geometric information of the example C is an extremum of the geometric information of the example A and the example B; the point cloud of the example C is the sum of the point clouds of the example A and the example B.
6. The semantic mapping method according to claim 1, wherein the example map is optimized in step S7 by: firstly, carrying out non-maximum value inhibition processing on an example map, and then carrying out point cloud alignment processing to obtain an optimized example map; the method for optimizing the global background point cloud comprises the following steps: and processing the global background point cloud by adopting a voxel filtering algorithm, and then performing point cloud alignment processing to obtain the optimized global background point cloud.
7. The method of claim 6, wherein the non-maxima suppression process comprises the following specific steps: calculating the PIoU values of any two instances for each instance in the instance map, and fusing the two instances if the PIoU value is greater than a preset threshold Th; the calculation method of the PIoU comprises the following steps: and (3) calculating the volumes Vm and Vn of the instances M and N, calculating the volume Vmn of the intersection part of the instances M and N, calculating the proportions Pm and Pn of the Vmn relative to the Vm and Vn, and selecting the maximum value of the Pm and Pn, wherein the maximum value is the PIoU value of the instances A and B.
8. The method for semantic mapping based on instance segmentation and VSLAM according to claim 6, wherein the point cloud alignment process is specifically operated as follows: fitting the maximum plane in the point cloud by adopting a random sampling consistency algorithm to obtain a fitted plane; constructing a rotation matrix according to the normal vector of the fitting plane, rotating the point cloud according to the rotation matrix until the normal vector is aligned with the z axis, and translating the fitting plane to an xy plane in a world coordinate system along the z axis; and then, taking the xy plane as a partition plane, calculating the number of the point clouds on two sides of the partition plane, judging whether to turn the point clouds or not according to the number of the point clouds on two sides of the partition plane, and if the number of the point clouds above the partition plane is smaller than that below the partition plane, turning the point clouds along the xy plane, namely finishing the point cloud alignment treatment.
9. The method of claim 8, wherein the rotation matrix is constructed by:
(1) fitting the maximum plane in the point cloud by adopting a random consensus algorithm to obtain a fitting plane, wherein the equation of the fitting plane is ax + by + cz + d is 0, and the normal vector of the fitting plane is recorded as p1=(x1,y1,z1) The normal vector p1The unit vector to the Z axis is denoted as p2=(x2,y2,z2) Let the origin be denoted as p3=(x3,y3,z3) (0, 0, 0) according to p1、p2、p3Three points defining a plane p1-p2-p3Plane p of1-p2-p3Is p1To p2Of the plane of rotation, the plane of calculation p1-p2-p3The normal vector n is p1To p2A rotating shaft of (a); since n and p1p2、p1p3Perpendicular, with p1p1And p1p3Solving a solution vector n by the cross product of the n, wherein a calculation formula of the normal vector n is shown as follows;
Figure FDA0003295666250000031
a=(y2-y1)*(z3-z1)-(y3-y1)*(z2-z1)
b=(z2-z1)*(x3-x1)-(z3-z1)*(x2-x1)
c=(x2-x1)*(y3-y1)-(x3-x1)*(y2-y1)
wherein i represents a direction vector on the X-axis, j represents a direction vector on the Y-axis, and k represents a direction vector on the Z-axis;
(2) calculating p1To p2The rotation angle θ of the rotation shaft of (a), the calculation formula of the rotation angle θ is as follows;
p1*p2=|p1|*|p2|*cosθ
Figure FDA0003295666250000032
(3) and calculating according to a Rodrigues formula to obtain a rotation matrix, wherein the calculation formula of the rotation matrix is as follows:
R=cosθI+(1-cosθ)nnT+sinθn^
where R denotes a rotation matrix, θ denotes a rotation angle, I denotes a unit vector, T denotes a transpose of the vector, and the symbol Λ denotes a vector to antisymmetric converter.
10. The semantic mapping method based on example segmentation and VSLAM according to claim 1, wherein the specific operation of creating the background point cloud in each RGB-D keyframe frame point cloud in step S6 is: and removing all example point clouds in the frame point cloud aiming at one RGB-D key frame point cloud to obtain the background point cloud in the RGB-D key frame point cloud.
CN202111176088.9A 2021-10-09 2021-10-09 Semantic map construction method based on instance segmentation and VSLAM Active CN113916245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111176088.9A CN113916245B (en) 2021-10-09 2021-10-09 Semantic map construction method based on instance segmentation and VSLAM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111176088.9A CN113916245B (en) 2021-10-09 2021-10-09 Semantic map construction method based on instance segmentation and VSLAM

Publications (2)

Publication Number Publication Date
CN113916245A true CN113916245A (en) 2022-01-11
CN113916245B CN113916245B (en) 2024-07-19

Family

ID=79238663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111176088.9A Active CN113916245B (en) 2021-10-09 2021-10-09 Semantic map construction method based on instance segmentation and VSLAM

Country Status (1)

Country Link
CN (1) CN113916245B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612494A (en) * 2022-03-11 2022-06-10 南京理工大学 Design method of mobile robot vision odometer in dynamic scene
CN114882182A (en) * 2022-04-22 2022-08-09 东南大学 Semantic map construction method based on vehicle-road cooperative sensing system

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130314403A1 (en) * 2012-05-22 2013-11-28 Hon Hai Precision Industry Co., Ltd. Method for splicing point clouds together
CN109724603A (en) * 2019-01-08 2019-05-07 北京航空航天大学 An Indoor Robot Navigation Method Based on Environmental Feature Detection
CN109816686A (en) * 2019-01-15 2019-05-28 山东大学 Robot semantic SLAM method, processor and robot based on object instance matching
US20190226852A1 (en) * 2016-09-09 2019-07-25 Nanyang Technological University Simultaneous localization and mapping methods and apparatus
CN110243370A (en) * 2019-05-16 2019-09-17 西安理工大学 A 3D Semantic Map Construction Method for Indoor Environment Based on Deep Learning
US20190370983A1 (en) * 2017-02-10 2019-12-05 SZ DJI Technology Co., Ltd. System and method for real-time location tracking of a drone
CN110728751A (en) * 2019-06-19 2020-01-24 武汉科技大学 A Construction Method of Indoor 3D Point Cloud Semantic Map
CN111060888A (en) * 2019-12-31 2020-04-24 芜湖哈特机器人产业技术研究院有限公司 Mobile robot repositioning method fusing ICP and likelihood domain model
CN111179426A (en) * 2019-12-23 2020-05-19 南京理工大学 Construction method of 3D semantic map of robot indoor environment based on deep learning
WO2020125495A1 (en) * 2018-12-17 2020-06-25 中国科学院深圳先进技术研究院 Panoramic segmentation method, apparatus and device
CN111402336A (en) * 2020-03-23 2020-07-10 中国科学院自动化研究所 Semantic S L AM-based dynamic environment camera pose estimation and semantic map construction method
CN111563442A (en) * 2020-04-29 2020-08-21 上海交通大学 Slam method and system for fusing point cloud and camera image data based on laser radar

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130314403A1 (en) * 2012-05-22 2013-11-28 Hon Hai Precision Industry Co., Ltd. Method for splicing point clouds together
US20190226852A1 (en) * 2016-09-09 2019-07-25 Nanyang Technological University Simultaneous localization and mapping methods and apparatus
US20190370983A1 (en) * 2017-02-10 2019-12-05 SZ DJI Technology Co., Ltd. System and method for real-time location tracking of a drone
WO2020125495A1 (en) * 2018-12-17 2020-06-25 中国科学院深圳先进技术研究院 Panoramic segmentation method, apparatus and device
CN109724603A (en) * 2019-01-08 2019-05-07 北京航空航天大学 An Indoor Robot Navigation Method Based on Environmental Feature Detection
CN109816686A (en) * 2019-01-15 2019-05-28 山东大学 Robot semantic SLAM method, processor and robot based on object instance matching
CN110243370A (en) * 2019-05-16 2019-09-17 西安理工大学 A 3D Semantic Map Construction Method for Indoor Environment Based on Deep Learning
CN110728751A (en) * 2019-06-19 2020-01-24 武汉科技大学 A Construction Method of Indoor 3D Point Cloud Semantic Map
CN111179426A (en) * 2019-12-23 2020-05-19 南京理工大学 Construction method of 3D semantic map of robot indoor environment based on deep learning
CN111060888A (en) * 2019-12-31 2020-04-24 芜湖哈特机器人产业技术研究院有限公司 Mobile robot repositioning method fusing ICP and likelihood domain model
CN111402336A (en) * 2020-03-23 2020-07-10 中国科学院自动化研究所 Semantic S L AM-based dynamic environment camera pose estimation and semantic map construction method
CN111563442A (en) * 2020-04-29 2020-08-21 上海交通大学 Slam method and system for fusing point cloud and camera image data based on laser radar

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
刘志超;顾广杰;: "多站激光扫描点云配准算法研究", 海洋测绘, no. 05, 25 September 2018 (2018-09-25) *
刘忠泽;陈慧岩;崔星;熊光明;王羽纯;陶溢;: "无人平台越野环境下同步定位与地图创建", 兵工学报, no. 12, 15 December 2019 (2019-12-15) *
张东;黄腾;: "基于平面特征的地面雷达点云配准算法", 测绘科学, vol. 40, no. 11, 30 November 2015 (2015-11-30) *
赵矿军;: "基于RGB-D摄像机的室内三维彩色点云地图构建", 哈尔滨商业大学学报(自然科学版), no. 01, 15 February 2018 (2018-02-15) *
黄疆坪;丛杨;高宏伟;唐延东;于海斌;: "基于字典选择的机器人在线场景语义浓缩", 科学通报, no. 2, 20 December 2013 (2013-12-20), pages 112 - 120 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612494A (en) * 2022-03-11 2022-06-10 南京理工大学 Design method of mobile robot vision odometer in dynamic scene
CN114882182A (en) * 2022-04-22 2022-08-09 东南大学 Semantic map construction method based on vehicle-road cooperative sensing system

Also Published As

Publication number Publication date
CN113916245B (en) 2024-07-19

Similar Documents

Publication Publication Date Title
JP6745328B2 (en) Method and apparatus for recovering point cloud data
US20210035314A1 (en) Map element extraction method and apparatus, and server
Matsuki et al. Codemapping: Real-time dense mapping for sparse slam using compact scene representations
CN111340939B (en) Indoor three-dimensional semantic map construction method
CN113052109A (en) 3D target detection system and 3D target detection method thereof
CN112767546B (en) Binocular image-based visual map generation method for mobile robot
CN113916245A (en) Semantic map construction method based on instance segmentation and VSLAM
Zhang et al. Improved feature point extraction method of ORB-SLAM2 dense map
CN113570713B (en) A semantic map construction method and device for dynamic environments
Phalak et al. Scan2plan: Efficient floorplan generation from 3d scans of indoor scenes
CN113160315B (en) Semantic environment map representation method based on dual quadric surface mathematical model
CN114022542A (en) A method of making 3D database based on 3D reconstruction
CN115601430A (en) Texture-free high-reflection object pose estimation method and system based on key point mapping
CN115902932A (en) Calibration and mapping method for fusion laser radar and rotating motor thereof
CN117115343A (en) Dynamic scene autonomous positioning and on-line high-precision three-dimensional reconstruction method
CN119152040B (en) Pose estimation method and device and electronic equipment
CN113822996B (en) Pose estimation method and device for robot, electronic device and storage medium
CN113240743A (en) Heterogeneous image pose estimation and registration method, device and medium based on neural network
CN117593620A (en) A multi-target detection method and device based on camera and lidar fusion
Jiang et al. 6D pose annotation and pose estimation method for weak-corner objects under low-light conditions
Rajput et al. Recursive Total Variation Filtering Based 3D Fusion.
Wang et al. S 2 KAN-SLAM: Elastic Neural LiDAR SLAM with SDF Submaps and Kolmogorov-Arnold Networks
Ruan The survey of vision-based 3D modeling techniques
Zhang et al. Colorful Reconstruction from Solid-State-LiDAR and Monocular Version
Jiao et al. Stereo vision SLAM system based on ORB

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant