[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN109544677B - Indoor scene main structure reconstruction method and system based on depth image key frame - Google Patents

Indoor scene main structure reconstruction method and system based on depth image key frame Download PDF

Info

Publication number
CN109544677B
CN109544677B CN201811278361.7A CN201811278361A CN109544677B CN 109544677 B CN109544677 B CN 109544677B CN 201811278361 A CN201811278361 A CN 201811278361A CN 109544677 B CN109544677 B CN 109544677B
Authority
CN
China
Prior art keywords
depth image
point
frame
camera
coordinate system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811278361.7A
Other languages
Chinese (zh)
Other versions
CN109544677A (en
Inventor
周元峰
高凤毅
张彩明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201811278361.7A priority Critical patent/CN109544677B/en
Publication of CN109544677A publication Critical patent/CN109544677A/en
Application granted granted Critical
Publication of CN109544677B publication Critical patent/CN109544677B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20028Bilateral filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The method and the system for reconstructing the main structure of the indoor scene based on the depth image key frame acquire the depth image: acquiring a depth image from a depth camera, processing the depth image to obtain corresponding point cloud data, and obtaining a normal vector according to the point cloud data; calculating a camera pose matrix of the current frame; judging whether the current frame depth image is a key frame depth image or not, and if so, adding the current frame depth image into the key frame sequence; calculating a main structure plane equation set for each frame of depth image added into the key frame sequence; converting the main structure plane equation from a camera coordinate system to a world coordinate system based on the camera pose matrix; adding the primary structure plane equation set into the secondary structure plane equation set, and adding the primary structure plane equation set into the primary structure plane equation set for registration and fusion; and reconstructing the main structure of the indoor scene according to the finally fused main structure plane equation set until all frames in the key frame sequence are processed.

Description

Indoor scene main structure reconstruction method and system based on depth image key frame
Technical Field
The embodiment of the application relates to the technical field of image processing, in particular to a method and a system for reconstructing a main structure of an indoor scene based on a depth image key frame.
Background
The Kinect sensor is an RGB-D sensor, can obtain environmental color value and depth value simultaneously, and its collection speed is fast, and the precision is high, and advantage such as low price makes its application in each field. The problems of object recognition, scene segmentation, three-dimensional reconstruction and the like of indoor scenes based on depth images also become research hotspots.
The depth map obtained by the Kinect camera can be mapped into a space to form a group of discrete three-dimensional point clouds, camera pose estimation and registration fusion are carried out through the point clouds, and finally a reconstruction algorithm of a three-dimensional model of an indoor scene is continuously optimized.
Resolving the main structure of an indoor scene based on a single frame depth image provides a foundation for the positions of beds, tables, doors and windows in a later scene. In recent years, a plurality of researchers have proposed some methods for analyzing main structures of indoor scenes, but because the limitation of the range of a single-frame image cannot accurately identify the real main structures in the scenes, partial analysis results are wrong, and identification of objects in later periods is influenced by marks.
In recent years, three-dimensional reconstruction based on various depth cameras, on-line images, disordered point clouds and the like has become a very popular research direction in the fields of computer graphics and vision. Slam (simultaneous Localization and mapping) has become a research hotspot in the field of robots, and aims to build a surrounding physical environment and track a user or a robot. Scanning indoor environments with a Kinect sensor to obtain depth data and modeling are the subject of much research by many scholars. University of Washington and Microsoft laboratories, Henry P, Krainin M, Herbst E, et al, RGB-D mapping Using depth cameras for dense 3D modeling of inductor environments [ C]A visual SLAM system based on SIFT (Scale invariant feature transform) feature matching positioning and TORO (Tree-based network Optimizer) optimization algorithm is developed to establish a 3D point cloud map. Newcomb R A, IZadi S, Hilliges O, et al Kinectfusion: Real-time dense surface mapping and tracking [ C]// Mixed and augmented reality (ISMAR), 201110 th IEEE international symposium on. IEEE,2011: 127-.
Figure BDA0001847526510000011
O,Prisacariu V A,Ren C Y,et al.Very high frame rate volumetric integration of depth images on mobile devices[J]IEEE transactions on visualization and computer graphics,2015,21(11):1241-1250 is a voxel hash-based indoor scene reconstruction algorithm developed by southern Kao university, which reduces memory by voxel block hashing, and also uses GPU acceleration to achieve very high real-time.
A plurality of researchers have proposed various research methods for the main structure analysis of indoor scenes, Lee D C, Hebert M, Kanade T.Geometric recovery for single image structure recovery [ C ]// Computer Vision and Pattern Recognition,2009.CVPR 2009.IEEE Conference on. IEEE,2009: 2136-; taylor C J, Cowley A. matching index genes using RGB-d image [ C ]// Robotics: Science and systems.2013,8:401 and 408. based on RGB image segmentation, fusing plane segmentation results, grouping the images, and finally converting the problem of extracting wall distribution into the problem of solving the optimal label by using dynamic programming; zhou Y, Xu H, Pan X, et al, matching main structures of inductor scenes from a single RGB-D image [ J ]. Journal of Advanced Mechanical Design, Systems, and Manufacturing,2016,10(8): JAMDSM0102-JAMDSM0102. clustering-based methods to perform plane fitting to generate the final main structure. Because of the limitation of a single frame image, the above methods all have the condition that the extraction main structure is wrong.
Disclosure of Invention
In order to overcome the defects of the prior art, the embodiment of the application provides an indoor scene main structure reconstruction method based on the depth image key frames, the extraction of a single-frame depth image main structure is expanded, a transformation matrix of each frame is calculated by utilizing camera pose estimation in a Kinectfusion algorithm, a main structure plane equation of a plurality of frames of key frames is subjected to registration fusion, and a three-dimensional model of the indoor scene main structure is generated.
The first aspect provides a method for reconstructing a main structure of an indoor scene based on a depth image key frame;
the method for reconstructing the main structure of the indoor scene based on the key frame of the depth image comprises the following steps:
step (1): acquiring a depth image: acquiring a depth image from a depth camera, processing the depth image to obtain corresponding point cloud data, and obtaining a normal vector according to the point cloud data;
step (2): based on point cloud data and normal vectors, searching corresponding pixel points of the current frame depth image in the previous frame depth image through a projection algorithm, and calculating a camera pose matrix of the current frame by minimizing the distance from the pixel points of the current frame depth image to the tangent plane of the corresponding pixel points of the previous frame depth image;
and (3): judging whether the current frame depth image is a key frame depth image or not, and if so, adding the current frame depth image into the key frame sequence;
and (4): calculating a main structure plane equation set for each frame of depth image added into the key frame sequence; converting the main structure plane equation from a camera coordinate system to a world coordinate system based on the camera pose matrix;
and (5): adding the primary structure plane equation set into the secondary structure plane equation set, and adding the primary structure plane equation set into the primary structure plane equation set for registration and fusion;
and (6): and reconstructing the main structure of the indoor scene according to the finally fused main structure plane equation set until all frames in the key frame sequence are processed.
Optionally, in some possible implementations, the step (1) includes:
a step (101): calibrating the Kinect depth camera to obtain internal parameters of the Kinect depth camera;
a step (102): scanning an indoor scene by using a Kinect depth camera to obtain a depth image;
step (103): performing noise reduction on the obtained depth image by using a bilateral filtering algorithm, calculating a point cloud three-dimensional coordinate corresponding to each pixel point of the depth image subjected to noise reduction through internal parameters of a Kinect depth camera, further obtaining three-dimensional point cloud data, and representing the three-dimensional point cloud data into ViWherein i represents the ith frame;
a step (104): computing a normal vector N for a pixel (x, y)i(x,y):
Ni(x,y)=(Vi(x+1,y)-Vi(x,y))×(Vi(y+1,x)-Vi(x,y));
Wherein, x represents cross product, Vi(x +1, y) represents three-dimensional point cloud data of a pixel point (x +1, y) of the ith frame image; vi(x, y) three-dimensional point cloud data representing pixel points (x, y) of the ith frame of image; vi(y +1, x) denotes the ith frame mapThree-dimensional point cloud data like pixel points (y +1, x);
and then, calculating normal vectors of all pixel points in the current frame.
Optionally, in some possible implementations, the step (2) includes:
step (201): determining the relation between corresponding pixel points of two frames of depth images at adjacent moments by using a projection method;
step (202): finding the best accuracy of the current relative pose by using the minimum total error among all corresponding pixel points;
step (203): judging whether the set iteration times are met, if so, entering the step (204), and if not, entering the repeated steps (201) - (202);
a step (204): and obtaining an optimal relative pose matrix.
Optionally, in some possible implementations, the step (201) includes:
using a camera to sample objects in the same environment at two positions, OkDenotes the origin of the coordinate system of the camera at time k, Ok-1Representing the origin of the coordinate system of the camera at the moment k-1;
converting a point in a world coordinate system at the moment k-1 into a three-dimensional point coordinate and a normal vector thereof in a camera coordinate system at the moment k-1 through a rotation matrix; then, converting three-dimensional points in a camera coordinate system at the moment k-1 into pixel points P in an image coordinate system at the moment k-1 by using camera internal parameters;
finding a pixel point P with the same pixel coordinate at the moment k, and calculating the three-dimensional point coordinate and the normal vector of the pixel point P by using the rotation matrix at the moment k;
judging whether the coordinates of the two three-dimensional points corresponding to the pixel point P are consistent at the time k-1 and the time k;
judging whether two normal vectors corresponding to the pixel point P are consistent at the time k-1 and the time k;
if the coordinates of the two three-dimensional points corresponding to the pixel point P are consistent, and the two normal vectors corresponding to the pixel point P are consistent; the pixel point P at the time k-1 and the time k are corresponding.
Optionally, in some possible implementations, the step (202) of calculating the total error includes:
pixel point P1And P2With an error of P1And P2The distance of the tangent to the point;
calculating the total error E (T) among all corresponding pixel pointsg,k):
Figure BDA0001847526510000041
Wherein,
Figure BDA0001847526510000042
indicating that one point u in the point cloud at the moment k has a corresponding point Tg,kIs a pose matrix of 4 x 4, which represents the absolute pose of the camera at time k in the world coordinate system, which is defined as the camera coordinate system of the first frame,
Figure BDA0001847526510000043
the vertex coordinates of the u point in the image frame at the time k,
Figure BDA0001847526510000044
is the vertex coordinate of the corresponding point in the image frame of the u point at the time instant k-1,
Figure BDA0001847526510000045
is a normal vector of the corresponding point,
Figure BDA0001847526510000046
is the corresponding point in the image of point u at time k-1.
Optionally, in some possible implementations, step (204): calculating an optimal solution x by calculating a linear equation set formula (2) through a least square method, and further obtaining an optimal relative pose matrix Tg,k
Figure BDA0001847526510000047
Wherein,
x=(β,γ,α,tx,ty,tz)T∈R6
Figure BDA0001847526510000048
wherein
Figure BDA0001847526510000049
Figure BDA00018475265100000410
Representing the pose matrix obtained by the z-1 st iteration,
Figure BDA00018475265100000411
a homogeneous representation of the point cloud coordinates representing pixel u at time k;
Figure BDA00018475265100000412
Figure BDA00018475265100000413
Figure BDA00018475265100000414
wherein, Tg,kThe camera position matrix at the moment k is represented, and points in the camera coordinate system at the moment k are converted into points of a global coordinate system through the camera position matrix; t isg,kThe pose matrix includes: rotation matrix Rg,kAnd a translation matrix tg,k(ii) a Parameters beta, alpha and gamma in the rotation matrix represent degrees of rotation along three coordinate axes of X, Y and Z; parameter t in translation matrixx,ty,tzIndicating the distance moved along three coordinate axes X, Y and Z;
Figure BDA0001847526510000051
representing a normal diagram and a point cloud diagram of the pixel point u in a world coordinate system at the moment k;
Figure BDA0001847526510000052
representing the corresponding point at the time k-1 by the pixel point u
Figure BDA0001847526510000053
Normal and point cloud images in the world coordinate system at the moment k-1; omegak(u) represents a corresponding point set of u points searched by a projection method at the moment k-1;
Figure BDA0001847526510000054
the three-dimensional point at the k coordinate system at the k moment corresponding to the u coordinate is transformed to the coordinate system at the k-1 moment and projected to the pixel coordinate position at the k-1 moment corresponding to the pixel coordinate system;
Figure BDA0001847526510000055
representing the z-1 th iteration solved camera pose matrix,
Figure BDA0001847526510000056
a homogeneous representation of the point cloud coordinates representing pixel u at time k;
Figure BDA0001847526510000057
the depth map representing the k moment is transformed to the three-dimensional point coordinates of a world coordinate system by the pixel u after z-1 iteration; g (u) is a matrix of 3 x 6, the first three columns being
Figure BDA0001847526510000058
The last three columns are 3 x 3 identity matrices; gT(u) is the transpose of G (u); i is3×3Is a 3 x 3 identity matrix.
Optionally, in some possible implementations, the step (3) includes:
determining whether a rotation transformation matrix between a frame pre-added to a sequence of key frames and a current frame in the sequence of key frames is greater than a threshold θ0If greater than, it will beAdding the frames added into the key frame sequence, otherwise, not adding the frames into the key frame sequence;
Tg,i -1*Tg,k0 (7)
setting the current frame in the key frame sequence as the kth frame and setting the frame newly added in the key frame sequence as the ith frame; if the formula (7) is satisfied, the ith frame is a key frame and is added into the key frame sequence; if the formula (7) is not satisfied, the ith frame is not added with the key frame sequence, and whether the rotation transformation matrix of the (i + 1) th frame is larger than the threshold value theta is continuously judged0(ii) a If the newly added continuous 10 frames are not more than theta0Then frame 10 is added to the sequence of key frames.
Optionally, in some possible implementations, the specific step of calculating the main structure plane equation set for each depth image added to the sequence of key frames in step (4) is:
a step (401): clustering normal vectors of all points to generate a plurality of core normal vectors;
step (402): denoising the clustering result, and removing points which have a direction difference of more than 30 degrees with the core normal vector in the same type of clustering result;
step (403): establishing a distance histogram for the projection distance of each type of point on the core normal vector according to the clustering result;
a step (404): and searching the maximum plane from large to small in the distance histograms of each type, and performing weighted least square fitting on the three-dimensional point cloud in the maximum plane to obtain a main structure plane of the frame depth image.
Optionally, in some possible implementations, the obtaining the main structure plane of the frame depth image includes:
min wu,v(Axu,v+Byu,v+Czu,v+D)2,(u,v)∈hf,
wherein h isfPoints representing the denoised core region; w is au,vIndicating that the point provides more weight to the plane fit as the normal vector of the (u, v) point is closer to the kernel normal, A, B, C and D are the parameters of the plane to be fitted.
Optionally, in some possible implementations, the specific step of converting the main structure plane equation to the world coordinate system in step (4) is:
Ak,ix+Bk,iy+Ck,iz+Dk,i=0;
Ak,i 2+Bk,i 2+Ck,i 2=1;
i={R+,1≤i≤n};
wherein A isk,i,Bk,i,Ck,i,Dk,i∈RnN represents the number of main structure planes in the frame depth image;
determining a plane equation according to the plane normal vector and any point on the plane, converting the plane equation in the camera coordinate system into a world coordinate system, and solving a plane parameter equation of the plane equation in the world coordinate system;
using a rotation matrix Rg,kConverting the normal vector from the camera coordinate system to the world coordinate system:
[Ag,k,Bg,k,Cg,k]T=Rg,k[Ak,Bk,Ck]T (3)
using a rotation matrix Rg,kAnd a translation matrix tg,kConverting points in the camera coordinate system to the world coordinate system:
[Xg,k,Yg,k,Zg,k]T=Rg,k[-DkAk,-DkBk,-DkCk]T+tg,k(4)
[-DFAk,-DkBk,-DkCk]Tis a point located on a plane;
Dg,k=-[Ag,k,Bg,k,Cg,k][Xg,k,Yg,k,Zg,k]T (5)
according to equations (3) and (5), the plane equation in the camera coordinate system at time k is converted into the plane parameter equation in the world coordinate system as follows:
Ag,kx+Bg,ky+Cg,kz+Dg,k=0 (6)。
optionally, in some possible implementations, the specific step of step (5) is:
step (501): and (3) solving the cosine distance of every two input frames and all the normal vectors of the plane equation after the current fusion, and judging whether the formula (8) is satisfied:
abs(Ncur,i*Nnew,k)>θ1 (8)
wherein N iscur,iNormal vector of plane equation in ith frame representing current sequence of key frames, Nnew,kNormal vector of plane equation, theta, in the k-th frame representing the newly added sequence of key frames1Setting a threshold value;
step (502): if the two planes meet the formula (8), judging whether the two planes are the same plane according to the equation parameter D of the two planes, and if the two planes meet the formula (9), indicating that the two planes are the same plane; if the formula (9) is not satisfied, judging that the two planes are parallel, and directly entering the step (503);
abs(Di-Dk)<θ2 (9)
wherein D isiConstant term in the plane parameter equation in frame i, corresponding to equation (8), DkConstant term, θ, in the plane parameter equation expressing the k-th frame in accordance with equation (8)2Setting a threshold value;
a step (503): registration plane weighted fusion:
the registered planes in the input frame and current plane equations are weighted and fused using equation (10):
Pmerge=wPcur,i+(1-w)Pnew,k (10)
wherein, Pcur,iRepresenting the plane parameter equation, P, resulting from the fusion of the current sequence of key framesnew,kRepresenting the plane parameter equation of the newly added key frame, w representing the fused weight, PmergeAnd expressing the fused plane parameter equation.
Step (504): generating a new plane set:
Pnow={Pmerge,Pcur,n,Pnew,m}
wherein, Pcur,nIndicating that there is no plane in the current key frame sequence that is in registration with the plane of the newly added key frame, Pnew,mIndicating that none of the newly added keyframes is registered with a plane in the current sequence of keyframes.
Optionally, in some possible implementations, the specific step of step (6) is:
according to the plane set P after fusion registrationnowAnd solving an intersection line between every two planes and an intersection point between the three planes so as to construct an indoor scene main structure.
In a second aspect, an indoor scene main structure reconstruction system based on a depth image key frame is provided;
indoor scene main structure reconstruction system based on depth image key frame includes: a depth camera and a processor;
the depth camera is used for acquiring a depth image of an indoor scene;
the processor is used for acquiring a depth image from the depth camera, processing the depth image to obtain corresponding point cloud data, and obtaining a normal vector according to the point cloud data; based on point cloud data and normal vectors, searching corresponding pixel points of the current frame depth image in the previous frame depth image through a projection algorithm, and calculating a camera pose matrix of the current frame by minimizing the distance from the pixel points to tangent planes where the corresponding pixel points are located; judging whether the current frame depth image is a key frame depth image or not, and if so, adding the current frame depth image into the key frame sequence; calculating a main structure plane equation set for each frame of depth image added into the key frame sequence; converting a main structure plane equation into a world coordinate system based on the camera pose matrix; adding the primary structure plane equation set and the primary structure plane equation set to carry out registration fusion; and reconstructing the main structure of the indoor scene according to the finally fused main structure plane equation set until all frames in the key frame sequence are processed.
In a third aspect, an electronic device is provided;
an electronic device, comprising: the computer program product comprises a memory, a processor, and computer instructions stored on the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of any of the above methods.
In a fourth aspect, a computer-readable storage medium is presented;
a computer readable storage medium having computer instructions embodied thereon, which, when executed by a processor, perform the steps of any of the above methods.
Compared with the prior art, the beneficial effects of the embodiment of the application are that:
and registering corresponding points of an indoor scene three-dimensional reconstruction frame based on a Kinectfusion algorithm to acquire the camera pose of each frame, and acquiring the camera pose change between key frames. If the camera pose change between the subsequent frame and the previous key frame exceeds the threshold value, adding the frame into the key frame sequence, and if the camera pose change is less than theta0Then the key frame is fetched at regular intervals. And clustering by taking the normal vector of each point in the key frame depth image as input, extracting the plane of the main structure of the indoor scene according to the clustering result, registering according to the pose of the key frame, fusing the plane equation of the main structure, and reconstructing the complete main structure of the indoor scene through plane intersection. The method is characterized by comprising the steps of expanding to multiple frames on the basis of a single-frame image, carrying out registration fusion on adjacent key frames based on a rotation matrix of a Kinectfusion algorithm, and finally generating a three-dimensional indoor scene main structure reconstruction model of a key frame sequence.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a Kinectfusion algorithm flow;
FIG. 2(a) shows two consecutive samples of the same scene;
FIG. 2(b) is a schematic diagram of a projection method point cloud corresponding point search and point-to-surface error mechanism;
FIG. 3(a) is an RGB map extracted from the main structure plane;
FIG. 3(b) is a main structure diagram extracted from the main structure plane;
FIG. 3(c) is a graph of the results of main structure plane extraction;
FIG. 4 is a schematic plan view of an intersection;
5(a) -5 (g) are partial key frames of Bedroom-0091;
FIG. 5(h) shows Kinectfusion reconstruction results;
FIG. 5(i) is the main structure point cloud reconstruction result;
FIG. 5(j) is a reconstruction result of an embodiment of the present application;
6(a) -6 (g) are Bedroom-0097 partial key frames;
FIG. 6(h) shows Kinectfusion reconstruction results;
FIG. 6(i) is the main structure point cloud reconstruction result;
FIG. 6(j) is a reconstruction result of an embodiment of the present application;
fig. 7 is a flowchart of an embodiment of the present application.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used in the examples herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
With the popularization of depth cameras, depth images are widely applied to various fields of image processing because they contain depth information. The method for analyzing the main structure of the indoor scene according to the input depth images is challenging work in the field of image analysis and computer vision, the condition of extraction errors of the main structure of a single depth image can be reduced through continuous depth image input, and the extracted main structure is optimized. According to the method and the device, the indoor scene main structure extraction and the visual SLAM are combined to generate a complete indoor scene main structure three-dimensional model. The method and the device for acquiring the camera pose of each frame are obtained by registering corresponding points of an indoor scene three-dimensional reconstruction frame based on Kinectfusion, and the change of the camera pose among key frames is obtained. And if the camera pose change between the subsequent frame and the previous key frame exceeds a threshold value, adding the frame into the key frame sequence, and if the camera pose change is smaller, taking the key frame at a fixed interval. And clustering by taking the normal vector of each point in the key frame depth image as input, extracting the plane of the main structure of the indoor scene according to the clustering result, registering according to the pose of the key frame, fusing the plane equation of the main structure, and reconstructing the complete main structure of the indoor scene through plane intersection.
The Kinectfusion algorithm realizes 3D scene reconstruction by matching, positioning and fusing depth data acquired by Kinect. The algorithm flow of the method is shown in fig. 1 and 7, and mainly comprises four parts:
a) the depth data processing is to convert the original depth data of the sensor into 3D point cloud to obtain the three-dimensional coordinates and normal vectors of the vertexes in the point cloud;
b) performing ICP (iterative closed Point) matching on the current frame of 3D point cloud and the predicted 3D point cloud generated by the current model, and calculating to obtain the pose of the current frame of camera;
c) point cloud fusion, namely fusing the 3D point cloud of the current frame into the existing model by using a TSDF point cloud fusion algorithm [ Curless B, Levoy M.A volumetric method for building complex models [ C ]// Proceedings of ACM SIGGRAGPH. New York, USA: ACM,1996: 303-;
d) scene rendering, namely predicting the environmental point cloud observed by the current camera according to the existing model and the current camera pose by using a ray tracing method, and feeding the environmental point cloud to a user on one hand and providing the environmental point cloud for b) to carry out ICP matching on the other hand.
ICP (iterative close Point) positioning method in Kinectfusion, namely, nearest point iterative algorithm)
In the ICP positioning link, when matching the current frame 3D point cloud with the predicted 3D point cloud, the method is realized by the following steps:
(1) the relationship between the corresponding points is determined using a projective method. The process is represented by a 2-dimensional ICP, the black curve in FIG. 2(a) being the object in the environment, which the camera samples and predicts at two successive locations, O, respectivelykAnd Ok-1Respectively the origin of the coordinate systems of the current camera and the previous frame of camera, firstly converting two point clouds at the moment of k and k-1 into the coordinate system of the current frame of k, and then passing the two point clouds through the center O of the camerakProjecting to the image plane, the point with the same projection point on the image plane in the two point clouds is the corresponding point, P in fig. 2(b)1And P2And the algorithm also screens the corresponding points through Euclidean distances and normal direction included angles between the corresponding points.
(2) And measuring the accuracy of the current relative pose by using a point-to-plane error mechanism. In FIG. 2(b), in the 2-dimensional case, P1And P2With an error of P1And P2The distance d of the tangent to the point. The total error between all corresponding points is given by the following equation:
Figure BDA0001847526510000101
wherein
Figure BDA0001847526510000102
Indicating that a corresponding point, T, exists at a point u in the current point cloudg,kIs a pose matrix of 4 x 4, which represents the absolute pose of the camera of the current frame in the world coordinate system, which is defined as the camera coordinate system of the first frame,
Figure BDA0001847526510000103
is the vertex coordinate of the u point in the current frame,
Figure BDA0001847526510000104
is the vertex coordinates of the corresponding point of the u point in the predicted frame,
Figure BDA0001847526510000105
is the normal vector of the corresponding point. FIG. 3(a) is an RGB map extracted from the main structure plane; FIG. 3(b) is a main structure diagram extracted from the main structure plane; fig. 3(c) is a graph of the result of main structure plane extraction.
(3) Obtaining the optimal relative pose T by optimizing the formula (1)g,k
And converting the optimization problem into least square optimization by adopting a linearization method, and calculating an optimal solution x by calculating a linear equation set as the formula (2).
Figure BDA0001847526510000111
Wherein
x=(β,γ,α,tx,ty,tz)T∈R6,
Figure BDA0001847526510000112
Wherein
Figure BDA0001847526510000113
Figure BDA0001847526510000114
Representing the pose matrix obtained by the z-1 st iteration,
Figure BDA0001847526510000115
a homogeneous representation of the point cloud coordinates representing the pixel u at time k.
Figure BDA0001847526510000116
Figure BDA0001847526510000117
Figure BDA0001847526510000118
(4) Iterating the step (1) to the step (2) for ten times
1.1 Main Structure plane extraction based on Single frame depth image
In the embodiment of the application, an indoor scene main structure is extracted according to each input frame of depth image, and a plane parameter equation is adopted to express a main structure plane in the frame. The method mainly comprises the following steps:
1) clustering is carried out according to the normal vector, and the density of each point and the distance between each point and a point with larger density are calculated according to the cosine distance to obtain a clustering center;
2) denoising the clustering result, and removing points which have too large difference with the direction of the core normal vector in the same type of clustering result;
3) establishing a distance histogram for the projection distance of each type of points in the core normal direction according to the clustering result;
4) searching a large plane from large to small (from small to large) in the distance histogram of each type, and performing weighted least square fitting on the three-dimensional point cloud in the large plane to obtain a parameter equation of the main structure plane in the frame:
min wu,v(Axu,v+Byu,v+Czu,v+D)2,(u,v)∈hf,
wherein h isfPoints representing a denoised core region;
2 indoor scene main structure reconstruction
2.1 transformation of the coordinate System
By preprocessing the single frame depth image, the transformation matrix T of each frame image to the world coordinate system is obtainedg,kParameter equation of main structure plane of indoor scene of each frame image, Ak,ix+Bk,iy+Ck,iz+Dk,i=0,Ak,i 2+Bk,i 2+Ck,i 2=1,i={R+,1≤i≤n},Ak,i,Bk,i,Ck,i,Dk,i∈RnAnd n represents the number of main structural planes in the frame. And determining a plane equation according to the plane normal vector and a point on the plane, and converting the plane equation in the camera coordinate system into the world coordinate system to obtain a plane parameter equation of the plane equation in the world coordinate system.
Using Rg,kConverting the normal vector from the camera coordinate system to the world coordinate system, wherein the formula is as follows:
[Ag,k,Bg,k,Cg,k]T=Rg,k[Ak,Bk,Ck]T (3)
using Rg,kAnd tg,kConverting a point in the camera coordinate system to the world coordinate system, the formula is as follows:
[Xg,k,Yg,k,Zg,k]T=Rg,k[-DkAk,-DkBk,-DkCk]T+tg,k (4)
[-DkAk,-DkBk,-DkCk]Tis a point located on the plane.
Dg,k=-[Ag,k,Bg,k,Cg,k][Xg,k,Yg,k,Zg,k]T (5)
According to the formulas (3) and (5), the plane equation in the k frame camera coordinate system is converted into the plane parameter equation in the world coordinate system as follows:
Ag,kx+Bg,ky+Cg,kz+Dg,k=0 (6)
2.2 Key frame selection
In order to accelerate the speed of extracting and reconstructing the main structure, the embodiment of the application selects a key frame sequence to reconstruct the main structure of the indoor scene. By judging whether the rotation transformation matrix between two frames is larger than a threshold value theta0To determine whether to add to the key frame sequence.
Tg,i -1*Tg,k0 (7)
And (3) setting the current frame in the key frame sequence as the kth frame and the newly added frame as the ith frame, and if the formula (7) is satisfied, setting the ith frame as the key frame and adding the key frame sequence. If not, the newly added frame does not add the key frame sequence, and whether the rotation transformation matrix of the next frame is larger than the threshold value theta set by the user or not is continuously judged0. If the newly added continuous 10 frames are not more than theta0Then frame 10 is added to the sequence of key frames.
2.3 planar registration fusion
1) And (3) determining the cosine distances of the input frame and all the planes after current fusion in a pairwise manner, and judging whether the following formula is met:
abs(Ncur,i*Nnew,k)>θ1 (8)
wherein N iscur,iRepresenting the normal vector of the plane equation, N, in the current sequence of key framesnew,kNormal vector of plane equation, theta, in the k-th frame representing the newly added sequence of key frames1The value in the experiment was 0.93.
If the normal vectors of the two planes conform to the formula (8), it is determined that the two planes are parallel.
2) In a hexahedral scene, the situation that two wall surfaces are parallel is very likely to occur, under the situation, the normal directions of the two wall surfaces are basically consistent, and if the two planes satisfy the above formula (8), whether the two planes are the same plane is judged according to the equation parameter D of the two planes.
abs(Di-Dk)<θ2 (9)
Wherein D isi,DkRepresents a constant term in the plane parameter equation corresponding to equation (8) in the ith and kth frames, respectively, theta2The value in the experiment was 0.5.
1) Registration plane weighted fusion
The registered planes (considered computationally identical planes) in the input frame and current plane equations are weighted fused using equation (10).
Pmerge=wPcur,i+(1-w)Pnew,k (10)
Wherein, Pcur,iRepresenting the plane parameter equation, P, resulting from the fusion of the current sequence of key framesnew,kA plane parameter equation representing the newly added key frame. w represents the weight of fusion, and the value in the experiment is 0.4.
2) Generating a new set of planes
Pnow={Pmerge,Pcur,n,Pnew,m}
Wherein, Pcur,nRepresenting planes in the current sequence of key frames that are not in plane registration with the newly added key frame, Pnew,mRepresenting planes in the newly added key frame that are not registered with planes in the current key frame sequence.
2.4 plane intersection
According to the obtained fusion registered plane set PnowAnd (3) solving intersection lines and intersection points, selecting the direction of the intersection lines pointing to the viewpoint direction to reconstruct a main structure of the indoor scene, wherein in the figure 4, a small sun point is the position of the viewpoint.
Algorithm indoor scene main structure reconstruction
Inputting: depth image sequence S ═ Si|i=1,2,……,n}
And (3) outputting: fused registered plane parameter equation set Pnow
1, acquiring a depth image from the Kinect, performing noise reduction processing, and calculating point cloud and normal information;
2 obtaining the camera pose T of each frame by using Kinectfusion algorithmg,i
3 judging whether the current frame is added into a key frame sequence (the key frame sequence is a subset of the input image sequence S) according to a formula (7);
4 calculating main structure plane equation set P for each frame depth image added to the key frame sequencecurAnd converting the plane equation into a world coordinate system according to a formula (6);
5, carrying out registration fusion on the plane equation set added later and the previous plane equation set according to the formulas (8), (9) and (10);
and 6, reconstructing an indoor scene main structure according to the finally fused plane equation set until all frames in the key frame sequence are processed.
4 results of the experiment
Experimental data were derived from the data set in NYU v2, which is widely used by scholars. We reconstruct the main structure of the indoor scene from the android-0091 and android-0097, and compare the method with the reconstruction method of the main structure of the indoor scene based on the point cloud, as shown in fig. 5(a) -5 (j) and 6(a) -6 (j). As can be seen from the result graph, in the cabinet at the corner in the bedroom-0091, because the wall plane extends leftwards due to the shielding of the cabinet, the main structure extraction does not obtain a left plane, and the point cloud fusion method cannot eliminate redundant main structure planes, so that the subsequent rendering result is affected, but the redundant part of the previous main structure fitting can be easily eliminated through the plane intersection of the method in the embodiment of the application. The ceiling in the upper right corner of the Bedroom-0097 is not obviously shown in the main structure reconstruction of Kinectfusion and point cloud because the area is too small, but the ceiling plane can be explicitly reconstructed by the main structure extraction reconstruction method in the embodiment of the application.
Compared with the extraction of the main structure of the indoor scene of a single frame image, the method can more reliably extract the main structure of the indoor scene, and avoids the situation of the extraction error of the main structure caused by the shielding of a large plane; meanwhile, the main structure of the scene can be completely reconstructed by reconstructing the main structure of the indoor scene based on the plane parameters, and a basic frame is provided for understanding the indoor scene and adding objects into the scene one by one in the later period. Compared with the reconstruction of the main structure of the indoor scene based on the point cloud, the reconstruction method based on the plane equation in the embodiment of the application has the obvious effect of being more efficient.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (8)

1. The method for reconstructing the main structure of the indoor scene based on the key frame of the depth image is characterized by comprising the following steps:
step (1): acquiring a depth image: acquiring a depth image from a depth camera, processing the depth image to obtain corresponding point cloud data, and obtaining a normal vector according to the point cloud data;
step (2): based on point cloud data and normal vectors, searching corresponding pixel points of the current frame depth image in the previous frame depth image through a projection algorithm, and calculating a camera pose matrix of the current frame by minimizing the distance from the pixel points of the current frame depth image to the tangent plane of the corresponding pixel points of the previous frame depth image;
and (3): judging whether the current frame depth image is a key frame depth image or not, and if so, adding the current frame depth image into the key frame sequence;
and (4): calculating a main structure plane equation set for each frame of depth image added into the key frame sequence; converting the main structure plane equation from a camera coordinate system to a world coordinate system based on the camera pose matrix;
and (5): adding the primary structure plane equation set into the secondary structure plane equation set, and adding the primary structure plane equation set into the primary structure plane equation set for registration and fusion;
and (6): reconstructing an indoor scene main structure according to a final fused main structure plane equation set until all frames in the key frame sequence are processed;
the step (2) comprises the following steps:
step (201): determining the relation between corresponding pixel points of two frames of depth images at adjacent moments by using a projection method;
the step (201) comprises the following steps:
using a camera to sample objects in the same environment at two positions, OkDenotes the origin of the coordinate system of the camera at time k, Ok-1Representing the origin of the coordinate system of the camera at the moment k-1;
converting a point in a world coordinate system at the moment k-1 into a three-dimensional point coordinate and a normal vector thereof in a camera coordinate system at the moment k-1 through a rotation matrix; then, converting three-dimensional points in a camera coordinate system at the moment k-1 into pixel points P in an image coordinate system at the moment k-1 by using camera internal parameters;
finding a pixel point P with the same pixel coordinate at the moment k, and calculating the three-dimensional point coordinate and the normal vector of the pixel point P by using the rotation matrix at the moment k;
judging whether the coordinates of the two three-dimensional points corresponding to the pixel point P are consistent at the time k-1 and the time k;
judging whether two normal vectors corresponding to the pixel point P are consistent at the time k-1 and the time k;
if the coordinates of the two three-dimensional points corresponding to the pixel point P are consistent, and the two normal vectors corresponding to the pixel point P are consistent; the pixel point P at the k-1 moment and the k moment is corresponding;
the specific step of calculating the main structure plane equation set for each frame depth image added into the key frame sequence in the step (4) is as follows:
a step (401): clustering normal vectors of all points to generate a plurality of core normal vectors;
step (402): denoising the clustering result, and removing points which have a direction difference of more than 30 degrees with the core normal vector in the same type of clustering result;
step (403): establishing a distance histogram for the projection distance of each type of point on the core normal vector according to the clustering result;
a step (404): and searching the maximum plane from large to small in the distance histograms of each type, and performing weighted least square fitting on the three-dimensional point cloud in the maximum plane to obtain a main structure plane of the frame depth image.
2. The method for reconstructing a main structure of an indoor scene based on a key frame of a depth image as claimed in claim 1, wherein the step (1) comprises the steps of:
a step (101): calibrating the Kinect depth camera to obtain internal parameters of the Kinect depth camera;
a step (102): scanning an indoor scene by using a Kinect depth camera to obtain a depth image;
step (103): performing noise reduction on the obtained depth image by using a bilateral filtering algorithm, calculating a point cloud three-dimensional coordinate corresponding to each pixel point of the depth image subjected to noise reduction through internal parameters of a Kinect depth camera, further obtaining three-dimensional point cloud data, and representing the three-dimensional point cloud data into ViWherein i represents the ith frame;
a step (104): computing a normal vector N for a pixel (x, y)i(x,y):
Ni(x,y)=(Vi(x+1,y)-Vi(x,y))×(Vi(y+1,x)-Vi(x,y));
Wherein, x represents cross product, Vi(x +1, y) represents three-dimensional point cloud data of a pixel point (x +1, y) of the ith frame image; vi(x, y) three-dimensional point cloud data representing pixel points (x, y) of the ith frame of image; vi(y +1, x) represents three-dimensional point cloud data of a pixel point (y +1, x) of the ith frame of image;
and then, calculating normal vectors of all pixel points in the current frame.
3. The method for reconstructing a main structure of an indoor scene based on a key frame of a depth image as claimed in claim 1, wherein the step (2) further comprises:
step (202): finding the best accuracy of the current relative pose by using the minimum total error among all corresponding pixel points;
step (203): judging whether the set iteration times are met, if so, entering the step (204), and if not, entering the repeated steps (201) - (202);
a step (204): and obtaining an optimal relative pose matrix.
4. The method as claimed in claim 3, wherein the total error of step (202) is calculated by:
pixel point P1And P2With an error of P1And P2The distance of the tangent to the point;
computingTotal error E (T) between all corresponding pixelsg,k):
Figure FDA0002767226230000031
Wherein,
Figure FDA0002767226230000032
indicating that one point u in the point cloud at the moment k has a corresponding point Tg,kIs a pose matrix of 4 x 4, which represents the absolute pose of the camera at time k in the world coordinate system, which is defined as the camera coordinate system of the first frame,
Figure FDA0002767226230000033
the vertex coordinates of the u point in the image frame at the time k,
Figure FDA0002767226230000034
is the vertex coordinate of the corresponding point in the image frame of the u point at the time instant k-1,
Figure FDA0002767226230000035
is a normal vector of the corresponding point,
Figure FDA0002767226230000036
is the corresponding point in the image of point u at time k-1;
a step (204): calculating an optimal solution x by calculating a linear equation set formula (2) through a least square method, and further obtaining an optimal relative pose matrix Tg,k
Figure FDA0002767226230000037
Wherein,
x=(β,γ,α,tx,ty,tz)T∈R6
Figure FDA0002767226230000038
wherein
Figure FDA0002767226230000039
Figure FDA00027672262300000310
Representing the pose matrix obtained by the z-1 st iteration,
Figure FDA00027672262300000311
a homogeneous representation of the point cloud coordinates representing pixel u at time k;
Figure FDA00027672262300000312
Figure FDA00027672262300000313
Figure FDA00027672262300000314
wherein, Tg,kThe pose matrix is 4 multiplied by 4, and points in a camera coordinate system at the moment k are converted into points of a global coordinate system through the camera pose matrix; t isg,kThe pose matrix includes: rotation matrix Rg,kAnd a translation matrix tg,k(ii) a Parameters beta, alpha and gamma in the rotation matrix represent degrees of rotation along three coordinate axes of X, Y and Z; parameter t in translation matrixx,ty,tzIndicating the distance moved along three coordinate axes X, Y and Z;
Figure FDA0002767226230000041
representing a normal diagram and a point cloud diagram of the pixel point u in a world coordinate system at the moment k;
Figure FDA0002767226230000042
representing the corresponding point at the time k-1 by the pixel point u
Figure FDA0002767226230000043
Normal and point cloud images in the world coordinate system at the moment k-1; omegak(u) represents a corresponding point set of u points searched by a projection method at the moment k-1;
Figure FDA0002767226230000044
the three-dimensional point at the k coordinate system at the k moment corresponding to the u coordinate is transformed to the coordinate system at the k-1 moment and projected to the pixel coordinate position at the k-1 moment corresponding to the pixel coordinate system;
Figure FDA0002767226230000045
representing the z-1 th iteration solved camera pose matrix,
Figure FDA0002767226230000046
a homogeneous representation of the point cloud coordinates representing pixel u at time k;
Figure FDA0002767226230000047
the depth map representing the k moment is transformed to the three-dimensional point coordinates of a world coordinate system by the pixel u after z-1 iteration; g (u) is a matrix of 3 x 6, the first three columns being
Figure FDA0002767226230000048
The last three columns are 3 x 3 identity matrices; gT(u) is the transpose of G (u); i is3×3Is a 3 x 3 identity matrix.
5. The method for reconstructing a main structure of an indoor scene based on a key frame of a depth image as claimed in claim 1, wherein the step (3) comprises the steps of:
determining frames pre-added to a sequence of key frames and a current frame in the sequence of key framesWhether the rotational transformation matrix between frames is greater than a threshold θ0If the number of the frames in the key frame sequence is larger than the preset number, adding the frames pre-added into the key frame sequence, otherwise, not adding the frames into the key frame sequence;
Tg,i -1*Tg,k>θ0 (7)
setting the current frame in the key frame sequence as the kth frame and setting the frame newly added in the key frame sequence as the ith frame; if the formula (7) is satisfied, the ith frame is a key frame and is added into the key frame sequence; if the formula (7) is not satisfied, the ith frame is not added with the key frame sequence, and whether the rotation transformation matrix of the (i + 1) th frame is larger than the threshold value theta is continuously judged0(ii) a If the newly added continuous 10 frames are not more than theta0Then frame 10 is added to the sequence of key frames.
6. Indoor scene main structure reconstruction system based on depth image key frame, characterized by includes: a depth camera and a processor;
the depth camera is used for acquiring a depth image of an indoor scene;
the processor is used for acquiring a depth image from the depth camera, processing the depth image to obtain corresponding point cloud data, and obtaining a normal vector according to the point cloud data; based on point cloud data and normal vectors, searching corresponding pixel points of the current frame depth image in the previous frame depth image through a projection algorithm, and calculating a camera pose matrix of the current frame by minimizing the distance from the pixel points to tangent planes where the corresponding pixel points are located; judging whether the current frame depth image is a key frame depth image or not, and if so, adding the current frame depth image into the key frame sequence; calculating a main structure plane equation set for each frame of depth image added into the key frame sequence; converting a main structure plane equation into a world coordinate system based on the camera pose matrix; adding the primary structure plane equation set and the primary structure plane equation set to carry out registration fusion; reconstructing an indoor scene main structure according to a final fused main structure plane equation set until all frames in the key frame sequence are processed;
the step of calculating the camera pose matrix of the current frame comprises:
step (201): determining the relation between corresponding pixel points of two frames of depth images at adjacent moments by using a projection method;
the step (201) comprises the following steps:
using a camera to sample objects in the same environment at two positions, OkDenotes the origin of the coordinate system of the camera at time k, Ok-1Representing the origin of the coordinate system of the camera at the moment k-1;
converting a point in a world coordinate system at the moment k-1 into a three-dimensional point coordinate and a normal vector thereof in a camera coordinate system at the moment k-1 through a rotation matrix; then, converting three-dimensional points in a camera coordinate system at the moment k-1 into pixel points P in an image coordinate system at the moment k-1 by using camera internal parameters;
finding a pixel point P with the same pixel coordinate at the moment k, and calculating the three-dimensional point coordinate and the normal vector of the pixel point P by using the rotation matrix at the moment k;
judging whether the coordinates of the two three-dimensional points corresponding to the pixel point P are consistent at the time k-1 and the time k;
judging whether two normal vectors corresponding to the pixel point P are consistent at the time k-1 and the time k;
if the coordinates of the two three-dimensional points corresponding to the pixel point P are consistent, and the two normal vectors corresponding to the pixel point P are consistent; the pixel point P at the k-1 moment and the k moment is corresponding;
the specific steps of calculating the main structure plane equation set for each frame depth image added into the key frame sequence are as follows:
a step (401): clustering normal vectors of all points to generate a plurality of core normal vectors;
step (402): denoising the clustering result, and removing points which have a direction difference of more than 30 degrees with the core normal vector in the same type of clustering result;
step (403): establishing a distance histogram for the projection distance of each type of point on the core normal vector according to the clustering result;
a step (404): and searching the maximum plane from large to small in the distance histograms of each type, and performing weighted least square fitting on the three-dimensional point cloud in the maximum plane to obtain a main structure plane of the frame depth image.
7. An electronic device, comprising: memory, a processor, and computer instructions stored on the memory and executed on the processor, the computer instructions, when executed by the processor, performing the steps of the method of any of claims 1-5.
8. A computer-readable storage medium having computer instructions embodied thereon, which when executed by a processor, perform the steps of the method of any one of claims 1 to 5.
CN201811278361.7A 2018-10-30 2018-10-30 Indoor scene main structure reconstruction method and system based on depth image key frame Active CN109544677B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811278361.7A CN109544677B (en) 2018-10-30 2018-10-30 Indoor scene main structure reconstruction method and system based on depth image key frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811278361.7A CN109544677B (en) 2018-10-30 2018-10-30 Indoor scene main structure reconstruction method and system based on depth image key frame

Publications (2)

Publication Number Publication Date
CN109544677A CN109544677A (en) 2019-03-29
CN109544677B true CN109544677B (en) 2020-12-25

Family

ID=65845972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811278361.7A Active CN109544677B (en) 2018-10-30 2018-10-30 Indoor scene main structure reconstruction method and system based on depth image key frame

Country Status (1)

Country Link
CN (1) CN109544677B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110223336B (en) * 2019-05-27 2023-10-17 上海交通大学 Plane fitting method based on TOF camera data
CN110675360B (en) * 2019-08-02 2022-04-01 杭州电子科技大学 Real-time plane detection and extraction method based on depth image
CN112106112A (en) * 2019-09-16 2020-12-18 深圳市大疆创新科技有限公司 Point cloud fusion method, device and system and storage medium
CN112541950A (en) * 2019-09-20 2021-03-23 杭州海康机器人技术有限公司 Method and device for calibrating external parameter of depth camera
CN110706332B (en) * 2019-09-25 2022-05-17 北京计算机技术及应用研究所 Scene reconstruction method based on noise point cloud
CN110675390A (en) * 2019-09-27 2020-01-10 广东博智林机器人有限公司 Building quality global detection method and device
CN110874851A (en) * 2019-10-25 2020-03-10 深圳奥比中光科技有限公司 Method, device, system and readable storage medium for reconstructing three-dimensional model of human body
CN110793441B (en) * 2019-11-05 2021-07-27 北京华捷艾米科技有限公司 High-precision object geometric dimension measuring method and device
CN111145238B (en) * 2019-12-12 2023-09-22 中国科学院深圳先进技术研究院 Three-dimensional reconstruction method and device for monocular endoscopic image and terminal equipment
CN111144349B (en) * 2019-12-30 2023-02-24 中国电子科技集团公司信息科学研究院 Indoor visual relocation method and system
CN113223079A (en) * 2020-02-06 2021-08-06 阿里巴巴集团控股有限公司 Visual positioning method and device and electronic equipment
CN111311664B (en) * 2020-03-03 2023-04-21 上海交通大学 Combined unsupervised estimation method and system for depth, pose and scene flow
CN111340949B (en) * 2020-05-21 2020-09-18 超参数科技(深圳)有限公司 Modeling method, computer device and storage medium for 3D virtual environment
CN111539899A (en) * 2020-05-29 2020-08-14 深圳市商汤科技有限公司 Image restoration method and related product
CN111986086B (en) * 2020-08-27 2021-11-09 贝壳找房(北京)科技有限公司 Three-dimensional image optimization generation method and system
CN112232274A (en) * 2020-11-03 2021-01-15 支付宝(杭州)信息技术有限公司 Depth image model training method and device
CN112348958A (en) * 2020-11-18 2021-02-09 北京沃东天骏信息技术有限公司 Method, device and system for acquiring key frame image and three-dimensional reconstruction method
CN113096185B (en) * 2021-03-29 2023-06-06 Oppo广东移动通信有限公司 Visual positioning method, visual positioning device, storage medium and electronic equipment
CN113240789B (en) 2021-04-13 2023-05-23 青岛小鸟看看科技有限公司 Virtual object construction method and device
CN113160390B (en) * 2021-04-28 2022-07-22 北京理工大学 Three-dimensional dense reconstruction method and system
CN113327318B (en) * 2021-05-18 2022-07-29 禾多科技(北京)有限公司 Image display method, image display device, electronic equipment and computer readable medium
CN113902846B (en) * 2021-10-11 2024-04-12 岱悟智能科技(上海)有限公司 Indoor three-dimensional modeling method based on monocular depth camera and mileage sensor
CN113989116B (en) * 2021-10-25 2024-08-02 西安知微传感技术有限公司 Point cloud fusion method and system based on symmetry plane
CN114943765A (en) * 2022-05-24 2022-08-26 奥比中光科技集团股份有限公司 Indoor attitude estimation method and device
CN116824070B (en) * 2023-08-31 2023-11-24 江西求是高等研究院 Real-time three-dimensional reconstruction method and system based on depth image
CN117649495B (en) * 2024-01-30 2024-05-28 山东大学 Indoor three-dimensional point cloud map generation method and system based on point cloud descriptor matching

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942832A (en) * 2014-04-11 2014-07-23 浙江大学 Real-time indoor scene reconstruction method based on on-line structure analysis
CN105205858A (en) * 2015-09-18 2015-12-30 天津理工大学 Indoor scene three-dimensional reconstruction method based on single depth vision sensor
CN107240129A (en) * 2017-05-10 2017-10-10 同济大学 Object and indoor small scene based on RGB D camera datas recover and modeling method
WO2018095789A1 (en) * 2016-11-22 2018-05-31 Lego A/S System for acquiring a 3d digital representation of a physical object

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875437B (en) * 2016-12-27 2020-03-17 北京航空航天大学 RGBD three-dimensional reconstruction-oriented key frame extraction method
CN107833253B (en) * 2017-09-22 2020-08-04 北京航空航天大学青岛研究院 RGBD three-dimensional reconstruction texture generation-oriented camera attitude optimization method
CN107862735B (en) * 2017-09-22 2021-03-05 北京航空航天大学青岛研究院 RGBD three-dimensional scene reconstruction method based on structural information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942832A (en) * 2014-04-11 2014-07-23 浙江大学 Real-time indoor scene reconstruction method based on on-line structure analysis
CN105205858A (en) * 2015-09-18 2015-12-30 天津理工大学 Indoor scene three-dimensional reconstruction method based on single depth vision sensor
WO2018095789A1 (en) * 2016-11-22 2018-05-31 Lego A/S System for acquiring a 3d digital representation of a physical object
CN107240129A (en) * 2017-05-10 2017-10-10 同济大学 Object and indoor small scene based on RGB D camera datas recover and modeling method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
RGBD数据驱动的室内景物三维建模方法研究;宫钰嵩;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160315(第03期);第I138-7412页 *
Stereo and kinect fusion for continuous 3D reconstruction and visual odometry;Ozgur Yilmaz 等;《2013 International Conference on Electronics, Computer and Computation (ICECCO》;20131231;第115-118页 *
一种鲁棒的三维点云骨架提取方法;王晓洁 等;《中国科学:信息科学》;20170720;第47卷(第7期);第832-845页 *

Also Published As

Publication number Publication date
CN109544677A (en) 2019-03-29

Similar Documents

Publication Publication Date Title
CN109544677B (en) Indoor scene main structure reconstruction method and system based on depth image key frame
CN111563442B (en) Slam method and system for fusing point cloud and camera image data based on laser radar
CN112258618B (en) Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map
KR102647351B1 (en) Modeling method and modeling apparatus using 3d point cloud
CN107093205B (en) A kind of three-dimensional space building window detection method for reconstructing based on unmanned plane image
Lin et al. Topology aware object-level semantic mapping towards more robust loop closure
CN108898676B (en) Method and system for detecting collision and shielding between virtual and real objects
Xu et al. Reconstruction of scaffolds from a photogrammetric point cloud of construction sites using a novel 3D local feature descriptor
CN112927353A (en) Three-dimensional scene reconstruction method based on two-dimensional target detection and model alignment, storage medium and terminal
CN115035260A (en) Indoor mobile robot three-dimensional semantic map construction method
CN108805201A (en) Destination image data set creation method and its device
CN112651944A (en) 3C component high-precision six-dimensional pose estimation method and system based on CAD model
Bu et al. Semi-direct tracking and mapping with RGB-D camera for MAV
Yuan et al. 3D point cloud recognition of substation equipment based on plane detection
CN108961385A (en) A kind of SLAM patterning process and device
Pan et al. Optimization algorithm for high precision RGB-D dense point cloud 3D reconstruction in indoor unbounded extension area
Yin et al. [Retracted] Virtual Reconstruction Method of Regional 3D Image Based on Visual Transmission Effect
CN115601430A (en) Texture-free high-reflection object pose estimation method and system based on key point mapping
CN111709269B (en) Human hand segmentation method and device based on two-dimensional joint information in depth image
CN108694348B (en) Tracking registration method and device based on natural features
Guo et al. Efficient planar surface-based 3D mapping method for mobile robots using stereo vision
Nakagawa et al. Topological 3D modeling using indoor mobile LiDAR data
CN111915632A (en) Poor texture target object truth value database construction method based on machine learning
CN116416608A (en) Monocular three-dimensional real-time detection algorithm based on key points
Chang et al. Using line consistency to estimate 3D indoor Manhattan scene layout from a single image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant