CN103247075B

CN103247075B - Based on the indoor environment three-dimensional rebuilding method of variation mechanism

Info

Publication number: CN103247075B
Application number: CN201310173608.XA
Authority: CN
Inventors: 贾松敏; 王可; 李雨晨; 李秀智
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2013-05-13
Filing date: 2013-05-13
Publication date: 2015-08-19
Anticipated expiration: 2033-05-13
Also published as: CN103247075A

Abstract

The invention belongs to the crossing domain of computer vision and intelligent robot, disclose a kind of method for reconstructing of the indoor scene on a large scale based on variation mechanism, comprising: step one, obtain the calibrating parameters of camera, and set up distortion correction model; Step 2, sets up camera pose and describes and camera projection model; Step 3, utilizes the monocular SLAM algorithm realization camera pose based on SFM to estimate; Step 4, sets up the depth map estimation model based on variation mechanism, and solves this model; Step 5, sets up key frame extraction mechanism, realizes the renewal of three-dimensional scenic.The present invention adopts RGB camera to obtain environmental data, for utilizing high precision monocular location algorithm, propose a kind of degree of depth drawing generating method based on variation mechanism, achieve large-scale quick indoor 3 D scene rebuilding, efficiently solve three-dimensional reconstruction algorithm cost and real time problems.

Description

Based on the indoor environment three-dimensional rebuilding method of variation mechanism

Technical field

The invention belongs to the crossing domain of computer vision and intelligent robot, relate to a kind of indoor environment three-dimensional reconstruction, particularly relate to a kind of method for reconstructing of the indoor scene on a large scale based on variation mechanism.

Technical background

That studies along with simultaneous localization and mapping (Simultaneous Localization And Mapping, SLAM) deepens continuously, and the modeling of surrounding three-dimensional three-dimensional progressively becomes this area research focus, causes the concern of numerous scholar.G.Klein equals first 2007 propose simultaneous localization and mapping (Parallel Tracking and Mapping, PTAM) concept in augmented reality (AR) field, to solve environment Real-time modeling set problem.Camera location and map generate and are divided into two separate threads by PTAM, while utilizing FastCorner method to upgrade detection unique point, adopt optimum local and overall light-stream adjustment (Bundle Adjustment, BA), the renewal of camera pose and three-dimensional feature point map is constantly realized.The method establishes surrounding three-dimensional map based on sparse some cloud, but this map lacks the three-dimensional description directly perceived to environment.The people such as Pollefeys achieve the three-dimensional reconstruction of large-scale outdoor scene by Multi-sensor Fusion.But the method exist calculate high complexity and to shortcomings such as noise-sensitive.In real-time follow-up and dense environment model reconstruction, there has also been some tentative progress at present, but be only confined to the reconstruct of some simple objects, and can only can obtain higher precision under particular constraints condition.The people such as Richard A.Newcombe, the SLAM algorithm based on SFM (Structure from Moving) is utilized to obtain space sparse features point cloud, adopt multiple dimensioned radial base interpolation, use Implicit Surface Polygonization method in graph image, structure three dimensions initialization grid map, and upgrade mesh vertex coordinates, to reach the object of approaching to reality scene in conjunction with scene flows constraint with high precision TV-L1 optical flow algorithm.This algorithm can obtain high-precision environmental model, but due to its algorithm complex higher, in two graphic hardware processor (GPU) acceleration situations, process the time that a two field picture still needs to spend a few second.

Summary of the invention

For the above-mentioned problems in the prior art, the invention provides a kind of quick three-dimensional reconstructing method based on variation mechanism, to realize the three-dimensional modeling under indoor complex environment.The method reduces required process data volume while ensureing environmental information, can realize large-scale quick indoor 3 D scene rebuilding.Efficiently solve three-dimensional reconstruction algorithm cost and real time problems, improve reconstruction precision.

The technical solution used in the present invention is as follows:

PTAM algorithm is utilized to estimate means as camera pose, and the depth map estimated energy function of suitable image sequence structure based on variation pattern is chosen at key frame place, use primal dual algorithm to optimize above-mentioned energy function, realize the acquisition at current key frame place environment depth map.Because this algorithm utilizes contiguous frames information structuring energy function, and the relevance that effectively make use of between certain viewing angles coordinate system, and translating camera perspective projection relation, data item is contained and looks imaging constraint more, reduce the computation complexity that algorithm model solves.Under unified calculation framework, the present invention utilizes graphics accelerator hardware to achieve the parallel optimization of algorithm, effectively improves algorithm real-time.

Based on a method for the indoor environment three-dimensional reconstruction of variation mechanism, it is characterized in that comprising the following steps:

Step one, obtains the calibrating parameters of camera, and sets up distortion correction model.

In computer vision application, by the geometric model of camera imaging, effectively set up the mapping relations between pixel and space three-dimensional point in image.The geometric parameter forming camera model must just can obtain with calculating by experiment, and the process solving above-mentioned parameter is just referred to as camera calibration.The demarcation of camera parameter is in the present invention unusual the key link, and the precision of calibrating parameters directly affects the accuracy of net result three-dimensional map.

The detailed process of camera calibration is:

(1) a chessboard template is printed.The present invention adopts an A4 paper, chessboard be spaced apart 0.25cm.

(2) from multiple angle shot chessboard.During shooting, should chessboard be allowed to take screen as far as possible, and ensure that each angle of chessboard is in screen, altogether shooting 6 template picture.

(3) unique point in image is detected, i.e. each black point of crossing of chessboard.

(4) ask for the inner parameter of camera, method is as follows:

RGB camera calibrating parameters is mainly camera internal reference.The internal reference matrix K of camera is:

K = [\begin{matrix} f_{u} & 0 & u_{0} \\ 0 & f_{v} & v_{0} \\ 0 & 0 & 1 \end{matrix}]

In formula, u, v are camera plane coordinate axis, (u ₀, v ₀) be camera as planar central coordinate, (f _u, f _v) be the focal length of camera.

According to calibrating parameters, the mapping relations of RGB image mid point and three dimensions point are as follows: the coordinate P of RGB image mid point p=(u, v) under camera coordinates system _3D=(x, y, z) is expressed as:

\{\begin{matrix} x = (u - u_{0}) * z / f_{u} \\ y = (v - v_{0}) * z / f_{v} \\ z = d \end{matrix}

In formula, d represents the depth value of depth image mid point p.

In the present invention, camera coordinates system as shown in Figure 2, and be y-axis positive dirction downwards, being forward z-axis positive dirction, is to the right x positive dirction.The initial point position of camera is set as world coordinate system initial point, and X, Y, the Z-direction of world coordinate system are identical with the definition of camera.

FOV (Field of Viewer) camera correction model is:

u_{d} = [\begin{matrix} u_{0} \\ v_{0} \end{matrix}] + [\begin{matrix} f_{u} & 0 \\ 0 & f_{v} \end{matrix}] \frac{r_{d}}{r_{u}} x_{u}

r_{d} = \frac{1}{ω} \arctan (2 r_{u} \tan \frac{ω}{2})

r_{u} = \frac{\tan (r_{d} ω)}{2 \tan \frac{ω}{2}}

In formula, x _ufor the pixel coordinate in z=1 face, u _dfor pixel coordinate in original image, ω is FOV camera distortion coefficient.

Step 2, sets up camera pose and describes and camera projection model.

Under the world coordinate system set up, camera pose can be expressed as matrix:

T_{cw} = [\begin{matrix} R_{cw} & t_{cw} \\ 0 & 1 \end{matrix}]

In formula, " cw " expression is tied to Current camera coordinate system from world coordinates, T _cwthe rotation translation transformation space that ∈ SE (3), SE (3) are rigid body.T _cwcan by following hexa-atomic group of μ=(μ ₁, μ ₂, μ ₃, μ ₄, μ ₅, μ ₆) represent, that is:

T_{cw} = \exp (\hat{μ})

\hat{μ} = [\begin{matrix} 0 & μ_{6} & - μ_{5} & μ_{1} \\ μ_{6} & 0 & μ_{4} & μ_{2} \\ μ_{5} & - μ_{4} & 0 & μ_{3} \\ 0 & 0 & 0 & 0 \end{matrix}]

In formula, μ ₁, μ ₂, μ ₃be respectively the translational movement of Kinect under global coordinate system, μ ₄, μ ₅, μ ₆the rotation amount of coordinate axis under expression local coordinate system.

The pose T of camera _cwestablish spatial point cloud coordinate p under current coordinate system _cto world coordinates p _wtransformation relation, that is:

p _c＝T _cwp _w

Under current mark system, three dimensions point cloud to z=1 plane projects and is defined as:

π(p)＝(x/z,y/z) ^T

In formula, p ∈ R ³for three dimensions point, x, y, z are the coordinate figure of this point.According to changing coordinates point depth value d, utilize backwards projection method determination current spatial three-dimensional point coordinate p, its coordinate relation can be expressed as:

π ^-1(u,d)＝dK ^-1u

Step 3, utilizes the monocular SLAM algorithm realization camera pose based on SFM to estimate.

At present, monocular vision SLAM algorithm mainly comprises the SLAM algorithm based on filtering and SFM (Structure from Moving).The present invention adopts PTAM algorithm realization to the location of camera.This algorithm is a kind of monocular vision SLAM method based on SFM, by by system divides being camera tracking and map building two independently thread.In camera track thread, system utilizes camera to obtain current environment texture information, and build four layers of Gaussian image pyramid, use FAST-10 Corner Detection Algorithm to extract characteristic information in present image, the mode of employing Block-matching sets up the data correlation between Corner Feature.On this basis, according to Current projection error, set up the accurate location that pose estimation model realizes camera, and generate current three-dimensional point cloud map in conjunction with characteristic matching information and triangulation algorithm.The detailed process that camera pose is estimated is:

(1) initialization of sparse map

PTAM algorithm utilizes standard stereo camera algorithm model to set up current environment initialization map, and combination newly increases key frame continuous renewal three-dimensional map on this basis.In the initialization procedure of map, by artificially selecting two independent key frames, utilize FAST corners Matching relation in image, adopt the five-spot based on stochastic sampling consistance (Random Sample Consensus, RANSAC) to realize important matrix F between above-mentioned key frame to estimate, and calculate the three-dimensional coordinate at current signature point place, simultaneously, set up current consistance plane in conjunction with the suitable spatial point of RANSAC algorithm picks, to determine overall world coordinate system, realize the initialization of map.

(2) camera pose is estimated

System utilizes camera to obtain current environment texture information, and builds four layers of Gaussian image pyramid, uses FAST-10 Corner Detection Algorithm to extract characteristic information in present image, adopts the mode of Block-matching to set up data correlation between Corner Feature.On this basis, according to Current projection error, set up pose estimation model, its mathematical description is as follows:

ξ = \underset{ξ}{\arg \min} ΣObj (\frac{| e_{j} |}{σ_{j}}, σ_{T})

e_{j} = (\begin{matrix} u_{i} \\ v_{i} \end{matrix}) - Kπ (\exp (\hat{ξ}) p)

In formula, e _jprojection error, ∑ Obj (, σ _t) be Tukey two power objective function, σ _tfor the unbiased estimator of the match-on criterion difference of unique point, ξ is current pose 6 element group representation, for the antisymmetric matrix be made up of ξ.

According to above-mentioned pose estimation model, choose 50 the characteristic matching points being positioned at image pyramid top layer, realize estimating the initialization pose of camera.Further, the initial pose of this algorithm combining camera, adopts polar curve to receive the mode of rope, sets up Corner Feature sub-pixel precision matching relationship in image pyramid, and by above-mentioned coupling to bringing pose estimation model into, realize the accurate reorientation of camera.

(3) camera pose is optimized

System is after initialization, and the new key frame of wait enters by map building thread.If camera and current key interframe number of image frames exceed threshold condition, and when camera tracking effect is best, add key frame process by automatically performing.Now, system will carry out Shi-Tomas assessment to newly increasing all FAST angle points in key frame, to obtain the current angle point information with notable feature, and choose key frame nearest with it and utilize polar curve to receive rope and block matching method to set up unique point mapping relations, the accurate reorientation of camera is realized in conjunction with pose estimation model, match point is projected to space simultaneously, generate current global context three-dimensional map.

In order to realize the maintenance of global map, in the process that the new key frame of map building thread waits enters, the consistance optimization that system will utilize local and the Levenberg-Marquardt boundling adjustment algorithm of the overall situation to realize current map.The mathematical description of this boundling adjustment algorithm is:

{{ξ_{1} . . ξ_{N}}, {p_{1} . . p_{M}}} = \underset{{{μ}, {p}}}{\arg \min} Σ_{i = 1}^{N} \underset{j &Element; S_{i}}{Σ} Obj (\frac{| e_{ji} |}{σ_{ji}}, σ_{T})

In formula, σ _jifor in i-th key frame, the unbiased esti-mator of the match-on criterion difference of FAST unique point, ξ _irepresent 6 element group representations of i-th key frame pose, p _ifor the point in global map.

Step 4, sets up the depth map estimation model based on variation mechanism, and solves this model.

Under the accurate pose of PTAM estimates prerequisite, the present invention is based on many apparent weights construction method, utilize variation Mechanism establishing depth solving model.The method is based on illumination invariant and depth map smoothness assumption, set up L1 type data penalty term and variation regularization term, this model sets up data penalty term by under the prerequisite supposed at illumination invariant, and utilizes data penalty term to ensure the flatness of current depth figure, and its mathematical model is as follows:

E_{d} = {&Integral;}_{Ω} (E_{data} + λ E_{reg}) dx

In formula, λ is data penalty term E _datawith variation regularization term E _regbetween weight coefficient, for depth map span.

By choosing the reference frame I that current key frame is depth map algorithm for estimating _r, utilize its adjacent picture sequence I={I ₁, I ₂..., I _n, set up data penalty term E in conjunction with projection model _data, its mathematical description is:

E_{data} = \frac{1}{| I (r) |} \underset{I_{i} &Element; I}{Σ} | I_{r} (x) - I_{i} (x^{'}) |

In formula, | I (r) | for having the image frames numbers of the information of coincidence in current picture sequence with reference frame, x ' is for being in I as reference frame x under degree of depth d _ithe projection coordinate at place, that is:

x^{'} = π^{- 1} ({KT}_{r}^{i} π (x, d))

Under depth map smoothness assumption prerequisite, in order to ensure the uncontinuity of boundary in the picture, introduce Weighted H uber operator and build variation regularization term, its mathematical description is:

E_{reg} = g (u) {| | &dtri; d (u) | |}_{a}

In formula, for the gradient of depth map, g (u) is pixel gradient weight coefficient, and huber operator || x|| _αmathematical description be:

{| | x | |}_{a} = \{\begin{matrix} \frac{{| | x | |}^{2}}{2 α}, & | | x | | \leq α \\ | | x | | - \frac{α}{2}, & others \end{matrix}

In formula, α is constant.

According to Legendre-Fenchel conversion, energy function can be expressed as:

g {| | &dtri; d | |}_{a} = < g &dtri; d, q > - δ (q) - \frac{α}{2} {| | q | |}^{2}

In formula,

δ (q) = \{\begin{matrix} \frac{α}{2} & α < | | q | | \leq 1 \\ \infty & others \end{matrix}

The three-dimensional reconstruction process that is introduced as of above-mentioned Huber operator provides slickness and ensures also there is discontinuous border in depth map for guaranteeing simultaneously, improves three-dimensional map and creates quality.

The problem high for above-mentioned mathematical model solving complexity, calculated amount is large, introduces auxiliary variable and sets up convex Optimized model, and adopt alternately descent method to realize the optimization to above-mentioned model, its detailed process is as follows:

(1) fixing h, solves:

\underset{q}{\arg \max} {\underset{d}{\arg \min} E_{d, q}}

E_{d, q} = {&Integral;}_{Ω} (< g &dtri; d, q > + \frac{1}{2 θ} {(d - h)}^{2} - δ (q) - \frac{α}{2} {| | q | |}^{2}) dx

In formula, θ is quadratic term constant coefficient, and g is gradient weight coefficient in variation regularization term.

According to Lagrangian extremum method, the condition that above-mentioned energy function reaches extreme value is:

\frac{&PartialD; E_{d, q}}{&PartialD; q} = g &dtri; d - αq = 0

\frac{&PartialD; E_{d, q}}{&PartialD; d} = g div q + \frac{1}{θ} (d - h) = 0

In formula, divq is the divergence of q.

Describe in conjunction with partial derivative discretize, above-mentioned extremum conditions can be expressed as:

\frac{q^{n + 1} - q^{n}}{ϵ_{q}} = g &dtri; d - α q^{n + 1}

\frac{d^{n + 1} - d^{n}}{ϵ_{d}} = g div p + \frac{1}{θ} (d^{n + 1} - h)

Primal dual algorithm now can be adopted to realize the iteration optimization of energy function, that is:

p^{n + 1} = \frac{(p^{n} + ϵ_{q} g &dtri; d^{n}) / (1 + ϵ_{q} α)}{\max (1, (p^{n} + ϵ_{q} g &dtri; d^{n}) / (1 + ϵ_{q} α))}

d^{n + 1} = \frac{d^{n} + ϵ_{d} ({g div q}^{n + 1} + h^{n} / θ)}{(1 + ϵ_{d} / θ)}

In formula, ε _q, ε _dfor constant, expression maximizes and minimizes gradient and describes coefficient respectively.

(2) fixing d, solves:

\underset{h}{\arg \min} E_{h}

E_{h} = {&Integral;}_{Ω} (\frac{θ}{2} {(d - h)}^{2} + \frac{λ}{| I (r) |} Σ_{i = 0}^{n} | I_{i} (x) - I_{ref} (x, h) |) dx

In above-mentioned energy function solution procedure, in order to effectively reduce the complexity of algorithm, ensure the part detailed information in process of reconstruction simultaneously.The present invention is by degree of depth span [d _min, d _max] be divided into S sample plane, adopt exhaustive mode to obtain the optimum solution of present energy function.Wherein being chosen as of step-length:

d_{inc}^{k} = \frac{{Sd}_{\min} d_{\max}}{(S - k) d_{\min} + d_{\max}}

In formula, for k and k-1 sample plane interval.

Step 5, sets up key frame extraction mechanism, realizes the renewal of three-dimensional scenic.

Considering the elimination of system redundancy information, in order to improve sharpness and the real-time of reconstructed results, reducing system in computation burden, the present invention only realizes the estimation to three-dimensional scenic at key frame place, and upgrades and safeguard the three-dimensional scenic generated.When after newly-increased frame KeyFrame data, according to formula current newly-increased KeyFrame data are transformed in world coordinate system, complete the renewal of contextual data.

Data penalty term in the estimation of Depth model utilizing step 4 to set up, the information set up between present frame with key frame overlaps scale evaluation function, that is:

N = \underset{x &Element; R^{2}}{Σ} c (x)

In formula, for constant.

If when now N is less than 0.7 of image size, namely determine that present frame is new key frame.

The invention has the beneficial effects as follows: the present invention adopts RGB camera to obtain environmental data.For utilizing high precision monocular location algorithm, proposing a kind of degree of depth drawing generating method based on variation mechanism, achieving large-scale quick indoor 3 D scene rebuilding, efficiently solving three-dimensional reconstruction algorithm cost and real time problems.

Accompanying drawing explanation

Fig. 1 is the indoor method for reconstructing three-dimensional scene process flow diagram based on Variation Model;

Fig. 2 is camera coordinates system schematic diagram;

Fig. 3 is the three-dimensional reconstruction experimental result of application example of the present invention.

Embodiment

Fig. 1 is the indoor method for reconstructing three-dimensional scene process flow diagram based on Variation Model, comprises the following steps:

Step 2, sets up camera pose and describes and camera projection model.

Provide an application example of the present invention below.

The RGB camera that this example adopts is Point Grey Flea2, and image distinguishes that rate is 640 × 480, and most high frame rate is 30fps, and horizontal field of view angle is 65 °, and focal length is approximately 3.5mm.The PC used is equipped with GTS450GPU and i5 tetra-core CPU.

In experimentation, obtain environment depth information by color camera, combining camera pose algorithm for estimating realizes accurately locating self.After entering key frame, around selection key frame, 20 two field pictures are as the input of this paper depth estimation algorithm.In depth estimation algorithm implementation, make d ⁰=h ⁰and q ⁰=0, calculate to obtain the initialization input of current depth figure, and iteration optimization E _d,qwith E _huntil convergence.Meanwhile, in algorithm iteration process, constantly should reduce θ value, and increase the weight of quadratic function in algorithm implementation, effectively improve algorithm the convergence speed.As shown in Figure 3, experiment shows that the method effectively can realize the dense three-dimensional reconstruction of environment to final experimental result, and a step of going forward side by side demonstrates the feasibility of the method.

Claims

1., based on a method for the indoor environment three-dimensional reconstruction of variation mechanism, it is characterized in that comprising the following steps:

Step one, obtains the calibrating parameters of camera, and sets up distortion correction model;

The detailed process of camera calibration is:

(1) a chessboard template is printed;

(2) from multiple angle shot chessboard, should chessboard be allowed to take screen as far as possible, and ensure that each angle of chessboard is in screen, altogether shooting 6 template picture;

(3) unique point in image is detected, i.e. each black point of crossing of chessboard;

(4) inner parameter asked for, method is as follows:

RGB camera calibrating parameters is mainly camera internal reference, and the internal reference matrix K of camera is:

K = [\begin{matrix} f_{u} & 0 & u_{0} \\ 0 & f_{v} & v_{0} \\ 0 & 0 & 1 \end{matrix}]

In formula, u, v are camera plane coordinate axis, (u ₀, v ₀) be camera as planar central coordinate, (f _u, f _v) be the focal length of camera;

\{\begin{matrix} x = (u - u_{0}) * z / f_{u} \\ y = (v - v_{0}) * z / f_{v} \\ z = d \end{matrix}

In formula, d represents the depth value of depth image mid point p;

Camera coordinates system is downwards y-axis positive dirction, is forward z-axis positive dirction, is to the right x positive dirction; The initial point position of camera is set as world coordinate system initial point, and X, Y, the Z-direction of world coordinate system are identical with the definition of camera;

FOV camera correction model is:

u_{d} = [\begin{matrix} u_{0} \\ v_{0} \end{matrix}] + [\begin{matrix} f_{u} & 0 \\ 0 & f_{v} \end{matrix}] \frac{r_{d}}{r_{u}} x_{u}

r_{d} = \frac{1}{ω} \arctan ({2 r}_{u} \tan \frac{ω}{2})

r_{u} = \frac{\tan (r_{d} ω)}{2 \tan \frac{ω}{2}}

In formula, x _ufor the pixel coordinate in z=1 face, u _dfor pixel coordinate in original image, ω is FOV camera distortion coefficient;

Step 2, set up camera pose and describe and camera projection model, direction is as follows:

T_{cw} = [\begin{matrix} R_{cw} & t_{cw} \\ 0 & 1 \end{matrix}]

In formula, cw represents from world coordinates and is tied to Current camera coordinate system, T _cwthe rotation translation transformation space that ∈ SE (3), SE (3) are rigid body; T _cwcan by following hexa-atomic group of μ=(μ ₁, μ ₂, μ ₃, μ ₄, μ ₅, μ ₆) represent, that is:

T_{cw} = \exp (\hat{μ})

\hat{μ} = [\begin{matrix} 0 & μ_{6} & - μ_{5} & μ_{1} \\ μ_{6} & 0 & μ_{4} & μ_{2} \\ μ_{5} & - μ_{4} & 0 & μ_{3} \\ 0 & 0 & 0 & 0 \end{matrix}]

In formula, μ ₁, μ ₂, μ ₃be respectively the translational movement of Kinect under global coordinate system, μ ₄, μ ₅, μ ₆the rotation amount of coordinate axis under expression local coordinate system;

p _c＝T _cwp _w

π(p)＝(x/z,y/z) ^T

In formula, p ∈ R ³for three dimensions point, x, y, z are the coordinate figure of this point; According to changing coordinates point depth value d, utilize backwards projection method determination current spatial three-dimensional point coordinate p, its coordinate relation can be expressed as:

π ^-1(u,d)＝dK ^-1u

Step 3, utilizes the monocular SLAM algorithm realization camera pose based on SFM to estimate;

Step 4, sets up the depth map estimation model based on variation mechanism, and solves this model;

Step 5, sets up key frame extraction mechanism, and realize the renewal of three-dimensional scenic, method is as follows:

Realize the estimation to three-dimensional scenic at key frame place, and upgrade and safeguard the three-dimensional scenic generated; When after newly-increased frame KeyFrame data, according to formula current newly-increased KeyFrame data are transformed in world coordinate system, complete the renewal of contextual data;

N = \underset{x &Element; R^{2}}{Σ} c (x)

In formula, for constant;

2. the method for a kind of indoor environment three-dimensional reconstruction based on variation mechanism according to claim 1, is characterized in that, the method that step 3 utilizes the monocular SLAM algorithm realization camera pose based on SFM to estimate is further comprising the steps of:

(1) initialization of sparse map

PTAM algorithm utilizes standard stereo camera algorithm model to set up current environment initialization map, and combination newly increases key frame continuous renewal three-dimensional map on this basis; In the initialization procedure of map, by artificially selecting two independent key frames, utilize FAST corners Matching relation in image, adopt the estimation realizing the important matrix F between above-mentioned key frame based on the conforming five-spot of stochastic sampling, and calculate the three-dimensional coordinate at current signature point place, meanwhile, set up current consistance plane in conjunction with the suitable spatial point of RANSAC algorithm picks, to determine overall world coordinate system, realize the initialization of map;

(2) camera pose is estimated

System utilizes camera to obtain current environment texture information, and builds four layers of Gaussian image pyramid, uses FAST-10 Corner Detection Algorithm to extract characteristic information in present image, and the mode of employing Block-matching sets up the data correlation between Corner Feature; On this basis, according to Current projection error, set up pose estimation model, its mathematical description is as follows:

ξ = \underset{ξ}{\arg \min} ΣObj (\frac{| e_{j} |}{σ_{j}}, σ_{T})

e_{j} = (\begin{matrix} u_{i} \\ v_{i} \end{matrix}) - Kπ (\exp (ξ) p)

In formula, e _jprojection error, Σ Obj (, σ _t) be Tukey two power objective function function, σ _tfor the unbiased estimator of the match-on criterion difference of unique point, ξ is current pose 6 element group representation, for the antisymmetric matrix be made up of ξ;

According to above-mentioned pose estimation model, choose 50 the characteristic matching points being positioned at image pyramid top layer, realize estimating the initialization pose of camera; Further, the initial pose of this algorithm combining camera, adopts polar curve to receive the mode of rope, sets up Corner Feature sub-pixel precision matching relationship in image pyramid, and by above-mentioned coupling to bringing pose estimation model into, realize the accurate reorientation of camera;

(3) camera pose is optimized

System is after initialization, and the key frame that map building thread waits is new enters; If camera and current key interframe number of image frames exceed threshold condition, and when camera tracking effect is best, add key frame process by automatically performing; Now, system will carry out Shi-Tomas assessment to FAST angle points all in the key frame newly increased, to obtain the current Corner Feature information with notable feature, and choose key frame nearest with it and utilize polar curve to receive rope and block matching method to set up unique point mapping relations, the accurate reorientation of camera is realized in conjunction with pose estimation model, match point is projected to space simultaneously, generate current global context three-dimensional map;

In order to realize the maintenance of global map, in the process that the key frame that map building thread waits is new enters, system utilizes local and the Levenberg-Marquardt boundling adjustment algorithm of the overall situation to realize the global coherency optimization of current map; The mathematical description of this boundling adjustment algorithm is:

{{ξ_{2} . . ξ_{N}}, {p_{1} . . p_{M}}} = \underset{{{μ}, {p}}}{\arg \min} Σ_{i = 1}^{N} \underset{j &Element; S_{i}}{Σ} Obj (\frac{| e_{ji} |}{σ_{ji}}, σ_{T})

3. the method for a kind of indoor environment three-dimensional reconstruction based on variation mechanism according to claim 1, is characterized in that, step 4 is set up and the method solved based on the depth map estimation model of variation mechanism is as follows:

Based on the depth map estimation model of variation mechanism, under the prerequisite of illumination invariant hypothesis, set up data penalty term, and utilize data penalty term to ensure the flatness of current depth figure, its mathematical model is as follows:

E _d＝∫ _Ω(E _data+λE _reg)dx

In formula, λ is data penalty term E _datawith variation regularization term E _regbetween weight coefficient, for depth map span;

E_{data} = \frac{1}{| I (r) |} \underset{I_{i} &Element; I}{Σ} | I_{r} (x) - I_{i} (x^{'}) |

x^{'} = π^{- 1} ({KT}_{r}^{i} π (x, d))

E _reg＝g(u)||▽d(u)|| _α

In formula, ▽ d is the gradient of depth map, and g (u) is pixel gradient weight coefficient, g (u)=exp (-a|| ▽ I _r(u) ||) Huber operator || x|| _αmathematical description be:

{| | x | |}_{α} = \{\begin{matrix} \frac{{| | x | |}^{2}}{2 α}, | | x | | \leq α \\ | | x | | - \frac{α}{2}, others \end{matrix}

In formula, α is constant;

According to Legendre-Fenchel conversion, energy function is transformed to:

g {| | &dtri; d | |}_{α} = < g &dtri; d, q > - δ (q) - \frac{α}{2} {| | q | |}^{2}

In formula,

δ (q) = \{\begin{matrix} \frac{α}{2} & α < | | q | | \leq 1 \\ \infty & others \end{matrix}

In view of above-mentioned mathematical model solving complexity is high, calculated amount large, introduce auxiliary variable and set up convex Optimized model, adopt alternately descent method to realize the optimization to above-mentioned model, detailed process is as follows:

(1) fixing h, solves:

\underset{q}{\arg \max} {\underset{d}{\arg \min} E_{d, q}}

E_{d, q} = {&Integral;}_{Ω} (< g &dtri; d, q > + \frac{1}{2 θ} {(d - h)}^{2} - δ (q) - \frac{α}{2} {| | q | |}^{2}) dx

In formula, g is gradient weight coefficient in variation regularization term, and θ is quadratic term constant coefficient;

\frac{{&PartialD; E}_{d, q}}{&PartialD; q} = g &dtri; d - αq = 0

\frac{{&PartialD; E}_{d, q}}{&PartialD; d} = g div q + \frac{1}{θ} (d - h) = 0

In formula, divq is the divergence of q;

\frac{q^{n + 1} - q^{n}}{ϵ_{q}} = g &dtri; d - {αq}^{n + 1}

\frac{d^{n + 1} - d^{n}}{ϵ_{d}} = g div p + \frac{1}{θ} (d^{n + 1} - h)

Primal dual algorithm is adopted to realize the iteration optimization of energy function, that is:

p^{n + 1} = \frac{(p^{n} + ϵ_{q} g &dtri; d^{n}) / (1 + ϵ_{q} α)}{\max (1, (p^{n} + ϵ_{q} g &dtri; d^{n}) / (1 + ϵ_{q} α))}

d^{n + 1} = \frac{d^{n} + ϵ_{d} ({g div q}^{n + 1} + h^{n} / θ)}{(1 + ϵ_{d} / θ)}

In formula, ε _q, ε _dfor constant, expression maximizes and minimizes gradient and describes coefficient respectively;

(2) fixing d, solves:

\underset{h}{\arg \min} E_{h}

E_{h} = {&Integral;}_{Ω} (\frac{θ}{2} {(d - h)}^{2} + \frac{λ}{| I (r) |} Σ_{i = 0}^{n} | I_{i} (x) - I_{ref} (x, h) |) dx

In above-mentioned energy function solution procedure, in order to effectively reduce the complexity of algorithm, ensure the part detailed information in process of reconstruction, by degree of depth span [d simultaneously _min, d _max] be divided into S sample plane, adopt exhaustive mode to obtain the optimum solution of present energy function; Wherein being chosen as of step-length:

d_{inc}^{k} = \frac{{Sd}_{\min} d_{\max}}{(S - k) d_{\min} + d_{\max}}

In formula, for k and k-1 sample plane interval.