CN100470452C

CN100470452C - Method and system for implementing three-dimensional enhanced reality

Info

Publication number: CN100470452C
Application number: CNB200610101229XA
Authority: CN
Inventors: 杨旭波; 曹达; 齐泉
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2006-07-07
Filing date: 2006-07-07
Publication date: 2009-03-18
Anticipated expiration: 2026-07-07
Also published as: CN101101505A

Abstract

The invention discloses a method and system of implementing 3D augmented reality, and the method comprises: video frame capture module captures video frames of 2D visual code mark in real environment; performing augmented reality processing on the obtained video frames and obtaining the processed virtual graphic frames; virtual-real synthesis module synthesize the obtained virtual graphic frames and the obtained video frames and obtaining the synthesized video frames of augmented reality. And the invention can better support implementation of the augmented reality technique in handheld mobile computers with relatively limited resources and expanding the application field of the augmented reality technique.

Description

A kind of method and system that realize three-dimensional enhanced reality

Technical field

The present invention relates to the augmented reality technology, relate in particular to a kind of method and system that realize three-dimensional enhanced reality.

Background technology

Augmented reality (AR, Augmented Reality) utilizes dummy object that real scene is carried out real technique for enhancing.Augmented reality keeps main perception and the interactive environment of the actual physical environment that is in of user as the user, by with text, two dimensional image, the information labeling of virtual generation such as three-dimensional model is on display screen on the object in the shown physical environment, thereby realize the note of real physical environment that the user is in, explanation, perhaps strengthen, emphasize some effect of actual environment, put on special-purpose augmented reality such as the user and show glasses, go sight-seeing on the spot to the Forbidden City in person, when visiting a historical relic, he not only can see historical relic and its surrounding environment itself, can also see the recommended information to this historical relic that presents by the additional multimedia mode of augmented reality technology simultaneously.The experience that the augmented reality technology merges for a kind of virtual objects of user and actual environment two-phase, it can help the cognitive surrounding environment of user effectively, realizes the mutual of user and surrounding environment.

Document " H.Billinghurst, M.Marker Tracking and HMD Calibration for avideo-based Augmented Reality Conferencing System.In Proceedings of the 2ndInternational Workshop on Augmented Reality.San Francisco, USA, October.1999 " in a kind of augmented reality algorithm based on mark has been proposed, and design an open source software bag (ARToolkit) thus.The ARToolkit vision technique that uses a computer calculates relative position relation between true shooting scene and the label symbol.The main algorithm flow process of ARToolkit is: the video frame image of input captured in real time, but convert thereof into the black and white binary map by preset threshold; The pairing connected region of black surround color of mark, the as a token of candidate target of thing black surround in the search scene; Obtain the outline line of each connected region, if can extract four crossing straight flanges, then as possible mark; The corner characteristics that utilizes four straight flanges to find carries out deformation and corrects, and calculates a homography matrix (homography) conversion that mark is transformed into front view; Utilize this homography to sample at the black surround interior zone of mark, the sampling template is generally 16x16, obtains 256 sampled points altogether and constitutes a vector of samples; This vector of samples and the mark that leaves the mark database in advance in are compared one by one, and the vector that respective point constitutes on the calculation flag thing and the normalized vector dot product of vector of samples obtain a confidence value; If confidence value is lower than a threshold value, just being used as is that the match is successful, otherwise is exactly that the match is successful.Find corresponding dummy object according to the mark that the match is successful, dummy object is carried out conversion by the current relative orientation of camera and mark, make it to match with mark.

Because the mark zone line content that ARToolkit adopted is random, not to unify standard.This brings following subject matter: pattern matching process calculates slow, along with the quantity in the mark database increases and growth computing time; Mismatch the rate height; Need be with dummy object and concrete mark pattern direct correlation, the foundation of actual situation matching relationship is all inconvenient with maintenance.

On the other hand, two-dimentional visual encoding is a kind of coding pattern that is similar to bar code.As shown in Figure 1, this two-dimensional visualization coding can be attached on the article the unique mark as the information and the function of identify objects.Mobile platforms such as mobile phone are well suited for realizing two-dimensional visualization coding, because the use of mobile phone is very universal now, and the mobile phone that has a camera also becomes more and more popular.Integrated camera can allow the user detect the object that has the two-dimensional visualization coding in the environment around in time as sensor on mobile computing devices such as mobile phone, the decoding numeral number that obtains being correlated with then, and provide corresponding information by numbering.At present there have been some two-dimensional visualization encoding softwares to be developed as business practice.

Document " Michael Rohs.Real-World Interaction with Camera-Phones.2ndInternational Symposium on Ubiquitous Computing Systems, Tokyo, Japan, November 2004 " a kind of two-dimensional visualization coding (Visual Code) technology proposed, its coding pattern as shown in Figure 2, mention by camera that mobile phone is with in the document and discern this coding pattern, calculate some relative motion information of mobile phone and this coding pattern simultaneously, the camera of mobile phone and the movable informations such as relative inclination of coding can be used as a mobile phone and a mutual parameter of numerical information.

The target of above-mentioned document is the coding that obtains image, and utilizes the componental movement information of mobile phone to carry out the mutual of two dimension, and what therefore obtain from code detection is the pairing numeral of coding, and the corresponding obliquity information of mobile phone and coding pattern.Capture coding pattern among Fig. 2 by the camera on the mobile computing devices such as mobile phone, can parse this two-dimensional encoded pattern one 76 numeral of unique correspondence.Then, utilize this numeral to retrieve some two-dimension pictures and Word message, be presented on the screen of mobile computing devices such as mobile phone.But above-mentioned two-dimensional visualization coding does not support to be similar to the augmented reality function of the three-dimensional information among the ARToolkit.

Summary of the invention

The technical problem to be solved in the present invention provides a kind of method and system that realize three-dimensional enhanced reality, is used to the foundation of mapping relations between the actual situation object, and a kind of quick, reliable method and system are provided.

For solving the problems of the technologies described above, the objective of the invention is to be achieved through the following technical solutions, a kind of method that realizes three-dimensional enhanced reality is provided, this method specifically comprises step:

A: the frame of video that obtains two-dimentional visual encoding mark;

B: described frame of video is carried out augmented reality handle, obtain the virtual pattern frame;

C: described virtual pattern frame and frame of video are synthesized, obtain the synthetic video frame.

Preferably, comprise pattern in the mark in the described steps A, described pattern is arranged according to coding rule.

Preferably, further comprise step before the described steps A:

A1: initialization 3 d graphics system environment, camera inner parameter and mark file;

A2: grasp video frame images, and make it as the texture mapping in the graphics system this image zoom;

A3: texture mapping is plotted as background and is stored in the frame buffer that graphics system provides;

A4: the frame of video that grasps in the steps A 2 is carried out Flame Image Process;

A5: described video frame images is compared through the mark of stipulating in the mark of Flame Image Process and the mark file, and whether error in judgement is less than setting value, if then mark is found in judgement;

A6: write down the apex coordinate of this mark in mark coordinate system and image coordinate system, and calculate and this mark corresponding codes value.

Preferably, described steps A 4 specifically comprises step:

A41: the frame of video that grasps is carried out gradation of image handle and binaryzation;

A42: the frame of video that grasps is carried out image tagged;

A43: the frame of video that grasps is carried out profile extract.

Preferably, described step B specifically comprises step:

B1: obtain mark apex coordinate corresponding in mark coordinate system and the image coordinate system, and calculate the transformation matrix that is tied to camera coordinates system from the mark coordinate;

B2: coding pattern in the two-dimentional visual encoding of sampling draws encoded radio.

Preferably, further comprise step behind the described step B2:

B3: retrieval is obtained this three-dimensional model, and is obtained the summit array of described three-dimensional model corresponding to the three-dimensional model of this coding.

Preferably, also comprise step behind the described step B3:

B4: the summit in the array of acquisition summit and the product of transformation matrix, described product are the coordinate array of this three-dimensional picture under camera coordinates system;

B5: draw three-dimensional picture, and this three-dimensional picture is stored in the frame buffer, generate the virtual pattern frame with this coordinate array correspondence.

The present invention also provides a kind of system that realizes three-dimensional enhanced reality, and this system comprises: video frame capture module, video tracking module, virtual pattern system module, actual situation synthesis module and video display module;

The video frame capture module is used to catch the frame of video of two-dimentional visual encoding mark, and this frame of video is sent to the video tracking module;

The video tracking module is used for the mark frame of video that computing obtains, and obtains to be tied to from the mark coordinate transformation matrix of camera coordinates system according to the computing result; Two-dimentional visual encoding coding pattern also is used for sampling, obtain the mark encoded radio, retrieve the three-dimensional model corresponding, and, obtain the coordinate array of this three-dimensional picture under camera coordinates system according to the summit array of this three-dimensional model and the product of transformation matrix with this encoded radio;

The virtual pattern system module is used for drawing corresponding three-dimensional picture according to the coordinate array of the three-dimensional picture that obtains under camera coordinates system; This three-dimensional picture is stored in the frame buffer, generates the virtual pattern frame;

The actual situation synthesis module is used for the virtual pattern frame that will obtain and the frame of video of two-dimentional visual encoding mark is synthesized, and obtains the synthetic video frame.

Above technical scheme because the present invention is attached to two-dimentional visual encoding in the three-dimensional enhanced reality technology, is brought following beneficial effect as can be seen:

1, in existing three-dimensional enhanced reality technology, introduces the two-dimensional visualization coded image of standard as following the tracks of used mark, replacing the mark of the arbitrary shape that ARToolkit adopts in the prior art, thereby it is low and accelerated the pattern match processing speed to have improved wherein track algorithm speed and reliability.

2, on the visual basis of coding of existing two-dimensional, introduce calculating and extraction to the relative information converting of three-dimensional, retrieve corresponding three-dimensional media information and the synthetic augmented reality technology of three-dimensional registration, this technology can not only identify two-dimentional visual encoding, can also be with its corresponding three-dimensional space position that obtains, the three-dimensional model that retrieves by coding strengthens in real time and is presented on the coded graphics, and then realizes the augmented reality function.

3, this programme can better support the augmented reality technology to implement on the relatively limited hand-held mobile computing device of computational resource, expand augmented reality The Application of Technology field, and by unifying the coded system of standard, for the foundation of the mapping relations between the actual situation object in the augmented reality technology provides a kind of means of very convenient, easy-to-use and standard, for the birth of novel enhanced real world applications provides new technical foundation.

Description of drawings

Fig. 1 is some two-dimensional visualization coding maker things and their corresponding codes value synoptic diagram;

Fig. 2 is the coding pattern of two-dimentional visual encoding Visual Code;

Fig. 3 is a method flow diagram of realizing three-dimensional enhanced reality;

Fig. 4 is a system function module synoptic diagram of realizing three-dimensional enhanced reality;

Fig. 5 is based on the augmented reality algorithm flow chart of two-dimensional visualization coding maker thing;

Fig. 6 is that the gradation of image processing adopts different directions that picture element is scanned synoptic diagram.

Embodiment

See also Fig. 3, the invention provides a kind of method that realizes three-dimensional enhanced reality, it is mainly realized by following steps:

301: initialization 3 d graphics system environment, camera inner parameter and mark file;

Wherein, the target of graphics system context initialization is that the drawing environment can support X-Y scheme and three-dimensional picture simultaneously is set, and comprises obtaining display mode being set, the display parameter tabulation being set, display device, establishment display surface, the display surface parameter is set, viewpoint position and view plane etc. are set.

The camera inner parameter is meant inner intrinsic parameters such as the focal length of camera camera and deformation, and this parameter has been determined the projective transformation matrix of camera camera, and it depends on the attribute of camera itself, so its inner parameter is invariable concerning same camera.The camera inner parameter is by an independently camera calibrated program acquisition in advance, and what done here is that this group parameter is read in the internal memory.

The mark file logging supporting paper of environmental evidence thing information, when program initialization, these information are read in internal memory by file.

302: grasp a video frame images by camera, and this video frame images is carried out convergent-divergent, make it to be suitable as the texture mapping in the graphics system, then it is plotted as background and is stored in the frame buffer that graphics system provides;

303: the video frame images that grasps in the step 302 is carried out Flame Image Process;

Specifically comprise image is carried out gray scale processing and binaryzation, mark and profile extraction processing.

Wherein, handle and binaryzation,, the computing formula of a normal brightness arranged in the International Telecommunication Union in order to obtain the gray-scale value of coloured image for gradation of image:

Y＝0.2126*red+0.7152*green+0.0722*blue

Influenced minimum because RGB produces in the sharpness of image and contrast, so its blue composition is omitted, so can adopt formula of reduction grey=(red+green)/2 to replace above-mentioned normal brightness formula.Need all handle each picture element because calculate gray-scale value and binaryzation, computation process more complicated, efficient are also lower.The gray-scale value that draws by above-mentioned formula of reduction is more more accurate than the normal brightness formula that adopts ITU, and efficient wants high.And after obtaining gray-scale value, carry out binary conversion treatment further to image, but the gray-scale value original appearance of reproduced image truly, reason is that the bright instability of camera image can cause the uneven brightness of coding maker thing.

In order to address this problem, adopt self-adapting threshold to determine that method handles, the different directions of employing line by line among its basic ideas such as Fig. 6 scans picture element, adopts this scan mode also can make the brightness of coding maker thing even under the bright unsettled situation of camera image.

See also Fig. 6, the scanning pattern among the figure is similar to the scanning pattern of a kind of " snakelike ".Because under the situation that uneven illumination is spared, scanning will produce different binaryzation results with right-to-left scanning from left to right, and adopt this " crawling " scanning pattern to be equivalent to obtain an average zone, thereby the even situation of uneven illumination is improved.

Average threshold value gs (n) will calculate according to following formula:

g _s(n)=g _s(n-1) * (1-1/s)+p _nP wherein _nThe gray-scale value of representing n picture element, the mean breadth during the s=w/8 presentation video moves (w presentation video width) g _sInitial value be cs/2 (wherein c is maximum gray-scale value).Thus, just can determine the result of each picture element after binaryzation with following formula:

If T (n)=1 is p _n＜(g _s(n)/s) * (1-t/100)

T=15 wherein

Or T (n)=0, if other situations.

Next exactly image is carried out range searching and mark, the groundwork of this step is the adjacent area that finds black pixel point, and to the zone numeration, gives a sequence number to each zone.Image is lined by line scan, be the zone found assignment in advance.Described " snakelike " scanning method before adopting during scanning is concrete

Algorithm following (is example with from left to right scanning process):

(i is the high order end picture element to for; I＜picture traverse; I++)

if[p _n<(g _s(n)/s)*(1-t/100)]{

Obtain present picture element point directly over the point label, be designated as top;

Obtain the label of the left side point of present picture element point, be designated as left;

If (top!=Bei Jingyanse ﹠amp; ﹠amp;=background color)

In the equals of top chained list, search for;

If (not having left in the equals chained list of top)

The equals chained list of left and the chained list of top are connected

}

Else if (top!=background color)

The label of present picture element point is designated as top;

}

Else if (left!=background color)

The label of present picture element point is designated as left;

}

else{

currentcluster++；

equals[currentcluster]＝currentcluster；

Present picture element point label=currentcluster; // new label

}

The process of right-to-left just need obtain and put directly over the present picture element point and the label of right-hand point compares to determine the label of current point similarly this moment.

In the process of range searching, a problem that might produce is: two zones with different labels in fact belong to the same area.In this case, in the equals chained list and synthesize, and the work of this part is exactly that such zone is connected together with these two regional records, and composes a new label for each zone.

Determine that the position of coding maker thing will determine the coding maker thing in image, just need find guide ribs and corresponding angle foundation stone earlier.The method of seeking candidate's guide ribs is, if certain regional axial ratio rate drops in the given ideal value nearby sphere, just it is used as candidate's guide ribs, then, the position of determining second guide ribs and three angle foundation stones according to the size and the direction in zone, if the relevant position truly has these features in the present image, just found real guide ribs at last.

304: will compare by the legal mark in characteristic matching algorithm and the mark file through the mark of Flame Image Process, find legal two-dimentional visual encoding mark;

Wherein, all to compare for the mark that satisfies size, shape by the legal mark in characteristic matching algorithm and the mark file.If error is in allowed limits, then judge to have searched out legal mark, write down the position of this mark in image coordinate system this moment, this image coordinate system is a two-dimensional coordinate system, with image laterally be X-axis, vertically be Y-axis.Utilizing the two-dimensional visualization coding all is foursquare feature, writes down the position on this mark summit successively, and image in this mark and pixel value information, calculates the encoded radio corresponding to this mark.If do not search out legal mark, then directly quit a program.

305～309: the video tracking module obtains mark apex coordinate corresponding in mark coordinate system and the image coordinate system, and calculating is tied to the transformation matrix of camera coordinates system from the mark coordinate, mark becomes the process of phase on phase plane, be equivalent to a little to fasten and transform to camera coordinates and fasten, project to the two dimensional image that forms mark on the phase plane then from the three-dimensional symbol article coordinate; The video tracking module is by the pattern of the two-dimentional visual encoding intermediate code part of sampling, draw encoded radio, from the actual situation correspondence database, retrieve three-dimensional model by the mark encoded radio that recognizes then corresponding to this coding, and obtain the summit array of this three-dimensional model, at last the apex coordinate in the array of summit be multiply by transformation matrix and obtain the coordinate of this three-dimensional picture under camera coordinates system; The virtual pattern system module is drawn the three-dimensional picture with this coordinate array correspondence, and this three-dimensional picture is stored in the frame buffer, generates the virtual pattern frame; The actual situation synthesis module synthesizes the synthetic video frame of the actual environment that is enhanced with the frame of video that the virtual pattern frame that obtains and video frame capture module capture two-dimentional visual encoding mark in the true environment.

Wherein, in having obtained mark coordinate system and image coordinate system, behind the coordinate on corresponding mark summit, can solve the product of projection matrix and T by Simultaneous Equations.And projection matrix depends on the inner parameter of camera fully, therefore can calculate to obtain transformation matrix T.

Concerning based on the augmented reality of mark, a mark has in fact just defined a mark coordinate system, and the three-dimensional figure on the mark plane in fact all is based upon in the mark coordinate system.The conversion that is tied to camera coordinates system from the mark coordinate is basic identical with the conversion process of being mentioned before, supposes that the mark coordinate system is equivalent to the coordinate system of a Z=0, has like this:

[\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \end{matrix}] = [\begin{matrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{matrix}] [\begin{matrix} x_{w} \\ y_{w} \\ 0 \\ 1 \end{matrix}] = [\begin{matrix} p_{11} & p_{12} & p_{14} \\ p_{21} & p_{22} & p_{24} \\ p_{31} & p_{32} & p_{33} \end{matrix}] [\begin{matrix} x_{w} \\ y_{w} \\ 1 \end{matrix}] = H [\begin{matrix} x_{w} \\ y_{w} \\ 1 \end{matrix}] - - - Eq 1 - 1

P wherein _IjThe element of the capable j row of the i of the perspective projection matrix mentioned before the expression, H then be one with the projection matrix of two-dimentional mark coordinate transform to the 3x3 of two dimensional image coordinate, be called homography.If carry out normalized, then Eq1-1 can be expressed as again:

[\begin{matrix} X' \\ Y' \\ 1 \end{matrix}] = λ [\begin{matrix} X \\ Y \\ 1 \end{matrix}]

Wherein λ is a scale-up factor.

A such matrix has been arranged, it can have been write as the form of equation:

x' = \frac{h_{11} x + h_{12} y + h_{13}}{h_{31} x + h_{32} y + h_{33}}

y' = \frac{h_{21} x + h_{22} y + h_{23}}{h_{31} x + h_{32} y + h_{33}} - - - Eq 1 - 2

H wherein _IjDefine the capable j column element of i of H, converted it into linear forms:

x′(h ₃₁x+h ₃₂y+h ₃₃)＝h ₁₁x+h ₁₂y+h ₁₃

y′(h ₃₁x+h ₃₂y+h ₃₃)＝h ₂₁x+h ₂₂y+h ₂₃

Represent with matrix form:

[\begin{matrix} x & y & 1 & 0 & 0 & 0 & - x' x & - x' y & - x' \\ 0 & 0 & 0 & x & y & 1 & - y' x & - y' y & - y' \end{matrix}] h = 0 - - - Eq 1 - 3

H=[h ₁₁h ₁₂h ₁₃h ₂₁h ₂₂h ₂₃h ₃₁h ₃₂h ₃₃] ^TBe the vector of a 9x1, element wherein is over against each element of answering among the H.By normalized processing, other is h ₃₃=1, then also have 8 unknown quantitys among the h, need find the solution by 8 equations.Need obtain 4 summit (x this moment ₁, y ₁), (x ₂, y ₂), (x ₃, y ₃), (x ₄, y ₄) and their corresponding point (x in the separated image plane ₁', y ₁'), (x ₂', y ₂'), (x ₃', y ₃'), (x ₄', y ₄'), (x wherein ₁, y ₁), (x ₂, y ₂), (x ₃, y ₃), (x ₄, y ₄) coordinate in its mark coordinate system is known.Under the situation of the information completely on these 4 summits, obtain following matrix equation:

[\begin{matrix} x_{1} & y_{1} & 1 & 0 & 0 & 0 & - x_{1}' x_{1} & - x_{1}' y_{1} & - x_{1}' \\ 0 & 0 & 0 & x_{1} & y_{1} & 1 & - y_{1}' x_{1} & - y_{1}' y_{1} & - y_{1}' \\ x_{2} & y_{2} & 1 & 0 & 0 & 0 & - x_{2}' x_{2} & - x_{2}' y_{2} & - x_{2}' \\ 0 & 0 & 0 & x_{2} & y_{2} & 1 & - y_{2}' x_{2} & - y_{2}' y_{2} & - y_{2}' \\ x_{3} & y_{3} & 1 & 0 & 0 & 0 & - x_{3}' x_{3} & - x_{3}' y_{3} & - x_{3}' \\ 0 & 0 & 0 & x_{3} & y_{3} & 1 & - y_{3}' x_{3} & - y_{3}' y_{3} & - y_{3}' \\ x_{4} & y_{4} & 1 & 0 & 0 & 0 & - x_{4}' x_{4} & - x_{4}' y_{4} & - x_{4}' \\ 0 & 0 & 0 & x_{4} & y_{4} & 1 & - y_{4}' x_{4} & - y_{4}' y_{4} & - y_{4}' \end{matrix}] h = Ah = 0 - - - Eq 1 - 4

In finding the solution the process of h, need do such distortion

[\begin{matrix} x_{1} & y_{1} & 1 & 0 & 0 & 0 & - x_{1}' x_{1} & - x_{1}' y_{1} \\ 0 & 0 & 0 & x_{1} & y_{1} & 1 & - y_{1}' x_{1} & - y_{1}' y_{1} \\ x_{2} & y_{2} & 1 & 0 & 0 & 0 & - x_{2}' x_{2} & - x_{2}' y_{2} \\ 0 & 0 & 0 & x_{2} & y_{2} & 1 & - y_{2}' x_{2} & - y_{2}' y_{2} \\ x_{3} & y_{3} & 1 & 0 & 0 & 0 & - x_{3}' x_{3} & - x_{3}' y_{3} \\ 0 & 0 & 0 & x_{3} & y_{3} & 1 & - y_{3}' x_{3} & - y_{3}' y_{3} \\ x_{4} & y_{4} & 1 & 0 & 0 & 0 & - x_{4}' x_{4} & - x_{4}' y_{4} \\ 0 & 0 & 0 & x_{4} & y_{4} & 1 & - y_{4}' x_{4} & - y_{4}' y_{4} \end{matrix}] h = Ch = [\begin{matrix} x_{1}' \\ y_{1}' \\ x_{2}' \\ y_{2}' \\ x_{3}' \\ y_{3}' \\ x_{4}' \\ y_{4} \end{matrix}] = B - - - Eq 1 - 5

H=C then ^-1B, the solution procedure of whole h just can be classified as the inversion process of C and a multiplication of matrices.After obtaining h, the coordinate of the point on the corresponding plane of delineation just can be obtained by Eq1-2.

So far, the H that obtains can only be used for carrying out the conversion of 2D-2D, and this is because done certain restriction in Eq1-1, supposes that Z component equals 0 all the time in the mark coordinate system, and this has also just limited temporarily can not define three-dimensional object in the Marker coordinate system.In order to realize being tied to the conversion of the 3D-2D of plane of delineation coordinate system, must obtain p among the Eq1-1 from the mark coordinate ₁₃, p ₂₃, p ₃₃Value.

H is that a reduced form of perspective projection matrix can be defined as following form to H equally under the situation of Z=0 as can be seen from Eq1-1:

H = [\begin{matrix} f_{u} r_{11} & f_{u} r_{12} & f_{u} t_{1} \\ f_{v} r_{21} & f_{v} r_{22} & f_{v} t_{2} \\ r_{31} & r_{32} & t_{3} \end{matrix}] - - - Eq 1 - 6

Because R is a rotation matrix, then can obtain according to its orthogonal attributes:

r ₁₁ ²+r ₂₁ ²+r ₃₁ ²＝1 Eq1-7

r ₁₂ ²+r ₂₂ ²+r ₃₂ ²＝1 Eq1-8

r ₁₁?r ₁₂+r ₂₁?r ₂₂+r ₃₁?r ₃₂＝0 Eq1-9

With Eq1-6 substitution Eq1-9, can obtain:

h ₁₁h ₁₂/f _u ²+h ₂₁h ₂₂/f _v ²+h ₃₁h ₃₂＝0 Eq1-10

Similarly, will can obtain after Eq1-7 and the Eq1-8 substitution:

λ ²(h ₁₁/f _u ²+h ₂₁/f _v ²+h ₃₁ ²)＝0 Eq1-11

λ ²(h ₁₂/f _u ²+h ₂₂/f _v ²+h ₃₂ ²)＝0 Eq1-12

By the λ among cancellation Eq1-10 and the Eq1-11 ², can obtain following formula:

(h ₁₁ ²-h ₁₂ ²)/f _u ²+(h ₂₁ ²-h ₂₂ ²)/f _v ²+h ₃₁ ²-h ₃₂ ²＝0 Eq1-13

In conjunction with Eq1-10 and Eq1-13, can obtain f again _uAnd f _vComputing formula:

f_{u} = \sqrt{\frac{h_{11} h_{12} ({h_{21}}^{2} - {h_{22}}^{2}) - h_{21} h_{22} ({h_{11}}^{2} - {h_{12}}^{2})}{- h_{31} h_{32} ({h_{21}}^{2} - {h_{22}}^{2}) + h_{21} h_{22} ({h_{31}}^{2} - {h_{32}}^{2})}} - - - Eq 1 - 14

f_{v} = \sqrt{\frac{h_{11} h_{12} ({h_{21}}^{2} - {h_{22}}^{2}) - h_{21} h_{22} ({h_{11}}^{2} - {h_{12}}^{2})}{- h_{31} h_{32} ({h_{11}}^{2} - {h_{12}}^{2}) + h_{11} h_{12} ({h_{31}}^{2} - {h_{32}}^{2})}} - - - Eq 1 - 15

Obtaining camera inner parameter f _uAnd f _vAfterwards, can utilize Eq1-11 or Eq1-12 to calculate the value of λ:

λ = \frac{1}{\sqrt{{h 11}^{2} / {fu}^{2} + {h 21}^{2} / {fv}^{2} + {h 31}^{2}}}

Again according to Eq1-1, all camera external parameters also can be by calculating:

r ₁₁＝λh ₁₁/f _u r ₁₂＝λh ₁₂/f _u r ₁₃＝r ₂₁r ₃₂-r ₃₁r ₂₂ t ₁＝λh ₁₃/f _u

r ₂₁＝λh ₂₁/f _v r ₂₂＝λh ₂₂/f _v r ₂₃＝r ₃₁r ₁₂-r ₁₁r ₃₂ t ₂＝λh ₂₃/f _v

r ₃₁＝λh ₃₁ r ₃₂＝λh ₃₂ r ₃₃＝r ₁₁r ₂₂-r ₂₁r ₁₂ t ₃＝λh ₃₃

Obtained after all the camera inner parameter and external parameter, just can by

[\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \end{matrix}] = [\begin{matrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{matrix}] [\begin{matrix} x_{w} \\ y_{w} \\ z_{w} \\ 1 \end{matrix}]

Realize being tied to the 3D-2D conversion of the plane of delineation from the mark coordinate.

310: video display module obtains the synthetic video frame of augmented reality environment, and it is outputed on the display screen.

Wherein, in this programme, use OpenGL ES as three-dimensional picture API (application programming interface).OpenGL ES is the little space subclass of OpenGL, is applied to senior embedded graphic and handles, and can support the 3D game function, and can be used on the platform widely that comprises fixed-point processor.For three-dimensional model being presented at the background front, drawing API all adopts OpenGL ES, then with background as texture, be attached on the onesize rectangle of one and screen, the code that relevant array and drafting wherein are set is:

GlVertexPointer (3, GL_SHORT, 0, vertices); // the apex coordinate array is set

GlTexCoordPointer (2, GL_BYTE, 0, texCoords); // the texture coordinate array is set

GlDrawElements (GL_TRIANGLES, 2*3, GL_UNSIGNED_BYTE, triangles); // rendering context

Obtained the position of three-dimensional model under image coordinate system in front, just can show it as long as the position coordinates obtained is arranged in the summit array of OpenGL ES correspondence.The code that shows three-dimensional model is:

GlVertexPointer (3, GL_FLOAT, 0, iVertexPointer); // three-dimensional body summit array is set

GlDrawElements (GL_TRIANGLES, faceNum*3, GL_UNSIGNED_SHORT, iIndex); // drawing three-dimensional object.

The video frame capture module is used for catching the frame of video of the two-dimentional visual encoding mark of true environment, and this frame of video is sent to the video tracking module;

The video tracking module is used for the frame of video that obtains is carried out computing, obtains the relative orientation of environment and camera;

The virtual pattern system module is used for drawing corresponding three-dimensional picture according to the coordinate array of the three-dimensional picture that obtains under camera coordinates system;

The actual situation synthesis module is used for the virtual pattern frame that will obtain and the frame of video of the two-dimentional visual encoding mark of true environment is synthesized the synthetic video frame of the actual environment that is enhanced.

See also Fig. 4 and Fig. 5, Fig. 4 is for realizing the system function module synoptic diagram of three-dimensional enhanced reality, and Fig. 5 is the augmented reality algorithm flow chart based on two-dimensional visualization coding maker thing, describes the present invention below in conjunction with Fig. 4 and Fig. 5.

401: the video frame capture module is caught the frame of video of two-dimentional visual encoding mark in the true environment, and this frame of video is sent to the video tracking module;

Wherein, before the video frame capture module is caught the frame of video of two-dimentional visual encoding mark in the true environment, need search out legal two-dimentional visual encoding mark, see also Fig. 5, as follows comprising 501～504 concrete steps:

Initialization 3 d graphics system environment, camera inner parameter and mark file, the target of graphics system context initialization is that the drawing environment can support X-Y scheme and three-dimensional picture simultaneously is set, and comprises obtaining display mode being set, the display parameter tabulation being set, display device, establishment display surface, the display surface parameter is set, viewpoint position and view plane etc. are set.

Grasp a video frame images by camera, and this video frame images is carried out convergent-divergent, make it to be suitable as the texture mapping in the graphics system, then it is plotted as background and is stored in the frame buffer that graphics system provides.

The video frame images that grasps is carried out Flame Image Process, specifically comprise image is carried out gray scale processing and binaryzation, mark and profile extraction processing.

Y＝0.2126*red+0.7152*green+0.0722*blue

Average threshold value g _s(n) will calculate according to following formula:

If T (n)=1 is p _n＜(g _s(n)/s) * (1-t/100)

T=15 wherein

Or T (n)=0, if other situations.

Algorithm following (is example with from left to right scanning process):

(i is the high order end picture element to for; I＜picture traverse; I++)

if[p _n<(g _s(n)/s)*(1-t/100)]{

If (top!=Bei Jingyanse ﹠amp; ﹠amp;=background color)

In the equals of top chained list, search for;

If (not having left in the equals chained list of top)

The equals chained list of left and the chained list of top are connected

}

Else if (top!=background color)

The label of present picture element point is designated as top;

}

Else if (left!=background color)

The label of present picture element point is designated as left;

}

else{

currentcluster++；

equals[currentcluster]＝currentcluster；

Present picture element point label=currentcluster; // new label

}

To compare by the legal mark in characteristic matching algorithm and the mark file through the mark of Flame Image Process, find legal two-dimentional visual encoding mark;

402: the video tracking module obtains the frame of video from the true environment of video frame capture module, and this frame of video is carried out computing;

See also among Fig. 5 505～506, its concrete steps are as follows:

The video tracking module obtains mark apex coordinate corresponding in mark coordinate system and the image coordinate system, and calculating is tied to the transformation matrix of camera coordinates system from the mark coordinate, mark becomes the process of phase on phase plane, be equivalent to a little to fasten and transform to camera coordinates and fasten, project to the two dimensional image that forms mark on the phase plane then from the three-dimensional symbol article coordinate; And by the two-dimentional visual encoding intermediate code of sampling pattern partly, draw encoded radio, from the actual situation correspondence database, retrieve three-dimensional model corresponding and this coding by the mark encoded radio that recognizes then, and obtain the summit array of this three-dimensional model, at last the apex coordinate in the array of summit be multiply by transformation matrix and obtain the coordinate of this three-dimensional picture under camera coordinates system;

[\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \end{matrix}] = [\begin{matrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{matrix}] [\begin{matrix} x_{w} \\ y_{w} \\ 0 \\ 1 \end{matrix}] = [\begin{matrix} p_{11} & p_{12} & p_{14} \\ p_{21} & p_{22} & p_{24} \\ p_{31} & p_{32} & p_{33} \end{matrix}] [\begin{matrix} x_{w} \\ y_{w} \\ 1 \end{matrix}] = H [\begin{matrix} x_{w} \\ y_{w} \\ 1 \end{matrix}]

Eq1-1

[\begin{matrix} X' \\ Y' \\ 1 \end{matrix}] = λ [\begin{matrix} X \\ Y \\ 1 \end{matrix}]

Wherein λ is a scale-up factor.

x' = \frac{h_{11} x + h_{12} y + h_{13}}{h_{31} x + h_{32} y + h_{33}}

y' = \frac{h_{21} x + h_{22} y + h_{23}}{h_{31} x + h_{32} y + h_{33}} - - - Eq 1 - 2

x′(h ₃₁x+h ₃₂y+h ₃₃)＝h ₁₁x+h ₁₂y+h ₁₃

y′(h ₃₁x+h ₃₂y+h ₃₃)＝h ₂₁x+h ₂₂y+h ₂₃

Matrix form is represented:

[\begin{matrix} x & y & 1 & 0 & 0 & 0 & - x' x & - x' y & - x' \\ 0 & 0 & 0 & x & y & 1 & - y' x & - y' y & - y' \end{matrix}] h = 0 - - - Eq 1 - 3

[\begin{matrix} x_{1} & y_{1} & 1 & 0 & 0 & 0 & - x_{1}' x_{1} & - x_{1}' y_{1} & - x_{1}' \\ 0 & 0 & 0 & x_{1} & y_{1} & 1 & - y_{1}' x_{1} & - y_{1}' y_{1} & - y_{1}' \\ x_{2} & y_{2} & 1 & 0 & 0 & 0 & - x_{2}' x_{2} & - x_{2}' y_{2} & - x_{2}' \\ 0 & 0 & 0 & x_{2} & y_{2} & 1 & - y_{2}' x_{2} & - y_{2}' y_{2} & - y_{2}' \\ x_{3} & y_{3} & 1 & 0 & 0 & 0 & - x_{3}' x_{3} & - x_{3}' y_{3} & - x_{3}' \\ 0 & 0 & 0 & x_{3} & y_{3} & 1 & - y_{3}' x_{3} & - y_{3}' y_{3} & - y_{3}' \\ x_{4} & y_{4} & 1 & 0 & 0 & 0 & - x_{4}' x_{4} & - x_{4}' y_{4} & - x_{4}' \\ 0 & 0 & 0 & x_{4} & y_{4} & 1 & - y_{4}' x_{4} & - y_{4}' y_{4} & - y_{4}' \end{matrix}] h = Ah = 0 - - - Eq 1 - 4

In finding the solution the process of h, need do such distortion

[\begin{matrix} x_{1} & y_{1} & 1 & 0 & 0 & 0 & - x_{1}' x_{1} & - x_{1}' y_{1} \\ 0 & 0 & 0 & x_{1} & y_{1} & 1 & - y_{1}' x_{1} & - y_{1}' y_{1} \\ x_{2} & y_{2} & 1 & 0 & 0 & 0 & - x_{2}' x_{2} & - x_{2}' y_{2} \\ 0 & 0 & 0 & x_{2} & y_{2} & 1 & - y_{2}' x_{2} & - y_{2}' y_{2} \\ x_{3} & y_{3} & 1 & 0 & 0 & 0 & - x_{3}' x_{3} & - x_{3}' y_{3} \\ 0 & 0 & 0 & x_{3} & y_{3} & 1 & - y_{3}' x_{3} & - y_{3}' y_{3} \\ x_{4} & y_{4} & 1 & 0 & 0 & 0 & - x_{4}' x_{4} & - x_{4}' y_{4} \\ 0 & 0 & 0 & x_{4} & y_{4} & 1 & - y_{4}' x_{4} & - y_{4}' y_{4} \end{matrix}] h = Ch = [\begin{matrix} x_{1}' \\ y_{1}' \\ x_{2}' \\ y_{2}' \\ x_{3}' \\ y_{3}' \\ x_{4}' \\ y_{4} \end{matrix}] = B - - - Eq 1 - 5

H = [\begin{matrix} f_{u} r_{11} & f_{u} r_{12} & f_{u} t_{1} \\ f_{v} r_{21} & f_{v} r_{22} & f_{v} t_{2} \\ r_{31} & r_{32} & t_{3} \end{matrix}] - - - Eq 1 - 6

r ₁₁ ²+r ₂₁ ²+r ₃₁ ²＝1 Eq1-7

r ₁₂ ²+r ₂₂ ²+r ₃₂ ²＝1 Eq1-8

r ₁₁r ₁₂+r ₂₁r ₂₂+r ₃₁r ₃₂＝0 Eq1-9

With Eq1-6 substitution Eq1-9, can obtain:

h ₁₁h ₁₂/f _u ²+h ₂₁h ₂₂/f _v ²+h ₃₁h ₃₂＝0 Eq1-10

Similarly, will can obtain after Eq1-7 and the Eq1-8 substitution:

λ ²(h ₁₁/f _u ²+h ₂₁/f _v ²+h ₃₁ ²)＝0 Eq1-11

λ ²(h ₁₂/f _u ²+h ₂₂/f _v ²+h ₃₂ ²)＝0 Eq1-12

f_{u} = \sqrt{\frac{h_{11} h_{12} ({h_{21}}^{2} - {h_{22}}^{2}) - h_{21} h_{22} ({h_{11}}^{2} - {h_{12}}^{2})}{- h_{31} h_{32} ({h_{21}}^{2} - {h_{22}}^{2}) + h_{21} h_{22} ({h_{31}}^{2} - {h_{32}}^{2})}} - - - Eq 1 - 14

f_{v} = \sqrt{\frac{h_{11} h_{12} ({h_{21}}^{2} - {h_{22}}^{2}) - h_{21} h_{22} ({h_{11}}^{2} - {h_{12}}^{2})}{- h_{31} h_{32} ({h_{11}}^{2} - {h_{12}}^{2}) + h_{11} h_{12} ({h_{31}}^{2} - {h_{32}}^{2})}} - - - Eq 1 - 15

λ = \frac{1}{\sqrt{{h 11}^{2} / {fu}^{2} + {h 21}^{2} / {fv}^{2} + {h 31}^{2}}}

[\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \end{matrix}] = [\begin{matrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \end{matrix}] [\begin{matrix} x_{w} \\ y_{w} \\ z_{w} \\ 1 \end{matrix}]

403: the virtual pattern system module is drawn the three-dimensional picture with this coordinate array correspondence, and this three-dimensional picture is stored in the frame buffer, generates the virtual pattern frame;

See also among Fig. 5 507～508, the virtual pattern system module is drawn the three-dimensional picture of coding C correspondence according to transformation matrix T, and this three-dimensional picture is stored in the frame buffer, generates the virtual pattern frame.

404: the actual situation synthesis module synthesizes the synthetic video frame of the actual environment that is enhanced with the frame of video that the virtual pattern frame that obtains and video frame capture module capture two-dimentional visual encoding mark in the true environment;

405: video display module obtains the synthetic video frame of augmented reality environment, and is entered on the display.

GlVertexPointer (3, GL_SHORT, 0, vertices); // the apex coordinate array is set

GlDrawElements (GL_TRIANGLES, 2 ^*3, GL_UNSIGNED_BYTE, triangles); // rendering context

Obtained the position of three-dimensional model under image coordinate system in front,, just can show it in the corresponding summit array as long as the position coordinates of obtaining is arranged on the OpenGL ES.The code that shows three-dimensional model is:

More than a kind of two-dimentional visual encoding provided by the present invention is attached to three-dimensional enhanced reality method be described in detail, the coding rule of two-dimentional visual encoding is not unique in the such scheme, by the figure place that changes coding, the shape that changes image in the coding and the recognition feature or the feature number that change coded image, thereby obtain the numerical coding of new coding rule; Perhaps adopt other three-dimensional picture application programming interface, or adopt other virtual patterns to generate and method for drafting, or change virtual pattern and the synthetic front and back order of frame of video, all can reach the object of the invention.Used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention and algorithm, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims

1, a kind of method that realizes three-dimensional enhanced reality is characterized in that, specifically comprises step:

A: the frame of video that obtains two-dimentional visual encoding mark;

2, the method for realization three-dimensional enhanced reality according to claim 1 is characterized in that, comprises pattern in the two-dimentional visual encoding mark in the described steps A, and described pattern is arranged according to coding rule.

3, the method for realization three-dimensional enhanced reality according to claim 1 is characterized in that, further comprises step before the described steps A:

A2: grasp the video frame images of mark, and make it as the texture mapping in the graphics system this image zoom;

A4: the video frame images that grasps in the steps A 2 is carried out Flame Image Process;

4, the method for realization three-dimensional enhanced reality according to claim 3 is characterized in that, described steps A 4 specifically comprises step:

A42: the frame of video that grasps is carried out image tagged;

A43: the frame of video that grasps is carried out profile extract.

According to the method for claim 1 or 3 described realization three-dimensional enhanced realities, it is characterized in that 5, described step B specifically comprises step:

6, the method for realization three-dimensional enhanced reality according to claim 5 is characterized in that, further comprises step behind the described step B2:

7, the method for realization three-dimensional enhanced reality according to claim 6 is characterized in that, also comprises step behind the described step B3:

8, a kind of system that realizes three-dimensional enhanced reality is characterized in that, this system comprises: video frame capture module, video tracking module, virtual pattern system module and actual situation synthesis module;