WO2022080553A1

WO2022080553A1 - Device and method for constructing 3d map for providing augmented reality on basis of pose information and depth information

Info

Publication number: WO2022080553A1
Application number: PCT/KR2020/015392
Authority: WO
Inventors: 장준환; 박우출; 양진욱; 윤상필; 최민수; 이준석; 송수호; 구본재
Original assignee: 한국전자기술연구원
Priority date: 2020-10-15
Filing date: 2020-11-05
Publication date: 2022-04-21
Also published as: KR102472568B1; KR20220050253A

Abstract

A device for configuring a three-dimensional map for providing augmented reality, of the present invention, comprises: a pose acquisition unit which, when two frames including a first frame and a second frame of an image captured while moving location are inputted, acquires pose information from the two input frames; a depth map calculation unit which derives a depth map from the two frames by using a deep learning model; and a 3D map generation unit which generates a 3D map on the basis of the pose information and the depth map.

Description

Apparatus and method for configuring a 3D map for providing augmented reality based on pose information and depth information

The present invention relates to a 3D map construction technology, and more particularly, to an apparatus and a method for configuring a 3D map for providing augmented reality based on pose information and depth map information.

Virtual reality (VR) refers to a specific environment or situation or the technology itself that is similar to reality created by artificial technology using a computer, etc. but is not real. Augmented reality (AR) is a field of virtual reality (VR) and is a computer graphic technique that synthesizes virtual objects or information in an actual environment to make them appear as if they exist in the original environment. In other words, augmented reality is a technology that superimposes virtual objects on the real world that users see with their eyes. It is also called mixed reality (MR) because the real world and the virtual world with additional information are combined in real time and displayed as a single image. Augmented reality, a concept that complements the real world with a virtual world, uses a virtual environment created with computer graphics, but the main character is the real environment. Computer graphics serve to provide additional information necessary for the real environment. This means that the distinction between the real environment and the virtual screen is blurred by overlapping the 3D virtual image on the actual image the user is viewing.

Virtual reality technology immerses the user in the virtual environment, making it impossible to see the real environment. However, augmented reality technology, in which the real environment and virtual objects are mixed, allows users to see the real environment, providing better realism and additional information.

An object of the present invention is to provide an apparatus and a method for configuring a 3D map for providing augmented reality based on pose information and depth map information, and a method therefor.

An apparatus for constructing a three-dimensional map for providing augmented reality according to a preferred embodiment of the present invention for achieving the above object includes a first frame and a second frame of an image taken while moving the location When two frames are input, a pose obtaining unit obtaining pose information from the two input frames, a depth map calculating unit deriving a depth map from the two frames using a deep learning model, the pose information and the and a 3D map generator that generates a 3D map based on the depth map.

The pose acquisition unit extracts a feature point representing the same object from each of the two frames, and sequentially derives pose information and a pose matrix based on the coordinate change of the extracted feature point.

The depth calculation unit derives a transformation matrix by using a known camera matrix and the pose matrix, and generates a simulating second frame simulating the second frame from the first frame using the transformation matrix, and the deep learning model It is characterized in that the depth map is derived according to the coordinate difference between the pixel of the second frame and the pixel of the second simulated frame through .

The depth calculation unit derives a transformation matrix using a pose matrix derived from the first frame for learning and the second frame for learning and a known camera matrix, and uses the transformation matrix to simulate the second frame for learning from the first frame for learning generates a second frame for simulation learning, and learns the correlation between the coordinate difference and the depth of the pixel of the second frame for learning of the portion representing the same object in the real world with respect to the prototype of the model and the pixel of the second frame for simulation learning It is characterized in that to generate a deep learning model for deriving a depth map according to the coordinate difference between the pixel of the second frame for learning and the pixel of the second frame for simulation learning.

In a method for constructing a three-dimensional map for providing augmented reality according to a preferred embodiment of the present invention for achieving the above object, the first frame and the second frame of the image captured while the pose acquirer moves the position When two frames including and generating, by a generator, a 3D map based on the pose information and the depth map.

The step of obtaining the pose information from the two frames includes the steps of: the pose obtaining unit extracting a feature point representing the same object from each of the two frames; and sequentially deriving a pose matrix.

The step of deriving the depth map from the two frames includes deriving a transformation matrix using the camera matrix and the pose matrix known by the depth calculator, and the depth calculator using the transform matrix to obtain a second one of the two frames. Generating a simulated second frame simulating a second frame from one frame, and the depth calculation unit using the deep learning model to generate a depth map according to the coordinate difference between the pixels of the second frame and the pixels of the simulated second frame It includes the step of deriving.

The method includes the steps of deriving a transformation matrix using a pose matrix and a known camera matrix derived from the first frame for training and the second frame for training by the depth calculator before acquiring the pose information from the two frames; The depth calculator generates a second frame for simulation learning that simulates the second frame for learning from the first frame for learning using the transformation matrix, and the depth calculator represents the same object in the real world with respect to the prototype of the model By learning the correlation between the coordinate difference and the depth of the pixel of the second frame for learning and the pixel of the second frame for imitation learning, the pixel of the second frame for learning and the pixel of the second frame for imitation learning according to the coordinate difference The method further includes generating a deep learning model for deriving a depth map.

According to the present invention, a 3D map for providing augmented reality may be configured based on pose information and depth map information. Accordingly, according to the present invention, a virtual object can be registered with an image captured using a 3D map. Since the 3D map of the present invention provides precise 3D coordinates, precise registration is possible when registering a virtual object to an image. Accordingly, it is possible to provide augmented reality with higher realism.

1 is a diagram for explaining the configuration of an apparatus for configuring a 3D map for providing augmented reality based on pose information and depth information according to an embodiment of the present invention.

2 is a view for explaining a detailed configuration of a control unit according to an embodiment of the present invention.

3 is a diagram for explaining a method of deriving a pose matrix according to an embodiment of the present invention.

4 is a diagram for explaining a method of deriving a transformation matrix using a pose matrix and a camera matrix according to an embodiment of the present invention.

5 is a diagram for explaining a method of generating a simulated frame using a transform matrix according to an embodiment of the present invention.

6 is a diagram for explaining a method for learning a correlation between pixels of two frames according to an embodiment of the present invention.

7 is a flowchart illustrating a method for configuring a 3D map for providing augmented reality based on pose information and depth information according to an embodiment of the present invention.

Prior to the detailed description of the present invention, the terms or words used in the present specification and claims described below should not be construed as being limited to their ordinary or dictionary meanings, and the inventors should develop their own inventions in the best way. It should be interpreted as meaning and concept consistent with the technical idea of the present invention based on the principle that it can be appropriately defined as a concept of a term for explanation. Therefore, the embodiments described in this specification and the configurations shown in the drawings are only the most preferred embodiments of the present invention, and do not represent all the technical spirit of the present invention, so various equivalents that can be substituted for them at the time of the present application It should be understood that there may be water and variations.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In this case, it should be noted that in the accompanying drawings, the same components are denoted by the same reference numerals as much as possible. In addition, detailed descriptions of well-known functions and configurations that may obscure the gist of the present invention will be omitted. For the same reason, some components are exaggerated, omitted, or schematically illustrated in the accompanying drawings, and the size of each component does not fully reflect the actual size.

First, an apparatus for configuring a 3D map for providing augmented reality based on pose information and depth information according to an embodiment of the present invention will be described. 1 is a diagram for explaining the configuration of an apparatus for configuring a 3D map for providing augmented reality based on pose information and depth information according to an embodiment of the present invention. Referring to FIG. 1 , an apparatus 10 (hereinafter, abbreviated as 'augmented reality device') for configuring a 3D map for providing augmented reality based on pose information and depth information according to an embodiment of the present invention is It includes a camera unit 11 , a communication unit 12 , a sensor unit 13 , an audio unit 14 , an input unit 15 , a display unit 16 , a storage unit 17 , and a control unit 18 .

The camera unit 11 is for capturing an image. In particular, the camera unit 11 according to an embodiment of the present invention may be a stereo camera. To this end, the camera unit 12 may include two lenses and two image sensors. Each image sensor receives light reflected from a subject and converts it into an electrical signal, and may be implemented based on a Charged Coupled Device (CCD), a Complementary Metal-Oxide Semiconductor (CMOS), or the like. The camera unit 11 may further include one or more analog-to-digital converters, and may convert an electrical signal output from the image sensor into a digital sequence and output it to the control unit 18 .

The communication unit 12 is for communication with other devices. The communication unit 12 may include a radio frequency (RF) transmitter (Tx) that up-converts and amplifies the frequency of the transmitted signal, and an RF receiver (Rx) that low-noise amplifies the received signal and down-converts the frequency. In addition, the communication unit 12 may include a modem that modulates a transmitted signal and demodulates a received signal.

The sensor unit 13 is for measuring inertia. The sensor unit 13 includes an Inertial Measurement Unit (IMU), a Doppler Velocity Log (DVL), an Attitude and Heading Reference System (AHRS), and the like. The sensor unit 13 measures inertial information including the position and speed of rotation and movement of the augmented reality device 10 and provides the measured inertial information of the augmented reality device 10 to the control unit 18 .

The audio unit 14 includes a speaker SPK for outputting an audio signal and a microphone MIKE for receiving an audio signal. The audio unit 14 may output an audio signal through the speaker SPK or transmit an audio signal input through the microphone MIKE to the control unit 18 under the control of the control unit 18 .

The input unit 15 receives a user's key manipulation for controlling the augmented reality device 10 , generates an input signal, and transmits it to the control unit 18 . The input unit 15 may include various types of keys for controlling the augmented reality device 10 . In the input unit 15, when the display unit 16 is formed of a touch screen, the functions of various keys can be performed on the display unit 16, and when all functions can be performed only with the touch screen, the input unit 15 may be omitted. may be

The display unit 16 visually provides a menu of the augmented reality device 10, input data, function setting information, and various other information to the user. The display unit 16 performs a function of outputting a boot screen, a standby screen, a menu screen, and the like of the augmented reality device 10 . In particular, the display unit 16 performs a function of outputting a 3D map according to an embodiment of the present invention to the screen. The display unit 16 may be formed of a liquid crystal display (LCD), an organic light emitting diode (OLED), an active matrix organic light emitting diode (AMOLED), or the like. Meanwhile, the display unit 16 may be implemented as a touch screen. In this case, the display unit 16 includes a touch sensor. The touch sensor detects a user's touch input. The touch sensor may be composed of a touch sensing sensor such as a capacitive overlay, a pressure type, a resistive overlay, or an infrared beam, or may be composed of a pressure sensor. . In addition to the above sensors, all types of sensor devices capable of sensing contact or pressure of an object may be used as the touch sensor of the present invention. The touch sensor may detect a user's touch input, generate a detection signal including input coordinates indicating the touched position, and transmit it to the controller 18 . In particular, when the display unit 16 is formed of a touch screen, some or all of the functions of the input unit 15 may be performed through the display unit 16 .

The storage unit 17 serves to store programs and data necessary for the operation of the augmented reality device 10 . In particular, the storage unit 17 may store a camera matrix, a pose matrix, and the like. Various types of data stored in the storage unit 17 may be deleted, changed, or added according to a user's operation of the augmented reality device 10 .

The controller 18 may control the overall operation of the augmented reality device 10 and the signal flow between internal blocks of the augmented reality device 10 , and perform a data processing function of processing data. In addition, the control unit 18 basically performs a role of controlling various functions of the augmented reality device (10). The controller 18 may include a central processing unit (CPU), a baseband processor (BP), an application processor (AP), a graphic processing unit (GPU), a digital signal processor (DSP), and the like.

Then, the detailed configuration of the above-described control unit 18 will be described in more detail. 2 is a view for explaining a detailed configuration of a control unit according to an embodiment of the present invention. 3 is a diagram for explaining a method of deriving a pose matrix according to an embodiment of the present invention. 4 is a diagram for explaining a method of deriving a transformation matrix using a pose matrix and a camera matrix according to an embodiment of the present invention. 5 is a diagram for explaining a method of generating a simulated frame using a transform matrix according to an embodiment of the present invention. 6 is a diagram for explaining a method for learning a correlation between pixels of two frames according to an embodiment of the present invention.

Referring to FIG. 2 , the control unit 18 includes a pose acquisition unit 110: Device Pose Acquisition, a depth calculation unit 120: Depth Map Acquisition, and a 3D map generation unit 130: 3D Map Reconstruction.

The pose acquisition unit 110 is for acquiring a pose of the augmented reality device 10 . The pose acquisition unit 110 extracts a feature point representing the same object from each of a plurality of frames of an image photographed through the camera unit 11 while moving the position, and changes the coordinates of the extracted feature point. , to calculate pose information. For example, the first frame F1 and the second frame F2 of FIG. 3 represent images captured while moving positions. The feature point P in the first frame F1 of FIG. 3 is moved from a position P(t-1) in the first frame F1 to a position in the second image F2 by P(t). Accordingly, it can be seen that the augmented reality device 10 has moved backward from the degree to which the feature point P has moved. In this way, the pose acquisition unit 110 derives pose information (position, rotation information) by calculating changes in feature points, and expresses the pose information as a matrix to derive a pose matrix.

The depth calculator 120 is to acquire a depth map using a deep learning model (DLM). A deep learning model (DLM) is generated through the following method using learning data including a first frame for learning and a second frame for learning. As shown in FIG. 4 , the depth calculator 120 first derives a transition matrix (TM) using a pose matrix (PM) and a camera matrix (CM). Here, the pose matrix PM is a matrix expressed by the pose information derived from the first frame F1 for learning and the second frame F2 for learning by the pose acquisition unit 110 . Also, the camera matrix CM is known as an internal parameter of the camera unit 11 . And as shown in FIG. 5, the depth calculation unit 120 transforms the first frame F1 for learning by using the transformation matrix TM to simulate the second frame F2 for learning. F2') is created. And, as shown in FIG. 6 , the depth calculation unit 120 includes the pixels of the second frame F2 for learning and the second frame F2' for imitation of a portion representing the same object in the real world with respect to the prototype of the model. By learning the correlation between the pixel coordinate difference and the depth (deep learning), the depth map is derived according to the coordinate difference between the pixels of the second frame F2 for learning and the pixels of the second frame F2' for simulation learning. to create a deep learning model (DLM).

As described above, after generating the deep learning model (DLM), the depth calculating unit 120 moves the position through the camera unit 11 and the first frame F1 of the image captured through the camera unit 11 . ) and the second frame F2 are input, as shown in FIG. 4 , a transformation matrix TM is derived using the pose matrix PM and the camera matrix CM. Then, as shown in FIG. 5 , the depth calculating unit 120 transforms the first frame F1 using the transformation matrix TM to simulate the second frame F2 to simulate the second frame F2 . ') is created. Next, the depth calculation unit 120 inputs the second frame F2 and the second simulated frame F2 ′ to the deep learning model DLM. Then, the deep learning model (DLM) derives a depth map according to the coordinate difference between the pixel of the second frame F2 and the pixel of the simulated second frame F2'.

The 3D map generation unit 130 generates a 3D map for a frame of a photographed image using the pose information obtained by the pose obtaining unit 110 and the depth map derived by the depth calculating unit 120 . That is, the position and rotation information between the first frame F1 and the second frame F2 of the augmented reality device 10 can be known from the pose information, and the first frame F1 and the second frame F1 through the depth map ( Since the depth between F1) can be known, the 3D map generator 130 may convert the 2D coordinates of the pixels of the corresponding frame into 3D coordinates using the position and rotation information and the depth.

Next, a method for configuring a 3D map for providing augmented reality based on pose information and depth information according to an embodiment of the present invention will be described. 7 is a flowchart illustrating a method for configuring a 3D map for providing augmented reality based on pose information and depth information according to an embodiment of the present invention. In the embodiment of FIG. 7 , as described above, a model prototype is learned using training data including a first frame for training and a second frame for training, and the depth is determined according to the difference between the pixel coordinates of the two frames. It is assumed that the deep learning model (DLM) that derives the map is in the generated state.

Referring to FIG. 7 , the pose acquisition unit 110 receives first and second frames F1 and F2 from an image captured by the camera unit 11 while moving the position in step S110 . Then, the pose acquisition unit 110 extracts a feature point P from each of the first and second frames F1 and F2 in step S120, calculates a change in the extracted feature point P, and obtains pose information and a pose matrix. Calculate. For example, the feature point P in the first frame F1 of FIG. 3 is moved from a position P(t-1) in the first frame F1 to a position in the second frame F2 by P(t).

Next, as shown in FIG. 4 in step S130 , the depth calculation unit 120 performs a pose matrix (PM) derived from the first frame F1 and the second frame F2 by the pose acquisition unit 110 . and a transformation matrix TM using the known camera matrix CM for the camera unit 11 .

Subsequently, as shown in FIG. 5 in step S140 , the depth calculator 120 transforms the first frame F1 using the transformation matrix TM to simulate the second frame F2 to simulate the second frame. (F2').

Next, the depth calculator 120 inputs the second frame F2 and the second simulated frame F2 ′ to the deep learning model DLM in step S150 . Then, the deep learning model (DLM) derives a depth map according to the coordinate difference between the pixel of the second frame F2 and the pixel of the simulated second frame F2' in step S160.

Next, the 3D map generating unit 130 uses the pose information obtained by the pose obtaining unit 110 in step S170 and the depth map derived by the depth calculating unit 120 to provide a three-dimensional (3D) frame for the captured image. draw a map That is, the position and rotation information between the first frame F1 and the second frame F2 of the augmented reality device 10 can be known from the pose information, and the first frame F1 and the second frame F1 through the depth map ( Since the depth between F1) can be known, the 3D map generator 130 may convert the 2D coordinates of the pixels of the corresponding frame into 3D coordinates using the position and rotation information and the depth.

According to the present invention, a virtual object can be registered with an image captured by using the 3D map derived as described above. Since the 3D map of the present invention provides precise 3D coordinates, precise registration is possible when registering virtual objects. Accordingly, it is possible to provide augmented reality with higher realism.

Meanwhile, the method according to the embodiment of the present invention described above may be implemented in the form of a program readable by various computer means and recorded in a computer readable recording medium. Here, the recording medium may include a program command, a data file, a data structure, etc. alone or in combination. The program instructions recorded on the recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. For example, the recording medium includes magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floppy disks ( magneto-optical media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions may include high-level languages that can be executed by a computer using an interpreter or the like as well as machine language such as generated by a compiler. Such hardware devices may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

Although the present invention has been described above using several preferred embodiments, these examples are illustrative and not restrictive. As such, those of ordinary skill in the art to which the present invention pertains will understand that various changes and modifications can be made in accordance with the doctrine of equivalents without departing from the spirit of the present invention and the scope of rights set forth in the appended claims.

Claims

An apparatus for constructing a three-dimensional map for providing augmented reality, comprising:

When two frames including the first frame and the second frame of the image taken while moving the position is input, the pose acquisition unit for obtaining the pose information from the two frames input;

a depth map calculator for deriving a depth map from the two frames using a deep learning model; and

a three-dimensional map generator for generating a three-dimensional map based on the pose information and the depth map;

characterized in that it comprises

A device for constructing a three-dimensional map.
According to claim 1,

The pose acquisition unit

in each of the two frames

Extracting feature points representing the same object, and sequentially deriving pose information and a pose matrix based on the coordinate change of the extracted feature points

A device for constructing a three-dimensional map.
3. The method of claim 2,

The depth calculation unit

Derive a transformation matrix using the known camera matrix and the pose matrix,

generating a second frame replicating the second frame from the first frame using the transformation matrix;

Through the deep learning model, it characterized in that the depth map is derived according to the coordinate difference between the pixel of the second frame and the pixel of the second simulated frame

A device for constructing a three-dimensional map.
According to claim 1,

The depth calculation unit

Derive a transformation matrix using the pose matrix derived from the first frame for training and the second frame for training and the known camera matrix,

using the transformation matrix to generate a second frame for imitation learning that simulates the second frame for learning from the first frame for learning;

With respect to the prototype of the model, by learning the correlation between the coordinate difference and the depth of the pixel of the second frame for training of the part representing the same object in the real world and the pixel of the second frame for simulation learning, the pixel of the second frame for training and the Characterized in generating a deep learning model for deriving a depth map according to the difference in the coordinates of the pixels of the second frame for simulation learning

A device for constructing a three-dimensional map.
A method for constructing a three-dimensional map for providing augmented reality, comprising:

When the pose acquisition unit receives two frames including the first frame and the second frame of the captured image while moving the position, acquiring pose information from the two input frames;

deriving a depth map from the two frames by a depth map calculator using a deep learning model; and

generating, by a 3D map generator, a 3D map based on the pose information and the depth map;

characterized in that it comprises

A method for constructing a three-dimensional map.
6. The method of claim 5,

The step of obtaining the pose information from the two frames is

extracting, by the pose acquisition unit, a feature point representing the same object in each of the two frames; and

sequentially deriving, by the pose acquisition unit, pose information and a pose matrix based on the coordinate changes of the extracted feature points;

characterized in that it comprises

A method for constructing a three-dimensional map.
7. The method of claim 6,

The step of deriving a depth map from the two frames is

deriving a transformation matrix using the known camera matrix and the pose matrix by the depth calculator;

generating, by the depth calculator, a second frame simulating a second frame from a first frame among the two frames by using the transformation matrix; and

deriving, by the depth calculation unit, a depth map according to a difference in coordinates between a pixel of the second frame and a pixel of a simulated second frame through the deep learning model;

characterized in that it comprises

A method for constructing a three-dimensional map.
The method of claim 1,

Before the step of obtaining pose information from the two frames,

deriving a transformation matrix using the known camera matrix and the pose matrix derived from the first frame for learning and the second frame for learning by the depth calculator;

generating, by the depth calculation unit, a second frame for imitation learning by using the transformation matrix to simulate the second frame for learning from the first frame for learning; and

The depth calculation unit learns the correlation between the depth and the coordinate difference between the pixel of the second frame for learning of the portion representing the same object in the real world with respect to the prototype of the model and the pixel of the second frame for simulating learning, the second frame for learning generating a deep learning model for deriving a depth map according to a coordinate difference between a pixel of and a pixel of the second frame for simulation learning;

characterized in that it further comprises

A method for constructing a three-dimensional map.