WO2018134897A1

WO2018134897A1 - Position and posture detection device, ar display device, position and posture detection method, and ar display method

Info

Publication number: WO2018134897A1
Application number: PCT/JP2017/001426
Authority: WO
Inventors: 誠治村田; 川村　友人; 孝弘松田; 俊輝中村
Original assignee: マクセル株式会社
Priority date: 2017-01-17
Filing date: 2017-01-17
Publication date: 2018-07-26

Abstract

The purpose of the present invention is to detect with a simple configuration the position and posture of a user with high accuracy from a small amount of information. Provided is a position and posture detection device comprising: an image-capture unit 102 for image-capturing a predetermined image-capture range including two or more objects; an object detection unit 105 for specifying a pixel position of each of the objects in the image that is image-captured by the image-capture unit 102; an orientation calculation unit 106 for calculating an orientation of each of the objects with respect to the image-capture unit 102 by using the pixel position, map information on each of the objects, and a focal distance of the image-capture unit; and a position and posture calculation unit 107 for calculating the position and posture of the image-capture unit 102 by using the orientation of each of the objects and the map information. The object detection unit 105 extracts two-dimensional map information corresponding to the image-capture range from two-dimensional map information, in which information on positions and shapes of a plurality of objects within a predetermined region are stored, and specifies a pixel position by using the two-dimensional map information.

Description

Position / orientation detection apparatus, AR display apparatus, position / orientation detection method, and AR display method

The present invention relates to a position / orientation detection technique and an AR (Augmented Reality) display technique. In particular, the present invention relates to a position / orientation detection technique and an AR display technique of an apparatus including an imaging unit.

There is Patent Document 1 as background art in this technical field. Japanese Patent Laid-Open No. 2004-133867 describes that “when a user designates a three-dimensional object with a cursor on a three-dimensional map image, an actual building corresponding to the three-dimensional object is captured from the image taken by the camera at that time. The extracted image portion is extracted as texture image data and registered as texture image data of the three-dimensional object.After that, the rendering processing unit texture-maps the registered texture image data as the surface texture. A navigation device is disclosed that is drawn on a three-dimensional map image (summary excerpt).

Moreover, there is Patent Document 2 as another background art. Japanese Patent Laid-Open No. 2004-151867 discloses that “the image feature that stores the image feature Fa to be recognized, the image feature detection unit that detects the image feature Fb from the preview image to be recognized, and the recognition based on the matching result of the image features Fa and Fb”. A posture estimation unit that estimates an initial posture of a target, a tracking point selection unit that selects a tracking point Fe from an image feature Fa based on the initial posture, and a template that generates a template image of a recognition target based on the estimation result of the initial posture A generating unit; a matching unit that matches the template image and the preview image with respect to the tracking point Fe; and a posture tracking unit that tracks the posture of the recognition target in the preview image based on the tracking point Fe that has been successfully matched. (Summary Extract) "is disclosed.

JP 2009-276266 A Japanese Unexamined Patent Publication No. 2016-066187

AR display is a technique for superimposing and displaying information such as images and data related to a real scene (real scene) viewed by a user. In order to superimpose and display related information in a real scene without deviation, it is necessary to specify the position on the user side and the direction (line-of-sight direction) with high accuracy.

In the technology disclosed in Patent Document 1, a user designates an object to be displayed with related information in an actual scene superimposed. For this reason, it takes time and effort. Also, a 3D map is displayed by pasting a sight image around the car as a texture image on a 3D model. For this reason, a three-dimensional map and a three-dimensional model corresponding to the scene around the vehicle are essential. Since image processing is performed using a 3D map and a 3D model corresponding to the real space, the amount of information to be processed increases.

Also, the technique disclosed in Patent Document 2 tracks the posture change of the recognition target using an image. Although the recognition target, that is, the posture change of the object can be detected, the position on the user side cannot be detected.

The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a position detection technique for detecting the position and orientation on the user side with high accuracy from a small amount of information with a simple configuration. Problems, configurations, and effects other than those described above will be clarified by the following description of embodiments.

The present invention provides a photographing unit that photographs a predetermined photographing range including two or more objects, an object detection unit that identifies the pixel position of each of the objects in a photographed image photographed by the photographing unit, and each of the objects A direction calculation unit that calculates an object direction that is a direction with respect to the shooting unit using the pixel position, map information of each object, and a focal length of the shooting unit, and an object direction of each object A position / orientation calculation unit that calculates the position and orientation of the photographing unit using the map information, and the object detection unit stores information on positions and shapes of a plurality of objects in a predetermined area. 2D map information corresponding to the shooting range is extracted from the 2D map information, and the 2D map information Providing the position and orientation detecting apparatus characterized by specifying the pixel position using.

An AR display device that displays content on a display having transparency and reflectivity in association with an object in a scene behind the display, the position and orientation detection device, and the content displayed on the display. Using the generated display content generation unit, the position and orientation of the photographing unit determined by the position / orientation detection device, and the pixel position of the object specified by the object detection unit on the display of the generated content There is provided an AR display device comprising: a superimposing unit that determines a display position; and a display unit that displays the generated content at a display position determined by the superimposing unit on the display.

According to the present invention, the position and orientation on the user side can be detected with high accuracy from a small amount of information with a simple configuration.

It is a functional block diagram of the position and orientation detection apparatus of the first embodiment. (A) is a block diagram of the imaging | photography part of 1st embodiment, (b) is a hardware block diagram of the position and orientation detection apparatus of 1st embodiment. (A) is a block diagram of the map server system of 1st embodiment, (b) is a block diagram of the content server system of 2nd embodiment. (A)-(f) is explanatory drawing for demonstrating the position and orientation detection method of 1st embodiment. It is a flowchart of the position and orientation detection processing of the first embodiment. (A) And (b) is explanatory drawing for demonstrating pattern matching. (A)-(f) is explanatory drawing for demonstrating how to determine the representative point of 1st embodiment. (A) And (b) is explanatory drawing for demonstrating the modification of the position and orientation calculation method of 1st embodiment. (A)-(c) is explanatory drawing for demonstrating the modification of the position and orientation calculation method of 1st embodiment. It is a functional block diagram of AR display device of a second embodiment. (A) And (b) is explanatory drawing for demonstrating the display part of 2nd embodiment, and the display position of a related content. It is a flowchart of AR display processing of the second embodiment. It is a functional block diagram of AR display device of a modification of a second embodiment. (A) And (b) is explanatory drawing for demonstrating the example of a display of 2nd embodiment. (A) And (b) is explanatory drawing for demonstrating the example of a display of 2nd embodiment. (A) And (b) is explanatory drawing for demonstrating the example of a display of 2nd embodiment. (A) And (b) is explanatory drawing for demonstrating the example of a display of 2nd embodiment. (A) And (b) is explanatory drawing for demonstrating the example of a display of 2nd embodiment.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. Hereinafter, in this specification, those having the same function are denoted by the same reference numerals unless otherwise specified, and repeated description is omitted. The present invention is not limited to the embodiment described here.

<< First Embodiment >>
The first embodiment is a position / orientation detection apparatus including an imaging unit. The position / orientation detection apparatus according to the present embodiment uses the image acquired by the imaging unit to detect its own position and orientation including the imaging unit.

First, the functional configuration of the position / orientation detection apparatus 100 of this embodiment will be described. FIG. 1 is a functional block diagram of a position / orientation detection apparatus 100 according to the present embodiment. As shown in the figure, the position / orientation detection apparatus 100 according to the present embodiment includes a control unit 101, a photographing unit 102, a rough detection sensor 103, an evaluation object extraction unit 104, an object detection unit 105, and a direction calculation unit. 106, a position / orientation calculation unit 107, a map management unit 108, and a gateway 110.

Further, the position / orientation detection apparatus 100 of the present embodiment includes a captured image holding unit 121 that temporarily holds information, a positioning information holding unit 122, a two-dimensional map information holding unit 123, a map information holding unit 124, Is provided.

The control unit 101 monitors the operation status of each part of the position / orientation detection apparatus 100 and controls the entire position / orientation detection apparatus 100. For example, it is composed of a circuit or the like. As will be described later, the CPU may execute the program stored in advance.

The photographing unit 102 photographs a scene in the real space (actual scene) and acquires a photographed image. The captured image is stored in the captured image holding unit 121. In the present embodiment, a shooting range including at least two or more objects is shot.

The configuration of the photographing unit 102 is shown in FIG. As shown in the figure, the photographing unit 102 includes a lens 131 that is an image forming optical system, and an image sensor 132 that converts the formed image into an electric signal. The imaging device 132 is a CMOS (metal-oxide-semiconductor) image sensor, a CCD (Charge-Coupled Device) image sensor, or the like. Reference numeral 133 denotes an optical axis of the lens.

The photographed image holding unit 121 may hold a plurality of photographed images as necessary. This is to extract temporal changes and to select a captured image with a good shooting state. In order to provide versatility, data obtained by performing various types of image processing on the acquired data may be stored as a captured image instead of the data itself acquired by the imaging unit 102. The image processing performed here is, for example, removal of lens distortion, adjustment of color and brightness, and the like.

The coarse detection sensor 103 detects the position and orientation in the real space of the position / orientation detection apparatus 100 including the imaging unit 102 with coarse accuracy, and stores the detected position and orientation in the positioning information holding unit 122 as positioning information.

For detecting the position, for example, GPS (Global Positioning System) is used. When using GPS, the coarse detection sensor 103 is a GPS receiver.

* For example, an electronic compass is used to detect the posture. The electronic compass is composed of a combination of two or more magnetic sensors. If the magnetic sensor is compatible with three axes, direction detection in a three-dimensional space is possible. If the measurement surface is limited to a horizontal plane, a biaxial magnetic sensor may be used and an electronic compass with reduced cost may be used.

Note that the positioning information held by the positioning information holding unit 122 is not limited to the positioning information obtained by the coarse detection sensor 103. For example, it may be positioning information obtained by a position / orientation calculation unit 107, which will be described later, or both. The positioning information is used for extraction of two-dimensional map information and map information described later, calculation of position and orientation, and the like.

The map management unit 108 acquires two-dimensional map information and map information in a range necessary and sufficient for processing by each unit of the position / orientation detection apparatus 100 including the visual field range of the photographing unit 102, and acquires the two-dimensional map information holding unit 123 and the map Each information is stored in the information holding unit 124.

The acquisition is performed via the gateway 110 via a network or the like from a server or storage device that holds such information, for example. The acquisition range is determined based on the positioning information.

The 2D map information held in the 2D map information holding unit 123 is object information of each object included in a predetermined area. The object information includes the position (position in the map), shape (appearance), feature point, and the like of each object in the area.

2D map information is created from images taken with a camera, for example. At the time of shooting, position information (map absolute position) of the shooting range is simultaneously acquired as information for specifying a predetermined area. Then, the photographed image is analyzed, and the object and its feature point are extracted. Then, the pixel position in the image of the extracted object is specified and set as the map position. The appearance shape of the object is acquired by using, for example, Google Street View.

The original shooting range is a two-dimensional map, and for each two-dimensional map, the map absolute position of the shooting range is associated with the position, shape, and feature point of each object in the area, and the two-dimensional map. Information. Since each two-dimensional map information has a map absolute position, the image can be deformed in accordance with the optical characteristics of the photographing unit 102.

Hereinafter, an object registered in the two-dimensional map information is referred to as a registered object. Note that the two-dimensional map information may further include attribute information for each registered object. The attribute information includes, for example, the type of registered object. The type of registered object is, for example, a building, a road, a signboard, or the like.

Note that the two-dimensional map information may be acquired by analyzing an image obtained by removing distortion based on information related to the camera and modulating the color and brightness of the photographed image.

In the map information held in the map information holding unit 124, the coordinates or addresses of each object in the real space are registered as map information. As the map information, for example, a Google map of Google Inc. can be used. In order to reduce the amount of data, it is desirable to use a list in which an object is associated with the coordinates of the object. For the coordinates, for example, latitude and longitude are used. Also, height information may be included. In this case, three-dimensional position measurement is possible. Further, a local coordinate system based on any actual location may be used. In this case, a data structure in which the position measurement accuracy is increased by specializing in a limited area may be used.

The evaluation object extraction unit 104 extracts object candidates to be processed (hereinafter referred to as evaluation object candidates) from each registered object registered in the two-dimensional map information. For example, all registered objects in the two-dimensional map information may be set as evaluation object candidates. When object attribute information is stored, a registered object whose attribute information matches a predetermined condition may be extracted as an evaluation object candidate. For example, only building objects are extracted.

As the 2D map information, information held in the 2D map information holding unit 123 is used. Here, the minimum two-dimensional map group is stored in the two-dimensional map information storage unit 123. However, at the time of extraction, the evaluation object extraction unit 104 may further narrow down 2D map information for extracting evaluation object candidates using the positioning information. For example, using the positioning information, the visual field range of the photographing unit 102 in the real space is calculated. Then, only the two-dimensional map information that matches the visual field range is scanned to extract evaluation object candidates.

The object detection unit 105 identifies the position (pixel position) of each extracted evaluation object candidate in the captured image. An evaluation object candidate whose position is specified in the captured image is set as an evaluation object. Here, the positions of at least two evaluation objects are specified. The position in the captured image is specified by pattern matching. A template image used for pattern matching is created using shape information of two-dimensional map information.

Also, the object detection unit 105 calculates the distance in the horizontal direction (horizontal distance) from the photographed image origin of each evaluation object using the specified pixel position.

The direction calculation unit 106 calculates the direction (object direction) of each evaluation object. As the object direction, for example, an angle from the direction of the optical axis 133 of the lens 131 of the photographing unit 102 is obtained. The angle is calculated using the pixel position calculated by the object detection unit 105, the horizontal distance, the map information of the evaluation object, and the focal length of the lens 131 of the photographing unit 102.

At this time, the direction calculation unit 106 may correct the angle error due to distortion of the lens 131 and calculate the direction. The angle error due to distortion is calculated from the relationship between the angle of view of the lens 131 and the image height. The relationship between the angle of view and the image height necessary for the calculation is acquired in advance.

The position / orientation calculation unit 107 calculates the position and orientation of the photographing unit 102 using the object direction of each evaluation object calculated by the direction calculation unit 106 and the map information. The position to be calculated is a coordinate in the same coordinate system as the map information. Further, the calculated posture is the direction of the optical axis 133. The posture can be represented by a direction angle, an elevation angle, a pitch angle, a yaw angle, and a low angle. In addition, you may use positioning information for calculation of a position and an attitude | position.

The gateway 110 is a communication interface. The position / orientation detection apparatus 100 transmits / receives data to / from, for example, a server connected to a network via the gateway 110. In the present embodiment, for example, two-dimensional map information and map information are downloaded from a server connected to the Internet, or the generated two-dimensional map information is uploaded to the server as will be described later.

Here, FIG. 3A shows an example of the configuration of the server (map server) system 620 from which the two-dimensional map information is acquired. As shown in the figure, the map server system 620 includes a map server 621 that controls operations, a map information storage unit 622 that stores map information, a 2D map information storage unit 623 that stores 2D map information, And a communication I / F 625.

The map server 621 receives a request from the map management unit 108, and transmits map information and two-dimensional map information in the requested range from each storage unit to the request source.

Note that, for example, information on the actual position, posture, and shape of each object 624 included in the two-dimensional map information may be held independently. In this case, each object 624 is held in association with the two-dimensional map information including the object 624.

The map server 621 has a function of extracting feature points of the object 624, analyzing the two-dimensional map information 210 transmitted from the position / orientation detection apparatus 100, extracting an object, and registering the two-dimensional map information 210 as necessary. May be.

Note that the map server system 620 that manages the map information and the two-dimensional map information is not limited to one system. Such information may be divided and managed in a plurality of server systems on the network.

As shown in FIG. 2B, the position / orientation detection apparatus 100 of the present embodiment includes a CPU 141, a memory 142, a storage device 143, an input / output interface (I / F) 144, and a communication I / F 145. Is realized by an information processing apparatus. For example, the program is stored in advance in the storage device 143 by the CPU 141 loaded into the memory 142 and executed. All or some of the functions may be realized by hardware or a circuit such as ASIC (Application Specific Integrated Circuit) or FPGA (Field-programmable gate array).

Further, various data used for the processing of each function and various data generated during the processing are stored in the memory 142 or the storage device 143. The captured image holding unit 121, the positioning information holding unit 122, the two-dimensional map information holding unit 123, and the map information holding unit 124 are constructed in, for example, the memory 142 provided in the position and orientation detection device 100.

Each holding unit may be realized by a memory 142 provided separately, or may be configured by a single memory 142 that is partially or entirely integrated. You may divide according to required capacity and speed. Note that the two-dimensional map information is particularly used in processing for extracting an object. For this reason, it is desirable to construct the two-dimensional map information holding unit 123 that holds this information in a memory area that can be accessed at a relatively high speed.

Next, the details and flow of the position / orientation detection process of the photographing unit 102 by each unit of the position / orientation detection apparatus 100 will be described. 4 (a) to 4 (f) are diagrams for explaining the position detection processing of the present embodiment, and FIG. 5 is a processing flow.

In this embodiment, initial processing, shooting processing, object detection processing, direction calculation processing, and position / posture calculation processing are performed in this order.

In the initial processing, the rough detection sensor 103 acquires positioning information as the approximate position and orientation of the photographing unit 102, and acquires two-dimensional map information and map information used for processing.

Specifically, first, the rough detection sensor 103 acquires the rough position and the rough posture of the position / orientation detection apparatus 100 as positioning information (step S1101). The acquired positioning information is stored in the positioning information holding unit 122.

Next, the map management unit 108 acquires the two-dimensional map information 210 and the map information 230 necessary for processing, and registers them in the two-dimensional map information holding unit 123 and the map information holding unit 124, respectively (step S1102). Here, the map management unit 108 calculates coordinates of the visual field range of the imaging unit 102 using the positioning information. Then, two-dimensional map information corresponding to the coordinates calculated by the map absolute position is acquired. Also, map information corresponding to the calculated coordinates is acquired.

Next, in the photographing process, the photographing unit 102 photographs an actual scene including two or more objects, and obtains a photographed image 220 (step S1103).

Next, object detection processing is performed. Here, the object is detected by specifying the pixel position of the evaluation object in the captured image 220.

Specifically, the evaluation object extraction unit 104 accesses the two-dimensional map information holding unit 123 and acquires the two-dimensional map information 210. Then, as shown in FIG. 4A, registered objects in the acquired two-dimensional map information 210 are extracted (step S1104) and set as evaluation object

candidates

211, 212, and 213.

At this time, all registered objects may be extracted as evaluation object candidates, or evaluation object candidates may be selected and extracted from all registered objects according to a predetermined condition. FIG. 4A illustrates a case where three

evaluation object candidates

211, 212, and 213 are extracted.

Next, the object detection unit 105 generates a template image for each extracted

evaluation object candidate

211, 212, 213 (step S1106). Then, the captured image 220 is scanned with the generated template image, and an evaluation object is specified in the captured image 220 (step S1107). The object detection unit 105 identifies the evaluation object in the captured image 220 for each of the extracted

evaluation object candidates

211, 212, and 213, and repeats until at least two evaluation objects are detected (step S1105).

FIG. 4B shows an example in which two

evaluation objects

221 and 222 are detected in the captured image 220. Here, it is assumed that the evaluation object 221 corresponds to the evaluation object candidate 211, and the evaluation object 222 corresponds to the evaluation object candidate 212, and is detected. Assume that no evaluation object corresponding to the evaluation object candidate 213 has been detected. In the present embodiment, it is sufficient that two evaluation objects can be detected, and thus the detection process is terminated when two evaluation objects can be detected.

When the object detection unit 105 detects the evaluation objects 221 and 222 in the captured image 220, the object detection unit 105 calculates the horizontal distances PdA and PdB of the evaluation objects 221 and 222 from the origin of the captured image 220, respectively (step S1108).

The horizontal distances PdA and PdB are the number of pixels on the image sensor 132. The origin of the captured image 220 indicated by a black dot in FIG. 4B is the point of the image sensor 132 that coincides with the direction in which the image capturing unit 102 faces, that is, the center of the optical axis 133 of the lens 131. The horizontal distance is a horizontal distance between the representative point in each

evaluation object

221 and 222 and the origin of the captured image. The representative point is, for example, a rectangular center point that constitutes the evaluation objects 221 and 222. The midpoint of the shape associated with each of the evaluation objects 221 and 222 may be used as the representative point. Details of how to determine the representative points will be described later.

Then, the object detection unit 105 refers to the map information 230 of FIG. 4C and acquires the map information of the evaluation objects 221 and 222 (step S1109). Here, the object detection unit 105 uses the map absolute position of the two-dimensional map information 210 that is the basis of the evaluation objects 221 and 222 and the map position of the

evaluation object candidates

211 and 212 to generate map information. The objects (real objects) 231 and 232 in 230 are associated with each other. Then, map information ((XA, YA), (XB, YB)) of the associated

real objects

231 and 232 is acquired.

In FIG. 4C, a two-dimensional map is illustrated for convenience, but a three-dimensional map may be used.

Next, the direction calculation unit 106 performs a direction calculation process. That is, the direction calculation unit 106 calculates the object direction of each evaluation object 221 and 222 (step S1110). As shown in FIGS. 4D and 4E, the direction calculation unit 106 calculates angles θA and θB with respect to the optical axis direction of the lens 131 of the photographing unit 102 as object directions, respectively.

FIG. 4D is a diagram for explaining a method of estimating the existence direction of the photographing unit 102 using the horizontal distance PdA in the photographed image 220 of the evaluation object 221 corresponding to the real object 231. FIG. 4E is a diagram for explaining a method of estimating the object direction using the horizontal distance PdB of the evaluation object 221 corresponding to the real object 232.

The image of the real object 231 is formed on the image sensor 132 through the lens 131. Therefore, the angle θA formed by the real object 231 and the optical axis 133 of the lens 131 of the photographing unit 102 is calculated geometrically and optically by the following expression (1) using the position PdA on the image sensor.

Here, f is the focal length of the lens 131.

Similarly, an angle θB formed by the real object 232 and the optical axis 133 of the lens 131 of the photographing unit 102 shown in FIG. 4E is similarly calculated by the following equation (2).

Note that if the lens 131 is distorted, errors are added to the calculated angles θA and θB. For this reason, it is desirable to use the lens 131 with little distortion. In addition, it is desirable that the relationship between the angle of view of the lens 131 and the image height is acquired in advance, and the angle error due to distortion of the lens 131 is corrected.

Finally, the position / orientation calculation unit 107 performs position / orientation calculation processing. Here, the map information A (XA, YA) and B (XB, YB) of the matching

evaluation objects

221 and 222 and the object directions θA and θB are respectively referred to, and the existence range of the photographing unit 102 in the real space is determined. Ask for each. And the position and attitude | position in the real space of the imaging | photography part 102 are specified from several existing ranges (step S1111). The position to be calculated is map information of the photographing unit 102. The calculated posture is the direction of the optical axis 133 of the lens 131.

These calculation methods will be described with reference to FIG. The optical axis 133 of the lens 131 forms an angle θA with the real object 231 and forms an angle θB with the real object 232. The trajectory of the position that satisfies these two conditions is a trajectory of a point having a constant circumferential angle, and becomes a circle 241 passing through the real object 231 and the real object 232.

The radius R of the circle 241 is calculated by the following equation (3) using the positions A (XA, YA) and B (XB, YB) of the real object 231 and the real object 232 in the real space.

Note that, from the relationship of θA and θB with respect to the optical axis 133, the locus where the imaging unit 102 exists can be specified as an arc AB indicated by a solid line of a circle 241. On the arc AB indicated by a broken line, the relationship between θA and θB is reversed left and right. For this reason, it does not hold as a locus where the photographing unit 102 exists.

Next, the position / orientation calculation unit 107 specifies the direction of the optical axis 133 of the lens 131 of the photographing unit 102 using the direction detected by the rough detection sensor 103. When the direction of the optical axis 133 is specified, the position of the imaging unit 102 existing on the arc AB is uniquely determined.

Note that the detected position of the imaging unit 102 is the principal point of the lens 131 provided in the imaging unit 102. Of course, the detection position is not limited to the principal point. Any position in the photographing unit 102 may be used.

As described above, the position and orientation of the photographing unit 102 can be detected. The position / orientation detection apparatus 100 according to the present embodiment repeats the above processing at predetermined time intervals, and always detects the latest position and orientation of the imaging unit 102. In the above processing flow, the captured image acquisition process in step S1103 may be the first. In this case, the position / orientation detection apparatus 100 starts the above process when the photographing unit 102 acquires a photographed image.

In the above processing flow, the case where the evaluation object candidate 211 and the evaluation object candidate 212 are registered in one two-dimensional map information 210 has been described as an example of the evaluation object. However, each

evaluation object candidate

211 and 212 may be registered in another two-dimensional map information 210.

Next, pattern matching processing using a template image performed when specifying the evaluation objects 221 and 222 will be described with reference to FIGS. 6 (a) to 6 (c). 6A shows a template image 310, and FIG. 6B shows a captured image 220.

The pattern matching process is a process for determining whether or not there is the same image as the template image in a certain image. If there is the same image, the pixel position of the image can be specified.

The shape of the predetermined surface of the object changes depending on the direction in which the photographing unit 102 photographs. Therefore, the object detection unit 105 generates a plurality of template images 311 to 316 having different inclinations and sizes as shown in FIG. Here, a case where six types of template images 311 to 316 are generated is illustrated. If there is no need to distinguish, the template image 311 is used as a representative.

Next, the object detection unit 105 scans the captured image 220 in the direction indicated by the arrow in FIG. 6B for each generated template image 311 and evaluates the degree of similarity.

As an example of a method for comparing the similarity of images, there is a method of taking a difference and evaluating the histogram. In this method, 0 is obtained as a difference if they completely match, and a value far from 0 is obtained as the degree of matching decreases. While changing the area to which the template image 311 is applied in the captured image 220, the evaluation result of the similarity between the template image 311 and the applied area in the captured image 220 is recorded. This is repeated for each of the template images 311 to 316. The entire area of the captured image 220 is evaluated with respect to all prepared template images 311, and an area with the smallest difference is determined as a matching area.

6B, a region 225 in the captured image 220 is a region where the similarity evaluation with the template image 311 is maximized. Therefore, the object detection unit 105 determines that an evaluation object exists in the captured image 220, and specifies the region 225 as the region (pixel position) of the evaluation object.

When this method is used, the object detection unit 105 can simultaneously acquire the pixel position of the evaluation object from the evaluation result. Using this, the horizontal distance PdA can be determined.

Note that when the orientation of the evaluation object is specified in advance and the template image to be used is specified, one template image is extracted and only that is necessary. Further, when the approximate existence area of the evaluation object in the captured image 220 is known, the evaluation object may be performed only for the area.

Also, if the two-dimensional map information 210 that is the basis for creating the template image 311 and the captured image 220 are exactly the same image, the matching region, that is, the pixel position in the captured image of the evaluation object can be easily specified. However, if the shadow conditions of the two are different, the images cannot be exactly the same.

Therefore, prior to template matching, countermeasures related to image information related to brightness and color and countermeasures related to object distortion depending on the shooting direction may be applied to the captured image. The distortion of the object due to the photographing direction means, for example, a deformation in which a three-dimensional space has a rectangular plane when an observer is facing the object, and looks almost trapezoid when viewed from an oblique direction.

In the measures relating to image information, for example, the two-dimensional map information 210 and the captured image 220 are converted to gray scale, and the brightness is made uniform from the respective histograms.

Further, as a countermeasure for distortion of the object, for example, as shown in FIG. 6A, when generating the template image 311 from the object, it is virtually deformed three-dimensionally. In other words, in order to detect an object even if it is deformed as described above, shape conversion is performed on the object so that a rectangle becomes a trapezoid, and template images 311 to 316 are obtained.

Before the template matching, the shooting direction of the object in the shot image 220 cannot be specified. Therefore, a template image is generated by performing shape conversion corresponding to each of the assumed shooting directions, and template matching is performed. Thereby, it can respond to arbitrary imaging directions.

Also, as another countermeasure corresponding to the difference in shooting conditions, a template image is generated by applying deformation such as rotation and enlargement / reduction to the object. Thereby, it is possible to eliminate influences such as a difference in object size due to a difference in shooting distance, a difference in rotation with respect to the optical axis of the camera used for shooting, and the like. Thereby, matching performance can be improved.

The pattern matching process described above is a general technique and has an advantage that it can be processed at high speed.

Next, how to determine the representative points of the evaluation objects 221 and 222 when calculating the horizontal distances PdA and PdB will be described. In the present embodiment, the determination is made within the template image used when specifying the evaluation objects 221 and 222.

7 (a) to 7 (d) show template images 321 to 325, respectively. White circles in the template images 321 to 325 indicate the representative points 331.

As shown in FIG. 7A, in the case of the template image 321 in which the object faces the front, the center of the outer shape is set as the representative point 331. In the case of the

template images

311 and 313 in which the object is inclined as shown in FIGS. 7B and 7C, a rectangle indicated by a broken line is defined from the outermost shape, and the center of the rectangle is set as the representative point 331. .

In the case of the template image 324 of an object having a partial protrusion, a rectangle indicated by a broken line from the outermost shape may be defined as shown in FIG. 7D, and the center of the rectangle may be set as the representative point 331. Alternatively, as shown in FIG. 7E, a part of the rectangle may be defined by a rectangle, and the center of the rectangle may be used as the representative point 331. Further, in the case of a template image 325 of a triangular object, the outermost shape is defined and the center of the rectangle is set as a representative point 331 as shown in FIG.

As described above, the method of defining a rectangle from the outermost shape and setting the center to the representative point 331 is simple and most desirable. Of course, if pattern matching processing is possible, the outermost diameter may not be used as shown in FIG. For example, any corner of the object may be used.

As described above, the representative point 331 is used when calculating the horizontal distance from the origin of the evaluation object in the position and orientation detection process. For this reason, it is desirable that the representative point 331 is configured so as to be accurately associated with the map information.

As described above, the position / orientation detection apparatus 100 according to the present embodiment captures a predetermined photographing range including two or more objects, and the object in the photographed image photographed by the photographing unit 102. An object detection unit 105 that identifies each pixel position; an object direction that is a direction of each of the objects with respect to the shooting unit 102; a pixel position; map information of each of the objects; a focal length of the shooting unit; And a position / orientation calculation unit 107 that calculates the position and orientation of the photographing unit using the object direction of each object and the map information. Then, the object detection unit 105 extracts the two-dimensional map information corresponding to the shooting range from the two-dimensional map information in which the information on the positions and shapes of a plurality of objects in a predetermined area is stored. The pixel position is specified using dimension map information.

As described above, according to the present embodiment, the position and orientation of the photographing unit 102 are calculated using an image actually taken by the photographing unit 102. At this time, two objects are extracted from the captured image and calculated using them. For this reason, image processing using a three-dimensional map or a three-dimensional model corresponding to the real space is unnecessary. Moreover, it does not depend on the accuracy of an external GPS or the like. Therefore, according to the present embodiment, the position and orientation of the photographing unit 102 can be detected with high accuracy with a simple configuration and a small amount of information processing. Therefore, the position and orientation of various devices whose relative position and relative direction with respect to the imaging unit 102 are known can be estimated with high accuracy.

According to this embodiment, the position and orientation of the photographing unit 102 included in the computer can be obtained by a pattern matching process that the computer is good at and simple geometric calculation. That is, the position / orientation detection apparatus 100 that detects its own position and orientation can be realized with a small amount of information.

Further, according to the present embodiment, the position and orientation detection by the above method is repeated at predetermined time intervals. Accordingly, the position and orientation of the position / orientation detection apparatus 100 including the imaging unit 102 can be identified with high accuracy and constantly with a small amount of information.

In the above embodiment, the position / orientation calculation unit 107 calculates the position and orientation of the photographing unit 102 using the two

evaluation objects

221 and 222. However, the calculation of the position and orientation by the position and orientation calculation unit 107 is not limited to this method. For example, three evaluation objects may be used.

A procedure for detecting the position and orientation of the photographing unit 102 using the three evaluation objects will be described with reference to FIG.

In the same procedure as described above, the map information ((XA, YA), (XB, YB), (XC, YC)) of the

real objects

231, 232, 233 corresponding to the three evaluation objects, and the light of the lens 131 The angles (θA, θB, θC) with respect to the axis 133 are acquired.

From the information of the real object 231 and the real object 232, the circle 241 where the photographing unit 102 exists is determined. Similarly, a circle 242 where the photographing unit 102 exists is determined from information on the real object 232 and the real object 233. The intersection of both the

circles

241 and 242 is the position of the photographing unit 102, and the direction satisfying the angle of the lens 131 with respect to the optical axis 133 is the direction of the photographing unit 102, that is, the posture.

If the three evaluation objects are used in this way, the position and orientation of the photographing unit 102 can be obtained only from the detection result of the evaluation object.

Note that when three evaluation objects on the same circle are selected, the position of the photographing unit 102 cannot be specified by the above method. Therefore, in this case, three evaluation objects that are not on the same circle are selected.

Further, in order to obtain the position of the photographing unit 102 with high accuracy, it is desirable that the circle 241 and the circle 242 be as far apart as possible. Therefore, for example, in the example of FIG. 8A, the position of the photographing unit 102 is determined using the circle 241 and the circle 242 instead of the circle passing through the circle 241, the real object 231 and the real object 233. It is desirable to do.

Note that the number of evaluation objects used when specifying the position and orientation of the imaging unit 102 is not limited to three. Three or more may be sufficient.

Suppose that the number of evaluation objects to be used is N (N is an integer of 1 or more). In this case, a maximum of N (N−1) / 2 trajectory arcs where the photographing unit 102 may exist is obtained. Since all of these trajectories pass through the position of the photographing unit 102, the position of the photographing unit can be obtained by obtaining the intersection of a plurality of arcs. Since the intersection of the arcs passes through the circular chord forming the arc, it is equivalent to finding the intersection of the straight lines corresponding to the chord. That is, if there are two or more strings, it is sufficient to solve a linear equation for obtaining the intersection of the straight lines instead of a quadratic equation for obtaining the intersection of the circle and the straight line. For this reason, the amount of calculation can be reduced.

If there are 3 or more straight lines, 2 or more intersections can be obtained. If there is a point where they all match, that is the position of the photographing unit 102. In reality, all may not match due to matching accuracy and errors in rounding to pixel coordinates. In this case, the candidate point after the exclusion of an apparent noise point is subjected to statistical processing from the obtained candidate point of the position of the photographing unit 102 and determined. The statistical processing to be performed is the most frequent point, the average point, and the like. When selecting the most frequent point, new coordinate calculation is not required. When selecting the average position, it is a simple calculation and the processing speed is fast. Using the simple average will minimize the variation for the true solution. It is desirable to select according to the required accuracy and processing speed.

Thus, with this configuration, the position and orientation of the photographing unit 102 can be specified if three or more objects can be detected. In this case, since the position and orientation can be specified without using the positioning information from the coarse detection sensor 103, the position and orientation detection accuracy is further improved.

Further, the position / orientation calculation unit 107 may calculate the position and orientation of the photographing unit 102 using two evaluation objects and information about the traffic infrastructure around the photographing unit 102, for example. This method will be described with reference to FIG. Here, the case where the information on the road 234 is used as the information on the traffic infrastructure will be described as an example. In addition, the photographing unit 102 is assumed to be mounted on a moving body traveling on the road 234.

The shape of the traffic infrastructure such as the road 234 is defined by the standard. By using the elements constituting the road 234 as object information, the posture can be detected with high accuracy. That is, with the existing method, the imaging unit 102 can recognize the road 234 on which the mounted moving body is traveling, and obtain the traveling direction (the optical axis direction of the imaging unit 102). The obtained information on the optical axis direction is used in place of the result of the rough detection sensor 103 of the above embodiment, and the position and orientation of the photographing unit 102 are determined.

In addition, although the case where the road 234 was used as information of traffic infrastructure was mentioned as an example, it is not limited to this. For example, route information such as bridges and intersections and position information such as signs and traffic lights may be used.

Alternatively, the position and orientation of the photographing unit 102 may be detected using the two-dimensional map information 210 using the road 234 itself as an evaluation object. Thereby, the evaluation object to be used can be reduced.

According to this method, the position and orientation can be specified without using the orientation information of the coarse detection sensor 103. For this reason, accuracy can be improved.

Next, a position / orientation calculation method using deformation of an object will be described as another modification. FIG. 9A to FIG. 9C are diagrams for explaining this modification. Here, two or more evaluation objects are used.

For example, it is assumed that the appearance of the real object 231 corresponding to the evaluation object 221 has a rectangular shape 251 as shown in FIG. Further, the external shape obtained from the captured image 220 is a trapezoid 252 as shown in FIG. In this case, as shown in FIG. 9A, the positional relationship between the real object 231 and the photographing unit 102 in the real space can be specified from the deformation amount.

When pattern matching the evaluation object in the captured image, the object detection unit 105 generates a template image by adding deformation parameters such as scaling, deformation, rotation, and distortion based on the shape 251 viewed from the front.

The evaluation object 221 is pattern-matched with the generated template image. A normal line (indicated by an arrow in the figure) of the front shape 261 of the real object 231 corresponding to the evaluation object 221 is obtained using the deformation parameter of the template image used for pattern matching.

In this way, if the shape 251 of the evaluation object 221 when viewed from the front in real space is known, the direction of the normal of the front shape 261 that is a plane is determined from the deformation amount of the shape in the captured image 220. Can be sought. Then, using this, the direction of the photographing unit 102 (direction of the optical axis 133) can be obtained.

When the evaluation object 221 is a building, the direction of the building surface in the real space may be obtained from the map information 230 or the like.

手法 According to the method of this modification, it is possible to obtain the position and orientation even if the number of extracted objects is two. As described above, the position and orientation detection method according to the present modification does not directly use the values of the coarse detection sensor 103 such as the GPS or the electronic compass, and therefore the position and orientation are not affected by the accuracy of the coarse detection sensor 103. Can be obtained with high accuracy.

It should be noted that each position and orientation calculation method described above is preferably selected according to the required accuracy.

In the above embodiment and each modification, the case where the position and orientation of the photographing unit 102 are calculated using the horizontal distance of each evaluation object has been described. Here, the vertical distance may also be measured and the height direction of the object may be estimated.

Prior to the evaluation object extraction process, the map management unit 108 uses the positioning information acquired by the rough detection sensor 103 to extract two-dimensional map information and map information necessary for the process, and stores them in the holding unit. Yes. However, it is not limited to this.

For example, once the position / orientation calculation unit 107 calculates the position and orientation of the photographing unit 102, the information may be used to extract necessary two-dimensional map information and map information. Thereby, the map management unit 108 can extract necessary and sufficient two-dimensional map information with higher accuracy, and the processing accuracy is improved. In addition, since there is no need to always acquire position information from the rough detection sensor 103, the position and orientation can be calculated even in a place where a signal from the GPS cannot be received.

The position / orientation detection apparatus 100 according to the present embodiment may further include a two-dimensional map generation unit 109.

The two-dimensional map generation unit 109 analyzes the captured image 220 acquired by the imaging unit 102 using the position and orientation of the imaging unit 102, and generates two-dimensional map information 210. At this time, the pixel position of the object in the captured image 220 and the appearance of the object are used. That is, the pixel position of each object detected by the object detection unit 105 is set as a map position, and the shape used for pattern matching is set as an object shape.

Since the captured image 220 is two-dimensional information, the two-dimensional map information 210 may be used as it is. The two-dimensional map generation unit 109 associates the position of the object in the real space with the appearance of the photographed object, and generates two-dimensional map information.

By having the two-dimensional map generation unit 109, an unknown object that is not registered in the two-dimensional map, for example, a newly constructed building, is newly registered as two-dimensional map information by associating the appearance with the position information. be able to.

In the case of registering an unknown object included in the captured image 220 in the two-dimensional map information 210, not only pattern matching but also a technique such as artificial intelligence or deep learning may be used. Non-known objects include, for example, buildings that are not registered in the two-dimensional map information 210 and the map information 230, and buildings that have been renovated and have changed appearance.

In this case, for example, a plurality of two-dimensional map information 210 including the same object is acquired. Then, learning is performed using the information stored in the server through the network and the two-dimensional map information 210 stored in the two-dimensional map information holding unit 123 to calculate the feature points of the object. The result is used for object pattern matching. If it is the method of matching by a feature point, even if it is a case where a shield exists between an object and the imaging | photography part 102, the success rate of a matching can be raised.

Further, the position of the object may be specified using a plurality of captured images 220 having different acquisition times. Using the position of the target object in the captured image 220 and the position and orientation of the imaging unit 102, it is possible to specify an area where an object in real space may exist. The specified area is a straight line. When the position / orientation detection apparatus 100 moves, if a similar process is performed after a predetermined time, another straight line is obtained as the object existence region. The intersection of the two straight lines obtained by the two processes is the position where the object actually exists (position in real space, latitude, longitude, coordinates, etc.). Note that the position of the object candidate in the real space may be calculated from the change in the position and orientation of the photographing unit 102 and the change in the horizontal distance of the object candidate.

2D map information 210 can be updated in real time by additionally registering the obtained object information in the existing 2D map information 210.

Note that the appearance can be updated if the object is already registered in the two-dimensional map and the position is known. If the object moves, it can be updated to the latest position information.

In the two-dimensional map generation unit 109, the position / orientation detection apparatus 100 obtains the position of the object in the real space using information used for detecting the position / orientation. Therefore, there is no need to calculate again to generate the two-dimensional map information. For this reason, it is possible to keep the two-dimensional map information up-to-date while reducing the calculation cost.

In generating 2D map information, a 3D map data model may be used. In the case of map data indicating the actual state of a building by its position and height, the shape of the building in the real world is known. In order to correspond the appearance data of the side surface of the building, an image taken from the side surface may be pasted on map data having a three-dimensional shape to obtain two-dimensional map information. For example, Google Street View can be used for images taken from the side.

Also, when generating 2D map information, an image taken from the side may be pasted on 3D map data, and a virtual camera may be placed in the map data model for CG rendering. With such a method, two-dimensional map information can be efficiently generated using existing map data.

<< Second Embodiment >>
Next, a second embodiment to which the present invention is applied will be described. The present embodiment is an AR display device 500 including the position and orientation detection device 100 of the first embodiment. Hereinafter, in the present embodiment, a case where the position / orientation detection apparatus 100 is mounted on an automobile and AR display is performed on the windshield of the automobile will be described as an example.

FIG. 10 is a functional block diagram of the AR display device 500 of the present embodiment. As shown in the figure, the AR display device 500 of the present embodiment includes a position / orientation detection device 100, a display content generation unit 501, a display content selection unit 502, a superimposition unit 503, a display unit 504, and instruction reception. A unit 506 and an extraction unit 507. Here, the case where the control unit 101 and the gateway 110 are shared with the position / orientation detection apparatus 100 will be described as an example.

Also, a content holding unit 511 is provided. The content holding unit 511 is a memory that temporarily holds content including AR content, and is configured by a memory that can be accessed at high speed. Hereinafter, the normal content and the AR content are collectively referred to as content unless it is particularly necessary to distinguish them.

The position / orientation detection apparatus 100 basically has the same configuration as the position / orientation detection apparatus 100 of the first embodiment. However, in the position / orientation detection apparatus 100 of the present embodiment, the object detection unit 105 may be configured to detect and hold the pixel positions of all the evaluation objects extracted by the evaluation object extraction unit 104. This information is used when the superimposing unit 503, which will be described later, determines the position where the content is superimposed. In addition, the control unit 101 of the position / orientation detection apparatus 100 controls the operation of each unit of the entire AR display device 500. The gateway 110 functions as a communication interface for the AR display device 500.

The instruction receiving unit 506 receives an operation input from the user (driver) 530. In the present embodiment, for example, selection of content or a condition of content to be displayed is received. The reception is performed via an existing operation device such as an operation button, a touch panel, a keyboard, or a mouse. Moreover, you may provide the apparatus which enables the voice input, the instruction | indication by blinking, etc.

The extraction unit 507 acquires content that may be used in the processing of the AR display device 500 from the content stored in the server or other storage device. The acquired content is stored in the content holding unit 511. Since the information processed by the AR display device 500 is mainly peripheral information, the processing cost is reduced by acquiring and processing information limited to information that may be processed.

The content to be acquired may be determined in consideration of information such as the traveling direction and speed of the vehicle on which the AR display device 500 is mounted. For example, when the speed of the moving object is v and the time required to download and hold the information in the holding unit is T, information in at least a circle with a radius Tv centered on the photographing unit 102 is acquired. Stored in the content holding unit 511. Further, when the travel route of the mounted vehicle is specified, information around the route may be extracted and stored in the content holding unit 511.

Next, an example of the configuration of the acquisition source server (content server) of the content held in the content holding unit 511 will be described.

FIG. 3B is a diagram for explaining the content server system 610. In the present embodiment, for example, advertisements, contents, and AR contents are held and provided as information.

The content server system 610 includes a content server 611, a content storage unit 612, an AR content storage unit 613, an advertisement storage unit 614, and a communication I / F 615.

The advertisement is the text, image, video, etc. of the advertisement provided by the advertiser. The content is information according to the service content. For example, a moving image for entertainment, a game, etc. are mentioned. The AR content is content that is assumed to be AR-superposed, and includes meta information such as a display position and a posture in the real space in addition to the content itself. The meta information may be defined in advance or may be dynamically updated according to an instruction from the AR display device 500.

The content server 611 outputs information held in each holding unit to the outside via the communication I / F 615 in response to an instruction received via the communication I / F 615. Further, the information received via the communication I / F 615 is held in the corresponding holding unit. When outputting, the content server 611 may integrate the above-described information and provide the AR display device 500 through a network. As an example of integrating each information, for example, an advertisement is inserted into entertainment content.

The AR content may be selected and extracted by a method similar to the method by which the position / orientation detection apparatus 100 extracts the two-dimensional map information and the map information. On the other hand, the content that the user desires to view may be instructed via the instruction receiving unit 506 and extracted according to the instruction.

The display content selection unit 502 selects the content to be displayed from the content held in the content holding unit 511 according to the content display condition. Thereafter, the display conditions may be determined in advance or may be designated by the user via the instruction receiving unit 506.

In this case, the instruction receiving unit 506 may be a motion recognition device that recognizes the user's motion.

As the motion recognition device, for example, a recognition device such as gesture recognition that detects a user's movement by a camera, voice recognition that is detected by a microphone, and gaze recognition that detects a gaze can be used. In particular, when the user is driving, the operation by hand is limited for driving. For this reason, it is desirable to have a motion recognition device that can give instructions without a hand motion, such as a line of sight or voice. A voice recognition device can handle a variety of operations because it can give relatively detailed instructions. If it is a line-of-sight recognition apparatus, even if it operates, it is hard to be perceived by others around it, and it can consider surrounding environment.

In the case of voice recognition, an operator's voice may be registered in advance so that the user who has input voice can be identified. The user may be configured to limit the contents accepted by voice recognition.

This allows the user to actively select the content to be displayed. Therefore, the content desired by the user can be appropriately selected and displayed.

The display unit 504 displays the content to be displayed on the windshield according to the instruction of the superimposing unit 503.

The display unit 504 of the present embodiment includes a projector 521 and a display (projection area) 522 as shown in FIG. The display 522 is realized by combining optical components having transparency and reflectivity, and is disposed on the windshield. The scene in real space behind the display 522 is transmitted through the display 522. On the other hand, the video (image) generated by the projector 521 is reflected by the display 522. The user 530 views an image in which a scene in real space that has passed through the display 522 and an image reflected by the display 522 are superimposed.

If the display 522 is the entire area of the windshield, the content of the user 530 who is looking forward can be covered, and the content can be displayed in a wide range over the actual scene spreading forward.

Also, if the display 522 is a separate part from the windshield, a dedicated design optimized for the AR display device 500 can be achieved. For this reason, design becomes easy. An example of such a configuration is HUD (Head-Up Display).

The display content generation unit 501 generates display content to be displayed on the windshield as a display destination from the selected content. In the present embodiment, the content is generated in a display mode suitable for superimposed display on a scene that is seen by the user's eyes through the windshield. For example, the size, color, brightness, etc. are determined. The display mode is determined in accordance with a user instruction or in accordance with a predetermined rule.

The superimposing unit 503 determines the display position on the display 522 of the display content generated by the display content generating unit 501. First, the superimposing unit 503 specifies the display position (arrangement position) of the object on the display 522. Based on the display position of the object, the display position of the related content of the object is determined. The display position of the object is calculated using the position and orientation of the image capturing unit 102 detected by the position / orientation detection apparatus 100 and the pixel position of each object on the captured image 220 captured by the image capturing unit 102.

Note that the superimposing unit 503 of the present embodiment includes a geometric relationship between a location (in-vehicle reference position) as a position reference of an automobile and the position and orientation of the photographing unit 102, an in-vehicle reference position, and a user (driver). The geometrical relationship with the average visual field range 531 of 530 is held in advance.

For example, when related content is superimposed on an object (real object) existing in the real space, the position on the display 522 corresponding to the pixel position in the captured image of the corresponding evaluation object is calculated using the correspondence relationship.

The display position of the related content may be set at the intersection of the line-of-sight direction in which the user 530 looks at the evaluation object 223 and the display 522. First, the photographing unit 102 photographs the evaluation object 223. From the photographing position of the evaluation object 223 in the photographed image, the direction in which the evaluation object 223 exists with respect to the photographing unit 102 is known.

As shown in FIG. 11A, when the distance between the user 530 and the evaluation object 223 is relatively short, the relative position of the evaluation object 223 with respect to the photographing unit 102 is obtained. By subtracting the relative position of the eye position of the user 530, which is known in advance, with respect to the imaging unit 102, a line of sight 551 for viewing the evaluation object 223 from the eye position of the user 530 is obtained. A point 552 where the line of sight 551 intersects the display 522 may be a display position of related content.

As shown in FIG. 11B, when the distance between the user 530 and the evaluation object 223 is sufficiently larger than the distance between the user 530 and the imaging unit 102, the imaging unit 102 and the evaluation object 223 The line 553 that connects the lines 551 and the line of sight 551 that looks at the evaluation object 223 from the position of the eyes of the user 530 are substantially parallel. Therefore, a point 552 that passes through the user 530 and a straight line parallel to the straight line 553 intersects the display 522 may be set as the display position of the related content.

If the display position of the related content is specified in this way, the user 530 can visually recognize that the related content is displayed superimposed on the evaluation object 223. Of course, the related content may be a display offset by a designated position with respect to the evaluation object 223.

Details of the determination of the display mode by the display content generation unit 501 and the determination of the display position by the superimposition unit 503 will be described later.

Note that the AR display device 500 is also provided with a CPU 141, a memory 142, a storage device 143, an input / output interface (I / F) 144, and a communication I / F 145, similar to the position and orientation detection device 100. It is realized with. For example, the program is stored in advance in the storage device 143 by the CPU 141 loaded into the memory 142 and executed. All or some of the functions may be realized by hardware or a circuit such as ASIC (Application Specific Integrated Circuit) or FPGA (Field-programmable gate array).

Further, various data used for the processing of each function and various data generated during the processing are stored in the memory 142 or the storage device 143. The content holding unit 511 is constructed in the memory 142 or the like, for example.

Next, the flow of AR display processing by the AR display device 500 of this embodiment will be described. FIG. 12 is a processing flow of the AR display processing of the present embodiment. The AR display process may be performed in synchronization with the position / orientation detection process performed by the position / orientation detection apparatus 100, or may be configured to be performed independently. Hereinafter, a case where the AR display process is executed every time the position and orientation detection device 100 detects the position and orientation of the photographing unit 102 will be described as an example. In addition, the instruction receiving unit 506 receives in advance the conditions (display conditions) of the content to be displayed.

First, the position / orientation detection apparatus 100 detects the position and orientation of the photographing unit 102 (step S2101). Note that the position / orientation detection apparatus 100 determines the position and orientation of the imaging unit 102 by the same method as in the first embodiment. This process is executed at predetermined time intervals.

The display content selection unit 502 selects the content to be displayed from the contents held in the content holding unit 511 (step S2102). Here, it is determined whether or not to display each content according to the display condition. Only content that matches the display conditions is displayed. The selection may be performed for each object to be superimposed, for example.

Then, the display content generation unit 501, the superimposition unit 503, and the display unit 504 repeat the following processing for all selected display contents (step S2103).

The display content generation unit 501 determines the display mode of the selected content (step S2104).

The superimposing unit 503 determines the display position of the selected display content (step S2105). At this time, the superimposing unit 503 uses the position and orientation of the imaging unit 102 detected by the position / orientation detection apparatus 100 in step S2101 and the position of each object in the captured image 220.

The display unit 504 displays the content in the display mode determined by the display content generation unit 501 at the position calculated by the superimposing unit 503 (step S2106). The above processing is repeated for all contents.

In addition, when performing the AR display processing after step S2102 independently of the position / orientation detection processing, the superimposing unit 503 displays the latest position and orientation of the photographing unit 102 at that time and the captured image 220. The display position is determined using the position information of each object.

As described above, the AR display device 500 according to the present embodiment is an AR display device that displays content in association with an object in a scene behind the display 522 on the display 522 having transparency and reflection. The position and orientation detection apparatus 100 according to the first embodiment, the display content generation unit 501 that generates the content to be displayed on the display 522, and the position and orientation of the photographing unit 102 determined by the position and orientation detection apparatus 100. And the pixel position of the object specified by the object detection unit 105, and a superimposition unit 503 for determining a display position of the generated content on the display 522, and the superimposition unit 503 on the display 522. And a display unit 504 that displays the generated content at the display position determined.

Therefore, according to the present embodiment, the position of the object is specified in the captured image acquired by the imaging unit 102, and the display position of the related content is determined using the specified position. Therefore, if only the amount of deviation between the photographing range 541 by the photographing unit 102 and the user's viewpoint is corrected, the display position of the object on the display can be accurately specified.

That is, according to the present embodiment, content can be displayed without using the relative position between the object and the AR display device 500. For this reason, according to the present embodiment, it is possible to realize a highly accurate AR superimposed display that does not include an error between the presence position of the object and the AR display device 500 and suppresses calculation cost.

Further, the position and orientation of the photographing unit 102 are acquired by the position and orientation detection apparatus 100 of the first embodiment. That is, the value of the coarse detection sensor 103 such as GPS or an electronic compass is not directly used. As a result, high accuracy can be obtained regardless of the accuracy of the coarse detection sensor 103. Therefore, it is possible to provide the AR display device 500 that can superimpose the AR with high accuracy.

In the above embodiment, the case where the display 522 has transparency has been described as an example. However, the display 522 may be opaque. In this case, the synthesized image may be displayed on the captured image 220 captured by the capturing unit 102. At this time, the brightness of the captured image 220 may be reduced, for example. As a result, an image with reduced brightness (dark image) can be overwritten on the actual scene. For this reason, for example, it is possible to express an image darker than an actual scene.

In the above embodiment, the case where the AR display device 500 is mounted on an automobile has been described as an example. However, the usage form of the AR display device 500 is not limited to this. For example, the form which a user wears and uses may be sufficient. In this case, a small AR display device 500 can be provided. An example of such a configuration is HMD (Head Mounted Display). In this case, the position / orientation detection apparatus 100 is also mounted on the HMD.

Further, a position and orientation detection apparatus 100 for detecting the position of the HMD may be provided separately from the HMD. The position / orientation detection apparatus 100 captures an HMD. Then, using the method of generating the two-dimensional map information using the HMD as an object, the position (latitude and longitude, coordinates) of the HMD in the real space is calculated. Then, the display position of the content is determined using this position information. Thereby, the content can be displayed at a desired position with higher accuracy.

As described above, since the AR display device 500 according to the present modification can detect the absolute position and orientation even when mounted on a moving body, it is highly accurate even if the scene viewed by the user constantly changes. The content can be displayed at a desired position.

The AR display device 500 can calculate the position and orientation of the photographing unit 102 with high accuracy and can superimpose AR with high accuracy. If high-precision AR superimposition becomes possible, the expression accuracy of AR content will increase and the expressive power will be enriched.

Further, the AR display device 500 does not have to include the photographing unit 102 in itself. For example, image data captured by another image capturing device (for example, a navigation device or a drive recorder in the case of an in-vehicle device) may be used as a captured image.

The AR display device 500 of the present embodiment may further include an eye tracking device 508 as shown in FIG.

The eye tracking device 508 is a device that tracks the user's line of sight. In this modification, the eye tracking device 508 is used to estimate the user's field of view and display the content in consideration of the direction of the user's line of sight. For example, the content is displayed on the display 522 in an area specified by the user's line-of-sight direction and field of view.

The superimposing unit 503 of the present embodiment calculates the position for displaying the content from the relative position between the photographing unit 102 and the display 522 and the user's line-of-sight direction and field of view calculated by the eye tracking device 508. In other words, the user's line-of-sight direction calculated by the eye tracking device 508 is used to correct the position for displaying the content. As a result, it is possible to superimpose the content as if it existed at the target position in the real space. Also, it is possible to superimpose content on a real space scene with high accuracy.

It should be noted that an object in the direction in which the user is facing in real space can be identified from the line-of-sight direction of the user detected by the eye tracking device 508 and the position and orientation of the photographing unit 102. This object is an object that the user is watching. By using this, for example, it is possible to select and display content related to the object being watched by the user.

In this case, the output of the eye tracking device 508 and the detection result of the object detection unit 105 are input to the display content selection unit 502. The display content selection unit 502 uses the user's line-of-sight direction and the pixel position of each object to identify the object that the user is watching and selects an object related to the object.

With this configuration, for example, content related to an object that the user is interested in can be displayed in association with the object without an instruction from the instruction receiving unit 506. Therefore, the user input method can be expanded by the eye tracking device 508.

Hereinafter, a display example based on the display mode and the display position determined by the display content generation unit 501 and the superimposition unit 503 will be described.

FIG. 14 (a) shows an example of the display position. Here, a building 711, a sign 712, and a guide plate 713 are illustrated as an example of an object (real object) that exists in real space. These real objects have known positions and appearances.

In the above embodiment, the superimposing unit 503 determines the display position so that the

related contents

811, 812, and 813 are displayed on the display 522 based on the pixel position of the evaluation object corresponding to each real object. For example, FIG. 14A shows an example of superimposing display on a real object.

The display content generation unit 501 and the superimposition unit 503 may classify the display content according to the meta information, and determine the display mode and the display position according to the classification result. For example, the display mode and the display position may be determined based on the update frequency, the required timing, the importance level, and the like. A display example in this case is shown in FIG.

For example, when it is determined by meta information that the content is static content that does not depend on the user's position, a distant display area (distant display area) 731 such as the sky or a display area where the bonnet can be seen (front side) The display mode and the display position are determined so as to be displayed in (display area) 732.

In addition, when it is determined that the content is information related to an object to be encountered in the future or a course direction, for example, a display mode and a display position are determined so as to be displayed in a road display area 733 that is a display area along the road. To do.

Further, when it is determined that the content is information that needs to be followed, for example, the display mode and the display position are determined so as to be displayed in the side display area 734 positioned on the side with respect to the user's traveling direction.

Further, when the content is determined to be important content, the display mode and the display position are determined so that the signal and the sign are displayed in the aerial display area 735 where the signal and the sign are not visually observed.

Further, the display mode and the display position may be determined so that the content is displayed in the display area (front display area) 736 in front of the windshield.

・ Since the car moves, the scene changes constantly. However, the far region 731 and the near region 732 are not affected by the movement and the scene and hardly change. That is, the far region 731 and the near region 732 are regions (small change regions) in which the change in the appearance of the real space with the movement of the automobile is small. When static content that does not depend on the user's position is displayed in the small change area, it is not necessary to perform image correction of the content according to the change in the scene, and the processing burden can be reduced.

Particularly, if the content has a low update frequency depending on the movement of the user, the processing by the superimposing unit 503 that calculates the display position according to the real space can be reduced. For this reason, if it comprises so that a static content may be displayed on a small change area | region, calculation cost can be held down.

Even when a plurality of AR contents are displayed, if the AR contents with a low update frequency are preferentially displayed in the small change area, the AR superimposition process can be efficiently reduced without giving the user a sense of incongruity. Thereby, the processing load of the AR display device 500 can be reduced.

Further, by displaying information required later in the road display area 733, the user can easily grasp intuitively. Further, the side display area 734 remains as it is even when the object flows and moves away from the front by moving. Therefore, by displaying information that needs to be tracked in the side surface display area 734, the user can track the information even if the automobile has moved. Also, by displaying important information in the aerial display area 735, high readability can be obtained without impairing the visibility of the scene in the real space. In addition, by displaying in the front display area 736, the user can view the content without greatly removing the line of sight from the front.

As described above, when the display mode and the display position are automatically determined based on the meta information, AR overlay display with high visibility and readability for the user is possible. Moreover, this AR superimposed display can be realized without any instruction from the user.

Note that the classification of display content is not limited to that based on meta information. For example, it may be performed according to a predetermined time for maintaining the display of the content.

Also, the operation mode of the moving body may be determined, and the display mode and / or display position may be determined according to the determination result. For example, content is displayed in the front display area 736 only in the automatic operation mode. In this case, the display content generation unit 501 and / or the superimposition unit 503 are configured to receive a signal indicating whether or not the automatic operation mode is set, for example, from the ECU or the like of the moving body.

Note that the automatic operation mode of the moving object is a mode in which the user does not need to actively operate and the moving object automatically operates. At this time, the user does not need to pay attention to the actual surrounding scene. However, during actual driving, it may be necessary for the user to actively drive depending on the surrounding traffic environment, time, place, etc., and there are cases where attention must be paid to the actual surrounding scene.

For example, in the automatic driving mode, when viewing content on the information terminal at hand, the line of sight is facing away from the front. According to this configuration, the content is displayed in front of the user. Therefore, the content can be viewed without removing the line of sight from the front. Even when the content is viewed during automatic driving, the user's line of sight is facing forward. For this reason, even when the automatic driving mode is canceled and it is necessary to call attention to the actual scene, it is possible to smoothly draw attention forward.

As another form, the display mode and display position of the content displayed on the display 522 may be changed depending on the level of alerting required by the user.

In this case, the display content generation unit 501 and / or the superimposition unit 503 receives signals such as a user's consciousness level, fatigue level, and driving state from a moving object or a sensor attached to the user. Furthermore, you may receive information, such as the surrounding traffic condition which the navigation mounted in the motor vehicle or the said vehicle grasps | ascertains. The display content generation unit 501 and / or the superimposition unit 503 combine these to determine the alert level.

For example, if alerting is not required as in the automatic operation mode, the content display content is not limited. Then, the display mode and / or the display position are determined so that the display content of the content is simplified as the alert level increases and the alert is required. By configuring in this way, the amount of information of the content decreases as the alerting becomes necessary, but the readability increases. For this reason, since the necessity for gazing decreases, the user can pay attention to others while obtaining information from the content.

Also, for example, as shown in FIG. 15A, when the alert level becomes a predetermined level or higher, the content is displayed in the save area 737. The retreat area 737 is an area that does not interfere with driving. For example, the outside of the windshield from the front. The save area 737 may be set smaller than the original display area.

The display in the retreat area 737 is performed, for example, when the automatic operation mode is switched to the normal operation mode.

In order to prevent the user from being confused by a sudden change in the image being viewed, the image may be continuously changed when retracted or reduced. In addition, for example, the display content generation unit 501 and / or the superimposition unit 503 may be configured to detect in advance a situation such as switching to the normal operation mode using a navigation system or the like, and notify the evacuation. .

The advance notice is performed, for example, by displaying a warning text in the warning area 743. Instead of displaying the warning text, a change in color tone, a countdown display, or an audio warning may be used.

It should be noted that the content display may be completely canceled when the alert level reaches a predetermined level or when a predetermined condition is satisfied. The case where the predetermined condition is satisfied is, for example, a case where the automatic operation mode is canceled.

The display content generation unit 501 and / or the superimposition unit 503 increases the transparency of the display content. Thereby, it appears to the user that the displayed content fades out. This allows the user to see only the actual scene in front. In addition, before raising transparency, you may comprise so that a notice display may be performed by control of the display content production | generation part 501 and / or the superimposition part 503 as mentioned above.

On the contrary, when the alert level is less than a predetermined level or when a predetermined condition is satisfied, the display restriction may be canceled as shown in FIG. The case where the predetermined condition in this case is satisfied is, for example, a case where the automatic operation mode is set.

The display content generation unit 501 and / or the superimposition unit 503 cancels the display restriction such as in the automatic operation mode, enlarges the size of the front display area 736, and determines the display mode and the display position to display the content. At this time, meta information may be used to display entertainment content such as a document or video that requires attention on a large screen.

The display area change control may be performed by detecting the driving location, traffic conditions, and user conditions. The user's driving skill level may be controlled in addition to the criterion.

As described above, in the modified example of the present embodiment, the content display area and the display method can be changed according to the environmental condition, the driving condition, and the like. According to the AR display device 500 of the present modification, since the display of content can be switched, both forward alerting and readability can be achieved.

Also, a display example using the user's line-of-sight direction when the eye tracking device 508 is provided will be described. FIG. 16A is a diagram for explaining a method of associating a real object with content.

Content related to the real object 714 at a position having an angle with respect to the photographing unit 102 is displayed in an arbitrary display area. FIG. 16A illustrates, for example, a case where the image is displayed in the display area 741 facing the user. Here, in order to associate the real object 714 with the display area 741, for example, a ribbon-like drawing effect 751 in which the content is drawn from the real object 714 is displayed.

The superimposing unit 503 first obtains the coordinates of the display position of the real object 714 on the display 522 from the direction of the user's line of sight and the relative position of the AR display device 500 and the real object 714. Then, a ribbon-like image is generated so as to connect the display position of the real object 714 and the display coordinates of the display area 741.

The drawer effect 751 may be a string. The narrower the string, the more difficult it is to disturb the sight of real space. Further, the drawing effect 751 may be translucent. When translucent, the relationship can be clarified without disturbing the actual scene. The string may be tied around the real object 714 and the display area 741 so as not to disturb the field of view.

In addition, you may comprise so that such a display method may be performed according to the content to display. For example, when the content to be displayed is information that the user is interested in, it is possible to suppress a decrease in readability to other AR content.

For the determination, for example, the user's line-of-sight direction detected by the eye tracking device 508 is used. For example, the degree of interest of the user is specified based on the degree of coincidence between the viewing direction of the user and the display position of the real object. The degree of interest of the user may be determined by an active selection operation by the user. In this case, the intention of the user can be reflected. In addition, this method may be used to accumulate user selection results and use them for other processing.

As another method for associating a real object with content, a reference marker 761 may be used. The display content generation unit 501 and / or the superimposition unit 503 superimpose the reference marker 761 on the real object 714. Then, the content related to the real object 714 is displayed in an arbitrary information display area 742. At this time, the reference marker 761 is also displayed in the information display area 742.

With the above configuration, it is possible to display content with good readability.

The configuration in which AR content related to an object is displayed using the above-described extraction effect 751 is particularly useful when the object itself has information. For example, as shown in FIG. 16B, the real object 715 is a signboard.

The signboard contains information. In addition to the information, the content has further information. The display content generation unit 501 and / or the superimposition unit 503 determine the display mode and the display position so that the content related to the real object 715 is displayed in the information display area 742 set at an arbitrary position. At this time, the display content generation unit 501 and / or the superimposition unit 503 displays the extraction effect 751 between the real object 715 and the information display area 742.

Thus, by configuring, the user can obtain more information than viewing the signboard.

For example, you may use the image which image | photographed the object as AR content to display. As shown in FIG. 17A, an image obtained by photographing the guide plate 713 is displayed in the display area 744 as content related to the guide plate 713. The display area 744 may be at an arbitrary position, but the size is larger than that of the guide plate 713. This makes it easier for the user to grasp the information on the guide plate 713. At this time, in order to associate the guide plate 713 with the display area 744, a reference marker 761 may be displayed on both.

In this case, the object detection unit 105 detects the guide plate 713 from the captured image captured by the imaging unit 102. Then, the display content selection unit 502 selects an image of the guide board 713 as the display content. In addition, the display content generation unit 501 and / or the superposition unit 503 determine a display mode and a display position so as to realize the display.

Also, the AR content to be displayed may be hierarchized. As shown in FIG. 17B, a group of related contents 821 to 825 are displayed in the same display area 745. Here, a case where five contents are displayed is illustrated. However, the number of contents displayed in one display area 745 is arbitrary.

A set of contents to be displayed is selected and generated by the display content selection unit 502. The display content selection unit 502 generates a set of contents using, for example, meta information.

For example, the case where the contents 821 to 825 are advertisements for a certain product will be described. Each content includes a company name, an icon indicating the company, a product description, a price display, a reference URL, and the like. The display content selection unit 502, the display content generation unit 501, and the superimposition unit 503 determine the selection timing, the display mode, and the display position so that the display content selection unit 502, the display content generation unit 501, and the superimposition unit 503 display them according to the user's selection and timing. .

The content display is often updated in real time. For this reason, if the amount of information to be displayed at a time is large, the user's understanding may be hindered. In order to avoid this, the contents to be presented to the user at once are simplified and sequentially displayed. In such a case, this hierarchical set of contents is used. Thereby, the information included in the content can be provided with good readability.

Furthermore, the content included in the content set may be tailored as a story, and the story may be advanced in time series. Moreover, you may have entertainment property, such as setting it as the structure which branches a story according to a user's path | route. These are realized, for example, by selection of the display content selection unit 502 based on the meta information.

Also, content with mascots may be displayed. The content with a mascot includes a content 831 and a mascot 841. For example, when a user's attention is desired to be displayed on the advertisement content 831, a popular mascot 841 is added and displayed.

For example, the eye tracking device 508 determines whether or not the user's line of sight has remained in the display area 746 of the content 831 for a predetermined period. When staying for a predetermined period, the display content generation unit 501 and the superposition unit 503 display the mascot 841 in the display area 746. Thereby, it is possible to operate such that the mascot 841 is displayed only when the user reads the content 831. By performing such an operation, the user's interest can be effectively attracted to the content 831.

For example, if the mascot 841 is a game character or the like and is configured to be usable in the game after being displayed, the viewing rate of the content 831 increases. In this way, high-value-added content display can be realized with good readability for the user.

In addition to displaying on the display 522, the display content may be transmitted to the outside and stored. For example, the transmission destination may be an information processing terminal 910 as shown in FIG. Further, it may be another storage device connected to the network. Transmission is performed via the gateway 110. If information is transmitted to another storage device and stored, the capacity of the information processing terminal 910 is not compressed. With this configuration, the displayed information can be reused. The displayed content can be utilized later, which increases convenience for the user.

Also, it may be configured to be able to select whether or not to accumulate information. In this case, for example, an instruction is received via the instruction receiving unit 506. Then, for example, the display content selection unit 502 transmits the instructed content from the selected content.

In addition, you may comprise so that a selection instruction | indication may be received via the above-mentioned motion recognition apparatus. For example, when receiving by voice, numbers, simple characters, or the like may be added and displayed as identifiers to each content to be displayed so that it is easy to give instructions. The identifier is given by, for example, the display content selection unit 502 or the display content generation unit 501. The display content selection unit 502 or the display content generation unit 501 adds an identifier by using meta information, sequentially assigning predetermined characters and numbers, or the like.

It should be noted that the content may be one that gives decoration to the object. And the decoration may be a thing with high entertainment property.

Fig. 10 shows a display example in this case. In the example of this figure,

contents

816 and 817 are respectively displayed at positions corresponding to a real building 716 and a car 717, and these are decorated.

For example, if the

contents

816 and 817 are provided with a design that has a sense of unity according to the destination, the atmosphere can be raised from driving. At this time, in order to create a world view, it is possible to unify the world view by switching the navigation guidance voice and prohibiting voice output. In this way, by replacing the actual scene with the

contents

816 and 817, the entertainment property is enhanced. The display content selection unit 502 selects the

contents

816 and 817 to be displayed.

In addition, although this embodiment gave and demonstrated the case where the AR display apparatus 500 was mounted in moving bodies, such as a motor vehicle, for example, it is not limited to this. For example, it may be mounted on the HMD and used by pedestrians.

Furthermore, a normal screen or the like may be used for the display 522, superimposed on other images, and used indoors.

In addition, this invention is not limited to the above-mentioned Example, Various modifications are included. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

DESCRIPTION OF SYMBOLS 100: Position and orientation detection apparatus, 101: Control part, 102: Imaging | photography part, 103: Coarse detection sensor, 104: Evaluation object extraction part, 105: Object detection part, 106: Direction calculation part, 107: Position / orientation calculation part, 108 : Map management unit, 109: Two-dimensional map generation unit, 110: Gateway, 121: Captured image holding unit, 122: Positioning information holding unit, 123: Two-dimensional map information holding unit, 124: Map information holding unit, 131: Lens 132: Image sensor, 133: Optical axis, 141: CPU, 142: Memory, 143: Storage device, 145: Communication I / F,
210: Two-dimensional map information, 211: Evaluation object candidate, 212: Evaluation object candidate, 213: Evaluation object candidate, 220: Captured image, 221: Evaluation object, 222: Evaluation object, 223: Evaluation object, 225: Area, 230 : Map information, 231: real object, 232: real object, 233: real object, 234: road, 241: circle, 242: circle, 251: shape, 252: trapezoid, 261: front shape,
310: Template image, 311: Template image, 312: Template image, 313: Template image, 314: Template image, 315: Template image, 316: Template image, 321: Template image, 322: Template image, 323: Template image, 324: Template image, 325: Template image, 331: Representative point,
500: AR display device, 501: display content generation unit, 502: display content selection unit, 503: superimposition unit, 504: display unit, 506: instruction reception unit, 507: extraction unit, 508: eye tracking device, 511: content Holding unit, 521: projector, 522: display, 530: user, 531: field of view range, 541: shooting range, 551: line of sight, 552: intersection, 553: straight line,
610: Content server system, 611: Content server, 612: Content storage unit, 613: AR content storage unit, 614: Advertisement storage unit, 615: Communication I / F, 620: System, 620: Map server system, 621: Map Server, 622: map information storage unit, 623: two-dimensional map information storage unit, 624: object, 625: communication I / F,
711: Building, 712: Sign, 713: Guide board, 714: Real object, 715: Real object, 716: Building, 717: Car, 731: Distant area, 732: Front area, 733: Road display area, 734: Side Display area, 735: Aerial display area, 736: Front display area, 737: Retraction area, 741: Display area, 742: Information display area, 743: Warning area, 744: Display area, 745: Same display area, 745: Display Area, 746: display area, 751: extraction effect, 761: reference marker,
811: Related content, 812: Related content, 813: Related content, 816: Content, 817: Content, 821: Content, 822: Content, 823: Content, 825: Content, 825: Content, 831: Content, 841: Mascot 910: Information processing terminal

Claims

An imaging unit for imaging a predetermined imaging range including two or more objects;
An object detection unit for specifying a pixel position of each of the objects in the captured image captured by the imaging unit;
A direction calculation unit that calculates an object direction, which is a direction of each of the objects with respect to the shooting unit, using the pixel position, map information of each of the objects, and a focal length of the shooting unit;
A position and orientation calculation unit that calculates the position and orientation of the photographing unit using the object direction of each object and the map information;
The object detection unit extracts two-dimensional map information corresponding to the shooting range from two-dimensional map information in which information on positions and shapes of a plurality of objects in a predetermined region is stored, and the two-dimensional map information The position / orientation detection apparatus characterized by specifying the said pixel position using.
The position and orientation detection apparatus according to claim 1,
The position / orientation detection unit calculates the position and orientation using positioning information that is an approximate position of the photographing unit.
The position and orientation detection apparatus according to claim 1,
The position and orientation calculation unit calculates the position and orientation using object directions of the three objects.
The position and orientation detection apparatus according to claim 1,
The position and orientation calculation unit calculates the position and orientation using the object direction of two objects and the map information of predetermined elements within the shooting range. .
The position and orientation detection apparatus according to claim 1,
The position and orientation calculation unit calculates the position and orientation using an object direction of the two objects and a deformation amount of the shape of the object in the captured image. apparatus.
The position and orientation detection apparatus according to claim 1,
A rough detection sensor for detecting a rough position and orientation of the photographing unit;
The position and orientation detection apparatus, wherein the object detection unit acquires the two-dimensional map information using the approximate position and orientation detected by the rough detection sensor.
The position and orientation detection apparatus according to claim 1,
Using a captured image captured by the imaging unit, further comprising a two-dimensional map generation unit for generating the two-dimensional map information;
The two-dimensional map generation unit
Using the pixel position in the captured image of the object specified by the object detection unit and the position and orientation of the imaging unit calculated by the position / orientation calculation unit, each object in the captured image in real space A position and orientation detection apparatus that calculates a position and associates the appearance of each object with a position in the real space to generate the two-dimensional map information.
An AR display device for displaying content in association with an object in a scene behind the display on a display having transparency and reflection,
A position and orientation detection apparatus according to claim 1;
A display content generation unit for generating the content to be displayed on the display;
A superimposition unit that determines the display position of the generated content on the display using the position and orientation of the photographing unit determined by the position and orientation detection device and the pixel position of the object specified by the object detection unit When,
An AR display device comprising: a display unit that displays the generated content at a display position determined by the superimposing unit on the display.
An AR display device for displaying content in association with an object in a scene behind the display on a display having transparency and reflection,
An object detection unit for specifying a pixel position of each of the objects in a photographed image obtained by photographing a predetermined photographing range including two or more objects;
A direction calculation unit that calculates an object direction, which is a direction of each of the objects with respect to the imaging apparatus that has acquired the captured image, using the pixel position, map information of each of the objects, and a focal length of the imaging apparatus; ,
A position and orientation calculation unit that calculates the position and orientation of the photographing device using the object direction of each object and the map information;
A display content generation unit for generating the content to be displayed on the display;
A superimposition unit that determines the display position of the generated content on the display using the position and orientation of the photographing apparatus calculated by the position / orientation calculation unit and the pixel position of the object specified by the object detection unit When,
A display unit that displays the generated content at a display position determined by the superimposing unit on the display, wherein the object detection unit has information on positions and shapes of a plurality of objects in a predetermined region. An AR display device, wherein two-dimensional map information corresponding to the imaging range is extracted from stored two-dimensional map information, and the pixel position is specified using the two-dimensional map information.
The AR display device according to claim 8 or 9, wherein
An AR display device, further comprising: a display content selection unit that selects the content to be displayed on the display in accordance with a predetermined display condition.
The AR display device according to claim 8 or 9, wherein
An eye tracking device that outputs a user's line-of-sight direction and field of view;
The AR display device, wherein the superimposing unit determines the display position so that the content is displayed in an area specified by the line-of-sight direction and field of view on the display.
The AR display device according to claim 8 or 9, wherein
The AR display device, wherein the superimposing unit determines a display position of the content according to meta information previously held by the content.
The AR display device according to claim 8 or 9, wherein
The AR display device is mounted on a moving body.
The AR display device according to claim 13,
The AR display device, wherein the superimposing unit determines a display position on the display according to an operation mode of the moving body.
Shooting a shooting range that is a predetermined range including two or more objects by a shooting unit to obtain a shot image,
Identify the pixel position of each of the objects in the captured image,
Calculating an object direction, which is a direction of each of the objects with respect to the photographing unit, using the pixel position, map information of each of the objects, and a focal length of the photographing unit;
Using the object direction of each object and the map information, the position and orientation of the photographing unit are calculated,
The pixel position is obtained by extracting two-dimensional map information corresponding to the shooting range from two-dimensional map information in which information on positions and shapes of a plurality of objects in a predetermined region is stored. A position and orientation detection method characterized by being specified by using.
An AR display method for displaying content in association with an object in a scene behind the display on a display having transparency and reflection,
Generating the content to be displayed on the display;
Using the captured image including the object captured by the imaging unit, the pixel position in the captured image of the object and the position and orientation of the imaging unit are calculated, and the pixel position and the position of the imaging unit are calculated. And using the posture to determine the display position of the content on the display,
The AR display method, wherein the content is displayed at the determined display position on the display.