WO2022220306A1 - 映像表示システム、情報処理装置、情報処理方法、及び、プログラム - Google Patents
映像表示システム、情報処理装置、情報処理方法、及び、プログラム Download PDFInfo
- Publication number
- WO2022220306A1 WO2022220306A1 PCT/JP2022/018087 JP2022018087W WO2022220306A1 WO 2022220306 A1 WO2022220306 A1 WO 2022220306A1 JP 2022018087 W JP2022018087 W JP 2022018087W WO 2022220306 A1 WO2022220306 A1 WO 2022220306A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- display
- video
- unit
- information
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims description 38
- 238000003672 processing method Methods 0.000 title claims description 11
- 238000003384 imaging method Methods 0.000 claims abstract description 33
- 238000004364 calculation method Methods 0.000 claims abstract description 20
- 238000012545 processing Methods 0.000 claims description 84
- 230000033001 locomotion Effects 0.000 claims description 72
- 230000006870 function Effects 0.000 claims description 44
- 239000003550 marker Substances 0.000 claims description 20
- 230000005540 biological transmission Effects 0.000 claims description 19
- 230000000007 visual effect Effects 0.000 claims description 13
- 238000010276 construction Methods 0.000 claims description 4
- 238000010191 image analysis Methods 0.000 claims description 3
- 230000008859 change Effects 0.000 abstract description 12
- 238000010586 diagram Methods 0.000 description 102
- 238000004891 communication Methods 0.000 description 75
- 238000001514 detection method Methods 0.000 description 63
- 230000006835 compression Effects 0.000 description 46
- 238000007906 compression Methods 0.000 description 46
- 238000000926 separation method Methods 0.000 description 38
- 238000000034 method Methods 0.000 description 28
- 239000011521 glass Substances 0.000 description 19
- 230000008569 process Effects 0.000 description 16
- 230000005236 sound signal Effects 0.000 description 14
- 230000000694 effects Effects 0.000 description 9
- 238000006243 chemical reaction Methods 0.000 description 8
- 230000003287 optical effect Effects 0.000 description 8
- 238000013507 mapping Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 239000000470 constituent Substances 0.000 description 4
- 210000003128 head Anatomy 0.000 description 4
- 238000007689 inspection Methods 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 210000000887 face Anatomy 0.000 description 3
- 230000004807 localization Effects 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 230000002194 synthesizing effect Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000006837 decompression Effects 0.000 description 2
- 201000003152 motion sickness Diseases 0.000 description 2
- 238000001454 recorded image Methods 0.000 description 2
- 210000001525 retina Anatomy 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 206010010219 Compulsions Diseases 0.000 description 1
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 1
- 206010028813 Nausea Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004424 eye movement Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 210000002837 heart atrium Anatomy 0.000 description 1
- 208000013057 hereditary mucoepithelial dysplasia Diseases 0.000 description 1
- 238000007654 immersion Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 206010025482 malaise Diseases 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008693 nausea Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 235000019640 taste Nutrition 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
- H04N21/4316—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for displaying supplemental content in a region of the screen, e.g. an advertisement in a separate window
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09G—ARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
- G09G5/00—Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
- G09G5/36—Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
- G09G5/37—Details of the operation on graphic patterns
- G09G5/377—Details of the operation on graphic patterns for mixing or overlaying two or more graphic patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/013—Eye tracking input arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0641—Shopping interfaces
- G06Q30/0643—Graphical representation of items or shoppers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q90/00—Systems or methods specially adapted for administrative, commercial, financial, managerial or supervisory purposes, not involving significant data processing
- G06Q90/20—Destination assistance within a business structure or complex
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/21805—Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/235—Processing of additional data, e.g. scrambling of additional data or processing content descriptors
- H04N21/2353—Processing of additional data, e.g. scrambling of additional data or processing content descriptors specifically adapted to content descriptors, e.g. coding, compressing or processing of metadata
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/4728—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
- H04N21/4788—Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8106—Monomedia components thereof involving special audio data, e.g. different tracks for different languages
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
- H04N7/183—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a single remote source
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Definitions
- the present disclosure relates to a video display system, an observation device, an information processing device, an information processing method, and a program.
- Patent Literature 1 discloses a head-mounted display capable of presenting (that is, displaying) an image of content and an image of the outside world.
- the head-mounted display disclosed in Patent Document 1 by adjusting the brightness of at least one of the video of the content and the video of the external world, the user feels uncomfortable when switching between the video of the content and the video of the external world. is reduced.
- display devices such as head-mounted displays can be used to simulate the experience of a certain point by viewing video from a remote location, as an application that takes advantage of its high degree of immersion. At this time, the display device is required to provide an appropriate image.
- the present disclosure is made in view of the above, and aims to provide a video display system or the like capable of displaying an appropriate video.
- one aspect of a video display system is a video display system for displaying a display video on a display device, comprising: a shooting unit that generates a wide-viewing-angle video; a data acquisition unit that acquires data relating to at least one of the position and direction of a gaze target to be gazed at by the user of the display device in the viewing angle image, and cue information for notifying changes in the state of the observation system; an observation device comprising: a metadata construction unit that uses the data from the acquisition unit as metadata together with other information; and a transmission unit that transmits the wide-viewing-angle image together with the metadata; a receiving unit for receiving data and the cue information; a display state estimating unit for estimating at least one of a position and orientation of the display device within the wide viewing angle image; and the estimated wide viewing angle image of the display device.
- a difference calculation unit that calculates at least one of a relative position, which is the relative position of the gaze target, and a relative direction, which is the relative direction of the gaze target; and at least one of the calculated relative position and the relative direction a presentation unit for presenting one of the information, an instruction based on the cue information, and the state of the observation system to a user of the display device; and the display device estimated by the display state estimation unit from the received wide-viewing-angle image.
- a VR device including a video generation unit and the display device for displaying the display video.
- one aspect of the information processing device is an information processing device used in a video display system for displaying at least part of a display video in a wide viewing angle video on a display device, a receiving unit for receiving metadata based on data relating to at least one of the position and direction of a gaze target to be gazed by the user of the display device in a corner image, the data being obtained by receiving an input; relative position, which is the position of the gaze target relative to at least one of the position and direction in the wide viewing angle image of the display device, based on the difference between at least one of the position and direction of the gaze target, and and a difference calculation unit that calculates and outputs at least one of relative directions that are relative directions of the gaze target.
- one aspect of the information processing method is an information processing method for displaying at least part of a display image in a wide viewing angle image on a display device, wherein the wide viewing angle image is used to receive metadata based on data obtained by accepting input and estimated position within the wide-viewing-angle image of the display device and a relative position, which is the position of the gaze target relative to the orientation of the display device, based on the difference between at least one of the gaze target and at least one of the gaze target position and direction on the metadata, and At least one of the relative directions, which is the relative direction of the gaze target, is calculated and output.
- a video display system or the like capable of displaying an appropriate video is provided.
- FIG. 1 is a diagram for explaining a conventional example.
- FIG. 2 is a diagram for explaining a conventional example.
- FIG. 3 is a diagram for explaining a conventional example.
- FIG. 4 is a diagram for explaining a conventional example.
- FIG. 5 is a diagram for explaining a conventional example.
- FIG. 6 is a diagram for explaining a conventional example.
- FIG. 7 is a diagram for explaining a conventional example.
- FIG. 8 is a diagram for explaining a conventional example.
- FIG. 9 is a diagram for explaining a conventional example.
- FIG. 10 is a diagram for explaining a conventional example.
- FIG. 11 is a diagram for explaining a conventional example.
- FIG. 12 is a diagram for explaining a conventional example.
- FIG. 13 is a diagram for explaining a conventional example.
- FIG. 1 is a diagram for explaining a conventional example.
- FIG. 2 is a diagram for explaining a conventional example.
- FIG. 3 is a diagram for explaining a conventional example.
- FIG. 14 is a diagram showing a schematic configuration of a video display system according to the embodiment.
- FIG. 15 is a diagram showing an example of video displayed in the video display system according to the embodiment.
- FIG. 16 is a block diagram showing the functional configuration of the video display system according to the embodiment.
- FIG. 17 is a more detailed block diagram showing the functional configuration of the observation device according to the embodiment.
- FIG. 18 is a more detailed block diagram showing the functional configuration of the display device according to the embodiment.
- FIG. 19 is a flow chart showing the operation of the video display system according to the embodiment.
- FIG. 20 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 21 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 20 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 22 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 23 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 24 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 25 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 26 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 27 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 28 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 29 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 30 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 30 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 31 is a conceptual diagram explaining generation of a display image according to the embodiment.
- FIG. 32 is a conceptual diagram explaining generation of a display image according to the embodiment.
- 33 is a diagram illustrating an example of a functional configuration of a video display system according to an embodiment;
- FIG. 34 is a diagram illustrating an example of a functional configuration of an observation system according to an embodiment;
- FIG. 35 is a diagram illustrating an example of the functional configuration of the VR system according to the embodiment;
- FIG. FIG. 36 is a diagram illustrating a configuration example of metadata according to the embodiment.
- 37 is a diagram illustrating a configuration example of metadata according to the embodiment;
- FIG. 38 is a diagram illustrating another configuration example of metadata according to the embodiment.
- FIG. 39 is a diagram illustrating an example of an operation flow of the video display system according to the embodiment;
- FIG. FIG. 40 is a diagram for explaining the result of the operation of the video display system in the example.
- FIG. 41 is a diagram for explaining the result of the operation of the video display system in the example.
- FIG. 42 is a diagram for explaining the result of the operation of the video display system in the example.
- FIG. 43 is a diagram for explaining the result of the operation of the video display system in the example.
- FIG. 44 is a diagram illustrating an example of the application of the video display system in the example.
- FIG. 45 is a diagram explaining an example of the application of the image display system in the embodiment.
- FIG. 46 is a diagram for explaining another example of the moving method of the video display system in the embodiment.
- FIG. 40 is a diagram for explaining the result of the operation of the video display system in the example.
- FIG. 41 is a diagram for explaining the result of the operation of the video display system in the example.
- FIG. 47 is a diagram for explaining a configuration example in which the video display system according to the embodiment is realized using the cloud
- FIG. 48 is a diagram for explaining a configuration example in which the image display system according to the embodiment is realized using the cloud
- FIG. 49 is a diagram for explaining a configuration example in which the video display system according to the embodiment is realized using the cloud
- HMD head-mounted display
- cameras capable of shooting 360-degree (all longitude) images are used as observation devices.
- the image captured by the observation device is a wide-viewing-angle image and forms a three-dimensional image space.
- the user side of the display device can extract and display the image (that is, the image of the visual field range in an arbitrary direction from the images constituting the three-dimensional image space). If a function that can detect the direction is installed, it is possible to cut out and display a part of the image according to the user's direction from the 3D image space, so that one camera image can meet the needs of many users. It is possible to provide a suitable viewing experience.
- this instruction is a three-dimensional image.
- the user may not understand where in the space the instruction should be focused. For example, when the user is looking in the direction of 3 o'clock, if a guide present in the direction of 12 o'clock gives a voice instruction such as "Look at the left hand" to make the user look at the direction of 9 o'clock. , the user looks at the 12 o'clock direction instead of the 9 o'clock direction. In this way, when the guide issues an instruction to gaze at the gaze target on the premise that the user is looking straight ahead, the user may not be able to understand the gaze target intended by the guide.
- an object of the present disclosure is to provide a video display system capable of presenting a direction corresponding to the gaze target to the user, in order to suppress such a situation in which the gaze target cannot be understood. do.
- a wide viewing angle image of 360 degrees is captured by the observation device. Any image captured in an arbitrary angle range may be used. Such a wide viewing angle image may have a viewing angle wider than at least the viewing angle of the image displayed by the user on the display device side.
- the present disclosure describes an image display system that assumes the occurrence of image movement within a horizontal plane, it is also applicable to image movement that occurs on an intersecting plane that intersects the horizontal plane, including a vertical component. be.
- FIG. 1 is a diagram for explaining a conventional example.
- VR tourism first-person experience
- 360° camera shooting services include FirstAirlines (https://firstairlines.jp/index.html) and Tabisuke (https://www.tokyotravelpartners.jp/kaigotabisuke-2/).
- 3D CG (computer graphics) services include Google Earth VR and Boulevard (https://www.blvrd.com/).
- FIG. 2 is a diagram for explaining a conventional example.
- a service for displaying images shot at the site on a display device such as a television and viewing such images from a third-person viewpoint (also called a third-person experience).
- the third-person experience is characterized by the provision of specialized services for users guided by experts, and the possibility of monetization if it matches individual tastes.
- FIG. 3 is a diagram for explaining a conventional example.
- a main body of the VR system when deploying VR tourism, a main body of the VR system, a controller 311, a computer or smart phone 313, a network, a cloud 314, an observation system 315, etc. are required as a basic configuration.
- the main body of the VR system was conventionally only the HMD type, which is heavy and covers the face considerably, but the small eyeglass-type VR glasses type makes it easy to use for a long time, and is becoming more widely used.
- the main body of the VR system includes the all-in-one type, which includes the functions necessary for the main body of the VR system, and the tethered type, which entrusts some functions to computers and smartphones.
- the controller is used for menu selection, movement in the VR space, and the like.
- the computer or smart phone may have only communication functions or may form part of the VR system.
- the network, cloud 314 connects the observation system and the VR system, and may implement some functions of the observation system or the VR system on a computer system on the cloud.
- the observation system uses a 360-degree camera with a wireless function or a 360-degree camera, a 180-degree camera, or a wide-angle camera connected wirelessly or by wire to a smartphone or computer. Through these devices and the like, the user 312 can visually recognize the guide and the sightseeing target building and scenery in the VR space.
- VR sightseeing using a 360° camera is used as an example, but it is possible to use a 180° camera, etc., as long as participants using VR glasses can change their viewpoint.
- a virtual camera is used in a virtual space composed of computer graphics, and VR glasses are also used for the guide. Sightseeing can also be realized by entering the virtual space by pressing the button and reproducing the video in the virtual space. Therefore, the present invention can also be applied to such uses.
- a typical example of the above is a VR trip to an area or space that general travelers cannot easily go to, such as a trip to the moon.
- FIG. 4 is a diagram for explaining a conventional example.
- Fig. 4 shows a VR sightseeing service (without a guide: upper part (hereinafter referred to as conventional example 1), with a guide: middle stage (hereinafter referred to as conventional example 2)) in the case of shooting with a 360° camera, and Zoom, which is an example of a third-person experience.
- (Registered Trademark) shows a schematic configuration of a conventional example of a sightseeing service (lower part (hereinafter referred to as conventional example 3)).
- voice, voice data, and voice information include not only conversation but also music and, in some cases, audio signals including ultrasonic waves outside the audible band.
- the observation system (tourist destination) side sends out pre-recorded images, or the VR system side operates a 360° camera, robot or drone, and the VR system side can view the VR images. can be done.
- the middle row it is also possible that there are guides and camera operators on the observation system side, and VR images such as 360° camera can be enjoyed as VR by the VR system.
- 2D images are sent from the observation system side in 2D by a remote conversation service such as Zoom for multiple people using audio and video, and the images of the sightseeing spot are viewed at a remote location. can be enjoyed.
- FIG. 5 is a diagram for explaining a conventional example.
- the overall system configuration of Conventional Example 2 will be described.
- Conventional example 1 differs from conventional example 2 in that pre-recorded VR images are used or operations are performed from the VR system side, and the differences between them are also explained.
- the observation system of Conventional Example 2 is composed of a camera for VR photography, for example, a 360° camera, and a communication device for transmitting the photographed information to a remote location.
- a 360° camera for VR shooting synthesizes (stitches) images from multiple cameras shooting in different directions into one moving image, which is mapped onto a plane by, for example, equirectangular projection (ERP), and ERP It is appropriately compressed as an image and sent from the communication device to the remote VR system together with the audio data captured by the microphone.
- 360-degree cameras may be mounted on robots, drones, and the like.
- a 360° camera or a robot or drone equipped with it is operated by a photographer or a guide. may be received at In this way, the three-dimensional image space means that the images that make up the space are not only images that allow the user to experience depth, but also that the images that are displayed as a result are planar images that are virtual three-dimensional images.
- the received planar image (ERP image) is converted into a spherical image, and a part is cut out according to the observer's direction and position, and displayed on the VR display device.
- the received video is 2D, it is displayed as 2D, and most people use a 2D display device such as a tablet, a smartphone, or a TV.
- a 2D display device such as a tablet, a smartphone, or a TV.
- the observation system side When operating on the VR system side, if the observation system side operates in conjunction with the orientation and position of the VR system side, key operations such as mouse, tablet, joystick, keyboard, etc., menus and icons on the screen can be selected.
- the VR system may send appropriate control data to the observation system, and the observation system may also send its status, such as direction and position, to the VR system. is necessary.
- FIG. 6 is a diagram for explaining a conventional example. Using the comparison between the 360° video and the normal video shown in FIG. 6, the resolution when viewing the 360° video in the VR system will be described.
- the resolution of the clipped video for VR display is only 1067 ⁇ 600 (about twice that of SD video). Since a VR system using a 2K ⁇ 2K resolution panel for one eye displays on a square panel, the vertical direction is further stretched by a factor of two, resulting in a very low resolution image.
- the VR display resolution is 2133 x 1200, and in terms of data volume, it is 1.23 times the area of Full HD (1920 x 1080), but the vertical direction is stretched twice, so Full HD It becomes a picture of degree.
- the VR resolution is 2933 ⁇ 1650, which is equivalent to the VR system.
- FIG. 7 is a diagram for explaining a conventional example. Configuration examples of main functions of conventional examples 1 and 2 will be described for each function.
- the observation system 751 of the conventional examples 1 and 2 includes a VR shooting means 762 (VR shooting camera) for performing VR shooting, and a VR video processing means 758 that processes the video shot by the VR shooting means 762 and converts it into an image suitable for transmission.
- VR video compression means 756 that compresses the VR video processed by the VR video processing means 758 and converts it into a data rate and video signal format suitable for transmission
- audio input means 763 that consists of a microphone for inputting guide and peripheral audio.
- audio compression means 760 for converting the audio signal input by the audio input means 763 into a data rate and audio signal format suitable for transmission, graphics generation means 759 for generating auxiliary information as graphics, VR video compression means 756, graphics Multiplexing means 757 for converting the video signal, audio signal, and graphics information compressed by the generating means 759 and the audio compressing means 760 into signals suitable for transmission;
- Communication means 754 for sending to the VR system 701 and receiving communication audio signals from a plurality of VR systems 701, Separation means 755 for extracting compressed audio signals from the communication audio signals received by the communication means 754, From the separation means 755 It comprises an audio decoding means 761 for extracting an audio signal from the compressed audio signal, and an audio output means 764 for outputting the audio signal decoded by the audio decoding means 761 as sound.
- VR video processing means 758, VR video compression means 756, and graphics generation means 759 are implemented within the GPU, and audio compression means 760, multiplexing means 757, separation means 755, and audio decoding means 761 are implemented within the CPU.
- audio compression means 760, multiplexing means 757, separation means 755, and audio decoding means 761 are implemented within the CPU.
- the CPU and GPU may be implemented as one processor, but the functional configuration and operation are the same.
- the VR shooting means 762 is, for example, a 360° camera, but is composed of a plurality of cameras that shoot in different directions. This is mapped onto a plane by, for example, raw range cylindrical projection (ERP), and output as an ERP image.
- ERP raw range cylindrical projection
- the VR systems 701 of the conventional examples 1 and 2 receive communication observation signals sent from the observation system 751 or use speech input by the VR system 701 as communication speech information.
- VR display means 704 for outputting the VR video from the VR display control means 708 for viewing with both eyes
- rotation detection means 703 for detecting the tilt of the VR display means 704 in the front, back, left and right or the direction of the white of the eye
- the respective outputs of the position detection means 702 for detecting positions in the front, back, and height directions are sent to the VR control means 707, and the video displayed on the VR display means 704 by the output of the VR control means 707 and the audio by the audio reproduction control means are displayed. Sound output by the reproducing means 709 is appropriately controlled.
- the compressed audio information separated by the separating means 715 is decoded by the audio decoding means 713 and sent as audio information to the audio reproduction control means 709 . Then, balance in the left, right, front, back, and height direction, and in some cases, frequency characteristics, delay processing, or synthesizing an alarm as the VR system 701 is performed.
- the graphics generation means 712 also generates graphics for display such as the system menu and warnings of the VR system 701 , which are superimposed on the VR image and displayed by the VR display means 704 .
- the VR system 701 is provided with voice input means 706 for inputting the voice of the user of the VR system 701. Voice information from the voice input means 706 is compressed by voice compression means 714 and sent to multiplexing means 717 as compressed voice information. Then, it is sent from the communication means 716 to the observation system 751 as voice information for communication.
- a typical example of the 360° camera 801 combines two imaging systems, that is, an ultra-wide-angle lens 854, a shutter 853, and an imaging device 852, and shoots images of 360° up and down, front and back.
- the VR imaging camera 804 is illustrated as having two or more imaging systems, since there are cases in which two or more imaging systems are combined in order to capture a higher quality image.
- the shooting system may be configured by combining independent cameras.
- GPU Graphics Processing Unit
- the main configuration of the 360° camera 801 is the VR camera 804 from the above-described multiple imaging systems, the GPU 803 that mainly processes video data and graphics, general data processing, processing related to input and output, and 360° A CPU 802 that controls the entire camera 801, an EEPROM (Electrical Erasable Programmable ROM) 813 for storing programs for operating the CPU 802 and GPU 803, a RAM 814 for storing data for operating the CPU 802 and GPU 803, video and audio
- SD card (registered trademark) 821 which is a removable memory for storing programs, and wireless communication with WiFi (registered trademark) and Bluetooth (registered trademark) for exchanging data with the outside and receiving operations from the outside.
- wireless communication element 820 for performing operation, buttons and display element 808 for operation and display, battery 807 and power supply control element 812, multiple microphones (microphone group 819) or microphone terminal 825 for inputting voice, microphone amplifier 818, ADC 817 an audio input unit consisting of a speaker 826 or a headphone terminal 824, an amplifier 823, an audio output unit consisting of a DAC 822, a video system bus mainly connecting the VR camera 804 and the CPU 802 and used for reading digital video data, the above-mentioned EEPROM 813, A memory bus that connects the RAM 814 and SD card 821 to the GPU 803 and CPU 802 to exchange data with the memory, the above-described CPU 802, GPU 803, wireless communication element 820, audio input section and audio output section are connected to provide control and data.
- Control and low-speed data exchange are performed including the system bus for communication, the aforementioned buttons and display element 808, the power supply control element 812, and the audio input unit, the audio output unit, and the VR shooting camera 804 (not shown). It consists of an I/O bus and several bus converters 815 and 816 connecting the respective buses. A motion/position detector 860 is further connected to the I/O bus. Whether some processing is performed by the GPU 803 or the CPU 802 may differ from this example, and the bus configuration may also differ from this example, but there is no difference in the functional configuration and operation described later.
- the VR shooting camera 804 has a lens 854 for shooting a wide-angle image, an image sensor 852 that converts the light collected by the lens 854 into an electrical signal, and is located between the lens 854 and the image sensor 852 to block light.
- a shutter 853 (not shown here), which is located in the same position as the shutter 853 and controls the intensity of light from the lens 854, and an ADC 851 that converts an analog electrical signal from the image sensor 852 into a digital video signal.
- each is controlled by the CPU 802 through an I/O bus, and the CPU 802 is informed of the status.
- Buttons include a power switch 806 for turning the power ON/OFF, a shooting start/end button 811 for starting/stopping shooting, and a shooting mode selection button for changing the shooting mode, which may not be provided.
- a zoom button 810 for moving the lens 809 and the lens 854 and digitally controlling the angle of view for scaling.
- the power control element 812 which may be integrated with the battery 807, stabilizes the voltage, manages the battery capacity, etc., and supplies power to all components (not shown). In addition, it supplies power to HMD/VR glasses through USB or AV output.
- Each function realized by the GPU 803 is realized by dedicated hardware and programs such as image processing, but generally the functions realized by the CPU 802 are realized by general-purpose hardware and programs.
- the GPU 803 is used to implement a VR image processing unit 842, a VR image compression unit 841, a graphics generation unit 843, and the like.
- the CPU 802 is used to implement a memory control unit 835 , a multiplexing unit 832 , an audio compression unit 833 , an audio decoding unit 834 and a separation unit 831 .
- FIG. 9 is a diagram for explaining a conventional example. Based on FIG. 9, an implementation example of a VR system 901 will be described as a typical implementation example of the observation system of Conventional Example 2.
- the VR system 901 is assumed to consist of a computer or smart phone 951 and an HMD or VR glasses 902 connected thereto.
- the HMD or the VR glasses 902 are used alone, but in that case, it can be considered that the functions of both the CPU and GPU are unified, and the peripheral functions are also integrated.
- the main components of the computer/smartphone 951 in the VR system 901 are a high-speed communication device 970 such as WiFi and Ethernet (registered trademark) for connecting with the observation system, and a GPU 954 that mainly processes video data and graphics.
- a CPU 965 that performs general data processing and overall control of the computer/smartphone 951;
- a RAM 961 used for storing data, a power control element 964 for supplying power to the power switch 963 and each part, an AV output 952 for outputting video and audio signals to the HMD/VR glasses 902, and the HMD/VR glasses 902 I / F such as USB 953 for controlling and acquiring data from there, memory bus for connecting RAM 961 and nonvolatile memory 962 and accessing by CPU 965 and GPU 954, CPU 965 and GPU 954 AV output 952, USB 953,
- a system bus for accessing the communication device 970 a bus connection (bus converter 960) connecting the system bus and the memory bus, a display device (not shown), an input
- the GPU 954 is used to implement a motion/position detection processing unit 955, a VR control unit 956, a VR display control unit 957, a VR video decoding unit 958, a graphics generation unit 959, and the like.
- the CPU 965 is used to realize an audio decoder 966 , an audio reproduction controller 967 , a multiplexer 968 and a separator 969 .
- the AV output 952 and USB 953 can be replaced with a high-speed bidirectional I/F such as USB Type-C (registered trademark).
- the HMD/VR glasses 902 side is also connected with the same I/F, or is connected with a converter that converts the I/F.
- the CPU 965 or GPU 954 performs appropriate image compression and sends the image to the HMD/VR glasses 902 via the USB 953 in order to appropriately compress the data amount.
- the main components of the HMD/VR glasses 902 in the VR system 901 are a microphone 906 for inputting voice, a microphone amplifier 917, an audio input unit consisting of an ADC 918, a speaker 907 or a headphone terminal 908, an amplifier 919, and a DAC 920. It consists of an audio output section, a VR display section consisting of two sets of lenses 904 and a display element 905 for the user to view VR images, a movement/position detection section and an orientation detection section consisting of a gyro sensor, a camera or an ultrasonic microphone, etc.
- power switch 921 for power supply control power supply control element 924 for power supply control, EEPROM 913 described above, RAM 914, SD card and GPU 910, memory bus for exchanging data with memory by connecting CPU 915, CPU 915 described above, GPU 910, wireless Communication element 927, AV input 925 for receiving video and audio signals from computer/smartphone 951, USB 926 for receiving control signals from computer/smartphone 951, sending video and audio signals, and movement/position data I / F such as, mainly audio compression (realized by the audio compression unit 916), control of switches and power supplies, CPU 915 that controls the entire HMD / VR glasses 902, mainly adjusts the video to the VR display unit Image display processing (implemented by the image display processing unit 912) and motion/position detection (implemented by the
- It consists of an I/O bus for controlling and exchanging low-speed data, and several bus converters 922 connecting each bus. Whether some processing is performed by the GPU 910 or by the CPU 910 may differ from this example, and the bus configuration may also differ from this example, but there is no difference in the functional configuration and operation described later.
- the video data from the AV input 925 has a large amount of data and is high speed, so if the system bus does not have a sufficient speed, it is shown as being directly loaded into the GPU 910.
- the video information captured by the camera of the motion/position sensor 903 is sent to the display element as information for the user to check the surroundings of the HMD/VR glasses 902, or is sent to the computer/smartphone 951 via the USB 926, and is sent to the user. may be monitored for dangerous situations.
- the power control element 924 receives power supply from the USB 926 or AV input 925, stabilizes the voltage, manages the battery capacity, etc., and supplies power to all components (not shown).
- the battery 923 may be provided inside or outside and connected to the power control element 924 .
- the state of the buttons and cursor of the controller (not shown) is acquired by the CPU 915 through the wireless communication device 927 and used for button operations, movement, and application operations in the VR space.
- the position and orientation of the controller are detected by a camera or an ultrasonic sensor in the motion/position detection unit, and after performing appropriate processing with the motion/position sensor, are used for control by the CPU 915 and are transmitted via the USB 926. is sent to the computer/smartphone 951 and used for drawing graphics and image processing executed by the program executed by the CPU 915 or by the GPU 910 . Since the basic operation is not directly related to the present invention, it is omitted.
- FIG. 10 is a diagram for explaining a conventional example. An implementation example of an integrated VR system 1001 in which HMD/VR glasses are provided with functions for VR in a computer/smartphone will be described.
- the functions of the computer/smartphone and HMD/VR glasses are integrated, and the functions of the CPU and GPU are realized with one CPU and GPU, respectively.
- the communication element 1033 is typically a WiFi that performs wireless communication and has a battery 1026 because it does not have a power cable. It has an I/F with a general-purpose computer such as a USB 1034 for charging the battery 1026 and initial setting.
- the integrated VR system 1001 does not require AV output, AV input, or USB to connect the computer/smartphone and HMD/VR glasses, so high-definition and delay-free transmission of AV information and efficient control are possible. Due to the size limitation, it may not be possible to use the high-performance CPU 1027 and GPU 1006 due to power, heat, and space limitations, and the VR function may be limited. There is
- the integrated VR system 100 similarly to the configuration described in FIGS. , RAM 1019 , EEPROM 1020 , bus conversion 1021 , motion position sensor 1022 , power switch 1023 , volume button 1024 and power control element 1025 .
- Video display processing 1012 , motion/position detection processing 1013 , VR control 1014 , VR display control 1015 , motion/position detection 1016 , VR video decoding 1017 , and graphics generation 1018 are realized using the GPU 1006 .
- audio compression 1028 , audio decoding 1029 , audio reproduction control 1030 , multiplexing 1031 and separation 1032 are implemented using the CPU 1027 .
- FIG. 11 is a diagram for explaining a conventional example. Based on FIG. 11, a more detailed configuration of the VR video processing unit 1103 that processes the video captured by the VR camera 1151 of the observation system of the conventional examples 1 and 2 will be described.
- the VR camera has a plurality of cameras cm for capturing 360° up and down images, typically a camera cm with an ultra-wide-angle lens. is input to the VR video processing unit 1103 implemented by a program in the GPU 1101 or a dedicated circuit.
- the VR video processing unit 1103 first evaluates the shooting direction of each camera cm and the video obtained by shooting a plurality of input videos, and performs processing to synthesize and connect them so as to form a continuous spherical video. Images captured by each camera cm are input to the stitching processing unit 1105 . The spherical video data output from the stitching processing unit 1105 is mapped onto a plane by, for example, equirectangular projection (ERP) by the VR video mapping unit 1104, and output as an ERP image from the VR video processing unit 1103. It is passed to the next VR video compression unit 1102 .
- ERP equirectangular projection
- connection between the video system bus and the camera is illustrated so that each camera is connected to the bus, but in the VR shooting camera 1151, it is collected as one signal and captured by each camera in a time division manner.
- the resulting video may be sent to the video system bus and input to the VR video processing unit 1103 .
- the GPU 1101 since there are two cameras cm, it is possible to configure the GPU 1101 to receive the outputs of the two cameras instead of the bus, and the VR video processing unit 1103 receives and processes the images shot in parallel. is.
- FIG. 12 is a diagram for explaining a conventional example. A more detailed configuration of the VR display control unit 1204 of the VR system of the conventional examples 1 and 2 will be described with reference to FIG.
- the VR display control unit 1204 is realized by a program or a dedicated circuit in the GPU 1201 of the computer/smartphone, and is composed of the mapping unit 1206 and the display VR video conversion unit 1205.
- the communication device 1261 receives communication data sent from the observation system, the separation unit 1232 of the CPU 1231 separates the compressed video, the GPU 1201 receives the video via the memory bus, the video is decoded by the VR video decoding unit 1207, and the plane It becomes a video (ERP image).
- the planar image is converted into a 360° spherical image by the mapping unit 1206 of the VR display control unit 1204, and displayed on the VR display means 1202 based on the control information output by the VR control unit 1203 in the next display VR image conversion 1205. The part that does is cut out.
- the center of the ERP image is the entire surface and the origin of the 360° spherical image.
- the initial image of the VR image displayed on the VR display means 1202 is centered on the origin, and according to the ability of the VR display means 1202, the image for the right eye is shifted slightly to the right, and the image for the left eye is shifted slightly to the left. Images are cut out using default values for the height direction, and displayed on right-eye and left-eye display elements. From here, the cutout position changes depending on whether the VR system rotates left or right or looks up or down.
- images from a 360° camera do not change when the VR system is moved, but in the case of CG-generated images, the position changes when the VR system is moved or the controller is operated.
- the initial value for clipping from the 360° spherical image may be from the previous clipping position, but in general, a function is provided to return to the initial position.
- FIG. 13 is a diagram for explaining a conventional example. An operation example of the conventional example 2 will be described with reference to FIG.
- voice is input by the voice input unit (microphone group, microphone terminal, microphone amplifier, ADC) (S1325), and voice compression is performed by the voice compression unit (S1326).
- voice input unit microphone group, microphone terminal, microphone amplifier, ADC
- voice compression is performed by the voice compression unit (S1326).
- the multiple cameras (lens, shutter, image sensor, ADC) of the VR camera capture a moving image (S1321), and the stitching processing unit of the VR image processing unit stitches into a spherical image with camera 1 at the center as the center.
- the VR image mapping unit generates an ERP image by equirectangular projection or the like (S1323), and the VR image compression unit appropriately compresses it (S1324).
- the compressed ERP image and audio information are multiplexed by the multiplexing unit (S1327) into a transmittable format, and sent (transmitted) to the VR system by the wireless communication element (S1328).
- graphics information may be superimposed on video before video compression, or may be multiplexed with video and audio as graphics information, but this is omitted here.
- the information sent from the observation system is received by the communication element (S1301) and sent to the separation unit.
- the separating unit separates the transmitted compressed video information and compressed audio information (S1302).
- the compressed audio information separated by the separating unit is sent to the audio decoding unit and decoded (S1303) to become uncompressed audio information.
- the audio information is sent from the audio decoding unit to the audio reproduction control unit, and audio processing is performed by the audio reproduction control unit based on the position/direction information of the VR observation system sent from the VR control unit of the GPU via the system bus (S1304). ).
- Audio information that has undergone audio processing is sent to the audio output section (DAC, amplifier, speaker and headphone jack) of the HMD/VR glasses via the system bus, AV output or USB, and output as audio.
- Audio processing includes left/right and spatial volume balance control, frequency characteristic change, delay, spatial movement, similar processing for specific sound sources, addition of sound effects, and the like.
- the video data from the separation unit of the CPU of the computer/smartphone is sent to the VR video decoding unit of the GPU via the memory bus, and is decoded in the VR video decoding unit (S1307). entered in the section.
- the mapping unit maps the ERP image to a 360° spherical image (S1308)
- the display VR image conversion unit maps an appropriate portion to a 360° spherical image based on the position and orientation information of the VR system from the VR control unit.
- a video is cut out from the video (S1309) and displayed as a VR video by the VR display unit (display element, lens) (S1310).
- graphics are separated at the same time as video and audio are separated and superimposed on VR video by the VR display control unit, or generated in the VR system and superimposed on VR video. is omitted here.
- a video display system is a video display system for displaying a display video on a display device, and includes an imaging unit that generates a wide viewing angle video, and a user of the display device within the wide viewing angle video.
- a data acquisition unit that acquires data relating to at least one of the position and direction of a gaze target to be gazed at, and cue information for informing changes in the state of the observation system; and a transmission unit for transmitting the wide-viewing-angle image together with the metadata, a receiving unit for receiving the wide-viewing-angle image, data and cue information, and a wide-viewing-angle display device
- a display state estimation unit that estimates at least one of a position and direction within an image, at least one of the estimated position and direction of the display device within the wide-viewing-angle image, and at least one of the position and direction of a gaze target on metadata.
- the relative position which is the position of the gaze target relative to at least one of the position and direction in the wide viewing angle image of the display device
- the relative direction which is the relative gaze target direction
- a difference calculation unit that calculates at least one of the received Information on at least one of the position and direction of the display device in the wide-viewing-angle image estimated by the display state estimation unit from the wide-viewing-angle image, and one corresponding to the visual field portion according to the instruction by the cue information and the state of the observation system.
- a VR device having a video generation unit that generates a display video including an image of the display device, and a display device that displays the display video.
- Such video display stems may include relative position, which is the position of the relative gaze target with respect to at least one of the position and direction of the gaze target, and relative gaze target, to cause the user of the display device to gaze by using the metadata.
- At least one of the relative directions which is the direction of Since at least one of the relative position and the relative direction is presented to the user, it is possible to prevent the user from losing sight of the gaze target position when the user can move. Therefore, according to the image display system, it is possible to display an appropriate image from the viewpoint of suppressing the inconvenience that the user loses sight of the position of the gaze target when the user can move.
- a camera for capturing an image or an image generation unit for generating an image by calculation is further provided, and the wide viewing angle image is an image captured by the camera or an image calculated by the image generation unit.
- the presentation unit generates and outputs graphics indicating information based on at least one of the calculated relative position and relative direction and the cue information, and superimposes the output graphics on a part of the image. At least one of the relative position and the relative direction may be presented by the video generation unit.
- At least one of the relative position and the relative direction can be presented to the user by graphics.
- the data receiving unit receives input of data regarding the direction of the gaze target
- the display state estimating unit estimates the direction in the wide viewing angle image of the display device
- the graphics is displayed on the display image in the relative direction.
- An arrow pointing to may be displayed.
- At least one of the relative position and the relative direction can be presented to the user by graphics that display an arrow pointing the relative movement direction on the display image.
- the data receiving unit receives input of data regarding the direction of the gaze target
- the display state estimating unit estimates the direction in the wide viewing angle image of the display device
- the graphics is the relative direction on the display image.
- a mask which is an image for covering at least part other than the sides, may be displayed.
- the relative direction can be presented to the user by graphics displaying a mask, which is an image for covering at least part of the display image other than the relative direction side.
- the data receiving unit receives input of data regarding the position of the gaze target
- the display state estimating unit estimates the position within the wide viewing angle image of the display device, and the graphics is a relative position on the display image. may be displayed.
- the direction of relative movement can be presented to the user by graphics that display a map indicating the relative position on the display image.
- an input interface for use in inputting data may be further provided, and the data acquisition unit may acquire data input via the input interface.
- Metadata can be constructed from the data input via the input interface.
- the data acquisition unit acquires the movement input via the input interface. At least one of the start and end timings may be acquired.
- At least one of the movement start and end timings input via the input interface can be obtained.
- the image forming the wide-viewing-angle image is an image output by a photographing unit that photographs a real space
- the input interface is an instruction marker held by an operator of the input interface in the real space
- An indication marker that indicates at least one of the position and direction of the gaze target by the movement of the indication marker, and the position and direction of the gaze target indicated by the indication marker by analyzing an image containing the indication marker output by the imaging unit. and an image analysis unit that receives at least one of them.
- At least one of the relative position and the relative direction can be calculated by indicating at least one of the position and direction of the gaze target by the movement of the indication marker.
- an information processing device that includes at least part of the functions provided by the observation device and the VR device, connects the observation device and the VR device via a network, and performs a part of the processing of the observation device or the VR device. good.
- a video display system can be realized by the observation device, the VR device, and the information processing device.
- the information processing device includes a receiving unit that receives a wide-viewing-angle image, data, and cue information from the observation device as metadata, at least one of the position and direction of the gaze target on the metadata, and cue information. and a position and direction in the wide viewing angle image of the display device estimated by the display state estimation unit from the received wide viewing angle image.
- a video generation unit that generates a display video by adding information generated by the presentation unit to a part of the image corresponding to the visual field part according to at least one of the information of the wide viewing angle video, the relative position and the relative direction
- a transmission unit that transmits the partial image corresponding to the portion of the field of view according to at least one of the information and the metadata.
- a video display system can be realized by the observation device, the VR device, and the information processing device configured as described above.
- the information processing device includes a receiving unit that receives wide-viewing-angle video, data, and cue information from the observation device as metadata, at least one of the position and direction of the gaze target on the metadata, and cue information.
- a presentation unit for generating information for presenting information according to the information to a user of a display device; a metadata construction unit for generating metadata from the information generated by the presentation unit; and metadata generated by the metadata construction unit.
- a transmitter for transmitting the wide-viewing-angle image and other information received by the receiver to the VR device.
- a video display system can be realized by the observation device, the VR device, and the information processing device configured as described above.
- the information processing device includes a receiving unit that receives wide-viewing-angle video, data, and cue information from the observation device as metadata, receives data about the orientation of the display device from the display device, and A difference calculation unit that calculates a relative movement direction, which is the direction of movement of the imaging unit relative to the orientation of the display device, based on the cue information and the difference from the movement information regarding the movement of the unit, and a graphic that shows the calculated relative movement direction.
- information according to the direction of relative movement and cue information by being superimposed on a portion of the image corresponding to the portion of the field of view corresponding to the estimated orientation of the display device, out of the wide-viewing-angle image.
- a presentation unit that generates and outputs graphics to be presented to the user of the display device, and a video generator that generates a display video by modifying the graphics based on data related to the orientation of the display device and superimposing them on the wide-viewing-angle video. and a transmitter for transmitting the display image and other information.
- a video display system can be realized by the observation device, the VR device, and the information processing device configured as described above.
- the information processing device may be provided on a cloud connected to a wide area network and connected to the observation device and the VR device via the wide area network.
- a video display system can be realized by an observation device, a VR device, and an information processing device connected to the observation device and the VR device via a wide area network and provided on the cloud.
- the cue information may be information indicating that at least one of the movement direction of the observation device or the position and direction of the gaze target to be gazed at by the user of the display device is changed.
- information indicating that at least one of the movement direction of the observation device and the position and direction of the gaze target to be gazed at by the user of the display device is changed can be used as the cue information.
- an information processing device is an information processing device used in a video display system for displaying at least part of a display video in a wide viewing angle video on a display device, a receiving unit for receiving metadata based on data relating to at least one of the position and direction of a gaze target that a user of a display device is to gaze at in an image, the data being obtained by receiving an input; and a gaze target on the metadata.
- a difference calculation unit Based on the difference from at least one of the position and direction of the relative position, which is the position of the gaze target relative to at least one of the position and direction in the wide viewing angle image of the display device, and the relative gaze target and a difference calculation unit that calculates and outputs at least one of the relative directions that are the directions.
- an information processing method for displaying at least a part of a display image in a wide viewing angle image on a display device, in which a user of the display device is gazed at with the wide viewing angle image.
- receiving metadata based on the data obtained by receiving the input, and determining at least the estimated position and orientation within the wide-viewing-angle image of the display device;
- Relative position which is the position of the gaze target relative to the orientation of the display device, and relative gaze target direction, based on the difference between one and at least one of the gaze target position and direction in the metadata At least one of the relative directions is calculated and output.
- Such an information processing method can produce the same effects as the video display system described above.
- a program according to one aspect of the present disclosure is a program for causing a computer to execute the information processing method described above.
- Such a program can use a computer to achieve the same effects as the video display system described above.
- each figure is not necessarily a strict illustration.
- substantially the same configurations are denoted by the same reference numerals, and overlapping descriptions are omitted or simplified.
- FIG. 14 is a diagram showing a schematic configuration of a video display system according to the embodiment.
- FIG. 15 is a diagram showing an example of an image displayed in the image display system according to the embodiment.
- an image display system 500 of the present embodiment is implemented by an observation device 300, a server device 200 connected via a network 150, and a display device 100 connected via a network 150. be done.
- the observation device 300 is a device inside the image holding device. More specifically, the observation device 300 holds an image obtained by shooting as wide-viewing-angle video information, and supplies it to the display device 100 so that part of the video can be viewed on the display device 100. It is configured.
- the observation device 300 is a so-called omnidirectional camera capable of capturing images of 360 degrees around it.
- the observation device 300 may be, for example, a photographing device 300a that is held by hand, or an observation device 300b that is fixed with a tripod or the like. In the case of the photographing device 300a held in hand, it is easy to photograph while moving around. These types are hereinafter referred to as the observation device 300 without particular distinction.
- the observation device 300 has an optical element such as a fisheye lens, and can photograph a wide viewing angle region, such as 180 degrees, with a single sensor array.
- a 360-degree wide-viewing-angle image can be captured using a plurality of combinations of optical elements and sensor arrays that are arranged to complement each other in different wide-viewing-angle regions.
- processing is performed in which the images captured by each of the plurality of sensor arrays are superimposed by specifying the elements corresponding to each other. This results in an image that can be converted back and forth between a plane and a sphere, such as an equirectangular view.
- a video moving image
- the inside of the spherical image is also called a 3D image space or a 3D image space.
- two three-dimensional image spaces are generated in which a shift corresponding to human parallax is generated.
- Such two three-dimensional image spaces may be generated from one three-dimensional image space by simulation or the like, or may be generated by two cameras with a parallax shift.
- the network 150 is a communication network for connecting the observation device 300, the server device 200, and the display device 100 so that they can communicate with each other.
- a communication network such as the Internet is used as the network 150 here, it is not limited to this.
- the connection between the observation device 300 and the network 150, the connection between the server device 200 and the network 150, and the connection between the display device 100 and the network 150 may be performed by wireless communication, or by wired communication. may be broken.
- the server device 200 is a device for performing information processing and the like, and is implemented using, for example, a processor and memory.
- the server device 200 may be implemented by an edge computer or by a cloud computer.
- One server device 200 may be provided for one video display system 500 , or one server device 200 may be provided for a plurality of video display systems 500 .
- the server device 200 may perform various processes in a plurality of video display systems 500 in parallel.
- the server device 200 is not an essential component in the video display system 500 .
- each functional unit of the server device 200 (to be described later) to the observation device 300 and the display device 100 respectively, it is possible to realize a video display system including only the observation device 300 and the display device 100.
- the display device 100 is implemented by an information processing terminal such as a smart phone that also serves as a display panel
- the functional units of the server device 200 can be easily implemented using the processor or the like of the information processing terminal.
- part of the functions of the observation device 300 or the display device 100 can be reduced, and the existing observation device or display device can be diverted. It becomes possible.
- a video display system can be easily realized.
- Each functional unit of the server device 200 will be described later using FIG. 16 and the like.
- the display device 100 supports the two lens barrels separated from each other by engaging the temple portions extending from the left and right with the auricles, respectively.
- Each lens barrel of the display device 100 has a built-in display panel, and, for example, as shown in FIG. 15, images with parallax blurring are projected toward the left and right eyes of the user.
- FIG. 15 shows an image for one frame in the video for the left eye
- (R) shows the same one-frame image in the video for the right eye.
- the display device 100 may not be a terminal dedicated to such video display.
- the display device of the present disclosure can also be realized by a display panel provided in a smartphone, a tablet terminal, a PC, or the like.
- FIG. 16 is a block diagram showing the functional configuration of the video display system according to the embodiment. As shown in FIG. 16 and as described with reference to FIG. 14, the video display system 500 includes the display device 100, the server device 200, and the observation device 300.
- the display device 100 has a display section 101 and a display state estimation section 102 .
- the display unit 101 is a functional unit that uses a backlight, a liquid crystal panel, an organic EL, a micro LED, and the like to output an optical signal according to image information.
- the display unit 101 controls the output optical signal so that an image is formed on the retina of the user's eye via optical elements such as a lens and an optical panel. As a result, the user can visually recognize the image with the image formed on the retina.
- the display unit 101 continuously outputs the above images in the time domain, so that continuous images, that is, video images can be visually recognized. In this manner, the display unit 101 displays images to the user of the display device 100 .
- the display state estimating unit 102 is a functional unit for estimating in which position and in which direction the user is viewing the video using the display device 100 in the three-dimensional image space. It can also be said that the display state estimation unit 102 estimates at least one of the position and direction in the three-dimensional image space of the display device 100 .
- the display state estimator 102 is implemented by various sensors such as an acceleration sensor and a gyro sensor built in appropriate positions of the display device 100 .
- the display state estimating unit 102 estimates the position of the display device 100 in the three-dimensional image space by estimating in which direction and how much the position has been changed with respect to a reference position preset in the display device 100 . presume.
- the display state estimation unit 102 estimates in which direction and by what angle the posture has been changed with respect to a reference direction set in advance in the display device 100, thereby obtaining a three-dimensional image space of the display device 100. Estimate the direction in .
- the display device 100 is supported by the user's head (the ear shell and nasal root), and therefore moves together with the user's head.
- the position and direction of the display device 100 it is possible to extract and display a visual field portion corresponding to the position and direction from the wide-viewing-angle image. That is, according to the position and direction of the display device 100 estimated by the display state estimation unit 102, the direction in which the user's head is facing is displayed as the desired viewing area in the three-dimensional image space. can be done.
- the direction of the display device 100 estimated here is the direction along the normal direction of the display panel of the display device 100 . Since the display panel is arranged to face the user's eyes, the user's eyes are normally positioned in the normal direction of the display panel. Therefore, the direction of the display device 100 matches the direction connecting the user's eyes and the display panel.
- the direction of the display device 100 and the direction of the user's line of sight may deviate due to the user's eye movement.
- the display device 100 is equipped with a sensor (eye tracker) or the like that detects the line of sight of the user, the detected line of sight of the user may be used as the direction of the display device 100 .
- the eye tracker is another example of the display state estimator.
- the display device 100 includes a power supply, various input switches, a display panel drive circuit, input/output wired and wireless communication modules, audio signal processing circuits such as signal converters and amplifiers, and A microphone and speaker for voice input/output are installed. These detailed configurations will be described later.
- the server device 200 has a reception unit 201 , a difference calculation unit 202 , a presentation unit 203 and a video generation unit 204 .
- the receiving unit 201 is a processing unit that receives (acquires) various signals from the observation device 300 described later.
- the receiving unit 201 receives the wide-viewing-angle image captured by the observation device 300 .
- the receiving unit 201 also receives metadata acquired by the observation device 300 .
- the receiving unit 201 receives information about the position and direction of the display device 100 estimated by the display device 100 .
- the difference calculation unit 202 calculates a relative position, which is the position of the gaze target 301 relative to the position of the display device 100 . It is a processing unit that calculates the relative direction, which is the direction of the gaze target 301 relative to the position and the direction of the display device 100 . A detailed operation of the difference calculation unit 202 will be described later.
- the presentation unit 203 is a processing unit that presents the relative position and relative direction calculated by the difference calculation unit 202 to the user of the display device 100 .
- the presentation unit 203 causes the image generation unit 204 to perform the above-described presentation by including content indicating the relative movement direction in the display image generated by the image generation unit 204.
- Presentation is not limited to the example included in the display image. For example, it may be presented as a sound from a predetermined direction of arrival corresponding to at least one of the relative position and the relative direction within the three-dimensional sound field, or a device such as a vibrating device held in both hands by the user may be presented as the sound from the relative position and the relative direction. It may be presented by vibrating the device on the side corresponding to at least one.
- a detailed operation of the presentation unit 203 will be described later together with a detailed operation of the difference calculation unit 202 .
- the image generation unit 204 cuts out a part of the image corresponding to the viewing area corresponding to the position and direction of the display device 100 estimated by the display state estimation unit 102 from the received wide-viewing-angle image. a display image including content indicating at least one of the calculated relative position and relative direction. Detailed operations of the video generation unit 204 will be described later together with detailed operations of the difference calculation unit 202 and the presentation unit 203 .
- Server device 200 also has a communication module for transmitting the generated display image to display device 100 .
- the observation device 300 has a storage unit 301 , an input interface 302 , a position input unit 303 , a data acquisition unit 304 , a metadata acquisition unit 305 and a transmission unit 306 .
- the observation device 300 also has a photographing unit (not shown) that is a functional part related to image photography and that is configured integrally with other functional components of the observation device 300 .
- the imaging unit may be separated from other functional configurations of the imaging device 300 by wired or wireless communication.
- the imaging unit includes an optical element, a sensor array, an image processing circuit, and the like. The imaging unit outputs, for example, the luminance value of light of each pixel received on the sensor array via an optical element as 2D luminance value data.
- the image processing circuit performs post-processing such as noise removal of luminance value data, and also performs processing such as stitching for generating a three-dimensional image space from 2D image data.
- post-processing such as noise removal of luminance value data
- processing such as stitching for generating a three-dimensional image space from 2D image data.
- a video in a three-dimensional image space formed from an actual image captured by the imaging unit is displayed using a display device. It may be a fictitious image formed by a technique such as Therefore, the imaging unit is not an essential component.
- the storage unit 301 is a storage device that stores image information of the three-dimensional image space generated by the imaging unit (images forming the three-dimensional image space).
- the storage unit 301 is implemented using a semiconductor memory or the like.
- the input interface 302 is a functional unit that is used when input is made by a guide who provides VR sightseeing guidance, etc. within the 3D image space.
- the input interface 302 includes a stick that can be tilted in each direction of 360 degrees corresponding to the moving direction of the imaging unit 301, and a physical sensor that detects the tilting direction.
- the guide can input the direction of the target of gaze to the system by tilting the stick in that direction.
- the input interface is a pointing marker with a fluorescent marker or the like attached to the tip of a pointer held by the guide, the pointing marker pointing the direction of the gaze target by movement of the pointing marker;
- An input interface including an image analysis unit that receives the direction of the gaze target indicated by the indication marker by analyzing the image including the indication marker output by the input interface may be used.
- the input interface 302 is not an essential component, and the present embodiment can be realized if only one of the position input unit 303 described later is provided.
- the position input unit 303 is a functional unit for inputting the position of the gaze target.
- the position input unit 303 is realized by executing a dedicated application on an information terminal such as a smart phone possessed by the guide. Map information of the entire space corresponding to the 3D image space is displayed on the screen of the information terminal, and by selecting a predetermined position on this map information, the selected position is input to the system as the position of the gaze target. be done.
- the position input unit 303 is an example of an input interface for inputting the position of the gaze target.
- the data acquisition unit 304 is a functional unit that acquires data regarding the position and direction of the gaze target from the input interface 302, the position detection unit 303, and the like.
- the data acquisition unit 304 is connected to at least one of the input interface 302 and the position detection unit 303, and acquires physical quantities corresponding to data regarding the position and direction of the gaze target from these functional units.
- the data acquisition unit 304 is an example of a data reception unit that receives input of data regarding the direction of the gaze target that the user of the display device 100 of the present embodiment is to gaze.
- the metadata acquisition unit 305 is a functional unit that acquires metadata by converting the data on the position and direction of the gaze target acquired by the data acquisition unit 304 into metadata to be added to the captured video data.
- the acquired metadata may include various data used in the video display system 500 in addition to data regarding the position and direction of the gaze target.
- the metadata acquisition unit 305 is an example of a metadata configuration unit that configures metadata capable of reading multiple pieces of data from one piece of information by collecting multiple pieces of data into one.
- the transmission unit 306 is a communication module that transmits the captured video (wide-viewing-angle video) stored in the storage unit 301 and the acquired metadata.
- the transmission unit 306 communicates with the reception unit 201 of the server device 200 to transmit the stored video and the acquired metadata and cause the reception unit to receive them.
- FIG. 17 is a more detailed block diagram showing the functional configuration of the observation device according to the embodiment.
- FIG. 18 is a more detailed block diagram showing the functional configuration of the display device according to the embodiment. 17 and 18 show the peripheral functional configurations of the observation device 300 and the display device 100 in more detail. Some of the functions shown in these drawings may be realized by the configuration of server device 200.
- FIG. 17 is a more detailed block diagram showing the functional configuration of the observation device according to the embodiment.
- FIG. 18 is a more detailed block diagram showing the functional configuration of the display device according to the embodiment. 17 and 18 show the peripheral functional configurations of the observation device 300 and the display device 100 in more detail. Some of the functions shown in these drawings may be realized by the configuration of server device 200.
- FIG. 17 is a more detailed block diagram showing the functional configuration of the observation device according to the embodiment.
- FIG. 18 is a more detailed block diagram showing the functional configuration of the display device according to the embodiment. 17 and 18 show the peripheral functional configurations of the observation device 300 and the display device 100 in more detail
- the cue information input unit 51 corresponds to the input interface 302 and the position input unit 303, and the operator of the observation device 300 or the guide inputs the position and direction of the gaze target using switches, tablets, smartphones, etc. that are physically operated. do. Further, the cue information input means 51 designates a target to be moved from a plurality of targets by cue data.
- the cue information input means 51 may obtain cue information from video obtained from the VR video processing means 67 or audio information obtained from the audio input means 71 .
- the voice input means 71 is another example of an input interface.
- the VR video processing means 67 is connected to the VR imaging means 69 corresponding to the imaging unit.
- the cue information obtained from the cue information input means 51 is sent to the position/orientation detection/storage means 53, where it is processed together with the position and orientation of the observation device 300. In some cases, its state is stored and appropriate data is generated. , sent to the multiplexing means 61 as metadata, and multiplexed with video, audio and graphics, and then sent to the display device 100 via the server device 200 by the communication means 55 .
- the observation device 300 includes separation means 57 , VR video compression means 59 , audio compression means 63 , audio decoding means 65 and audio output means 73 in addition to the above.
- the communication means 39 receives the communication information from the observation device 300 , the separation means 37 separates the metadata and sends it to the position/direction/queue determination means 31 .
- the position/orientation/cue determination means 31 extracts the cue data from the metadata, performs predetermined processing on the cue data, sends it to the graphics generation means 33 to display the cue information as a figure, and the VR display means 15 performs the VR processing.
- the audio reproduction control means 25 generates guide audio for guidance, appropriately processes the reproduced audio, and the like.
- the relative position with respect to target A differs depending on the position of the display device 100, and when the cue information is indicated by graphics, an appropriate A directional arrow is displayed. More specifically, when the target A is on the left side, a left-pointing arrow is displayed. When controlled by voice, an announcement such as "Please look to the left” is played. In this way, the content of the cue data is compared with the position/orientation of the display device 100, and appropriate processing is performed.
- the display device 100 includes position detection means 11, rotation detection means 13, audio reproduction means 17, audio input means 19, VR control means 21, audio compression means 27, audio decoding means 35, and multiplexing means. 41 are included. Each component shown in FIG. 3 is implemented by including one or more combinations of the components shown in FIGS. 17 and 18 .
- FIG. 19 is a flow chart showing the operation of the video display system according to the embodiment.
- an image is captured by the image capturing unit, the image is stored in the storage unit 301, and the input interface 302, position input unit 303, data acquisition unit 304, and meta
- the data acquisition unit 305 metadata including data regarding the position and direction of the gaze target is acquired.
- the metadata is received by the server device 200 together with the captured and stored video via the transmission unit 306 and the reception unit 201 (S101).
- the display state estimation unit 102 of the display device 100 continuously estimates the position and direction of the display device 100 .
- Display device 100 transmits the direction of display device 100 estimated by display state estimation section 102 to server device 200 .
- the server device 200 receives the estimated position and direction of the display device 100 (S102). Note that the order of steps S101 and S102 may be changed.
- the server apparatus 200 determines whether or not there is an input designating the position and direction of the gaze target, based on whether or not data relating to the gaze target position and direction are included (S103). If it is determined that there is an input indicating the position and direction of the gaze target (Yes in S103), the server device 200 starts an operation for presenting the relative position and relative direction to the user of the display device 100.
- the difference calculation unit 202 calculates the position viewed by the user (that is, the position of the display device 100) based on the position and direction of the display device 100 and the data on the position and direction of the gaze target on the metadata. ) is calculated as a relative position. Further, the difference calculation unit 202 calculates the direction in which the user is looking (that is, the direction of the display device 100) based on the position and direction of the display device 100 and the data on the position and direction of the gaze target on the metadata. The direction of the gaze target relative to is calculated as the relative direction (S104).
- FIG. 20 is a conceptual diagram explaining generation of a display image according to the embodiment.
- (a) shows a partial image cut out from the three-dimensional image space
- (b) shows graphics 99 generated by the presentation unit 203
- (c) shows a display generated by superimposition. shows the video.
- FIG. 21 is a conceptual diagram explaining generation of a display image according to the embodiment.
- (a) shows an image captured by the imaging unit
- (b) shows a display image viewed by the user.
- the guide is shown in (a), but this guide does not appear on the video. Therefore, in (b), an image without a guide is shown.
- the guide instructs the user facing the guide to gaze at the gaze target in the front direction, as in "What can be seen in the front?" is doing.
- (b) of FIG. 21 a schematic diagram representing the direction in which the user is viewing is shown below the image viewed by the user.
- the upper side of the paper surface is the front direction in the three-dimensional image space, which is recognized by the guide as the direction the user is looking. That is, the user shown in FIG. 21 is looking straight ahead.
- the keyword "front” uttered by the guide is acquired by the voice input means 71 or the like, and this is acquired as data regarding the direction of the gaze target.
- an arrow 99a is superimposed to generate a display image.
- the direction of this arrow is the arrow 99a that simply points to the "front” direction because the "front” as the gaze target direction and the "front” as the direction of the display device 100 match.
- the sound is reproduced as shown in FIG. 21(b).
- the voice saying "What can be seen in the front?" be done.
- FIG. 22 is a conceptual diagram explaining generation of a display image according to the embodiment.
- (a) shows an image captured by the imaging unit
- (b) shows a display image viewed by the user.
- (a) shows a guide, but this guide does not appear on the video. Therefore, in (b), an image without a guide is shown.
- the guide instructs the user facing the guide to gaze at the gaze target in the right direction, as in "What is visible on the right hand side?" is doing.
- a schematic diagram representing the direction viewed by the user is shown below the image viewed by the user.
- the upper side of the paper surface is the front direction in the three-dimensional image space, which is recognized by the guide as the direction the user is looking. That is, the user shown in FIG. 22 is looking straight ahead.
- the keyword "right hand" uttered by the guide is acquired by the voice input means 71 or the like, and this is acquired as data relating to the gaze target direction.
- an arrow 99a is superimposed to generate a display image.
- FIG. 23 is a conceptual diagram explaining generation of a display image according to the embodiment.
- (a) shows an image captured by the imaging unit
- (b) shows a display image viewed by the user.
- the guide is shown in (a), but this guide may or may not appear on the video.
- the image of the visual field portion without the guide is shown.
- the guide instructs the user facing the guide to gaze at the gaze target in the right direction, as in "What is visible on the right hand side?" is doing.
- a schematic diagram representing the direction in which the user is viewing is shown below the image viewed by the user.
- the upper side of the paper surface is the front direction in the three-dimensional image space, which is recognized by the guide as the direction the user is looking. That is, the user shown in FIG. 23 is looking in the right direction.
- the keyword "right hand” uttered by the guide is acquired by the voice input means 71 or the like, and this is acquired as data relating to the gaze target direction.
- the "right hand” here is a direction closer to the front side than the right hand direction viewed by the user.
- Such a detailed gaze target direction is input using the input interface 302 or the like.
- an arrow 99a is superimposed to generate a display image.
- the direction of the arrow is the direction of the "right hand near the front” and the "right hand”.
- the difference is calculated as an arrow 99a pointing in the "left hand” direction.
- FIG. 24 is a conceptual diagram explaining generation of a display image according to the embodiment.
- (a) shows an image captured by the imaging unit
- (b) shows a display image viewed by the user.
- the guide is shown in (a), but this guide does not appear on the video. Therefore, in (b), an image without a guide is shown.
- the guide gazes at the gaze target located on the right hand side so that the user facing the guide says, "What is visible on the right hand side?" is instructed to
- a schematic diagram representing the direction in which the user is viewing is shown below the image viewed by the user.
- the upper side of the paper surface is the front direction in the three-dimensional image space, which is recognized by the guide as the direction the user is looking.
- the user shown in FIG. 24 is looking forward.
- the position of the target of gaze input by the guide is acquired by the position input unit 303 or the like, and this is acquired as data regarding the position of the target of gaze.
- a display image is generated by superimposing the map 99b.
- an arrow pointing to a position corresponding to the position of the target of gaze is attached within the map. Since the position of the gaze target and the position of the display device 100 do not match, the map 99b has an arrow pointing to the vicinity of the 4 o'clock direction.
- the user's position that is, the position of the display device 100
- the central portion is the central portion.
- FIG. 25 is a conceptual diagram explaining generation of a display image according to the embodiment.
- (a) shows an image captured by the imaging unit
- (b) shows a display image viewed by the user.
- FIG. 25, (a) shows a guide, but this guide may or may not appear on the video.
- the image of the visual field portion without the guide is shown.
- the guide gazes at the gaze target located on the right hand side so that the user facing the guide says, "What is visible on the right hand side?" is instructed to
- FIG. 25(b) a schematic diagram representing the direction in which the user is viewing is shown below the image viewed by the user.
- the upper side of the paper surface is the front direction in the three-dimensional image space, which is recognized by the guide as the direction the user is looking. That is, the user shown in FIG. 25 is looking in the right direction.
- the position of the target of gaze input by the guide is acquired by the position input unit 303 or the like, and this is acquired as data regarding the position of the target of gaze.
- a display image is generated by superimposing the map 99b.
- an arrow pointing to a position corresponding to the position of the target of gaze is attached within the map. Since the position of the gaze target and the position of the display device 100 do not match, the map 99b has an arrow indicating the vicinity of the 2 o'clock direction.
- FIG. 26 is a conceptual diagram explaining generation of a display image according to the embodiment. Since FIG. 26 shows another example of the operation in the same situation as in FIG. 21, description of the situation is omitted. As shown in FIG. 26(b), a display image is generated on which the arrow 99a is not superimposed. Since "front” as the direction of the gaze target coincides with "front” as the direction of the display device 100, the arrow 99a simply pointing to the "front” direction is displayed. The arrow 99a is not displayed here because it would be redundant to present an arrow pointing in the front direction to the user who is standing. On the other hand, the sound is reproduced as shown in FIG. 26(b). Here, since "front” as the direction of the gaze target coincides with "front” as the direction of the display device 100, the voice saying "What can be seen in the front?" be done.
- FIG. 27 is a conceptual diagram explaining generation of a display image according to the embodiment. Since FIG. 27 shows another example of the operation in the same situation as in FIG. 22, description of the situation is omitted. As shown in FIG. 27(b), a display image is generated by superimposing a mask 99c instead of the arrow 99a. Since the "right hand” as the gaze target direction and the "front” as the direction of the display device 100 do not match, the difference between the "right hand” and the "front” is calculated, and the "right hand” direction is determined by the user. is a mask 99c for guiding the user to visually recognize.
- the mask 99c that covers a part of the side opposite to the relative direction side as the graphics 99, it is configured to give a large change on the displayed image.
- the direction of the arrow 99a is visually easy to understand, the change on the image may be difficult to understand.
- This example can compensate for the above drawbacks. That is, the user moves the line of sight to view the remaining image from the area covered by the mask 99c, and the direction of the line of sight movement corresponds to the relative direction. , has the advantage that the relative direction can be easily recognized naturally.
- the term "covering” as used herein also includes covering with a semi-transmissive image in which the area to be covered is partially transparent.
- FIG. 28 is a conceptual diagram explaining generation of a display image according to the embodiment. Since FIG. 28 shows another example of the operation in the same situation as in FIGS. 22 and 27, description of the situation is omitted. As shown in FIG. 28(b), instead of the arrow 99a and mask 99c, a resolution reduction filter 99d is superimposed to generate a display image. The resolution reduction filter 99d coarsens the image of the portion where the filter is superimposed, like so-called mosaic processing. Then, as with the mask 99c, the user moves the line of sight in order to visually recognize a clearer image portion from an area where visual recognition has become difficult due to the low resolution, and the direction of the line of sight movement corresponds to the relative direction. There is an advantage that it is easy to recognize the relative direction naturally.
- FIG. 28 shows a situation in which the user is visually recognizing the right direction. Since this right-hand direction corresponds to the direction of the target of gaze, the resolution-lowering filter 99d is not superimposed on this area, and the image remains clear.
- FIG. 28(c) shows a situation in which the user is further viewing in the right direction (that is, in the back direction with respect to the original front). Since the display image is generated by superimposing the same resolution reduction filter 99d also in the back direction, there is an advantage that the right hand direction to be visually recognized is easier for the user to recognize. 28(c) and 28(d) are also valid for the example using the mask 99c (example of FIG. 27).
- FIG. 29 is a conceptual diagram explaining generation of a display image according to the embodiment. Since FIG. 29 shows another example of the operation in the same situation as in FIGS. 21 and 29, description of the situation is omitted. As shown in (b) of FIG. 29, a display image on which the arrow 99a is not superimposed is generated. Since "front” as the direction of the gaze target coincides with "front” as the direction of the display device 100, the arrow 99a simply pointing to the "front” direction is displayed. The arrow 99a is not displayed here because it would be redundant to present an arrow pointing in the front direction to the user who is standing. On the other hand, the sound is reproduced as shown in FIG. 29(b).
- FIG. 30 is a conceptual diagram explaining generation of a display image according to the embodiment. Since FIG. 30 shows another example of the operation in the same situation as in FIGS. 22, 27 and 28, description of the situation is omitted. As shown in FIG. 30(b), a display image is generated on which the arrow 99a is not superimposed. On the other hand, the sound is reproduced as shown in FIG. 30(b).
- the "right hand” as the direction of the gaze target does not match the "front” as the direction of the display device 100, the difference between the "front” and the "right hand” is calculated. A voice saying “I can see it” is reproduced together with the displayed image.
- the sound reproduced here is stereoscopic sound perceived by the user as sound coming from the user's "right hand", as in the example of FIG.
- FIG. 31 is a conceptual diagram explaining generation of a display image according to the embodiment.
- (a) shows an image captured by the imaging unit
- (b) shows a display image viewed by the user.
- (a) shows a guide, but this guide may or may not appear on the video.
- the image of the visual field portion without the guide is shown.
- the guide instructs the user facing the guide to gaze at the gaze target in the right direction, as in "What is visible on the right hand side?" is doing.
- FIG. 31(b) a schematic diagram representing the direction viewed by the user is shown below the image viewed by the user.
- the upper side of the paper surface is the front direction in the three-dimensional image space, which is recognized by the guide as the direction the user is looking. That is, the user shown in FIG. 31 is in a state of looking in the back direction.
- the keyword "right hand” uttered by the guide is acquired by the voice input means 71 or the like, and this is acquired as data relating to the gaze target direction. Then, the sound is reproduced as shown in FIG. 31(b).
- the "right hand” as the direction of the gaze target does not match the "back” as the direction of the display device 100, the difference between the "right hand” and the “back” is calculated. A voice saying “I can see it” is reproduced together with the displayed image.
- FIG. 32 is a conceptual diagram explaining generation of a display image according to the embodiment.
- (a) shows a map of each point included in the three-dimensional image space, points A, B, C, and G on the map, and (b) to (d) shows the display image that the user is viewing.
- point A shown in (a) corresponds to the user position (position of display device 100) shown in (b)
- point B shown in (a) corresponds to the user position (position of display device 100) shown in (c). position
- the point D shown in (a) corresponds to the user's position (position of the display device 100) shown in (d)
- the point G shown in (a) corresponds to the gaze target position.
- schematic diagrams representing directions viewed by users are shown below images viewed by respective users.
- the upper side of the paper surface is the front direction in the three-dimensional image space, which is recognized by the guide as the direction the user is looking. That is, the user shown in (b) of FIG. 32 is looking in the front direction, the user shown in (c) in FIG. 32 is looking in the front direction, and (d) in FIG. The user shown in is looking in the right direction.
- the voice input means 71 or the like acquires the keyword "Please gather at XX (place name of point G, etc.)" uttered by the guide, and acquires this as data relating to the position of the gaze target. do. Then, sounds are reproduced as shown in (b) to (d) of FIG.
- the respective relative positions are calculated, and the voice saying "Please gather at XX" is played along with the displayed image. is played. 29 and 30, the sound reproduced here is stereoscopic sound perceived by the user as sound coming from the direction of the relative position.
- the display device 100 since the user can grasp at least one of the relative position and the relative direction, it is possible to easily recognize at least one of the position and the direction of the gaze target.
- the image display system 500 it is possible to cause the display device 100 to display an appropriate image.
- Example A more detailed description will be given below based on an example of the embodiment.
- the two use cases are 360° camera shooting and 3DCG space.
- the former there is a photographer with a 360-degree camera in a remote location, and the 360-degree video + metadata is sent, and the viewer generates CG based on the metadata information etc. with a VR device and sends it to the 360-degree °It is expected to be used for viewing by synthesizing with video.
- VR sightseeing real-time sightseeing
- VR sightseeing real-time + recording
- VR factory tour site inspection: guided by a leader (instruction to cameraman required)
- VR real exhibition inspection leader ( A privileged leader is designated for each group) to lead a guided tour.
- 360° video when shooting and transmitting 360° video in real time, if the location is the same and only the direction that each user is looking is different, the guide is taking a selfie and the direction in which you are reflected If you put a cue when you want to see , that cue is transmitted in some way, and on the viewer side, when that cue comes, you can force the center of the viewpoint to move there (the cutout position of the video can be change). At this time, in order to avoid VR motion sickness, processing such as switching between display of movement is performed (for example, motion sickness may occur if forced to move slowly). Alternatively, when a cue comes, an arrow or target could be displayed on the viewer to prompt the user to turn in that direction.
- the CG avatar corresponding to the guide instructs to look in a specific direction of a specific place for explanation etc., it is forcibly turned in a specific direction, guided by displaying a space guide, or when the current position is If the position is significantly different from the specified location and the specified object cannot be seen, the position is moved (warped), but a warning will be displayed on the viewer as sudden warping will give a surprise (may lead to nausea). Then, by displaying an unusual warp button (escape button), etc., the user is made to take an action and then move (the direction is also adjusted when moving).
- the process corresponds to the third process described above, in which the avatar is warped to the location where the guide is and is directed in the direction of the guide.
- Important elements in the present invention include (i) a method for detecting, recording, and transmitting directions and bearings, and (ii) generating, transmitting, and controlling cues such as directing and gathering participants in specific directions ( compulsion, alerting, etc.).
- cue generation there are also two sub-elements: direction, bearing detection and direction, bearing transmission.
- cue delivery further includes three sub-elements: cue generation, cue delivery, and VR space control method according to the cue.
- cues are generated using switches, menus, displaying specific objects, such as guide flags, using specific light pulses, using specific keywords, such as the front used by tourist guides, Directions such as right, names of buildings, actions such as gathering, or voices that use keywords when giving instructions such as "Everyone, let's turn to the right" with specific keywords at the beginning, voices that cannot be recognized by humans It is envisaged to use pulses, such as audio watermarks.
- the content of the cue is assumed to be the same as that of the direction.
- FIG. 33 is a diagram showing an example of the functional configuration of the video display system according to the embodiment.
- 34 is a diagram illustrating an example of a functional configuration of an observation system according to an embodiment;
- FIG. 35 is a diagram illustrating an example of the functional configuration of the VR system according to the embodiment;
- FIG. 33 is a diagram showing an example of the functional configuration of the video display system according to the embodiment.
- FIG. 33 An implementation example of a 360° camera will be described with reference to FIGS. 33 and 34.
- FIG. 33 An implementation example of a 360° camera will be described with reference to FIGS. 33 and 34.
- the observation system 3351 (360° camera 3401) of the embodiment of the present invention is almost the same as the implementation example of the 360° camera of Conventional Example 2, and differences will be explained.
- an input button and a position/orientation detection/storage unit 3402 are added as a cue information input unit (cue information input means 3352) as a program of the CPU 802, and from the input button (data input unit) Metadata based on the input cue information and the position/orientation detected by the position/orientation detection/storage unit (metadata conversion unit) 3402 is generated by the position/orientation detection/storage unit and sent to the multiplexing unit 832, It is multiplexed and sent to the VR system 3301 via the wireless communication element (transmitter) 820 .
- Queue information can be obtained by selecting a pre-specified direction, such as the right or left side, using the input button, selecting a plurality of pre-specified locations using numbers or menus, and selecting a destination. It is used to give timing to start moving.
- buttons there are designation of the right, left, front and back with buttons, designation of start/end with the direction change start/end buttons 811, designation of targets with number keys, designation with a touch panel, and the like.
- the cue information can be obtained not only by the input button, but also by analyzing the captured image by the position/orientation analysis units 3403 and 3404 indicated by the dashed lines, and by comparing it with the image specified in advance. Detecting the ground, detecting the position and direction of the destination of movement, analyzing the hand gestures and gestures of the guide, detecting the direction of the destination and the timing of the movement, and a button and LED on the pointer-like stick held by the guide. By operating the guide, the light-emitting element emits light in pulses and detects the light-emitting pattern, thereby detecting the direction and position of the destination or selecting one from a plurality of predetermined destinations.
- the voice from the microphone is analyzed, the destination is specified from the words generated by the guide, the direction of movement and the timing of movement are detected, and the position/direction detection function executed by the CPU 802 is used to determine the appropriate meta. It is converted into data and sent to the VR system 3301 .
- FIG. 33 An implementation example of the VR system 3301 (display device, HMD/VR glasses and computer/smartphone 3501) of the embodiment of the present invention will be described with reference to FIGS. 33 and 35.
- FIG. 33 An implementation example of the VR system 3301 (display device, HMD/VR glasses and computer/smartphone 3501) of the embodiment of the present invention will be described with reference to FIGS. 33 and 35.
- FIG. 33 An implementation example of the VR system 3301 (display device, HMD/VR glasses and computer/smartphone 3501) of the embodiment of the present invention will be described with reference to FIGS. 33 and 35.
- position/direction/cue determination units (viewpoint determination units) 3502 and 3503 are added as programs to the CPU 965 and GPU 954 of the computer/smartphone 3501 of the VR system of Conventional Example 2.
- a position/direction/cue determination unit 3503 of the CPU 965 receives cue data as metadata from the observation system 3351 via a communication element (receiving unit) together with the position/direction of the target obtained from the observation system 3351 or the guide. Accordingly, when changing the sound, the sound reproduction unit realized by the program of the CPU 965 generates the guide sound, appropriately processes the reproduced sound, and the like. When changing VR video or graphics by cue data, the position/orientation/cue determination unit 3503 of the CPU 965 sends metadata to the GPU via the system bus.
- the GPU 954 processes the metadata received by the position/orientation/cue determination unit 3502 and sends information to a graphics generation unit (graphics generation unit) 959 to display graphics based on the cue data.
- the graphics data from the graphics generator 959 is superimposed on the VR video and displayed by the VR display controller 957 .
- information based on the cue data is sent from the position/orientation/cue determination unit 3502 to the VR control unit 956, and the information from the motion/position sensor 903 is detected by the motion/position detection processing unit (detection unit) 955.
- the VR display control unit (display control unit) 957 appropriately processes the VR image along with the state of the position and orientation of the , the image data is sent from the AV output 952 to the AV input 925, and the image display processing unit 912 displays the VR image as a display element (Display unit) 905 performs display. Even if the above audio, graphics, and VR video processing are implemented independently and no other processing is performed, there are cases where multiple processing is implemented and processing is selected during operation of the VR system or observation system. be.
- the position/azimuth detection processing of the observation system 3351 may be realized by a computer system such as the cloud, which is located between the observation system and the VR system.
- a computer system such as the cloud
- no metadata is sent from the observation system 3351, or data input by the operator is sent as metadata.
- the position/direction or movement of the observation system, guide, or target is detected and sent to the VR system as metadata.
- an existing 360° camera can exhibit the effects of this embodiment.
- Whether some processing is performed by the GPU 954 or the CPU 965 may differ from this example, and the bus configuration may also differ from this example, but there is no difference in the functional configuration and operation described later.
- this embodiment is almost the same as the conventional example, and a small integrated VR system can be realized by realizing the functions by using the CPU 965 and GPU 954 as one.
- the embodiment describes the observation system in FIG. 34 and the VR system in FIG. 35 as functional blocks, integrating the two systems not as actual connections but as data and control flows.
- the VR imaging camera 804 in FIG. 34 corresponds to the VR imaging means 762 in FIG.
- the VR video processing section is VR video processing means 758
- the VR video compression section is VR compression means 756, the microphone group, microphone terminal, microphone amplifier and ADC are audio input means 763
- the audio compression section is audio compression means 760
- the input button cue information input means 3352 motion/position detection unit, position/orientation detection/storage unit, position/orientation analysis unit of GPU and CPU position/orientation detection/storage unit 3353
- multiplexing unit multiplexing unit 2652 the radio communication element corresponds to the communication means 754
- the separation part corresponds to the separation means 755
- the audio decoding part corresponds to the audio decoding means 761
- the DAC the amplifier, the headphone element, and the speaker correspond to the audio output means 764 .
- Video system bus, memory bus, system bus, I/O bus, bus conversion, RAM, EEPROM, SD card, power switch, power control element, battery, display element, shooting mode selection button, zoom button, shooting start/end button is not shown because it is not directly related to the operation of the present invention.
- the communication element in FIG. 35 corresponds to the communication means 716 in FIG.
- the separation unit is a separation unit 715
- the audio decoding unit is an audio decoding unit 713
- the audio reproduction control unit is an audio reproduction control unit 709
- the DAC, amplifier, speaker, and headphone terminal is an audio reproduction unit 705
- the VR video decoding unit is The VR video decoding means 710
- the graphics generation unit is the graphics generation unit 712
- the position/orientation/cue determination unit in each of the CPU and GPU is the position/orientation/cue determination unit 3302
- the motion/position sensor and the motion/position detection unit are Position detection means and rotation detection means
- motion/position detection processing section and VR control section are VR control means 707
- VR display control section is VR display control means 708, video display processing section, display element, and lens are VR video display.
- the means 704 , the microphone, the microphone amplifier, the ADC correspond to the audio input means 706
- the audio compression section corresponds to the audio compression means 714
- the multiplexing section corresponds to the multiplexing means 717 .
- Video system bus, memory bus, system bus, I/O bus, bus conversion, RAM, EEPROM, non-volatile memory, power switch, power control element, battery, volume button, AV output, AV input, USB operates according to the present invention. The illustration is omitted because it is not directly related to , or because it is described as one system.
- a wireless communication element is necessary for communication with the controller, but is omitted from FIG. 33 because the controller is omitted.
- cue information input means 3352 is provided.
- the cue information input means 3352 uses switches, tablets, smartphones, etc. physically operated by the operator or guide of the observation system 3351 to input the timing of the start and end of movement, the position or direction of movement, or the destination (target). Enter the position and direction.
- cue data designates a target to move from among multiple targets.
- the cue information input means 3352 may obtain cue information from video obtained from the VR video processing means 758 or audio information obtained from the audio input means, as indicated by the dashed line.
- the cue information obtained from the cue information input means 3352 is sent to the position/orientation detection/storage means 3353, where it is processed together with the position and orientation of the observation system 3351.
- the state is stored and formed into suitable data. and sent to the multiplexing means 2652 as metadata, and sent to the VR system 3301 by the communication means 754 after being multiplexed together with video, audio and graphics.
- the communication means 716 receives the communication information from the observation system 3351 , the separation means 715 separates the metadata and sends it to the position/direction/cue determination means 3302 .
- the position/direction/cue determination means 3302 extracts the cue data from the metadata, performs predetermined processing, sends the cue information to the graphics generation means 712 to display the cue information as a graphic, and outputs the VR image to the VR display means 704 . , or send it to the VR control means 707, appropriately process the VR image by the VR display control means 708 together with the state of the position and orientation of the VR system 3301, and display it on the VR display means 704;
- the audio reproduction control means 709 generates the guide audio, appropriately processes the reproduced audio, and the like.
- the cue data shows the data "move to target A", depending on the position of the VR system, the position of target A will be different. is displayed. Specifically, when target A is on the left side, a left-pointing arrow is displayed. Please,” an announcement flows. In this manner, appropriate processing is performed by comparing the content of the cue data with the position/orientation of the VR system.
- 36 and 37 are diagrams showing configuration examples of metadata according to the embodiment. A configuration example of metadata in this embodiment will be described.
- the type of metadata contains a predetermined code or character string indicating that it is the metadata of the present invention.
- the version number is a number for when the metadata structure is changed. ), etc., and used with the idea of guaranteeing compatibility between major versions of the same major version.
- the function code is 0, it indicates that the metadata information is invalid, and otherwise indicates the type of information in the metadata.
- 0001 indicates that the format describes the reference position, camera, guide and target positions, moving directions and velocities.
- 0002 indicates graphics data
- 0003 indicates information of the VR system
- 0011 indicates cue data sent from the observation system to 0001
- 0021 indicates cue data. For example, it has a moving target.
- Metadata includes multiple parameters related to cue data, such as the type and size of cue data.
- parameters relating to cue data as shown in FIG. 37, eight types of cue data, for example, 0 to 7, are prepared, and a numerical value specifying which cue data is entered.
- multiple targets can be selectively specified, so parameters for specifying targets are included.
- a different numerical value is set for each of the plurality of targets, and the target can be specified and selected by the numerical value.
- the direction can be specified in units of 1°.
- the reference position is position data that serves as a reference for the position data, and is represented by, for example, X (east-west distance), Y (north-south distance), Z (height direction distance), or latitude and altitude for the entire system. Determine in advance including the unit. If the reference position is 0, it indicates that the position at the time of resetting the entire system is used as the reference. Regarding the position of the camera and the position of the guide, it is determined in advance whether they are absolute coordinates or relative coordinates from the reference position.
- the movement direction and speed show the movement status of the guide as an observation system, but if there is cue data, it shows how it will move from now on.
- the number of targets indicates the destinations visited as tourism. When the number of targets is 0, it indicates that there are no targets.
- the verification code is a code for verifying that the metadata data is correct during transmission, and uses, for example, a CRC.
- the order of items, the content of items, and the values of each item of metadata may differ from this configuration example, but may have the same functions.
- FIG. 38 is a diagram showing another configuration example of metadata according to the embodiment.
- the targeted state metadata is shown for the example of FIG.
- FIG. 39 is a diagram showing an example of the operation flow of the video display system according to the embodiment. The operation of the embodiment of the present invention will be explained.
- the input from the input button or the like is treated as cue information by the cue information input means (S3932), and the position/direction detection/storage means confirms whether or not there is valid cue information (S3933). . If there is valid cue information (Yes in S3933), metadata is generated from the input cue information by the position/direction detection/storage means (S3934). It is multiplexed with graphics information (S3927) and sent to the VR system by communication means (S3928). If there is no valid queue information (No in S3933), no processing is performed (S3935).
- the cue information is extracted by the cue input means from the input audio information, the VR video/audio input, and the VR video input from the imaging means (S3930), and is converted into metadata by the position/orientation detection/storage means (S3931). ), multiplexed with video, audio and graphics information by the multiplexing means (S2927), and sent to the VR system by the communication means (S3928).
- the separation means separates the metadata from the information received by the communication means (S3901, S3902), the metadata is analyzed by the position/direction/cue determination means in the metadata analysis step (S3906), and the cue information is obtained.
- the cue information is sent to the graphics generation means, the VR control means, or the audio control means.
- the graphics generating means generates graphics based on the cue information.
- the VR control means controls the VR video based on the cue information (S3907, S3908, S3909, S3910).
- voice information is added or controlled based on cue information by voice control means (S3903, S3904, S3905). Which of the above processes is performed depends on the settings of the VR system or the entire system.
- step S3921 corresponds to step S1321
- step S3922 corresponds to step S1322
- step S3923 corresponds to step S1323
- step S3924 corresponds to step S1324
- step S3925 corresponds to step This corresponds to S1325
- step S3926 corresponds to step S1326.
- FIG. 40 is a diagram explaining the result of the operation of the video display system in the example.
- the example of FIG. 40 shows a case where arrow display graphics are superimposed.
- the cue data shows the data "turn right", depending on the position of the VR system
- the target may be to the left of the user of the VR system.
- an arrow pointing to the left is displayed.
- the position and direction of the target is known with fairly high accuracy in the observation system, inputting that information and sending it as metadata to the VR system will compare it with the direction of the user of the VR system.
- the target is in front with a predetermined error
- the arrow indicates the front of the user of the VR system.
- An arrow pointing to the right is shown. If the target is behind the user within a separately defined error range, a backward arrow is shown.
- the azimuth determination means receives the position/azimuth of the observation system, guide, or target as metadata, sends the information to the graphics generation means to display it as a graphic, and superimposes it on the VR image by the VR display means.
- the leftmost example in the figure shows how an arrow pointing to the front is displayed.
- the center left example shows how an arrow is displayed to point to the right.
- an arrow pointing left if the user of the VR system is facing right, an arrow pointing left, if facing left, an arrow pointing backward, and if facing backward, an arrow pointing left and backward.
- an arrow pointing to the left is displayed when facing to the right).
- the arrow changes direction accordingly, as described above.
- a state in the case of MAP display is shown.
- an arrow pointing to the right is displayed, or a star mark is used to indicate the destination.
- the rightmost example shows the case of MAP display.
- the arrow and the MAP are shown to be appropriately rotated according to the orientation.
- FIG. 41 is a diagram for explaining the result of the operation of the video display system in the example.
- the example of FIG. 41 shows the case of processing a VR video.
- the cue data shows the data "turn to the right"
- there may be a target on the left side or behind the user of the VR system and when controlling the image , eg mask and resolution control positions change.
- the determination of the direction is the same as in the case of the arrow.
- the center left example areas other than the right side are masked to encourage the user to turn to the right.
- the middle example shows how to lower the resolution except for the right side and encourage people to look to the right.
- the center right example shows how the mask (center left example) and resolution (middle example) return to the original display when facing right.
- the rightmost example when the user faces right from the beginning, nothing is displayed, or the area other than the center is masked or the resolution is reduced. If it was facing left, a little bit of the right might be masked, and if it was looking back, the left might be masked.
- FIG. 42 is a diagram for explaining the result of the operation of the video display system in the example.
- the example of FIG. 42 shows the case where the audio guide is reproduced. Specifically, if the cue data shows the data "turn to the right", depending on the position of the VR system, the target may be on the left side of the VR system user. Please turn to the left,” an announcement flows. The determination of the direction is the same as in the case of the arrow.
- the voice when explaining the building in front, the voice is played so that the voice of the guide can be heard from the front.
- the guide's voice when explaining the building on the right, the guide's voice is heard from the right, and when the back is the target, the voice is heard from the back, and when the left is the target, the voice is heard from the left. It shows how it is played back (in the figure, the voice of the guide can be heard from the right while explaining the building on the right).
- the voice of the guide will be heard from the left of the user of the VR system, since the right hand is actually the left hand. If there is confusion, replace the "right” in the voice with "left", or combine with graphics to indicate the direction with an arrow or the like.
- FIG. 43 is a diagram for explaining the result of the operation of the video display system in the example.
- the example of FIG. 43 shows a case where multiple users are dispersed within the same image space. Specifically, if there are multiple VR system users and they are in a different location than the guide location in the VR space, the VR images that the VR system users see are pre-recorded. Become. Alternatively, there are multiple observation systems. At this time, in order for the guide to gather the participants at the guide's location, if the cue data for gathering is indicated by voice, button or map position, arrows or It is displayed or reproduced for each user of the VR system by processing the VR image, or by using voice, vibration of the controller, or the like.
- VR system A faces north and the front is A Shrine, so the arrow is displayed pointing to the front.
- VR system B faces north and the left hand is A shrine, so the arrow is on the left. Since VR system C is facing east and the A shrine is on the right, an arrow is displayed pointing to the right.
- MAP display may be performed.
- the controller When moving, the controller may be used to move using the so-called warp function. As a similar function, the user may move by selecting the function "move to guide position" from the menu.
- This function is also effective when multiple VR systems are distributed in a wide VR space such as VR conferences, VR events, and watching VR sports.
- FIG. 44 is a diagram explaining an example of the application of the video display system in the embodiment.
- a video display system can be used in the example of a VR exhibition. For example, when a guide or organizer announces that they will gather at booth 2-3, the participants in a plurality of VR systems can move from their respective positions.
- FIG. 45 is a diagram explaining an example of the application of the video display system in the embodiment.
- a video display system can be used as an example of a VR mall. For example, when a guide or organizer announces that they will gather at atrium A8, participants in multiple VR systems can move from their positions.
- VR sightseeing in the VR space composed of CG instead of sending VR images from the observation system, VR sightseeing in the VR space composed of CG, VR conferences (multiple sessions gather into the whole session), VR sports viewing (from multiple viewing locations) It can be applied to VR events (gathering at the main venue from multiple different event locations), etc.
- the same method can be used not only for gatherings, but also when traveling as a group to different venues. In this case, instead of the host or guide issuing cues, the group members should issue cues.
- FIG. 46 is a diagram for explaining another example of the moving method of the video display system in the embodiment.
- the movement start method differs depending on the situation of the VR screen viewed by the user of the VR system.
- the controller function for example, warp function
- This method has no problem at short distances, but is troublesome at long distances.
- FIG. 47 is a diagram for explaining a configuration example in which the image display system according to the embodiment is realized using the cloud.
- the graphics, VR video, sound, and vibration of the controller are controlled according to the position, orientation, and cue information of the observation system 4761 on the cloud and the position and orientation of the VR system.
- the position/direction/cue detection storage means 4740 in the cloud by having the position/direction/cue detection storage means 4740 in the cloud, the position/direction of the observation system 4761 on the cloud is separated by the separation means 4742 from the data sent from the observation system.
- Graphics such as arrows are generated by the graphics generating means 4736 by reading data or by reading the position and direction of the observation system from the VR video sent from the observation system 4761 .
- the position and orientation of the observation system and the VR system position and orientation are determined by the VR control means 4707, and the VR display control means 4708 performs graphics synthesis with the VR video, or processes the VR video, and controls audio playback.
- the functions provided on the cloud are not limited to the configuration shown in FIG. 47, and the functions and operations as a whole may be approximately the same depending on the configuration and functions of the connected observation system or VR system. You can choose the functions provided on the cloud. As an example, if the observation system does not detect the position and direction of the observation system, but detects the position and direction of the observation system on the cloud and sends it as graphics to the VR system superimposed on the video, the position of the VR system ⁇ Although there are restrictions on changing graphics depending on the orientation, no special functions are required for the VR system. Also, in a configuration in which the VR system is provided with position/orientation control means for correcting graphics according to the position/orientation of the VR system and graphics generation means, it is possible to change the graphics according to the position/orientation of the VR system.
- Queue information input means 4762, multiplexing means 4763, communication means 4764, separation means 4765, VR video compression means 4766, audio compression means 4767, audio decoding means 4768, VR video processing means 4769, VR photographing means 4770, audio input means 4771 , audio output means 4772 are position detection means 702, rotation detection means 703, VR display means 704, audio reproduction means 705, audio input means 706, VR control means 707, VR display control means 708, and audio reproduction control means 709.
- VR video decoding means 710 graphics generation means 712, audio decoding means 713, audio compression means 714, separation means 715, communication means 716, multiplexing means 717, position/direction/cue determination means 3302, cue information input means 3352 , position/direction detection/storage means 3353, communication means 754, separation means 755, VR video compression means 756, multiplexing means 757, VR video processing means 758, graphics generation means 759, audio compression means 760, audio decoding means 761 , VR imaging means 762, audio input means 763, and audio output means 764, respectively, in one-to-one, many-to-one, one-to-many, or many-to-many correspondence.
- FIG. 48 is a diagram for explaining a configuration example in which the image display system according to the embodiment is realized using the cloud.
- the observation system position/direction/cue detection and storage means 4740 may be realized by a computer system such as a cloud located between the observation system 4761 and the VR system 4701 .
- the metadata indicating the direction is not sent from the observation system, or the cue information data input by the operator is sent as metadata.
- the position/direction or movement of the observation system, guide, or target is detected from the video, audio, or metadata sent from the observation system 4861 and sent as metadata to the VR system. As a result, even an existing 360° camera can exhibit the effects of this embodiment.
- the position/direction determination means 4915 on the VR system side and the control of the VR video and audio by it may also be realized by a computer system between the VR system such as the cloud and the observation system.
- the same processing can be performed at one place, and it becomes easy to give the same effect to a plurality of VR systems at the same time, and the effect of the present invention can be given to existing systems.
- the configuration of FIG. 48 is an example in which the position and direction of the VR system are not sent to the cloud side.
- the VR display control means can perform processing such as changing the resolution of the VR image, masking, and changing the localization of the sound according to the output of the position/direction/cue detection/storage means.
- FIG. 49 is a diagram for explaining a configuration example in which the video display system according to the embodiment is realized using the cloud.
- the position and orientation of the observation system on the cloud are read from the metadata separated by the separation means 4912 from the data sent from the observation system, and graphics such as arrows are generated by the graphics generation means 4916 accordingly.
- graphics are generated from the metadata separated by the separation means, and the position and direction of the VR system obtained from the position/azimuth and position detection means and rotation detection means of the observation system are determined by the VR control means.
- Display and audio output suitable for the position and orientation of the VR system are possible. Also, although not shown here, it is possible to appropriately control the controller of the VR system and notify the user of the VR system of the direction and position by vibration or the like.
- the position and orientation information of the VR system detected by the position detection means and rotation detection means of the VR system is used as metadata, multiplexed with other information by the multiplexing means, and sent to the computer system on the cloud by the communication means. .
- This function is almost included in general VR systems.
- each component may be realized by executing a software program suitable for each component.
- Each component may be realized by reading and executing a software program recorded in a recording medium such as a hard disk or a semiconductor memory by a program execution unit such as a CPU or processor.
- each component may be realized by hardware.
- each component may be a circuit (or integrated circuit). These circuits may form one circuit as a whole, or may be separate circuits. These circuits may be general-purpose circuits or dedicated circuits.
- the present disclosure is useful for displaying an appropriate image on a display device.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Human Computer Interaction (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- Computer Graphics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- Library & Information Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Geometry (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Development Economics (AREA)
- Controls And Circuits For Display Device (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Telephonic Communication Services (AREA)
- Studio Devices (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
近年、ユーザが頭部に装着することで眼前に表示部を配置して、見かけ上、表示される画像を大画面で視認することが可能な表示装置が開発されている。このような表示装置は、ヘッドマウントディスプレイ(HMD)と呼ばれ、光学的に大画面として画像を視認できるという特性がある。また、HMDでは、ユーザの右目及び左目のそれぞれに対応する視差分のずれを発生させた映像を表示させることで、ユーザが、視聴する映像を立体的に感じることができるものもある。そして、近年の通信品質の向上によって、遠隔地に配置された観測装置によって撮影された映像を数ミリ秒~数十ミリ秒程度の遅延で、略リアルタイムに視聴することができ、現地を訪れなくても、その場に居るかのような体験をすることが可能となっている。この技術を利用して、観光旅行、展示会見学、視察、工場見学、美術館・博物館・動物園・水族館見学等のバーチャルな観光体験(以下、疑似観光、又は、VR(Virtual reality)観光ともいう)も実現されるようになってきた。
本開示の概要は、以下のとおりである。
[構成]
はじめに、実施の形態における映像表示システムの概要について、図14及び図15を用いて説明する。図14は、実施の形態に係る映像表示システムの概略構成を示す図である。また、図15は、実施の形態に係る映像表示システムにおいて表示される映像の一例を示す図である。
次に、上記のように構成された映像表示システム500の動作について、図19~図32を用いて説明する。図19は、実施の形態に係る映像表示システムの動作を示すフローチャートである。
以下、実施の形態の実施例に基づいて、さらに詳細に説明する。この実施例では、主にVR観光の2つのユースケースについて説明する。2つのユースケースは、360°カメラ撮影の場合、および、3DCG空間の場合の2つのケースである。前者では、遠隔地に360°カメラをもつ撮影者が居て、360°映像+メタデータを送り、視聴者はVR機器でメタデータの情報等を元にCGを生成し、送られてきた360°映像と合成することで視聴するという使い方が想定される。この場合、VR観光(リアルタイム観光)、VR観光(リアルタイム+録画)、VR工場見学、現場視察:引率者がガイドする(カメラマンへの指示が必要)、および、VRリアル展示会視察:引率者(グループ毎に、特権リーダを指定)が引率して、ガイドツアーを行う、などの用途にさらに細分される。
以上、実施の形態等について説明したが、本開示は、上記実施の形態等に限定されるものではない。
13 回転検出手段
15 VR表示手段
17 音声再生手段
19、71 音声入力手段
21 VR制御手段
23 VR表示制御手段
25 音声再生制御手段
27、63 音声圧縮手段
29 VR映像復号手段
31 位置・方位・キュー判断手段
33 グラフィックス生成手段
35、65 音声復号手段
37、57 分離手段
39、55 通信手段
41、61 多重化手段
51 キュー情報入力手段
53 位置・方位検出・記憶手段
59 VR映像圧縮手段
67 VR映像処理手段
69 VR撮影手段
73 音声出力手段
99 グラフィック
99a 矢印
99b マップ
99c マスク
99d 低解像度化フィルタ
100 表示装置
101 表示部
102 表示状態検出部
150 ネットワーク
200 サーバ装置
201 受信部
202 差分算出部
203 提示部
204 映像生成部
300 観測装置
300a、300b 撮影装置
301 記憶部
302 入力インターフェイス
303 位置入力部
304 データ取得部
305 メタデータ取得部
306 送信部
500 映像表示システム
Claims (19)
- 表示装置により表示映像を表示するための映像表示システムであって、
広視野角映像を生成する撮影部と、前記広視野角映像内で前記表示装置のユーザに注視させる注視対象の位置及び方向の少なくとも一方に関するデータ、ならびに、前記観測システムの状態の変化を知らせるためのキュー情報を取得するデータ取得部と、前記データ取得部からの前記データを他の情報とともにメタデータとするメタデータ構成部と、前記広視野角映像を前記メタデータとともに送信する送信部と、を有する観測装置と、
前記広視野角映像、前記データ及び前記キュー情報を受信する受信部と、前記表示装置の前記広視野角映像内における位置及び方向の少なくとも一方を推定する表示状態推定部と、推定された前記表示装置の前記広視野角映像内における位置及び方向の少なくとも一方と、前記メタデータ上の前記注視対象の位置及び方向の少なくとも一方との差分に基づいて、前記表示装置の前記広視野角映像内における位置及び方向の少なくとも一方に対する相対的な前記注視対象の位置である相対位置、及び、相対的な前記注視対象の方向である相対方向の少なくとも一方を算出する差分算出部と、算出された前記相対位置及び前記相対方向の少なくとも一方の情報、ならびに、前記キュー情報による指示及び前記観測システムの状態を前記表示装置のユーザに提示する提示部と、受信された前記広視野角映像から前記表示状態推定部が推定した前記表示装置の前記広視野角映像内における位置及び方向の少なくとも一方の情報、ならびに、前記キュー情報による指示及び前記観測システムの状態に応じた視野部分に対応する一部の画像を含む前記表示映像を生成する映像生成部と、前記表示映像を表示する前記表示装置と、を有するVR装置と、を備える
映像表示システム。 - 映像を撮影するカメラ、又は、演算により画像を生成する画像生成部をさらに備え、
前記広視野角映像は、前記カメラによって撮影された映像、又は、前記画像生成部によって演算された画像である
請求項1に記載の映像表示システム。 - 前記提示部は、
算出された前記相対位置及び前記相対方向の少なくとも一方およびキュー情報に基づく情報を示すグラフィックスを生成して出力し、
前記一部の画像に、出力した前記グラフィックスを重畳させることで、前記映像生成部に前記相対位置及び前記相対方向の少なくとも一方を提示させる
請求項1又は2に記載の映像表示システム。 - 前記データ受付部は、前記注視対象の方向に関するデータの入力を受け付け、
前記表示状態推定部は、前記表示装置の前記広視野角映像内における方向を推定し、
前記グラフィックスは、前記表示映像上に前記相対方向を指し示す矢印を表示させる
請求項3に記載の映像表示システム。 - 前記データ受付部は、前記注視対象の方向に関するデータの入力を受け付け、
前記表示状態推定部は、前記表示装置の前記広視野角映像内における方向を推定し、
前記グラフィックスは、前記表示映像上における前記相対方向側以外の少なくとも一部を覆い隠すための画像であるマスクを表示させる
請求項3に記載の映像表示システム。 - 前記データ受付部は、前記注視対象の位置に関するデータの入力を受け付け、
前記表示状態推定部は、前記表示装置の前記広視野角映像内における位置を推定し、
前記グラフィックスは、前記表示映像上に前記相対位置を示すマップを表示させる
請求項3に記載の映像表示システム。 - さらに、前記データの入力に用いるための入力インターフェイスを備え、
前記データ取得部は、前記入力インターフェイスを介して入力された前記データを取得する
請求項1~6のいずれか1項に記載の映像表示システム。 - さらに、前記ユーザの前記広視野角映像内での移動の開始と終了のタイミングの少なくとも一方を指定するための入力インターフェイスを備え、
前記データ取得部は、前記入力インターフェイスを介して入力された前記移動の開始と終了のタイミングの少なくとも一方を取得する
請求項3~5のいずれか1項に記載の映像表示システム。 - 前記広視野角映像を構成する画像は、実空間を撮影する撮影部によって出力された画像であり、
前記入力インターフェイスは、前記実空間において前記入力インターフェイスの操作者が保持する指示マーカであって、前記指示マーカの動きによって前記注視対象の位置及び方向の少なくとも一方を指示する指示マーカと、前記撮影部によって出力された前記指示マーカを含む画像を解析することにより前記指示マーカによって指示された前記注視対象の位置及び方向の少なくとも一方を受け付ける画像解析部と、を有する
請求項6に記載の映像表示システム。 - 前記観測装置と前記VR装置が備える機能の少なくとも一部を備え、前記観測装置及び前記VR装置とをネットワークで接続され、前記観測装置また前記VR装置の処理の一部を担う情報処理装置を備える
請求項1~9のいずれか1項に記載の映像表示システム。 - 前記情報処理装置は、
前記広視野角映像、前記データ及び前記キュー情報を前記観測装置から前記メタデータとして受信する受信部と、
前記メタデータ上の前記注視対象の位置及び方向の少なくとも一方、ならびに、前記キュー情報に従った情報を前記表示装置のユーザに提示させるための情報を生成する提示部と、
受信された前記広視野角映像から、前記表示状態推定部が推定した前記表示装置の前記広視野角映像内における位置及び方向の少なくとも一方の情報に応じた視野部分に対応する一部の画像に前記提示部で生成された情報を加えて、前記表示映像を生成する映像生成部と、
前記広視野角映像、前記相対位置及び前記相対方向の少なくとも一方の情報に応じた視野部分に対応する一部の画像、ならびに、前記メタデータを送信する送信部と、を備えた
請求項10に記載の映像表示システム。 - 前記情報処理装置は、
前記広視野角映像、前記データ及び前記キュー情報を前記観測装置から前記メタデータとして受信する受信部と、
前記メタデータ上の前記注視対象の位置及び方向の少なくとも一方、ならびに、前記キュー情報に従った情報を前記表示装置のユーザに提示させるための情報を生成する提示部と、
前記提示部で生成された情報から前記メタデータを生成するメタデータ構成部と、
前記メタデータ構成部で生成された前記メタデータと前記受信部で受信された前記広視野角映像とその他の情報とを、前記VR装置に送信する送信部と、を備えた
請求項10に記載の映像表示システム。 - 前記情報処理装置は、
前記広視野角映像、前記データ及び前記キュー情報を前記観測装置から前記メタデータとして受信し、前記表示装置の向きに関するデータを前記表示装置から受信
する受信部と、
前記表示装置の向きと前記撮影部の移動に関する移動情報との差分および前記キュー情報に基づいて、前記表示装置の向きに対する相対的な前記撮影部の移動方向である相対移動方向を算出する差分算出部と、
算出した前記相対移動方向を示すグラフィックスであって、 前記広視野角映像のうち、前記表示装置の推定された向きに応じた視野部分に対応する一部の映像に対して重畳されることで前記相対移動方向および前記キュー情報に従った情報を前記表示装置のユーザに提示させるグラフィックスを生成して出力する提示部と、
前記グラフフィックスを前記表示装置の向きに関するデータに基づいて修正し、前記広視野角映像に重畳することで前記表示映像を生成する映像生成部と、
前記表示映像とその他の情報とを送信する送信部と、を備えた
請求項10に記載の映像表示システム。 - 前記情報処理装置は
広域ネットワークに接続されたクラウド上に設けられ、
前記広域ネットワークを介して前記観測装置及び前記VR装置と接続される
請求項10~13のいずれか1項に記載の映像表示システム。 - 前記キュー情報は
前記観測装置の移動方向又は前記表示装置のユーザに注視させる注視対象の位置及び方向の少なくとも一方が変化することを示す情報である
請求項1~14のいずれか1項に記載の映像表示システム。 - 表示装置に広視野角映像内の少なくとも一部の表示映像を表示させるための映像表示システムに用いられる情報処理装置であって、
前記広視野角映像で前記表示装置のユーザに注視させる注視対象の位置及び方向の少なくとも一方に関するデータであって、入力を受け付けることによって得られたデータに基づくメタデータを受信する受信部と、
前記メタデータ上の前記注視対象の位置及び方向の少なくとも一方との差分に基づいて、前記表示装置の前記広視野角映像内における位置及び方向の少なくとも一方に対する相対的な前記注視対象の位置である相対位置、及び、相対的な前記注視対象の方向である相対方向の少なくとも一方を算出して出力する差分算出部と、を備える
情報処理装置。 - 算出した前記相対位置及び前記相対方向の少なくとも一方を示すグラフィックスであって、前記広視野角映像を構成する画像のうち、前記表示装置の前記広視野角映像内における推定された位置及び方向の少なくとも一方に応じた視野部分に対応する一部の画像に対して重畳されることで、前記相対位置及び前記相対方向の少なくとも一方を前記表示装置のユーザに提示させるグラフィックスを生成して出力する提示部をさらに備える
請求項16に記載の情報処理装置。 - 表示装置に広視野角映像内の少なくとも一部の表示映像を表示させる情報処理方法であって、
前記広視野角映像で前記表示装置のユーザに注視させる注視対象の位置及び方向の少なくとも一方に関するデータであって、入力を受け付けることによって得られたデータに基づくメタデータを受信し、
前記表示装置の前記広視野角映像内における推定された位置及び方向の少なくとも一方と、前記メタデータ上の前記注視対象の位置及び方向の少なくとも一方との差分に基づいて、前記表示装置の向きに対する相対的な前記注視対象の位置である相対位置、及び、相対的な前記注視対象の方向である相対方向の少なくとも一方を算出して出力する
情報処理方法。 - 請求項18に記載の情報処理方法をコンピュータに実行させるための
プログラム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202280028039.7A CN117121473A (zh) | 2021-04-16 | 2022-04-18 | 影像显示系统、信息处理装置、信息处理方法及程序 |
EP22788229.7A EP4325842A4 (en) | 2021-04-16 | 2022-04-18 | VIDEO DISPLAY SYSTEM, INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM |
US18/286,354 US20240205513A1 (en) | 2021-04-16 | 2022-04-18 | Video display system, information processing device, information processing method, and recording medium |
JP2023514693A JPWO2022220306A5 (ja) | 2022-04-18 | 映像表示システム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163176004P | 2021-04-16 | 2021-04-16 | |
US63/176,004 | 2021-04-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022220306A1 true WO2022220306A1 (ja) | 2022-10-20 |
Family
ID=83640677
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/018088 WO2022220307A1 (ja) | 2021-04-16 | 2022-04-18 | 映像表示システム、観測装置、情報処理方法、及び、プログラム |
PCT/JP2022/018087 WO2022220306A1 (ja) | 2021-04-16 | 2022-04-18 | 映像表示システム、情報処理装置、情報処理方法、及び、プログラム |
PCT/JP2022/018086 WO2022220305A1 (ja) | 2021-04-16 | 2022-04-18 | 映像表示システム、情報処理方法、及び、プログラム |
PCT/JP2022/018089 WO2022220308A1 (ja) | 2021-04-16 | 2022-04-18 | 映像表示システム及び映像表示方法 |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/018088 WO2022220307A1 (ja) | 2021-04-16 | 2022-04-18 | 映像表示システム、観測装置、情報処理方法、及び、プログラム |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/018086 WO2022220305A1 (ja) | 2021-04-16 | 2022-04-18 | 映像表示システム、情報処理方法、及び、プログラム |
PCT/JP2022/018089 WO2022220308A1 (ja) | 2021-04-16 | 2022-04-18 | 映像表示システム及び映像表示方法 |
Country Status (5)
Country | Link |
---|---|
US (3) | US20240196062A1 (ja) |
EP (4) | EP4325842A4 (ja) |
JP (3) | JP7486110B2 (ja) |
CN (4) | CN117121473A (ja) |
WO (4) | WO2022220307A1 (ja) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015170461A1 (ja) * | 2014-05-07 | 2015-11-12 | 日本電気株式会社 | 画像処理装置、画像処理方法およびコンピュータ可読記録媒体 |
WO2016009864A1 (ja) * | 2014-07-18 | 2016-01-21 | ソニー株式会社 | 情報処理装置、表示装置、情報処理方法、プログラム、および情報処理システム |
JP2016090773A (ja) | 2014-11-04 | 2016-05-23 | 株式会社ソニー・コンピュータエンタテインメント | ヘッドマウントディスプレイおよび輝度調整方法 |
JP2019503612A (ja) * | 2015-12-22 | 2019-02-07 | トムソン ライセンシングThomson Licensing | 乖離したカメラの照準方向を制御するための方法及び機器 |
JP2019521547A (ja) * | 2016-05-02 | 2019-07-25 | フェイスブック,インク. | コンテンツを提示するためのシステムおよび方法 |
JP2019197939A (ja) * | 2018-05-07 | 2019-11-14 | 株式会社Nttドコモ | 情報処理装置及び情報処理システム |
Family Cites Families (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012010418A (ja) | 1996-10-03 | 2012-01-12 | Masanobu Kujirada | 実況映像提供システム |
JP2001352534A (ja) | 2000-06-08 | 2001-12-21 | Casio Comput Co Ltd | 放映配信システムおよびそのプログラム記録媒体 |
JP2002169901A (ja) * | 2000-12-01 | 2002-06-14 | I Academy:Kk | インターネットを利用した集合参加型教育システム |
JP2004007561A (ja) | 2002-04-15 | 2004-01-08 | Sumitomo Electric Ind Ltd | テレビ会議システム、それに含まれる端末装置、及びデータ配信方法 |
JP4434219B2 (ja) * | 2007-02-26 | 2010-03-17 | 株式会社デンソー | 画像サーバ |
US20100283829A1 (en) | 2009-05-11 | 2010-11-11 | Cisco Technology, Inc. | System and method for translating communications between participants in a conferencing environment |
CA2720886A1 (en) * | 2010-11-12 | 2012-05-12 | Crosswing Inc. | Customizable virtual presence system |
JP6026088B2 (ja) * | 2011-08-09 | 2016-11-16 | 株式会社トプコン | 遠隔操作システム |
JP2015005967A (ja) * | 2013-05-20 | 2015-01-08 | 株式会社ニコン | 電子機器及びプログラム |
WO2016065519A1 (en) * | 2014-10-27 | 2016-05-06 | SZ DJI Technology Co., Ltd. | Uav flight display |
KR102144515B1 (ko) * | 2015-01-07 | 2020-08-14 | 삼성전자주식회사 | 마스터 기기, 슬레이브 기기 및 그 제어 방법 |
JP6392150B2 (ja) * | 2015-03-18 | 2018-09-19 | 株式会社東芝 | 講演支援装置、方法およびプログラム |
US9911238B2 (en) * | 2015-05-27 | 2018-03-06 | Google Llc | Virtual reality expeditions |
US9794514B1 (en) * | 2016-06-03 | 2017-10-17 | Avaya Inc. | Positional sensitive interaction functionality |
KR101917860B1 (ko) * | 2016-07-28 | 2018-11-12 | 주식회사 빅스 | 무인 비행체의 최적 경로 탐색 방법, 최적 경로 탐색 서버 및 시스템 |
JP2018112809A (ja) * | 2017-01-10 | 2018-07-19 | セイコーエプソン株式会社 | 頭部装着型表示装置およびその制御方法、並びにコンピュータープログラム |
JP6929674B2 (ja) * | 2017-03-22 | 2021-09-01 | 株式会社東京エネシス | 環境画像表示システム及び環境画像表示方法 |
JP6915442B2 (ja) * | 2017-08-10 | 2021-08-04 | 株式会社Ihi | 移動体の運動情報表示システム |
JP2019040555A (ja) * | 2017-08-29 | 2019-03-14 | ソニー株式会社 | 情報処理装置、情報処理方法、およびプログラム |
US10950135B2 (en) * | 2017-11-09 | 2021-03-16 | Accenture Global Solutions Limited | Customized virtual reality learning environment |
EP3547691A4 (en) * | 2018-02-02 | 2019-11-13 | Ntt Docomo, Inc. | INFORMATION PROCESSING DEVICE |
JP6981305B2 (ja) * | 2018-02-27 | 2021-12-15 | トヨタ自動車株式会社 | 情報処理装置、画像配信システム、情報処理方法、及びプログラム |
JP2019075075A (ja) * | 2018-03-28 | 2019-05-16 | 株式会社自律制御システム研究所 | 無人航空機の飛行計画経路を設定するためのシステム及びプログラム |
JP6974247B2 (ja) * | 2018-04-27 | 2021-12-01 | エスゼット ディージェイアイ テクノロジー カンパニー リミテッドSz Dji Technology Co., Ltd | 情報処理装置、情報提示指示方法、プログラム、及び記録媒体 |
JP7353821B2 (ja) * | 2019-06-24 | 2023-10-02 | キヤノン株式会社 | 画像処理装置、その制御方法、プログラム |
CN110673734B (zh) * | 2019-09-30 | 2023-12-01 | 京东方科技集团股份有限公司 | 虚拟旅游方法、客户端、服务器端、系统及图像采集设备 |
JP6727388B1 (ja) * | 2019-11-28 | 2020-07-22 | 株式会社ドワンゴ | 授業システム、視聴端末、情報処理方法およびプログラム |
JP6855616B2 (ja) * | 2020-04-01 | 2021-04-07 | キヤノン株式会社 | 操作装置、移動装置、およびその制御システム |
-
2022
- 2022-04-18 US US18/286,357 patent/US20240196062A1/en active Pending
- 2022-04-18 CN CN202280028039.7A patent/CN117121473A/zh active Pending
- 2022-04-18 WO PCT/JP2022/018088 patent/WO2022220307A1/ja active Application Filing
- 2022-04-18 CN CN202280028041.4A patent/CN117223050A/zh active Pending
- 2022-04-18 JP JP2023514695A patent/JP7486110B2/ja active Active
- 2022-04-18 WO PCT/JP2022/018087 patent/WO2022220306A1/ja active Application Filing
- 2022-04-18 CN CN202280028058.XA patent/CN117280696A/zh active Pending
- 2022-04-18 EP EP22788229.7A patent/EP4325842A4/en active Pending
- 2022-04-18 JP JP2023514694A patent/JPWO2022220307A1/ja active Pending
- 2022-04-18 EP EP22788228.9A patent/EP4325476A4/en active Pending
- 2022-04-18 US US18/286,353 patent/US20240196045A1/en active Pending
- 2022-04-18 JP JP2023514692A patent/JPWO2022220305A1/ja active Pending
- 2022-04-18 WO PCT/JP2022/018086 patent/WO2022220305A1/ja active Application Filing
- 2022-04-18 EP EP22788231.3A patent/EP4325867A4/en active Pending
- 2022-04-18 EP EP22788230.5A patent/EP4325843A4/en active Pending
- 2022-04-18 US US18/286,354 patent/US20240205513A1/en active Pending
- 2022-04-18 WO PCT/JP2022/018089 patent/WO2022220308A1/ja active Application Filing
- 2022-04-18 CN CN202280028040.XA patent/CN117121474A/zh active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015170461A1 (ja) * | 2014-05-07 | 2015-11-12 | 日本電気株式会社 | 画像処理装置、画像処理方法およびコンピュータ可読記録媒体 |
WO2016009864A1 (ja) * | 2014-07-18 | 2016-01-21 | ソニー株式会社 | 情報処理装置、表示装置、情報処理方法、プログラム、および情報処理システム |
JP2016090773A (ja) | 2014-11-04 | 2016-05-23 | 株式会社ソニー・コンピュータエンタテインメント | ヘッドマウントディスプレイおよび輝度調整方法 |
JP2019503612A (ja) * | 2015-12-22 | 2019-02-07 | トムソン ライセンシングThomson Licensing | 乖離したカメラの照準方向を制御するための方法及び機器 |
JP2019521547A (ja) * | 2016-05-02 | 2019-07-25 | フェイスブック,インク. | コンテンツを提示するためのシステムおよび方法 |
JP2019197939A (ja) * | 2018-05-07 | 2019-11-14 | 株式会社Nttドコモ | 情報処理装置及び情報処理システム |
Non-Patent Citations (1)
Title |
---|
See also references of EP4325842A4 |
Also Published As
Publication number | Publication date |
---|---|
CN117223050A (zh) | 2023-12-12 |
CN117121474A (zh) | 2023-11-24 |
WO2022220307A1 (ja) | 2022-10-20 |
WO2022220305A1 (ja) | 2022-10-20 |
US20240196062A1 (en) | 2024-06-13 |
JPWO2022220305A1 (ja) | 2022-10-20 |
EP4325843A4 (en) | 2024-10-16 |
EP4325843A1 (en) | 2024-02-21 |
EP4325476A1 (en) | 2024-02-21 |
EP4325476A4 (en) | 2024-10-09 |
JPWO2022220307A1 (ja) | 2022-10-20 |
EP4325867A1 (en) | 2024-02-21 |
WO2022220308A1 (ja) | 2022-10-20 |
CN117121473A (zh) | 2023-11-24 |
JPWO2022220308A1 (ja) | 2022-10-20 |
US20240196045A1 (en) | 2024-06-13 |
JP7486110B2 (ja) | 2024-05-17 |
CN117280696A (zh) | 2023-12-22 |
EP4325842A4 (en) | 2024-10-16 |
EP4325842A1 (en) | 2024-02-21 |
EP4325867A4 (en) | 2024-09-25 |
JPWO2022220306A1 (ja) | 2022-10-20 |
US20240205513A1 (en) | 2024-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9858643B2 (en) | Image generating device, image generating method, and program | |
WO2020210213A1 (en) | Multiuser asymmetric immersive teleconferencing | |
WO2017086263A1 (ja) | 情報処理装置および画像生成方法 | |
US10681276B2 (en) | Virtual reality video processing to compensate for movement of a camera during capture | |
US20130176403A1 (en) | Heads up display (HUD) sensor system | |
JP2015149634A (ja) | 画像表示装置および方法 | |
US20150156481A1 (en) | Heads up display (hud) sensor system | |
CN112272817B (zh) | 用于在沉浸式现实中提供音频内容的方法和装置 | |
WO2020059327A1 (ja) | 情報処理装置、情報処理方法、及びプログラム | |
WO2022209129A1 (ja) | 情報処理装置、情報処理方法、およびプログラム | |
EP3665656B1 (en) | Three-dimensional video processing | |
KR102140077B1 (ko) | 서버, 사용자 단말 장치 및 그 제어 방법 | |
WO2019034804A2 (en) | THREE-DIMENSIONAL VIDEO PROCESSING | |
WO2018042658A1 (ja) | 携帯情報端末、頭部装着表示システム、及びその音声出力制御方法 | |
WO2022220306A1 (ja) | 映像表示システム、情報処理装置、情報処理方法、及び、プログラム | |
JP2017208808A (ja) | 仮想空間を提供する方法、プログラム及び記録媒体 | |
CN117826982A (zh) | 一种基于用户位姿计算的实时音效交互系统 | |
JP6921204B2 (ja) | 情報処理装置および画像出力方法 | |
WO2020054585A1 (ja) | 情報処理装置、情報処理方法及びプログラム | |
WO2023248832A1 (ja) | 遠隔視認システム、現地撮像システム | |
US20240022688A1 (en) | Multiuser teleconferencing with spotlight feature | |
JP2023130045A (ja) | 情報処置装置、情報処理装置の制御方法、プログラム、記録媒体、およびシステム | |
JP2021186215A (ja) | パフォーマンスイベント実施方法及び該パフォーマンスイベント実施方法において用いられる中継装置 | |
JP2006129472A (ja) | 画像処理装置および画像処理方法、並びにプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22788229 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023514693 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18286354 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022788229 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022788229 Country of ref document: EP Effective date: 20231116 |