KR101979432B1

KR101979432B1 - Apparatus and method for predicting user viewpoint using lication information of sound source in 360 vr contents

Info

Publication number: KR101979432B1
Application number: KR1020170166031A
Authority: KR
Inventors: 이예훈; 최승호; 김동호; 정은영; 윤덕규
Original assignee: 서울과학기술대학교 산학협력단
Priority date: 2017-12-05
Filing date: 2017-12-05
Publication date: 2019-05-16

Abstract

According to the present technology, disclosed are an apparatus and a method for predicting a viewpoint of a user by using location information of a sound source in 360-degree VR content. According to a specific embodiment of the present invention, the accuracy of predicting a viewpoint location can be better than the accuracy of predicting viewpoint location information of a user by using the existing image media presentation description (MPD) as the viewpoint location information of the user is predicted by using image MPD and audio MPD, a virtual reality service can be realistically provided as the user can feel the same sense of direction, the same sense of distance, and the same sense of space for virtual reality as a real environment by stereoscopically reflecting a real-environmental audio onto the virtual reality to provide the audio to the user. Therefore, the immersion and interest for the virtual reality service can be more improved.

Description

BACKGROUND OF THE INVENTION Field of the Invention [0001] The present invention relates to an apparatus and method for predicting a user's viewpoint using sound source location information in 360-degree VR content,

본 발명은 360도 VR(Virtual Reality) 콘텐츠 내의 음원 위치 정보를 이용한 사용자의 시점 예측 장비 및 방법에 관한 것으로서, 더욱 상세하게는 DASH 세그먼트(Dynamic Adaptive Streaming over Hyper-text transport protocol (HTTP)) 세그먼트 타일을 이용하여 사용자 시점(Viewpoint)의 영상 및 오디오를 압축 코딩함에 따라, 사용자 시점 예측의 정확도를 향상시킬 수 있고 실감나는 가상 현실 서비스를 제공할 수 있도록 한 기술에 관한 것이다. BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus and method for predicting a user's viewpoint using sound source location information in 360 degree VR (Virtual Reality) contents, and more particularly, Which can improve the accuracy of user-viewpoint prediction and provide realistic virtual reality service by compressing and encoding the video and audio of the user's viewpoint by using the video and audio of the user viewpoint.

최근 스마트폰 등의 기기 발달과 동시에 VR 기술(Virtual Reality: 이하 VR기술)에 대한 사회적 관심도가 높아지고 있다. VR 기술이란, 모의되는 개체에 대한 표현의 충실도를 높여 현실과 가상 체계의 차이를 극복할 수 있게 하는 기술로써 기존 기술이 갖고 있는 한계를 극복할 기술로 최근 주목 받는 기술 중 하나이다.Recently, with the development of devices such as smart phones, social interest in VR technology (VR technology) is increasing. VR technology is one of the technologies that can overcome the limitations of the existing technology by increasing the fidelity of the representation of the simulated entity and overcoming the difference between the reality and the virtual system.

이러한 360 VR 콘텐츠는 네트워크를 통해 DASH MPD 로 통해 제공된다. 즉, DASH는 HTTP(Hyper-Text Transport Protocol) 기법을 이용하는 웹 서버들로부터 인터넷을 통해 미디어 데이터 스트리밍을 가능하게 하는 적응적 비트 레이트 스트리밍 기법이다. These 360 VR content is provided over the network to the DASH MPD. That is, the DASH is an adaptive bit-rate streaming technique that enables media data streaming from the web servers using the HTTP (Hyper-Text Transport Protocol) technique over the Internet.

이때 MPD는 주기 내에 음성 및 영상 스트림을 적응 셋(Adaptation Set)을 부여하고 적응 셋 내에 해상도마다 설명 셋(Description Set)을 부여하여 각 부여된 적응 셋 및 설명 셋에 초 단위 세그먼트가 분리한 후 HTTP 서버(10)에 저장된다. In this case, the MPD assigns an adaptation set to the audio and video streams within a period, assigns a description set to each resolution in the adaptation set, divides the seconds segment into each applied adaptation set and description set, And is stored in the server 10.

한편 타일(tile)은 한 프레임의 비디오를 공간적으로 분할한 후 각 타일 별로 고효율 비디오 코덱(High Efficiency Video Codec: HEVC)을 통해 압축 코딩되며 각 타일마다 해상도를 달리하여 전송된다.On the other hand, a tile is divided into a single frame of video and then compressed and coded by a High Efficiency Video Codec (HEVC) for each tile.

이에 따라 DASH 클라이언트(20)는 HTTP 엔진으로부터 제공된 MPD(Media Presentation Description)을 파싱(Pasing)한 후 해당 콘텐츠를 요청하면, HTTP 서버는 최저 해상도의 세그먼트(Segment)를 제공하고 이 후 네트워크 상황 및 파라미터에 따라 적응적으로 세그먼트를 제공한다. 네트워크 상황이 좋은 경우 고화질의 세그먼트를 요청하고 네트워크 상황이 안 좋은 경우 저화질의 세그먼트를 요청한다. Accordingly, when the DASH client 20 pings the MPS (Media Presentation Description) provided from the HTTP engine and requests the content, the HTTP server provides a segment of the lowest resolution, And provides the segments adaptively according to < RTI ID = 0.0 > If a network situation is good, it requests a high quality segment, and if a network situation is bad, it requests a low quality segment.

그러나, 이러한 VR 기술을 이용한 제작된 360 VR 콘텐츠는 기존의 2D 콘텐츠에 비해 대역폭의 소모가 큰 한계에 도달하였다.However, the 360 VR content produced using this VR technology has a limitation in bandwidth consumption compared to the conventional 2D contents.

이에 도 1에 도시된 바와 같이, 고효율 비디오 코덱(High Efficiency Video Codec)의 타일링(tiling)을 이용하여 사용자의 관심 영역(ROI: Region of Interesting)가 포함된 시점(Viewpoint)를 예측하고 시점의 타일은 고화질로 전송하고 나머지 타일(tile)은 저화질로 전송하여 대역폭을 감소하는 다양한 방법들이 개발되어 있다. 이때 사용자 시점은 움직이는 물체 이미지를 활용하여 예측된다. 예를 들어, 자동차가 이동할 때 이동하는 자동차의 타일(tile)은 고화질로 전송되고 나머지 배경의 타일(tile)은 저화질로 전송된다. As shown in FIG. 1, a viewpoint including a region of interest (ROI) of a user is predicted using tiling of a high efficiency video codec, Various methods for reducing the bandwidth by transmitting the high quality image and transmitting the remaining tiles with low image quality have been developed. At this time, the user's viewpoint is predicted by using the moving object image. For example, when a car moves, a tile of a moving car is transmitted at a high image quality and a tile of the remaining background is transmitted at a low image quality.

그러나, 음원의 방향감과 거리감 및 공간감을 느낄 수 있도록 사용자 시점의 음원을 가상 현실 서비스에 반영하여 제공하는 기술은 없었다.However, there is no technology that reflects the sound source of the user's point in the virtual reality service so that the sense of direction, distance, and space of the sound source can be felt.

본 발명은 실감나는 영상뿐만 아니라 오디오를 입체적으로 가상 현실 서비스에 부가하여 제공함에 따라 사용자 시점 예측에 대한 정확도를 제공할 수 있는 360도 VR 콘텐츠 내의 음원 위치 정보를 이용한 사용자의 시점 예측 장치 및 방법을 제공하고자 함에 그 목적이 있다.The present invention provides an apparatus and method for predicting a user's viewpoint using sound source location information in 360 degree VR content that can provide accuracy for user's viewpoint prediction by providing not only realistic images but also audio in a stereoscopic virtual reality service The purpose is to provide.

본 발명에 의거 제공되는 입체 음향으로 인해 가상 현실 서비스에 대한 몰입도 및 흥미성을 향상시킬 수 있는 360도 VR 콘텐츠 내의 음원 위치 정보를 이용한 사용자의 시점 예측 장치 및 방법을 제공하고자 함에 그 목적이 있다.It is an object of the present invention to provide an apparatus and method for predicting a user's viewpoint using sound source location information in 360-degree VR contents, which can enhance the immersion and interest in a virtual reality service due to stereophonic sound provided by the present invention.

본 발명의 목적은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects and advantages of the present invention which are not mentioned can be understood by the following description, and will be more clearly understood by the embodiments of the present invention. It will also be readily apparent that the objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.

전술한 목적을 달성하기 위한 본 발명의 실시 태양에 의해 360도 VR 콘텐츠 내의 음원 위치 정보를 이용한 사용자의 시점 예측 장치는, According to an aspect of the present invention, there is provided a viewpoint prediction apparatus for a user using sound source location information in 360-degree VR content,

네트워크를 통해 수신된 미디어 형태가 DASH MPD(Dynamic Adaptive Streaming over Hyper-text transport protocol Media Presentation Description)인 경우 DASH MPD에 포함된 URL(uniform resource locator)과 함께 MPD(Media Presentation Description) 요청 명령을 생성하여 전달하는 DASH 클라이언트; 및 상기 MPD 요청 명령에 의거 네트워크를 통해 수신된 DASH 세그먼트를 수집하여 초 단위로 분리한 후 분리된 DASH 세그먼트 및 MPD를 상기 DASH 클라이언트로 전달하는 HTTP 엔진을 포함하는 것을 특징으로 한다.If the type of media received through the network is DASH MPD (Dynamic Adaptive Streaming over Hypertext Transport Protocol), a MPD (Media Presentation Description) request command is generated together with a URL (uniform resource locator) included in the DASH MPD Forwarding DASH client; And an HTTP engine for collecting DASH segments received through the network based on the MPD request command, separating the DASH segments in seconds, and delivering the separated DASH segments and MPDs to the DASH client.

바람직하게 상기 MPD는 오디오 MPD 및 영상 MPD를 포함하고 오디오 MPD는 SRD 및 SLID를 포함할 수 있다.Preferably, the MPD includes audio MPD and video MPD, and audio MPD may include SRD and SLID.

바람직하게 상기 SRD(Spatial Representation Description)는 source_id(콘텐츠 소스에 대한 식별자를 제공하고 좌표계를 임시적으로 정의된 10진수 형태의 양의 정수), object_x(좌표계의 미디어 에셋와 관련된 좌측 위 코너의 수평 위치를 나타낸 10진수 형태의 양의 정수), object_y(좌표계의 미디어 에셋와 관련된 좌측 위 코너의 수직 위치를 나타낸 10진수 형태의 양의 정수), object_width(좌표계의 미디어 에셋와 관련된 폭을 나타낸 10진수 형태의 양의 정수), object_height(좌표계의 미디어 에셋와 관련된 높이를 나타낸 10진수 형태의 양의 정수), total_width(좌표계의 모든 미디어 에셋의 확장 폭을 나타낸 조건부 10진수 형태의 양의 정수), total_height(좌표계의 모든 미디어 에셋의 확장 높이를 나타낸 조건부 10진수 형태의 양의 정수), 및 spatial_set_id(미디오 에셋 그룹에 대한 식별자를 제공한 조건부 10진수 형태의 양의 정수)를 포함하고, SLID(Sound Localization Information Description)는 sound_R(360 VR 콘텐츠 내의 음원 소스의 우측 위상 정보), sound_L(360 VR 콘텐츠 내의 음원 소스의 좌측 위상 정보), sound_spatial_hori(360 VR 콘텐츠 내의 음원 소스를 시각화하여 사용에게 제공하기 위한 수평각 정보), 및 sound_spatial_verti(360 VR 콘텐츠 내의 음원 소스를 시각화하여 사용에게 제공하기 위한 수직각 정보)를 포함할 수 있다.Preferably, the Spatial Representation Description (SRD) includes a source_id (an identifier for the content source and a coordinate system of a temporarily defined decimal type positive integer), object_x (representing the horizontal position of the upper left corner associated with the media asset in the coordinate system) Object_y (a positive integer in decimal form representing the vertical position of the upper-left corner associated with the media asset in the coordinate system), object_width (a positive integer in decimal form indicating the width associated with the media asset in the coordinate system) ), object_height (a positive integer in decimal form indicating the height associated with the media asset in the coordinate system), total_width (a positive integer in the form of conditional decimal representing the extension of all media assets in the coordinate system), total_height And a spatial_set_id (identification for the media asset group) SLID (sound localization information description) includes sound_R (right phase information of a sound source in 360 VR content), sound_L (left phase information of a sound source in 360 VR content) ), sound_spatial_hori (horizontal angle information for visualizing a source of sound in 360 VR content and providing it to the user), and sound_spatial_verti (vertical angle information for visualizing the source of sound in 360 VR content to provide to the user).

바람직하게 상기 DASH 클라이언트는 수신된 SRD 및 SLID를 토대로 사용자 시점 위치를 예측하고 예측된 사용자 시점 위치 타일을 상기 HTTP 엔진에 요청하도록 구비될 수 있고, 상기 HTTP 엔진은 예측된 사용자 시점 위치 타일 요청에 따라 사용자 시점 타일을 DASH 클라이언트에 전달하도록 구비될 수 있다.Preferably, the DASH client may be configured to predict a user viewpoint location based on the received SRD and SLID and to request a predicted user viewpoint location tile from the HTTP engine, and the HTTP engine may request the predicted user viewpoint location tile And to forward the user view tile to the DASH client.

본 발명의 다른 실시 태양에 의해 360도 VR 콘텐츠 내의 음원 위치 정보를 이용한 사용자의 시점 예측 방법은, DASH 클라이언트에서 네트워크를 통해 수신된 URL(uniform resource locator)과 함께 MPD 요청 명령을 HTTP 엔진로 전달하는 단계; 상기 HTTP 엔진에서 MPD 요청 명령에 응답하여 수신된 오디오 및 영상 MPD와 DASH 세그먼트들을 포함하는 미디어 데이터를 수신한 후 초 단위로 DASH 세그먼트를 분리한 후 상기 DASH 클라이언트로 전달하는 단계; 상기 DASH 클라이언트에서 수신된 영상 MPD 및 오디오 MPD를 토대로 사용자 시점(viewpoint) 위치를 예측한 후 예측된 사용자 시점 위치 타일을 상기 HTTP 엔진에 요청하는 단계; 및 상기 HTTP 엔진으로부터 제공받은 사용자 시점 위치 타일을 토대로 수신된 DASH 세그먼트를 처리하여 화면에 표시하는 단계를 포함할 수 있고, 여기서, 오디오 MPD는 SRD 및 SLID를 포함하며, SRD는 source_id(콘텐츠 소스에 대한 식별자를 제공하고 좌표계를 임시적으로 정의된 10진수 형태의 양의 정수), object_x(좌표계의 미디어 에셋와 관련된 좌측 위 코너의 수평 위치를 나타낸 10진수 형태의 양의 정수), object_y(좌표계의 미디어 에셋와 관련된 좌측 위 코너의 수직 위치를 나타낸 10진수 형태의 양의 정수), object_width(좌표계의 미디어 에셋와 관련된 폭을 나타낸 10진수 형태의 양의 정수), object_height(좌표계의 미디어 에셋와 관련된 높이를 나타낸 10진수 형태의 양의 정수), total_width(좌표계의 모든 미디어 에셋의 확장 폭을 나타낸 조건부 10진수 형태의 양의 정수), total_height(좌표계의 모든 미디어 에셋의 확장 높이를 나타낸 조건부 10진수 형태의 양의 정수), 및 spatial_set_id(미디오 에셋 그룹에 대한 식별자를 제공한 조건부 10진수 형태의 양의 정수)를 포함하고, SLID는 sound_R(360 VR 콘텐츠 내의 음원 소스의 우측 위상 정보), sound_L(360 VR 콘텐츠 내의 음원 소스의 좌측 위상 정보), sound_spatial_hori(360 VR 콘텐츠 내의 음원 소스를 시각화하여 사용에게 제공하기 위한 수평각 정보), 및 sound_spatial_verti(360 VR 콘텐츠 내의 음원 소스를 시각화하여 사용에게 제공하기 위한 수직각 정보)를 포함할 수 있다.According to another embodiment of the present invention, a user's viewpoint prediction method using sound source position information in 360-degree VR content is performed by transmitting a MPD request command to a HTTP engine together with a uniform resource locator (URL) received through a network from a DASH client step; Receiving media data including audio and video MPD and DASH segments received in response to an MPD request command from the HTTP engine, separating DASH segments in units of seconds and forwarding them to the DASH client; Requesting a predicted user viewpoint location tile from the HTTP engine after predicting a viewpoint location based on the image MPD and audio MPD received from the DASH client; And displaying the received DASH segment on a screen based on a user viewpoint location tile received from the HTTP engine, wherein the audio MPD includes an SRD and an SLID, and the SRD includes a source_id Object_x (a positive integer in decimal form representing the horizontal position of the upper left corner associated with the media asset in the coordinate system), object_y (the media asset in the coordinate system and Object_width (a positive integer in decimal form indicating the width associated with the media asset in the coordinate system), object_height (a decimal representation of the height associated with the media asset in the coordinate system Total_width (a positive integer in the form of a conditional decimal number indicating the extent of the expansion of all media assets in the coordinate system) ), total_height (a positive integer in the form of a conditional decimal number indicating the extension height of all media assets in the coordinate system), and spatial_set_id (a positive integer in the form of a conditional decimal number providing an identifier for the media asset group) SLID includes sound_R (right phase information of a sound source in a 360 VR content), sound_L (left phase information of a sound source in a 360 VR content), sound_spatial_hori (horizontal angle information to visualize a sound source in a 360 VR content to provide to the user) And sound_spatial_verti (vertical angle information for visualizing the source of sound in 360 VR content to provide to the user).

본 발명에 따르면 영상 MPD 및 오디오 MPD를 이용하여 사용자의 시점 위치 정보가 예측됨에 따라 기존의 영상 MPD을 이용하여 사용자의 시점 위치 정보를 예측하는 것 보다 시점 위치 예측에 대한 정확도를 더욱 향상시킬 수 있는 효과를 얻는다. According to the present invention, as the user's viewpoint position information is predicted using the video MPD and the audio MPD, it is possible to further improve the accuracy of the viewpoint position prediction than to predict the viewpoint position information of the user using the existing video MPD Effect is obtained.

본 발명에 의거, 실제 환경의 오디오를 입체적으로 가상 현실에 반영하여 사용자에게 제공됨에 따라 가상 현실에 대한 방향감, 거리감 및 공간감을 실제 환경과 동일하게 느낄 수 있어 실감나게 가상 현실 서비스를 제공할 수 있고 이에 따라 가상 현실 서비스에 대한 몰입도 및 흥미성을 더욱 향상시킬 수 있는 이점을 가진다.According to the present invention, since the audio of the real environment is three-dimensionally reflected to the virtual reality, it is provided to the user, so that the sense of direction, the sense of distance and the sense of space with respect to the virtual reality can be felt to be the same as the actual environment, Accordingly, the present invention has an advantage that immersion and interest in the virtual reality service can be further improved.

본 명세서에서 첨부되는 다음의 도면들은 본 발명의 바람직한 실시 예를 예시하는 것이며, 후술하는 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니된다.
도 1은 종래의 360 VR 콘텐츠의 타일을 기반으로 사용자 시점 영역 및 나머지 영역 별 해상도를 보인 예시도이다.
도 2는 본 발명의 실시 예가 적용되는 통신 시스템의 구성을 보인 도이다.
도 3은 본 발명의 실시 예에 따른 360도 VR 콘텐츠 내의 음원 위치 정보를 이용한 사용자의 시점 예측 장치의 구성을 보인 도이다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate preferred embodiments of the invention and, together with the description of the invention given below, serve to further understand the technical idea of the invention. And should not be construed as limiting.
FIG. 1 is an exemplary view showing a resolution of a user viewpoint area and a remaining area based on a tile of a conventional 360 VR content.
2 is a diagram showing a configuration of a communication system to which an embodiment of the present invention is applied.
3 is a diagram illustrating a configuration of a user's viewpoint prediction apparatus using sound source location information in 360-degree VR content according to an embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명의 실시예들을 보다 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Brief Description of the Drawings The advantages and features of the present invention, and how to accomplish them, will become apparent with reference to the embodiments described hereinafter with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. To fully disclose the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 발명에 대해 구체적으로 설명하기로 한다.The terms used in this specification will be briefly described and the present invention will be described in detail.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.While the present invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not limited to the disclosed embodiments. Also, in certain cases, there may be a term selected arbitrarily by the applicant, in which case the meaning thereof will be described in detail in the description of the corresponding invention. Therefore, the term used in the present invention should be defined based on the meaning of the term, not on the name of a simple term, but on the entire contents of the present invention.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에서 사용되는 "부"라는 용어는 소프트웨어, FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, "부"는 어떤 역할들을 수행한다. 그렇지만 "부"는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. "부"는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다.When an element is referred to as " including " an element throughout the specification, it is to be understood that the element may include other elements as well, without departing from the spirit or scope of the present invention. Also, as used herein, the term " part " refers to a hardware component such as software, FPGA or ASIC, and " part " However, " part " is not meant to be limited to software or hardware. &Quot; Part " may be configured to reside on an addressable storage medium and may be configured to play back one or more processors.

따라서, 일 예로서 "부"는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 "부"들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 "부"들로 결합되거나 추가적인 구성요소들과 "부"들로 더 분리될 수 있다.Thus, by way of example, and not limitation, " part (s) " refers to components such as software components, object oriented software components, class components and task components, and processes, Subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. The functions provided in the components and " parts " may be combined into a smaller number of components and " parts " or further separated into additional components and " parts ".

아래에서는 첨부한 도면을 참고하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. In order to clearly explain the present invention in the drawings, parts not related to the description will be omitted.

이하에 논의되는 도 2 및 도 3에 개시된 단지 예일뿐으로 어떤 식으로도 본 개시의 범위를 한정하는 것으로 간주되어서는 안될 것이다. 당업자는 본 개시의 원리들이 어떤 적절하게 구성된 무선 통신 시스템으로 구현될 수 있다는 것을 알 수 있을 것이다.2 and 3 discussed below, and should not be construed as limiting the scope of the present disclosure in any way. Those skilled in the art will appreciate that the principles of the present disclosure may be implemented in any appropriately configured wireless communication system.

도 2는 본 발명의 실시 예가 적용되는 통신 시스템을 보인 도면으로서, 도 2를 참조하면 시스템(100)은 시스템(100) 내 다양한 구성요소들 사이의 통신을 수월하게 하는 이종 네트워크(102)를 포함한다. 예를 들어, 네트워크(102)는 네트워크 어드레스들 간 인터넷 프로토콜(IP) 패킷들, 프레임 중계 프레임들, 비동기 전송 모드(ATM) 셀들, 또는 다른 정보를 전송할 수 있다. 네트워크(102)는 또한, 케이블 및 위성 통신 링크들과 같은 브로드캐스팅 네트워크들을 포함하는 이종 네트워크일 수도 있다. 네트워크(102)는 하나 이상의 LAN(local area networks); MAN(metropolitan area networks); WAN(wide area networks); 인터넷 같은 글로벌 네트워크 전체나 일부; 또는 하나 이상의 위치들에 있는 어떤 다른 통신 시스템이나 시스템들을 포함할 수 있다.2 illustrates a communication system to which an embodiment of the present invention is applied. Referring to FIG. 2, the system 100 includes a heterogeneous network 102 that facilitates communication among various components within the system 100 do. For example, the network 102 may transmit Internet Protocol (IP) packets, frame relay frames, asynchronous transfer mode (ATM) cells, or other information between network addresses. The network 102 may also be a heterogeneous network that includes broadcast networks, such as cable and satellite communication links. The network 102 may include one or more local area networks (LANs); Metropolitan area networks (MAN); Wide area networks (WAN); Global or some global network, such as the Internet; Or any other communication system or systems in one or more locations.

다양한 실시예들에서, 이종 네트워크(102)는 브로드캐스트 네트워크(102a) 및 브로드밴드 네트워크(102b)를 포함한다. 브로드캐스트 네트워크(102a)는 일반적으로 한 방향, 예컨대 하나 이상의 서버들(104-105)로부터 클라이언트 장치들(106-115)의 방향인, 클라이언트 장치들(106-115)로의 미디어 데이터의 브로드캐스트를 위한 것이다. 브로드캐스트 네트워크(102a)는 가령 위성, 무선, 유선, 및 광섬유 네트워크 링크들과 장치들과 같은 임의 개의 브로드캐스트 링크들과 장치들을 포함할 수 있다. In various embodiments, the heterogeneous network 102 includes a broadcast network 102a and a broadband network 102b. Broadcast network 102a typically broadcasts a stream of media data to client devices 106-115, which is the direction of client devices 106-115 from one or more servers 104-105, . Broadcast network 102a may include any number of broadcast links and devices, such as, for example, satellite, wireless, wired, and fiber optic network links and devices.

브로드캐스트 네트워크(102a)는 일반적으로 두 방향, 예컨대 하나 이상의 서버들(104-105)로부터 클라이언트 장치들(106-115)로 왕복하는 방향인, 클라이언트 장치들(106-115)의 미디어 데이터에 대한 브로드밴드 액세스를 위한 것이다. 브로드밴드 네트워크(102b)는 가령 인터넷, 무선, 유선, 및 광섬유 네트워크 링크들과 장치들과 같은 임의 개의 브로드밴드 링크들과 장치들을 포함할 수 있다.Broadcast network 102a is typically configured to receive media data from client devices 106-115, which is a direction to travel from two directions, e.g., one or more servers 104-105 to client devices 106-115. It is for broadband access. Broadband network 102b may include, for example, any broadband links and devices, such as Internet, wireless, wired, and fiber optic network links and devices.

네트워크(102)는 서버들(104-105) 및 다양한 클라이언트 장치들(106-115) 간의 통신을 돕는다. 서버들(104-105) 각각은 하나 이상의 클라이언트 장치들에 컴퓨팅 서비스를 제공할 수 있는 어떤 적절한 컴퓨팅 또는 프로세싱 장치를 포함한다. 서버들(104-105) 각각은 예컨대, 하나 이상의 프로세싱 장치들, 명령 및 데이터를 저장하는 하나 이상의 메모리들, 및 네트워크(102)를 통한 통신을 돕는 하나 이상의 네트워크 인터페이스들을 포함할 수 있다. 예를 들어 서버들(104-105)은 HTTP 기법을 이용하여 네트워크(102) 내 브로드캐스트 네트워크를 통해 미디어 데이터를 브로드캐스팅하는 서버들을 포함할 수 있다. 다른 예에서, 서버들(104-105)은 대쉬(DASH)를 이용하여 네트워크(102) 내 브로드캐스트 네트워크를 통해 미디어 데이터를 브로드캐스팅하는 서버들을 포함할 수 있다.The network 102 facilitates communication between the servers 104-105 and the various client devices 106-115. Each of the servers 104-105 includes any suitable computing or processing device capable of providing computing services to one or more client devices. Each of the servers 104-105 may include, for example, one or more processing devices, one or more memories for storing instructions and data, and one or more network interfaces for facilitating communication over the network 102. For example, the servers 104-105 may include servers that broadcast media data over a broadcast network in the network 102 using HTTP techniques. In another example, the servers 104-105 may include servers that broadcast media data over a broadcast network in the network 102 using a dash (DASH).

각각의 클라이언트 장치(106-115)는 네트워크(102)를 통해 적어도 하나의 서버 또는 다른 컴퓨팅 장치(들)과 상호 동작하는 어떤 적절한 컴퓨팅 또는 프로세싱 장치를 나타낸다. 이 예에서, 클라이언트 장치들(106-115)에는 데스크탑 컴퓨터(106), 모바일 전화나 스마트폰(108), PDA(personal digital assistant)(110), 랩탑 컴퓨터(112), 태블릿 컴퓨터(114), 및 세탑박스 및/또는 텔레비전(115)이 포함될 수 있다. 그러나, 어떤 다른, 혹은 추가적인 클라이언트 장치들이 통신 시스템(100) 내에서 사용될 수도 있다. Each of the client devices 106-115 represents any suitable computing or processing device that interacts with at least one server or other computing device (s) over the network 102. In this example, client devices 106-115 include a desktop computer 106, a mobile phone or smartphone 108, a personal digital assistant (PDA) 110, a laptop computer 112, a tablet computer 114, And a set-top box and / or a television 115. However, some other, or additional, client devices may be used within the communication system 100.

이 예에서, 일부 클라이언트 장치들(108-114)은 네트워크(102)와 간접적으로 통신한다. 예를 들어, 클라이언트 장치들(108-110)은 휴대전화 기지국들이나 eNodeB들과 같은 하나 이상의 기지국들(116)을 통해 통신한다. 또한 클라이언트 장치들(112-115)은 IEEE 802.11 무선 액세스 포인트들과 같은 하나 이상의 무선 액세스 포인트들(118)을 통해 통신한다. 이들은 다만 예시를 위한 것이며, 각각의 클라이언트 장치가 네트워크(102)와 직접 통신하거나 어떤 적절한 매개 장치(들)이나 네트워크(들)을 통해 네트워크(102)와 간접적으로 통신할 수도 있다는 것을 알아야 한다. 이하에서 보다 상세히 기술되는 바와 같이, 클라이언트 장치들(106-115)의 전부나 어느 하나는 HTTP 및 DASH를 이용하여 미디어 데이터를 수신 및 제공하는 구조를 포함할 수 있다.In this example, some client devices 108-114 communicate with the network 102 indirectly. For example, client devices 108-110 communicate via one or more base stations 116, such as cell phone base stations or eNodeBs. The client devices 112-115 also communicate via one or more wireless access points 118, such as IEEE 802.11 wireless access points. It should be appreciated that these are just examples and that each client device may communicate directly with the network 102 or indirectly with the network 102 via any suitable intermediate device (s) or network (s). As described in more detail below, either or both of the client devices 106-115 may include a structure for receiving and providing media data using HTTP and DASH.

본 발명의 실시 예가 적용되는 통신 시스템(100)은 각각의 구성요소에 대해 임의 개를 임의의 적절한 구성으로 포함할 수도 있다. 일반적으로, 컴퓨팅 및 통신 시스템들은 광범위한 구성들로 나타나며, 도 2는 본 개시의 범위를 어떤 특정 구성으로 한정하지 않는다. 도 2는 본 특허 문서에서 개시된 다양한 특성들이 사용될 수 있는 하나의 동작 환경을 도시하고 있지만, 그러한 특성들은 어떤 다른 적절한 시스템에서 사용될 수도 있다.The communication system 100 to which the embodiment of the present invention is applied may include any arbitrary arrangement for each component in any suitable configuration. In general, computing and communication systems appear in a wide variety of configurations, and Figure 2 does not limit the scope of this disclosure to any particular configuration. Figure 2 illustrates one operating environment in which the various features disclosed in this patent document may be used, although such characteristics may be used in any other suitable system.

도 3은 본 발명의 실시 예에 따른 HTTP 및 DASH를 이용하여 360 VR 콘텐츠를 제공하기 위한 클라이언트 장치(200)를 나타낸 도면으로서, 클라이언트 장치(200)는 도 5에 도시된 클라이언트 장치(106 ~115) 중 하나 이상을 나타낼 수 있다. 3 illustrates a client device 200 for providing 360 VR content using HTTP and DASH in accordance with an embodiment of the present invention. Client device 200 includes client device 106 - 115 ). &Lt; / RTI >

본 발명에서 HTTP는 오디오, 비디오, 및 위젯, 파일 등과 같은 기타 고정 콘텐츠 같은 시간 연속적인 멀티미디어 전달을 위한 새로운 프레임워크를 규정한다. DASH는 수신 개체로 HTTP 서버들로부터 인터넷을 통해 제공받은 미디어 데이터 스트리밍을 가능토록 한 적응적 비트 레이트 스트리밍 기법이다. In the present invention, HTTP defines a new framework for time-continuous multimedia delivery such as audio, video, and other fixed content such as widgets, files, and the like. DASH is an adaptive bitstream streaming technique that enables streaming of media data received from HTTP servers via the Internet as a receiving entity.

여기서, 클라이언트 장치(200)는 도 6에 도시된 바와 같이, HTTP 엔진(210), DASH 클라이언트(220), 및 360 VR 엔진(230)를 포함할 수 있다. Here, the client device 200 may include an HTTP engine 210, a DASH client 220, and a 360 VR engine 230, as shown in FIG.

DASH 클라이언트(220)는 네트워크(102b)를 통해 수신된 콘텐츠 타입 정보가 DASH MPD인 경우 수신된 미디어 데이터를 MPD 파서(221)로 전달하고, MPD 파서(221)는 수신된 DASH MPD 파일들을 수신하여 처리한다. MPD 파서(221)는 관련 DASH MPD를 참조하여 '미디어 싱크' 엘리먼트들로부터 DASH MPD 파일 내 제1세그먼트의 제1액세스 유닛의 제공 시간을 식별한다. The DASH client 220 delivers the received media data to the MPD parser 221 when the content type information received via the network 102b is a DASH MPD and the MPD parser 221 receives the received DASH MPD files . MPD parser 221 refers to the associated DASH MPD to identify the presentation time of the first access unit of the first segment in the DASH MPD file from the 'Media Sync' elements.

그리고, MPD 파서(221)는 네트워크(102)를 통해 수신된 URL(uniform resource locator)과 함께 요청들(가령, “GET” 요청들)을 HTTP 엔진(210)로 보내고, 응답으로 수신된 DASH 세그먼트들을 과 같은 미디어 데이터를 수신한다. The MPD parser 221 then sends requests (e.g., " GET " requests) to the HTTP engine 210 along with a URL (uniform resource locator) received over the network 102 and the DASH segment Lt; / RTI >

한편, HTTP 엔진(210)은 DASH 클라이언트(220)의 요청에 의거 서버(104)(105)로부터 제공받은 오디오 및 영상 MPD 및 DASH 세그먼트를 수신하고 수신된 DASH 세그먼트를 초 단위 DASH 세그먼트로 분리한 후 DASH 클라이언트(220)에 전달한다. 이때 MPD는 한 주기 내의 프레임의 오디오 및 영상 스트림 각각에 대한 오디오 MPD와 영상 MPD를 포함하고, 각각의 MPD는 적응 셋과 상기 적응 셋 내에 포함된 설명 셋을 포함하며, 본 발명에 의거 오디오 MPD의 설명 셋은 SRD(Spatial Related Description) 데이터와 SLID(Sound Location Information Description)을 더 포함한다. On the other hand, the HTTP engine 210 receives the audio and video MPD and DASH segments received from the servers 104 and 105 according to the request of the DASH client 220, separates the received DASH segments into DASH segments in seconds To the DASH client (220). Wherein the MPD comprises an audio MPD and an image MPD for each audio and video stream of a frame within one cycle, each MPD comprising an adaptation set and a description set contained in the adaptation set, The description set further includes SRD (Spatial Related Description) data and SLID (Sound Location Information Description).

여기서, SRD(Spatial Related Description) 및 SLID(Sound Location Information Description)는 하기 표에 도시된 바와 같다.Here, SRD (Spatial Related Description) and SLID (Sound Location Information Description) are as shown in the following table.

[표 1][Table 1]

[표 2][Table 2]

표 1에 도시된 바와 같이, SRD는 source_id(콘텐츠 소스에 대한 식별자를 제공하고 좌표계를 임시적으로 정의된 10진수 형태의 양의 정수), object_x(좌표계의 미디어 에셋와 관련된 좌측 위 코너의 수평 위치를 나타낸 10진수 형태의 양의 정수), object_y(좌표계의 미디어 에셋와 관련된 좌측 위 코너의 수직 위치를 나타낸 10진수 형태의 양의 정수), object_width(좌표계의 미디어 에셋와 관련된 폭을 나타낸 10진수 형태의 양의 정수), object_height(좌표계의 미디어 에셋와 관련된 높이를 나타낸 10진수 형태의 양의 정수), total_width(좌표계의 모든 미디어 에셋의 확장 폭을 나타낸 조건부 10진수 형태의 양의 정수), total_height(좌표계의 모든 미디어 에셋의 확장 높이를 나타낸 조건부 10진수 형태의 양의 정수), 및 spatial_set_id(미디오 에셋 그룹에 대한 식별자를 제공한 조건부 10진수 형태의 양의 정수)를 포함한다.As shown in Table 1, the SRD includes a source_id (providing an identifier for the content source and a coordinate system of a temporarily positive definite decimal type positive integer), object_x (representing the horizontal position of the upper left corner associated with the media asset in the coordinate system) Object_y (a positive integer in decimal form representing the vertical position of the upper-left corner associated with the media asset in the coordinate system), object_width (a positive integer in decimal form indicating the width associated with the media asset in the coordinate system) ), object_height (a positive integer in decimal form indicating the height associated with the media asset in the coordinate system), total_width (a positive integer in the form of conditional decimal representing the extension of all media assets in the coordinate system), total_height A positive integer in the form of a conditional decimal number indicating the extension height of the media asset group), and spatial_set_id (a conditional part providing an identifier for the media asset group Positive integer in decimal form).

또한 표 2를 참조하면, SLID는 sound_R(360 VR 콘텐츠 내의 음원 소스의 우측 위상 정보), sound_L(360 VR 콘텐츠 내의 음원 소스의 좌측 위상 정보), sound_spatial_hori(360 VR 콘텐츠 내의 음원 소스를 시각화하여 사용에게 제공하기 위한 수평각 정보), 및 sound_spatial_verti(360 VR 콘텐츠 내의 음원 소스를 시각화하여 사용에게 제공하기 위한 수직각 정보)를 포함한다.Also, referring to Table 2, the SLID can be used to visualize a sound source in sound_R (right phase information of a sound source in 360 VR content), sound_L (left phase information of a sound source in 360 VR content), sound_spatial_hori , And sound_spatial_verti (vertical angle information for visualizing the source of the source in the 360 VR content to provide to the user).

이러한 SRD 및 SLID를 포함하는 오디오 MPD와 기존의 영상 MPD를 제공받은 DASH 클라이언트(220)는 오디오 MPD와 영상 MPD를 토대로 사용자 시점(Viewpoint)의 위치를 예측하고 HTTP 엔진(210)에 예측된 사용자 시점 타일(viewpoint tile)을 요청한다. The DASH client 220 receiving the audio MPD including the SRD and the SLID and the conventional video MPD predicts the location of the user's viewpoint based on the audio MPD and the video MPD, Request a tile (viewpoint tile).

이에 HTTP 엔진(210)은 예측된 사용자 시점 타일(viewpoint tile)을 DASH 클라이언트(220)로 전달한다. 그리고 DASH 클라이언트(220)는 HTTP 엔진(210)에서 수신된 사용자 시점 타일(viewpoint tile)를 토대로 DASH 세그먼트들을 처리하고, DASH 세그먼트들의 처리가 완료될 때 그 처리된 DASH 세그먼트들을 360 VR 엔진(230)로 전달한다. The HTTP engine 210 delivers the predicted viewpoint tile to the DASH client 220. The DASH client 220 processes the DASH segments based on the viewpoint tile received from the HTTP engine 210 and transmits the processed DASH segments to the 360 VR engine 230 when the processing of the DASH segments is completed. .

360 VR 엔진(230)은 DASH 세그먼트들을 수신하여 알맞은 디코더들을 사용해 디코딩한다. 360 VR 엔진(230)은 디코딩 결과를 프레젠테이션 엔진(240)로 전달하고 프레젠테이션 엔진(240)은 디스플레이 상에 사용자에게 디스플레이할 미디어 데이터를 렌더링 및 제공한다. 비한정 예들에 있어서, 프레젠테이션 엔진(240)은 관련된 연관 미디어의 디스플레이와 동기된 사적 광고 정보 시간 및 위치를 중첩시키고/거나 디스플레이의 코너및 디스플레이된 브로드캐스팅된 미디어 데이터의 관련된 연관 부분과 동기된 시간에 위치하는 스트리밍된 브로드밴드 미디어 콘텐츠의 PIP(picture-in-picture) 데이터를 제공할 수 있다.The 360 VR engine 230 receives the DASH segments and decodes them using the appropriate decoders. The 360 VR engine 230 delivers the decoding results to the presentation engine 240 and the presentation engine 240 renders and provides the media data to be displayed to the user on the display. In a non-limiting example, the presentation engine 240 overlaps the private ad information time and location synchronized with the display of the associated media associated with and / or overlaps the time of the synchronized time with the associated portion of the displayed broadcast media data Picture-in-picture (PIP) data of the streamed broadband media content located in the Internet.

이에 따라 영상 MPD 및 오디오 MPD를 이용하여 사용자의 시점 위치 정보가 예측됨에 따라 기존의 영상 MPD을 이용하여 사용자의 시점 위치 정보를 예측하는 것 보다 시점 위치 예측에 대한 정확도를 더욱 향상시킬 수 있고, 실제 환경의 오디오를 입체적으로 가상 현실에 반영하여 사용자에게 제공됨에 따라 가상 현실에 대한 방향감, 거리감 및 공간감을 실제 환경과 동일하게 느낄 수 있어 실감나게 가상 현실 서비스를 제공할 수 있고 이에 따라 가상 현실 서비스에 대한 몰입도 및 흥미성을 더욱 향상시킬 수 있다. Accordingly, as the user's viewpoint position information is predicted using the video MPD and the audio MPD, the accuracy of the viewpoint position prediction can be improved more than the prediction of the viewpoint position information of the user using the existing video MPD, As the audio of the environment is three-dimensionally reflected in the virtual reality and provided to the user, the sense of direction, the sense of distance and the sense of space with respect to the virtual reality can be felt to be the same as the real environment. Thus, the virtual reality service can be realistically realized, The degree of immersion and interest can be further improved.

본 발명의 다른 태양에 의해 360도 VR 콘텐츠 내의 음원 위치 정보를 이용한 사용자의 시점 예측 방법은, DASH 클라이언트에서 네트워크를 통해 수신된 URL(uniform resource locator)과 함께 MPD 요청 명령을 HTTP 엔진로 전달하는 단계; 상기 HTTP 엔진에서 MPD 요청 명령에 응답하여 수신된 오디오 및 영상 MPD와 DASH 세그먼트들을 포함하는 미디어 데이터를 수신한 후 초 단위로 DASH 세그먼트를 분리한 후 상기 DASH 클라이언트로 전달하는 단계; 상기 DASH 클라이언트에서 수신된 영상 MPD 및 오디오 MPD를 토대로 사용자 시점(viewpoint) 위치를 예측한 후 예측된 사용자 시점 위치 타일을 상기 HTTP 엔진에 요청하는 단계; 및 상기 HTTP 엔진으로부터 제공받은 사용자 시점 위치 타일을 토대로 수신된 DASH 세그먼트를 처리하여 화면에 표시하는 단계를 포함할 수 있고, 여기서, 오디오 MPD는 SRD 및 SLID를 포함하며, SRD는 source_id(콘텐츠 소스에 대한 식별자를 제공하고 좌표계를 임시적으로 정의된 10진수 형태의 양의 정수), object_x(좌표계의 미디어 에셋와 관련된 좌측 위 코너의 수평 위치를 나타낸 10진수 형태의 양의 정수), object_y(좌표계의 미디어 에셋와 관련된 좌측 위 코너의 수직 위치를 나타낸 10진수 형태의 양의 정수), object_width(좌표계의 미디어 에셋와 관련된 폭을 나타낸 10진수 형태의 양의 정수), object_height(좌표계의 미디어 에셋와 관련된 높이를 나타낸 10진수 형태의 양의 정수), total_width(좌표계의 모든 미디어 에셋의 확장 폭을 나타낸 조건부 10진수 형태의 양의 정수), total_height(좌표계의 모든 미디어 에셋의 확장 높이를 나타낸 조건부 10진수 형태의 양의 정수), 및 spatial_set_id(미디오 에셋 그룹에 대한 식별자를 제공한 조건부 10진수 형태의 양의 정수)를 포함하고, SLID는 sound_R(360 VR 콘텐츠 내의 음원 소스의 우측 위상 정보), sound_L(360 VR 콘텐츠 내의 음원 소스의 좌측 위상 정보), sound_spatial_hori(360 VR 콘텐츠 내의 음원 소스를 시각화하여 사용에게 제공하기 위한 수평각 정보), 및 sound_spatial_verti(360 VR 콘텐츠 내의 음원 소스를 시각화하여 사용에게 제공하기 위한 수직각 정보)를 포함할 수 있다. 상기의 360도 VR 콘텐츠 내의 음원 위치 정보를 이용한 사용자의 시점 예측 방법의 각 단계는 전술한 HTTP 엔진(210), DASH 클라이언트(220), 및 360 VR 엔진(230)에서 수행되는 기능으로 자세한 원용은 생략한다. According to another aspect of the present invention, a method of predicting a user's viewpoint using sound source location information in 360-degree VR content includes the steps of delivering an MPD request command to an HTTP engine together with a uniform resource locator (URL) received through a network from a DASH client ; Receiving media data including audio and video MPD and DASH segments received in response to an MPD request command from the HTTP engine, separating DASH segments in units of seconds and forwarding them to the DASH client; Requesting a predicted user viewpoint location tile from the HTTP engine after predicting a viewpoint location based on the image MPD and audio MPD received from the DASH client; And displaying the received DASH segment on a screen based on a user viewpoint location tile received from the HTTP engine, wherein the audio MPD includes an SRD and an SLID, and the SRD includes a source_id Object_x (a positive integer in decimal form representing the horizontal position of the upper left corner associated with the media asset in the coordinate system), object_y (the media asset in the coordinate system and Object_width (a positive integer in decimal form indicating the width associated with the media asset in the coordinate system), object_height (a decimal representation of the height associated with the media asset in the coordinate system Total_width (a positive integer in the form of a conditional decimal number indicating the extent of the expansion of all media assets in the coordinate system) ), total_height (a positive integer in the form of a conditional decimal number indicating the extension height of all media assets in the coordinate system), and spatial_set_id (a positive integer in the form of a conditional decimal number providing an identifier for the media asset group) SLID includes sound_R (right phase information of a sound source in a 360 VR content), sound_L (left phase information of a sound source in a 360 VR content), sound_spatial_hori (horizontal angle information to visualize a sound source in a 360 VR content to provide to the user) And sound_spatial_verti (vertical angle information for visualizing the source of sound in 360 VR content to provide to the user). Each step of the user's viewpoint prediction method using the sound source location information in the 360 degree VR content is a function performed in the HTTP engine 210, the DASH client 220, and the 360 VR engine 230 described above. It is omitted.

이에 본 발명에 따르면, 영상 MPD 및 오디오 MPD를 이용하여 사용자의 시점 위치 정보가 예측됨에 따라 기존의 영상 MPD을 이용하여 사용자의 시점 위치 정보를 예측하는 것 보다 시점 위치 예측에 대한 정확도를 더욱 향상시킬 수 있고, 실제 환경의 오디오를 입체적으로 가상 현실에 반영하여 사용자에게 제공됨에 따라 가상 현실에 대한 방향감, 거리감 및 공간감을 실제 환경과 동일하게 느낄 수 있어 실감나게 가상 현실 서비스를 제공할 수 있고 이에 따라 가상 현실 서비스에 대한 몰입도 및 흥미성을 더욱 향상시킬 수 있다.According to the present invention, as the user's viewpoint position information is predicted using the video MPD and the audio MPD, the accuracy of the viewpoint position prediction can be improved more than the viewpoint position information of the user is predicted using the existing video MPD As the audio of the real environment is three-dimensionally reflected to the virtual reality, it is provided to the user, so that the sense of direction, distance, and space of the virtual reality can be felt to be the same as the real environment, realizing a realistic virtual reality service. The immersion and the interest in the virtual reality service can be further improved.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다. 그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the equivalents of the claims, as well as the claims.

영상 MPD 및 오디오 MPD를 이용하여 사용자의 시점 위치 정보가 예측됨에 따라 기존의 영상 MPD을 이용하여 사용자의 시점 위치 정보를 예측하는 것 보다 시점 위치 예측에 대한 정확도를 더욱 향상시킬 수 있고, 실제 환경의 오디오를 입체적으로 가상 현실에 반영하여 사용자에게 제공됨에 따라 가상 현실에 대한 방향감, 거리감 및 공간감을 실제 환경과 동일하게 느낄 수 있어 실감나게 가상 현실 서비스를 제공할 수 있고 이에 따라 가상 현실 서비스에 대한 몰입도 및 흥미성을 더욱 향상시킬 수 있는 360도 VR 콘텐츠 내의 음원 위치 정보를 이용한 사용자의 시점 예측 장치 및 방법에 대한 운용의 정확성 및 신뢰도 측면, 더 나아가 성능 효율 면에 매우 큰 진보를 가져올 수 있으며, 가상 현실 서비스를 제공하는 시스템의 시판 또는 영업의 가능성이 충분할 뿐만 아니라 현실적으로 명백하게 실시할 수 있는 정도이므로 산업상 이용가능성이 있는 발명이다.As the user's viewpoint position information is predicted using the video MPD and the audio MPD, the accuracy of the viewpoint position prediction can be improved more than the viewpoint position information of the user using the existing video MPD, Since the audio is three-dimensionally reflected to the virtual reality and provided to the user, the sense of direction, the sense of distance, and the sense of space with respect to the virtual reality can be felt to be the same as the actual environment, realizing the realistic virtual reality service, It is possible to make a great progress in terms of accuracy and reliability of operation and further performance efficiency of a user's viewpoint prediction apparatus and method using sound source position information in 360 degree VR contents which can further improve the degree and the interest, There is a good chance that a system that provides a real service is commercially available or operating. It is an invention that is industrially applicable since it is practically possible to carry out clearly.

Claims

If the type of media received through the network is DASH MPD (Dynamic Adaptive Streaming over Hypertext Transport Protocol), a MPD (Media Presentation Description) request command is generated together with a URL (uniform resource locator) included in the DASH MPD Forwarding DASH client; And
And an HTTP engine for collecting DASH segments received through the network based on the MPD request command, separating the DASH segments in seconds, and delivering the separated DASH segments and the MPD to the DASH client,
The MPD includes audio MPD and video MPD,
The audio MPD includes a Spatial Representation Description (SRD) and a Sound Localization Information Description (SLID)
The SLID
sound_R (right phase information of the sound source in the 360 VR content), sound_L (left phase information of the sound source in the 360 VR content), sound_spatial_hori (horizontal angle information to visualize the sound source in the 360 VR content to provide to the user) (360) of the 360-degree VR content (vertical angle information for visualizing the sound source in the 360-VR content and providing the information to the user).

delete

The method of claim 1, wherein the SRD (Spatial Representation Description)
source_id (a positive integer in the form of a decimal value providing an identifier for the content source and temporarily defining a coordinate system), object_x (a positive integer representing a horizontal position of the upper left corner associated with the media asset in the coordinate system), object_y (A positive integer in decimal form indicating the vertical position of the upper left corner associated with the media asset in the coordinate system), object_width (a positive integer in decimal form indicating the width associated with the media asset in the coordinate system), object_height , Total_width (a positive integer in the form of a conditional decimal value indicating the extent of the expansion of all media assets in the coordinate system), total_height (a conditional decimal representation of the extension height of all media assets in the coordinate system) Positive integer), and spatial_set_id (a positive integer in the form of a conditional decimal number that provides an identifier for the media asset group) Wherein the sound source position information includes at least one sound source position information in the 360-degree VR content.

delete

The method of claim 1, wherein the DASH client
Predicts the user's viewpoint position based on the received SRD and SLID, requests the predicted user viewpoint location tile to the HTTP engine,
Wherein the HTTP engine is adapted to deliver a user viewpoint tile to a DASH client according to a predicted user viewpoint positional tile request.

Transmitting an MPD request command to a HTTP engine together with a uniform resource locator (URL) received from a DASH client through a network;
Receiving media data including audio and video MPD and DASH segments received in response to an MPD request command from the HTTP engine, separating DASH segments in units of seconds and forwarding them to the DASH client;
Requesting a predicted user viewpoint location tile from the HTTP engine after predicting a viewpoint location based on the image MPD and audio MPD received from the DASH client; And
Processing the received DASH segment based on the user viewpoint location tile provided from the HTTP engine and displaying the processed DASH segment on the screen,
Audio MPD and video MPD, and audio MPD
A Spatial Representation Description (SRD), and a Sound Localization Information Description (SLID)
sound_R (right phase information of the sound source in the 360 VR content), sound_L (left phase information of the sound source in the 360 VR content), sound_spatial_hori (horizontal angle information to visualize the sound source in the 360 VR content to provide to the user) (Vertical angle information for visualizing the sound source in the 360 VR content and providing the information to the user).

delete

8. The method of claim 7, wherein the SRD
source_id (a positive integer in the form of a decimal value providing an identifier for the content source and temporarily defining a coordinate system), object_x (a positive integer representing a horizontal position of the upper left corner associated with the media asset in the coordinate system), object_y (A positive integer in decimal form indicating the vertical position of the upper left corner associated with the media asset in the coordinate system), object_width (a positive integer in decimal form indicating the width associated with the media asset in the coordinate system), object_height , Total_width (a positive integer in the form of a conditional decimal value indicating the extent of the expansion of all media assets in the coordinate system), total_height (a conditional decimal representation of the extension height of all media assets in the coordinate system) Positive integer), and spatial_set_id (a positive integer in the form of a conditional decimal number that provides an identifier for the media asset group) Figure 360 a user at the time prediction method using the sound source position information in the VR content comprises a one.

delete