TWI653882B

TWI653882B - Video device and encoding/decoding method for 3d objects thereof

Info

Publication number: TWI653882B
Application number: TW106140711A
Authority: TW
Inventors: 佑和; 石維國; 楊朝光; 莊子龍
Original assignee: 宏碁股份有限公司
Priority date: 2017-11-23
Filing date: 2017-11-23
Publication date: 2019-03-11
Also published as: TW201926996A

Abstract

本發明係提供一種視訊裝置，包括：一記憶體；一處理器，用以取得一三維物件，並將該三維物件之複數個視角所相應之複數張視角影像進行編碼以產生一視訊流；以及一繪圖處理單元，用以對該視訊流進行解碼以取得該三維物件相應於複數個視角之複數張視角影像，並將該複數張視角影像儲存於該記憶體；其中該繪圖處理單元係依據來自一使用者所配戴之一顯示裝置之一感測資訊決定該使用者與該三維物件之間之一相對視角，並從儲存於該記憶體中之該複數張視角影像計算出相應於該相對視角之一輸出影像；其中該繪圖處理單元更將該輸出影像傳送至該顯示裝置進行播放。 The present invention provides a video device, comprising: a memory; a processor for acquiring a three-dimensional object, and encoding a plurality of viewing angle images corresponding to the plurality of viewing angles of the three-dimensional object to generate a video stream; a graphics processing unit, configured to decode the video stream to obtain a plurality of viewing angle images corresponding to the plurality of viewing angles, and store the plurality of viewing angle images in the memory; wherein the drawing processing unit is based on Sensing information of one of the display devices of a user determines a relative viewing angle between the user and the three-dimensional object, and calculates corresponding to the relative image from the plurality of viewing angle images stored in the memory One of the viewing angles outputs an image; wherein the drawing processing unit further transmits the output image to the display device for playing.

Description

Video device and its three-dimensional object codec method

本發明係有關於顯示裝置，特別是有關於一種視訊裝置及其三維物件編解碼方法。 The present invention relates to a display device, and more particularly to a video device and a method for encoding and decoding the three-dimensional object thereof.

近年來，隨著科技發展，虛擬實境(virtual reality，VR)及擴增實境(Augmented Reality，AR)之裝置及相關應用也愈來愈普及。一般而言，虛擬實境裝置通常包括兩個顯示器，且各顯示器係分別播放影像至使用者的左眼及右眼，進而構成3D影像。虛擬實境之視訊內容可具有360度之視角(field of view，FOV)，且具有60Hz至90Hz之畫面更新率。進一步而言，上述的虛擬實境之視訊內容通常會包含了三維物件，故所需的檔案空間相當大。因此，對於傳統的虛擬實境裝置來說，其設計的瓶頸會在於傳輸頻寬以及視訊檔案的儲存空間。換言之，傳統的虛擬實境裝置往往會因為三維物件的視訊編碼後的視訊檔案太大，進而造成傳輸頻寬不足，故需要犧牲畫面品質，所以會造成使用者體驗不佳。 In recent years, with the development of science and technology, devices and related applications of virtual reality (VR) and Augmented Reality (AR) have become more and more popular. In general, a virtual reality device typically includes two displays, and each display separately plays an image to the left and right eyes of the user to form a 3D image. The video content of the virtual reality may have a 360 degree field of view (FOV) and has a picture update rate of 60 Hz to 90 Hz. Further, the video content of the virtual reality mentioned above usually includes a three-dimensional object, so the required file space is quite large. Therefore, for the traditional virtual reality device, the bottleneck of its design lies in the transmission bandwidth and the storage space of the video file. In other words, the traditional virtual reality device often has too large a video file after the video encoding of the three-dimensional object, and thus the transmission bandwidth is insufficient, so that the image quality needs to be sacrificed, so that the user experience is poor.

因此，需要一種視訊裝置及其三維物件編解碼方法以降低編碼。 Therefore, there is a need for a video device and its three-dimensional object encoding and decoding method to reduce encoding.

在本發明之一實施例中，該三維物件之該複數個視角係在垂直角度及水平角度等距(equidistant)排列。 In an embodiment of the invention, the plurality of viewing angles of the three-dimensional object are equidistantly arranged at a vertical angle and a horizontal angle.

在本發明之一實施例中，該處理器係將該三維物件之該複數張視角影像排列為一二維影像並進行一空間編碼以產生該視訊流，且該視訊流之一元資料(metadata)係記錄各視角所對應之各視角影像。 In an embodiment of the present invention, the processor arranges the plurality of views of the three-dimensional object into a two-dimensional image and performs a spatial encoding to generate the video stream, and the video stream is a meta-data. It records the image of each view corresponding to each angle of view.

在本發明之一實施例中，在該繪圖處理單元取得該複數張視角影像後，該繪圖處理單元更對該複數張視角影像進行一背景去除及影像擷取處理，再將處理後之該複數張視角影像儲存於該記憶體。 In an embodiment of the present invention, after the drawing processing unit obtains the plurality of viewing angle images, the drawing processing unit further performs background removal and image capturing processing on the plurality of viewing angle images, and then processes the complex number The viewing angle image is stored in the memory.

在本發明之一實施例中，該繪圖處理單元係由該複數張視角影像中決定最接近該相對視角之一第一視角影像及取得其相鄰的一或多張視角影像，並由該第一視角影像及其相鄰之該視角影像計算出該輸出影像。 In an embodiment of the present invention, the drawing processing unit determines, by the plurality of viewing angle images, a first viewing angle image that is closest to the relative viewing angle and obtains one or more viewing angle images adjacent thereto, and is configured by the first One-view image and its The output image is calculated by the adjacent view image.

本發明更提供一種三維物件編解碼方法，用於一視訊裝置，其中該視訊裝置包括一處理器及一繪圖處理單元，該方法包括：利用該處理器取得一三維物件，並將該三維物件之複數個視角所相應之複數張視角影像進行編碼以產生一視訊流；利用該繪圖處理單元對該視訊流進行解碼以取得該三維物件相應於複數個視角之複數張視角影像；利用該繪圖處理單元依據來自一使用者所配戴之一顯示裝置之一感測資訊決定該使用者與該三維物件之間之一相對視角，利用該繪圖處理單元從該複數張視角影像計算出相應於該相對視角之一輸出影像；以及利用該繪圖處理單元將該輸出影像傳送至該顯示裝置進行播放。 The present invention further provides a three-dimensional object encoding and decoding method for a video device, wherein the video device includes a processor and a graphics processing unit, the method includes: using the processor to obtain a three-dimensional object, and the three-dimensional object The plurality of viewing angle images corresponding to the plurality of viewing angles are encoded to generate a video stream; the graphics processing unit is used to decode the video stream to obtain a plurality of viewing angle images corresponding to the plurality of viewing angles; and the drawing processing unit is utilized Determining a relative viewing angle between the user and the three-dimensional object based on sensing information from one of the display devices of a user, and calculating, by the drawing processing unit, the relative viewing angle from the plurality of viewing angle images And outputting the image; and transmitting the output image to the display device for playing by using the drawing processing unit.

10‧‧‧視訊系統 10‧‧‧Video System

100‧‧‧視訊裝置 100‧‧‧ video equipment

110‧‧‧處理器 110‧‧‧ processor

120‧‧‧繪圖處理單元 120‧‧‧Drawing processing unit

130‧‧‧記憶體 130‧‧‧ memory

140‧‧‧儲存裝置 140‧‧‧Storage device

150‧‧‧傳輸介面 150‧‧‧Transport interface

160‧‧‧傳輸線 160‧‧‧ transmission line

200‧‧‧顯示裝置 200‧‧‧ display device

210‧‧‧處理器 210‧‧‧ processor

220‧‧‧記憶體 220‧‧‧ memory

230‧‧‧傳輸介面 230‧‧‧Transport interface

240‧‧‧左眼顯示面板 240‧‧‧Left eye display panel

241‧‧‧左眼鏡頭 241‧‧‧left eye lens

250‧‧‧右眼顯示面板 250‧‧‧right eye display panel

251‧‧‧右眼鏡頭 251‧‧‧right eye lens

260‧‧‧感測裝置 260‧‧‧Sensing device

270‧‧‧殼體 270‧‧‧shell

20‧‧‧使用者 20‧‧‧Users

30‧‧‧三維物件 30‧‧‧Three-dimensional objects

31‧‧‧中心點 31‧‧‧ center point

32‧‧‧點 32‧‧‧ points

300‧‧‧圓球區域 300‧‧‧ sphere area

50A、50B‧‧‧輸出影像 50A, 50B‧‧‧ output image

S410-S470‧‧‧步驟 S410-S470‧‧‧Steps

第1圖係顯示依據本發明一實施例中之視訊系統的方塊圖。 1 is a block diagram showing a video system in accordance with an embodiment of the present invention.

第2A~2F圖係顯示依據本發明一實施例中之三維物件的視角影像的示意圖。 2A-2F are schematic views showing a view image of a three-dimensional object according to an embodiment of the present invention.

第3A及3B圖係顯示依據本發明一實施例中使用者與三維物件之相對視角的示意圖。 3A and 3B are schematic views showing the relative viewing angles of the user and the three-dimensional object in accordance with an embodiment of the present invention.

第4圖係顯示依據本發明一實施例中之三維物件編解碼方法的流程圖。 Figure 4 is a flow chart showing a method for encoding and decoding a three-dimensional object in accordance with an embodiment of the present invention.

為使本發明之上述目的、特徵和優點能更明顯易懂，下文特舉一較佳實施例，並配合所附圖式，作詳細說明如下。 The above described objects, features and advantages of the present invention will become more apparent from the description of the appended claims.

第1圖係顯示依據本發明一實施例中之視訊系統的方塊圖。在一實施例中，視訊系統10包括一視訊裝置100及一顯示裝置200，其中視訊裝置100及顯示裝置200係透過一傳輸線160連接。 1 is a block diagram showing a video system in accordance with an embodiment of the present invention. In one embodiment, the video system 10 includes a video device 100 and a display device 200. The video device 100 and the display device 200 are connected through a transmission line 160.

在一實施例中，視訊裝置100可為一運算裝置，例如是一個人電腦、智慧型手機、平板電腦等裝置，但本發明並不限於此。視訊裝置100係用以計算欲顯示的3D影像(例如包括三維物件)。然而計算而得的三維物件並無法直接進行顯示。因此，3D影像需編碼為一視訊流，並且經由繪圖處理繪製產生欲播放的輸出影像。其中輸出影像係經由傳輸線160傳送至顯示裝置200進行播放，使得使用者可透過顯示裝置200觀看到虛擬實境及3D場景。 In an embodiment, the video device 100 can be an computing device, such as a personal computer, a smart phone, a tablet, etc., but the invention is not limited thereto. The video device 100 is used to calculate a 3D image to be displayed (for example, including a three-dimensional object). However, the calculated three-dimensional object cannot be directly displayed. Therefore, the 3D image needs to be encoded into a video stream, and the output image to be played is generated by drawing processing. The output image is transmitted to the display device 200 via the transmission line 160 for playback, so that the user can view the virtual reality and the 3D scene through the display device 200.

視訊裝置100係包括一處理器110、一繪圖處理單元120、一記憶體130、一儲存裝置140、以及一傳輸介面150。處理器110例如是一中央處理器(central processing unit，CPU)，且記憶體130例如是一靜態隨機存取記憶體(SRAM)或一動態隨機存取記憶體(DRAM)等等的揮發性記憶體。舉例來說，處理器110係用以計算一三維物件，並將該三維物件之複數個視角所相應的視角影像編碼為一視訊流(或視訊檔案)。 The video device 100 includes a processor 110, a graphics processing unit 120, a memory 130, a storage device 140, and a transmission interface 150. The processor 110 is, for example, a central processing unit (CPU), and the memory 130 is, for example, a volatile memory of a static random access memory (SRAM) or a dynamic random access memory (DRAM). body. For example, the processor 110 is configured to calculate a three-dimensional object and encode the corresponding view image of the three-dimensional object into a video stream (or video file).

繪圖處理單元120係依據處理器110所執行之應用程式的指令以進行相應的繪圖處理及視訊解碼以產生輸出影像，其細節將詳述於後。 The graphics processing unit 120 performs corresponding graphics processing and video decoding according to instructions of the application executed by the processor 110 to generate an output image. Like, the details will be detailed later.

儲存裝置140係用以儲存在行動裝置上所執行之作業系統、各種應用程式、及顯示驅動程式等等。儲存裝置140例如是一硬碟機、一固態硬碟、一快閃記憶體等等的非揮發性記憶體，但本發明並不限於此。處理器110係可將儲存於儲存裝置140中之作業系統及應用程式讀取至記憶體130並執行。 The storage device 140 is used to store operating systems, various applications, display drivers, and the like executed on the mobile device. The storage device 140 is, for example, a non-volatile memory of a hard disk drive, a solid state hard disk, a flash memory, or the like, but the present invention is not limited thereto. The processor 110 can read and execute the operating system and application stored in the storage device 140 to the memory 130.

傳輸介面150係可支援高速的多媒體資料傳輸，例如可為一多媒體傳輸介面，例如是高解析度多媒體介面(high-definition multimedia interface)或是顯示埠(DisplayPort)介面，但本發明並不限於此。 The transmission interface 150 can support high-speed multimedia data transmission, for example, can be a multimedia transmission interface, such as a high-definition multimedia interface or a display port interface, but the invention is not limited thereto. .

在一實施例中，繪圖處理單元120係包括了繪圖處理管線(未繪示)，其可對頂點資料(vertex data)及圖元資料(primitive data)進行曲面細分(tessellation)，並對細分後之資料依序進行頂點處理(vertex processing)、幾何處理(geometry processing)、及像素處理(pixel processing)，其中在像素處理時係加入材質(texture)，並接著進行像素繪製(pixel rendering)。 In an embodiment, the drawing processing unit 120 includes a drawing processing pipeline (not shown) that can tessellation the vertex data and the primitive data, and after subdivision The data is sequentially subjected to vertex processing, geometry processing, and pixel processing, in which a texture is added during pixel processing, and then pixel rendering is performed.

此外，繪圖處理單元120亦可進行視訊解碼，例如對處理器110所產生之三維物件的視訊流進行解碼，藉以取得該三維物件之複數個視角所相應的複數張視角影像。 In addition, the graphics processing unit 120 can also perform video decoding, for example, decoding the video stream of the three-dimensional object generated by the processor 110, thereby obtaining a plurality of viewing angle images corresponding to the plurality of viewing angles of the three-dimensional object.

顯示裝置200例如可為一虛擬實境顯示裝置，例如可為一頭戴式顯示器(head-mounted display，HMD)，用以播放來自視訊裝置100之視訊內容。 The display device 200 can be, for example, a virtual reality display device, for example, a head-mounted display (HMD) for playing video content from the video device 100.

在一些實施例中，顯示裝置200更具有視訊解碼能力，例如可透過其傳輸介面230接收來自視訊裝置100之一視訊流，並將視訊流解碼為一視訊資料(例如可為立體影像、多視角影像等等)，並在其左眼顯示面板240及右眼顯示面板250上播放該視訊資料。換言之，在此實施例中，顯示裝置200可用以執行在視訊裝置100中之視訊解碼處理。 In some embodiments, the display device 200 further has video decoding capability, for example, can receive video from the video device 100 through its transmission interface 230. The video stream is decoded into a video material (for example, a stereoscopic image, a multi-view image, etc.), and the video data is played on the left-eye display panel 240 and the right-eye display panel 250. In other words, in this embodiment, the display device 200 can be used to perform video decoding processing in the video device 100.

在一實施例中，顯示裝置200係包括一處理器210、一記憶體220、一傳輸介面230、左眼顯示面板240及相應的左眼鏡頭241、右眼顯示面板250及相應的右眼鏡頭251、一感測裝置260、一殼體270。處理器210例如可為一中央處理器(CPU)、或一微處理器(microprocessor)等等，但本發明並不限於此。記憶體220例如可為一靜態隨機存取記憶體(SRAM)或一動態隨機存取記憶體(DRAM)等等的揮發性記憶體，用以儲存進行視訊解碼時的暫存資料以及做為影像緩衝器(image buffer)。 In one embodiment, the display device 200 includes a processor 210, a memory 220, a transmission interface 230, a left-eye display panel 240, and a corresponding left-eye lens 241, right-eye display panel 250, and corresponding right lens. 251, a sensing device 260, a housing 270. The processor 210 can be, for example, a central processing unit (CPU), or a microprocessor, etc., but the invention is not limited thereto. The memory 220 can be, for example, a static random access memory (SRAM) or a dynamic random access memory (DRAM) volatile memory for storing temporary data during video decoding and as an image. Buffer (image buffer).

傳輸介面230係可支援高速的多媒體資料傳輸，例如可為一多媒體傳輸介面，例如是高解析度多媒體介面(high-definition multimedia interface)或是顯示埠(DisplayPort)介面，但本發明並不限於此。 The transmission interface 230 can support high-speed multimedia data transmission, for example, can be a multimedia transmission interface, such as a high-definition multimedia interface or a display port interface, but the invention is not limited thereto. .

左眼顯示面板240及右眼顯示面板250可分別接收並播放來自處理器210之視訊內容，例如分別為左眼影像及右眼影像。舉例來說，左眼顯示面板240及右眼顯示面板250例如可為一液晶螢幕(LCD)、發光二極體(LED)螢幕、有機發光二極體(OLED)螢幕，但本發明並不限於此。因此，透過左眼顯示面板240以及右眼顯示面板250及相應的左眼鏡頭241及右眼鏡頭251分別投射左眼影像及右眼影像至使用者的左眼及右眼，故使用者可觀看到相應的立體影像，進而感受到虛擬實境(VR)。 The left-eye display panel 240 and the right-eye display panel 250 can respectively receive and play video content from the processor 210, such as a left-eye image and a right-eye image, respectively. For example, the left-eye display panel 240 and the right-eye display panel 250 can be, for example, a liquid crystal display (LCD), a light-emitting diode (LED) screen, or an organic light-emitting diode (OLED) screen, but the invention is not limited thereto. this. Therefore, the left-eye display panel 240 and the right-eye display panel 250 and the corresponding left-eye lens 241 and right-eye lens 251 respectively project the left-eye image and the right-eye image to the left and right eyes of the user, The user can view the corresponding stereoscopic image and feel the virtual reality (VR).

感測裝置260係用以偵測使用者頭部之移動及方向，並產生一感測資料。其中該感測資料會回報至處理器210，故處理器210可依據該感測資料計算使用者之眼睛的視角(FOV)，並可從該視角決定要顯示三維物件所相應的視角影像。舉例來說，感測裝置260可包括一陀螺儀(gyroscope)及一加速度計(accelerometer)，但本發明並不限於此。 The sensing device 260 is configured to detect the movement and direction of the user's head and generate a sensing data. The sensing data is returned to the processor 210, so the processor 210 can calculate the perspective (FOV) of the user's eyes based on the sensing data, and can determine the corresponding viewing angle image of the three-dimensional object from the viewing angle. For example, the sensing device 260 can include a gyroscope and an accelerometer, but the invention is not limited thereto.

在顯示裝置200中之元件210~260係安置於殼體270之中，且殼體270可包括一束帶或其他輔助裝置(未繪示)以供使用者戴於頭上以透過顯示裝置200觀賞畫面。 The components 210-260 in the display device 200 are disposed in the housing 270, and the housing 270 can include a strap or other auxiliary device (not shown) for the user to wear on the head to view through the display device 200. Picture.

舉例來說，處理器210可透過傳輸介面230接收來自視訊裝置100的視訊內容，並於左眼顯示面板及240及右眼顯示面板250進行播放。 For example, the processor 210 can receive the video content from the video device 100 through the transmission interface 230 and play it on the left-eye display panel 240 and the right-eye display panel 250.

在本發明中，上述視訊流中對於三維物件之編碼方式係不同於傳統方法。更進一步而言，由繪圖處理單元120所進行的三維物件均是由複數個多邊形(polygon)及材質(texture)所組成。傳統的影像/視訊壓縮方法會直接用於材質影像及物件形狀(object shape)。然而，對於虛擬實境應用來說，所需的計算量比一般的視訊壓縮高出非常多，主要是在虛擬實境裝置中，使用者的眼睛相當靠近顯示面板，故可感受到的影像細節也愈多，若因為傳輸頻寬的考量而犧牲影像品質，使用者很容易就可感覺到，進而造成使用者體驗不佳。 In the present invention, the encoding method for the three-dimensional object in the above video stream is different from the conventional method. Furthermore, the three-dimensional objects performed by the drawing processing unit 120 are composed of a plurality of polygons and textures. Traditional image/video compression methods are used directly for material images and object shapes. However, for virtual reality applications, the amount of computation required is much higher than that of normal video compression, mainly in virtual reality devices where the user's eyes are fairly close to the display panel, so the image details can be felt. The more the image quality is sacrificed due to the consideration of the transmission bandwidth, the user can easily feel it, resulting in a poor user experience.

另外，傳統方法的另一個限制是在於使用性。使用者產生的影像內容需要特定的軟體及處理方式，雖然可使用已存在的影像來簡化其流程，但仍需要額外的中間步驟以產生三維模型(3D model)，且仍使用影像繪製(image rendering)的步驟以產生虛擬實境的視角影像。 In addition, another limitation of the conventional method is the usability. User-generated image content requires specific software and processing methods, although it can be used Existing images simplify the process, but additional intermediate steps are still required to generate a 3D model, and the steps of image rendering are still used to produce a virtual reality view image.

在本發明中，視訊裝置100係簡化了在產生虛擬實境影像中繪製三維物件之流程，例如利用三維物件在不同視角的複數張視角影像(二維影像)來取代原本的三維物件。需了解的是，在每一特定時間點，使用者觀看該三維物件均只會看到該三維物件的其中一個視角影像。隨著使用者與該三維物件之間的相對視角不同，使用者所看到的三維物件之視角影像也隨著不同。 In the present invention, the video device 100 simplifies the process of drawing a three-dimensional object in a virtual reality image, for example, using a three-dimensional object to replace a three-dimensional object in a plurality of viewing angle images (two-dimensional images) at different viewing angles. It should be understood that at each specific time point, the user only sees one of the view images of the three-dimensional object when viewing the three-dimensional object. As the relative viewing angle between the user and the three-dimensional object is different, the viewing angle image of the three-dimensional object seen by the user also varies.

更進一步而言，視訊裝置100例如同時為一編碼端及解碼端。在處理器110繪製三維物件時，會先將預先建立之該三維物件的複數個視角影像(例如可為點陣圖或JPEG之格式的二維影像)進行編碼，例如可先將該複數個視角影像先進行排列成一整合影像再對該整合影像進行幀內編碼(spatial encoding)，或是可分別將各個二維影像。因此，在編碼後所產生的視訊流(可儲存為一視訊壓縮檔案)即已包括該三維物件之各個視角的視角影像，且該三維物件之各視角及其相應的視角影像之資訊均儲存於所產生的視訊流中之元資料(metadata)或標頭資訊(header information)。因此，視訊裝置100中之繪圖處理單元120則不需像傳統方法一樣先計算三維模型並再將三維模型繪製於輸出影像。 Furthermore, the video device 100 is, for example, both an encoding end and a decoding end. When the processor 110 draws a three-dimensional object, a plurality of perspective images of the three-dimensional object (for example, a two-dimensional image in a bitmap or JPEG format) may be encoded in advance, for example, the plurality of viewing angles may be first used. The images are first arranged into an integrated image and then spatially encoded for the integrated image, or each of the two-dimensional images can be separately. Therefore, the video stream generated after the encoding (which can be stored as a video compressed file) includes the view image of each view of the three-dimensional object, and the information of each view of the three-dimensional object and the corresponding view image are stored in the image. Metadata or header information in the generated video stream. Therefore, the drawing processing unit 120 in the video device 100 does not need to calculate the three-dimensional model and then draw the three-dimensional model on the output image as in the conventional method.

更進一步而言，整合影像之大小係與該三維物件之視角數量及原始視角影像之解析度有關。當視角數量愈多或是視角影像之解析度愈高，整合影像也就愈大。然而，對於較小的視角數量而言，對整合影像所產生的視訊流係遠比對該三維物件進行編碼所產生的視訊流還小。 Furthermore, the size of the integrated image is related to the number of views of the three-dimensional object and the resolution of the original view image. When the number of views is increased or The higher the resolution of the perspective image, the larger the integrated image. However, for a smaller number of views, the video stream produced by the integrated image is much smaller than the video stream produced by encoding the three-dimensional object.

承上述實施例，視訊裝置100中之繪圖處理單元120，例如為一解碼端，會讀取處理器110所產生之視訊流之元資料，並可分析出目前的視訊流是利用複數個視角影像來取代三維物件的編碼方式。接著，繪圖處理單元120則可對該視訊流進行解碼以取得該三維物件相關的複數個視角影像，並將該複數個視角影像儲存至記憶體130中。 According to the above embodiment, the graphics processing unit 120 in the video device 100 is, for example, a decoding terminal, and reads the metadata of the video stream generated by the processor 110, and can analyze that the current video stream is a plurality of viewing angle images. To replace the encoding of three-dimensional objects. Then, the graphics processing unit 120 can decode the video stream to obtain a plurality of perspective images related to the three-dimensional object, and store the plurality of perspective images in the memory 130.

在一些實施例中，在取得該三維物件相關的複數個視角影像之後，繪圖處理單元120更可對該複數個視角影像進行額外的影像處理，例如是背景去除(background removing)或影像截取(image cropping)。 In some embodiments, after the plurality of view images related to the three-dimensional object are obtained, the drawing processing unit 120 may perform additional image processing on the plurality of view images, such as background removal or image capture (image). Cropping).

接著，繪圖處理單元120會從感測裝置260中取得顯示裝置200之感測資料，藉以判斷使用者與該三維物件之間的一相對視角，並依據該相對視角從該複數個視角影像中選擇相應的視角影像以做為輸出影像。在一些實施例中，繪圖處理單元120則再將所選擇的視角影像與相應的背景及材質進行影像繪製以產生輸出影像(例如包括左眼影像及右眼影像)。 Then, the mapping processing unit 120 obtains the sensing data of the display device 200 from the sensing device 260, thereby determining a relative viewing angle between the user and the three-dimensional object, and selecting from the plurality of viewing angle images according to the relative viewing angle. The corresponding view image is used as the output image. In some embodiments, the graphics processing unit 120 then images the selected perspective image and the corresponding background and material to generate an output image (eg, including a left eye image and a right eye image).

在一些實施例中，繪圖處理單元120不需從感測裝置260中取得顯示裝置200之感測資料，而是執行一立體投影法(stereoscopic projection method)以辨識出各個視角的方向以及計算出相對視角，但此方法中繪圖處理單元120需要較高的運算能力以讀取相關的影像資料。 In some embodiments, the drawing processing unit 120 does not need to obtain the sensing data of the display device 200 from the sensing device 260, but performs a stereoscopic projection method to recognize the directions of the respective viewing angles and calculate the relative directions. Perspective, but in this method the graphics processing unit 120 requires a higher computing power to read the associated image data.

需注意的是，在處理器110進行三維物件之複數個視角影像的編碼時，因為會設定一合理的視角數量以節省編碼產生的視訊流的長度。繪圖處理單元120在依據使用者與該三維物件之間的一相對視角在選擇相應的視角影像時，該相對視角並不一定可剛好對應三維物件中之一特定視角影像。因此，在考量到記憶體頻寬的情況下，繪圖處理單元120係讀取與該相對視角最接近的一視角影像及其相鄰的一或多視角影像，並進行內插以產生輸出影像。 It should be noted that when the processor 110 performs encoding of a plurality of view images of a three-dimensional object, a reasonable number of views is set to save the length of the video stream generated by the encoding. When the drawing processing unit 120 selects the corresponding view image according to a relative viewing angle between the user and the three-dimensional object, the relative viewing angle does not necessarily correspond to one of the three-dimensional objects. Therefore, in consideration of the memory bandwidth, the drawing processing unit 120 reads a view image closest to the relative view angle and its adjacent one or more view images, and interpolates to generate an output image.

在第2A~2F圖中係繪示了三維物件280之不同視角所相應的視角影像。舉例來說，三維物件280例如為一立方體，且6個表面均有不同的顏色。在第2A圖及第2B圖係分別繪示了三維物件280之俯視(top view)圖及底視(bottom view)圖。 In the 2A-2F drawings, the perspective images corresponding to different viewing angles of the three-dimensional object 280 are shown. For example, the three-dimensional object 280 is, for example, a cube, and the six surfaces have different colors. The top view and the bottom view of the three-dimensional object 280 are shown in FIGS. 2A and 2B, respectively.

第2C~2F圖係分別表示在三維物件280在四個不同方向的斜視圖。舉例來說，對於使用者而言，在三維物件280的斜上方的不同視角會觀看到不同的視角影像，即例如第2C~2F圖所示，且不同的視角影像均是二維影像。 The 2C~2F diagrams respectively show oblique views in three different directions of the three-dimensional object 280. For example, for the user, different viewing angle images are viewed at different viewing angles obliquely above the three-dimensional object 280, that is, for example, as shown in FIGS. 2C-2F, and different viewing angle images are two-dimensional images.

需注意的是，為了便於說明，在此實施例中僅用6個視角影像來表示三維物件之不同視角。本發明領域中具有通常知識者當可了解可隨著實際情況以調整三維物件之視角影像的數量。 It should be noted that, for convenience of description, only six viewing angle images are used in this embodiment to represent different viewing angles of the three-dimensional object. Those of ordinary skill in the art will be able to understand the number of viewing angle images that can be adjusted as a function of the actual situation.

如第3A圖所示，使用者20係戴著顯示裝置200，且在使用者之視線係對準三維物件30。舉例來說，若以三維物件30之中心點31為準定義一圓球區域300，且點32在水平平面是定義為0度角。使用者之視線與該中心點31之水平平面之夾角(即傾角)為φ，且其水平夾角為θ。因此，可定義此時的使用者與三維物件之間的相對視角為(θ,φ)。此時，使用者20可透過顯示裝置200觀看到相應於相對視角(θ,φ)的輸出影像50A。 As shown in FIG. 3A, the user 20 wears the display device 200 and aligns the three-dimensional object 30 with the user's line of sight. For example, if a center point 31 of the three-dimensional object 30 is used, a sphere region 300 is defined, and the point 32 is defined as a 0 degree angle in the horizontal plane. The angle between the line of sight of the user and the horizontal plane of the center point 31 (i.e., the angle of inclination) is φ, and the horizontal angle is θ. Therefore, the relative viewing angle between the user and the three-dimensional object at this time can be defined as (θ, φ). At this time, the user 20 can view the output image 50A corresponding to the relative angle of view (θ, φ) through the display device 200.

如第3B圖所示，當使用者20之頭部往上移動(例如使用者站立起來)且其視線仍然對準三維物件30時，使用者之視線與該中心點31之水平平面之夾角(即傾角)增加為φ’，但其水平夾角仍然維持在θ。因此，可定義此時的使用者與三維物件之間的相對視角為(θ,φ’)。此時，使用者20可透過顯示裝置200觀看到相應於相對視角(θ,φ’)的輸出影像50B。 As shown in FIG. 3B, when the head of the user 20 is moved upward (for example, the user stands up) and the line of sight is still aligned with the three-dimensional object 30, the angle between the user's line of sight and the horizontal plane of the center point 31 ( That is, the inclination angle is increased to φ', but the horizontal angle is still maintained at θ. Therefore, the relative viewing angle between the user and the three-dimensional object at this time can be defined as (θ, φ'). At this time, the user 20 can view the output image 50B corresponding to the relative angle of view (θ, φ') through the display device 200.

需注意的是，使用者20所看到的輸出影像50A或50B，均可能是由最接近相對視角的視角影像及其相鄰的視角影像進行內插計算而得。 It should be noted that the output image 50A or 50B seen by the user 20 may be calculated by interpolating the view image closest to the relative view angle and its adjacent view image.

表1-1及表1-2係顯示本發明一實施例中之水平夾角及傾角之不同組合。 Table 1-1 and Table 1-2 show different combinations of horizontal angles and inclinations in an embodiment of the present invention.

在表1-1及表1-2之實施例中，三維物件之視角變化可在水平軸及垂直軸上每45度角均設定一視角，且水平夾角θ之範圍為0至360度，傾角φ之範圍為-90度至90度。 In the embodiments of Table 1-1 and Table 1-2, the viewing angle of the three-dimensional object can be set at a 45-degree angle on the horizontal axis and the vertical axis, and the horizontal angle θ ranges from 0 to 360 degrees, and the inclination angle is The range of φ is -90 degrees to 90 degrees.

更進一步而言，在傾角φ為90度或-90度時，即表示使用者是從三維物件之正上方及正下方觀看該三維物件。在使用者較常觀看之傾角可設定為45度、0度、及-45度，水平夾角則是每隔45度角均可設定。因此，可得出表1-1及表1-2之不同相對視角之組合。例如相對視角之集合C1表示水平夾角為0度，集合C2表示水平夾角為45度，依此類推。上述的相對視角之資訊在其相應的視角影像在處理器110進行編碼時均會記錄於視訊流或影像檔的元資料中。 Furthermore, when the inclination angle φ is 90 degrees or -90 degrees, it means that the user views the three-dimensional object from directly above and below the three-dimensional object. The angle of inclination of the user can be set to 45 degrees, 0 degrees, and -45 degrees, and the horizontal angle can be set every 45 degrees. Therefore, a combination of different relative viewing angles of Table 1-1 and Table 1-2 can be obtained. For example, the set of relative viewing angles C1 represents a horizontal angle of 0 degrees, the set C2 represents a horizontal angle of 45 degrees, and so on. The information of the relative viewing angle described above is recorded in the metadata of the video stream or the image file when the corresponding view image is encoded by the processor 110.

在一些實施例中，本發明可針對不同的視角來設定不同數量的視角影像。舉例來說，三維物件之正上方及正下方這兩個角度比較不會被使用者觀看到，反而是斜上方及斜下方之角度(例如傾角為45度及-45度)較容易被使用者觀看到。例如，在正上方及正下方的視角影像僅分別需設定一張視角影像，但在斜上方及斜下方之視角則很容易被使用者觀看到，因此可針對斜上方及斜下方之視角分割為更多的視角數量，例如可在傾角為45度時，在水平夾角分為12個角度，即每30度一個水平夾角。類似地，可在傾角為-45度時，在水平夾角同樣分為12個角度，即每30度一個水平夾角。 In some embodiments, the present invention can set different numbers of perspective images for different viewing angles. For example, the angles directly above and below the three-dimensional object are not viewed by the user, but the angles above and below the oblique angle (for example, the angle of inclination is 45 degrees and -45 degrees) are more easily used by the user. Watch it. For example, the viewing angle images directly above and below only need to set a single viewing angle image, but the viewing angles above and below the oblique angle are easily viewed by the user, so the angle of view can be divided into obliquely upward and obliquely below. The number of more angles of view, for example, can be divided into 12 angles at a horizontal angle of 45 degrees, that is, a horizontal angle every 30 degrees. Similarly, at an inclination angle of -45 degrees, the horizontal angle is equally divided into 12 angles, that is, a horizontal angle every 30 degrees.

在一些實施例中，三維物件可能具有特定的標準形狀，例如是立方體(cube)可用6個視角的視角影像來表示，二十面體(icosahedron)則可用20個視角的視角影像來表示，其中每個視角在垂直角度之間隔是相等的(equidistant)，且在水平角度之間隔亦是相等的。 In some embodiments, the three-dimensional object may have a specific standard shape, for example, a cube may be represented by a view angle of 6 angles of view, and an icosahedron may be represented by a view angle of 20 angles, wherein Each viewing angle is equidistant at intervals of vertical angles and equally spaced at horizontal angles.

在步驟S410，利用處理器110取得一三維物件。舉例來說，該三維物件可以是視訊裝置100計算而得的三維模型，或是實際的三維物體。 In step S410, the processor 110 is used to obtain a three-dimensional object. For example, the three-dimensional object may be a three-dimensional model calculated by the video device 100 or an actual three-dimensional object.

在步驟S420，利用處理器110建立該三維物件之複數個視角影像，其中各視角影像係對應於觀看該三維物件之一視角。 In step S420, the processor 110 is used to create a plurality of view images of the three-dimensional object, wherein each view image corresponds to viewing a view angle of the three-dimensional object.

在步驟S430，利用處理器110對該複數個視角影像進行編碼以產生一視訊流，其中該視訊流之元資料(metadata)係記錄了各視角影像所相應的該視角。 In step S430, the processor 110 is used to encode the plurality of view images to generate a video stream, wherein the metadata of the video stream records the view angle corresponding to each view image.

在步驟S440，利用繪圖處理單元120對該視訊流進行解碼以得到該複數張視角影像。舉例來說，繪圖處理單元120係依據該視訊流之元資料對該視訊流進行解碼以得到該複數張視角影像。此外，繪圖處理單元120可選擇性地對該複數張視角影像進行影像處理，例如是背景去除或影像擷取，並將處理後的視角影像儲存於記憶體130。 In step S440, the video stream is decoded by the graphics processing unit 120 to obtain the plurality of view images. For example, the graphics processing unit 120 decodes the video stream according to the metadata of the video stream to obtain the plurality of view images. In addition, the graphics processing unit 120 can selectively perform image processing on the plurality of perspective images, such as background removal or image capture, and store the processed perspective images in the memory 130.

在步驟S450，利用繪圖處理單元120計算一使用者與該三維物件之間的一相對視角。舉例來說，繪圖處理單元120 可依據來自感測裝置260的感測資訊以計算該相對視角，亦可利用立體投影法計算該相對視角。在一些實施例中，計算相對視角之運算係可由處理器110進行，且處理器110係通知繪圖處理單元120該相對視角之資訊。 In step S450, the drawing processing unit 120 is used to calculate a relative viewing angle between a user and the three-dimensional object. For example, the drawing processing unit 120 The relative viewing angle may be calculated based on the sensing information from the sensing device 260, and may also be calculated using a stereoscopic projection method. In some embodiments, the calculation of the relative viewing angle is performed by the processor 110, and the processor 110 notifies the graphics processing unit 120 of the relative viewing angle information.

在步驟S460，利用繪圖處理單元120從該複數張視角影像中決定相應於該相對視角之一第一視角影像。 In step S460, the first processing unit image corresponding to the relative viewing angle is determined from the plurality of viewing angle images by the drawing processing unit 120.

在步驟S470，利用繪圖處理單元120依據該第一視角影像產生一輸出影像，並將該輸出影像傳送至顯示裝置200進行播放。舉例來說，該輸出影像可包括一左眼影像及一右眼影像，其分別於左眼顯示面板240及右眼顯示面板250進行播放，透過左眼鏡頭214及右眼鏡頭251而聚焦於使用者的左眼及右眼，故使用者可透過顯示裝置200觀看到虛擬實境影像或立體影像。 In step S470, the drawing processing unit 120 generates an output image according to the first view image, and transmits the output image to the display device 200 for playing. For example, the output image may include a left eye image and a right eye image, which are respectively played by the left eye display panel 240 and the right eye display panel 250, and are focused by the left lens 214 and the right lens 251. The left eye and the right eye of the user can view the virtual reality image or the stereo image through the display device 200.

綜上所述，本發明係提供一種視訊系統及其三維物件編解碼方法，其可將虛擬實境中之三維物件利用其在不同視角的視角影像來表示，且在編碼端是對三維物件之視角影像進行編碼而非直接對三維物件進行編碼，故能大幅編碼所產生的視訊流之大小，並節省編碼端與解碼端進行運算時所需的記憶體頻寬。此外，在解碼端亦僅需依據視訊流之元資料(metadata)對視訊流解碼以得到三維物件之各個視角影像。解碼端亦可依據使用者與三維物件之相對視角從三維物件之視角影像中選擇相應的一者以做為輸出影像。因此，對於虛擬實境裝置而言，在編碼端與解碼端所需的運算亦大幅減少，而且亦能節省運算所需的記憶體頻寬。 In summary, the present invention provides a video system and a three-dimensional object encoding and decoding method thereof, which can represent a three-dimensional object in a virtual reality by using a perspective image at different viewing angles, and at the encoding end is a three-dimensional object. The view image is encoded instead of directly encoding the three-dimensional object, so that the size of the generated video stream can be greatly encoded, and the memory bandwidth required for the operation of the encoding end and the decoding end is saved. In addition, at the decoding end, only the video stream is decoded according to the metadata of the video stream to obtain the respective view images of the three-dimensional object. The decoding end may also select a corresponding one of the three-dimensional object view images as the output image according to the relative viewing angle of the user and the three-dimensional object. Therefore, for the virtual reality device, the operations required at the encoding end and the decoding end are also greatly reduced, and the memory bandwidth required for the operation can also be saved.

本發明雖以較佳實施例揭露如上，然其並非用以限定本發明的範圍，任何所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍內，當可做些許的更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。 The present invention has been disclosed in the above preferred embodiments, and is not intended to limit the scope of the present invention. Any one of ordinary skill in the art can make a few changes without departing from the spirit and scope of the invention. The scope of protection of the present invention is therefore defined by the scope of the appended claims.

Claims

A video device includes: a memory; a processor for acquiring a three-dimensional object, and encoding a plurality of viewing angle images corresponding to the plurality of viewing angles of the three-dimensional object to generate a video stream; and a graphics processing unit Decoding the video stream to obtain a plurality of viewing angle images corresponding to the plurality of viewing angles of the three-dimensional object, and storing the plurality of viewing angle images in the memory; wherein the processor is the plural number of the three-dimensional object The image of the viewing angle is arranged as a two-dimensional image and spatially encoded to generate the video stream, and the metadata of the video stream records each view image corresponding to each view; wherein the drawing processing unit is based on one Sensing information of one of the display devices of the user determines a relative viewing angle between the user and the three-dimensional object, and calculates a corresponding viewing angle from the plurality of viewing angle images stored in the memory. One of the output images; wherein the drawing processing unit further transmits the output image to the display device for playing.

The video device of claim 1, wherein the plurality of viewing angles of the three-dimensional object are equidistantly arranged at a vertical angle and a horizontal angle.

The video processing device of claim 1, wherein after the drawing processing unit obtains the plurality of viewing angle images, the graphics processing unit further performs background removal and image capturing processing on the plurality of viewing angle images, and then After processing The plurality of viewing angle images are stored in the memory.

The video processing device of claim 1, wherein the mapping processing unit determines, by the plurality of viewing angle images, a first viewing angle image that is closest to the relative viewing angle and obtains one or more viewing angle images adjacent thereto. And calculating the output image from the first view image and the adjacent view image.

A three-dimensional object encoding and decoding method for a video device, wherein the video device comprises a processor and a graphics processing unit, the method comprising: using the processor to obtain a three-dimensional object, and the plurality of viewing angles of the three-dimensional object Corresponding multiple viewing angle images are encoded to generate a video stream; the graphics processing unit is used to decode the video stream to obtain a plurality of viewing angle images corresponding to the plurality of viewing angles; and the processor is used to select the three-dimensional object The plurality of viewing angle images are arranged as a two-dimensional image and spatially encoded to generate the video stream, wherein the video stream metadata records the respective viewing angle images corresponding to the respective viewing angles; Sensing information from one of the display devices of a user determines a relative viewing angle between the user and the three-dimensional object; and the drawing processing unit calculates the relative viewing angle from the plurality of viewing angle images An output image; and transmitting the output image to the display device by using the graphics processing unit Play.

The three-dimensional object encoding and decoding method according to claim 5, wherein the plurality of viewing angles of the three-dimensional object are equidistantly arranged at a vertical angle and a horizontal angle.

The method for encoding and decoding a three-dimensional object according to claim 5, further comprising: after the plurality of viewing angle images are acquired by the drawing processing unit, performing a background removal and image detection on the plurality of viewing angle images by using the drawing processing unit. Take the processing.

The method for encoding and decoding a three-dimensional object according to claim 5, further comprising: determining, by the drawing processing unit, one of the plurality of viewing angle images that is closest to the first viewing angle image and obtaining an adjacent one of the plurality of viewing angle images Or a plurality of viewing angle images; and the drawing processing unit calculates the output image from the first viewing angle image and the adjacent viewing angle image.