US20210227236A1 - Scalability of multi-directional video streaming - Google Patents
Scalability of multi-directional video streaming Download PDFInfo
- Publication number
- US20210227236A1 US20210227236A1 US17/221,299 US202117221299A US2021227236A1 US 20210227236 A1 US20210227236 A1 US 20210227236A1 US 202117221299 A US202117221299 A US 202117221299A US 2021227236 A1 US2021227236 A1 US 2021227236A1
- Authority
- US
- United States
- Prior art keywords
- version
- projection
- projection format
- image
- enhancement layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 43
- 239000010410 layer Substances 0.000 claims 27
- 239000011229 interlayer Substances 0.000 claims 6
- 238000004891 communication Methods 0.000 abstract description 12
- 238000009877 rendering Methods 0.000 description 15
- 230000008859 change Effects 0.000 description 8
- 230000004044 response Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000003384 imaging method Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 241000251468 Actinopterygii Species 0.000 description 1
- 235000013290 Sagittaria latifolia Nutrition 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 235000015246 common arrowhead Nutrition 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
- H04N19/29—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding involving scalability at the object level, e.g. video object layer [VOL]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/167—Position within a video image, e.g. region of interest [ROI]
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09G—ARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
- G09G5/00—Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
- G09G5/14—Display of multiple viewports
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/21805—Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44004—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving video buffer management, e.g. video decoder buffer or video display buffer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/162—User input
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
Definitions
- the present disclosure relates to coding techniques for multi-directional imaging applications.
- Some modern imaging applications capture image data from multiple directions about a camera. Some cameras pivot during image capture, which allows a camera to capture image data across an angular sweep that expands the camera's effective field of view. Some other cameras have multiple imaging systems that capture image data in several different fields of view. In either case, an aggregate image may be created that merges image data captured from these multiple views.
- a variety of rendering applications are available for multi-directional content.
- One rendering application involves extraction and display of a subset of the content contained in a multi-directional image.
- a viewer may employ a head mounted display and change the orientation of the display to identify a portion of the multi-directional image in which the viewer is interested.
- a viewer may employ a stationary display and identify a portion of the multi-directional image in which the viewer is interested through user interface controls.
- a display device extracts a portion of image content from the multi-directional image (called a “viewport” for convenience) and displays it. The display device would not display other portions of the multi-directional image that are outside an area occupied by the viewport.
- FIG. 1 illustrates a system according to an aspect of the present disclosure.
- FIG. 2 figuratively illustrates a rendering application for a sink terminal according to an aspect of the present disclosure.
- FIG. 3 illustrates an exemplary partitioning scheme in which a frame is partitioned into non-overlapping tiles.
- FIG. 4 illustrates a coded data stream that may be developed from coding of a single tile 410 , according to an aspect of the present disclosure.
- FIG. 5 illustrates a method according to an aspect of the present disclosure.
- FIG. 6 illustrates a method according to an aspect of the present disclosure.
- FIG. 7 illustrates example data flows of FIG. 6 .
- FIG. 8 illustrates a frame of omnidirectional video that may be coded by a source terminal.
- FIG. 9 illustrates a frame of omnidirectional video that may be coded by a source terminal.
- FIG. 10 is a simplified block diagram of an example video distribution system.
- FIG. 11 illustrates a frame 1100 of multi-directional video with a moving viewport.
- FIG. 12 is a functional block diagram of a coding system according to an aspect of the present disclosure.
- FIG. 13 is a functional block diagram of a decoding system according to an aspect of the present disclosure.
- FIG. 14 illustrates an exemplary multi-directional image projection format according to one aspect.
- FIG. 15 illustrates an exemplary multi-directional image projection format according to another aspect.
- FIG. 16 illustrates another exemplary multi-directional projection image format 1630 .
- FIG. 17 illustrates an exemplary prediction reference pattern.
- FIG. 18 illustrates two exemplary multi-directional projections for combining.
- FIG. 19 illustrates an exemplary system for creating a residual from two different multi-directional projections.
- aggregate source image data at a transmitter exceeds the data that is needed to display a rendering of a viewport at a receiver.
- Coding techniques for transmitting source data may account for a current viewport of the receiving rendering device.
- these coding techniques incur coding and transmission latency and coding inefficiency.
- first streams of coded video data are received from a source.
- the first streams include coded data for each of a plurality of tiles representing a multi-directional video, where each tile corresponding to a predetermined spatial region of the multi-directional video, and at least one tile of the plurality of tiles in the first streams contains a current viewport location at a receiver.
- the techniques include decoding the first streams corresponding to the at least one tile containing the current viewport location, and displaying the decoded content for the current viewport location.
- the viewport location at the receiver changes to include a new tile of the plurality of tiles, retrieving first streams for the new tile, decoding the retrieved first streams, displaying the decoded content for the changed viewport location, and transmitting information representing the changed viewport location to the source.
- FIG. 1 illustrates a system 100 according to an aspect of the present disclosure.
- the system 100 is shown as including a source terminal 110 and a sink terminal 120 interconnected by a network 130 .
- the source terminal 110 may transmit a coded representation of omnidirectional video to the sink terminal 120 .
- the sink terminal 120 may receive the coded video, decode it, and display a selected portion of the decoded video.
- FIG. 1 illustrates the source terminal 110 as a multi-directional camera that captures image data of a local environment before coding it.
- the source terminal 110 may receive omni-directional video from an external source (not shown), such as a streaming service or storage device.
- the sink terminal 120 may determine a viewport location in a three-dimensional space represented by the multi-directional image.
- the sink terminal 120 may select a portion of decoded video to be displayed, for example, based on the terminal's orientation in free space.
- FIG. 1 illustrates the sink terminal 120 as a head mounted display but, in other aspects, the sink terminal 120 may be another type of display device, such as a stationary flat panel display, smartphone, tablet computer, gaming device or portable media player. Different types of user controls may be provided with each such display type through which a viewer identifies the viewport.
- the sink terminal's device type is immaterial to the present discussion unless otherwise noted herein.
- the network 130 represents any number of computer and/or communication networks that extend from the source terminal 110 to the sink terminal 120 .
- the network 130 may include one or a combination of circuit-switched and/or packet-switched communication networks.
- the network 130 may communicate data between the source terminal 110 and the sink terminal 120 by any number of wireline and/or wireless communication media.
- the architecture and operation of the network 130 is immaterial to the present discussion unless otherwise noted herein.
- FIG. 1 illustrates a communication configuration in which coded video data is transmitted in a single direction from the source terminal 110 to the sink terminal 120 .
- Aspects of the present disclosure find application with communication equipment that exchange coded video data in a bidirectional fashion, from terminal 110 to terminal 120 and also from terminal 120 to terminal 110 .
- the principles of the present disclosure find application with both unidirectional and bidirectional exchange of video.
- FIG. 2 figuratively illustrates a rendering application for a sink terminal 200 according to an aspect of the present disclosure.
- omnidirectional video is represented as if it exists along a spherical surface 210 provided about the sink terminal 200 .
- the terminal 200 may select a portion of the video (called, a “viewport” for convenience) and display the selected portion.
- the terminal 200 may select different portions from the video.
- FIG. 2 illustrates the viewport changing from a first location 230 to a second location 240 along the surface 210 .
- the source terminal 110 may code video data according to an ITU-T/ISO MPEG coding protocol such as H.265 (HEVC), H.264 (AVC), and the upcoming H.266 (VVC) standard, an AOM coding protocol such as AV1, or a predecessor coding protocol.
- ITU-T/ISO MPEG coding protocol such as H.265 (HEVC), H.264 (AVC), and the upcoming H.266 (VVC) standard
- AV1 H.265
- AV1 a predecessor coding protocol.
- pixel blocks parse individual frames of video into spatial arrays of video, called “pixel blocks” herein, and may code the pixel blocks in a regular coding order such as a raster scan order.
- FIG. 3 illustrates an exemplary partitioning scheme in which a frame 300 is partitioned into non-overlapping tiles 310 . 0 - 310 . 11 .
- the frame 300 represents omnidirectional content (e.g., it represents image content in a perfect 360° field of view, the image content will be continuous across opposing left and right edges 320 , 322 of the frame 300 ).
- the tiles described here may be a special case of the tiles used in some standards, such as HEVC.
- the tiles used herein may be “motion constrained tile sets,” where all frames are segmented using the exact same tile partitioning, and each tile in every frame is only permitted to use prediction from co-located tiles in other frames. Filtering in the decoder loop may also be disallowed across tiles, providing decoding independency between tiles.
- FIG. 4 illustrates a coded data stream that may be developed from coding of a single tile 410 , according to an aspect of the present disclosure.
- the coded tile 410 may be coded in several representations 420 - 450 , labeled “tier 0,” “tier 1,” “tier 2,” and “tier 3” respectively, each corresponding to a predetermined bandwidth constraint.
- a tier 0 coding may be generated for a 500 kbps representation
- a tier 1 coding may be generated for a 2 Mbps representation
- a tier 2 coding may be generated for a 4 Mbps representation
- a tier 3 coding may be generated for an 8 Mbps representation.
- the number of tiers and the selection of target bandwidth may be tuned to suit individual application needs.
- the coded tile 410 also may contain a number of differential codings 460 - 480 , each coded differentially with respect to the coded data of the tier 0 representation and each having a bandwidth tied to the bandwidth of another bandwidth tier.
- the other differential codings 470 , 480 may have data rates that match the differences between the data rates of their base tiers 440 , 450 and the data rate of the tier 0 coding 420 .
- elements of the differential codings 460 , 470 , 480 may be coded predictively using content from a corresponding chunk of the tier 0 coding as a prediction reference; in such an embodiment, the differential codings 460 , 470 , 480 may be generated as enhancement layers according to a scalable coding protocol in which tier 0 serves as a base layer for those encodings.
- the codings 420 - 480 of the tile are shown as partitioned into individual chunks (e.g., chunks 420 . 1 - 420 .N for tier 0 420 , chunks 430 . 1 - 430 .N for tier 1 430 , etc.). Each chunk may be referenced by its own network identifier.
- a client device 120 FIG. 1
- FIG. 5 illustrates a method 500 according to an aspect of the present disclosure.
- terminal 110 may transmit high quality coding for tiles included in a current viewport (msg. 510 ) and low quality coding for other tiles (msg. 520 ) from source terminal 110 to sink terminal 120 .
- Sink terminal 120 may then decode and render data of the current viewport (box 530 ). If the viewport does not move to include different tiles (box 540 ), terminal 120 repeats decoding and rendering the current tiles (back to box 530 ). Alternately, if the viewport moves such that the tiles included in the viewport change, then the change in the viewport is reported back to the source terminal 110 (msg. 550 ).
- the source terminal 110 then repeats by sending high quality coding for the tiles of the new viewport location (back to msg. 510 ), and low quality tiles that do not include the new viewport location (msg. 520 ).
- the operations illustrated in FIG. 5 are expected to provide low latency rendering of new viewports of multi-directional video in the presence of communication latencies between a source terminal 110 and a sink terminal 120 .
- a sink terminal 120 may buffer the data locally. If/when a viewport changes to a spatial location that coincides with one of the formerly non-viewed viewports, the locally-buffered video may be decoded and displayed. The decoding and display can occur without incurring latencies involved with round-trip communication from the sink terminal 120 to the source terminal 110 , which would be needed if data of the non-viewed viewport(s) were not prefetched to the sink device 120 .
- a sink terminal 120 may identify a location of current viewport by identifying a spatial location within the multiview image at which the viewport is located, for example, by identifying its location within a coordinate space defined for the image (see, FIG. 2 ).
- a sink terminal 120 may identify tier(s) of a multi-directional image ( FIG. 3 ) in which its current viewport is located and request chunk(s) from the tiers ( FIG. 4 ) based on this identification.
- FIG. 6 illustrates a method 600 of exemplary tile download according to an aspect of the present disclosure.
- FIG. 6 illustrates download operations that may occur for a tile that is not being viewed initially but to which the viewport moves during operation.
- a sink terminal 120 may issue requests for the tile at a tier 0 level of services, which are downloaded to the terminal 120 from a source terminal 110 .
- FIG. 6 illustrates a request 610 for a chunk Y of the tile, from the tier 0 level of service.
- the terminal 110 may provide content of the chunk Y in a response message 630 .
- the request and response messages 610 , 630 for the chunk Y may be interleaved with other requests and responses exchanged by the source and sink terminals 110 (shown in phantom), 120 relating to chunks of other tiles, including both the tile in which the viewport is located and other tiles that are not being viewed.
- the viewport changes (box 620 ) from a prior tile to the tile that was requested in msg. 610 .
- the viewport may change either while a request (msg. 610 ) for chunk Y is pending or after the content of chunk Y has been received (msg. 630 ).
- the example of FIG. 6 illustrates the viewport change (box 620 ) as occurring while msg. 610 is pending.
- the terminal 120 may determine, from a history of prior requests, that a chunk Y at a tier 0 service level either has been requested or already has been received and is stored locally at the terminal 120 .
- the terminal 120 may estimate whether there is time to request additional data of chunk Y (a differential tier) before the chunk Y must be rendered. If so, the terminal 120 may issue a request for chunk Y of the new tile using a differential tier (msg. 640 ).
- the sink terminal 120 may render chunk Y (box 660 ) using content developed from the content provided in messages 630 and 650 . If not, the sink terminal 120 may render chunk Y (box 660 ) using content developed from the tier 0 level of service (msg. 630 ).
- FIG. 7 illustrates a rendering timeline of chunks that may occur according to the foregoing aspects of the present disclosure.
- FIG. 7 includes a data stream for a prior tile 710 , for example for the tile of a viewport location prior to the change of the viewport location as in box 620 of FIG. 6
- FIG. 7 includes a data stream for a new tile 720 , for example for the tile that includes the new viewport location after box 620 of FIG. 6
- Data for prior tile 710 includes chunks Y ⁇ 3 to Y+1
- data for new tile includes chucks Y ⁇ 3 to Y+4.
- chunks Y ⁇ 3 to Y ⁇ 1 for the prior tile are shown having been retrieved at a relatively high level of service or quality (shown as tier 3) and, prior to a viewport switch, being rendered.
- a viewport switch occurs from the prior tile 710 to the new tile 720 in the midst of chunk Y ⁇ 1, a tier 0 level of service may be rendered for tile 720 at chunk Y ⁇ 1. This may occur, for example, if a sink device 120 estimates that insufficient time exists to download a differential tier for new tile 720 at chunk Y ⁇ 1, or if the sink device 120 requested a differential tier for the chunk but it was not received in time to be rendered.
- FIG. 7 illustrates rendering of tile 720 at chunks Y to Y+2 using data from both tier 0 and from differential tiers. This may occur, for example, if a sink device 120 had already requested the tier 0 levels of service for the chunks Y to Y+2 prior to the viewport switch and (for example, see request 610 in FIG. 6 ), after the switch, the sink device retrieved differential tiers for those chunks Y to Y+2 (for example, see response 650 in FIG. 6 ).
- FIG. 7 illustrates rendering of tile 720 from tier 3 starting from chunk Y+3.
- a switch from differential tiers to higher quality tiers may occur for chunks for which download requests are made after the viewport switch occurs.
- a sink terminal 120 may determine what tiers to request for the new tile from its operating state and the transmission latency in the system.
- the transitional period may include rendering the new viewport location from a lower quality of service (such as tier 0 for chunk Y ⁇ 1 in FIG. 7 ).
- the transitional period may also include rendering the new viewport location from an enhanced lower quality of services (such as tier 0 enhanced by the differential tier for chunks Y to Y+2 in FIG. 7 ).
- FIG. 8 illustrates a frame 800 of omnidirectional video that may be coded by a source terminal 110 .
- the frame 800 is illustrated as having been parsed into a plurality of tiles 810 . 0 - 810 . n .
- Each tile may be coded in raster scan order.
- content of tile 810 . 0 may be coded separately from content of tile 810 . 1
- content of tile 810 . 1 may be coded separately from content of tile 810 . 2 .
- tiles 810 . 1 - 810 . n may be coded in multiple tiers, producing discrete encoded data that may be segmented by both tier and tile.
- encoded data may also be segmented into time chunks.
- encoded data may be segmented into discrete segments for each time chunk, tile, and tier.
- a sink terminal 120 may extract a viewport 830 from the frame 800 , after it is coded by the source terminal 110 ( FIG. 1 ), transmitted to the sink terminal 120 , and decoded.
- the sink terminal 120 may display the viewport 800 locally.
- the sink terminal 120 may transmit to the source terminal 110 viewport information, such as data identifying a location of the viewport 830 within an area of the frame 800 .
- the sink terminal 120 may transmit offset data, shown as offset-x and offset-y from origin 820 , identifying a location of the viewport 830 within the area of the frame 800 .
- a size and/or shape of the viewport 830 may be included in the viewport information sent to source terminal 110 .
- Source terminal 120 may then use the received viewport information to select which discrete portions of encoded data to transmit to sink terminal 120 .
- viewport 830 spans tiles 810 . 5 and 810 . 6 .
- a first tier may be sent for tiles 810 . 5 and 810 . 6
- a second tier may be sent for the remaining tiles that do not include any portion of the viewport.
- the first tier may be sent to sink terminal 120 for tiles 810 . 5 and 810 . 6
- the second tier providing lower quality video may be sent for some or all of the other tiles.
- a lower quality tier may be provided for all tiles. In another aspect a lower quality tier may be provided for only a portion of the frame 800 . For example, a lower quality tier may be provided only for 180 degrees of view centered on the current viewport (instead of 360 degrees), or the lower quality tier may be provided only in areas of frame 800 where the viewport is likely to move next.
- frame 800 may be encoded according to a layered coding protocol, where one tier is coded as a base layer, and other tiers are encoded as enhancement layers of the base layer.
- An enhancement layer may be predicted from one or more lower layers. For example, a first enhancement layer may be predicted from the base layer, and a second, higher enhancement layer may be predicted from either the base layer or from the first, lower enhancement layer.
- An enhancement layer may be differentially or predictively coded from one or more lower layers.
- Non-enhancement layers such as a base layer, may be encoded independently of other layers. Reconstruction at a decoder of a differentially coded layer will require both the encoded data segment of the differentially coded layer and the segment(s) from the differentially coded layer(s) from which it is predicted.
- sending that layer may include sending both the discrete encoded data segment of the predictively coded layer, and also sending the discrete encoded data segment of the layer(s) used as a prediction reference.
- differential layered coding of frame 800 a lower base layer may be sent to sink terminal 120 , for all tiles, while discrete data segments for a higher differential layer (that is coded using predictions from the base layer) may be sent only for tiles 810 . 5 and 810 . 6 as the viewport 830 is included in those tiles.
- FIG. 9 illustrates a frame 900 of omnidirectional video that may be coded by a source terminal 110 .
- the frame 900 is illustrated as having been parsed into a plurality of tiles 810 . 0 - 810 . n .
- Frame 900 may represent a different video time from frame 800 , for example a frame 900 may be a later time in the timeline of the video.
- the viewport of sink terminal 120 may have moved to the location of viewport 930 , which may be identified by offset-x′ and offset-y′ from origin 820 .
- the viewport of sink terminal 120 moves from the location of viewport 830 in FIG. 8 to the location of viewport 930 in FIG.
- the sink terminal sends the new viewport information to source terminal 110 .
- sink terminal 120 may change which discrete segments of encoded video are sent to sink terminal, such that a first layer may be sent for tiles that include a portion of the viewport, while a second layer may be sent for tiles that do not include a portion of the viewport.
- pixels of tiles 810 . 0 and 810 . 1 are included in viewport 930 and hence a first layer may be sent for these tiles, while a second layer may be sent for the tiles that do not include a portion of the viewport.
- FIG. 10 is a simplified block diagram of an example video distribution system 100 suitable for use with the present invention, including when multi-directional video is pre-encoded and stored on a server.
- the system 1000 may include a distribution server system 1010 and a client device 1020 connected via a communication network 1030 .
- the distribution system 1000 may provide coded multi-directional video data to the client 1020 in response to client requests.
- the client 1020 may decode the coded video data and render it on a display.
- the distribution server 1010 may include a storage system 1040 on which pre-encoded multi-directional videos are stored in a variety of tiers for download by the client device 1020 .
- the distribution server 1010 may store several coded representations of a video content item, shown as tiers 1, 2, and 3, which have been coded with different coding parameters.
- the video content item includes a manifest file containing pointers to chunks of encoded video data for each tier.
- the Tiers 1 and 2 differ by average bit rate, with Tier 2 enabling a higher quality reconstruction of the video content item at a higher average bitrate compared to that provided by Tier 1.
- the difference in bitrate and quality may be induced by differences in coding parameters—e.g., coding complexity, frame rates, frame size and the like.
- Tier 3 may be an enhancement layer of Tier 1, which, when decoded in combination with Tier 1, may improve the quality of the Tier 1 representation if it were decoded by itself.
- Each video tier 1-3 may be parsed into a plurality of chunks CH 1 . 1 -CH 1 .N, CH 2 . 1 -CH 2 .N, and CH 3 . 1 -CH 3 .N.
- Manifest file 1050 may include pointers to each chunk of encoded video data for each tier. The different chunks may be retrieved from storage and delivered to the client 1020 over a channel defined in the network 1030 .
- Channel stream 1040 represents aggregation of transmitted chunks from multiple tiers.
- a multi-directional video may be spatially segmented into tiles.
- FIG. 10 depicts the chunks available for the various tiers of one tile.
- Manifest 1050 may additionally include other tiles (not depicted in FIG. 10 ), such as by providing metadata and pointers to multiple tiers including storage locations encoded data chunks for each of the various tiers.
- FIG. 10 illustrates three encoded video tiers 1, 2, and 3 for one tile, each tier coded into N chunks (1 to N) with different coding parameters.
- this example illustrates the chunks of each tier as temporally-aligned so that chunk boundaries define respective time periods (t 1 , t 2 , t 3 , . . . , t N ) of video content.
- Chunk boundaries may provide preferred points for stream switching between the tiers. Stream switching may be facilitated, for example, by resetting motion prediction coding state at switching points.
- Times A, B, C, and D are depicted in FIG. 10 in part to assist in illustrating a moving viewport in an aspect of this disclosure.
- Times A, B, C, and D are positioned along the streaming timeline of the media chunks referenced by manifest 1050 .
- Times A, B, and D may correspond to the beginning of time period t 1 , t 2 , and t 3 , respectively, while time C may correspond to a time somewhere in the middle of time period t 2 , between the beginning of t 2 and the beginning of t 3 .
- multi-directional image data may include depth maps and/or occlusion information.
- Depth maps and/or occlusion information may be included as separate channel(s) and manifest 1050 may include references to these separate channel(s) for depth maps and/or occlusion information.
- FIG. 11 illustrates a frame 1100 of multi-directional video with a moving viewport.
- frame 1100 is illustrated as having been parsed into a plurality of tiles 1110 . 0 - 1110 . n .
- viewport location 1130 which may correspond to a first location of a viewport in client 1020 at first time
- viewport location 1140 which may correspond to a second location of the same viewport at a second time.
- client 1020 may extract a viewport image from the high reconstruction quality of tier 2.
- client 1020 may extract a viewport image from the reconstructed combination of tier 1 and enhancement layer tier 3 when the viewport moves into a new spatial tile, and then return to a steady state by extracting a viewport image from tier 2 once tier 2 is again available at client 1020 .
- An example of this is illustrated in tables 1 and 2 for a viewport of client 1020 that were to jump from viewport location 1130 to viewport location 1140 right at time C.
- Client 1020 requests for tiers of tiles is listed in Table 1, and tiers from which a viewport image is extracted is listed in Table 2.
- Tier 2 being the higher quality tier, may be requested by client 1020 from server 1010 for tile 1110 . 0 at time A, as indicated in Table 1.
- client 1020 For tiles not included in the viewport at location 1130 (tiles 1110 . 1 - 1110 . n ), the lower quality and more highly compressed tier 1 is requested instead.
- tier 1 chunks are requested for time period t 1 at time A for all tiles other than tile 1110 . 0 .
- the viewport is then extracted from the reconstruction of tier 2 by client 1020 starting at time A.
- the viewport has not yet moved, so the same tiers are requested by client 1020 for the same tiles as at time A, but the requests are for the specific chunks corresponding to time period t 2 .
- the viewport of client 1020 may jump from viewport location 1130 to location 1140 .
- location 1140 At the point to time C, somewhere between the beginning and end of t 2 , lower quality tier 1 has already been requested for the new location of the viewport, tile 1110 . 5 . So, a viewport can be extracted immediately from tier 1 at time C when the viewport moves.
- tier 3 can may be requested, and as soon as it is available, the combination of tier 1 and enhancement layer tier 3 can be used for extracting a viewport image at client 1020 .
- client 1020 may go back to a steady state by requesting layer 2 for tiles containing the viewport location, and layer 0 for tiles not containing the viewport location.
- FIG. 12 is a functional block diagram of a coding system 1200 according to an aspect of the present disclosure.
- the system 1200 may include an image source 1210 , an image processing system 1220 , a video coder 1230 , a video decoder 1240 , a reference picture store 1250 and a predictor 1260 .
- the image source 1210 may generate image data as a multi-directional image, containing image data of a field of view that extends around a reference point in multiple directions.
- the image processing system 1220 may perform image processing operations to condition the image for coding.
- the image processing system 1220 may generate different versions of source data to facilitate encoding the source data into multiple layers of coded data.
- image processing system 1220 may generate multiple different projections of source video aggregated from multiple cameras.
- image processing system 1220 may generate resolutions of source video for a high layer with a higher spatial resolution and a lower layer with a lower spatial resolution.
- the video coder 1230 may generate a multi-layered coded representation of its input image data, typically by exploiting spatial and/or temporal redundancies in the image data.
- the video coder 1230 may output a coded representation of the input data that consumes less bandwidth than the original source video when transmitted and/or stored.
- Video coder 1230 may output data in discrete time chunks corresponding to a temporal portion of source image data, and in some aspects, separate time chunks encoded data may be decoded independently of other time chunks.
- Video coder 1230 may also output data in discrete layers, and in some aspects, separate layers may be transmitted independently of other layers.
- the video decoder 1240 may invert coding operations performed by the video encoder 1230 to obtain a reconstructed picture from the coded video data.
- the coding processes applied by the video coder 1230 are lossy processes, which cause the reconstructed picture to possess various errors when compared to the original picture.
- the video decoder 1240 may reconstruct pictures of select coded pictures, which are designated as “reference pictures,” and store the decoded reference pictures in the reference picture store 1250 . In the absence of transmission errors, the decoded reference pictures may replicate decoded reference pictures obtained by a decoder (not shown in FIG. 12 ).
- the predictor 1260 may select prediction references for new input pictures as they are coded. For each portion of the input picture being coded (called a “pixel block” for convenience), the predictor 1260 may select a coding mode and identify a portion of a reference picture that may serve as a prediction reference search for the pixel block being coded.
- the coding mode may be an intra-coding mode, in which case the prediction reference may be drawn from a previously-coded (and decoded) portion of the picture being coded.
- the coding mode may be an inter-coding mode, in which case the prediction reference may be drawn from another previously-coded and decoded picture.
- prediction references may be pixel blocks previously decoded from another layer, typically a lower layer lower than the layer currently being encoded.
- a function such as an image warp function may be applied to a reference image in one projection format at a first layer to predict a pixel block in a different projection format at a second layer.
- a differentially coded enhancement layer may be coded with restricted prediction references to enable seeking or layer/tier switching into the middle of an encoded enhancement layer chunk.
- predictor 1260 may restrict prediction references of only every frame in an enhancement layer to be frames of a base layer or other lower layer. When every frame of an enhancement layer is predicted without reference to other frames of the enhancement layer, a decoder may switch to the enhancement layer at any frame efficiently because previous enhancement layer frames will never be necessary to reference as a prediction reference.
- predictor 1260 may require that every Nth frame (such as every other frame) within a chuck be predicted only from a base layer or other lower layer to enable seeking to every Nth frame within an encoded data chunk.
- the predictor 1260 may furnish the prediction data to the video coder 1230 .
- the video coder 1230 may code input video data differentially with respect to prediction data furnished by the predictor 1260 .
- prediction operations and the differential coding operate on a pixel block-by-pixel block basis.
- Prediction residuals which represent pixel-wise differences between the input pixel blocks and the prediction pixel blocks, may be subject to further coding operations to reduce bandwidth further.
- the coded video data output by the video coder 1230 should consume less bandwidth than the input data when transmitted and/or stored.
- the coding system 1200 may output the coded video data to an output device 1270 , such as a transceiver, that may transmit the coded video data across a communication network 130 ( FIG. 1 ).
- the coding system 1200 may output coded data to a storage device (not shown) such as an electronic-, magnetic- and/or optical storage medium.
- the transceiver 1270 also may receive viewport information from a decoding terminal ( FIG. 7 ) and provide the viewport information to controller 1280 .
- Controller 1280 may control the image processor 1220 , the video coding process overall, including video coder 1230 and transceiver 1270 .
- Viewport information received by transceiver 1270 may include a viewport location and/or a preferred projection format.
- controller 1280 may control transceiver 1270 based on viewport information to send certain coded layer(s) for certain spatial tiles, while sending a different coded layer(s) for other tiles.
- controller 1280 may control the allowable prediction references in certain frames of certain layers.
- controller 1280 may control the projection format(s) or scaled layers produced by image processor 1230 based on the received viewport information.
- FIG. 13 is a functional block diagram of a decoding system 1300 according to an aspect of the present disclosure.
- the decoding system 1300 may include a transceiver 1310 , a buffer 1315 , a video decoder 1320 , an image processor 1330 , a video sink 1340 , a reference picture store 1350 , a predictor 1360 , and a controller 1370 .
- the transceiver 1310 may receive coded video data from a channel and route it to buffer 1315 before sending it to video decoder 1320 .
- the coded video data may be organized into chunks of time and spatial tiles, and may include different coded layers for different tiles.
- the video data buffered in buffer 1315 may span the video time of multiple chunks.
- the video decoder 1320 may decode the coded video data with reference to prediction data supplied by the predictor 1360 .
- the video decoder 1320 may output decoded video data in a representation determined by a source image processor (such as image processor 1220 of FIG. 12 ) of a coding system that generated the coded video.
- the image processor 1330 may extract video data from the decoded video according to the viewport orientation currently in force at the decoding system.
- the image processor 1330 may output the extracted viewport data to the video sink device 1340 .
- Controller 1370 may control the image processor 1330 , the video decoding processing including video decoder 1320 , and transceiver 1310 .
- the video sink 1340 may consume decoded video generated by the decoding system 1300 .
- Video sinks 1340 may be embodied by, for example, display devices that render decoded video.
- video sinks 1340 may be embodied by computer applications, for example, gaming applications, virtual reality applications and/or video editing applications, that integrate the decoded video into their content.
- a video sink may process the entire multi-directional field of view of the decoded video for its application but, in other applications, a video sink 1340 may process a selected sub-set of content from the decoded video. For example, when rendering decoded video on a flat panel display, it may be sufficient to display only a selected subset of the multi-directional video.
- decoded video may be rendered in a multi-directional format, for example, in a planetarium.
- the transceiver 1310 also may send viewport information provided by the controller 1370 , such as a viewport location and/or a preferred projection format, to the source of encoded video, such as terminal 1200 of FIG. 12 .
- controller 1370 may provide new viewport information to transceiver 1310 to send on to the encoded video source.
- new viewport information missing layers for certain previously received but not yet decoded tiles of encoded video may be received by transceiver 1310 and stored in buffer 1315 . Decoder 1320 may then decode these tiles using these replacement layers (which were previously missing) instead of the layers that had previously been received based on the old viewport location.
- Controller 1370 may determine viewport information based on a viewport location.
- the viewport information may include just a viewport location, and the encoded video source may then use the location to identify which encoded layers to provide to decoding system 1300 for specific spatial tiles.
- viewport information sent from the decoding system may include specific requests for specific layers of specific tiles, leaving much of the viewport location mapping in the decoding system.
- viewport information may include a request for a particular projection format based on the viewport location.
- the principles of the present disclosure find application with a variety of projection formats of multi-directional images.
- FIG. 14 illustrates an exemplary multi-directional image projection format according to one aspect.
- the multi-directional image 1430 may be generated by a camera 1410 that pivots along an axis.
- the camera 1410 may capture image content as it pivots along a predetermined angular distance 1420 (preferably, a full 360°) and may merge the captured image content into a 360° image.
- the capture operation may yield a multi-directional image 1430 that represents a multi-directional field of view having been partitioned along a slice 1422 that divides a cylindrical field of view into a two dimensional array of data.
- pixels on either edge 1432 , 1434 of the image 1430 represent adjacent image content even though they appear on different edges of the multi-directional image 1430 .
- FIG. 15 illustrates an exemplary multi-directional image projection format according to another aspect.
- a camera 1510 may possess image sensors 1512 - 1516 that capture image data in different fields of view from a common reference point.
- the camera 1510 may output a multi-directional image 1530 in which image content is arranged according to a cube map capture operation 1520 in which the sensors 1512 - 1516 capture image data in different fields of view 1521 - 1526 (typically, six) about the camera 1510 .
- the image data of the different fields of view 1521 - 1526 may be stitched together according to a cube map layout 1530 .
- a cube map layout 1530 In the example illustrated in FIG.
- pixels from the front image 1532 that are adjacent to the pixels from each of the left, the right, the top, and the bottom images 1531 , 1533 , 1535 , 1536 represent image content that is adjacent respectively to content of the adjoining sub-images.
- pixels from the right and back images 1533 , 1534 that are adjacent to each other represent adjacent image content. Further, content from a terminal edge 1538 of the back image 1534 is adjacent to content from an opposing terminal edge 1539 of the left image.
- the image 1530 also may have regions 1537 . 1 - 1537 . 4 that do not belong to any image.
- the representation illustrated in FIG. 15 often is called a “cube map” image.
- Coding of cube map images may occur in several ways.
- the cube map image 1530 may be coded directly, which includes coding of null regions 1537 . 1 - 1537 . 4 that do not have image content.
- the encoding techniques of FIG. 3 may be applied to cube map image 1530 .
- the cube map image 1530 may be repacked to eliminate null regions 1537 . 1 - 1537 . 4 prior to coding, shown as image 1540 .
- the techniques described in FIG. 3 may also be applied to a packed image frame 1540 . After decode, the decoded image data may be unpacked prior to display.
- FIG. 16 illustrates another exemplary multi-directional projection image format 1630 .
- the frame format of FIG. 16 may be generated by another type of omnidirectional camera 1600 , called a panoramic camera.
- a panoramic camera typically is composed of a pair of fish eye lenses 1612 , 1614 and associated imaging devices (not shown), each arranged to capture image data in a hemispherical view of view. Images captured from the hemispherical fields of view may be stitched together to represent image data in a full 360° field of view.
- FIG. 16 illustrates a multi-directional image 1630 that contains image content 1631 , 1632 from the hemispherical views 1622 , 1624 of the camera and which are joined at a seam 1635 .
- the techniques described hereinabove also find application with multi-directional image data in such formats 1630 .
- cameras such as the cameras 1410 , 1510 , and 1610 in FIGS. 14-16 , may capture depth or occlusion information in addition to visible light.
- depth and occlusion information may be stored as separate data channels of data in multi-projection formats such as images such as 1430 , 1530 , 1540 , and 1630 .
- depth and occlusion information may be included as a separate data channel in a manifest, such as manifest 1050 of FIG. 10 .
- FIG. 17 illustrates an exemplary prediction reference pattern.
- Video sequence 1700 includes a base layer 1720 and enhancement layer 1710 , each layer comprising a series of corresponding frames.
- Base layer 1720 includes an intra-coded frame L 0 .I 0 followed by predicted frames L 0 .P 1 -L 0 .P 7 .
- Enhancement layer 1710 includes predicted frames L 1 .P 0 -L 1 .P 7 .
- Intra-coded frame L 0 .I 0 may be coded without prediction from any other frame. Predicted frames may be coded by predicting pixel blocks of the frame portions of reference frames indicated by solid arrows in FIG.
- L 0 .P 1 is predicted only from frame L 0 .I 0 as a reference, L 0 .P 1 may be a reference for L 0 .P 2 , L 0 .P 2 may be reference for L 0 .P 3 , and so on, as indicated by the arrows inside base layer 1720 .
- the frames of enhancement layer 1710 may be predicted using only corresponding base layer reference frames, such that L 0 .I 0 may be a prediction reference for L 1 .P 0 , L 0 .P 1 may be a prediction reference for L 1 .P 1 , and so on.
- enhancement layer 1710 frames may also be predicted from previous enhancement layer frames, as indicated by optional dashed arrows in FIG. 17 .
- frame L 1 .P 7 may be predicted from either L 0 .P 7 or L 1 .P 6 .
- Prediction references within enhancement layer 1710 may be limited such that only a subset of enhancement layer frames may use other enhancement layer frames as a prediction reference, and this subset of enhancement layer frames may follow a pattern. In the example of FIG.
- every other frame of enhancement layer 1710 (L 1 .P 0 , L 1 .P 2 , L 1 .P 4 , and L 1 .P 6 ) is predicted only from the corresponding base layer frame, while alternate frames (L 1 .P 1 , L 1 .P 3 , L 1 .P 5 , L 1 .P 7 ) may be predicted from either base layer frames or previous enhancement layer frames.
- Tier switching to enhancement layer 1710 may be facilitated at the frames that are predicted only from lower layers because prior frames of the enhancement layer need not be previously decoded for use a reference frames.
- Enhancement layer frames that are predicted only from lower layer frames may be considered safe-switching frames, sometimes called key frames, because previous frames from the enhancement layer need not be available to correctly decode these safe switching frames.
- a sink terminal may switch to a new layer or new tier on non-safe-switching frames when some decoded quality drift may be tolerated.
- a non-safe switching frame may be decoded without having access to the reference frames used for its prediction, and quality gradually gets worse as errors from incorrect predictions accumulate into what may be called quality drift.
- Error concealment techniques may be used to mitigate the quality drift due to switching at non-safe-switching enhancement layer frames.
- Example error concealment techniques include predicting from a frame similar to the missing reference frame, and periodic intra-refresh mechanisms. By tolerating some quality drift caused by switching at non-safe-switching frames, the latency can be reduced between moving a viewport and presenting images of the new viewport location.
- FIG. 18 illustrates two exemplary multi-directional projections for combining. Images of the same scene may be encoded in a plurality of projection formats.
- a multi-directional scene is encoded as a first image with a first projection format, such as an image 1810 in equirectangular projection format
- the same scene is encoded as a second image in a second projection format, such as image 1820 in a cube map projection format.
- Region of interest 1812 projected onto equirectangular image 1810 and region of interest 1822 projected onto cube map image 1820 may both correspond to the same region of interest in the scene projected into images 1810 and 1820 .
- Cube Map image 1820 may include null regions 1837 . 1 - 1837 . 4 and cube faces left, front, right, back, top and bottom 1831 - 1836 .
- multiple projection formats may be combined to form a better reconstruction of a region of interest (ROI) than can be produced from a single projection format.
- ROI region of interest
- a reconstructed region of interest, ROI combo may be produced from a weighted sum of the encoded projections or may be produced from a filtered sum of the encoded projections.
- the region of interest in the scene of FIG. 18 may be reconstructed as:
- ROI combo f (ROI 1 ,ROI 2 )
- first region of interest image ROI 1 may be, for example, the equirectangular region of interest image from ROI 1812
- second region of interest image ROI 2 may be, for example, the cube map region of interest image from ROI 1822 . If f( ) is a weighted sum,
- ROI combo alpha*ROI 1 +beta*ROI 2
- a projection format conversion function may be used, as in:
- ROI combo alpha* P Conv(ROI 1 )+beta*ROI 2
- PConv( ) is a function that converts an image in a first projection format into a second projection format.
- PConv( ) may simply be an up-sample or a down-sample function.
- the best projection formation for encoding an entire multi-directional scene may be different than the best projection format for encoding only a region of interest, such as for encoding in an enhancement layer.
- a multi-tiered encoding of the scene of FIG. 18 may include encoding the entirety of equirectangular image 1810 in a first tier, and encoding only the ROI 1822 of cube map image 1820 in a second tier.
- ROI 1822 may be encoded by encoding the entire front face 1832 as a tile of cube map image 1820 .
- this second tier may be encoded as an enhancement layer over the first tier base layer, as depicted in FIG. 19 .
- FIG. 19 illustrates an exemplary system for creating a residual from two different multi-directional projections.
- a base layer ROI image 1910 in a projection format P 1 may be converted to a projection format P 2 by conversion process 1902 to create a prediction of the ROI image 1920 in projection format P 2 .
- the prediction image from conversion process 1902 is subtracted from the actual P 2 ROI image 1920 at adder 1904 to produce a P 2 residual ROI, which may then be encoded as a P 2 projection enhancement layer over a P 1 base layer.
- the base layer may encode the entire scene in projection P 1
- the enhancement layer may encode only a region of interest within the scene in projection P 2 .
- a first tier may be encoded as a base layer comprising the entire equirectangular image 1810
- a second tier may be encoded as an enhancement layer comprising a subset of cube map image 1820 such as a single tile or region of interest.
- Video decoders and/or controllers can be embodied in integrated circuits, such as application specific integrated circuits, field programmable gate arrays and/or digital signal processors. Alternatively, they can be embodied in computer programs that execute on camera devices, personal computers, notebook computers, tablet computers, smartphones or computer servers. Such computer programs include processor instructions and typically are stored in physical storage media such as electronic-, magnetic-, and/or optically-based storage devices, where they are read by a processor and executed.
- Decoders commonly are packaged in consumer electronics devices, such as smartphones, tablet computers, gaming systems, DVD players, portable media players and the like; and they also can be packaged in consumer software applications such as video games, media players, media editors, and the like. And, of course, these components may be provided as hybrid systems that distribute functionality across dedicated hardware components and programmed general-purpose processors, as desired.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
- The present disclosure relates to coding techniques for multi-directional imaging applications.
- Some modern imaging applications capture image data from multiple directions about a camera. Some cameras pivot during image capture, which allows a camera to capture image data across an angular sweep that expands the camera's effective field of view. Some other cameras have multiple imaging systems that capture image data in several different fields of view. In either case, an aggregate image may be created that merges image data captured from these multiple views.
- A variety of rendering applications are available for multi-directional content. One rendering application involves extraction and display of a subset of the content contained in a multi-directional image. For example, a viewer may employ a head mounted display and change the orientation of the display to identify a portion of the multi-directional image in which the viewer is interested. Alternatively, a viewer may employ a stationary display and identify a portion of the multi-directional image in which the viewer is interested through user interface controls. In these rendering applications, a display device extracts a portion of image content from the multi-directional image (called a “viewport” for convenience) and displays it. The display device would not display other portions of the multi-directional image that are outside an area occupied by the viewport.
-
FIG. 1 illustrates a system according to an aspect of the present disclosure. -
FIG. 2 figuratively illustrates a rendering application for a sink terminal according to an aspect of the present disclosure. -
FIG. 3 illustrates an exemplary partitioning scheme in which a frame is partitioned into non-overlapping tiles. -
FIG. 4 illustrates a coded data stream that may be developed from coding of asingle tile 410, according to an aspect of the present disclosure. -
FIG. 5 illustrates a method according to an aspect of the present disclosure. -
FIG. 6 . illustrates a method according to an aspect of the present disclosure. -
FIG. 7 . illustrates example data flows ofFIG. 6 . -
FIG. 8 illustrates a frame of omnidirectional video that may be coded by a source terminal. -
FIG. 9 illustrates a frame of omnidirectional video that may be coded by a source terminal. -
FIG. 10 is a simplified block diagram of an example video distribution system. -
FIG. 11 illustrates aframe 1100 of multi-directional video with a moving viewport. -
FIG. 12 is a functional block diagram of a coding system according to an aspect of the present disclosure. -
FIG. 13 is a functional block diagram of a decoding system according to an aspect of the present disclosure. -
FIG. 14 illustrates an exemplary multi-directional image projection format according to one aspect. -
FIG. 15 illustrates an exemplary multi-directional image projection format according to another aspect. -
FIG. 16 illustrates another exemplary multi-directionalprojection image format 1630. -
FIG. 17 illustrates an exemplary prediction reference pattern. -
FIG. 18 illustrates two exemplary multi-directional projections for combining. -
FIG. 19 illustrates an exemplary system for creating a residual from two different multi-directional projections. - In communication applications, aggregate source image data at a transmitter exceeds the data that is needed to display a rendering of a viewport at a receiver. Coding techniques for transmitting source data may account for a current viewport of the receiving rendering device. However, when accounting for a moving viewport, these coding techniques incur coding and transmission latency and coding inefficiency.
- Aspects of the present disclosure provide techniques for reducing latency and improving image quality of a viewport extracted from multi-directional video communications. According to such techniques, first streams of coded video data are received from a source. The first streams include coded data for each of a plurality of tiles representing a multi-directional video, where each tile corresponding to a predetermined spatial region of the multi-directional video, and at least one tile of the plurality of tiles in the first streams contains a current viewport location at a receiver. The techniques include decoding the first streams corresponding to the at least one tile containing the current viewport location, and displaying the decoded content for the current viewport location. When the viewport location at the receiver changes to include a new tile of the plurality of tiles, retrieving first streams for the new tile, decoding the retrieved first streams, displaying the decoded content for the changed viewport location, and transmitting information representing the changed viewport location to the source.
-
FIG. 1 illustrates asystem 100 according to an aspect of the present disclosure. There, thesystem 100 is shown as including asource terminal 110 and asink terminal 120 interconnected by anetwork 130. Thesource terminal 110 may transmit a coded representation of omnidirectional video to thesink terminal 120. Thesink terminal 120 may receive the coded video, decode it, and display a selected portion of the decoded video. -
FIG. 1 illustrates thesource terminal 110 as a multi-directional camera that captures image data of a local environment before coding it. In another aspect, thesource terminal 110 may receive omni-directional video from an external source (not shown), such as a streaming service or storage device. - The
sink terminal 120 may determine a viewport location in a three-dimensional space represented by the multi-directional image. Thesink terminal 120 may select a portion of decoded video to be displayed, for example, based on the terminal's orientation in free space.FIG. 1 illustrates thesink terminal 120 as a head mounted display but, in other aspects, thesink terminal 120 may be another type of display device, such as a stationary flat panel display, smartphone, tablet computer, gaming device or portable media player. Different types of user controls may be provided with each such display type through which a viewer identifies the viewport. The sink terminal's device type is immaterial to the present discussion unless otherwise noted herein. - The
network 130 represents any number of computer and/or communication networks that extend from thesource terminal 110 to thesink terminal 120. Thenetwork 130 may include one or a combination of circuit-switched and/or packet-switched communication networks. Thenetwork 130 may communicate data between thesource terminal 110 and thesink terminal 120 by any number of wireline and/or wireless communication media. The architecture and operation of thenetwork 130 is immaterial to the present discussion unless otherwise noted herein. -
FIG. 1 illustrates a communication configuration in which coded video data is transmitted in a single direction from thesource terminal 110 to thesink terminal 120. Aspects of the present disclosure find application with communication equipment that exchange coded video data in a bidirectional fashion, fromterminal 110 toterminal 120 and also fromterminal 120 toterminal 110. The principles of the present disclosure find application with both unidirectional and bidirectional exchange of video. -
FIG. 2 figuratively illustrates a rendering application for asink terminal 200 according to an aspect of the present disclosure. There, omnidirectional video is represented as if it exists along aspherical surface 210 provided about thesink terminal 200. Based on the orientation of thesink terminal 200, theterminal 200 may select a portion of the video (called, a “viewport” for convenience) and display the selected portion. As the orientation of thesink terminal 200 changes, theterminal 200 may select different portions from the video. For example,FIG. 2 illustrates the viewport changing from afirst location 230 to asecond location 240 along thesurface 210. - Aspects of the present disclosure may apply video compression techniques according to any of a number of coding protocols. For example, the source terminal 110 (
FIG. 1 ) may code video data according to an ITU-T/ISO MPEG coding protocol such as H.265 (HEVC), H.264 (AVC), and the upcoming H.266 (VVC) standard, an AOM coding protocol such as AV1, or a predecessor coding protocol. Typically, such protocols parse individual frames of video into spatial arrays of video, called “pixel blocks” herein, and may code the pixel blocks in a regular coding order such as a raster scan order. - In an aspect, individual frames of multi-directional content may be parsed into individual spatial regions, herein called “tiles”, and coded as independent data streams.
FIG. 3 illustrates an exemplary partitioning scheme in which aframe 300 is partitioned into non-overlapping tiles 310.0-310.11. In a case where theframe 300 represents omnidirectional content (e.g., it represents image content in a perfect 360° field of view, the image content will be continuous across opposing left andright edges - In an aspect, the tiles described here may be a special case of the tiles used in some standards, such as HEVC. In this aspect, the tiles used herein may be “motion constrained tile sets,” where all frames are segmented using the exact same tile partitioning, and each tile in every frame is only permitted to use prediction from co-located tiles in other frames. Filtering in the decoder loop may also be disallowed across tiles, providing decoding independency between tiles.
-
FIG. 4 illustrates a coded data stream that may be developed from coding of asingle tile 410, according to an aspect of the present disclosure. Thecoded tile 410 may be coded in several representations 420-450, labeled “tier 0,” “tier 1,” “tier 2,” and “tier 3” respectively, each corresponding to a predetermined bandwidth constraint. For example, atier 0 coding may be generated for a 500 kbps representation, atier 1 coding may be generated for a 2 Mbps representation, atier 2 coding may be generated for a 4 Mbps representation, and atier 3 coding may be generated for an 8 Mbps representation. In practice, the number of tiers and the selection of target bandwidth may be tuned to suit individual application needs. - The
coded tile 410 also may contain a number of differential codings 460-480, each coded differentially with respect to the coded data of thetier 0 representation and each having a bandwidth tied to the bandwidth of another bandwidth tier. Thus, in an example where thetier 0 coding is generated at a 500 Kbps representation and thetier 1 coding is generated at a 2 Mbps representation, thetier 1differential coding 460 may be coded at a 1.5 Mbps representation (1.5 Mbps=2 Mbps-500 Kbps). The otherdifferential codings base tiers tier 0coding 420. In an aspect, elements of thedifferential codings tier 0 coding as a prediction reference; in such an embodiment, thedifferential codings tier 0 serves as a base layer for those encodings. - The codings 420-480 of the tile are shown as partitioned into individual chunks (e.g., chunks 420.1-420.N for
tier 0 420, chunks 430.1-430.N fortier 1 430, etc.). Each chunk may be referenced by its own network identifier. During operation, a client device 120 (FIG. 1 ) may select individual chunks for download and request the chunks from a source terminal 120 (FIG. 1 ). -
FIG. 5 illustrates amethod 500 according to an aspect of the present disclosure. According to themethod 500, terminal 110 may transmit high quality coding for tiles included in a current viewport (msg. 510) and low quality coding for other tiles (msg. 520) fromsource terminal 110 to sink terminal 120. Sink terminal 120 may then decode and render data of the current viewport (box 530). If the viewport does not move to include different tiles (box 540), terminal 120 repeats decoding and rendering the current tiles (back to box 530). Alternately, if the viewport moves such that the tiles included in the viewport change, then the change in the viewport is reported back to the source terminal 110 (msg. 550). Thesource terminal 110 then repeats by sending high quality coding for the tiles of the new viewport location (back to msg. 510), and low quality tiles that do not include the new viewport location (msg. 520). - The operations illustrated in
FIG. 5 are expected to provide low latency rendering of new viewports of multi-directional video in the presence of communication latencies between asource terminal 110 and asink terminal 120. By transmitting low quality codings of tiles that do not belong to a current viewport, asink terminal 120 may buffer the data locally. If/when a viewport changes to a spatial location that coincides with one of the formerly non-viewed viewports, the locally-buffered video may be decoded and displayed. The decoding and display can occur without incurring latencies involved with round-trip communication from thesink terminal 120 to thesource terminal 110, which would be needed if data of the non-viewed viewport(s) were not prefetched to thesink device 120. - In an embodiment, a
sink terminal 120 may identify a location of current viewport by identifying a spatial location within the multiview image at which the viewport is located, for example, by identifying its location within a coordinate space defined for the image (see,FIG. 2 ). In another aspect, asink terminal 120 may identify tier(s) of a multi-directional image (FIG. 3 ) in which its current viewport is located and request chunk(s) from the tiers (FIG. 4 ) based on this identification. -
FIG. 6 illustrates amethod 600 of exemplary tile download according to an aspect of the present disclosure.FIG. 6 illustrates download operations that may occur for a tile that is not being viewed initially but to which the viewport moves during operation. Thus, asink terminal 120 may issue requests for the tile at atier 0 level of services, which are downloaded to the terminal 120 from asource terminal 110.FIG. 6 illustrates arequest 610 for a chunk Y of the tile, from thetier 0 level of service. The terminal 110 may provide content of the chunk Y in aresponse message 630. The request andresponse messages - In the example of
FIG. 6 , the viewport changes (box 620) from a prior tile to the tile that was requested in msg. 610. The viewport may change either while a request (msg. 610) for chunk Y is pending or after the content of chunk Y has been received (msg. 630). The example ofFIG. 6 illustrates the viewport change (box 620) as occurring while msg. 610 is pending. In response to the viewport change, the terminal 120 may determine, from a history of prior requests, that a chunk Y at atier 0 service level either has been requested or already has been received and is stored locally at the terminal 120. The terminal 120 may estimate whether there is time to request additional data of chunk Y (a differential tier) before the chunk Y must be rendered. If so, the terminal 120 may issue a request for chunk Y of the new tile using a differential tier (msg. 640). - If the
source terminal 110 provides the media content of the differential tier (msg. 650) before the chunk Y must be rendered, thesink terminal 120 may render chunk Y (box 660) using content developed from the content provided inmessages sink terminal 120 may render chunk Y (box 660) using content developed from thetier 0 level of service (msg. 630). -
FIG. 7 illustrates a rendering timeline of chunks that may occur according to the foregoing aspects of the present disclosure.FIG. 7 includes a data stream for aprior tile 710, for example for the tile of a viewport location prior to the change of the viewport location as inbox 620 ofFIG. 6 , andFIG. 7 includes a data stream for anew tile 720, for example for the tile that includes the new viewport location afterbox 620 ofFIG. 6 . Data forprior tile 710 includes chunks Y−3 to Y+1, and data for new tile includes chucks Y−3 to Y+4. In this example, chunks Y−3 to Y−1 for the prior tile are shown having been retrieved at a relatively high level of service or quality (shown as tier 3) and, prior to a viewport switch, being rendered. When a viewport switch occurs from theprior tile 710 to thenew tile 720 in the midst of chunk Y−1, atier 0 level of service may be rendered fortile 720 at chunk Y−1. This may occur, for example, if asink device 120 estimates that insufficient time exists to download a differential tier fornew tile 720 at chunk Y−1, or if thesink device 120 requested a differential tier for the chunk but it was not received in time to be rendered. - The example of
FIG. 7 illustrates rendering oftile 720 at chunks Y to Y+2 using data from bothtier 0 and from differential tiers. This may occur, for example, if asink device 120 had already requested thetier 0 levels of service for the chunks Y to Y+2 prior to the viewport switch and (for example, seerequest 610 inFIG. 6 ), after the switch, the sink device retrieved differential tiers for those chunks Y to Y+2 (for example, seeresponse 650 inFIG. 6 ). - The example of
FIG. 7 illustrates rendering oftile 720 fromtier 3 starting from chunk Y+3. - A switch from differential tiers to higher quality tiers (e.g., tier 3) may occur for chunks for which download requests are made after the viewport switch occurs. Thus, when a viewport changes from one tile to another, a
sink terminal 120 may determine what tiers to request for the new tile from its operating state and the transmission latency in the system. In some cases there will be a transitional period after the viewport moves and before the sink terminal can render the new viewport location at a high quality of service (such astier 3 for chunk Y+3 and later inFIG. 7 ). The transitional period may include rendering the new viewport location from a lower quality of service (such astier 0 for chunk Y−1 inFIG. 7 ). The transitional period may also include rendering the new viewport location from an enhanced lower quality of services (such astier 0 enhanced by the differential tier for chunks Y to Y+2 inFIG. 7 ). -
FIG. 8 illustrates aframe 800 of omnidirectional video that may be coded by asource terminal 110. There, theframe 800 is illustrated as having been parsed into a plurality of tiles 810.0-810.n. Each tile may be coded in raster scan order. Thus, content of tile 810.0 may be coded separately from content of tile 810.1, content of tile 810.1 may be coded separately from content of tile 810.2. Furthermore, tiles 810.1-810.n may be coded in multiple tiers, producing discrete encoded data that may be segmented by both tier and tile. In one aspect, encoded data may also be segmented into time chunks. Hence, encoded data may be segmented into discrete segments for each time chunk, tile, and tier. - As discussed, a sink terminal 120 (
FIG. 1 ) may extract aviewport 830 from theframe 800, after it is coded by the source terminal 110 (FIG. 1 ), transmitted to thesink terminal 120, and decoded. Thesink terminal 120 may display theviewport 800 locally. Thesink terminal 120 may transmit to thesource terminal 110 viewport information, such as data identifying a location of theviewport 830 within an area of theframe 800. For example, thesink terminal 120 may transmit offset data, shown as offset-x and offset-y fromorigin 820, identifying a location of theviewport 830 within the area of theframe 800. In an aspect, a size and/or shape of theviewport 830 may be included in the viewport information sent to source terminal 110.Source terminal 120 may then use the received viewport information to select which discrete portions of encoded data to transmit to sink terminal 120. In the example ofFIG. 8 ,viewport 830 spans tiles 810.5 and 810.6. Hence, a first tier may be sent for tiles 810.5 and 810.6, while a second tier may be sent for the remaining tiles that do not include any portion of the viewport. For example, when the first tier provides higher quality video and the second tier provides more efficient coding (high compression), the first tier may be sent to sink terminal 120 for tiles 810.5 and 810.6, while the second tier providing lower quality video may be sent for some or all of the other tiles. - In an aspect, a lower quality tier may be provided for all tiles. In another aspect a lower quality tier may be provided for only a portion of the
frame 800. For example, a lower quality tier may be provided only for 180 degrees of view centered on the current viewport (instead of 360 degrees), or the lower quality tier may be provided only in areas offrame 800 where the viewport is likely to move next. - In an aspect,
frame 800 may be encoded according to a layered coding protocol, where one tier is coded as a base layer, and other tiers are encoded as enhancement layers of the base layer. An enhancement layer may be predicted from one or more lower layers. For example, a first enhancement layer may be predicted from the base layer, and a second, higher enhancement layer may be predicted from either the base layer or from the first, lower enhancement layer. - An enhancement layer may be differentially or predictively coded from one or more lower layers. Non-enhancement layers, such as a base layer, may be encoded independently of other layers. Reconstruction at a decoder of a differentially coded layer will require both the encoded data segment of the differentially coded layer and the segment(s) from the differentially coded layer(s) from which it is predicted. In the case of a predictively coded layer, sending that layer may include sending both the discrete encoded data segment of the predictively coded layer, and also sending the discrete encoded data segment of the layer(s) used as a prediction reference. In an example, differential layered coding of
frame 800, a lower base layer may be sent to sink terminal 120, for all tiles, while discrete data segments for a higher differential layer (that is coded using predictions from the base layer) may be sent only for tiles 810.5 and 810.6 as theviewport 830 is included in those tiles. -
FIG. 9 illustrates aframe 900 of omnidirectional video that may be coded by asource terminal 110. There, as inframe 800 ofFIG. 8 , theframe 900 is illustrated as having been parsed into a plurality of tiles 810.0-810.n.Frame 900 may represent a different video time fromframe 800, for example aframe 900 may be a later time in the timeline of the video. At this later time, the viewport ofsink terminal 120 may have moved to the location ofviewport 930, which may be identified by offset-x′ and offset-y′ fromorigin 820. When the viewport ofsink terminal 120 moves from the location ofviewport 830 inFIG. 8 to the location ofviewport 930 inFIG. 9 , the sink terminal sends the new viewport information to source terminal 110. In response,sink terminal 120 may change which discrete segments of encoded video are sent to sink terminal, such that a first layer may be sent for tiles that include a portion of the viewport, while a second layer may be sent for tiles that do not include a portion of the viewport. In the example ofFIG. 9 , pixels of tiles 810.0 and 810.1 are included inviewport 930 and hence a first layer may be sent for these tiles, while a second layer may be sent for the tiles that do not include a portion of the viewport. -
FIG. 10 is a simplified block diagram of an examplevideo distribution system 100 suitable for use with the present invention, including when multi-directional video is pre-encoded and stored on a server. Thesystem 1000 may include adistribution server system 1010 and aclient device 1020 connected via acommunication network 1030. Thedistribution system 1000 may provide coded multi-directional video data to theclient 1020 in response to client requests. Theclient 1020 may decode the coded video data and render it on a display. - The
distribution server 1010 may include astorage system 1040 on which pre-encoded multi-directional videos are stored in a variety of tiers for download by theclient device 1020. Thedistribution server 1010 may store several coded representations of a video content item, shown astiers - In the example of
FIG. 10 , theTiers Tier 2 enabling a higher quality reconstruction of the video content item at a higher average bitrate compared to that provided byTier 1. The difference in bitrate and quality may be induced by differences in coding parameters—e.g., coding complexity, frame rates, frame size and the like.Tier 3 may be an enhancement layer ofTier 1, which, when decoded in combination withTier 1, may improve the quality of theTier 1 representation if it were decoded by itself. Each video tier 1-3 may be parsed into a plurality of chunks CH1.1-CH1.N, CH2.1-CH2.N, and CH3.1-CH3.N.Manifest file 1050 may include pointers to each chunk of encoded video data for each tier. The different chunks may be retrieved from storage and delivered to theclient 1020 over a channel defined in thenetwork 1030.Channel stream 1040 represents aggregation of transmitted chunks from multiple tiers. Furthermore, as explained above with regard toFIGS. 4 and 5 , a multi-directional video may be spatially segmented into tiles.FIG. 10 depicts the chunks available for the various tiers of one tile.Manifest 1050 may additionally include other tiles (not depicted inFIG. 10 ), such as by providing metadata and pointers to multiple tiers including storage locations encoded data chunks for each of the various tiers. - The example of
FIG. 10 illustrates three encodedvideo tiers - Times A, B, C, and D are depicted in
FIG. 10 in part to assist in illustrating a moving viewport in an aspect of this disclosure. Times A, B, C, and D are positioned along the streaming timeline of the media chunks referenced bymanifest 1050. Specifically, Times A, B, and D may correspond to the beginning of time period t1, t2, and t3, respectively, while time C may correspond to a time somewhere in the middle of time period t2, between the beginning of t2 and the beginning of t3. - In an aspect, multi-directional image data may include depth maps and/or occlusion information. Depth maps and/or occlusion information may be included as separate channel(s) and manifest 1050 may include references to these separate channel(s) for depth maps and/or occlusion information.
-
FIG. 11 illustrates aframe 1100 of multi-directional video with a moving viewport. There,frame 1100 is illustrated as having been parsed into a plurality of tiles 1110.0-1110.n. Superimposed uponframe 1100 isviewport location 1130 which may correspond to a first location of a viewport inclient 1020 at first time, andviewport location 1140, which may correspond to a second location of the same viewport at a second time. - In an aspect, in steady state when a viewport is not moving,
client 1020 may extract a viewport image from the high reconstruction quality oftier 2. During a transitional period,client 1020 may extract a viewport image from the reconstructed combination oftier 1 andenhancement layer tier 3 when the viewport moves into a new spatial tile, and then return to a steady state by extracting a viewport image fromtier 2 oncetier 2 is again available atclient 1020. An example of this is illustrated in tables 1 and 2 for a viewport ofclient 1020 that were to jump fromviewport location 1130 toviewport location 1140 right attime C. Client 1020 requests for tiers of tiles is listed in Table 1, and tiers from which a viewport image is extracted is listed in Table 2. -
TABLE 1 Requests for tiles Time A Time B Time C Time D Tier 1 Tiles All tiles All tiles None All tiles Requested except except except (1 MB/sec) 1110.0 1110.0 1110.5 Tier 2 Tiles1110.0 1110.0 None 1110.5 Requested (2 MB/sec) Tier 3 TilesNone None 1110.5 None Requested (Enhance- ment of Tier 1) -
TABLE 2 Viewport extraction Time A Time B Time C Time D Viewport Tile 1110.0 Tile 1110.0 Tile 1110.5 Tile 1110.5 location Extracted for Tier 2Tier 2Tier 1; thenTier 2Viewport Tier 1 + Tier 3 - Under the initial steady state condition during time period t1, the viewport is not moving and
viewport location 1130 is fully contained in tile 1110.0.Tier 2, being the higher quality tier, may be requested byclient 1020 fromserver 1010 for tile 1110.0 at time A, as indicated in Table 1. For tiles not included in the viewport at location 1130 (tiles 1110.1-1110.n), the lower quality and more highlycompressed tier 1 is requested instead. Hence,tier 1 chunks are requested for time period t1 at time A for all tiles other than tile 1110.0. The viewport is then extracted from the reconstruction oftier 2 byclient 1020 starting at time A. - At time B, the viewport has not yet moved, so the same tiers are requested by
client 1020 for the same tiles as at time A, but the requests are for the specific chunks corresponding to time period t2. At time C, the viewport ofclient 1020 may jump fromviewport location 1130 tolocation 1140. At the point to time C, somewhere between the beginning and end of t2,lower quality tier 1 has already been requested for the new location of the viewport, tile 1110.5. So, a viewport can be extracted immediately fromtier 1 at time C when the viewport moves. At time C,tier 3 can may be requested, and as soon as it is available, the combination oftier 1 andenhancement layer tier 3 can be used for extracting a viewport image atclient 1020. At time D,client 1020 may go back to a steady state by requestinglayer 2 for tiles containing the viewport location, andlayer 0 for tiles not containing the viewport location. -
FIG. 12 is a functional block diagram of acoding system 1200 according to an aspect of the present disclosure. Thesystem 1200 may include animage source 1210, animage processing system 1220, avideo coder 1230, avideo decoder 1240, areference picture store 1250 and apredictor 1260. Theimage source 1210 may generate image data as a multi-directional image, containing image data of a field of view that extends around a reference point in multiple directions. Theimage processing system 1220 may perform image processing operations to condition the image for coding. In one aspect, theimage processing system 1220 may generate different versions of source data to facilitate encoding the source data into multiple layers of coded data. For example,image processing system 1220 may generate multiple different projections of source video aggregated from multiple cameras. In another example,image processing system 1220 may generate resolutions of source video for a high layer with a higher spatial resolution and a lower layer with a lower spatial resolution. Thevideo coder 1230 may generate a multi-layered coded representation of its input image data, typically by exploiting spatial and/or temporal redundancies in the image data. Thevideo coder 1230 may output a coded representation of the input data that consumes less bandwidth than the original source video when transmitted and/or stored.Video coder 1230 may output data in discrete time chunks corresponding to a temporal portion of source image data, and in some aspects, separate time chunks encoded data may be decoded independently of other time chunks.Video coder 1230 may also output data in discrete layers, and in some aspects, separate layers may be transmitted independently of other layers. - The
video decoder 1240 may invert coding operations performed by thevideo encoder 1230 to obtain a reconstructed picture from the coded video data. Typically, the coding processes applied by thevideo coder 1230 are lossy processes, which cause the reconstructed picture to possess various errors when compared to the original picture. Thevideo decoder 1240 may reconstruct pictures of select coded pictures, which are designated as “reference pictures,” and store the decoded reference pictures in thereference picture store 1250. In the absence of transmission errors, the decoded reference pictures may replicate decoded reference pictures obtained by a decoder (not shown inFIG. 12 ). - The
predictor 1260 may select prediction references for new input pictures as they are coded. For each portion of the input picture being coded (called a “pixel block” for convenience), thepredictor 1260 may select a coding mode and identify a portion of a reference picture that may serve as a prediction reference search for the pixel block being coded. The coding mode may be an intra-coding mode, in which case the prediction reference may be drawn from a previously-coded (and decoded) portion of the picture being coded. Alternatively, the coding mode may be an inter-coding mode, in which case the prediction reference may be drawn from another previously-coded and decoded picture. In one aspect of layered coding, prediction references may be pixel blocks previously decoded from another layer, typically a lower layer lower than the layer currently being encoded. In the case of two layers that encode two different projections formats of multi-directional video, a function such as an image warp function may be applied to a reference image in one projection format at a first layer to predict a pixel block in a different projection format at a second layer. - In another aspect of a layered coding system, a differentially coded enhancement layer may be coded with restricted prediction references to enable seeking or layer/tier switching into the middle of an encoded enhancement layer chunk. In a first aspect,
predictor 1260 may restrict prediction references of only every frame in an enhancement layer to be frames of a base layer or other lower layer. When every frame of an enhancement layer is predicted without reference to other frames of the enhancement layer, a decoder may switch to the enhancement layer at any frame efficiently because previous enhancement layer frames will never be necessary to reference as a prediction reference. In a second aspect,predictor 1260 may require that every Nth frame (such as every other frame) within a chuck be predicted only from a base layer or other lower layer to enable seeking to every Nth frame within an encoded data chunk. - When an appropriate prediction reference is identified, the
predictor 1260 may furnish the prediction data to thevideo coder 1230. Thevideo coder 1230 may code input video data differentially with respect to prediction data furnished by thepredictor 1260. Typically, prediction operations and the differential coding operate on a pixel block-by-pixel block basis. Prediction residuals, which represent pixel-wise differences between the input pixel blocks and the prediction pixel blocks, may be subject to further coding operations to reduce bandwidth further. - As indicated, the coded video data output by the
video coder 1230 should consume less bandwidth than the input data when transmitted and/or stored. Thecoding system 1200 may output the coded video data to anoutput device 1270, such as a transceiver, that may transmit the coded video data across a communication network 130 (FIG. 1 ). Alternatively, thecoding system 1200 may output coded data to a storage device (not shown) such as an electronic-, magnetic- and/or optical storage medium. - The
transceiver 1270 also may receive viewport information from a decoding terminal (FIG. 7 ) and provide the viewport information tocontroller 1280.Controller 1280 may control theimage processor 1220, the video coding process overall, includingvideo coder 1230 andtransceiver 1270. Viewport information received bytransceiver 1270 may include a viewport location and/or a preferred projection format. In one aspect,controller 1280 may controltransceiver 1270 based on viewport information to send certain coded layer(s) for certain spatial tiles, while sending a different coded layer(s) for other tiles. In another aspect,controller 1280 may control the allowable prediction references in certain frames of certain layers. In yet another aspect,controller 1280 may control the projection format(s) or scaled layers produced byimage processor 1230 based on the received viewport information. -
FIG. 13 is a functional block diagram of adecoding system 1300 according to an aspect of the present disclosure. Thedecoding system 1300 may include atransceiver 1310, abuffer 1315, avideo decoder 1320, animage processor 1330, avideo sink 1340, areference picture store 1350, apredictor 1360, and acontroller 1370. Thetransceiver 1310 may receive coded video data from a channel and route it to buffer 1315 before sending it tovideo decoder 1320. The coded video data may be organized into chunks of time and spatial tiles, and may include different coded layers for different tiles. The video data buffered inbuffer 1315 may span the video time of multiple chunks. Thevideo decoder 1320 may decode the coded video data with reference to prediction data supplied by thepredictor 1360. Thevideo decoder 1320 may output decoded video data in a representation determined by a source image processor (such asimage processor 1220 ofFIG. 12 ) of a coding system that generated the coded video. Theimage processor 1330 may extract video data from the decoded video according to the viewport orientation currently in force at the decoding system. Theimage processor 1330 may output the extracted viewport data to thevideo sink device 1340.Controller 1370 may control theimage processor 1330, the video decoding processing includingvideo decoder 1320, andtransceiver 1310. - The
video sink 1340, as indicated, may consume decoded video generated by thedecoding system 1300. Video sinks 1340 may be embodied by, for example, display devices that render decoded video. In other applications, video sinks 1340 may be embodied by computer applications, for example, gaming applications, virtual reality applications and/or video editing applications, that integrate the decoded video into their content. In some applications, a video sink may process the entire multi-directional field of view of the decoded video for its application but, in other applications, avideo sink 1340 may process a selected sub-set of content from the decoded video. For example, when rendering decoded video on a flat panel display, it may be sufficient to display only a selected subset of the multi-directional video. In another application, decoded video may be rendered in a multi-directional format, for example, in a planetarium. - The
transceiver 1310 also may send viewport information provided by thecontroller 1370, such as a viewport location and/or a preferred projection format, to the source of encoded video, such asterminal 1200 ofFIG. 12 . When the viewport location changes,controller 1370 may provide new viewport information totransceiver 1310 to send on to the encoded video source. In response to the new viewport information, missing layers for certain previously received but not yet decoded tiles of encoded video may be received bytransceiver 1310 and stored inbuffer 1315.Decoder 1320 may then decode these tiles using these replacement layers (which were previously missing) instead of the layers that had previously been received based on the old viewport location. -
Controller 1370 may determine viewport information based on a viewport location. In one example, the viewport information may include just a viewport location, and the encoded video source may then use the location to identify which encoded layers to provide todecoding system 1300 for specific spatial tiles. In another example, viewport information sent from the decoding system may include specific requests for specific layers of specific tiles, leaving much of the viewport location mapping in the decoding system. In yet another example, viewport information may include a request for a particular projection format based on the viewport location. - The principles of the present disclosure find application with a variety of projection formats of multi-directional images. In an aspect, one may convert between the various projection formats of
FIGS. 14-16 using a suitable projection conversion function. -
FIG. 14 illustrates an exemplary multi-directional image projection format according to one aspect. Themulti-directional image 1430 may be generated by acamera 1410 that pivots along an axis. During operation, thecamera 1410 may capture image content as it pivots along a predetermined angular distance 1420 (preferably, a full 360°) and may merge the captured image content into a 360° image. The capture operation may yield amulti-directional image 1430 that represents a multi-directional field of view having been partitioned along aslice 1422 that divides a cylindrical field of view into a two dimensional array of data. In themulti-directional image 1430, pixels on eitheredge image 1430 represent adjacent image content even though they appear on different edges of themulti-directional image 1430. -
FIG. 15 illustrates an exemplary multi-directional image projection format according to another aspect. In the aspect ofFIG. 15 , acamera 1510 may possess image sensors 1512-1516 that capture image data in different fields of view from a common reference point. Thecamera 1510 may output amulti-directional image 1530 in which image content is arranged according to a cube map capture operation 1520 in which the sensors 1512-1516 capture image data in different fields of view 1521-1526 (typically, six) about thecamera 1510. The image data of the different fields of view 1521-1526 may be stitched together according to acube map layout 1530. In the example illustrated inFIG. 15 , six sub-images corresponding to aleft view 1521, afront view 1522, aright view 1523, aback view 1524, atop view 1525 and abottom view 1526 may be captured, stitched and arranged within themulti-directional picture 1530 according to “seams” of image content between the respective views 1521-1526. Thus, as illustrated inFIG. 15 , pixels from thefront image 1532 that are adjacent to the pixels from each of the left, the right, the top, and thebottom images back images terminal edge 1538 of theback image 1534 is adjacent to content from an opposingterminal edge 1539 of the left image. Theimage 1530 also may have regions 1537.1-1537.4 that do not belong to any image. The representation illustrated inFIG. 15 often is called a “cube map” image. - Coding of cube map images may occur in several ways. In one coding application, the
cube map image 1530 may be coded directly, which includes coding of null regions 1537.1-1537.4 that do not have image content. The encoding techniques ofFIG. 3 may be applied tocube map image 1530. - In other coding applications, the
cube map image 1530 may be repacked to eliminate null regions 1537.1-1537.4 prior to coding, shown asimage 1540. The techniques described inFIG. 3 may also be applied to apacked image frame 1540. After decode, the decoded image data may be unpacked prior to display. -
FIG. 16 illustrates another exemplary multi-directionalprojection image format 1630. The frame format ofFIG. 16 may be generated by another type ofomnidirectional camera 1600, called a panoramic camera. A panoramic camera typically is composed of a pair offish eye lenses FIG. 16 illustrates amulti-directional image 1630 that containsimage content hemispherical views seam 1635. The techniques described hereinabove also find application with multi-directional image data insuch formats 1630. - In an aspect, cameras, such as the
cameras FIGS. 14-16 , may capture depth or occlusion information in addition to visible light. In some cases, depth and occlusion information may be stored as separate data channels of data in multi-projection formats such as images such as 1430, 1530, 1540, and 1630. In other cases, depth and occlusion information may be included as a separate data channel in a manifest, such asmanifest 1050 ofFIG. 10 . -
FIG. 17 illustrates an exemplary prediction reference pattern.Video sequence 1700 includes abase layer 1720 andenhancement layer 1710, each layer comprising a series of corresponding frames.Base layer 1720 includes an intra-coded frame L0.I0 followed by predicted frames L0.P1-L0.P7.Enhancement layer 1710 includes predicted frames L1.P0-L1.P7. Intra-coded frame L0.I0 may be coded without prediction from any other frame. Predicted frames may be coded by predicting pixel blocks of the frame portions of reference frames indicated by solid arrows inFIG. 17 , where the arrow head points to a reference frame that may be used as a prediction reference for a frame touching the tail of the arrow. For example, predicted frames in a base layer may be predicted using only a previous base layer frame as a prediction reference. As depicted inFIG. 17 , L0.P1 is predicted only from frame L0.I0 as a reference, L0.P1 may be a reference for L0.P2, L0.P2 may be reference for L0.P3, and so on, as indicated by the arrows insidebase layer 1720. The frames ofenhancement layer 1710 may be predicted using only corresponding base layer reference frames, such that L0.I0 may be a prediction reference for L1.P0, L0.P1 may be a prediction reference for L1.P1, and so on. - In an aspect,
enhancement layer 1710 frames may also be predicted from previous enhancement layer frames, as indicated by optional dashed arrows inFIG. 17 . For example, frame L1.P7 may be predicted from either L0.P7 or L1.P6. Prediction references withinenhancement layer 1710 may be limited such that only a subset of enhancement layer frames may use other enhancement layer frames as a prediction reference, and this subset of enhancement layer frames may follow a pattern. In the example ofFIG. 17 , every other frame of enhancement layer 1710 (L1.P0, L1.P2, L1.P4, and L1.P6) is predicted only from the corresponding base layer frame, while alternate frames (L1.P1, L1.P3, L1.P5, L1.P7) may be predicted from either base layer frames or previous enhancement layer frames. Tier switching toenhancement layer 1710 may be facilitated at the frames that are predicted only from lower layers because prior frames of the enhancement layer need not be previously decoded for use a reference frames. Enhancement layer frames that are predicted only from lower layer frames may be considered safe-switching frames, sometimes called key frames, because previous frames from the enhancement layer need not be available to correctly decode these safe switching frames. - In an aspect, a sink terminal may switch to a new layer or new tier on non-safe-switching frames when some decoded quality drift may be tolerated. A non-safe switching frame may be decoded without having access to the reference frames used for its prediction, and quality gradually gets worse as errors from incorrect predictions accumulate into what may be called quality drift. Error concealment techniques may be used to mitigate the quality drift due to switching at non-safe-switching enhancement layer frames. Example error concealment techniques include predicting from a frame similar to the missing reference frame, and periodic intra-refresh mechanisms. By tolerating some quality drift caused by switching at non-safe-switching frames, the latency can be reduced between moving a viewport and presenting images of the new viewport location.
-
FIG. 18 illustrates two exemplary multi-directional projections for combining. Images of the same scene may be encoded in a plurality of projection formats. In the example ofFIG. 18 , a multi-directional scene is encoded as a first image with a first projection format, such as animage 1810 in equirectangular projection format, and the same scene is encoded as a second image in a second projection format, such asimage 1820 in a cube map projection format. Region ofinterest 1812 projected ontoequirectangular image 1810 and region ofinterest 1822 projected ontocube map image 1820 may both correspond to the same region of interest in the scene projected intoimages Cube Map image 1820 may include null regions 1837.1-1837.4 and cube faces left, front, right, back, top and bottom 1831-1836. - In one aspect, multiple projection formats may be combined to form a better reconstruction of a region of interest (ROI) than can be produced from a single projection format. A reconstructed region of interest, ROIcombo, may be produced from a weighted sum of the encoded projections or may be produced from a filtered sum of the encoded projections. For example, the region of interest in the scene of
FIG. 18 may be reconstructed as: -
ROIcombo =f(ROI1,ROI2) - where f( ) is a function for combining two region of interest images, first region of interest image ROI1 may be, for example, the equirectangular region of interest image from
ROI 1812, and second region of interest image ROI2 may be, for example, the cube map region of interest image fromROI 1822. If f( ) is a weighted sum, -
ROIcombo=alpha*ROI1+beta*ROI2 - where alpha and beta are predetermined constants, and alpha+beta=1. In cases where pixel locations do not exactly correspond in the projection formats being combined, a projection format conversion function may be used, as in:
-
ROIcombo=alpha*PConv(ROI1)+beta*ROI2 - where PConv( ) is a function that converts an image in a first projection format into a second projection format. For example, PConv( ) may simply be an up-sample or a down-sample function.
- In another aspect, the best projection formation for encoding an entire multi-directional scene, such as for encoding a base layer, may be different than the best projection format for encoding only a region of interest, such as for encoding in an enhancement layer. Hence a multi-tiered encoding of the scene of
FIG. 18 may include encoding the entirety ofequirectangular image 1810 in a first tier, and encoding only theROI 1822 ofcube map image 1820 in a second tier. Forexample ROI 1822 may be encoded by encoding theentire front face 1832 as a tile ofcube map image 1820. In a further aspect, this second tier may be encoded as an enhancement layer over the first tier base layer, as depicted inFIG. 19 . -
FIG. 19 illustrates an exemplary system for creating a residual from two different multi-directional projections. A baselayer ROI image 1910 in a projection format P1 may be converted to a projection format P2 byconversion process 1902 to create a prediction of theROI image 1920 in projection format P2. The prediction image fromconversion process 1902 is subtracted from the actualP2 ROI image 1920 atadder 1904 to produce a P2 residual ROI, which may then be encoded as a P2 projection enhancement layer over a P1 base layer. In an aspect, the base layer may encode the entire scene in projection P1, while the enhancement layer may encode only a region of interest within the scene in projection P2. This aspect may be beneficial, for example, when projection P1 is preferred for encoding the entire scene, while projection P2 is preferred for encoding a particular region of interest. For example, with respect toFIG. 18 , a first tier may be encoded as a base layer comprising the entireequirectangular image 1810, while a second tier may be encoded as an enhancement layer comprising a subset ofcube map image 1820 such as a single tile or region of interest. - The foregoing discussion has described operation of the aspects of the present disclosure in the context of video coders and decoders. Commonly, these components are provided as electronic devices. Video decoders and/or controllers can be embodied in integrated circuits, such as application specific integrated circuits, field programmable gate arrays and/or digital signal processors. Alternatively, they can be embodied in computer programs that execute on camera devices, personal computers, notebook computers, tablet computers, smartphones or computer servers. Such computer programs include processor instructions and typically are stored in physical storage media such as electronic-, magnetic-, and/or optically-based storage devices, where they are read by a processor and executed. Decoders commonly are packaged in consumer electronics devices, such as smartphones, tablet computers, gaming systems, DVD players, portable media players and the like; and they also can be packaged in consumer software applications such as video games, media players, media editors, and the like. And, of course, these components may be provided as hybrid systems that distribute functionality across dedicated hardware components and programmed general-purpose processors, as desired.
Claims (19)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/221,299 US20210227236A1 (en) | 2018-09-14 | 2021-04-02 | Scalability of multi-directional video streaming |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/132,219 US10999583B2 (en) | 2018-09-14 | 2018-09-14 | Scalability of multi-directional video streaming |
US17/221,299 US20210227236A1 (en) | 2018-09-14 | 2021-04-02 | Scalability of multi-directional video streaming |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/132,219 Continuation US10999583B2 (en) | 2018-09-14 | 2018-09-14 | Scalability of multi-directional video streaming |
Publications (1)
Publication Number | Publication Date |
---|---|
US20210227236A1 true US20210227236A1 (en) | 2021-07-22 |
Family
ID=67997700
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/132,219 Active US10999583B2 (en) | 2018-09-14 | 2018-09-14 | Scalability of multi-directional video streaming |
US17/221,299 Pending US20210227236A1 (en) | 2018-09-14 | 2021-04-02 | Scalability of multi-directional video streaming |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/132,219 Active US10999583B2 (en) | 2018-09-14 | 2018-09-14 | Scalability of multi-directional video streaming |
Country Status (3)
Country | Link |
---|---|
US (2) | US10999583B2 (en) |
CN (1) | CN112703737A (en) |
WO (1) | WO2020055655A1 (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10291910B2 (en) * | 2016-02-12 | 2019-05-14 | Gopro, Inc. | Systems and methods for spatially adaptive video encoding |
KR102361314B1 (en) * | 2016-07-19 | 2022-02-10 | 한국전자통신연구원 | Method and apparatus for providing 360 degree virtual reality broadcasting services |
US11461942B2 (en) * | 2018-12-21 | 2022-10-04 | Koninklijke Kpn N.V. | Generating and signaling transition between panoramic images |
WO2020186478A1 (en) * | 2019-03-20 | 2020-09-24 | Beijing Xiaomi Mobile Software Co., Ltd. | Method and device for transmitting viewpoint switching capabilities in a vr360 application |
US11606574B2 (en) * | 2019-05-31 | 2023-03-14 | Apple Inc. | Efficient coding of source video sequences partitioned into tiles |
EP3996376A4 (en) * | 2019-07-03 | 2023-08-09 | Sony Group Corporation | Information processing device, information processing method, reproduction processing device, and reproduction processing method |
US11381817B2 (en) * | 2019-09-24 | 2022-07-05 | At&T Intellectual Property I, L.P. | Viewport-based transcoding for immersive visual streams |
CN115462078A (en) * | 2020-05-26 | 2022-12-09 | 华为技术有限公司 | Video transmission method, device and system |
CN113301355B (en) * | 2020-07-01 | 2023-04-28 | 阿里巴巴集团控股有限公司 | Video transmission, live broadcast and playing method, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140245367A1 (en) * | 2012-08-10 | 2014-08-28 | Panasonic Corporation | Method for providing a video, transmitting device, and receiving device |
US20180109817A1 (en) * | 2016-10-17 | 2018-04-19 | Mediatek Inc. | Deriving And Signaling A Region Or Viewport In Streaming Media |
US20190158815A1 (en) * | 2016-05-26 | 2019-05-23 | Vid Scale, Inc. | Methods and apparatus of viewport adaptive 360 degree video delivery |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012012584A1 (en) * | 2010-07-21 | 2012-01-26 | Dolby Laboratories Licensing Corporation | Systems and methods for multi-layered frame-compatible video delivery |
US9699437B2 (en) * | 2014-03-03 | 2017-07-04 | Nextvr Inc. | Methods and apparatus for streaming content |
KR102013403B1 (en) | 2015-05-27 | 2019-08-22 | 구글 엘엘씨 | Spherical video streaming |
US10545263B2 (en) | 2015-07-13 | 2020-01-28 | The Climate Corporation | Systems and methods for generating computer-based representations of probabilities of precipitation occurrences and intensities |
CN108476324B (en) | 2015-10-08 | 2021-10-29 | 皇家Kpn公司 | Method, computer and medium for enhancing regions of interest in video frames of a video stream |
US10582201B2 (en) | 2016-05-19 | 2020-03-03 | Qualcomm Incorporated | Most-interested region in an image |
CN109891850B (en) | 2016-09-09 | 2023-04-04 | Vid拓展公司 | Method and apparatus for reducing 360 degree view adaptive streaming media delay |
US10917564B2 (en) * | 2016-10-12 | 2021-02-09 | Qualcomm Incorporated | Systems and methods of generating and processing files for partial decoding and most interested regions |
-
2018
- 2018-09-14 US US16/132,219 patent/US10999583B2/en active Active
-
2019
- 2019-09-05 WO PCT/US2019/049678 patent/WO2020055655A1/en active Application Filing
- 2019-09-05 CN CN201980059922.0A patent/CN112703737A/en active Pending
-
2021
- 2021-04-02 US US17/221,299 patent/US20210227236A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140245367A1 (en) * | 2012-08-10 | 2014-08-28 | Panasonic Corporation | Method for providing a video, transmitting device, and receiving device |
US20190158815A1 (en) * | 2016-05-26 | 2019-05-23 | Vid Scale, Inc. | Methods and apparatus of viewport adaptive 360 degree video delivery |
US20180109817A1 (en) * | 2016-10-17 | 2018-04-19 | Mediatek Inc. | Deriving And Signaling A Region Or Viewport In Streaming Media |
Also Published As
Publication number | Publication date |
---|---|
WO2020055655A1 (en) | 2020-03-19 |
CN112703737A (en) | 2021-04-23 |
US10999583B2 (en) | 2021-05-04 |
US20200092571A1 (en) | 2020-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210227236A1 (en) | Scalability of multi-directional video streaming | |
CN113498606B (en) | Apparatus, method and computer program for video encoding and decoding | |
JP7492978B2 (en) | Apparatus, method and computer program for video encoding and decoding | |
US11523135B2 (en) | Apparatus, a method and a computer program for volumetric video | |
JP6437096B2 (en) | Video composition | |
CN113170234B (en) | Adaptive encoding and streaming method, system and storage medium for multi-directional video | |
CN110121065B (en) | Multi-directional image processing in spatially ordered video coding applications | |
CN112602329B (en) | Block scrambling for 360 degree video decoding | |
CN112153391B (en) | Video coding method and device, electronic equipment and storage medium | |
US11388437B2 (en) | View-position and angle dependent processing of point cloud data | |
KR101898822B1 (en) | Virtual reality video streaming with viewport information signaling | |
CN111953996A (en) | Method and device for video decoding | |
KR101941789B1 (en) | Virtual reality video transmission based on viewport and tile size | |
US11910054B2 (en) | Method and apparatus for decoding a 3D video | |
US11736725B2 (en) | Methods for encoding decoding of a data flow representing of an omnidirectional video | |
KR102183895B1 (en) | Indexing of tiles for region of interest in virtual reality video streaming | |
US20240236305A1 (en) | Vertices grouping in mesh motion vector coding | |
RU2775391C1 (en) | Splitting into tiles and subimages | |
US20240282010A1 (en) | Adaptive integrating duplicated vertices in mesh motion vector coding | |
US20240244260A1 (en) | Integrating duplicated vertices and vertices grouping in mesh motion vector coding | |
KR101981868B1 (en) | Virtual reality video quality control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED |
|
AS | Assignment |
Owner name: APPLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TOURAPIS, ALEXANDROS;ZHANG, DAZHONG;YUAN, HANG;AND OTHERS;SIGNING DATES FROM 20180913 TO 20180917;REEL/FRAME:056957/0527 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |