[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2024183506A1 - 沉浸媒体的数据处理方法、装置、计算机设备、存储介质及程序产品 - Google Patents

沉浸媒体的数据处理方法、装置、计算机设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2024183506A1
WO2024183506A1 PCT/CN2024/074627 CN2024074627W WO2024183506A1 WO 2024183506 A1 WO2024183506 A1 WO 2024183506A1 CN 2024074627 W CN2024074627 W CN 2024074627W WO 2024183506 A1 WO2024183506 A1 WO 2024183506A1
Authority
WO
WIPO (PCT)
Prior art keywords
media
track
code stream
replaceable
current
Prior art date
Application number
PCT/CN2024/074627
Other languages
English (en)
French (fr)
Inventor
胡颖
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2024183506A1 publication Critical patent/WO2024183506A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • H04N21/2335Processing of audio elementary streams involving reformatting operations of audio signals, e.g. by converting from one coding standard to another
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments

Definitions

  • the present application relates to the field of audio and video technology, and in particular to an immersive media data processing method, an immersive media data processing device, a computer device, a computer-readable storage medium, and a computer program product.
  • Immersive media can be encoded into interchangeable bitstreams to meet different presentation requirements for immersive media. For example, two bitstreams with different encoding quality but the same content can be interchangeable; another example: two bitstreams with different encoding types but the same content can be interchangeable. For multiple interchangeable bitstreams, corresponding instructions need to be given on the decoding side to guide the decoding and presentation process of the immersive media.
  • the embodiments of the present application provide a data processing method, apparatus, computer equipment, storage medium, and program product for immersive media, which can clearly indicate the replaceable relationship between code streams and improve the presentation effect of immersive media.
  • an embodiment of the present application provides an immersive media data processing method, which is executed by a computer device, and the method includes:
  • the immersive media includes N replaceable code streams;
  • the media file includes relationship indication information, the relationship indication information is used to indicate the replaceable relationship between the N code streams, N is an integer greater than 1;
  • the media file is decoded to present the immersive media.
  • an embodiment of the present application provides another immersive media data processing method, which is executed by a computer device, and the method includes:
  • relationship indication information is generated, where the relationship indication information is used to indicate the replaceable relationship between the N code streams;
  • the relationship indication information and the N code streams are encapsulated to obtain a media file of the immersive media.
  • an immersive media data processing device comprising:
  • an acquisition unit configured to acquire a media file of immersive media;
  • the immersive media includes N replaceable code streams;
  • the media file includes relationship indication information, the relationship indication information is used to indicate the replaceable relationship between the N code streams, N is an integer greater than 1;
  • the processing unit is used to decode the media file according to the relationship indication information to present the immersive media.
  • an embodiment of the present application provides another immersive media data processing device, the device comprising:
  • a coding unit used for coding the immersive media to obtain N replaceable code streams
  • a processing unit configured to generate relationship indication information according to the replaceable relationship between the N code streams, wherein the relationship indication information is used to indicate the replaceable relationship between the N code streams;
  • the processing unit is further used to encapsulate the relationship indication information and the N code streams to obtain a media file of the immersive media.
  • an embodiment of the present application provides a computer device, the computer device comprising:
  • processor suitable for executing a computer program
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the above-mentioned immersive media data processing method is implemented.
  • an embodiment of the present application provides a computer-readable storage medium, which stores a computer program.
  • the computer program is loaded by a processor and executes the data processing method of the immersive media as described above.
  • an embodiment of the present application provides a computer program product, which includes a computer program or computer instructions, and when the computer program or computer instructions are executed by a processor, the above-mentioned immersive media data processing method is implemented.
  • FIG. 1a is a schematic diagram of a 6DoF provided by an exemplary embodiment of the present application.
  • FIG1b is a schematic diagram of a 3DoF provided by an exemplary embodiment of the present application.
  • FIG1c is a schematic diagram of a 3DoF+ provided by an exemplary embodiment of the present application.
  • FIG2 is an architecture diagram of a data processing system provided by an exemplary embodiment of the present application.
  • FIG3a is a schematic diagram of a packaging result based on single-track packaging provided by an exemplary embodiment of the present application.
  • FIG3 b is a schematic diagram of a packaging result of a component-based multi-track packaging provided by an exemplary embodiment of the present application
  • FIG3c is a schematic diagram of a packaging result of a multi-track packaging based on slicing provided by an exemplary embodiment of the present application;
  • FIG3d is a schematic diagram of another packaging result of multi-track packaging based on slicing provided by an exemplary embodiment of the present application.
  • FIG4a is a flow chart of data processing of immersive media provided by an exemplary embodiment of the present application.
  • FIG4 b is a schematic diagram of a packaging result of an immersive media provided by an exemplary embodiment of the present application.
  • FIG5 is a flow chart of a method for processing immersive media data provided by an exemplary embodiment of the present application.
  • FIG6a is a flow chart of another immersive media data processing method provided by an exemplary embodiment of the present application.
  • FIG6b is a schematic diagram of the content of a media file provided by an exemplary embodiment of the present application.
  • FIG7 is a schematic diagram of the contents of another media file provided by an exemplary embodiment of the present application.
  • FIG8a is a schematic diagram of the structure of an immersive media data processing device provided by an exemplary embodiment of the present application.
  • FIG8b is a schematic structural diagram of another immersive media data processing device provided by an exemplary embodiment of the present application.
  • FIG. 9 is a schematic diagram of the structure of a computer device provided by an exemplary embodiment of the present application.
  • the terms “first”, “second”, etc. are used to distinguish the same or similar items with basically the same role and function. It should be understood that there is no logical or temporal dependency between “first”, “second”, and “nth”, nor is there a limit on the quantity and execution order.
  • the term “at least one” means one or more, and the meaning of “multiple” means two or more; for example: multiple code streams refer to two or more code streams, and at least one media track refers to one or more media tracks.
  • Immersive media refers to media files that can provide immersive media content, so that viewers immersed in the media content can obtain visual, auditory and other sensory experiences in the real world. Immersive media can also be called immersive media. According to the degree of freedom of viewers when watching media content, immersive media can be divided into: 6DoF (Degree of Freedom) immersive media, 3DoF immersive media, and 3DoF+ immersive media.
  • 6DoF Degree of Freedom
  • 3DoF immersive media 3DoF immersive media
  • 3DoF+ immersive media 3DoF+ immersive media.
  • 6DoF means that viewers of immersive media can freely translate along the X-axis, Y-axis, and Z-axis.
  • viewers of immersive media can move freely in three-dimensional 360-degree (Virtual Reality, VR) content.
  • VR Virtual Reality
  • Figure 1b is a schematic diagram of 3DoF provided in an embodiment of the present application; as shown in Figure 1b, 3DoF means that the viewer of the immersive media is fixed at a center point in a three-dimensional space, and the head of the viewer of the immersive media rotates along the X-axis, Y-axis and Z-axis to view the picture provided by the media content.
  • immersive media includes temporal immersive media and non-temporal immersive media.
  • the signals in temporal immersive media have a temporal sequence, while the signals in non-temporal immersive media do not have a temporal sequence.
  • immersive media includes but is not limited to: volumetric media, volumetric video media, multi-view video media, Subtitle media and audio media, etc.
  • Volumetric media refers to media with three-dimensional content, such as point cloud media (a typical 6DoF immersive media).
  • Immersive media can be encoded into multiple interchangeable codestreams.
  • Different interchangeable codestreams have an interchangeable relationship, which refers to a relationship that can replace each other. Based on the interchangeable relationship between the interchangeable codestreams, the N interchangeable codestreams are allowed to replace each other during presentation.
  • Different interchangeable codestreams can have the same content and different qualities, or the same content but different encoding types.
  • the codestreams of different resolutions obtained by encoding point cloud media have an interchangeable relationship.
  • the codestream obtained by encoding the point cloud media using a lossy encoding method and the codestream obtained by using a lossless encoding method are interchangeable codestreams.
  • Point cloud refers to a set of irregularly distributed discrete points in space that express the spatial structure and surface properties of a three-dimensional object or scene.
  • Each point in a point cloud includes at least geometric data, which is used to represent the three-dimensional position information of the point.
  • a point in a point cloud may also include one or more sets of attribute data, each set of attribute data is used to reflect an attribute of the point, such as color, material or other information.
  • each point in a point cloud has the same number of sets of attribute data.
  • Point cloud can flexibly and conveniently express the spatial structure and surface properties of three-dimensional objects or scenes, and therefore has a wide range of applications. It can be applied to virtual reality VR games, computer-aided design (CAD), geographic information system (GIS), autonomous navigation system (ANS), digital cultural heritage, free viewpoint broadcasting, three-dimensional immersive remote presentation, three-dimensional reconstruction of biological tissues and organs, and other scenarios.
  • CAD computer-aided design
  • GIS geographic information system
  • ANS autonomous navigation system
  • digital cultural heritage digital cultural heritage
  • free viewpoint broadcasting three-dimensional immersive remote presentation
  • three-dimensional reconstruction of biological tissues and organs and other scenarios.
  • Point clouds can be obtained mainly through the following ways: computer generation, three-dimensional (3D) laser scanning, 3D photogrammetry, etc.
  • point clouds can be obtained by collecting visual scenes in the real world through acquisition equipment (a set of cameras or camera equipment with multiple lenses and sensors).
  • Point clouds of static real-world three-dimensional objects or scenes can be obtained through 3D laser scanning, and millions of point clouds can be obtained per second;
  • point clouds of dynamic real-world three-dimensional objects or scenes can be obtained through 3D photography, and tens of millions of point clouds can be obtained per second.
  • point clouds of biological tissues and organs can be obtained through magnetic resonance imaging (MRI), computed tomography (CT), and electromagnetic positioning information.
  • MRI magnetic resonance imaging
  • CT computed tomography
  • point clouds can also be directly generated by computers based on virtual three-dimensional objects and scenes, such as computers can generate point clouds of virtual three-dimensional objects and scenes.
  • computers can generate point clouds of virtual three-dimensional objects and scenes.
  • Point cloud media includes a point cloud sequence composed of one or more point cloud frames in sequence, and each point cloud frame is composed of geometric data and attribute data of one or more points in the point cloud.
  • a point in the point cloud may include one or more sets of attribute data, and each set of attribute data is used to reflect an attribute of the point.
  • a point in the point cloud has a set of color attribute data, and the color attribute data is used to reflect the color attribute of the point (such as red, yellow, etc.); for another example, a point in the point cloud has a set of reflectivity attribute data, and the reflectivity attribute data is used to reflect the laser reflection intensity attribute of the point.
  • the types of the multiple sets of attribute data can be the same or different.
  • a point in the point cloud can have a set of color attribute data and a set of reflectivity attribute data; for another example, a point in the point cloud can have two sets of color attribute data, and the two sets of color attribute data are used to reflect the color attribute of the point at different times.
  • a track refers to a collection of media data in the process of media file encapsulation, and a track consists of multiple samples with time sequence.
  • a media file may contain one or more tracks.
  • a video media file may include but is not limited to: video media track, audio media track and subtitle media track.
  • metadata information can also be included in the media file as a media type in the form of a metadata track.
  • the so-called metadata information is a general term for information related to the presentation of immersive media, and the metadata information may include descriptive information about the media content of the immersive media.
  • the time-series immersive media is included in the media file of the immersive media in the form of a track, and the track may also be called a media track.
  • a sample is a unit of packaging in the process of media file packaging.
  • a track is composed of many samples.
  • a video media track can be composed of many samples, and a sample is usually a video frame.
  • the temporal immersive media can be included in the media file of the temporal immersive media in the form of a track.
  • the track contains one or more samples, and each sample can contain one or more tactile signals in the temporal immersive media.
  • the sample entry is used to indicate metadata information related to all samples in the track.
  • the sample entry of a video media track usually contains metadata information related to initialization of the decoding device.
  • the sample entry of a volumetric media track may contain relationship indication information for indicating the replaceable relationship between code streams, etc.
  • a project is a packaging unit of non-sequential media data in the process of media file packaging.
  • a static picture can be packaged as a project.
  • non-sequential immersive media can be packaged into one or more projects.
  • a project can also be called a media project.
  • ISOBMFF ISO-Based Media File Format
  • ISOBMFF is a packaging standard for media files, and a typical ISOBMFF file is an MP4 file.
  • DASH is an adaptive bitrate technology that enables high-quality streaming media to be delivered over the Internet through traditional HTTP web servers.
  • MPD Media Presentation Description
  • Representation refers to the combination of one or more media components in DASH.
  • a video file of a certain resolution can be regarded as a Representation.
  • a video file of a certain time domain level can be regarded as a Representation.
  • Adaptation Sets refers to a set of one or more video streams in DASH.
  • An Adaptation Set can contain multiple Representations.
  • an Adaptation Set can be referred to as Adaption.
  • an embodiment of the present application provides a data processing solution for immersive media, which includes a processing flow of an encoding end of immersive media and a processing flow of a decoding end of immersive media.
  • relationship indication information According to the replaceable relationship between the N code streams of the immersive media, generate relationship indication information, where the relationship indication information is used to indicate the replaceable relationship between the N code streams;
  • Obtaining a media file of immersive media wherein the immersive media includes N replaceable code streams, and the media file includes relationship indication information, where the relationship indication information is used to indicate the replaceable relationship between the N code streams, and N is an integer greater than 1;
  • the media file is decoded to present the immersive media.
  • the embodiment of the present application can add relationship indication information to the media file of the immersive media during the encoding process of the immersive media.
  • the relationship indication information can indicate the replaceable relationship between multiple replaceable code streams of the immersive media. Based on the replaceable relationship, the decoding end can be guided to accurately decode the immersive media, thereby ensuring the presentation accuracy of the immersive media and improving the presentation effect of the immersive media.
  • the data processing system 20 of the immersive media may include a service device 201 and a decoding device 202.
  • the service device 201 may be used as the encoding end of the immersive media; the service device 201 may be a terminal device or a server.
  • the decoding device 202 may be used as the decoding end of the immersive media, and the decoding device 202 may be a terminal device or a server.
  • a communication connection may be established between the service device 201 and the decoding device 202.
  • the terminal may be a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a car terminal, a smart TV, etc., but is not limited thereto.
  • the server may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
  • the specific process of the service device 201 and the decoding device 202 performing the data processing of the immersive media is as follows:
  • the service device 201 mainly includes the following data processing process:
  • the decoding device 202 mainly includes the following data processing process:
  • the transmission process of immersive media between the service device 201 and the decoding device 202 can be based on various transmission protocols (or transmission signaling).
  • the transmission protocols here may include but are not limited to: DASH (Dynamic Adaptive Streaming over HTTP, dynamic adaptive streaming media transmission) protocol, HLS (HTTP Live Streaming, dynamic bit rate adaptive transmission) protocol, SMTP (Smart Media Transport Protocol, smart media transmission protocol), TCP (Transmission Control Protocol, transmission control protocol), etc.
  • Scene capture of immersive media refers to obtaining immersive media by collecting real-world visual scenes through a capture device associated with the service device 201; wherein the capture device is used to provide immersive media acquisition services for the service device 201, and the capture device may include but is not limited to any of the following: a camera device, a sensor device, and a scanning device; wherein the camera device may include an ordinary camera, a stereo camera, and a light field camera, etc.
  • the sensor device may include a laser device, a radar device, etc.
  • the scanning device may include a three-dimensional laser scanning device, etc.
  • the capture device associated with the service device 201 may refer to a hardware component provided in the service device 201, for example, the capture device is a camera, a sensor, etc. of a terminal, and the capture device associated with the service device 201 may also refer to a hardware device connected to the service device 201, such as a camera connected to the service device 201, etc.
  • the device generates immersive media, which means that the service device 201 generates immersive media based on virtual objects (such as virtual three-dimensional objects and virtual three-dimensional scenes obtained through three-dimensional modeling).
  • the above immersive media can be point cloud media or other media, such as multi-view video media, volumetric video media, audio media, tactile media, subtitle media, etc.
  • tactile media refers to immersive media of tactile type, which can provide consumers with a media file of tactile sensory experience in the real world.
  • the service device 201 can encode the immersive media to obtain N replaceable code streams of the immersive media, where N is an integer greater than 1.
  • the immersive media is point cloud media
  • the point cloud media obtained can be encoded using a point cloud coding method (Point Cloud Compression, PCC) to obtain N replaceable code streams of the point cloud media.
  • PCC Point Cloud Compression
  • G-PCC Geometry-based Point Cloud Compression, point cloud coding based on geometric structure
  • G-PCC Geometry-based Point Cloud Compression, point cloud coding based on geometric structure
  • the service device 201 generates relationship indication information according to the replaceable relationship between the N code streams of the immersive media.
  • the replaceable relationship refers to the relationship between any two code streams that replace each other.
  • the generated relationship indication information is used to indicate the replaceable relationship between the N code streams.
  • the service device 201 may encapsulate the relationship indication information and the N code streams of the immersive media to obtain a media file of the immersive media.
  • the encapsulation process of the N code streams of the immersive media may include the following methods:
  • any of the N bitstreams of the immersive media may be encapsulated into one or more media tracks.
  • the encapsulation of any of the N code streams of the immersive media can be done by a single-track encapsulation method (one code stream is encapsulated into one media track) or a multi-track encapsulation method (one code stream is encapsulated into multiple media tracks).
  • a media track can be obtained by encapsulating a code stream in a single-track encapsulation manner.
  • the media track includes a sample entry and at least one sample, and each sample includes parameter information, geometric data, and attribute data.
  • FIG3a a schematic diagram of the encapsulation result based on single-track encapsulation
  • the media track 310 obtained by encapsulating the point cloud code stream stores samples 312 and 313 of the geometric point cloud, and the sample entry is 311.
  • the multi-track encapsulation method can be used to encapsulate the bitstream to obtain multiple media tracks.
  • the multi-track encapsulation method can include component-based multi-track encapsulation and fragment-based multi-track encapsulation.
  • the media tracks into which the code stream is encapsulated include a geometry track and an attribute track.
  • the geometry track includes a sample entry and at least one sample, each sample includes parameter information and geometry data;
  • the attribute track includes a sample entry and at least one sample, each sample includes parameter information and attribute data.
  • FIG3b a schematic diagram of the encapsulation result of a component-based multi-track encapsulation is shown in FIG3b.
  • the point cloud media is encapsulated into one geometry component track 321 and two attribute component tracks 322 and 323, and different attribute component tracks contain different attribute data, such as attribute 1 data 324 and attribute 2 data 325 in FIG3b.
  • the geometry component track is associated with both attribute component tracks, as shown by the dotted arrows.
  • the code stream can be encapsulated into a slice-based media track, including a slice base track and multiple slice tracks.
  • Each sample in the slice base track includes a geometry header and an attribute header; each sample in the slice track includes one or more slices.
  • each slice includes a geometry header, geometry data, an attribute header, and attribute data.
  • FIG3c a schematic diagram of the encapsulation result of the slice-based multi-track encapsulation is shown in FIG3c. Among them, a slice base track 331 and two slice tracks 332 and 333 are included, slice track 332 includes slices 1 and 2, slice track 333 includes slice 3, and the slice base track is associated with both slice tracks, as shown by the dotted arrows.
  • the slice track may include a geometry track and an attribute track, each slice in the geometry track includes a geometry header and geometry data, and each slice in the attribute track includes an attribute header and attribute data.
  • the geometry track and the attribute track are associated, and the slice base track may be associated with at least one geometry slice track.
  • FIG3d shows a schematic diagram of another encapsulation result of multi-track encapsulation based on slices. It includes 1 slice base track (slice base track) 341, 2 geometry component tracks 342, 344 and 2 attribute component tracks 343, 345.
  • the geometry data and the attribute data are respectively encapsulated in slices of different tracks, and a geometry component track (such as 344) and an attribute component track (such as 345) are associated, and the association is reflected in that the data in its sample comes from the same slice (such as slice 3); the slice base track (slice base track) 341 is associated with each geometry component track 342, 344, as shown by the dotted arrows.
  • any code stream of immersive media can be encapsulated in a single track or in multiple tracks, which is not limited in this application.
  • the relationship indication information can be added to the corresponding media track to form a media file of the immersive media.
  • the relationship indication information can be added to the sample entry of the corresponding media track.
  • the setting of the relationship indication information may include the following:
  • relationship indication information can be added to any of the multiple media tracks to indicate that the combination of the media track and other media tracks corresponds to one code stream.
  • relationship indication information can be added to the media track to indicate the replaceable relationship between the codestream to which the media track belongs and other codestreams in the N codestreams.
  • relationship indication information can be added to the same media track to indicate that the media track is shared by different codestreams.
  • any of the N bitstreams of the immersive media can be encapsulated into one or more media items.
  • the relationship indication information can be added to the corresponding media item to form a media file of the immersive media. Adding the relationship indication information to the corresponding media item is similar to adding the relationship indication information in the above-mentioned media track, and will not be described in detail here.
  • the service device 201 may send the media file to the decoding device 202 .
  • the decoding device 202 may obtain the media file of the immersive media and description information for media presentation through the service device 201 , where the description information includes relevant information of the media file of the immersive media.
  • the decoding device 202 can obtain relationship indication information from the media file, and then select the required presentation code stream based on the replaceable relationship indicated by the relationship indication information, specifically organize the media track/media item corresponding to the code stream, and then decode the media track/media item to present the immersive media.
  • the immersive media can be transmitted in a streaming manner.
  • the transmission signaling (such as DASH, SMT) includes description information of the relationship indication information.
  • the media file segments (including one or more media tracks/one or more media items) of the immersive media to be decoded can be determined for decoding processing, thereby presenting the immersive media.
  • the present application also provides a flow chart of an immersive media data processing method.
  • the flow chart of the immersive media data processing method includes the following contents:
  • the real-world visual scene A can be sampled through a collection device (such as a group of cameras or a camera device with multiple lenses and sensors) to obtain the source data B of the immersive media corresponding to the real-world visual scene.
  • a collection device such as a group of cameras or a camera device with multiple lenses and sensors
  • the source data B is a frame sequence composed of a large number of point cloud frames
  • the acquired immersive media is encoded to obtain a code stream E, which contains N replaceable code streams
  • relationship indication information can be generated based on the replaceable relationship between the N code streams, and then the code stream E and the relationship indication information are packaged to obtain a media file corresponding to the immersive media.
  • the code stream of the immersive media can be packaged into one or more media tracks (or media items), and the relationship indication information is added to the corresponding media track (or media item), thereby forming a media file of the immersive media.
  • the service device 201 can synthesize one or more encoded bit streams into a media file F for file playback, or a sequence (Fs) of initialization segments and media segments for streaming transmission according to a specific media container file format; wherein the media container file format may refer to the ISO basic media file format specified in the International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) 14496-12.
  • ISO International Organization for Standardization
  • IEC International Electrotechnical Commission
  • the service device 201 may also generate description information of the relationship indication information according to the replaceable relationship between the N code streams, and the description information of the relationship indication information may be sent to the decoding device 202 through transmission signaling, and the decoding device 202 may decide whether to use the transmission signaling to obtain the media file of the immersive media according to the transmission mode of the media file.
  • the transmission signaling may be in the form of a signaling description file.
  • the media file of the immersive media sent by the service device 201 is received, and the media file may include: a media file F' for file playback, or a sequence Fs' of initialization segments and media segments for streaming; then the media file is decapsulated to obtain a code stream E'; then the relationship indication information is obtained from the media file, and the code stream required to be presented is determined from the N code streams based on the replaceable relationship indicated by the relationship indication information, and the code stream required to be presented is decoded to obtain the immersive media D'; wherein, the decoding device 202 can obtain the initialization segment and media segment Fs' for streaming based on the transmission signaling.
  • the decoding of the code stream is specifically the decoding of the media track/media item corresponding to the code stream.
  • the data processing technology of immersive media involved in this application can be implemented based on cloud technology; for example, using a cloud server as a service device.
  • Cloud technology refers to a hosting technology that unifies a series of resources such as hardware, software, and network in a wide area network or a local area network to achieve data calculation, storage, processing, and sharing.
  • the data processing technology of immersive media provided in this application can be applied to point cloud compression related products and various links such as service device ends, playback device ends, and intermediate nodes in immersive systems.
  • the service device may obtain the immersive media, encode the immersive media, obtain N replaceable code streams, and encapsulate the N code streams and relationship indication information (used to indicate the replaceable relationship between the code streams) to obtain a media file of the immersive media; then the decoding device may obtain the media file of the immersive media, and encapsulate the N code streams and relationship indication information (used to indicate the replaceable relationship between the code streams) to obtain the media file of the immersive media;
  • the replaceable relationship shown is used to determine the code stream to be presented from the N code streams of the immersive media for decoding and presenting the immersive media. It can be seen that in the encoding process of the immersive media, relationship indication information can be added to the media file. In this way, the replaceable relationship between the code streams can be clearly indicated through the relationship indication information, thereby effectively guiding the decoding end to decode and present the immersive media more accurately, thereby improving the presentation effect of the immersive media.
  • sequential immersive media can be encapsulated into one or more media tracks, and the media tracks include component tracks.
  • the replaceable relationship between code streams can be indicated by the replaceable relationship between component tracks.
  • Component tracks that can be replaced with each other can constitute a track replaceable group.
  • the replaceable information structure V3CAlternativeInfoStruct
  • the syntax of the replaceable information structure is shown in Table 1 below.
  • Quality ranking identification field (quality_ranking_flag): When the value is 1, it indicates that there is a quality interchangeable relationship between the component tracks in the track interchangeable group; when the value is 0, it indicates that there is no quality interchangeable relationship between the interchangeable component tracks.
  • Coding type identification field (codec_type_flag): When the value is 1, it indicates that there is an alternative relationship in coding type between the component tracks in the track alternative group; when the value is 0, it indicates that there is no alternative relationship in coding type between the component tracks in the track alternative group.
  • Quality ranking field used to indicate quality ranking information. The smaller the value of the quality ranking field, the higher the quality of the corresponding component track.
  • Coding type field (codec_type): used to indicate the coding type of the corresponding component track.
  • the mutually replaceable component tracks belong to the same track replaceable group of the volumetric video. Only one of the component tracks in the track replaceable group can be indexed by the corresponding atlas track or atlas fragment track.
  • the component track contains various data of a video frame, such as geometric data, attribute data, etc.; the atlas track contains images, for example, a video frame is encapsulated as an atlas track in the form of an image.
  • Table 2 For the definition and grammatical representation of the track replaceable group, please refer to the following Table 2.
  • the track replaceable group can be indicated by a track group type data box (TrackGroupTypeBox), the type of which is 'valg' and is included in a track group data box (TrackGroupBox).
  • TrackGroupTypeBox the type of which is 'valg' and is included in a track group data box
  • Zero or more TrackGroupTypeBoxes can be set in a media track.
  • non-temporal immersive media there may be a replaceable relationship between its component items.
  • the non-temporal immersive media is a non-temporal volumetric media
  • V3CAlternativeEntityToGroupBox is used to indicate the difference information (such as quality difference information) between the replaceable component items.
  • the mutually replaceable component items only one can be indexed by the corresponding atlas item or atlas slice item.
  • Table 3 For the definition and grammatical representation of the item replaceable group, please refer to the following Table 3.
  • the project replaceable group is indicated by an entity group data box (EntityToGroupBox), the type of the data box is 'valy', and 0, 1 or more entity group data boxes can be set in a component project.
  • EntityToGroupBox entity group data box
  • type of the data box is 'valy'
  • 0, 1 or more entity group data boxes can be set in a component project.
  • corresponding data boxes can be set according to corresponding syntax and included in media files to indicate the replaceable relationship between items/tracks.
  • the interchangeable relationship between the bitstreams cannot be clearly indicated based only on the interchangeable relationship between projects or tracks, thus affecting the presentation of immersive media.
  • the media file 1 contains 2 bitstreams, and each bitstream is encapsulated in a multi-track manner to obtain multiple media tracks, wherein track 1 (track1) and track 2 (track2) correspond to bitstream 1, and track 1 and track 3 (track3) correspond to bitstream 2, and when the geometric information in bitstream 1 and bitstream 2 is completely consistent, that is, the geometric information is encoded in exactly the same way, only one geometric track is included in the media file 1, that is, track 1, and track 2 and track 3 are interchangeable with each other.
  • the interchangeable relationship between track 2 and track 3 and the shared geometric track are indicated, and the interchangeable relationship between the bitstreams can be known.
  • media file 2 includes code stream 3, and the geometric tracks of code stream 3 are not repeated (i.e., not shared by multiple code streams), only the replaceable relationship between track 2 and track 3 is indicated, and the replaceable relationship between code stream 3 and other code streams cannot be indicated. In this case, the indication of the replaceable relationship between the code streams is not clear enough.
  • the embodiment of the present application expands the relationship indication information to indicate the replaceable relationship at the code stream level, supports the use in various file encapsulation scenarios, flexibly organizes the media tracks/media items corresponding to the code streams, clearly indicates the replaceable relationship between code streams, and has high versatility.
  • FIG. 5 is a flow chart of an immersive media data processing method provided in an embodiment of the present application.
  • the immersive media data processing method can be executed by a decoding device 202 in an immersive media data processing system.
  • the method includes the following steps S501-S502:
  • the immersive media includes N replaceable code streams;
  • the media file includes relationship indication information,
  • the relationship indication information is used to indicate the replaceable relationship between N code streams, where N is an integer greater than 1.
  • the immersive media can be a timing immersive media or a non-timing immersive media; according to the signal characteristics of the immersive media, the immersive media can be a point cloud media or other media, such as: any one of multi-view video media, audio media, subtitle media, tactile media, volumetric video media, etc.
  • the N code streams of the immersive media have a replaceable relationship. Based on the replaceable relationship, any two code streams can be replaced with each other, that is, they are replaceable code streams. Therefore, each of the N code streams can also be called a replaceable code stream.
  • the immersive media includes 3 replaceable code streams, namely code stream 1, code stream 2 and code stream 3, and the 3 code streams are replaceable versions of the same content with different qualities.
  • Any code stream can be a binary code stream or other base code streams (such as quaternary code streams, hexadecimal code streams, etc.). This application does not impose any restrictions on this.
  • the setting of the relationship indication information in the media file and the content indicated by the relationship indication information are explained by taking the case where the time-sequential immersive media is encapsulated as a media track in the media file and the non-time-sequential immersive media is encapsulated as a media item in the media file.
  • the immersive media is time-sequential immersive media.
  • the N code streams of the immersive media are encapsulated as M media tracks in the media file, where M is an integer and is greater than or equal to N.
  • the number of media tracks M ⁇ the number of replaceable code streams N.
  • the M media tracks include a corresponding number of different media tracks corresponding to the N code streams, and all media tracks corresponding to the N code streams are included in one media file.
  • the immersive media includes three replaceable code streams, namely code stream 1, code stream 2, and code stream 3.
  • the media file contains eight media tracks, namely, three media tracks into which code stream 1 is encapsulated, four media tracks into which code stream 2 is encapsulated, and one media track into which code stream 3 is encapsulated. There are no identical media tracks in the media tracks of the various code streams, that is, the various code streams do not share media tracks.
  • the relationship indication information can be set in the media track, specifically in the sample entry of the media track. It can be regarded as the replaceable information metadata of the media track to indicate the replaceable relationship between the code streams.
  • any code stream among the N code streams is represented as code stream i
  • any two code streams among the N code streams can be represented as code stream i and code stream j respectively, where i and j are positive integers and both i and j are less than or equal to N.
  • the setting of relationship indication information may include the following (1.1)-(1.3):
  • Code stream i is encapsulated into one media track Mi among M media tracks, and relationship indication information is set in media track Mi.
  • a single-track encapsulation method can be used at the encoding end to encapsulate it into a single media track Mi.
  • M media tracks include the media track Mi
  • the media track Mi can be used to represent the code stream i.
  • Relationship indication information is set in the media track Mi , and can be used to indicate the replaceable relationship between the code stream i and other code streams to which the media track Mi belongs.
  • other code streams refer to code streams other than code stream i among the N code streams.
  • Codestream i is encapsulated into multiple media tracks among M media tracks, and relationship indication information is set in media track Mi ;
  • media track Mi refers to any media track among the multiple media tracks into which codestream i is encapsulated.
  • the encoding end may also use a multi-track encapsulation method to encapsulate it into multiple media tracks.
  • the M media tracks include multiple media tracks corresponding to the code stream i.
  • the encapsulated multiple media tracks can be combined to represent the code stream i.
  • Each media track in the multiple media tracks belongs to the code stream i.
  • the relationship indication information can be set in any media track in the multiple media tracks, that is, the media track Mi.
  • the relationship indication information when the relationship indication information is set in the media track Mi among multiple media tracks, the relationship indication information can not only indicate the replaceable relationship between the codestream i to which the media track belongs and other codestreams, but can also be used to indicate the association relationship between the media track Mi and other media tracks corresponding to the codestream i; wherein other media tracks refer to the media tracks other than the media track Mi in the multiple media tracks into which the codestream i is encapsulated.
  • the association relationship is used to indicate that the media track Mi and other media tracks belong to the same codestream i.
  • the above association relationship may refer to a combination relationship between multiple media tracks belonging to the same codestream, and the codestream i can be fully represented by the combination between the media track Mi and other media tracks of the codestream i, and the relationship indication information may include an indication of the association relationship.
  • Codestream i is encapsulated into the first plurality of media tracks among the M media tracks; codestream j is encapsulated into the second plurality of media tracks among the M media tracks; if the first plurality of media tracks and the second plurality of media tracks both include the media track Mij , the relationship indication information is also used to indicate the shared ownership relationship of the media track Mij .
  • the number of media tracks into which code stream i is encapsulated and the number of media tracks into which code stream j is encapsulated may be the same or different, but the number of media tracks into which code stream i and code stream j are encapsulated is greater than 1.
  • code stream i is encapsulated into 3 media tracks out of 8 media tracks
  • code stream j is encapsulated into 2 media tracks out of 8 media tracks.
  • the different code streams encapsulated may be encapsulated into different media tracks.
  • the media track may contain the same media track.
  • bitstream i For the same media track, one can be retained in the media file. In this way, there are no duplicate media tracks among the M media tracks, which can effectively save storage resources.
  • the three media tracks corresponding to bitstream i include geometry track 1, and the two media tracks corresponding to bitstream j include geometry track 2; the geometry data in bitstream i and bitstream j are obtained in exactly the same encoding method, then bitstream i and bitstream j have the same geometry track, that is, geometry track 1 and geometry track 2 belong to the same media track, then only one geometry track (that is, geometry track 1 or 2) can be retained in the media file.
  • the relationship indication information may be set in the media track Mij to indicate the shared ownership relationship of the media track Mij .
  • the shared ownership relationship is used to indicate that the media track Mij is a media track shared by the codestream i and the codestream j. That is, the media track Mij belongs to both the codestream i and the codestream j.
  • Such a media track may also be referred to as a shared media track. It is understandable that any shared media track may be shared by at least two codestreams. For example, if a media track in a media file is shared by three codestreams, the number of shared media tracks included in the M media tracks may be 0 or greater than 0.
  • the relationship indication information set on the media track Mij can indicate the following multiple relationships: the replaceable relationship between the codestream to which the media track Mij belongs and other codestreams, the association relationship between the media track Mij and other media tracks corresponding to the codestream i/codestream j, and the shared ownership relationship of the media track Mij . If the media track Mij mentioned in (1.2) above is a media track shared between different codestreams, the relationship indication information set on the media track Mij can also indicate the above multiple relationships.
  • any one or more of the following relationships can be indicated: the replaceable relationship between the code stream to which the media track belongs and other code streams, the association relationship between the media track and other media tracks belonging to the same code stream, and the shared ownership relationship of the media track.
  • the indication based on the above relationship can clearly indicate the replaceable relationship between the code streams, so as to facilitate the flexible and accurate organization of the media tracks corresponding to the replaceable code streams, and decode the media tracks corresponding to the code streams to be presented to present the immersive media.
  • the multiple media tracks that need to be played together belong to the same playback track group.
  • media tracks can be played jointly.
  • Media tracks belonging to the same playback track group may belong to the same code stream or to different code streams.
  • code streams that adopt multi-track encapsulation a combination of multiple media tracks is required to fully represent the code stream.
  • multiple media tracks that exist for M media tracks need to be played jointly.
  • These multiple media tracks that need to be played jointly belong to the same code stream and can be classified into a playback track group.
  • the combination of media tracks can be indicated by the playback track group.
  • immersive media as time-sequential volumetric media (such as volumetric video) as an example
  • the playback track group of volumetric video has the definition and grammatical representation shown in Table 4 below.
  • volumetric video For media tracks of volumetric video, only certain specific combinations of media tracks should be played together, and a playback track group of volumetric video can be used to indicate the combination of media tracks required for joint playback.
  • the TrackGroupBox (track group data box) of the media track contains a PlayoutTrackGroupBox (extended from TrackGroupTypeBox in ISO/IEC 14496-12, i.e., playback track group data box) with a unique track_group_id (track group identifier, used to indicate the identifier of the playback track group).
  • PlayoutTrackGroupBox indicates that the corresponding media track belongs to a media track in a playback track group.
  • the joint quality level can be selectively defined to indicate media content of different qualities.
  • Quality ranking identification field when the value is 1, it indicates that all media tracks of the playback track group of the volumetric video have a joint quality ranking; when the value is 0, it indicates that all media tracks of the playback track group of the volumetric video do not have a joint quality ranking.
  • Quality ranking field used to indicate the joint quality ranking of all media tracks in a playback track group of a volumetric video. The smaller the value of the quality ranking field, the higher the joint quality ranking.
  • the immersive media is non-sequential immersive media; N code streams are encapsulated as P media items in the media file; P is an integer and P is greater than or equal to N.
  • a code stream of non-sequential immersive media can be encapsulated as one or more media items in a media file, P media items include a corresponding number of different media items corresponding to N code streams, and the P media items are included in one media file.
  • the relationship indication information can be set in the media item.
  • any one of the N bitstreams is represented as bitstream i
  • any two of the N bitstreams can be represented as bitstream i and bitstream j, respectively, where i and j are positive integers and both are less than or equal to N.
  • the setting of the relationship indication information may include the following (2.1)-(2.3):
  • the code stream i is encapsulated into a media item Pi among P media items, and the relationship indication information is set in the media item Pi .
  • the codestream i is encapsulated as a single media item Pi and included in P media items.
  • the media item Pi can be used to represent the codestream i.
  • the relationship indication information can be set in the media item and can be used to indicate the replaceable relationship between the codestream i to which the media track Mi belongs and other codestreams, where the other codestreams refer to the codestreams other than the codestream i in the N codestreams.
  • the code stream i is encapsulated into multiple media items among P media items, and the relationship indication information is set in the media item Pi ;
  • the media item Pi refers to any media item among the multiple media items into which the code stream i is encapsulated.
  • the relationship indication information may be set in any media item among the multiple media items, that is, in the media item P i .
  • the relationship indication information is also used to indicate the association relationship between the media item Pi and other media items corresponding to the code stream i.
  • other media items refer to the media items other than the media item Pi in the multiple media items into which the code stream i is encapsulated; the association relationship is used to indicate that the media item Pi and other media items belong to the same code stream i.
  • the above association relationship may refer to the combination relationship between multiple media items corresponding to the same code stream, and the combination of the media item Pi and other media items belonging to the code stream i (i.e., including all project media belonging to the same code stream) can completely represent the code stream i.
  • the relationship indication information includes an indication of the association relationship.
  • the media item Pi with the relationship indication information may belong only to the codestream i, or may belong to the codestream i and at least one codestream other than the codestream i among the N codestreams. If the media item Pi belongs to at least two codestreams, the relationship indication information set in the media item Pi also has the relationship indication in the following (2.3).
  • Codestream i is encapsulated into the first plurality of media items among the P media items; codestream j is encapsulated into the first plurality of media items among the P media items. If both the first plurality of media items and the second plurality of media items include media item Pi j , the relationship indication information is further used to indicate the shared attribution relationship of media item Pi j ; the shared attribution relationship is used to indicate that media item Pi j is a media item shared by codestream i and codestream j.
  • bitstream i and bitstream j the number of media items into which bitstream i is encapsulated and the number of media items into which bitstream j is encapsulated may be the same or different.
  • bitstream i is encapsulated into 3 media items
  • bitstream j is encapsulated into 2 media items.
  • each media track obtained by encapsulating different bitstreams may contain the same media items.
  • media items repeated between different bitstreams one may also be retained in the media file, which can effectively save storage resources.
  • a media file includes 7 media items, 3 of which are corresponding to bitstream i and 5 of which are corresponding to bitstream j.
  • the media items corresponding to bitstream i and the media items corresponding to bitstream j both contain media item x. Only one of the media items x is retained in the media file and belongs to bitstream i and bitstream j.
  • the relationship indication information may be set in the media item Pi j and may be used to indicate the shared ownership relationship of the media item Pi j .
  • the shared ownership relationship is used to indicate that the media item Pi j is a media track shared by the code stream i and the code stream j. That is, the media item Pi j belongs to both the code stream i and the code stream j.
  • Such media items may also be referred to as shared media items. It is understandable that any shared media item may be shared by at least two of the N code streams, and the number of shared media items included in the M media items may be 0 or greater than 0.
  • the relationship indication information set on the media item Pi j can indicate the following multiple relationships: the replaceable relationship between the bitstream to which the media item Pi j belongs and other bitstreams, the association relationship between the media item Pi j and other media items corresponding to the bitstream i/bitstream j, the shared belonging relationship of the media item Pi j, the When the media item Pi mentioned in (2.2) above is a media track shared by different code streams, the above relationship can also be indicated by the relationship indication information set on the media item Pi .
  • any one or more of the following relationships can be indicated: the replaceable relationship between the code stream to which the media item belongs and other code streams, the association relationship between the media item and other media items belonging to the same code stream, and the shared ownership relationship of the media item.
  • the indication based on the above relationships can clearly indicate the replaceable relationship between the code streams, so as to facilitate the flexible and accurate organization of the media items corresponding to the replaceable code streams, and decode them to present the immersive media.
  • the multiple media items that need to be jointly played belong to the same playback entity group.
  • the playback entity group can indicate the combination of media items that are played together.
  • the playback entity group for non-sequential volumetric media has the definition shown in Table 5 below.
  • the playback entity group of volumetric media can be used to indicate the combination of these jointly played media items.
  • the playback entity group is represented by a playback entity group data box PlayoutEntityToGroupBox of the 'eply' type, which can be set in the media item.
  • PlayoutEntityToGroupBox of the 'eply' type, which can be set in the media item.
  • its joint quality level can be selectively defined to indicate media content of different qualities.
  • Quality ranking flag field when the value is 1, it indicates that all media items of the playback entity group of the volumetric media have a joint quality ranking; when the value is 0, it indicates that all media items of the playback entity group of the volumetric media do not have a joint quality ranking.
  • Quality ranking field used to indicate the joint quality ranking of all media items in a playback entity group of a volumetric media. The smaller the value of the quality ranking field, the higher the joint quality ranking.
  • N code streams with an alternative relationship belong to the same alternative group, and different code streams in the same alternative group can be replaced with each other when presented;
  • the relationship indication information includes an alternative information data box (AlternativeInfoBox).
  • the alternative information data box has a definition as shown in Table 6 below.
  • the replaceable information data box is a newly added data box of type 'alif', which can be set in the sample entry of the media item or media track. That is, a media item or a sample entry of a media track may include a replaceable information data box.
  • the number of replaceable information data boxes may be greater than or equal to 0, that is, 0, 1 or more replaceable information data boxes may be set in a media track/media item, which is determined according to the characteristics of the media track/media item. For example, if the media track track1 belongs to two code streams, then 2 replaceable information data boxes may be set.
  • the replaceable information data box can be used to indicate the information of the replaceable group to which the code stream of the media track/media item belongs: a. If the replaceable information data box is set in the current media track, the replaceable information data box contains the information of the replaceable group to which the code stream of the current media track belongs. The current media track refers to the media track being decoded. b. If the replaceable information data box is set in the current media item, the replaceable information data box contains the information of the replaceable group to which the code stream of the current media item belongs. The current media item refers to the media item being decoded.
  • the immersive media is sequential immersive media, there may be one or more media tracks in the media file with replaceable information data boxes. If the immersive media is non-sequential immersive media, there may be one or more media items in the media file with replaceable information data boxes. For any media track or media item being decoded in the media file, if there is a replaceable information data box, the replaceable information data box indicates the information of the replaceable group to which the corresponding code stream belongs, so as to indicate the replaceable relationship between the corresponding code streams.
  • the alternative information data box includes an alternative group identification flag field (alternative_group_id_flag) and an alternative group identification field (alternative_group_id). If the alternative information data box is set in the current media track, the alternative group identification flag field is used to indicate whether the alternative information data box in the current media track indicates the alternative group identifier to which the code stream corresponding to the current media track belongs.
  • the alternative information data box can be specifically set in the sample entry of the current media track.
  • the value of the alternative group identifier field is the first preset value (such as "0"), it means that the alternative information data box in the current media track indicates the alternative group identifier to which the corresponding code stream of the current media track belongs.
  • the value of the alternative group identifier flag field is the second preset value (such as "1"), it means that the alternative information data box in the current media track does not indicate the alternative group identifier to which the corresponding code stream of the current media track belongs.
  • the alternative group identifier flag field is the first preset value (such as "0")
  • the alternative group identifier exists in the track header data box (TrackHeaderBox) of the current media track.
  • the alternative group identifier field (alternative_group_id) is used to indicate the alternative group identifier to which the corresponding code stream of the current media track belongs; the alternative group identifiers corresponding to different code streams of the same alternative group are the same, and the alternative group identifier can be a value, such as 1.
  • the replaceable group identification flag field is used to indicate whether the replaceable information data box in the current media project indicates the replaceable group identifier to which the corresponding code stream of the current media project belongs. Based on different values of the replaceable group identification flag field, the replaceable group identification flag field can indicate different contents.
  • the replaceable group identification flag field when the value of the replaceable group identification flag field is a first preset value (such as "0"), it means that the replaceable information data box in the current media project indicates the replaceable group identifier to which the corresponding code stream of the current media project belongs; when the value of the replaceable group identification flag field is a second preset value (such as "1"), it means that the replaceable information data box in the current media project does not indicate the replaceable group identifier to which the corresponding code stream of the current media project belongs; the replaceable group identification field is used to indicate the replaceable group identifier to which the corresponding code stream of the current media project belongs; the replaceable group identifier corresponding to different code streams of the same replaceable group is the same, and the replaceable group identifier can be a value, such as 1; the replaceable group identifier can also be a string of characters, such as aabbxx.
  • a media track/media item in which a replaceable information data box is set may indicate the replaceable group identifier to which the code stream corresponding to the media track/media item belongs in the replaceable information data box based on the replaceable group identifier flag field and the replaceable group identifier field contained in the replaceable information data box.
  • the values of the replaceable group identifier field in the replaceable information data boxes in different media tracks/media items are the same, indicating that the replaceable group identifiers to which the corresponding code streams belong are the same.
  • the relationship indication information is also used to indicate the shared ownership relationship of the current media track or the shared ownership relationship of the current media item.
  • the indication of the shared ownership relationship includes the following two methods: one is to indicate based on the field in the replaceable information data box, and the other is to indicate based on the number of replaceable information data boxes.
  • Method 1 indicating based on a field in a replaceable information data box.
  • the alternative information data box includes a multi-alternative bitstream flag field (multi_alternative_bitstream_flag) and a bitstream number field (num_bitstream).
  • the multiple alternative code stream flag field is used to indicate whether the current media track belongs to
  • the replaceable information data box may be specifically provided in the sample entry of the current media track.
  • the value of the multiple replacement codestream flag field when the value of the multiple replacement codestream flag field is a first preset value (such as "0"), it indicates that the current media track belongs to only one codestream; the current media track is a component of a codestream.
  • the current media track can represent a codestream alone, or be combined with other media tracks to represent a codestream.
  • the value of the multiple replacement codestream flag field is a second preset value (such as "1"), it indicates that the current media track belongs to multiple codestreams.
  • the current media track is also a component of multiple codestreams.
  • the current media track is shared by at least two codestreams.
  • the current media track track1 is both a media track in the multiple media tracks corresponding to codestream codestream 1 and a media track in the multiple media tracks corresponding to codestream codestream 2.
  • the codestream number field (num_bitstream) is used to indicate the number of codestreams to which the current media track belongs. That is, when the value of the multiple replacement codestream flag field indicates that the current media track belongs to multiple codestreams, the number of codestreams to which the current media track belongs can be indicated by the codestream number field (num_bitstream).
  • the value of the bitstream number field is the same as the number of bitstreams belonging to the current media track. For example, if the current media track belongs to K (greater than 1) bitstreams, the value of the bitstream number field can be K.
  • multi_alternative_bitstream_flag in the alternative information data box set in the current media track track1 is 1, and num_bitstream is equal to 3, it means that the track track1 belongs to 3 bitstreams, that is, the track track1 is a media track shared by the 3 bitstreams.
  • the multi-alternative bitstream flag field is used to indicate whether the current media item belongs to multiple bitstreams; when the value of the multi-alternative bitstream flag field (multi_alternative_bitstream_flag) is a first preset value (such as "0"), it indicates that the current media item belongs to only one bitstream, and the current media item is a component of one bitstream; when the value of the multi-alternative bitstream flag field is a second preset value (such as "1"), it indicates that the current media item belongs to multiple bitstreams, and the current media track is a component of multiple bitstreams at the same time.
  • the bitstream number field (num_bitstream) is used to indicate the number of bitstreams to which the current media item belongs, and the value of the bitstream number field is the same as the number of bitstreams to which the current media item belongs.
  • the multiple replacement codestream flag field and the codestream quantity field can indicate whether the current media track or the current media item is shared by multiple codestreams, thereby clarifying the sharing relationship of the current media track/current media item and the number of codestreams belonging to the current media track/current media item.
  • Method 2 Based on the indication of the number of replaceable information data boxes.
  • the current media track contains only one replaceable information data box, it indicates that the current media track belongs to only one code stream; if the current media track contains multiple replaceable information data boxes, it indicates that the current media track belongs to multiple code streams, and the number of replaceable information data boxes in the current media track should be equal to the number of code streams to which the current media track belongs.
  • the number of alternative information data boxes in the current media track is the same as the number of code streams to which the current media track belongs, indicating the number of code streams to which the current media track belongs.
  • AlternativeInfoBox By adding multiple alternative information data boxes (AlternativeInfoBox) in the current media track, it can be indicated that the current media track has a shared ownership relationship, that is, the current media track can be shared by multiple code streams.
  • the current media track For a current media track that belongs to only one of the N replaceable code streams, the current media track contains at most one alternative information data box (AlternativeInfoBox). Here, at most one means 0 or 1.
  • the current media track belongs to one code stream, it may be one of the multiple media tracks corresponding to the multi-track encapsulated code stream, rather than a single media track corresponding to the single-track encapsulated code stream. At this time, it may not be necessary to set an alternative information data box in the current media track.
  • the current media project contains only one replaceable information data box, it indicates that the current media project belongs to only one code stream; if the current media project contains multiple replaceable information data boxes, it indicates that the current media project belongs to multiple code streams, and the number of replaceable information data boxes in the current media project should be equal to the number of code streams to which the current media project belongs. Schematically, if the current media project contains two replaceable information data boxes, it can indicate that the current media track belongs to two code streams.
  • the replaceable information data box does not include the multiple replacement code stream flag field and/or the code stream quantity field in manner 1.
  • the shared ownership relationship of the current media track/current media item is indicated based on the number of replaceable information data boxes, so that the shared media track/media item can be identified, and corresponding information can be set in different replaceable information data boxes of the current media track/current media item to further indicate the association relationship between the current media track/current media item and the corresponding media track/corresponding media item.
  • the relationship indication information is further used to indicate the association relationship between the current media track and other media tracks belonging to the same code stream as the current media track, or to indicate the association relationship between the current media item and other media items belonging to the same code stream as the current media item.
  • the replaceable information data box includes a component reference type field (components_ref_type), which is used to indicate the association between the current media track and other media tracks belonging to the same code stream as the current media track, or the current media item and other media tracks related to the current media track.
  • component_ref_type component reference type field
  • a replaceable information data box is set in the current media track.
  • component_ref_type is a first preset value (such as "0"), it means that the current media track is associated with other media tracks belonging to the same code stream as the current media track through a track reference, and the replaceable information data box also includes: a track reference type field (track_ref_type), which is used to indicate the type of track reference.
  • the track reference is used to associate the current media track with other media tracks.
  • the type of the track reference is indicated by the track reference type field (track_ref_type). If the track reference type fields of different media tracks have the same value, it can indicate that the media tracks are associated with each other.
  • the replaceable information data box also includes: a track group type field (track_group_type) and a track group identification field (track_group_id), the track group type field is used to indicate the type of the track group to which the current media track belongs, and the track group identification field is used to indicate the identifier of the track group to which the current media track belongs.
  • the track group contains multiple media tracks.
  • the above track group can be the media track corresponding to the same code stream, the media tracks contained in each track group can be combined to represent a code stream, and M media tracks can correspond to N track groups.
  • the type of the track group to which the current media track belongs is indicated by the track group type field (track_group_type) contained in the replaceable data box, and the identifier of the track group to which the current media track belongs is indicated by the track group identification field (track_group_id).
  • the identifier and type of the same track group are the same, and the identifier can be a number or a string to indicate that the media tracks in the track group are interrelated.
  • a new data box can be replaced and set in the current media project.
  • component_ref_type the third preset value (such as "2")
  • the replaceable information data box also includes: an item reference type field (item_ref_type), which is used to indicate the type of project reference.
  • the item reference is used to associate the current media item with other media items.
  • the type of the item reference is indicated by the item reference type field (item_ref_type).
  • the item reference type fields of different media items have the same value, indicating that the media items can be associated with each other.
  • the replaceable information data box also includes: an entity group type field (track_group_type) and an entity group identification field (track_group_id), the entity group type field is used to indicate the type of the entity group to which the current media item belongs; the entity group identification field is used to indicate the identifier of the entity group to which the current media item belongs.
  • the entity group includes one or more media items.
  • an entity group includes all media items corresponding to a code stream, so that the code stream is represented by the entity group.
  • the type of the entity group to which the current media item belongs is indicated by the entity group type field (track_group_type) contained in the replaceable data box, and the identifier of the entity group to which the current media item belongs is indicated by the entity group identification field (track_group_id).
  • the identifier and type of the same entity group are the same to indicate the association relationship between the media items in the entity group.
  • the current media track can be further associated with other media tracks, or the current media item can be associated with other media items based on the values of some fields, so as to clearly indicate the combination of media tracks or media items corresponding to the same code stream. Furthermore, in the replaceable information data box, in combination with the indication of the association relationship and the shared ownership relationship, the replaceable relationship between the media track combination can be indicated, and then the replaceable relationship at the code stream level can be indicated.
  • the replaceable information data box further includes a multi-components flag field (multi_components_flag).
  • the multi-component flag field (multi_components_flag) is used to indicate whether the code stream to which the current media track belongs is encapsulated into multiple media tracks.
  • the replaceable information data box can be set in the sample entry of the current media track.
  • the encapsulation method of the code stream to which the current media track belongs can be known through the multi-component flag field, and then the component attribute of the code stream belonging to the current media track can be known, and the component attribute means that the current media track is one of the multiple media tracks to which the code stream is encapsulated, or a single media track to which the code stream is encapsulated.
  • multi_components_flag When the value of the multi-component flag field (multi_components_flag) is the first preset value (such as "0"), it means that the code stream to which the current media track belongs is encapsulated into a media track; the current media track is the media track to which the code stream to which it belongs is encapsulated.
  • the current media track is a component of the code stream encapsulated in a single track, and the corresponding code stream can be represented by the current media track alone.
  • multi_components_flag When the value of the multi-component flag field (multi_components_flag) is the second preset value (such as "1"), it means that the code stream to which the current media track belongs is encapsulated
  • the current media track is any one of the multiple media tracks to which the codestream to which it belongs is encapsulated. That is, the codestream to which the current media track belongs adopts a multi-track encapsulation method, and the current media track is one of the multiple media tracks to which it is encapsulated. At this time, the current media track needs to be combined with other media tracks belonging to the same codestream to represent the codestream.
  • the multi-component flag field (multi_components_flag) is used to indicate whether the code stream to which the current media project belongs is encapsulated into multiple media projects; when the value of the multi-component flag field is the first preset value (such as "0"), it indicates that the code stream to which the current media project belongs is encapsulated into one media project; the current media project is the media project obtained by encapsulating the code stream, and the corresponding code stream can be represented by the current media project alone.
  • the current media project is the media project to which the code stream to which it belongs is encapsulated; the current media project is the media project obtained by encapsulating the code stream, and the corresponding code stream can be represented by the current media project alone.
  • the value of the multi-component flag field is the second preset value (such as "1"), it indicates that the code stream to which the current media project belongs is encapsulated into multiple media projects, and the current media project is any one of the multiple media projects to which the code stream to which it belongs is encapsulated.
  • the current media project can be combined with other media projects belonging to the same code stream to represent the code stream.
  • Alternative group identification flag field (alternative_group_id_flag): If the alternative information data box is set in the current media track, it is used to indicate whether the alternative information data box in the current media track indicates the alternative group identifier to which the code stream corresponding to the current media track belongs. If the alternative information data box is set in the current media project, it is used to indicate whether the alternative information data box in the current media project indicates the alternative group identifier to which the code stream corresponding to the current media project belongs.
  • alternative_group_id_flag when the value of alternative_group_id_flag is 1, it indicates that the alternative information data box in the current media track indicates the alternative group identifier to which the code stream corresponding to the current media track belongs, and when the value of alternative_group_id_flag is 0, it indicates that the alternative information data box in the current media track does not indicate the alternative group identifier to which the code stream corresponding to the current media track belongs.
  • Alternative group identifier field (alternative_group_id): If the alternative information data box is set in the current media track, it is used to indicate the alternative group identifier to which the corresponding code stream of the current media track belongs. If the alternative information data box is set in the current media item, it is used to indicate the alternative group identifier to which the corresponding code stream of the current media item belongs. Schematically, the above alternative group identifier can be 1.
  • Multi-alternative stream identification field (multi_alternative_bitstream_flag): If the alternative information data box is set in the current media track, it is used to indicate whether the current media track belongs to multiple bitstreams; if the alternative information data box is set in the current media item, it is used to indicate whether the current media item belongs to multiple bitstreams.
  • bitstream number field (num_bitstream): if the replaceable information data box is set in the current media track, it is used to indicate the number of bitstreams belonging to the current media track; if the replaceable information data box is set in the current media item, it is used to indicate the number of bitstreams belonging to the current media item.
  • multi_alternative_bitstream_flag 1
  • the current media track belongs to multiple bitstreams
  • the number of bitstreams is indicated by num_bitstream.
  • Component reference field (components_ref_type): If the replaceable information data box is set in the current media track, it is used to indicate the association between the current media track and other media tracks belonging to the same code stream as the current media track; schematically, the value of components_ref_type is 0, indicating that the current media track is associated with other media tracks belonging to the same code stream through a track reference, and the track reference type field (track_ref_type) is used to indicate the type of track reference.
  • components_ref_type is 1, indicating that the current media track is associated with other media tracks belonging to the same code stream through a track group, and the type of the track group is indicated by the track group type field (track_group_type), and the identifier of the track group is indicated by the track group identifier field (track_group_id). If the replaceable information data box is set in the current media item, it is used to indicate the association between the current media item and other media items belonging to the same code stream as the current media item.
  • components_ref_type takes a value of 2, indicating that the current media item is associated with other media items belonging to the same code stream through an item reference, and the type of the item reference is indicated by the item reference type field (item_ref_type).
  • components_ref_type takes a value of 3, indicating that the current media item is associated with other media items belonging to the same code stream through an entity group, and the type of the entity group is indicated by the entity group type field (entity_group_type), and the identifier of the entity group is indicated by the entity group identifier field (entity_group_id).
  • Multi-components_flag If the alternative information data box is set in the current media track, it is used to indicate Indicates whether the code stream of the current media track is encapsulated into multiple media tracks. If the replaceable information data box is set in the current media item, it is used to indicate whether the code stream of the current media item is encapsulated into multiple media items.
  • the codestream number field (num_bitstream) and the component reference field (components_ref_type) may be defined.
  • the components_ref_type may indicate the association between the current media track and other media tracks belonging to the same codestream as the current media track.
  • the multi-component flag field multi_components_flag
  • multi_components_flag may be further used to indicate whether the codestream to which the current media track belongs is encapsulated into multiple media tracks.
  • the component reference field may be defined to further indicate the association between the current media track and other media tracks belonging to the same codestream as the current media track.
  • each replaceable information data box has a syntax as shown in Table 8 below.
  • the replaceable data box includes an alternative group identification flag field (alternative_group_id_flag), a multi-component flag field (multi_components_flag) and a component reference field (components_ref_type), and the relevant introduction can be found in the above description, which will not be repeated here.
  • the replaceable data box does not include a multi-alternative stream flag field (multi_alternative_bitstream_flag) and the judgment content under the corresponding indication.
  • the replaceable information data box shown in Table 8 above is set in the current media track, and the multi-component flag field (multi_components_flag) can be directly used to indicate whether the code stream to which the current media track belongs is encapsulated into multiple media tracks.
  • the component reference field (components_ref_type) is further defined to indicate the current media track and other media tracks belonging to the same code stream as the current media track.
  • the immersive media is point cloud media
  • the code stream of the point cloud media is a point cloud code stream
  • the corresponding replaceable information data box can be further simplified by limiting the type.
  • the code stream to which the current media track/current media project belongs is a point cloud code stream
  • the point cloud code stream is encapsulated in a multi-track encapsulation manner
  • the value of the multi-component flag field is a second preset value (such as "1").
  • the point cloud media is encapsulated in a multi-track manner
  • the value of multi_components_flag is 1.
  • components_ref_type 1
  • multiple media tracks corresponding to a point cloud code stream can be organized through a specific track group. Based on the above content, the judgment when the field values in the replaceable information data box are different can be omitted, thereby simplifying the content in the replaceable data box, improving the efficiency of organizing the media tracks corresponding to the multiple replaceable code streams, and saving the resources spent on searching for media tracks/media projects.
  • the media files of immersive media are obtained in different ways.
  • a decoding device can receive a complete media file of immersive media, which encapsulates multiple replaceable code streams and relationship indication information.
  • immersive media is transmitted in a streaming transmission mode, and obtaining the media files of immersive media includes the following steps: obtaining transmission signaling of immersive media, wherein the transmission signaling includes description information of the relationship indication information; and obtaining the media files of immersive media according to the transmission signaling.
  • the transmission signaling may be DASH signaling, MPD signaling, etc., and the transmission signaling can be acquired by the decoding device in the form of a signaling description file.
  • the description information is used to define the N code streams with a replaceable relationship indicated by the relationship indication information.
  • Preselection N preselection identifiers
  • each preselection identifier is used to represent one of the N code streams
  • each preselection identifier corresponds to one or more adaptation sets (Adaptation), and an adaptation set represents a media track or a media item in the code stream represented by each
  • the transmission signaling is DASH signaling
  • the preselection identifier Preselection included in the description information can be defined by the Preselection tool in the DASH signaling for different components of the same code stream (such as different media tracks/different media items), and the Preselection is used to represent one of the N code streams.
  • the N preselection identifiers are different to represent different code streams, for example, the preselection identifier Preselection1 corresponds to code stream 1, and the preselection identifier Preselection2 corresponds to code stream 2.
  • the code stream represented by the preselection identifier it can be represented by a combination of the adaptation sets (Adaptation) corresponding to the preselection identifier.
  • An adaptation set contains one identifier.
  • the number of adaptation sets (Adaptation) or the number of representations (Representation) corresponding to a preselection identifier is equal to the number of media tracks/media items of the code stream represented by the preselection identifier.
  • the pre-selected identifier corresponds to a representation/adaptation set, it means that the components of the code stream represented by the pre-selected identifier include a media track or a media item.
  • the decoding device can request the corresponding media file segments according to its own performance and the presentation requirements for the immersive media, and then decode the obtained media file segments for decapsulation and decoding, and present the immersive media.
  • the performance of the decoding device includes but is not limited to: the encoding method supported by the decoding device, the bandwidth supported by the decoding device, the processing power supported by the central processor CPU of the decoding device, the rendering power supported by the image processor GPU of the decoding device, etc.
  • the presentation requirements include but are not limited to: presentation clarity, presentation resolution, bit rate, size, viewing angle, viewing direction, etc.
  • S502 Decode the media file according to the relationship indication information to present immersive media.
  • decoding the media file according to the relationship indication information to present the immersive media may include the following steps: first, according to the replaceable relationship indicated by the relationship indication information, determining the code stream to be presented from the N replaceable code streams; and then decoding and presenting the code stream to be presented.
  • the decoding device can obtain a complete media file.
  • the immersive media is a time-series immersive media
  • the media file contains M media tracks corresponding to N code streams
  • the relationship indication information is set in the corresponding media track.
  • the decoding device can first decapsulate the media file to obtain M media tracks, and then determine the code stream to be presented for decoding according to the replaceable relationship indicated by the relationship indication information set in the media track.
  • the code stream to be presented is determined here specifically based on the relationship indication information to screen out all media tracks that can represent the code stream, which may be a combination of multiple media tracks or a single media track.
  • the corresponding media track can also be determined based on the device performance of the decoding device and the presentation requirements of the immersive media.
  • the code stream to be presented is decoded, specifically, the selected media track is decoded, so as to present the immersive media. It can be understood that if the code stream to be presented is represented by a media item, then the object of decoding is specifically the media item.
  • the decoding device may obtain the media file of the immersive media according to the transmission signaling.
  • the media file is obtained in the form of a fragment.
  • the fragment of the media file includes one or more media tracks, which can represent the code streams required to be presented in N code streams.
  • the one or more media tracks and the relationship indication information in the media tracks can be obtained, and then the media tracks can be further decoded and the immersive media can be presented according to the relationship indication information. If the fragment of the media file includes a media item, the media item can be decoded to present the immersive media.
  • the data processing method of immersive media can obtain a media file of immersive media, the immersive media includes N replaceable code streams, the media file includes relationship indication information, and the relationship indication information is used to indicate the replaceable relationship between the N code streams, N is an integer greater than 1; according to the relationship indication information, the media file is decoded to present the immersive media.
  • the relationship indication information can clearly indicate the replaceable relationship between any two code streams of the immersive media, and then guide the accurate decoding and presentation of the immersive media based on the replaceable relationship, thereby improving the presentation effect of the immersive media.
  • the relationship indication information can be set in the corresponding media track/media item, which can not only indicate the replaceable relationship at the code stream level, but also further support the combination relationship between multiple media tracks (or multiple media items) corresponding to the same code stream and/or the shared ownership relationship of media tracks (or media items) shared by different code streams.
  • the media track/media item corresponding to the code stream to be presented can be accurately obtained according to the relationship indication information, thereby realizing the presentation of any version of the content of the immersive media, thereby improving the presentation effect of the immersive media.
  • FIG. 6a is a flow chart of an immersive media data processing method provided in an embodiment of the present application.
  • the immersive media data processing method may be executed by a service device 202 in an immersive media data processing system.
  • the method includes the following steps S601-S602:
  • S601 Encode the immersive media to obtain N replaceable code streams.
  • the service device may encode the immersive media using different encoding methods to obtain N replaceable code streams of the immersive media.
  • the N code streams have the same content and different encoding types.
  • the immersive media may also be encoded according to different quality standards to obtain N replaceable code streams of the immersive media.
  • the N code streams have the same content and different qualities.
  • the N replaceable code streams can be regarded as different versions of code streams, and any two of the N code streams have a replaceable relationship. Based on the replaceable relationship, different code streams are allowed to replace each other during presentation.
  • the replaceability here includes but is not limited to any one or more of the following: quality replaceability, encoding type replaceability, and content replaceability.
  • a repeated geometric track can be omitted in the media file 610, that is, only one geometric track track1 is included in the media file, and relationship indication information can be added to the geometric track track1 to identify the geometric track.
  • Track1 is the shared media track.
  • a media file of immersive media can be formed.
  • relationship indication information may be added to the media item Mi.
  • the relationship indication information may be used to indicate the replaceable relationship between the code stream i and other code streams.
  • the other code streams refer to the code streams other than the code stream i among the N code streams.
  • relationship indication information may be added to any of the multiple media items, and the relationship indication information may also be used to indicate the association relationship between the media item and other media items of the code stream i, and the relationship indication information includes an indication of the association relationship.
  • the relationship indication information added to the corresponding media item may not only be used to indicate the replaceable relationship between the code stream i to which the media item belongs and other code streams, but may also indicate the association relationship between the media item and other media items belonging to the code stream i, so as to indicate that the combination of multiple media items may represent the code stream i.
  • the immersive media is encoded to obtain N code streams of the immersive media, and the N code streams have an interchangeable relationship.
  • Relationship indication information can be generated based on the interchangeable relationship, and the relationship indication information and the N code streams are encapsulated to obtain a media file of the immersive media. It can be seen that in the encoding process of the immersive media, relationship indication information can be added to the media file to indicate the interchangeable relationship between different code streams. In this way, the purpose of indicating the interchangeable relationship at the code stream level is achieved through the relationship indication information.
  • the interchangeable relationship between any two interchangeable code streams can be clearly indicated through the relationship indication information set in the corresponding media track/media project.
  • the number of interchangeable code streams of the immersive media has sufficient compatibility, strong versatility, and strong scalability.
  • the decoding end can flexibly organize the media tracks/media items corresponding to the code stream, accurately select the corresponding media tracks/media items, and then guide the decoding and accurate presentation of the immersive media, thereby improving the presentation effect of the immersive media.
  • the service device can obtain the immersive media and encode the immersive media to obtain N replaceable code streams.
  • the service device generates relationship indication information according to the replaceable relationship between the N code streams.
  • the service device encapsulates the relationship indication information and the N code streams to obtain a media file of the immersive media.
  • the immersive media is point cloud media
  • the bitstream of the immersive media is the point cloud bitstream
  • N is 3.
  • Encoding the point cloud media can obtain three point cloud bitstreams, which are recorded as bitstream 1, bitstream 2, and bitstream 3, and the three point cloud bitstreams are replaceable bitstreams with the same content but different qualities.
  • the geometric data of bitstream 1 and bitstream 2 are obtained in exactly the same encoding method, and bitstream 1 and bitstream 2 are encapsulated in a component-based multi-track encapsulation method.
  • a certain attribute (such as reflectivity) in bitstream 1 and bitstream 2 is also obtained in exactly the same encoding method, and bitstream 3 is encapsulated in a single-track encapsulation method.
  • FIG. 7 A schematic diagram of the encapsulation result of the media track contained in the media file is shown in Figure 7.
  • stream 1 and stream 2 share Track1 geometry track (GeometryTrack) and Track4 attribute track (AttributeTrack), and stream 1 and stream 2 have different color attribute tracks (AttributeTrack(color)).
  • the combination of Track1, Track2 and Track4 can represent stream 1, the combination of Track1, Track3 and Track4 can represent stream 2, and Track5 can represent stream 3.
  • Track5 is a single-track encapsulation method for stream 3.
  • the resulting track contains geometry and attribute data (Geometry+Attributes Track).
  • AlternativeInfoBox For Track1, two alternative information data boxes AlternativeInfoBox are set to indicate that Track1 is shared by two code streams (i.e., code stream 1 and code stream 2), and the values of the track group identification field track_group_id in different alternative data boxes are different.
  • a play track group data box PlayoutTrackGroupBox is also set to indicate that it needs to be played in conjunction with other media tracks.
  • the PlayoutTrackGroupBox contains the track group type field track_group_type and the track group identification field track_group_id, and the values are the same as the values of the corresponding fields in the alternative information data box.
  • the values of the track group identification field of the media tracks belonging to the same track group are the same.
  • Whether a single-track encapsulation method or a multi-track encapsulation method is adopted can be known by the value of the multi-component flag field multi_components_flag (such as a value of 0 indicates single-track encapsulation, and a value of 1 indicates multi-track encapsulation).
  • multi_components_flag such as a value of 0 indicates single-track encapsulation, and a value of 1 indicates multi-track encapsulation.
  • PlayoutTrackGroupBox is set, which also indicates that they need to be played in conjunction with other media tracks.
  • the specific way of combining is indicated by track_group_id.
  • Track1, Track2 and Track4 is the same, indicating that these tracks need to be played in conjunction.
  • Track1, Track3 and Track4 need to be played in conjunction in the same way.
  • Track5 only sets one alternative information data box AlternativeInfoBox, and the alternative group identifier is the same as the alternative group identifier contained in the AlternativeInfoBox in Track1,
  • the service device can transmit the media files of the immersive media to the decoding device.
  • the transmission of media files includes the following two transmission methods:
  • the service device may directly transmit the complete media file F to the decoding device, where the media file includes the relationship indication information.
  • the service device may adopt a streaming transmission method to transmit one or more segments Fs of the media file (eg, one or more media tracks of the media file) to the decoding device.
  • the service device During streaming transmission, the service device generates description information of the relationship indication information according to the replaceable relationship between the code streams, and the description information can define the N code streams with replaceable relationship indicated by the relationship indication information, and then sends the description information of the relationship indication information to the decoding device through transmission signaling, and the transmission signaling can be in the form of a signaling description file.
  • the decoding device can determine the replaceable relationship between the code streams according to the description information of the relationship indication information, and then obtain the code stream to be presented according to the transmission signaling.
  • the service device can generate a signaling description file based on the sharing and replaceable relationship between the geometric track and the attribute track.
  • the signaling description file contains description information of the relationship indication information.
  • the existing Preselection tool in the DASH signaling can be used to define different components of the same code stream (corresponding to the media track here) as a Preselection, and add the same encoding identifier @gpccId to the same point cloud content to represent different versions of the point cloud code stream.
  • the tracks Track1 to Track5 contained in the media file correspond to the adaptation set/representation Adaptation1/Representation1 to Adaptation5/Representation5, respectively.
  • the description information of the relationship indication information is as follows:
  • Adaptation1 is the adaptation set corresponding to track1
  • Adaptation2 is the adaptation set corresponding to track2
  • Adaptation3 is the adaptation set corresponding to track3
  • Adaptation4 is the adaptation set corresponding to track4
  • Adaptation5 is the adaptation set corresponding to track5.
  • the decoding device receives a media file of the immersive media, wherein the media file includes relationship indication information.
  • the decoding device decodes the media file according to the relationship indication information to present the immersive media.
  • the decoding device may receive the complete media file F, or obtain a segment Fs of the media file based on the transmission signaling.
  • the point cloud file F1 in the above example of the media file is used as an example for description.
  • the decoding device receives a complete point cloud file F1, which contains all media tracks corresponding to N replaceable code streams.
  • the decoding device can first decapsulate the point cloud file to obtain the media tracks contained therein, and then, based on the data box information set in the media track, it can be learned that there are three options for representing the code stream: 1 Track1+Track2+Track4; 2 Track1+Track3+Track4; 3 Track5.
  • the code stream to be presented can be selected, specifically, the track in the point cloud file F1 is selected, and then, based on the corresponding metadata information in the point cloud file, the selected track is decoded to present the point cloud media.
  • the decoding device needs to switch between different versions of the code stream, it can directly decode the code stream to be presented, thereby achieving more efficient switching, thereby improving the presentation efficiency when switching immersive media.
  • the decoding device first receives the signaling description file and parses the signaling description file to obtain the description information of the relationship indication information. Based on the description information, it can be known that the decoding device has the following options for representing the bitstream:
  • Representation1 is the representation corresponding to track1
  • Representation2 is the representation corresponding to track2
  • Representation3 is the representation corresponding to track3
  • Representation4 is the representation corresponding to track4
  • Representation5 is the representation corresponding to track5.
  • Representation1+Representation2+Representation4 corresponds to bitstream 1
  • Representation1+Representation3+Representation4 corresponds to bitstream 2
  • Representation5 corresponds to bitstream 3.
  • the decoding device can request the corresponding transport stream (corresponding to one or more tracks in the point cloud file, i.e., file fragments) Fs based on the transmission signaling according to the device performance and presentation requirements. Then, the decoding device can decapsulate the received file fragments and decode the media tracks to finally present the point cloud media. In this way, the decoding device does not need to receive all the media files, but can accurately obtain the required code stream based on the transmission signaling, thereby reducing the resource consumption of the immersive media presentation.
  • a service device may obtain immersive media, and encode the immersive media to obtain multiple replaceable code streams; then, relationship indication information is generated based on the replaceable relationship between the code streams, and the relationship indication information and the code streams are encapsulated to obtain a media file of the immersive media, and the media file is transmitted to a decoding device.
  • the decoding device may receive a media file, decode the immersive media based on the relationship indication information contained in the media file, and present the immersive media.
  • the replaceable relationship between different code streams of the immersive media can be effectively indicated through the relationship indication information, thereby guiding the decoding end to accurately present the immersive media based on its own needs, thereby improving the presentation accuracy and presentation effect of the immersive media.
  • the embodiment of the present application is more concise and efficient by encapsulating all media tracks corresponding to N code streams of immersive media in one media file, and indicating the replaceable relationship between each code stream based on relationship indication information.
  • this scheme only retains one media track shared by at least two code streams in the media file for the same media track, and indicates the relationship between the media track and the code stream through relationship indication information. This can save storage resources and accurately find the media file corresponding to the code stream for decoding. Furthermore, compared to encapsulating different code streams in one media file, omitting repeated media tracks in the media file, and indicating the replaceable relationship between media tracks. This solution is based on the relationship indication information indicating the replaceable relationship at the bitstream level, rather than the replaceable relationship at the track level/project level.
  • the corresponding media track/media project can be accurately selected based on the relationship indication information, and then the corresponding bitstream can be decoded to present the immersive media, which has a better presentation effect and better versatility.
  • FIG 8a is a schematic diagram of the structure of an immersive media data processing device provided in an embodiment of the present application.
  • the immersive media data processing device can be set in the computer device provided in the embodiment of the present application, and the computer device can be the decoding device mentioned in the above method embodiment.
  • the immersive media data processing device shown in Figure 8a can be a computer program (including program code) running in the computer device, and the immersive media data processing device can be used to execute some or all of the steps in the method embodiment shown in Figure 5.
  • the immersive media data processing device may include the following units:
  • the acquisition unit 801 is used to acquire a media file of the immersive media; the immersive media includes N replaceable code streams; the media file includes relationship indication information, the relationship indication information is used to indicate the replaceable relationship between the N code streams, and N is an integer greater than 1;
  • the processing unit 802 is configured to decode the media file according to the relationship indication information to present the immersive media.
  • the immersive media is sequential immersive media; N code streams are encapsulated as M media tracks in the media file; M is an integer and M is greater than or equal to N; and the relationship indication information is set in the media track.
  • any one of the N code streams is represented as code stream i, where i is a positive integer and i is less than or equal to N; code stream i is encapsulated into a media track Mi among the M media tracks; and relationship indication information is set in the media track Mi.
  • any one of the N code streams is represented as code stream i, where i is a positive integer and i is less than or equal to N; code stream i is encapsulated into multiple media tracks among M media tracks; relationship indication information is set in media track Mi ; media track Mi refers to any one of the multiple media tracks into which code stream i is encapsulated.
  • the relationship indication information is further used to indicate the association relationship between the media track Mi and other media tracks among the multiple media tracks except the media track Mi ; the association relationship is used to indicate that the media track Mi and other media tracks belong to the same code stream i.
  • any two code streams among N code streams are respectively represented as code stream i and code stream j, where i and j are positive integers and both are less than or equal to N; code stream i is encapsulated into a first plurality of media tracks among M media tracks; code stream j is encapsulated into a second plurality of media tracks among M media tracks; if the first plurality of media tracks and the second plurality of media tracks both include media track Mij , the relationship indication information is also used to indicate a shared ownership relationship of media track Mij ; the shared ownership relationship is used to indicate that media track Mij is a media track shared by code stream i and code stream j.
  • the immersive media is non-sequential immersive media; N code streams are encapsulated as P media items in the media file; P is an integer and P is greater than or equal to N; and the relationship indication information is set in the media item.
  • any one of the N code streams is represented as code stream i, where i is a positive integer and i is less than or equal to N; code stream i is encapsulated into multiple media items among P media items; relationship indication information is set in media item Pi ; media item Pi refers to any one of the multiple media items into which code stream i is encapsulated.
  • the relationship indication information is further used to indicate the association relationship between the media item Pi and other media items among the multiple media items except the media item Pi ; the association relationship is used to indicate that the media item Pi and other media items belong to the same code stream i.
  • the relationship indication information is also used to indicate the association relationship between the media item P i and other media items corresponding to the bitstream i;
  • other media items refer to media items other than media item Pi among the multiple media items into which codestream i is encapsulated; the association relationship is used to indicate that media item Pi and other media items belong to the same codestream i.
  • any two code streams among N code streams are respectively represented as code stream i and code stream j, where i and j are positive integers and both are less than or equal to N; code stream i is encapsulated into the first plurality of media projects among P media projects; code stream j is encapsulated into the second plurality of media projects among P media projects; if the first plurality of media projects and the second plurality of media projects both contain media project Pi j , the relationship indication information is also used to indicate the shared attribution relationship of media project Pi j ; the shared attribution relationship is used to indicate that media project Pi j is a media project shared by code stream i and code stream j.
  • N code streams are encapsulated as M media tracks in the media file; M is an integer and M is greater than or equal to N; wherein, when multiple media tracks in the M media tracks need to be played together, Multiple media tracks that need to be played together belong to the same playback track group;
  • the N code streams are encapsulated as P media items in the media file; P is an integer and P is greater than or equal to N; wherein, when there are multiple media items among the P media items that need to be played jointly, the multiple media items that need to be played jointly belong to the same playback entity group.
  • the N code streams with a replaceable relationship belong to the same replaceable group, and different code streams in the same replaceable group are allowed to replace each other when presented;
  • the relationship indication information includes a replaceable information data box;
  • the replaceable information data box contains information about the replaceable group to which the corresponding code stream of the current media track belongs;
  • the replaceable information data box contains information about the replaceable group to which the corresponding bitstream of the current media item belongs;
  • the current media track refers to the media track being decoded; and the current media item refers to the media item being decoded.
  • the replaceable information data box includes an replaceable group identification flag field and an replaceable group identification field;
  • the replaceable group identification flag field is used to indicate whether the replaceable information data box in the current media track indicates the replaceable group identifier to which the code stream corresponding to the current media track belongs; when the value of the replaceable group identification flag field is a first preset value, it means that the replaceable information data box in the current media track indicates the replaceable group identifier to which the code stream corresponding to the current media track belongs; when the value of the replaceable group identification flag field is a second preset value, it means that the replaceable information data box in the current media track does not indicate the replaceable group identifier to which the code stream corresponding to the current media track belongs; the replaceable group identification field is used to indicate the replaceable group identifier to which the code stream corresponding to the current media track belongs;
  • the replaceable group identification flag field is used to indicate whether the replaceable information data box in the current media project indicates the replaceable group identifier to which the corresponding code stream of the current media project belongs; when the value of the replaceable group identification flag field is a first preset value, it means that the replaceable information data box in the current media project indicates the replaceable group identifier to which the corresponding code stream of the current media project belongs; when the value of the replaceable group identification flag field is a second preset value, it means that the replaceable information data box in the current media project does not indicate the replaceable group identifier to which the corresponding code stream of the current media project belongs; the replaceable group identification field is used to indicate the replaceable group identifier to which the corresponding code stream of the current media project belongs.
  • the relationship indication information is also used to indicate the shared ownership relationship of the current media track or the shared ownership relationship of the current media item;
  • the replaceable information data box includes a multiple replacement code stream flag field and a code stream quantity field;
  • the multiple replacement code stream flag field is used to indicate whether the current media track belongs to multiple code streams; when the value of the multiple replacement code stream flag field is a first preset value, it indicates that the current media track belongs to only one code stream; when the value of the multiple replacement code stream flag field is a second preset value, it indicates that the current media track belongs to multiple code streams; the code stream number field is used to indicate the number of code streams to which the current media track belongs;
  • the multiple replacement code stream flag field is used to indicate whether the current media item belongs to multiple code streams; when the value of the multiple replacement code stream flag field is a first preset value, it indicates that the current media item belongs to only one code stream; when the value of the multiple replacement code stream flag field is a second preset value, it indicates that the current media item belongs to multiple code streams; the code stream quantity field is used to indicate the number of code streams to which the current media item belongs.
  • the relationship indication information is also used to indicate the shared ownership relationship of the current media track or the shared ownership relationship of the current media item
  • the current media track contains only one replaceable information data box, it indicates that the current media track belongs to only one code stream; if the current media track contains multiple replaceable information data boxes, it indicates that the current media track belongs to multiple code streams, and the number of replaceable information data boxes in the current media track should be equal to the number of code streams to which the current media track belongs; or,
  • the current media project contains only one replaceable information data box, it indicates that the current media project belongs to only one code stream; if the current media project contains multiple replaceable information data boxes, it indicates that the current media project belongs to multiple code streams; the number of replaceable information data boxes in the current media project should be equal to the number of code streams to which the current media project belongs.
  • the relationship indication information is further used to indicate the association relationship between the current media track and other media tracks belonging to the same code stream as the current media track, or to indicate the association relationship between the current media item and other media items belonging to the same code stream as the current media item;
  • the replaceable information data box includes a component reference type field, which is used to indicate the association between the current media track and other media tracks belonging to the same code stream as the current media track, or to indicate the current media item and other media tracks belonging to the current media item. How to associate other media items in the same stream;
  • the replaceable information data box When the value of the component reference type field is a first preset value, it indicates that the current media track is associated with other media tracks belonging to the same code stream as the current media track through a track reference, and the replaceable information data box further includes a track reference type field, and the track reference type field is used to indicate the type of the track reference;
  • the replaceable information data box further includes a track group type field and a track group identifier field
  • the track group type field is used to indicate the type of the track group to which the current media track belongs
  • the track identifier field is used to indicate the identifier of the track group to which the current media track belongs
  • the replaceable information data box When the value of the component reference type field is a third preset value, it indicates that the current media item is associated with other media items belonging to the same code stream as the current media item through an item reference, and the replaceable information data box further includes an item reference type field, which is used to indicate the type of the item reference;
  • the replaceable information data box When the value of the component reference type field is the fourth preset value, it indicates that the current media item is associated with other media items belonging to the same code stream as the current media item through an entity group, and the replaceable information data box also includes an entity group type field and an entity group identification field, the entity group type field is used to indicate the type of entity group to which the current media item belongs; the entity group identification field is used to indicate the identifier of the entity group to which the current media item belongs.
  • the replaceable information data box further includes a multi-component flag field
  • the multi-component flag field is used to indicate whether the code stream belonging to the current media track is encapsulated into multiple media tracks; when the value of the multi-component flag field is a first preset value, it indicates that the code stream belonging to the current media track is encapsulated into one media track, and the current media track is the media track to which the code stream belonging to the current media track is encapsulated; when the value of the multi-component flag field is a second preset value, it indicates that the code stream belonging to the current media track is encapsulated into multiple media tracks, and the current media track is any one of the multiple media tracks to which the code stream belonging to the current media track is encapsulated;
  • the multi-component flag field is used to indicate whether the code stream belonging to the current media project is encapsulated into multiple media projects; when the value of the multi-component flag field is a first preset value, it means that the code stream belonging to the current media project is encapsulated into one media project, and the current media project is the media project into which the code stream belonging to the current media project is encapsulated; when the value of the multi-component flag field is a second preset value, it means that the code stream belonging to the current media project is encapsulated into multiple media projects, and the current media project is any one of the multiple media projects into which the code stream belonging to the current media project is encapsulated;
  • the value of the multi-component flag field is the second preset value.
  • the immersive media is transmitted in a streaming transmission manner
  • the acquisition unit 801 is specifically configured to: acquire transmission signaling of the immersive media; and acquire a media file of the immersive media according to the transmission signaling.
  • the transmission signaling includes description information of the relationship indication information, the description information is used to define N code streams with replaceable relationships indicated by the relationship indication information; the description information includes N pre-selected identifiers, each pre-selected identifier is used to represent one of the N code streams; each pre-selected identifier has the same coding identifier; each pre-selected identifier corresponds to one or more adaptation sets, and one adaptation set represents a media track or a media item in the code stream represented by each pre-selected identifier; or each pre-selected identifier corresponds to one or more representations, and one representation represents a media track or a media item in the code stream represented by each pre-selected identifier.
  • the processing unit 802 is specifically used to: determine the code stream to be presented from the N replaceable code streams according to the replaceable relationship indicated by the relationship indication information; decode and present the code stream to be presented; wherein the immersive media includes any one or more of the following: volumetric media, volumetric video media, multi-view video media, subtitle media, and audio media.
  • FIG 8b is a schematic diagram of the structure of an immersive media data processing device provided in an embodiment of the present application.
  • the immersive media data processing device can be set in the computer device provided in the embodiment of the present application, and the computer device can be the service device mentioned in the above method embodiment.
  • the immersive media data processing device shown in Figure 8b can be a computer program (including program code) running in a computer device, and the immersive media data processing device can be used to execute some or all of the steps in the method embodiment shown in Figure 6a.
  • the immersive media data processing device may include the following units:
  • the encoding unit 811 is used to encode the immersive media to obtain N replaceable code streams;
  • the processing unit 812 is used to generate relationship indication information according to the replaceable relationship between the N code streams, and the relationship indication information is used to indicate Shows the replaceable relationship between N code streams;
  • the processing unit 812 is further configured to encapsulate the relationship indication information and the N code streams to obtain a media file of the immersive media.
  • the encoding end of the immersive media can encode the immersive media to obtain N code streams of the immersive media, and the N code streams have an interchangeable relationship.
  • relationship indication information for indicating the interchangeable relationship can be generated, and the relationship indication information and the N code streams can be encapsulated to obtain a media file of the immersive media. It can be seen that in the encoding process of the immersive media, relationship indication information can be added to the media file to indicate the interchangeable relationship between different code streams. In this way, the purpose of indicating the interchangeable relationship at the code stream level is achieved through the relationship indication information. Based on the relationship indication information, the decoding of the immersive media can be guided to be accurately presented, thereby improving the presentation effect of the immersive media.
  • the embodiment of the present application also provides a structural diagram of a computer device, and the structural diagram of the computer device can be seen in Figure 9;
  • the computer device may include: a processor 901, an input device 902, an output device 903 and a memory 904.
  • the processor 901, the input device 902, the output device 903 and the memory 904 are connected via a bus.
  • the memory 904 is used to store a computer program, and the computer program includes program instructions.
  • the processor 901 is used to execute the program instructions stored in the memory 904.
  • the computer device may be the above-mentioned decoding device; in this embodiment, the processor 901 executes the above-mentioned immersive media data processing method by running the executable program code in the memory 904 .
  • the embodiment of the present application also provides a computer-readable storage medium, and a computer program is stored in the computer-readable storage medium, and the computer program includes program instructions.
  • the processor executes the above program instructions, it can execute the method in the embodiment corresponding to Figures 5 and 6a above, so it will not be repeated here.
  • the program instructions can be deployed on a computer device, or executed on multiple computer devices located at one location, or, executed on multiple computer devices distributed in multiple locations and interconnected by a communication network.
  • a computer program product comprising a computer program, the computer program being stored in a computer-readable storage medium.
  • a processor of a computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program, so that the computer device can execute the method in the embodiments corresponding to FIG. 5 and FIG. 6a above, and therefore, will not be described in detail here.
  • the storage medium can be a disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

本申请实施例提供了一种沉浸媒体的数据处理方法、装置、计算机设备、存储介质及程序产品,其中方法包括:获取沉浸媒体的媒体文件,该沉浸媒体包括可替换的N个码流,该媒体文件包括关系指示信息,且关系指示信息用于指示N个码流之间的可替换关系,N为大于1的整数;根据关系指示信息,对媒体文件进行解码处理,以呈现沉浸媒体。

Description

沉浸媒体的数据处理方法、装置、计算机设备、存储介质及程序产品
本申请要求于2023年3月7日提交中国专利局、申请号为202310247101.8、申请名称为“一种沉浸媒体的数据处理方法及相关设备”的中国专利申请的优先权。
技术领域
本申请涉及音视频技术领域,具体涉及一种沉浸媒体的数据处理方法、一种沉浸媒体的数据处理装置、一种计算机设备、一种计算机可读存储介质及一种计算机程序产品。
发明背景
沉浸媒体可被编码为相互之间可替换的码流,来满足对沉浸媒体的不同呈现需求,例如,编码质量不同但内容相同的两个码流之间可相互替换;再如:编码类型不同但内容相同的两个码流之间可相互替换。对于可替换的多个码流,需要在解码侧进行相应指示,以指导沉浸媒体的解码呈现过程。
然而,现有关于沉浸媒体的编码标准,对于可替换的码流的指示尚不够清晰明确,从而影响沉浸媒体的呈现效果。
发明内容
本申请实施例提供了一种沉浸媒体的数据处理方法、装置、计算机设备、存储介质及程序产品,能够清晰地指示出码流之间的可替换关系,提升沉浸媒体的呈现效果。
一方面,本申请实施例提供了一种沉浸媒体的数据处理方法,由计算机设备执行,该方法包括:
获取沉浸媒体的媒体文件;沉浸媒体包括可替换的N个码流;媒体文件包括关系指示信息,关系指示信息用于指示N个码流之间的可替换关系,N为大于1的整数;及,
根据关系指示信息,对媒体文件进行解码处理,以呈现沉浸媒体。
一方面,本申请实施例提供了另一种沉浸媒体的数据处理方法,由计算机设备执行,该方法包括:
对沉浸媒体进行编码处理,得到可替换的N个码流;
根据N个码流之间的可替换关系,生成关系指示信息,关系指示信息用于指示N个码流之间的可替换关系;及,
对关系指示信息和N个码流进行封装处理,得到沉浸媒体的媒体文件。
一方面,本申请实施例提供了一种沉浸媒体的数据处理装置,该装置包括:
获取单元,用于获取沉浸媒体的媒体文件;沉浸媒体包括可替换的N个码流;媒体文件包括关系指示信息,关系指示信息用于指示N个码流之间的可替换关系,N为大于1的整数;及,
处理单元,用于根据关系指示信息,对媒体文件进行解码处理,以呈现沉浸媒体。
一方面,本申请实施例提供了另一种沉浸媒体的数据处理装置,该装置包括:
编码单元,用于对沉浸媒体进行编码处理,得到可替换的N个码流;
处理单元,用于根据N个码流之间的可替换关系,生成关系指示信息,关系指示信息用于指示N个码流之间的可替换关系;
处理单元,还用于对关系指示信息和N个码流进行封装处理,得到沉浸媒体的媒体文件。
一方面,本申请实施例提供一种计算机设备,该计算机设备包括:
处理器,适用于执行计算机程序;
计算机可读存储介质,计算机可读存储介质中存储有计算机程序,计算机程序被处理器执行时,实现如上述沉浸媒体的数据处理方法。
一方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器加载并执行如上述沉浸媒体的数据处理方法。
一方面,本申请实施例提供了一种计算机程序产品,该计算机程序产品包括计算机程序或计算机指令,该计算机程序或计算机指令被处理器执行时实现上述沉浸媒体的数据处理方法。
附图简要说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1a是本申请一个示例性实施例提供的一种6DoF的示意图;
图1b是本申请一个示例性实施例提供的一种3DoF的示意图;
图1c是本申请一个示例性实施例提供的一种3DoF+的示意图;
图2是本申请一个示例性实施例提供的一种数据处理系统的架构图;
图3a是本申请一个示例性实施例提供的一种基于单轨封装的封装结果的示意图;
图3b是本申请一个示例性实施例提供的一种基于组件的多轨封装的封装结果的示意图;
图3c是本申请一个示例性实施例提供的一种基于分片的多轨封装的封装结果的示意图;
图3d是本申请一个示例性实施例提供的另一种基于分片的多轨封装的封装结果的示意图;
图4a是本申请一个示例性实施例提供的一种沉浸媒体的数据处理的流程图;
图4b是本申请一个示例性实施例提供的一种沉浸媒体的封装结果的示意图;
图5是本申请一个示例性实施例提供的一种沉浸媒体的数据处理方法的流程示意图;
图6a是本申请一个示例性实施例提供的另一种沉浸媒体的数据处理方法的流程示意图;
图6b是本申请一个示例性实施例提供的一种媒体文件的内容示意图;
图7是本申请一个示例性实施例提供的另一种媒体文件的内容示意图;
图8a是本申请一个示例性实施例提供的一种沉浸媒体的数据处理装置的结构示意图;
图8b是本申请一个示例性实施例提供的另一种沉浸媒体的数据处理装置的结构示意图;
图9是本申请一个示例性实施例提供的一种计算机设备的结构示意图。
实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。本申请中术语“至少一个”是指一个或多个,“多个”的含义是指两个或两个以上;例如:多个码流是指两个或两个以上的码流,至少一个媒体轨道是指一个或多个媒体轨道。
下面对本申请中涉及的其他技术术语进行介绍:
一、沉浸媒体
沉浸媒体是指能够提供沉浸式的媒体内容,使沉浸于该媒体内容中的观看者能够获得现实世界中视觉、听觉等感官体验的媒体文件。沉浸媒体也可称为沉浸式媒体,沉浸式媒体按照观看者在观看媒体内容时的自由度,可以分为:6DoF(Degree of Freedom)沉浸媒体,3DoF沉浸媒体,3DoF+沉浸媒体。
其中,如图1a所示,6DoF是指沉浸媒体的观看者,可以沿着X轴、Y轴、Z轴自由平移,例如,沉浸媒体的观看者可以在三维的360度(Virtual Reality,VR)内容中自由的走动。
与6DoF相类似的,还有3DoF和3DoF+制作技术。图1b为本申请实施例提供的一种3DoF的示意图;如图1b所示,3DoF是指沉浸媒体的观看者在一个三维空间的中心点固定,沉浸媒体的观看者头部沿着X轴、Y轴和Z轴旋转,来观看媒体内容提供的画面。图1c为本申请实施例提供的一种3DoF+的示意图,如图1c所示,3DoF+是指当沉浸媒体提供的虚拟场景具有一定的深度信息,沉浸媒体的观看者头部,可以基于3DoF,在一个有限的空间内移动,来观看媒体内容提供的画面。
按照沉浸媒体的时序特点,沉浸媒体包括时序沉浸媒体和非时序沉浸媒体,时序沉浸媒体中的信号之间具备时间先后顺序,非时序沉浸媒体中的信号之间不具备时间先后顺序。
按照沉浸媒体的信号特点,沉浸媒体包括但不限于:体积媒体、容积视频媒体、多视角视频媒体、 字幕媒体以及音频媒体等等。体积媒体是指三维内容的媒体,如体积媒体可以是点云媒体(一种典型的6DoF沉浸媒体)。
二、可替换关系
沉浸媒体可以被编码为可替换的多个码流。可替换的不同码流之间具备可替换关系,该可替换关系是指可以相互替换的关系。基于可替换的码流之间具备可替换关系,可替换的N个码流在呈现时允许相互替换。可替换的不同码流之间可以具有相同内容且不同质量,或者具备相同内容且不同编码类型。举例来说,对点云媒体编码得到的不同分辨率的码流之间具备可替换关系。又例如,对点云媒体采用有损编码方式编码得到的码流和采用无损编码方式得到的码流,互为可替换码流。
三、点云和点云媒体
点云是指空间中一组无规则分布的、表达三维物体或场景的空间结构及表面属性的离散点集。点云中的每个点至少包括几何数据,该几何数据用于表示点的三维位置信息。根据应用场景的不同,点云中的点还可以包括一组或多组属性数据,每一组属性数据用于反映点所具备的一种属性,该属性例如可以是色彩、材质或其他信息。通常,点云中的每个点都具有相同组数的属性数据。
点云可以灵活方便地表达三维物体或场景的空间结构及表面属性,因而应用广泛,可以应用于虚拟现实VR游戏、计算机辅助设计(Computer Aided Design,CAD)、地理信息系统(Geography Information System,GIS)、自动导航系统(Autonomous Navigation System,ANS)、数字文化遗产、自由视点广播、三维沉浸远程呈现、生物组织器官三维重建等场景中。
点云的获取主要有以下途径:计算机生成、三维(3-Dimension,3D)激光扫描、3D摄影测量等。具体来说,点云可以是通过采集设备(一组摄像机或具有多个镜头和传感器的摄像机设备)对现实世界的视觉场景进行采集得到的。通过3D激光扫描可以获得静态现实世界三维物体或场景的点云,每秒可以获取百万级点云;通过3D摄影可以获得动态现实世界三维物体或场景的点云,每秒可以获取千万级点云。此外,在医学领域,可以通过磁共振成像(Magnetic Resonance Imaging,MRI)、电子计算机断层扫描(Computed Tomography,CT)、电磁定位信息获得生物组织器官的点云。又如,点云还可以由计算机根据虚拟三维物体及场景直接生成,如计算机可以生成虚拟三维物体及场景的点云。伴随着大规模的点云数据不断积累,点云数据的高效存储、传输、发布、共享和标准化,成为点云应用的关键。
点云媒体包括由一帧或多帧点云帧按序构成的点云序列,每帧点云帧由点云中的一个或多个点具备的几何数据和属性数据共同组成。点云中的某个点可以包括一组或多组属性数据,每一组属性数据用于反映点所具备的一种属性,例如,点云中某个点具备一组颜色属性数据,该颜色属性数据用于反映该点的颜色属性(如红色、黄色等等);再如,点云中的某个点具备一组反射率属性数据,该反射率属性数据用于反映该点的激光反射强度属性。当点云中某个点具备多组属性数据时,该多组属性数据的类型可以相同,也可以不同,例如,点云中某个点可以具有一组颜色属性数据和一组反射率属性数据;再如,点云中某个点可以具有两组颜色属性数据,这两组颜色属性数据分别用于反映该点在不同时刻下的颜色属性。
四、轨道(Track)
轨道是指媒体文件封装过程中的媒体数据集合,一个轨道中由多个具备时序的样本组成。一个媒体文件中可包含一个或多个轨道。示意性的,例如一个视频媒体文件可以包含但不限于:视频媒体轨道、音频媒体轨道及字幕媒体轨道。特别地,元数据信息也可以作为一种媒体类型,以元数据轨道的形式包含于媒体文件中。所谓元数据信息是对与沉浸媒体的呈现有关的信息的总称,该元数据信息可包括对沉浸媒体的媒体内容的描述信息。在本申请实施例中,时序沉浸媒体以轨道的形式包含于沉浸媒体的媒体文件中,轨道也可称为媒体轨道。
五、样本(Sample)
样本是媒体文件封装过程中的封装单位,一个轨道由很多个样本组成,例如:一个视频媒体轨道可以由很多个样本组成,一个样本通常为一个视频帧。在本申请实施例中,如前述,时序沉浸媒体可以以轨道的形式包含于该时序沉浸媒体的媒体文件中,该轨道中包含一个或多个样本,每个样本可以包含该时序沉浸媒体中的一个或多个触觉信号。
六、样本入口(Sample Entry)
样本入口用于指示轨道中所有样本相关的元数据信息。例如:在视频媒体轨道的样本入口中,通常会包含解码设备初始化相关的元数据信息。又如:在容积媒体轨道的样本入口中,可以包含用于指示码流之间的可替换关系的关系指示信息等等。
七、项目(Item)
项目是媒体文件封装过程中非时序媒体数据的封装单元。例如:一幅静态图片可以被封装为一个项目。本申请实施例中,非时序沉浸媒体可以被封装为一个或多个项目。本申请实施例中,项目也可称为媒体项目。
八、基于ISO标准的媒体文件格式(ISO Based Media File Format,ISOBMFF)
ISOBMFF是媒体文件的封装标准,较为典型的ISOBMFF文件即为MP4文件。
九、基于HTTP的动态自适应流(Dynamic Adaptive Streaming over HTTP,DASH)
DASH是一种自适应比特率技术,使高质量流媒体可以通过传统的HTTP网络服务器在互联网传递。
十、DASH中的媒体演示描述信令(Media Presentation Description,MPD):MPD用于描述媒体文件中的媒体片段信息。
十一、表示(Representation):
Representation是指DASH中一个或多个媒体成分的组合,比如某种分辨率的视频文件可以看作一个Representation。例如:某种时域层级的视频文件可以看作一个Representation。
十二、自适应集合(Adaptation Sets):Adaptation Sets是指DASH中一个或多个视频流的集合,一个Adaptation Sets中可以包含多个Representation。本申请实施例中,自适应集合可简称为Adaption。
基于上述相关描述,本申请实施例提供了一种沉浸媒体的数据处理方案,该方案包括沉浸媒体的编码端的处理流程,以及沉浸媒体的解码端的处理流程。
(一)编码端的处理流程大致如下:
①对沉浸媒体进行编码处理,得到沉浸媒体的可替换的N个码流,N为大于1的整数;
②根据沉浸媒体的N个码流之间的可替换关系,生成关系指示信息,该关系指示信息用于指示N个码流之间的可替换关系;
③对关系指示信息和N个码流进行封装,得到沉浸媒体的媒体文件。
(二)解码端的处理流程大致如下:
①获取沉浸媒体的媒体文件,其中,沉浸媒体包括可替换的N个码流,媒体文件包括关系指示信息,该关系指示信息用于指示N个码流之间的可替换关系,N为大于1的整数;
②根据关系指示信息,对媒体文件进行解码处理,以呈现沉浸媒体。
由上述方案可知,本申请实施例可以在沉浸媒体的编码过程中,在沉浸媒体的媒体文件中添加关系指示信息,通过该关系指示信息可指示沉浸媒体的可替换的多个码流之间的可替换关系,基于该可替换关系可指导解码端准确地解码出沉浸媒体,保证沉浸媒体的呈现准确性,提升沉浸媒体的呈现效果。
基于上述描述,下面结合图2对适于实现本申请实施例提供的沉浸媒体的数据处理系统进行介绍。如图2所示,该沉浸媒体的数据处理系统20中可以包括服务设备201和解码设备202,服务设备201可作为沉浸媒体的编码端;该服务设备201可以是终端设备,也可以是服务器。解码设备202可作为沉浸媒体的解码端,该解码设备202可以是终端设备,也可以是服务器。服务设备201和解码设备202之间可以建立通信连接。其中,终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、车载终端、智能电视等,但并不局限于此。服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。
其中,服务设备201和解码设备202执行该沉浸媒体的数据处理的具体流程如下:针对服务设备201主要包括以下数据处理过程:
(1)沉浸媒体的获取过程;
(2)沉浸媒体的编码及文件封装的过程。
针对解码设备202主要包括以下数据处理过程:
(3)沉浸媒体的文件解封装及解码的过程;
(4)沉浸媒体的呈现过程。
此外,服务设备201与解码设备202之间涉及沉浸媒体的传输过程,该传输过程可以基于各种传输协议(或者传输信令)来进行,此处的传输协议可包括但不限于:DASH(Dynamic Adaptive Streaming over HTTP,动态自适应流媒体传输)协议、HLS(HTTP Live Streaming,动态码率自适应传输)协议、SMTP(Smart Media Transport Protocol,智能媒体传输协议)、TCP(Transmission Control Protocol,传输控制协议)等。
下面对沉浸媒体的数据处理过程进行详细描述:
(1)沉浸媒体的获取过程。
服务设备201可以获取沉浸媒体,沉浸媒体可以通过场景捕获或设备生成两种方式获取得到。
场景捕获沉浸媒体,是指通过服务设备201关联的捕获设备采集真实世界的视觉场景得到沉浸媒体;其中,捕获设备用于为服务设备201提供沉浸媒体的获取服务,捕获设备可以包括但不限于以下任一种:摄像设备、传感设备、扫描设备;其中,摄像设备可以包括普通摄像头、立体摄像头、以及光场摄像头等。传感设备可以包括激光设备、雷达设备等。扫描设备可以包括三维激光扫描设备等。服务设备201关联的捕获设备可以是指设置于服务设备201中的硬件组件,例如捕获设备是终端的摄像头、传感器等,服务设备201关联的捕获设备也可以是指与服务设备201相连接的硬件装置,例如与服务设备201相连接的摄像头等。
设备生成沉浸媒体,是指服务设备201根据虚拟对象(例如通过三维建模得到的虚拟三维物体及虚拟三维场景)生成沉浸媒体。上述沉浸媒体可以是点云媒体,也可以是其他媒体,例如多视角视频媒体、容积视频媒体、音频媒体、触觉媒体、字幕媒体等等。其中,触觉媒体是指媒体类型为触觉类型的沉浸媒体,能够为消费者提供现实世界中的触觉的感官体验的媒体文件。
(2)沉浸媒体的编码以及文件封装的过程。
①服务设备201可以对沉浸媒体进行编码处理,得到沉浸媒体的可替换的N个码流,N为大于1的整数。在一种实施方式中,沉浸媒体为点云媒体,可采用点云编码方式(Point Cloud Compression,PCC)对获取的点云媒体进行编码,得到点云媒体的可替换的N个码流。例如采用G-PCC(Geometry-based Point Cloud Compression,基于几何结构的点云编码)对获取的点云媒体中的几何数据和属性数据进行编码处理,得到点云媒体的不同版本的几何码流和属性码流。
②服务设备201根据沉浸媒体的N个码流之间的可替换关系,生成关系指示信息。该可替换关系是指任意两个码流之间互相替换的关系。所生成的关系指示信息用于指示N个码流之间的可替换关系。
进一步地,在生成关系指示信息之后,服务设备201可对关系指示信息和沉浸媒体的N个码流进行封装处理,得到沉浸媒体的媒体文件。
其中,对沉浸媒体的N个码流进行封装处理可以包含以下几种方式:
a、若沉浸媒体为时序沉浸媒体,那么针对该沉浸媒体的N个码流中的任一码流可封装至一个或多个媒体轨道中。
对沉浸媒体的N个码流中的任一码流的封装可采用单轨封装方式(一个码流封装至一个媒体轨道)或者多轨封装方式(一个码流封装至多个媒体轨道)。下面以沉浸媒体为点云媒体为例进行说明,对点云码流(即点云媒体的码流)的封装方式进行介绍,存在以下三种:(1)单轨封装;(2)基于组件的多轨封装;(3)基于分片的多轨封装。
(1)单轨封装
采用单轨封装方式对一个码流进行封装处理可得到一个媒体轨道。该媒体轨道包含一个样本入口和至少一个样本,每个样本包含:参数信息、几何数据以及属性数据。示意性地,如图3a所示的基于单轨封装的封装结果的示意图,对点云码流进行封装得到的媒体轨道310中存储有的几何点云的样本312和313,样本入口为311。
(2)多轨封装
采用多轨封装方式对码流进行封装处理可得到多个媒体轨道。按照封装单位的不同,多轨封装方式可包括基于组件的多轨封装和基于分片的多轨封装。
在基于组件的多轨封装方式下,码流被封装至的媒体轨道包括几何轨道和属性轨道。其中,几何轨道包含一个样本入口以及至少一个样本,每个样本包含参数信息和几何数据;属性轨道包含一个样本入口以及至少一个样本,每个样本包含参数信息和属性数据。示意性的,如图3b示出了一种基于组件的多轨封装的封装结果的示意图。其中,点云媒体被封装为了1个几何组件轨道321和2个属性组件轨道322、323,且不同属性组件轨道包含的属性数据不同,例如图3b中的属性1数据324和属性2数据325,几何组件轨道和两个属性组件轨道均存在关联,如虚线箭头所示。
在基于分片的多轨封装方式下,码流可被封装为基于分片的媒体轨道,包括一个分片基础轨道(slice base track)以及多个分片轨道(slice track)。分片基础轨道中每个样本包含几何头和属性头;分片轨道中的每个样本包含一个或者多个分片(slice)。在一种实现方式中,每个分片包含几何片头、几何数据、属性片头以及属性数据。示意性地,如图3c所示基于分片的多轨封装的封装结果的示意图。其中,包含一个分片基础轨道331以及两个分片轨道332、333,分片轨道332包括分片1和2,分片轨道333包括分片3,且分片基础轨道与两个分片轨道均存在关联,如虚线箭头所示。
在另一种实现方式中,分片轨道可包含几何轨道以及属性轨道,几何轨道中的每个分片包含几何片头和几何数据,属性轨道中的每个分片包含属性片头和属性数据。几何轨道和属性轨道之间关联,分片基础轨道可关联至少一个几何分片轨道。示意性地,如图3d示出了另一种基于分片的多轨封装的封装结果的示意图。其中,包括1个分片基础轨道(slice base track)341、2个几何组件轨道342、344以及2个属性组件轨道343、345。几何数据和属性数据分别封装在不同轨道的分片中,一个几何组件轨道(如344)和一个属性组件轨道(如345)之间是关联的,该关联体现在其样本中的数据来自于同一分片(如分片3);分片基础轨道(slice base track)341和各个几何组件轨道342、344关联,如虚线箭头所示。
沉浸媒体的任一码流可采用单轨封装或者多轨封装,本申请在此不做限制。在封装得到媒体轨道之后,可将关系指示信息添加至相应媒体轨道中,形成沉浸媒体的媒体文件。示意性地,关系指示信息可添加在相应媒体轨道的样本入口。具体地,对于关系指示信息的设置可包括以下几种:
①若N个码流中任一码流封装至多个媒体轨道中,那么可将关系指示信息添加至该多个媒体轨道中的任一个,以指示该媒体轨道与其他媒体轨道组合对应一个码流。
②若N个码流中任一码流封装至一个媒体轨道中,那么可将关系指示信息添加至该媒体轨道中,以指示该媒体轨道所属码流与N个码流中的其他码流之间的可替换关系。
③若N个码流中至少两个码流均被封装至多个媒体轨道中,并且任意两个码流对应的媒体轨道中包含相同的媒体轨道,那么可将关系指示信息添加至该相同的媒体轨道中,以指示该媒体轨道被不同码流共享。
b、若沉浸媒体为非时序沉浸媒体,那么针对该沉浸媒体的N个码流中的任一码流,可封装至一个或多个媒体项目中。另外,可将关系指示信息添加至相应媒体项目中,进而形成沉浸媒体的媒体文件。对于关系指示信息添加至相应媒体项目中,类似与上述媒体轨道中关系指示信息的添加,在此不做详述。
在得到沉浸媒体的媒体文件之后,服务设备201可向解码设备202发送该媒体文件。
(3)沉浸媒体的文件解封装以及解码过程。
解码设备202可以通过服务设备201获得沉浸媒体的媒体文件以及用于媒体呈现的描述信息,该描述信息包含该沉浸媒体的媒体文件的相关信息。
其中,解码设备202的解码过程与服务设备201的编码过程是相逆的,解码设备202按照沉浸媒体的文件格式要求对媒体文件进行解封装,得到封装好的沉浸媒体的可替换的多个码流,从而从中确定一个码流,并对其进行解码以还原出沉浸媒体。
作为一种可实现的方式,在解码过程中,解码设备202可以从媒体文件中获取关系指示信息,然后依据关系指示信息所指示的可替换关系,选取出所需呈现的码流,具体是组织该码流对应的媒体轨道/媒体项目,进而对媒体轨道/媒体项目进行解码处理,以呈现沉浸媒体。
作为另一种可实现的方式,沉浸媒体可以采用流化传输方式进行传输。此时解码设备202可获取 传输信令(如DASH、SMT),该传输信令包含关系指示信息的描述信息,根据传输信令可确定需解码的沉浸媒体的媒体文件片段(包括一个或多个媒体轨道/一个或多个媒体项目)进行解码处理,从而呈现沉浸媒体。
(4)沉浸媒体的呈现过程。
解码设备202可以对解码得到的码流进行渲染处理。依据沉浸媒体的时序特性,对于时序沉浸媒体,具体是对解码得到的媒体轨道进行渲染处理。对于非时序沉浸媒体,具体是对解码得到的媒体项目进行渲染处理。例如,沉浸媒体为容积视频媒体,且容积视频媒体包括3个媒体轨道,那么可在解码出各个媒体轨道之后,按照媒体轨道中所包含的几何数据和属性数据渲染容积视频媒体的内容,进而能够播放该容积视频媒体。
在一个实施例中,本申请实施例还提供一种沉浸媒体的数据处理方法的流程示意图。请参见图4a,该沉浸媒体的数据处理方法的流程包括如下内容:
服务设备201:首先可以通过采集设备(例如一组相机或者一个具有多个镜头和传感器的相机设备)对真实世界的视觉场景A进行采样,得到与真实世界的视觉场景对应的沉浸媒体的源数据B。示意性地,若沉浸媒体为点云媒体,那么源数据B是由大量点云帧组成的帧序列;然后对获取的沉浸媒体进行编码处理,得到码流E,该码流E包含可替换的N个码流;接着可基于N个码流之间的可替换关系生成关系指示信息,进而对码流E和关系指示信息进行封装得到沉浸媒体对应的媒体文件。在一种实现方式中,在封装过程中,可将沉浸媒体的码流封装至一个或多个媒体轨道(或媒体项目),并在相应媒体轨道(或媒体项目)中添加关系指示信息,从而形成沉浸媒体的媒体文件。服务设备201可以根据特定媒体容器文件格式,将一个或多个编码比特流合成为用于文件回放的媒体文件F,或用于流式传输的初始化片段和媒体片段的序列(Fs);其中,媒体容器文件格式可以是指在国际标准化组织(International Organization for Standardization,ISO)/国际电工委员会(International Electrotechnical Commission,IEC)14496-12中规定的ISO基本媒体文件格式。
在一种实施方式中,服务设备201还可根据N个码流之间的可替换关系,生成关系指示信息的描述信息,该关系指示信息的描述信息可通过传输信令发送给解码设备202,解码设备202可依据媒体文件的传输方式来决定是否使用该传输信令获取沉浸媒体的媒体文件。可选地,该传输信令的形式可以是信令描述文件。
在解码设备202:首先接收服务设备201发送的沉浸媒体的媒体文件,该媒体文件可以包括:用于文件回放的媒体文件F',或用于流式传输的初始化片段和媒体片段的序列Fs';然后对媒体文件进行解封装处理,得到码流E';接着从媒体文件中获取关系指示信息,并基于该关系指示信息所指示的可替换关系从N个码流中确定所需呈现的码流,并对该所需呈现的码流进行解码处理,得到沉浸媒体D';其中,解码设备202可基于传输信令获取流式传输的初始化片段和媒体片段Fs'。对于码流的解码具体是对码流对应的媒体轨道/媒体项目的解码。
在具体实现中,解码设备还可以基于当前对象的观看需求(包括观看位置/观看方向)确定呈现沉浸媒体所需的媒体文件,或者媒体片段序列;并对呈现沉浸媒体所需的媒体文件,或者媒体片段序列进行解码处理,得到呈现所需的沉浸媒体。最后基于当前对象的观看(视窗)方向,对解码后的沉浸媒体进行渲染,得到沉浸媒体的媒体帧A',并按照媒体帧的呈现时间在解码设备携带的头戴式显示器或任何其他显示设备的屏幕上呈现沉浸媒体。需要说明的是,当前对象的观看视窗可由各种类型的传感器(例如头部跟随传感器、位置跟随传感器、眼动跟随传感器)确定。其中,在基于视窗的传输过程中,当前的观看位置和观看方向也被传递给策略模块,用于确定要接收的轨道。
可以理解的是,本申请涉及的沉浸媒体的数据处理技术可以依托于云技术进行实现;例如,将云服务器作为服务设备。云技术(Cloud technology)是指在广域网或局域网内将硬件、软件、网络等系列资源统一起来,实现数据的计算、储存、处理和共享的一种托管技术。本申请所提供的沉浸媒体的数据处理技术可应用于点云压缩相关产品以及沉浸式系统中的服务设备端、播放设备端以及中间节点等各个环节。
在本申请实施例中,服务设备可以获取沉浸媒体,并对沉浸媒体进行编码处理,得到可替换的N个码流,并对N个码流以及关系指示信息(用于指示码流之间的可替换关系)进行封装,得到沉浸媒体的媒体文件;然后解码设备可以获取沉浸媒体的媒体文件,并根据媒体文件中的关系指示信息所指 示的可替换关系,从沉浸媒体的N个码流中确定所需呈现的码流进行解码处理,并呈现沉浸媒体。可见,在沉浸媒体的编码过程中,可在媒体文件中添加关系指示信息,这样,通过关系指示信息便可明确地指示码流之间的可替换关系,进而有效指导解码端对沉浸媒体更加准确的解码以及呈现,从而提升沉浸媒体的呈现效果。
在目前可实现的方式中,时序沉浸媒体可被封装为一个或多个媒体轨道,媒体轨道包括组件轨道,码流之间的可替换关系可通过组件轨道之间的可替换关系来指示。可相互替换的组件轨道可构成一个轨道可替换组。示例性地,以时序沉浸媒体为容积视频为例,当容积视频的组件轨道存在可替换的组件轨道时,容积视频的可替换信息结构(V3CAlternativeInfoStruct)可用于指示一个轨道可替换组内多个可替换的组件轨道之间的差异。对于可替换信息结构的语法表示如下表1。
表1
其中,对于上述表1中各个字段的含义如下:
质量等级标识字段(quality_ranking_flag):取值为1时表示轨道可替换组内的组件轨道之间存在质量上的可替换关系;取值为0时表示可替换的组件轨道之间不存在质量上的可替换关系。
编码类型标识字段(codec_type_flag):取值为1时表示轨道可替换组内的组件轨道之间存在编码类型上的可替换关系;取值为0时表示轨道可替换组内的组件轨道之间不存在编码类型上的可替换关系。
质量等级字段(quality_ranking):用于指示质量等级信息,该质量等级字段取值越小,表明对应组件轨道的质量越高。
编码类型字段(codec_type):用于指示对应组件轨道的编码类型。
容积视频内容可被编码为不同版本的内容。不同的可替换内容由ISO/IEC 14496-12中定义的可替换组机制(轨道头数据盒TrackHeaderBox中的可替换组字段alternate_group)指示。若不同的容积视频的图集轨道拥有相同alternate_group取值,则说明容积视频的图集轨道对应的容积视频内容之间互为可替换的内容。
基于容积视频的组件轨道存在的可替换关系,相互之间可替换的组件轨道同属于一个容积视频的轨道可替换组,轨道可替换组中的组件轨道,仅有一个能被相应的图集轨道或图集分片轨道索引。组件轨道中包含一个视频帧的各种数据,例如几何数据、属性数据等;图集轨道中包含图像,例如一个视频帧以图像形式被封装为一个图集轨道。对于轨道可替换组的定义以及语法表示,可参见下述表2所示。
表2

其中,轨道可替换组可通过轨道组类型数据盒(TrackGroupTypeBox)来指示,该数据盒的类型为'valg',且包含于轨道组数据盒(TrackGroupBox)中,一个媒体轨道中可设置0个或者多个TrackGroupTypeBox。
对于非时序沉浸媒体,其组件项目之间可存在可替换关系。举例来说,非时序沉浸媒体为非时序容积媒体,那么对于该非时序容积媒体的组件项目之间存在可替换关系,V3CAlternativeEntityToGroupBox用于指示可替换的组件项目间的差异信息(如质量差异信息)。相互可替换的组件项目中,仅有一个能被相应的图集项目或图集分片项目索引。对于项目可替换组的定义以及语法表示,可参见下述表3所示。
表3
其中,项目可替换组通过实体组数据盒(EntityToGroupBox)来指示,该数据盒的类型为'valy',且一个组件项目中可设置0个、1个或者多个该实体组数据盒。
基于上述对时序沉浸媒体中轨道可替换组,以及非时序沉浸媒体中项目可替换组有关的示例内容,可按照相应语法来设置对应的数据盒,并将其包含于媒体文件中,以指示项目/轨道之间的可替换关系。
然而,在一些特殊场景下,仅基于项目或轨道之间的可替换关系并不能够明确地指示出码流之间的可替换关系,从而影响沉浸媒体的呈现。示意性地,请参见图4b所示的码流的封装示意图。如图4b中的(1)所示,媒体文件1中包含2个码流(bitstream),各个码流均采用多轨封装的方式得到多个媒体轨道,其中,轨道1(track1)和轨道2(track2)对应码流1,轨道1和轨道3(track3)对应码流2,且码流1和码流2中的几何信息完全一致时,即几何信息采用完全相同的编码方式,在媒体文件1中仅包含一个几何轨道,即轨道1,轨道2和轨道3之间互为可替换关系。此时,指示出轨道2和轨道3之间的可替换关系以及共享几何轨道,可得知码流之间的可替换关系。
然而,如图4b中的(2)所示,若媒体文件2中包含码流3,且码流3的几何轨道不重复(即不被多个码流共享)时,仅指示出轨道2和轨道3之间的可替换关系,并不能够指示出码流3与其他码流之间的可替换关系,此时对码流之间的可替换关系的指示并不够清晰。
可见,若仅指示轨道之间的可替换关系并不能够覆盖任意场景下的码流之间的可替换关系,码流之间的可替换关系的指示不够明确。因此,本申请实施例在编码过程中,扩展了关系指示信息来指示出码流级别的可替换关系,支持在各种文件封装的场景下使用,灵活地组织码流对应的媒体轨道/媒体项目,清晰地指示出码流之间的可替换关系,通用性较高。
需要说明的是,本申请实施例可以在系统层添加若干描述性字段,包括文件封装层面的字段扩展和信令消息层面的字段扩展,以支持本申请各实施例中的实施步骤。接下来以扩展现有ISOBMFF数据盒和DASH信令的形式举例。请参见图5,图5为本申请实施例提供的一种沉浸媒体的数据处理方法的流程示意图,该沉浸媒体的数据处理方法可以由沉浸媒体的数据处理系统中的解码设备202来执行,该方法包括以下步骤S501-S502:
S501,获取沉浸媒体的媒体文件;沉浸媒体包括可替换的N个码流;媒体文件包括关系指示信息, 该关系指示信息用于指示N个码流之间的可替换关系,N为大于1的整数。
按照沉浸媒体的时序特点,沉浸媒体可以是时序沉浸媒体或者非时序沉浸媒体;按照沉浸媒体的信号特点,沉浸媒体可以是点云媒体,也可以是其他媒体,其他媒体例如是:多视角视频媒体、音频媒体、字幕媒体、触觉媒体、容积视频媒体等中的任一种。沉浸媒体的N个码流之间具备可替换关系,基于该可替换关系任意两个码流之间可相互替换,即互为可替换码流,因此N个码流中的每个码流也可称为可替换码流。举例来说,沉浸媒体包括3个可替换的码流,分别为码流1、码流2以及码流3,且3个码流为相同内容不同质量的可替换版本。任一码流可以是二进制码流或者其他进制码流(如四进制码流、十六进制码流等等)。本申请对此不做限制。
接下来以时序沉浸媒体在媒体文件中被封装为媒体轨道以及非时序沉浸媒体在媒体文件中被封装为媒体项目,来阐述媒体文件中关系指示信息的设置以关系指示信息所指示的内容。
(一)、沉浸媒体为时序沉浸媒体,沉浸媒体的N个码流在媒体文件中被封装为M个媒体轨道,M为整数且M大于或等于N。
由于时序沉浸媒体的一个码流在媒体文件中可被封装为一个或多个媒体轨道,因此,媒体轨道的数量M≥可替换的码流的数量N。M个媒体轨道包括N个码流分别对应的相应数量的不同媒体轨道,且N个码流对应的所有媒体轨道包含于一个媒体文件中。举例来说,沉浸媒体包含的可替换的3个码流分别为码流1、码流2以及码流3,媒体文件中包含8个媒体轨道,分别为码流1被封装至的3个媒体轨道,码流2被封装至的4个媒体轨道,码流3被封装至的1个媒体轨道,各个码流的媒体轨道中不存在相同的媒体轨道,即各个码流不共享媒体轨道。
关系指示信息可设置于媒体轨道中,具体可设置于媒体轨道的样本入口中。并可被视为媒体轨道的可替换信息元数据来指示码流之间的可替换关系。为便于描述,N个码流中的任一个码流表示为码流i,对于N个码流中的任意两个码流则可分别表示为码流i和码流j,i、j为正整数且i、j均小于或等于N。
基于封装方式的不同,关系指示信息的设置可包括以下(1.1)-(1.3):
(1.1)码流i被封装至M个媒体轨道中的一个媒体轨道Mi中,关系指示信息设置于媒体轨道Mi中。
对于时序沉浸媒体的码流i,在编码端可采用单轨封装方式将其封装为单个的媒体轨道Mi。M个媒体轨道包含该媒体轨道Mi,该媒体轨道Mi可用于表示码流i。关系指示信息设置于该媒体轨道Mi中,可用于指示媒体轨道Mi所属码流i和其他码流之间的可替换关系,此处的其他码流是指N个码流中除码流i之外的码流。
(1.2)码流i被封装至M个媒体轨道中的多个媒体轨道中,关系指示信息设置于媒体轨道Mi中;媒体轨道Mi是指码流i被封装至的多个媒体轨道中的任一个媒体轨道。
对于时序沉浸媒体的码流i,在编码端也可采用多轨封装方式将其封装为多个媒体轨道。M个媒体轨道中包含码流i对应的多个媒体轨道,封装得到的多个媒体轨道组合起来可表示码流i,多个媒体轨道中的每个媒体轨道均属于码流i。关系指示信息可设置在该多个媒体轨道中的任一媒体轨道,即媒体轨道Mi中。
在一种可行的实现方式中,在关系指示信息设置于多个媒体轨道中的媒体轨道Mi中的情况下,关系指示信息不仅可指示媒体轨道所属码流i和其他码流之间的可替换关系,还可用于指示媒体轨道Mi与码流i对应的其他媒体轨道之间的关联关系;其中,其他媒体轨道是指码流i被封装至的多个媒体轨道中除媒体轨道Mi之外的媒体轨道。关联关系用于表示媒体轨道Mi与其他媒体轨道属于同一个码流i。上述关联关系可以是指属于同一码流的多个媒体轨道之间的组合关系,通过媒体轨道Mi与码流i的其他媒体轨道之间的组合可完整地表示出码流i,关系指示信息中可包含该关联关系的指示。
(1.3)码流i被封装至M个媒体轨道中的第一多个媒体轨道中;码流j被封装至M个媒体轨道中的第二多个媒体轨道中;若第一多个媒体轨道与第二多个媒体轨道均包含媒体轨道Mij,则关系指示信息还用于指示媒体轨道Mij的共享归属关系。
针对N个码流中的任意两个采用多轨封装方式的码流i和码流j,码流i被封装至的媒体轨道的数量和码流j被封装至的媒体轨道的数量可相同或不同,但码流i和码流j被封装至的媒体轨道的数量均大于1。示意性地,码流i被封装至8个媒体轨道中的3个媒体轨道中,码流j被封装至8个媒体轨道中的2个媒体轨道中。并且,由于不同码流之间的一些信息可能完全一致,不同码流封装得到的各个 媒体轨道中可能包含相同的媒体轨道。对于相同的媒体轨道可在媒体文件中保留一个,这样,M个媒体轨道不存在重复的媒体轨道,可有效节省存储资源。示意性地,码流i对应的3个媒体轨道中包含几何轨道1,在码流j对应的2个媒体轨道中包含几何轨道2;码流i和码流j中的几何数据以完全相同的编码方式获得,那么码流i以及码流j具有相同的几何轨道,即几何轨道1和几何轨道2属于相同的媒体轨道,那么该媒体文件中可仅保留一个几何轨道(即几何轨道1或2)。
关系指示信息可设置于该媒体轨道Mij中,用于指示媒体轨道Mij的共享归属关系。该共享归属关系用于表示媒体轨道Mij是码流i和码流j共享的媒体轨道。即媒体轨道Mij既属于码流i,又属于码流j,此类媒体轨道也可称为共享媒体轨道。可理解的是,对于任一共享媒体轨道,可以被至少两个码流共享,例如媒体文件中的媒体轨道被3个码流共享,M个媒体轨道中包括的共享媒体轨道的数量可以为0或者大于0。
另外,由于媒体轨道Mij是码流i或码流j对应的多个媒体轨道中的一个媒体轨道。因此,通过设置于该媒体轨道Mij的关系指示信息可指示出以下多种关系:媒体轨道Mij所属码流与其他码流之间的可替换关系,媒体轨道Mij与同属于码流i/码流j对应的其他媒体轨道之间的关联关系,媒体轨道Mij的共享归属关系。如上述(1.2)中提及的媒体轨道Mi为不同码流之间共享的媒体轨道时,通过设置于媒体轨道Mi的关系指示信息也可指示上述多种关系。
可见,通过设置于媒体轨道中的关系指示信息,可指示出以下任一种或多种关系:媒体轨道所属码流与其他码流之间的可替换关系、媒体轨道与属于同一码流的其他媒体轨道之间的关联关系、媒体轨道的共享归属关系。基于以上关系的指示可清晰地指示出码流之间的可替换关系,从而便于灵活、准确地组织可替换的码流对应的媒体轨道,并解码需呈现的码流对应的媒体轨道以呈现沉浸媒体。
在一种可实现的方式中,当M个媒体轨道中存在多个媒体轨道需要被联合播放时,需要被联合播放的多个媒体轨道属于同一个播放轨道组。
依据不同联合播放需求,媒体轨道之间可进行联合进行播放。属于同一播放轨道组的媒体轨道可能归属于同一码流,也可能归属于不同码流。示意性地,对于采用多轨封装方式的码流,需多个媒体轨道的组合来完整地表示码流。这样,对于M个媒体轨道存在的多个媒体轨道便需要被联合播放。这些需要被联合播放的多个媒体轨道属于同一码流,可归为一个播放轨道组,媒体轨道之间的组合可通过播放轨道组指示。下面以沉浸媒体为时序容积媒体(如容积视频)为例,对容积视频的播放轨道组具有如下表4所示的定义以及语法表示。
表4
对于容积视频的媒体轨道,仅有某些特定组合的媒体轨道应被联合播放,可使用容积视频的播放轨道组来指示联合播放所需的媒体轨道之间的组合。
对于播放轨道组中的每一个媒体轨道,该媒体轨道的TrackGroupBox(轨道组数据盒)中包含一个携带独特track_group_id(轨道组标识,用于指示播放轨道组的标识符)的PlayoutTrackGroupBox(扩展自ISO/IEC 14496-12中的TrackGroupTypeBox,即播放轨道组数据盒)。PlayoutTrackGroupBox表示对应媒体轨道属于构成一个播放轨道组中的一个媒体轨道。对于一个播放轨道组的所有媒体轨道, 可以有选择地定义其联合质量等级,用以指示不同质量的媒体内容。其中,对于上述表4中语法部分所涉及的各个字段的含义如下:
质量等级标识字段(quality_ranking_flag):取值为1时表示容积视频的播放轨道组的所有媒体轨道具有联合质量等级;取值为0时表示容积视频的播放轨道组的所有媒体轨道不具有联合质量等级。
质量等级字段(quality_ranking):用于指示一个容积视频的播放轨道组内所有媒体轨道的联合质量等级。该质量等级字段的取值越小,表明联合质量的等级越高。
(二)沉浸媒体为非时序沉浸媒体;N个码流在媒体文件中被封装为P个媒体项目;P为整数且P大于或等于N。
非时序沉浸媒体的一个码流在媒体文件中可被封装为一个或多个媒体项目,P个媒体项目包含N个码流分别对应的相应数量的不同媒体项目,且P个媒体项目包含于一个媒体文件中。
关系指示信息可设置于媒体项目中。为便于描述,N个码流中的任一个码流表示为码流i,N个码流中的任意两个码流则可分别表示为码流i和码流j,i、j为正整数且i、j均小于或等于N。基于封装方式的不同,关系指示信息的设置可包括以下(2.1)-(2.3):
(2.1)码流i被封装至P个媒体项目中的一个媒体项目Pi中,关系指示信息设置于媒体项目Pi中。
码流i被封装为单个媒体项目Pi,且包含于P个媒体项目中。该媒体项目Pi可用于表示码流i。在此情况下,关系指示信息可设置于该媒体项目中,并可用于指示媒体轨道Mi所属码流i和其他码流之间的可替换关系,此处的其他码流是指N个码流中除码流i之外的码流。
(2.2)码流i被封装至P个媒体项目中的多个媒体项目中,关系指示信息设置于媒体项目Pi中;媒体项目Pi是指码流i被封装至的多个媒体项目中的任一个媒体项目。
在码流i被封装为多个媒体项目的情况下,关系指示信息可设置于该多个媒体项目中的任一媒体项目,即媒体项目Pi中。
在一个可行的实现方式中,关系指示信息还用于指示媒体项目Pi与码流i对应的其他媒体项目之间的关联关系。其中,其他媒体项目是指码流i被封装至的多个媒体项目中除媒体项目Pi之外的媒体项目;关联关系用于表示媒体项目Pi与其他媒体项目属于同一个码流i。上述关联关系可以是指同一码流对应的多个媒体项目之间的组合关系,媒体项目Pi与同属于码流i的其他媒体项目之间组合(即包括属于同一码流的所有项目媒体)可完整地表示码流i。关系指示信息包含该关联关系的指示。
可选地,设置有关系指示信息的媒体项目Pi可以仅属于码流i,也可以既属于码流i,还属于N个码流中除码流i之外的至少一个码流。若媒体项目Pi属于至少两个码流,设置于媒体项目Pi中的关系指示信息还具有如下(2.3)中的关系指示。
(2.3)码流i被封装至P个媒体项目中的第一多个媒体项目中;码流j被封装至P个媒体项目中的第一多个媒体项目中。若第一多个媒体项目与第二多个媒体项目均包含媒体项目Pij,则关系指示信息还用于指示媒体项目Pij的共享归属关系;共享归属关系用于表示媒体项目Pij是码流i和码流j共享的媒体项目。
对于码流i和码流j,码流i被封装至的媒体项目的数量和码流j被封装至的媒体项目的数量可相同或不同。示意性地,码流i被封装至3个媒体项目中,码流j被封装至2个媒体项目中。并且,由于不同码流之间的一些信息可能完全一致,不同码流封装得到的各个媒体轨道中可包含相同的媒体项目对于不同码流之间重复的媒体项目,在媒体文件中同样可保留一个,这样能够有效节省存储资源。例如,媒体文件包括7个媒体项目,码流i对应其中的3个媒体项目以及码流j对应其中的5个媒体项目,码流i对应的媒体项目和码流j对应的媒体项目中均包含媒体项目x,在媒体文件中仅保留一个该媒体项目x且归属于码流i和码流j。
关系指示信息可设置于该媒体项目Pij中,并可用于指示媒体项目Pij的共享归属关系。该共享归属关系用于表示媒体项目Pij是码流i和码流j共享的媒体轨道。即媒体项目Pij既属于码流i,又属于码流j,此类媒体项目也可称为共享媒体项目。可理解的是,对于任一共享媒体项目,可以被N个码流中的至少两个码流共享,M个媒体项目中包括的共享媒体项目的数量可以为0或者大于0。
另外,由于媒体项目Pij也是码流i或码流j对应的多个媒体项目中的一个媒体项目。因此,通过设置于该媒体项目Pij的关系指示信息可指示以下多种关系:媒体项目Pij所属码流与其他码流之间的可替换关系,媒体项目Pij与码流i/码流j对应的其他媒体项目之间的关联关系,媒体项目Pij的共享归属关 系。如上述(2.2)中提及的媒体项目Pi为不同码流之间共享的媒体轨道时,通过设置于媒体项目Pi的关系指示信息也可指示上述关系。
通过在媒体项目中设置的关系指示信息,可指示出以下任一种或多种关系:媒体项目所属码流与其他码流之间的可替换关系、媒体项目与属于同一码流的其他媒体项目之间的关联关系、媒体项目的共享归属关系。基于以上关系的指示可清晰地指示出码流之间的可替换关系,从而便于灵活、准确地组织可替换的码流对应的媒体项目,并解码以呈现沉浸媒体。
在一种可行的实现方式中,当P个媒体项目中存在多个媒体项目需要被联合播放时,需要被联合播放的多个媒体项目属于同一个播放实体组。
依据联合播放需求的不同,可联合不同的媒体项目进行播放。同一播放实体组的媒体项目可归属于相同或不同的码流。示意性地,P个媒体项目中存在多个媒体项目可完整地表示码流,因此,对于表示码流的多个媒体项目需被联合播放。这些需要被联合播放的多个媒体轨道可归为一个播放实体组。通过播放实体组可指示联合播放的媒体项目之间的组合。下面以非时序沉浸媒体为非时序容积媒体为例,对非时序容积媒体的播放实体组进行如下介绍。对于非时序容积媒体的播放实体组具有如下表5所示的定义。
表5
对于容积媒体的媒体项目,仅有某些特定组合的媒体项目应被联合播放时,可使用容积媒体的播放实体组来指示这些联合播放的媒体项目的组合。该播放实体组通过'eply'类型的播放实体组数据盒PlayoutEntityToGroupBox表示,可设置于媒体项目中。对于一个播放实体组的所有媒体项目,可以有选择地定义其联合质量等级,用以指示不同质量的媒体内容。其中,对于上述表5中的语法部分所涉及的各个字段的含义如下:
质量等级标识字段(quality_ranking_flag):取值为1时表示容积媒体的播放实体组的所有媒体项目具有联合质量等级;取值为0时表示容积媒体的播放实体组的所有媒体项目不具有联合质量等级。
质量等级字段(quality_ranking):用于指示一个容积媒体的播放实体组内所有媒体项目的联合质量等级,该质量等级字段取值越小,表明联合质量等级越高。
在一个实施例中,具备可替换关系的N个码流属于同一个可替换组,同一个可替换组中的不同码流在呈现时允许相互替换;关系指示信息包括可替换信息数据盒(AlternativeInfoBox)。对于可替换信息数据盒具有如下表6所示的定义。
表6
该可替换信息数据盒是新增的类型为'alif'的数据盒,可设置在媒体项目或者媒体轨道的样本入口中, 即媒体项目中或媒体轨道的样本入口中可包含可替换信息数据盒。对于可替换信息数据盒的数量可以是大于或等于0,即在一个媒体轨道/媒体项目中可设置0个、1个或者多个可替换信息数据盒,这依据媒体轨道/媒体项目的特点来决定。例如媒体轨道track1属于两个码流,那么可设置2个可替换信息数据盒。
可替换信息数据盒可用于指示媒体轨道/媒体项目对应码流所属可替换组的信息:a、若可替换信息数据盒设于当前媒体轨道中,则可替换信息数据盒中包含该当前媒体轨道对应码流所属的可替换组的信息。其中,当前媒体轨道是指正在被解码的媒体轨道。b、若可替换信息数据盒设于当前媒体项目中,则可替换信息数据盒中包含该当前媒体项目对应码流所属的可替换组的信息。其中,当前媒体项目是指正在被解码的媒体项目。
可理解的是,若沉浸媒体为时序沉浸媒体,媒体文件中可能存在一个或多个媒体轨道中设置有可替换信息数据盒。若沉浸媒体为非时序沉浸媒体,媒体文件中可能存在一个或多个媒体项目中设置有可替换信息数据盒。对于媒体文件中任一正在被解码的媒体轨道或者媒体项目,若设有可替换信息数据盒,则通过可替换信息数据盒来指示相应码流所属可替换组的信息,以指示相应码流之间的可替换关系。
在一种可实现的方式中,可替换信息数据盒中包含可替换组标识标志字段(alternative_group_id_flag)和可替换组标识字段(alternative_group_id)。若可替换信息数据盒设于当前媒体轨道中,则可替换组标识标志字段用于指示当前媒体轨道中的可替换信息数据盒中是否指示当前媒体轨道对应码流所属的可替换组标识符。
可替换信息数据盒具体可设于当前媒体轨道的样本入口中。当可替换组标识志字段的取值为第一预设数值(如“0”)时,表示当前媒体轨道中的可替换信息数据盒中指示该当前媒体轨道对应码流所属的可替换组标识符。当可替换组标识标志字段的取值为第二预设数值(如“1”)时,表示当前媒体轨道中的可替换信息数据盒中不指示该当前媒体轨道对应码流所属的可替换组标识符。可选地,若可替换组标志标识标志字段的取值为第一预设数值(如“0”),那么可替换组标识符存在于当前媒体轨道的轨道头数据盒(TrackHeaderBox)中。可替换组标识字段(alternative_group_id)用于指示当前媒体轨道对应码流所属的可替换组标识符;同一可替换组的不同码流对应的可替换组标识符相同,该可替换组标识符可以是一个数值,例如1。
若可替换信息数据盒设于当前媒体项目中,则可替换组标识标志字段用于指示当前媒体项目中的可替换信息数据盒中是否指示当前媒体项目对应码流所属的可替换组标识符。基于可替换组标识标志字段的取值不同,可替换组标识标志字段可指示不同的内容。其中,当可替换组标识标志字段的取值为第一预设数值(如“0”)时,表示当前媒体项目中的可替换信息数据盒中指示当前媒体项目对应码流所属的可替换组标识符;当可替换组标识标志字段的取值为第二预设数值(如“1”)时,表示当前媒体项目中的可替换信息数据盒中不指示当前媒体项目对应码流所属的可替换组标识符;可替换组标识字段用于指示当前媒体项目对应码流所属的可替换组标识符;同一可替换组的不同码流对应的可替换组标识符相同,该可替换组标识符可以是一个数值,例如1;该可替换组标识符也可以是一串字符,例如aabbxx。
对于可替换的码流,设置了可替换信息数据盒的媒体轨道/媒体项目,基于可替换信息数据盒中包含的可替换组标识标志字段以及可替换组标识字段,可指示所设置的可替换信息数据盒中指示该媒体轨道/媒体项目对应码流所属的可替换组标识符。不同的媒体轨道/媒体项目中的可替换信息数据盒中可替换组标识字段的取值相同,以表示对应码流所属可替换组标识符相同。
在一个实施例中,关系指示信息还用于指示当前媒体轨道的共享归属关系或当前媒体项目的共享归属关系。对于共享归属关系的指示,包括以下两种方式,一种是基于可替换信息数据盒中的字段来指示,另一种是基于可替换信息数据盒的数量来指示。
方式一、基于可替换信息数据盒中的字段来指示。
可替换信息数据盒包括多替换码流标志字段(multi_alternative_bitstream_flag)和码流数量字段(num_bitstream)。
若可替换信息数据盒设于当前媒体轨道中,则多替换码流标志字段用于指示当前媒体轨道是否属 于多个码流。上述可替换信息数据盒具体可设于当前媒体轨道的样本入口中。
其中,当多替换码流标志字段的取值为第一预设数值(如“0”),时,表示当前媒体轨道仅属于一个码流;该当前媒体轨道是一个码流的组成成分。该当前媒体轨道可独自表示一个码流,或者与其他媒体轨道组合来表示一个码流。当多替换码流标志字段的取值为第二预设数值(如“1”)时,表示当前媒体轨道属于多个码流。该当前媒体轨道同时是多个码流的组成成分。当前媒体轨道被至少两个码流共享。举例来说,当前媒体轨道track1既是码流码流1对应的多个媒体轨道中的一个媒体轨道,又是码流码流2对应的多个媒体轨道中的一个媒体轨道。码流数量字段(num_bitstream)用于指示当前媒体轨道所属的码流的数量。也即,当多替换码流标志字段的取值指示当前媒体轨道属于多个码流时,该当前媒体轨道所属码流的数量可通过码流数量字段来指示(num_bitstream)。码流数量字段的取值和当前媒体轨道所属码流的数量相同,例如当前媒体轨道属于K(大于1)个码流,那么码流数量字段则可取值为K。举例来说,设置在当前媒体轨道track1中的可替换信息数据盒中的multi_alternative_bitstream_flag取值为1,且num_bitstream等于3,那么便表示该轨道track1属于3个码流,即轨道track1是3个码流共享的媒体轨道。
若可替换信息数据盒设于当前媒体项目中,则多替换码流标志字段用于指示当前媒体项目是否属于多个码流;当多替换码流标志字段(multi_alternative_bitstream_flag)的取值为第一预设数值(如“0”)时,表示当前媒体项目仅属于一个码流,该当前媒体项目是一个码流的组成成分;当多替换码流标志字段的取值为第二预设数值(如“1”)时,表示当前媒体项目属于多个码流,该当前媒体轨道同时是多个码流的组成成分。码流数量字段(num_bitstream)用于指示当前媒体项目所属的码流的数量,码流数量字段的取值和当前媒体项目所属码流的数量相同。
通过多替换码流标志字段以及码流数量字段,可指示出当前媒体轨道或者当前媒体项目是否被多个码流共享,从而明确该当前媒体轨道/当前媒体项目的共享归属关系,以及当前媒体轨道/当前媒体项目归属码流的数量。
方式二、基于可替换信息数据盒的数量指示。
若当前媒体轨道中仅包含一个可替换信息数据盒,则指示当前媒体轨道仅属于一个码流;若当前媒体轨道包含多个可替换信息数据盒,则指示当前媒体轨道属于多个码流,该当前媒体轨道中的可替换信息数据盒的数量应等于当前媒体轨道所属的码流的数量。
当前媒体轨道中的可替换信息数据盒的数量和当前媒体轨道所归属的码流的数量是相同的,以指示当前媒体轨道所属码流的数量。通过在该当前媒体轨道中添加多个可替换信息数据盒(AlternativeInfoBox)可指示该当前媒体轨道具有共享归属关系,即当前媒体轨道可被多个码流共享。对于仅属于可替换的N个码流中的一个码流的当前媒体轨道,该当前媒体轨道至多包含一个可替换信息数据盒(AlternativeInfoBox),此处至多包含一个是指包含0个或者1个,这是因为:当前媒体轨道虽然属于一个码流,但可能是多轨封装的码流对应的多个媒体轨道中的一个,而并不是单轨封装的码流对应的单个媒体轨道,此时当前媒体轨道中可能并不需设置可替换信息数据盒。
若当前媒体项目中仅包含一个可替换信息数据盒,则指示当前媒体项目仅属于一个码流;若当前媒体项目包含多个可替换信息数据盒,则指示当前媒体项目属于多个码流,当前媒体项目中的可替换信息数据盒的数量应等于该当前媒体项目所属的码流的数量。示意性地,当前媒体项目中包含2个可替换信息数据盒,那么可指示该当前媒体轨道属于2个码流。
在此种方式下,可替换信息数据盒不包含方式一中的多替换码流标志字段和/或码流数量字段。基于可替换信息数据盒的数量多少来指示当前媒体轨道/当前媒体项目的共享归属关系,可以标识被共享的媒体轨道/媒体项目,也便于在当前媒体轨道/当前媒体项目的不同可替换信息数据盒中设置相应的信息,来进一步指示当前媒体轨道/当前媒体项目与相应媒体轨道/相应媒体项目之间的关联关系。
在一个实施例中,关系指示信息还用于指示当前媒体轨道及与当前媒体轨道属于同一码流的其他媒体轨道之间的关联关系,或用于指示当前媒体项目及与当前媒体项目属于同一码流的其他媒体项目之间的关联关系。
基于关联关系,当前媒体轨道和同属于一个码流的其他媒体轨道可互相关联来表示相应码流。可替换信息数据盒包括组件参考类型字段(components_ref_type),该组件参考类型字段用于指示当前媒体轨道及与当前媒体轨道属于同一码流的其他媒体轨道之间的关联方式,或当前媒体项目及与当前媒 体项目属于同一码流的其他媒体项目之间的关联方式。
(1)可替换信息数据盒设置于当前媒体轨道中。
①若组件参考类型字段(components_ref_type)的取值为第一预设数值(如“0”),则表示当前媒体轨道通过轨道参考关联至与当前媒体轨道属于同一码流的其他媒体轨道,且可替换信息数据盒还包括:轨道参考类型字段(track_ref_type),该轨道参考类型字段用于指示轨道参考的类型。
其中,轨道参考是用于关联当前媒体轨道和其他媒体轨道的方式,该轨道参考的类型由轨道参考类型字段(track_ref_type)指示。不同媒体轨道的轨道参考类型字段取值相同的内容,则可以表示媒体轨道之间相互关联。
②若组件参考类型字段(components_ref_type)的取值为第二预设数值(如“1”),则表示当前媒体轨道通过轨道组关联至与当前媒体轨道属于同一码流的其他媒体轨道,且可替换信息数据盒还包括:轨道组类型字段(track_group_type)和轨道组标识字段(track_group_id),该轨道组类型字段用于指示当前媒体轨道所属轨道组的类型,该轨道组标识字段用于指示当前媒体轨道所属轨道组的标识符。
该轨道组包含多个媒体轨道。示意性地,按照码流归属,上述轨道组可以是同一码流对应的媒体轨道,每个轨道组中包含的媒体轨道可组合表示一个码流,M个媒体轨道可对应N个轨道组。当前媒体轨道所属的轨道组的类型由可替换数据盒中包含的轨道组类型字段(track_group_type)指示,当前媒体轨道所属的轨道组的标识符由轨道组标识字段(track_group_id)来指示。同一轨道组的标识符和类型相同,该标识符可以是数字或字符串,以表示该轨道组中的媒体轨道之间是相互关联的。
(2)可替换新数据盒设置于当前媒体项目中。
①若组件参考类型字段(components_ref_type)的取值为第三预设数值(如“2”),则表示当前媒体项目通过项目参考的方式关联至与当前媒体项目属于同一码流的其他媒体项目,且可替换信息数据盒还包括:项目参考类型字段(item_ref_type),该项目参考类型字段用于指示项目参考的类型。
其中,项目参考是用于关联当前媒体项目和其他媒体项目的方式,项目参考的类型由项目参考类型字段(item_ref_type)指示。不同媒体项目的项目参考类型字段取值相同,以表示媒体项目之间可相互关联。
②若组件参考类型字段(components_ref_type)的取值为第四预设数值(如“3”),则表示当前媒体项目通过实体组关联至与当前媒体项目属于同一码流的其他媒体项目,且可替换信息数据盒还包括:实体组类型字段(track_group_type)和实体组标识字段(track_group_id),实体组类型字段用于指示当前媒体项目所属实体组的类型;实体组标识字段用于指示当前媒体项目所属实体组的标识符。
该实体组包含一个或多个媒体项目。示意性地,按照码流归属特性,一个实体组包含一个码流对应的所有媒体项目,以通过实体组表示该码流。当前媒体项目所属的实体组的类型由可替换数据盒中包含的实体组类型字段(track_group_type)指示,当前媒体项目所属的实体组的标识符由实体组标识字段(track_group_id)来指示。同一实体组的标识符以及类型相同,以表示实体组中的媒体项目之间的关联关系。
通过组件参考类型字段(components_ref_type)的指示,能够进一步基于部分字段的取值来关联当前媒体轨道与其他媒体轨道,或者当前媒体项目与其他媒体项目,从而清晰地指示出同一码流对应的媒体轨道的组合或媒体项目的组合。进一步地,在可替换信息数据盒中结合对关联关系以及共享归属关系的指示,能够指示出媒体轨道组合之间的可替换关系,进而指示出码流级别的可替换关系。
在一个实施例中,可替换信息数据盒还包括多组件标志字段(multi_components_flag)。
若可替换信息数据盒设于当前媒体轨道中,则多组件标志字段(multi_components_flag)用于指示当前媒体轨道所属码流是否被封装至多个媒体轨道中。可替换信息数据盒可设于当前媒体轨道的样本入口中。通过多组件标志字段可得知当前媒体轨道所属码流的封装方式,进而能够得知当前媒体轨道归属于码流的成分属性,该成分属性是指当前媒体轨道是码流封装至的多个媒体轨道中的一个媒体轨道,或者码流封装至的单个媒体轨道。
当多组件标志字段(multi_components_flag)的取值为第一预设数值(如“0”)时,表示当前媒体轨道所属码流被封装至一个媒体轨道中;当前媒体轨道是其所属码流封装至的媒体轨道。该当前媒体轨道属于单轨封装的码流的组成成分,通过该当前媒体轨道单独便可表示相应码流。当多组件标志字段(multi_components_flag)的取值为第二预设数值(如“1”)时,表示当前媒体轨道所属码流被封装 至多个媒体轨道中,当前媒体轨道为其所属码流被封装至的多个媒体轨道中的的任一个媒体轨道。即当前媒体轨道所属码流采用多轨封装方式,当前媒体轨道是其封装至的多个媒体轨道中的一个媒体轨道,此时该当前媒体轨道需与属于同一码流的其他媒体轨道组合来表示该码流。
若可替换信息数据盒设于当前媒体项目中,则多组件标志字段(multi_components_flag)用于指示当前媒体项目所属码流是否被封装至多个媒体项目中;当多组件标志字段的取值为第一预设数值(如“0”)时,表示当前媒体项目所属码流被封装至一个媒体项目中;当前媒体项目即该码流所封装得到的媒体项目,通过该当前媒体项目单独便可表示相应码流。当前媒体项目是其所属码流封装至的媒体项目;当前媒体项目即该码流所封装得到的媒体项目,通过该当前媒体项目单独便可表示相应码流。当多组件标志字段的取值为第二预设数值(如“1”)时,表示当前媒体项目所属码流被封装至多个媒体项目中,当前媒体项目为其所属码流被封装至的多个媒体项目中的的任一个媒体项目。此时该当前媒体项目可与属于同一码流的其他媒体项目组合来表示该码流。
基于上述介绍,可替换信息数据盒中语法表示可如下表7所示。
表7

对于如上述表7所示的可替换信息数据盒中的各个字段的含义如下:
可替换组标识标志字段(alternative_group_id_flag):若可替换信息数据盒设于当前媒体轨道中,则用于指示当前媒体轨道中的可替换信息数据盒中是否指示该当前媒体轨道对应码流所属的可替换组标识符。若可替换信息数据盒设于当前媒体项目中,则用于指示当前媒体项目中的可替换信息数据盒中是否指示当前媒体项目对应码流所属的可替换组标识符。示意性的,当alternative_group_id_flag取值为1时,表示当前媒体轨道中的可替换信息数据盒中指示当前媒体轨道对应码流所属的可替换组标识符,当alternative_group_id_flag取值为0时,表示当前媒体轨道中的可替换信息数据盒中不指示当前媒体轨道对应码流所属的可替换组标识符。
可替换组标识字段(alternative_group_id):若可替换信息数据盒设于当前媒体轨道中,则用于指示当前媒体轨道对应码流所属的可替换组标识符。若可替换信息数据盒设于当前媒体项目中,则用于指示当前媒体项目对应码流所属的可替换组标识符。示意性地,上述可替换组标识符可以为1。
多替换流标识字段(multi_alternative_bitstream_flag):若可替换信息数据盒设于当前媒体轨道中,则用于指示当前媒体轨道是否属于多个码流;若可替换信息数据盒设于当前媒体项目中,则用于指示当前媒体项目是否属于多个码流。
码流数量字段(num_bitstream):若可替换信息数据盒设于当前媒体轨道中,则用于指示当前媒体轨道所属码流的数量;若可替换信息数据盒设于当前媒体项目中,则用于指示当前媒体项目所属码流的数量。
示意性地,multi_alternative_bitstream_flag取值为1,当前媒体轨道属于多个码流,且码流的数量由num_bitstream指示。
组成参考字段(components_ref_type):若可替换信息数据盒设于当前媒体轨道中,则用于指示当前媒体轨道及与当前媒体轨道属于同一码流的其他媒体轨道之间的关联方式;示意性地,components_ref_type取值为0,表示当前媒体轨道通过轨道参考关联至属于同一码流的其他媒体轨道,且轨道参考类型字段(track_ref_type)用于指示轨道参考的类型。components_ref_type取值为1,表示当前媒体轨道通过轨道组关联至属于同一码流的其他媒体轨道,且轨道组的类型由轨道组类型字段(track_group_type)指示,轨道组的标识符由轨道组标识字段(track_group_id)指示。若可替换信息数据盒设于当前媒体项目中,则用于指示当前媒体项目及与当前媒体项目属于同一码流的其他媒体项目之间的关联方式。示意性地,components_ref_type取值为2,表示当前媒体项目通过项目参考关联至属于同一码流的其他媒体项目,且项目参考的类型由项目参考类型字段(item_ref_type)指示。components_ref_type取值为3,表示当前媒体项目通过实体组关联至属于同一码流的其他媒体项目,且实体组的类型由实体组类型字段(entity_group_type)指示,实体组的标识符由实体组标识字段(entity_group_id)指示。
多组件标志字段(multi_components_flag):若可替换信息数据盒设于当前媒体轨道中,则用于指 示当前媒体轨道所属码流是否被封装至多个媒体轨道。若可替换信息数据盒设于当前媒体项目中,则用于指示当前媒体项目所属码流是否被封装至多个媒体项目。
示意性地,在multi_alternative_bitstream_flag指示当前媒体轨道属于多个码流的情况下(即multi_alternative_bitstream_flag==1),可定义码流数量字段(num_bitstream)以及组成参考字段(components_ref_type)。针对当前媒体轨道所属的每个码流,通过components_ref_type可指示当前媒体轨道及与当前媒体轨道属于同一码流的其他媒体轨道之间的关联方式。而在multi_alternative_bitstream_flag指示当前媒体轨道属于一个码流的情况下,则可进一步通过多组件标志字段(multi_components_flag)指示该当前媒体轨道所属码流是否被封装至多个媒体轨道。在multi_components_flag指示该当前媒体轨道所属码流被封装至多个媒体轨道的情况下(即multi_components_flag==1),可定义组成参考字段(components_ref_type),进而指示该当前媒体轨道及与当前媒体轨道属于同一码流的其他媒体轨道的方式。
在另一种实现方式中,若通过可替换信息数据盒的数量来指示当前媒体轨道/当前媒体项目的共享归属关系,那么对于每个可替换信息数据盒具有如下表8所示的语法。
表8
与表7对比可知,该可替换数据盒中包含可替换组标识标志字段(alternative_group_id_flag)、多组件标志字段(multi_components_flag)以及组成参考字段(components_ref_type),其相关介绍可参见上述描述,在此不做赘述。但可替换数据盒中不包含多替换流标志字段(multi_alternative_bitstream_flag)以及相应指示下的判断内容。示意性地,如上述表8所示的可替换信息数据盒设置于当前媒体轨道中,可直接通过多组件标志字段(multi_components_flag)来指示该当前媒体轨道所属码流是否被封装至多个媒体轨道中,进而在该当前媒体轨道所属码流被封装至多个媒体轨道中的情况下(multi_components_flag==1),进一步定义组成参考字段(components_ref_type),以指示该当前媒体轨道及与当前媒体轨道属于同一码流的其他媒体轨道的方式。
可选地,若沉浸媒体为点云媒体,点云媒体的码流为点云码流,相应的可替换信息数据盒可通过限定类型的方式进一步简化。若当前媒体轨道/当前媒体项目所属码流为点云码流,且点云码流采用多轨封装方式进行封装,则该多组件标志字段的取值为第二预设数值(如“1”)。也就是说,当点云媒体采用多轨封装方式时,可直接认为multi_components_flag的取值为1。进一步地,当可替换的多个点云码流存在共享的媒体轨道时,可直接认为components_ref_type=1。从而通过特定的轨道组可组织一个点云码流对应的多个媒体轨道。基于上述内容,可省略可替换信息数据盒中字段取值不同时的判断,从而简化可替换数据盒中的内容,提高组织可替换的多个码流对应的媒体轨道的效率,节省查找媒体轨道/媒体项目所花费的资源。
依据沉浸媒体的不同传输方式,沉浸媒体的媒体文件的获取方式不同。作为一种实现方式,解码设备可接收到完整的沉浸媒体的媒体文件,该媒体文件封装了可替换的多个码流以及关系指示信息。作为另一种实现方式,沉浸媒体采用流化传输方式进行传输,获取沉浸媒体的媒体文件包括以下步骤:获取沉浸媒体的传输信令,该传输信令中包含关系指示信息的描述信息;根据该传输信令获取沉浸媒体的媒体文件。
其中,传输信令可以是DASH信令、MPD信令等等,该传输信令能够以信令描述文件的形式被解码设备获取到。描述信息用于定义关系指示信息所指示的具备可替换关系的N个码流。
可选地,描述信息包括N个预选标识(Preselection),每个预选标识分别用于表示N个码流中的一个码流;各个预选标识具备相同的编码标识(如@gpccId=1);每个预选标识对应一个或多个自适应集合(Adaptation),一个自适应集合代表每个预选标识所表示的码流中的一个媒体轨道或一个媒体项目;或者,每个预选标识对应一个或多个表示(Representation),一个表示代表每个预选标识所表示的码流中的一个媒体轨道或一个媒体项目。
示意性地,传输信令为DASH信令,描述信息包括的预选标识Preselection可以利用DASH信令中Preselection工具对同一码流的不同成分(如不同媒体轨道/不同媒体项目)定义得到,且该Preselection用于表示N个码流中的一个码流。N个预选标识不同以表示不同码流,例如预选标识Preselection1对应码流码流1,预选标识Preselection2对应码流码流2。不同预选标识所表示的码流之间的可替换关系,可通过相同的编码标识(@gpccId=1)指示。对于预选标识所表示的码流,则可通过预选标识对应的自适应集合(Adaptation)的组合表示。一个自适应集合包含一个标识。一个预选标识对应的自适应集合(Adaptation)的数量或者表示(Representation)的数量,与该预选标识所表示的码流的媒体轨道/媒体项目的数量相等的。示意性地,若预选标识对应一个表示Representation/自适应集合Adaptation,那么说明该预选标识所表示的码流的组成成分包含一个媒体轨道或者一个媒体项目。
在得到传输信令之后,解码设备可根据自身的性能以及对沉浸媒体的呈现需求来请求相应的媒体文件的片段,进而解码获取到的媒体文件的片段进行解封装和解码,并呈现该沉浸媒体。其中,解码设备的性能包括但不限于:解码设备所支持的编码方式、解码设备所支持的带宽、解码设备的中心处理器CPU所支持的处理能力、解码设备的图像处理器GPU所支持的渲染能力等等,呈现需求包括但不限于:呈现清晰度、呈现分辨率、码率、尺寸、观看视角、观看方位等等。
S502,根据关系指示信息,对媒体文件进行解码处理,以呈现沉浸媒体。
在一个实施例中,根据关系指示信息,对媒体文件进行解码处理,以呈现沉浸媒体可以包括以下步骤:首先按照关系指示信息所指示的可替换关系,从可替换的N个码流中确定所需呈现的码流;然后对该所需呈现的码流进行解码并呈现。
作为一种实现方式,解码设备可获取到完整的媒体文件。示意性的,沉浸媒体为时序沉浸媒体,该媒体文件中包含N个码流对应的M个媒体轨道,且关系指示信息设置在相应媒体轨道中。解码设备可先对媒体文件进行解封装,得到M个媒体轨道,然后根据设置在媒体轨道中的关系指示信息所指示的可替换关系,从中确定所需呈现的码流进行解码,此处确定所需呈现的码流具体是基于关系指示信息筛选出可表示该码流的所有媒体轨道,可能是多个媒体轨道的组合或者单个媒体轨道。此外,还可结合根据解码设备的设备性能以及对沉浸媒体的呈现需求来确定对应的媒体轨道。接着对所需呈现的码流进行解码,具体是对选择出的媒体轨道进行解码,从而呈现出该沉浸媒体。可理解的是,若所需呈现的码流是通过媒体项目来表示的,那么解码的对象具体是该媒体项目。
作为另一种可实现的方式,解码设备可根据传输信令获取到沉浸媒体的媒体文件。该沉浸媒体的 媒体文件以片段的形式被获取到,示意性地,该媒体文件的片段包括一个或多个媒体轨道,可表示N个码流中所需呈现的码流,通过对该媒体文件的片段进行解封装,可得到该一个或多个媒体轨道以及媒体轨道中的关系指示信息,进而依据关系指示信息可进一步解码媒体轨道并呈现沉浸媒体。若媒体文件的片段包括媒体项目,则可解码媒体项目进而呈现沉浸媒体。
本申请实施例提供的沉浸媒体的数据处理方法,可获取沉浸媒体的媒体文件,该沉浸媒体包括可替换的N个码流,该媒体文件包括关系指示信息,且关系指示信息用于指示N个码流之间的可替换关系,N为大于1的整数;根据关系指示信息,对媒体文件进行解码处理,以呈现沉浸媒体。通过关系指示信息能够清晰地指示沉浸媒体的任意两个码流之间的可替换关系,进而基于可替换关系指导沉浸媒体的准确解码及呈现,提升沉浸媒体的呈现效果。在此基础上,基于对码流的封装,关系指示信息可设置于相应的媒体轨道/媒体项目中,不仅可指示出码流级别的可替换关系,还可进一步支持同一码流对应的多个媒体轨道(或者多个媒体项目)之间的组合关系和/或被不同码流共享的媒体轨道(或媒体项目)的共享归属关系。基于上述关系的指示,对于任意码流均能够依据关系指示信息准确地获取到所需呈现的码流对应的媒体轨道/媒体项目,进而实现对沉浸媒体的任意版本的内容的呈现,进而提升沉浸媒体的呈现效果。
请参见图6a,图6a为本申请实施例提供的一种沉浸媒体的数据处理方法的流程示意图,该沉浸媒体的数据处理方法可以由沉浸媒体的数据处理系统中的服务设备202来执行,该方法包括以下步骤S601-S602:
S601,对沉浸媒体进行编码处理,得到可替换的N个码流。
在一种实现方式中,服务设备可采用不同的编码方式对沉浸媒体进行编码处理,得到沉浸媒体的可替换的N个码流。此时N个码流具有相同内容以及不同编码类型。在另一种实现方式中,也可按照不同的质量标准对沉浸媒体进行编码处理,得到沉浸媒体的可替换的N个码流。此时N个码流具有相同内容以及不同质量。可替换的N个码流可视为不同版本的码流,N个码流中的任意两个码流之间具备可替换关系,基于该可替换关系,不同码流在呈现时允许相互替换。此处可替换包括但不限于以下任一种或多种:质量的可替换、编码类型的可替换及内容的可替换。
S602,根据N个码流之间的可替换关系,生成关系指示信息。
S603,对关系指示信息和N个码流进行封装处理,得到沉浸媒体的媒体文件。
所生成的关系指示信息可用于指示N个码流之间的可替换关系。服务设备可对N个码流中的每个码流进行封装处理,并依据封装处理得到的组成成分(如媒体轨道/媒体项目)之间的关系以及与码流的归属关系来添加相应的关系指示信息。在一种实现方式中,沉浸媒体为时序沉浸媒体,服务设备可将可替换的N个码流封装为M个媒体轨道(M>=N),其中,每个码流可被封装为M个媒体轨道中的一个或多个媒体轨道。
为便于描述,N个码流中的任一码流可表示为码流i,基于对码流i的不同封装方式,关系指示信息在编码端的设置以及关系指示信息的内容如下(1)-(3)所示。
(1)当码流i被封装为一个媒体轨道Mi时,可在该媒体轨道Mi中添加关系指示信息,该关系指示信息可用于指示码流i与其他码流之间的可替换关系,其他码流是指N个码流中除码流i之外的码流。
(2)当码流i被封装为多个媒体轨道时,可在多个媒体轨道中的任一媒体轨道中添加关系指示信息,且该关系指示信息还可用于指示该媒体轨道与码流i的其他媒体轨道之间的关联关系,该关系指示信息包含该关联关系的指示。此时,在相应媒体轨道中添加的关系指示信息不仅可用于指示该媒体轨道所属的码流i与其他码流之间的可替换关系,还可指示该媒体轨道与同属于码流i的其他媒体轨道之间的关联关系,以表明多个媒体轨道的组合可表示码流i。
(3)当N个码流中至少两个可替换码流均被封装为多个媒体轨道,并且不同码流所封装的多个媒体轨道中包含相同的媒体轨道时,对于不同码流之间重复的媒体轨道,在媒体文件中可仅保留一个媒体轨道,在被至少两个码流共享的媒体轨道中可添加关系指示信息,以指示该媒体轨道同时属于多个码流。示例性地,如图6b所示的媒体文件610,其中,码流1和码流2中的几何信息完全一致时,即几何信息采用完全相同的编码方式,从而在媒体文件610中可省略一个重复的几何轨道,即在媒体文件中仅包含一个几何轨道track1,并且在该几何轨道track1中可添加关系指示信息,以标识几何轨道 track1为共享的媒体轨道。
可理解的是,如上述关系指示信息设置于被多个码流共享的媒体轨道中,除了可指示媒体轨道被多个码流共享,还可指示该媒体轨道与其他媒体轨道同属于码流i。基于码流i与其他码流之间的可替换关系,能够得知媒体轨道的组合与表示其他码流的媒体轨道组合或者单个媒体轨道之间可相互替换。如上述图6b所示的媒体文件610中,轨道track2和track3之间具备可替换关系,并且基于关系指示信息还能够得知轨道1+轨道2、轨道1+轨道3、轨道4+轨道5中任意两个组合之间的可替换关系,从而达到码流级别的可替换关系的指示目的,实现码流之间的可替换关系更加准确清晰地指示。
通过将码流封装为媒体轨道,并按照上述方式在相应媒体轨道中添加关系指示信息,从而可形成沉浸媒体的媒体文件。
在另一种实现方式中,沉浸媒体为非时序沉浸媒体,服务设备可将可替换的N个码流封装为P个媒体项目,其中,每个码流可被封装为P个媒体项目中的一个或多个媒体项目。对于非时序沉浸媒体的码流i,同理有如下(4)-(6)关系指示信息的设置以及内容指示。
(4)当码流i被封装为一个媒体项目Mi时,可在该媒体项目Mi中添加关系指示信息,该关系指示信息可用于指示码流i与其他码流之间的可替换关系,其他码流是指N个码流中除码流i之外的码流。
(5)当码流i被封装为多个媒体项目时,可在多个媒体项目中的任一媒体项目中添加关系指示信息,且该关系指示信息还可用于指示该媒体项目与码流i的其他媒体项目之间的关联关系,该关系指示信息包含该关联关系的指示。此时,在相应媒体项目中添加的关系指示信息不仅可用于指示该媒体项目所属的码流i与其他码流之间的可替换关系,还可指示该媒体项目与同属于码流i的其他媒体项目之间的关联关系,以表明多个媒体项目的组合可表示码流i。
(6)当N个码流中至少两个可替换码流均被封装为多个媒体项目,并且不通过码流所封装的多个媒体项目中包含相同的媒体项目时,对于码流对应的重复的媒体项目,在媒体文件中可保留一个媒体项目,并在该媒体项目中添加关系指示信息,以指示该媒体项目同时属于多个码流。
在本申请实施例中,对沉浸媒体进行编码处理,可得到沉浸媒体的N个码流,且N个码流之间具备可替换关系。根据该可替换关系可生成关系指示信息,并对该关系指示信息以及N个码流进行封装,可得到沉浸媒体的媒体文件。可见,在对沉浸媒体的编码过程中,可在媒体文件中添加关系指示信息,以指示不同码流之间的可替换关系,这样,通过关系指示信息便达到指示码流级别的可替换关系的目的。通过码流级别的可替换关系的指示,无论媒体文件中所封装的可替换的码流的数量是多少,码流采用的封装方式是何种,均能够通过设置在相应媒体轨道/媒体项目中的关系指示信息,清晰地指示出可替换的任意两个码流之间的可替换关系。如此,对于沉浸媒体的可替换的码流的数量具备足够的兼容性,通用性强,且可扩展性强。在解码端基于该关系指示信息还能够可灵活地组织码流对应的媒体轨道/媒体项目,准确地选择对应的媒体轨道/媒体项目,进而指导沉浸媒体的解码准确呈现,提升沉浸媒体的呈现效果。
下面通过一个完整的例子对本申请提供的沉浸媒体的数据处理方法进行详细说明:
1.服务设备可以获取沉浸媒体,并对沉浸媒体进行编码处理,得到可替换的N个码流。
2.服务设备根据N个码流之间的可替换关系,生成关系指示信息。
3.服务设备对关系指示信息和N个码流进行封装处理,得到沉浸媒体的媒体文件。
在此假设沉浸媒体为点云媒体,沉浸媒体的码流为点云码流,N取值3。对点云媒体编码可得到3个点云码流,分别记为码流1、码流2、码流3,且3个点云码流为相同内容不同质量的可替换的码流。其中,码流1、码流2码流的几何数据以完全相同的编码方式获得,且码流1、码流2码流以基于组件的多轨封装方式进行封装。另外,码流1和码流2中的某个属性(例如反射率)也以完全相同的编码方式获得,且码流3以单轨封装方式进行封装。那么对于码流1和码流2中的几何轨道以及反射率属性轨道,均是相同的媒体轨道,因此在媒体文件中仅保留一个几何轨道和一个反射率属性轨道。对于媒体文件中所包含的媒体轨道的封装结果的示意图,如图7所示。其中,码流1和码流2共享Track1几何轨道(GeometryTrack)以及Track4属性轨道(AttributeTrack),且码流1和码流2具有不同的颜色属性轨道(AttributeTrack(color)),由Track1、Track2以及Track4组合可表示码流1,由Track1、Track3以及Track4组合可表示码流2,由Track5可表示码流3。Track5是对码流3采用单轨封装方式 得到的包含几何以及属性数据的轨道(Geometry+Attributes Track)。
对于媒体轨道包含的可替换信息元数据以及相关的数据盒的使用如下可参见如下表9。
表9
其中,基于上表11的设置,对于Track1,设置了2个可替换信息数据盒AlternativeInfoBox来指示该Track1被2个码流(即码流1和码流2)共享,并且不同可替换数据盒中轨道组标识字段track_group_id的取值不同。此外,还设置有播放轨道组数据盒PlayoutTrackGroupBox,表示其需联合其他媒体轨道播放,PlayoutTrackGroupBox中包含轨道组类型字段track_group_type和轨道组标识字段track_group_id,且取值均和可替换信息数据盒中相应字段的取值相同。属于同一轨道组的媒体轨道的轨道组标识字段的取值相同。对于采用单轨封装方式以及多轨封装方式可通过多组件标志字段multi_components_flag的取值得知(如取值为0即表示单轨封装,取值为1即表示多轨封装)。对于Track2、Track3以及Track4则均设置了PlayoutTrackGroupBox,也说明需联合其他媒体轨道播放,具体的联合方式即通过track_group_id来指示,例如Track1、Track2和Track4中的track_group_id相同,以表示这些轨道需联合播放。Track1、Track3和Track4同理需联合播放,Track5仅设置了一个可替换信息数据盒AlternativeInfoBox,且可替换组标识和Track1中的AlternativeInfoBox包含的可替换组标识相同,表示对应码流之间具备可替换关系。
4.服务设备可将沉浸媒体的媒体文件传输给解码设备。对于媒体文件的传输包含以下两种传输方式:
A、服务设备可直接传输完整的媒体文件F至解码设备,该媒体文件包含关系指示信息。
B、服务设备可采用流化传输方式,传输媒体文件的一个或多个片段Fs(例如包含媒体文件的一个或多个媒体轨道)至解码设备。
在流化传输的过程中,服务设备根据码流之间的可替换关系,生成关系指示信息的描述信息,该描述信息可定义该关系指示信息所指示的具备可替换关系的N个码流,然后将该关系指示信息的描述信息通过传输信令发送给解码设备,该传输信令的形式可以是一个信令描述文件。解码设备可根据该关系指示信息的描述信息确定码流之间的可替换关系,再根据传输信令获取需呈现的码流。
如上述示例中,服务设备可根据几何轨道和属性轨道之间的共享以及可替换关系,生成信令描述文件。该信令描述文件中包含关系指示信息的描述信息。以DASH为例,可以利用DASH信令中现有的Preselection工具,将同一个码流的不同成分(此处对应媒体轨道)定义为一个Preselection,并给相同的点云内容添加相同的编码标识@gpccId表示不同版本的点云码流。具体来说,媒体文件中所包含的轨道Track1~Track5分别对应自适应集合/表示Adaptation1/Representation1~Adaptation5/Representation5。关系指示信息的描述信息具体如下:
Preselection1:Adaptation1+Adaptation2+Adaptation4;@gpccId=1
Preselection2:Adaptation1+Adaptation3+Adaptation4;@gpccId=1
Preselection3:Adaptation5;@gpccId=1
其中,Adaptation1为track1对应的自适应集合,Adaptation2为track2对应的自适应集合,Adaptation3为track3对应的自适应集合,Adaptation4为track4对应的自适应集合,Adaptation5为track5对应的自适应集合。预选择标识Preselection对应一个码流,不同预选择标识Preselection的@gpccId=1表示不同码流之间可替换。
5.解码设备接收到沉浸媒体的媒体文件,媒体文件中包含关系指示信息。
6.解码设备根据关系指示信息对媒体文件进行解码以呈现沉浸媒体。
基于媒体文件的传输方式不同,解码设备可接收到完整的媒体文件F,或者基于传输信令获取到媒体文件的片段Fs。以媒体文件上述示例的点云文件F1为例进行说明。
(1)解码设备接收到完整的点云文件F1,该点云文件F1中包含可替换的N个码流对应的所有媒体轨道。解码设备可先对点云文件进行解封装,得到其中包含的媒体轨道,然后依据设置于媒体轨道中的数据盒信息可得知有以下3种表示码流的选择:①Track1+Track2+Track4;②Track1+Track3+Track4;③Track5。接着可根据解码设备的性能以及对点云媒体的呈现需求,并结合点云文件中关系指示信息所指示的可替换关系,选择所需呈现的码流,具体是选择点云文件F1中的轨道,再依据点云文件中相应的元数据信息,对所选择的轨道进行解码以呈现点云媒体。在此方式下,由于预先已获取到的完整的媒体文件,在解码设备需切换不同版本的码流时,可直接解码所需呈现的码流,实现更加高效地切换,从而提高沉浸媒体切换时的呈现效率。
(2)解码设备先接收到信令描述文件,并解析该信令描述文件,得到关系指示信息的描述信息,基于该描述信息可知解码设备对于码流的表示包括以下几种选择:
Representation1+Representation2+Representation4;
Representation1+Representation3+Representation4;
Representation5。
其中,Representation1为track1对应的表示,Representation2为track2对应的表示,Representation3为track3对应的表示,Representation4为track4对应的表示,Representation5为track5对应的表示。Representation1+Representation2+Representation4对应码流码流1,Representation1+Representation3+Representation4对应码流码流2,Representation5对应码流码流3。
解码设备可根据设备性能以及呈现需求,基于传输信令请求相应的传输流(对应点云文件中的一个或多个轨道,即文件片段)Fs。然后,解码设备可对接收到的文件片段进行解封装,并对媒体轨道进行解码,最终呈现点云媒体。在此种方式下,解码设备无需接收全部的媒体文件,而是依据传输信令来精准地获取所需呈现的码流,从而能够减少沉浸媒体一次呈现的资源消耗。
应理解的是,对于非时序沉浸媒体,上述实施例的示例中的媒体轨道替换为媒体项目,同样适用。
在本申请实施例中,服务设备可以获取沉浸媒体,并对沉浸媒体进行编码处理,得到可替换的多个码流;然后依据码流之间的可替换关系生成关系指示信息,再对关系指示信息和码流进行封装得到沉浸媒体的媒体文件,并将该媒体文件传输至解码设备。解码设备可接收媒体文件,根据媒体文件中所包含的关系指示信息解码沉浸媒体并呈现该沉浸媒体。可见,在对沉浸媒体进行编码的过程中(具体是在封装媒体文件的过程中),通过在沉浸媒体的媒体文件中添加关系指示信息,便可通过关系指示信息有效指示沉浸媒体的不同码流之间的可替换关系,从而指导解码端基于自身需求准确地呈现沉浸媒体,提升沉浸媒体的呈现准确性以及呈现效果。
相较于将相互之间可替换的码流封装为不同的媒体文件(如媒体文件F1中包含码流1对应的媒体轨道,媒体文件F2中包含码流2对应的媒体轨道)的方式,本申请实施例通过将沉浸媒体的N个码流对应的所有媒体轨道封装在一个媒体文件中,并基于关系指示信息来指示各个码流之间的可替换关系,更加简洁、高效。另外,相较于将不同码流封装在一个媒体文件中,且不同码流的轨道所封装的媒体轨道存在的相同的媒体轨道,会在媒体文件中会重复出现的方式,本方案对于相同的媒体轨道,在媒体文件中仅保留一个被至少两个码流共享,并通过关系指示信息来指示该媒体轨道与码流之间的归属关系,这样即可节省存储资源,还可准确地找到码流对应媒体文件来解码。进一步地,相较于将不同码流封装在一个媒体文件中,并在媒体文件中省略重复的媒体轨道,且指示媒体轨道之间的可替 换关系。本方案基于关系指示信息指示的是码流级别的可替换关系,而非轨道级别/项目级别的可替换关系,这样无论媒体文件中封装有多少码流,均能够基于关系指示信息准确地选择出对应的媒体轨道/媒体项目,进而解码出对应的码流来呈现沉浸媒体,具备较好的呈现效果,且具备更好的通用性。
接下来对本申请实施例涉及的沉浸媒体的数据处理装置进行相关阐述。
请参见图8a,图8a是本申请实施例提供的一种沉浸媒体的数据处理装置的结构示意图,该沉浸媒体的数据处理装置可以设置于本申请实施例提供的计算机设备中,计算机设备可以是上述方法实施例中提及的解码设备。图8a所示的沉浸媒体的数据处理装置可以是运行于计算机设备中的一个计算机程序(包括程序代码),该沉浸媒体的数据处理装置可以用于执行图5所示的方法实施例中的部分或全部步骤。请参见图8a,该沉浸媒体的数据处理装置可以包括如下单元:
获取单元801,用于获取沉浸媒体的媒体文件;沉浸媒体包括可替换的N个码流;媒体文件包括关系指示信息,关系指示信息用于指示N个码流之间的可替换关系,N为大于1的整数;
处理单元802,用于根据关系指示信息,对媒体文件进行解码处理,以呈现沉浸媒体。
在一个实施例中,沉浸媒体为时序沉浸媒体;N个码流在媒体文件中被封装为M个媒体轨道;M为整数且M大于或等于N;关系指示信息设置于媒体轨道中。
在一个实施例中,N个码流中的任一个码流表示为码流i,i为正整数且i小于或等于N;码流i被封装至M个媒体轨道中的一个媒体轨道Mi中;关系指示信息设置于媒体轨道Mi中。
在一个实施例中,N个码流中的任一个码流表示为码流i,i为正整数且i小于或等于N;码流i被封装至M个媒体轨道中的多个媒体轨道中;关系指示信息设置于媒体轨道Mi中;媒体轨道Mi是指码流i被封装至的多个媒体轨道中的任一个媒体轨道。
在一个实施例中,所述关系指示信息还用于指示所述媒体轨道Mi与所述多个媒体轨道中除所述媒体轨道Mi之外的其他媒体轨道之间的关联关系;关联关系用于表示媒体轨道Mi与其他媒体轨道属于同一个码流i。
在一个实施例中,N个码流中的任意两个码流分别表示为码流i和码流j,i、j为正整数且i、j均小于或等于N;码流i被封装至M个媒体轨道中的第一多个媒体轨道中;码流j被封装至M个媒体轨道中的第二多个媒体轨道中;若所述第一多个媒体轨道与所述第二多个媒体轨道均包含媒体轨道Mij,则关系指示信息还用于指示媒体轨道Mij的共享归属关系;共享归属关系用于表示媒体轨道Mij是码流i和码流j共享的媒体轨道。
在一个实施例中,沉浸媒体为非时序沉浸媒体;N个码流在媒体文件中被封装为P个媒体项目;P为整数且P大于或等于N;关系指示信息设置于媒体项目中。
在一个实施例中,N个码流中的任一个码流表示为码流i,i为正整数且i小于或等于N;码流i被封装至P个媒体项目中的多个媒体项目中;关系指示信息设置于媒体项目Pi中;媒体项目Pi是指码流i被封装至的多个媒体项目中的任一个媒体项目。
在一个实施例中,所述关系指示信息还用于指示所述媒体项目Pi与所述多个媒体项目中除所述媒体项目Pi之外的其他媒体项目之间的关联关系;关联关系用于表示媒体项目Pi与其他媒体项目属于同一个码流i。
在一个实施例中,关系指示信息还用于指示媒体项目Pi与码流i对应的其他媒体项目之间的关联关系;
其中,其他媒体项目是指码流i被封装至的多个媒体项目中除媒体项目Pi之外的媒体项目;关联关系用于表示媒体项目Pi与其他媒体项目属于同一个码流i。
在一个实施例中,N个码流中的任意两个码流分别表示为码流i和码流j,i、j为正整数且i、j均小于或等于N;码流i被封装至P个媒体项目中的第一多个媒体项目中;码流j被封装至P个媒体项目中的第二多个媒体项目中;若所述第一多个媒体项目与所述第二多个媒体项目均包含媒体项目Pij,则关系指示信息还用于指示媒体项目Pij的共享归属关系;共享归属关系用于表示媒体项目Pij是码流i和码流j共享的媒体项目。
在一个实施例中,若沉浸媒体为时序沉浸媒体,则N个码流在媒体文件中被封装为M个媒体轨道;M为整数且M大于或等于N;其中,当M个媒体轨道中存在多个媒体轨道需要被联合播放时, 需要被联合播放的多个媒体轨道属于同一个播放轨道组;
若沉浸媒体为非时序沉浸媒体,则N个码流在媒体文件中被封装为P个媒体项目;P为整数且P大于或等于N;其中,当P个媒体项目中存在多个媒体项目需要被联合播放时,需要被联合播放的多个媒体项目属于同一个播放实体组。
在一个实施例中,具备可替换关系的N个码流属于同一个可替换组,同一个可替换组中的不同码流在呈现时允许相互替换;关系指示信息包括可替换信息数据盒;
若可替换信息数据盒设于当前媒体轨道中,则可替换信息数据盒中包含当前媒体轨道对应码流所属的可替换组的信息;
若可替换信息数据盒设于当前媒体项目中,则可替换信息数据盒中包含当前媒体项目对应码流所属的可替换组的信息;
其中,当前媒体轨道是指正在被解码的媒体轨道;当前媒体项目是指正在被解码的媒体项目。
在一个实施例中,可替换信息数据盒中包含可替换组标识标志字段和可替换组标识字段;
若可替换信息数据盒设于当前媒体轨道中,则可替换组标识标志字段用于指示当前媒体轨道中的可替换信息数据盒中是否指示当前媒体轨道对应码流所属的可替换组标识符;当可替换组标识标志字段的取值为第一预设数值时,表示当前媒体轨道中的可替换信息数据盒中指示当前媒体轨道对应码流所属的可替换组标识符;当可替换组标识标志字段的取值为第二预设数值时,表示当前媒体轨道中的可替换信息数据盒中不指示当前媒体轨道对应码流所属的可替换组标识符;可替换组标识字段用于指示当前媒体轨道对应码流所属的可替换组标识符;
若可替换信息数据盒设于当前媒体项目中,则可替换组标识标志字段用于指示当前媒体项目中的可替换信息数据盒中是否指示当前媒体项目对应码流所属的可替换组标识符;当可替换组标识标志字段的取值为第一预设数值时,表示当前媒体项目中的可替换信息数据盒中指示当前媒体项目对应码流所属的可替换组标识符;当可替换组标识标志字段的取值为第二预设数值时,表示当前媒体项目中的可替换信息数据盒中不指示当前媒体项目对应码流所属的可替换组标识符;可替换组标识字段用于指示当前媒体项目对应码流所属的可替换组标识符。
在一个实施例中,关系指示信息还用于指示当前媒体轨道的共享归属关系或当前媒体项目的共享归属关系;可替换信息数据盒包括多替换码流标志字段和码流数量字段;
若可替换信息数据盒设于当前媒体轨道中,则多替换码流标志字段用于指示当前媒体轨道是否属于多个码流;当多替换码流标志字段的取值为第一预设数值时,表示当前媒体轨道仅属于一个码流;当多替换码流标志字段的取值为第二预设数值时,表示当前媒体轨道属于多个码流;码流数量字段用于指示当前媒体轨道所属的码流的数量;
若可替换信息数据盒设于当前媒体项目中,则多替换码流标志字段用于指示当前媒体项目是否属于多个码流;当多替换码流标志字段的取值为第一预设数值时,表示当前媒体项目仅属于一个码流;当多替换码流标志字段的取值为第二预设数值时,表示当前媒体项目属于多个码流;码流数量字段用于指示当前媒体项目所属的码流的数量。
在一个实施例中,关系指示信息还用于指示当前媒体轨道的共享归属关系或当前媒体项目的共享归属关系;
若当前媒体轨道中仅包含一个可替换信息数据盒,则指示当前媒体轨道仅属于一个码流;若当前媒体轨道包含多个可替换信息数据盒,则指示当前媒体轨道属于多个码流,当前媒体轨道中的可替换信息数据盒的数量应等于当前媒体轨道所属的码流的数量;或者,
若当前媒体项目中仅包含一个可替换信息数据盒,则指示当前媒体项目仅属于一个码流;若当前媒体项目包含多个可替换信息数据盒,则指示当前媒体项目属于多个码流;当前媒体项目中的可替换信息数据盒的数量应等于当前媒体项目所属的码流的数量。
在一个实施例中,述关系指示信息还用于指示当前媒体轨道及与当前媒体轨道属于同一码流的其他媒体轨道之间的关联关系,或用于指示当前媒体项目及与当前媒体项目属于同一码流的其他媒体项目之间的关联关系;
可替换信息数据盒包括组件参考类型字段,组件参考类型字段用于指示当前媒体轨道及与当前媒体轨道属于同一码流的其他媒体轨道之间的关联方式,或用于指示当前媒体项目及与当前媒体项目属 于同一码流的其他媒体项目之间的关联方式;
当组件参考类型字段的取值为第一预设数值时,表示当前媒体轨道通过轨道参考关联至与当前媒体轨道属于同一码流的其他媒体轨道,且可替换信息数据盒还包括轨道参考类型字段,轨道参考类型字段用于指示轨道参考的类型;
当组件参考类型字段的取值为第二预设数值时,表示当前媒体轨道通过轨道组关联至与当前媒体轨道属于同一码流的其他媒体轨道,且可替换信息数据盒还包括轨道组类型字段和轨道组标识字段,轨道组类型字段用于指示当前媒体轨道所属轨道组的类型;轨道标识字段用于指示当前媒体轨道所属轨道组的标识符;
当组件参考类型字段的取值为第三预设数值时,表示当前媒体项目通过项目参考的方式关联至与当前媒体项目属于同一码流的其他媒体项目,且可替换信息数据盒还包括项目参考类型字段,项目参考类型字段用于指示项目参考的类型;
当组件参考类型字段的取值为第四预设数值时,表示当前媒体项目通过实体组的方式关联至与当前媒体项目属于同一码流的其他媒体项目,且可替换信息数据盒还包括实体组类型字段和实体组标识字段,实体组类型字段用于指示当前媒体项目所属实体组的类型;实体组标识字段用于指示当前媒体项目所属实体组的标识符。
在一个实施例中,可替换信息数据盒还包括多组件标志字段;
若可替换信息数据盒设于当前媒体轨道中,则多组件标志字段用于指示当前媒体轨道所属码流是否被封装至多个媒体轨道中;当多组件标志字段的取值为第一预设数值时,表示当前媒体轨道所属码流被封装至一个媒体轨道中,当前媒体轨道是其所属码流封装至的媒体轨道;当多组件标志字段的取值为第二预设数值时,表示当前媒体轨道所属码流被封装至多个媒体轨道中,当前媒体轨道为其所属码流被封装至的多个媒体轨道中的的任一个媒体轨道;
若可替换信息数据盒设于当前媒体项目中,则多组件标志字段用于指示当前媒体项目所属码流是否被封装至多个媒体项目中;当多组件标志字段的取值为第一预设数值时,表示当前媒体项目所属码流被封装至一个媒体项目中,当前媒体项目是其所属码流封装至的媒体项目;当多组件标志字段的取值为第二预设数值时,表示当前媒体项目所属码流被封装至多个媒体项目中,当前媒体项目为其所属码流被封装至的多个媒体项目中的的任一个媒体项目;
其中,若当前媒体轨道/当前媒体项目所属码流为点云码流,且点云码流采用多轨封装方式进行封装,则多组件标志字段的取值为第二预设数值。
在一个实施例中,沉浸媒体采用流化传输方式进行传输,获取单元801,具体用于:获取沉浸媒体的传输信令;根据传输信令获取沉浸媒体的媒体文件。
在一个实施例中,传输信令中包含关系指示信息的描述信息,描述信息用于定义关系指示信息所指示的具备可替换关系的N个码流;描述信息包括N个预选标识,每个预选标识分别用于表示N个码流中的一个码流;各个预选标识具备相同的编码标识;每个预选标识对应一个或多个自适应集合,一个自适应集合代表每个预选标识所表示的码流中的一个媒体轨道或一个媒体项目;或者,每个预选标识对应一个或多个表示,一个表示代表每个预选标识所表示的码流中的一个媒体轨道或一个媒体项目。
在一个实施例中,处理单元802,具体用于:按照关系指示信息所指示的可替换关系,从可替换的N个码流中确定所需呈现的码流;对所需呈现的码流进行解码并呈现;其中,沉浸媒体包括以下任一种或多种:体积媒体、容积视频媒体、多视角视频媒体、字幕媒体以及音频媒体。
请参见图8b,图8b是本申请实施例提供的一种沉浸媒体的数据处理装置的结构示意图,该沉浸媒体的数据处理装置可以设置于本申请实施例提供的计算机设备中,计算机设备可以是上述方法实施例中提及的服务设备。图8b所示的沉浸媒体的数据处理装置可以是运行于计算机设备中的一个计算机程序(包括程序代码),该沉浸媒体的数据处理装置可以用于执行图6a所示的方法实施例中的部分或全部步骤。请参见图8b,该沉浸媒体的数据处理装置可以包括如下单元:
编码单元811,用于对沉浸媒体进行编码处理,得到可替换的N个码流;
处理单元812,用于根据N个码流之间的可替换关系,生成关系指示信息,关系指示信息用于指 示N个码流之间的可替换关系;
处理单元812,还用于对关系指示信息和N个码流进行封装处理,得到沉浸媒体的媒体文件。
在本申请实施例中,沉浸媒体的编码端可对沉浸媒体进行编码处理,可得到沉浸媒体的N个码流,且N个码流之间具备可替换关系。根据该可替换关系可生成用于指示该可替换关系的关系指示信息,并对该关系指示信息以及N个码流进行封装,可得到沉浸媒体的媒体文件。可见,在对沉浸媒体的编码过程中,可在媒体文件中添加关系指示信息,以指示不同码流之间的可替换关系,这样,通过关系指示信息便达到指示码流级别的可替换关系的目的。基于该关系指示信息能够指导沉浸媒体的解码准确呈现,提升沉浸媒体的呈现效果。
接下来对本申请实施例提供的解码设备和服务设备进行相关阐述。
进一步地,本申请实施例还提供了一种计算机设备的结构示意图,该计算机设备的结构示意图可参见图9;该计算机设备可以包括:处理器901、输入设备902,输出设备903和存储器904。上述处理器901、输入设备902、输出设备903和存储器904通过总线连接。存储器904用于存储计算机程序,计算机程序包括程序指令,处理器901用于执行存储器904存储的程序指令。
在一个实施例中,该计算机设备可以是上述解码设备;在此实施例中,处理器901通过运行存储器904中的可执行程序代码,执行如上所述的沉浸媒体的数据处理方法。
此外,这里需要指出的是:本申请实施例还提供了一种计算机可读存储介质,且计算机可读存储介质中存储有计算机程序,且该计算机程序包括程序指令,当处理器执行上述程序指令时,能够执行前文图5和图6a所对应实施例中的方法,因此,这里将不再进行赘述。对于本申请所涉及的计算机可读存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。作为示例,程序指令可以被部署在一个计算机设备上,或者在位于一个地点的多个计算机设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算机设备上执行。
根据本申请的一个方面,提供了一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机程序,处理器执行该计算机程序,使得该计算机设备可以执行前文图5和图6a所对应实施例中的方法,因此,这里将不再进行赘述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本申请一种较佳实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于本申请所涵盖的范围。

Claims (27)

  1. 一种沉浸媒体的数据处理方法,由计算机设备执行,包括:
    获取沉浸媒体的媒体文件;所述沉浸媒体包括可替换的N个码流;所述媒体文件包括关系指示信息,所述关系指示信息用于指示所述N个码流之间的可替换关系,N为大于1的整数;及,
    根据所述关系指示信息,对所述媒体文件进行解码处理,以呈现所述沉浸媒体。
  2. 如权利要求1所述的方法,其中,所述沉浸媒体为时序沉浸媒体;所述N个码流在所述媒体文件中被封装为M个媒体轨道;M为整数且M大于或等于N;
    所述关系指示信息设置于所述媒体轨道中。
  3. 如权利要求2所述的方法,其中,所述N个码流中的任一个码流表示为码流i,i为正整数且i小于或等于N;所述码流i被封装至所述M个媒体轨道中的一个媒体轨道Mi中;
    所述关系指示信息设置于所述媒体轨道Mi中。
  4. 如权利要求2所述的方法,其中,所述N个码流中的任一个码流表示为码流i,i为正整数且i小于或等于N;所述码流i被封装至所述M个媒体轨道中的多个媒体轨道中;
    所述关系指示信息设置于媒体轨道Mi中;所述媒体轨道Mi是指所述多个媒体轨道中的任一个媒体轨道。
  5. 如权利要求4所述的方法,其中,所述关系指示信息还用于指示所述媒体轨道Mi与所述多个媒体轨道中除所述媒体轨道Mi之外的其他媒体轨道之间的关联关系;所述关联关系用于表示所述媒体轨道Mi与所述其他媒体轨道属于同一个码流i。
  6. 如权利要求2所述的方法,其中,所述N个码流中的任意两个码流分别表示为码流i和码流j,i、j为正整数且i、j均小于或等于N;所述码流i被封装至所述M个媒体轨道中的第一多个媒体轨道中;所述码流j被封装至所述M个媒体轨道中的第二多个媒体轨道中;
    若所述第一多个媒体轨道与所述第二多个媒体轨道均包含媒体轨道Mij,则所述关系指示信息还用于指示所述媒体轨道Mij是所述码流i和所述码流j共享的媒体轨道。
  7. 如权利要求1所述的方法,其中,所述沉浸媒体为非时序沉浸媒体;所述N个码流在所述媒体文件中被封装为P个媒体项目;P为整数且P大于或等于N;
    所述关系指示信息设置于所述媒体项目中。
  8. 如权利要求7所述的方法,其中,所述N个码流中的任一个码流表示为码流i,i为正整数且i小于或等于N;所述码流i被封装至所述P个媒体项目中的一个媒体项目Pi中;
    所述关系指示信息设置于所述媒体项目Pi中。
  9. 如权利要求7所述的方法,其中,所述N个码流中的任一个码流表示为码流i,i为正整数且i小于或等于N;所述码流i被封装至所述P个媒体项目中的多个媒体项目中;
    所述关系指示信息设置于媒体项目Pi中;所述媒体项目Pi是指所述多个媒体项目中的任一个媒体项目。
  10. 如权利要求9所述的方法,其中,所述关系指示信息还用于指示所述媒体项目Pi与所述多个媒体项目中除所述媒体项目Pi之外的其他媒体项目之间的关联关系;所述关联关系用于表示所述媒体项目Pi与所述其他媒体项目属于同一个码流i。
  11. 如权利要求7所述的方法,其中,所述N个码流中的任意两个码流分别表示为码流i和码流j,i、j为正整数且i、j均小于或等于N;所述码流i被封装至所述P个媒体项目中的第一多个媒体项目中;所述码流j被封装至所述P个媒体项目中的第二多个媒体项目中;
    若所述第一多个媒体项目与所述第二多个媒体项目均包含媒体项目Pij,则所述关系指示信息还用于指示所述媒体项目Pij是所述码流i和所述码流j共享的媒体项目。
  12. 如权利要求1所述的方法,其中,
    若所述沉浸媒体为时序沉浸媒体,则所述N个码流在所述媒体文件中被封装为M个媒体轨道;M为整数且M大于或等于N;其中,当所述M个媒体轨道中存在多个媒体轨道需要被联合播放时,需要被联合播放的多个媒体轨道属于同一个播放轨道组;
    若所述沉浸媒体为非时序沉浸媒体,则所述N个码流在所述媒体文件中被封装为P个媒体项目;P为整数且P大于或等于N;其中,当所述P个媒体项目中存在多个媒体项目需要被联合播放时,需要被联合播放的多个媒体项目属于同一个播放实体组。
  13. 如权利要求1所述的方法,其中,具备可替换关系的所述N个码流属于同一个可替换组,同一个可替换组中的不同码流在呈现时允许相互替换;所述关系指示信息包括可替换信息数据盒;
    若所述可替换信息数据盒设于当前媒体轨道中,则所述可替换信息数据盒中包含所述当前媒体轨道对应码流所属的可替换组的信息;
    若所述可替换信息数据盒设于当前媒体项目中,则所述可替换信息数据盒中包含所述当前媒体项目对应码流所属的可替换组的信息;
    其中,所述当前媒体轨道是指正在被解码的媒体轨道;所述当前媒体项目是指正在被解码的媒体项目。
  14. 如权利要求13所述的方法,其中,所述可替换信息数据盒中包含可替换组标识标志字段和可替换组标识字段;
    若所述可替换信息数据盒设于所述当前媒体轨道中,则所述可替换组标识标志字段用于指示所述当前媒体轨道中的可替换信息数据盒中是否指示所述当前媒体轨道对应码流所属的可替换组标识符;其中,
    当所述可替换组标识标志字段的取值为第一预设数值时,表示所述当前媒体轨道中的可替换信息数据盒中指示所述当前媒体轨道对应码流所属的可替换组标识符;
    当所述可替换组标识标志字段的取值为第二预设数值时,表示所述当前媒体轨道中的可替换信息数据盒中不指示所述当前媒体轨道对应码流所属的可替换组标识符;
    所述可替换组标识字段用于指示所述当前媒体轨道对应码流所属的可替换组标识符;
    若所述可替换信息数据盒设于所述当前媒体项目中,则所述可替换组标识标志字段用于指示所述当前媒体项目中的可替换信息数据盒中是否指示所述当前媒体项目对应码流所属的可替换组标识符;其中,
    当所述可替换组标识标志字段的取值为第一预设数值时,表示所述当前媒体项目中的可替换信息数据盒中指示所述当前媒体项目对应码流所属的可替换组标识符;
    当所述可替换组标识标志字段的取值为第二预设数值时,表示所述当前媒体项目中的可替换信息数据盒中不指示所述当前媒体项目对应码流所属的可替换组标识符;
    所述可替换组标识字段用于指示所述当前媒体项目对应码流所属的可替换组标识符。
  15. 如权利要求13所述的方法,其中,所述关系指示信息还用于指示所述当前媒体轨道的共享归属关系或所述当前媒体项目的共享归属关系;所述可替换信息数据盒包括多替换码流标志字段和码流数量字段;
    若所述可替换信息数据盒设于所述当前媒体轨道中,则所述多替换码流标志字段用于指示所述当前媒体轨道是否属于多个码流;其中,
    当所述多替换码流标志字段的取值为第一预设数值时,表示所述当前媒体轨道仅属于一个码流;
    当所述多替换码流标志字段的取值为第二预设数值时,表示所述当前媒体轨道属于多个码流;
    所述码流数量字段用于指示所述当前媒体轨道所属的码流的数量;
    若所述可替换信息数据盒设于所述当前媒体项目中,则所述多替换码流标志字段用于指示所述当前媒体项目是否属于多个码流;其中,
    当所述多替换码流标志字段的取值为第一预设数值时,表示所述当前媒体项目仅属于一个码流;
    当所述多替换码流标志字段的取值为第二预设数值时,表示所述当前媒体项目属于多个码流;
    所述码流数量字段用于指示所述当前媒体项目所属的码流的数量。
  16. 如权利要求13所述的方法,其中,所述关系指示信息还用于指示所述当前媒体轨道的共享归属关系或所述当前媒体项目的共享归属关系;
    若所述当前媒体轨道中仅包含一个可替换信息数据盒,则指示所述当前媒体轨道仅属于一个码流; 若所述当前媒体轨道包含多个可替换信息数据盒,则指示所述当前媒体轨道属于多个码流,所述当前媒体轨道中的可替换信息数据盒的数量等于所述当前媒体轨道所属的码流的数量;或者,
    若所述当前媒体项目中仅包含一个可替换信息数据盒,则指示所述当前媒体项目仅属于一个码流;若所述当前媒体项目包含多个可替换信息数据盒,则指示所述当前媒体项目属于多个码流;所述当前媒体项目中的可替换信息数据盒的数量等于所述当前媒体项目所属的码流的数量。
  17. 如权利要求13-16任一项所述的方法,其中,所述关系指示信息还用于指示所述当前媒体轨道及与所述当前媒体轨道属于同一码流的其他媒体轨道之间的关联关系,或用于指示所述当前媒体项目及与所述当前媒体项目属于同一码流的其他媒体项目之间的关联关系;
    所述可替换信息数据盒包括组件参考类型字段,所述组件参考类型字段用于指示当前媒体轨道及与所述当前媒体轨道属于同一码流的其他媒体轨道之间的关联方式,或用于指示当前媒体项目及与所述当前媒体项目属于同一码流的其他媒体项目之间的关联方式;其中,
    当所述组件参考类型字段的取值为第一预设数值时,表示所述当前媒体轨道通过轨道参考关联至与所述当前媒体轨道属于同一码流的其他媒体轨道,且所述可替换信息数据盒还包括轨道参考类型字段,所述轨道参考类型字段用于指示所述轨道参考的类型;
    当所述组件参考类型字段的取值为第二预设数值时,表示所述当前媒体轨道通过轨道组关联至与所述当前媒体轨道属于同一码流的其他媒体轨道,且所述可替换信息数据盒还包括轨道组类型字段和轨道组标识字段,所述轨道组类型字段用于指示所述当前媒体轨道所属轨道组的类型;所述轨道标识字段用于指示所述当前媒体轨道所属轨道组的标识符;
    当所述组件参考类型字段的取值为第三预设数值时,表示所述当前媒体项目通过项目参考的方式关联至与所述当前媒体项目属于同一码流的其他媒体项目,且所述可替换信息数据盒还包括项目参考类型字段,所述项目参考类型字段用于指示所述项目参考的类型;
    当所述组件参考类型字段的取值为第四预设数值时,表示所述当前媒体项目通过实体组的方式关联至与所述当前媒体项目属于同一码流的其他媒体项目,且所述可替换信息数据盒还包括实体组类型字段和实体组标识字段,所述实体组类型字段用于指示所述当前媒体项目所属实体组的类型;所述实体组标识字段用于指示所述当前媒体项目所属实体组的标识符。
  18. 如权利要求13所述的方法,其中,所述可替换信息数据盒还包括多组件标志字段;
    若所述可替换信息数据盒设于所述当前媒体轨道中,则所述多组件标志字段用于指示所述当前媒体轨道所属码流是否被封装至多个媒体轨道中;其中,
    当所述多组件标志字段的取值为第一预设数值时,表示当前媒体轨道所属码流被封装至一个媒体轨道中,所述当前媒体轨道是其所属码流封装至的媒体轨道;
    当所述多组件标志字段的取值为第二预设数值时,表示所述当前媒体轨道所属码流被封装至多个媒体轨道中,所述当前媒体轨道为其所属码流被封装至的多个媒体轨道中的的任一个媒体轨道;
    若所述可替换信息数据盒设于所述当前媒体项目中,则所述多组件标志字段用于指示所述当前媒体项目所属码流是否被封装至多个媒体项目中;其中,
    当所述多组件标志字段的取值为第一预设数值时,表示当前媒体项目所属码流被封装至一个媒体项目中,所述当前媒体项目是其所属码流封装至的媒体项目;
    当所述多组件标志字段的取值为第二预设数值时,表示所述当前媒体项目所属码流被封装至多个媒体项目中,所述当前媒体项目为其所属码流被封装至的多个媒体项目中的的任一个媒体项目;
    其中,若所述当前媒体轨道/所述当前媒体项目所属码流为点云码流,且所述点云码流采用多轨封装方式进行封装,则所述多组件标志字段的取值为第二预设数值。
  19. 如权利要求1所述的方法,其中,所述沉浸媒体采用流化传输方式进行传输,所述获取沉浸媒体的媒体文件,包括:
    获取所述沉浸媒体的传输信令;
    根据所述传输信令,获取所述沉浸媒体的媒体文件。
  20. 如权利要求19所述的方法,其中,所述传输信令中包含所述关系指示信息的描述信息,所述描述信息用于定义所述关系指示信息所指示的具备可替换关系的N个码流;
    所述描述信息包括N个预选标识,每个预选标识分别用于表示所述N个码流中的一个码流;各个 预选标识具备相同的编码标识;
    每个预选标识对应一个或多个自适应集合,一个自适应集合代表每个预选标识所表示的码流中的一个媒体轨道或一个媒体项目;或者,每个预选标识对应一个或多个表示,一个表示代表每个预选标识所表示的码流中的一个媒体轨道或一个媒体项目。
  21. 如权利要求1所述的方法,其中,所述根据所述关系指示信息,对所述媒体文件进行解码处理,以呈现所述沉浸媒体,包括:
    按照所述关系指示信息所指示的可替换关系,从可替换的N个码流中确定所需呈现的码流;
    对所需呈现的码流进行解码并呈现;
    其中,所述沉浸媒体包括以下任一种或多种:体积媒体、容积视频媒体、多视角视频媒体、字幕媒体以及音频媒体。
  22. 一种沉浸媒体的数据处理方法,由计算机设备执行,包括:
    对沉浸媒体进行编码处理,得到可替换的N个码流;
    根据所述N个码流之间的可替换关系,生成关系指示信息,所述关系指示信息用于指示所述N个码流之间的可替换关系;及,
    对所述关系指示信息和所述N个码流进行封装处理,得到所述沉浸媒体的媒体文件。
  23. 一种沉浸媒体的数据处理装置,包括:
    获取单元,用于获取沉浸媒体的媒体文件;所述沉浸媒体包括可替换的N个码流;所述媒体文件包括关系指示信息,所述关系指示信息用于指示所述N个码流之间的可替换关系,N为大于1的整数;及,
    处理单元,用于根据所述关系指示信息对所述媒体文件进行解码处理以呈现所述沉浸媒体。
  24. 一种沉浸媒体的数据处理装置,包括:
    编码单元,用于对沉浸媒体进行编码处理,得到可替换的N个码流;
    处理单元,用于根据所述N个码流之间的可替换关系生成关系指示信息,所述关系指示信息用于指示所述N个码流之间的可替换关系;及,
    所述处理单元,还用于对所述关系指示信息和所述N个码流进行封装处理,得到所述沉浸媒体的媒体文件。
  25. 一种计算机设备,包括:
    处理器,适用于执行计算机程序;
    计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被所述处理器执行时,执行如权利要求1-22任一项所述的沉浸媒体的数据处理方法。
  26. 一种计算机可读存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序被处理器执行时,执行如权利要求1-22任一项所述的沉浸媒体的数据处理方法。
  27. 一种计算机程序产品,所述计算机程序产品包括计算机程序或计算机指令,所述计算机程序或计算机指令被处理器执行实现如权利要求1-22任一项所述沉浸媒体的数据处理方法。
PCT/CN2024/074627 2023-03-07 2024-01-30 沉浸媒体的数据处理方法、装置、计算机设备、存储介质及程序产品 WO2024183506A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310247101.8 2023-03-07
CN202310247101.8A CN116347118A (zh) 2023-03-07 2023-03-07 一种沉浸媒体的数据处理方法及相关设备

Publications (1)

Publication Number Publication Date
WO2024183506A1 true WO2024183506A1 (zh) 2024-09-12

Family

ID=86881630

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/074627 WO2024183506A1 (zh) 2023-03-07 2024-01-30 沉浸媒体的数据处理方法、装置、计算机设备、存储介质及程序产品

Country Status (2)

Country Link
CN (1) CN116347118A (zh)
WO (1) WO2024183506A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116347118A (zh) * 2023-03-07 2023-06-27 腾讯科技(深圳)有限公司 一种沉浸媒体的数据处理方法及相关设备

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111435991A (zh) * 2019-01-11 2020-07-21 上海交通大学 基于分组的点云码流封装方法和系统
US20210377581A1 (en) * 2018-06-27 2021-12-02 Canon Kabushiki Kaisha Method, device, and computer program for transmitting media content
WO2022149849A1 (en) * 2021-01-05 2022-07-14 Samsung Electronics Co., Ltd. V3c video component track alternatives
CN114930813A (zh) * 2020-01-08 2022-08-19 Lg电子株式会社 点云数据发送装置、点云数据发送方法、点云数据接收装置和点云数据接收方法
CN115088258A (zh) * 2021-01-15 2022-09-20 中兴通讯股份有限公司 基于多轨道的沉浸式媒体播放
CN115623183A (zh) * 2021-07-12 2023-01-17 腾讯科技(深圳)有限公司 容积媒体的数据处理方法、装置、设备以及存储介质
WO2023024839A1 (zh) * 2021-08-23 2023-03-02 腾讯科技(深圳)有限公司 媒体文件封装与解封装方法、装置、设备及存储介质
CN115733576A (zh) * 2021-08-26 2023-03-03 腾讯科技(深圳)有限公司 点云媒体文件的封装与解封装方法、装置及存储介质
CN116347118A (zh) * 2023-03-07 2023-06-27 腾讯科技(深圳)有限公司 一种沉浸媒体的数据处理方法及相关设备

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210377581A1 (en) * 2018-06-27 2021-12-02 Canon Kabushiki Kaisha Method, device, and computer program for transmitting media content
CN111435991A (zh) * 2019-01-11 2020-07-21 上海交通大学 基于分组的点云码流封装方法和系统
CN114930813A (zh) * 2020-01-08 2022-08-19 Lg电子株式会社 点云数据发送装置、点云数据发送方法、点云数据接收装置和点云数据接收方法
WO2022149849A1 (en) * 2021-01-05 2022-07-14 Samsung Electronics Co., Ltd. V3c video component track alternatives
CN115088258A (zh) * 2021-01-15 2022-09-20 中兴通讯股份有限公司 基于多轨道的沉浸式媒体播放
CN115623183A (zh) * 2021-07-12 2023-01-17 腾讯科技(深圳)有限公司 容积媒体的数据处理方法、装置、设备以及存储介质
WO2023024839A1 (zh) * 2021-08-23 2023-03-02 腾讯科技(深圳)有限公司 媒体文件封装与解封装方法、装置、设备及存储介质
CN115733576A (zh) * 2021-08-26 2023-03-03 腾讯科技(深圳)有限公司 点云媒体文件的封装与解封装方法、装置及存储介质
CN116347118A (zh) * 2023-03-07 2023-06-27 腾讯科技(深圳)有限公司 一种沉浸媒体的数据处理方法及相关设备

Also Published As

Publication number Publication date
CN116347118A (zh) 2023-06-27

Similar Documents

Publication Publication Date Title
JP7434574B2 (ja) ポイントクラウドデータ送信装置、ポイントクラウドデータ送信方法、ポイントクラウドデータ受信装置及びポイントクラウドデータ受信方法
US20230421810A1 (en) Encapsulation and decapsulation methods and apparatuses for point cloud media file, and storage medium
WO2024037137A1 (zh) 一种沉浸媒体的数据处理方法、装置、设备、介质和产品
CN113891117B (zh) 沉浸媒体的数据处理方法、装置、设备及可读存储介质
CN115379189B (zh) 一种点云媒体的数据处理方法及相关设备
WO2023202095A1 (zh) 点云媒体的编解码方法、装置、电子设备和存储介质
WO2024041239A1 (zh) 一种沉浸媒体的数据处理方法、装置、设备、存储介质及程序产品
WO2024041238A1 (zh) 一种点云媒体的数据处理方法及相关设备
WO2024183506A1 (zh) 沉浸媒体的数据处理方法、装置、计算机设备、存储介质及程序产品
US12107908B2 (en) Media file encapsulating method, media file decapsulating method, and related devices
US20230360678A1 (en) Data processing method and storage medium
WO2023226504A1 (zh) 一种媒体数据处理方法、装置、设备以及可读存储介质
WO2023024841A1 (zh) 点云媒体文件的封装与解封装方法、装置及存储介质
WO2023024839A1 (zh) 媒体文件封装与解封装方法、装置、设备及存储介质
JP2024503059A (ja) マルチトラックベースの没入型メディアプレイアウト
WO2023169004A1 (zh) 点云媒体的数据处理方法、装置、设备及介质
CN116781674B (zh) 一种沉浸媒体的数据处理方法、装置、设备及存储介质
WO2023169003A1 (zh) 点云媒体的解码方法、点云媒体的编码方法及装置
US12052454B2 (en) Data processing method, apparatus, and device for point cloud media, and storage medium
WO2024114519A1 (zh) 点云封装与解封装方法、装置、介质及电子设备
WO2023024843A1 (zh) 媒体文件封装与解封装方法、设备及存储介质
CN116939290A (zh) 媒体数据处理方法、装置、设备及存储介质
CN115061984A (zh) 点云媒体的数据处理方法、装置、设备、存储介质
KR20240136930A (ko) 포인트 클라우드 데이터의 전송 장치와 이 전송 장치에서 수행되는 방법 및, 포인트 클라우드 데이터의 수신 장치와 이 수신 장치에서 수행되는 방법
CN117082262A (zh) 点云文件封装与解封装方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24766229

Country of ref document: EP

Kind code of ref document: A1