WO2023169003A1 - 点云媒体的解码方法、点云媒体的编码方法及装置 - Google Patents
点云媒体的解码方法、点云媒体的编码方法及装置 Download PDFInfo
- Publication number
- WO2023169003A1 WO2023169003A1 PCT/CN2022/135732 CN2022135732W WO2023169003A1 WO 2023169003 A1 WO2023169003 A1 WO 2023169003A1 CN 2022135732 W CN2022135732 W CN 2022135732W WO 2023169003 A1 WO2023169003 A1 WO 2023169003A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- point cloud
- attribute
- indicate
- slice
- type
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 230000006835 compression Effects 0.000 claims abstract description 124
- 238000007906 compression Methods 0.000 claims abstract description 124
- 230000005540 biological transmission Effects 0.000 claims description 32
- 238000005538 encapsulation Methods 0.000 claims description 26
- 230000011664 signaling Effects 0.000 claims description 16
- 238000002310 reflectometry Methods 0.000 claims description 11
- 230000015654 memory Effects 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 description 15
- 238000012545 processing Methods 0.000 description 14
- 230000006978 adaptation Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 230000006854 communication Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 230000000007 visual effect Effects 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 239000012634 fragment Substances 0.000 description 4
- 239000003550 marker Substances 0.000 description 4
- 230000002457 bidirectional effect Effects 0.000 description 3
- 238000011960 computer-aided design Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- AWSBQWZZLBPUQH-UHFFFAOYSA-N mdat Chemical compound C1=C2CC(N)CCC2=CC2=C1OCO2 AWSBQWZZLBPUQH-UHFFFAOYSA-N 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 208000003028 Stuttering Diseases 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000012092 media component Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/119—Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/174—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/184—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/186—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
Definitions
- This application belongs to the field of audio and video technology, and specifically relates to a point cloud media encoding method, a point cloud media decoding method, a point cloud media encoding device, a point cloud media decoding device, computer readable media, electronic equipment and computer programs. product.
- Point cloud is a set of discrete points randomly distributed in space that expresses the spatial structure and surface properties of a three-dimensional object or scene. After obtaining large-scale point cloud data through point cloud acquisition equipment, the point cloud data can be encoded and encapsulated for transmission and presentation to users. Point cloud media generally suffers from defects such as large amount of transmitted data and data redundancy during encoding, transmission and decoding consumption. Therefore, how to improve the encoding and decoding flexibility of point cloud media is an issue that needs to be solved urgently.
- a method for decoding point cloud media includes:
- the point cloud media file including point cloud samples encapsulated in one or more tracks;
- the media file data box of the point cloud sample includes a type field used to indicate the type of the compression unit, and the type of the compression unit includes a geometry header.
- the geometry header is used to indicate a parameter set of geometric information
- the attribute header is used to indicate a parameter set of attribute information
- the geometry slice is used to indicate geometric information
- Point cloud slice data the attribute slice is point cloud slice data used to indicate attribute information
- the target compression unit is selected according to the type field, and the target compression unit is decoded to obtain point cloud data.
- a point cloud media encoding method includes:
- point cloud source data where the point cloud source data includes multiple point cloud frames
- the at least one compression unit is encapsulated to obtain a point cloud media file.
- the point cloud media file includes point cloud samples encapsulated in one or more tracks; the media file data box of the point cloud sample includes: A type field indicating the type of the compression unit.
- the type of the compression unit includes any one of a geometry header, an attribute header, a geometry slice, and an attribute slice.
- the geometry header is used to indicate a parameter set of geometry information.
- the attribute The header is used to indicate a parameter set of attribute information, the geometric piece is point cloud piece data used to indicate geometric information, and the attribute piece is point cloud piece data used to indicate attribute information.
- a point cloud media decoding device includes:
- An acquisition module used to acquire point cloud media files, where the point cloud media files include point cloud samples encapsulated in one or more tracks;
- a decapsulation module configured to decapsulate the point cloud sample to obtain at least one compression unit;
- the media file data box of the point cloud sample includes a type field used to indicate the type of the compression unit, and the compression unit
- the type of unit includes any one of a geometry header, an attribute header, a geometry slice, and an attribute slice.
- the geometry header is used to indicate a parameter set of geometric information.
- the attribute header is used to indicate a parameter set of attribute information.
- the geometric slice is point cloud patch data used to indicate geometric information
- the attribute patch is point cloud patch data used to indicate attribute information;
- the decoding module is used to select the target compression unit according to the type field, and decode the target compression unit to obtain point cloud data.
- a point cloud media encoding device the device includes:
- An acquisition module used to acquire point cloud source data, where the point cloud source data includes multiple point cloud frames;
- An encoding module used to encode the point cloud frame to obtain at least one compression unit
- An encapsulation module configured to encapsulate the at least one compression unit to obtain a point cloud media file.
- the point cloud media file includes a point cloud sample encapsulated in one or more tracks; the media file of the point cloud sample.
- the data box includes a type field used to indicate the type of the compression unit.
- the type of the compression unit includes any one of a geometry header, an attribute header, a geometry slice, and an attribute slice.
- the geometry header is used to indicate parameters of geometry information.
- the attribute header is used to indicate a parameter set of attribute information
- the geometric slice is point cloud slice data used to indicate geometric information
- the attribute slice is point cloud slice data used to indicate attribute information.
- a computer-readable medium has computer-readable instructions stored thereon.
- the computer-readable instructions are executed by a processor, the encoding and decoding method of point cloud media in the above technical solution is implemented.
- An electronic device the electronic device includes: a processor; and a memory for storing computer readable instructions of the processor; wherein the processor is used to execute the above technical solution by executing the computer readable instructions. Encoding and decoding methods for point cloud media in .
- a computer program product or computer program includes computer readable instructions stored in a computer readable storage medium.
- the processor of the computer device reads the computer-readable instructions from the computer-readable storage medium, and the processor executes the computer-readable instructions, so that the computer device performs the encoding and decoding method of point cloud media as in the above technical solution.
- Figure 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiment of the present application can be applied;
- Figure 2 shows a schematic diagram of the point cloud media encoding and decoding process in an application scenario according to the embodiment of the present application
- Figure 3 shows a flow chart of the steps of the point cloud media decoding method in one embodiment of the present application
- Figure 4 shows the syntax structure of encapsulating point cloud samples based on the TLV (Tag-Length-Value) format in one embodiment of the present application
- Figure 5 shows an exemplary structure of encapsulating a geometry code stream and an attribute code stream in a single track according to one embodiment of the present application
- Figure 6 shows an exemplary structure of encapsulating geometry code streams and attribute code streams in multiple tracks according to one embodiment of the present application
- Figure 7 shows the syntax structure of a compression unit based on TLV format encapsulation in one embodiment of the present application
- Figure 8 shows the syntax structure of encapsulating point cloud samples based on G-PCC compression mode in one embodiment of the present application
- Figure 9 shows the syntax structure of a point cloud sample that provides specific indication information of a point cloud patch in one embodiment of the present application
- Figure 10 shows the syntax structure corresponding to the specific parameters of the codec in the media file data box of the subsample according to one embodiment of the present application
- Figure 11 shows the syntax structure of metadata information indicating the number of attributes in multi-track encapsulation mode in one embodiment of the present application
- Figure 12 shows the syntax structure of metadata information of extended attribute types in multi-track encapsulation mode in one embodiment of the present application
- Figure 13 shows a flow chart of the steps of the point cloud media encoding method in one embodiment of the present application
- Figure 14 shows a flow chart of point cloud data encoding and decoding in a multi-track encapsulated streaming media transmission application scenario according to an embodiment of the present application
- Figure 15 shows a flow chart of point cloud data encoding and decoding in the application scenario of single-track encapsulated local point cloud media playback according to the embodiment of the present application
- Figure 16 schematically shows a structural block diagram of a point cloud decoding device provided by an embodiment of the present application
- Figure 17 schematically shows a structural block diagram of a point cloud encoding device provided by an embodiment of the present application.
- Figure 18 schematically shows a system structural block diagram of an electronic device suitable for implementing embodiments of the present application.
- Example embodiments will now be described more fully with reference to the accompanying drawings.
- Example embodiments may, however, be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concepts of the example embodiments. To those skilled in the art.
- the "plurality” mentioned in this article means two or more than two.
- “And/or” describes the relationship between related objects, indicating that there can be three relationships.
- a and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone.
- the character “/” generally indicates that the related objects are in an "or” relationship.
- this application involves user-related data such as point cloud media transmission content, decoding content, and consumption content.
- user permission is required. Or agree, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.
- Immersive media media content that can bring immersive experience to consumers.
- Immersive media can be divided into 3DoF media, 3DoF+ media and 6DoF media according to the user's degree of freedom when consuming media content.
- Point cloud media is a typical 6DoF media.
- DoF Degree of Freedom
- degree of freedom In this application, it refers to the degree of freedom a user has to support movement and interact with content while viewing immersive media.
- 3DoF Three degrees of freedom, which refers to the three degrees of freedom for the user's head to rotate around the x, y, and z axes.
- 3DoF+ In addition to three degrees of freedom, the user also has limited degrees of freedom for movement along the x, y, and z axes.
- 6DoF In addition to three degrees of freedom, the user also has the freedom to move freely along the x, y, and z axes.
- Point cloud is a set of discrete points randomly distributed in space that expresses the spatial structure and surface properties of a three-dimensional object or scene. Each point in the point cloud has at least three-dimensional position information. Depending on the application scenario, it may also have color, material or other information. Typically, each point in a point cloud has the same number of additional attributes.
- PCC Point Cloud Compression, point cloud compression.
- G-PCC Geometry-based Point Cloud Compression, point cloud compression based on geometric model.
- Sample the encapsulation unit in the media file encapsulation process.
- a media file consists of many samples. Taking video media as an example, a sample of video media is usually a video frame.
- DASH dynamic adaptive streaming over HTTP
- dynamic adaptive streaming based on HTTP is an adaptive bitrate streaming technology that enables high-quality streaming media to be delivered over the Internet through traditional HTTP web servers.
- MPD media presentation description, media presentation description signaling in DASH, used to describe media segment information.
- a combination of one or more media components such as a video file of a certain resolution, can be regarded as a Representation.
- Adaptation Sets In DASH, a collection of one or more video streams.
- An Adaptation Set can contain multiple Representations.
- Media Segment Media segment. Playable clips that conform to certain media formats. When playing, it may need to cooperate with 0 or more previous clips and initialization clips.
- point cloud media can be divided into point cloud media (Video-based Point Cloud Compression, VPCC) that is compressed based on traditional video coding methods and point cloud media that is compressed based on geometric features (Geometry-based Point Cloud Compression, GPCC).
- VPCC Video-based Point Cloud Compression
- GPCC Geometry-based Point Cloud Compression
- the three-dimensional position information is usually called the geometry component of the point cloud media file
- the attribute information is called the attribute component of the point cloud media file.
- a point cloud media file has only one geometric component, but there can be one or more attribute components.
- Point cloud can flexibly and conveniently express the spatial structure and surface properties of three-dimensional objects or scenes, so it is widely used. Its main application scenarios can be classified into two categories. 1) Machine-perceived point clouds, such as Computer Aided Design (CAD), Autonomous Navigation System (ANS), real-time inspection system, Geography Information System (GIS), and visual sorting robots , rescue and disaster relief robots. 2) The human eye perceives point clouds, such as point cloud application scenarios such as virtual reality (VR) games, digital cultural heritage, free-viewpoint broadcasting, three-dimensional immersive communication, and three-dimensional immersive interaction.
- CAD Computer Aided Design
- ANS Autonomous Navigation System
- GIS Geography Information System
- VR visual sorting robots
- rescue and disaster relief robots rescue and disaster relief robots.
- point clouds such as point cloud application scenarios such as virtual reality (VR) games, digital cultural heritage, free-viewpoint broadcasting, three-dimensional immersive communication, and three-dimensional immersive interaction.
- VR virtual reality
- Point clouds The main ways to obtain point clouds are: computer generation, 3D laser scanning, 3D photogrammetry, etc.
- Computers can generate point clouds of virtual three-dimensional objects and scenes.
- 3D scanning can obtain point clouds of static real-world three-dimensional objects or scenes, and millions of point clouds can be obtained per second.
- 3D photography can obtain point clouds of dynamic real-world three-dimensional objects or scenes, and tens of millions of point clouds can be obtained per second.
- point clouds of biological tissues and organs can be obtained from MRI, CT, and electromagnetic positioning information.
- Figure 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiment of the present application can be applied.
- system architecture 100 includes a plurality of terminals that can communicate with each other through, for example, network 150 .
- the system architecture 100 may include a first terminal 110 and a second terminal 120 interconnected through a network 150 .
- the first terminal 110 and the second terminal 120 perform one-way data transmission.
- the first terminal 110 may encode point cloud data (eg, point cloud data collected by the terminal 110) for transmission to the second terminal 120 through the network 150, and the encoded point cloud data may be represented by one or more encoded The point cloud code stream is transmitted, and the second terminal 120 can receive the encoded point cloud data from the network 150, decode the encoded point cloud data to restore the point cloud data, and display the point cloud content according to the restored point cloud data.
- point cloud data eg, point cloud data collected by the terminal 110
- the encoded point cloud data may be represented by one or more encoded
- the point cloud code stream is transmitted, and the second terminal 120 can receive the encoded point cloud data from the network 150, decode the encoded point cloud data to restore the point cloud data, and display the point cloud content according to the restored point cloud data.
- the system architecture 100 may include a third terminal 130 and a fourth terminal 140 that perform bidirectional transmission of encoded point cloud data, which bidirectional transmission may occur, for example, during a video conference.
- each of the third terminal 130 and the fourth terminal 140 may encode point cloud data (eg, point cloud data collected by the terminal) for transmission to the third terminal 130 and the fourth terminal through the network 150 Another terminal among terminals 140.
- Each of the third terminal 130 and the fourth terminal 140 may also receive the encoded point cloud data transmitted by the other terminal of the third terminal 130 and the fourth terminal 140 and may decode the encoded point cloud data.
- To restore point cloud data and display point cloud content on an accessible display device based on the restored point cloud data.
- the first terminal 110 , the second terminal 120 , the third terminal 130 and the fourth terminal 140 may be servers, personal computers and smart phones, but the principles disclosed in this application may not be limited thereto. Embodiments disclosed herein are suitable for use with laptops, tablets, media players, and/or dedicated video conferencing devices.
- Network 150 represents any number of networks that convey encoded point cloud data between first terminal 110, second terminal 120, third terminal 130, and fourth terminal 140, including, for example, wired and/or wireless communication networks.
- Communication network 150 may exchange data in circuit-switched and/or packet-switched channels.
- the network may include telecommunications networks, local area networks, wide area networks, and/or the Internet. For purposes of this application, unless explained below, the architecture and topology of network 150 may be immaterial to the operations disclosed herein.
- the server in the embodiment of this application may be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides cloud computing services.
- the terminal can be a smartphone, tablet, laptop, desktop computer, smart speaker, smart watch, vehicle terminal, smart TV, etc., but is not limited to this.
- the terminal and the server can be connected directly or indirectly through wired or wireless communication methods, which is not limited in this application.
- the encoded data stream needs to be encapsulated and transmitted to the user.
- the point cloud file needs to be decapsulated first, then decoded, and finally the decoded data stream is presented.
- Figure 2 shows a schematic diagram of the point cloud media encoding and decoding process in an application scenario according to the embodiment of the present application.
- the real-world visual scene A can be captured by collecting point cloud data through the collection device 210.
- the collection device 210 may be, for example, a set of cameras or a camera device with multiple lenses and sensors.
- the collection result is point cloud source data B, which is a frame sequence composed of a large number of point cloud frames.
- One or more point cloud frames may be encoded by the encoder 220 to obtain an encoded G-PCC bit stream, which may specifically include an encoded geometry bit stream and an attribute bit stream E.
- the file encapsulator 230 can encapsulate one or more encoded bit streams according to a specific media container file format to obtain a media file F for file playback or a series of initialization segments and media segments Fs for streaming transmission.
- the media container file format may be, for example, the ISO basic media file format specified in ISO/IEC 14496-12 [ISOBMFF].
- File encapsulator 230 may also encapsulate metadata in media files F or media fragments Fs.
- the media file F output by the file encapsulator 230 is the same as the media file F′ input to the file depackager 240 .
- the file decapsulator can extract the encoded bit stream E' and parse the metadata by processing the media file F' or processing the received media fragments F's.
- the decoder 250 may decode the G-PCC bit stream into a decoded signal D' and generate point cloud data according to the decoded signal D'.
- point cloud data may be rendered and displayed by renderer 260 to a head mounted display or any other device based on the current viewing position, viewing direction, or viewport determined by various types of sensors (eg, sensors disposed on the head). on the screen of other display devices.
- the current viewing position or viewing direction can also be used for decoding optimization.
- the current viewing position and viewing direction are also passed to the policy module, which can be used to determine which track to receive.
- streaming transmission technology is usually used to handle the transmission of media resources between the server and the client.
- Common media streaming transmission technologies include DASH (Dynamic Adaptive Streaming over HTTP), HLS (HTTP Live Streaming), SMT (Smart Media Transport) and other technologies.
- DASH is an adaptive bitrate streaming technology that enables high-quality streaming media to be delivered over the Internet through traditional HTTP web servers.
- DASH breaks the content into a series of small HTTP-based file fragments, each fragment contains a short length of playable content, and the total length of the content may be several hours (such as a movie or live sports event).
- Content will be cut into multiple bitrate alternatives to provide multiple bitrate versions for selection.
- the client When media content is played by a DASH client, the client will automatically select which alternative to download and play based on current network conditions. The client will select for playback the highest bitrate clip that can be downloaded in a timely manner, thus avoiding playback stutters or rebuffering events. Because of this, the DASH client can seamlessly adapt to changing network conditions and provide a high-quality playback experience with less lag and rebuffering.
- DASH uses existing HTTP web server infrastructure. It allows devices such as Internet TVs, TV set-top boxes, desktop computers, smartphones, tablets and other devices to consume multimedia content (such as videos, TV, radio, etc.) transmitted through the Internet, and can cope with changing Internet reception conditions.
- devices such as Internet TVs, TV set-top boxes, desktop computers, smartphones, tablets and other devices to consume multimedia content (such as videos, TV, radio, etc.) transmitted through the Internet, and can cope with changing Internet reception conditions.
- Figure 3 shows a step flow chart of the point cloud media decoding method in one embodiment of the present application.
- This method can be applied to the server, terminal (or client running on the terminal) and intermediate nodes of the point cloud media system.
- the embodiment of the present application takes an electronic device installed with a point cloud media decoding device to execute the point cloud media decoding method as an example.
- the point cloud media decoding method includes the following steps S310 to S330.
- step S310 a point cloud media file is obtained.
- the point cloud media file includes point cloud samples encapsulated in one or more tracks.
- the point cloud media file may be a media file or media segment obtained after encoding and encapsulation processing as shown in Figure 2.
- the media file or media segment carries a point cloud code stream to be transmitted.
- the data source can encapsulate the point cloud code stream into a single track based on the geometric parameter information, attribute parameter information and point cloud slice parameter information contained in the point cloud code stream, or it can also encapsulate a single track
- the point cloud media file of the track is repackaged into a point cloud media file containing multiple tracks.
- the data source is an electronic device that produces point cloud media files, such as a server. If the electronic device that executes the point cloud media decoding method is a terminal, the terminal can obtain the point cloud media file from the server. If the electronic device that executes the point cloud media decoding method is The electronic device is the server, and the electronic device can directly obtain the point cloud media file.
- a track refers to a volumetric visual track used to carry a coded geometry bitstream or a coded attribute bitstream, or a volumetric visual track that carries both a coded geometry bitstream and a coded attribute bitstream.
- each point cloud sample can correspond to a complete point cloud frame.
- the point cloud sample is decapsulated to obtain at least one compression unit;
- the media file data box of the point cloud sample includes a type field used to indicate the type of compression unit.
- the type of compression unit includes a geometry header and an attribute header. , any one of the geometric slice and the attribute slice, the geometry header is used to indicate the parameter set of geometric information, the attribute header is used to indicate the parameter set of attribute information, the geometry slice is the point cloud slice data used to indicate the geometric information, and the attribute slice is Point cloud slice data used to indicate attribute information.
- the media file data box may be a data box based on the ISO base media file format ISOBMFF (ISO Base Media File Format).
- ISOBMFF ISO Base Media File Format
- ISOBMFF ISO Base Media File Format
- FIG. 4 shows the syntax structure of encapsulating point cloud samples based on TLV format in one embodiment of the present application.
- Each point cloud sample consists of one or more compression units G-PCC unit.
- Figure 5 shows an exemplary structure of encapsulating a geometry code stream and an attribute code stream in a single track according to an embodiment of the present application.
- a point cloud sample in single-track encapsulation mode, can include one or more compression units based on TLV format encapsulation. For example, it can include the parameter set unit Parameter Set TLV and the geometry information unit Geometry TLV in Figure 5. and attribute information unit Attribute TLV.
- TLV code stream format namely Type-length-value bytestream format
- TLV code stream format refers to a structure composed of data type Type, data length Length and data value Value.
- TLV code stream format please refer to the standard ISO/IEC 23090-9.
- Figure 6 shows an exemplary structure of encapsulating geometry code streams and attribute code streams in multiple tracks according to one embodiment of the present application.
- ftyp represents the file type and describes the version of the specification that the point cloud sample complies with
- moov represents the metadata information of the point cloud sample
- mdat represents the specific media data carried in the point cloud sample.
- G-PCC geometry track contains at least one compression unit G-PCC unit, which carries a single G-PCC component data unit instead of a geometry and attribute data unit or a multiplexing of different attribute data units.
- G-PCC attribute tracks should not reuse different attribute substreams, such as color and reflectivity.
- Figure 7 shows the syntax structure of a compression unit encapsulated based on TLV format in one embodiment of the present application.
- tlv_type is a type field used to indicate the type of compression unit.
- Table 1 shows the semantic description of different values of the compression unit type field in an embodiment of the present application.
- the type field with different values can be used to indicate different compression unit types.
- sequence parameter set SPS Sequence Parameter Set
- the type of the compression unit is the geometry parameter set GPS (Geometry Parameter Set).
- embodiments of this application also provide the following semantic description of the compression unit type field. .
- the type of the compression unit is a geometry header, and the geometry header is used to indicate the parameter set of geometric information.
- the value of the type field is 8
- the attribute header is used to indicate the parameter set of attribute information.
- the type of compression unit is geometric slices, and geometric slices are used to indicate point cloud slice data of geometric information.
- the type of compression unit is attribute slice
- the attribute slice is used to indicate point cloud slice data of attribute information.
- the compression unit G-PCC unit includes any one of a geometry header, an attribute header, a geometry slice, and an attribute slice.
- the compression unit G-PCC unit in the same point cloud sample corresponds to the same point cloud frame and has the same rendering time.
- step S330 the target compression unit is selected according to the type field, and the target compression unit is decoded to obtain point cloud data.
- the electronic device can use different field values to indicate that the compression unit to be decoded is a geometry header, an attribute header, a geometry slice, or an attribute slice, so that it can consume point cloud media based on It is necessary to selectively decode part of the file content without decoding the entire file content. Therefore, it can not only improve the flexibility of point cloud data consumption, but also significantly improve the decoding efficiency of point cloud data and reduce the consumption of computing resources.
- the media file data box of the point cloud sample may also include:
- the component header number field num_component_headers is used to indicate the number of geometric header parameter sets and attribute header parameter sets included in the point cloud sample.
- the slice number field num_slices is used to indicate the number of point cloud slices included in the point cloud sample, that is, the number of geometric slices and attribute slices.
- the value of the component header quantity field when the value of the component header quantity field is 0, it indicates that the geometry header parameter set and the attribute header parameter set are decoder configuration information.
- Figure 8 shows the syntax structure of encapsulating point cloud samples based on G-PCC compression mode in one embodiment of the present application.
- the component header number field num_component_headers is used to indicate the number of geometric headers and attribute header parameter sets contained in the current point cloud frame. If the value of this field is 0, it means that the corresponding geometry header and attribute header parameter sets are given in the decoder configuration information.
- the slice number field num_slices is used to indicate the number of point cloud slices contained in the current point cloud frame.
- header_type is used to indicate that the type of the parameter set is a geometry header or an attribute header.
- a value of 1 for this field indicates that the parameter set is a geometry header parameter set; a value of 2 indicates that the parameter set is an attribute header parameter set.
- the header length field header_length is used to indicate the length of the parameter set.
- the header data field header is used to indicate the data in the parameter set.
- the parsing of this field follows the definition of the parameter set in the corresponding encoding standard.
- the following fields of indication information can be provided.
- the slice type field slice_type is used to indicate the type of point cloud slice, which may specifically include point cloud geometric slices and point cloud attribute slices corresponding to different attribute information.
- the slice length field slice_length is used to indicate the length of the point cloud slice.
- Point cloud slices contain corresponding point cloud slice headers and data information.
- the slice data field slice is used to indicate the data in the point cloud slice.
- the parsing of this field follows the definition of point cloud header and data information in the corresponding encoding standard.
- the value of the patch type field when the value of the patch type field is the first value, it indicates that the type of the point cloud patch is a point cloud geometric patch; when the value of the patch type field is the second value, it indicates that the type of the point cloud patch is The type is the point cloud color attribute patch; when the patch type field value is the third value, it indicates that the point cloud patch type is the point cloud reflectance attribute patch; when the patch type field value is the fourth numerical value, it indicates the point cloud patch The type is a point cloud hybrid attribute piece that includes color attributes and reflectivity attributes.
- a value of 0 in the patch type field represents a point cloud geometry patch
- a value of 1 represents a point cloud color attribute patch
- a value of 2 represents a point cloud reflectance attribute patch
- a value of 3 represents a point cloud mixed attribute patch. (i.e. a point cloud attribute slice containing color and reflectivity).
- the electronic device can indicate the component data of the point cloud sample based on the field value, so the component header or point cloud slice can be selectively partially decoded, and further Improve the decoding flexibility and decoding efficiency of point cloud media, and further reduce the cost consumption of computing resources.
- the point cloud slice includes a title and data information; the media file data box of the point cloud sample also includes:
- the geometric header length field is used to indicate the length of the header when the point cloud slice is a geometric slice
- the geometric patch data length field is used to indicate the length of the data information when the point cloud patch is a geometric patch
- the attribute header length field is used to indicate the length of the header when the point cloud slice is an attribute slice
- the attribute patch data length field is used to indicate the length of the data information when the point cloud patch is an attribute patch
- the geometric slice header field is used to indicate the point cloud slice header when the point cloud slice is a geometric slice
- the geometric patch data field is used to indicate the data information when the point cloud patch is a geometric patch
- the attribute header field is used to indicate the point cloud header when the point cloud slice is an attribute slice
- the attribute patch data field is used to indicate the data information when the point cloud patch is an attribute patch.
- Figure 9 shows a syntax structure of a point cloud sample that provides specific indication information of a point cloud patch in one embodiment of the present application.
- the geometric slice header length field geo_slice_header_length and the geometric slice data length field geo_slice_data_length shown in Figure 10 can be used to indicate the length of the geometric slice header and the length of the geometric information respectively, and at the same time, the geometric slice header length field geo_slice_header_length and the geometric slice data length field geo_slice_data_length shown in Figure 10 can be used.
- the geometric slice header field geo_slice_header and the geometric slice data field geo_slice_data respectively indicate the data of the geometric slice header and geometric information.
- the attribute slice header length field attr_slice_header_length and the attribute slice data length field attr_slice_data_length shown in Figure 10 can be used to indicate the length of the attribute slice header and the length of the attribute information respectively.
- the attribute slice header length field attr_slice_data_length shown in Figure 10 can be used.
- the attribute slice header field attr_slice_header and the attribute slice data field attr_slice_data respectively indicate the data of the attribute slice header and attribute information.
- sub-samples can be further divided within the sample to achieve partial access.
- the media file data box of the sub-sample may include a sub-sample identification field, which is a flag bit used to indicate the type of sub-sample.
- the subsample identification field When the value of the subsample identification field is 0, it means that the subsample is a subsample based on a compression unit, that is, a subsample is composed of at least one compression unit in the point cloud sample.
- the subsample identification field When the value of the subsample identification field is 1, it indicates that the subsample is a subsample based on a tile, that is, a subsample is composed of a continuous unit sequence containing one or more compression units corresponding to a tile, or a subsample.
- a sample consists of a sequence of contiguous units containing one or more compression units for each parameter set, tile list, or frame boundary marker.
- the media file data box of the subsample may include the following fields related to codec_specific_parameters:
- the geometric head identification field is used to indicate whether the subsample is a geometric head parameter set
- the attribute header identification field is used to indicate whether the subsample is a set of attribute header parameters
- Geometric patch identification field used to indicate whether the subsample is a point cloud geometric patch
- the attribute patch identification field is used to indicate whether the subsample is a point cloud attribute patch.
- the media file data box of the subsample may also include an attribute type field, which is used to indicate the type of point cloud attribute when the subsample is a point cloud attribute piece.
- the value of the attribute type field when the value of the attribute type field is the first value, it indicates that the type of the point cloud attribute is a color attribute;
- the value of the attribute type field is the second value, it indicates that the type of the point cloud attribute is the reflectance attribute
- the value of the attribute type field is the third value, it indicates that the type of point cloud attribute is color attribute and reflectance attribute.
- Figure 10 shows the syntax structure corresponding to codec specific parameters in the media file data box of the subsample according to one embodiment of the present application.
- Geometry header identification field geo_header_flag a value of 1 indicates that the subsample is a geometry head parameter set; a value of 0 indicates that the subsample is not a geometry head parameter set.
- Attribute header identification field attr_header_flag a value of 1 indicates that the subsample is an attribute header parameter set; a value of 0 indicates that the subsample is not an attribute header parameter set.
- the geometric slice identification field geo_slice_flag has a value of 1 indicating that the subsample is a point cloud geometric slice; a value of 0 indicates that the subsample is not a point cloud geometric slice.
- Attribute slice identification field attr_slice_flag a value of 1 indicates that the subsample is a point cloud attribute slice; a value of 0 indicates that the subsample is not a point cloud attribute slice.
- the four flag bits of the geometry header identification field geo_header_flag, the attribute header identification field attr_header_flag, the geometry slice identification field geo_slice_flag, and the attribute slice identification field attr_slice_flag cannot be 0 at the same time.
- the attribute type field attr_type indicates the type of the point cloud attribute in the point cloud attribute piece.
- a value of 0 indicates that the point cloud attribute patch only contains color attributes; a value of 1 indicates that the point cloud attribute patch only contains reflectance attributes; a value of 2 indicates that the point cloud attribute patch contains both color attributes and reflectance attributes.
- different component information (geometric data, attribute data) in the point cloud media file can be encapsulated in different tracks based on the multi-track encapsulation mode. On this basis, the relevant fields of point cloud samples and their subsamples have corresponding value range constraints.
- the point cloud media file includes a first point cloud sample encapsulated in a geometry track, which is a track used to encapsulate geometric data.
- the slice type field has a value of the first value, and the slice type field of the first value is used to indicate that the type of the point cloud slice in the first point cloud sample is point.
- Cloud geometry slice in the media file data box of the sub-sample of the first point cloud sample, the value range of the attribute slice identification field does not include the second value, and the attribute slice identification field with the value of the second value is used to represent the sub-sample It is a point cloud attribute piece.
- the slice type field slice_type exists in its point cloud sample, this field can only take the value 0.
- the attribute slice identification field attr_slice_flag in its subsample definition is not allowed to have a value of 1.
- the point cloud media file includes a second point cloud sample encapsulated in an attribute track, which is a track used to encapsulate attribute data; in the media file data box of the second point cloud sample, The value of the header type field is the third value, and the header type field whose value is the third value is used to indicate that the type of the parameter set is an attribute header; in the media file data box of the second point cloud sample, the value of the slice type field The range does not include the first value, and the patch type field whose value is the first value is used to indicate that the point cloud patch in the second point cloud sample is a point cloud geometry patch; in the media file data box of the subsample of the second point cloud sample , the value range of the geometric head identification field does not include the second value, and the geometric head identification field whose value is the second value is used to indicate that the subsample is a geometric head parameter set; in the media file of the subsample of the second point cloud sample In the data box, the value range of the geometric patch
- the header type field header_type exists in the point cloud sample, this field can only have a value of 2. If the slice type field slice_type exists in the sample, its value cannot be 0. .
- the geometric header identification field geo_header_flag and the geometric slice identification field geo_slice_flag field in the subsample definition are not allowed to have a value of 1.
- the point cloud media file includes point cloud samples encapsulated in multiple tracks, and the media file data box of the point cloud sample includes metadata information corresponding to the track; the metadata information includes:
- the component type field is used to indicate the component type of the point cloud sample encapsulated in the track.
- the component type includes attribute components used to represent attribute data and geometric components used to represent geometric data;
- the attribute quantity field is used to indicate the number of attribute components encapsulated in the track
- Property type field used to indicate the type of property component encapsulated in the track.
- different values of the attribute type field can be used to represent different attribute component types.
- the value of the attribute type field is the first numerical value, it indicates that the type of the attribute component is the color attribute; when the value of the attribute type field is the second numerical value, it indicates that the type of the attribute component is the reflectance attribute.
- Figure 11 shows the syntax structure of metadata information indicating the number of attributes in multi-track encapsulation mode in one embodiment of the present application.
- the component type field gpcc_type indicates the type of component in the track.
- the attribute number field attr_num indicates the number of attribute components contained in the track.
- the attribute type field attr_type indicates the type of attribute component contained in the track.
- a value of 0 indicates that the component type is a color attribute; a value of 1 indicates that the component type is a reflectance attribute.
- the attribute number field can be combined with the attribute type field to indicate the type of each attribute component encapsulated in the track.
- Table 2 shows the semantic description of different values of the component type field in an embodiment of the present application.
- the point cloud media file includes point cloud samples encapsulated in multiple tracks, and the media file data box of the point cloud sample includes metadata information corresponding to the track; the metadata information includes:
- the component type field is used to indicate the component type of the point cloud sample encapsulated in the track.
- the component type includes attribute components used to represent attribute data and geometric components used to represent geometric data;
- Property type field used to indicate the type of property component encapsulated in the track.
- different values of the attribute type field can be used to represent different attribute component types.
- the value of the attribute type field is the first value, it indicates that the type of the attribute component is a color attribute; when the value of the attribute type field is the second value, it indicates that the type of the attribute component is a reflectance attribute; when the value of the attribute type field is When the value is the third value, it indicates that the type of the attribute component includes both color attributes and reflectivity attributes.
- Figure 12 shows the syntax structure of metadata information of extended attribute types in multi-track encapsulation mode in one embodiment of the present application.
- the component type field gpcc_type indicates the type of component in the track.
- the attribute type field attr_type indicates the type of attribute component contained in the track.
- a value of 0 indicates that the component type is a color attribute; a value of 1 indicates that the component type is a reflectance attribute; a value of 2 indicates that the component type contains both color attributes and reflectance attributes.
- the attribute type field can be used to separately indicate the type of each attribute component encapsulated in the track.
- the data source can send streaming media transmission signaling corresponding to the point cloud media file to the data receiver.
- the data receiver parses the streaming media transmission signaling sent by the data source to obtain the streaming media transmission signaling.
- the component descriptor carried in the command is used to indicate the type information and attribute information of the point cloud component encapsulated in the track; the point cloud media file sent by the data source is obtained according to the component descriptor.
- the data receiver here can be an electronic device that needs to receive the point cloud media file, such as a terminal. After receiving the point cloud media file, the terminal can decode the point cloud media file to obtain the point cloud data.
- Streaming media transmission signaling is a message transmitted between the data source and the data receiver for coordinating the communication process.
- it can be DASH signaling based on the DASH protocol, or it can also be SMT signaling.
- DASH signaling In single-track encapsulation mode, DASH signaling includes an Adaptation Set with one or more Representations. Each representation represents an independent point cloud code stream. If a Representation consists of multiple Media Segments, the DASH signaling also includes an Initialization Media Segment.
- the Initialization Media Segment contains the GPCC decoder configuration record GPCCDecoderConfigurationRecord with the G-PCC parameter set, such as SPS, GPS and APS defined in the standard ISO/IEC 23090-9.
- each G-PCC component in DASH signaling is represented as a separate Adaptation Set, which can be called Component Adaptation Set.
- the adaptation set containing geometric information is the main GPCC adaptation set that serves as the G-PCC content access point.
- the main GPCC adaptation set contains a single initialization segment Initialization Segment at the Adaptation Set level or multiple initialization segments Initialization Segment at the representation level (each representation corresponds to an initialization segment Initialization Segment).
- the Initialization Segment should contain the specified G-PCC parameter set, which is necessary to initialize the G-PCC decoder.
- a component descriptor GPCCComponent descriptor shall signal each point cloud component present in the Adaptation Set's representation Representation.
- the component descriptor includes:
- the component type field component@type is used to indicate that the type of point cloud component is a geometric component or an attribute component
- the component attribute quantity field component@attr_num is used to indicate the number of attribute components
- the component attribute type field component@attr_type is used to indicate the type of the attribute component.
- Table 3 shows the semantic interpretation of component descriptors in one embodiment of the present application.
- the component descriptor indicates the component type, number of component attributes, and component attribute type of the point cloud component carried in the point cloud media file, which can provide the data receiver with partial processing of the point cloud media file.
- Receiving or partially decoding the indication information can improve the transmission and decoding efficiency of point cloud media and reduce the cost consumption of bandwidth resources and computing resources.
- Figure 13 shows a step flow chart of the point cloud media encoding method in one embodiment of the present application.
- This method can be applied to the server, terminal (or client running on the terminal) and intermediate nodes of the point cloud media system.
- the embodiment of the present application takes an electronic device installed with a point cloud media encoding device to execute the point cloud media encoding method as an example.
- the point cloud media encoding method includes the following steps S1310 to S1330.
- step S1310 point cloud source data is obtained, and the point cloud source data includes multiple point cloud frames.
- Point cloud source data includes point cloud videos (images and/or videos) representing objects and/or environments located in various 3D spaces (eg, 3D spaces representing real environments, 3D spaces representing virtual environments, etc.).
- the data source may use one or more cameras (for example, an infrared camera capable of protecting depth information, an RGB camera capable of extracting color information corresponding to depth information, etc.), a projector (such as , infrared pattern projectors used to protect depth information), LiDRA and other acquisition devices to capture point cloud source data.
- cameras for example, an infrared camera capable of protecting depth information, an RGB camera capable of extracting color information corresponding to depth information, etc.
- a projector such as , infrared pattern projectors used to protect depth information
- LiDRA LiDRA and other acquisition devices to capture point cloud source data.
- the shape of the geometric structure composed of points in the 3D space can be extracted from the depth information of the point cloud source data, and the attributes of each point can be extracted from the color information of the point cloud source data to protect the point cloud source data.
- a point cloud video can include one or more point cloud frames, and one point cloud frame can represent one frame of point cloud image.
- point cloud video data may be captured based on at least one of inward-facing technology and outward-facing technology.
- Inward-facing technology refers to a technology that captures images of a central object with one or more cameras (or camera sensors) arranged around the central object. Inward-facing techniques can be used to generate point cloud content that provides the user with 360-degree images of key objects (e.g., VR/AR that provides the user with 360-degree images of key objects such as characters, players, objects, or actors). content).
- point cloud content e.g., VR/AR that provides the user with 360-degree images of key objects such as characters, players, objects, or actors).
- Outward-facing technology refers to a technology that uses one or more cameras (or camera sensors) arranged around the central object to capture the environment of the central object rather than the image of the central object.
- Point cloud content that provides the surrounding environment as it appears from the user's perspective may be generated using outward-facing techniques (eg, content representing the external environment that may be provided to a user of a self-driving vehicle).
- the data source can calibrate one or more cameras to set the global coordinate system prior to the capture operation. .
- the data source may generate point cloud content by compositing arbitrary images and/or videos with images and/or videos captured via the capture techniques described above.
- the data source may perform post-processing on the captured images and/or videos, which may, for example, remove unwanted areas (such as background), identify the spaces to which the captured images and/or videos are connected, and perform filling when spatial holes are present Operation of space holes and so on.
- the data source can generate a piece of point cloud content by performing coordinate transformations on the points of the point cloud video secured from each camera.
- the data source can perform coordinate transformations on points based on the coordinates of each camera location. Therefore, the data source can generate a point cloud content that represents a broad spatial extent, or it can generate point cloud content with a high density of points.
- step S1320 the point cloud frame is encoded to obtain at least one compression unit.
- step S1330 encapsulation processing is performed on at least one compression unit to obtain a point cloud media file.
- the point cloud media file includes point cloud samples encapsulated in one or more tracks; the media file data box of the point cloud sample includes instructions for Type field of the type of compression unit.
- the type of compression unit includes any one of geometry header, attribute header, geometry slice and attribute slice.
- the geometry header is used to indicate the parameter set of geometric information
- the attribute header is used to indicate the parameter set of attribute information.
- the geometry patch is point cloud patch data used to indicate geometric information
- the attribute patch is point cloud patch data used to indicate attribute information.
- the point cloud media file may be a media file or media segment obtained after encoding and encapsulation processing as shown in Figure 2.
- the media file or media segment carries a point cloud code stream to be transmitted.
- the data source can encapsulate the point cloud code stream into a single track based on the geometric parameter information, attribute parameter information and point cloud slice parameter information contained in the point cloud code stream, or it can also encapsulate a single track
- the point cloud media file of the track is repackaged into a point cloud media file containing multiple tracks.
- the track can be a volumetric visual track used to carry the encoded geometry bitstream or the encoded attribute bitstream, or it can be a volumetric visual track that carries both the encoded geometry bitstream and the encoded attribute bitstream.
- each point cloud sample can correspond to a complete point cloud frame.
- the media file data box may be a data box based on the ISO basic media file format ISOBMFF (ISO Base Media File Format).
- ISOBMFF ISO Base Media File Format
- ISOBMFF ISO Base Media File Format
- a point cloud sample can include one or more compression units based on TLV format encapsulation.
- it can include the parameter set unit Parameter Set TLV, the geometry information unit Geometry TLV, and the attribute information unit Attribute TLV in Figure 5. .
- TLV code stream format namely Type-length-value bytestream format
- TLV code stream format refers to a structure composed of data type Type, data length Length and data value Value.
- TLV code stream format please refer to the standard ISO/IEC 23090-9.
- G-PCC geometry track contains at least one compression unit G-PCC unit, which carries a single G-PCC component data unit instead of a geometry and attribute data unit or a multiplexing of different attribute data units.
- G-PCC attribute tracks should not reuse different attribute substreams, such as color and reflectivity.
- the data structure obtained by encapsulating the geometry code stream and attribute code stream in multiple tracks can be referred to Figure 6.
- ftyp represents the file type and describes the version of the specification that the point cloud sample complies with;
- moov box represents the metadata information of the point cloud sample;
- mdat represents the specific media data carried in the point cloud sample.
- field assignment can be performed on the media file data box of the point cloud sample.
- the assignment basis can refer to the TLV format shown in Figure 7
- the corresponding type field can be filled with a value of 0.
- the value 1 can be filled in the corresponding type field.
- the value 2 can be filled in the corresponding type field.
- attribute parameter set APS Attribute Parameter Set
- the value 4 can be filled in the corresponding type field.
- the value 5 can be filled in the corresponding type field.
- the value 6 can be filled in the corresponding type field.
- embodiments of the present application can also define the following compression unit types based on semantic descriptions: field is assigned a value.
- the corresponding type field can be filled with a value of 7.
- the geometry header is used to indicate a parameter set of geometry information.
- the corresponding type field can be filled with a value of 8, and the attribute header is used to indicate the parameter set of the attribute information.
- the geometry slice is used to indicate point cloud slice data of geometric information.
- the corresponding type field can be filled with a value of 10.
- the attribute slice is used to indicate the point cloud slice data of the attribute information.
- the compression unit G-PCC unit includes any one of a geometry header, an attribute header, a geometry slice, and an attribute slice.
- the compression unit G-PCC unit in the same point cloud sample corresponds to the same point cloud frame and has the same rendering time.
- the type field of the compression unit in the media file data box By providing the type field of the compression unit in the media file data box, different field values can be used to indicate that the compression unit to be decoded is a geometry header, an attribute header, a geometry slice, or an attribute slice, so that the point cloud media can be consumed according to the needs of the point cloud media.
- Figure 14 shows a flow chart of point cloud data encoding and decoding in a multi-track encapsulated streaming media transmission application scenario according to an embodiment of the present application.
- the server serves as the data source for producing point cloud media files. It can encode and send the point cloud data to the user's client (or the terminal running the client), and decode the point cloud media files through the client. The point cloud data can then be obtained for user consumption.
- the specific point cloud data encoding and decoding process may include the following steps.
- Step S1401 The server encapsulates the point cloud code stream into a multi-track point cloud media file F1 based on the geometric parameter information, attribute parameter information and point cloud slice parameter information contained in the point cloud code stream.
- the point cloud media file F1 may include, for example, three tracks: Track1, Track2, and Track3.
- Step S1402 The server converts the file F1 into multiple segments in a streaming media transmission scenario according to the DASH standard.
- Step S1403 The server generates MPD signaling information and sends it to the client.
- Step S1404 The client parses the component descriptor in the MPD signaling.
- the component descriptor includes an attribute quantity field attr_num and an attribute type field attr_type. Based on the field values of the component descriptor, the number and corresponding types of attribute components included in the Representation can be determined.
- Step S1405 The client requests corresponding expressions for consumption according to its own bandwidth and needs.
- client 1 can request geometric data and color attribute data for consumption
- client 2 can request geometric data, color, and reflectivity attribute data for consumption.
- Figure 15 shows a flow chart of point cloud data encoding and decoding in the application scenario of single-track encapsulated local point cloud media playback according to an embodiment of the present application.
- the server as the data source for producing point cloud media files, can encode and send the point cloud data to the user's client. After decoding the point cloud media files through the client, the point cloud data can be obtained for use. User consumption.
- the specific point cloud data encoding and decoding process may include the following steps.
- Step S1501 The server encapsulates the point cloud code stream into a single-track point cloud media file F1 based on the geometric parameter information, attribute parameter information and point cloud slice parameter information contained in the point cloud code stream.
- sub-samples of the point cloud samples can be divided to achieve partial access.
- a subsample is a subsample based on a compression unit, that is, a subsample is composed of at least one compression unit in a point cloud sample.
- the media file data box of the sub-sample may include a sub-sample identification field, which is a flag bit used to indicate the type of sub-sample.
- a sub-sample identification field When the value of the subsample identification field is 0, it means that the subsample is a subsample based on a compression unit, that is, a subsample is composed of at least one compression unit in the point cloud sample.
- the subsample's media file data box may include the following fields related to codec_specific_parameters:
- the geometric head identification field is used to indicate whether the subsample is a geometric head parameter set
- the attribute header identification field is used to indicate whether the subsample is a set of attribute header parameters
- Geometric patch identification field used to indicate whether the subsample is a point cloud geometric patch
- the attribute patch identification field is used to indicate whether the subsample is a point cloud attribute patch
- the attribute type field is used to indicate the type of point cloud attributes when the subsample is a point cloud attribute patch.
- the geometric header identification field geo_header_flag has a value of 1 indicating that the subsample is a geometric head parameter set; a value of 0 indicates that the subsample is not a geometric head parameter set.
- Attribute header identification field attr_header_flag a value of 1 indicates that the subsample is an attribute header parameter set; a value of 0 indicates that the subsample is not an attribute header parameter set.
- the geometric slice identification field geo_slice_flag has a value of 1 indicating that the subsample is a point cloud geometric slice; a value of 0 indicates that the subsample is not a point cloud geometric slice.
- Attribute slice identification field attr_slice_flag a value of 1 indicates that the subsample is a point cloud attribute slice; a value of 0 indicates that the subsample is not a point cloud attribute slice.
- the four flag bits of the geometry header identification field geo_header_flag, the attribute header identification field attr_header_flag, the geometry slice identification field geo_slice_flag, and the attribute slice identification field attr_slice_flag cannot be 0 at the same time.
- the attribute type field attr_type indicates the type of the point cloud attribute in the point cloud attribute piece.
- a value of 0 indicates that the point cloud attribute patch only contains color attributes; a value of 1 indicates that the point cloud attribute patch only contains reflectance attributes; a value of 2 indicates that the point cloud attribute patch contains both color attributes and reflectance attributes.
- Step S1502 The server transmits the point cloud media file F1 to the client.
- Step S1503 The client parses the media file data box of the point cloud media file F1 and obtains the sub-sample division information contained in the point cloud sample.
- Step S1504 The client selectively decodes and consumes the point cloud samples in the point cloud media file F1 according to the sub-sample division information.
- the data type contained in the compression unit corresponding to the subsample can be determined based on the geometry header identification field geo_header_flag, attribute header identification field attr_header_flag, geometric slice identification field geo_slice_flag, attribute slice identification field attr_slice_flag, attribute type field attr_type and other information in the media file data box.
- client 1 can partially decode geometric data and color attribute data for consumption.
- Client 2 can completely decode geometric data, color attribute data, and reflectivity attribute data for consumption.
- the embodiment of the present application can define in the media file data box the component information in the track samples and sub-samples in the single-track encapsulation mode, and the component information in the track samples and sub-samples in the multi-track encapsulation mode.
- Component information, as well as component indication information that defines the track in multi-track encapsulation mode enable the client to transmit, decapsulate and decode the required point cloud data according to the component type, achieving the purpose of partial access and partial transmission, and improving point cloud data transmission. efficiency and decoding efficiency to maximize savings in bandwidth and computing resources.
- Figure 16 schematically shows a structural block diagram of a point cloud media decoding device provided by an embodiment of the present application.
- the point cloud media decoding device 1600 may include:
- the acquisition module 1610 is used to acquire point cloud media files, which include point cloud samples encapsulated in one or more tracks;
- the decapsulation module 1620 is used to decapsulate the point cloud sample to obtain at least one compression unit;
- the media file data box of the point cloud sample includes a type field used to indicate the type of compression unit, and the type of compression unit includes a geometry header, Any one of attribute header, geometry slice and attribute slice.
- the geometry header is used to indicate the parameter set of geometric information.
- the attribute header is used to indicate the parameter set of attribute information.
- the geometry slice is the point cloud slice data used to indicate geometric information.
- Attribute Patch is point cloud patch data used to indicate attribute information;
- the decoding module 1630 is used to select the target compression unit according to the type field, and decode the target compression unit to obtain point cloud data.
- Figure 17 schematically shows a structural block diagram of a point cloud media encoding device provided by an embodiment of the present application.
- the point cloud media encoding device 1700 may include:
- the acquisition module 1710 is used to acquire point cloud source data, where the point cloud source data includes multiple point cloud frames;
- Encoding module 1720 used to encode point cloud frames to obtain at least one compression unit
- the encapsulating module 1730 is used to encapsulate at least one compression unit to obtain a point cloud media file.
- the point cloud media file includes point cloud samples encapsulated in one or more tracks; the media file data box of the point cloud sample includes: A type field indicating the type of compression unit.
- the type of compression unit includes any one of geometry header, attribute header, geometry slice and attribute slice.
- the geometry header is used to indicate the parameter set of geometric information, and the attribute header is used to indicate the parameters of attribute information.
- the geometry patch is point cloud patch data used to indicate geometric information
- the attribute patch is point cloud patch data used to indicate attribute information.
- Figure 18 schematically shows a system structural block diagram of an electronic device used to implement an embodiment of the present application.
- the system 1800 includes a central processing unit 1801 (Central Processing Unit, CPU), which can be loaded into random access according to the program stored in the read-only memory 1802 (Read-Only Memory, ROM) or from the storage part 1808.
- the program in the memory 1803 (Random Access Memory, RAM) performs various appropriate actions and processing.
- RAM Random Access Memory
- various programs and data required for system operation are also stored.
- the central processing unit 1801, the read-only memory 1802 and the random access memory 1803 are connected to each other through a bus 1804.
- the input/output interface 1805 Input/Output interface, ie I/O interface
- Ie I/O interface is also connected to the bus 1804.
- the following components are connected to the input/output interface 1805: an input part 1806 including a keyboard, a mouse, etc.; an output part 1807 including a cathode ray tube (Cathode Ray Tube, CRT), a liquid crystal display (Liquid Crystal Display, LCD), etc., and a speaker, etc. ; a storage section 1808 including a hard disk, etc.; and a communication section 1809 including a network interface card such as a LAN card, a modem, etc. The communication section 1809 performs communication processing via a network such as the Internet.
- Driver 1810 is also connected to input/output interface 1805 as needed.
- Removable media 1811 such as magnetic disks, optical disks, magneto-optical disks, semiconductor memories, etc., are installed on the drive 1810 as needed so that computer readable instructions read therefrom are installed into the storage portion 1808 as needed.
- the processes described in the respective method flow charts may be implemented as computer software programs.
- embodiments of the present application include a computer program product including computer-readable instructions carried on a computer-readable medium, the computer-readable instructions containing program code for performing the method illustrated in the flowchart.
- the computer-readable instructions may be downloaded and installed from the network via communications portion 1809 and/or installed from removable media 1811 .
- the central processor 1801 When the computer readable instructions are executed by the central processor 1801, various functions defined in the system of the present application are performed.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
一种点云媒体的解码方法,包括:获取点云媒体文件,所述点云媒体文件包括封装于一个或者多个轨道中的点云样本;对点云样本进行解封装处理,得到至少一个压缩单元;点云样本的媒体文件数据盒包括用于指示压缩单元的类型的类型字段,压缩单元的类型包括几何头、属性头、几何片和属性片中的任意一个,几何头用于指示几何信息的参数集合,属性头用于指示属性信息的参数集合,几何片是用于指示几何信息的点云片数据,属性片是用于指示属性信息的点云片数据;根据类型字段选取目标压缩单元,并对目标压缩单元进行解码处理,得到点云数据。
Description
本申请要求于2022年03月11日提交中国专利局,申请号为202210243282.2,申请名称为“点云媒体的编解码方法及相关产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请属于音视频技术领域,具体涉及一种点云媒体的编码方法、点云媒体的解码方法、点云媒体的编码装置、点云媒体的解码装置、计算机可读介质、电子设备以及计算机程序产品。
点云是空间中一组无规则分布的、表达三维物体或场景的空间结构及表面属性的离散点集。在通过点云采集设备获取到大规模的点云数据后,可以对点云数据进行编码封装以向用户传输和呈现。点云媒体在编码传输和解码消费等过程中普遍存在传输数据量大、数据冗余等缺陷,因此如何提高点云媒体的编解码灵活性是目前亟待解决的问题。
发明内容
一种点云媒体的解码方法,该方法包括:
获取点云媒体文件,所述点云媒体文件包括封装于一个或者多个轨道中的点云样本;
对所述点云样本进行解封装处理,得到至少一个压缩单元;所述点云样本的媒体文件数据盒包括用于指示所述压缩单元的类型的类型字段,所述压缩单元的类型包括几何头、属性头、几何片和属性片中的任意一个,所述几何头用于指示几何信息的参数集合,所述属性头用于指示属性信息的参数集合,所述几何片是用于指示几何信息的点云片数据,所述属性片是用于指示属性信息的点云片数据;
根据类型字段选取目标压缩单元,并对所述目标压缩单元进行解码处理,得到点云数据。
一种点云媒体的编码方法,该方法包括:
获取点云源数据,所述点云源数据包括多个点云帧;
对所述点云帧进行编码处理,得到至少一个压缩单元;
对所述至少一个压缩单元进行封装处理,得到点云媒体文件,所述点云媒体文件包括封装于一个或者多个轨道中的点云样本;所述点云样本的媒体文件数据盒包括用于指示所述压缩单元的类型的类型字段,所述压缩单元的类型包括几何头、属性头、几何片和属性片中的任意一个,所述几何头用于指示几何信息的参数集合,所述属性头用于指示属性信息的参数集合,所述几何片是用于指示几何信息的点云片数据,所述属性片是用于指示属性信息的点云片数据。
一种点云媒体的解码装置,该装置包括:
获取模块,用于获取点云媒体文件,所述点云媒体文件包括封装于一个或者多个轨道中的点云样本;
解封装模块,用于对所述点云样本进行解封装处理,得到至少一个压缩单元;所述 点云样本的媒体文件数据盒包括用于指示所述压缩单元的类型的类型字段,所述压缩单元的类型包括几何头、属性头、几何片和属性片中的任意一个,所述几何头用于指示几何信息的参数集合,所述属性头用于指示属性信息的参数集合,所述几何片是用于指示几何信息的点云片数据,所述属性片是用于指示属性信息的点云片数据;
解码模块,用于根据类型字段选取目标压缩单元,并对所述目标压缩单元进行解码处理,得到点云数据。
一种点云媒体的编码装置,该装置包括:
获取模块,用于获取点云源数据,所述点云源数据包括多个点云帧;
编码模块,用于对所述点云帧进行编码处理,得到至少一个压缩单元;
封装模块,用于对所述至少一个压缩单元进行封装处理,得到点云媒体文件,所述点云媒体文件包括封装于一个或者多个轨道中的点云样本;所述点云样本的媒体文件数据盒包括用于指示所述压缩单元的类型的类型字段,所述压缩单元的类型包括几何头、属性头、几何片和属性片中的任意一个,所述几何头用于指示几何信息的参数集合,所述属性头用于指示属性信息的参数集合,所述几何片是用于指示几何信息的点云片数据,所述属性片是用于指示属性信息的点云片数据。
一种计算机可读介质,其上存储有计算机可读指令,该计算机可读指令被处理器执行时实现如以上技术方案中的点云媒体的编解码方法。
一种电子设备,该电子设备包括:处理器;以及存储器,用于存储所述处理器的计算机可读指令;其中,所述处理器用于经由执行所述计算机可读指令来执行如以上技术方案中的点云媒体的编解码方法。
一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机可读指令,该计算机可读指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机可读指令,处理器执行该计算机可读指令,使得该计算机设备执行如以上技术方案中的点云媒体的编解码方法。
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了可以应用本申请实施例技术方案的示例性系统架构的示意图;
图2示出了本申请实施例在一个应用场景中的点云媒体编解码流程示意图;
图3示出了本申请一个实施例中的点云媒体的解码方法的步骤流程图;
图4示出了本申请一个实施例中基于TLV(Tag-Length-Value)格式封装点云样本的语法结构;
图5示出了本申请一个实施例在单个轨道中封装几何码流和属性码流的示例性结构;
图6示出了本申请一个实施例在多个轨道中封装几何码流和属性码流的示例性结构;
图7示出了本申请一个实施例中基于TLV格式封装的压缩单元的语法结构;
图8示出了本申请一个实施例中基于G-PCC压缩模式封装点云样本的语法结构;
图9示出了本申请一个实施例中提供点云片的具体指示信息的点云样本语法结构;
图10示出了本申请一个实施例在子样本的媒体文件数据盒中与编解码器具体参数对应的语法结构;
图11示出了本申请一个实施例中多轨道封装模式下指示属性数量的元数据信息的语法结构;
图12示出了本申请一个实施例中多轨道封装模式下扩展属性类型的元数据信息的语法结构;
图13示出了本申请一个实施例中点云媒体的编码方法的步骤流程图;
图14示出了本申请实施例在多轨道封装的流媒体传输应用场景中进行点云数据编解码的流程图;
图15示出了本申请实施例在单轨道封装的本地点云媒体播放的应用场景中进行点云数据编解码的流程图;
图16示意性地示出了本申请实施例提供的点云解码装置的结构框图;
图17示意性地示出了本申请实施例提供的点云编码装置的结构框图;
图18示意性示出了适于用来实现本申请实施例的电子设备的系统结构框图。
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本申请将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
在本文中提及的“多个”是指两个或两个以上。“和/或”描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
在本申请的具体实施方式中,涉及到点云媒体的传输内容、解码内容和消费内容等与用户相关的数据,当本申请的各个实施例运用到具体产品或技术中时,需要获得用户许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
本申请实施例中涉及的相关术语或者缩略语解释如下。
沉浸式媒体:能为消费者带来沉浸式体验的媒体内容,沉浸式媒体按照用户在消费媒体内容时的自由度,可以分为3DoF媒体、3DoF+媒体以及6DoF媒体。点云媒体即一种典型的6DoF媒体。
DoF:Degree of Freedom,自由度。本申请中是指用户在观看沉浸式媒体时支持的运动并产生内容交互的自由度。
3DoF:即三自由度,指用户头部围绕x,y,z轴旋转的三种自由度。
3DoF+:即在三自由度的基础上,用户还拥有沿x,y,z轴有限运动的自由度。
6DoF:即在三自由度的基础上,用户还拥有沿x,y,z轴自由运动的自由度。
点云:点云是空间中一组无规则分布的、表达三维物体或场景的空间结构及表面属性的离散点集。点云中的每个点至少具有三维位置信息,根据应用场景的不同,还可能具有色彩、材质或其他信息。通常,点云中的每个点都具有相同数量的附加属性。
PCC:Point Cloud Compression,点云压缩。
G-PCC:Geometry-based Point Cloud Compression,基于几何模型的点云压缩。
Sample:样本,媒体文件封装过程中的封装单位,一个媒体文件由很多个样本组成。以视频媒体为例,视频媒体的一个样本通常为一个视频帧。
DASH:dynamic adaptive streaming over HTTP,基于HTTP的动态自适应流是一种自适应比特率流技术,使高质量流媒体可以通过传统的HTTP网络服务器以互联网传递。
MPD:media presentation description,DASH中的媒体演示描述信令,用于描述媒体片段信息。
Representation:DASH中,一个或多个媒体成分的组合,比如某种分辨率的视频文件可以看做一个Representation。
Adaptation Sets:DASH中,一个或多个视频流的集合,一个Adaptation Sets中可以包含多个Representation。
Media Segment:媒体片段。符合一定的媒体格式、可播放的片段。播放时可能需要与其前面的0个或多个片段以及初始化片段配合。
点云媒体从编码方式上又可以分为基于传统视频编码方式进行压缩的点云媒体(Video-based Point Cloud Compression,VPCC)以及基于几何特征进行压缩的点云媒体(Geometry-based Point Cloud Compression,GPCC)。在点云媒体的文件封装中,三维位置信息通常称为点云媒体文件的几何组件(Geometry Component),属性信息称为点云媒体文件的属性组件(Attribute Component)。一个点云媒体文件仅有一个几何组件,但可以存在一个或多个属性组件。
点云可以灵活方便地表达三维物体或场景的空间结构及表面属性,因而应用广泛,其主要应用场景可以归为两大类别。1)机器感知点云,例如计算机辅助设计(Computer Aided Design,CAD)、自主导航系统(Autonomous Navigation System,ANS)、实时巡检系统、地理信息系统(Geography Information System,GIS)、视觉分拣机器人、抢险救灾机器人。2)人眼感知点云,例如虚拟现实(Virtual Reality,VR)游戏、数字文化遗产、自由视点广播、三维沉浸通信、三维沉浸交互等点云应用场景。
点云的获取主要有以下途径:计算机生成、3D激光扫描、3D摄影测量等。计算机可以生成虚拟三维物体及场景的点云。3D扫描可以获得静态现实世界三维物体或场景 的点云,每秒可以获取百万级点云。3D摄像可以获得动态现实世界三维物体或场景的点云,每秒可以获取千万级点云。此外,在医学领域,由MRI、CT、电磁定位信息,可以获得生物组织器官的点云。这些技术降低了点云数据获取成本和时间周期,提高了数据的精度。点云数据获取方式的变革,使大量点云数据的获取成为可能。伴随着大规模的点云数据不断积累,点云数据的高效存储、传输、发布、共享和标准化,成为点云应用的关键。
图1示出了可以应用本申请实施例技术方案的示例性系统架构的示意图。
如图1所示,系统架构100包括多个终端,所述终端可通过例如网络150彼此通信。举例来说,系统架构100可以包括通过网络150互连的第一终端110和第二终端120。在图1的实施例中,第一终端110和第二终端120执行单向数据传输。
举例来说,第一终端110可对点云数据(例如由终端110采集的点云数据)进行编码以通过网络150传输到第二终端120,已编码的点云数据以一个或多个已编码点云码流形式传输,第二终端120可从网络150接收已编码点云数据,对已编码点云数据进行解码以恢复点云数据,并根据恢复的点云数据显示点云内容。
在本申请的一个实施例中,系统架构100可以包括执行已编码点云数据的双向传输的第三终端130和第四终端140,所述双向传输比如可以发生在视频会议期间。对于双向数据传输,第三终端130和第四终端140中的每个终端可对点云数据(例如由终端采集的点云数据)进行编码,以通过网络150传输到第三终端130和第四终端140中的另一终端。第三终端130和第四终端140中的每个终端还可接收由第三终端130和第四终端140中的另一终端传输的已编码点云数据,且可对已编码点云数据进行解码以恢复点云数据,并可根据恢复的点云数据在可访问的显示装置上显示点云内容。
在图1的实施例中,第一终端110、第二终端120、第三终端130和第四终端140可为服务器、个人计算机和智能电话,但本申请公开的原理可不限于此。本申请公开的实施例适用于膝上型计算机、平板电脑、媒体播放器和/或专用视频会议设备。网络150表示在第一终端110、第二终端120、第三终端130和第四终端140之间传送已编码点云数据的任何数目的网络,包括例如有线和/或无线通信网络。通信网络150可在电路交换和/或分组交换信道中交换数据。该网络可包括电信网络、局域网、广域网和/或互联网。出于本申请的目的,除非在下文中有所解释,否则网络150的架构和拓扑对于本申请公开的操作来说可能是无关紧要的。
本申请实施例中的服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云计算服务的云服务器。终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、车载终端、智能电视等,但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。
在对点云媒体进行编码后,需要对编码后的数据流进行封装并传输给用户。相对应地,在点云媒体播放器端,需要先对点云文件进行解封装,然后再进行解码,最后将解码后的数据流呈现。
图2示出了本申请实施例在一个应用场景中的点云媒体编解码流程示意图。
通过采集设备210进行点云数据采集可以捕获真实世界的视觉场景A,采集设备210 例如可以是一组相机或者一个具有多镜头和传感器的相机设备。采集结果为点云源数据B,点云源数据B是由大量点云帧组成的帧序列。通过编码器220可以对一个或多个点云帧进行编码处理,得到编码后的G-PCC比特流,具体可以包括编码的几何比特流和属性比特流E。文件封装器230可以根据特定的媒体容器文件格式,对一个或多个编码比特流进行封装处理,得到用于文件回放的媒体文件F或一系列初始化段和用于流式传输的媒体片段Fs。在本申请的一些实施例中,媒体容器文件格式例如可以是ISO/IEC 14496-12[ISOBMFF]中指定的ISO基本媒体文件格式。文件封装器230还可以将元数据封装在媒体文件F或媒体片段Fs中。
文件封装器230输出的媒体文件F与输入至文件解封装器240的媒体文件F'相同。文件解封装器通过处理媒体文件F'或处理接收到的媒体片段F's,可以提取得到编码比特流E'并解析元数据。解码器250可以将G-PCC比特流解码为解码信号D',并根据解码信号D'生成点云数据。适用时,基于由各种类型的传感器(例如设置在头部的传感器)确定的当前观看位置、观看方向或视口,可以通过渲染器260将点云数据渲染并显示到头戴式显示器或任何其他显示设备的屏幕上。除了被播放器用来访问解码后的点云数据的适当部分外,当前的观看位置或观看方向也可以用于解码优化。在视口相关的内容分发器270中,当前的观看位置和观看方向也被传递给策略模块,该策略模块可以用于确定要接收的轨道。
在点云媒体的传输技术中,通常采用流化传输技术来处理服务器和客户端之间的媒体资源传输。常见的媒体流化传输技术包括DASH(Dynamic Adaptive Streaming over HTTP),HLS(HTTP Live Streaming),SMT(Smart Media Transport)等技术。
以DASH为例,DASH是一种自适应比特率流技术,使高质量流媒体可以通过传统的HTTP网络服务器以互联网传递。DASH会将内容分解成一系列小型的基于HTTP的文件片段,每个片段包含很短长度的可播放内容,而内容总长度可能长达数小时(例如电影或体育赛事直播)。内容将被制成多种比特率的备选片段,以提供多种比特率的版本供选用。当媒体内容被DASH客户端播放时,客户端将根据当前网络条件自动选择下载和播放哪一个备选方案。客户端将选择可及时下载的最高比特率片段进行播放,从而避免播放卡顿或重新缓冲事件。也因如此,DASH客户端可以无缝适应不断变化的网络条件并提供高质量的播放体验,拥有更少的卡顿与重新缓冲发生率。
DASH使用现有的HTTP网络服务器基础设施。它允许如互联网电视、电视机顶盒、台式电脑、智能手机、平板电脑等设备消费通过互联网传送的多媒体内容(如视频、电视、广播等),并可应对变动的互联网接收条件。
下面结合具体实施方式对本申请提供的点云媒体的编码方法、点云媒体的解码方法、点云媒体的编码装置、点云媒体的解码装置、计算机可读存储介质、电子设备以及计算机程序产品等技术方案做出详细说明。本申请实施例的各项技术方案可以应用于沉浸式媒体系统的服务器、播放器以及中间节点等环节。
图3示出了本申请一个实施例中的点云媒体的解码方法的步骤流程图,该方法可以应用于点云媒体系统的服务器、终端(或终端上运行的客户端)以及中间节点等环节,本申请实施例以安装有点云媒体的解码装置的电子设备来执行点云媒体的解码方法作为示例。如图3所示,该点云媒体的解码方法包括如下的步骤S310至步骤S330。
在步骤S310中,获取点云媒体文件,点云媒体文件包括封装于一个或者多个轨道中的点云样本。
点云媒体文件可以是如图2所示的经过编码和封装处理后得到的媒体文件或者媒体片段,该媒体文件或者媒体片段中承载有待传输的点云码流。
在本申请的一个实施例中,数据源可以根据点云码流中包含的几何参数信息、属性参数信息以及点云片的参数信息,将点云码流封装为单一轨道,或者也可以将单一轨道的点云媒体文件重新封装为包含多个轨道的点云媒体文件。数据源是生产点云媒体文件的电子设备,如服务器,若执行点云媒体的解码方法的电子设备为终端,则终端可以从该服务器获得点云媒体文件,若执行点云媒体的解码方法的电子设备即为该服务器,则电子设备可以直接得到该点云媒体文件。
轨道是指用于承载编码几何比特流或者编码属性比特流的体积视觉轨道(volumetric visual track),也可以是同时承载编码几何比特流和编码属性比特流的体积视觉轨道。
在点云码流以单轨道封装的情况下,每个点云样本都可以对应于一个的完整的点云帧。
在步骤S320中,对点云样本进行解封装处理,得到至少一个压缩单元;点云样本的媒体文件数据盒包括用于指示压缩单元的类型的类型字段,压缩单元的类型包括几何头、属性头、几何片和属性片中的任意一个,几何头用于指示几何信息的参数集合,属性头用于指示属性信息的参数集合,几何片是用于指示几何信息的点云片数据,属性片是用于指示属性信息的点云片数据。
媒体文件数据盒可以是基于ISO基本媒体文件格式ISOBMFF(ISO Base Media File Format)的数据盒。ISOBMFF的具体信息可以参考标准ISO/IEC 14496-12。
当G-PCC码流被承载在单个轨道中时,可以通过将G-PCC码流存储在单个轨道中来利用简单的ISOBMFF封装,而无需进一步处理。
图4示出了本申请一个实施例中基于TLV格式封装点云样本的语法结构。每个点云样本由一个或多个压缩单元G-PCC unit组成。
图5示出了本申请一个实施例在单个轨道中封装几何码流和属性码流的示例性结构。如图5所示,在单轨道封装模式下,一个点云样本可以包括基于TLV格式封装的一个或者多个压缩单元,例如可以包括图5中的参数集合单元Parameter Set TLV、几何信息单元Geometry TLV和属性信息单元Attribute TLV。
TLV码流格式,即Type-length-value bytestream format,指的是由数据的类型Type、数据的长度Length和数据的值Value组成的结构体。关于TLV码流格式的具体信息可以参考标准ISO/IEC 23090-9。
图6示出了本申请一个实施例在多个轨道中封装几何码流和属性码流的示例性结构。其中,ftyp表示文件类型,描述点云样本遵从的规范的版本;moov表示点云样本的元数据信息metadata;mdat表示点云样本中携带的具体的媒体数据。
如图6所示,在多轨道封装模式下,每个点云组件的码流数据被映射到单独的轨道中。G-PCC组件轨道有两种类型:G-PCC几何轨道和G-PCC属性轨道。轨道中的每个点云样本都包含至少一个压缩单元G-PCC unit,该压缩单元承载单个G-PCC组件数据 单元,而不是几何和属性数据单元或者不同属性数据单元的复用。G-PCC属性轨道不应复用不同的属性子流,例如颜色、反射率。
图7示出了本申请一个实施例中基于TLV格式封装的压缩单元的语法结构。其中,tlv_type是用于指示压缩单元的类型的类型字段。表1示出了本申请一个实施例中关于压缩单元类型字段不同取值的语义描述。
表1
tlv_type | Description |
0 | Sequence parameter set |
1 | Geometry parameter set |
2 | Geometry data unit |
3 | Attribute parameter set |
4 | Attribute data unit |
5 | Tile inventory |
6 | Frame boundary marker |
7 | 几何头 |
8 | 属性头 |
9 | 几何片 |
10 | 属性片 |
如表1所示,不同取值的类型字段可以用于指示不同的压缩单元类型。
当类型字段取值为0时,表示压缩单元的类型为序列参数集合SPS(Sequence Parameter Set)。
当类型字段取值为1时,表示压缩单元的类型为几何参数集合GPS(Geometry Parameter Set)。
当类型字段取值为2时,表示压缩单元的类型为几何数据单元Geometry data unit。
当类型字段取值为3时,表示压缩单元的类型为属性参数集合APS(Attribute Parameter Set)。
当类型字段取值为4时,表示压缩单元的类型为属性数据单元Attribute data unit。
当类型字段取值为5时,表示压缩单元的类型为图块清单Tile inventory。
当类型字段取值为6时,表示压缩单元的类型为帧边界标记Frame boundary marker。
以上压缩单元类型的具体信息可以参考标准ISO/IEC 23090-9。
为了提供更为具体的点云组件的指示信息,从而能够根据实际的点云消费需求更加灵活地进行点云媒体文件的传输和解码,本申请实施例还提供如下的压缩单元类型字段的语义描述。
当类型字段取值为7时,表示压缩单元的类型为几何头,几何头用于指示几何信息的参数集合。
当类型字段取值为8时,表示压缩单元的类型为属性头,属性头用于指示属性信息的参数集合。
当类型字段取值为9时,表示压缩单元的类型为几何片,几何片用于指示几何信息的点云片数据。
当类型字段取值为10时,表示压缩单元的类型为属性片,属性片用于指示属性信息的点云片数据。
在本申请实施例中,压缩单元G-PCC unit包含几何头、属性头、几何片、属性片中的任意一个。同一个点云样本中的压缩单元G-PCC unit对应同一个点云帧且具备相同的呈现时间。
在步骤S330中,根据类型字段选取目标压缩单元,并对目标压缩单元进行解码处理,得到点云数据。
电子设备通过在媒体文件数据盒中提供压缩单元的类型字段,可以利用不同的字段取值指示待解码的压缩单元为几何头、属性头、几何片或者属性片,从而可以根据点云媒体的消费需求有选择地对部分文件内容进行数据解码,而无需解码全部的文件内容,因此不仅可以提高点云数据消费的灵活性,而且可以显著提高点云数据的解码效率,降低计算资源的消耗。
在本申请的一个实施例中,点云样本的媒体文件数据盒还可以包括:
组件头数量字段num_component_headers,用于指示点云样本中包括的几何头参数集合以及属性头参数集合的数量。
片数量字段num_slices,用于指示点云样本中包括的点云片的数量,即几何片和属性片的数量。
在本申请的一个实施例中,当组件头数量字段的取值为0时,表示几何头参数集合以及属性头参数集合为解码器配置信息。
图8示出了本申请一个实施例中基于G-PCC压缩模式封装点云样本的语法结构。
组件头数量字段num_component_headers,用于指示当前点云帧包含的几何头以及属性头参数集合的个数。若该字段取值为0,则说明相应的几何头以及属性头参数集合在解码器配置信息中给出。
片数量字段num_slices,用于指示当前点云帧包含的点云片的个数。
针对组件头部分,可以提供如下指示信息的字段。
头类型字段header_type,用于指示参数集合的类型为几何头或者属性头。该字段取值为1表示参数集合为几何头参数集合;取值为2表示参数集合为属性头参数集合。
头长度字段header_length,用于指示参数集合的长度。
头数据字段header,用于指示参数集合中的数据。该字段的解析遵循相应编码标准中的参数集合的定义。
针对点云片部分,可以提供如下指示信息的字段。
片类型字段slice_type,用于指示点云片的类型,具体可以包括点云几何片以及对应于不同属性信息的点云属性片。
片长度字段slice_length,用于指示点云片的长度。点云片包含相应的点云片头与数据信息。
片数据字段slice,用于指示点云片中的数据。该字段的解析遵循相应编码标准中的点云片头和数据信息的定义。
在本申请的一个实施例中,当片类型字段取值为第一数值时,表示点云片的类型为点云几何片;当片类型字段取值为第二数值时,表示点云片的类型为点云颜色属性片; 当片类型字段取值为第三数值时,表示点云片的类型为点云反射率属性片;当片类型字段取值为第四数值时,表示点云片的类型为包括颜色属性和反射率属性的点云混合属性片。
举例而言,片类型字段取值为0表示点云几何片;取值为1表示点云颜色属性片;取值为2表示点云反射率属性片;取值为3表示点云混合属性片(即包含颜色和反射率的点云属性片)。
电子设备通过提供与组件头或者点云片相关的数量字段和指示信息字段,可以基于字段取值指示点云样本的组件数据,因此可以对组件头或者点云片进行有选择地部分解码,进一步提高点云媒体的解码灵活性和解码效率,并进一步降低计算资源的成本消耗。
在本申请的一个实施例中,点云片包括片头和数据信息;点云样本的媒体文件数据盒还包括:
几何片头长度字段,用于指示点云片为几何片时的片头的长度;
几何片数据长度字段,用于指示点云片为几何片时的数据信息的长度;
属性片头长度字段,用于指示点云片为属性片时的片头的长度;
属性片数据长度字段,用于指示点云片为属性片时的数据信息的长度;
几何片头字段,用于指示点云片为几何片时的点云片头;
几何片数据字段,用于指示点云片为几何片时的数据信息;
属性片头字段,用于指示点云片为属性片时的点云片头;
属性片数据字段,用于指示点云片为属性片时的数据信息。
图9示出了本申请一个实施例中提供点云片的具体指示信息的点云样本语法结构。
当点云片的类型为几何片时,可以利用图10所示的几何片头长度字段geo_slice_header_length和几何片数据长度字段geo_slice_data_length分别指示几何片头的长度以及几何信息的长度,同时可以利用图10所示的几何片头字段geo_slice_header和几何片数据字段geo_slice_data分别指示几何片头以及几何信息的数据。
当点云片的类型为属性片时,可以利用图10所示的属性片头长度字段attr_slice_header_length和属性片数据长度字段attr_slice_data_length分别指示属性片头的长度以及属性信息的长度,同时可以利用图10所示的属性片头字段attr_slice_header和属性片数据字段attr_slice_data分别指示属性片头以及属性信息的数据。
在本申请的一个实施例中,对于一个点云帧样本,可以在其样本内进一步划分子样本,从而达到部分访问的目的。当点云样本包括一个或者多个子样本时,子样本的媒体文件数据盒可以包括子样本标识字段,该字段为用于指示子样本类型的标志位。
当子样本标识字段的取值为0时,表示子样本是基于压缩单元的子样本,即一个子样本由点云样本中的至少一个压缩单元构成。
当子样本标识字段的取值为1时,表示子样本是基于图块的子样本,即一个子样本由对应于一个图块的包含一个或者多个压缩单元的连续单元序列构成,或者一个子样本由包含每个参数集合、图块清单或者帧边界标记的一个或者多个压缩单元的连续单元序列构成。
在本申请的一个实施例中,子样本的媒体文件数据盒可以包括如下的与编解码器具体参数codec_specific_parameters相关的字段:
几何头标识字段,用于指示子样本是否为几何头参数集合;
属性头标识字段,用于指示子样本是否为属性头参数集合;
几何片标识字段,用于指示子样本是否为点云几何片;
属性片标识字段,用于指示子样本是否为点云属性片。
在本申请的一个实施例中,子样本的媒体文件数据盒还可以包括属性类型字段,该字段用于指示当子样本为点云属性片时的点云属性的类型。
在本申请的一个实施例中,当属性类型字段取值为第一数值时,表示点云属性的类型为颜色属性;
当属性类型字段取值为第二数值时,表示点云属性的类型为反射率属性;
当属性类型字段取值为第三数值时,表示点云属性的类型为颜色属性和反射率属性。
图10示出了本申请一个实施例在子样本的媒体文件数据盒中与编解码器具体参数对应的语法结构。
几何头标识字段geo_header_flag,取值为1表示子样本为几何头参数集合;取值为0表示子样本不是几何头参数集合。
属性头标识字段attr_header_flag,取值为1表示子样本为属性头参数集合;取值为0表示子样本不是属性头参数集合。
几何片标识字段geo_slice_flag,取值为1表示子样本为点云几何片;取值为0表示子样本不是点云几何片。
属性片标识字段attr_slice_flag,取值为1表示子样本为点云属性片;取值为0表示子样本不是点云属性片。
在本申请实施例中,几何头标识字段geo_header_flag、属性头标识字段attr_header_flag、几何片标识字段geo_slice_flag、属性片标识字段attr_slice_flag四个标志位不能同时为0。
属性类型字段attr_type指示点云属性片中点云属性的类型。取值为0表示点云属性片仅包含颜色属性;取值为1表示点云属性片仅包含反射率属性;取值为2表示点云属性片同时包含颜色属性和反射率属性。
在本申请的一个实施例中,点云媒体文件中的不同组件信息(几何数据、属性数据)可以基于多轨道封装模式在不同的轨道中进行封装。在此基础上,点云样本及其子样本的相关字段具有对应的取值范围约束。
在本申请的一个实施例中,点云媒体文件包括封装于几何轨道中的第一点云样本,几何轨道是用于封装几何数据的轨道。在第一点云样本的媒体文件数据盒中,片类型字段取值为第一数值,取值为第一数值的片类型字段用于表示第一点云样本中的点云片的类型为点云几何片;在第一点云样本的子样本的媒体文件数据盒中,属性片标识字段的取值范围不包括第二数值,取值为第二数值的属性片标识字段用于表示子样本为点云属性片。
举例而言,对于仅封装几何数据的轨道,若其点云样本中存在片类型字段slice_type,则该字段只能取值为0。同时,其子样本定义中的属性片标识字段attr_slice_flag,不允许存在取值为1的情况。
在本申请的一个实施例中,点云媒体文件包括封装于属性轨道中的第二点云样本,属性轨道是用于封装属性数据的轨道;在第二点云样本的媒体文件数据盒中,头类型字段取值为第三数值,取值为第三数值的头类型字段用于表示参数集合的类型为属性头;在第二点云样本的媒体文件数据盒中,片类型字段的取值范围不包括第一数值,取值为第一数值的片类型字段用于表示第二点云样本中的点云片为点云几何片;在第二点云样本的子样本的媒体文件数据盒中,几何头标识字段的取值范围不包括第二数值,取值为第二数值的几何头标识字段用于表示子样本为几何头参数集合;在第二点云样本的子样本的媒体文件数据盒中,几何片标识字段的取值范围不包括第二数值,取值为第二数值的几何片标识字段用于表示子样本为点云几何片。
举例而言,对于仅封装属性数据的轨道,若其点云样本中存在头类型字段header_type,则该字段只能取值为2,若其样本中存在片类型字段slice_type,其取值不能为0。同时,其子样本定义中的几何头标识字段geo_header_flag与几何片标识字段geo_slice_flag字段,不允许存在取值为1的情况。
在本申请的一个实施例中,点云媒体文件包括封装于多个轨道中的点云样本,点云样本的媒体文件数据盒包括对应于轨道的元数据信息;元数据信息包括:
组件类型字段,用于指示轨道中封装的点云样本的组件类型,组件类型包括用于表示属性数据的属性组件和用于表示几何数据的几何组件;
属性数量字段,用于指示轨道中封装的属性组件的数量;
属性类型字段,用于指示轨道中封装的属性组件的类型。
在此基础上,属性类型字段的不同的取值可以用于表示不同的属性组件的类型。当属性类型字段的取值为第一数值时,表示属性组件的类型为颜色属性;当属性类型字段的取值为第二数值时,表示属性组件的类型为反射率属性。
图11示出了本申请一个实施例中多轨道封装模式下指示属性数量的元数据信息的语法结构。
组件类型字段gpcc_type指示轨道中组件的类型。
属性数量字段attr_num指示轨道中包含的属性组件的个数。
属性类型字段attr_type指示轨道中包含的属性组件的类型。取值为0表示组件的类型为颜色属性;取值为1表示组件的类型为反射率属性。
通过在元数据信息中指示属性数量,可以通过属性数量字段与属性类型字段相结合,指示轨道中封装的每个属性组件的类型。
表2示出了本申请一个实施例中关于组件类型字段不同取值的语义描述。
表2
gpcc_type value | Description |
1 | Reserved |
2 | Geometry Data |
3 | Reserved |
4 | Attribute Data |
5..31 | Reserved. |
如表2所示,当组件类型字段gpcc_type取值为2时,代表点云组件是用于表示 几何数据的几何组件;当组件类型字段gpcc_type取值为4时,代表点云组件是用于表示属性数据的属性组件;组件类型字段gpcc_type的其他取值(如1、3、5~31等)为保留值。
在本申请的一个实施例中,点云媒体文件包括封装于多个轨道中的点云样本,点云样本的媒体文件数据盒包括对应于轨道的元数据信息;元数据信息包括:
组件类型字段,用于指示轨道中封装的点云样本的组件类型,组件类型包括用于表示属性数据的属性组件和用于表示几何数据的几何组件;
属性类型字段,用于指示轨道中封装的属性组件的类型。
在此基础上,属性类型字段的不同的取值可以用于表示不同的属性组件的类型。当属性类型字段的取值为第一数值时,表示属性组件的类型为颜色属性;当属性类型字段的取值为第二数值时,表示属性组件的类型为反射率属性;当属性类型字段的取值为第三数值时,表示属性组件的类型同时包括颜色属性和反射率属性。
图12示出了本申请一个实施例中多轨道封装模式下扩展属性类型的元数据信息的语法结构。
组件类型字段gpcc_type指示轨道中组件的类型。
属性类型字段attr_type指示轨道中包含的属性组件的类型。取值为0表示组件的类型为颜色属性;取值为1表示组件的类型为反射率属性;取值为2表示组件的类型同时包含颜色属性和反射率属性。
通过在元数据信息中扩展属性类型的字段取值范围,可以利用属性类型字段分别指示轨道中封装的每个属性组件的类型。
在本申请的一个实施例中,数据源可以向数据接收方发送与点云媒体文件相对应的流媒体传输信令,数据接收方解析数据源发送的流媒体传输信令,得到流媒体传输信令中携带的组件描述符,组件描述符用于指示轨道中封装的点云组件的类型信息和属性信息;根据组件描述符获取由数据源发送的点云媒体文件。此处的数据接收方可以是需要接收点云媒体文件的电子设备,如终端,终端可以在接收到该点云媒体文件后,对点云媒体文件进行解码,可以得到点云数据。
流媒体传输信令是在数据源和数据接收方之间进行传输的用于协调通信过程的消息,例如可以是基于DASH协议的DASH信令,或者还可以是SMT信令。
在单轨道封装模式下,DASH信令包括具有一个或多个表示Representation的自适应集Adaptation Set。其中每个表示Representation均代表一个独立的点云码流。如果一个表示Representation由多个媒体段Media Segment组成,则DASH信令中还包括一个初始化媒体段Initialization Media Segment。
初始化媒体段Initialization Media Segment包含带有G-PCC参数集的GPCC解码器配置记录GPCCDecoderConfigurationRecord,例如在标准ISO/IEC 23090-9中定义的SPS、GPS和APS。
在多轨道封装模式下,DASH信令中的每个G-PCC组件表示为单独的自适应集Adaptation Set,可以称为组件自适应集Component Adaptation Set。包含几何信息的自适应集是作为G-PCC内容接入点的主GPCC自适应集。主GPCC自适应集包含在自适应集Adaptation Set级别上的单个初始化段Initialization Segment或者在表示 Representation级别上的多个初始化段Initialization Segment(每个表示Representation各对应一个初始化段Initialization Segment)。初始化段Initialization Segment应包含规定的G-PCC参数集,这些参数集是初始化G-PCC解码器所必需的。
在自适应集Adaptation Set级别上,一个组件描述符GPCCComponent descriptor应针对自适应集Adaptation Set的表示Representation中存在的每个点云组件发出信号。
在本申请的一个实施例中,组件描述符包括:
组件类型字段component@type,用于指示点云组件的类型为几何组件或者属性组件;
组件属性数量字段component@attr_num,用于指示属性组件的数量;
组件属性类型字段component@attr_type,用于指示属性组件的类型。
表3示出了本申请一个实施例中的组件描述符的语义解释。
表3
在流媒体传输信令中,通过组件描述符指示点云媒体文件中携带的点云组件的组件类型、组件属性数量以及组件属性类型等信息,可以向数据接收方提供对点云媒体文件进行部分接收或者部分解码的指示信息,因而可以提高点云媒体的传输和解码效率,降低带宽资源和计算资源的成本消耗。
图13示出了本申请一个实施例中点云媒体的编码方法的步骤流程图,该方法可以应用于点云媒体系统的服务器、终端(或终端上运行的客户端)以及中间节点等环节,本申请实施例以安装有点云媒体的编码装置的电子设备来执行点云媒体的编码方法作为示例。如图13所示,该点云媒体的编码方法包括如下的步骤S1310至步骤S1330。
在步骤S1310中,获取点云源数据,点云源数据包括多个点云帧。
点云源数据包括表示位于各种3D空间(例如,表示真实环境的3D空间、表示虚拟环境的3D空间等)中的对象和/或环境的点云视频(图像和/或视频)。
在本申请的一个实施例中,数据源可以使用一个或多个相机(例如,能够对深度信息进行保护的红外相机、能够提取与深度信息对应的颜色信息的RGB相机等)、投影仪(例如,用于对深度信息进行保护的红外图案投影仪)、LiDRA等采集设备来捕获点云源数据。从点云源数据的深度信息中可以提取由3D空间中的点构成的几何结构的形状,并可以从点云源数据的颜色信息中提取每个点的属性以对点云源数据进行保护。
以点云视频数据为例,点云视频可以包括一个或多个点云帧,一个点云帧可以表示一帧点云图像。在本申请的一个实施例中,可以基于面向内技术和面向外技术中的至少一种来捕获点云视频数据。
面向内技术是指用设置在中心对象周围的一个或更多个相机(或相机传感器)捕获中心对象的图像的技术。可以使用面向内技术生成向用户提供关键对象的360度图像的点云内容(例如,向用户提供对象(例如,诸如角色、玩家、对象或演员这样的关键对象)的360度图像的VR/AR内容)。
面向外技术是指用设置在中心对象周围的一个或更多个相机(或相机传感器)捕获中心对象的环境而非中心对象的图像的技术。可以使用面向外技术生成用于提供从用户的角度出现的周围环境的点云内容(例如,表示可以提供给自驾驶车辆的用户的外部环境的内容)。
当基于一个或更多个相机的捕获操作来生成点云内容时,坐标系在每个相机当中是不同的,因此,数据源可以在捕获操作之前校准一个或更多个相机以设置全局坐标系。另外,数据源可以通过将任意图像和/或视频与通过上述捕获技术捕获的图像和/或视频进行合成来生成点云内容。数据源可以对所捕获的图像和/或视频执行后处理,例如可以去除不需要的区域(例如背景),识别所捕获的图像和/或视频连接到的空间,并且当存在空间孔时执行填充空间孔的操作等等。
数据源可以通过对从每个相机保护的点云视频的点执行坐标变换来生成一条点云内容。数据源可以基于每个相机位置的坐标对点执行坐标变换。因此,数据源可以生成一个表示宽泛的空间范围的点云内容,或可以生成具有高密度点的点云内容。
在步骤S1320中,对点云帧进行编码处理,得到至少一个压缩单元。
在步骤S1330中,对至少一个压缩单元进行封装处理,得到点云媒体文件,点云媒 体文件包括封装于一个或者多个轨道中的点云样本;点云样本的媒体文件数据盒包括用于指示压缩单元的类型的类型字段,压缩单元的类型包括几何头、属性头、几何片和属性片中的任意一个,几何头用于指示几何信息的参数集合,属性头用于指示属性信息的参数集合,几何片是用于指示几何信息的点云片数据,属性片是用于指示属性信息的点云片数据。
点云媒体文件可以是如图2所示的经过编码和封装处理后得到的媒体文件或者媒体片段,该媒体文件或者媒体片段中承载有待传输的点云码流。
在本申请的一个实施例中,数据源可以根据点云码流中包含的几何参数信息、属性参数信息以及点云片的参数信息,将点云码流封装为单一轨道,或者也可以将单一轨道的点云媒体文件重新封装为包含多个轨道的点云媒体文件。
轨道可以是用于承载编码几何比特流或者编码属性比特流的体积视觉轨道(volumetric visual track),也可以是同时承载编码几何比特流和编码属性比特流的体积视觉轨道。
在点云码流以单轨道封装的情况下,每个点云样本都可以对应于一个的完整的点云帧。
媒体文件数据盒可以是基于ISO基本媒体文件格式ISOBMFF(ISO Base Media File Format)的数据盒。ISOBMFF的具体信息可以参考标准ISO/IEC 14496-12。
当G-PCC码流被承载在单个轨道中时,可以通过将G-PCC码流存储在单个轨道中来利用简单的ISOBMFF封装,而无需进一步处理。
在单轨道封装模式下,一个点云样本可以包括基于TLV格式封装的一个或者多个压缩单元,例如可以包括图5中的参数集合单元Parameter Set TLV、几何信息单元Geometry TLV和属性信息单元Attribute TLV。
TLV码流格式,即Type-length-value bytestream format,指的是由数据的类型Type、数据的长度Length和数据的值Value组成的结构体。关于TLV码流格式的具体信息可以参考标准ISO/IEC 23090-9。
在多轨道封装模式下,每个点云组件的码流数据被映射到单独的轨道中。G-PCC组件轨道有两种类型:G-PCC几何轨道和G-PCC属性轨道。轨道中的每个点云样本都包含至少一个压缩单元G-PCC unit,该压缩单元承载单个G-PCC组件数据单元,而不是几何和属性数据单元或者不同属性数据单元的复用。G-PCC属性轨道不应复用不同的属性子流,例如颜色、反射率。在多个轨道中封装几何码流和属性码流得到的数据结构可以参考图6。其中,ftyp表示文件类型,描述点云样本遵从的规范的版本;moov box表示点云样本的元数据信息metadata;mdat表示点云样本中携带的具体的媒体数据。
在本申请的一个实施例中,根据编码后的点云源数据的几何信息和属性信息,可以对点云样本的媒体文件数据盒进行字段赋值,赋值依据可以参考图7所示的基于TLV格式封装的压缩单元的语法结构,以及表1所示的关于压缩单元类型字段不同取值的语义描述。
针对不同类型的压缩单元,可以为对应的类型字段填充不同的数值。以表1作为示例。
当压缩单元的类型为序列参数集合SPS(Sequence Parameter Set)时,可以向对应 的类型字段填充数值0。
当压缩单元的类型为几何参数集合GPS(Geometry Parameter Set)时,可以向对应的类型字段填充数值1。
当压缩单元的类型为几何数据单元Geometry data unit时,可以向对应的类型字段填充数值2。
当压缩单元的类型为属性参数集合APS(Attribute Parameter Set)t时,可以向对应的类型字段填充数值3。
当压缩单元的类型为属性数据单元Attribute data unitt时,可以向对应的类型字段填充数值4。
当压缩单元的类型为图块清单Tile inventoryt时,可以向对应的类型字段填充数值5。
当压缩单元的类型为帧边界标记Frame boundary markert时,可以向对应的类型字段填充数值6。
以上压缩单元类型的具体信息可以参考标准ISO/IEC 23090-9。
为了提供更为具体的点云组件的指示信息,从而能够根据实际的点云消费需求更加灵活地进行点云媒体文件的传输和解码,本申请实施例还可以根据语义描述对如下的压缩单元类型字段进行赋值。
当压缩单元的类型为几何头类型时,可以向对应的类型字段填充数值7,几何头用于指示几何信息的参数集合。
当压缩单元的类型为属性头时,可以向对应的类型字段填充数值8,属性头用于指示属性信息的参数集合。
当压缩单元的类型为几何片时,可以向对应的类型字段填充数值9,几何片用于指示几何信息的点云片数据。
当压缩单元的类型为属性片时,可以向对应的类型字段填充数值10,属性片用于指示属性信息的点云片数据。
在本申请实施例中,压缩单元G-PCC unit包含几何头、属性头、几何片、属性片中的任意一个。同一个点云样本中的压缩单元G-PCC unit对应同一个点云帧且具备相同的呈现时间。
通过在媒体文件数据盒中提供压缩单元的类型字段,可以利用不同的字段取值指示待解码的压缩单元为几何头、属性头、几何片或者属性片,从而可以根据点云媒体的消费需求有选择地对部分文件内容进行数据解码,而无需解码全部的文件内容,因此不仅可以提高点云数据消费的灵活性,而且可以显著提高点云数据的解码效率,降低计算资源的成本消耗。
图14示出了本申请实施例在多轨道封装的流媒体传输应用场景中进行点云数据编解码的流程图。如图14所示,服务器作为生产点云媒体文件的数据源,可以将点云数据编码并发送至用户所在的客户端(或运行该客户端的终端),通过客户端对点云媒体文件进行解码后可以得到点云数据以供用户消费。具体的点云数据编解码过程可以包括如下步骤。
步骤S1401:服务器根据点云码流中包含的几何参数信息、属性参数信息以及点 云片的参数信息,将点云码流封装为多轨道的点云媒体文件F1。
点云媒体文件F1例如可以包括Track1、Track2和Track3三个轨道。
在轨道Track1的媒体文件数据盒GPCCComponentInfoBox中,gpcc_type=2;代表该轨道内封装的点云组件是用于表示几何数据的几何组件。
在轨道Track2的媒体文件数据盒GPCCComponentInfoBox中,gpcc_type=4,attr_num=1,attr_type=0;代表该轨道内封装的点云组件是用于表示属性数据的属性组件,而且该轨道中仅包含一个属性组件,该属性组件的属性类型为颜色属性。
在轨道Track2的媒体文件数据盒GPCCComponentInfoBox中,gpcc_type=4;attr_num=1;attr_type=1;代表该轨道内封装的点云组件是用于表示属性数据的属性组件,而且该轨道中仅包含一个属性组件,该属性组件的属性类型为反射率属性。
步骤S1402:服务器将文件F1按照DASH标准转化为流媒体传输场景下的多个片段。
步骤S1403:服务器生成MPD信令信息,并发送给客户端。
步骤S1404:客户端解析MPD信令中的组件描述符。
组件描述符中包括属性数量字段attr_num与属性类型字段attr_type,基于组件描述符的字段取值可以确定表示Representation中包含的属性组件的数量以及对应的类型。
步骤S1405:客户端根据自身的带宽和需求请求相应的表达进行消费。
基于组件描述符的取值可以有选择地向服务器请求传输不同的点云数据,避免点云媒体文件的全量传输和全量解码,进而提高数据传输效率以及解码效率,降低带宽资源和计算资源的消耗。例如,客户端1可以请求几何数据、颜色属性数据进行消费,客户端2请求几何数据、颜色、反射率属性数据进行消费。
图15示出了本申请实施例在单轨道封装的本地点云媒体播放的应用场景中进行点云数据编解码的流程图。如图15所示,服务器作为生产点云媒体文件的数据源,可以将点云数据编码并发送至用户所在的客户端,通过客户端对点云媒体文件进行解码后可以得到点云数据以供用户消费。具体的点云数据编解码过程可以包括如下步骤。
步骤S1501:服务器根据点云码流中包含的几何参数信息、属性参数信息以及点云片的参数信息,将点云码流封装为单轨道的点云媒体文件F1。
在点云媒体文件F1中可以划分点云样本的子样本,从而达到部分访问的目的。子样本是基于压缩单元的子样本,即一个子样本由点云样本中的至少一个压缩单元构成。
当点云样本包括一个或者多个子样本时,子样本的媒体文件数据盒可以包括子样本标识字段,该字段为用于指示子样本类型的标志位。当子样本标识字段的取值为0时,表示子样本是基于压缩单元的子样本,即一个子样本由点云样本中的至少一个压缩单元构成。
子样本的媒体文件数据盒可以包括如下的与编解码器具体参数codec_specific_parameters相关的字段:
几何头标识字段,用于指示子样本是否为几何头参数集合;
属性头标识字段,用于指示子样本是否为属性头参数集合;
几何片标识字段,用于指示子样本是否为点云几何片;
属性片标识字段,用于指示子样本是否为点云属性片;
属性类型字段,用于指示当子样本为点云属性片时的点云属性的类型。
其中,几何头标识字段geo_header_flag,取值为1表示子样本为几何头参数集合;取值为0表示子样本不是几何头参数集合。
属性头标识字段attr_header_flag,取值为1表示子样本为属性头参数集合;取值为0表示子样本不是属性头参数集合。
几何片标识字段geo_slice_flag,取值为1表示子样本为点云几何片;取值为0表示子样本不是点云几何片。
属性片标识字段attr_slice_flag,取值为1表示子样本为点云属性片;取值为0表示子样本不是点云属性片。
几何头标识字段geo_header_flag、属性头标识字段attr_header_flag、几何片标识字段geo_slice_flag、属性片标识字段attr_slice_flag四个标志位不能同时为0。
属性类型字段attr_type指示点云属性片中点云属性的类型。取值为0表示点云属性片仅包含颜色属性;取值为1表示点云属性片仅包含反射率属性;取值为2表示点云属性片同时包含颜色属性和反射率属性。
步骤S1502:服务器将点云媒体文件F1传输给客户端。
步骤S1503:客户端解析点云媒体文件F1的媒体文件数据盒,得到点云样本中包含的子样本划分的信息。
步骤S1504:客户端根据子样本划分的信息对点云媒体文件F1中的点云样本进行有选择地解码并消费。
基于媒体文件数据盒中的几何头标识字段geo_header_flag、属性头标识字段attr_header_flag、几何片标识字段geo_slice_flag、属性片标识字段attr_slice_flag、属性类型字段attr_type等信息可以确定子样本对应的压缩单元中包含的数据类型,因此可以根据自身需求,结合样本中包含的子样本划分的信息(子样本工具本身可以将一个样本划分为不同的数据块,结合本发明中定义的)对点云样本进行有选择地解码并消费。例如客户端1可以部分解码几何数据、颜色属性数据进行消费。客户端2可以完整解码几何数据、颜色属性数据、反射率属性数据进行消费。
本申请实施例通过在文件封装层面以及传输信令层面进行字段扩展,可以在媒体文件数据盒中定义单轨封装模式下轨道样本和子样本中的组件信息、多轨封装模式下轨道样本和子样本中的组件信息、以及定义多轨封装模式下轨道的组件指示信息,使得客户端可以按照组件类型传输、解封装以及解码所需的点云数据,达到部分访问和部分传输的目的,提高点云数据传输效率及解码效率,实现最大化节省带宽和计算资源。
应当注意,尽管在附图中以特定顺序描述了本申请中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
以下介绍本申请的装置实施例,可以用于执行本申请上述实施例中的点云媒体的编 解码方法。
图16示意性地示出了本申请实施例提供的点云媒体的解码装置的结构框图。如图16所示,点云媒体的解码装置1600可以包括:
获取模块1610,用于获取点云媒体文件,点云媒体文件包括封装于一个或者多个轨道中的点云样本;
解封装模块1620,用于对点云样本进行解封装处理,得到至少一个压缩单元;点云样本的媒体文件数据盒包括用于指示压缩单元的类型的类型字段,压缩单元的类型包括几何头、属性头、几何片和属性片中的任意一个,几何头用于指示几何信息的参数集合,属性头用于指示属性信息的参数集合,几何片是用于指示几何信息的点云片数据,属性片是用于指示属性信息的点云片数据;
解码模块1630,用于根据类型字段选取目标压缩单元,并对目标压缩单元进行解码处理,得到点云数据。
图17示意性地示出了本申请实施例提供的点云媒体的编码装置的结构框图。如图17所示,点云媒体的编码装置1700可以包括:
获取模块1710,用于获取点云源数据,点云源数据包括多个点云帧;
编码模块1720,用于对点云帧进行编码处理,得到至少一个压缩单元;
封装模块1730,用于对至少一个压缩单元进行封装处理,得到点云媒体文件,点云媒体文件包括封装于一个或者多个轨道中的点云样本;点云样本的媒体文件数据盒包括用于指示压缩单元的类型的类型字段,压缩单元的类型包括几何头、属性头、几何片和属性片中的任意一个,几何头用于指示几何信息的参数集合,属性头用于指示属性信息的参数集合,几何片是用于指示几何信息的点云片数据,属性片是用于指示属性信息的点云片数据。
本申请各实施例中提供的点云媒体的编码装置以及点云媒体的解码装置的具体细节与有益效果,均已经在对应的方法实施例中进行了详细的描述,可参考相应的方法实施例。
图18示意性地示出了用于实现本申请实施例的电子设备的系统结构框图。
需要说明的是,图18示出的电子设备的系统1800仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图18所示,系统1800包括中央处理器1801(Central Processing Unit,CPU),其可以根据存储在只读存储器1802(Read-Only Memory,ROM)中的程序或者从存储部分1808加载到随机访问存储器1803(Random Access Memory,RAM)中的程序而执行各种适当的动作和处理。在随机访问存储器1803中,还存储有系统操作所需的各种程序和数据。中央处理器1801、在只读存储器1802以及随机访问存储器1803通过总线1804彼此相连。输入/输出接口1805(Input/Output接口,即I/O接口)也连接至总线1804。
以下部件连接至输入/输出接口1805:包括键盘、鼠标等的输入部分1806;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分1807;包括硬盘等的存储部分1808;以及包括诸如局域网卡、调制解调器等的网络接口卡的通信部分1809。通信部分1809经由诸如因特网的网 络执行通信处理。驱动器1810也根据需要连接至输入/输出接口1805。可拆卸介质1811,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1810上,以便于从其上读出的计算机可读指令根据需要被安装入存储部分1808。
特别地,根据本申请的实施例,各个方法流程图中所描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机可读指令,该计算机可读指令包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机可读指令可以通过通信部分1809从网络上被下载和安装,和/或从可拆卸介质1811被安装。在该计算机可读指令被中央处理器1801执行时,执行本申请的系统中限定的各种功能。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。
Claims (22)
- 一种点云媒体的解码方法,由电子设备执行,所述方法包括:获取点云媒体文件,所述点云媒体文件包括封装于一个或者多个轨道中的点云样本;对所述点云样本进行解封装处理,得到至少一个压缩单元;所述点云样本的媒体文件数据盒包括用于指示所述压缩单元的类型的类型字段,所述压缩单元的类型包括几何头、属性头、几何片和属性片中的任意一个,所述几何头用于指示几何信息的参数集合,所述属性头用于指示属性信息的参数集合,所述几何片是用于指示几何信息的点云片数据,所述属性片是用于指示属性信息的点云片数据;根据类型字段选取目标压缩单元,并对所述目标压缩单元进行解码处理,得到点云数据。
- 根据权利要求1所述的点云媒体的解码方法,其特征在于,所述点云样本的媒体文件数据盒还包括:组件头数量字段,用于指示所述点云样本中包括的参数集合的数量;片数量字段,用于指示所述点云样本中包括的点云片的数量。
- 根据权利要求2所述的点云媒体的解码方法,其特征在于,当所述组件头数量字段的取值为0时,表示所述参数集合为解码器配置信息。
- 根据权利要求2所述的点云媒体的解码方法,其特征在于,所述点云样本的媒体文件数据盒还包括:头类型字段,用于指示所述参数集合的类型为几何头或者属性头;头长度字段,用于指示所述参数集合的长度;头数据字段,用于指示所述参数集合中的数据。
- 根据权利要求2所述的点云媒体的解码方法,其特征在于,所述点云样本的媒体文件数据盒还包括:片类型字段,用于指示所述点云片的类型;片长度字段,用于指示所述点云片的长度;片数据字段,用于指示所述点云片中的数据。
- 根据权利要求5所述的点云媒体的解码方法,其特征在于:当所述片类型字段取值为第一数值时,表示所述点云片的类型为点云几何片;当所述片类型字段取值为第二数值时,表示所述点云片的类型为点云颜色属性片;当所述片类型字段取值为第三数值时,表示所述点云片的类型为点云反射率属性片;当所述片类型字段取值为第四数值时,表示所述点云片的类型为包括颜色属性和反射率属性的点云混合属性片。
- 根据权利要求2所述的点云媒体的解码方法,其特征在于,所述点云片包括片头和数据信息;所述点云样本的媒体文件数据盒还包括:几何片头长度字段,用于指示所述点云片为几何片时的片头的长度;几何片数据长度字段,用于指示所述点云片为几何片时的数据信息的长度;属性片头长度字段,用于指示所述点云片为属性片时的片头的长度;属性片数据长度字段,用于指示所述点云片为属性片时的数据信息的长度;几何片头字段,用于指示所述点云片为几何片时的点云片头;几何片数据字段,用于指示所述点云片为几何片时的数据信息;属性片头字段,用于指示所述点云片为属性片时的点云片头;属性片数据字段,用于指示所述点云片为属性片时的数据信息。
- 根据权利要求1至7中任意一项所述的点云媒体的解码方法,其特征在于,所述点云样本包括一个或者多个子样本,所述子样本的媒体文件数据盒包括子样本标识字段;当所述子样本标识字段的取值为0时,表示所述子样本由所述点云样本中的至少一个压缩单元构成。
- 根据权利要求8所述的点云媒体的解码方法,其特征在于,所述子样本的媒体文件数据盒还包括:几何头标识字段,用于指示所述子样本是否为几何头参数集合;属性头标识字段,用于指示所述子样本是否为属性头参数集合;几何片标识字段,用于指示所述子样本是否为点云几何片;属性片标识字段,用于指示所述子样本是否为点云属性片。
- 根据权利要求9所述的点云媒体的解码方法,其特征在于,所述子样本的媒体文件数据盒还包括:属性类型字段,用于指示当所述子样本为点云属性片时的点云属性的类型。
- 根据权利要求10所述的点云媒体的解码方法,其特征在于:当所述属性类型字段取值为第一数值时,表示所述点云属性的类型为颜色属性;当所述属性类型字段取值为第二数值时,表示所述点云属性的类型为反射率属性;当所述属性类型字段取值为第三数值时,表示所述点云属性的类型为颜色属性和反射率属性。
- 根据权利要求8所述的点云媒体的解码方法,其特征在于,所述点云媒体文件还包括封装于几何轨道中的第一点云样本,所述几何轨道是用于封装几何数据的轨道;在所述第一点云样本的媒体文件数据盒中,片类型字段取值为第一数值,取值为第一数值的片类型字段用于表示所述第一点云样本中的点云片的类型为点云几何片;在所述第一点云样本的子样本的媒体文件数据盒中,属性片标识字段的取值范围不包括第二数值,取值为第二数值的属性片标识字段用于表示所述子样本为点云属性片。
- 根据权利要求8所述的点云媒体的解码方法,其特征在于,所述点云媒体文件还包括封装于属性轨道中的第二点云样本,所述属性轨道是用于封装属性数据的轨道;在所述第二点云样本的媒体文件数据盒中,头类型字段取值为第三数值,取值为第三数值的头类型字段用于表示参数集合的类型为属性头;在所述第二点云样本的媒体文件数据盒中,片类型字段的取值范围不包括第一数值,取值为第一数值的片类型字段用于表示所述第二点云样本中的点云片为点云几何片;在所述第二点云样本的子样本的媒体文件数据盒中,几何头标识字段的取值范围不包括第二数值,取值为第二数值的几何头标识字段用于表示子样本为几何头参数集合;在所述第二点云样本的子样本的媒体文件数据盒中,几何片标识字段的取值范围不 包括第二数值,取值为第二数值的几何片标识字段用于表示子样本为点云几何片。
- 根据权利要求1至7中任意一项所述的点云媒体的解码方法,其特征在于,所述点云媒体文件包括封装于多个轨道中的点云样本,所述点云样本的媒体文件数据盒包括对应于所述轨道的元数据信息;所述元数据信息包括:组件类型字段,用于指示所述轨道中封装的点云组件的组件类型,所述组件类型包括用于表示属性数据的属性组件和用于表示几何数据的几何组件;属性数量字段,用于指示所述轨道中封装的属性组件的数量;属性类型字段,用于指示所述轨道中封装的属性组件的类型。
- 根据权利要求14所述的点云媒体的解码方法,其特征在于:当所述属性类型字段的取值为第一数值时,表示所述属性组件的类型为颜色属性;当所述属性类型字段的取值为第二数值时,表示所述属性组件的类型为反射率属性;当所述属性类型字段的取值为第三数值时,表示所述属性组件的类型包括所述颜色属性和所述反射率属性。
- 根据权利要求1至7中任意一项所述的点云媒体的解码方法,其特征在于,获取点云媒体文件,包括:解析数据源发送的流媒体传输信令,得到所述流媒体传输信令中携带的组件描述符,所述组件描述符用于指示轨道中封装的点云组件的类型信息和属性信息;根据所述组件描述符获取由所述数据源发送的点云媒体文件。
- 根据权利要求16所述的点云媒体的解码方法,其特征在于,所述组件描述符包括:组件类型字段,用于指示点云组件的类型为几何组件或者属性组件;组件属性数量字段,用于指示属性组件的数量;组件属性类型字段,用于指示属性组件的类型。
- 一种点云媒体的编码方法,由电子设备执行,所述方法包括:获取点云源数据,所述点云源数据包括多个点云帧;对所述点云帧进行编码处理,得到至少一个压缩单元;对所述至少一个压缩单元进行封装处理,得到点云媒体文件,所述点云媒体文件包括封装于一个或者多个轨道中的点云样本;所述点云样本的媒体文件数据盒包括用于指示所述压缩单元的类型的类型字段,所述压缩单元的类型包括几何头、属性头、几何片和属性片中的任意一个,所述几何头用于指示几何信息的参数集合,所述属性头用于指示属性信息的参数集合,所述几何片是用于指示几何信息的点云片数据,所述属性片是用于指示属性信息的点云片数据。
- 一种点云媒体的解码装置,包括:获取模块,用于获取点云媒体文件,所述点云媒体文件包括封装于一个或者多个轨道中的点云样本;解封装模块,用于对所述点云样本进行解封装处理,得到至少一个压缩单元;所述点云样本的媒体文件数据盒包括用于指示所述压缩单元的类型的类型字段,所述压缩单元的类型包括几何头、属性头、几何片和属性片中的任意一个,所述几何头用于指示几 何信息的参数集合,所述属性头用于指示属性信息的参数集合,所述几何片是用于指示几何信息的点云片数据,所述属性片是用于指示属性信息的点云片数据;解码模块,用于根据类型字段选取目标压缩单元,并对所述目标压缩单元进行解码处理,得到点云数据。
- 一种点云媒体的编码装置,包括:获取模块,用于获取点云源数据,所述点云源数据包括多个点云帧;编码模块,用于对所述点云帧进行编码处理,得到至少一个压缩单元;封装模块,用于对所述至少一个压缩单元进行封装处理,得到点云媒体文件,所述点云媒体文件包括封装于一个或者多个轨道中的点云样本;所述点云样本的媒体文件数据盒包括用于指示所述压缩单元的类型的类型字段,所述压缩单元的类型包括几何头、属性头、几何片和属性片中的任意一个,所述几何头用于指示几何信息的参数集合,所述属性头用于指示属性信息的参数集合,所述几何片是用于指示几何信息的点云片数据,所述属性片是用于指示属性信息的点云片数据。
- 一种计算机可读介质,所述计算机可读介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现权利要求1至18中任意一项所述的方法。
- 一种电子设备,包括:处理器;以及存储器,用于存储计算机可读指令;其中,所述处理器配置为经由执行所述计算机可读指令使得所述电子设备执行权利要求1至18中任意一项所述的方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/235,685 US20230396808A1 (en) | 2022-03-11 | 2023-08-18 | Method and apparatus for decoding point cloud media, and method and apparatus for encoding point cloud media |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210243282.2A CN116781913A (zh) | 2022-03-11 | 2022-03-11 | 点云媒体的编解码方法及相关产品 |
CN202210243282.2 | 2022-03-11 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/235,685 Continuation US20230396808A1 (en) | 2022-03-11 | 2023-08-18 | Method and apparatus for decoding point cloud media, and method and apparatus for encoding point cloud media |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023169003A1 true WO2023169003A1 (zh) | 2023-09-14 |
Family
ID=87937145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/135732 WO2023169003A1 (zh) | 2022-03-11 | 2022-12-01 | 点云媒体的解码方法、点云媒体的编码方法及装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230396808A1 (zh) |
CN (1) | CN116781913A (zh) |
WO (1) | WO2023169003A1 (zh) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210112282A1 (en) * | 2018-08-08 | 2021-04-15 | Panasonic Intellectual Property Corporation Of America | Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device |
US20210321139A1 (en) * | 2020-04-07 | 2021-10-14 | Qualcomm Incorporated | High-level syntax design for geometry-based point cloud compression |
US20210319571A1 (en) * | 2020-04-14 | 2021-10-14 | Lg Electronics Inc. | Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method |
WO2021210867A1 (ko) * | 2020-04-12 | 2021-10-21 | 엘지전자 주식회사 | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 |
WO2021209044A1 (zh) * | 2020-04-16 | 2021-10-21 | 上海交通大学 | 多媒体数据收发方法、系统、处理器和播放器 |
US20210329052A1 (en) * | 2020-04-13 | 2021-10-21 | Lg Electronics Inc. | Point cloud data transmission apparatus, point cloud data transmission method, point cloud data reception apparatus and point cloud data reception method |
CN114097229A (zh) * | 2019-07-03 | 2022-02-25 | Lg 电子株式会社 | 点云数据发送设备、点云数据发送方法、点云数据接收设备和点云数据接收方法 |
-
2022
- 2022-03-11 CN CN202210243282.2A patent/CN116781913A/zh active Pending
- 2022-12-01 WO PCT/CN2022/135732 patent/WO2023169003A1/zh unknown
-
2023
- 2023-08-18 US US18/235,685 patent/US20230396808A1/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210112282A1 (en) * | 2018-08-08 | 2021-04-15 | Panasonic Intellectual Property Corporation Of America | Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device |
CN114097229A (zh) * | 2019-07-03 | 2022-02-25 | Lg 电子株式会社 | 点云数据发送设备、点云数据发送方法、点云数据接收设备和点云数据接收方法 |
US20210321139A1 (en) * | 2020-04-07 | 2021-10-14 | Qualcomm Incorporated | High-level syntax design for geometry-based point cloud compression |
WO2021210867A1 (ko) * | 2020-04-12 | 2021-10-21 | 엘지전자 주식회사 | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 |
US20210329052A1 (en) * | 2020-04-13 | 2021-10-21 | Lg Electronics Inc. | Point cloud data transmission apparatus, point cloud data transmission method, point cloud data reception apparatus and point cloud data reception method |
US20210319571A1 (en) * | 2020-04-14 | 2021-10-14 | Lg Electronics Inc. | Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method |
WO2021209044A1 (zh) * | 2020-04-16 | 2021-10-21 | 上海交通大学 | 多媒体数据收发方法、系统、处理器和播放器 |
Also Published As
Publication number | Publication date |
---|---|
US20230396808A1 (en) | 2023-12-07 |
CN116781913A (zh) | 2023-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11477542B2 (en) | Method, device, and computer program for generating timed media data | |
JP7399224B2 (ja) | メディアコンテンツを送信するための方法、装置及びコンピュータプログラム | |
JP7058273B2 (ja) | 情報処理方法および装置 | |
US12088862B2 (en) | Method, device, and computer program for transmitting media content | |
KR20190008901A (ko) | 가상 현실 미디어 콘텐트의 스트리밍을 개선하는 방법, 디바이스, 및 컴퓨터 프로그램 | |
CN111727605B (zh) | 用于发送和接收关于多个视点的元数据的方法及设备 | |
WO2023202095A1 (zh) | 点云媒体的编解码方法、装置、电子设备和存储介质 | |
WO2024041239A1 (zh) | 一种沉浸媒体的数据处理方法、装置、设备、存储介质及程序产品 | |
WO2024041238A1 (zh) | 一种点云媒体的数据处理方法及相关设备 | |
US20230034937A1 (en) | Media file encapsulating method, media file decapsulating method, and related devices | |
WO2024183506A1 (zh) | 沉浸媒体的数据处理方法、装置、计算机设备、存储介质及程序产品 | |
WO2023226504A1 (zh) | 一种媒体数据处理方法、装置、设备以及可读存储介质 | |
WO2023169003A1 (zh) | 点云媒体的解码方法、点云媒体的编码方法及装置 | |
WO2024114519A1 (zh) | 点云封装与解封装方法、装置、介质及电子设备 | |
CN116781674B (zh) | 一种沉浸媒体的数据处理方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22930626 Country of ref document: EP Kind code of ref document: A1 |