WO2023015530A1

WO2023015530A1 - Point cloud encoding and decoding methods, encoder, decoder, and computer readable storage medium

Info

Publication number: WO2023015530A1
Application number: PCT/CN2021/112338
Authority: WO
Inventors: 元辉; 王婷婷; 刘昊; 李明
Original assignee: Oppo广东移动通信有限公司
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2023-02-16
Also published as: CN117378204A

Abstract

Disclosed in the embodiments of the present application are point cloud encoding and decoding methods, an encoder, a decoder, and a computer readable storage medium, the method comprising: on the basis of a point cloud to be encoded, determining a point cloud to be processed in a block to be processed; on the basis of the distance between two points in the point cloud to be processed, determining the edge weight between the two points to thereby determine a graph filter, the distance between the two points being related to the edge weight between the two points; on the basis of the graph filter, performing graph filter sampling on the point cloud to be processed in order to obtain a first point cloud having undergone graph filter sampling; and, on the basis of the first point cloud, performing compensation on a reference point cloud to thereby encode the point cloud to be encoded.

Description

Point cloud encoding and decoding method, encoder, decoder and computer-readable storage medium

technical field

The embodiments of the present application relate to the technical field of point cloud encoding and decoding, and in particular, to a point cloud encoding and decoding method, an encoder, a decoder, and a computer-readable storage medium.

Background technique

During inter-frame geometric coding, there is a process of global motion compensation and/or local motion compensation. In the process of global motion estimation, the target point cloud after equal interval sampling still contains a large number of points, resulting in a high time complexity for using the LMS method to solve the global motion transformation matrix (Mat_GM), and the target point cloud after sampling has a certain high frequency and low frequency noise information. When performing local motion estimation, all points contained in the hierarchical nodes of the octree are used to estimate local motion information, and the number of points used in the hierarchical nodes is large, resulting in high time complexity of motion estimation.

Contents of the invention

Embodiments of the present application provide a point cloud encoding and decoding method, an encoder, a decoder, and a computer-readable storage medium, which can reduce time complexity in a motion compensation process.

The technical solutions of the embodiments of the present application can be implemented as follows:

In the first aspect, the embodiment of the present application provides a point cloud encoding method, including:

Based on the point cloud to be encoded, determine the point cloud to be processed in the block to be processed;

Based on the distance between two points in the point cloud to be processed, determine the edge weight between the two points, so as to determine the graph filter; the distance between the two points and the edge weight between the two points related;

Based on the image filter, image filtering sampling is performed on the point cloud to be processed to obtain a first point cloud after image filtering sampling;

Compensating the reference point cloud based on the first point cloud, and then encoding the point cloud to be encoded.

In a second aspect, the embodiment of the present application provides a point cloud decoding method, including:

Analyzing the code stream to obtain geometric encoding results and motion-related information; the motion-related information is obtained by calculating the first point cloud after image filtering and sampling of the point cloud to be processed in the point cloud to be encoded, and the reference point cloud;

Get the decoded reference point cloud;

Based on the motion-related information and the reference point cloud, motion compensation is performed on the reference point cloud, and the geometric code stream is decoded to obtain decoded information.

In a third aspect, the embodiment of the present application provides an encoder, including:

The first determination part is configured to determine, based on the point cloud to be encoded, the point cloud to be processed in the block to be processed; based on the distance between two points in the point cloud to be processed, determine the edge weight between the two points , so as to determine the graph filter;

The filtering part is configured to perform image filtering and sampling on the point cloud to be processed based on the image filter, to obtain a first point cloud after image filtering and sampling;

The encoding part is configured to compensate the reference point cloud based on the first point cloud, and then encode the point cloud to be encoded.

In the fourth aspect, the embodiment of the present application also provides an encoder, including:

The first memory is used to store executable point cloud coding instructions;

The first processor is configured to implement the point cloud encoding method described by the encoder when executing the executable point cloud encoding instruction stored in the first memory.

In the fifth aspect, the embodiment of the present application provides a decoder, including:

The decoding part is configured to analyze the code stream, and obtain geometric encoding results and motion-related information; the motion-related information is the first point cloud after image filtering and sampling of the point cloud to be processed in the point cloud to be encoded, and the reference point cloud calculated;

an acquisition part configured to acquire the decoded reference point cloud;

The decoding part is further configured to perform motion compensation on the reference point cloud based on the motion-related information and the reference point cloud, and decode the geometric code stream to obtain decoded information.

In a sixth aspect, the embodiment of the present application also provides a decoder, including:

The second memory is used to store executable point cloud decoding instructions;

The second processor is configured to implement the point cloud decoding method described in the decoder when executing the executable point cloud decoding instruction stored in the second memory.

In the seventh aspect, the embodiment of the present application provides a computer-readable storage medium, which stores executable point cloud encoding instructions, and is used to cause the first processor to implement the point cloud encoding method described in the encoder; or, Executable point cloud decoding instructions are stored for causing the second processor to implement the point cloud decoding method described in the decoder.

The embodiment of the present application provides a point cloud encoding and decoding method, an encoder, a decoder, and a computer-readable storage medium. The encoder can perform global motion compensation and local motion compensation on the reference point cloud during the point cloud encoding process. In the process, the point cloud to be processed is first filtered and sampled to reduce the number of points participating in the midpoint of the point cloud to be processed, so that the calculation time can be reduced in the process of calculating motion information or global motion transformation matrix using the filtered and sampled point cloud Complexity, thereby improving the encoding time efficiency.

Description of drawings

FIG. 1 is a block diagram of an exemplary encoding system provided by an embodiment of the present application;

FIG. 2 is a block diagram of an exemplary decoding system provided by an embodiment of the present application;

Fig. 3 is the flowchart of a kind of point cloud encoding method provided by the embodiment of the present application;

FIG. 4 is a schematic diagram 1 of an exemplary octree division provided by an embodiment of the present application;

FIG. 5 is a schematic diagram 2 of an exemplary octree division provided by the embodiment of the present application;

FIG. 6 is an exemplary edge weight connection diagram provided by an embodiment of the present application;

FIG. 7 is a schematic diagram 3 of an exemplary octree division provided by the embodiment of the present application;

FIG. 8 is a schematic diagram of an exemplary local motion compensation provided by an embodiment of the present application;

FIG. 9 is an exemplary coding flowchart provided by the embodiment of the present application;

FIG. 10 is a flowchart of a point cloud decoding method provided by an embodiment of the present application;

Fig. 11 is an exemplary decoding flowchart provided by the embodiment of the present application;

FIG. 12 is a structural diagram 1 of an encoder provided in an embodiment of the present application;

FIG. 13 is a second structural diagram of an encoder provided in the embodiment of the present application.

FIG. 14 is a structural diagram 1 of a decoder provided by an embodiment of the present application;

FIG. 15 is the second structural diagram of a decoder provided in the embodiment of the present application.

Detailed ways

In order to understand the characteristics and technical contents of the embodiments of the present application in more detail, the implementation of the embodiments of the present application will be described in detail below in conjunction with the accompanying drawings. The attached drawings are only for reference and description, and are not intended to limit the embodiments of the present application.

In order to facilitate the understanding of the embodiments of the present application, firstly, the relevant concepts involved in the embodiments of the present application are briefly introduced as follows:

Point cloud refers to a set of discrete point sets randomly distributed in space, expressing the spatial structure and surface properties of 3D objects or 3D scenes.

Point cloud data (Point Cloud Data) is a specific record form of point cloud, and the points in the point cloud can include point location information and point attribute information. For example, the point position information may be three-dimensional coordinate information of the point. The location information of a point may also be referred to as geometric information of a point. For example, the attribute information of a point may include color information and/or reflectivity and the like. For example, the color information may be information on any color space. For example, the color information may be (RGB). For another example, the color information may be luminance and chrominance (YcbCr, YUV) information. For example, Y represents brightness (Luma), Cb (U) represents blue color difference, Cr (V) represents red color, and U and V are expressed as chromaticity (Chroma) for describing color difference information. For example, according to the point cloud obtained according to the principle of laser measurement, the points in the point cloud may include the three-dimensional coordinate information of the point and the laser reflection intensity (reflectance) of the point. For another example, in the point cloud obtained according to the principle of photogrammetry, the points in the point cloud may include three-dimensional coordinate information and color information of the point. For another example, combining the principles of laser measurement and photogrammetry to obtain a point cloud, the points in the point cloud may include the three-dimensional coordinate information of the point, the laser reflection intensity (reflectance) of the point, and the color information of the point.

Ways to obtain point cloud data may include but not limited to at least one of the following: (1) Generated by computer equipment. The computer device can generate point cloud data according to virtual three-dimensional objects and virtual three-dimensional scenes. (2) 3D (3-Dimension, three-dimensional) laser scanning acquisition. Point cloud data of static real-world 3D objects or 3D scenes can be obtained through 3D laser scanning, and millions of point cloud data can be obtained per second; (3) 3D photogrammetry acquisition. Through 3D photography equipment (that is, a group of cameras or camera equipment with multiple lenses and sensors) to collect the visual scene of the real world to obtain the point cloud data of the visual scene of the real world, through 3D photography can obtain dynamic real world three-dimensional objects Or point cloud data of a 3D scene. (4) Obtain point cloud data of biological tissues and organs through medical equipment. In the medical field, point cloud data of biological tissues and organs can be obtained through magnetic resonance imaging (Magnetic Resonance Imaging, MRI), electronic computer tomography (Computed Tomography, CT), electromagnetic positioning information and other medical equipment.

Point clouds can be divided into dense point clouds and sparse point clouds according to the way of acquisition.

According to the time series type of data, point cloud is divided into:

The first type of static point cloud: that is, the object is stationary, and the device for obtaining the point cloud is also stationary;

The second type of dynamic point cloud: the object is moving, but the device for obtaining the point cloud is still;

The third type of dynamic acquisition of point clouds: the equipment for acquiring point clouds is in motion.

According to the purpose of point cloud, it can be divided into two categories:

Category 1: Machine perception point cloud, which can be used in scenarios such as autonomous navigation systems, real-time inspection systems, geographic information systems, visual sorting robots, and emergency rescue robots;

Category 2: Human eyes perceive point clouds, which can be used in point cloud application scenarios such as digital cultural heritage, free viewpoint broadcasting, 3D immersive communication, and 3D immersive interaction.

Point cloud data can be used to form point cloud media, which can be a media file. The point cloud media may include multiple media frames, and each media frame in the point cloud media is composed of point cloud data. Point cloud media can flexibly and conveniently express the spatial structure and surface properties of 3D objects or 3D scenes, so it is widely used. After encoding the point cloud media, the encoded code stream is encapsulated to form an encapsulated file, which can be used for transmission to the user. Correspondingly, on the point cloud media player side, it is necessary to decapsulate the encapsulated file first, then decode it, and finally present the decoded data stream. Package files may also be referred to as point cloud files.

As of now, point clouds can be encoded through the Point Cloud Encoding Framework.

The point cloud coding framework can be a geometry-based point cloud compression (Geometry Point Cloud Compression, G-PCC) codec framework or a video-based point cloud compression (Video Point Cloud Compression) provided by the Moving Picture Experts Group (MPEG). Cloud Compression, V-PCC) codec framework, or the AVS-PCC codec framework provided by Audio Video Standard (AVS). The G-PCC codec framework can be used to compress the first static point cloud and the third type of dynamically acquired point cloud, and the V-PCC codec framework can be used to compress the second type of dynamic point cloud. The G-PCC codec framework is also called point cloud codec TMC13, and the V-PCC codec framework is also called point cloud codec TMC2. The codec framework applicable to the embodiment of the present application will be described below using the G-PCC codec framework.

Fig. 1 is a schematic block diagram of a coding framework 100 provided by an embodiment of the present application.

As shown in FIG. 1 , the encoding framework 100 can acquire position information (geometric information) and attribute information of point clouds from the acquisition device. The encoding of point cloud includes geometric encoding and attribute encoding. In one embodiment, the geometry encoding process includes: performing preprocessing on the original point cloud such as coordinate transformation, quantization and removing duplicate points, etc.; based on the reference point cloud, performing motion compensation and constructing an octree before encoding to form a geometric code stream. The attribute coding process includes: by given the reconstruction information of the position information of the input point cloud and the real value of the attribute information, select one of the three prediction modes for point cloud prediction, quantify the predicted results, and perform arithmetic coding to form property stream.

As shown in Figure 1, position coding can be realized by the following units:

Coordinate transformation (Tanmsform coordinates) unit 101, quantization and removal of repeated points (Quantize and remove points) unit 102, global motion compensation unit 1031 and local motion compensation unit 1032, octree analysis (Analyze octree) unit 103, geometric reconstruction ( Reconstruct geometry) unit 104 and a first arithmetic encoding (Arithmetic encode) unit 105.

The coordinate transformation unit 101 can be used to transform the world coordinates of points in the point cloud into relative coordinates. For example, subtracting the minimum values of the xyz coordinate axes from the geometric coordinates of the point is equivalent to a DC operation to transform the coordinates of the points in the point cloud from world coordinates to relative coordinates. Quantization and removal of duplicate points unit 102 can reduce the number of coordinates by quantization; original different points may be given the same coordinates after quantization, based on this, repeated points can be deleted through de-duplication operations; for example, with the same quantization position and Multiple clouds with different attribute information can be merged into one cloud through attribute transformation. In some embodiments of the present application, the Quantize and Remove Duplicate Points unit 102 is an optional unit module. The global motion compensation unit 1031 uses the global motion matrix to perform motion compensation on the first reference point cloud to determine the corresponding second reference point cloud. An octree is constructed, and then the current point cloud is predicted and encoded based on the second reference point cloud or the first reference point cloud. The octree analysis unit 103 may use an octree encoding method to encode the position information of the quantized points. For example, divide the point cloud in the form of an octree, first determine the bounding box of the point cloud according to the maximum and minimum coordinate values of the point cloud, and construct the root node of the octree (including all points in the point cloud); then Carry out eight equal divisions to the bounding box of the point cloud to obtain eight child nodes; if a child node is empty (does not contain points), then mark the node as 0, stop division; if a child node is not empty (contains points ), then mark the node as 1, then continue to divide the node into eight equal parts until each child node contains at most 1 point, such a child node is called a leaf node; in this way, the geometric information of the point cloud will be Expressed as a binary code stream. In the process of octree division by the octree analysis unit 103, local motion compensation can be performed on the reference nodes in the reference point cloud through the local motion compensation unit 1032, and according to the octree of the reference nodes after local motion compensation The division mode (node occupancy information) predicts the octree division mode of the current node to obtain the context information of the current node. The first arithmetic coding unit 105 combines the node occupancy information output by the octree analysis unit 103 or the local motion compensation unit 1032 with the context information to generate a geometric code stream; the geometric code stream can also be called a geometric bit stream ( geometry bitstream).

Attribute coding can be achieved by the following units:

Color space transformation (Transform colors) unit 110, attribute transformation (Transfer attributes) unit 111, region adaptive layered transformation (Region Adaptive Hierarchical Transform, RAHT) unit 112, prediction change (predicting transform) unit 113 and lifting transform (lifting transform) ) unit 114, a quantization (Quantize) unit 115, and a second arithmetic coding unit 116.

The color space transformation unit 110 can be used to transform the RGB color space of points in the point cloud into YCbCr format or other formats. The attribute transformation unit 111 can be used to transform attribute information of points in the point cloud to minimize attribute distortion. For example, the attribute transformation unit 111 can be used to obtain the real value of the attribute information of the point. For example, the attribute information may be color information of dots. After the real value of the attribute information of the point is obtained through the transformation by the attribute transformation unit 111, any prediction unit can be selected to predict the point in the point cloud. The unit for predicting points in the point cloud may include: at least one of RAHT 112, predicting transform unit 113, and lifting transform unit 114. In other words, any one of the RAHT 112, the predicting transform unit 113 and the lifting transform unit 114 can be used to predict the attribute information of the points in the point cloud, so as to obtain the predicted values of the attribute information of the points, Furthermore, the residual value of the attribute information of the point can be obtained based on the predicted value of the attribute information of the point. For example, the residual value of the point's attribute information may be the actual value of the point's attribute information minus the predicted value of the point's attribute information.

The prediction transformation unit 113 can also be used to generate a level of detail (LOD), sequentially predict the attribute information of points in the LOD, and calculate the prediction residual for subsequent quantization and coding.

Fig. 2 is a schematic block diagram of a decoding framework 200 provided by an embodiment of the present application.

As shown in FIG. 2 , the decoding framework 200 can obtain the code stream of the point cloud from the encoding device, and obtain the position information and attribute information of the points in the point cloud by parsing the code. The decoding of point cloud includes position decoding and attribute decoding.

In one embodiment, the position decoding process includes: performing arithmetic decoding on the geometric code stream; and using the reference point cloud to reconstruct the position information of the point, and performing coordinate transformation to obtain the geometric coordinate value of the point.

The attribute decoding process includes: obtaining the residual value of the attribute information of the point cloud by parsing the attribute code stream; dequantizing the residual value of the attribute information of the point to obtain the residual value of the attribute information of the dequantized point value; based on the reconstruction information of the point's position information obtained in the position decoding process, one of the three prediction modes is selected for point cloud prediction, and the reconstruction value of the point's attribute information is obtained; the color space is performed on the reconstruction value of the point's attribute information Inverse transform to get the decoded point cloud.

As shown in Figure 2, position decoding can be realized by the following units: first arithmetic decoding unit 201, global motion compensation unit 2021, octree analysis (synthesize octree) unit 202, local motion compensation unit 2022, geometric reconstruction (Reconstruct geometry) A unit 203 and an inverse transform coordinates unit 204 . Attribute encoding can be realized by the following units: second arithmetic decoding unit 210, inverse quantize unit 211, RAHT unit 212, predicting transform unit 213, lifting transform unit 214, and color space inverse transform (inverse transform colors) Unit 215.

It should be noted that decompression is an inverse process of compression. Similarly, the functions of each unit in the decoding framework 200 may refer to the functions of corresponding units in the encoding framework 100 . Additionally, decoding framework 200 may include more, fewer or different functional components than in FIG. 3 .

It should be noted that the point cloud encoding method proposed in the embodiment of the present application acts on the global motion compensation unit and the local motion compensation unit in FIG. 1 , and the point cloud decoding method acts on the global motion compensation unit and the local motion compensation unit in FIG. 2 .

Based on the background of the above introduction, the point cloud encoding method provided by the embodiment of the present application is introduced below.

As shown in Figure 3, the embodiment of the present application provides a point cloud encoding method, which is applied to an encoder, and the method may include:

S101. Based on the point cloud to be encoded, determine the point cloud to be processed in the block to be processed.

In the embodiment of the present application, during the encoding process of a 3D image model, the 3D image model to be encoded is a point cloud in space, and the point cloud may be geometric information and attribute information of the 3D image model. In the process of encoding the 3D image model, the geometric information of the point cloud to be encoded and the attribute information corresponding to each point are encoded separately. The inter-frame prediction based on octree geometric information coding during geometric information coding is realized by using the point cloud coding method provided by the embodiment of the present application.

It should be noted that, in the embodiment of the present application, the first frame in a video sequence to be encoded can be predicted through intra-frame prediction, and the frames before the first frame use inter-frame prediction.

In the embodiment of the present application, when performing point cloud encoding on the current point cloud to be encoded, for the point cloud to be encoded, the encoder can combine image filtering processing in the process of performing global motion compensation and/or local motion compensation, and follow The frequency information of the point is used to sample the point cloud to obtain the filtered and sampled point cloud. Based on the filtered and sampled point cloud, the motion estimation is performed to obtain the motion information. The motion compensation is performed on the reference point cloud using the motion information, and the compensated point cloud is used. The current point cloud is predicted and encoded with reference to the point cloud.

Among them, the encoder can filter and sample the global point cloud obtained by calculating the global motion transformation matrix, and then encode the point cloud to be processed based on the obtained point cloud to be processed. In the process of octree division, the encoder can also perform image filtering and sampling on the current node in the point cloud to be encoded and the corresponding reference node in the reference point cloud, determine the motion information, and finally perform local motion on the reference node based on the motion information Compensation, and then perform predictive coding on the current node. The encoder may also implement the local motion compensation described above in the subsequent process of performing octree division after performing the above global motion compensation, which is not limited in this embodiment of the present application.

It should be noted that in the process of global motion compensation, the determination process of the block to be processed is as follows: the encoder determines the object point cloud within the preset range that satisfies the Z direction based on the point cloud to be encoded (object after downsampling) point cloud); and divide the object point cloud into blocks to obtain multiple blocks to be processed. Among them, the point cloud in each block to be processed is the point cloud to be processed.

In the embodiment of this application, the encoder divides the current point cloud into object and road according to the thresholds top_z_boundary and bottom_z_boundary of the z coordinates of the point cloud to be encoded (that is, the current point cloud) (z<bottom_z_boundary or z>top_z_boundary is object, and the rest For the road, down-sample the object to be coded point cloud to obtain the target point cloud, then divide the target point cloud into blocks to obtain multiple blocks to be processed, so as to obtain the corresponding point cloud to be processed in each block to be processed.

In some embodiments of the present application, the encoder can perform image filtering on all blocks to be processed, and perform sampling based on frequency information to obtain the first point cloud after image filtering and sampling, thereby reducing the number of points in the point cloud. After the number, the corresponding point of the first point cloud is found from the first reference point cloud, and the global motion transformation matrix (Mat_GM) is solved by the LMS algorithm.

It should be noted that, in the process of local motion compensation, the determination process of the point cloud block to be processed is as follows: the encoder performs eight-point analysis on one of the first reference point cloud and the second reference point cloud, and the point cloud to be encoded. Tree division, obtain the size information of the reference node at the current level of the reference block (cuboid), and the size information of the current block (cuboid) of the current node at the current level; the size information of the current reference block and the size information of the current block represent the current reference When the block and the current block belong between the largest prediction unit (LPU) and the smallest prediction unit (minPU), the points in the current reference node and the current node are determined as point clouds to be processed.

In this embodiment of the application, the encoder needs to perform octree division on the first reference point cloud or the second reference point cloud, and the point cloud to be encoded. In the process of octree division, the current node's Coordinate information and coordinate information of the current reference node, and the size of the block divided by the current level, that is, the size information of the current reference block at the current level (such as the volume of a cuboid), and the size information of the current block of the current node at the current level .

In the embodiment of the present application, when the node size of the octree is between the LPU maximum prediction unit and the minPU minimum prediction unit (the LPU and the minPU are both PUs), according to the current node of the current point cloud to be encoded, and the first The first reference point cloud or the reference node of the second reference point cloud calculates the local motion vector MV (ie motion information), and applies the MV to the reference node for local motion estimation, and then performs local motion compensation to obtain the compensated reference node. That is to say, when the size information of the reference block and the size information of the current block indicate that the current block and the reference block belong between the largest prediction unit (LPU) and the smallest prediction unit (minPU), the reference node and the current node are determined to be processed point cloud. Based on the current node and reference node of the current point cloud to be encoded, in the process of solving the local motion vector MV, after performing graph filtering on the current node and reference node of the current point cloud to be encoded, and then estimating the local motion vector MV, it can reduce Participate in determining the number of motion information points, and then reduce the time complexity when calculating motion information.

The following describes the octree division by taking the point cloud to be encoded as an example. The principle of the division process of the reference point cloud is the same and will not be repeated here.

For the point cloud to be encoded, the encoder finds the maximum and minimum values of the coordinates corresponding to each coordinate component in the three dimensions of x, y and z, namely {x _max , x _min }, {y _max , y _min } Calculate d=ceil(log ₂ (max{max(x _max ,y _max ) _, z _max })) with _{ z max , z min }, where the ceil() function represents rounding up. W=2 ^d is used as the side length of the bounding box of the cube, and the cube is made with the origin as the starting point (lower left corner), and all points in the point cloud are included in the bounding box. Then, the encoder divides the bounding box into 8 sub-cubes, as shown in Figure 4, and continues to divide the sub-cubes that are not empty (including points in the point cloud) until the leaf node obtained by the division is 1 ×1×1 unit cube (or a cube with a preset size) to stop dividing, the tree structure of the octree is shown in FIG. 5 .

In the process of the above-mentioned octree division, the encoder can obtain the occupancy information of a current node (a cube, that is, a block) (indicating whether the point in the contained point cloud is empty or not), and the current node Neighborhood information (the number and details of adjacent nodes, etc.).

In the embodiment of the present application, the encoder can also slice the point cloud data to be encoded. After coordinate conversion and voxelization, first perform global motion compensation and then enter the process of octree division. When the tree is divided, each level of the octree has at least one node. For a node, that is, the current node, the encoder can obtain the coordinate information of the current node in the division process based on the geometric information, as well as the current node division. current level. Wherein, the level can be understood as the number of layers of the tree, and at each current level, after determining whether to perform local motion information compensation processing on the current node, continue to divide the octree at the next level.

In the embodiment of the present application, the encoder can obtain the coordinate information of the current node and the current level corresponding to the current node in real time during the process of dividing the octree. There is only one node in the first level, and each subsequent level is realized by dividing each node of the previous level into 8 nodes.

S102. Based on the distance between two points in the point cloud to be processed, determine an edge weight between the two points, so as to determine a graph filter.

In the embodiment of this application, the point cloud block to be processed contains multiple points, that is, the point cloud to be processed contains multiple points, and the encoder can determine the distance between two points, and determine the distance between the two points according to the coordinate information of the point. distance between.

In some embodiments of the present application, the encoder may determine the edge weight between the two points based on the distance between the two points and the weight segmentation function, and then construct a graph filter based on the edge weight between the two points. Among them, the edge weight distance between two points is related to the distance between the two points.

In some embodiments of the present application, the encoder determines the edge weights between the two points based on the distance between the two points in the point cloud to be processed, thereby obtaining the adjacency matrix formed by the edge weights in the point cloud to be processed; based on the adjacency matrix that determines the graph filter.

It should be noted that the encoder obtains the edge weights between two points, and the edge weights between points form an adjacency matrix; according to the adjacency matrix, a graph filter can be constructed.

It should be noted that, in the embodiment of the present application, the function of the image filter is to filter the input image signal to obtain an output image signal including frequency, and then perform a sampling operation on the output image signal.

In some embodiments of the present application, if the distance between two points in the point cloud to be processed is less than or equal to the preset distance threshold, the encoder performs processing based on the preset parameter threshold and the distance between the two points to obtain the two points Edge weight between points; if the distance between two points in the point cloud block to be processed is greater than the preset distance threshold, the edge weight between the two points is zero. Among them, the distance between two points is related to the edge weight between the two points.

It should be noted that the algorithm of the adjacency matrix and the embodiment of the application can be tried in many ways, not limited to the following examples.

Exemplarily, the weight segmentation function may be as shown in formula (1), as follows:

Take point i and point j as examples for illustration.

Among them, w _{i, j} is the weight of the edge of the line between point i and point j, that is, the weight of the edge, x _i and x _j represent the three-dimensional coordinates of point i and point j, σ is the preset parameter threshold, ε is the preset Set the distance threshold.

In the embodiment of the present application, the preset distance threshold is determined based on the preset KNN search parameter and the side length of the block where the point cloud to be processed is located; wherein, the preset KNN search parameter is between 0-1; the preset parameter threshold is A multiple of the preset distance threshold, where the value of the multiple is not limited.

Exemplarily, σ=ε×2.

In the embodiment of this application, the encoder uses the KNN algorithm to calculate the distance between two points in the block, and sets ε as the radius of the KNN search, and only calculates the distance between the two points in the points after the KNN search, which effectively reduces time complexity. ε can be set to n≤1 times the side length of the block, and the parameter KNNthresh (preset KNN search parameter) is added to the cfg file, which can be set to a number between 0-1. Then ε=motion.motion_block_size (block size)×KNNthresh.

Exemplarily, the global motion compensation process is taken as an example for illustration. There are many points in the point cloud sequence, and the pc_world_target(object) obtained through the above calculation is usually large, which makes the construction of the graph very complicated. Therefore, the encoder divides pc_world_target into multiple blocks, and constructs a graph in each block. The size of the block is consistent with the default parameter motion.motion_block_size in TMC13v12InterEMv2.0, for example, 64, which is not limited in the embodiment of the present application. If there are N points in the point cloud to be processed, the number of points in each block is N, N is the number of points in the block, and the weight w _{i, j} of the edge between points is related to the distance between points. As shown in Figure 6 (N=5, nodes are 1, 2, 3, 4 and 5), when the distance between two points in a block in pc_world_target is less than or equal to ε, use an edge to connect these two points, the The edge weight of an edge is related to the distance between two points, and the smaller the distance between two points, the larger the value w _{i, j} of the edge weight. When the distance between two points in a block in pc_world_target is greater than ε, the two points are not connected, that is, w _i,j =0. As shown in FIG. 6 , when the distance between point 1 and point 5 is greater than ε, then w _1,5 =w _5,1 =0.

In some embodiments of the present application, the encoder performs weighted summation according to the adjacency matrix to obtain a weighted degree matrix (D); inverts the weighted degree matrix and multiplies it with the adjacency matrix to obtain a graph transformation operator (A) ; Construct a polynomial based on the graph transformation operator, the preset graph filter parameters (h _l ) and the length (L) of the graph filter to determine the graph filter.

Exemplarily, the input of the graph filter is a graph signal (ie, the three-dimensional coordinates of the point), and the output is another graph signal (ie, the three-dimensional coordinates of the point).

The encoder performs weighted summation according to the adjacency matrix to obtain a weighted degree matrix (D), as shown in formula (2):

D _i,i = Σw _i,j (2)

Among them, D is the weighted degree matrix of the graph (weighted degree matrix), which is a diagonal matrix. w _{i, j} is the weight of the edge of the line between point i and point j, that is, the edge weight.

Exemplarily, the encoder inverts the weighted degree matrix and multiplies it with the adjacency matrix to obtain a graph transformation operator (A), as shown in formula (3):

A＝D ^- 1W (3)

Among them, A refers to the graph shift operator, D is the weighted degree matrix of the graph, and W is the adjacency matrix of the graph.

It should be noted that the formula (3) is only an example, and the embodiment of the present application does not need to limit the way of obtaining A.

Based on the graph transformation operator, the preset graph filtering parameters (h _l ) and the length (L) of the graph filter to construct a linear, translation-invariant shift-invariant graph filter as a polynomial (4):

Wherein, h _l is a parameter of the graph filter (ie, a preset graph filter parameter), and L is a length of the graph filter.

Exemplarily, let h ₀ =1, h ₁ =-1. The length L of the graph filter is 2, and the graph filter is shown in formula (5):

h(A)=IA=ID ^-1 W (5)

The output graph signal of the graph filter of the encoder is y=h(A)×S, where, y∈R ^N×3 , h(A)∈R ^N×N , S∈R ^N×3 , A∈R ^{N × N} . N is the number of points in the block, and S is the input graph signal (3D coordinates of points in a block).

S103. Based on the graph filter, perform graph filtering and sampling on the point cloud to be processed to obtain a first point cloud after graph filtering and sampling.

After obtaining the image filter, the encoder performs image filtering and sampling on all point clouds to be processed, and obtains the first point cloud after image filtering and sampling.

It should be noted that, considering that the number of points in different blocks is different, it is not necessary to perform image filtering processing when the number of points in a block is not large, or when there is only one point in a block, it will not be performed Image filtering. Therefore, the encoder can also determine whether to perform image filtering according to the number of points in the block to be processed. If the number of points is 1, the point is directly filtered out; if the number of points is less than the preset number of points, no image filtering is performed; If the number of points is greater than or equal to the preset number of points, image filtering is performed.

Exemplarily, the preset number of points is 6, which is not limited in this embodiment of the present application.

Based on the number of points contained in each block, the processing methods are different. The processing methods of the points in the block to be processed are shown in Table 1 below:

Table 1

In some embodiments of the present application, the encoder performs graph filtering on the point cloud to be processed based on the graph filter to obtain the second point cloud with frequency information; performs frequency domain sampling on the points of the second point cloud according to the frequency information to obtain The point cloud block sampled by graph filtering is the first point cloud. The point cloud composed of all the point cloud blocks sampled by graph filtering is called the first point cloud.

In some embodiments of the present application, the process of the encoder sampling the point cloud block containing frequency information (that is, the second point cloud) is: the encoder performs two-dimensional sampling of the three-dimensional coordinates of the points in the point cloud block containing frequency information. Norm sorting to obtain the frequency value corresponding to each point after sorting; according to the preset filtering ratio, remove the lowest frequency and highest frequency from the frequency value corresponding to each point after sorting to obtain the remaining frequency value; the remaining frequency Points corresponding to values in the second point cloud constitute the first point cloud after sampling.

It should be noted that the encoder can use many extraction methods for the frequency extraction process of the points in the second point cloud, not limited to the two-norm extraction method mentioned in the embodiment of the present application.

In the embodiment of the present application, for example, when the number of points in the block is N≥6, image filtering is performed on all points in the block, and sampling is performed based on the frequency information of the points to remove high-frequency points and low-frequency points. The input of the graph filter is the three-dimensional coordinates of all points in the block, and the output graph signal is the three-dimensional coordinates with frequency information. The encoder sorts the obtained bi-norm of the output image signal as the frequency value of the point. The encoder removes points with too high or too low frequency from it, and finally gets the points after graph filtering. Perform the above operations on all blocks in pc_world_target, and finally get the remaining point cloud pc_reserve after image filtering.

It should be noted that a parameter ratioFilteredOut is added to the cfg file, which indicates the ratio of filtering out the second point cloud midpoint after sorting, that is, the preset filtering ratio.

Exemplarily, the default value set is ratioFilteredOut=0.4 (therefore high frequency and low frequency each need to remove 40% of the ratio), that is, a total of 80% of points in a block are filtered out, leaving only 20% of points.

It should be noted that the preset filtering ratio can be the same ratio that both high frequency and low frequency need to be filtered out, or different ratios can be set for high frequency and low frequency, which is not limited in this embodiment of the present application.

It can be understood that after the encoder is processed by graph filtering, the number of points in the point cloud to be processed can be reduced, and the time complexity for subsequent motion estimation can be reduced, thereby improving the encoding efficiency.

In some embodiments of the application, since the encoder is based on a graph filter, it performs graph filtering on the point cloud to be processed to obtain a second point cloud with frequency information. Therefore, the encoder can remove some points based on the second point cloud after image filtering before performing subsequent processing. In this way, high-frequency and low-frequency noise can be filtered out, making the calculation process more accurate.

Regarding the encoder based on the graph filter, the point cloud to be processed is subjected to graph filtering, and why the second point cloud with frequency information can be obtained will be verified by the following theory.

Graph transformation operator A eigendecomposition is A=VΛV ^-1 , the diagonal elements of Λ are the eigenvalues λ ₁ ,...,λ _N arranged in descending order, and V is the combination of eigenvectors, each The eigenvalues represent different frequencies of the graph signal, and the larger the eigenvalue, the lower the frequency of the corresponding graph signal. Equation (6) shows the eigendecomposition of graph transformation operators.

Assuming that h(A)=I-A is used as the graph filter, then based on formula (6), the graph filter is formula (7) as follows:

If λ _i in formula (6) corresponds to the low frequency component of the image signal, then 1-λ _i corresponds to the high frequency component of the image signal. If λ _i in formula (6) corresponds to the high-frequency component of the image signal, then 1-λ _i corresponds to the low-frequency component of the image signal. Furthermore, since the graph Fourier transform of the graph signal S is formula (8):

in,

is the frequency domain transformation result of S.

The inverse Fourier transform of the graph signal S is formula (9):

After the graph Fourier transform, the graph signal (point coordinates) is converted to the frequency domain, and after filtering, the frequency domain information can be converted into another filtered graph signal through the graph Fourier inverse transform; The graph signal carries frequency information. The image signals output after image filtering are sorted according to frequency from high to low. According to formula (7), formula (8) and formula (9), formula (10) is obtained. Formula (10) is the process of graph Fourier transform first and then graph inverse Fourier transform.

After the graph filter is used to filter S, the output graph signal of y is obtained,

In the frequency domain, 1- _λi is represented in the frequency domain, therefore, the output image signal of y obtained by formula (10) contains frequency information, that is, the encoder performs image filtering on the point cloud to be processed based on the image filter, A point cloud block with frequency information can be obtained, that is, a second point cloud.

S104. Compensate the reference point cloud based on the first point cloud, and then encode the point cloud to be encoded.

After the encoder obtains the filtered and sampled first point cloud, it can perform encoding processing and subsequent implementation on the point cloud to be encoded based on the first point cloud.

Since there are two implementations of full motion compensation and/or partial motion compensation in this application, the encoding process also has different processing.

Method 1: In the case of global motion compensation:

The encoder is based on the first reference point cloud, and the implementation of encoding the point cloud to be encoded is:

The encoder determines the first sub-reference point cloud corresponding to the first point cloud from the first reference point cloud of the acquired reference frame; wherein, the first point cloud is obtained after block division and image filtering sampling of the object point cloud the global point cloud; based on the first reference point cloud and the first point cloud, calculate the global motion transformation matrix of two adjacent point clouds, and write the global motion transformation matrix into the code stream; based on the global motion transformation matrix, the first reference Perform global motion compensation on the point cloud to obtain the second reference point cloud; respectively perform octree division and arithmetic coding on the second reference point cloud and the point cloud to be encoded to obtain the first geometric encoding result, and write the first geometric encoding result to input stream. Wherein, the first reference point cloud is an initial point cloud corresponding to the reference point cloud.

In the embodiment of the present application, the encoder performs octree division on the second reference point cloud and the point cloud to be encoded respectively, and uses the octree division result of the reference node of the second reference point cloud as the current point cloud to be encoded The predicted text of the octree division of the node is then arithmetically encoded on the octree division result of the current node of the point cloud to be encoded to obtain the first geometric encoding result.

Exemplarily, the encoder performs the above operations on all blocks in pc_world_target in the case of global motion compensation, and finally obtains the remaining point cloud pc_reserve (first point cloud) after image filtering. Based on the point cloud pc_reserve, determine the global motion transformation matrix.

Method 2: Local Motion Compensation

Local motion compensation can be performed alone or after global motion compensation.

The encoder encodes the point cloud to be encoded based on the first reference point cloud (performed alone) or the second reference point cloud (performed after global motion compensation).

The following is an example of the first reference point cloud for illustration. The realization of the second reference point cloud is the same, so it will not be repeated. The realization is as follows:

The encoder determines the motion information from the preset candidate motion information list according to the reference node sampled by graph filtering and the current node sampled by graph filter in the first point cloud; Local motion compensation is performed on the reference node to obtain a compensated reference node; arithmetic coding is performed based on the compensated reference node and the current node to obtain a second geometric coding result.

In some embodiments of the present application, the encoder traverses each candidate motion information in the preset candidate motion information list, performs local motion estimation on points in the filtered reference node, and obtains the estimated compensation corresponding to each candidate motion information After the reference node; determine the minimum distance between each point in the current node after the graph filter sampling in the first point cloud, and each point in the reference node after the estimated compensation; the current node after the graph filter sampling The sum of the minimum distances of each point of is used as the distortion of each candidate motion information; the candidate motion information corresponding to the minimum distortion among the distortions of each candidate motion information is determined as the motion information. The encoder performs local motion compensation on the reference nodes in the first reference point cloud according to the motion information.

In the embodiment of this application, when the size information of the reference node and the current node is between the LPU maximum prediction unit and the minPU minimum prediction unit, the local motion vector MV, that is, the motion information, is calculated according to the reference node and the current node, and the reference point cloud The MV is applied to the reference nodes in (the first reference point cloud or the second reference point cloud), and local motion compensation is performed to obtain the compensated reference nodes.

In some embodiments of the present application, the current node is taken as an example for illustration, and the principle of division of reference nodes is the same. When the octree is divided, the encoder can judge whether the current node conforms to the direct encoding mode. If it conforms to the direct encoding mode, it directly encodes the coordinate information of the points in the current node; if it does not conform to the direct encoding mode, it continues to perform the octree Partitioning, the nodes are arithmetically coded according to the occupancy information and context information (ie neighborhood information) of the current node. When an octree depth division reaches a leaf node, encodes the number of points contained in the leaf node.

Exemplarily, as shown in FIG. 7, starting from the root node for the current point cloud and the reference point cloud, a node is divided into an octree, and a indicates that when the block size (power of 2) is at a level greater than the LPU, Each node is divided into 8 sub-nodes, which are hierarchically divided in turn. b indicates that when the node is between the LPU and the minPU, if the node is not the last leaf node and conforms to the direct encoding mode, the coordinate information of the point is directly encoded. c indicates that if the node is the last leaf node, the number of points contained in the leaf node is encoded.

In some embodiments of the present application, both global motion compensation and local motion compensation can be performed, so that the reference node used by local motion compensation is the reference node of the first reference point cloud after global motion compensation, so that the encoder treats encoding The realization of the point cloud encoding is that after the completion of the first method, the obtained second reference point cloud and the point cloud to be encoded are divided into octrees, and the size information of the reference block of the reference node at the current level is obtained, and the current node is at Size information of the current block at the current level.

In the embodiment of this application, since there are two types of point clouds corresponding to the reference point cloud, one is the first reference point cloud and the other is the second reference point cloud, therefore, when performing local motion compensation, the encoder can Which one is the current reference node is determined based on the first reference point cloud and the second reference point cloud. The second reference point cloud is a point cloud after global motion compensation of the first reference point cloud.

That is to say, the reference node is a node determined from the first reference point cloud using the first search window; or a node determined from the second reference point cloud using the second search window.

In this embodiment of the application, the encoder uses the first search window to determine the first node corresponding to the current node from the first reference point cloud, and uses the second search window to determine the node corresponding to the current node from the second reference point cloud. second node. The encoder determines the first minimum distance between each point in the current node and each point in the first node, and determines the second minimum distance between each point in the current node and the point in the second node . The sum of the first minimum distance of each point in the current node is the first distortion, and the sum of the second minimum distance of each point in the current node is the second distortion. If the first distortion is smaller than the second distortion, then determine the reference node corresponding to the current node from the first reference point cloud, that is, the point set w; The reference node corresponding to the current node.

Exemplarily, when the node size is between LPU and minPU, the local motion vector MV is calculated. First, find the search window win_V (first search window) and win_W (second search window) of the current node in the first reference point cloud and the second reference point cloud, and find the point in the current node and the first search window and the distortion of the point in the second search window, if the distortion between the point in the current node and win_V is small, then calculate the MV compensated reference node in win_V, that is, find the node in the first search window that is consistent with the current point cloud The point's corresponding point set w (reference node). Otherwise, calculate MV in win_W.

As shown in Figure 8, in the process of determining the MV, whether the reference node corresponding to the current node of the point cloud to be encoded is determined from the search window of the first reference point cloud or from the search window of the second reference point cloud can be determined. Based on the current node and the reference node, the best motion information MV is determined from the preset candidate motion information list, and local motion compensation is performed based on the motion information MV.

In some embodiments of the present application, based on the motion information, the encoder may perform local motion compensation on the current reference node to obtain the situation of the compensated reference node. When the node size is between LPU and minPU, it is also necessary to determine whether local motion compensation is required for the reference node based on whether the reference node can be split. Whether the reference node can be split can be determined by comparing the first cost and the second cost. Wherein, the first cost is the minimum distortion (cost_NoSplit) between the reference node and the current node after local compensation is performed using the best motion information assuming no split. The second cost is to divide the reference node and the current node into octrees respectively, use the corresponding nodes to calculate 8 local motion vectors MV, and use the best motion information to locally compensate the 8 sub-nodes divided by the current reference node. After that, the 8 minimum distortions corresponding to the 8 sub-nodes divided by the current node are obtained. After adding the 8 distortions, the cost_Split when splitting is obtained. If the first cost is less than the second cost, it is not divided. In the case of no division, it is determined that local motion compensation needs to be performed on the reference node; if the second cost is less than or equal to the first cost, it is divided. In the case of division In this case, local motion compensation does not need to be performed on the current reference node, but local motion compensation is performed on its child nodes.

It can be understood that when the encoder performs local motion compensation on some nodes of the reference point cloud (the first reference point cloud or the second reference point cloud), the image filtering and sampling operation can reduce the calculation amount and improve the coding efficiency.

Exemplarily, as shown in Figure 9, the encoder determines the global motion transformation matrix according to the current point cloud and the first reference point cloud, uses the global motion transformation matrix to perform global motion compensation on the first reference point cloud, and encodes the global motion transformation matrix Matrix, write code stream, that is, Mat_GM bit stream. The encoder performs octree division on the compensated second reference point cloud (corresponding to the reference point cloud) and the current point cloud. During the division process, when the divided block size is between LPU and minPU, the encoder can also After local motion compensation is performed on the current reference node at the current level by using the best motion information (that is, the local motion vector), the compensated reference node is obtained, and the best motion information is encoded and written into the code stream, that is, the MV bit stream. When encoding the current node of the current point cloud, the encoder uses the occupancy information and neighborhood information of the current node, as well as the occupancy information and neighborhood information of the reference node after compensation, and performs arithmetic coding (in Arithmetic encoder) to obtain the coded bit stream. The encoder continues to encode the child nodes of the current node of the current point cloud (that is, the FIFO queue stores the node queue) until the encoding of the current point cloud is completed.

It can be understood that, in the process of point cloud inter-frame encoding, when compensating the global motion and local motion of the first reference point cloud, the encoder first performs image filtering and sampling on the point cloud to be processed to reduce the number of points in the point cloud to be processed. In this way, in the process of calculating the motion information or the global motion transformation matrix, the time complexity of calculation can be reduced, thereby improving the time efficiency of encoding.

The point cloud decoding method provided by the embodiment of the present application is introduced below.

As shown in Figure 10, the embodiment of the present application provides a point cloud decoding method, which is applied to a decoder, and the method may include:

S201. Analyze the code stream to obtain geometric encoding results and motion-related information; the motion-related information is obtained by calculating the first point cloud after image filtering and sampling of the point cloud to be processed in the point cloud to be encoded, and the reference point cloud.

S202. Obtain a decoded reference point cloud.

S203. Based on the motion-related information and the reference point cloud, perform motion compensation on the reference node in the point cloud to be decoded, decode the geometric code stream, and obtain decoding information.

In the embodiment of this application, the decoder can parse the code stream, and parse out the geometric encoding result and motion-related information from the code stream, wherein the motion-related information is obtained by the encoder through image filtering and sampling of the point cloud to be processed in the point cloud to be coded. After the first point cloud is calculated with the reference point cloud. Because the embodiment of the present application may use at least one of global motion compensation and local motion compensation during encoding. Therefore, the motion-related information acquired by the decoder during parsing may be at least one of the global motion transformation matrix and motion information. Correspondingly, the geometry encoding result may also be at least one of the first geometry encoding result (corresponding to global motion compensation) and the second geometry encoding result (corresponding to local motion compensation).

In the embodiment of the present application, when the decoder decodes the current node, it indicates that the reference point cloud has been decoded. In this way, the decoder can decode the current node in the point cloud to be decoded based on the motion-related information and the reference point cloud, so as to obtain the decoding information of the current node.

In some embodiments of the present application, the global motion compensation and the local motion compensation can be decoded in the following two manners.

method one:

In the case of global motion compensation: the geometry encoding result includes: the first geometry encoding result; the motion-related information includes: the global motion transformation matrix.

The decoder performs global motion compensation on the reference point cloud (the first reference point cloud) based on the global motion transformation matrix to obtain the compensated reference point cloud (the second reference point cloud); based on the compensated reference point cloud and the first geometry Encoding result, get decoding information.

Method 2:

In the case of local motion compensation: the geometric encoding result includes: the second geometric encoding result; the motion related information includes: motion information (ie local motion vector).

The decoder divides the reference point cloud (that is, the first reference point cloud or the second reference point cloud) into an octree to determine the reference node; based on the motion information, it performs local motion compensation on the reference node to obtain the compensated reference node; based on The decoded information is obtained from the compensated reference node and the second geometric encoding result.

It should be noted that the process and principle of the octree division are consistent with those of the encoder, and will not be repeated here.

In the embodiment of the present application, the decoding information may be the decoded point cloud corresponding to the current node, and when the decoder completes decoding of all nodes, the decoded point cloud is obtained.

Exemplarily, as shown in Figure 11, the decoder can parse out the global motion transformation matrix and the local motion vector from the coded bit stream, so that the decoder can first use the global motion transformation matrix of the global motion transformation matrix in the process of decoding the current node The vector Mat_GM performs global motion compensation on the reference point cloud to obtain the second reference point cloud, and then divides the second reference point cloud into an octree. When the divided block size is between LPU and minPU, the decoder can use the second After local motion compensation is performed on the reference node of the current level of the reference point cloud, the compensated reference node point cloud of the reference node is obtained, and then the compensated reference node and the coded bit stream corresponding to the current node are decoded to obtain the decoded point cloud After the point cloud corresponding to the current node, continue to decode the next node until the decoding of all nodes is completed, and the decoded point cloud is obtained.

It can be understood that the motion information or global motion transformation matrix used by the decoder is in the process of performing global motion compensation and local motion compensation on the reference frame during the point cloud inter-frame coding, and the point cloud to be processed is image-filtered first, and then Reduce the number of points to be processed, which can reduce the time complexity of the calculation.

Based on the implementation basis of the foregoing embodiments, as shown in FIG. 12, the embodiment of the present application provides an encoder 1, including:

The first determination part 10 is configured to determine the point cloud to be processed in the block to be processed based on the point cloud to be encoded; based on the distance between two points in the point cloud to be processed, determine the edge between the two points weights, thereby determining the graph filter;

The filtering part 11 is configured to obtain a first point cloud after image filtering and sampling based on the image filter sampling;

The encoding part 12 is configured for sampling, and obtains the first point cloud after image filtering and sampling.

In some embodiments of the present application, the first determining part 10 is further configured to determine the edge weight between the two points based on the distance between the two points in the point cloud to be processed, so as to obtain the An adjacency matrix composed of edge weights in the point cloud to be processed; based on the adjacency matrix, the graph filter is determined; the distance between the two points is related to the edge weights between the two points.

In some embodiments of the present application, the first determining part 10 is further configured to perform weighted summation according to the adjacency matrix to obtain a weighted degree matrix; inverse the weighted degree matrix, and combine it with the A graph transformation operator is obtained by multiplying adjacency matrices; a polynomial is constructed based on the graph transformation operator, preset graph filtering parameters, and length of the graph filter to determine the graph filter.

In some embodiments of the present application, the first determining part 10 is further configured to, if the distance between two points in the point cloud to be processed is less than or equal to a preset distance threshold, based on the preset parameter threshold and The distance between the two points is processed to obtain the edge weight between the two points; if the distance between the two points in the point cloud to be processed is greater than the preset distance threshold, the edge weight between the two points to zero.

In some embodiments of the present application, the preset distance threshold is determined based on the preset KNN search parameter and the side length of the block to be processed where the point cloud to be processed is located; wherein, the preset KNN search parameter is Between 0-1;

The preset parameter threshold is a multiple of the preset distance threshold.

In some embodiments of the present application, the filtering part 11 is further configured to perform image filtering on the point cloud to be processed based on the image filter to obtain a second point cloud with frequency information; Frequency-domain sampling is performed on the second point cloud to obtain the first point cloud after image filtering and sampling.

In some embodiments of the present application, the filtering part 11 is further configured to perform two-norm sorting on the three-dimensional coordinates of the points in the second point cloud with frequency information, and obtain the corresponding Frequency value; according to the preset filtering ratio, the frequency value corresponding to each point after the sorting is removed with the lowest frequency and the highest frequency to obtain the remaining frequency value; the remaining frequency value corresponds to the point cloud to be processed The points constitute the first point cloud.

In some embodiments of the present application, the first determining part 10 is configured to determine, based on the point cloud to be encoded, an object point cloud within a preset range satisfying the Z direction; block the object point cloud Divide to obtain a plurality of blocks to be processed, and the point cloud in each block to be processed is the point cloud to be processed.

In some embodiments of the present application, the encoding part 12 is configured to calculate the global motion transformation matrix of two adjacent point clouds based on the first reference point cloud and the first point cloud, and convert the The global motion transformation matrix is written into the code stream; the first reference point cloud is the initial point cloud corresponding to the reference point cloud; based on the global motion transformation matrix, global motion compensation is performed on the first reference point cloud to obtain The second reference point cloud; perform octree division on the reference node of the second reference point cloud and the current node of the point cloud to be encoded, and divide the octree division result of the reference node of the second reference point cloud As the predicted text of the octree division of the current node of the point cloud to be encoded, arithmetic encoding is performed on the octree division result of the current node of the point cloud to be encoded to obtain the first geometric encoding result, and the The result of the first geometric encoding is written into the code stream.

In some embodiments of the present application, the first determining part 10 is further configured to perform octree division on the obtained first reference point cloud and the point cloud to be coded respectively, and obtain the reference node at the current level The size information of the reference block, and the size information of the current block of the current node at the current level; the size information of the reference block and the size information of the current block indicate that the reference block and the current block belong to the largest prediction unit and the smallest prediction unit time, determine the reference node and the current node as the point cloud to be processed in the block to be processed.

The encoding part 12 is further configured to determine motion information from a preset candidate motion information list according to the reference node sampled by the graph filter and the current node sampled by the graph filter; based on the motion information , performing local motion compensation on the reference nodes in the first reference point cloud to obtain the compensated reference nodes; performing arithmetic coding based on the compensated reference nodes and the current node to obtain the second geometric code of the current node result.

In some embodiments of the present application, the encoding part 12 is further configured to traverse each candidate motion information in the preset candidate motion information list, and perform Local motion estimation, obtaining an estimated and compensated reference node corresponding to each candidate motion information; determining each point in the current node after image filtering and sampling in the first point cloud and the estimated and compensated reference node The minimum distance between each point in the graph; the sum of the minimum distances of each point in the current node after the graph is filtered and sampled, as the distortion of each candidate motion information; the each candidate motion information The candidate motion information corresponding to the minimum distortion among the distortions is determined as the motion information.

In some embodiments of the present application, the reference node is a node determined from the first reference point cloud using the first search window; or a node determined from the second reference point cloud using the second search window.

It is understandable that the encoder can perform image filtering on the point cloud to be processed in the process of global motion compensation and local motion compensation for the reference point cloud in the process of point cloud encoding, and reduce the number of participating points to be processed. In this way, the time complexity of calculation can be reduced, thereby improving the time efficiency of encoding.

In practical applications, as shown in Figure 13, the embodiment of the present application also provides an encoder, including:

The first memory 13 is used to store executable point cloud coding instructions;

The first processor 14 is configured to implement the point cloud encoding method when executing the executable point cloud encoding instruction stored in the first memory 13 .

Wherein, the first processor 14 can be implemented by software, hardware, firmware or a combination thereof, and can use circuits, single or multiple application specific integrated circuits (ASIC), single or multiple general integrated circuits, single or multiple a microprocessor, a single or multiple programmable logic devices, or a combination of the aforementioned circuits or devices, or other suitable circuits or devices, so that the first processor 14 can execute the point cloud encoding method in the aforementioned embodiments corresponding steps.

Each component in the embodiment of the present application may be integrated into one processing unit, or each unit may physically exist separately, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.

If the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this embodiment is essentially or It is said that the part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment. The aforementioned storage medium includes: magnetic random access memory (FRAM, ferromagnetic random access memory), read-only memory (ROM, Read Only Memory), programmable read-only memory (PROM, Programmable Read-Only Memory), erasable Programmable Read-Only Memory (EPROM, Erasable Programmable Read-Only Memory), Electrically Erasable Programmable Read-Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), Flash Memory (Flash Memory), Magnetic Surface Memory, Optical Disk , or CD-ROM (Compact Disc Read-Only Memory) and other media that can store program codes, the embodiments of the present application are not limited.

The embodiment of the present application also provides a computer-readable storage medium, which stores executable point cloud coding instructions, and is used to cause the first processor to execute the point cloud coding method provided in the embodiment of the present application.

Based on the implementation basis of the foregoing embodiments, as shown in FIG. 14 , the embodiment of the present application provides a decoder 2, including:

The decoding part 20 is configured to analyze the code stream, and obtain geometric encoding results and motion-related information; the motion-related information is the first point cloud after image filtering and sampling of the point cloud to be processed in the point cloud to be encoded, and the reference point obtained by cloud computing;

The obtaining part 21 is configured to obtain the decoded reference point cloud;

The decoding part 20 is further configured to perform motion compensation on the reference point cloud based on the motion related information and the reference point cloud, and decode the geometric code stream to obtain decoded information.

In some embodiments of the present application, the geometric encoding result includes: a first geometric encoding result; the motion-related information includes: a global motion transformation matrix;

The decoding part 20 is further configured to perform global motion compensation on the reference point cloud based on the global motion transformation matrix to obtain a compensated reference point cloud; based on the compensated reference point cloud and the first A geometrically encoded result to obtain the decoded information.

In some embodiments of the present application, the geometric encoding result includes: a second geometric encoding result; the motion-related information includes: motion information;

The decoding part 20 is further configured to perform octree division on the reference point cloud to determine a reference node; based on the motion information, perform local motion compensation on the reference node to obtain a compensated reference node; The decoded information is obtained from the compensated reference node and the second geometric encoding result.

It can be understood that the motion information or global motion transformation matrix used by the decoder is in the process of point cloud encoding, in the process of global motion compensation and local motion compensation of the reference point cloud, image filtering sampling is performed on the point cloud to be processed first, Reduce the number of participating pending points, so that the time complexity of calculation can be reduced.

In practical applications, as shown in Figure 15, the embodiment of the present application also provides a decoder, including:

The second memory 23 is used to store executable point cloud decoding instructions;

The second processor 24 is configured to implement the point cloud decoding method of the decoder when executing the executable point cloud decoding instruction stored in the second memory 23 .

The embodiment of the present application also provides a computer-readable storage medium storing executable point cloud decoding instructions, which are used to cause the second processor to execute the point cloud decoding method provided in the embodiment of the present application.

Those skilled in the art can appreciate that the units and process steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

Finally, it should be noted that the above content is only a specific embodiment of the application, but the scope of protection of the application is not limited thereto. Any person familiar with the technical field can easily think of Any changes or substitutions shall fall within the scope of protection of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Industrial Applicability

The embodiment of the present application provides a point cloud encoding and decoding method, an encoder, a decoder, and a computer-readable storage medium. The encoder can perform global motion compensation and local motion compensation on the reference point cloud during the point cloud encoding process. In the process, the point cloud to be processed is first filtered and sampled to reduce the number of points participating in the point cloud to be processed. In this way, the calculation time and complexity can be reduced in the process of calculating motion information or global motion transformation matrix using the filtered and sampled point cloud. degree, thereby improving the encoding time efficiency.

Claims

A point cloud encoding method, comprising:

Based on the point cloud to be encoded, determine the point cloud to be processed in the block to be processed;

Based on the distance between two points in the point cloud to be processed, determine an edge weight between the two points, thereby determining a graph filter;

Based on the image filter, image filtering sampling is performed on the point cloud to be processed to obtain a first point cloud after image filtering sampling;

Compensating the reference point cloud based on the first point cloud, and then encoding the point cloud to be encoded.
The method according to claim 1, wherein said determination of edge weights between said two points based on the distance between two points in said point cloud to be processed, thereby determining a graph filter, comprises:

Based on the distance between two points in the point cloud to be processed, determine the edge weight between the two points, thereby obtaining the adjacency matrix formed by the edge weights in the point cloud to be processed; The distance is related to the edge weight between said two points;

Based on the adjacency matrix, the graph filter is determined.
The method according to claim 2, wherein said determining said graph filter based on said adjacency matrix comprises:

performing weighted summation according to the adjacency matrix to obtain a weighted degree matrix;

Inverting the weighted degree matrix and multiplying it with the adjacency matrix to obtain a graph transformation operator;

The graph filter is determined by constructing a polynomial based on the graph transform operator, preset graph filter parameters, and length of the graph filter.
The method according to any one of claims 1 to 3, wherein the determining the edge weight between the two points based on the distance between the two points in the point cloud to be processed comprises:

If the distance between two points in the point cloud to be processed is less than or equal to the preset distance threshold, then process based on the preset parameter threshold and the distance between the two points to obtain the edge weight between the two points;

If the distance between two points in the point cloud to be processed is greater than a preset distance threshold, the edge weight between the two points is zero.
The method according to claim 4, wherein,

The preset distance threshold is determined based on a preset KNN search parameter and the side length of the block to be processed; wherein, the preset KNN search parameter is between 0-1;

The preset parameter threshold is a multiple of the preset distance threshold.
The method according to any one of claims 1 to 5, wherein, based on the graph filter, performing graph filtering and sampling on the point cloud to be processed to obtain the first point cloud after graph filtering and sampling, include:

performing graph filtering on the point cloud to be processed based on the graph filter to obtain a second point cloud with frequency information;

Perform frequency-domain sampling on the second point cloud to obtain the first point cloud after image filtering and sampling.
The method according to claim 6, wherein said performing frequency-domain sampling on said second point cloud to obtain said first point cloud after image filtering and sampling comprises:

Performing two-norm sorting on the three-dimensional coordinates of the points in the second point cloud with frequency information to obtain a frequency value corresponding to each point after sorting;

According to the preset filtering ratio, the lowest frequency and the highest frequency are removed from the frequency values corresponding to each point after sorting to obtain the remaining frequency values; the points corresponding to the remaining frequency values constitute the first point cloud.
The method according to any one of claims 1 to 7, wherein, said point cloud to be encoded based on the point cloud to be encoded, determining the point cloud to be processed in the block to be processed, comprising:

Based on the point cloud to be encoded, determine an object point cloud within a preset range satisfying the Z direction;

The point cloud of the object is divided into blocks to obtain a plurality of blocks to be processed, and the point cloud in each block to be processed is the point cloud to be processed.
The method according to claim 8, wherein said compensating the reference point cloud based on the first point cloud, and then encoding the point cloud to be encoded comprises:

Based on the first reference point cloud and the first point cloud, calculate the global motion transformation matrix of two adjacent point clouds, and write the global motion transformation matrix into the code stream; the first reference point cloud is the reference The initial point cloud corresponding to the point cloud;

Based on the global motion transformation matrix, perform global motion compensation on the first reference point cloud to obtain a second reference point cloud;

Perform octree division on the second reference point cloud and the point cloud to be encoded respectively, and use the octree division result of the reference node of the second reference point cloud as the octree division result of the current node of the point cloud to be encoded The predicted text of the fork tree division, and then perform arithmetic encoding on the octree division result of the current node of the point cloud to be encoded to obtain the first geometric encoding result, and write the first geometric encoding result into the code stream.
The method according to any one of claims 1 to 7, wherein the point cloud to be encoded based on the point cloud to be encoded, determining the point cloud to be processed in the block to be processed comprises:

Carry out octree division on the first reference point cloud and the point cloud to be encoded respectively, obtain the size information of the reference block of the reference node at the current level, and the size information of the current block of the current node at the current level; the first The reference point cloud is an initial point cloud corresponding to the reference point cloud;

When the size information of the reference block and the size information of the current block indicate that the reference block and the current block belong between the largest prediction unit and the smallest prediction unit, determine the reference node and the current node as the to-be-processed The point cloud to be processed is described in the block.
The method according to claim 10, wherein said compensating the reference point cloud based on the first point cloud, and then encoding the point cloud to be encoded comprises:

Determining motion information from a preset candidate motion information list according to the reference node sampled by graph filtering and the current node sampled by graph filtering in the first point cloud;

Based on the motion information, perform local motion compensation on reference nodes in the first reference point cloud to obtain compensated reference nodes;

performing arithmetic coding based on the compensated reference node and the current node to obtain a second geometric coding result.
The method according to claim 11, wherein the motion is determined from the preset candidate motion information list according to the reference node sampled by graph filtering and the current node sampled by graph filtering in the first point cloud. information, including:

Traverse each candidate motion information in the preset candidate motion information list, perform local motion estimation on the points in the reference node after the graph filtering and sampling, and obtain the estimated and compensated reference node corresponding to each candidate motion information node;

determining the minimum distance between each point in the current node sampled by graph filtering in the first point cloud and each point in the estimated compensated reference node;

taking the sum of the minimum distances of each point in the current node after the graph filtering and sampling as the distortion of each candidate motion information;

The candidate motion information corresponding to the minimum distortion among the distortions of each candidate motion information is determined as the motion information.
A method according to any one of claims 9 to 12, wherein,

The reference node is a node determined by using the first search window in the first reference point cloud; or a node determined by the second search window from the second reference point cloud.
A point cloud decoding method, comprising:

Analyzing the code stream to obtain geometric encoding results and motion-related information; the motion-related information is obtained by calculating the first point cloud after image filtering and sampling of the point cloud to be processed in the point cloud to be encoded, and the reference point cloud;

Get the decoded reference point cloud;

Based on the motion-related information and the reference point cloud, motion compensation is performed on the reference point cloud, and the geometric code stream is decoded to obtain decoded information.
The method according to claim 14, wherein the geometric encoding result comprises: a first geometric encoding result; the motion-related information comprises: a global motion transformation matrix;

The step of performing motion compensation on the reference point cloud based on the motion-related information and the reference point cloud, decoding the geometric code stream, and obtaining decoding information includes:

Based on the global motion transformation matrix, perform global motion compensation on the reference point cloud to obtain a compensated reference point cloud;

The decoding information is obtained based on the compensated reference point cloud and the first geometric encoding result.
The method according to claim 14, wherein the geometric encoding result comprises: a second geometric encoding result; the motion-related information comprises: motion information;

The step of performing motion compensation on the reference point cloud based on the motion-related information and the reference point cloud, decoding the geometric code stream, and obtaining decoding information includes:

Carry out octree division to described reference point cloud, determine reference node;

performing local motion compensation on the reference node based on the motion information to obtain a compensated reference node;

The decoding information is obtained based on the compensated reference node and the second geometric encoding result.
An encoder comprising:

The first determination part is configured to determine, based on the point cloud to be encoded, the point cloud to be processed in the block to be processed; based on the distance between two points in the point cloud to be processed, determine the edge weight between the two points , so as to determine the graph filter;

The filtering part is configured to perform image filtering and sampling on the point cloud to be processed based on the image filter, to obtain a first point cloud after image filtering and sampling;

The encoding part is configured to compensate the reference point cloud based on the first point cloud, and then encode the point cloud to be encoded.
An encoder comprising:

The first memory is used to store executable point cloud coding instructions;

The first processor is configured to implement the method according to any one of claims 1 to 13 when executing the executable point cloud encoding instructions stored in the first memory.
A decoder comprising:

The decoding part is configured to analyze the code stream, and obtain geometric encoding results and motion-related information; the motion-related information is the first point cloud after image filtering and sampling of the point cloud to be processed in the point cloud to be encoded, and the reference point cloud calculated;

an acquisition part configured to acquire the decoded reference point cloud;

The decoding part is further configured to perform motion compensation on the reference point cloud based on the motion-related information and the reference point cloud, and decode the geometric code stream to obtain decoded information.
A decoder comprising:

The second memory is used to store executable point cloud decoding instructions;

The second processor is configured to implement the method according to any one of claims 14 to 16 when executing the executable point cloud decoding instruction stored in the second memory.
A computer-readable storage medium, storing executable point cloud coding instructions, used to cause the first processor to execute the method according to any one of claims 1 to 13; or storing executable point cloud decoding The instruction is used to cause the second processor to implement the method described in any one of claims 14 to 16 when executed.