CN118575200A

CN118575200A - Adaptive geometry filtering for mesh compression

Info

Publication number: CN118575200A
Application number: CN202380017864.1A
Authority: CN
Inventors: 张翔; 许晓中; 黄超; 田军; 刘杉
Original assignee: Tencent America LLC
Current assignee: Tencent America LLC
Priority date: 2022-10-14
Filing date: 2023-10-13
Publication date: 2024-08-30

Abstract

A method and apparatus includes computer code configured to cause one or more processors to determine a plurality of vertices in an input mesh and group the plurality of vertices into a plurality of vertex groups. Grouping the respective vertices into respective groups may be based on the topological distance of the respective vertices. In an embodiment, the one or more processors may further determine a set of filter coefficients for the plurality of vertex groups; and signaling the plurality of vertex groups and the set of filter coefficients.

Description

Adaptive geometry filtering for mesh compression

RELATED APPLICATIONS

The present application claims the benefit of priority from U.S. provisional application No. 63/416382 filed on day 10, month 14 of 2022 and U.S. application No. 18/485770 filed on day 10, month 12 of 2023, the disclosures of which are incorporated herein in their entireties.

Technical Field

Embodiments of the present disclosure relate to video encoding and decoding. In particular, embodiments of the present disclosure are encoding and decoding grids, including adaptive geometric filtering in grid motion vector encoding.

Background

Advanced three-dimensional (3D) world representations are achieving more immersive forms of interaction and communication. In order to achieve realism in 3D representations, 3D models are becoming more and more complex and large amounts of data are associated with the creation and use of these 3D models. 3D meshes are widely used for 3D modeling of immersive content.

The 3D mesh may comprise several polygons describing the surface of the volumetric object. Dynamic grid sequences may require a large amount of data because there may be a large amount of time-varying information. Therefore, efficient compression techniques are needed to store and transmit such content.

Whereas, previously, grid compression standards were established: interpolation coding (Interpolation Coding, IC for short), MESHGRID, and Frame-based ANIMATED MESH Compression (FAMC for short) to handle dynamic meshes with constant connectivity, geometry, and vertex attributes over time. However, these standards do not take into account time-varying attribute maps and connectivity information.

Furthermore, it is also challenging for volume acquisition techniques to generate a dynamic mesh with unchanged connectivity, especially under real-time constraints. Existing standards do not support this type of dynamic grid content.

Another example, GL transport format (GL Transmission Format, abbreviated glTF) is a standard developed by the Khronos group for efficient transport and loading of 3D scenes and models by applications. glTF aims to minimize the size of the 3D asset and the runtime processing required for unpacking. Geometric companding was performed using glTF 2.0.0 of the Google Draco technology under development to reduce the size of the glTF model and scene.

Disclosure of Invention

According to one embodiment, a method and apparatus includes computer code configured to cause one or more processors to: determining a plurality of vertices in an input mesh, the input mesh representing volumetric data of at least one three-dimensional (3D) visual content; grouping the plurality of vertices into a plurality of vertex groups, wherein grouping the respective vertices into the respective groups is based on a topological distance of the respective vertices; determining a set of filter coefficients for a plurality of vertex groups; and signaling the plurality of vertex groups and the set of filter coefficients.

According to one embodiment, a method and apparatus includes computer code configured to cause one or more processors to: receiving an encoded bitstream associated with the mesh, wherein the encoded bitstream includes information about vertices in the mesh and filter coefficients associated with the vertices; obtaining a plurality of vertex groups included in a mesh from an encoded bitstream; obtaining a set of filter coefficients for a plurality of vertex groups from the encoded bitstream; generating a reconstructed mesh based on the information about vertices in the mesh; and generating a refined reconstructed mesh using the reconstructed mesh, the plurality of vertex groups, and the set of filter coefficients for the plurality of vertex groups.

Drawings

Further features, properties and various advantages of the disclosed subject matter will become more apparent from the following detailed description and drawings, in which:

fig. 1 is a schematic diagram of a simplified block diagram of a communication system according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram of a simplified block diagram of a streaming system according to an embodiment of the present disclosure.

Fig. 3 is a schematic diagram of a simplified block diagram of a video encoder and decoder according to an embodiment of the present disclosure.

Fig. 4A-B are exemplary illustrations of UV parameterized mapping from 3D mesh segments to 2D charts according to embodiments of the present disclosure.

Fig. 5 is a flowchart of a process for encoding adaptive filter coefficients according to an embodiment of the present disclosure.

Fig. 6 is a flowchart of a process for generating a reconstructed grid using signal adaptive filter coefficients according to an embodiment of the present disclosure.

FIG. 7 is an exemplary diagram of a computer system suitable for implementing embodiments.

Detailed Description

The suggested functions discussed below may be used alone or in any order in combination. Furthermore, the present embodiments may be implemented by a processing circuit (e.g., one or more processors or one or more integrated circuits). In one example, one or more processors execute a program stored in a non-transitory computer readable medium.

Fig. 1 shows a simplified block diagram of a communication system 100 according to an embodiment of the present disclosure. Communication system 100 may include at least two terminals 102 and 103 that are connected to each other via a network 105. For unidirectional transmission of data, the first terminal 103 may encode video data at a local location for transmission over the network 105 to the other terminal 102. The second terminal 102 may receive encoded video data of another terminal from the network 105, decode the encoded data, and display the restored video data. Unidirectional data transmission may be common in applications such as media services.

Fig. 1 shows a second pair of terminals 101 and 104 for supporting bi-directional transmission of encoded video, such as may occur during a video conference. For bi-directional transmission of data, each terminal 101 and 104 may encode video data captured at a local location for transmission over the network 105 to the other terminal. Each terminal 101 and 104 may also receive encoded video data transmitted by other terminals, decode the encoded data, and display the recovered video data on a local display device.

In fig. 1, terminals 101, 102, 103, and 104 may be represented as servers, personal computers, and smart phones, but the principles of the present disclosure are not so limited. Embodiments of the present disclosure may be applied to notebook computers, tablet computers, media players, and/or dedicated video conferencing devices. Network 105 represents any number of networks that transmit encoded video data between terminals 101, 102, 103, and 104, including wired and/or wireless communication networks, for example. The communication network 105 may exchange data in circuit-switched and/or packet-switched channels. Representative networks include telecommunication networks, local area networks, wide area networks, and/or the internet. For purposes of this discussion, the structure and topology of the network 105 may be irrelevant to the operation of the present disclosure unless explained below.

Fig. 2 illustrates the application of video encoders and decoders in a streaming environment. The disclosed subject matter is equally applicable to other video applications including, for example, video conferencing, digital television, storage compressed video on digital media such as Compact Disc, CD for short, digital video Disc (DIGITAL VERSATILE DISC DVD for short), flash memory storage, and the like.

The streaming media system may include a capture subsystem 203, which may include a video source 201, such as a digital camera, for creating an uncompressed video sample stream 213. The sample stream 213 may be emphasized as a high data volume compared to the encoded video bit stream and may be processed by an encoder 202 coupled to the video source 201. Encoder 202 may include hardware, software, or a combination thereof to enable or implement various aspects of the disclosed subject matter described in greater detail below. The encoded video bitstream 204 has a smaller amount of data than the sample stream and can be stored on the streaming server 205 for future use. One or more streaming clients 212 and 207 may access streaming server 205 to retrieve copies 208 and 206 of encoded video bitstream 204. The client 212 may include a video decoder 211 that may decode an incoming copy of the encoded video bitstream 208 and create an outgoing video sample stream 210 that may be rendered on a display 209 or other rendering device (not depicted). In some streaming media systems, video bitstreams 204, 206, and 208 may be encoded according to some video encoding/compression standards. Examples of these criteria are indicated above and further described herein.

According to an exemplary embodiment, which is described further below, the term "mesh" denotes a combination of one or more polygons, which describe the surface of a volumetric object. Each polygon is defined by its vertices in three-dimensional space and information on how the vertices are connected (referred to as connectivity information). Optionally, vertex attributes (e.g., color, normal, etc.) may be associated with the mesh vertices. The grid and two-dimensional attribute map may also be parameterized using the mapping information to associate attributes with the grid surface. Such a mapping may be described by a set of parametric coordinates (referred to as UV coordinates or texture coordinates) associated with the mesh vertices. The two-dimensional attribute map is used to store high resolution attribute information, such as texture, normals, displacements, and the like. According to an exemplary embodiment, such information may be used for various purposes such as texture mapping and rendering.

However, dynamic grid sequences may require a large amount of data because they may contain a large amount of time-varying information. For example, in a "static mesh" or "static mesh sequence" the information of the mesh does not change from frame to frame, whereas a "dynamic mesh" or "dynamic mesh sequence" indicates the movement of vertices represented by the mesh from frame to frame. Therefore, efficient compression techniques are needed to store and transmit such content. Previously, the dynamic picture expert group (Moving Picture Experts Group, MPEG for short) developed grid compression standards IC, MESSGRID, and FAMC to handle dynamic grids that are connectivity-invariant, geometry, and vertex attributes over time. However, these standards do not take into account time-varying attribute maps and connectivity information. Digital content creation (Digital Content Creation, DCC for short) tools typically generate such dynamic grids. Correspondingly, it is challenging for volume acquisition techniques to generate a dynamic mesh with unchanged connectivity, especially under real-time constraints. Existing standards do not support such content. In accordance with the exemplary embodiments herein, various aspects of a new mesh compression standard are described to directly handle lossy and lossless compression of dynamic meshes with time-varying connectivity information and optionally time-varying attribute maps for various applications such as real-time communications, storage, freeview video, augmented Reality (AR) Augmented Reality, and Virtual Reality (VR). In addition, functions such as random access and scalable/progressive coding are also considered.

FIG. 3 illustrates an example framework 300 for dynamic mesh compression, e.g., a two-dimensional atlas sampling-based approach. Each frame of the input grid 301 may be pre-processed through a series of operations, such as tracking, re-gridding, parameterization, voxel. It is noted that these operations may be encoder-only, that is, may not be part of the decoding process, and that this possibility may be signaled by a flag in the metadata, e.g. 0 for encoder only and 1 for the other. Subsequently, a grid with a two-dimensional UV atlas 302 may be obtained, wherein each vertex of the grid has one or more associated UV coordinates in the two-dimensional atlas. The grid may then be converted to multiple maps, including geometric maps and attribute maps, by sampling on a two-dimensional atlas. These two-dimensional maps may then be encoded by video/image codecs, e.g., HEVC, VVC, AV, AVS3, etc. On the decoder 303 side, the grid may be reconstructed from the decoded two-dimensional map. Any post-processing and filtering may also be applied to the reconstructed mesh 304. Note that for the purpose of three-dimensional mesh reconstruction, other metadata may be signaled to the decoder side. Note that the graph boundary information (including uv and xyz coordinates) of the boundary vertices may be predicted, quantized, and entropy encoded in the bitstream. The quantization step size may be configured at the encoder side to trade off between quality and bit rate.

In some implementations, the three-dimensional mesh may be divided into a plurality of segments (or patches/charts), and according to example embodiments, one or more three-dimensional mesh segments may be considered a "three-dimensional mesh". Each segment is composed of a set of connected vertices associated with its geometry, attributes, and connectivity information. As shown in example 400 of volumetric data in fig. 4A, a UV parameterization process 402 that maps from three-dimensional mesh segments to a two-dimensional graph (e.g., the two-dimensional UV atlas 302 block described above) maps one or more mesh segments 401 onto a two-dimensional graph 403 in a two-dimensional UV atlas 404. Each vertex (v _n) in the mesh segment assigns two-dimensional UV coordinates in the two-dimensional UV atlas. Note that the vertex (v _n) in the two-dimensional graph is identical to the vertex in the three-dimensional graph, forming a connection assembly. The geometry, properties, and connectivity information of each vertex can also be inherited from its three-dimensional counterpart. For example, information that vertex v4 is directly connected to vertices v0, v5, v1, and v3 may be displayed, and information of each of the other vertices may be displayed as well. Furthermore, according to an exemplary embodiment, such a two-dimensional texture grid may further indicate information, e.g. color information, patch by patch, e.g. v2, v5, v3, per triangle, as one "patch".

For example, further describing features of example 400 of fig. 4A, reference is made to example 450 of fig. 4B, in which a three-dimensional mesh segment 451 may also be mapped to multiple independent two-dimensional charts 451 and 452. In this case, the vertices in three dimensions may correspond to multiple vertices in a two-dimensional UV map set. As shown in FIG. 4B, in a two-dimensional UV atlas, the same three-dimensional mesh segment is mapped into multiple two-dimensional charts, rather than a single chart as in FIG. 4A. For example, three-dimensional vertices v1 and v4 have two-dimensional corresponding points v1, v1 'and v4, v4', respectively. Thus, a typical two-dimensional UV atlas of a three-dimensional mesh may be composed of multiple charts, where each chart may contain multiple (typically greater than or equal to 3) vertices associated with its three-dimensional geometry, attributes, and connectivity information.

Fig. 4B shows an example 453 illustrating derived triangulation in a graph with boundary vertex B ₀、B₁、B₂、B₃、B₄、B₅、B₆、B₇. After obtaining this information, any triangulation method may be used to establish connections between vertices (including boundary vertices and sample vertices). For example, for each vertex, the nearest two vertices are found. Or for all vertices, triangles are continuously generated until a certain number of minimum triangles are reached. As shown in example 453, there are various regularly shaped repeating triangles and various irregularly shaped triangles that are generally closest to the boundary vertices, have their own unique dimensions, and may or may not be shared with other triangles. The connectivity information may also be reconstructed by explicit signaling. If the polygon cannot be restored by the implicit rule, the encoder may signal connectivity information in the bitstream according to an exemplary embodiment.

Boundary vertices B ₀、B₁、B₂、B₃、B₄、B₅、B₆ and B ₇ are defined in two-dimensional UV space. The boundary edge may be determined by checking whether the edge appears in only one triangle. According to an exemplary embodiment, the following information of boundary vertices is very important, should be transmitted in a bitstream in a signaling form: geometric information, for example, three-dimensional XYZ coordinates (even if currently in the form of two-dimensional UV parameters) and two-dimensional UV coordinates.

For the case where three-dimensional boundary vertices correspond to multiple vertices in a two-dimensional UV atlas, the mapping from three-dimensional XUZ to two-dimensional UV may be one-to-many, as shown in fig. 4B. Thus, a UV to XYZ (or UV2 XYZ) index may be used to indicate the mapping function. UV2XYZ may be a one-dimensional index array, corresponding each two-dimensional UV vertex to a three-dimensional XYZ vertex.

As described above, dynamic mesh sequences may require a large amount of data because they may contain a large amount of time-varying information. The sample-based approach used in the related art may introduce some artifacts on the reconstructed mesh geometry, which may reduce visual quality. Therefore, there is a need to develop efficient algorithms to reduce such artifacts and to improve the quality of the reconstructed mesh.

Embodiments of the present disclosure may be used alone or in combination in any order. Furthermore, each of the methods (or embodiments), encoder, and decoder may be implemented by a processing circuit (e.g., one or more processors or one or more integrated circuits). In one example, one or more processors execute a program stored in a non-transitory computer readable medium.

Embodiments of the present disclosure relate to various methods for filtering reconstructed geometries in mesh compression. Those skilled in the art will appreciate that the methods disclosed herein may be applied alone or in any combination. It should also be noted that these methods can be applied to static grids where there is only one grid frame, or the grid content does not change over time.

An advantage of the adaptive geometric filtering disclosed herein is to improve the quality of the reconstructed mesh by signaling the set of filter coefficients. The filter coefficients may be trained on the encoder side based on the reconstructed mesh and the corresponding reference mesh, where each set of filter coefficients may be trained on a set of similar vertices. The filter coefficients may then be applied to each vertex and its neighbors.

More specifically, according to an embodiment, let V ^rec denote the vertices of the reconstructed mesh and V ^ref be the vertices of the corresponding reference mesh. Note that the reconstructed mesh and the reference mesh have the same number of vertices (N) and the same topology, and the vertices in V ^rec and V ^ref have a one-to-one correspondence. The reference grid may be the original grid or a warped grid from the original grid, depending on the encoder configuration. The adaptive geometric filtering according to embodiments of the present disclosure may be applied to the encoder side as follows:

As an initial step, vertices may be classified into groups.

Let V _i,j ^rec be the jth vertex in the ith group of the reconstructed mesh, where j = 1,2, …, G _i,i＝1,2,…,N_G.N_G be the number of groups, and G _i be the number of vertices in the ith group. Similarly, let V _i,j ^ref be the jth vertex in the ith group of the corresponding reference mesh.

In the same or a next step, a set of i-th set of vertices, filter coefficients (c _i), may be calculated by minimizing some defined error function based on the vertices in the set and their neighbors.

In the next or same step, for the ith set of vertices, the calculated filter coefficients (c _i) may be quantized, optimized, and encoded in the bitstream, and dequantized by the decoder.

Let (c _i ^{^}) be the corresponding dequantized filter coefficients. Then, for the ith group of vertices, given the dequantization coefficients (c _i ^{^}), each vertex is updated.

Thus, according to one embodiment, adaptive geometric filtering may include determining a plurality of vertices in an input mesh and grouping the plurality of vertices into a plurality of vertex groups. Then, a set of filter coefficients for the plurality of vertex groups may be determined. The plurality of vertex groups and the set of filter coefficients may then be encoded and signaled. In some embodiments, they may be encoded in the same bitstream. In some implementations, the set of filter coefficients may be encoded and signaled as metadata.

The adaptive geometric filtering according to embodiments of the present disclosure may be applied to the encoder side as follows:

vertices may be classified into groups using exactly the same method as the encoder. In some embodiments, the grouping method is agreed upon by the encoder and decoder. In some implementations, the agreed-upon method may include grouping respective vertices into respective groups based on topological distances of the respective vertices.

Then, for the ith set of vertices, the filter coefficients may be decoded from the bitstream and dequantized in the same way as the encoder.

In the same or different steps, each vertex may be updated given the dequantization coefficient (c _i ^{^}) as the encoder for the ith group of vertices. Note that the same methods presented in this disclosure may be applied to a 3D point cloud, where the vertices of the mesh correspond to points in the point cloud. The same approach can also be applied to other properties of the grid (and point cloud), such as color, normal, etc.

Thus, according to one embodiment, adaptive geometric filtering applied at a decoder may include receiving an encoded bitstream associated with a mesh, wherein the encoded bitstream includes information about vertices in the mesh and filter coefficients associated with the vertices, and obtaining, from the encoded bitstream, a plurality of vertex groups included in the mesh and a set of filter coefficients for the plurality of vertex groups. The reconstructed mesh may then be constructed based on the information about the vertices in the mesh, which may be refined using the plurality of vertex groups and the set of filter coefficients for the plurality of vertex groups.

Vertex grouping

The vertex grouping may classify the vertices of the reconstructed mesh into groups based on different features, as long as the same process can be done at the decoder side.

In one embodiment, vertices may be grouped into regions based on the topology of the reconstructed mesh, where each group of vertices has approximately the same number of vertices connected by edges. In some embodiments, topological distances that measure structural distances between two graphs or points in the metric space may be used for grouping, e.g., the number of edges between two points, or euclidean distances.

In the same or another embodiment, vertices may be grouped based on various features that may be derived from the reconstructed mesh. These features may include, but are not limited to, the following: curvature of the vertex, vertex normal direction, level of detail layer of the vertex, etc.

Vertex adjacent

Depending on the implementation, there may be different methods for each vertex V _i,j ^rec to define its neighbors. Criteria defining neighbors may include, but are not limited to, topological distances, euclidean distances, and the like.

In one embodiment, only vertices directly connected to V _i,j ^rec are considered neighbors. In other words, the topological distance between V _i,j ^rec and its neighbors is 1. For example, in the same or another embodiment, all connected vertices of V _i,j ^rec with a topological distance less than the threshold may be considered neighbors of V _i,j ^rec. For example, in the same or another embodiment, all vertices with euclidean distances to V _i,j ^rec less than the threshold may be considered neighbors of V _i,j ^rec.

In some embodiments, the neighbors of V _i,j ^rec may then be reordered in ascending order by their euclidean distance to V _i,j ^rec. Note that the neighbors of V _i,j ^rec and V _i,j ^rec do not necessarily belong to the same group.

Filter and coefficient derivation

Let V _i,j ^rec be the reordered neighbor of { V _i,j,k ^rec }, where k=1, 2, …, T-1, i.e. V _i,j,k ^rec is the kth neighbor of V _i,j ^rec and T-1 is the number of neighbors. In some embodiments, and by way of example only herein, V _i,j,k ^rec＝V_i,j ^rec when k=0 in the following equation.

The filter coefficients and error functions to be minimized may be defined in a number of different ways, depending on the implementation.

Adaptive laplace filter with point-to-point loss function

In one embodiment, the adaptive laplacian filter may be applied to the ith set of vertices with a single coefficient (c _i), as follows,

Where V _i,j ^filtered indicates the vertex updated from V _i,j ^rec.

On the encoder side, the coefficient c _i can be derived by minimizing the following point-to-point error function:

This can be solved by a suitable method, for example, the least squares method.

Adaptive laplace filter with point-to-face loss function

In the same or another embodiment, the same adaptive laplace filter as in the previous embodiment may be applied, but the filter coefficients are optimized with a minimum point-to-face error function,

Wherein Q _i,j ^ref is a 4 x 4 quadratic matrix derived from the jth vertex in the ith group of the reference mesh, anIs the homogeneous vector of V _i,j ^filtered, and the expansion dimension value is 1. The minimization loss function may also be solved by a least squares method.

Adaptive wiener filter with point-to-point loss function

In the same or another embodiment, an adaptive wiener filter may be applied,

Note that for each set of vertices, the derived filter coefficients (c _i) are of length T, and they can be derived by minimizing the point-to-point error function,

This may also be solved by any suitable method, for example, least squares.

Adaptive wiener filter with point-to-face loss function

In the same or another embodiment, the same adaptive wiener filter as in the previous embodiment can be applied, but the filter coefficients can be optimized with a minimum point-to-face error function,

Wherein Q _i,j ^ref is a 4 x 4 quadratic matrix derived from the jth vertex in the ith group of the reference mesh, anIs the homogeneous vector of V _i,j ^filtered, and the expansion dimension value is 1. The minimization of the loss function may also be solved by any suitable method, such as least squares.

Thus, as disclosed herein, the encoder may use any combination of filtering functions and loss functions. In an embodiment, the encoder and decoder may choose the same filter function and loss function. In an embodiment, the encoder may signal a flag indicating whether each group uses the same filter function, the same loss function, different filter functions, or different loss functions. If the sets can use different filters or loss functions, the encoder can signal additional flags or information indicating the filter function and/or loss function of each set. In some embodiments, this signaling precedes signaling to the vertex groups entirely, but in some embodiments, the signaling precedes the corresponding vertex groups.

Filter coefficient optimization

The filter coefficients may be fine-tuned prior to encoding based on the estimated rate-distortion performance. The distortion term (D) may be estimated by a point-to-point or point-to-face loss function described herein. The rate term (R) may be estimated from a distribution of coefficients.

In one embodiment, the filter may be disabled for the ith set of vertices if the distortion term is greater than a threshold, where the threshold may be a function of the quantization step size of the filter coefficients.

In the same or another embodiment, if d+λ·r is greater than the threshold, the filter may be disabled for the ith set of vertices, where λ is a compromise parameter, which may be a function of the quantization step size of the filter coefficients.

Filter coefficient coding

The quantized and optimized filter coefficients may then be entropy encoded in the bitstream in different ways.

In one embodiment, a binary flag is first encoded to indicate whether the filter is applied to the ith group. If the flag is true, encoding the coefficient; otherwise, the coefficients are not encoded. In arithmetic coding, this binary flag may or may not be context coded.

For each coefficient value, the coding may be performed using fixed length coding, exp-Golomb coding, or the like.

In one embodiment, the following coefficient values may be encoded. A binary flag is encoded to indicate whether the value is equal to 0. If the flag is false, another binary flag is encoded to indicate whether the value is equal to 1. If the flag is also false, the value minus 2 is encoded by Exp-Golomb encoding.

In some implementations, some coefficients may be predicted from other coefficients prior to encoding. For example, if an adaptive wiener filter is applied, the first coefficient may be predicted from the other coefficients.

Fig. 5 is a flow chart of a process 500 of mesh compression according to an embodiment of the present disclosure.

At operation 505, a plurality of vertices in an input mesh may be determined. The input mesh may represent volumetric data of at least one three-dimensional (3D) visual content.

At operation 510, a plurality of vertices may be grouped into a plurality of vertex groups. In an embodiment, grouping respective vertices into respective groups may be based on a topological distance of the respective vertices.

In an embodiment, grouping the respective vertices based on the topological distance may include determining neighboring vertices of the respective vertex, wherein the neighboring vertices are vertices connected to the respective vertex by at least one edge; and grouping the respective vertices into the first group such that the vertices in the first group are connected to the same number of vertices by edges.

In an embodiment, determining adjacent vertices of the respective vertices may include determining adjacent vertices directly connected to the respective edges in the input mesh; or determining a plurality of edges connected to adjacent vertices of the respective vertex, the number of the plurality of edges being less than or equal to a first threshold.

At operation 515, a set of filter coefficients for the plurality of vertex groups may be determined.

In an embodiment, determining the set of filter coefficients may include determining the set of filter coefficients based on the first loss function and the first filter function, wherein the set of filter coefficients includes one or more filter coefficients for each of the plurality of vertex groups.

In an embodiment, the first loss function may comprise one of a point-to-point loss function or a point-to-point loss function. In the same or different embodiments, the first filter function may comprise one of a laplacian filter or a wiener filter.

In some implementations, where, after determining the set of filter coefficients, process 500 may include disabling a first filter coefficient of a first set of the plurality of vertex groups based on the first filter coefficient having distortion above a distortion threshold.

At operation 520, a plurality of vertex groups and a set of filter coefficients may be signaled.

In some implementations, operation 520 may include signaling a binary flag for each of the plurality of vertex groups, the binary flag indicating whether the filter coefficients of the respective group are enabled; and signaling one or more coefficients associated with the respective group based on the binary flag indicating that the respective group of filter coefficients is enabled. Operation 520 may further comprise signaling a second flag indicating whether the same first filter function is used to determine the set of filter coefficients for each of the plurality of vertex groups. In some implementations, operation 520 may further include signaling a third flag indicating whether the same first loss function is used to determine the set of filter coefficients for each of the plurality of vertex groups.

Fig. 6 is a flow chart of a process 600 of mesh reconstruction according to an embodiment of the present disclosure.

In operation 605, an encoded bitstream associated with a mesh may be received, wherein the encoded bitstream includes information of vertices in the mesh and filter coefficients associated with the vertices. In operation 610, a plurality of vertex groups included in a mesh may be obtained from an encoded bitstream. In operation 615, a set of filter coefficients for a plurality of vertex groups may be obtained from the encoded bitstream. In operation 620, a reconstructed mesh may be generated based on the information of the vertices in the mesh. In operation 625, a refined reconstructed mesh is generated using the reconstructed mesh, the plurality of vertex groups, and the set of filter coefficients for the plurality of vertex groups.

The proposed methods may be used alone or in any order in combination. The proposed method can be used for any polygonal mesh, although only triangular meshes are used in the presentation of the various embodiments. As described above, it will be assumed that the input grid may contain one or more instances, that the sub-grid is part of the input grid having one or more instances, and that the multiple instances may be grouped to form the sub-grid.

The techniques described above may be implemented as computer software using computer readable instructions and stored physically in one or more computer readable media or by one or more hardware processors of a particular configuration. For example, FIG. 7 illustrates a computer system 700 suitable for implementing certain embodiments of the disclosed subject matter.

The computer software may be encoded using any suitable machine code or computer language that may be subject to assembly, compilation, linking, or similar mechanisms to create code comprising instructions that may be executed directly by a computer central processing unit (Central Processing Unit, CPU for short), graphics processing unit (Graphics Processing Unit, GPU for short), or the like, either directly or by way of interpretation, microcode execution, or the like.

The instructions may be executed on various types of computers or components thereof, including, for example, personal computers, tablet computers, servers, smartphones, gaming devices, internet of things devices, and the like.

The components of computer system 700 shown in fig. 7 are exemplary in nature and are not intended to suggest any limitation as to the scope of use or functionality of the computer software implementing embodiments of the present disclosure. Nor should the configuration of components be construed as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary embodiment of computer system 700.

The computer system 700 may include some human interface input devices. Such a human interface input device may be responsive to input by one or more human users via, for example, tactile input (e.g., key strokes, swipes, data glove movements), audio input (e.g., voice, clapping), visual input (e.g., gestures), olfactory input (not shown). The human interface device may also be used to capture certain media that is not necessarily directly related to the conscious input of a person, such as audio (e.g., speech, music, ambient sound), images (e.g., scanned images, photo images obtained from still image cameras), video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).

The human interface input device may include one or more of the following (only one is depicted per each): keyboard 701, mouse 702, trackpad 703, touch screen 710, joystick 705, microphone 706, scanner 708, camera 707.

The computer system 700 may also include some human interface output devices. Such a human interface output device may stimulate the perception of one or more human users by, for example, tactile output, sound, light, and smell/taste. Such human interface output devices may include haptic output devices (e.g., haptic feedback via touch screen 710 or joystick 705, but may or may not have haptic feedback devices that do not serve as input devices), audio output devices (e.g., speakers 709, headphones (not shown)), visual output devices (e.g., screen 710, including Cathode Ray Tube (CRT) screens, liquid CRYSTAL DISPLAY (LCD) screens, plasma screens, organic Light-Emitting Diode (OLED) screens, each with or without touch screen input capabilities, some of which can output two-dimensional visual output or more than three-dimensional output by means such as stereo output, virtual reality glasses (not shown), holographic displays, and smoke boxes (not shown), and printers (not shown).

The computer system 700 may also include human-accessible Memory devices and their associated media, such as optical media including CD/DVD ROM/RW 720 with CD/DVD 711 or similar media, thumb drive 722, removable hard disk drive or solid state drive 723, conventional magnetic media such as magnetic tapes and floppy disks (not shown), special purpose Read-Only Memory (ROM)/Application-SPECIFIC INTEGRATED Circuit (ASIC)/programmable logic device (Programmable Logic Device PLD) devices, and the like.

It should also be appreciated by those skilled in the art that the term "computer-readable medium" as used in connection with the presently disclosed subject matter does not include transmission media, carrier waves or other transitory signals.

Computer system 700 may also include an interface 799 to one or more communication networks 798. The network 798 may be wireless, wired, or fiber optic. The network 798 may also be a local area network, wide area network, metropolitan area network, on-board and industrial network, real-time network, delay tolerant network, or the like. Examples of networks 798 include local area networks such as ethernet, wireless local area networks (Local Area Network, LAN for short), cellular networks including global system for mobile communications (Global System for Mobile Communications, GSM for short), 3G, 4G, 5G, long-Term Evolution (LTE for short), etc., television wired or wireless wide area digital networks including cable television, satellite television, and terrestrial broadcast television, vehicular and industrial networks including controller area network buses (Controller Area Network Bus, CANBus for short), and the like. Some networks 798 typically require external network interface adapters that connect to some universal data port or universal serial bus (Universal Serial Bus, simply USB) port of peripheral bus 750 and peripheral bus 751 (e.g., computer system 700); other interfaces are typically integrated into the core of computer system 700 (e.g., an ethernet interface in a PC computer system or a cellular network interface in a smart phone computer system) by connecting to a system bus as described below. Using any of these networks 798, computer system 700 may communicate with other entities. Such communications may be unidirectional, receive-only (e.g., broadcast television), unidirectional, send-only (e.g., CANbus to some CANbus devices), or bidirectional, e.g., to other computer systems using a local or wide area digital network. As described above, certain protocols and protocol stacks may be used on each of these networks and network interfaces.

The aforementioned human interface devices, human accessible storage devices, and network interfaces may be attached to core 740 of computer system 700.

The core 740 may include one or more Central Processing Units (CPUs) 741, graphics Processing Units (GPUs) 742, graphics adapters 717, dedicated Programmable processing units in the form of Field-Programmable gate arrays (FPGAs) 743, hardware accelerators 744 for specific tasks, and the like. These devices, along with Read Only Memory (ROM) 745, random access memory 746, internal mass storage 747 such as internal non-user accessible hard disk drives, solid state disk (Solid state STATE DRIVE, SSD for short), etc., may be connected by system bus 748. In some computer systems, the system bus 748 may be accessed in the form of one or more physical plugs to allow for expansion of additional CPUs, GPUs, and the like. Peripheral devices may be connected to the system bus 748 of the core either directly or through a peripheral bus 749. The architecture of the peripheral bus includes peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECT, PCI for short), USB, and the like.

CPU 741, GPU 742, FPGA743, and accelerator 744 may execute certain instructions that, in combination, may constitute the computer code described above. The computer code may be stored in ROM 745 or RAM 746. The transition data may also be stored in random access memory 746 (Random Access Memory, RAM for short), while the permanent data may be stored in, for example, internal mass storage 747. Fast storage and retrieval of any storage device may be achieved through the use of a cache memory, which may be closely associated with one or more CPUs 741, GPUs 742, mass storage 747, ROMs 745, RAMs 746, and the like.

The computer readable medium may have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present disclosure, or they may be of the kind well known and available to those having skill in the computer software arts.

By way of example, and not limitation, a computer system having architecture 700, and in particular core 740, may perform functions as a result of a processor (including CPU, GPU, FPGA, accelerators, etc.) executing software contained in one or more tangible computer-readable media. Such computer readable media may be media associated with a mass storage device accessible to the user as described above, as well as some memory having a non-transitory core 740, such as core internal mass storage 747 or ROM 745. Software implementing various embodiments of the present disclosure may be stored in such devices and executed by core 740. The computer-readable medium may include one or more memory devices or chips, according to particular needs. The software may cause the core 740, and in particular the processor therein (including CPU, GPU, FPGA, etc.), to perform certain processes or certain portions of certain processes described herein, including defining data structures stored in RAM 746 and modifying such data structures according to the software-defined processes. In addition or alternatively, the computer system may provide functionality as a result of logic (e.g., accelerator 744) hardwired or otherwise contained in circuitry, which may operate in place of or in conjunction with software to perform certain processes or certain portions of certain processes described herein. References to software may include logic, and vice versa, where appropriate. References to computer readable media may include circuitry storing software for execution (e.g., integrated circuits (INTEGRATED CIRCUIT, abbreviated ICs)), circuitry containing logic for execution, or both, where appropriate. The present disclosure encompasses any suitable combination of hardware and software.

While this disclosure has described several exemplary embodiments, there are alterations, permutations, and various substitute equivalents, which fall within the scope of this disclosure. It will thus be appreciated that those skilled in the art will be able to devise numerous systems and methods which, although not explicitly shown or described herein, embody the principles of the disclosure and are thus within the spirit and scope of the present disclosure.

Claims

1. A method for grid compression, the method performed by at least one processor, the method comprising:

determining a plurality of vertices in an input mesh, the input mesh representing volumetric data of at least one three-dimensional 3D visual content;

grouping the plurality of vertices into a plurality of vertex groups, wherein grouping respective vertices into respective groups is based on a topological distance of the respective vertices;

determining a set of filter coefficients for the plurality of vertex groups; and

Signaling the plurality of vertex groups and the set of filter coefficients.

2. The method of claim 1, wherein grouping the respective vertices based on the topological distance comprises:

Determining adjacent vertices of the respective vertex, wherein the adjacent vertex is a vertex connected to the respective vertex by at least one edge; and

The respective vertices are grouped into a first group such that the vertices in the first group are connected to the same number of vertices by edges.

3. The method of claim 2, wherein determining the neighboring vertex of the respective vertex comprises one of:

Determining the adjacent vertices directly connected to respective edges in the input mesh; or (b)

A plurality of edges connected to the adjacent vertices of the respective vertex are determined, the number of the plurality of edges being less than or equal to a first threshold.

4. The method of claim 1, wherein determining the set of filter coefficients comprises:

determining the set of filter coefficients based on a first loss function and a first filter function;

wherein the set of filter coefficients includes one or more filter coefficients for each of the plurality of vertex groups; and

Wherein each filter coefficient in the set of filter coefficients is associated with one or more vertices in the plurality of vertex groups.

5. The method of claim 4, wherein the first loss function comprises one of a point-to-point loss function or a point-to-face loss function; and

Wherein the first filter function comprises one of a laplacian filter or a wiener filter.

6. The method of claim 1, wherein after determining the set of filter coefficients, the method further comprises:

disabling a first filter coefficient of a first set of the plurality of vertex groups based on the first filter coefficient having distortion above a distortion threshold.

7. The method of claim 1, wherein the signaling the plurality of vertex groups and the filter coefficients comprises:

Signaling a binary flag for each of the plurality of vertex groups, the binary flag indicating whether filter coefficients in the respective group are enabled; and

One or more filter coefficients associated with the respective set are signaled based on the binary flag indicating the respective set of filter coefficients is enabled.

8. The method of claim 7, wherein the method further comprises:

a second flag is signaled, the second flag being used to indicate whether the same first filter function is used to determine the set of filter coefficients for each of the plurality of vertex groups.

9. The method of claim 7, wherein the method further comprises:

A third flag is signaled, the third flag being used to indicate whether the same first loss function is used to determine the set of filter coefficients for each of the plurality of vertex groups.

10. An apparatus for an adaptive geometry filter for mesh compression, the apparatus comprising:

at least one memory configured to store program code; and

At least one processor configured to read the program code and operate in accordance with instructions of the program code, the program code comprising:

a first determination code configured to cause the at least one processor to determine more than one vertex in an input mesh, the input mesh representing volumetric data of at least one three-dimensional (3D) visual content;

a first grouping code configured to cause the at least one processor to group the one or more vertices into one or more vertex groups, wherein the grouping of the respective vertices into respective groups is based on a topological distance of the respective vertices;

a second determining code configured to cause the at least one processor to determine a set of filter coefficients for the more than one vertex group; and

A first signaling code configured to cause the at least one processor to signal the set of filter coefficients and the one or more vertex groups.

11. The apparatus of claim 10, wherein the first group code further comprises:

third determining code configured to cause the at least one processor to determine neighboring vertices of the respective vertex, wherein neighboring vertices are vertices connected to the respective vertex by at least one edge; and

And second grouping code configured to cause the at least one processor to group the respective vertices into a first group such that vertices in the first group are connected to the same number of vertices by edges.

12. The apparatus of claim 11, wherein the third determination code further comprises one of:

Fourth determining code configured to cause the at least one processor to determine the adjacent vertices directly connected to the respective edge in the input mesh; or (b)

Fifth determining code configured to cause the at least one processor to determine a plurality of edges connected to the adjacent vertex of the respective vertex, the plurality of edges being less than or equal to a first threshold.

13. The apparatus of claim 10, wherein the second determination code further comprises:

A sixth determining code configured to cause the at least one processor to determine the set of filter coefficients based on the first loss function and the first filter function,

Wherein the set of filter coefficients includes one or more filter coefficients for each of the more than one vertex groups, and

Wherein each coefficient in the set of filter coefficients is associated with one or more vertices in the more than one vertex set.

14. The apparatus of claim 10, wherein after determining the set of filter coefficients, the program code further comprises:

disabling code configured to cause the at least one processor to disable a first filter coefficient of a first set of the more than one vertex groups based on the first filter coefficient having distortion above a distortion threshold.

15. The method of claim 10, wherein the first signaling code further comprises:

A second signaling code configured to cause the at least one processor to signal a binary flag for each of the more than one vertex groups, the binary flag indicating whether the filter coefficients of the respective group are enabled; and

A third signaling code configured to cause the at least one processor to signal one or more coefficients associated with the respective group based on the binary flag indicating that the respective group of filter coefficients is enabled.

16. The apparatus of claim 10, wherein the program code further comprises:

Fourth, a second flag is signaled, the second flag indicating whether the same first filter function is used to determine the set of filter coefficients for each of the more than one vertex groups.

17. The method of claim 15, wherein the program code further comprises:

A fifth signaling code configured to cause the at least one processor to signal a third flag indicating whether the same first loss function is used to determine the set of filter coefficients for each of the more than one vertex groups.

18. The apparatus of claim 15, wherein the method further comprises:

a sixth signaling code configured to cause the at least one processor to signal a third flag indicating whether the same first loss function is used to determine the set of filter coefficients for each of the more than one vertex groups.

19. A non-transitory computer-readable medium storing instructions, the instructions comprising: one or more instructions that, when executed by one or more processors of a device for mesh compression of a geometric filter, cause the one or more processors to:

determining more than one vertex in an input mesh, the input mesh representing volumetric data of at least one three-dimensional 3D visual content;

Grouping the more than one vertex into more than one vertex group, wherein the grouping the respective vertex into a respective group is based on a topological distance of the respective vertex;

determining a set of filter coefficients for the more than one vertex set; and

The set of more than one vertex groups and the filter coefficients are signaled.

20. The non-transitory computer-readable medium of claim 19, wherein the grouping the respective vertices based on the topological distance comprises:

Determining adjacent vertices of the respective vertex, wherein adjacent vertices are vertices connected to the respective vertex by at least one edge; and