CN118505909A

CN118505909A - Map-assisted incomplete cloud completion method and system

Info

Publication number: CN118505909A
Application number: CN202410957907.0A
Authority: CN
Inventors: 徐子钦; 李基拓; 孙越; 陆国栋
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2024-07-17
Filing date: 2024-07-17
Publication date: 2024-08-16
Anticipated expiration: 2044-07-17
Also published as: CN118505909B

Abstract

The invention discloses a method and a system for supplementing residual point cloud assisted by a sketch, wherein the method uses the assisted sketch as a guide, combines point cloud data obtained by scanning with the sketch drawn interactively, effectively combines information of two modes of the point cloud and the sketch in a local potential space, and outputs three-dimensional point cloud data with more complete geometric information. The invention selects weak supervision setting, assists the sketch to provide supervision signals for the training process by using the micro-renderers on the complete point cloud to measure the fidelity in the image space, realizes multi-mode information fusion by the sketch information and the information of the incomplete point cloud, and can generate the complete point cloud which is more reliable and more in line with the user intention.

Description

Map-assisted incomplete cloud completion method and system

Technical Field

The invention relates to the technical fields of computer vision technology and three-dimensional point cloud model completion, in particular to a method and a system for map-assisted residual point cloud completion.

Background

3D data is used in many different fields including autopilot, robotics, etc. The point cloud has a very uniform structure, avoiding composition irregularities and complexity. In practical applications, however, the collected point cloud data is often incomplete due to occlusion of objects, differences in reflectivity of the target surface material, and limitations in resolution and viewing angle of the vision sensor. The generated geometrical and semantic information loss influences the subsequent 3D task, so incomplete data are used for complementing the complete point cloud, the original shape of the point cloud is restored, and the point cloud has important significance for the downstream task.

Over the years, researchers have tried many ways to solve this problem in the field of deep learning. Early attempts at point cloud completion attempted to migrate a mature method from a 2D completion task to a 3D point cloud through voxelization and three-dimensional convolution. Such computation costs are high, and until Pointnet and Pointnet ++ appear, direct processing of three-dimensional coordinates is the main stream of three-dimensional analysis based on point clouds, and the architecture of encoder and decoder is gradually relied on to complement the incomplete point clouds.

However, most of the existing point cloud completion methods are based on single-mode information, the shape prior is used for directly deducing the defect, and because the single-mode defect point cloud information is limited, large uncertainty exists when the point cloud is completed, and the inherent sparsity of the point cloud data causes that the blank part and the defect part of the model are difficult to distinguish. Human beings are very good at understanding two-dimensional and three-dimensional models, and judge incomplete parts of point clouds through visual concepts, and sketches are convenient, quick and easy-to-acquire media for expressing interactive intention, and the information of the incomplete parts can be well supplemented through sketches drawn by users. Therefore, we have devised a sketch-aided point cloud completion method that enables multi-modal input.

Disclosure of Invention

The invention aims to provide a map-assisted incomplete point cloud completion method aiming at the defects of the prior art, and aims to improve the defects of the current single-mode point cloud model and directly deducing the incomplete defects by using shape prior. The user expresses the complement intention by taking the sketch as a medium, the network acquires the key information of the missing point cloud from the sketch, and the complement of the missing point cloud is realized through an effective cross-modal and cross-layer fusion framework.

The aim of the invention is realized by the following technical scheme: a map-assisted incomplete point cloud completion method comprises the following steps:

(2) The residual point cloud and the sketch are respectively input into encoders of different modes, and coding features of the two modes, namely sketch features and point cloud features, are extracted;

(3) Fusing the coding features of the two modes obtained in the step (2);

(4) And (3) learning and reconstructing a complete point cloud, and decoding the fused features by using a decoder which simultaneously maintains global and local features to complete the point cloud.

Further, the step (1) is specifically to obtain sketch input of a user, wherein the sketch can be collected through hand drawing of the user; and displaying the view of the current incomplete cloud model in real time, and enabling a user to directly sketch and outline the incomplete cloud which is hoped to be completed on the view.

Further, in the step (2), the residual point cloud is input into a DGCNN encoder according to the different data forms, and the hand-drawn sketch is input into a ResNet encoder for feature extraction.

Specifically, the feature extraction uses two feature extractors specific to different modes, one is used for capturing local features of a sketch and summarized as a pixel N _s, and the other is used for capturing local features of a point cloud and summarized as a point N _x; the ResNet is used as an encoder of a sketch to extract the characteristics, so that the network has higher convergence speed, and the characteristic extraction can be ensured; representing partial point cloud inputThe sketch input is expressed asThe complete point cloud is expressed asThe point cloud complement to be performed is to give incomplete point cloud and sketch to predict a complete point cloud; The point cloud encoder extracts features from the partial shape X, maintains the locality of the features, adopts DGCNN frames, and the frames are composed of a series of graph convolution layers through interleaving pooling operation, so that the cardinality of the point cloud is reduced;；

wherein, Is thatConstructing a partial graph through a point and surrounding adjacent points through the coded features, extracting convolution of each edge in the graph, and obtaining the features of the center point through a weighted average method; wherein ≡represents a channel symmetric aggregation operation, h _Θ is a nonlinear learning function whose result is taken as the midpointIs characterized by (2); through pooling operation, the receiving domain can be enlarged, more global information is contained, and meanwhile, the complexity of cross attention operation of subsequent fusion of two modes is reduced; encoding sketches asEncoding a incomplete point cloud as。

Further, the step (3) specifically includes fusing the sketch features and the point cloud features obtained by encoding by the encoder through a cross attention mechanism and a self attention mechanism.

Further, the fusion is performed through a cross attention mechanism and a self attention mechanism, specifically: the method comprises the steps of collecting local information from two modes through an attention mechanism for searching a corresponding relation between the characteristics of a point cloud area and a sketch area, and fusing the local information, wherein a multi-head attention mechanism of a transducer is used in an attention layer of a framework of the attention mechanism;

(6.1) in the process of using a cross attention mechanism, projecting the residual fault cloud features to form a query vector, projecting the sketch features to form a key vector and a value vector, and after three vectors exist, fusing the residual fault cloud features and the sketch features extracted to the associated area by a feature extractor by the attention mechanism to realize feature fusion among different modal inputs;

Obtaining a query vector of the point cloud through the product of the obtained point cloud characteristics and the weight vector, and respectively obtaining a key vector and a value vector of the sketch through the product of the obtained sketch characteristics and the weight vector; the softmax normalization is used again on the basis of the existing three vectors, and the features are fused;

(6.2) adding a self-attention layer after cross-attention fusion is used in the framework, and realizing the arrangement invariant transformation of the features with the global acceptance domain so as to correct the data which are not correctly acquired in the sketch; the principle of the self-attention layer is the same as that in (6.1), except that the input features are different, and the self-attention layer adopts the same mixed features to carry out vector Is calculated by the following steps;

(6.3) the framework uses a mode of combining a cross attention layer and a self attention layer to finish the fusion of the whole characteristics, so that the characteristic fusion of two mode data is realized; at the end of the whole fusion module, a special cross attention layer is used, and the information of the end and the beginning of the fusion module is combined, so that the high-level features cross and participate in the fusion of the low-level features.

Specifically, feature fusion between different modal inputs is realized in the step (6.1); the characteristic fusion expression is as follows:

；

wherein H _X and H _S are respectively the coding feature vectors of the point cloud, W is a weight matrix, Respectively a query vector, a key vector and a value vector; is the transpose of the key vector and, In order to query the dimensions of the vector,Encoding a query vector of data for the point cloud; Encoding key vectors of the data for the point cloud; Encoding a value vector of data for the point cloud; And The weight matrix of the query vector, the key vector and the value vector, respectively.

Specifically, the decoder estimates the position of the point to be complemented, and fuses the two points acquired in the mode of furthest point sampling, so that the frame focuses on the part where the point cloud is missing; uniformly sampling a sample by using a mode of sampling the furthest point, wherein an initial point of sampling the furthest point is selected as a random point, and different sampling results are ensured each time, and the distance form of sampling the furthest point is selected to use Euclidean distance to measure the absolute distance of two points in a multidimensional space;

Upsampling the feature domain by a decoder, performing feature fusion allows higher level features to be fused; the specific operation is realized by a mechanism based on attention, and the encoder provides the characteristics that Each K _n branches into；

；

Wherein the method comprises the steps ofA multi-layer perceptron with different weights for each branch, projects features into the K _n subspace, and generates self-attention weights for the resampling process,The method comprises the steps that the method is a projection matrix of a three-dimensional space, and finally, outputs of all decoder branches are connected with a part of the furthest point sampling in series to generate a complete point cloud;

；

wherein the method comprises the steps of Is a predicted complete point cloud;

the furthest point sampling is adopted, the FPS sampling points and the points estimated by the decoder are fused, the fidelity of the existing partial point cloud is maintained, and the frame only pays attention to the point cloud of the missing part; the whole system is flexibly complemented by adjusting the mixing proportion of the sampling point and the estimation point according to the requirement; loss function using L1 chamfer distance between generated shape and true value shape Performing supervision training;

；

Wherein the first term sums the operations Representative ofAny one ofTo the point ofSum of minimum distances of (2), second term sum operationThen representsAny one ofTo the point ofIs the sum of the minimum distances of (a); if the distance is large, the difference between the two groups of point clouds is large, and the distance is inversely related to the complement effect; the actually input sketch contains the supplementary information related to the point cloud, and the supplementary point cloud can be completed.

Specifically, the distance form of the furthest point sampling selects Euclidean distance; the Euclidean distance expression is as follows:

；

The Euclidean distance is a basic distance measure, the absolute distance between two points in the multidimensional space is measured, and the calculation is performed The distance between two points, whereinIs thatIs the i-th coordinate of (c).

The invention also provides a map-assisted incomplete cloud completion system, which comprises the following modules:

and the information acquisition module is used for: based on the existing incomplete point cloud, sketch input of a user is obtained, and outline information of the incomplete point cloud is interactively supplemented;

the characteristic acquisition module is used for: the residual point cloud and the sketch are respectively input into encoders of different modes, and coding features of the two modes, namely sketch features and point cloud features, are extracted;

the feature fusion module is used for fusing the coding features of the two modes obtained by the feature acquisition module;

and the point cloud complement module is used for: and (3) learning and reconstructing a complete point cloud, and decoding the fused features by using a decoder which simultaneously maintains global and local features to complete the point cloud.

The beneficial effects of the invention are as follows:

compared with the existing method for complementing the single-mode residual point cloud, the method can only complement by means of shape prior, and can not realize appointed and accurate complementation according to the mind of a user; according to the invention, through sketch analysis of the user, global structure information is obtained, and multi-mode information fusion is realized through sketch information and information of the incomplete point cloud, so that a more reliable complete point cloud which meets the user intention better is generated.

Drawings

FIG. 1 is a flow chart of a point cloud completion method provided by the invention;

FIG. 2 is a network block diagram of the point cloud completion method provided by the invention;

FIG. 3 is a block diagram of a point cloud completion method provided by the invention;

FIG. 4 is a cross-attention schematic of the present invention;

FIG. 5 is a point cloud completion effect diagram of the present invention-an aircraft class diagram;

fig. 6 is a point cloud completion effect diagram-automobile class diagram of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.

The invention provides a map-assisted residual point cloud completion method.

As shown in fig. 1, the present invention includes the following modules:

(1) Based on the existing incomplete point cloud, sketch input of a user is obtained, and outline information of the incomplete point cloud is interactively supplemented;

(3) Fusing the coding features of the two modes obtained in the step (2);

S1, acquiring sketch input of a user as shown in FIG. 2, acquiring sketch through user hand drawing, acquiring a current residual point cloud model view in real time, and directly drawing sketch contours of residual point clouds which are required to be completed on the view.

S2, as shown in FIG. 3, according to different data formats, the residual point cloud is input into an encoder of the DGCNN network frame, and the hand-drawn sketch is input into an encoder of the ResNet network frame for feature extraction.

S21, two feature extractors specific to different modes, one for capturing local features of a sketch and summarized as a small number of pixels, and the other for capturing local features of a point cloud and summarized as a small number of points; the ResNet is used as an encoder of a sketch to extract the characteristics, so that the network has higher convergence speed, and the characteristic extraction effect can be ensured;

s22, representing part of point cloud input as The sketch input is expressed asThe complete point cloud is expressed asThe point cloud complement to be performed is to give incomplete point cloud and sketch to predict a complete point cloud; The point cloud encoder extracts features from the partial shape X, maintains a certain degree of locality, adopts DGCNN frames which are composed of a series of graph convolution layers through interleaving pooling operation, and reduces the cardinality of the point cloud; （1）

wherein, Is thatThe coded characteristics are constructed by a method of constructing a partial graph through a point and surrounding adjacent points, convolution of each edge in the graph is extracted, and then the characteristics of a central point are obtained by a weighted average method; where ≡represents a channel-by-channel symmetric operation, h _Θ is a non-linear, learnable function.

S23, the result is taken as the midpointIs characterized by (2); through pooling operation, the receiving domain can be enlarged, more global information can be contained, and meanwhile, the complexity of cross attention operation of the subsequent fusion of two modes can be reduced; encoding sketches asEncoding a incomplete point cloud as。

S3, as shown in FIG. 4, fusing the acquired features, namely fusing sketch features and point cloud features obtained by encoding by an encoder through a cross attention mechanism and a self attention mechanism;

The fusion module based on the attention mechanism collects local information from two modes and needs to be fused, and the attention mechanism is very suitable for searching the corresponding relation between the characteristics of the point cloud area and the sketch area, so that the attention mechanism is used in the module of the characteristic fusion, and the multi-head attention mechanism of the Transformer is used in the attention layer of the framework;

S31, in the process of using a cross attention mechanism, the residual fault cloud features are projected to form a query tensor, the sketch features are projected to form a key vector and a value vector, and after three vectors exist, the attention mechanism fuses the residual fault cloud features and the sketch features extracted to the associated area through the feature extractor, so that feature fusion among different modal inputs is realized;

（2）

（3）

wherein H _X and H _S are respectively the coding feature vectors of the point cloud, W is a weight matrix, Respectively a query vector, a key vector and a value vector; is the transpose of the key vector and, In order to query the dimensions of the vector,Encoding a query vector of data for the point cloud; the key vector is used for encoding data for the point cloud; Encoding a value vector of data for the point cloud; And The weight matrix of the query vector, the key vector and the value vector, respectively.

S32, obtaining a query vector of the point cloud through the product of the obtained point cloud characteristics and the weight vector, and respectively obtaining a key vector and a value vector of the sketch through the product of the obtained sketch characteristics and the weight vector; the softmax normalization is used again on the basis of the existing three vectors, and the features are fused;

S33, adding a self-attention layer after cross attention fusion is used in the framework, and realizing the arrangement unchanged transformation of the features with the global acceptance domain so as to correct the data which are not correctly acquired in the sketch; the principle of the self-focusing layer is the same as that in the formulas (2) and (3), except that the input characteristics are different, and the self-focusing layer adopts the same mixed characteristics to perform Vector operation;

S34, the framework completes the fusion of the whole features by combining a cross attention layer and a self attention layer, and realizes the feature fusion of two mode data; at the end of the whole sequence, a special cross attention layer is used, and the information of the end and the beginning of the sequence is combined, so that the high-level features can cross-participate in the fusion of the low-level features, and the method has better flexibility in determining the required abstract level.

S4, decoding the features, embedding the joint features, learning and reconstructing a complete point cloud, and simultaneously maintaining the decoder of the global and local features.

S41, the encoder estimates the positions of some points, is connected to the sampling version of the point cloud of the input part in the mode of the furthest point sampling, and only estimates the missing part of the point cloud; the method adopts the mode of sampling the furthest point, ensures that the sample is uniformly sampled, and the initial point of sampling the furthest point is selected as a random point, so that the difference of sampling results each time can be ensured, and the distance form of sampling the furthest point selects Euclidean distance to measure the absolute distance of two points in a multidimensional space; the expression is as follows:

（4）

S42, up-sampling the feature domain through a decoder, wherein the potential space for performing feature fusion is more local, so as to reduce complexity and allow higher-level features to be fused; the specific operations may be implemented based on a mechanism of attention, the encoder of which provides features that areEach K _n branches into；

（5）

（6）

s43, finally, the outputs of all decoder branches are connected in series with the part of the farthest point sampling to generate a complete point cloud; the result of the complementation is shown in fig. 5 and 6;

（7）

wherein the method comprises the steps of Is a predicted complete point cloud;

the FPS is the furthest point sampling, and the FPS sampling points and the points estimated by the decoder can be connected, so that the fidelity of the existing partial point cloud can be maintained, and the scheme of estimating only the incomplete part is realized; the mixing proportion of the sampling point and the estimation point can be adjusted as required, so that the flexibility of the whole system is improved; loss function using L1 chamfer distance between generated shape and true value shape Performing supervision training;

（8）

Wherein the first term sums the operations Representative ofAny one ofTo the point ofSum of minimum distances of (2), second term sum operationThen representsAny one ofTo the point ofIs the sum of the minimum distances of (a); if the distance is larger, the difference between the two groups of point clouds is larger, and if the distance is smaller, the complementation effect is better; the problem of multi-mode complement is a solution of weak supervised learning, and actually, the sketch as input contains the supplementary information related to the point cloud, so that the point cloud can be well assisted to complement.

and the point cloud complement module is used for: and learning and reconstructing a complete point cloud, and simultaneously maintaining a decoder of global and local characteristics, and decoding the fused characteristics to complete point cloud completion.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims

1. The map-assisted incomplete point cloud completion method is characterized by comprising the following steps of:

(3) Fusing the coding features of the two modes obtained in the step (2);

2. The sketch-assisted incomplete cloud completion method according to claim 1, wherein the step (1) is specifically that sketch input of a user is obtained, and sketch can be collected through hand drawing of the user; and displaying the view of the current incomplete cloud model in real time, and enabling a user to directly sketch and outline the incomplete cloud which is hoped to be completed on the view.

3. The method for supplementing the residual point cloud with the aid of a sketch according to claim 1, wherein in the step (2), the residual point cloud is input into a DGCNN coder, and the sketch is input into a ResNet coder for feature extraction according to different data forms.

4. A sketch-assisted incomplete cloud completion method according to claim 3, wherein the feature extraction is performed by using two feature extractors specific to different modalities, one for capturing local features of the sketch, summarized as pixels N _s, and one for capturing local features of the point cloud, summarized as points N _x; the ResNet is used as an encoder of a sketch to extract the characteristics, so that the network has higher convergence speed, and the characteristic extraction can be ensured; representing partial point cloud inputThe sketch input is expressed asThe complete point cloud is expressed asThe point cloud complement to be performed is to give incomplete point cloud and sketch to predict a complete point cloud; The point cloud encoder extracts features from the partial shape X, maintains the locality of the features, adopts DGCNN frames, and the frames are composed of a series of graph convolution layers through interleaving pooling operation, so that the cardinality of the point cloud is reduced;；

5. The method for supplementing the map-assisted residual point cloud according to claim 1, wherein the step (3) specifically comprises fusing the map features and the point cloud features obtained by encoding by the encoder through a cross attention mechanism and a self attention mechanism.

6. The sketch-assisted incomplete cloud completion method according to claim 5, wherein the fusion is performed by a cross-attention mechanism and a self-attention mechanism, specifically: the method comprises the steps of collecting local information from two modes through an attention mechanism for searching a corresponding relation between the characteristics of a point cloud area and a sketch area, and fusing the local information, wherein a multi-head attention mechanism of a transducer is used in an attention layer of a framework of the attention mechanism;

7. The sketch-assisted incomplete cloud completion method according to claim 6, wherein feature fusion between different modal inputs is achieved in the step (6.1); the characteristic fusion expression is as follows:

；

8. The map-assisted incomplete cloud completion method according to claim 7, wherein the decoder estimates the positions of points to be completed, and fuses the two points acquired by the most distant point sampling method to focus on the missing part of the point cloud; uniformly sampling a sample by using a mode of sampling the furthest point, wherein an initial point of sampling the furthest point is selected as a random point, and different sampling results are ensured each time, and the distance form of sampling the furthest point is selected to use Euclidean distance to measure the absolute distance of two points in a multidimensional space;

；

wherein the method comprises the steps of Is a predicted complete point cloud;

；

9. A sketch-assisted incomplete cloud completion method according to claim 8, wherein the distance form of the furthest point sample is selected to use euclidean distance; the Euclidean distance expression is as follows:

；

10. The utility model provides a supplementary incomplete point cloud of sketch fills up system which characterized in that, this system includes following module: