[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Face Stability Analysis for Tunnels under Steady Unsaturated Seepage and Inhomogeneity Conditions
Previous Article in Journal
Enhancement of Robot Position Control for Dual-User Operation of Remote Robot System with Force Feedback
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Multi-Scale Covariance Matrix Descriptor and an Accurate Transformation Estimation for Robust Point Cloud Registration

by
Fengguang Xiong
1,2,3,*,
Yu Kong
1,
Xinhe Kuang
4,
Mingyue Hu
1,
Zhiqiang Zhang
1,
Chaofan Shen
1 and
Xie Han
1
1
School of Computer Science and Technology, North University of China, Taiyuan 030051, China
2
Shanxi Provincial Key Laboratory of Machine Vision and Virtual Reality, North University of China, Taiyuan 030051, China
3
Shanxi Visual Information Processing and Intelligent Robot Engineering Research Center, North University of China, Taiyuan 030051, China
4
Sydney Smart Technology College, Northeastern University, Qinhuangdao 066004, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(20), 9375; https://doi.org/10.3390/app14209375
Submission received: 9 September 2024 / Revised: 3 October 2024 / Accepted: 13 October 2024 / Published: 14 October 2024

Abstract

:
This paper presents a robust point cloud registration method based on a multi-scale covariance matrix descriptor and an accurate transformation estimation. Compared with state-of-the-art feature descriptors, such as FPH, 3DSC, spin image, etc., our proposed multi-scale covariance matrix descriptor is superior for dealing with registration problems in a higher noise environment since the mean operation in generating the covariance matrix can filter out most of the noise-damaged samples or outliers and also make itself robust to noise. Compared with transformation estimation, such as feature matching, clustering, ICP, RANSAC, etc., our transformation estimation is able to find a better optimal transformation between a pair of point clouds since our transformation estimation is a multi-level point cloud transformation estimator including feature matching, coarse transformation estimation based on clustering, and a fine transformation estimation based on ICP. Experiment findings reveal that our proposed feature descriptor and transformation estimation outperforms state-of-the-art feature descriptors and transformation estimation, and registration effectiveness based on our registration framework of point cloud is extremely successful in the Stanford 3D Scanning Repository, the SpaceTime dataset, and the Kinect dataset, where the Stanford 3D Scanning Repository is known for its comprehensive collection of high-quality 3D scans, and the SpaceTime dataset and the Kinect dataset are captured by a SpaceTime Stereo scanner and a low-cost Microsoft Kinect scanner, respectively.

1. Introduction

More and more fields, including 3D reconstruction [1,2], simultaneous localization and mapping (SLAM) [3], autonomous driving [4], virtual reality [5], and others, have used point clouds obtained by laser scanning or stereo matching of images. However, only a part of the 3D object’s point cloud may be obtained in a single scan due to the scanning device’s range of view limitation. If insufficient acquired point clouds are provided, several point clouds of the 3D object must be combined to measure the entire 3D object, which requires all partial point clouds to be aligned into a global coordinate frame [6]. The registration between point clouds is a necessary operation before further use of point clouds, such as segmentation, recognition, classification, retrieval of point clouds, etc.
Minimizing the distance between a source point cloud and a target point cloud is the aim of point cloud registration, which then converts the pair of point clouds into a global coordinate frame. Nevertheless, without an accurate initial orientation or position, the fine transformation matrix estimation may only converge to a local minimum rather than a global optimum during the distance-minimization process. Therefore, a coarse transformation estimation should be carried out initially to obtain the correct initial transformation parameters, which include a rotation matrix and a translation vector with six degrees of freedom. Finding geometric features and the correspondence between these features are two crucial jobs that must be completed for point clouds in any initial orientation or position during coarse registration. However, the aforementioned tasks may be hindered by some significant obstacles, such as noise and outliers, various types of occlusions, varying densities, low overlap, and so on.
In this paper, we propose a local descriptor based on a multi-scale covariance matrix for point cloud registration which is robust to noise and invariant to rigid transformation, and we also put forward an accurate transformation estimation method by multi-level estimating point cloud transformations. The following is a summary of the contributions.
(1) A multi-scale covariance matrix feature descriptor is proposed, which can describe the geometrical relationship, eigenvalue variation gradient, and surface variation gradient between a keypoint and its neighboring points. The feature descriptor can filter out most of the noise samples by the mean operation in the process of producing covariance and make it very robust to noise.
(2) A method for detecting boundary points is put forward, which can efficiently extract points on the border. By utilizing this approach, nearby keypoints may be successfully eliminated, improving the descriptiveness of the keypoint by our proposed feature descriptor.
(3) An accurate transformation estimation between a pair of point clouds is brought upon. In this method, we obtain corresponding keypoint pairs by feature matching, estimate an initial transformation using a coarse transformation estimation method based on clustering via these correspondences, and finally obtain an accurate transformation by a fine transformation estimation, such as ICP’s variants.

2. Related Work

2.1. Traditional Methods

To expand the viability and effectiveness of traditional point cloud registration, many studies have proposed some methods to tackle these challenges, including coarse registration methods and fine registration methods.
Coarse registration is a rapid estimation of an initial matrix without knowing any initial positions between the source and target point clouds. The course registration approach is always composed of four stages: keypoint detection, feature descriptor generation, and feature matching and transformation estimation. Among the preceding processes, generating feature descriptors is unquestionably the most significant, and numerous studies on feature descriptors, particularly local feature descriptors, have been presented for point cloud registration. Johnson and Hebert et al. [7] presented a spin image descriptor. Each keypoint described by a spin image is insensitive to rigid transformation, and keypoint correspondences between a source point and a target point cloud can be constructed using the spin image’s correspondences. Frome et al. [8] suggested a 3D shape context (3DSC) descriptor. As the local reference axis, they use the normal at a keypoint. After that, the spherical neighborhood is partitioned evenly along the azimuth and elevation dimensions but logarithmically along the radial dimension. The weighted number of points that fall within each bin is used to construct the 3DSC descriptor. Rusu et al. [9] offered the Point Feature Histogram (PFH) descriptor. A Darboux frame is defined initially for each pair of points in the neighborhood of a keypoint. They then compute four measures based on the angles between the normals of the points and the distance vector between them and add these measures for all pairs of points to form a 16-bin histogram. Point clouds are finally registered and aligned into a consistent global point cloud using corresponding feature point pairs. Salti et al. [10] put forward the SHOT descriptor which encodes the histograms of surface normals at various spatial positions in the reference point neighborhood. Yulan Guo et al. [11] presented the RoPS descriptor which employs the rotation and projection mechanism to encode the local surfaces. Radu Bogdan Rusu et al. suggested Fast Point Feature Histograms (FPFHs) [12] which modify PFH descriptors’ mathematical expressions and perform a rigorous analysis of their robustness and complexity for the problem of 3D registration for overlapping point cloud views. Yu Zou et al. [13] put forward BRoPH, which is generated directly from the point cloud by turning the description of the 3D point cloud into a series binarization of 2D image patches. Yuhe Zhang et al. [14] proposed KDD, which encodes the information of the whole 3D space around the feature point via kernel density estimation and provides the strategy for selecting different matching metrics for datasets with diverse levels of resolution qualities.
Fine registration is a more precise and refined registration based on coarse registration, and the essential idea is to immediately acquire the correspondence between a source point cloud and a target point cloud. The ICP (Iterative Closest Point) technique [15,16] is the most extensively used fine registration method. It is based on the quaternions-based approach and reduce point-to-point distances in overlapping areas between point clouds. However, ICP often needs adequate starting transformation settings, and its iteration process is lengthy. As a result, various ICP variations [17,18,19,20,21,22,23,24,25] have been developed to solve these issues.

2.2. Learning-Based Methods

A number of deep learning approaches, including PCRNet [26], REGTR [27], PREDATOR [28], and others, have recently been suggested for registration. Two categories can be used to categorize learning-based registration: feature learning-based methods and end-to-end learning-based approaches.
In contrast to the conventional point cloud registration techniques, feature learning-based approaches learn a robust feature correspondence search using a deep neural network. The transformation matrix is then finally computed via a one-step estimate (e.g., RANSAC) without the need for iterations. In order to identify soft correspondences between the overlap of two point clouds and extract contextual information for the purpose of learning more unique feature descriptors, PREDATOR uses an attention mechanism. Self-attention and cross-attention are used by REGTR to directly forecast a set of final point correspondences that are consistent. The goal of all these techniques is to estimate robust correspondences using the learned distinctive feature, and they all use deep learning as a tool for feature extraction. Gojcic et al. [29] propose to describe 3DSmoothNet for point cloud registration, a comprehensive approach to match 3D point clouds with a Siamese deep learning architecture with fully convolutional layers via a voxelized smoothed density value (SDV) representation. Xuyang Bai et al. [30] put forward PointDSC, which is a revolutionary deep neural network that explicitly uses spatial consistency to prune correspondences between outliers during point cloud registration.
End-to-end neural networks are used in end-to-end learning-based approaches to overcome the registration challenge. Two point clouds are the network’s input, and the transformation matrix that aligns the two point clouds is its output. The network not only can extract features of point clouds but can also estimate transformation. The network of the feature learning-based approach is distinct from the transformation estimation network of the end-to-end learning-based approach and concentrates on feature learning. After extracting global features with PointNet, PCRNet joins these features and feeds them into the MLP network for regression modification. To learn pose-invariant point-to-distribution parameter correspondences, DeepGMR [31] makes use of a neural network. The GMM optimization module then uses these correspondences to estimate the transformation matrix. For internal likelihood prediction, DGR [32] proposes a six-dimensional convolutional network design and uses a weighted Procrustes module to estimate the transformation.
However, 3D point clouds with labeled tags are commonly used in deep learning methods, but the creation of the labeled tags is time consuming, making them difficult to use for some practical applications. Exceptions include point cloud registration technology based on deep learning, which still has problems, such as inaccurate error solving, certain randomness and instability of registration results, high algorithm complexity and long computation time, and limited generalization ability. Specifically, the issues with 3DSmoonNet includes high efficiency in processing large-scale point clouds, strong hardware dependency, and low generalization ability. The shortcomings of PointDSC include severe performance dependence on initial matching, high computational complexity, etc.

3. Methodology

Our registration method includes three steps: keypoint detection with boundary removal, feature descriptor generator, and accurate transformation estimation. In the process of keypoint detection with boundary removal, we can detect keypoints from the point cloud that are not located around the boundary, allowing us to describe complete neighborhood information of keypoints and generate a good feature descriptor for each keypoint. In the process of feature descriptor generation, we use our multi-scale covariance matrix descriptor to extract local features of keypoints, making it more robust to noise and invariant to rigid transformation. During the accurate transformation estimation, we use feature matching, coarse transformation estimation based on clustering, and fine transformation estimation based on an ICP’s variant to estimate an accurate transformation between a pair of point clouds. The framework of our point cloud registration is shown in Figure 1.

3.1. Keypoint Detection with Bounday Removal

We put forward a method for keypoint detection with boundary removal that detects firstly keypoints and boundary points, respectively, and then removes keypoints close to boundary points. The keypoint detection with boundary removal makes sure detected keypoints are of high quality, which can be better matched with the keypoints on another point cloud.

3.1.1. Keypoint Detector

The goal of the keypoint detector is to choose a subset of conspicuous points with good repeatability from the surface of a point cloud. The discovered keypoints should be very robust to noise and highly invariant to rigid transformation.
Given a point cloud P = { p 1 , p 2 , , p N } R 3 and p i = [ x i , y i , z i ] T P , the mean p ¯ and covariance matrix C of P can be computed as follows:
p ¯ = 1 N i = 1 N p i
C = 1 N i = 1 N ( p i p ¯ ) ( p i p ¯ ) T
then, applying eigenvalue decomposition on the covariance matrix C yields the following results:
C V = E V
where V is the matrix of eigenvectors, E is the diagonal eigenvalues matrix of C, and each column in V corresponds to a diagonal eigenvalue in the diagonal matrix E.
Following that, using Formula (4), point cloud P is aligned with the principal axes established by V, which is known as Principal Component Analysis (PCA).
P = V ( P p )
where P is a normalized point cloud of point cloud P.
For each point p i in point cloud P , a local neighborhood Nbhd ( p i ) is used to prune from P using a sphere with a radius r (so-called support radius) centered at p i . Let X and Y stand for the x and y components of Nbhd ( p i ) as follows:
X = { x 1 , x 2 , , x l }
Y = { y 1 , y 2 , , y l }
where l is the length of Nbhd ( p i ). Then, a surface variation index ϑ is defined as follows:
ϑ = max X m i n ( X ) max Y m i n ( Y )
where the local neighborhood’s geometric variation around the point p i is reflected by the surface variation index ϑ , which is the ratio between the local neighborhood’s first two principal axes centered at p i . ϑ is 1 for a symmetric local point set, such as a plane or sphere, and more than 1 for an asymmetric local point set. The point in P that has a surface variation index higher than ε ϑ is regarded as a keypoint and is defined as follows:
ϑ > ε ϑ  

3.1.2. Boundary Point Detector

We project a point’s neighboring points onto the point’ tangent plane, as seen in Figure 2, and it can be observed that the majority of a boundary point’s surrounding points projected on the tangent plane are located on one side of the boundary point. There will, therefore, be a notch in the line connecting the projection points and the boundary point when these surrounding points are projected onto the boundary point’s tangent plane. One can ascertain if a point is a boundary point or not by measuring the angle of the notch.
The following illustrates how to decide whether a point p i in a point cloud P is a boundary point.
  • Find p i s surrounding points centered at p i on a sphere with radius r and designate them as N b h d ( p i ) .
    Nbhd ( p i ) = { p j | p j P , p j p i r }
  • Project each point p j in N b h d ( p i ) to a tangent plane centered at p i with the following formula:
    q j = p j n ·   p i p j × n
    where n is the normal of p i , and designate these projection points as Nbhd ( q j ) .
  • Find the nearest point to p i , designate it as q u , and then construct a local reference frame ( p i , u , v , w ) in which w axis is p i ’s normal n , u axis is p i q u / p i q u , and v axis is u × w .
  • Find the included angle by calculating the angle between the vector p i q j and u axis, respectively, along a clockwise manner. Then, designate the set of the included angle as S = ( α 1 , α 2 , , α k ) , where k represents the number of all the included angles.
  • Assign S = ( α 1 , α 2 , , α k ) after sorting S in ascending order and then compute the difference between adjacent included angles as follows:
    L i = α i + 1 α i
  • Find maximum difference L m a x and determine p i is a boundary point if L m a x > L t h . L t h is a threshold in this case, whose value is modifiable to determine whether a point is a boundary point or not.

3.1.3. Boundary Keypoint Removal

The process of boundary keypoints removal is depicted as follows:
  • Traverse all points in the point cloud P and detect its boundary points into a set named B o u n d a r i e , following the above boundary point detector.
  • Traverse all points in the point cloud P and detect its entire keypoints into a set named K p t s following the above keypoint detector.
  • For every keypoint k p in K p t s , find its nearest distance d m i n to a boundary point in B o u n d a r i e s and remove k p from K p t s if d m i n < 5   m r .

3.2. Feature Descriptor Generation

We put forward a multi-scale covariance matrix descriptor that gathers geometric relation, surface variation gradient, and eigenvalue variation gradient from a keypoint and its neighborhood within several support radii and encodes such information into a covariance matrix.

3.2.1. Covariance Matrix Descriptor

In probability theory and statistics, covariance is a measure of the joint variability of two random variables, and it is used as a descriptor in image processing. A set of random variables relates to a set of observable properties that may be detected from a keypoint and its neighborhood in the context given by the covariance matrix descriptor, such as normal values, curvature values, 3D coordinates, and so on. As a result, the first step of generating a covariance matrix descriptor is to construct a variable selection function p , r for a given keypoint p as shown below.
p , r = { φ p i , p i   s . t . p p i r }
where p i is a neighboring point of p, φ p i is a random variable feature vector obtained between p and each of its neighboring points, and r is the support radius of the neighborhood of p.
To make selected random variables in the feature vector robust to noise and invariant to rigid transformation, these random variables should be computed relative to the keypoint p. As a result, we define φ p i as follows:
φ p i = ( cos ( α ( p i p ) ) , cos ( β ( p i p ) ) , cos ( γ ( p i p ) ) , ϑ ( p i p ) , λ 1 ( p i p ) , λ 2 ( p i p ) , λ 3 ( p i p ) ) T
where cos ( α p i p ) represents the cosine between the segment from p to p i and the normal vector n of p; cos ( β p i p ) represents the cosine between the segment from p to p i and the normal vector n i of p i ; cos ( γ p i p ) represents the cosine between the normal vector n i of p i and the normal vector n of p; ϑ p i p represents the surface variation gradient between p and p i ; and λ 1 p i p , λ 2 p i p , and   λ 3 p i p represent the gradient of eigenvalue variation of two covariance matrices constructed by two neighborhoods centered at p and p i , respectively. It should be noted that we utilize the cosine rather than the angle of α p i p , β p i p , and   γ p i p to save computation time. From (14) to (20), the calculation formulae are provided, and the three geometric relations α p i p , β p i p , and γ p i p are shown in Figure 3.
cos ( α p i p ) = ( p i p ) ·   n i ( p i p )
cos ( β p i p ) = ( p i p ) ·   n ( p i p )
cos ( γ p i p ) = n i ·   n
ϑ p i p = ϑ p i ϑ p
λ 1 p i p = λ 1 p i λ 1 p
λ 2 p i p = λ 2 p i λ 2 p
λ 3 p i p = λ 3 p i λ 3 p
where ( · ) represents the dot product between two points and · represents the Euclidean distance between two points.
Following the definition of a variable selection function p , r for the keypoint p, the covariance matrix descriptor of a keypoint p is defined as follows:
C r p , r = 1 N   i = 1 N ( φ p i u φ p i ) ( φ p i u φ p i ) T
where N represents the number of neighboring points of p within its support radius r and u φ p i is the mean of these vectors. Our proposed feature descriptor offers a more flexible representation since it considers 3D points as samples of a distribution and, by construction, subtracts the mean of this sample distribution; therefore, in case of noise, it will be naturally attenuated.

3.2.2. Multi-Scale Covariance Matrix Descriptor

We adjust the value of the support radius r to generate covariance descriptors at various scales for a keypoint. A keypoint’s multi-scale covariance descriptor is defined as follows:
  C M p = { C r p , r ,   r { r 1 , , r s } }
Compared to a covariance matrix descriptor with a fixed radius, a multi-scale covariance descriptor for a keypoint can improve its description.

3.3. Accurate Transformation Estimation

An accurate transformation estimation includes feature matching, coarse transformation estimation, and fine transformation estimation. By feature matching, we can find coarse correspondences between a pair of point clouds, and we can obtain a coarse transformation matrix between the pair of point clouds by coarse transformation estimation. After applying the coarse transformation matrix to the source point cloud which is a transformable point cloud in the pair of point clouds, we use a fine transformation estimation to estimate a fine matrix between the transformed source point cloud and the target point cloud, which is fixed point cloud in the original pair of point clouds. Through these three processes, the transformation between a pair of point clouds will become increasingly accurate.

3.3.1. Feature Matching

The goal of feature matching is to find feature correspondences between two sets of multi-scale covariance matrix descriptors and then find keypoint correspondences between the source point cloud and the target point cloud. To establish feature correspondences, we must determine the similarity of two multi-scale covariance matrix descriptors followed by the feature correspondences between two sets of multi-scale covariance matrix descriptors.
  • Similarity between two multi-scale covariance matrix descriptors.
    The distance between two covariance matrix descriptors can be used to represent their similarity. The distance is closer, and the similarity is greater. Because the covariance matrix is a symmetric positive definite matrix, the geodesic distance between two covariance matrices can be used to measure their similarity.
    In recent years, various geodesic distance formulas have been proposed. In this paper, Jensen–Bregman LogDet Divergence [33] is employed, and it is defined as follows:
    d J B L D X ,   Y = log X + Y 2 1 2 log X Y
    where · is the matrix determinant and   log ( · ) is the natural logarithm. This metric has been proven to be more efficient in terms of speed without sacrificing application performance [28], which is why we use it to compare the similarity of two covariance matrix descriptors in this paper.
    Additionally, the similarity between two multi-scaled covariance matrix descriptors can be defined using formula (24) as follows:
    d C M 1 ,   C M 2 = 1 s   i = r 1 r s d J B L D C i 1 ,   C i 2
    where C M 1 and C M 2 correspond to two covariance matrix descriptors relating to different radius scales ranging from r 1 to r s and d J B L D C i 1 ,   C i 2 represents the similarity of two covariance matrix descriptors with a fixed scale r i . Formula (24) considers similarity under multi-scales and utilizes the mean of similarity under multi-scales as the similarity of multi-scaled covariance matrix descriptors. A covariance matrix descriptor stated in the rest of this paper refers to a multi-scale covariance matrix descriptor unless otherwise specified.
  • Correspondence between two sets of covariance matrix descriptors.
    We present a bidirectional nearest neighbor distance ratio (BNNDR) to detect correspondences from two sets of covariance matrix descriptors, which differs from existing techniques, such as the nearest neighbor (NN) and the nearest neighbor distance ratio (NNDR). The detailed matching procedure via the BNNDR is as follows.
    First, assume that D S = { d S i } and D T = { d T j } are the sets of covariance matrix descriptors from a source point P S and a target point cloud P T , respectively.
    Next, each covariance matrix descriptor d T i in D T is matched against all the covariance matrix descriptors in D S to yield the first and second nearest covariance matrix descriptors to d T i : d S j and d S j , respectively.
    d S j = m i n d S j D S d ( d T i , d S j )
    d S j = m i n d S j D S \ d S j d ( d T i , d S j )
    where D S \ d S j is the set D S , excluding the covariance matrix descriptor d S j , and then the NNDR r d i s t is defined as follows:
    r d i s t = d ( d T i , d S j ) d ( d T i , d S j )
    if the ratio r d i s t is less than a threshold ε r , ( d T i , d S j ) can be thought of as a candidate-matched feature pair, and ( k p T i , k p S j ) can be also thought of as a candidate-matched keypoint pair, with keypoint k p T i corresponding to d S i and keypoint k p S j corresponding to d S j .
Then, d S j is also matched against all the covariance matrix descriptors in D T to generate its corresponding feature descriptor d T i following the preceding procedures, and the matched keypoint pair ( k p S j , k p T i ) can be acquired by the corresponding covariance matrix descriptor pair.
Finally, ( d T i , d S j ) is a matched feature pair if k p T i is the same keypoint as k p T i ; otherwise, ( d T i , d S j ) is an unmatched feature pair.

3.3.2. Coarse Transformation Estimation

So far, a correspondence set C is constructed between P S and P T . The goal of coarse transformation estimation is to find an optimal rigid transformation T o from C . The RANSAC algorithm and its variants [29,30] are popular approaches to solving the problem. However, they face challenges due to a lack of robustness and a high time consumption. To estimate T o , we put forward a coarse transformation estimation method based on clustering. The steps are outlined below in detail.
  • Generate a T i from a correspondence C i .
    In theory, a correspondence C i is inadequate to estimate a rigid transformation between a pair of point clouds. Yet, if a correspondence is oriented with a local reference frame (LRF), estimating a rigid transformation is adequate [34]. We have constructed the LRF for each point during the keypoint detection with boundary removal, and we can utilize the LRFs to estimate transformations. The following are the formulas for estimating a rigid transformation:
    R i = L R F S j T L R F T i
    t i = k p T i k p S j R i
    where ( k p T i , k p S j ) is a matched keypoint pair, L R F S j and L R F T i are local reference frames of k p S j and k p T i , respectively, and R i and t i signify a rotation matrix and a translation vector. The advantage of correspondence over traditional RANSAC is that the computational complex is lowered from O ( n 3 ) to O ( n ) , where n is the number of correspondences.
  • Estimate an optimal rigid transformation based on clustering.
    Each rotation matrix R i is transformed to Euler angle rotations and concatenated with the corresponding translation vector t i to generate a six-dimensional vector before clustering. Each 6D vector denotes six degrees of freedom, consisting of three rotations and three translations along the x, y, and z axes. The k-means clustering algorithm is then used to cluster these 6D vectors and construct κ clusters, with each cluster center being a transformation T that includes a rotation and translation. The next step is to extract a translation vector and a rotation matrix from the cluster center. Then, we transform the source point cloud P S using the transformation described above, naming it P S . For every point in P S , we find its nearest point in P T , and we consider a point p S i in P S to be an inlier if the distance between p S i and its nearest point p T j in p T j is less than a threshold, i.e.,
    p S i p T j < ϵ d
    and, after that, the definition of the ratio of inliers is shown as follows:
    r t i n l i e r s = n i n l i n e s / n s
    in which n i n l i n e s represents the number of inliers and n s represents the number of points in P S . Finally, we choose an optimal T as T o , whose r t i n l i e r s is the highest.

3.3.3. Fine Transformation Estimation

Following the completion of the preceding two phases, we acquire an initial transformation T o between the source and target point clouds and then transform the source point using the transformation T o before fine registration. We apply a recent ICP variant [21] to complete fine transformation estimation, which can register accurately the source point cloud to the target point cloud.

4. Experiments

The experiments involve the comparison of performance and registration effectiveness between our proposed feature descriptor based on multi-scale covariance matrix and state-of-the-art feature descriptors, with the goal of testing the performance of our proposed feature descriptor and registration approach.

4.1. Experiments of Keypoint Detection with Boundary Removal

The task of this section is to validate whether our keypoint detection with boundary removed can detect boundary points in a point cloud and can obtain keypoints far away from the boundary.

4.1.1. Dataset

To conduct these experiments, we initially employed partial point clouds of Armadillo, Bunny, and Buddha et al. from the Stanford 3D Scanning Repository [35] for our research. For Armadillo, we included 11 partial point clouds in our collection, numbered from ArmadilloBack_0 to ArmadilloBack_330 with 60-degree intervals between them. For Bunny, we provided six partial point clouds in our dataset, labeled Bun000, Bun045, Bun090, Bun180, Bun270, and Bun315. For Happy Buddha, we chose 15 partial point clouds from our dataset and called them HappyStandRight_0 to HappyStandRight_336 with 24-degree intervals between them. Samples of point clouds from our dataset are shown in Figure 4.

4.1.2. Results of Keypoint Detection with Boundary Removal

To determine if a location is a boundary point, we choose neighboring points within a radius of 4 mr (the support radius) and assign different L t h values. Figure 5 depicts the boundary points of the point clouds indicated in Figure 4 under various L t h , with the red points representing boundary points and the green points representing the original point clouds in Figure 4. Figure 5 clearly shows that the number of boundary points is the minimum, and the boundary points are discrete when L t h is set to π. There are more boundary locations when L m a x is between π/2 and π/4. When L t h is set to π/3 or π/4, the boundary points overlap and create many clusters. In subsequent studies, we set L t h to π/3, recognizing that too many boundary points reduce the time efficiency of finding keypoints and that too few boundary points reduce the effectiveness of keypoint detection.
After acquiring boundary points for a point cloud, we may use our offered keypoint detector to extract original keypoints and delete boundary keypoints. In Figure 6, columns (a) and (c) illustrate keypoints detected by our keypoint detector for the various point clouds indicated in Figure 4, with red points representing keypoints and green points representing original points. After retrieving the original keypoints, we shall eliminate any keypoints that are less than 5 mr from the nearest boundary point. Columns (b) and (d) in Figure 6 show the current keypoints after deleting the boundary keypoints from the original keypoints. When compared to keypoints in columns (a), (b), (c), and (d) in Figure 6, it is obvious that our approach for keypoint detection with boundary removal can yield keypoints on non-boundaries, which is critical for generating feature descriptors and matching point pairs between a pair of point clouds.

4.2. Experiments on the Performance of Descriptors

4.2.1. Dataset

We chose four source point clouds from the Stanford 3D Scanning Repository for these experiments to evaluate the performance of our proposed feature descriptor in these experiments. Our target point clouds are created by superimposing Gaussian noise on our source point clouds (at three distinct levels of Gaussian noise, with standard deviations of 0.1 mr, 0.3 mr, and 0.5 mr).

4.2.2. Evaluation Metrics of Performance of Descriptors

Our proposed multi-scale covariance matrix descriptor includes only one parameter: random variables in the vector φ . As a result, the descriptor’s performance against various variables in the vector φ must be evaluated. In the meantime, a performance comparison of our proposed feature descriptor with state-of-the-art feature descriptors is required.
Our experimental findings are shown using the recall versus 1-precision curve (RP curve). The recall versus 1-precision curve is a frequent statistic in the literature for evaluating a descriptor [36]. It is created in the following manner. Given a target point cloud, a source point cloud, and a ground truth matrix, keypoints from the target and source point clouds are detected and described as target feature descriptors and source feature descriptors, respectively, using our proposed multi-scale covariance matrix descriptor algorithm. The total number of positives is used to calculate the number of target feature descriptors. To determine the closest feature descriptor, a target feature descriptor is compared to all source feature descriptors. If the distance between the target feature descriptor and the nearest source feature descriptor is less than a certain threshold, the pair is considered matched. Furthermore, a matched feature pair is regarded as a valid positive if the distance between their physical places is short enough; otherwise, it is considered a false positive. The RP can be calculated using these numbers.
r e c a l l = t h e   n u m b e r   o f   c o r r e c t   p o s i t i v e s t o t a l   n u m b e r   o f   p o s i t i v e s
1 p r e c i s i o n = t h e   n u m b e r   o f   f a l s e   p o s i t i v e s t o t a l   n u m b e r   o f   m a t c h e d   f e a t u r e   p a i r s

4.2.3. Comparison between Different Feature Vectors for Covariance Descriptor

Random variables in the feature vector play a significant part in the generation of our proposed covariance matrix descriptor, which impacts not only the encapsulation ability of information in the covariance matrix but also the length of the feature vector. The following feature vectors are concatenated by random variables in this paper:
F 3 = ( cos α p i p , cos β p i p , cos ( γ p i p ) ) F 4 = ( σ p i p , λ 1 p i p , λ 2 p i p , λ 3 ( p i p ) ) F 7 = ( cos ( α p i p ) , cos ( β p i p ) , cos ( γ p i p ) , σ p i p , λ 1 p i p , λ 2 p i p , λ 3 ( p i p ) ) F 10 = ( x p i , y p i , z p i , cos ( α p i p ) , cos ( β p i p ) , cos ( γ p i p ) , σ p i p , λ 1 p i p , λ 2 p i p , λ 3 ( p i p ) ) F 13 = ( n x p i , n y p i , n z p i , x p i , y p i , z p i , cos ( α p i p ) , cos ( β p i p ) , cos ( γ p i p ) , σ p i p , λ 1 p i p , λ 2 p i p , λ 3 ( p i p ) )
F 7 is the feature vector that is employed in the covariance matrix descriptor described in this paper. Other feature vectors are generated by increasing or reducing random variables based on F 7 . The relative values of the keypoints and their neighboring ones are represented by the values of each variable in the feature vectors F 3 , F 4 , and F 7 , and these random variables exhibit rotation translation invariance and noise robustness. The x, y, and z coordinates of neighboring points that are not invariant to rotation and translation are added to the feature vector F 10 . On the basis of F 10 ., the eigenvector F 13 . adds the normal vector information of neighboring points. These normal vectors lack rotation translation invariance as well.
We find keypoints from a source point cloud using our proposed keypoint detection method with boundary removal in these experiments. Simultaneously, to ensure that the entire number of positives is known, we must construct keypoints of the target point cloud that match to keypoints of the source point cloud. The following description depicts the method of producing corresponding keypoints of target point clouds:
  • Transform the source point cloud’s keypoints using the ground truth matrix;
  • For every keypoint in transformed keypoints, find the closest point to it in the target point cloud and designate the point as a keypoint of the target point cloud.
In the process of generating the covariance matrix descriptor, the support radius is set to 6 mr, 7 mr, 8 mr, and 9 mr, correspondingly. The average recall and 1-precision are determined under three Gaussian noise situations. Figure 6 depicts the experimental results, where Figure 7a–c represent RP curves at 0.1 mr, 0.3 mr, and 0.5 mr, respectively.
Figure 6 shows that under the three noises, the performance of the covariance matrix descriptor formed by the feature vectors F 3 , F 4 , and F 7 is better than the performance of the covariance matrix formed by the feature vectors F 10 and F 13 , and the performance of the covariance matrix formed by the feature vectors F 10 and F 13 will decline rapidly as the noise deviation increases. This is due to two factors. First, the random variables in F 3 , F 4 , and F 7 have rotation and translation invariance, but the random variables in F 10 and F 13 do not have rotation and translation invariance, such as absolute spatial coordinates x, y, z, and so on. Second, the random variables F 3 , F 4 , and F 7 include the geometric structure, surface variation gradient, and eigenvalue variation gradient, which are all robust to noise, whereas the random variables F 10 and F 13 include normal vector and spatial coordinates that are not robust to noise.
In the meantime, when the performance of the covariance matrix descriptor constructed from the feature vectors F 3 and F 4 is compared, it can be seen that the performance of the covariance matrix descriptor formed by F 4 is better than the performance of the covariance matrix descriptor formed by F 3 , which is due to the fact that surface variation gradient and eigenvalue variation gradient are more stable and more robust to noise than geometric structure.
Additionally, by comparing the performance of the covariance matrix descriptor constructed by the feature vector F 3 , F 4 , and F 7 , it can also be shown that the performance of the covariance matrix descriptor will be enhanced by adding more random variables with strong robustness to noise and invariance to rotation and translation to the feature vector. Kaiser et al. [37] obtain a similar result and recommend that the number of random variables in the feature vector be limited to roughly 15. In this study, the length of the random variables is 7. Despite the limited number of random variables, the reduced dimensionality of the covariance matrix descriptor can increase time efficiency and minimize space overhead. Finally, Figure 6 shows that the performance of the covariance matrix descriptor formed by F 7 drops slightly as noise increases from 0.1 mr to 0.5 mr, confirming that the feature vector used in this paper has good noise robustness and rotation and translation invariance.

4.2.4. Comparison between Different Feature Descriptors

We compare the performance of our proposed covariance matrix descriptor to the performance of state-of-the-art feature descriptors like spin image [7], 3DSC [8], PFH [9], RoPS [11], FHPH [12], BRoPH [13], and KDD [14], which are regarded as classical feature descriptors in the literature for successful surface matching and are implemented on Point Cloud Library (PCL) [38,39]. Any required parameter (support radius, bin size) for the compared feature descriptors is set according to their original recommendations or similar values for our proposed feature descriptor, with the goal of providing the most fair comparison possible.
Figure 5 clearly shows that the spin image descriptor performs worse than the other three descriptors under three different noise circumstances. The fundamental explanation for this is that the spin image descriptor is sensitive to non-uniform sampling [40], whereas the keypoints in this set of trials are not produced via uniform sampling. Simultaneously, when noise levels rise, FPH’s performance falls significantly. There are two reasons for this. First, the PFH establishes a Darboux coordinate system and computes four features between two surrounding points that are rotation and translation invariant and noise resistant. Second, PFH combines three of its four feature variables to build a histogram with 16 intervals, which further increases PFH’s robustness to noise.
Meanwhile, Figure 8 shows that the performance of KDD is better than that of our proposed feature descriptor in the case of 0.1 mr noise, but as the noise increases, particularly when the noise is 0.3 mr and 0.5 mr, the performance of our proposed feature descriptor clearly outperforms all of the other feature descriptors. Our proposed feature descriptor is based on a covariance matrix, which is noise resistant for two reasons. First, the feature vectors that comprise the covariance matrix are noise robust and rotation and translation insensitive. Second, the mean operation in the process of producing covariance filters out most of the noise-damaged samples, directly improving the robustness of the feature descriptor to noise.

4.3. Experiments on Registration Effectiveness

4.3.1. Dataset

To verify the registration of point clouds using our proposed covariance matrix descriptor, we conduct experiments on partial point clouds from the Stanford 3D Scanning Repository, the SpaceTime [10,37,38] dataset, and the Kinect dataset [10,37,38]. In the Stanford 3D Scanning Repository, we test registration effectiveness for eleven of Armadillo’s incomplete point clouds, six of Bunny’s incomplete point clouds, fifteen of Happy Buddha’s incomplete point clouds, and fifteen of Dragon’s partial point clouds. In the SpaceTime dataset, we test registration effectiveness between two models and fifteen scenes, where models and scenes are regarded as source point clouds and target point clouds, respectively. In the Kinect dataset, we test registration effectiveness between partial views of six models and fifteen scenes, where partial views of six models and scenes are regarded as source point clouds and target point clouds, respectively. Some of the point clouds taken from these datasets are presented in Figure 9.

4.3.2. Evaluation Metrics of Registration Effectiveness

In [41], two metrics—the rotation error θ r and the translation error θ t —are utilized for a quantitative evaluation of the registration’s performance. The error between the real matrix R A and the ground truth rotation matrix R G T is represented by θ r . The inaccuracy between the translation vector t A and the ground truth vector t G T is represented by θ t . Here is how they are defined.
θ r = c o s 1 t r a c e R A R G T 1 1 2 180 π
θ t = t A t G T d r e s
where R G T and t G T are known in advance time and d r e s is 1 mr. It should be mentioned how R A and t A can be obtained. A coarse transformation matrix R t C between a source point cloud and a target point cloud can be estimated following coarse registration, and it is presented as formula (36). A transformation matrix called R t F i will be produced throughout each iteration of the fine registration procedure. Therefore, each iterative transformation matrix can be multiplied to generate a fine transformation matrix, as indicated by formula (37). Ultimately, the final transformation matrix R t A is obtained by multiplying R t F by R t C , as expressed in formula (38).
R t C = R c t C 0 1
R t F = R t F n R t F n 1   R t F 1 = R F t F 0 1
R t A = R t F R t C = R F t F 0 1 R c t C 0 1 = R F R C R F t C + t F 0 1
where R t A includes a rotate matrix R A and a translator vector t A , R c and t C stand for the rotation matrix and translation vector of R t C , respectively, and R F and t F stand for the rotation matrix and translation vector of R t F , respectively.

4.3.3. Settings of Experimental Parameters

Table 1 lists all parameters used in our experiments. ε ϑ is a threshold for a surface variation index, and a point is defined as a keypoint when its surface variation index is greater than ε ϑ ; L t h is a threshold for the maximum difference between adjacent included angles, and a point is defined as a boundary point when its maximum difference is greater than L t h . d m i n is a threshold for the nearest distance between a keypoint and a boundary point, and a keypoint is regarded as a boundary point when its distance to its nearest boundary point is lower than d m i n .   r 1 , , r s is the supported radius used to generate a multi-scale feature descriptor proposed in our paper. ε r is a threshold for the distance ratio between a pair of the first nearest descriptor and a pair of the second nearest descriptor; a pair of descriptors can be regarded as a matched pair of descriptors when the distance ratio between a pair of the first nearest descriptor and a pair of the second nearest descriptor is less than ε r . κ is the number of clusters in coarse transformer estimation, and we set it to seven in this paper; ϵ d is a threshold for determining whether a point is an inlier, and a transformed point by a transformer matrix in a source point cloud is regarded as an inlier when its distance to its nearest point in the target point cloud is less than ϵ d .

4.3.4. Comparison between Registration Effectiveness of Transformation Estimation

Based on our registration framework and our proposed feature descriptor, we compare the registration effectiveness between our accurate transformation estimation and state-of-the-art transformation estimation such as feature matching, clustering, ICP, RANSAC, etc., on the above datasets. Table 2 shows the registration effectiveness of kinds of transformation estimation based on formulas (36) and (37). It is clearly shown that our proposed transformation estimation outperforms other transformation estimations followed by feature matching + clustering and RANSAC, and transformation estimation based on ICP obtains the lowest registration effectiveness in all transformation estimation methods. The core reason that transformation estimation based on ICP obtains the lowest registration effectiveness is that the initial transformation is unknown. So, when the initial transformation is obtained by feature matching, we then obtain a transformation between a pair of point clouds, and it will improve registration effectiveness. At the same time, according to a comparison of the registration effectiveness of transformation estimation between our proposed transformation estimation and feature matching + clustering and feature matching + ICP, three steps in our proposed transformation estimation were found to gradually improve registration effectiveness. The transformation estimation based on feature matching can not only obtain the transformation of the point cloud but also obtain a series of correspondences between source keypoints and target keypoints between source feature descriptors and target feature descriptors. These correspondences are applied to coarse transformation estimation based on clustering, which will help us obtain the initial transformation of point cloud registration. Finally, a fine transformation estimation (transformation estimation based on ICP) via the initial transformation can further obtain a more optimal transformation of point cloud registration.

4.3.5. Comparison between Registration Effectiveness of Point Clouds

As a result, we compare our feature descriptor to state-of-the-art feature descriptors using our registration framework. The average registration errors of point clouds in our dataset are shown in Table 3. It clearly shows that the average registration error for point clouds in the Standford 3D Scanning Repository via our proposed feature descriptor is moderate, lagged by BRoPH and KDD, and the average registration error for point clouds in the SpaceTime dataset via our proposed feature descriptor is also moderate, lagged by 3DSC, BRoPH, and KDD. However, in the Kinect dataset, pairwise registration based on our proposed feature descriptor obtains the least registration error, which means it achieves the best registration effectiveness. The deep-seated reason is that the sampling quality or noise level of the three datasets is different, and our proposed feature descriptor has a greater robustness to noise. The point clouds in the 3D Scanning Repository are scanned with a Cyberware 3030 MS scanner, which is a swept-stripe laser triangulation range scanner with a higher sampling quality; the point clouds in the SpaceTime dataset are captured by a SpaceTime Stereo scanner with a high sampling quality, and point clouds are scanned by a low-cost Microsoft Kinect scanner with a low sampling quality. Different sampling qualities of the scanner decide the noise level of captured point clouds, with higher sampling quality resulting in less noise, and vice versa. Point clouds in the 3D Scanning Repository and the SpaceTime dataset have lower noise, so feature descriptors such as 3DSC, BRoPH, and KDD are more advantageous, and the registration effectiveness based on these feature descriptors are naturally higher than the registration effectiveness based on our proposed feature descriptor. However, with the increase in noise in point clouds, the advantage of our proposed feature descriptor in point cloud registration also emerges, which is due that the mean operation in the process of producing covariance filters out of most of the noise-damaged samples, making our feature descriptor very robust to noise and achieving the highest registration performance in the Kinect dataset.

4.3.6. Comparison between Registration Efficiency of Point Cloud

Based on our registration framework and our proposed feature descriptor, we compare the average registration time between our accurate transformation estimation and state-of-the-art transformation estimation, such as feature matching, clustering, ICP, RANSAC, etc., on the above three datasets. Table 4 shows the average registration time based on the kinds of transformation estimations. Meanwhile, we compare the average registration time between our feature descriptor and state-of-the-art feature descriptors via our registration framework and our accurate transformation estimation. The average registration time of point clouds based on the kinds of feature descriptors on the above three datasets is shown in Table 5.
The results in Table 4 show that the registration time of our proposed transformation estimation approach is only slightly lower than that of feature matching and ICP-based estimation methods. Although our transformation estimate technique takes somewhat longer to register than feature matching-based estimation methods, it is due to the fact that our method relies on feature matching to create initial matched point pairs. Although this phase takes some time, it lays the groundwork for more precise transformation estimates in the next stage. On the other hand, although the ICP-based transformation estimation method outperforms our method in registration time, as shown in Table 2, this method has significant errors in estimating the transformation matrix. This means that although the ICP method has advantages in time efficiency, there are shortcomings in accuracy. In addition, it can be observed in Table 4 that when feature matching-based transformation methods are combined with ICP methods or clustering methods, their computation time increases, resulting in lower time efficiency than our methods. This may be because these combination methods require more computing resources and steps when processing complex data. Finally, the transformation approach based on clustering/RANSAC requires a significant number of iterative operations, which takes longer than our method. This further demonstrates that our accurate transformation estimation approach is time efficient while retaining excellent accuracy. Based on the findings in Table 2, we can infer that our transformation estimation approach maintains high accuracy while being time efficient.
According to Table 5, the registration approach based on our proposed feature descriptors takes just slightly longer than the registration approach based on PFH/FPFH. The registration approach based on PFH has the shortest registration time, which might be owing to the relatively easy and quick calculation of PFH descriptors. In contrast, registration methods based on 3DSC have the longest registration times, typically due to higher descriptor dimensions, resulting in more complex and time-consuming generation and matching processes. As a result, it is clear that one of the most important elements influencing registration time is the descriptor dimension. Generally speaking, the larger the dimension of a descriptor, the more processing and storage resources it demands, resulting in a longer creation and matching time. Combined with the results shown in Table 3, it is clear that our feature descriptor achieves a relatively rapid registration time while being low dimensional, resulting in a good balance of efficiency and accuracy.

5. Conclusions

In this paper, we propose a feature descriptor based on a multi-scale covariance matrix that describes the geometrical relationship, the eigenvalue variation gradient, and the surface variation gradient, and we use it to extract local features of keypoints detected by our keypoint detection method. We, thereafter, present an accurate transformation estimation method by multi-level estimating point cloud transformations to obtain the best optimal transformation between a pair of point clouds. Experiment findings reveal that our proposed feature descriptor is more resistant to noise than PFH, 3DSC, spin image, etc., our accurate transformation estimation outperforms state-of-the-art transformation estimation such as feature matching, clustering, ICP, RANSAC, etc., and our registration effectiveness is extremely successful in the Kinect dataset with noisy data. Of course, the approach we propose has certain limitations, which are as follows.
(1) Too many parameters and thresholds must be supplied in our proposed approach. For example, r (the support radius is used to detect keypoints), ε ϑ (a threshold for a surface variation index), L t h (a threshold for the maximum difference between adjacent included angles), d m i n (a threshold for the nearest distance between a keypoint and a boundary point), et al., must be provided. Setting these parameter values/thresholds is a hard and time-consuming operation, and the quality of the settings will influence the registration outcomes. As a result, the next phase of work should concentrate on research in this area, such as establishing multiple parameter values/threshold acquisition standards and adaptively choosing parameter values/thresholds for 3D point cloud registration.
(2) The rigid point cloud registration approach assumes that objects do not undergo deformation during the registration process, while the non-rigid point cloud registration involves objects that undergo deformation, that is, objects may undergo shape or size changes during the registration process due to various factors such as pressure, temperature, etc. Therefore, the method used in this paper cannot be directly applied to non-rigid body registration due to its inherent assumptions and limitations. But if non-rigid bodies are decomposed into multiple small rigid bodies, non-rigid body registration can be defined as a non-uniform Euclidean transformation between multiple small rigid bodies and the approach proposed in this paper also have practical applications. Therefore, future work needs to research how to decompose non-rigid point cloud registrations into multiple local rigid point cloud registrations and apply the method proposed in this paper to the non-rigid point cloud registrations.
(3) Apply our proposed point cloud registration approach in this paper to the registration of indoor and outdoor scenes. At present, our point clouds used in this paper only involve single, small objects and do not involve large point clouds, such as indoor and outdoor scenes. Compared with a single small object, the point cloud involved in indoor and outdoor scenes grows exponentially, and the computational complexity also increases exponentially. These will be challenges for our point cloud registration approach proposed in this paper.

Author Contributions

Conceptualization, F.X.; methodology, F.X.; software, F.X. and Y.K.; validation, X.K. and Z.Z.; formal analysis, M.H.; writing—original draft preparation, F.X. and Y.K.; writing—review and editing, X.K. and C.S.; funding acquisition, X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Natural Science Foundation of China under Grant Number 62272426, in part by the Shanxi Province Science and Technology Major Special Plan “Unveiling and Leading” Project under Grant Number 202201150401021, and in part by the Shanxi Provincial Natural Science Foundation under Grant Number 202203021212138.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author ([email protected]).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ding, C.; Dai, Y.; Feng, X.; Zhou, Y.; Li, Q. Stereo vision SLAM-based 3D reconstruction on UAV development platforms. J. Electron. Imaging 2023, 32, 013041. [Google Scholar] [CrossRef]
  2. Bai, L.; Li, Y.; Zhou, Z. Visualization pipeline of autonomous driving scenes based on FCCR-3D reconstruction. J. Electron. Imaging 2022, 31, 033023. [Google Scholar] [CrossRef]
  3. Kong, X.; Liu, S.; Taher, M.; Davison, A.J. vmap: Vectorised object mapping for neural field slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
  4. Mao, J.; Shi, S.; Wang, X.; Li, H. 3D object detection for autonomous driving: A comprehensive survey. Int. J. Comput. Vis. 2023, 131, 1909–1963. [Google Scholar] [CrossRef]
  5. Shi, C.; Miao, X.; Liu, H.; Han, Y.; Wang, Y.; Gao, W.; Liu, G.; Li, S.; Lin, Y.; Wei, X.; et al. How to promote the sustainable development of virtual reality technology for training in construction filed: A tripartite evolutionary game analysis. PLoS ONE 2023, 18, e0290957. [Google Scholar] [CrossRef]
  6. Xu, G.; Pang, Y.; Bai, Z.; Wang, Y.; Lu, Z. A fast point clouds registration algorithm for laser scanners. Appl. Sci. 2021, 11, 3426. [Google Scholar] [CrossRef]
  7. Johnson, A.E.; Hebert, M. Surface matching for object recognition in complex three-dimensional scenes. Image Vis. Comput. 1998, 16, 635–651. [Google Scholar] [CrossRef]
  8. Frome, A.; Huber, D.; Kolluri, R.; Bülow, T.; Malik, J. Recognizing objects in range data using regional point descriptors. In Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic, 11–14 May 2004; pp. 224–237. [Google Scholar]
  9. Rusu, R.B. Semantic 3D object maps for everyday manipulation in human living environments. KI-Künstliche Intell. 2010, 24, 345–348. [Google Scholar] [CrossRef]
  10. Salti, S.; Tombari, F.; Di Stefano, L. Shot: Unique Signatures of Histograms for Surface and Texture Description. Comput. Vis. Image Underst. 2014, 125, 251–264. [Google Scholar] [CrossRef]
  11. Guo, Y.; Sohel, F.; Bennamoun, M.; Lu, M.; Wan, J. Rotational Projection Statistics for 3D Local Surface Description and Object Recognition. Int. J. Comput. Vis. 2013, 105, 63–86. [Google Scholar] [CrossRef]
  12. Rusu, R.B.; Blodow, N.; Beetz, M. Fast Point Feature Histograms (FPFH) for 3D Registration. In Proceedings of the IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 3212–3217. [Google Scholar]
  13. Zou, Y.; Wang, X.; Zhang, T.; Liang, B.; Song, J.; Liu, H. BRoPH: An efficient and compact binary descriptor for 3D point clouds. Pattern Recognit. 2018, 76, 522–536. [Google Scholar] [CrossRef]
  14. Zhang, Y.; Li, C.; Guo, B.; Guo, C.; Zhang, S. KDD: A kernel density based descriptor for 3D point clouds. Pattern Recognit. 2021, 111, 107691. [Google Scholar] [CrossRef]
  15. Chen, Y.; Medioni, G. Object modelling by registration of multiple range images. Image Vis. Comput. 1992, 10, 145–155. [Google Scholar] [CrossRef]
  16. Besl, P.J.; McKay, D.N. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
  17. Sharp, G.C.; Lee, S.W.; Wehe, D.K. ICP registration using invariant features. IEEE Trans. Pattern Anal. 2002, 24, 90–102. [Google Scholar] [CrossRef]
  18. Fitzgibbon, A.W. Robust registration of 2D and 3D point sets. Image Vis. Comput. 2003, 21, 1145–1153. [Google Scholar] [CrossRef]
  19. Chetverikov, D.; Svirko, D.; Stepanov, D.; Krsek, P. The trimmed iterative closest point algorithm. In Proceedings of the 2002 International Conference on Pattern Recognition, Quebec City, QC, Canada, 11–15 August 2002; Volume 3, pp. 545–548. [Google Scholar]
  20. Dong, J.; Peng, Y.; Ying, S.; Hu, Z. LieTrICP: An improvement of trimmed iterative closest point algorithm. Neurocomputing 2014, 140, 67–76. [Google Scholar] [CrossRef]
  21. Yang, J.; Li, H.; Jia, Y. Go-ICP: Solving 3D registration efficiently and globally optimally. In Proceedings of the IEEE Conference on Computer Vision, Portland, OR, USA, 23–28 June 2013; pp. 1457–1464. [Google Scholar]
  22. Simon, D.A.; Hebert, M.; Kanade, T. Techniques for fast and accurate intrasurgical registration. Comput. Aided Surg. 1995, 1, 17–29. [Google Scholar] [CrossRef]
  23. Qiu, D.; May, S.; Nüchter, A. GPU-accelerated nearest neighbor search for 3D registration. In Proceedings of the International Conference on Computer Vision Systems, Liege, Belgium, 13–15 October 2009; pp. 194–203. [Google Scholar]
  24. Uhlenbrock, R.; Kim, K.; Hoffmann, H.; Dolne, J. Rapid 3D registration using local subtree caching in iterative closest point (ICP) algorithm. In Proceedings of the International Society for Optics and Photonics, San Diego, CA, USA, 9–10 August 2017; pp. 135–139. [Google Scholar]
  25. Tazir, M.L.; Gokhool, T.; Checchin, P.; Malaterre, L.; Trassoudaine, L. Cluster ICP: Towards Sparse to Dense Registration. In Proceedings of the International Conference on Intelligent Autonomous Systems, Singapore, 1–3 March 2018; pp. 730–747. [Google Scholar]
  26. Sarode, V.; Li, X.; Goforth, H.; Aoki, Y.; Srivatsan, R.A.; Lucey, S.; Choset, H. Pcrnet: Point cloud registration network using pointnet encoding. arXiv 2019, arXiv:1908.07906. [Google Scholar]
  27. Yew, Z.J.; Lee, G.H. Regtr: End-to-end point cloud correspondences with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6677–6686. [Google Scholar]
  28. Huang, S.; Gojcic, Z.; Usvyatsov, M.; Wieser, A.; Schindler, K. Predator: Registration of 3d point clouds with low overlap. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4267–4276. [Google Scholar]
  29. Gojcic, Z.; Zhou, C.; Wegner, J.D.; Wieser, A. The perfect match: 3D point cloud matching with smoothed densities. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5540–5549. [Google Scholar]
  30. Bai, X.; Luo, Z.; Zhou, L.; Chen, H.; Li, L.; Hu, Z.; Fu, H.; Tai, C.-L. Pointdsc: Robust point cloud registration using deep spatial consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; IEEE: New York, NY, USA, 2021; pp. 15859–15869. [Google Scholar]
  31. Yuan, W.; Eckart, B.; Kim, K.; Jampani, V.; Fox, D.; Kautz, J. Deepgmr: Learning latent gaussian mixture models for registration. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part V 16. Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 733–750. [Google Scholar]
  32. Choy, C.; Dong, W.; Koltun, V. Deep global registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2514–2523. [Google Scholar]
  33. Rice, J.A. Mathematical Statistics and Data Analysis; Cengage Learning: Belmont, CA, USA, 2006. [Google Scholar]
  34. Mian, A.; Bennamoun, M.; Owens, R. A Novel Representation and Feature Matching Algorithm for Automatic Pairwise Registration of Range Images. Int. J. Comput. Vis. 2006, 66, 19–40. [Google Scholar] [CrossRef]
  35. Curless, B.; Levoy, M. A volumetric method for building complex models from range images. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 4–9 August 1996; pp. 303–312. [Google Scholar]
  36. Hou, T.; Qin, H. Efficient computation of scale-space features for deformable shape correspondences. In Proceedings of the Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Greece, 5–11 September 2010; pp. 384–397. [Google Scholar]
  37. Kaiser, M.; Xu, X.; Kwolek, B.; Sural, S.; Rigoll, G. Towards using covariance matrix pyramids as salient point descriptors in 3D point clouds. Neurocomputing 2013, 120, 101–112. [Google Scholar] [CrossRef]
  38. Aldoma, A.; Marton, Z.; Tombari, F.; Wohlkinger, W.; Potthast, C.; Zeisl, B.; Rusu, R.B.; Gedikli, S.; Vincze, M. Tutorial: Point Cloud Library: Three-Dimensional Object Recognition and 6 DOF Pose Estimation. IEEE Robot. Autom. Mag. 2012, 19, 80–91. [Google Scholar] [CrossRef]
  39. Holz, D.; Ichim, A.; Tombari, F.; Rusu, R.B.; Behnke, S. Registration with the Point Cloud Library: A Modular Framework for Aligning in 3-D. IEEE Robot. Autom. Mag. 2015, 22, 110–124. [Google Scholar] [CrossRef]
  40. Mian, A.; Bennamoun, M.; Owens, R. On the Repeatability and Quality of Keypoints for Local Feature-based 3D Object Retrieval from Cluttered Scenes. Int. J. Comput. Vis. 2010, 89, 348–361. [Google Scholar] [CrossRef]
  41. Yang, J.; Zhang, Q.; Cao, Z. Multi-attribute statistics histograms for accurate and robust pairwise registration of range images. Neurocomputing 2017, 251, 54–67. [Google Scholar] [CrossRef]
Figure 1. The framework of our point cloud registration.
Figure 1. The framework of our point cloud registration.
Applsci 14 09375 g001
Figure 2. Distribution of a boundary and a non-boundary point with their neighboring points.
Figure 2. Distribution of a boundary and a non-boundary point with their neighboring points.
Applsci 14 09375 g002
Figure 3. Geometric relations α, β, and γ between a keypoint p and one of its neighboring points.
Figure 3. Geometric relations α, β, and γ between a keypoint p and one of its neighboring points.
Applsci 14 09375 g003
Figure 4. Samples of point clouds from our dataset.
Figure 4. Samples of point clouds from our dataset.
Applsci 14 09375 g004
Figure 5. Boundary points under various differences between adjacent included angles.
Figure 5. Boundary points under various differences between adjacent included angles.
Applsci 14 09375 g005
Figure 6. Keypoints on different point clouds. (a) Keypoints illustrator 1 with boundary point retained. (b) Keypoints illustrator 1 with boundary point removed. (c) Keypoints illustrator 2 with boundary point retained. (d) Keypoints illustrator 2 with boundary point removed.
Figure 6. Keypoints on different point clouds. (a) Keypoints illustrator 1 with boundary point retained. (b) Keypoints illustrator 1 with boundary point removed. (c) Keypoints illustrator 2 with boundary point retained. (d) Keypoints illustrator 2 with boundary point removed.
Applsci 14 09375 g006
Figure 7. Performance of covariance matrix descriptor formed by different feature vectors under different noise conditions.
Figure 7. Performance of covariance matrix descriptor formed by different feature vectors under different noise conditions.
Applsci 14 09375 g007
Figure 8. Performance comparison between our proposed covariance matrix descriptor and the state-of-art feature descriptors under different noise conditions.
Figure 8. Performance comparison between our proposed covariance matrix descriptor and the state-of-art feature descriptors under different noise conditions.
Applsci 14 09375 g008
Figure 9. The datasets used in the experiments.
Figure 9. The datasets used in the experiments.
Applsci 14 09375 g009
Table 1. Parameters used in our experiments.
Table 1. Parameters used in our experiments.
Parameter NameParameter ValueDescription
r4 mrThe support radius used to detect keypoints
ε ϑ 1.5A threshold for a surface variation index
L t h π / 3 A threshold for the maximum difference between adjacent included angles
d m i n 5 mrA threshold for the nearest distance between a keypoint and a boundary point
{ r 1 , , r s } {3 mr, 5 mr, 7 mr, 9 mr}A multi-support radius used to generate a multi-scale feature descriptor
ε r 0.8A threshold for the distance ratio between a pair of the first nearest descriptor and a pair of the second nearest descriptor
κ7The number of clusters in coarse transformer estimation
ϵ d 1 mrA threshold for determining whether a point is an inlier
Table 2. Average registration errors for point cloud registration based on different transformation estimations.
Table 2. Average registration errors for point cloud registration based on different transformation estimations.
DatasetMethodAverage θ r Average θ t Average θ
Stanford 3D Scanning RepositoryBased on our proposed transformation estimation0.94821.95681.4525
Based on feature matching 9.43997.32468.3823
Based on clustering4.34316.54985.4465
Based on ICP9.87699.67819.7775
Based on feature matching + clustering1.45012.09061.7704
Based on feature matching + ICP7.91396.23057.0722
Based on RANSAC3.77982.19092.9854
SpaceTime datasetBased on our proposed transformation estimation1.15841.76491.4617
Based on feature matching 8.23086.54237.3866
Based on clustering4.56714.73484.6510
Based on ICP8.55217.22617.8891
Based on feature matching + clustering2.19892.10912.154
Based on feature matching + ICP3.22394.01213.618
Based on RANSAC3.32013.22713.2736
Kinect datasetBased on our proposed transformation estimation1.37392.78912.0815
Based on feature matching 10.891110.091910.4915
Based on clustering6.22378.32117.2724
Based on ICP12.300210.080911.1906
Based on feature matching + clustering2.32713.16332.7452
Based on feature matching + ICP4.02123.77513.8982
Based on RANSAC3.94323.58723.7652
Table 3. Average registration errors for point cloud registration based on different feature descriptors.
Table 3. Average registration errors for point cloud registration based on different feature descriptors.
DatasetMethodAverage θ r Average θ t Average θ
Stanford 3D
Scanning Repository
Based on our proposed feature descriptor0.94821.95681.4525
Based on 3DSC2.14932.78552.4674
Based on spin image5.93444.45025.1923
Based on PFH3.55374.65894.1063
Based on RoPS3.21452.17652.6955
Based on FPFH4.23054.73274.4816
Based on BRoPH0.78911.39071.0899
Based on KDD0.58531.77861.1820
SpaceTime datasetBased on our proposed feature descriptor1.15841.76491.4617
Based on 3DSC0.77231.34021.0563
Based on spin image2.64782.97882.8133
Based on PFH2.09262.85222.4724
Based on RoPS3.03442.20792.6212
Based on FPFH2.56983.29032.9301
Based on BRoPH0.98331.27491.1291
Based on KDD0.84781.51941.1836
Kinect datasetBased on our proposed feature descriptor1.37392.78912.0815
Based on 3DSC3.24554.32643.7860
Based on spin image6.34214.89115.6166
Based on PFH5.32444.98175.1531
Based on RoPS5.32014.59274.9564
Based on FPFH5.01345.09235.0529
Based on BRoPH1.45053.01172.2311
Based on KDD2.33154.82993.5807
Table 4. Average registration time for point cloud registration based on different transformation estimations.
Table 4. Average registration time for point cloud registration based on different transformation estimations.
MethodRegistration Time (ms)
Based on our proposed transformation estimation4085
Based on feature matching 1422
Based on clustering3448
Based on ICP3252
Based on feature matching + clustering6066
Based on feature matching + ICP5247
Based on RANSAC5395
Table 5. Average registration time for point cloud registration based on different feature descriptors.
Table 5. Average registration time for point cloud registration based on different feature descriptors.
MethodDimensionRegistration Time (ms)
Based on our proposed feature descriptor494085
Based on 3DSC198014,530
Based on spin image4007004
Based on PFH1253153
Based on RoPS1358606
Based on FPFH332257
Based on BRoPH2569905
Based on KDD1285714
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiong, F.; Kong, Y.; Kuang, X.; Hu, M.; Zhang, Z.; Shen, C.; Han, X. A Multi-Scale Covariance Matrix Descriptor and an Accurate Transformation Estimation for Robust Point Cloud Registration. Appl. Sci. 2024, 14, 9375. https://doi.org/10.3390/app14209375

AMA Style

Xiong F, Kong Y, Kuang X, Hu M, Zhang Z, Shen C, Han X. A Multi-Scale Covariance Matrix Descriptor and an Accurate Transformation Estimation for Robust Point Cloud Registration. Applied Sciences. 2024; 14(20):9375. https://doi.org/10.3390/app14209375

Chicago/Turabian Style

Xiong, Fengguang, Yu Kong, Xinhe Kuang, Mingyue Hu, Zhiqiang Zhang, Chaofan Shen, and Xie Han. 2024. "A Multi-Scale Covariance Matrix Descriptor and an Accurate Transformation Estimation for Robust Point Cloud Registration" Applied Sciences 14, no. 20: 9375. https://doi.org/10.3390/app14209375

APA Style

Xiong, F., Kong, Y., Kuang, X., Hu, M., Zhang, Z., Shen, C., & Han, X. (2024). A Multi-Scale Covariance Matrix Descriptor and an Accurate Transformation Estimation for Robust Point Cloud Registration. Applied Sciences, 14(20), 9375. https://doi.org/10.3390/app14209375

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop