Open AccessArticle

A Generalized Voronoi Diagram-Based Segment-Point Cyclic Line Segment Matching Method for Stereo Satellite Images

Li Zhao

Fengcheng Guo

^2,*

Yi Zhu

Haiyan Wang

³ and

Bingqian Zhou

School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221116, China

School of Geography Geomatics and Planning, Jiangsu Normal University, Xuzhou 221116, China

College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(23), 4395; https://doi.org/10.3390/rs16234395

Submission received: 11 October 2024 / Revised: 21 November 2024 / Accepted: 22 November 2024 / Published: 24 November 2024

(This article belongs to the Section Remote Sensing Image Processing)

Download

Browse Figures

Figure 1
Feature point matching and LSM. Matching points are derived from affine scale-invariant feature transform (ASIFT) algorithm and matching line segments are sourced from the proposed LSM method [<a href="#B7-remotesensing-16-04395" class="html-bibr">7</a>]. The matched feature points are connected by lines, and the points and line segments in the left image are marked in green, while those in the right image are marked in yellow. "> Figure 2
Outline of the proposed LSM method. "> Figure 3
An illustration of PHOW feature extraction. "> Figure 4
Line segment retrivery based on polar constraints. The red line segments indicate the target line segments in the left view. The green line segments represent the candidate line segments and the yellow line segments represent the other line segments in the right view. The purple dashed lines indicate the polar lines. "> Figure 5
Schematic illustration of line segment matching based on soft voting classifier. In the line graph of the right subfigure, the red curve indicates that CAR takes the value range of <math display="inline"><semantics> <mfenced separators="" open="[" close="]"> <mrow> <mn>0</mn> <mo>,</mo> <mo> </mo> <mn>1</mn> </mrow> </mfenced> </semantics></math> when the line segment intersection angle <math display="inline"><semantics> <mrow> <mi>θ</mi> <mo>∈</mo> <mfenced separators="" open="[" close="]"> <mrow> <mn>0</mn> <mo>,</mo> <mrow> <mi>π</mi> <mrow> <mfenced open="/" close=""> <mphantom> <mpadded width="0pt"> <mi>π</mi> <mn>6</mn> </mpadded> </mphantom> </mfenced> <mspace width="0.0pt"/> </mrow> <mn>6</mn> </mrow> </mrow> </mfenced> </mrow> </semantics></math>. "> Figure 6
Voronoi diagrams. (a) Left view, (b) Right view. First row: Results of ASIFT match point display. Feature points are represented by red dots, and the corresponding locations are marked with the same numerical labels. Second row: Voronoi diagram plotted on the satellite maps, where red dots indicate anchor points from ASIFT and blue edges denote the Voronoi diagram edges. Third row: Delaunay triangulation with Voronoi diagram, where red dots represent anchor points, red lines indicate Voronoi edges, and blue lines denote Delaunay edges. Additionally, the diagram Voronoi differences between the left and right images are highlighted with green boxes. "> Figure 7
Local point-line homography geometry. In the figure, C and <math display="inline"><semantics> <msup> <mi>C</mi> <mo>′</mo> </msup> </semantics></math> denote the camera point locations, p and <math display="inline"><semantics> <msup> <mi>p</mi> <mo>′</mo> </msup> </semantics></math> denote a pair of ASIFT matching points. The red line segment represents the left line segment in the left view, the green and yellow line segments denote the line segments with the corresponding line segments in the right view. The blue solid lines denote the Voronoi edges, and the blue dots denote the Voronoi seed points. "> Figure 8
The 14 pairs of test images from the benchmark dataset. Numbers 1–14 represent different pairs of test images. "> Figure 9
The 18 pairs of test areas from the WV-2 dataset. Numbers 1–18 represent different pairs of test images. "> Figure 10
The 15 pairs of test areas from the WV-3 dataset. Numbers 1–15 represent different pairs of test images. "> Figure 11
Precision, Recall, F1-score, and total number of matched line segments for the proposed method with CAR weights taking values of <math display="inline"><semantics> <mfenced separators="" open="[" close="]"> <mrow> <mn>0.1</mn> <mo>,</mo> <mn>0.2</mn> <mo>,</mo> <mo>…</mo> <mo>,</mo> <mn>0.9</mn> </mrow> </mfenced> </semantics></math>. "> Figure 12
LSM results of the proposed method with CAR classifier weight values. (a–d) The results for values of <math display="inline"><semantics> <msub> <mi>ω</mi> <mn>2</mn> </msub> </semantics></math> of 0.1, 0.3, 0.5, and 0.7, respectively (only the left images are shown). Correctly matched line segments are highlighted in green, while incorrect line segments are highlighted in red. "> Figure 13
Statistics of the LILH, SLEM, and proposed methods on the 14 test images of the benchmark dataset. (a) Precision, (b) Recall, (c) F1-score. "> Figure 14
Actual matching results of the proposed method on the benchmark dataset. (a–f) The left images of representative matching results for the benchmark dataset. Mismatches are marked in red, while correct matches are marked in green. Precision and Recall values are also provided for each image. "> Figure 15
Statistics of LILH, SLEM, and proposed methods in the 18 test areas of the WV-2 dataset. (a) Precision, (b) Recall, (c) F1-score. "> Figure 16
Actual matching results of the proposed method on the WV-2 dataset. (a–h) The left images of representative matching results for the WV-2 dataset. Mismatches are labeled in red, correct matches are labeled in green. Precision values are also provided for each image. "> Figure 17
Intermediate results of the proposed method. The (left) figure represents the discrete matching of line segment points, with red dots indicating the matched line segment points in the left view, green dots indicating the matched line segment points in the right view, and yellow boxes indicating close-ups. The (right) figure represents the matching results of the corresponding line segment in the left figure, and the corresponding line segment is labeled in red. The blue line segments indicate the LSD line segments to be matched. "> Figure 18
Statistics of LILH, SLEM, and proposed methods in the 15 test areas of the WV-3 dataset. (a) Precision, (b) Recall, (c) F1-score. "> Figure 19
Actual matching results of the proposed method on the WV-3 dataset. (a–h) The left images of representative matching results for the WV-2 dataset. Mismatches are labeled in red, correct matches are labeled in green. Precision values are also provided for each image. "> Figure 20
Actual runtime of LILH, SLEM, and proposed methods on WV-2 dataset and WV-3 dataset. (a) Runtime on the WV-2 dataset, (b) Runtime on the WV-3 dataset. ">

Versions Notes

Abstract

Matched line segments are crucial geometric elements for reconstructing the desired 3D structure in stereo satellite imagery, owing to their advantages in spatial representation, complex shape description, and geometric computation. However, existing line segment matching (LSM) methods face significant challenges in effectively addressing co-linear interference and the misdirection of parallel line segments. To address these issues, this study proposes a “continuous–discrete–continuous” cyclic LSM method, based on the Voronoi diagram, for stereo satellite images. Initially, to compute the discrete line-point matching rate, line segments are discretized using the Bresenham algorithm, and the pyramid histogram of visual words (PHOW) feature is assigned to the line segment points which are detected using the line segment detector (LSD). Next, to obtain continuous matched line segments, the method combines the line segment crossing angle rate with the line-point matching rate, utilizing a soft voting classifier. Finally, local point-line homography models are constructed based on the Voronoi diagram, filtering out misdirected parallel line segments and yielding the final matched line segments. Extensive experiments on the challenging benchmark, WorldView-2 and WorldView-3 satellite image datasets, demonstrate that the proposed method outperforms several state-of-the-art LSM methods. Specifically, the proposed method achieves F1-scores that are 6.22%, 12.60%, and 18.35% higher than those of the best-performing existing LSM method on the three datasets, respectively.

Keywords:

stereo satellite image; line segment matching; Voronoi diagram; segment-point cycle; soft voting

1. Introduction

Three-dimensional reconstruction based on stereo satellite images is an interesting field, and researchers have developed some remarkable 3D reconstruction results by extracting feature points in the images [1,2]. However, compared with feature points, line segments can express richer geometric information, support more complex computation and analysis, and effectively represent multi-dimensional data such as shapes, motions, paths, etc. Therefore, line segments have important applications in rendering, collision detection, and spatial modeling. In the field of photogrammetry, the length, orientation, and constructive capabilities of line segments enable them to provide fundamental geometric information for important photogrammetric tasks such as geometric surveying and structural assessment. Further, for urban satellite images, buildings in urban scenes usually contain rich geometric structures, and line features of buildings are able to contain more semantic and structural information than point features [3], as shown in Figure 1. In this way, line segment matching (LSM) can serve as a fundamental step in the recovery of 3D structures based on binocular view or multi-view images, which is crucial for all aspects of 3D reconstruction, including structural recovery, planar reconstruction, and vectorization. Recently, researchers have begun to explore the use of line segment features to better realize the cross-frame matching task [4,5,6].

Furthermore, a wide range of literature has been published on LSM, which can be categorized into three groups based on feature computation: matching by local feature, matching by combining geometric features, and matching based on deep learning.

1.1. Matching by Local Feature

The basic idea of the LSM with local features is to build line segment descriptors from the texture and structure information of the line segment neighbors, then calculate the descriptor distances between a line segment in the left image and all (or local) line segments in the right image and take the closest distance as the best match. The set of line segment correspondences obtained in the way of computing descriptor distances is generally low in accuracy and often requires additional auxiliary information to be added to reduce the rate of mismatches. To this end, Zhang and Koch [8] proposed a line band descriptor which combines local texture and geometric constraints that can somewhat overcome affine transformation errors. Lopez et al. [9] constructed descriptors using multiscale information and also introduced them into LSM with geometric constraints. Li et al. [10] introduced a line–junction–line (LJL) structure based on multiscale descriptors consisting of intersecting lines and intersections, where the orientation histograms are grouped as descriptors. Li and Yao [11] subsequently improved LJL to make it affine and scale-invariant by recovering the affine transformations applied to the intersecting lines and their intersections, thus eliminating the need for multiscale information. To further capture the structural information around the lines, López et al. [9] established several types of geometric alignments of neighboring line segments based on local appearance and geometric properties. And the similarity metric is dynamically optimized through an iterative manner so that the set of matched lines grows robustly. In order to improve the centrality symmetry of line segments, Zhang et al. [12] used the midpoint, angle, and length as the minimum representation of line segments. A non-centrality suppression mechanism is also established to filter the fragmented line segments caused by line segment crossing. In the same year, Wang et al. [13] proposed a two-stage LSM method, which first performs coarse matching to compute the projection transformation model and later recomputes the descriptors to realize fine matching to adapt to the salient structures in multi-temporal remote sensing images.

In general, a certain number of corresponding line segments can be obtained by local feature similarity matching, but the false matching rate remains high due to the low descriptive ability of local descriptors for line segments. For this reason, researchers have been trying to perform LSM by combining geometric features.

1.2. Matching by Combining Geometric Features

Combined geometric features are often represented as a set of line segments (or a set of line segments and points) that construct descriptors rather than being computed individually. These kinds of methods usually establish coarse matching relationships by means of detecting the correspondence of a set of candidate line segment combinations (or a set of candidate line segment and point combinations). The line segments are then directly matched with combining strict monoclinicity constraints, polyline geometry, and other topological geometric relationships. However, in most cases, these geometric constraints cannot directly obtain line segment correspondences, instead being used only in pre-processing or post-processing procedures, such as constructing hypothetical line pair correspondences for further descriptor-based matching [14].

Random Sample Consensus (RANSAC) and its variants are common methods for eliminating mismatches by the use of combinatorial geometric features. Wang et al. [15] suggested using intersections of line segments for RANSAC, which usually leads to an increase in the number of outliers. A better solution is to represent the line segments as linear equations, thus avoiding the problem of inaccuracies at midpoints or endpoints. Yammine et al. [16] proposed the utilization of a variant of RANSAC called PROSAC to speed up the estimation process by utilizing descriptor similarity scores. One should note that line segments cannot be used directly in RANSAC due to geometric complexity. Subsequently, Wang et al. [13] used endpoints or midpoints of line segments as inputs for RANSAC. However, the endpoints or midpoints do not provide a stable solution as the line segments may be broken. Combined feature matching methods mainly take advantage of easy-to-model image motions and deformations, but it is difficult to exploit the homograph of large image regions caused by two views [17]. The reason for this may be that local geometric constraints are difficult to effectively utilize if the weakly textured region does not have enough feature points. Accordingly, a coplanar projection invariant was developed by Jia et al. [3,18] by utilizing coplanar line segment intersections to fuse the local geometric features of line segments and neighboring points, so that the problems of broken line segments and low texture regions can be overcome. Chen et al. [19] classified line segments into structured line pairs, unstructured line pairs, and individual line pairs with hierarchical geometric constraints to overcome viewpoint variations. Zheng et al. [5] proposed a robust LSM method based on global projection transform modeling, combining a smooth projection transform model and high-quality point matching for line matching based on point correspondences, which provides a strict projection at the intersections of the pairs.

Mismatched filtration of line segments caused by line segment breakage has already been the object of researchers’ interest. According to Chen et al. [19], a line segment merging operation was designed to convert fragmented line segments into more easily matched line segments. The geometrical relationship between line segments can be effectively utilized by merging line segments for matching. In order to fully utilize the redundant information and exploit the covariance of the fragmented line segments, Wang et al. [14] proposed a method based on a bilinear matrix to record the matching results of the line segment pairs, which combines the covariance constraints and the redundant line segments to check the matching results. Later, Wang et al. [20] proposed another bilinear descriptor and covariance constraints to check the one-to-many and many-to-one correspondences in the results based on the previous work. Shen et al. [21] transformed the LSM problem into an intuitive point-to-point alignment problem by means of a Gaussian mixture model and used the guided endpoint drift method to supplement the line segment breaks with satisfactory results.

The combination of line orientation and homography transformation is also an effective means to improve the precision of LSM. Wei et al. [4] extracted line segments from both views, recovered their orientations, and initialized the candidate line segments with the homography constraint. Then, the weighted random wandering method iswas employed to find reliable matches. Later, Wei et al. [4] used homography, polar line, and trifocal tensor constraints to constrain line and point candidates between views. The different geometric constraints can then be unified into a two-view or multi-view line-point correlation graph. In contrast, to simplify the joint features and improve the matching efficiency of line segments, Guo et al. [6] proposed a one-point-one-line LSM method which constructs a scanning trajectory in the right image based on polar geometry and depth constraints and determines the corresponding line segments by matched scores.

Overall, all of the above combinatorial geometric feature matching methods use a combination of multiple features to form higher-order geometric invariants for improving describability of line segments. Although it works in many cases, this strategy is complex and computationally expensive, relying on accurate and uniform additional key matching points (or line segments); it is also unreliable when the matching points (or line segments) are sparse or severely deviated due to the fact that the local geometrical constraints are not applicable if there are not enough feature points (or line segments) around the line segments. To overcome this problem, we model Voronoi diagrams topologically consistent between the left and right views using Voronoi diagrams with each key matching points as seed points. Benefits include the following: (1) Topological invariance of point and line combination is able to be maintained between the left and right views based on the Voronoi diagram, and it is robust to changes in viewing angle and illumination; (2) The line segment attribution region based on the matched Voronoi diagram model can be built and is robust to the line segment lengths and breaks, with local geometric constraints remaining in place.

1.3. Matching Based on Deep Learning

Recently, deep learning approaches have been applied to LSM and performed well on some challenging images [22,23]. These methods also primarily belong to the feature class of methods, except that they extract line segment features through neural networks. Ma et al. [24] proposed an end-to-end approach which is able to predict the allocation matrices of two feature sets through solving the relaxed optimal transmission problem with convolutional neural networks and graph convolutional networks. To get rid of the geometric constraints of line segments, Li et al. [25] mapped line feature correspondences into tangential vectors to the sphere and then encoded the angular and positional variations of image lines using these vectors. Further, it is possible to learn the spatial regularity of the vectors tangent to the sphere and acquire point-line correspondences for structured and unstructured scenes. In order to utilize both point and line features in a single graph neural network, Pautrat et al. [26] proposed a graph neural network called GlueStick which utilizes the connectivity information between the nodes to better glue them together so as to jointly match key points and line segments.

However, existing deep learning-based LSM methods may be highly relying on training samples, so the feature learning process of complex models may encode scene-specific information due to domain differences in feature learning. In this way, the sensitivity to environmental changes, as well as the generalization ability of the neural network itself, limits its wide application in large-size satellite remote sensing image. To this end, to select the corresponding line segments for satellite images with different shooting conditions and different coverage scenarios, we adopt the soft voting classifier in the field of ensemble learning with unsupervised learning to integrate the matching rate of discrete line segment points and the line segment angle rate.

Nevertheless, LSM is still considered as an open research front, primarily for its inherent problems, i.e., the problems of parallel misdirection as well as the uncertainty about the endpoints of the line segments.

(1) Parallel misdirection: The distinctiveness of the line segment descriptors of the line segment individual-based matching method is greatly reduced for large-scale changes and perspective changes, leading to a large number of indistinguishable ambiguous matches. Moreover, the “parallel misdirected line segments” derived from similar and repetitive textures in the urban architectural scene are similar in spatial location, texture features, angle features, and relative position to the polylines, making it difficult to deal with such ambiguous matches with the lack of geometric constraints. The geometrical constraints on line matching were imposed by Jia et al. [18] and Wei et al. [4] by constructing potential homography matrices from line segments and pairs of line intersections. Inspired by this, we similarly use point-line geometric features for constraining LSM. Differently, the matching points are used as anchors to construct the topologically consistent Voronoi diagrams between the left and right views [27]. In this way, the geometric features are built by means of Voronoi diagrams, and the filter of “parallel misdirected line segments” can be subject to both geometric and topological constraints.

(2) Uncertainty of line segment endpoints: The uncertainty of segment endpoints often leads to the problem of broken and length inconsistencies, which is usually manifested as “co-linear interferential line segments”, i.e., line segments that are co-linear with the actual corresponding line segments, arising from the brokenness and length inconsistencies. Thus, the endpoints of the co-linear interferential line segments do not reliably correspond to each other, and short line segments in one image are allowed to correspond to long line segments in the other image. In satellite remote sensing images, atmospheric radiations, changes in illumination, and shifts in the angle of view from which the shot is taken add to the magnitude of this problem. To address this problem, Shen et al. [21] modeled line segments with point-to-point representations using a Gaussian-uniform mixture model and supplemented the fracture problem by allowing endpoints to move along the line using a directed endpoint drift strategy. Instead, we are not obsessed with the uncertainty of segment endpoints. Broken and variable-length segments are essentially composed of discrete segment points, for which we utilize the discrete line points to calculate the line point matching rate (LPMR) and combine it with the crossing angle rate (CAR) to match candidate line segments through soft voting.

Based on the above observations and comparisons, the proposed method is computed by stereo rectification of a satellite image, discrete point matching of line segments, LSM based on soft voting classifier, and calculation of localized point-line homography models based on a Voronoi diagram [28]. (1) Stereo rectification of satellite images: The affine-compensated rational polynomial coefficient (RPC) stereo rectification is applied to satellite images. (2) Discrete point matching of line segments: Discrete point matching of line segments implements the transformation of continuous line segments to discrete points based on line segment detector (LSD) line segments, then acquiring the matching rate of the line segment points based on the feature of the pyramid histogram of visual words (PHOW). (3) LSM based on soft voting classifier: LSM based on soft voting classifier achieves the inverse conversion from discrete points to continuous line segments, and the pre-matching results of line segments can be obtained by combining LPMR and CAR under the effect of soft voting classifier. (4) Calculation of localized point-line homography models based on Voronoi diagram: The ASIFT matching points are used as anchor points to build Voronoi diagrams topologically consistent between the left and right views and then fit homography models based on local point and line structures. For obtaining the final LSM results, similar line segments are filtered in the pre-matching results by minimizing the local homography model error [7]. In addition, to enable the proposed LSM to be used for stereo satellite images, stereo rectification is employed as a preprocessing for line segment matching of stereo satellite images [29]. An overview of the proposed LSM method is shown in Figure 2.

In conclusion, the main contributions of this study are as follows:

(1) A new LSM method applicable to stereo satellite images is proposed, which is able to overcome the interferences caused by illumination changes, view angle changes, texture differences, and partial occlusions on LSM, and obtain matching line segments with high accuracy and sufficient number in high-resolution satellite images. Moreover, the proposed method is capable of serving the 3D structure reconstruction of satellite images.

(2) A soft voting-based LSM method with segment-point cycle is proposed, which utilizes a continuous–discrete–continuous matching mechanism and is able to effectively overcome the problem of “co-linear interference” caused by broken and length inconsistencies.

(3) We propose a Voronoi diagram-based localized point-line homography model capable of addressing the issue of “parallel misdirection”. This model leverages the topological consistency of Voronoi diagrams along with the geometric consistency of point-line combinations to obtain reliably matched line segments on large-scale architectural and road surfaces, even in cases with similar or repetitive textures.

2. Method

The proposed method consists of stereo rectification of satellite images, discrete point matching of line segments, LSM based on soft voting classifier, and computation of a localized point-line homography model. Here is a detailed description of the proposed method.

2.1. Stereo Rectification of Satellite Image

For satellite images, the epipolar geometry relationship between satellite image pairs is established by RPC [29]. RPC is useful for finding conjugate points between images in image space and constructing stereo-rectification images. However, the drawback of satellite image RPC is the need for supplementary correction for epipolar deviation and drift. To this end, we use the ASIFT algorithm to obtain matches on the initial epipolar-corrected images and perform outlier filtering and affine matrix fitting by the RANSAC algorithm. Finally, the affine matrix is implemented on the initially epipolar-corrected images to achieve additional correction.

2.2. Discrete Point Matching of Line Segments

The line segments detected in the left and right images through the LSD method proposed by Gioi et al. [30] are taken as inputs to our method. Discrete point matching of line segments is a “continuous→discrete” process. It starts with acquiring the pixel coordinates of the image where the line segment is located by taking the endpoints of the line segment as the start and end points via the Bresenham algorithm. Then, the PHOW feature is used to extract the point descriptors of the line segments.

2.2.1. Line Segment Discretization

Bresenham is a method used in the field of computer graphics for calculating lines that works on the principle of incrementing or decrementing pixel points along the line segments to be mapped for accurate dot-matrix plotting [31].

The core idea of the Bresenham method is to determine the next pixel point that should be selected by iterating step by step along the main direction of the line (X-axis or Y-axis). The assumption is that the line starts at

(x_{0}, y_{0})

, ends at

(x_{1}, y_{1})

, and has slope m between 0 and 1 (i.e.,

|Δ y| \leq |Δ x|

, where

Δ x = x_{1} - x_{0}

Δ y = y_{1} - y_{0}

Then, the currently localized pixel point is

(x_{k}, y_{k})

, and the X-coordinate of the next pixel point is

x_{k + 1} = x_{k} + 1

. It needs to be considered whether the Y-coordinate of the next pixel point is

y_{k}

y_{k} + 1

. Thus, two candidate coordinates

c_{l o w e r}

and

c_{u p p e r}

are formed with

c_{l o w e r} = (x_{k} + 1, y_{k})

and

c_{u p p e r} = (x_{k} + 1, y_{k} + 1)

Next, when the X-coordinate is

x_{k} + 1

, the Y-coordinate is calculated as follows:

y = m (x_{k} + 1) + b

(1)

where b is the Y-axis intercept. The points to be selected are determined by calculating the distance

d (\cdot)

between the two candidate points

c_{l o w e r}

and

c_{u p p e r}

and the ideal straight line.

(1) The equation of

d (c_{l o w e r})

d (c_{l o w e r}) = y - y_{k} = m (x_{k} + 1) + b - y_{k}

(2)

(2) The equation of

d (c_{u p p e r})

d (c_{u p p e r}) = (y_{k} + 1) - y = y_{k} + 1 - m (x_{k} + 1) - b

(3)

To determine which pixel is closer to the ideal straight line, the difference is calculated as follows:

d (c_{l o w e r}) - d (c_{u p p e r}) = 2 m (x_{k} + 1) - 2 y_{k} + 2 b - 1

(4)

We organize to obtain

d (c_{l o w e r}) - d (c_{u p p e r}) = 2 \frac{Δ y}{Δ x} (x_{k} + 1) - 2 y_{k} + 2 b - 1

(5)

We organize to obtain

Δ x (d (c_{l o w e r}) - d (c_{u p p e r})) = 2 Δ y \cdot x_{k} - 2 Δ x \cdot y_{k} + c

(6)

where c is a constant.

Based on these equations, a decision variable

D_{k}

is defined to represent the deviation of the current point from the ideal line. Specifically,

D_{k} = 2 Δ y \times x_{k} - 2 Δ x \times y_{k} + c

(7)

where c does not affect the final judgment and the key is to compare the sign of

D_{k}

. By constantly updating the

D_{k}

value, we can determine whether the Y-coordinate of the next pixel should be

y_{k}

y_{k} + 1

, i.e., if

D_{k} < 0

, then

(x_{k + 1}, y_{k})

is closer to the line and the point is chosen; If

D_{k} \geq 0

, then

(x_{k + 1}, y_{k} + 1)

is closer to the line and the point is chosen. In this way, the decision process is calculated as follows:

Initializing decision variable

D_{0}

D_{0} = 2 Δ y - Δ x

(8)

D_{k} < 0

, it means that the ideal line is below or coincides with the current point, the Y-coordinate should be kept unchanged, and

y_{k + 1} = y_{k}

, while updating the following decision variables:

D_{k + 1} = D_{k} + 2 Δ y

(9)

D_{k} \geq 0

, it means that the ideal straight line is on the top of current point, the pixel point above should be selected next pixel point

y_{k + 1} = y_{k} + 1

, and the decision variable should be updated at the same time:

D_{k + 1} = D_{k} + 2 (Δ y - Δ x)

(10)

2.2.2. Phow Feature Extraction

The PHOW feature extraction method is commonly used for image classification and object recognition, among other tasks. It combines image pyramid construction, Scale-Invariant Feature Transform (SIFT) feature extraction, visual vocabulary construction, and multi-scale fusion. The specific process is as follows:

Image pyramid construction: The original image is scaled to multiple scales to constitute an image pyramid. Typically, the image is progressively downsampled at a certain scale (e.g., 0.5×) to produce a number of images with different resolutions.

SIFT feature extraction: At each scale of the image, SIFT feature extraction is performed at the locations of segment points. The segment point is represented as a 128-dimensional SIFT descriptor that captures the gradient and texture information within the local region such that we are able to obtain a set of target SIFT features at each scale of the image, which can be used for subsequent visual vocabulary construction.

Visual vocabulary construction: Visual vocabulary construction is a key step in the PHOW extraction, where a large number of local features are quantized into a finite number of visual vocabularies by means of a clustering method (usually K-means). The specific process is as follows:

(1) SIFT features are extracted for line segment point locations from images across all scales. These feature vectors constitute a large collection of features.

(2) K-means clustering is performed on the collected set of features and all the features are divided into K cluster centers. Each cluster center represents a “visual vocabulary”, i.e., a visual word, and the cluster distance is calculated as follows:

d (x_{i}, μ_{j}) = {∥x_{i} - μ_{j}∥}_{2}

(11)

where

x_{i}

is a SIFT feature vector and

μ_{j}

is the

j_{t h}

clustering center. The clustering center update rule is as follows:

μ_{j} = \frac{1}{|C_{j}|} \sum_{x_{i} \in C_{j}} x_{i}

(12)

where

C_{j}

is the set of all feature points belonging to the

j_{t h}

cluster and

|C_{j}|

is the number of feature points in the cluster.

After each clustering center is obtained, it is required to map each SIFT feature vector to the nearest clustering center to quantify it into the corresponding visual vocabulary. For each feature

x_{i}

, we determine its corresponding visual vocabulary index

i n d_{s}

at the

s_{t h}

scale:

i n d_{s} (x_{i}) = arg min_{j} {∥x_{i} - μ_{j}∥}_{2}

(13)

(3) Once all features are quantized as visual vocabulary indexes, a histogram of the bag-of-words model is constructed for the image at each scale. This histogram records the frequency of occurrence of each visual vocabulary word in the image, represented as a K-dimensional vector, specified as the number of line segment pixel points in the image.

v_{s, j} = \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} δ (i n d_{s} (x_{i}), j)

(14)

where,

v_{s, j}

denotes the frequency of the

j_{t h}

visual word at the sth scale,

N_{s}

is the number of features at that scale, and

δ (\cdot)

is the Kronecker function, which is 1 when

v_{s} (x_{i}) = j

and 0 otherwise.

Multi-scale fusion: To capture multi-scale properties of images, bag-of-words histograms at all scales are fused together to form a final feature representation. The maximum pooling technique is utilized to capture the maximum value

V_{j} = {max}_{s = 1}^{S} (v_{s, j})

for the frequency of each visual word at all scales. Then, the fused feature vectors are normalized to obtain PHOW features

\tilde{V} = V / {∥V∥}_{2}

and reduce the effect of differences in brightness and contrast between images on the feature representation. In order to resist the influence of satellite image illumination and architectural distortion on feature extraction, we apply summing of windows of multiple sizes to compute the PHOW features at the final location, with the window sizes of

8 \times 8

12 \times 12

, and

16 \times 16

, respectively, as follows:

\tilde{V} (x, y) = {\tilde{V}}_{8 \times 8} (x, y) + {\tilde{V}}_{12 \times 12} (x, y) + {\tilde{V}}_{16 \times 16} (x, y)

(15)

The process is illustrated in Figure 3.

2.2.3. Matching of Discrete Line Segment Points

After obtaining the PHOW features of the line segment points, it is possible to compute the corresponding discrete points of line segments between the left and right images with Euclidean distance. Additionally, under the constraint of the polar lines, multiple points matching the target line segment can be obtained (as shown in Figure 4).

The PHOW-based Euclidean distance

d_{i}^{p h o w}

between the left line segment PHOW features

f_{L}

and the

i_{t h}

right line segment candidate PHOW features

f_{R i}

along the pole line within the disparity range

(d i s p_{m i n}, d i s p_{m a x})

can be calculated cyclically, using the left line segment as the reference.

d_{i}^{p h o w} = {∥f_{L} - f_{R i}∥}_{2} = \sqrt{\sum_{d = 1}^{D} {(f_{L}^{d} - f_{R i}^{d})}^{2}}

(16)

where

D (D = 128)

denotes the dimension of the PHOW feature vector and i is the index of the candidate point of the right line segment. The minimum disparity value

d i s p_{m i n}

and the maximum disparity value

d i s p_{m a x}

can be acquired by the distance between the ASIFT matching points in polar direction.

After calculating the PHOW-based Euclidean distance of all candidate points for the right line segment, we find the minimum distance

d_{m i n}^{p h o w}

and the second minimum distance

d_{s e c}^{p h o w}

\{\begin{matrix} d_{m i n}^{p h o w} = min_{i} d_{i}^{p h o w} \\ d_{s e c}^{p h o w} = min_{j, j \neq arg {min}_{i} d_{i}^{p h o w}} d_{j}^{p h o w} \end{matrix}

(17)

The selection of correctly matched points is subject to the simultaneous fulfillment of two conditions: a. The feature distance between the left line segment point and the candidate point is the minimum; b. The ratio of the minimum feature distance to the second smallest feature distance does not exceed

τ_{d} (τ_{d} = 0.8)

, as followes:

\begin{matrix} s . t . \to d_{m i n}^{p h o w} = min_{i} d_{i}^{p h o w} \\ s . t . \to d_{m i n}^{p h o w} / d_{sec}^{p h o w} \leq τ_{d} \end{matrix}

(18)

We repeat the above process until all points of the left line segment are traversed.

2.3. LSM Based on Soft Voting Classifier

After obtaining the corresponding discrete line segment points, unsurprisingly, these points usually come from multiple line segments, so the current problem to be solved is how to select the real corresponding line segments from the candidate line segments on the right. Using a soft voting strategy, we simultaneously consider the line segment point matching rate and angle between line segments, and therefore construct an LPMR classifier and a CAR classifier.

In a soft voting classifier, each classifier

C_{i}

for the candidate right line segments outputs the probability that the right line segment belongs to the correct matching:

P_{i} = [p_{i, 1}, p_{i, 2}, \dots p_{i, K}]

(19)

where

p_{i, j}

denotes the probability that the classifier

C_{i}

predicts that the right line segment j belongs to the correct match and K denotes the number of candidate right line segments. In the proposed method, it takes the values of 1 and 2, which denote the LPMR and CAR, respectively.

2.3.1. Lpmr Classifier

Line segments are composed of multiple line points, so the search of corresponding line segments in the proposed method is also realized based on the voting of matching line points. Line point matching needs to be mapped to line segment matching by a soft voting classifier. Therefore,

C_{1}

is specified as LPMR. For the right line segment j, the following formula is used:

p_{1, j} = \frac{n u m_{m a t c h e d}}{n u m_{a l l}}

(20)

where

n u m_{m a t c h e d}

denotes the number of matching points and

n u m_{a l l}

denotes the number of line segment points.

2.3.2. Car Classifier

The CAR classifier is able to incorporate the cross angle between line segments into the soft voting mechanism for line matching, provided that the disturbance of image rotation is overcome by polar alignment correction.

Moreover, we quantize the cross angle by trigonometry to keep its degree of effect on the same scale as the LPMR classifier. In addition, under the effect of stereo rectification, the larger the cross angle between the line segments to be matched, the lower the likelihood of them becoming matching line segments. Therefore, the maximum cross angle should not exceed

π / 6

, empirically. In this way, the CAR is calculated as follows:

p_{2, j} = sin (π / 2 - 3 θ), θ \in [0, π / 6]

(21)

where

θ

denotes the cross angle between the left and right line segments and

θ \in [0, π / 6]

. The derivative of this function is

p_{2, j}^{'} = - 3 cos (π / 2 - 3 θ), θ \in [0, π / 6]

(22)

As the angle of intersection

θ (θ \in [0, π / 6])

increases, the value of this derivative also increases, so the CAR speeds up to become smaller. The trend is consistent with the idea of voting on the cross angle of line segments, as it should be decreasingly likely to become a corresponding line segment as the angle of intersection becomes larger under the action of the polar lines.

After that, the final probability that the right line segment belongs to the correct match is calculated by means of a weighted average of the probability distributions of each classifier.

P_{j} = \sum_{i = 1}^{n} ω_{i} \cdot p_{i, j}

(23)

where

P_{j}

denotes the comprehensive matching probability (CMP) that the right line segment j is a correct match,

ω_{i}

denotes the weight of classifier

C_{i}

\sum_{i = 1}^{n} ω_{i} = 1

, and

n = 2

. A schematic diagram of the soft voting process is shown in Figure 5.

In summary, compared to the conventional LSM with two endpoints as the basic unit, the soft voting classifier-based line segment matching employs a strategy of discretization followed by continuity. Obviously, discrete matching is robust to line segment lengths, breaks, and differences in the positions of line segment endpoints because it is no longer constrained by the two endpoints.

However, it is still difficult for soft voting-based LSM to distinguish the “parallel misdirected line segments”. To prevent the influence of “parallel misdirected line segments”, it is useful to keep the potential corresponding line segments. For this reason, the right line segments which are not lower than the maximum matching rate

γ (γ = 0.8)

are selected as the set of pre-matched line segments.

\{\begin{matrix} L S^{'} = \{j | p_{j} > γ \cdot p_{max}, j = 1, 2, \dots, K\} \\ p_{max} = max_{j} p_{j} \end{matrix}

(24)

where K denotes the number of candidate line segments. In this way, it is possible to obtain the set of pre-matched line segments

L S^{'} = \{l {s^{'}}_{1}, l {s^{'}}_{2}, \dots, l {s^{'}}_{T}\}

corresponding to the current left line segment

l s

T (T \leq K)

denotes the number of pre-matched right line segments for which a mismatched line segment filtering is required.

2.4. Calculation of Localized Point-Line Homography Model

Some of the parallel line segments

l s_{t}^{'} \in L S^{'}

in the set of pre-matched segments are so close in terms of angle and feature similarity that the local features can no longer be separated. To this end, we combine “local point-line homography geometric features” to quantify the similarity between line segments by exploiting the planar properties of point-line combinations.

2.4.1. Voronoi Diagram

Local planes are first formed between the non-collinear ASIFT matching points and line segments, and then the mismatched line segments in the set of pre-matched line segments are filtered by the homography distance between the planes. As follows, the Voronoi diagrams are constructed by using the ASIFT matching points as seed points in the left and right diagrams, respectively (as shown in Figure 6).

The Voronoi diagram creation process is as follows (we take the left image as an example, the right image is the same): Based on the ASIFT matching method, we can obtain the set of matching points

S_{L} \leftrightarrow S_{R}

, and

p_{L i}

is a two-dimensional point with coordinates

(x_{L i}, y_{L i})

in the ASIFT matching point set

S_{L} = \{p_{L 1}, p_{L 2}, \dots, p_{L n}\}

on the left image. The goal of the Voronoi diagram

V (S_{L})

is to partition the left image into n regions, each of which corresponds to a seed point

p_{L i}

, and all the pixels inside the region are closer to

p_{L i}

than the other seed points.

The Voronoi diagram

V (S_{L})

can be represented as follows:

V (L) = \{V (p_{L 1}), V (p_{L 2}), \dots, V (p_{L n})\}

(25)

where

V (p_{L i})

denotes the

i_{t h}

convex polygonal region of Voronoi and all polygons cover the whole left image and do not overlap with each other. For the seed point

p_{L i}

, its corresponding Voronoi region satisfies the following relationship:

V (p_{L i}) = {p_{L} \in R^{2} ∣ d (p_{L}, p_{L i}) \leq d (p_{L}, p_{L j}) \forall L j \neq L i}

(26)

where

p_{L} = (x_{L}, y_{L})

is a pixel point in the left panel and

d (p_{L}, p_{L i})

denotes the Euclidean distance from point

p_{L}

p_{L i}

Voronoi edges are represented as follows: A Voronoi edge is represented as the common edge of two Voronoi regions

V (p_{L i})

and

V (p_{L j})

, consisting of the perpendicular bisectors of two neighboring seed points. Thus, a Voronoi edge can be represented as the set of points whose distances to

p_{L i}

and

p_{L j}

are equal, as follows:

E_{L i j} = \{p_{L} \in R^{2} | d (p_{L}, p_{L i}) = d (p_{L}, p_{L j})\}

(27)

The intersections of three or more Voronoi regions in a Voronoi diagram are called Voronoi vertices, which are points in the plane equidistant from three or more seed points. Since the dyadic diagram of a Voronoi diagram is Delaunay triangular dissection, i.e., two neighboring generating points in a Voronoi diagram, they form an edge in Delaunay triangular dissection. Therefore, it is usually more efficient to create a Voronoi diagram by Delaunay triangulation than to compute a Voronoi diagram directly. With the outer centroid computation for each triangle of the Delaunay triangular dissection and by connecting the outer centroids of neighboring triangles, a complete Voronoi diagram is eventually formed, as shown in the third row of Figure 6.

2.4.2. Localized Point-Line Homography Model

According to the property of Voronoi diagrams, i.e., there is no overlap between different Voronoi regions, the distance from the interior point of each region to the seed point to which it belongs is less than the distance to the other seed points. For each line segment, we are able to identify a Voronoi region and the closest ASIFT matching point to it.

The ASIFT matching points are used as anchor points to form the virtual surface by combining the line segments corresponding to the Voronoi region. Then, using the homography model error of the virtual surface, the matched line segments with the smallest homography model error are filtered out from the set of pre-matched line segments as the final matched line segments, as shown in Figure 7.

Homography model fitting: For a set of matching points

(p_{i}, {p^{'}}_{i})

between the left and right views,

p_{i} = {[x_{i}, y_{i}, 1]}^{T}

is homogeneous coordinate point on plane

π_{L}

and

p_{i}^{'} = {[{x^{'}}_{i}, {y^{'}}_{i}, 1]}^{T}

is homogeneous coordinate point on plane

π_{R}

corresponding to

p_{i}

. According to the definition of the homography transformation, for each point pair

(p_{i}, {p^{'}}_{i})

, the following equation is given:

p_{i}^{'} = H p_{i}

(28)

Expanding this matrix equation yields the following system of linear equations:

(\begin{matrix} {x^{'}}_{i} \\ {y^{'}}_{i} \\ 1 \end{matrix}) = (\begin{matrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{matrix}) (\begin{matrix} x_{i} \\ y_{i} \\ 1 \end{matrix})

(29)

We organize to obtain it:

\begin{matrix} {x^{'}}_{i} = \frac{h_{11} x_{i} + h_{12} y_{i} + h_{13}}{h_{31} x_{i} + h_{32} y_{i} + h_{33}} \\ {y^{'}}_{i} = \frac{h_{21} x_{i} + h_{22} y_{i} + h_{23}}{h_{31} x_{i} + h_{32} y_{i} + h_{33}} \end{matrix}

(30)

The above system of equations is rearranged to obtain a system of linear equations:

\begin{matrix} h_{11} x_{i} + h_{12} y_{i} + h_{13} - {x^{'}}_{i} h_{31} x_{i} - {x^{'}}_{i} h_{32} y_{i} - {x^{'}}_{i} h_{33} = 0 \\ h_{21} x_{i} + h_{22} y_{i} + h_{23} - {y^{'}}_{i} h_{31} x_{i} - {y^{'}}_{i} h_{32} y_{i} - {y^{'}}_{i} h_{33} = 0 \end{matrix}

(31)

We represent this as a matrix of the form

A h = 0

, where contains the vector of parameters to be solved in H:

A = (\begin{matrix} x_{i} & y_{i} & 1 & 0 & 0 & 0 & - {x^{'}}_{i} x_{i} & - {x^{'}}_{i} y_{i} & - {x^{'}}_{i} \\ 0 & 0 & 0 & x_{i} & y_{i} & 1 & - {y^{'}}_{i} x_{i} & - {y^{'}}_{i} y_{i} & - {y^{'}}_{i} \end{matrix})

(32)

h = (\begin{matrix} h_{11} \\ h_{12} \\ h_{13} \\ h_{21} \\ h_{22} \\ h_{23} \\ h_{31} \\ h_{32} \\ h_{33} \end{matrix})

(33)

The system of equations

A h = 0

is solved by the least squares method. Since H is represented by a system of linear equations, it can be solved by minimizing the following objective function:

min_{h} {∥A h∥}^{2}, s . t . h^{T} h = 1

(34)

It is possible to obtain h by solving for the eigenvector corresponding to the smallest eigenvalue of

A^{T} A

. Moreover, it is common for the last element

h_{33}

of H to be normalized to 1 to remove the scale uncertainty.

Homography model error calculation: Borrowing from the random sampling idea of RANSAC, a homography matrix

H_{t}

can be uniquely determined for the

t_{t h}

line segment combining one pair of ASIFT matching anchor points and three pairs of randomly selected line segment matches to form four pairs of points (the minimum number of pairs of points to fit a homography matrix).

Multiple iterations of fitting the homography matrix

H_{t}

yield the optimal homography matrix

{\hat{H}}_{t}

{\hat{H}}_{t} = arg min_{H} \sum_{j = 1}^{J} {∥{p^{'}}_{j} - H p_{j}∥}_{2}

(35)

where

p_{j}

and

p_{j}^{'}

are the

t_{t h}

line match points located in the left and right views and J denotes the number of line match points.

At the same time, in turn, the homography matrix

{\hat{H}}_{t}

can be applied to all matching points of the current matched segment pair

l s \leftrightarrow l s_{t}^{'}, l s_{t}^{'} \in L S^{'}

, and the error of the mapping of the homography matrix

{\hat{H}}_{t}

to the points of the current matched segment is then taken as the quantized value of the degree of matching of the current segment.

In this way, the line segment

\hat{t}

with the smallest error from the set of pre-matched line segments

L S^{'} = \{l {s^{'}}_{1}, l {s^{'}}_{2}, \dots, l {s^{'}}_{T}\}

is obtained as the final matched line segment.

\{\begin{matrix} E r r (t) = \sum_{j = 1}^{J} {∥{\hat{H}}_{t} p_{j} - {p^{'}}_{j}∥}_{2} \\ \hat{t} = arg min_{t} E r r (t) \end{matrix}

(36)

3. Result

3.1. Experimental Configurations

In this section, three challenging datasets are used to evaluate the methods. The first one is the benchmark dataset, which is frequently used for performing LSM evaluations. The line segments of the benchmark dataset are detected by the LSD detector and labeled as ground-truth matches by Li et al. [32]. We select 14 pairs of images from the benchmark dataset that contain a number of independent or composite representative line matching problems. These problems include changes in illumination, scale, rotation and viewing angle, image blurring, occlusion, and texture differences, as shown in Figure 8.

The second dataset has two stereo satellite images taken by the WorldView-2 (WV-2) satellite. The satellite image has a ground sample distance (GSD) of 0.5 m and covers about 64 km² in Washington, D.C., United States. The 18 test areas in WV-2 are selected for the experiment, containing large buildings, low buildings, and mixed areas of buildings and vegetation with the problems of illumination, viewing angle, texture difference, and occlusions. Moreover, since the LSM results are used for 3D structure reconstruction of satellite images, affine-compensated RPC stereo rectification is applied to the satellite images, with an average size of 590 px × 460 px as shown in Figure 9.

The third dataset is composed of two stereo satellite images from the WorldView-3 (WV-3) satellite provided by the Johns Hopkins University Applied Physics Laboratory (JHUAPL) [33]. The satellite image has a GSD of 0.31 m and covers about 100 km² near San Fernando, Argentina. We select 15 test areas in WV-3 for the experiment, containing large buildings, densely distributed low buildings, flat ground, and roads. Similarly, the affine-compensated RPC stereo rectification is applied to WV-3 satellite images with the average size of the stereo image of 650 px × 620 px as shown in Figure 10.

In addition, the LSD method proposed by Gioi et al. [30] is uniformly used to extract the line segments. All experiments are performed on a personal computer equipped with an Intel Core i7-8700 (3.20 GHz) CPU and 16G RAM, and the experimental process is not accelerated with GPU.

3.1.1. Evaluation Metrics

The popular LSM methods we select for the comparison of the proposed methods are the LILH method proposed by Jia et al. [18] and the SLEM method proposed by Zheng et al. [5]. Qualitative and quantitative comparisons of the experimental results of the methods are performed in which the quantitative comparison experiments use the most popular evaluation metrics: Precision, Recall, and F1-score calculated as follows.

Precision: In the matching method, Precision indicates the percentage of actual correct matches in the matching results returned by the method. The formula is

P r e c i s i o n = \frac{T P}{T P + F P}

(37)

where

T P

(True Positive) indicates the number of correct matches obtained by the method and

F P

(False Positive) indicates the number of incorrect matches obtained by the method.

T P + F P

indicates all results predicted as matches by the method (both correct and incorrect matches).

Recall: Recall represents the proportion of all possible correct matches that are successfully recognized by the method. The formula is

R e c a l l = \frac{T P}{T P + F N}

(38)

where

F N

(False Negative) denotes the number that the method incorrectly identified as a mismatch, i.e., the number of samples that are falsely identified as negative by the method.

T P + F N

denotes the total number of actual correct matches.

F1-score: F1-score is the harmonic mean of Precision and Recall which is used to evaluate the method’s precision and recall together. The formula is

F 1 - s c o r e = = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(39)

3.1.2. Parameters

Parameter setting involved in the experiment of the proposed method is shown in Table 1.

In the experiments,

τ_{d}

of Equation (18) is used to select the line segment points that are correctly matched between the left and right views, and the determination of this value is inspired by Lowe et al. [34] who utilized a similar threshold to obtain correctly matched points.

The function of

γ

in Equation (24) is to obtain other possible corresponding line segments as candidates in the range of maximum matching rate

γ

. The goal of this approach is to provide more options for screening the correct matching line segments for the localized point-line homography model, since the matching rate of “parallel misdirected line segments” varies very little. The value of

γ

affects the number of candidate line segments, which in turn influences the amount of computation of the local point and line homography feature distances. The smaller the

γ

value, the more candidate line segments need to be screened, and the larger the computation amount. The value of

γ

is set to 0.8 with reference to Lowe et al. [34].

There exists a

ω_{1} + ω_{2} = 1

relationship between

ω_{1}

and

ω_{2}

of Equation (23) which denote the weights of LPMR and CAR classifiers, respectively. We repeat experiments to verify the experimental effect of taking the value of

[0.1, 0.2, \dots, 0.9]

for the weight of CAR

ω_{2}

to determine the values of

ω_{1}

and

ω_{2}

. The results of the repeated experiments in WV-2 are shown in Figure 11 and Figure 12.

According to Figure 11, with the increase in CAR weights, Precision, Recall, and F1-score are all increasing and then decreasing, and the value of the indicator is the largest when the CAR weight is 0.5. Moreover, the values of all three metrics decline with acceleration when the CAR weight is greater than 0.5. This is hardly difficult to understand since the soft voting classifier becomes more and more sensitive to the cross angle as the CAR weights increase. However, it is common to have differences for the cross angles of the corresponding line segments, in the left and right views of satellite images, and the larger the CAR weight, the more demanding the intersection angle. Therefore, as the CAR weight increases, many potential true matched line segments are more and more likely to be ignored.

3.2. Experiments on the Benchmark Dataset

The results of the quantitative comparison of the LILH, SLEM, and proposed methods on the benchmark dataset are shown in Figure 13.

Based on the quantitative comparison of these three methods, it can be found that for Precision, the proposed method is generally ahead of LILH and SLEM. However, with respect to Recall, the proposed method falls behind SLEM in Test Images 1, 2, 3, 4, 6, and 9 of Figure 13b, which share the common characteristic of a higher total number of LSD line segments. The reason for this result is that the proposed method employs a more stringent line segment screening method, i.e., a soft voting mechanism combined with a Voronoi diagram structure to select the correct corresponding line segments. This is not contrary to our original intention of LSM, as we prefer to have more correct line segments in the matched segments for the 3D structural reconstruction of the scene, even if not many segments are matched. This also explains why Precision is higher than Recall for most of the images for which the algorithm is proposed.

In addition, the Recall of all three methods is very low in Test Images 7, 8, 9, 11, and 13 of Figure 13b. This indicates that few correctly matched line segments are obtained by the three methods compared to the actual number of matched line segments.

As shown in Figure 8, difficulties of Test Images 7 and 8 are that the images contain a large number of weakly textured regions, which leads to fewer line segment features. The trouble of Test Images 9 and 13 is that the images contain a lot of regions with similar and repetitive textures, which leads to many similar line segments, and the feature distances between line segments cannot be effectively distinguished. The challenge of Test Image 11 is that large angular orientation differences exist between the left and right images. Rarely, even in these extreme cases, the Recall of the proposed method is higher than the other two methods.

We involve PHOW-based multi-scale features in the matching rate calculation, so the proposed method outperforms LILH and SLEM, which rely on geometric constraints, under weak texture conditions. Regions with similar textures are more prone to “parallel misdirected line segments”, and it is clear that the local point-line monoclinic model with Voronoi diagram structure is superior to the LILH based on the line segment intersection monoclinic structure and the SLEM constrained by the global projection transformation model (as also shown in the comparison experiments of WV-2 and WV-3).

To further compare the performance of the three methods on the benchmark dataset, we summarize the average values of Precision, Recall, F1-score, total number of matches, and number of correct matches for 14 pairs of images in Table 2.

According to Table 2, the proposed method has the highest average values of Precision, Recall, and F1-score among the three compared methods, with the average Precision being 22.84% and 7.15% higher than LILH and SLEM, respectively, the average Recall being 35.93% and 4.86% higher than LILH and SLEM, respectively, and the average F1-score being 31.95% and 6.22% higher than LILH and SLEM, respectively.

The actual matching results of the proposed method on the benchmark dataset are shown in Figure 14. The proposed method is able to overcome the illumination variations (e.g., Figure 14a,b), orientation variations (e.g., Figure 14c,d), and the interference of weak textures (e.g., Figure 14e,f).

3.3. Experiments on the WV-2 Dataset

The quantitative comparison of the matching results of LILH, SLEM, and proposed methods on the WV-2 dataset is shown in Figure 15. Quantitative comparison results on the WV-2 dataset are similar to the benchmark dataset. In terms of Precision and F1-score, the proposed method outperforms LILH and SLEM, but the proposed method does not fully outperform the SLEM method in terms of Recall metrics, which is consistent with the results of the benchmark dataset. Nevertheless, the average Recall of the proposed method still outperforms the SLEM (see Table 3). According to Table 3, in the experiments on the WV-2 dataset, the proposed method has the highest mean values of Precision, Recall, and F1-score among the three compared methods. The average Precision is 41.03% and 27.78% higher than LILH and SLEM, respectively, the average Recall is 113.68% and 0.81% higher than LILH and SLEM, respectively, and the average F1-score is 89.56% and 12.60% higher than LILH and SLEM, respectively.

The actual matching results of the proposed method on the WV-2 dataset are shown in Figure 16. Figure 16a–c cover mixed areas of buildings and vegetation, Figure 16d–e include large buildings, and Figure 16f–h contain low buildings. Furthermore, the mismatched line segments of the proposed method are mainly concentrated in the visual difference regions between the left and right views. Such regions include differences in building sidewall surfaces due to viewpoint shifts (e.g., Figure 16a,d), changes in forested areas on the ground due to temporal changes (e.g., Figure 16a–c), and differences in building shadows due to changes in illumination (e.g., Figure 16b,c).

For the purpose of further verifying the effectiveness of soft voting classifier-based LSM strategy and Voronoi diagram-based localized point-line homography model of the proposed method, we select two sets of representative line segment matching results and their interval results to validate the effectiveness of the above strategies, as shown in Figure 17.

According to the first row of Figure 17, the points of the left red line segment are matched to the points of three right green line segments through discrete point matching of line segments based on the PHOW feature. However, with the localized point-line homography model based on the Voronoi diagram, only one most correct corresponding right line segment is determined with the left line segment. This is due to the ability of the localized point-line homography model to fit complex multi-surface structures, especially building edge regions, providing effective constraints on parallel misdirected line segments.

According to the second row of Figure 17, the points of the left red line segment are not continuous due to the breakage. But they still match the points of two right green line segments under the effect of PHOW feature. And then, with the help of soft voting classifier, the left line segment determines the most correct corresponding right line segment. This shows the effectiveness of the proposed method in the task of LSM for complex urban satellite images.

3.4. Experiments on the WV-3 Dataset

The quantitative comparison of the matching results of LILH, SLEM, and the proposed method on the WV-3 dataset is shown in Figure 18. According to Figure 18, the proposed method outperforms LILH and SLEM in all three metrics on the WV-3 dataset. Moreover, in Test Areas 2, 7, 9, 11, and 13, the Precision and Recall of the LILH and SLEM methods are slightly lower than the other regions.

It is observed that all these regions contain flat ground and roads. In the flat areas, the line segments all fall inside the same range of deformation. Thus, the geometric differences between the line segments are so small that the homography and projection models from LILH and SLEM are not sufficient to tell the differences between similar line segments. It follows that in order to cope with large-scale satellite remote sensing images with complex ground scenes, local features cannot be easily abandoned (e.g., PHOW features).

According to Table 4, the experimental results of the WV-3 dataset, the proposed method has the highest mean values of Precision, Recall, and F1-score among the three compared methods. The average Precision of the proposed method is 33.34% and 19.27% higher than LILH and SLEM, respectively, the average Recall is 55.00% and 17.59% higher than LILH and SLEM, respectively, and the average F1-score is 44.67% and 18.35% higher than LILH and SLEM, respectively.

The actual matching results of the proposed method on the WV-3 dataset are shown in Figure 19. According to Figure 19, Figure 19a–c cover large buildings, Figure 19d–f include flat ground and roads, and Figure 19g–h contain densely distributed low buildings. Furthermore, the proposed method is able to obtain a large number of corresponding line segments, no matter in roads with complex lines (e.g., Figure 19d,f), large buildings with many parallel misdirected line segments (e.g., Figure 19b,c), or dense buildings with broken line segment points (e.g., Figure 19g,h).

3.5. Complexity Analytics

In the proposed method, there are several stages of line segment discretization, PHOW feature extraction, soft voting classifier-based LSM, and calculation of localized point-line homography model.

(1) In the stage of line segment discretization, the time complexity of the Bresenham-based line segment growth method is

O (n)

. Since the number of iterations of its main loop is equal to the length of the line segment, in each iteration, the method performs the computation of constant time, including error update and coordinate selection.

(2) The time complexity of PHOW feature extraction consists of SIFT feature extraction

O (n^{2})

, bag-of-words model construction

O (k \cdot m \cdot n)

, and histogram construction

O (n)

, where k is the number of clustering centers and m is the feature dimension. Since the PHOW features are extracted only for line segment points, k is a constant and m is also a constant 128. Hence, the time complexity of PHOW feature extraction is

O (n^{2}) + O (k \cdot m \cdot n) + O (n) = O (n^{2})

(3) In LSM based on soft voting classifier, the soft voting classifier consists of an LPMR classifier and a CAR classifier. In terms of the LPMR classifier, the line segment length L, PHOW feature dimension m are constants, and only the number of line segments n exists as a variable, so the time complexity of the LPMR classifier is

O (L \cdot m \cdot n) = O (n)

. As for the CAR classifier, the calculation of the intersection angle rate is only related to the two endpoints of the line segments and the number of candidate line segments, so its time complexity is

O (n)

. Thus, the time complexity of the LSM based on soft voting classifier is

O (n) + O (n) = O (n)

(4) The time complexity of ASIFT matching points in calculation of localized point-line homography model is

O (n^{2})

[7]. The overall time complexity of first constructing the Voronoi diagram is

O (n log n)

. Then, the localized homography model is fitted using RANSAC, usually with T iterations, to increase the probability of finding the correct model with a time complexity of

O (T \cdot n)

. Finally, the time complexity of calculating the homography model error for excluding the mismatched line segments is

O (n)

. Therefore, the time complexity of calculating the distance to the homography model with localized point-lines based on the Voronoi diagram is

O (n^{2}) + O (n log n) + O (T \cdot n) + O (n) = O (n^{2})

In summary, the overall time complexity of the proposed method is

O (n) + O (n^{2}) + O (n) + O (n^{2}) = O (n^{2})

Furthermore, we counted the actual running time of LILH, SLEM, and proposed methods on the WV-2 and WV-3 datasets, as shown in Figure 20. According to Figure 20, the LILH has the longest running time among the three methods, and the SLEM method time is similar to the proposed method time. In addition, the running time of the WV-3 dataset is much longer than that of the WV-2 dataset due to the fact that the number of LSD line segments in WV-3 images exceeds that of WV-2 images.

4. Discussion

First of all, the image texture variance has an impact on the performance of the LSM method. It can be found that more textures are available in the WV-3 image than in the WV-2 image, and thus the LSD line segments are more densely distributed. In this way, more corresponding line segments are obtained by the proposed method in WV-3 data and the Recall values of the proposed method are higher compared to WV-2 experiments. However, while a large amount of texture information enables the method to obtain more corresponding line segments, it conversely creates more interference in the matching. Most of the mismatches of the proposed method are concentrated in the regions where dense line segments exist, as shown in Figure 19b,d.

Moreover, experiments with the benchmark dataset and the stereo satellite image dataset reveal that the LSM of the stereo satellite image is more susceptible to the interference of parallel misdirection. This is related to the nature of urban satellite images themselves, which contain a large number of artificial buildings. As a result, the advantages of the localized point-line homography model proposed in this paper, which is able to overcome the effects of complex surface structures, are also highlighted.

Finally, while line segments, in one dimension higher than feature points, are capable of providing more structural and geometrical information, it is also found through experiments that the running time of the LSM algorithm is also much longer than feature point matching in order to obtain reliable LSM results. Consequently, it is noteworthy to improve the efficiency of the method for future research on LSM.

5. Conclusions

The results of LSM of stereo satellite images serve as an important geometric element for 3D structural reconstruction thanks to the fact that matched line segments have advantages in spatial representation, complex shape description, and geometric computation. Unfortunately, it is still difficult for the existing line segment matching methods to effectively solve the problems of parallel misdirection and line segment breakage. In this study, we propose a “continuous–discrete–continuous” point-line cyclic LSM method based on a Voronoi diagram. It first obtains discrete line segment points using Bresenham and the discrete matching rate of line points based on PHOW features and LPMR. Then, the pre-matching results of continuous line segments are acquired by a soft voting classifier combined with CAR. Finally, the local point-line homography model is constructed based on the Voronoi diagram to provide filtration of parallel misdirected line segments, and the ultimately LSM results are achieved. The experimental results on the benchmark, WV-2 satellite image, and WV-3 satellite image datasets show that the segment-point cyclic matching mechanism can efficiently overcome the co-linear interference due to the broken line segments and varying lengths, and the local point-line homography model is able to solve the problem of parallel lines caused by similar textures and repetitive textures. Moreover, the key evaluation metrics of the proposed method exceed the advanced LSM methods.

However, there remains a small number of line segments that fail to correctly match on flat roads and large building surfaces. This could be attributed to the fact that the geometric differences between the local homography models of different line segments are too small in such areas to separate similar line segments. Accordingly, in the future research, the mutual relationship between stereo matching and LSM will be explored to solve the LSM problem in the regions with “weak geometric structure and weak texture”, and the promotion of LSM for 3D structure reconstruction of satellite images can also be investigated.

Author Contributions

Conceptualization, L.Z.; Data curation, H.W.; Funding acquisition, L.Z. and F.G.; Methodology, L.Z.; Project administration, Y.Z.; Resources, L.Z.; Software, L.Z. and F.G.; Validation, F.G.; Visualization, Y.Z., H.W. and B.Z.; Writing—original draft, L.Z.; Writing—review and editing, L.Z., F.G., Y.Z., H.W. and B.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant (No. 62401235, 62101219) and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant 23KJB520010.

Data Availability Statement

The benchmark dataset presented in the study are openly available in [32]. The WorldView-2 dataset presented in this article are not readily available because copyright restriction. The WorldView-3 dataset presented in the study are openly available in [33]. The code is available at https://github.com/zhao1i/LSM.git (accessed on 10 October 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Arce, S.; Vernon, C.A.; Hammond, J.; Newell, V.; Janson, J.; Franke, K.W.; Hedengren, J.D. Automated 3D reconstruction using optimized view-planning algorithms for iterative development of structure-from-motion models. Remote Sens. 2020, 12, 2169. [Google Scholar] [CrossRef]
Zhao, L.; Liu, Y.; Men, C.; Men, Y. Double propagation stereo matching for urban 3-d reconstruction from satellite imagery. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–17. [Google Scholar] [CrossRef]
Jia, Q.; Gao, X.; Fan, X.; Luo, Z.; Li, H.; Chen, Z. Novel coplanar line-points invariants for robust line matching across views. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part VIII 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 599–611. [Google Scholar]
Wei, D.; Zhang, Y.; Li, C. Robust line segment matching via reweighted random walks on the homography graph. Pattern Recognit. 2021, 111, 107693. [Google Scholar] [CrossRef]
Zheng, X.; Yuan, Z.; Dong, Z.; Dong, M.; Gong, J.; Xiong, H. Smoothly varying projective transformation for line segment matching. ISPRS J. Photogramm. Remote Sens. 2022, 183, 129–146. [Google Scholar] [CrossRef]
Guo, H.; Wei, D.; Zhang, Y.; Wan, Y.; Zheng, Z.; Yao, Y.; Liu, X.; Li, Z. The One-Point-One-Line geometry for robust and efficient line segment correspondence. ISPRS J. Photogramm. Remote Sens. 2024, 210, 80–96. [Google Scholar] [CrossRef]
Yu, G.; Morel, J.M. ASIFT: An algorithm for fully affine invariant comparison. Image Process. Line 2011, 1, 11–38. [Google Scholar] [CrossRef]
Zhang, L.; Koch, R. An efficient and robust line segment matching approach based on LBD descriptor and pairwise geometric consistency. J. Vis. Commun. Image Represent. 2013, 24, 794–805. [Google Scholar] [CrossRef]
López, J.; Santos, R.; Fdez-Vidal, X.R.; Pardo, X.M. Two-view line matching algorithm based on context and appearance in low-textured images. Pattern Recognit. 2015, 48, 2164–2184. [Google Scholar] [CrossRef]
Li, K.; Yao, J.; Lu, X.; Li, L.; Zhang, Z. Hierarchical line matching based on line–junction–line structure descriptor and local homography estimation. Neurocomputing 2016, 184, 207–220. [Google Scholar] [CrossRef]
Li, K.; Yao, J. Line segment matching and reconstruction via exploiting coplanar cues. ISPRS J. Photogramm. Remote Sens. 2017, 125, 33–49. [Google Scholar] [CrossRef]
Zhang, H.; Luo, Y.; Qin, F.; He, Y.; Liu, X. Elsd: Efficient line segment detector and descriptor. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2969–2978. [Google Scholar]
Wang, Z.; Nie, J.; Li, D.; Feng, X.; Wu, Y.; Xu, G. A Two-Stage Line Matching Method for Multi-temporal Remote Sensing Images. In Proceedings of the 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, 22–24 October 2021; pp. 170–174. [Google Scholar]
Wang, J.; Zhu, Q.; Liu, S.; Wang, W. Robust line feature matching based on pair-wise geometric constraints and matching redundancy. ISPRS J. Photogramm. Remote Sens. 2021, 172, 41–58. [Google Scholar] [CrossRef]
Wang, L.; Neumann, U.; You, S. Wide-baseline image matching using line signatures. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 1311–1318. [Google Scholar]
Yammine, G.; Wige, E.; Simmet, F.; Niederkorn, D.; Kaup, A. Novel similarity-invariant line descriptor and matching algorithm for global motion estimation. IEEE Trans. Circuits Syst. Video Technol. 2014, 24, 1323–1335. [Google Scholar] [CrossRef]
Wang, Q.; Zhang, W.; Liu, X.; Zhang, Z.; Baig, M.H.A.; Wang, G.; He, L.; Cui, T. Line matching of wide baseline images in an affine projection space. Int. J. Remote Sens. 2020, 41, 632–654. [Google Scholar] [CrossRef]
Jia, Q.; Fan, X.; Gao, X.; Yu, M.; Li, H.; Luo, Z. Line matching based on line-points invariant and local homography. Pattern Recognit. 2018, 81, 471–483. [Google Scholar] [CrossRef]
Chen, M.; Yan, S.; Qin, R.; Zhao, X.; Fang, T.; Zhu, Q.; Ge, X. Hierarchical line segment matching for wide-baseline images via exploiting viewpoint robust local structure and geometric constraints. ISPRS J. Photogramm. Remote Sens. 2021, 181, 48–66. [Google Scholar] [CrossRef]
Wang, J.; Liu, S.; Zhang, P. A New Line Matching Approach for High-Resolution Line Array Remote Sensing Images. Remote Sens. 2022, 14, 3287. [Google Scholar] [CrossRef]
Shen, L.; Zhu, J.; Xin, Q.; Huang, X.; Jin, T. Robust line segment mismatch removal using point-pair representation and Gaussian-uniform mixture formulation. ISPRS J. Photogramm. Remote Sens. 2023, 203, 314–327. [Google Scholar] [CrossRef]
Lange, M.; Raisch, C.; Schilling, A. Wld: A Wavelet and Learning Based Line Descriptor for Line Feature Matching; The Eurographics Association: Tübingen, Germany, 2020. [Google Scholar]
Abdellali, H.; Frohlich, R.; Vilagos, V.; Kato, Z. L2d2: Learnable line detector and descriptor. In Proceedings of the 2021 International Conference on 3D Vision (3DV), London, UK, 1–3 December 2021; pp. 442–452. [Google Scholar]
Ma, Q.; Jiang, G.; Wu, J.; Cai, C.; Lai, D.; Bai, Z.; Chen, H. WGLSM: An end-to-end line matching network based on graph convolution. Neurocomputing 2021, 453, 195–208. [Google Scholar] [CrossRef]
Li, H.; Chen, K.; Zhao, J.; Wang, J.; Kim, P.; Liu, Z.; Liu, Y.H. Learning to identify correct 2D-2D line correspondences on sphere. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11743–11752.
Pautrat, R.; Suárez, I.; Yu, Y.; Pollefeys, M.; Larsson, V. Gluestick: Robust image matching by sticking points and lines together. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 9706–9716. [Google Scholar]
Maruani, N.; Klokov, R.; Ovsjanikov, M.; Alliez, P.; Desbrun, M. Voromesh: Learning watertight surface meshes with voronoi diagrams. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 14565–14574. [Google Scholar]
Khan, M.A.; Iqbal, N.; Jamil, H.; Kim, D.H. An optimized ensemble prediction model using AutoML based on soft voting classifier for network intrusion detection. J. Netw. Comput. Appl. 2023, 212, 103560. [Google Scholar] [CrossRef]
De Franchis, C.; Meinhardt-Llopis, E.; Michel, J.; Morel, J.M.; Facciolo, G. On stereo-rectification of pushbroom images. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 5447–5451. [Google Scholar]
Von Gioi, R.G.; Jakubowicz, J.; Morel, J.M.; Randall, G. LSD: A fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 32, 722–732. [Google Scholar] [CrossRef]
Gaol, F.L. Bresenham Algorithm: Implementation and Analysis in Raster Shape. J. Comput. 2013, 8, 69–78. [Google Scholar] [CrossRef]
Li, K.; Yao, J.; Lu, M.; Heng, Y.; Wu, T.; Li, Y. Line segment matching: A benchmark. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–9. [Google Scholar]
Bosch, M.; Kurtz, Z.; Hagstrom, S.; Brown, M. A multiple view stereo benchmark for satellite imagery. In Proceedings of the 2016 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 18–20 October 2016; pp. 1–9. [Google Scholar]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]

Figure 1. Feature point matching and LSM. Matching points are derived from affine scale-invariant feature transform (ASIFT) algorithm and matching line segments are sourced from the proposed LSM method [7]. The matched feature points are connected by lines, and the points and line segments in the left image are marked in green, while those in the right image are marked in yellow.

Figure 2. Outline of the proposed LSM method.

Figure 3. An illustration of PHOW feature extraction.

Figure 4. Line segment retrivery based on polar constraints. The red line segments indicate the target line segments in the left view. The green line segments represent the candidate line segments and the yellow line segments represent the other line segments in the right view. The purple dashed lines indicate the polar lines.

Figure 5. Schematic illustration of line segment matching based on soft voting classifier. In the line graph of the right subfigure, the red curve indicates that CAR takes the value range of

[0, 1]

when the line segment intersection angle

θ \in [0, π / 6]

Figure 5. Schematic illustration of line segment matching based on soft voting classifier. In the line graph of the right subfigure, the red curve indicates that CAR takes the value range of

[0, 1]

when the line segment intersection angle

θ \in [0, π / 6]

Figure 6. Voronoi diagrams. (a) Left view, (b) Right view. First row: Results of ASIFT match point display. Feature points are represented by red dots, and the corresponding locations are marked with the same numerical labels. Second row: Voronoi diagram plotted on the satellite maps, where red dots indicate anchor points from ASIFT and blue edges denote the Voronoi diagram edges. Third row: Delaunay triangulation with Voronoi diagram, where red dots represent anchor points, red lines indicate Voronoi edges, and blue lines denote Delaunay edges. Additionally, the diagram Voronoi differences between the left and right images are highlighted with green boxes.

Figure 7. Local point-line homography geometry. In the figure, C and

C^{'}

denote the camera point locations, p and

p^{'}

denote a pair of ASIFT matching points. The red line segment represents the left line segment in the left view, the green and yellow line segments denote the line segments with the corresponding line segments in the right view. The blue solid lines denote the Voronoi edges, and the blue dots denote the Voronoi seed points.

Figure 7. Local point-line homography geometry. In the figure, C and

C^{'}

denote the camera point locations, p and

p^{'}

Figure 8. The 14 pairs of test images from the benchmark dataset. Numbers 1–14 represent different pairs of test images.

Figure 9. The 18 pairs of test areas from the WV-2 dataset. Numbers 1–18 represent different pairs of test images.

Figure 10. The 15 pairs of test areas from the WV-3 dataset. Numbers 1–15 represent different pairs of test images.

Figure 11. Precision, Recall, F1-score, and total number of matched line segments for the proposed method with CAR weights taking values of

[0.1, 0.2, \dots, 0.9]

Figure 11. Precision, Recall, F1-score, and total number of matched line segments for the proposed method with CAR weights taking values of

[0.1, 0.2, \dots, 0.9]

Figure 12. LSM results of the proposed method with CAR classifier weight values. (a–d) The results for values of

ω_{2}

of 0.1, 0.3, 0.5, and 0.7, respectively (only the left images are shown). Correctly matched line segments are highlighted in green, while incorrect line segments are highlighted in red.

Figure 12. LSM results of the proposed method with CAR classifier weight values. (a–d) The results for values of

ω_{2}

of 0.1, 0.3, 0.5, and 0.7, respectively (only the left images are shown). Correctly matched line segments are highlighted in green, while incorrect line segments are highlighted in red.

Figure 13. Statistics of the LILH, SLEM, and proposed methods on the 14 test images of the benchmark dataset. (a) Precision, (b) Recall, (c) F1-score.

Figure 14. Actual matching results of the proposed method on the benchmark dataset. (a–f) The left images of representative matching results for the benchmark dataset. Mismatches are marked in red, while correct matches are marked in green. Precision and Recall values are also provided for each image.

Figure 15. Statistics of LILH, SLEM, and proposed methods in the 18 test areas of the WV-2 dataset. (a) Precision, (b) Recall, (c) F1-score.

Figure 16. Actual matching results of the proposed method on the WV-2 dataset. (a–h) The left images of representative matching results for the WV-2 dataset. Mismatches are labeled in red, correct matches are labeled in green. Precision values are also provided for each image.

Figure 17. Intermediate results of the proposed method. The (left) figure represents the discrete matching of line segment points, with red dots indicating the matched line segment points in the left view, green dots indicating the matched line segment points in the right view, and yellow boxes indicating close-ups. The (right) figure represents the matching results of the corresponding line segment in the left figure, and the corresponding line segment is labeled in red. The blue line segments indicate the LSD line segments to be matched.

Figure 18. Statistics of LILH, SLEM, and proposed methods in the 15 test areas of the WV-3 dataset. (a) Precision, (b) Recall, (c) F1-score.

Figure 19. Actual matching results of the proposed method on the WV-3 dataset. (a–h) The left images of representative matching results for the WV-2 dataset. Mismatches are labeled in red, correct matches are labeled in green. Precision values are also provided for each image.

Figure 20. Actual runtime of LILH, SLEM, and proposed methods on WV-2 dataset and WV-3 dataset. (a) Runtime on the WV-2 dataset, (b) Runtime on the WV-3 dataset.

Table 1. Parameter settings in the experiment.

Parameters	Description	Values
$τ_{d}$	The ratio of the minimum feature distance to the secondary feature distance in PHOW feature matching (Equation (18))	0.8
$ω_{1}$	LPMR classifier weighting (Equation (23))	0.5
$ω_{2}$	CAR classifier weighting (Equation (23))	0.5
$γ$	Range of maximum possible matches for pre-matched line segment filtering (Equation (24))	0.8

Table 2. Average values of Precision, Recall, and F1-score, total number of matches and number of correct matches for the three methods on the baseline dataset.

	Precision (%)	Recall (%)	F1-Score (%)	Total Number of Matches	Number of Correct Matches
LILH	76.28	57.74	64.03	249.71	190.48
SLEM	87.45	74.85	79.54	277.00	242.24
Proposed	93.70	78.49	84.49	371.14	347.72

Table 3. Average values of Precision, Recall, and F1-score, total number of matches and number of correct matches for the three methods on the WV-2 dataset.

	Precision (%)	Recall (%)	F1-Score (%)	Total Number of Matches	Number of Correct Matches
LILH	65.85	35.61	43.88	131.45	86.56
SLEM	72.68	75.48	73.87	248.43	180.56
Proposed	92.87	76.09	83.18	197.40	183.33

Table 4. Average values of Precision, Recall, and F1-score, total number of matches and number of correct matches for the three methods on the WV-3 dataset.

	Precision (%)	Recall (%)	F1-Score (%)	Total Number of Matches	Number of Correct Matches
LILH	67.91	53.13	59.57	615.33	417.87
SLEM	75.92	70.03	72.82	727.34	552.20
Proposed	90.55	82.35	86.18	717.32	649.53

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, L.; Guo, F.; Zhu, Y.; Wang, H.; Zhou, B. A Generalized Voronoi Diagram-Based Segment-Point Cyclic Line Segment Matching Method for Stereo Satellite Images. Remote Sens. 2024, 16, 4395. https://doi.org/10.3390/rs16234395

AMA Style

Zhao L, Guo F, Zhu Y, Wang H, Zhou B. A Generalized Voronoi Diagram-Based Segment-Point Cyclic Line Segment Matching Method for Stereo Satellite Images. Remote Sensing. 2024; 16(23):4395. https://doi.org/10.3390/rs16234395

Chicago/Turabian Style

Zhao, Li, Fengcheng Guo, Yi Zhu, Haiyan Wang, and Bingqian Zhou. 2024. "A Generalized Voronoi Diagram-Based Segment-Point Cyclic Line Segment Matching Method for Stereo Satellite Images" Remote Sensing 16, no. 23: 4395. https://doi.org/10.3390/rs16234395

APA Style

Zhao, L., Guo, F., Zhu, Y., Wang, H., & Zhou, B. (2024). A Generalized Voronoi Diagram-Based Segment-Point Cyclic Line Segment Matching Method for Stereo Satellite Images. Remote Sensing, 16(23), 4395. https://doi.org/10.3390/rs16234395

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Generalized Voronoi Diagram-Based Segment-Point Cyclic Line Segment Matching Method for Stereo Satellite Images

Abstract

1. Introduction

1.1. Matching by Local Feature

1.2. Matching by Combining Geometric Features

1.3. Matching Based on Deep Learning

2. Method

2.1. Stereo Rectification of Satellite Image

2.2. Discrete Point Matching of Line Segments

2.2.1. Line Segment Discretization

2.2.2. Phow Feature Extraction

2.2.3. Matching of Discrete Line Segment Points

2.3. LSM Based on Soft Voting Classifier

2.3.1. Lpmr Classifier

2.3.2. Car Classifier

2.4. Calculation of Localized Point-Line Homography Model

2.4.1. Voronoi Diagram

2.4.2. Localized Point-Line Homography Model

3. Result

3.1. Experimental Configurations

3.1.1. Evaluation Metrics

3.1.2. Parameters

3.2. Experiments on the Benchmark Dataset

3.3. Experiments on the WV-2 Dataset

3.4. Experiments on the WV-3 Dataset

3.5. Complexity Analytics

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI