[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (373)

Search Parameters:
Keywords = stereo matching

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 8045 KiB  
Article
Estimation of Wind Turbine Blade Icing Volume Based on Binocular Vision
by Fangzheng Wei, Zhiyong Guo, Qiaoli Han and Wenkai Qi
Appl. Sci. 2025, 15(1), 114; https://doi.org/10.3390/app15010114 - 27 Dec 2024
Viewed by 230
Abstract
Icing on wind turbine blades in cold and humid weather has become a detrimental factor limiting their efficient operation, and traditional methods for detecting blade icing have various limitations. Therefore, this paper proposes a non-contact ice volume estimation method based on binocular vision [...] Read more.
Icing on wind turbine blades in cold and humid weather has become a detrimental factor limiting their efficient operation, and traditional methods for detecting blade icing have various limitations. Therefore, this paper proposes a non-contact ice volume estimation method based on binocular vision and improved image processing algorithms. The method employs a stereo matching algorithm that combines dynamic windows, multi-feature fusion, and reordering, integrating gradient, color, and other information to generate matching costs. It utilizes a cross-based support region for cost aggregation and generates the final disparity map through a Winner-Take-All (WTA) strategy and multi-step optimization. Subsequently, combining image processing techniques and three-dimensional reconstruction methods, the geometric shape of the ice is modeled, and its volume is estimated using numerical integration methods. Experimental results on volume estimation show that for ice blocks with regular shapes, the errors between the measured and actual volumes are 5.28%, 8.35%, and 4.85%, respectively; for simulated icing on wind turbine blades, the errors are 5.06%, 6.45%, and 9.54%, respectively. The results indicate that the volume measurement errors under various conditions are all within 10%, meeting the experimental accuracy requirements for measuring the volume of ice accumulation on wind turbine blades. This method provides an accurate and efficient solution for detecting blade icing without the need to modify the blades, making it suitable for wind turbines already in operation. However, in practical applications, it may be necessary to consider the impact of illumination and environmental changes on visual measurements. Full article
Show Figures

Figure 1

Figure 1
<p>Flowchart of the proposed algorithm.</p>
Full article ">Figure 2
<p>Census transformation process.</p>
Full article ">Figure 3
<p>Comparison of improved algorithms and traditional algorithms under the influence of noise.</p>
Full article ">Figure 4
<p>Comparison between the traditional Census transform and the improved Census transform.</p>
Full article ">Figure 5
<p>The process of fusing corner and edge information into a combined encoding.</p>
Full article ">Figure 6
<p>Cross construction.</p>
Full article ">Figure 7
<p>Cost aggregation diagram.</p>
Full article ">Figure 8
<p>Middlebury stereo evaluation dataset. (<b>a</b>) The left image of the stereo pair and the ground truth map of the Cones dataset; (<b>b</b>) the left image of the stereo pair and the ground truth map of the Teddy dataset.</p>
Full article ">Figure 9
<p>Schematic diagram of the experimental setup’s spatial layout.</p>
Full article ">Figure 10
<p>Experimental results. (<b>a1</b>–<b>a3</b>) Photographs of the physical objects on the left; (<b>b1</b>–<b>b3</b>) Disparity maps; (<b>c1</b>–<b>c3</b>) Point cloud diagrams.</p>
Full article ">Figure 11
<p>Experimental results. (<b>a1</b>–<b>a3</b>) Photographs of the physical objects on the left; (<b>b1</b>–<b>b3</b>) disparity maps; (<b>c1</b>–<b>c3</b>) point cloud diagrams.</p>
Full article ">
24 pages, 6629 KiB  
Article
UnDER: Unsupervised Dense Point Cloud Extraction Routine for UAV Imagery Using Deep Learning
by John Ray Bergado and Francesco Nex
Remote Sens. 2025, 17(1), 24; https://doi.org/10.3390/rs17010024 - 25 Dec 2024
Viewed by 237
Abstract
Extraction of dense 3D geographic information from ultra-high-resolution unmanned aerial vehicle (UAV) imagery unlocks a great number of mapping and monitoring applications. This is facilitated by a step called dense image matching, which tries to find pixels corresponding to the same object within [...] Read more.
Extraction of dense 3D geographic information from ultra-high-resolution unmanned aerial vehicle (UAV) imagery unlocks a great number of mapping and monitoring applications. This is facilitated by a step called dense image matching, which tries to find pixels corresponding to the same object within overlapping images captured by the UAV from different locations. Recent developments in deep learning utilize deep convolutional networks to perform this dense pixel correspondence task. A common theme in these developments is to train the network in a supervised setting using available dense 3D reference datasets. However, in this work we propose a novel unsupervised dense point cloud extraction routine for UAV imagery, called UnDER. We propose a novel disparity-shifting procedure to enable the use of a stereo matching network pretrained on an entirely different typology of image data in the disparity-estimation step of UnDER. Unlike previously proposed disparity-shifting techniques for forming cost volumes, the goal of our procedure was to address the domain shift between the images that the network was pretrained on and the UAV images, by using prior information from the UAV image acquisition. We also developed a procedure for occlusion masking based on disparity consistency checking that uses the disparity image space rather than the object space proposed in a standard 3D reconstruction routine for UAV data. Our benchmarking results demonstrated significant improvements in quantitative performance, reducing the mean cloud-to-cloud distance by approximately 1.8 times the ground sampling distance (GSD) compared to other methods. Full article
Show Figures

Figure 1

Figure 1
<p>An overview of the proposed UnDER framework consisting of three main steps: image rectification, disparity estimation, and triangulation. UnDER accepts the following as an input: undistorted UAV image pairs, camera interior and exterior orientation parameters, a disparity estimation network. UnDER produces, as a final output, a dense point cloud corresponding to the overlapping area of the image pairs.</p>
Full article ">Figure 2
<p>An overview of the parallax attention stereo matching network used in the disparity estimation step of UnDER.</p>
Full article ">Figure 3
<p>Comparison of self-attention and parallax attention. The similarity of the selected pixel (green) to other pixels (in different colors) is measured in the same feature map (self-attention), or in a feature map extracted from a paired right image (parallax attention).</p>
Full article ">Figure 4
<p>Reference figure for defining disparity shifting. It shows the image planes of a stereo pair, a basis depth for deriving the disparity shift, the projection centers of the two cameras, the image points of the left principal point in both image planes, the corresponding object point lying on the basis depth, and the disparity of the left principal point.</p>
Full article ">Figure 5
<p>Reference figure for the disparity consistency check. It shows how the occlusion mask is calculated by comparing output disparity maps by switching the base image in the image pairs. Images <math display="inline"><semantics> <msup> <mi>I</mi> <mo>′</mo> </msup> </semantics></math> and <math display="inline"><semantics> <msup> <mi>I</mi> <mrow> <mo>″</mo> </mrow> </msup> </semantics></math> are correspondingly captured at two different locations of the camera projection center <math display="inline"><semantics> <msup> <mi mathvariant="bold">Z</mi> <mo>′</mo> </msup> </semantics></math> and <math display="inline"><semantics> <msup> <mi mathvariant="bold">Z</mi> <mrow> <mo>″</mo> </mrow> </msup> </semantics></math>, and <span class="html-italic">M</span> is the output mask.</p>
Full article ">Figure 6
<p>Dataset-1 of the UseGeo dataset: full extent of the dataset, a sample undistorted image, and a corresponding subset of the reference LiDAR point cloud (left to right). The area of the sample image is located in the yellow box annotated on the extent of Dataset-1.</p>
Full article ">Figure 7
<p>The UAV-Nunspeet dataset: full extent of the dataset, a sample undistorted image, and a corresponding subset of the point cloud derived from Pix4D. The area of the sample image is located in the yellow box annotated on the extent of the dataset.</p>
Full article ">Figure 8
<p>Subset of the UAV-Zeche-Zollern dataset: the extent of the subset and the corresponding reference Pix4D point cloud.</p>
Full article ">Figure 9
<p>Plot showing the effect of varying the disparity shift ratio (<math display="inline"><semantics> <mi>δ</mi> </semantics></math>) values used in the disparity-estimation step of the point cloud extraction routine. Each solid curve corresponds to a different <math display="inline"><semantics> <mi>δ</mi> </semantics></math> value. The horizontal axis shows the base images used in each multi-stereo pair. The left vertical axis shows the natural logarithm (log) of the mean cloud-to-cloud (C2C) distance: comparing the point cloud extracted from each multi-stereo pair with the reference LiDAR point cloud. The dashed curve shows the mean baseline length of the image pairs used in the multi-stereo. The right vertical axis provides the range of values of the mean baseline length.</p>
Full article ">Figure 10
<p>Plot showing the effect of varying the disparity difference threshold (<math display="inline"><semantics> <mi>ϵ</mi> </semantics></math>) values used in the occlusion-masking step of the point cloud extraction routine. Each curve corresponds to a different <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math> value. The horizontal axis shows the base images used in each multi-stereo pair. The vertical axis shows the natural logarithm (log) of the mean cloud-to-cloud (C2C) distance, comparing the point cloud extracted from each multi-stereo pair with the reference LiDAR point cloud. A zoomed-in portion of the graph is included, to further highlight the differences in the setups with increasing <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math>.</p>
Full article ">Figure 11
<p>Plot showing the effect of using a multi-stereo setup compared to a single-stereo setup in the triangulation step of the point cloud extraction routine. The first solid curve corresponds to the single-stereo setup while the second solid curve corresponds to the multi-stereo setup. The horizontal axis shows the base images used in each single-stereo or multi-stereo pair. The left vertical axis shows the natural logarithm (log) of the mean cloud-to-cloud (C2C) distance, comparing the point cloud extracted from each multi-stereo pair with the reference LiDAR point cloud. The dashed curve shows the mean absolute difference in <math display="inline"><semantics> <mi>κ</mi> </semantics></math> values of the images used in each single-stereo and multi-stereo pair. The right vertical axis displays the range of the mean differences in <math display="inline"><semantics> <mi>κ</mi> </semantics></math> angles.</p>
Full article ">Figure 12
<p>A subset of the UseGeo Dataset-1 showing the UseGeo DIM point cloud and the mean cloud-to-cloud (C2C) distances of UnDER-P and UnDER-FN+FPCfilter (left to right) with respect to the reference LiDAR point cloud. The bottom row shows a zoomed-in portion of the subset from the top row, indicated by the yellow box. All C2C distances greater than 0.1 m are displayed in red, all C2C distances less than 0.02 m are displayed as blue, and everything in between is displayed in a gradient of green to yellow.</p>
Full article ">Figure 13
<p>Histogram of mean C2C distance values of UseGeo DIM, UnDER-P, and UnDER-FN+FPCfilter. Values beyond 0.5 m were truncated for better visualization.</p>
Full article ">
22 pages, 6639 KiB  
Article
Reliable Disparity Estimation Using Multiocular Vision with Adjustable Baseline
by Victor H. Diaz-Ramirez, Martin Gonzalez-Ruiz, Rigoberto Juarez-Salazar and Miguel Cazorla
Sensors 2025, 25(1), 21; https://doi.org/10.3390/s25010021 - 24 Dec 2024
Viewed by 227
Abstract
Accurate estimation of three-dimensional (3D) information from captured images is essential in numerous computer vision applications. Although binocular stereo vision has been extensively investigated for this task, its reliability is conditioned by the baseline between cameras. A larger baseline improves the resolution of [...] Read more.
Accurate estimation of three-dimensional (3D) information from captured images is essential in numerous computer vision applications. Although binocular stereo vision has been extensively investigated for this task, its reliability is conditioned by the baseline between cameras. A larger baseline improves the resolution of disparity estimation but increases the probability of matching errors. This research presents a reliable method for disparity estimation through progressive baseline increases in multiocular vision. First, a robust rectification method for multiocular images is introduced, satisfying epipolar constraints and minimizing induced distortion. This method can improve rectification error by 25% for binocular images and 80% for multiocular images compared to well-known existing methods. Next, a dense disparity map is estimated by stereo matching from the rectified images with the shortest baseline. Afterwards, the disparity map for the subsequent images with an extended baseline is estimated within a short optimized interval, minimizing the probability of matching errors and further error propagation. This process is iterated until the disparity map for the images with the longest baseline is obtained. The proposed method increases disparity estimation accuracy by 20% for multiocular images compared to a similar existing method. The proposed approach enables accurate scene characterization and spatial point computation from disparity maps with improved resolution. The effectiveness of the proposed method is verified through exhaustive evaluations using well-known multiocular image datasets and physical scenes, achieving superior performance over similar existing methods in terms of objective measures. Full article
(This article belongs to the Collection Robotics and 3D Computer Vision)
Show Figures

Figure 1

Figure 1
<p>Optical setup of a multiocular vision system.</p>
Full article ">Figure 2
<p>Block diagram of the proposed PSO-based method for multiocular image rectification.</p>
Full article ">Figure 3
<p>Diagram of the proposed method for disparity estimation with an adjustable baseline.</p>
Full article ">Figure 4
<p>Stereo image rectification results. (<b>a</b>) Unrectified test images. Rectified images obtained using: (<b>b</b>) Fusiello et al. [<a href="#B42-sensors-25-00021" class="html-bibr">42</a>], (<b>c</b>) Juarez-Salazar et al. [<a href="#B27-sensors-25-00021" class="html-bibr">27</a>], (<b>d</b>) DSR [<a href="#B41-sensors-25-00021" class="html-bibr">41</a>], and (<b>e</b>) proposed method.</p>
Full article ">Figure 5
<p>Constructed laboratory platform for experiments. (<b>a</b>) Frontal view of the multiocular camera. (<b>b</b>) Side view of the multiocular camera. (<b>c</b>) Test scene captured by the experimental multiocular platform.</p>
Full article ">Figure 6
<p>Multiocular image rectification results from a real scene captured with the experimental platform shown in <a href="#sensors-25-00021-f005" class="html-fig">Figure 5</a>. (<b>a</b>) Unrectified input images. Rectified images obtained using: (<b>b</b>) Yang et al. [<a href="#B44-sensors-25-00021" class="html-bibr">44</a>] method and (<b>c</b>) proposed method.</p>
Full article ">Figure 7
<p>Disparity estimation results for multiocular images obtained with the proposed approach and the method by Li et al. [<a href="#B15-sensors-25-00021" class="html-bibr">15</a>]. (<b>a</b>) Reference image of the input multiocular image set. (<b>b</b>) Ground truth disparity map of the reference image with the largest baseline. Estimated disparity maps obtained with the proposed method for: (<b>c</b>) Cameras 1 and 5. (<b>d</b>) Cameras 1, 3, and 5. (<b>e</b>) Cameras 1, 2, 3, and 5. (<b>f</b>) All images. (<b>g</b>) Estimated disparity obtained with the method by Li et al. [<a href="#B15-sensors-25-00021" class="html-bibr">15</a>].</p>
Full article ">Figure 8
<p>Three-dimensional reconstruction results obtained with the proposed approach in real scenes captured with the experimental platform shown in <a href="#sensors-25-00021-f005" class="html-fig">Figure 5</a>. (<b>a</b>) Reference images of the captured scenes. (<b>b</b>) Estimated disparity map obtained with the proposed approach between cameras 1 and 4. (<b>c</b>–<b>e</b>) Different perspective views of the reconstructed three-dimensional scenes.</p>
Full article ">Figure 9
<p>Reprojection errors obtained with the estimated intrinsic parameters obtained using the calibration methods: (<b>a</b>) DLT. (<b>b</b>) Distorted pinhole. (<b>c</b>) Zhang’s method.</p>
Full article ">
23 pages, 31563 KiB  
Article
Comparative Analysis of Deep Learning-Based Stereo Matching and Multi-View Stereo for Urban DSM Generation
by Mario Fuentes Reyes, Pablo d’Angelo and Friedrich Fraundorfer
Remote Sens. 2025, 17(1), 1; https://doi.org/10.3390/rs17010001 - 24 Dec 2024
Viewed by 362
Abstract
The creation of digital surface models (DSMs) from aerial and satellite imagery is often the starting point for different remote sensing applications. For this task, the two main used approaches are stereo matching and multi-view stereo (MVS). The former needs stereo-rectified pairs as [...] Read more.
The creation of digital surface models (DSMs) from aerial and satellite imagery is often the starting point for different remote sensing applications. For this task, the two main used approaches are stereo matching and multi-view stereo (MVS). The former needs stereo-rectified pairs as inputs and the results are in the disparity domain. The latter works with images from various perspectives and produces a result in the depth domain. So far, both approaches have proven to be successful in producing accurate DSMs, especially in the deep learning area. Nonetheless, an assessment between the two is difficult due to the differences in the input data, the domain where the directly generated results are provided and the evaluation metrics. In this manuscript, we processed synthetic and real optical data to be compatible with the stereo and MVS algorithms. Such data is then applied to learning-based algorithms in both analyzed solutions. We focus on an experimental setting trying to establish a comparison between the algorithms as fair as possible. In particular, we looked at urban areas with high object densities and sharp boundaries, which pose challenges such as occlusions and depth discontinuities. Results show in general a good performance for all experiments, with specific differences in the reconstructed objects. We describe qualitatively and quantitatively the performance of the compared cases. Moreover, we consider an additional case to fuse the results into a DSM utilizing confidence estimation, showing a further improvement and opening up a possibility for further research. Full article
(This article belongs to the Section Urban Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>Selected geometry for SyntCities samples. All images lie on the same epipolar line with different baselines. There are 6 available views for each region on the surface. Baseline distances are given with respect to <math display="inline"><semantics> <msub> <mi>V</mi> <mn>1</mn> </msub> </semantics></math>.</p>
Full article ">Figure 2
<p>Dublin digital surface model obtained by merging all provided point clouds and used as ground truth. Blue areas are low objects and red areas are high objects.</p>
Full article ">Figure 3
<p>Pipeline used to generate the Dublin dataset for both cases: Dublin_stereo and Dublin_MVS.</p>
Full article ">Figure 4
<p>Dublin_stereo dataset samples. (<b>a</b>,<b>d</b>) are the left views for the corresponding (<b>b</b>,<b>e</b>) right views, (<b>c</b>,<b>f</b>) are the ground truth aligned with the left views. Bar scale for disparities is in pixels.</p>
Full article ">Figure 5
<p>Dublin_MVS dataset samples. (<b>a</b>,<b>c</b>) are the reference views for the corresponding (<b>b</b>,<b>d</b>) ground truth. Bar scale for depth is in meters.</p>
Full article ">Figure 6
<p>Selected geometry for Dublin samples. Images lay on a flight path with an approximate baseline of 100 m, but not in the same epipolar line.</p>
Full article ">Figure 7
<p>Pipeline used to fuse the results of the predicted disparity/depth maps. In the case of the Stereo and MVS_Stereo methods, more results are available but they use the same available information as the MVS_Full case. All results then follow the same steps which include height conversion, orthorectification and fusion.</p>
Full article ">Figure 8
<p>Pipeline for confidence-based fusion. After estimating confidence maps along with the height maps obtained from the reconstruction algorithms, a stack of height maps is sorted based on the respective confidence values and then we compute the median to get the final DSM.</p>
Full article ">Figure 9
<p>DSMs and error maps for a SyntCities sample. For the reference image (<b>e</b>) with ground truth (<b>a</b>), we show the DSMs computed by using the models Stereo_SC (<b>b</b>), MVS_Full_SC (<b>c</b>) and MVS_Stereo_SC (<b>d</b>). The respective 1 m-error maps (e1m) for the same models are shown in (<b>f</b>–<b>h</b>). Scale bars for the DSMs and error maps are given as a reference and use meters as unit. Errors are clipped to a maximum of 1 m. Regions in black correspond to undefined pixels by the algorithms.</p>
Full article ">Figure 10
<p>SyntCities computed DSMs, 3D view. For the same perspective given for the ground truth (<b>a</b>), we show the results for the models Stereo_SC (<b>b</b>), MVS_Full_SC (<b>c</b>) and MVS_Stereo_SC (<b>d</b>). It covers the same area as the <a href="#remotesensing-17-00001-f009" class="html-fig">Figure 9</a>. Height values are displayed in blue to red color from low to high.</p>
Full article ">Figure 11
<p>DSMs and error maps for a Dublin sample. For ground truth (<b>a</b>), we show the DSMs computed by using the models Stereo_Du (<b>b</b>), MVS_Full_Du (<b>c</b>) and MVS_Stereo_Du (<b>d</b>). The respective 1 m-error maps (e1m) for the same models are shown in (<b>f</b>–<b>h</b>). Scale bars in meters for the DSMs and error maps are given as a reference. Errors are clipped to a maximum of 3 m. Regions in black correspond to undefined pixels by the algorithms. The corresponding orthorectified RGB is not shown, as this was not provided in the original dataset for this region. Instead, we show an oblique image captured close to this region in (<b>e</b>). This image is not aligned with the results.</p>
Full article ">Figure 12
<p>Dublin computed DSMs, 3D view. For the same perspective given for the ground truth (<b>a</b>), we show the results for the models Stereo_Du (<b>b</b>), MVS_Full_Du (<b>c</b>) and MVS_Stereo_Du (<b>d</b>). It covers the same area as the <a href="#remotesensing-17-00001-f011" class="html-fig">Figure 11</a>.</p>
Full article ">Figure 13
<p>Dublin DSMs created with confidence based fusion - Stereo case. We show cases for mean fusion without confidence (<b>a</b>), with <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="italic">rem</mi> <mo>%</mo> </msub> <mo>=</mo> <mn>25</mn> </mrow> </semantics></math> (<b>b</b>) and with <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="italic">rem</mi> <mo>%</mo> </msub> <mo>=</mo> <mn>50</mn> </mrow> </semantics></math> (<b>c</b>). Similar cases are presented for the median in (<b>d</b>–<b>f</b>). Scale bar for the error is given in meters. Yellow rectangles highlight areas with significant differences.</p>
Full article ">Figure 14
<p>Generated DSMs for a Dublin region in a 3D representation—Stereo case. Region is the same as for <a href="#remotesensing-17-00001-f013" class="html-fig">Figure 13</a>. We show three DSMs: ground truth, median fusion (no confidence based) and median fusion <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="italic">rem</mi> <mo>%</mo> </msub> <mo>=</mo> <mn>50</mn> </mrow> </semantics></math>. Changes are highlighted with the white rectangles.</p>
Full article ">
18 pages, 43610 KiB  
Article
Reliable and Effective Stereo Matching for Underwater Scenes
by Lvwei Zhu, Ying Gao, Jiankai Zhang, Yongqing Li and Xueying Li
Remote Sens. 2024, 16(23), 4570; https://doi.org/10.3390/rs16234570 - 5 Dec 2024
Viewed by 578
Abstract
Stereo matching plays a vital role in underwater environments, where accurate depth estimation is crucial for applications such as robotics and marine exploration. However, underwater imaging presents significant challenges, including noise, blurriness, and optical distortions that hinder effective stereo matching. This study develops [...] Read more.
Stereo matching plays a vital role in underwater environments, where accurate depth estimation is crucial for applications such as robotics and marine exploration. However, underwater imaging presents significant challenges, including noise, blurriness, and optical distortions that hinder effective stereo matching. This study develops two specialized stereo matching networks: UWNet and its lightweight counterpart, Fast-UWNet. UWNet utilizes self- and cross-attention mechanisms alongside an adaptive 1D-2D cross-search to enhance cost volume representation and refine disparity estimation through a cascaded update module, effectively addressing underwater imaging challenges. Due to the need for timely responses in underwater operations by robots and other devices, real-time processing speed is critical for task completion. Fast-UWNet addresses this challenge by prioritizing efficiency, eliminating the reliance on the time-consuming recurrent updates commonly used in traditional methods. Instead, it directly converts the cost volume into a set of disparity candidates and their associated confidence scores. Adaptive interpolation, guided by content and confidence information, refines the cost volume to produce the final accurate disparity. This streamlined approach achieves an impressive inference speed of 0.02 s per image. Comprehensive tests conducted in diverse underwater settings demonstrate the effectiveness of both networks, showcasing their ability to achieve reliable depth perception. Full article
(This article belongs to the Special Issue Artificial Intelligence and Big Data for Oceanography)
Show Figures

Figure 1

Figure 1
<p>Visualization of the style-transferred images: (<b>a</b>) the original input image, (<b>b</b>) the input style image, (<b>c</b>) the style-transferred image after initial AdaIN processing, and (<b>d</b>) the image after non-local mean filtering.</p>
Full article ">Figure 2
<p>The overview of UWNet. The feature extractor processes the input stereo image pair to obtain feature information and contextual features. The lowest-resolution features are refined through the coordination of self-attention and cross-attention mechanisms to achieve more accurate feature representations. These processed features are then used to compute the cost volume through a 1D-2D cross-search strategy. Finally, the contextual features and the cost volume are fed into a cascaded iterative refinement module to generate the final disparity map.</p>
Full article ">Figure 3
<p>1D-2D cross-search.</p>
Full article ">Figure 4
<p>Overview of our Fast-UWNet. The core ideas are cost propagation and the affinity constraint. Disparity candidates and their confidence are generated based on cost propagation, and the affinity constraint effectively filters these candidates for improved accuracy.</p>
Full article ">Figure 5
<p>The k-th propagation process.</p>
Full article ">Figure 6
<p>Visualization of confidence maps.</p>
Full article ">Figure 7
<p>The operational paradigm of the affinity constraint. Each yellow circle represents a pixel point in the feature map, and the relationship with neighboring pixels is calculated from the central pixel point outwards.</p>
Full article ">Figure 8
<p>Visualization on the underwater dataset.</p>
Full article ">Figure 9
<p>Visualization on the SQUID dataset.</p>
Full article ">Figure 10
<p>Performance comparison between Fast-ACVNet and Fast-UWNet during training shows a clear trend: our method consistently outperforms the other in terms of EPE.</p>
Full article ">Figure 11
<p>Visualization on the Scene Flow dataset.</p>
Full article ">
20 pages, 4856 KiB  
Article
Enhancing the Ground Truth Disparity by MAP Estimation for Developing a Neural-Net Based Stereoscopic Camera
by Hanbit Gil, Sehyun Ryu and Sungmin Woo
Sensors 2024, 24(23), 7761; https://doi.org/10.3390/s24237761 - 4 Dec 2024
Viewed by 533
Abstract
This paper presents a novel method to enhance ground truth disparity maps generated by Semi-Global Matching (SGM) using Maximum a Posteriori (MAP) estimation. SGM, while not producing visually appealing outputs like neural networks, offers high disparity accuracy in valid regions and avoids the [...] Read more.
This paper presents a novel method to enhance ground truth disparity maps generated by Semi-Global Matching (SGM) using Maximum a Posteriori (MAP) estimation. SGM, while not producing visually appealing outputs like neural networks, offers high disparity accuracy in valid regions and avoids the generalization issues often encountered with neural network-based disparity estimation. However, SGM struggles with occlusions and textureless areas, leading to invalid disparity values. Our approach, though relatively simple, mitigates these issues by interpolating invalid pixels using surrounding disparity information and Bayesian inference, improving both the visual quality of disparity maps and their usability for training neural network-based commercial depth-sensing devices. Experimental results validate that our enhanced disparity maps preserve SGM’s accuracy in valid regions while improving the overall performance of neural networks on both synthetic and real-world datasets. This method provides a robust framework for advanced stereoscopic camera systems, particularly in autonomous applications. Full article
Show Figures

Figure 1

Figure 1
<p>The proposed framework for enhancing SGM disparity map.</p>
Full article ">Figure 2
<p>Example of left (<b>a</b>) and RGB (<b>b</b>) images.</p>
Full article ">Figure 3
<p>(<b>a</b>) Disparity map generated by SGM for the images in <a href="#sensors-24-07761-f001" class="html-fig">Figure 1</a>. (<b>b</b>) Enlarged view of (<b>a</b>). Dark blue pixels indicate “invalid” regions. The numbers shown represent disparity values for each grouped region.</p>
Full article ">Figure 4
<p>(<b>a</b>) Prior probability, (<b>b</b>) Likelihood, and (<b>c</b>) Posterior distribution of an invalid pixel from <a href="#sensors-24-07761-f003" class="html-fig">Figure 3</a>.</p>
Full article ">Figure 5
<p>Preprocessing steps for the proposed method: (<b>a</b>) Original cropped patch, (<b>b</b>) Standardized patch, (<b>c</b>) Mask, and (<b>d</b>) Masked patch.</p>
Full article ">Figure 6
<p>(<b>a</b>) Left masked cropped patch. (<b>b</b>) Right cropped candidate patches.</p>
Full article ">Figure 7
<p>Disparity map comparisons on the synthetic Driving dataset across different scenes. (<b>a</b>) Ground truth, (<b>b</b>) SGM<math display="inline"><semantics> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mrow> <mi>t</mi> <mi>h</mi> </mrow> </msub> <mo>=</mo> <mn>10</mn> <mo>)</mo> </mrow> </semantics></math>, (<b>c</b>) SGM<math display="inline"><semantics> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mrow> <mi>t</mi> <mi>h</mi> </mrow> </msub> <mo>=</mo> <mn>0</mn> <mo>)</mo> </mrow> </semantics></math>, (<b>d</b>) Linear Interpolation, (<b>e</b>) Nearest Interpolation, (<b>f</b>) PDE, (<b>g</b>) ShCNN [<a href="#B56-sensors-24-07761" class="html-bibr">56</a>], (<b>h</b>) GMCNN [<a href="#B55-sensors-24-07761" class="html-bibr">55</a>], (<b>i</b>) MADF [<a href="#B58-sensors-24-07761" class="html-bibr">58</a>], (<b>j</b>) Chen [<a href="#B57-sensors-24-07761" class="html-bibr">57</a>], and (<b>k</b>) The proposed. Invalid regions are shown in darkish blue.</p>
Full article ">Figure 8
<p>Example captured images of real-world indoor scenes.</p>
Full article ">Figure 9
<p>Disparity map comparisons across different real-world scenes. (<b>a</b>) Input left images, (<b>b</b>) SGM<math display="inline"><semantics> <mrow> <mo>(</mo> <msub> <mi>U</mi> <mrow> <mi>t</mi> <mi>h</mi> </mrow> </msub> <mo>=</mo> <mn>10</mn> <mo>)</mo> </mrow> </semantics></math>, (<b>c</b>) Linear Interpolation, (<b>d</b>) Nearest Interpolation, (<b>e</b>) PDE, (<b>f</b>) Shepard inpainting [<a href="#B56-sensors-24-07761" class="html-bibr">56</a>], (<b>g</b>) GMCNN [<a href="#B55-sensors-24-07761" class="html-bibr">55</a>], (<b>h</b>) MADF [<a href="#B58-sensors-24-07761" class="html-bibr">58</a>], (<b>i</b>) Chen [<a href="#B57-sensors-24-07761" class="html-bibr">57</a>], and (<b>j</b>) The proposed. The insets highlight areas with significant differences, particularly in challenging regions with occlusions and textureless surfaces. Invalid regions are shown in darkish blue.</p>
Full article ">Figure 10
<p>Basic CNN-based model for disparity estimation.</p>
Full article ">Figure 11
<p>ResNet-based model for disparity estimation. Note that the additional low-scale intermediate feature maps are used to capture structural information of disparity at the original size.</p>
Full article ">Figure 12
<p>Vision Transformer-based model for disparity estimation. This model replaces the Encoder of the baseline model with a Vision Transformer (ViT) and modifies the Decoder from <a href="#sensors-24-07761-f010" class="html-fig">Figure 10</a> accordingly.</p>
Full article ">Figure 13
<p>Disparity map comparisons across various scenes using different models. (<b>a</b>) GT, (<b>b</b>) CNN-based, (<b>c</b>) ResNet-based, (<b>d</b>) ViT-based, (<b>e</b>) PSMNet [<a href="#B50-sensors-24-07761" class="html-bibr">50</a>]. The first row of each scene is trained with the original SGM GT, and the second row with the proposed GT. Invalid regions are shown in darkish red.</p>
Full article ">Figure 14
<p>Relationship between patch size <math display="inline"><semantics> <msub> <mi mathvariant="bold">S</mi> <mi>p</mi> </msub> </semantics></math> and both error and invalid pixel ratios for various prior window sizes <math display="inline"><semantics> <msub> <mi mathvariant="bold">S</mi> <mi>w</mi> </msub> </semantics></math> at a fixed intensity threshold <math display="inline"><semantics> <mrow> <msub> <mi>I</mi> <mrow> <mi>t</mi> <mi>h</mi> </mrow> </msub> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math>. (<b>a</b>) shows how smaller values of <math display="inline"><semantics> <msub> <mi mathvariant="bold">S</mi> <mi>w</mi> </msub> </semantics></math> and larger values of <math display="inline"><semantics> <msub> <mi mathvariant="bold">S</mi> <mi>p</mi> </msub> </semantics></math> tend to minimize error, with an optimal configuration observed around <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold">S</mi> <mi>w</mi> </msub> <mo>=</mo> <mn>17</mn> <mo>×</mo> <mn>17</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="bold">S</mi> <mi>p</mi> </msub> <mo>=</mo> <mn>24</mn> <mo>×</mo> <mn>4</mn> </mrow> </semantics></math>. (<b>b</b>) demonstrates that smaller values of <math display="inline"><semantics> <msub> <mi mathvariant="bold">S</mi> <mi>w</mi> </msub> </semantics></math> generally lead to higher invalid pixel ratios, while smaller values of <math display="inline"><semantics> <msub> <mi mathvariant="bold">S</mi> <mi>p</mi> </msub> </semantics></math> help in reducing invalid pixel ratios. This indicates a trade-off in parameter selection between minimizing error and reducing invalid pixel ratios.</p>
Full article ">Figure 15
<p>3D visualization of error-based on prior window size <math display="inline"><semantics> <msub> <mi mathvariant="bold">S</mi> <mi>w</mi> </msub> </semantics></math> and patch size <math display="inline"><semantics> <msub> <mi mathvariant="bold">S</mi> <mi>p</mi> </msub> </semantics></math>, with point size indicating the reciprocal of invalid pixel ratios.</p>
Full article ">
29 pages, 30892 KiB  
Article
A Generalized Voronoi Diagram-Based Segment-Point Cyclic Line Segment Matching Method for Stereo Satellite Images
by Li Zhao, Fengcheng Guo, Yi Zhu, Haiyan Wang and Bingqian Zhou
Remote Sens. 2024, 16(23), 4395; https://doi.org/10.3390/rs16234395 - 24 Nov 2024
Viewed by 407
Abstract
Matched line segments are crucial geometric elements for reconstructing the desired 3D structure in stereo satellite imagery, owing to their advantages in spatial representation, complex shape description, and geometric computation. However, existing line segment matching (LSM) methods face significant challenges in effectively addressing [...] Read more.
Matched line segments are crucial geometric elements for reconstructing the desired 3D structure in stereo satellite imagery, owing to their advantages in spatial representation, complex shape description, and geometric computation. However, existing line segment matching (LSM) methods face significant challenges in effectively addressing co-linear interference and the misdirection of parallel line segments. To address these issues, this study proposes a “continuous–discrete–continuous” cyclic LSM method, based on the Voronoi diagram, for stereo satellite images. Initially, to compute the discrete line-point matching rate, line segments are discretized using the Bresenham algorithm, and the pyramid histogram of visual words (PHOW) feature is assigned to the line segment points which are detected using the line segment detector (LSD). Next, to obtain continuous matched line segments, the method combines the line segment crossing angle rate with the line-point matching rate, utilizing a soft voting classifier. Finally, local point-line homography models are constructed based on the Voronoi diagram, filtering out misdirected parallel line segments and yielding the final matched line segments. Extensive experiments on the challenging benchmark, WorldView-2 and WorldView-3 satellite image datasets, demonstrate that the proposed method outperforms several state-of-the-art LSM methods. Specifically, the proposed method achieves F1-scores that are 6.22%, 12.60%, and 18.35% higher than those of the best-performing existing LSM method on the three datasets, respectively. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

Figure 1
<p>Feature point matching and LSM. Matching points are derived from affine scale-invariant feature transform (ASIFT) algorithm and matching line segments are sourced from the proposed LSM method [<a href="#B7-remotesensing-16-04395" class="html-bibr">7</a>]. The matched feature points are connected by lines, and the points and line segments in the left image are marked in green, while those in the right image are marked in yellow.</p>
Full article ">Figure 2
<p>Outline of the proposed LSM method.</p>
Full article ">Figure 3
<p>An illustration of PHOW feature extraction.</p>
Full article ">Figure 4
<p>Line segment retrivery based on polar constraints. The red line segments indicate the target line segments in the left view. The green line segments represent the candidate line segments and the yellow line segments represent the other line segments in the right view. The purple dashed lines indicate the polar lines.</p>
Full article ">Figure 5
<p>Schematic illustration of line segment matching based on soft voting classifier. In the line graph of the right subfigure, the red curve indicates that CAR takes the value range of <math display="inline"><semantics> <mfenced separators="" open="[" close="]"> <mrow> <mn>0</mn> <mo>,</mo> <mo> </mo> <mn>1</mn> </mrow> </mfenced> </semantics></math> when the line segment intersection angle <math display="inline"><semantics> <mrow> <mi>θ</mi> <mo>∈</mo> <mfenced separators="" open="[" close="]"> <mrow> <mn>0</mn> <mo>,</mo> <mrow> <mi>π</mi> <mrow> <mfenced open="/" close=""> <mphantom> <mpadded width="0pt"> <mi>π</mi> <mn>6</mn> </mpadded> </mphantom> </mfenced> <mspace width="0.0pt"/> </mrow> <mn>6</mn> </mrow> </mrow> </mfenced> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>Voronoi diagrams. (<b>a</b>) Left view, (<b>b</b>) Right view. First row: Results of ASIFT match point display. Feature points are represented by red dots, and the corresponding locations are marked with the same numerical labels. Second row: Voronoi diagram plotted on the satellite maps, where red dots indicate anchor points from ASIFT and blue edges denote the Voronoi diagram edges. Third row: Delaunay triangulation with Voronoi diagram, where red dots represent anchor points, red lines indicate Voronoi edges, and blue lines denote Delaunay edges. Additionally, the diagram Voronoi differences between the left and right images are highlighted with green boxes.</p>
Full article ">Figure 7
<p>Local point-line homography geometry. In the figure, <span class="html-italic">C</span> and <math display="inline"><semantics> <msup> <mi>C</mi> <mo>′</mo> </msup> </semantics></math> denote the camera point locations, <span class="html-italic">p</span> and <math display="inline"><semantics> <msup> <mi>p</mi> <mo>′</mo> </msup> </semantics></math> denote a pair of ASIFT matching points. The red line segment represents the left line segment in the left view, the green and yellow line segments denote the line segments with the corresponding line segments in the right view. The blue solid lines denote the Voronoi edges, and the blue dots denote the Voronoi seed points.</p>
Full article ">Figure 8
<p>The 14 pairs of test images from the benchmark dataset. Numbers 1–14 represent different pairs of test images.</p>
Full article ">Figure 9
<p>The 18 pairs of test areas from the WV-2 dataset. Numbers 1–18 represent different pairs of test images.</p>
Full article ">Figure 10
<p>The 15 pairs of test areas from the WV-3 dataset. Numbers 1–15 represent different pairs of test images.</p>
Full article ">Figure 11
<p>Precision, Recall, F1-score, and total number of matched line segments for the proposed method with CAR weights taking values of <math display="inline"><semantics> <mfenced separators="" open="[" close="]"> <mrow> <mn>0.1</mn> <mo>,</mo> <mn>0.2</mn> <mo>,</mo> <mo>…</mo> <mo>,</mo> <mn>0.9</mn> </mrow> </mfenced> </semantics></math>.</p>
Full article ">Figure 12
<p>LSM results of the proposed method with CAR classifier weight values. (<b>a</b>–<b>d</b>) The results for values of <math display="inline"><semantics> <msub> <mi>ω</mi> <mn>2</mn> </msub> </semantics></math> of 0.1, 0.3, 0.5, and 0.7, respectively (only the left images are shown). Correctly matched line segments are highlighted in green, while incorrect line segments are highlighted in red.</p>
Full article ">Figure 13
<p>Statistics of the LILH, SLEM, and proposed methods on the 14 test images of the benchmark dataset. (<b>a</b>) Precision, (<b>b</b>) Recall, (<b>c</b>) F1-score.</p>
Full article ">Figure 14
<p>Actual matching results of the proposed method on the benchmark dataset. (<b>a</b>–<b>f</b>) The left images of representative matching results for the benchmark dataset. Mismatches are marked in red, while correct matches are marked in green. Precision and Recall values are also provided for each image.</p>
Full article ">Figure 15
<p>Statistics of LILH, SLEM, and proposed methods in the 18 test areas of the WV-2 dataset. (<b>a</b>) Precision, (<b>b</b>) Recall, (<b>c</b>) F1-score.</p>
Full article ">Figure 16
<p>Actual matching results of the proposed method on the WV-2 dataset. (<b>a</b>–<b>h</b>) The left images of representative matching results for the WV-2 dataset. Mismatches are labeled in red, correct matches are labeled in green. Precision values are also provided for each image.</p>
Full article ">Figure 17
<p>Intermediate results of the proposed method. The (<b>left</b>) figure represents the discrete matching of line segment points, with red dots indicating the matched line segment points in the left view, green dots indicating the matched line segment points in the right view, and yellow boxes indicating close-ups. The (<b>right</b>) figure represents the matching results of the corresponding line segment in the left figure, and the corresponding line segment is labeled in red. The blue line segments indicate the LSD line segments to be matched.</p>
Full article ">Figure 18
<p>Statistics of LILH, SLEM, and proposed methods in the 15 test areas of the WV-3 dataset. (<b>a</b>) Precision, (<b>b</b>) Recall, (<b>c</b>) F1-score.</p>
Full article ">Figure 19
<p>Actual matching results of the proposed method on the WV-3 dataset. (<b>a</b>–<b>h</b>) The left images of representative matching results for the WV-2 dataset. Mismatches are labeled in red, correct matches are labeled in green. Precision values are also provided for each image.</p>
Full article ">Figure 20
<p>Actual runtime of LILH, SLEM, and proposed methods on WV-2 dataset and WV-3 dataset. (<b>a</b>) Runtime on the WV-2 dataset, (<b>b</b>) Runtime on the WV-3 dataset.</p>
Full article ">
18 pages, 6146 KiB  
Article
A Near-Infrared Imaging System for Robotic Venous Blood Collection
by Zhikang Yang, Mao Shi, Yassine Gharbi, Qian Qi, Huan Shen, Gaojian Tao, Wu Xu, Wenqi Lyu and Aihong Ji
Sensors 2024, 24(22), 7413; https://doi.org/10.3390/s24227413 - 20 Nov 2024
Viewed by 917
Abstract
Venous blood collection is a widely used medical diagnostic technique, and with rapid advancements in robotics, robotic venous blood collection has the potential to replace traditional manual methods. The success of this robotic approach is heavily dependent on the quality of vein imaging. [...] Read more.
Venous blood collection is a widely used medical diagnostic technique, and with rapid advancements in robotics, robotic venous blood collection has the potential to replace traditional manual methods. The success of this robotic approach is heavily dependent on the quality of vein imaging. In this paper, we develop a vein imaging device based on the simulation analysis of vein imaging parameters and propose a U-Net+ResNet18 neural network for vein image segmentation. The U-Net+ResNet18 neural network integrates the residual blocks from ResNet18 into the encoder of the U-Net to form a new neural network. ResNet18 is pre-trained using the Bootstrap Your Own Latent (BYOL) framework, and its encoder parameters are transferred to the U-Net+ResNet18 neural network, enhancing the segmentation performance of vein images with limited labelled data. Furthermore, we optimize the AD-Census stereo matching algorithm by developing a variable-weight version, which improves its adaptability to image variations across different regions. Results show that, compared to U-Net, the BYOL+U-Net+ResNet18 method achieves an 8.31% reduction in Binary Cross-Entropy (BCE), a 5.50% reduction in Hausdorff Distance (HD), a 15.95% increase in Intersection over Union (IoU), and a 9.20% increase in the Dice coefficient (Dice), indicating improved image segmentation quality. The average error of the optimized AD-Census stereo matching algorithm is reduced by 25.69%, and the improvement of the image stereo matching performance is more obvious. Future research will explore the application of the vein imaging system in robotic venous blood collection to facilitate real-time puncture guidance. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

Figure 1
<p>Schematic diagram of arm vein imaging.</p>
Full article ">Figure 2
<p>Simulate NIR propagation through arm tissue. (<b>a</b>) Radial two-dimensional cross-section of the local arm model. The black rectangles represent the skin, subcutaneous tissue, and muscle layers, from top to bottom, while the circle represents the radial cross-sections of the vein. (<b>b</b>) The ratio of photon densities at <span class="html-italic">x</span> = 2.00 mm. (<b>c</b>) The ratio of photon densities at <span class="html-italic">y</span> = 3.80 mm. (<b>d</b>) The simulation of photon density variation at an incident light wavelength of 850 nm. (<b>e</b>) Rectangular light source and light-receiving plane model. (<b>f</b>) Circular light source and light-receiving plane model. (<b>g</b>) The ratio of illuminance to mean illuminance on the <span class="html-italic">x</span>-axis.</p>
Full article ">Figure 3
<p>Vein imaging device.</p>
Full article ">Figure 4
<p>Schematic diagram of the vein imaging system for robotic venipuncture.</p>
Full article ">Figure 5
<p>(<b>a</b>) U-Net+ResNet18 neural network. (<b>b</b>) Neural network pre-training and model parameters migration.</p>
Full article ">Figure 6
<p>Cross-based Cost Aggregation. (<b>a</b>) Cross-based regions and Support regions, the cross shadows represent the cross-based regions, and the other shadows represent the support regions. (<b>b</b>) Horizontal aggregation, the blue arrows represent the aggregation direction. (<b>c</b>) Vertical aggregation.</p>
Full article ">Figure 7
<p>Vein image random transformation. (<b>a</b>) Original NIR vein image. (<b>b</b>,<b>c</b>) The vein image after random transformation.</p>
Full article ">Figure 8
<p>The variation of the loss function with epoch.</p>
Full article ">Figure 9
<p>NIR vein images segmentation results. (<b>a</b>) Original NIR vein images. (<b>b</b>) NIR vein images segmentation results using the Hessian matrix. (<b>c</b>) NIR vein images segmentation results using BYOL+U-Net+ResNet18 method. (<b>d</b>) Image binarization effect. (<b>e</b>) The labels corresponding to the original image.</p>
Full article ">Figure 10
<p>Variation of each neural network model metric with epochs. (<b>a</b>) Variation of BCE with epochs. (<b>b</b>) Variation of IoU with epochs. (<b>c</b>) Variation of Dice with epochs. (<b>d</b>) Variation of HD with epochs.</p>
Full article ">Figure 11
<p>Vein centerline extraction. (<b>a</b>) Pre-processed NIR greyscale map of veins. (<b>b</b>) Vein centerline extracted by the proposed algorithm in this paper. (<b>c</b>) The image after connecting and eliminating small connected regions using the contour connection algorithm (see the red circles).</p>
Full article ">Figure 12
<p>Comparison of results of stereo matching algorithms. (<b>a</b>) Left image. (<b>b</b>) Right image. (<b>c</b>) Disparity map of AD-Census algorithm. (<b>d</b>) Disparity map of optimization AD-Census algorithm.</p>
Full article ">Figure 13
<p>Vein image visualization process. (<b>a</b>) Original vein image collected by the camera. (<b>b</b>) Vein centerline extraction results. (<b>c</b>) Vein image segmentation results. (<b>d</b>) Disparity map.</p>
Full article ">
16 pages, 4359 KiB  
Article
Adaptive Kernel Convolutional Stereo Matching Recurrent Network
by Jiamian Wang, Haijiang Sun and Ping Jia
Sensors 2024, 24(22), 7386; https://doi.org/10.3390/s24227386 - 20 Nov 2024
Viewed by 516
Abstract
For binocular stereo matching techniques, the most advanced method currently is using an iterative structure based on GRUs. Methods in this class have shown high performance on both high-resolution images and standard benchmarks. However, simply replacing cost aggregation with a GRU iterative method [...] Read more.
For binocular stereo matching techniques, the most advanced method currently is using an iterative structure based on GRUs. Methods in this class have shown high performance on both high-resolution images and standard benchmarks. However, simply replacing cost aggregation with a GRU iterative method leads to the original cost volume for disparity calculation lacking non-local geometric and contextual information. Based on this, this paper proposes a new GRU iteration-based adaptive kernel convolution deep recurrent network architecture for stereo matching. This paper proposes a kernel convolution-based adaptive multi-scale pyramid pooling (KAP) module that fully considers the spatial correlation between pixels and adds new matching attention (MAR) to refine the matching cost volume before inputting it into the iterative network for iterative updates, enhancing the pixel-level representation ability of the image and improving the overall generalization ability of the network. At present, the AKC-Stereo network proposed in this paper has a higher improvement than the basic network. On the Sceneflow dataset, the EPE of AKC-Stereo reaches 0.45, which is 0.02 higher than the basic network. On the KITTI 2015 dataset, the AKC-Stereo network outperforms the base network by 5.6% on the D1-all metric. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) shows the performance comparison between the AKC-Stereo network proposed in this article and the base network on the KITTI 2015 dataset with the change in the number of iterations; num_steps is set to 20,000, the abscissa is the number of iterations, and the ordinate is the EPE endpoint error data. In the figure, the orange part represents the performance of the base network, namely IGEV-Stereo, as the number of iterations changes, and the blue part represents the performance of the AKC-Stereo network as the number of iterations changes. (<b>b</b>) shows the effect of smaller batch size on the performance of the AKC-Stereo network proposed in this article (orange lines), RAFT-Stereo (green lines), and IGEV-Stereo (yellow lines) on the KITTI 2015 dataset with equal training rounds. The abscissa is the batch size number, and the ordinate is the EPE endpoint error data.</p>
Full article ">Figure 2
<p>The overall structure diagram of the proposed AKC-Stereo. AKC-Stereo first constructs a multi-scale adaptive feature extractor (KAP), then computes the matching correlation volume by grouping correlation method, and then preliminarily refines it using the matching attention (MAR) module. Finally, the refined correlation volume is fused with the context-encoded features obtained through residual blocks, and the combined data are fed into the GRU iteration for further optimization through iterative updates.</p>
Full article ">Figure 3
<p>(<b>a</b>) shows the sampling search window with a normal dilation rate of 3 for feature extraction; (<b>b</b>) shows the positions where the sampling points should be shifted after adding offsets; (<b>c</b>) illustrates the more precise sampling search window obtained after the offsets are applied.</p>
Full article ">Figure 4
<p>The figure illustrates some adaptive search windows with varying dilation rates for a 3 × 3 convolutional kernel. Specifically, (<b>a</b>) shows the adaptive search window with a dilation rate of 1; (<b>b</b>) displays the adaptive search window with a dilation rate of 3; (<b>c</b>) presents the adaptive search window with a dilation rate of 4; and (<b>e</b>) shows the fusion of features into four feature maps, respectively, by taking three layers of (<b>a</b>–<b>c</b>), three layers of (<b>b</b>–<b>d</b>), three layers of (<b>a</b>,<b>b</b>,<b>d</b>), and three layers of (<b>a</b>,<b>c</b>,<b>d</b>), and then the residual connection operation is performed to obtain the final feature map.</p>
Full article ">Figure 5
<p>The concrete structure diagram of the KAP module.</p>
Full article ">Figure 6
<p>The pink cube on the left is the 3D correlation volume of size <math display="inline"><semantics> <mrow> <mi>W</mi> <mo>×</mo> <mi>H</mi> <mo>×</mo> <mi mathvariant="normal">D</mi> </mrow> </semantics></math> calculated by correlation. The cubes in the middle, positioned above and below, represent the disparity-level attention calculated for the <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>W</mi> </mrow> <mrow> <msub> <mrow> <mi>d</mi> </mrow> <mrow> <mi>i</mi> </mrow> </msub> </mrow> </msub> <mo>×</mo> <msub> <mrow> <mi>H</mi> </mrow> <mrow> <msub> <mrow> <mi>d</mi> </mrow> <mrow> <mi>i</mi> </mrow> </msub> </mrow> </msub> </mrow> </semantics></math> plane and the epipolar-line-level attention calculated for the <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>W</mi> </mrow> <mrow> <msub> <mrow> <mi>H</mi> </mrow> <mrow> <mi>x</mi> </mrow> </msub> </mrow> </msub> <mo>×</mo> <msub> <mrow> <mi>d</mi> </mrow> <mrow> <msub> <mrow> <mi>H</mi> </mrow> <mrow> <mi>x</mi> </mrow> </msub> </mrow> </msub> </mrow> </semantics></math> plane, respectively. The blue cube on the right illustrates the refined correlation volume after being processed by the MAR matching attention module. This refined correlation volume is then fed into the GRU for iterative updates to further optimize the disparity map.</p>
Full article ">Figure 7
<p>Qualitative results for Middlebury. The first column shows the original images (left images) in the dataset, and the second and third columns show the results of IGEV-Stereo and the AKC-Stereo network proposed in this article, respectively. The proposed network exhibits better results for detailed parts as well as regions with a background.</p>
Full article ">Figure 8
<p>Qualitative results on the KITTI2015 dataset. The first column shows the left image of the original image in the dataset, the second column shows the operation results of the baseline IGEV-Stereo, and the third column shows the operation results of the AKC-Stereo network. From the direction indicated by the arrow, it can be seen that the AKC-Stereo network proposed in this article performs exceptionally well in areas with high light reflection such as signs and areas with thin structures such as railing.</p>
Full article ">
21 pages, 7841 KiB  
Article
Research on a Method for Measuring the Pile Height of Materials in Agricultural Product Transport Vehicles Based on Binocular Vision
by Wang Qian, Pengyong Wang, Hongjie Wang, Shuqin Wu, Yang Hao, Xiaoou Zhang, Xinyu Wang, Wenyan Sun, Haijie Guo and Xin Guo
Sensors 2024, 24(22), 7204; https://doi.org/10.3390/s24227204 - 11 Nov 2024
Viewed by 665
Abstract
The advancement of unloading technology in combine harvesting is crucial for the intelligent development of agricultural machinery. Accurately measuring material pile height in transport vehicles is essential, as uneven accumulation can lead to spillage and voids, reducing loading efficiency. Relying solely on manual [...] Read more.
The advancement of unloading technology in combine harvesting is crucial for the intelligent development of agricultural machinery. Accurately measuring material pile height in transport vehicles is essential, as uneven accumulation can lead to spillage and voids, reducing loading efficiency. Relying solely on manual observation for measuring stack height can decrease harvesting efficiency and pose safety risks due to driver distraction. This research applies binocular vision to agricultural harvesting, proposing a novel method that uses a stereo matching algorithm to measure material pile height during harvesting. By comparing distance measurements taken in both empty and loaded states, the method determines stack height. A linear regression model processes the stack height data, enhancing measurement accuracy. A binocular vision system was established, applying Zhang’s calibration method on the MATLAB (R2019a) platform to correct camera parameters, achieving a calibration error of 0.15 pixels. The study implemented block matching (BM) and semi-global block matching (SGBM) algorithms using the OpenCV (4.8.1) library on the PyCharm (2020.3.5) platform for stereo matching, generating disparity, and pseudo-color maps. Three-dimensional coordinates of key points on the piled material were calculated to measure distances from the vehicle container bottom and material surface to the binocular camera, allowing for the calculation of material pile height. Furthermore, a linear regression model was applied to correct the data, enhancing the accuracy of the measured pile height. The results indicate that by employing binocular stereo vision and stereo matching algorithms, followed by linear regression, this method can accurately calculate material pile height. The average relative error for the BM algorithm was 3.70%, and for the SGBM algorithm, it was 3.35%, both within the acceptable precision range. While the SGBM algorithm was, on average, 46 ms slower than the BM algorithm, both maintained errors under 7% and computation times under 100 ms, meeting the real-time measurement requirements for combine harvesting. In practical operations, this method can effectively measure material pile height in transport vehicles. The choice of matching algorithm should consider container size, material properties, and the balance between measurement time, accuracy, and disparity map completeness. This approach aids in manual adjustment of machinery posture and provides data support for future autonomous master-slave collaborative operations in combine harvesting. Full article
(This article belongs to the Special Issue AI, IoT and Smart Sensors for Precision Agriculture)
Show Figures

Figure 1

Figure 1
<p>Principle of triangulation.</p>
Full article ">Figure 2
<p>Zhang’ calibration steps.</p>
Full article ">Figure 3
<p>The corner extraction results of the checkerboard. (<b>a</b>) Calibration paper; (<b>b</b>) Calibration plate.</p>
Full article ">Figure 4
<p>The relative position between the binocular camera and the calibration board. (<b>a</b>) Calibration paper; (<b>b</b>) calibration plate.</p>
Full article ">Figure 5
<p>The reprojection errors of the chessboard calibration. (<b>a</b>) Calibration paper; (<b>b</b>) calibration plate.</p>
Full article ">Figure 6
<p>Polar correction. (<b>a</b>) Before correction; (<b>b</b>) after correction.</p>
Full article ">Figure 6 Cont.
<p>Polar correction. (<b>a</b>) Before correction; (<b>b</b>) after correction.</p>
Full article ">Figure 7
<p>Basic workflow of stereo matching.</p>
Full article ">Figure 8
<p>Method for measuring the height of piled materials.</p>
Full article ">Figure 9
<p>The process of measuring the piled height of potatoes.</p>
Full article ">Figure 10
<p>The image under no-load conditions. (<b>a</b>) Left image; (<b>b</b>) right image; (<b>c</b>) BM disparity map; (<b>d</b>) BM pseudo-color map; (<b>e</b>) SGBM disparity map; (<b>f</b>) SGBM pseudo-color map.</p>
Full article ">Figure 11
<p>Images of three different load conditions.</p>
Full article ">Figure 11 Cont.
<p>Images of three different load conditions.</p>
Full article ">Figure 12
<p>The distance measurement results between the surface of stacked potatoes and the stereo camera under three different conditions. (<b>a</b>) State 1; (<b>b</b>) state 2; (<b>c</b>) state 3.</p>
Full article ">Figure 12 Cont.
<p>The distance measurement results between the surface of stacked potatoes and the stereo camera under three different conditions. (<b>a</b>) State 1; (<b>b</b>) state 2; (<b>c</b>) state 3.</p>
Full article ">Figure 13
<p>Regression model and evaluation metrics. (<b>a</b>) BM measurement values and calibrated values; (<b>b</b>) SGBM measurement values and calibrated values; (<b>c</b>) residual plot of the BM regression model; (<b>d</b>) residual plot of the SGBM regression model.</p>
Full article ">Figure 13 Cont.
<p>Regression model and evaluation metrics. (<b>a</b>) BM measurement values and calibrated values; (<b>b</b>) SGBM measurement values and calibrated values; (<b>c</b>) residual plot of the BM regression model; (<b>d</b>) residual plot of the SGBM regression model.</p>
Full article ">Figure 14
<p>Comparison of pile heights and errors before and after calibration.</p>
Full article ">
17 pages, 13227 KiB  
Article
Robot Localization Method Based on Multi-Sensor Fusion in Low-Light Environment
by Mengqi Wang, Zengzeng Lian, María Amparo Núñez-Andrés, Penghui Wang, Yalin Tian, Zhe Yue and Lingxiao Gu
Electronics 2024, 13(22), 4346; https://doi.org/10.3390/electronics13224346 - 6 Nov 2024
Viewed by 740
Abstract
When robots perform localization in indoor low-light environments, factors such as weak and uneven lighting can degrade image quality. This degradation results in a reduced number of feature extractions by the visual odometry front end and may even cause tracking loss, thereby impacting [...] Read more.
When robots perform localization in indoor low-light environments, factors such as weak and uneven lighting can degrade image quality. This degradation results in a reduced number of feature extractions by the visual odometry front end and may even cause tracking loss, thereby impacting the algorithm’s positioning accuracy. To enhance the localization accuracy of mobile robots in indoor low-light environments, this paper proposes a visual inertial odometry method (L-MSCKF) based on the multi-state constraint Kalman filter. Addressing the challenges of low-light conditions, we integrated Inertial Measurement Unit (IMU) data with stereo vision odometry. The algorithm includes an image enhancement module and a gyroscope zero-bias correction mechanism to facilitate feature matching in stereo vision odometry. We conducted tests on the EuRoC dataset and compared our method with other similar algorithms, thereby validating the effectiveness and accuracy of L-MSCKF. Full article
Show Figures

Figure 1

Figure 1
<p>Algorithm procedure.</p>
Full article ">Figure 2
<p>Selection of image enhancement algorithm parameters. (<b>a</b>) Fixed low-frequency gain at 0.5, sharpening coefficient at 1, and contrast threshold set to 4. (<b>b</b>) Fixed high-frequency gain at 1.6, sharpening coefficient of 1, and contrast threshold of 4. (<b>c</b>) High-frequency gain is set to 1.6, low-frequency gain to 0.3, and the contrast threshold to 4. (<b>d</b>) The fixed high-frequency gain is 1.6, the low-frequency gain is 0.3, and the sharpening coefficient is 1.5.</p>
Full article ">Figure 3
<p>Comparison of feature point extraction effect. (<b>a</b>) The feature point extraction result of the original image. (<b>b</b>) The feature point extraction result after CLAHE processing. (<b>c</b>) The feature point extraction result after homomorphic filtering processing. (<b>d</b>) The feature point extraction result after both CLAHE processing and homomorphic filtering processing.</p>
Full article ">Figure 4
<p>Estimation of gyroscope bias coefficients on the MH02 and V203 sequences. (<b>a</b>) Variation in gyroscope bias for L-MSCKF and MSCKF-VIO on the MH02 sequence. (<b>b</b>) Estimated gyroscope bias values by L-MSCKF and MSCKF-VIO on the V203 sequence.</p>
Full article ">Figure 5
<p>The trajectory of the algorithm on sequences V103 and V203 of the EuRoC dataset. (<b>a</b>) The trajectory on the V103 sequence. (<b>b</b>) The X, Y, and Z triaxial values on the V103 sequence. (<b>c</b>) The trajectory on the V203 sequence. (<b>d</b>) The X, Y, and Z triaxial values on the V203 sequence.</p>
Full article ">Figure 6
<p>Comparison of absolute trajectory errors of each algorithm on weak light sequence V203.</p>
Full article ">Figure 7
<p>Comparison of the computational efficiency of each algorithm. (<b>a</b>) Average CPU usage in % of the total available CPU, by the algorithms running the same experiment. (<b>b</b>) Total running time of each algorithm on the same dataset.</p>
Full article ">
17 pages, 3301 KiB  
Article
Stereo and LiDAR Loosely Coupled SLAM Constrained Ground Detection
by Tian Sun, Lei Cheng, Ting Zhang, Xiaoping Yuan, Yanzheng Zhao and Yong Liu
Sensors 2024, 24(21), 6828; https://doi.org/10.3390/s24216828 - 24 Oct 2024
Viewed by 811
Abstract
In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain [...] Read more.
In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain limitations. This paper proposes a hybrid localization and mapping method using stereo vision and LiDAR. Unlike the traditional single-sensor systems, we construct a pose optimization model by matching ground information between LiDAR maps and visual images. We use stereo vision to extract ground information and fuse it with LiDAR tensor voting data to establish coplanarity constraints. Pose optimization is achieved through a graph-based optimization algorithm and a local window optimization method. The proposed method is evaluated using the KITTI dataset and compared against the ORB-SLAM3, F-LOAM, LOAM, and LeGO-LOAM methods. Additionally, we generate 3D point cloud maps for the corresponding sequences and high-definition point cloud maps of the streets in sequence 00. The experimental results demonstrate significant improvements in trajectory accuracy and robustness, enabling the construction of clear, dense 3D maps. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

Figure 1
<p>Pose optimization based on ground information. <span class="html-italic">T</span>, <span class="html-italic">p</span>, and <span class="html-italic">q</span> represent the transformation matrix, points on the plane, and points off the plane, respectively.</p>
Full article ">Figure 2
<p>The stereo sensor model and the coordinate systems used [<a href="#B34-sensors-24-06828" class="html-bibr">34</a>].</p>
Full article ">Figure 3
<p>Region of interest extraction. (<b>a</b>) Left image. (<b>b</b>) Right image. (<b>c</b>) Disparity image. (<b>d</b>) v-disparity. (<b>e</b>) u-disparity. (<b>d</b>,<b>e</b>) are derived from (<b>c</b>). (<b>f</b>) Large obstacles removed by removing peak values from (<b>e</b>). (<b>g</b>) v-disparity based on (<b>f</b>), and red line is disparity profile of ground plane. (<b>h</b>) Detected ground plane and region of interest (RoI); RoI in red box. (<b>i</b>) City 3D reconstruction; green represents ground.</p>
Full article ">Figure 4
<p>Graph-structure optimization. <span class="html-italic">P</span> represents the nodes of visual points, and <span class="html-italic">X</span> represents the pose of the frame. “Ground” denotes the ground information extracted from the 3D reconstruction.</p>
Full article ">Figure 5
<p>Trajectory estimates in the KITTI dataset. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 6
<p>High-definition display of point clouds for some streets in the 00 sequence. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 7
<p>3D reconstruction based on road constraints, where green represents the road. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 7 Cont.
<p>3D reconstruction based on road constraints, where green represents the road. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 8
<p>High-definition display of point clouds for some streets in the 00 sequence.The image in the top left corner is a 3D reconstruction of the entire city, and the other images depict details of its streets (<b>a</b>–<b>e</b>).</p>
Full article ">
15 pages, 8542 KiB  
Article
The Adversarial Robust and Generalizable Stereo Matching for Infrared Binocular Based on Deep Learning
by Bowen Liu, Jiawei Ji, Cancan Tao, Jujiu Li and Yingxun Wang
J. Imaging 2024, 10(11), 264; https://doi.org/10.3390/jimaging10110264 - 22 Oct 2024
Viewed by 829
Abstract
Despite the considerable success of deep learning methods in stereo matching for binocular images, the generalizability and robustness of these algorithms, particularly under challenging conditions such as occlusions or degraded infrared textures, remain uncertain. This paper presents a novel deep-learning-based depth optimization method [...] Read more.
Despite the considerable success of deep learning methods in stereo matching for binocular images, the generalizability and robustness of these algorithms, particularly under challenging conditions such as occlusions or degraded infrared textures, remain uncertain. This paper presents a novel deep-learning-based depth optimization method that obviates the need for large infrared image datasets and adapts seamlessly to any specific infrared camera. Moreover, this adaptability extends to standard binocular images, allowing the method to work effectively on both infrared and visible light stereo images. We further investigate the role of infrared textures in a deep learning framework, demonstrating their continued utility for stereo matching even in complex lighting environments. To compute the matching cost volume, we apply the multi-scale census transform to the input stereo images. A stacked sand leak subnetwork is subsequently employed to address the matching task. Our approach substantially improves adversarial robustness while maintaining accuracy on comparison with state-of-the-art methods which decrease nearly a half in EPE for quantitative results on widely used autonomous driving datasets. Furthermore, the proposed method exhibits superior generalization capabilities, transitioning from simulated datasets to real-world datasets without the need for fine-tuning. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision)
Show Figures

Figure 1

Figure 1
<p>Workflow.</p>
Full article ">Figure 2
<p>Comparisons of performance on standard datasets.</p>
Full article ">Figure 3
<p>Illustration of the adversarial patch attack. The first row is the image selected in the KITTI 2015 with ground truth but attacked by PGD method. The second row to the last are the results of: our method, LEAStereo, PSMNet, and GANet, where the first and third lines are the depth maps and the second and fourth lines are the error maps generated by the result in the first and third lines compared with the ground truth. The colors in the depth maps (in first and third lines) given by our methods are light to dark in the following order: white, green, pink, red, blue, and dark which means from near to far.</p>
Full article ">Figure 4
<p>Depth prediction map given by baseline and our D. w/o b. method.</p>
Full article ">Figure 5
<p>The result of depth prediction in real complex environment by infrared camera.</p>
Full article ">Figure A1
<p>The sample of real infrared binocular data captured for training the model.</p>
Full article ">Figure A2
<p>Laser depth map for dataset annotation.</p>
Full article ">Figure A3
<p>The sample of annotated infrared stereo dataset (ground truth).</p>
Full article ">Figure A4
<p>The PGD attack attached (the pixels are generated randomly).</p>
Full article ">
19 pages, 29661 KiB  
Article
High-Precision Disparity Estimation for Lunar Scene Using Optimized Census Transform and Superpixel Refinement
by Zhen Liang, Hongfeng Long, Zijian Zhu, Zifei Cao, Jinhui Yi, Yuebo Ma, Enhai Liu and Rujin Zhao
Remote Sens. 2024, 16(21), 3930; https://doi.org/10.3390/rs16213930 - 22 Oct 2024
Viewed by 561
Abstract
High-precision lunar scene 3D data are essential for lunar exploration and the construction of scientific research stations. Currently, most existing data from orbital imagery offers resolutions up to 0.5–2 m, which is inadequate for tasks requiring centimeter-level precision. To overcome this, our research [...] Read more.
High-precision lunar scene 3D data are essential for lunar exploration and the construction of scientific research stations. Currently, most existing data from orbital imagery offers resolutions up to 0.5–2 m, which is inadequate for tasks requiring centimeter-level precision. To overcome this, our research focuses on using in situ stereo vision systems for finer 3D reconstructions directly from the lunar surface. However, the scarcity and homogeneity of available lunar surface stereo datasets, combined with the Moon’s unique conditions—such as variable lighting from low albedo, sparse surface textures, and extensive shadow occlusions—pose significant challenges to the effectiveness of traditional stereo matching techniques. To address the dataset gap, we propose a method using Unreal Engine 4 (UE4) for high-fidelity physical simulation of lunar surface scenes, generating high-resolution images under realistic and challenging conditions. Additionally, we propose an optimized cost calculation method based on Census transform and color intensity fusion, along with a multi-level super-pixel disparity optimization, to improve matching accuracy under harsh lunar conditions. Experimental results demonstrate that the proposed method exhibits exceptional robustness and accuracy in our soon-to-be-released multi-scene lunar dataset, effectively addressing issues related to special lighting conditions, weak textures, and shadow occlusion, ultimately enhancing disparity estimation accuracy. Full article
Show Figures

Figure 1

Figure 1
<p>Lunar terrain simulation model map: (<b>a</b>) Lunar surface terrain mesh refinement model map. The size of the red area is 10 km × 10 km, where each square grid size is 25.6 cm. The size of the blue area is 25 km × 25 km, where each square grid size is 2.56 m. The size of the blue area is 40 km × 40 km, where each square grid size is 256 m. (<b>b</b>) Lunar terrain camera rendering map.</p>
Full article ">Figure 2
<p>Wireframe view of scene objects.</p>
Full article ">Figure 3
<p>Lighting model comparison: (<b>a</b>) Original lighting model effect. (<b>b</b>) Our improved lighting model effect.</p>
Full article ">Figure 4
<p>Example of a lunar scene dataset created: (<b>a</b>) Left-camera image of the lunar scene. (<b>b</b>) Right-camera image of the lunar scene. (<b>c</b>) Disparity truth map of the left camera image. (<b>d</b>) Depth truth map of the left camera image.</p>
Full article ">Figure 5
<p>Examples of the results from traditional stereo matching algorithms BM and SAD in lunar scenes: (<b>a</b>) Image data collected in our high-fidelity lunar scene physics simulation. This is the left image of a pair of stereo images. (<b>b</b>) Disparity map obtained using the BM algorithm, showing noticeable matching errors and disparity gaps, particularly in regions with weak textures and shadow occlusions. (<b>c</b>) Disparity map generated by the SAD algorithm, also exhibiting significant errors and disparity voids in texture-sparse and shadow-affected areas.</p>
Full article ">Figure 6
<p>Workflow of the proposed method. On the far left is our input stereoscopic image pair of the moon scene: (<b>a</b>) The process of our proposed optimized Census transform multi-feature fusion cost calculation. (<b>b</b>) The process of the multi-layer superpixel disparity optimization.</p>
Full article ">Figure 7
<p>Comparison of optimized Census transform and original Census transform in weakly textured areas: (<b>a</b>) The effect of noise on the original Census transform. (<b>b</b>) The effect of noise on the optimized Census variation method.</p>
Full article ">Figure 8
<p>Disparity estimation results for test images of the lunar research stations scene dataset. (<b>b</b>–<b>f</b>) represent the disparity estimation images of different methods in scenes 1-5 respectively. (<b>a</b>) Original image. (<b>b</b>) Disparity image obtained by SGM. (<b>c</b>) Disparity image obtained by StereoBM. (<b>d</b>) Disparity image obtained by AD Census. (<b>e</b>) Disparity image obtained by MC-CNN. (<b>f</b>) Disparity image obtained by Our method. (<b>g</b>) Ground truth disparity maps.</p>
Full article ">Figure 9
<p>Disparity estimation results for test images of the lunar research stations scene dataset. (<b>b</b>–<b>f</b>) represent the disparity estimation images of different methods in scenes 1–5 respectively. (<b>a</b>) Original image. (<b>b</b>) Disparity image obtained by SGM. (<b>c</b>) Disparity image obtained by StereoBM. (<b>d</b>) Disparity image obtained by AD Census. (<b>e</b>) Disparity image obtained by MC-CNN. (<b>f</b>) Disparity image obtained by Our method. (<b>g</b>) Ground truth disparity maps.</p>
Full article ">Figure 10
<p>Visual comparison of different methods on example images. The first and second rows are the results of two pairs of datasets in different combination experiments. They are denoted as Scene 1 and Scene 2, respectively: (<b>a</b>) Left image of the original image pair. (<b>b</b>–<b>e</b>) Disparity map of the Census, Opt_Census, Census + SDO, Opt_Census + SDO, respectively. (<b>f</b>) Ground truth disparity maps.</p>
Full article ">Figure 11
<p>Disparity estimation results for Yutu-2 real lunar scene dataset images: (<b>a</b>) The left-camera image from the stereo image pair captured by the Yutu-2 panoramic camera. (<b>b</b>–<b>e</b>) Disparity maps estimated by the stereo matching algorithm SGM, StereoBM, AD Census, and MCCNN. (<b>f</b>) Disparity map estimated by our proposed method.</p>
Full article ">
23 pages, 3934 KiB  
Article
A Multi-Scale Covariance Matrix Descriptor and an Accurate Transformation Estimation for Robust Point Cloud Registration
by Fengguang Xiong, Yu Kong, Xinhe Kuang, Mingyue Hu, Zhiqiang Zhang, Chaofan Shen and Xie Han
Appl. Sci. 2024, 14(20), 9375; https://doi.org/10.3390/app14209375 - 14 Oct 2024
Cited by 1 | Viewed by 834
Abstract
This paper presents a robust point cloud registration method based on a multi-scale covariance matrix descriptor and an accurate transformation estimation. Compared with state-of-the-art feature descriptors, such as FPH, 3DSC, spin image, etc., our proposed multi-scale covariance matrix descriptor is superior for dealing [...] Read more.
This paper presents a robust point cloud registration method based on a multi-scale covariance matrix descriptor and an accurate transformation estimation. Compared with state-of-the-art feature descriptors, such as FPH, 3DSC, spin image, etc., our proposed multi-scale covariance matrix descriptor is superior for dealing with registration problems in a higher noise environment since the mean operation in generating the covariance matrix can filter out most of the noise-damaged samples or outliers and also make itself robust to noise. Compared with transformation estimation, such as feature matching, clustering, ICP, RANSAC, etc., our transformation estimation is able to find a better optimal transformation between a pair of point clouds since our transformation estimation is a multi-level point cloud transformation estimator including feature matching, coarse transformation estimation based on clustering, and a fine transformation estimation based on ICP. Experiment findings reveal that our proposed feature descriptor and transformation estimation outperforms state-of-the-art feature descriptors and transformation estimation, and registration effectiveness based on our registration framework of point cloud is extremely successful in the Stanford 3D Scanning Repository, the SpaceTime dataset, and the Kinect dataset, where the Stanford 3D Scanning Repository is known for its comprehensive collection of high-quality 3D scans, and the SpaceTime dataset and the Kinect dataset are captured by a SpaceTime Stereo scanner and a low-cost Microsoft Kinect scanner, respectively. Full article
Show Figures

Figure 1

Figure 1
<p>The framework of our point cloud registration.</p>
Full article ">Figure 2
<p>Distribution of a boundary and a non-boundary point with their neighboring points.</p>
Full article ">Figure 3
<p>Geometric relations α, β, and γ between a keypoint <span class="html-italic">p</span> and one of its neighboring points.</p>
Full article ">Figure 4
<p>Samples of point clouds from our dataset.</p>
Full article ">Figure 5
<p>Boundary points under various differences between adjacent included angles.</p>
Full article ">Figure 6
<p>Keypoints on different point clouds. (<b>a</b>) Keypoints illustrator 1 with boundary point retained. (<b>b</b>) Keypoints illustrator 1 with boundary point removed. (<b>c</b>) Keypoints illustrator 2 with boundary point retained. (<b>d</b>) Keypoints illustrator 2 with boundary point removed.</p>
Full article ">Figure 7
<p>Performance of covariance matrix descriptor formed by different feature vectors under different noise conditions.</p>
Full article ">Figure 8
<p>Performance comparison between our proposed covariance matrix descriptor and the state-of-art feature descriptors under different noise conditions.</p>
Full article ">Figure 9
<p>The datasets used in the experiments.</p>
Full article ">
Back to TopTop