MDPI - Publisher of Open Access Journals

20 pages, 8045 KiB

Open AccessArticle

Estimation of Wind Turbine Blade Icing Volume Based on Binocular Vision

by Fangzheng Wei, Zhiyong Guo, Qiaoli Han and Wenkai Qi

Appl. Sci. 2025, 15(1), 114; https://doi.org/10.3390/app15010114 - 27 Dec 2024

Viewed by 230

Icing on wind turbine blades in cold and humid weather has become a detrimental factor limiting their efficient operation, and traditional methods for detecting blade icing have various limitations. Therefore, this paper proposes a non-contact ice volume estimation method based on binocular vision [...] Read more.

Icing on wind turbine blades in cold and humid weather has become a detrimental factor limiting their efficient operation, and traditional methods for detecting blade icing have various limitations. Therefore, this paper proposes a non-contact ice volume estimation method based on binocular vision and improved image processing algorithms. The method employs a stereo matching algorithm that combines dynamic windows, multi-feature fusion, and reordering, integrating gradient, color, and other information to generate matching costs. It utilizes a cross-based support region for cost aggregation and generates the final disparity map through a Winner-Take-All (WTA) strategy and multi-step optimization. Subsequently, combining image processing techniques and three-dimensional reconstruction methods, the geometric shape of the ice is modeled, and its volume is estimated using numerical integration methods. Experimental results on volume estimation show that for ice blocks with regular shapes, the errors between the measured and actual volumes are 5.28%, 8.35%, and 4.85%, respectively; for simulated icing on wind turbine blades, the errors are 5.06%, 6.45%, and 9.54%, respectively. The results indicate that the volume measurement errors under various conditions are all within 10%, meeting the experimental accuracy requirements for measuring the volume of ice accumulation on wind turbine blades. This method provides an accurate and efficient solution for detecting blade icing without the need to modify the blades, making it suitable for wind turbines already in operation. However, in practical applications, it may be necessary to consider the impact of illumination and environmental changes on visual measurements. Full article

► Show Figures

Figure 1

24 pages, 6629 KiB

Open AccessArticle

UnDER: Unsupervised Dense Point Cloud Extraction Routine for UAV Imagery Using Deep Learning

by John Ray Bergado and Francesco Nex

Remote Sens. 2025, 17(1), 24; https://doi.org/10.3390/rs17010024 - 25 Dec 2024

Viewed by 237

Abstract

Extraction of dense 3D geographic information from ultra-high-resolution unmanned aerial vehicle (UAV) imagery unlocks a great number of mapping and monitoring applications. This is facilitated by a step called dense image matching, which tries to find pixels corresponding to the same object within [...] Read more.

Extraction of dense 3D geographic information from ultra-high-resolution unmanned aerial vehicle (UAV) imagery unlocks a great number of mapping and monitoring applications. This is facilitated by a step called dense image matching, which tries to find pixels corresponding to the same object within overlapping images captured by the UAV from different locations. Recent developments in deep learning utilize deep convolutional networks to perform this dense pixel correspondence task. A common theme in these developments is to train the network in a supervised setting using available dense 3D reference datasets. However, in this work we propose a novel unsupervised dense point cloud extraction routine for UAV imagery, called UnDER. We propose a novel disparity-shifting procedure to enable the use of a stereo matching network pretrained on an entirely different typology of image data in the disparity-estimation step of UnDER. Unlike previously proposed disparity-shifting techniques for forming cost volumes, the goal of our procedure was to address the domain shift between the images that the network was pretrained on and the UAV images, by using prior information from the UAV image acquisition. We also developed a procedure for occlusion masking based on disparity consistency checking that uses the disparity image space rather than the object space proposed in a standard 3D reconstruction routine for UAV data. Our benchmarking results demonstrated significant improvements in quantitative performance, reducing the mean cloud-to-cloud distance by approximately 1.8 times the ground sampling distance (GSD) compared to other methods. Full article

(This article belongs to the Special Issue Deep Learning Applications of 3D Reconstruction and Visualization from Remote Sensing Imagery)

► Show Figures

Figure 1

22 pages, 6639 KiB

Open AccessArticle

Reliable Disparity Estimation Using Multiocular Vision with Adjustable Baseline

by Victor H. Diaz-Ramirez, Martin Gonzalez-Ruiz, Rigoberto Juarez-Salazar and Miguel Cazorla

Sensors 2025, 25(1), 21; https://doi.org/10.3390/s25010021 - 24 Dec 2024

Viewed by 227

Abstract

Accurate estimation of three-dimensional (3D) information from captured images is essential in numerous computer vision applications. Although binocular stereo vision has been extensively investigated for this task, its reliability is conditioned by the baseline between cameras. A larger baseline improves the resolution of [...] Read more.

Accurate estimation of three-dimensional (3D) information from captured images is essential in numerous computer vision applications. Although binocular stereo vision has been extensively investigated for this task, its reliability is conditioned by the baseline between cameras. A larger baseline improves the resolution of disparity estimation but increases the probability of matching errors. This research presents a reliable method for disparity estimation through progressive baseline increases in multiocular vision. First, a robust rectification method for multiocular images is introduced, satisfying epipolar constraints and minimizing induced distortion. This method can improve rectification error by 25% for binocular images and 80% for multiocular images compared to well-known existing methods. Next, a dense disparity map is estimated by stereo matching from the rectified images with the shortest baseline. Afterwards, the disparity map for the subsequent images with an extended baseline is estimated within a short optimized interval, minimizing the probability of matching errors and further error propagation. This process is iterated until the disparity map for the images with the longest baseline is obtained. The proposed method increases disparity estimation accuracy by 20% for multiocular images compared to a similar existing method. The proposed approach enables accurate scene characterization and spatial point computation from disparity maps with improved resolution. The effectiveness of the proposed method is verified through exhaustive evaluations using well-known multiocular image datasets and physical scenes, achieving superior performance over similar existing methods in terms of objective measures. Full article

(This article belongs to the Collection Robotics and 3D Computer Vision)

► Show Figures

Figure 1

23 pages, 31563 KiB

Open AccessArticle

Comparative Analysis of Deep Learning-Based Stereo Matching and Multi-View Stereo for Urban DSM Generation

by Mario Fuentes Reyes, Pablo d’Angelo and Friedrich Fraundorfer

Remote Sens. 2025, 17(1), 1; https://doi.org/10.3390/rs17010001 - 24 Dec 2024

Viewed by 362

Abstract

The creation of digital surface models (DSMs) from aerial and satellite imagery is often the starting point for different remote sensing applications. For this task, the two main used approaches are stereo matching and multi-view stereo (MVS). The former needs stereo-rectified pairs as [...] Read more.

The creation of digital surface models (DSMs) from aerial and satellite imagery is often the starting point for different remote sensing applications. For this task, the two main used approaches are stereo matching and multi-view stereo (MVS). The former needs stereo-rectified pairs as inputs and the results are in the disparity domain. The latter works with images from various perspectives and produces a result in the depth domain. So far, both approaches have proven to be successful in producing accurate DSMs, especially in the deep learning area. Nonetheless, an assessment between the two is difficult due to the differences in the input data, the domain where the directly generated results are provided and the evaluation metrics. In this manuscript, we processed synthetic and real optical data to be compatible with the stereo and MVS algorithms. Such data is then applied to learning-based algorithms in both analyzed solutions. We focus on an experimental setting trying to establish a comparison between the algorithms as fair as possible. In particular, we looked at urban areas with high object densities and sharp boundaries, which pose challenges such as occlusions and depth discontinuities. Results show in general a good performance for all experiments, with specific differences in the reconstructed objects. We describe qualitatively and quantitatively the performance of the compared cases. Moreover, we consider an additional case to fuse the results into a DSM utilizing confidence estimation, showing a further improvement and opening up a possibility for further research. Full article

(This article belongs to the Section Urban Remote Sensing)

► Show Figures

Figure 1

18 pages, 43610 KiB

Open AccessArticle

Reliable and Effective Stereo Matching for Underwater Scenes

by Lvwei Zhu, Ying Gao, Jiankai Zhang, Yongqing Li and Xueying Li

Remote Sens. 2024, 16(23), 4570; https://doi.org/10.3390/rs16234570 - 5 Dec 2024

Viewed by 578

Abstract

Stereo matching plays a vital role in underwater environments, where accurate depth estimation is crucial for applications such as robotics and marine exploration. However, underwater imaging presents significant challenges, including noise, blurriness, and optical distortions that hinder effective stereo matching. This study develops [...] Read more.

Stereo matching plays a vital role in underwater environments, where accurate depth estimation is crucial for applications such as robotics and marine exploration. However, underwater imaging presents significant challenges, including noise, blurriness, and optical distortions that hinder effective stereo matching. This study develops two specialized stereo matching networks: UWNet and its lightweight counterpart, Fast-UWNet. UWNet utilizes self- and cross-attention mechanisms alongside an adaptive 1D-2D cross-search to enhance cost volume representation and refine disparity estimation through a cascaded update module, effectively addressing underwater imaging challenges. Due to the need for timely responses in underwater operations by robots and other devices, real-time processing speed is critical for task completion. Fast-UWNet addresses this challenge by prioritizing efficiency, eliminating the reliance on the time-consuming recurrent updates commonly used in traditional methods. Instead, it directly converts the cost volume into a set of disparity candidates and their associated confidence scores. Adaptive interpolation, guided by content and confidence information, refines the cost volume to produce the final accurate disparity. This streamlined approach achieves an impressive inference speed of 0.02 s per image. Comprehensive tests conducted in diverse underwater settings demonstrate the effectiveness of both networks, showcasing their ability to achieve reliable depth perception. Full article

(This article belongs to the Special Issue Artificial Intelligence and Big Data for Oceanography)

► Show Figures

Figure 1

20 pages, 4856 KiB

Open AccessArticle

Enhancing the Ground Truth Disparity by MAP Estimation for Developing a Neural-Net Based Stereoscopic Camera

by Hanbit Gil, Sehyun Ryu and Sungmin Woo

Sensors 2024, 24(23), 7761; https://doi.org/10.3390/s24237761 - 4 Dec 2024

Viewed by 533

Abstract

This paper presents a novel method to enhance ground truth disparity maps generated by Semi-Global Matching (SGM) using Maximum a Posteriori (MAP) estimation. SGM, while not producing visually appealing outputs like neural networks, offers high disparity accuracy in valid regions and avoids the [...] Read more.

This paper presents a novel method to enhance ground truth disparity maps generated by Semi-Global Matching (SGM) using Maximum a Posteriori (MAP) estimation. SGM, while not producing visually appealing outputs like neural networks, offers high disparity accuracy in valid regions and avoids the generalization issues often encountered with neural network-based disparity estimation. However, SGM struggles with occlusions and textureless areas, leading to invalid disparity values. Our approach, though relatively simple, mitigates these issues by interpolating invalid pixels using surrounding disparity information and Bayesian inference, improving both the visual quality of disparity maps and their usability for training neural network-based commercial depth-sensing devices. Experimental results validate that our enhanced disparity maps preserve SGM’s accuracy in valid regions while improving the overall performance of neural networks on both synthetic and real-world datasets. This method provides a robust framework for advanced stereoscopic camera systems, particularly in autonomous applications. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 2nd Volume)

► Show Figures

Figure 1

29 pages, 30892 KiB

Open AccessArticle

A Generalized Voronoi Diagram-Based Segment-Point Cyclic Line Segment Matching Method for Stereo Satellite Images

by Li Zhao, Fengcheng Guo, Yi Zhu, Haiyan Wang and Bingqian Zhou

Remote Sens. 2024, 16(23), 4395; https://doi.org/10.3390/rs16234395 - 24 Nov 2024

Viewed by 407

Abstract

Matched line segments are crucial geometric elements for reconstructing the desired 3D structure in stereo satellite imagery, owing to their advantages in spatial representation, complex shape description, and geometric computation. However, existing line segment matching (LSM) methods face significant challenges in effectively addressing [...] Read more.

Matched line segments are crucial geometric elements for reconstructing the desired 3D structure in stereo satellite imagery, owing to their advantages in spatial representation, complex shape description, and geometric computation. However, existing line segment matching (LSM) methods face significant challenges in effectively addressing co-linear interference and the misdirection of parallel line segments. To address these issues, this study proposes a “continuous–discrete–continuous” cyclic LSM method, based on the Voronoi diagram, for stereo satellite images. Initially, to compute the discrete line-point matching rate, line segments are discretized using the Bresenham algorithm, and the pyramid histogram of visual words (PHOW) feature is assigned to the line segment points which are detected using the line segment detector (LSD). Next, to obtain continuous matched line segments, the method combines the line segment crossing angle rate with the line-point matching rate, utilizing a soft voting classifier. Finally, local point-line homography models are constructed based on the Voronoi diagram, filtering out misdirected parallel line segments and yielding the final matched line segments. Extensive experiments on the challenging benchmark, WorldView-2 and WorldView-3 satellite image datasets, demonstrate that the proposed method outperforms several state-of-the-art LSM methods. Specifically, the proposed method achieves F1-scores that are 6.22%, 12.60%, and 18.35% higher than those of the best-performing existing LSM method on the three datasets, respectively. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

18 pages, 6146 KiB

Open AccessArticle

A Near-Infrared Imaging System for Robotic Venous Blood Collection

by Zhikang Yang, Mao Shi, Yassine Gharbi, Qian Qi, Huan Shen, Gaojian Tao, Wu Xu, Wenqi Lyu and Aihong Ji

Sensors 2024, 24(22), 7413; https://doi.org/10.3390/s24227413 - 20 Nov 2024

Viewed by 917

Abstract

Venous blood collection is a widely used medical diagnostic technique, and with rapid advancements in robotics, robotic venous blood collection has the potential to replace traditional manual methods. The success of this robotic approach is heavily dependent on the quality of vein imaging. [...] Read more.

Venous blood collection is a widely used medical diagnostic technique, and with rapid advancements in robotics, robotic venous blood collection has the potential to replace traditional manual methods. The success of this robotic approach is heavily dependent on the quality of vein imaging. In this paper, we develop a vein imaging device based on the simulation analysis of vein imaging parameters and propose a U-Net+ResNet18 neural network for vein image segmentation. The U-Net+ResNet18 neural network integrates the residual blocks from ResNet18 into the encoder of the U-Net to form a new neural network. ResNet18 is pre-trained using the Bootstrap Your Own Latent (BYOL) framework, and its encoder parameters are transferred to the U-Net+ResNet18 neural network, enhancing the segmentation performance of vein images with limited labelled data. Furthermore, we optimize the AD-Census stereo matching algorithm by developing a variable-weight version, which improves its adaptability to image variations across different regions. Results show that, compared to U-Net, the BYOL+U-Net+ResNet18 method achieves an 8.31% reduction in Binary Cross-Entropy (BCE), a 5.50% reduction in Hausdorff Distance (HD), a 15.95% increase in Intersection over Union (IoU), and a 9.20% increase in the Dice coefficient (Dice), indicating improved image segmentation quality. The average error of the optimized AD-Census stereo matching algorithm is reduced by 25.69%, and the improvement of the image stereo matching performance is more obvious. Future research will explore the application of the vein imaging system in robotic venous blood collection to facilitate real-time puncture guidance. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

16 pages, 4359 KiB

Open AccessArticle

Adaptive Kernel Convolutional Stereo Matching Recurrent Network

by Jiamian Wang, Haijiang Sun and Ping Jia

Sensors 2024, 24(22), 7386; https://doi.org/10.3390/s24227386 - 20 Nov 2024

Viewed by 516

Abstract

For binocular stereo matching techniques, the most advanced method currently is using an iterative structure based on GRUs. Methods in this class have shown high performance on both high-resolution images and standard benchmarks. However, simply replacing cost aggregation with a GRU iterative method [...] Read more.

For binocular stereo matching techniques, the most advanced method currently is using an iterative structure based on GRUs. Methods in this class have shown high performance on both high-resolution images and standard benchmarks. However, simply replacing cost aggregation with a GRU iterative method leads to the original cost volume for disparity calculation lacking non-local geometric and contextual information. Based on this, this paper proposes a new GRU iteration-based adaptive kernel convolution deep recurrent network architecture for stereo matching. This paper proposes a kernel convolution-based adaptive multi-scale pyramid pooling (KAP) module that fully considers the spatial correlation between pixels and adds new matching attention (MAR) to refine the matching cost volume before inputting it into the iterative network for iterative updates, enhancing the pixel-level representation ability of the image and improving the overall generalization ability of the network. At present, the AKC-Stereo network proposed in this paper has a higher improvement than the basic network. On the Sceneflow dataset, the EPE of AKC-Stereo reaches 0.45, which is 0.02 higher than the basic network. On the KITTI 2015 dataset, the AKC-Stereo network outperforms the base network by 5.6% on the D1-all metric. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

21 pages, 7841 KiB

Open AccessArticle

Research on a Method for Measuring the Pile Height of Materials in Agricultural Product Transport Vehicles Based on Binocular Vision

by Wang Qian, Pengyong Wang, Hongjie Wang, Shuqin Wu, Yang Hao, Xiaoou Zhang, Xinyu Wang, Wenyan Sun, Haijie Guo and Xin Guo

Sensors 2024, 24(22), 7204; https://doi.org/10.3390/s24227204 - 11 Nov 2024

Viewed by 665

Abstract

The advancement of unloading technology in combine harvesting is crucial for the intelligent development of agricultural machinery. Accurately measuring material pile height in transport vehicles is essential, as uneven accumulation can lead to spillage and voids, reducing loading efficiency. Relying solely on manual [...] Read more.

The advancement of unloading technology in combine harvesting is crucial for the intelligent development of agricultural machinery. Accurately measuring material pile height in transport vehicles is essential, as uneven accumulation can lead to spillage and voids, reducing loading efficiency. Relying solely on manual observation for measuring stack height can decrease harvesting efficiency and pose safety risks due to driver distraction. This research applies binocular vision to agricultural harvesting, proposing a novel method that uses a stereo matching algorithm to measure material pile height during harvesting. By comparing distance measurements taken in both empty and loaded states, the method determines stack height. A linear regression model processes the stack height data, enhancing measurement accuracy. A binocular vision system was established, applying Zhang’s calibration method on the MATLAB (R2019a) platform to correct camera parameters, achieving a calibration error of 0.15 pixels. The study implemented block matching (BM) and semi-global block matching (SGBM) algorithms using the OpenCV (4.8.1) library on the PyCharm (2020.3.5) platform for stereo matching, generating disparity, and pseudo-color maps. Three-dimensional coordinates of key points on the piled material were calculated to measure distances from the vehicle container bottom and material surface to the binocular camera, allowing for the calculation of material pile height. Furthermore, a linear regression model was applied to correct the data, enhancing the accuracy of the measured pile height. The results indicate that by employing binocular stereo vision and stereo matching algorithms, followed by linear regression, this method can accurately calculate material pile height. The average relative error for the BM algorithm was 3.70%, and for the SGBM algorithm, it was 3.35%, both within the acceptable precision range. While the SGBM algorithm was, on average, 46 ms slower than the BM algorithm, both maintained errors under 7% and computation times under 100 ms, meeting the real-time measurement requirements for combine harvesting. In practical operations, this method can effectively measure material pile height in transport vehicles. The choice of matching algorithm should consider container size, material properties, and the balance between measurement time, accuracy, and disparity map completeness. This approach aids in manual adjustment of machinery posture and provides data support for future autonomous master-slave collaborative operations in combine harvesting. Full article

(This article belongs to the Special Issue AI, IoT and Smart Sensors for Precision Agriculture)

► Show Figures

Figure 1

17 pages, 13227 KiB

Open AccessArticle

Robot Localization Method Based on Multi-Sensor Fusion in Low-Light Environment

by Mengqi Wang, Zengzeng Lian, María Amparo Núñez-Andrés, Penghui Wang, Yalin Tian, Zhe Yue and Lingxiao Gu

Electronics 2024, 13(22), 4346; https://doi.org/10.3390/electronics13224346 - 6 Nov 2024

Viewed by 740

Abstract

When robots perform localization in indoor low-light environments, factors such as weak and uneven lighting can degrade image quality. This degradation results in a reduced number of feature extractions by the visual odometry front end and may even cause tracking loss, thereby impacting [...] Read more.

When robots perform localization in indoor low-light environments, factors such as weak and uneven lighting can degrade image quality. This degradation results in a reduced number of feature extractions by the visual odometry front end and may even cause tracking loss, thereby impacting the algorithm’s positioning accuracy. To enhance the localization accuracy of mobile robots in indoor low-light environments, this paper proposes a visual inertial odometry method (L-MSCKF) based on the multi-state constraint Kalman filter. Addressing the challenges of low-light conditions, we integrated Inertial Measurement Unit (IMU) data with stereo vision odometry. The algorithm includes an image enhancement module and a gyroscope zero-bias correction mechanism to facilitate feature matching in stereo vision odometry. We conducted tests on the EuRoC dataset and compared our method with other similar algorithms, thereby validating the effectiveness and accuracy of L-MSCKF. Full article

► Show Figures

Figure 1

Figure 1
Algorithm procedure. Full article ">Figure 2
Selection of image enhancement algorithm parameters. (a) Fixed low-frequency gain at 0.5, sharpening coefficient at 1, and contrast threshold set to 4. (b) Fixed high-frequency gain at 1.6, sharpening coefficient of 1, and contrast threshold of 4. (c) High-frequency gain is set to 1.6, low-frequency gain to 0.3, and the contrast threshold to 4. (d) The fixed high-frequency gain is 1.6, the low-frequency gain is 0.3, and the sharpening coefficient is 1.5. Full article ">Figure 3
Comparison of feature point extraction effect. (a) The feature point extraction result of the original image. (b) The feature point extraction result after CLAHE processing. (c) The feature point extraction result after homomorphic filtering processing. (d) The feature point extraction result after both CLAHE processing and homomorphic filtering processing. Full article ">Figure 4
Estimation of gyroscope bias coefficients on the MH02 and V203 sequences. (a) Variation in gyroscope bias for L-MSCKF and MSCKF-VIO on the MH02 sequence. (b) Estimated gyroscope bias values by L-MSCKF and MSCKF-VIO on the V203 sequence. Full article ">Figure 5
The trajectory of the algorithm on sequences V103 and V203 of the EuRoC dataset. (a) The trajectory on the V103 sequence. (b) The X, Y, and Z triaxial values on the V103 sequence. (c) The trajectory on the V203 sequence. (d) The X, Y, and Z triaxial values on the V203 sequence. Full article ">Figure 6
Comparison of absolute trajectory errors of each algorithm on weak light sequence V203. Full article ">Figure 7
Comparison of the computational efficiency of each algorithm. (a) Average CPU usage in % of the total available CPU, by the algorithms running the same experiment. (b) Total running time of each algorithm on the same dataset. Full article ">

17 pages, 3301 KiB

Open AccessArticle

Stereo and LiDAR Loosely Coupled SLAM Constrained Ground Detection

by Tian Sun, Lei Cheng, Ting Zhang, Xiaoping Yuan, Yanzheng Zhao and Yong Liu

Sensors 2024, 24(21), 6828; https://doi.org/10.3390/s24216828 - 24 Oct 2024

Viewed by 811

Abstract

In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain [...] Read more.

In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain limitations. This paper proposes a hybrid localization and mapping method using stereo vision and LiDAR. Unlike the traditional single-sensor systems, we construct a pose optimization model by matching ground information between LiDAR maps and visual images. We use stereo vision to extract ground information and fuse it with LiDAR tensor voting data to establish coplanarity constraints. Pose optimization is achieved through a graph-based optimization algorithm and a local window optimization method. The proposed method is evaluated using the KITTI dataset and compared against the ORB-SLAM3, F-LOAM, LOAM, and LeGO-LOAM methods. Additionally, we generate 3D point cloud maps for the corresponding sequences and high-definition point cloud maps of the streets in sequence 00. The experimental results demonstrate significant improvements in trajectory accuracy and robustness, enabling the construction of clear, dense 3D maps. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

15 pages, 8542 KiB

Open AccessArticle

The Adversarial Robust and Generalizable Stereo Matching for Infrared Binocular Based on Deep Learning

by Bowen Liu, Jiawei Ji, Cancan Tao, Jujiu Li and Yingxun Wang

J. Imaging 2024, 10(11), 264; https://doi.org/10.3390/jimaging10110264 - 22 Oct 2024

Viewed by 829

Abstract

Despite the considerable success of deep learning methods in stereo matching for binocular images, the generalizability and robustness of these algorithms, particularly under challenging conditions such as occlusions or degraded infrared textures, remain uncertain. This paper presents a novel deep-learning-based depth optimization method [...] Read more.

Despite the considerable success of deep learning methods in stereo matching for binocular images, the generalizability and robustness of these algorithms, particularly under challenging conditions such as occlusions or degraded infrared textures, remain uncertain. This paper presents a novel deep-learning-based depth optimization method that obviates the need for large infrared image datasets and adapts seamlessly to any specific infrared camera. Moreover, this adaptability extends to standard binocular images, allowing the method to work effectively on both infrared and visible light stereo images. We further investigate the role of infrared textures in a deep learning framework, demonstrating their continued utility for stereo matching even in complex lighting environments. To compute the matching cost volume, we apply the multi-scale census transform to the input stereo images. A stacked sand leak subnetwork is subsequently employed to address the matching task. Our approach substantially improves adversarial robustness while maintaining accuracy on comparison with state-of-the-art methods which decrease nearly a half in EPE for quantitative results on widely used autonomous driving datasets. Furthermore, the proposed method exhibits superior generalization capabilities, transitioning from simulated datasets to real-world datasets without the need for fine-tuning. Full article

(This article belongs to the Special Issue Deep Learning in Computer Vision)

► Show Figures

Figure 1

19 pages, 29661 KiB

Open AccessArticle

High-Precision Disparity Estimation for Lunar Scene Using Optimized Census Transform and Superpixel Refinement

by Zhen Liang, Hongfeng Long, Zijian Zhu, Zifei Cao, Jinhui Yi, Yuebo Ma, Enhai Liu and Rujin Zhao

Remote Sens. 2024, 16(21), 3930; https://doi.org/10.3390/rs16213930 - 22 Oct 2024

Viewed by 561

Abstract

High-precision lunar scene 3D data are essential for lunar exploration and the construction of scientific research stations. Currently, most existing data from orbital imagery offers resolutions up to 0.5–2 m, which is inadequate for tasks requiring centimeter-level precision. To overcome this, our research [...] Read more.

High-precision lunar scene 3D data are essential for lunar exploration and the construction of scientific research stations. Currently, most existing data from orbital imagery offers resolutions up to 0.5–2 m, which is inadequate for tasks requiring centimeter-level precision. To overcome this, our research focuses on using in situ stereo vision systems for finer 3D reconstructions directly from the lunar surface. However, the scarcity and homogeneity of available lunar surface stereo datasets, combined with the Moon’s unique conditions—such as variable lighting from low albedo, sparse surface textures, and extensive shadow occlusions—pose significant challenges to the effectiveness of traditional stereo matching techniques. To address the dataset gap, we propose a method using Unreal Engine 4 (UE4) for high-fidelity physical simulation of lunar surface scenes, generating high-resolution images under realistic and challenging conditions. Additionally, we propose an optimized cost calculation method based on Census transform and color intensity fusion, along with a multi-level super-pixel disparity optimization, to improve matching accuracy under harsh lunar conditions. Experimental results demonstrate that the proposed method exhibits exceptional robustness and accuracy in our soon-to-be-released multi-scene lunar dataset, effectively addressing issues related to special lighting conditions, weak textures, and shadow occlusion, ultimately enhancing disparity estimation accuracy. Full article

► Show Figures

Figure 1

23 pages, 3934 KiB

Open AccessArticle

A Multi-Scale Covariance Matrix Descriptor and an Accurate Transformation Estimation for Robust Point Cloud Registration

by Fengguang Xiong, Yu Kong, Xinhe Kuang, Mingyue Hu, Zhiqiang Zhang, Chaofan Shen and Xie Han

Appl. Sci. 2024, 14(20), 9375; https://doi.org/10.3390/app14209375 - 14 Oct 2024

Cited by 1 | Viewed by 834

Abstract

This paper presents a robust point cloud registration method based on a multi-scale covariance matrix descriptor and an accurate transformation estimation. Compared with state-of-the-art feature descriptors, such as FPH, 3DSC, spin image, etc., our proposed multi-scale covariance matrix descriptor is superior for dealing [...] Read more.

This paper presents a robust point cloud registration method based on a multi-scale covariance matrix descriptor and an accurate transformation estimation. Compared with state-of-the-art feature descriptors, such as FPH, 3DSC, spin image, etc., our proposed multi-scale covariance matrix descriptor is superior for dealing with registration problems in a higher noise environment since the mean operation in generating the covariance matrix can filter out most of the noise-damaged samples or outliers and also make itself robust to noise. Compared with transformation estimation, such as feature matching, clustering, ICP, RANSAC, etc., our transformation estimation is able to find a better optimal transformation between a pair of point clouds since our transformation estimation is a multi-level point cloud transformation estimator including feature matching, coarse transformation estimation based on clustering, and a fine transformation estimation based on ICP. Experiment findings reveal that our proposed feature descriptor and transformation estimation outperforms state-of-the-art feature descriptors and transformation estimation, and registration effectiveness based on our registration framework of point cloud is extremely successful in the Stanford 3D Scanning Repository, the SpaceTime dataset, and the Kinect dataset, where the Stanford 3D Scanning Repository is known for its comprehensive collection of high-quality 3D scans, and the SpaceTime dataset and the Kinect dataset are captured by a SpaceTime Stereo scanner and a low-cost Microsoft Kinect scanner, respectively. Full article

► Show Figures

Figure 1

Search Results (373)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (373)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI