MDPI - Publisher of Open Access Journals

20 pages, 9980 KiB

Open AccessArticle

TGNF-Net: Two-Stage Geometric Neighborhood Fusion Network for Category-Level 6D Pose Estimation

by Xiaolong Zhao, Feihu Yan, Guangzhe Zhao and Caiyong Wang

Information 2025, 16(2), 113; https://doi.org/10.3390/info16020113 - 6 Feb 2025

Viewed by 449

The main goal of six-dimensional pose estimation is to accurately ascertain the location and orientation of an object in three-dimensional space, which has a wide range of applications in the field of artificial intelligence. Due to the relative sparseness of the point cloud [...] Read more.

The main goal of six-dimensional pose estimation is to accurately ascertain the location and orientation of an object in three-dimensional space, which has a wide range of applications in the field of artificial intelligence. Due to the relative sparseness of the point cloud data captured by the depth camera, the ability of models to fully understand the shape, structure, and other features of the object is hindered. Consequently, the model exhibits weak generalization when faced with objects with significant shape differences in the new scene. The deep integration of feature levels and the mining of local and global information can effectively alleviate the influence of the above factors. To solve these problems, we propose a new Two-Stage Geometric Neighborhood Fusion Network for category-level 6D pose estimation (TGNF-Net) to estimate objects that have not appeared in the training phase, which strengthens the fusion capacity of feature points within a specific range of neighborhoods, enabling the feature points to be more sensitive to both local and global geometric information. Our approach includes a neighborhood information fusion module, which can effectively utilize neighborhood information to enrich the feature set of different modal data and overcome the problem of heterogeneity between image and point cloud data. In addition to this, we design a two-stage geometric information embedding module, which can effectively fuse geometric information of the multi-scale range into keypoint features. This way enhances the robustness of the model and enables the model to exhibit stronger generalization capabilities when faced with unknown or complex scenes. These two strategies enhance the expression of features and make NOCS coordinate predictions more accurate. Many experiments show that our approach is superior to other classical methods on the CAMERA25, REAL275, HouseCat6D, and Omni6DPose datasets. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Graphical abstract

22 pages, 10897 KiB

Open AccessArticle

Array Three-Dimensional SAR Imaging via Composite Low-Rank and Sparse Prior

by Zhiliang Yang, Yangyang Wang, Chudi Zhang, Xu Zhan, Guohao Sun, Yuxuan Liu and Yuru Mao

Remote Sens. 2025, 17(2), 321; https://doi.org/10.3390/rs17020321 - 17 Jan 2025

Cited by 1 | Viewed by 464

Abstract

Array three-dimensional (3D) synthetic aperture radar (SAR) imaging has been used for 3D modeling of urban buildings and diagnosis of target scattering characteristics, and represents one of the significant directions in SAR development in recent years. However, sparse driven 3D imaging methods usually [...] Read more.

Array three-dimensional (3D) synthetic aperture radar (SAR) imaging has been used for 3D modeling of urban buildings and diagnosis of target scattering characteristics, and represents one of the significant directions in SAR development in recent years. However, sparse driven 3D imaging methods usually only capture the sparse features of the imaging scene, which can result in the loss of the structural information of the target and cause bias effects, affecting the imaging quality. To address this issue, we propose a novel array 3D SAR imaging method based on composite sparse and low-rank prior (SLRP), which can achieve high-quality imaging even with limited observation data. Firstly, an imaging optimization model based on composite SLRP is established, which captures both sparse and low-rank features simultaneously by combining non-convex regularization functions and improved nuclear norm (INN), reducing bias effects during the imaging process and improving imaging accuracy. Then, the framework that integrates variable splitting and alternative minimization (VSAM) is presented to solve the imaging optimization problem, which is suitable for high-dimensional imaging scenes. Finally, the performance of the method is validated through extensive simulation and real data experiments. The results indicate that the proposed method can significantly improve imaging quality with limited observational data. Full article

(This article belongs to the Special Issue SAR Images Processing and Analysis (2nd Edition))

► Show Figures

Graphical abstract

16 pages, 2102 KiB

Open AccessArticle

Semantic Segmentation Method for High-Resolution Tomato Seedling Point Clouds Based on Sparse Convolution

by Shizhao Li, Zhichao Yan, Boxiang Ma, Shaoru Guo and Hongxia Song

Agriculture 2025, 15(1), 74; https://doi.org/10.3390/agriculture15010074 - 31 Dec 2024

Viewed by 525

Abstract

Semantic segmentation of three-dimensional (3D) plant point clouds at the stem-leaf level is foundational and indispensable for high-throughput tomato phenotyping systems. However, existing semantic segmentation methods often suffer from issues such as low precision and slow inference speed. To address these challenges, we [...] Read more.

Semantic segmentation of three-dimensional (3D) plant point clouds at the stem-leaf level is foundational and indispensable for high-throughput tomato phenotyping systems. However, existing semantic segmentation methods often suffer from issues such as low precision and slow inference speed. To address these challenges, we propose an innovative encoding-decoding structure, incorporating voxel sparse convolution (SpConv) and attention-based feature fusion (VSCAFF) to enhance semantic segmentation of the point clouds of high-resolution tomato seedling images. Tomato seedling point clouds from the Pheno4D dataset labeled into semantic classes of ‘leaf’, ‘stem’, and ‘soil’ are applied for the semantic segmentation. In order to reduce the number of parameters so as to further improve the inference speed, the SpConv module is designed to function through the residual concatenation of the skeleton convolution kernel and the regular convolution kernel. The feature fusion module based on the attention mechanism is designed by giving the corresponding attention weights to the voxel diffusion features and the point features in order to avoid the ambiguity of points with different semantics having the same characteristics caused by the diffusion module, in addition to suppressing noise. Finally, to solve model training class bias caused by the uneven distribution of point cloud classes, the composite loss function of Lovász-Softmax and weighted cross-entropy is introduced to supervise the model training and improve its performance. The results show that mIoU of VSCAFF is 86.96%, which outperformed the performance of PointNet, PointNet++, and DGCNN, respectively. IoU of VSCAFF achieves 99.63% in the soil class, 64.47% in the stem class, and 96.72% in the leaf class. The time delay of 35ms in inference speed is better than PointNet++ and DGCNN. The results demonstrate that VSCAFF has high performance and inference speed for semantic segmentation of high-resolution tomato point clouds, and can provide technical support for the high-throughput automatic phenotypic analysis of tomato plants. Full article

(This article belongs to the Section Digital Agriculture)

► Show Figures

Figure 1

13 pages, 7953 KiB

Open AccessArticle

TAMC: Textual Alignment and Masked Consistency for Open-Vocabulary 3D Scene Understanding

by Juan Wang, Zhijie Wang, Tomo Miyazaki, Yaohou Fan and Shinichiro Omachi

Sensors 2024, 24(19), 6166; https://doi.org/10.3390/s24196166 - 24 Sep 2024

Viewed by 806

Abstract

Three-dimensional (3D) Scene Understanding achieves environmental perception by extracting and analyzing point cloud data with wide applications including virtual reality, robotics, etc. Previous methods align the 2D image feature from a pre-trained CLIP model and the 3D point cloud feature for the open [...] Read more.

Three-dimensional (3D) Scene Understanding achieves environmental perception by extracting and analyzing point cloud data with wide applications including virtual reality, robotics, etc. Previous methods align the 2D image feature from a pre-trained CLIP model and the 3D point cloud feature for the open vocabulary scene understanding ability. We believe that existing methods have the following two deficiencies: (1) the 3D feature extraction process ignores the challenges of real scenarios, i.e., point cloud data are very sparse and even incomplete; (2) the training stage lacks direct text supervision, leading to inconsistency with the inference stage. To address the first issue, we employ a Masked Consistency training policy. Specifically, during the alignment of 3D and 2D features, we mask some 3D features to force the model to understand the entire scene using only partial 3D features. For the second issue, we generate pseudo-text labels and align them with the 3D features during the training process. In particular, we first generate a description for each 2D image belonging to the same 3D scene and then use a summarization model to fuse these descriptions into a single description of the scene. Subsequently, we align 2D-3D features and 3D-text features simultaneously during training. Massive experiments demonstrate the effectiveness of our method, outperforming state-of-the-art approaches. Full article

(This article belongs to the Special Issue Object Detection via Point Cloud Data)

► Show Figures

Figure 1

15 pages, 2064 KiB

Open AccessArticle

Research on the Depth Image Reconstruction Algorithm Using the Two-Dimensional Kaniadakis Entropy Threshold

by Xianhui Yang, Jianfeng Sun, Le Ma, Xin Zhou, Wei Lu and Sining Li

Sensors 2024, 24(18), 5950; https://doi.org/10.3390/s24185950 - 13 Sep 2024

Viewed by 780

Abstract

The photon-counting light laser detection and ranging (LiDAR), especially the Geiger mode avalanche photon diode (Gm-APD) LiDAR, can obtain three-dimensional images of the scene, with the characteristics of single-photon sensitivity, but the background noise limits the imaging quality of the laser radar. In [...] Read more.

The photon-counting light laser detection and ranging (LiDAR), especially the Geiger mode avalanche photon diode (Gm-APD) LiDAR, can obtain three-dimensional images of the scene, with the characteristics of single-photon sensitivity, but the background noise limits the imaging quality of the laser radar. In order to solve this problem, a depth image estimation method based on a two-dimensional (2D) Kaniadakis entropy thresholding method is proposed which transforms a weak signal extraction problem into a denoising problem for point cloud data. The characteristics of signal peak aggregation in the data and the spatio-temporal correlation features between target image elements in the point cloud-intensity data are exploited. Through adequate simulations and outdoor target-imaging experiments under different signal-to-background ratios (SBRs), the effectiveness of the method under low signal-to-background ratio conditions is demonstrated. When the SBR is 0.025, the proposed method reaches a target recovery rate of 91.7%, which is better than the existing typical methods, such as the Peak-picking method, Cross-Correlation method, and the sparse Poisson intensity reconstruction algorithm (SPIRAL), which achieve a target recovery rate of 15.7%, 7.0%, and 18.4%, respectively. Additionally, comparing with the SPIRAL, the reconstruction recovery ratio is improved by 73.3%. The proposed method greatly improves the integrity of the target under high-background-noise environments and finally provides a basis for feature extraction and target recognition. Full article

(This article belongs to the Special Issue Application of LiDAR Remote Sensing and Mapping)

► Show Figures

Figure 1

Figure 1
Flowchart of the proposed algorithm. Full article ">Figure 2
(a) Distribution of the probability density of the signal and noise when the signal position is at the 300th time bin. (b) Counting histogram of 0.1 s data with signal and noise. Full article ">Figure 3
Signal detection rate for extracting different numbers of wave peaks under different SBRs. Full article ">Figure 4
(a) The standard depth image. (b) The standard intensity image. Full article ">Figure 5
Depth images obtained using different methods of reconstruction. (a–d) SBR is 0.01; (e–h) SBR is 0.02; (i–l) SBR is 0.04; (m–p) SBR is 0.06; (q–t) SBR is 0.08; (a,e,i,m,q) Peak-picking method; (b,f,j,n,r) Cross-Correlation method; (c,g,k,o,s) SPIRAL method; (d,h,l,p,t) proposed method. Full article ">Figure 6
The relationship between the SBR of the scene and the SNR of the reconstructed image using different methods. Full article ">Figure 7
The photos of the experimental targets. Full article ">Figure 8
The number of echo photons/pixel of the scenes. (a) Building at a distance of 730 m with an SBR of 0.078. (b) Building at a distance of 730 m with an SBR of 0.053. (c) Building at a distance of 730 m with an SBR of 0.031. (d) Building at a distance of 1400 m with an SBR of 0.031. (e) Building at a distance of 1400 m with an SBR of 0.025. Full article ">Figure 9
Truth depth images and depth images obtained using different methods of the 730 m building. (a–e) SBR is 0.078; (f–j) SBR is 0.053; (k–o) SBR is 0.031; (a,f,k) truth depth image; (b,g,l) Peak-picking method; (c,h,m) Cross-Correlation method; (d,i,n) SPIRAL method; (e,j,o) proposed method. Full article ">Figure 10
Truth depth images and depth images obtained using different methods of the 1400 m building. (a–e) SBR is 0.031; (f–j) SBR is 0.025; (a,f) truth depth image; (b,g) Peak-picking method; (c,h) Cross-Correlation method; (d,i) SPIRAL method; (e,j) proposed method. Full article ">Figure 11
Intensity images with different signal-to-background ratios. (a) A 730 m building intensity image with an SBR of 0.078; (b) a 730 m building intensity image with an SBR of 0.053; (c) a 730 m building intensity image with an SBR of 0.031; (d) a 1400 m building intensity image with an SBR of 0.031; (e) a 1400 m building intensity image with an SBR of 0.025. Full article ">

22 pages, 13810 KiB

Open AccessArticle

An Underwater Stereo Matching Method: Exploiting Segment-Based Method Traits without Specific Segment Operations

by Xinlin Xu, Huiping Xu, Lianjiang Ma, Kelin Sun and Jingchuan Yang

J. Mar. Sci. Eng. 2024, 12(9), 1599; https://doi.org/10.3390/jmse12091599 - 10 Sep 2024

Viewed by 1136

Abstract

Stereo matching technology, enabling the acquisition of three-dimensional data, holds profound implications for marine engineering. In underwater images, irregular object surfaces and the absence of texture information make it difficult for stereo matching algorithms that rely on discrete disparity values to accurately capture [...] Read more.

Stereo matching technology, enabling the acquisition of three-dimensional data, holds profound implications for marine engineering. In underwater images, irregular object surfaces and the absence of texture information make it difficult for stereo matching algorithms that rely on discrete disparity values to accurately capture the 3D details of underwater targets. This paper proposes a stereo method based on an energy function of Markov random field (MRF) with 3D labels to fit the inclined plane of underwater objects. Through the integration of a cross-based patch alignment approach with two label optimization stages, the proposed method demonstrates features akin to segment-based stereo matching methods, enabling it to handle images with sparse textures effectively. Through experiments conducted on both simulated UW-Middlebury datasets and real deteriorated underwater images, our method demonstrates superiority compared to classical or state-of-the-art methods by analyzing the acquired disparity maps and observing the three-dimensional reconstruction of the underwater target. Full article

(This article belongs to the Special Issue Underwater Observation Technology in Marine Environment)

► Show Figures

Figure 1

18 pages, 6500 KiB

Open AccessArticle

NSVDNet: Normalized Spatial-Variant Diffusion Network for Robust Image-Guided Depth Completion

by Jin Zeng and Qingpeng Zhu

Electronics 2024, 13(12), 2418; https://doi.org/10.3390/electronics13122418 - 20 Jun 2024

Cited by 1 | Viewed by 1098

Abstract

Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance [...] Read more.

Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance of aligned Red–Green–Blue (RGB) images. Recent approaches have achieved a remarkable improvement, but the performance will degrade severely due to the corruption in input sparse depth. To enhance robustness to input corruption, we propose a novel depth completion scheme based on a normalized spatial-variant diffusion network incorporating measurement uncertainty, which introduces the following contributions. First, we design a normalized spatial-variant diffusion (NSVD) scheme to apply spatially varying filters iteratively on the sparse depth conditioned on its certainty measure for excluding depth corruption in the diffusion. In addition, we integrate the NSVD module into the network design to enable end-to-end training of filter kernels and depth reliability, which further improves the structural detail preservation via the guidance of RGB semantic features. Furthermore, we apply the NSVD module hierarchically at multiple scales, which ensures global smoothness while preserving visually salient details. The experimental results validate the advantages of the proposed network over existing approaches with enhanced performance and noise robustness for depth completion in real-use scenarios. Full article

(This article belongs to the Special Issue Image Sensors and Companion Chips)

► Show Figures

Figure 1

Figure 1
Example in NYUv2 dataset [<a href="#B12-electronics-13-02418" class="html-bibr">12</a>]. (a) RGB image input, (b) sparse depth input, depth estimation with (c) PNCNN [<a href="#B6-electronics-13-02418" class="html-bibr">6</a>] using single depth, (d) MiDaS [<a href="#B7-electronics-13-02418" class="html-bibr">7</a>] using single RGB, (e) NLSPN [<a href="#B11-electronics-13-02418" class="html-bibr">11</a>], and (f) proposed NSVDNet using both RGB and depth. As highlighted in the black rectangles, (f) NSVDNet generates more accurate structural details than (e) NLSPN due to the uncertainty-aware diffusion scheme. The results are evaluated using RMSE metric, where (f) NSVDNet achieves the smallest RMSE, indicating improved accuracy. Full article ">Figure 2
An overview of NSVDNet architecture to predict a dense depth from a disturbed sparse depth with RGB guidance. NSVDNet is composed of the depth-dominant branch, which estimates the initial dense depth from the sparse sensor depth, and the RGB-dominant branch, which generates the semantic structural features. The two branches are fused in the hierarchical NSVD modules, where the initial dense depth is diffused with spatial-variant diffusion kernels constructed from RGB features. Full article ">Figure 3
Depth completion with different algorithms, tested on NYUv2 dataset. As highlighted in the red rectangles, the proposed NSVDNet achieves more accurate depth completion results with detail preservation and noise robustness. Full article ">Figure 4
Comparison of depth completion with original sparse depth and noisy sparse depth with 50% outliers, tested on NYUv2 dataset. The comparison between results with original and noisy inputs demonstrates the robustness to input corruption for the proposed method. The selected patches are enlarged in the colored rectangles. Full article ">Figure 5
Generalization ability evaluation tests on TetrasRGBD dataset with outliers. The certainty maps explain the robustness of NSVDNet to input corruptions. Full article ">Figure 6
Generalization ability evaluation tests on TetrasRGBD dataset with real sensor data, where the proposed NSVDNet generates more accurate depth estimation than competitive methods, including PNCNN [<a href="#B38-electronics-13-02418" class="html-bibr">38</a>] and NLSPN [<a href="#B11-electronics-13-02418" class="html-bibr">11</a>]. Full article ">

16 pages, 3250 KiB

Open AccessArticle

Iterative Adaptive Based Multi-Polarimetric SAR Tomography of the Forested Areas

by Shuang Jin, Hui Bi, Qian Guo, Jingjing Zhang and Wen Hong

Remote Sens. 2024, 16(9), 1605; https://doi.org/10.3390/rs16091605 - 30 Apr 2024

Cited by 2 | Viewed by 1502

Abstract

Synthetic aperture radar tomography (TomoSAR) is an extension of synthetic aperture radar (SAR) imaging. It introduces the synthetic aperture principle into the elevation direction to achieve three-dimensional (3-D) reconstruction of the observed target. Compressive sensing (CS) is a favorable technology for sparse elevation [...] Read more.

Synthetic aperture radar tomography (TomoSAR) is an extension of synthetic aperture radar (SAR) imaging. It introduces the synthetic aperture principle into the elevation direction to achieve three-dimensional (3-D) reconstruction of the observed target. Compressive sensing (CS) is a favorable technology for sparse elevation recovery. However, for the non-sparse elevation distribution of the forested areas, if CS is selected to reconstruct it, it is necessary to utilize some orthogonal bases to first represent the elevation reflectivity sparsely. The iterative adaptive approach (IAA) is a non-parametric algorithm that enables super-resolution reconstruction with minimal snapshots, eliminates the need for hyperparameter optimization, and requires fewer iterations. This paper introduces IAA to tomographicinversion of the forested areas and proposes a novel multi-polarimetric-channel joint 3-D imaging method. The proposed method relies on the characteristics of the consistent support of the elevation distribution of different polarimetric channels and uses the

L_{2}

-norm to constrain the IAA-based 3-D reconstruction of each polarimetric channel. Compared with typical spectral estimation (SE)-based algorithms, the proposed method suppresses the elevation sidelobes and ambiguity and, hence, improves the quality of the recovered 3-D image. Compared with the wavelet-based CS algorithm, it reduces computational cost and avoids the influence of orthogonal basis selection. In addition, in comparison to the IAA, it demonstrates greater accuracy in identifying the support of the elevation distribution in forested areas. Experimental results based on BioSAR 2008 data are used to validate the proposed method. Full article

(This article belongs to the Special Issue Advances in Synthetic Aperture Radar Data Processing and Application)

► Show Figures

Figure 1

26 pages, 19577 KiB

Open AccessArticle

Enhancing Building Point Cloud Reconstruction from RGB UAV Data with Machine-Learning-Based Image Translation

by Elisabeth Johanna Dippold and Fuan Tsai

Sensors 2024, 24(7), 2358; https://doi.org/10.3390/s24072358 - 8 Apr 2024

Cited by 1 | Viewed by 1696

Abstract

The performance of three-dimensional (3D) point cloud reconstruction is affected by dynamic features such as vegetation. Vegetation can be detected by near-infrared (NIR)-based indices; however, the sensors providing multispectral data are resource intensive. To address this issue, this study proposes a two-stage framework [...] Read more.

The performance of three-dimensional (3D) point cloud reconstruction is affected by dynamic features such as vegetation. Vegetation can be detected by near-infrared (NIR)-based indices; however, the sensors providing multispectral data are resource intensive. To address this issue, this study proposes a two-stage framework to firstly improve the performance of the 3D point cloud generation of buildings with a two-view SfM algorithm, and secondly, reduce noise caused by vegetation. The proposed framework can also overcome the lack of near-infrared data when identifying vegetation areas for reducing interferences in the SfM process. The first stage includes cross-sensor training, model selection and the evaluation of image-to-image RGB to color infrared (CIR) translation with Generative Adversarial Networks (GANs). The second stage includes feature detection with multiple feature detector operators, feature removal with respect to the NDVI-based vegetation classification, masking, matching, pose estimation and triangulation to generate sparse 3D point clouds. The materials utilized in both stages are a publicly available RGB-NIR dataset, and satellite and UAV imagery. The experimental results indicate that the cross-sensor and category-wise validation achieves an accuracy of 0.9466 and 0.9024, with a kappa coefficient of 0.8932 and 0.9110, respectively. The histogram-based evaluation demonstrates that the predicted NIR band is consistent with the original NIR data of the satellite test dataset. Finally, the test on the UAV RGB and artificially generated NIR with a segmentation-driven two-view SfM proves that the proposed framework can effectively translate RGB to CIR for NDVI calculation. Further, the artificially generated NDVI is able to segment and classify vegetation. As a result, the generated point cloud is less noisy, and the 3D model is enhanced. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

13 pages, 3903 KiB

Open AccessArticle

Binocular Visual Measurement Method Based on Feature Matching

by Zhongyang Xie and Chengyu Yang

Sensors 2024, 24(6), 1807; https://doi.org/10.3390/s24061807 - 11 Mar 2024

Cited by 3 | Viewed by 1351

Abstract

To address the issues of low measurement accuracy and unstable results when using binocular cameras to detect objects with sparse surface textures, weak surface textures, occluded surfaces, low-contrast surfaces, and surfaces with intense lighting variations, a three-dimensional measurement method based on an improved [...] Read more.

To address the issues of low measurement accuracy and unstable results when using binocular cameras to detect objects with sparse surface textures, weak surface textures, occluded surfaces, low-contrast surfaces, and surfaces with intense lighting variations, a three-dimensional measurement method based on an improved feature matching algorithm is proposed. Initially, features are extracted from the left and right images obtained by the binocular camera. The extracted feature points serve as seed points, and a one-dimensional search space is established accurately based on the disparity continuity and epipolar constraints. The optimal search range and seed point quantity are obtained using the particle swarm optimization algorithm. The zero-mean normalized cross-correlation coefficient is employed as a similarity measure function for region growing. Subsequently, the left and right images are matched based on the grayscale information of the feature regions, and seed point matching is performed within each matching region. Finally, the obtained matching pairs are used to calculate the three-dimensional information of the target object using the triangulation formula. The proposed algorithm significantly enhances matching accuracy while reducing algorithm complexity. Experimental results on the Middlebury dataset show an average relative error of 0.75% and an average measurement time of 0.82 s. The error matching rate of the proposed image matching algorithm is 2.02%, and the PSNR is 34 dB. The algorithm improves the measurement accuracy for objects with sparse or weak textures, demonstrating robustness against brightness variations and noise interference. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

21 pages, 6240 KiB

Open AccessArticle

by Zhenwen He, Xianzhen Liu and Chunfeng Zhang

ISPRS Int. J. Geo-Inf. 2024, 13(3), 89; https://doi.org/10.3390/ijgi13030089 - 11 Mar 2024

Viewed by 1841

Abstract

Three-dimensional voxel models are widely applied in various fields such as 3D imaging, industrial design, and medical imaging. The advancement of 3D modeling techniques and measurement devices has made the generation of three-dimensional models more convenient. The exponential increase in the number of [...] Read more.

Three-dimensional voxel models are widely applied in various fields such as 3D imaging, industrial design, and medical imaging. The advancement of 3D modeling techniques and measurement devices has made the generation of three-dimensional models more convenient. The exponential increase in the number of 3D models presents a significant challenge for model retrieval. Currently, these models are numerous and typically represented as point clouds or meshes, resulting in sparse data and high feature dimensions within the retrieval database. Traditional methods for 3D model retrieval suffer from high computational complexity and slow retrieval speeds. To address this issue, this paper combines spatial-filling curves with octree structures and proposes a novel approach for representing three-dimensional voxel model sequence data features, along with a similarity measurement method based on symbolic operators. This approach enables efficient similarity calculations and rapid dimensionality reduction for the three-dimensional model database, facilitating efficient similarity calculations and expedited retrieval. Full article

► Show Figures

Figure 1

15 pages, 3959 KiB

Open AccessArticle

Sub-Bin Delayed High-Range Accuracy Photon-Counting 3D Imaging

by Hao-Meng Yin, Hui Zhao, Ming-Yang Yang, Yong-An Liu, Li-Zhi Sheng and Xue-Wu Fan

Photonics 2024, 11(2), 181; https://doi.org/10.3390/photonics11020181 - 16 Feb 2024

Viewed by 1279

Abstract

The range accuracy of single-photon-array three-dimensional (3D) imaging systems is limited by the time resolution of the array detectors. We introduce a method for achieving super-resolution in 3D imaging through sub-bin delayed scanning acquisition and fusion. Its central concept involves the generation of [...] Read more.

The range accuracy of single-photon-array three-dimensional (3D) imaging systems is limited by the time resolution of the array detectors. We introduce a method for achieving super-resolution in 3D imaging through sub-bin delayed scanning acquisition and fusion. Its central concept involves the generation of multiple sub-bin difference histograms through sub-bin shifting. Then, these coarse time-resolution histograms are fused with multiplied averages to produce finely time-resolved detailed histograms. Finally, the arrival times of the reflected photons with sub-bin resolution are extracted from the resulting fused high-time-resolution count distribution. Compared with the sub-delayed with the fusion method added, the proposed method performs better in reducing the broadening error caused by coarsened discrete sampling and background noise error. The effectiveness of the proposed method is examined at different target distances, pulse widths, and sub-bin scales. The simulation analytical results indicate that small-scale sub-bin delays contribute to superior reconstruction outcomes for the proposed method. Specifically, implementing a sub-bin temporal resolution delay of a factor of 0.1 for a 100 ps echo pulse width substantially reduces the system ranging error by three orders of magnitude. Furthermore, Monte Carlo simulations allow to describe a low signal-to-background noise ratio (0.05) characterised by sparsely reflected photons. The proposed method demonstrates a commendable capability to simultaneously achieve wide-ranging super-resolution and denoising. This is evidenced by the detailed depth distribution information and substantial reduction of 95.60% in the mean absolute error of the reconstruction results, confirming the effectiveness of the proposed method in noisy scenarios. Full article

► Show Figures

Figure 1

Figure 1
(a) Diagram of single-photon 3D imaging system (PC, personal computer; TDC, time-to-digital conversion); (b) the principle of time-correlated single-photon counting. Full article ">Figure 2
Distribution of quantisation error and quantisation centroid error with difference gate start time at 5.7 m target distance. Full article ">Figure 3
Example of proposed method for double delaying. Full article ">Figure 4
Delay of 0.1 ns for acquisition coarse histograms at 6 m target distance. Full article ">Figure 5
Counts over time using different fusion methods. (a) Theoretical photon-count distribution. (b) Direct measurement histogram distribution. (c) Additive fusion-count distribution. (d) Multiplicative fusion-count distribution. Full article ">Figure 6
Distribution of ranging error with theoretical distance for (a) <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mn>2</mn> <mi>n</mi> <mi>s</mi> </mrow> </semantics></math>, (b) <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mn>4</mn> <mi>n</mi> <mi>s</mi> </mrow> </semantics></math>, (c) <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mn>5</mn> <mi>n</mi> <mi>s</mi> </mrow> </semantics></math>, and (d) <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mn>10</mn> <mi>n</mi> <mi>s</mi> </mrow> </semantics></math>. Full article ">Figure 7
Reconstructed target distances at reflected echo pulse width τ of (a) 20 ps, (b) 50 ps, and (c) 100 ps. Full article ">Figure 8
Distribution of calculated and theoretical distances for pulse width τ of (a) 20 ps, (b) 50 ps, and (c) 100 ps. Full article ">Figure 9
(a,e) Ground truth. (b,f) Direct measurement reconstruction results for different SBR levels. (c,g) Subtractive dither reconstruction results for different SBR levels. (d,h) Proposed method reconstruction results for different SBR levels. Full article ">Figure 10
(a,e,i,m) Ground truth. (b,f,j,n) Direct measurement reconstruction results for different acquisition pulse numbers. (c,g,k,o) Subtractive dither reconstruction results for different acquisition pulse number. (d,h,l,p) Proposed method reconstruction results for different acquisition pulse numbers. Full article ">

19 pages, 7992 KiB

Open AccessArticle

A Deep Learning Approach for Improving Two-Photon Vascular Imaging Speeds

by Annie Zhou, Samuel A. Mihelic, Shaun A. Engelmann, Alankrit Tomar, Andrew K. Dunn and Vagheesh M. Narasimhan

Bioengineering 2024, 11(2), 111; https://doi.org/10.3390/bioengineering11020111 - 24 Jan 2024

Cited by 2 | Viewed by 1958

Abstract

A potential method for tracking neurovascular disease progression over time in preclinical models is multiphoton fluorescence microscopy (MPM), which can image cerebral vasculature with capillary-level resolution. However, obtaining high-quality, three-dimensional images with traditional point scanning MPM is time-consuming and limits sample sizes for [...] Read more.

A potential method for tracking neurovascular disease progression over time in preclinical models is multiphoton fluorescence microscopy (MPM), which can image cerebral vasculature with capillary-level resolution. However, obtaining high-quality, three-dimensional images with traditional point scanning MPM is time-consuming and limits sample sizes for chronic studies. Here, we present a convolutional neural network-based (PSSR Res-U-Net architecture) algorithm for fast upscaling of low-resolution or sparsely sampled images and combine it with a segmentation-less vectorization process for 3D reconstruction and statistical analysis of vascular network structure. In doing so, we also demonstrate that the use of semi-synthetic training data can replace the expensive and arduous process of acquiring low- and high-resolution training pairs without compromising vectorization outcomes, and thus open the possibility of utilizing such approaches for other MPM tasks where collecting training data is challenging. We applied our approach to images with large fields of view from a mouse model and show that our method generalizes across imaging depths, disease states and other differences in neurovasculature. Our pretrained models and lightweight architecture can be used to reduce MPM imaging time by up to fourfold without any changes in underlying hardware, thereby enabling deployability across a range of settings. Full article

(This article belongs to the Special Issue AI and Big Data Research in Biomedical Engineering)

► Show Figures

Figure 1

Figure 1
Structure and analysis pipeline. Low-resolution images (128 × 128 pixels) are acquired using two-photon microscopy. A deep learning (PSSR Res-U-Net)-based upscaling process generates high-resolution images (512 × 512 pixels), which take much longer to acquire, from low-resolution images. Segmentation-less vascular vectorization (SLAVV) generates 3D renderings and calculates network statistics from an upscaled image stack. Full article ">Figure 2
Generating and evaluating semi-synthetic training data. (a) Examples of semi-synthetic training images created using different types of added noise prior to downscaling: no noise (downscaling only), Poisson, Gaussian, and additive Gaussian. Acquired low-resolution (LR, 128 × 128 pixels) and high-resolution (HR, 512 × 512 pixels) ground truth images are shown for reference. (b) Resulting test image output from models trained using each noise method, with acquired low-resolution image for model input and acquired high-resolution image as ground truth for comparison. All models were trained with 3399 image pairs, with the Gaussian and additive Gaussian models further tested on 24,069 image pairs (7×) to further test performance. (c) Boxplot comparison of PSNR and SSIM values for each noise method image in (b) measured against ground truth image. Values plotted for an image stack of 222 images. (d) Comparison of test images from models trained using real-world acquired vs. semi-synthetic data, with real acquired low-resolution image for model input and acquired high-resolution image as ground truth for reference. All models were trained with 234 image pairs, a large reduction from the noise model comparison, due to the limited availability of real-world pairs. (e) Boxplot comparison of PSNR and SSIM values for real acquired vs. semi-synthetic model outputs corresponding to (d), measured against high-resolution ground truth image. All values are plotted for an image stack of 222 images. Full article ">Figure 3
Comparison of performance between bilinear upscaling, a single-frame model, and a multi-frame model for semi-synthetic and real acquired test images. All models were trained with 24,069 image pairs. (a) Semi-synthetic test images from bilinear upscaling and models trained using single- vs. multi-frame data. Acquired low-resolution image for model input and acquired high-resolution image as ground truth are shown for reference. (b) PSNR and SSIM plots corresponding to semi-synthetic test results from (a). (c) Real-world test images from bilinear upscaling and models trained using single- vs. multi-frame data. (d) PSNR and SSIM plots corresponding to real-world test image results from (c). Full article ">Figure 4
Maximum-intensity projections (x-y) of ischemic infarct images consisting of 2 × 4 tiles with 213 slices (final dimensions 1.18 mm × 2.10 mm × 0.636 mm, pixel dimensions 1.34 μm × 1.36 μm × 3 μm) for a semi-synthetic low-resolution image, bilinear upscaled image, single- and multi-frame output images, and acquired high-resolution image. The black hole in the bottom-left corner represents the infarct itself. Full article ">Figure 5
Comparison of vectorization results using different upscaling methods against a ground truth image. (a) Blender rendering of vectorized images using VessMorphoVis [<a href="#B26-bioengineering-11-00111" class="html-bibr">26</a>] for visual comparison between single- and multi-frame results and an acquired high-resolution image. We performed manual curation for this vectorization process. (b) Vectorized image statistics for the automated curation process with known ground truth (simulated from manually curated high-resolution image). CDFs shown for metrics of length, radius, z-direction, and inverse tortuosity for original (OG), simulated original (sOG), bilinear upscaled (BL), and PSSR single- and multi-frame (SF, MF, respectively) images. Pearson’s correlation values (r2) were calculated between the original image and each simulated or upscaled image for each metric. (c) Statistics regarding maximum accuracy (%) achieved with vectorization or thresholding and % error in median length and radius for each method. Full article ">

17 pages, 7346 KiB

Open AccessArticle

W-Band FMCW MIMO System for 3-D Imaging Based on Sparse Array

by Wenyuan Shao, Jianmin Hu, Yicai Ji, Wenrui Zhang and Guangyou Fang

Electronics 2024, 13(2), 369; https://doi.org/10.3390/electronics13020369 - 16 Jan 2024

Cited by 4 | Viewed by 1656

Abstract

Multiple-input multiple-output (MIMO) technology is widely used in the field of security imaging. However, existing imaging systems have shortcomings such as numerous array units, high hardware costs, and low imaging resolutions. In this paper, a sparse array-based frequency modulated continuous wave (FMCW) millimeter [...] Read more.

Multiple-input multiple-output (MIMO) technology is widely used in the field of security imaging. However, existing imaging systems have shortcomings such as numerous array units, high hardware costs, and low imaging resolutions. In this paper, a sparse array-based frequency modulated continuous wave (FMCW) millimeter wave imaging system, operating in the W-band, is presented. In order to reduce the number of transceiver units of the system and lower the hardware cost, a linear sparse array with a periodic structure was designed using the MIMO technique. The system operates at 70~80 GHz, and the high operating frequency band and 10 GHz bandwidth provide good imaging resolution. The system consists of a one-dimensional linear array, a motion control system, and hardware for signal generation and image reconstruction. The channel calibration technique was used to eliminate inherent errors. The system combines mechanical and electrical scanning, and uses FMCW signals to extract distance information. The three-dimensional (3-D) fast imaging algorithm in the wave number domain was utilized to quickly process the detection data. The 3-D imaging of the target in the near-field was obtained, with an imaging resolution of 2 mm. The imaging ability of the system was verified through simulations and experiments. Full article

(This article belongs to the Special Issue Radar Signal Processing Technology)

► Show Figures

Figure 1

30 pages, 22271 KiB

Open AccessArticle

A Novel Approach for Simultaneous Localization and Dense Mapping Based on Binocular Vision in Forest Ecological Environment

by Lina Liu, Yaqiu Liu, Yunlei Lv and Xiang Li

Forests 2024, 15(1), 147; https://doi.org/10.3390/f15010147 - 10 Jan 2024

Cited by 3 | Viewed by 1823

Abstract

The three-dimensional reconstruction of forest ecological environment by low-altitude remote sensing photography from Unmanned Aerial Vehicles (UAVs) provides a powerful basis for the fine surveying of forest resources and forest management. A stereo vision system, D-SLAM, is proposed to realize simultaneous localization and [...] Read more.

The three-dimensional reconstruction of forest ecological environment by low-altitude remote sensing photography from Unmanned Aerial Vehicles (UAVs) provides a powerful basis for the fine surveying of forest resources and forest management. A stereo vision system, D-SLAM, is proposed to realize simultaneous localization and dense mapping for UAVs in complex forest ecological environments. The system takes binocular images as input and 3D dense maps as target outputs, while the 3D sparse maps and the camera poses can be obtained. The tracking thread utilizes temporal clue to match sparse map points for zero-drift localization. The relative motion amount and data association between frames are used as constraints for new keyframes selection, and the binocular image spatial clue compensation strategy is proposed to increase the robustness of the algorithm tracking. The dense mapping thread uses Linear Attention Network (LANet) to predict reliable disparity maps in ill-posed regions, which are transformed to depth maps for constructing dense point cloud maps. Evaluations of three datasets, EuRoC, KITTI and Forest, show that the proposed system can run at 30 ordinary frames and 3 keyframes per second with Forest, with a high localization accuracy of several centimeters for Root Mean Squared Absolute Trajectory Error (RMS ATE) on EuRoC and a Relative Root Mean Squared Error (RMSE) with two average values of 0.64 and 0.2 for t_rel and R_rel with KITTI, outperforming most mainstream models in terms of tracking accuracy and robustness. Moreover, the advantage of dense mapping compensates for the shortcomings of sparse mapping in most Smultaneous Localization and Mapping (SLAM) systems and the proposed system meets the requirements of real-time localization and dense mapping in the complex ecological environment of forests. Full article

(This article belongs to the Special Issue Modeling and Remote Sensing of Forests Ecosystem)

► Show Figures

Figure 1

Search Results (68)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (68)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI