[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (68)

Search Parameters:
Keywords = three-dimensional (3D) sparse imaging

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 9980 KiB  
Article
TGNF-Net: Two-Stage Geometric Neighborhood Fusion Network for Category-Level 6D Pose Estimation
by Xiaolong Zhao, Feihu Yan, Guangzhe Zhao and Caiyong Wang
Information 2025, 16(2), 113; https://doi.org/10.3390/info16020113 - 6 Feb 2025
Viewed by 449
Abstract
The main goal of six-dimensional pose estimation is to accurately ascertain the location and orientation of an object in three-dimensional space, which has a wide range of applications in the field of artificial intelligence. Due to the relative sparseness of the point cloud [...] Read more.
The main goal of six-dimensional pose estimation is to accurately ascertain the location and orientation of an object in three-dimensional space, which has a wide range of applications in the field of artificial intelligence. Due to the relative sparseness of the point cloud data captured by the depth camera, the ability of models to fully understand the shape, structure, and other features of the object is hindered. Consequently, the model exhibits weak generalization when faced with objects with significant shape differences in the new scene. The deep integration of feature levels and the mining of local and global information can effectively alleviate the influence of the above factors. To solve these problems, we propose a new Two-Stage Geometric Neighborhood Fusion Network for category-level 6D pose estimation (TGNF-Net) to estimate objects that have not appeared in the training phase, which strengthens the fusion capacity of feature points within a specific range of neighborhoods, enabling the feature points to be more sensitive to both local and global geometric information. Our approach includes a neighborhood information fusion module, which can effectively utilize neighborhood information to enrich the feature set of different modal data and overcome the problem of heterogeneity between image and point cloud data. In addition to this, we design a two-stage geometric information embedding module, which can effectively fuse geometric information of the multi-scale range into keypoint features. This way enhances the robustness of the model and enables the model to exhibit stronger generalization capabilities when faced with unknown or complex scenes. These two strategies enhance the expression of features and make NOCS coordinate predictions more accurate. Many experiments show that our approach is superior to other classical methods on the CAMERA25, REAL275, HouseCat6D, and Omni6DPose datasets. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Comparison between different networks. Unlike ordinary feature fusion (<b>a</b>), our method (<b>b</b>) fuses the neighborhood information of different modalities into the feature points. The red/green/gray cubes represent RGB/point cloud/neighborhood features respectively.</p>
Full article ">Figure 2
<p>Framework of TGNF-Net. We input RGB-D images for cropping and segmentation and obtain <math display="inline"><semantics> <msub> <mi>F</mi> <mrow> <mi>r</mi> <mi>g</mi> <mi>b</mi> </mrow> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>F</mi> <mrow> <mi>d</mi> <mi>e</mi> <mi>p</mi> <mi>t</mi> <mi>h</mi> </mrow> </msub> </semantics></math> through the feature extraction network. (i) Illustration of the Neighborhood Information Aggregation module. We calculate the similarity of <math display="inline"><semantics> <msub> <mi>F</mi> <mrow> <mi>r</mi> <mi>g</mi> <mi>b</mi> </mrow> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>F</mi> <mrow> <mi>d</mi> <mi>e</mi> <mi>p</mi> <mi>t</mi> <mi>h</mi> </mrow> </msub> </semantics></math>. For a certain RGB feature point, <math display="inline"><semantics> <msub> <mi>F</mi> <mrow> <mi>r</mi> <mi>g</mi> <mi>b</mi> </mrow> </msub> </semantics></math> obtains the neighborhood point cloud feature points <span class="html-italic">Nei-pc</span> with the highest similarity. For a certain feature point of point cloud, <math display="inline"><semantics> <msub> <mi>F</mi> <mrow> <mi>d</mi> <mi>e</mi> <mi>p</mi> <mi>t</mi> <mi>h</mi> </mrow> </msub> </semantics></math> obtains the neighborhood RGB feature points <span class="html-italic">Nei-rgb</span> with the highest similarity. <span class="html-italic">Nei-pc</span> and <span class="html-italic">Nei-rgb</span> are fused into <math display="inline"><semantics> <msub> <mi>F</mi> <mrow> <mi>r</mi> <mi>g</mi> <mi>b</mi> </mrow> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>F</mi> <mrow> <mi>d</mi> <mi>e</mi> <mi>p</mi> <mi>t</mi> <mi>h</mi> </mrow> </msub> </semantics></math> respectively to obtain <math display="inline"><semantics> <msub> <mi>F</mi> <mrow> <mi>r</mi> <mi>g</mi> <mi>b</mi> <mi>d</mi> </mrow> </msub> </semantics></math>. We initialize a set of category-shared learnable queries <span class="html-italic">Q</span> and represent the keypoint detector. (ii) Illustration of Two-Stage Geometric Information Embedding module. In the <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>t</mi> <mi>a</mi> <mi>g</mi> <mi>e</mi> <mi>I</mi> </mrow> </semantics></math>, for a certain feature point, we extract its local features in the point cloud and its global features inside the feature points. In the <math display="inline"><semantics> <mrow> <mi>s</mi> <mi>t</mi> <mi>a</mi> <mi>g</mi> <mi>e</mi> <mi>I</mi> <mi>I</mi> </mrow> </semantics></math>, we use the NNS algorithm to expand the number of detected point clouds and internal feature points to double the previous number.</p>
Full article ">Figure 3
<p>Two-stage geometric information embedding process. We used the NNS algorithm to double the number of points.</p>
Full article ">Figure 4
<p>Visual comparison of our method with other methods on the REAL275 dataset. In the visualization, the green bounding boxes represent the ground truth, and the red bounding boxes represent the prediction. Green points represent small errors, and red points represent large errors.</p>
Full article ">Figure 5
<p>Visual comparison of our method with other methods on the HouseCat6D dataset. In the visualization, the green bounding boxes represent the ground truth, and the red bounding boxes represent the prediction.</p>
Full article ">Figure 6
<p>Visual comparison of our method with other methods on the Omni6DPose dataset. In the visualization, the green bounding boxes represent the ground truth, and the red bounding boxes represent the prediction.</p>
Full article ">Figure 7
<p>Comparison of our method with AG-Pose and Query6DoF on different objects. We show the trend plot of the precision obtained by different cm-deg thresholds on the CAMERA25 dataset.</p>
Full article ">Figure 8
<p>Visual comparison of our method with the other two modules. In the visualization, the green bounding boxes represent the ground truth, and the red bounding boxes represent the prediction.</p>
Full article ">
22 pages, 10897 KiB  
Article
Array Three-Dimensional SAR Imaging via Composite Low-Rank and Sparse Prior
by Zhiliang Yang, Yangyang Wang, Chudi Zhang, Xu Zhan, Guohao Sun, Yuxuan Liu and Yuru Mao
Remote Sens. 2025, 17(2), 321; https://doi.org/10.3390/rs17020321 - 17 Jan 2025
Cited by 1 | Viewed by 464
Abstract
Array three-dimensional (3D) synthetic aperture radar (SAR) imaging has been used for 3D modeling of urban buildings and diagnosis of target scattering characteristics, and represents one of the significant directions in SAR development in recent years. However, sparse driven 3D imaging methods usually [...] Read more.
Array three-dimensional (3D) synthetic aperture radar (SAR) imaging has been used for 3D modeling of urban buildings and diagnosis of target scattering characteristics, and represents one of the significant directions in SAR development in recent years. However, sparse driven 3D imaging methods usually only capture the sparse features of the imaging scene, which can result in the loss of the structural information of the target and cause bias effects, affecting the imaging quality. To address this issue, we propose a novel array 3D SAR imaging method based on composite sparse and low-rank prior (SLRP), which can achieve high-quality imaging even with limited observation data. Firstly, an imaging optimization model based on composite SLRP is established, which captures both sparse and low-rank features simultaneously by combining non-convex regularization functions and improved nuclear norm (INN), reducing bias effects during the imaging process and improving imaging accuracy. Then, the framework that integrates variable splitting and alternative minimization (VSAM) is presented to solve the imaging optimization problem, which is suitable for high-dimensional imaging scenes. Finally, the performance of the method is validated through extensive simulation and real data experiments. The results indicate that the proposed method can significantly improve imaging quality with limited observational data. Full article
(This article belongs to the Special Issue SAR Images Processing and Analysis (2nd Edition))
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Imaging model and flowchart. (<b>a</b>) Geometric diagram of array 3D SAR imaging. (<b>b</b>) Process flowchart of the proposed method.</p>
Full article ">Figure 2
<p>The low-rank of SAR image.</p>
Full article ">Figure 3
<p>Aircraft target imaging results under 100% SR. (<b>a</b>) RI. (<b>b</b>) MFW. (<b>c</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>d</b>) SCAD. (<b>e</b>) LR-<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>f</b>) The proposed method.</p>
Full article ">Figure 4
<p>Aircraft target imaging results under 75% SR. (<b>a</b>) MFW. (<b>b</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>c</b>) SCAD. (<b>d</b>) LR−<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) The proposed method.</p>
Full article ">Figure 5
<p>Aircraf target imaging results under 50% SR. (<b>a</b>) MFW. (<b>b</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>c</b>) SCAD. (<b>d</b>) LR−<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) CP-VSAM.</p>
Full article ">Figure 6
<p>Aircraft target imaging results under 10 dB SNR. (<b>a</b>) MFW. (<b>b</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>c</b>) SCAD. (<b>d</b>) LR−<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) CP-VSAM.</p>
Full article ">Figure 7
<p>Aircraft target imaging results under 5 dB SNR. (<b>a</b>) MFW. (<b>b</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>c</b>) SCAD. (<b>d</b>) LR−<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) CP-VSAM.</p>
Full article ">Figure 8
<p>The target scene. (<b>a</b>) The metal hammer. (<b>b</b>) The metal stiletto.</p>
Full article ">Figure 9
<p>The imaging result of the hammer under 100% SR. (<b>a</b>) RI. (<b>b</b>) MFW. (<b>c</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>d</b>) SCAD. (<b>e</b>) LR-<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>f</b>) CP-VSAM.</p>
Full article ">Figure 10
<p>The imaging result of the hammer under 75% SR. (<b>a</b>) MFW. (<b>b</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>c</b>) SCAD. (<b>d</b>) LR−<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) CP-VSAM.</p>
Full article ">Figure 11
<p>The imaging result of the hammer under 50% SR. (<b>a</b>) MFW. (<b>b</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>c</b>) SCAD. (<b>d</b>) LR−<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) CP-VSAM.</p>
Full article ">Figure 12
<p>The imaging result of the stiletto under 100% SR. (<b>a</b>) RI. (<b>b</b>) MFW. (<b>c</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>d</b>) SCAD. (<b>e</b>) LR−<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>f</b>) CP-VSAM.</p>
Full article ">Figure 13
<p>The imaging result of the stiletto under 75% SR. (<b>a</b>) MFW. (<b>b</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>c</b>) SCAD. (<b>d</b>) LR−<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) CP-VSAM.</p>
Full article ">Figure 14
<p>The imaging result of the stiletto under 50% SR. (<b>a</b>) MFW. (<b>b</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>c</b>) SCAD. (<b>d</b>) LR−<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) CP-VSAM.</p>
Full article ">Figure 15
<p>Aircraft target imaging results under 0 dB SNR. (<b>a</b>) MFW. (<b>b</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>c</b>) SCAD. (<b>d</b>) LR−<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) CP-VSAM.</p>
Full article ">Figure 16
<p>Aircraft target imaging results under −5 dB SNR. (<b>a</b>) MFW. (<b>b</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>c</b>) SCAD. (<b>d</b>) LR−<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) CP-VSAM.</p>
Full article ">
16 pages, 2102 KiB  
Article
Semantic Segmentation Method for High-Resolution Tomato Seedling Point Clouds Based on Sparse Convolution
by Shizhao Li, Zhichao Yan, Boxiang Ma, Shaoru Guo and Hongxia Song
Agriculture 2025, 15(1), 74; https://doi.org/10.3390/agriculture15010074 - 31 Dec 2024
Viewed by 525
Abstract
Semantic segmentation of three-dimensional (3D) plant point clouds at the stem-leaf level is foundational and indispensable for high-throughput tomato phenotyping systems. However, existing semantic segmentation methods often suffer from issues such as low precision and slow inference speed. To address these challenges, we [...] Read more.
Semantic segmentation of three-dimensional (3D) plant point clouds at the stem-leaf level is foundational and indispensable for high-throughput tomato phenotyping systems. However, existing semantic segmentation methods often suffer from issues such as low precision and slow inference speed. To address these challenges, we propose an innovative encoding-decoding structure, incorporating voxel sparse convolution (SpConv) and attention-based feature fusion (VSCAFF) to enhance semantic segmentation of the point clouds of high-resolution tomato seedling images. Tomato seedling point clouds from the Pheno4D dataset labeled into semantic classes of ‘leaf’, ‘stem’, and ‘soil’ are applied for the semantic segmentation. In order to reduce the number of parameters so as to further improve the inference speed, the SpConv module is designed to function through the residual concatenation of the skeleton convolution kernel and the regular convolution kernel. The feature fusion module based on the attention mechanism is designed by giving the corresponding attention weights to the voxel diffusion features and the point features in order to avoid the ambiguity of points with different semantics having the same characteristics caused by the diffusion module, in addition to suppressing noise. Finally, to solve model training class bias caused by the uneven distribution of point cloud classes, the composite loss function of Lovász-Softmax and weighted cross-entropy is introduced to supervise the model training and improve its performance. The results show that mIoU of VSCAFF is 86.96%, which outperformed the performance of PointNet, PointNet++, and DGCNN, respectively. IoU of VSCAFF achieves 99.63% in the soil class, 64.47% in the stem class, and 96.72% in the leaf class. The time delay of 35ms in inference speed is better than PointNet++ and DGCNN. The results demonstrate that VSCAFF has high performance and inference speed for semantic segmentation of high-resolution tomato point clouds, and can provide technical support for the high-throughput automatic phenotypic analysis of tomato plants. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>The raw tomato seedling point cloud and point cloud labeled into semantic classes of ‘leaf’, and ‘stem’, and ‘soil’.</p>
Full article ">Figure 2
<p>The structure of network.</p>
Full article ">Figure 3
<p>Encoding-decodin g architecture based on SpConv.</p>
Full article ">Figure 4
<p>Three kinds of convolution kernel structures.</p>
Full article ">Figure 5
<p>Attention-based feature fusion method.</p>
Full article ">Figure 6
<p>Semantic segmentation of the tomato plant point clouds. Note1: GT represents ground truth. Note2: Four seedling point clouds scanned in four discrete days represented.</p>
Full article ">
13 pages, 7953 KiB  
Article
TAMC: Textual Alignment and Masked Consistency for Open-Vocabulary 3D Scene Understanding
by Juan Wang, Zhijie Wang, Tomo Miyazaki, Yaohou Fan and Shinichiro Omachi
Sensors 2024, 24(19), 6166; https://doi.org/10.3390/s24196166 - 24 Sep 2024
Viewed by 806
Abstract
Three-dimensional (3D) Scene Understanding achieves environmental perception by extracting and analyzing point cloud data with wide applications including virtual reality, robotics, etc. Previous methods align the 2D image feature from a pre-trained CLIP model and the 3D point cloud feature for the open [...] Read more.
Three-dimensional (3D) Scene Understanding achieves environmental perception by extracting and analyzing point cloud data with wide applications including virtual reality, robotics, etc. Previous methods align the 2D image feature from a pre-trained CLIP model and the 3D point cloud feature for the open vocabulary scene understanding ability. We believe that existing methods have the following two deficiencies: (1) the 3D feature extraction process ignores the challenges of real scenarios, i.e., point cloud data are very sparse and even incomplete; (2) the training stage lacks direct text supervision, leading to inconsistency with the inference stage. To address the first issue, we employ a Masked Consistency training policy. Specifically, during the alignment of 3D and 2D features, we mask some 3D features to force the model to understand the entire scene using only partial 3D features. For the second issue, we generate pseudo-text labels and align them with the 3D features during the training process. In particular, we first generate a description for each 2D image belonging to the same 3D scene and then use a summarization model to fuse these descriptions into a single description of the scene. Subsequently, we align 2D-3D features and 3D-text features simultaneously during training. Massive experiments demonstrate the effectiveness of our method, outperforming state-of-the-art approaches. Full article
(This article belongs to the Special Issue Object Detection via Point Cloud Data)
Show Figures

Figure 1

Figure 1
<p>Comparison of previous methods and ours. (<b>a</b>) is the workflow of methods with class limitation, which needs labeled 3D point cloud data of seen classes and evaluates the model on unlabeled classes. As a comparison, both (<b>b</b>) OpenScene [<a href="#B7-sensors-24-06166" class="html-bibr">7</a>] and (<b>c</b>) our method follow the open vocabulary setting, meaning that no annotated data are required during the training process. The difference lies in that the proposed training process includes the supervision from the pseudo text label, and the masking training policy makes the extracted 3D features more robust, resulting in higher accuracy.</p>
Full article ">Figure 2
<p>Caption generation. Multi-view images are fed into the image captioning model <math display="inline"><semantics> <msub> <mi mathvariant="script">G</mi> <mrow> <mi>c</mi> <mi>a</mi> <mi>p</mi> </mrow> </msub> </semantics></math> to generate corresponding captions <math display="inline"><semantics> <msub> <mi>t</mi> <mi>i</mi> </msub> </semantics></math> of a scene with <span class="html-italic">N</span> images, then the text-summarization model <math display="inline"><semantics> <msub> <mi mathvariant="script">G</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>m</mi> </mrow> </msub> </semantics></math> summarizes <math display="inline"><semantics> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mn>1</mn> </msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi>t</mi> <mi>i</mi> </msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi>t</mi> <mi>N</mi> </msub> <mo>)</mo> </mrow> </semantics></math> to generate a scene-level caption <math display="inline"><semantics> <mi mathvariant="bold">t</mi> </semantics></math>.</p>
Full article ">Figure 3
<p>Method overview. (<b>a</b>) Training. Given a 3D point cloud, a set of posed images, and a scene-level caption, we train a 3D encoder <math display="inline"><semantics> <msubsup> <mi>ϵ</mi> <mi>θ</mi> <mrow> <mn>3</mn> <mi>D</mi> </mrow> </msubsup> </semantics></math> to produce a masked point-wise 3D feature <math display="inline"><semantics> <msubsup> <mrow> <mi mathvariant="bold">F</mi> </mrow> <mrow> <mi>M</mi> <mi>a</mi> <mi>s</mi> <mi>k</mi> </mrow> <mrow> <mn>3</mn> <mi>D</mi> </mrow> </msubsup> </semantics></math> with two losses: <math display="inline"><semantics> <msub> <mi mathvariant="script">L</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi mathvariant="script">L</mi> <mn>2</mn> </msub> </semantics></math> for <math display="inline"><semantics> <msubsup> <mrow> <mi mathvariant="bold">F</mi> </mrow> <mrow> <mi>M</mi> <mi>a</mi> <mi>s</mi> <mi>k</mi> </mrow> <mrow> <mn>3</mn> <mi>D</mi> </mrow> </msubsup> </semantics></math>-<math display="inline"><semantics> <msup> <mrow> <mi mathvariant="bold">F</mi> </mrow> <mrow> <mn>2</mn> <mi>D</mi> </mrow> </msup> </semantics></math> loss and <math display="inline"><semantics> <msubsup> <mrow> <mi mathvariant="bold">F</mi> </mrow> <mrow> <mi>M</mi> <mi>a</mi> <mi>s</mi> <mi>k</mi> </mrow> <mrow> <mn>3</mn> <mi>D</mi> </mrow> </msubsup> </semantics></math>-<math display="inline"><semantics> <msup> <mrow> <mi mathvariant="bold">F</mi> </mrow> <mrow> <mi>T</mi> <mi>e</mi> <mi>x</mi> <mi>t</mi> </mrow> </msup> </semantics></math> loss, respectively (refer to <a href="#sec3dot4-sensors-24-06166" class="html-sec">Section 3.4</a>). (<b>b</b>) Testing. We use cosine similarity loss between per-point features and text features to perform open-vocabulary 3D Scene Understanding tasks. The ‘<span class="html-italic">an XX in a scene</span>’ serves as the input prompt for text, where ‘<span class="html-italic">XX</span>’ represents query text, which adopts a dataset class during the segmentation task.</p>
Full article ">Figure 4
<p>Qualitative results on ScanNet [<a href="#B26-sensors-24-06166" class="html-bibr">26</a>]. From left to right: 3D input and related 2D image, (<b>a</b>) the result of the baseline method (OpenScene [<a href="#B7-sensors-24-06166" class="html-bibr">7</a>]), (<b>b</b>) the proposed method, (<b>c</b>) the ground truth segementation.</p>
Full article ">
15 pages, 2064 KiB  
Article
Research on the Depth Image Reconstruction Algorithm Using the Two-Dimensional Kaniadakis Entropy Threshold
by Xianhui Yang, Jianfeng Sun, Le Ma, Xin Zhou, Wei Lu and Sining Li
Sensors 2024, 24(18), 5950; https://doi.org/10.3390/s24185950 - 13 Sep 2024
Viewed by 780
Abstract
The photon-counting light laser detection and ranging (LiDAR), especially the Geiger mode avalanche photon diode (Gm-APD) LiDAR, can obtain three-dimensional images of the scene, with the characteristics of single-photon sensitivity, but the background noise limits the imaging quality of the laser radar. In [...] Read more.
The photon-counting light laser detection and ranging (LiDAR), especially the Geiger mode avalanche photon diode (Gm-APD) LiDAR, can obtain three-dimensional images of the scene, with the characteristics of single-photon sensitivity, but the background noise limits the imaging quality of the laser radar. In order to solve this problem, a depth image estimation method based on a two-dimensional (2D) Kaniadakis entropy thresholding method is proposed which transforms a weak signal extraction problem into a denoising problem for point cloud data. The characteristics of signal peak aggregation in the data and the spatio-temporal correlation features between target image elements in the point cloud-intensity data are exploited. Through adequate simulations and outdoor target-imaging experiments under different signal-to-background ratios (SBRs), the effectiveness of the method under low signal-to-background ratio conditions is demonstrated. When the SBR is 0.025, the proposed method reaches a target recovery rate of 91.7%, which is better than the existing typical methods, such as the Peak-picking method, Cross-Correlation method, and the sparse Poisson intensity reconstruction algorithm (SPIRAL), which achieve a target recovery rate of 15.7%, 7.0%, and 18.4%, respectively. Additionally, comparing with the SPIRAL, the reconstruction recovery ratio is improved by 73.3%. The proposed method greatly improves the integrity of the target under high-background-noise environments and finally provides a basis for feature extraction and target recognition. Full article
(This article belongs to the Special Issue Application of LiDAR Remote Sensing and Mapping)
Show Figures

Figure 1

Figure 1
<p>Flowchart of the proposed algorithm.</p>
Full article ">Figure 2
<p>(<b>a</b>) Distribution of the probability density of the signal and noise when the signal position is at the 300th time bin. (<b>b</b>) Counting histogram of 0.1 s data with signal and noise.</p>
Full article ">Figure 3
<p>Signal detection rate for extracting different numbers of wave peaks under different SBRs.</p>
Full article ">Figure 4
<p>(<b>a</b>) The standard depth image. (<b>b</b>) The standard intensity image.</p>
Full article ">Figure 5
<p>Depth images obtained using different methods of reconstruction. (<b>a</b>–<b>d</b>) SBR is 0.01; (<b>e</b>–<b>h</b>) SBR is 0.02; (<b>i</b>–<b>l</b>) SBR is 0.04; (<b>m</b>–<b>p</b>) SBR is 0.06; (<b>q</b>–<b>t</b>) SBR is 0.08; (<b>a</b>,<b>e</b>,<b>i</b>,<b>m</b>,<b>q</b>) Peak-picking method; (<b>b</b>,<b>f</b>,<b>j</b>,<b>n</b>,<b>r</b>) Cross-Correlation method; (<b>c</b>,<b>g</b>,<b>k</b>,<b>o</b>,<b>s</b>) SPIRAL method; (<b>d</b>,<b>h</b>,<b>l</b>,<b>p</b>,<b>t</b>) proposed method.</p>
Full article ">Figure 6
<p>The relationship between the SBR of the scene and the SNR of the reconstructed image using different methods.</p>
Full article ">Figure 7
<p>The photos of the experimental targets.</p>
Full article ">Figure 8
<p>The number of echo photons/pixel of the scenes. (<b>a</b>) Building at a distance of 730 m with an SBR of 0.078. (<b>b</b>) Building at a distance of 730 m with an SBR of 0.053. (<b>c</b>) Building at a distance of 730 m with an SBR of 0.031. (<b>d</b>) Building at a distance of 1400 m with an SBR of 0.031. (<b>e</b>) Building at a distance of 1400 m with an SBR of 0.025.</p>
Full article ">Figure 9
<p>Truth depth images and depth images obtained using different methods of the 730 m building. (<b>a</b>–<b>e</b>) SBR is 0.078; (<b>f</b>–<b>j</b>) SBR is 0.053; (<b>k</b>–<b>o</b>) SBR is 0.031; (<b>a</b>,<b>f</b>,<b>k</b>) truth depth image; (<b>b</b>,<b>g</b>,<b>l</b>) Peak-picking method; (<b>c</b>,<b>h</b>,<b>m</b>) Cross-Correlation method; (<b>d</b>,<b>i</b>,<b>n</b>) SPIRAL method; (<b>e</b>,<b>j</b>,<b>o</b>) proposed method.</p>
Full article ">Figure 10
<p>Truth depth images and depth images obtained using different methods of the 1400 m building. (<b>a</b>–<b>e</b>) SBR is 0.031; (<b>f</b>–<b>j</b>) SBR is 0.025; (<b>a</b>,<b>f</b>) truth depth image; (<b>b</b>,<b>g</b>) Peak-picking method; (<b>c</b>,<b>h</b>) Cross-Correlation method; (<b>d</b>,<b>i</b>) SPIRAL method; (<b>e</b>,<b>j</b>) proposed method.</p>
Full article ">Figure 11
<p>Intensity images with different signal-to-background ratios. (<b>a</b>) A 730 m building intensity image with an SBR of 0.078; (<b>b</b>) a 730 m building intensity image with an SBR of 0.053; (<b>c</b>) a 730 m building intensity image with an SBR of 0.031; (<b>d</b>) a 1400 m building intensity image with an SBR of 0.031; (<b>e</b>) a 1400 m building intensity image with an SBR of 0.025.</p>
Full article ">
22 pages, 13810 KiB  
Article
An Underwater Stereo Matching Method: Exploiting Segment-Based Method Traits without Specific Segment Operations
by Xinlin Xu, Huiping Xu, Lianjiang Ma, Kelin Sun and Jingchuan Yang
J. Mar. Sci. Eng. 2024, 12(9), 1599; https://doi.org/10.3390/jmse12091599 - 10 Sep 2024
Viewed by 1136
Abstract
Stereo matching technology, enabling the acquisition of three-dimensional data, holds profound implications for marine engineering. In underwater images, irregular object surfaces and the absence of texture information make it difficult for stereo matching algorithms that rely on discrete disparity values to accurately capture [...] Read more.
Stereo matching technology, enabling the acquisition of three-dimensional data, holds profound implications for marine engineering. In underwater images, irregular object surfaces and the absence of texture information make it difficult for stereo matching algorithms that rely on discrete disparity values to accurately capture the 3D details of underwater targets. This paper proposes a stereo method based on an energy function of Markov random field (MRF) with 3D labels to fit the inclined plane of underwater objects. Through the integration of a cross-based patch alignment approach with two label optimization stages, the proposed method demonstrates features akin to segment-based stereo matching methods, enabling it to handle images with sparse textures effectively. Through experiments conducted on both simulated UW-Middlebury datasets and real deteriorated underwater images, our method demonstrates superiority compared to classical or state-of-the-art methods by analyzing the acquired disparity maps and observing the three-dimensional reconstruction of the underwater target. Full article
(This article belongs to the Special Issue Underwater Observation Technology in Marine Environment)
Show Figures

Figure 1

Figure 1
<p>The pipeline of our method, which is founded on the construction of two units: the fixed grid and the adaptive region, corresponding to different matching stages for proposing proposals.</p>
Full article ">Figure 2
<p>The relation between a cross-based patch and a center grid. The cross-based patch constructed with the center <span class="html-italic">p</span> of grid <math display="inline"><semantics> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </semantics></math> as the anchor is covered by grid <math display="inline"><semantics> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </semantics></math>.</p>
Full article ">Figure 3
<p>The illustration of the process of enhancing the cross-based patch. Binary labels, “0” and “1”, are employed to signify whether the pixels are located within the small speckle.</p>
Full article ">Figure 4
<p>The propagation process and expansion process occurring in a center grid <math display="inline"><semantics> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </semantics></math>.</p>
Full article ">Figure 5
<p>Summary of the proposed methodology.</p>
Full article ">Figure 6
<p>UW-Middlebury dataset.</p>
Full article ">Figure 7
<p>Effects of different optimization stages on error rate.</p>
Full article ">Figure 8
<p>Effect of optimization stages on running time. <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>o</mi> <msub> <mi>p</mi> <mi>r</mi> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>o</mi> <msub> <mi>p</mi> <mi>c</mi> </msub> </mrow> </semantics></math> signify the use of two optimization stages for conducting disparity estimation for Reindeer and Cones, respectively.</p>
Full article ">Figure 9
<p>Visual effect of optimization stages. The images illustrate the changes in disparity during the coarse-to-fine matching stage, with and without the propagation process. Frames 1, 3 shows the method’s performance in boundary regions, whereas frame 2 highlights its effectiveness in areas with repetitive textures.</p>
Full article ">Figure 10
<p>Visualization analysis of disparity results from diverse algorithms utilizing the dataset captured by Hawaii Institute [<a href="#B3-jmse-12-01599" class="html-bibr">3</a>,<a href="#B21-jmse-12-01599" class="html-bibr">21</a>,<a href="#B24-jmse-12-01599" class="html-bibr">24</a>,<a href="#B35-jmse-12-01599" class="html-bibr">35</a>].</p>
Full article ">Figure 11
<p>Visualization analysis of disparity results from diverse algorithms utilizing the dataset captured by our institute [<a href="#B3-jmse-12-01599" class="html-bibr">3</a>,<a href="#B21-jmse-12-01599" class="html-bibr">21</a>,<a href="#B24-jmse-12-01599" class="html-bibr">24</a>,<a href="#B35-jmse-12-01599" class="html-bibr">35</a>].</p>
Full article ">
18 pages, 6500 KiB  
Article
NSVDNet: Normalized Spatial-Variant Diffusion Network for Robust Image-Guided Depth Completion
by Jin Zeng and Qingpeng Zhu
Electronics 2024, 13(12), 2418; https://doi.org/10.3390/electronics13122418 - 20 Jun 2024
Cited by 1 | Viewed by 1098
Abstract
Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance [...] Read more.
Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance of aligned Red–Green–Blue (RGB) images. Recent approaches have achieved a remarkable improvement, but the performance will degrade severely due to the corruption in input sparse depth. To enhance robustness to input corruption, we propose a novel depth completion scheme based on a normalized spatial-variant diffusion network incorporating measurement uncertainty, which introduces the following contributions. First, we design a normalized spatial-variant diffusion (NSVD) scheme to apply spatially varying filters iteratively on the sparse depth conditioned on its certainty measure for excluding depth corruption in the diffusion. In addition, we integrate the NSVD module into the network design to enable end-to-end training of filter kernels and depth reliability, which further improves the structural detail preservation via the guidance of RGB semantic features. Furthermore, we apply the NSVD module hierarchically at multiple scales, which ensures global smoothness while preserving visually salient details. The experimental results validate the advantages of the proposed network over existing approaches with enhanced performance and noise robustness for depth completion in real-use scenarios. Full article
(This article belongs to the Special Issue Image Sensors and Companion Chips)
Show Figures

Figure 1

Figure 1
<p>Example in NYUv2 dataset [<a href="#B12-electronics-13-02418" class="html-bibr">12</a>]. (<b>a</b>) RGB image input, (<b>b</b>) sparse depth input, depth estimation with (<b>c</b>) PNCNN [<a href="#B6-electronics-13-02418" class="html-bibr">6</a>] using single depth, (<b>d</b>) MiDaS [<a href="#B7-electronics-13-02418" class="html-bibr">7</a>] using single RGB, (<b>e</b>) NLSPN [<a href="#B11-electronics-13-02418" class="html-bibr">11</a>], and (<b>f</b>) proposed NSVDNet using both RGB and depth. As highlighted in the black rectangles, (<b>f</b>) NSVDNet generates more accurate structural details than (<b>e</b>) NLSPN due to the uncertainty-aware diffusion scheme. The results are evaluated using RMSE metric, where (<b>f</b>) NSVDNet achieves the smallest RMSE, indicating improved accuracy.</p>
Full article ">Figure 2
<p>An overview of NSVDNet architecture to predict a dense depth from a disturbed sparse depth with RGB guidance. NSVDNet is composed of the depth-dominant branch, which estimates the initial dense depth from the sparse sensor depth, and the RGB-dominant branch, which generates the semantic structural features. The two branches are fused in the hierarchical NSVD modules, where the initial dense depth is diffused with spatial-variant diffusion kernels constructed from RGB features.</p>
Full article ">Figure 3
<p>Depth completion with different algorithms, tested on NYUv2 dataset. As highlighted in the red rectangles, the proposed NSVDNet achieves more accurate depth completion results with detail preservation and noise robustness.</p>
Full article ">Figure 4
<p>Comparison of depth completion with original sparse depth and noisy sparse depth with 50% outliers, tested on NYUv2 dataset. The comparison between results with original and noisy inputs demonstrates the robustness to input corruption for the proposed method. The selected patches are enlarged in the colored rectangles.</p>
Full article ">Figure 5
<p>Generalization ability evaluation tests on TetrasRGBD dataset with outliers. The certainty maps explain the robustness of NSVDNet to input corruptions.</p>
Full article ">Figure 6
<p>Generalization ability evaluation tests on TetrasRGBD dataset with real sensor data, where the proposed NSVDNet generates more accurate depth estimation than competitive methods, including PNCNN [<a href="#B38-electronics-13-02418" class="html-bibr">38</a>] and NLSPN [<a href="#B11-electronics-13-02418" class="html-bibr">11</a>].</p>
Full article ">
16 pages, 3250 KiB  
Article
Iterative Adaptive Based Multi-Polarimetric SAR Tomography of the Forested Areas
by Shuang Jin, Hui Bi, Qian Guo, Jingjing Zhang and Wen Hong
Remote Sens. 2024, 16(9), 1605; https://doi.org/10.3390/rs16091605 - 30 Apr 2024
Cited by 2 | Viewed by 1502
Abstract
Synthetic aperture radar tomography (TomoSAR) is an extension of synthetic aperture radar (SAR) imaging. It introduces the synthetic aperture principle into the elevation direction to achieve three-dimensional (3-D) reconstruction of the observed target. Compressive sensing (CS) is a favorable technology for sparse elevation [...] Read more.
Synthetic aperture radar tomography (TomoSAR) is an extension of synthetic aperture radar (SAR) imaging. It introduces the synthetic aperture principle into the elevation direction to achieve three-dimensional (3-D) reconstruction of the observed target. Compressive sensing (CS) is a favorable technology for sparse elevation recovery. However, for the non-sparse elevation distribution of the forested areas, if CS is selected to reconstruct it, it is necessary to utilize some orthogonal bases to first represent the elevation reflectivity sparsely. The iterative adaptive approach (IAA) is a non-parametric algorithm that enables super-resolution reconstruction with minimal snapshots, eliminates the need for hyperparameter optimization, and requires fewer iterations. This paper introduces IAA to tomographicinversion of the forested areas and proposes a novel multi-polarimetric-channel joint 3-D imaging method. The proposed method relies on the characteristics of the consistent support of the elevation distribution of different polarimetric channels and uses the L2-norm to constrain the IAA-based 3-D reconstruction of each polarimetric channel. Compared with typical spectral estimation (SE)-based algorithms, the proposed method suppresses the elevation sidelobes and ambiguity and, hence, improves the quality of the recovered 3-D image. Compared with the wavelet-based CS algorithm, it reduces computational cost and avoids the influence of orthogonal basis selection. In addition, in comparison to the IAA, it demonstrates greater accuracy in identifying the support of the elevation distribution in forested areas. Experimental results based on BioSAR 2008 data are used to validate the proposed method. Full article
(This article belongs to the Special Issue Advances in Synthetic Aperture Radar Data Processing and Application)
Show Figures

Figure 1

Figure 1
<p>TomoSAR imaging geometry.</p>
Full article ">Figure 2
<p>Scattering Scattering of a forested area. (<b>a</b>) Scattering mechanism, (<b>b</b>) Scattering distribution.</p>
Full article ">Figure 3
<p>Schematic diagram of backscattering coefficients of HH, HV, and VV polarimetric channels.</p>
Full article ">Figure 4
<p>Elevation aperture position in the BioSAR 2008 dataset.</p>
Full article ">Figure 5
<p>Implementation process of the TomoSAR 3-D imaging of the forested areas based on the proposed method.</p>
Full article ">Figure 6
<p>Polarimetric SAR image of the surveillance area (The yellow area numbers 1 and 2 respectively represent the two slices selected for the experiment).</p>
Full article ">Figure 7
<p>Amplitude and phase results after data preprocessing for the (<b>a</b>) HH, (<b>b</b>) HV, and (<b>c</b>) VV polarimetric channels.</p>
Full article ">Figure 7 Cont.
<p>Amplitude and phase results after data preprocessing for the (<b>a</b>) HH, (<b>b</b>) HV, and (<b>c</b>) VV polarimetric channels.</p>
Full article ">Figure 8
<p>The incoherent sum of the results for all polarization channels (Slice 1). (<b>a</b>) BF. (<b>b</b>) Capon. (<b>c</b>) MUSIC. (<b>d</b>) Wavelet-based <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) IAA. (<b>f</b>) The proposed method. The white line represents the LiDAR DSM.</p>
Full article ">Figure 8 Cont.
<p>The incoherent sum of the results for all polarization channels (Slice 1). (<b>a</b>) BF. (<b>b</b>) Capon. (<b>c</b>) MUSIC. (<b>d</b>) Wavelet-based <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) IAA. (<b>f</b>) The proposed method. The white line represents the LiDAR DSM.</p>
Full article ">Figure 9
<p>The incoherent sum of the results for all polarization channels (Slice 2). (<b>a</b>) BF. (<b>b</b>) Capon. (<b>c</b>) MUSIC. (<b>d</b>) Wavelet-based <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) IAA. (<b>f</b>) The proposed method. The white line represents the LiDAR DSM.</p>
Full article ">Figure 9 Cont.
<p>The incoherent sum of the results for all polarization channels (Slice 2). (<b>a</b>) BF. (<b>b</b>) Capon. (<b>c</b>) MUSIC. (<b>d</b>) Wavelet-based <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) IAA. (<b>f</b>) The proposed method. The white line represents the LiDAR DSM.</p>
Full article ">Figure 10
<p>3-D point cloud map of the entire surveillance region reconstructed by the proposed method.</p>
Full article ">
26 pages, 19577 KiB  
Article
Enhancing Building Point Cloud Reconstruction from RGB UAV Data with Machine-Learning-Based Image Translation
by Elisabeth Johanna Dippold and Fuan Tsai
Sensors 2024, 24(7), 2358; https://doi.org/10.3390/s24072358 - 8 Apr 2024
Cited by 1 | Viewed by 1696
Abstract
The performance of three-dimensional (3D) point cloud reconstruction is affected by dynamic features such as vegetation. Vegetation can be detected by near-infrared (NIR)-based indices; however, the sensors providing multispectral data are resource intensive. To address this issue, this study proposes a two-stage framework [...] Read more.
The performance of three-dimensional (3D) point cloud reconstruction is affected by dynamic features such as vegetation. Vegetation can be detected by near-infrared (NIR)-based indices; however, the sensors providing multispectral data are resource intensive. To address this issue, this study proposes a two-stage framework to firstly improve the performance of the 3D point cloud generation of buildings with a two-view SfM algorithm, and secondly, reduce noise caused by vegetation. The proposed framework can also overcome the lack of near-infrared data when identifying vegetation areas for reducing interferences in the SfM process. The first stage includes cross-sensor training, model selection and the evaluation of image-to-image RGB to color infrared (CIR) translation with Generative Adversarial Networks (GANs). The second stage includes feature detection with multiple feature detector operators, feature removal with respect to the NDVI-based vegetation classification, masking, matching, pose estimation and triangulation to generate sparse 3D point clouds. The materials utilized in both stages are a publicly available RGB-NIR dataset, and satellite and UAV imagery. The experimental results indicate that the cross-sensor and category-wise validation achieves an accuracy of 0.9466 and 0.9024, with a kappa coefficient of 0.8932 and 0.9110, respectively. The histogram-based evaluation demonstrates that the predicted NIR band is consistent with the original NIR data of the satellite test dataset. Finally, the test on the UAV RGB and artificially generated NIR with a segmentation-driven two-view SfM proves that the proposed framework can effectively translate RGB to CIR for NDVI calculation. Further, the artificially generated NDVI is able to segment and classify vegetation. As a result, the generated point cloud is less noisy, and the 3D model is enhanced. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Two-stage framework separated into the machine learning technique (gray) and the application in three steps (green). Firstly (gray), the CIR image is generated from RGB with image-to-image translation. Then (light green), the NDVI is calculated with the generated NIR and red band. Afterwards, (medium green), the NDVI segmentation and classification is used to match the detected features accordingly. Finally (dark green), pose estimation and triangulation are used to generate a sparse 3D point cloud.</p>
Full article ">Figure 2
<p>First stage of the two-stage workflow. (<b>a</b>) Image-to-image translation in 5 steps for RGB2CIR simulation. In general, input and pre-processing (orange), training and testing (green) and verification and validation (yellow) (<b>b</b>) Image-to-image translation training.</p>
Full article ">Figure 3
<p>Framework second stage: segmentation-driven two-view SfM algorithm. The processing steps are grouped by color, the NDVI related processing (green), the input, feature detection (orange), feature processing (yellow) and the output (blue).</p>
Full article ">Figure 4
<p>Pleiades VHR satellite imagery, with the nadir view in true color (RGB). The location of the study target is marked in orange and used for validation (see <a href="#sec3dot2dot3-sensors-24-02358" class="html-sec">Section 3.2.3</a>).</p>
Full article ">Figure 5
<p>The target for validation captured by Pleiades VHR satellite. (<b>a</b>) The target stadium; (<b>b</b>) the geolocation of the target (marked in orange in <a href="#sensors-24-02358-f004" class="html-fig">Figure 4</a>); (<b>c</b>) the target ground truth (GT) CIR image. GT NDVI of the target building and its vicinity.</p>
Full article ">Figure 6
<p>Morphological changes on the image covering the target and image tiles. (<b>a</b>) Original cropped CIR image of Pleiades Satellite Imagery (1024 × 1024 × 3). A single tile, the white rectangle in (<b>a</b>), is shown as (<b>e</b>). (<b>b</b>–<b>d</b>) and (<b>f</b>–<b>i</b>) are the morphed images of (<b>a</b>) and (<b>e</b>), respectively.</p>
Full article ">Figure 7
<p>Training over 200 epochs for model selection. The generator loss (loss GEN) plotted in orange and, in contrast, FID calculation results in blue.</p>
Full article ">Figure 8
<p>Training Pix2Pix for model selection with FID. The epochs with the best FID and CM are marked for every test run, expect overall, with colored bars respectivly. The numbers are summarized in <a href="#sensors-24-02358-t005" class="html-table">Table 5</a>.</p>
Full article ">Figure 9
<p>CIR pansharpening on the target. The high-resolution panchromatic image is used to increase the resolution of the composite CIR image while preserving spectral information. From top to bottom, (<b>a</b>) panchromatic, (<b>b</b>) color infrared created from multi-spectral bands, and (<b>c</b>) pansharpened color infrared are shown.</p>
Full article ">Figure 10
<p>Example of vegetation feature removal to the north of the stadium. (<b>a</b>) CIR images; (<b>b</b>) NDVI image with legend; (<b>c</b>) identified SURF features (yellos asterisks) within dense vegetated areas (green) using 0.6 as the threshold.</p>
Full article ">Figure 11
<p>Comparison between the prediction and the ground truth (GT) of the CIR, NIR and NDVI (incl. legend) of the main target (a stadium) and vicinity.</p>
Full article ">Figure 12
<p>Comparison between the prediction and the ground truth (GT) of the CIR, NIR and NDVI generated from a pansharpened RGB satellite sub-image.</p>
Full article ">Figure 13
<p>Histogram and visual inspection of the CIR and NDVI simulated using MS and PAN images on the target stadium. (<b>a</b>–<b>c</b>) Ground truth (GT) and NDVI predicted using one tile with the size of 256 × 256 from MS Pleiades and their histograms. (<b>d</b>–<b>f</b>) Ground truth of CIR, NIR, NDVI and predicted NIR and NDVI images from nine tiles of the PAN Pleiades images and histograms for NDVI comparison.</p>
Full article ">Figure 14
<p>Histogram and visual inspection of MS (<b>I</b>–<b>III</b>) and PAN (<b>IV</b>–<b>VI</b>) examples of Zhubei city.</p>
Full article ">Figure 15
<p>Prediction of CIR, NIR and calculated NDVI of a UAV scene: (<b>a</b>) RGB, (<b>b</b>) predicted CIR image, (<b>c</b>) the extracted NIR band of (<b>b</b>), and (<b>d</b>) calculated NDVI with NIR and red band. A close-up view of the area marked with an orange box in (<b>a</b>) is displayed as two 256 × 256 tiles in RGB (<b>e</b>) and the predicted CIR (<b>f</b>).</p>
Full article ">Figure 15 Cont.
<p>Prediction of CIR, NIR and calculated NDVI of a UAV scene: (<b>a</b>) RGB, (<b>b</b>) predicted CIR image, (<b>c</b>) the extracted NIR band of (<b>b</b>), and (<b>d</b>) calculated NDVI with NIR and red band. A close-up view of the area marked with an orange box in (<b>a</b>) is displayed as two 256 × 256 tiles in RGB (<b>e</b>) and the predicted CIR (<b>f</b>).</p>
Full article ">Figure 16
<p>Direct comparison between without (<b>a</b>) and with vegetation segmentation (<b>b</b>). Areas of low density shown in blue, areas of high density shown in red.</p>
Full article ">Figure 17
<p>Two−view SfM 3D sparse point cloud without the application of NDVI−based vegetation removal on the target CSRSR. (<b>a</b>) Sparse point cloud with no further coloring; (<b>b</b>) point cloud colored by elevation; (<b>c</b>) density analysis and the corresponding histogram (<b>d</b>). In addition, Table (<b>e</b>) shows the accumulated number of points over the three operators (SURF, ORB and FAST) and the initial and manually cleaned and processed point cloud.</p>
Full article ">Figure 18
<p>Two−view SfM reconstructed 3D sparse point cloud with vegetation segmentation and removal process based on simulated NDVI of the target building. (<b>a</b>) Sparse point cloud with no further coloring; (<b>b</b>) point cloud colored by elevation; (<b>c</b>) density analysis and (<b>d</b>) the histogram. In addition, (<b>e</b>) lists the accumulated number of points over the three operators (SURF, ORB and FAST) after segmentation, with 0.5 NDVI as the threshold to mask vegetation in SURF and ORB, and the initial and manually cleaned point cloud.</p>
Full article ">
13 pages, 3903 KiB  
Article
Binocular Visual Measurement Method Based on Feature Matching
by Zhongyang Xie and Chengyu Yang
Sensors 2024, 24(6), 1807; https://doi.org/10.3390/s24061807 - 11 Mar 2024
Cited by 3 | Viewed by 1351
Abstract
To address the issues of low measurement accuracy and unstable results when using binocular cameras to detect objects with sparse surface textures, weak surface textures, occluded surfaces, low-contrast surfaces, and surfaces with intense lighting variations, a three-dimensional measurement method based on an improved [...] Read more.
To address the issues of low measurement accuracy and unstable results when using binocular cameras to detect objects with sparse surface textures, weak surface textures, occluded surfaces, low-contrast surfaces, and surfaces with intense lighting variations, a three-dimensional measurement method based on an improved feature matching algorithm is proposed. Initially, features are extracted from the left and right images obtained by the binocular camera. The extracted feature points serve as seed points, and a one-dimensional search space is established accurately based on the disparity continuity and epipolar constraints. The optimal search range and seed point quantity are obtained using the particle swarm optimization algorithm. The zero-mean normalized cross-correlation coefficient is employed as a similarity measure function for region growing. Subsequently, the left and right images are matched based on the grayscale information of the feature regions, and seed point matching is performed within each matching region. Finally, the obtained matching pairs are used to calculate the three-dimensional information of the target object using the triangulation formula. The proposed algorithm significantly enhances matching accuracy while reducing algorithm complexity. Experimental results on the Middlebury dataset show an average relative error of 0.75% and an average measurement time of 0.82 s. The error matching rate of the proposed image matching algorithm is 2.02%, and the PSNR is 34 dB. The algorithm improves the measurement accuracy for objects with sparse or weak textures, demonstrating robustness against brightness variations and noise interference. Full article
(This article belongs to the Section Optical Sensors)
Show Figures

Figure 1

Figure 1
<p>Three-dimensional measurement workflow.</p>
Full article ">Figure 2
<p>Epipolar line constraint.</p>
Full article ">Figure 3
<p>Disparity gradient constraint.</p>
Full article ">Figure 4
<p>Region growing process. (<b>a</b>) Image of seed point pixel; (<b>b</b>) image after seed point diffusion.</p>
Full article ">Figure 5
<p>Schematic diagram of binocular camera measurement.</p>
Full article ">Figure 6
<p>Experimental results for bowling stereo image pair. (<b>a</b>) Left view; (<b>b</b>) right view; (<b>c</b>) Siamese network algorithm; (<b>d</b>) generative adversarial network algorithm; (<b>e</b>) region matching algorithm; (<b>f</b>) DP algorithm; (<b>g</b>) SIFT matching algorithm; (<b>h</b>) proposed algorithm.</p>
Full article ">Figure 7
<p>Experimental results for lampshade stereo image pair. (<b>a</b>) Left view; (<b>b</b>) right view; (<b>c</b>) Siamese network algorithm; (<b>d</b>) generative adversarial network algorithm; (<b>e</b>) region matching algorithm; (<b>f</b>) DP algorithm; (<b>g</b>) SIFT matching algorithm; (<b>h</b>) proposed algorithm.</p>
Full article ">Figure 8
<p>Experimental results for plastic stereo image pair. (<b>a</b>) Left view; (<b>b</b>) right view; (<b>c</b>) Siamese network algorithm; (<b>d</b>) generative adversarial network algorithm; (<b>e</b>) region matching algorithm; (<b>f</b>) DP algorithm; (<b>g</b>) SIFT matching algorithm; (<b>h</b>) proposed algorithm.</p>
Full article ">Figure 9
<p>Experimental results for wood stereo image pair. (<b>a</b>) Left view; (<b>b</b>) right view; (<b>c</b>) Siamese network algorithm; (<b>d</b>) generative adversarial network algorithm; (<b>e</b>) region matching algorithm; (<b>f</b>) DP algorithm; (<b>g</b>) SIFT matching algorithm; (<b>h</b>) proposed algorithm.</p>
Full article ">
21 pages, 6240 KiB  
Article
Similarity Measurement and Retrieval of Three-Dimensional Voxel Model Based on Symbolic Operator
by Zhenwen He, Xianzhen Liu and Chunfeng Zhang
ISPRS Int. J. Geo-Inf. 2024, 13(3), 89; https://doi.org/10.3390/ijgi13030089 - 11 Mar 2024
Viewed by 1841
Abstract
Three-dimensional voxel models are widely applied in various fields such as 3D imaging, industrial design, and medical imaging. The advancement of 3D modeling techniques and measurement devices has made the generation of three-dimensional models more convenient. The exponential increase in the number of [...] Read more.
Three-dimensional voxel models are widely applied in various fields such as 3D imaging, industrial design, and medical imaging. The advancement of 3D modeling techniques and measurement devices has made the generation of three-dimensional models more convenient. The exponential increase in the number of 3D models presents a significant challenge for model retrieval. Currently, these models are numerous and typically represented as point clouds or meshes, resulting in sparse data and high feature dimensions within the retrieval database. Traditional methods for 3D model retrieval suffer from high computational complexity and slow retrieval speeds. To address this issue, this paper combines spatial-filling curves with octree structures and proposes a novel approach for representing three-dimensional voxel model sequence data features, along with a similarity measurement method based on symbolic operators. This approach enables efficient similarity calculations and rapid dimensionality reduction for the three-dimensional model database, facilitating efficient similarity calculations and expedited retrieval. Full article
Show Figures

Figure 1

Figure 1
<p>Three-dimensional model retrieval based on VSO architecture.</p>
Full article ">Figure 2
<p>The Hilbert curve.</p>
Full article ">Figure 3
<p>The three-dimensional Hilbert curve.</p>
Full article ">Figure 4
<p>Hilbert space-filling curve-based voxelization and octree storage of 3D models.</p>
Full article ">Figure 5
<p>(<b>a</b>) Voxel granularity 4 × 4 × 4, (<b>b</b>) voxel granularity 8 × 8 × 8.</p>
Full article ">Figure 6
<p>(<b>a</b>) Sixteen filling states of voxels (<b>b</b>) Representation of 3D models based on symbolic operators.</p>
Full article ">Figure 7
<p>(<b>a</b>) Symbol sequence of length 16, (<b>b</b>) symbol sequence of length 128.</p>
Full article ">Figure 8
<p>Symbol sequence for the bathtub class model.</p>
Full article ">Figure 9
<p>Feature representation processes based on symbolic operators.</p>
Full article ">Figure 10
<p>Similarity measurement based on VSO.</p>
Full article ">Figure 11
<p>ModelNet10 classification confusion matrix.</p>
Full article ">Figure 12
<p>Confusing night_stand and dresser models.</p>
Full article ">Figure 13
<p>The section represents the characteristic dimensions of the method.</p>
Full article ">
15 pages, 3959 KiB  
Article
Sub-Bin Delayed High-Range Accuracy Photon-Counting 3D Imaging
by Hao-Meng Yin, Hui Zhao, Ming-Yang Yang, Yong-An Liu, Li-Zhi Sheng and Xue-Wu Fan
Photonics 2024, 11(2), 181; https://doi.org/10.3390/photonics11020181 - 16 Feb 2024
Viewed by 1279
Abstract
The range accuracy of single-photon-array three-dimensional (3D) imaging systems is limited by the time resolution of the array detectors. We introduce a method for achieving super-resolution in 3D imaging through sub-bin delayed scanning acquisition and fusion. Its central concept involves the generation of [...] Read more.
The range accuracy of single-photon-array three-dimensional (3D) imaging systems is limited by the time resolution of the array detectors. We introduce a method for achieving super-resolution in 3D imaging through sub-bin delayed scanning acquisition and fusion. Its central concept involves the generation of multiple sub-bin difference histograms through sub-bin shifting. Then, these coarse time-resolution histograms are fused with multiplied averages to produce finely time-resolved detailed histograms. Finally, the arrival times of the reflected photons with sub-bin resolution are extracted from the resulting fused high-time-resolution count distribution. Compared with the sub-delayed with the fusion method added, the proposed method performs better in reducing the broadening error caused by coarsened discrete sampling and background noise error. The effectiveness of the proposed method is examined at different target distances, pulse widths, and sub-bin scales. The simulation analytical results indicate that small-scale sub-bin delays contribute to superior reconstruction outcomes for the proposed method. Specifically, implementing a sub-bin temporal resolution delay of a factor of 0.1 for a 100 ps echo pulse width substantially reduces the system ranging error by three orders of magnitude. Furthermore, Monte Carlo simulations allow to describe a low signal-to-background noise ratio (0.05) characterised by sparsely reflected photons. The proposed method demonstrates a commendable capability to simultaneously achieve wide-ranging super-resolution and denoising. This is evidenced by the detailed depth distribution information and substantial reduction of 95.60% in the mean absolute error of the reconstruction results, confirming the effectiveness of the proposed method in noisy scenarios. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Diagram of single-photon 3D imaging system (PC, personal computer; TDC, time-to-digital conversion); (<b>b</b>) the principle of time-correlated single-photon counting.</p>
Full article ">Figure 2
<p>Distribution of quantisation error and quantisation centroid error with difference gate start time at 5.7 m target distance.</p>
Full article ">Figure 3
<p>Example of proposed method for double delaying.</p>
Full article ">Figure 4
<p>Delay of 0.1 ns for acquisition coarse histograms at 6 m target distance.</p>
Full article ">Figure 5
<p>Counts over time using different fusion methods. (<b>a</b>) Theoretical photon-count distribution. (<b>b</b>) Direct measurement histogram distribution. (<b>c</b>) Additive fusion-count distribution. (<b>d</b>) Multiplicative fusion-count distribution.</p>
Full article ">Figure 6
<p>Distribution of ranging error with theoretical distance for (<b>a</b>) <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mn>2</mn> <mi>n</mi> <mi>s</mi> </mrow> </semantics></math>, (<b>b</b>) <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mn>4</mn> <mi>n</mi> <mi>s</mi> </mrow> </semantics></math>, (<b>c</b>) <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mn>5</mn> <mi>n</mi> <mi>s</mi> </mrow> </semantics></math>, and (<b>d</b>) <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mn>10</mn> <mi>n</mi> <mi>s</mi> </mrow> </semantics></math>.</p>
Full article ">Figure 7
<p>Reconstructed target distances at reflected echo pulse width <span class="html-italic">τ</span> of (<b>a</b>) 20 ps, (<b>b</b>) 50 ps, and (<b>c</b>) 100 ps.</p>
Full article ">Figure 8
<p>Distribution of calculated and theoretical distances for pulse width <span class="html-italic">τ</span> of (<b>a</b>) 20 ps, (<b>b</b>) 50 ps, and (<b>c</b>) 100 ps.</p>
Full article ">Figure 9
<p>(<b>a</b>,<b>e</b>) Ground truth. (<b>b</b>,<b>f</b>) Direct measurement reconstruction results for different SBR levels. (<b>c</b>,<b>g</b>) Subtractive dither reconstruction results for different SBR levels. (<b>d</b>,<b>h</b>) Proposed method reconstruction results for different SBR levels.</p>
Full article ">Figure 10
<p>(<b>a</b>,<b>e</b>,<b>i</b>,<b>m</b>) Ground truth. (<b>b</b>,<b>f</b>,<b>j</b>,<b>n</b>) Direct measurement reconstruction results for different acquisition pulse numbers. (<b>c</b>,<b>g</b>,<b>k</b>,<b>o</b>) Subtractive dither reconstruction results for different acquisition pulse number. (<b>d</b>,<b>h</b>,<b>l</b>,<b>p</b>) Proposed method reconstruction results for different acquisition pulse numbers.</p>
Full article ">
19 pages, 7992 KiB  
Article
A Deep Learning Approach for Improving Two-Photon Vascular Imaging Speeds
by Annie Zhou, Samuel A. Mihelic, Shaun A. Engelmann, Alankrit Tomar, Andrew K. Dunn and Vagheesh M. Narasimhan
Bioengineering 2024, 11(2), 111; https://doi.org/10.3390/bioengineering11020111 - 24 Jan 2024
Cited by 2 | Viewed by 1958
Abstract
A potential method for tracking neurovascular disease progression over time in preclinical models is multiphoton fluorescence microscopy (MPM), which can image cerebral vasculature with capillary-level resolution. However, obtaining high-quality, three-dimensional images with traditional point scanning MPM is time-consuming and limits sample sizes for [...] Read more.
A potential method for tracking neurovascular disease progression over time in preclinical models is multiphoton fluorescence microscopy (MPM), which can image cerebral vasculature with capillary-level resolution. However, obtaining high-quality, three-dimensional images with traditional point scanning MPM is time-consuming and limits sample sizes for chronic studies. Here, we present a convolutional neural network-based (PSSR Res-U-Net architecture) algorithm for fast upscaling of low-resolution or sparsely sampled images and combine it with a segmentation-less vectorization process for 3D reconstruction and statistical analysis of vascular network structure. In doing so, we also demonstrate that the use of semi-synthetic training data can replace the expensive and arduous process of acquiring low- and high-resolution training pairs without compromising vectorization outcomes, and thus open the possibility of utilizing such approaches for other MPM tasks where collecting training data is challenging. We applied our approach to images with large fields of view from a mouse model and show that our method generalizes across imaging depths, disease states and other differences in neurovasculature. Our pretrained models and lightweight architecture can be used to reduce MPM imaging time by up to fourfold without any changes in underlying hardware, thereby enabling deployability across a range of settings. Full article
(This article belongs to the Special Issue AI and Big Data Research in Biomedical Engineering)
Show Figures

Figure 1

Figure 1
<p><b>Structure and analysis pipeline.</b> Low-resolution images (128 × 128 pixels) are acquired using two-photon microscopy. A deep learning (PSSR Res-U-Net)-based upscaling process generates high-resolution images (512 × 512 pixels), which take much longer to acquire, from low-resolution images. Segmentation-less vascular vectorization (SLAVV) generates 3D renderings and calculates network statistics from an upscaled image stack.</p>
Full article ">Figure 2
<p><b>Generating and evaluating semi-synthetic training data.</b> (<b>a</b>) Examples of semi-synthetic training images created using different types of added noise prior to downscaling: no noise (downscaling only), Poisson, Gaussian, and additive Gaussian. Acquired low-resolution (LR, 128 × 128 pixels) and high-resolution (HR, 512 × 512 pixels) ground truth images are shown for reference. (<b>b</b>) Resulting test image output from models trained using each noise method, with acquired low-resolution image for model input and acquired high-resolution image as ground truth for comparison. All models were trained with 3399 image pairs, with the Gaussian and additive Gaussian models further tested on 24,069 image pairs (7×) to further test performance. (<b>c</b>) Boxplot comparison of PSNR and SSIM values for each noise method image in (<b>b</b>) measured against ground truth image. Values plotted for an image stack of 222 images. (<b>d</b>) Comparison of test images from models trained using real-world acquired vs. semi-synthetic data, with real acquired low-resolution image for model input and acquired high-resolution image as ground truth for reference. All models were trained with 234 image pairs, a large reduction from the noise model comparison, due to the limited availability of real-world pairs. (<b>e</b>) Boxplot comparison of PSNR and SSIM values for real acquired vs. semi-synthetic model outputs corresponding to (<b>d</b>), measured against high-resolution ground truth image. All values are plotted for an image stack of 222 images.</p>
Full article ">Figure 3
<p><b>Comparison of performance</b> between bilinear upscaling, a single-frame model, and a multi-frame model for semi-synthetic and real acquired test images. All models were trained with 24,069 image pairs. (<b>a</b>) Semi-synthetic test images from bilinear upscaling and models trained using single- vs. multi-frame data. Acquired low-resolution image for model input and acquired high-resolution image as ground truth are shown for reference. (<b>b</b>) PSNR and SSIM plots corresponding to semi-synthetic test results from (<b>a</b>). (<b>c</b>) Real-world test images from bilinear upscaling and models trained using single- vs. multi-frame data. (<b>d</b>) PSNR and SSIM plots corresponding to real-world test image results from (<b>c</b>).</p>
Full article ">Figure 4
<p><b>Maximum-intensity projections (x-y) of ischemic infarct images</b> consisting of 2 × 4 tiles with 213 slices (final dimensions 1.18 mm × 2.10 mm × 0.636 mm, pixel dimensions 1.34 μm × 1.36 μm × 3 μm) for a semi-synthetic low-resolution image, bilinear upscaled image, single- and multi-frame output images, and acquired high-resolution image. The black hole in the bottom-left corner represents the infarct itself.</p>
Full article ">Figure 5
<p><b>Comparison of vectorization results using different upscaling methods against a ground truth image.</b> (<b>a</b>) Blender rendering of vectorized images using VessMorphoVis [<a href="#B26-bioengineering-11-00111" class="html-bibr">26</a>] for visual comparison between single- and multi-frame results and an acquired high-resolution image. We performed manual curation for this vectorization process. (<b>b</b>) Vectorized image statistics for the automated curation process with known ground truth (simulated from manually curated high-resolution image). CDFs shown for metrics of length, radius, z-direction, and inverse tortuosity for original (OG), simulated original (sOG), bilinear upscaled (BL), and PSSR single- and multi-frame (SF, MF, respectively) images. Pearson’s correlation values (<span class="html-italic">r</span><sup>2</sup>) were calculated between the original image and each simulated or upscaled image for each metric. (<b>c</b>) Statistics regarding maximum accuracy (%) achieved with vectorization or thresholding and % error in median length and radius for each method.</p>
Full article ">
17 pages, 7346 KiB  
Article
W-Band FMCW MIMO System for 3-D Imaging Based on Sparse Array
by Wenyuan Shao, Jianmin Hu, Yicai Ji, Wenrui Zhang and Guangyou Fang
Electronics 2024, 13(2), 369; https://doi.org/10.3390/electronics13020369 - 16 Jan 2024
Cited by 4 | Viewed by 1656
Abstract
Multiple-input multiple-output (MIMO) technology is widely used in the field of security imaging. However, existing imaging systems have shortcomings such as numerous array units, high hardware costs, and low imaging resolutions. In this paper, a sparse array-based frequency modulated continuous wave (FMCW) millimeter [...] Read more.
Multiple-input multiple-output (MIMO) technology is widely used in the field of security imaging. However, existing imaging systems have shortcomings such as numerous array units, high hardware costs, and low imaging resolutions. In this paper, a sparse array-based frequency modulated continuous wave (FMCW) millimeter wave imaging system, operating in the W-band, is presented. In order to reduce the number of transceiver units of the system and lower the hardware cost, a linear sparse array with a periodic structure was designed using the MIMO technique. The system operates at 70~80 GHz, and the high operating frequency band and 10 GHz bandwidth provide good imaging resolution. The system consists of a one-dimensional linear array, a motion control system, and hardware for signal generation and image reconstruction. The channel calibration technique was used to eliminate inherent errors. The system combines mechanical and electrical scanning, and uses FMCW signals to extract distance information. The three-dimensional (3-D) fast imaging algorithm in the wave number domain was utilized to quickly process the detection data. The 3-D imaging of the target in the near-field was obtained, with an imaging resolution of 2 mm. The imaging ability of the system was verified through simulations and experiments. Full article
(This article belongs to the Special Issue Radar Signal Processing Technology)
Show Figures

Figure 1

Figure 1
<p>Schematic diagram model of radar echo signal.</p>
Full article ">Figure 2
<p>Schematic diagram of a single module array and EPC distribution.</p>
Full article ">Figure 3
<p>Schematic diagram of the working principle of millimeter wave MIMO imaging system.</p>
Full article ">Figure 4
<p>Schematic diagram of point-target imaging simulation scene.</p>
Full article ">Figure 5
<p>(<b>a</b>) Simulation results of point targets at 0.3 m; (<b>b</b>) simulation results of two-dimensional points-matrix target at 0.3 m.</p>
Full article ">Figure 6
<p>Single-point target and two-dimensional point-target matrix echo curves at 0.3 m. (<b>a</b>) Azimuth direction echo curve of single-point target; (<b>b</b>) mechanical scanning direction echo curve of single-point target; (<b>c</b>) azimuth direction echo curve of points-matrix target; (<b>d</b>) mechanical scanning direction echo curve of points-matrix target.</p>
Full article ">Figure 7
<p>Schematic diagram of radar system structure.</p>
Full article ">Figure 8
<p>Photograph of the imaging system.</p>
Full article ">Figure 9
<p>Schematic diagram of the imaging module structure.</p>
Full article ">Figure 10
<p>Schematic diagram of calibration equipment structure.</p>
Full article ">Figure 11
<p>(<b>a</b>) Photo of the experimental scene, including three metal spherical shells at 0.3 m; (<b>b</b>) imaging results of three metal shells.</p>
Full article ">Figure 12
<p>Echo curves of three metal spherical shells at 0.3 m. (<b>a</b>) Azimuth direction echo curve; (<b>b</b>) mechanical scanning direction echo curve.</p>
Full article ">Figure 13
<p>(<b>a</b>) Optical picture of the resolution board; (<b>b</b>) imaging results before calibration; (<b>c</b>) imaging results after calibration.</p>
Full article ">Figure 14
<p>(<b>a</b>) Optical photo of the pistol model; (<b>b</b>) optical photo of the dagger.</p>
Full article ">Figure 15
<p>(<b>a</b>) Photo of the experimental scene, including the imaging system and human models containing dangerous articles hidden under clothing; (<b>b</b>) imaging results of the pistol model; (<b>c</b>) imaging results of the dagger.</p>
Full article ">
30 pages, 22271 KiB  
Article
A Novel Approach for Simultaneous Localization and Dense Mapping Based on Binocular Vision in Forest Ecological Environment
by Lina Liu, Yaqiu Liu, Yunlei Lv and Xiang Li
Forests 2024, 15(1), 147; https://doi.org/10.3390/f15010147 - 10 Jan 2024
Cited by 3 | Viewed by 1823
Abstract
The three-dimensional reconstruction of forest ecological environment by low-altitude remote sensing photography from Unmanned Aerial Vehicles (UAVs) provides a powerful basis for the fine surveying of forest resources and forest management. A stereo vision system, D-SLAM, is proposed to realize simultaneous localization and [...] Read more.
The three-dimensional reconstruction of forest ecological environment by low-altitude remote sensing photography from Unmanned Aerial Vehicles (UAVs) provides a powerful basis for the fine surveying of forest resources and forest management. A stereo vision system, D-SLAM, is proposed to realize simultaneous localization and dense mapping for UAVs in complex forest ecological environments. The system takes binocular images as input and 3D dense maps as target outputs, while the 3D sparse maps and the camera poses can be obtained. The tracking thread utilizes temporal clue to match sparse map points for zero-drift localization. The relative motion amount and data association between frames are used as constraints for new keyframes selection, and the binocular image spatial clue compensation strategy is proposed to increase the robustness of the algorithm tracking. The dense mapping thread uses Linear Attention Network (LANet) to predict reliable disparity maps in ill-posed regions, which are transformed to depth maps for constructing dense point cloud maps. Evaluations of three datasets, EuRoC, KITTI and Forest, show that the proposed system can run at 30 ordinary frames and 3 keyframes per second with Forest, with a high localization accuracy of several centimeters for Root Mean Squared Absolute Trajectory Error (RMS ATE) on EuRoC and a Relative Root Mean Squared Error (RMSE) with two average values of 0.64 and 0.2 for trel and Rrel with KITTI, outperforming most mainstream models in terms of tracking accuracy and robustness. Moreover, the advantage of dense mapping compensates for the shortcomings of sparse mapping in most Smultaneous Localization and Mapping (SLAM) systems and the proposed system meets the requirements of real-time localization and dense mapping in the complex ecological environment of forests. Full article
(This article belongs to the Special Issue Modeling and Remote Sensing of Forests Ecosystem)
Show Figures

Figure 1

Figure 1
<p>Disparity maps in Forest dataset.</p>
Full article ">Figure 2
<p>Bag in Forest dataset.</p>
Full article ">Figure 3
<p>Left images in bag.</p>
Full article ">Figure 4
<p>Right images in bag.</p>
Full article ">Figure 5
<p>The D-SLAM system consists of four main parallel threads, tracking, local mapping, dense mapping, and loop closing, where the acronyms are defined as follows: Preprocessing (Pre-process), Llocal Bundle Adjustment (Local BA), Full Bundle Adjustment (Full BA), Special Euclidean group (SE3), Point cloud (Pcd), Linear Attention Network (LANet), and bags of binary Words for fast place recognition in image sequences (DboW2).</p>
Full article ">Figure 6
<p>Compensation of spatial clue in the right image frame, where KF represents the key frame, CF represents the current frame, and RF-1 represents the previous frame of the right image.</p>
Full article ">Figure 7
<p>Network structure of LANet. LANet consists of five main parts: feature extraction ResNet, Attention Module (AM), Construction of Matching cost, Three Dimensional Convolutional Neural Network aggregation (3D CNN aggregation), and disparity prediction. The AM consists of two parts: Spatial Attention Module (SAM) and Channel Attention Module (CAM); the 3D CNN aggregation consists of two structures: the basic structure is used for ablation experiments to test the performance of various parts of the network and the stacked hourglass structure is used to optimize the network.</p>
Full article ">Figure 8
<p>Linear mapping layers.</p>
Full article ">Figure 9
<p>The visualization of disparity maps with Forest. The yellow or green boxes are the regions with significant disparity contrast generated by various methods.</p>
Full article ">Figure 10
<p>Estimated trajectory (dark blue) and GT (red) for 9 sequences on EuRoC.</p>
Full article ">Figure 11
<p>Error graph for the 05 sequence of KITTI.</p>
Full article ">Figure 12
<p>Comparison of projection trajectories for the 08 sequence of KITTI.</p>
Full article ">Figure 13
<p>Dense mapping Effect for 01-sequence of KITTI.(<b>a</b>) a left RGB image, (<b>b</b>) a visual disparity map, (<b>c</b>) afeature point tracking map, (<b>d</b>) anestimated trajectory map, (<b>e</b>) asparse point cloud map, and (<b>f</b>) a dense point cloud map.</p>
Full article ">Figure 14
<p>Local dense point cloud maps from different views for the KITTI01 sequence.</p>
Full article ">Figure 15
<p>Dense mapping process on Forest, where (<b>a</b>) is the left RGB image, (<b>b</b>) is the visual disparity map, (<b>c</b>) is the feature point tracking map, (<b>d</b>) is the estimated trajectory map, which has a loop closing and the localization of the front end and the accuracy of the back end mapping are improved after loop closing correction, (<b>e</b>) is the sparse point cloud map, in which the blue boxes represent the keyframes, the green box represents the current frame, the red box represents the start frame, the red points represent the reference map points, and the black points represent all map points generated by keyframes. and (<b>f</b>) is the overall effect of the dense point cloud map.</p>
Full article ">Figure 16
<p>Localized dense point cloud at different angles with Forest.</p>
Full article ">
Back to TopTop