[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (66)

Search Parameters:
Keywords = three-dimensional (3D) sparse imaging

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 2102 KiB  
Article
Semantic Segmentation Method for High-Resolution Tomato Seedling Point Clouds Based on Sparse Convolution
by Shizhao Li, Zhichao Yan, Boxiang Ma, Shaoru Guo and Hongxia Song
Agriculture 2025, 15(1), 74; https://doi.org/10.3390/agriculture15010074 - 31 Dec 2024
Viewed by 232
Abstract
Semantic segmentation of three-dimensional (3D) plant point clouds at the stem-leaf level is foundational and indispensable for high-throughput tomato phenotyping systems. However, existing semantic segmentation methods often suffer from issues such as low precision and slow inference speed. To address these challenges, we [...] Read more.
Semantic segmentation of three-dimensional (3D) plant point clouds at the stem-leaf level is foundational and indispensable for high-throughput tomato phenotyping systems. However, existing semantic segmentation methods often suffer from issues such as low precision and slow inference speed. To address these challenges, we propose an innovative encoding-decoding structure, incorporating voxel sparse convolution (SpConv) and attention-based feature fusion (VSCAFF) to enhance semantic segmentation of the point clouds of high-resolution tomato seedling images. Tomato seedling point clouds from the Pheno4D dataset labeled into semantic classes of ‘leaf’, ‘stem’, and ‘soil’ are applied for the semantic segmentation. In order to reduce the number of parameters so as to further improve the inference speed, the SpConv module is designed to function through the residual concatenation of the skeleton convolution kernel and the regular convolution kernel. The feature fusion module based on the attention mechanism is designed by giving the corresponding attention weights to the voxel diffusion features and the point features in order to avoid the ambiguity of points with different semantics having the same characteristics caused by the diffusion module, in addition to suppressing noise. Finally, to solve model training class bias caused by the uneven distribution of point cloud classes, the composite loss function of Lovász-Softmax and weighted cross-entropy is introduced to supervise the model training and improve its performance. The results show that mIoU of VSCAFF is 86.96%, which outperformed the performance of PointNet, PointNet++, and DGCNN, respectively. IoU of VSCAFF achieves 99.63% in the soil class, 64.47% in the stem class, and 96.72% in the leaf class. The time delay of 35ms in inference speed is better than PointNet++ and DGCNN. The results demonstrate that VSCAFF has high performance and inference speed for semantic segmentation of high-resolution tomato point clouds, and can provide technical support for the high-throughput automatic phenotypic analysis of tomato plants. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>The raw tomato seedling point cloud and point cloud labeled into semantic classes of ‘leaf’, and ‘stem’, and ‘soil’.</p>
Full article ">Figure 2
<p>The structure of network.</p>
Full article ">Figure 3
<p>Encoding-decodin g architecture based on SpConv.</p>
Full article ">Figure 4
<p>Three kinds of convolution kernel structures.</p>
Full article ">Figure 5
<p>Attention-based feature fusion method.</p>
Full article ">Figure 6
<p>Semantic segmentation of the tomato plant point clouds. Note1: GT represents ground truth. Note2: Four seedling point clouds scanned in four discrete days represented.</p>
Full article ">
13 pages, 7953 KiB  
Article
TAMC: Textual Alignment and Masked Consistency for Open-Vocabulary 3D Scene Understanding
by Juan Wang, Zhijie Wang, Tomo Miyazaki, Yaohou Fan and Shinichiro Omachi
Sensors 2024, 24(19), 6166; https://doi.org/10.3390/s24196166 - 24 Sep 2024
Viewed by 677
Abstract
Three-dimensional (3D) Scene Understanding achieves environmental perception by extracting and analyzing point cloud data with wide applications including virtual reality, robotics, etc. Previous methods align the 2D image feature from a pre-trained CLIP model and the 3D point cloud feature for the open [...] Read more.
Three-dimensional (3D) Scene Understanding achieves environmental perception by extracting and analyzing point cloud data with wide applications including virtual reality, robotics, etc. Previous methods align the 2D image feature from a pre-trained CLIP model and the 3D point cloud feature for the open vocabulary scene understanding ability. We believe that existing methods have the following two deficiencies: (1) the 3D feature extraction process ignores the challenges of real scenarios, i.e., point cloud data are very sparse and even incomplete; (2) the training stage lacks direct text supervision, leading to inconsistency with the inference stage. To address the first issue, we employ a Masked Consistency training policy. Specifically, during the alignment of 3D and 2D features, we mask some 3D features to force the model to understand the entire scene using only partial 3D features. For the second issue, we generate pseudo-text labels and align them with the 3D features during the training process. In particular, we first generate a description for each 2D image belonging to the same 3D scene and then use a summarization model to fuse these descriptions into a single description of the scene. Subsequently, we align 2D-3D features and 3D-text features simultaneously during training. Massive experiments demonstrate the effectiveness of our method, outperforming state-of-the-art approaches. Full article
(This article belongs to the Special Issue Object Detection via Point Cloud Data)
Show Figures

Figure 1

Figure 1
<p>Comparison of previous methods and ours. (<b>a</b>) is the workflow of methods with class limitation, which needs labeled 3D point cloud data of seen classes and evaluates the model on unlabeled classes. As a comparison, both (<b>b</b>) OpenScene [<a href="#B7-sensors-24-06166" class="html-bibr">7</a>] and (<b>c</b>) our method follow the open vocabulary setting, meaning that no annotated data are required during the training process. The difference lies in that the proposed training process includes the supervision from the pseudo text label, and the masking training policy makes the extracted 3D features more robust, resulting in higher accuracy.</p>
Full article ">Figure 2
<p>Caption generation. Multi-view images are fed into the image captioning model <math display="inline"><semantics> <msub> <mi mathvariant="script">G</mi> <mrow> <mi>c</mi> <mi>a</mi> <mi>p</mi> </mrow> </msub> </semantics></math> to generate corresponding captions <math display="inline"><semantics> <msub> <mi>t</mi> <mi>i</mi> </msub> </semantics></math> of a scene with <span class="html-italic">N</span> images, then the text-summarization model <math display="inline"><semantics> <msub> <mi mathvariant="script">G</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>m</mi> </mrow> </msub> </semantics></math> summarizes <math display="inline"><semantics> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mn>1</mn> </msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi>t</mi> <mi>i</mi> </msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi>t</mi> <mi>N</mi> </msub> <mo>)</mo> </mrow> </semantics></math> to generate a scene-level caption <math display="inline"><semantics> <mi mathvariant="bold">t</mi> </semantics></math>.</p>
Full article ">Figure 3
<p>Method overview. (<b>a</b>) Training. Given a 3D point cloud, a set of posed images, and a scene-level caption, we train a 3D encoder <math display="inline"><semantics> <msubsup> <mi>ϵ</mi> <mi>θ</mi> <mrow> <mn>3</mn> <mi>D</mi> </mrow> </msubsup> </semantics></math> to produce a masked point-wise 3D feature <math display="inline"><semantics> <msubsup> <mrow> <mi mathvariant="bold">F</mi> </mrow> <mrow> <mi>M</mi> <mi>a</mi> <mi>s</mi> <mi>k</mi> </mrow> <mrow> <mn>3</mn> <mi>D</mi> </mrow> </msubsup> </semantics></math> with two losses: <math display="inline"><semantics> <msub> <mi mathvariant="script">L</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi mathvariant="script">L</mi> <mn>2</mn> </msub> </semantics></math> for <math display="inline"><semantics> <msubsup> <mrow> <mi mathvariant="bold">F</mi> </mrow> <mrow> <mi>M</mi> <mi>a</mi> <mi>s</mi> <mi>k</mi> </mrow> <mrow> <mn>3</mn> <mi>D</mi> </mrow> </msubsup> </semantics></math>-<math display="inline"><semantics> <msup> <mrow> <mi mathvariant="bold">F</mi> </mrow> <mrow> <mn>2</mn> <mi>D</mi> </mrow> </msup> </semantics></math> loss and <math display="inline"><semantics> <msubsup> <mrow> <mi mathvariant="bold">F</mi> </mrow> <mrow> <mi>M</mi> <mi>a</mi> <mi>s</mi> <mi>k</mi> </mrow> <mrow> <mn>3</mn> <mi>D</mi> </mrow> </msubsup> </semantics></math>-<math display="inline"><semantics> <msup> <mrow> <mi mathvariant="bold">F</mi> </mrow> <mrow> <mi>T</mi> <mi>e</mi> <mi>x</mi> <mi>t</mi> </mrow> </msup> </semantics></math> loss, respectively (refer to <a href="#sec3dot4-sensors-24-06166" class="html-sec">Section 3.4</a>). (<b>b</b>) Testing. We use cosine similarity loss between per-point features and text features to perform open-vocabulary 3D Scene Understanding tasks. The ‘<span class="html-italic">an XX in a scene</span>’ serves as the input prompt for text, where ‘<span class="html-italic">XX</span>’ represents query text, which adopts a dataset class during the segmentation task.</p>
Full article ">Figure 4
<p>Qualitative results on ScanNet [<a href="#B26-sensors-24-06166" class="html-bibr">26</a>]. From left to right: 3D input and related 2D image, (<b>a</b>) the result of the baseline method (OpenScene [<a href="#B7-sensors-24-06166" class="html-bibr">7</a>]), (<b>b</b>) the proposed method, (<b>c</b>) the ground truth segementation.</p>
Full article ">
15 pages, 2064 KiB  
Article
Research on the Depth Image Reconstruction Algorithm Using the Two-Dimensional Kaniadakis Entropy Threshold
by Xianhui Yang, Jianfeng Sun, Le Ma, Xin Zhou, Wei Lu and Sining Li
Sensors 2024, 24(18), 5950; https://doi.org/10.3390/s24185950 - 13 Sep 2024
Viewed by 661
Abstract
The photon-counting light laser detection and ranging (LiDAR), especially the Geiger mode avalanche photon diode (Gm-APD) LiDAR, can obtain three-dimensional images of the scene, with the characteristics of single-photon sensitivity, but the background noise limits the imaging quality of the laser radar. In [...] Read more.
The photon-counting light laser detection and ranging (LiDAR), especially the Geiger mode avalanche photon diode (Gm-APD) LiDAR, can obtain three-dimensional images of the scene, with the characteristics of single-photon sensitivity, but the background noise limits the imaging quality of the laser radar. In order to solve this problem, a depth image estimation method based on a two-dimensional (2D) Kaniadakis entropy thresholding method is proposed which transforms a weak signal extraction problem into a denoising problem for point cloud data. The characteristics of signal peak aggregation in the data and the spatio-temporal correlation features between target image elements in the point cloud-intensity data are exploited. Through adequate simulations and outdoor target-imaging experiments under different signal-to-background ratios (SBRs), the effectiveness of the method under low signal-to-background ratio conditions is demonstrated. When the SBR is 0.025, the proposed method reaches a target recovery rate of 91.7%, which is better than the existing typical methods, such as the Peak-picking method, Cross-Correlation method, and the sparse Poisson intensity reconstruction algorithm (SPIRAL), which achieve a target recovery rate of 15.7%, 7.0%, and 18.4%, respectively. Additionally, comparing with the SPIRAL, the reconstruction recovery ratio is improved by 73.3%. The proposed method greatly improves the integrity of the target under high-background-noise environments and finally provides a basis for feature extraction and target recognition. Full article
(This article belongs to the Special Issue Application of LiDAR Remote Sensing and Mapping)
Show Figures

Figure 1

Figure 1
<p>Flowchart of the proposed algorithm.</p>
Full article ">Figure 2
<p>(<b>a</b>) Distribution of the probability density of the signal and noise when the signal position is at the 300th time bin. (<b>b</b>) Counting histogram of 0.1 s data with signal and noise.</p>
Full article ">Figure 3
<p>Signal detection rate for extracting different numbers of wave peaks under different SBRs.</p>
Full article ">Figure 4
<p>(<b>a</b>) The standard depth image. (<b>b</b>) The standard intensity image.</p>
Full article ">Figure 5
<p>Depth images obtained using different methods of reconstruction. (<b>a</b>–<b>d</b>) SBR is 0.01; (<b>e</b>–<b>h</b>) SBR is 0.02; (<b>i</b>–<b>l</b>) SBR is 0.04; (<b>m</b>–<b>p</b>) SBR is 0.06; (<b>q</b>–<b>t</b>) SBR is 0.08; (<b>a</b>,<b>e</b>,<b>i</b>,<b>m</b>,<b>q</b>) Peak-picking method; (<b>b</b>,<b>f</b>,<b>j</b>,<b>n</b>,<b>r</b>) Cross-Correlation method; (<b>c</b>,<b>g</b>,<b>k</b>,<b>o</b>,<b>s</b>) SPIRAL method; (<b>d</b>,<b>h</b>,<b>l</b>,<b>p</b>,<b>t</b>) proposed method.</p>
Full article ">Figure 6
<p>The relationship between the SBR of the scene and the SNR of the reconstructed image using different methods.</p>
Full article ">Figure 7
<p>The photos of the experimental targets.</p>
Full article ">Figure 8
<p>The number of echo photons/pixel of the scenes. (<b>a</b>) Building at a distance of 730 m with an SBR of 0.078. (<b>b</b>) Building at a distance of 730 m with an SBR of 0.053. (<b>c</b>) Building at a distance of 730 m with an SBR of 0.031. (<b>d</b>) Building at a distance of 1400 m with an SBR of 0.031. (<b>e</b>) Building at a distance of 1400 m with an SBR of 0.025.</p>
Full article ">Figure 9
<p>Truth depth images and depth images obtained using different methods of the 730 m building. (<b>a</b>–<b>e</b>) SBR is 0.078; (<b>f</b>–<b>j</b>) SBR is 0.053; (<b>k</b>–<b>o</b>) SBR is 0.031; (<b>a</b>,<b>f</b>,<b>k</b>) truth depth image; (<b>b</b>,<b>g</b>,<b>l</b>) Peak-picking method; (<b>c</b>,<b>h</b>,<b>m</b>) Cross-Correlation method; (<b>d</b>,<b>i</b>,<b>n</b>) SPIRAL method; (<b>e</b>,<b>j</b>,<b>o</b>) proposed method.</p>
Full article ">Figure 10
<p>Truth depth images and depth images obtained using different methods of the 1400 m building. (<b>a</b>–<b>e</b>) SBR is 0.031; (<b>f</b>–<b>j</b>) SBR is 0.025; (<b>a</b>,<b>f</b>) truth depth image; (<b>b</b>,<b>g</b>) Peak-picking method; (<b>c</b>,<b>h</b>) Cross-Correlation method; (<b>d</b>,<b>i</b>) SPIRAL method; (<b>e</b>,<b>j</b>) proposed method.</p>
Full article ">Figure 11
<p>Intensity images with different signal-to-background ratios. (<b>a</b>) A 730 m building intensity image with an SBR of 0.078; (<b>b</b>) a 730 m building intensity image with an SBR of 0.053; (<b>c</b>) a 730 m building intensity image with an SBR of 0.031; (<b>d</b>) a 1400 m building intensity image with an SBR of 0.031; (<b>e</b>) a 1400 m building intensity image with an SBR of 0.025.</p>
Full article ">
22 pages, 13810 KiB  
Article
An Underwater Stereo Matching Method: Exploiting Segment-Based Method Traits without Specific Segment Operations
by Xinlin Xu, Huiping Xu, Lianjiang Ma, Kelin Sun and Jingchuan Yang
J. Mar. Sci. Eng. 2024, 12(9), 1599; https://doi.org/10.3390/jmse12091599 - 10 Sep 2024
Viewed by 952
Abstract
Stereo matching technology, enabling the acquisition of three-dimensional data, holds profound implications for marine engineering. In underwater images, irregular object surfaces and the absence of texture information make it difficult for stereo matching algorithms that rely on discrete disparity values to accurately capture [...] Read more.
Stereo matching technology, enabling the acquisition of three-dimensional data, holds profound implications for marine engineering. In underwater images, irregular object surfaces and the absence of texture information make it difficult for stereo matching algorithms that rely on discrete disparity values to accurately capture the 3D details of underwater targets. This paper proposes a stereo method based on an energy function of Markov random field (MRF) with 3D labels to fit the inclined plane of underwater objects. Through the integration of a cross-based patch alignment approach with two label optimization stages, the proposed method demonstrates features akin to segment-based stereo matching methods, enabling it to handle images with sparse textures effectively. Through experiments conducted on both simulated UW-Middlebury datasets and real deteriorated underwater images, our method demonstrates superiority compared to classical or state-of-the-art methods by analyzing the acquired disparity maps and observing the three-dimensional reconstruction of the underwater target. Full article
(This article belongs to the Special Issue Underwater Observation Technology in Marine Environment)
Show Figures

Figure 1

Figure 1
<p>The pipeline of our method, which is founded on the construction of two units: the fixed grid and the adaptive region, corresponding to different matching stages for proposing proposals.</p>
Full article ">Figure 2
<p>The relation between a cross-based patch and a center grid. The cross-based patch constructed with the center <span class="html-italic">p</span> of grid <math display="inline"><semantics> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </semantics></math> as the anchor is covered by grid <math display="inline"><semantics> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </semantics></math>.</p>
Full article ">Figure 3
<p>The illustration of the process of enhancing the cross-based patch. Binary labels, “0” and “1”, are employed to signify whether the pixels are located within the small speckle.</p>
Full article ">Figure 4
<p>The propagation process and expansion process occurring in a center grid <math display="inline"><semantics> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </semantics></math>.</p>
Full article ">Figure 5
<p>Summary of the proposed methodology.</p>
Full article ">Figure 6
<p>UW-Middlebury dataset.</p>
Full article ">Figure 7
<p>Effects of different optimization stages on error rate.</p>
Full article ">Figure 8
<p>Effect of optimization stages on running time. <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>o</mi> <msub> <mi>p</mi> <mi>r</mi> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>o</mi> <msub> <mi>p</mi> <mi>c</mi> </msub> </mrow> </semantics></math> signify the use of two optimization stages for conducting disparity estimation for Reindeer and Cones, respectively.</p>
Full article ">Figure 9
<p>Visual effect of optimization stages. The images illustrate the changes in disparity during the coarse-to-fine matching stage, with and without the propagation process. Frames 1, 3 shows the method’s performance in boundary regions, whereas frame 2 highlights its effectiveness in areas with repetitive textures.</p>
Full article ">Figure 10
<p>Visualization analysis of disparity results from diverse algorithms utilizing the dataset captured by Hawaii Institute [<a href="#B3-jmse-12-01599" class="html-bibr">3</a>,<a href="#B21-jmse-12-01599" class="html-bibr">21</a>,<a href="#B24-jmse-12-01599" class="html-bibr">24</a>,<a href="#B35-jmse-12-01599" class="html-bibr">35</a>].</p>
Full article ">Figure 11
<p>Visualization analysis of disparity results from diverse algorithms utilizing the dataset captured by our institute [<a href="#B3-jmse-12-01599" class="html-bibr">3</a>,<a href="#B21-jmse-12-01599" class="html-bibr">21</a>,<a href="#B24-jmse-12-01599" class="html-bibr">24</a>,<a href="#B35-jmse-12-01599" class="html-bibr">35</a>].</p>
Full article ">
18 pages, 6500 KiB  
Article
NSVDNet: Normalized Spatial-Variant Diffusion Network for Robust Image-Guided Depth Completion
by Jin Zeng and Qingpeng Zhu
Electronics 2024, 13(12), 2418; https://doi.org/10.3390/electronics13122418 - 20 Jun 2024
Cited by 1 | Viewed by 921
Abstract
Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance [...] Read more.
Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance of aligned Red–Green–Blue (RGB) images. Recent approaches have achieved a remarkable improvement, but the performance will degrade severely due to the corruption in input sparse depth. To enhance robustness to input corruption, we propose a novel depth completion scheme based on a normalized spatial-variant diffusion network incorporating measurement uncertainty, which introduces the following contributions. First, we design a normalized spatial-variant diffusion (NSVD) scheme to apply spatially varying filters iteratively on the sparse depth conditioned on its certainty measure for excluding depth corruption in the diffusion. In addition, we integrate the NSVD module into the network design to enable end-to-end training of filter kernels and depth reliability, which further improves the structural detail preservation via the guidance of RGB semantic features. Furthermore, we apply the NSVD module hierarchically at multiple scales, which ensures global smoothness while preserving visually salient details. The experimental results validate the advantages of the proposed network over existing approaches with enhanced performance and noise robustness for depth completion in real-use scenarios. Full article
(This article belongs to the Special Issue Image Sensors and Companion Chips)
Show Figures

Figure 1

Figure 1
<p>Example in NYUv2 dataset [<a href="#B12-electronics-13-02418" class="html-bibr">12</a>]. (<b>a</b>) RGB image input, (<b>b</b>) sparse depth input, depth estimation with (<b>c</b>) PNCNN [<a href="#B6-electronics-13-02418" class="html-bibr">6</a>] using single depth, (<b>d</b>) MiDaS [<a href="#B7-electronics-13-02418" class="html-bibr">7</a>] using single RGB, (<b>e</b>) NLSPN [<a href="#B11-electronics-13-02418" class="html-bibr">11</a>], and (<b>f</b>) proposed NSVDNet using both RGB and depth. As highlighted in the black rectangles, (<b>f</b>) NSVDNet generates more accurate structural details than (<b>e</b>) NLSPN due to the uncertainty-aware diffusion scheme. The results are evaluated using RMSE metric, where (<b>f</b>) NSVDNet achieves the smallest RMSE, indicating improved accuracy.</p>
Full article ">Figure 2
<p>An overview of NSVDNet architecture to predict a dense depth from a disturbed sparse depth with RGB guidance. NSVDNet is composed of the depth-dominant branch, which estimates the initial dense depth from the sparse sensor depth, and the RGB-dominant branch, which generates the semantic structural features. The two branches are fused in the hierarchical NSVD modules, where the initial dense depth is diffused with spatial-variant diffusion kernels constructed from RGB features.</p>
Full article ">Figure 3
<p>Depth completion with different algorithms, tested on NYUv2 dataset. As highlighted in the red rectangles, the proposed NSVDNet achieves more accurate depth completion results with detail preservation and noise robustness.</p>
Full article ">Figure 4
<p>Comparison of depth completion with original sparse depth and noisy sparse depth with 50% outliers, tested on NYUv2 dataset. The comparison between results with original and noisy inputs demonstrates the robustness to input corruption for the proposed method. The selected patches are enlarged in the colored rectangles.</p>
Full article ">Figure 5
<p>Generalization ability evaluation tests on TetrasRGBD dataset with outliers. The certainty maps explain the robustness of NSVDNet to input corruptions.</p>
Full article ">Figure 6
<p>Generalization ability evaluation tests on TetrasRGBD dataset with real sensor data, where the proposed NSVDNet generates more accurate depth estimation than competitive methods, including PNCNN [<a href="#B38-electronics-13-02418" class="html-bibr">38</a>] and NLSPN [<a href="#B11-electronics-13-02418" class="html-bibr">11</a>].</p>
Full article ">
16 pages, 3250 KiB  
Article
Iterative Adaptive Based Multi-Polarimetric SAR Tomography of the Forested Areas
by Shuang Jin, Hui Bi, Qian Guo, Jingjing Zhang and Wen Hong
Remote Sens. 2024, 16(9), 1605; https://doi.org/10.3390/rs16091605 - 30 Apr 2024
Cited by 1 | Viewed by 1232
Abstract
Synthetic aperture radar tomography (TomoSAR) is an extension of synthetic aperture radar (SAR) imaging. It introduces the synthetic aperture principle into the elevation direction to achieve three-dimensional (3-D) reconstruction of the observed target. Compressive sensing (CS) is a favorable technology for sparse elevation [...] Read more.
Synthetic aperture radar tomography (TomoSAR) is an extension of synthetic aperture radar (SAR) imaging. It introduces the synthetic aperture principle into the elevation direction to achieve three-dimensional (3-D) reconstruction of the observed target. Compressive sensing (CS) is a favorable technology for sparse elevation recovery. However, for the non-sparse elevation distribution of the forested areas, if CS is selected to reconstruct it, it is necessary to utilize some orthogonal bases to first represent the elevation reflectivity sparsely. The iterative adaptive approach (IAA) is a non-parametric algorithm that enables super-resolution reconstruction with minimal snapshots, eliminates the need for hyperparameter optimization, and requires fewer iterations. This paper introduces IAA to tomographicinversion of the forested areas and proposes a novel multi-polarimetric-channel joint 3-D imaging method. The proposed method relies on the characteristics of the consistent support of the elevation distribution of different polarimetric channels and uses the L2-norm to constrain the IAA-based 3-D reconstruction of each polarimetric channel. Compared with typical spectral estimation (SE)-based algorithms, the proposed method suppresses the elevation sidelobes and ambiguity and, hence, improves the quality of the recovered 3-D image. Compared with the wavelet-based CS algorithm, it reduces computational cost and avoids the influence of orthogonal basis selection. In addition, in comparison to the IAA, it demonstrates greater accuracy in identifying the support of the elevation distribution in forested areas. Experimental results based on BioSAR 2008 data are used to validate the proposed method. Full article
(This article belongs to the Special Issue Advances in Synthetic Aperture Radar Data Processing and Application)
Show Figures

Figure 1

Figure 1
<p>TomoSAR imaging geometry.</p>
Full article ">Figure 2
<p>Scattering Scattering of a forested area. (<b>a</b>) Scattering mechanism, (<b>b</b>) Scattering distribution.</p>
Full article ">Figure 3
<p>Schematic diagram of backscattering coefficients of HH, HV, and VV polarimetric channels.</p>
Full article ">Figure 4
<p>Elevation aperture position in the BioSAR 2008 dataset.</p>
Full article ">Figure 5
<p>Implementation process of the TomoSAR 3-D imaging of the forested areas based on the proposed method.</p>
Full article ">Figure 6
<p>Polarimetric SAR image of the surveillance area (The yellow area numbers 1 and 2 respectively represent the two slices selected for the experiment).</p>
Full article ">Figure 7
<p>Amplitude and phase results after data preprocessing for the (<b>a</b>) HH, (<b>b</b>) HV, and (<b>c</b>) VV polarimetric channels.</p>
Full article ">Figure 7 Cont.
<p>Amplitude and phase results after data preprocessing for the (<b>a</b>) HH, (<b>b</b>) HV, and (<b>c</b>) VV polarimetric channels.</p>
Full article ">Figure 8
<p>The incoherent sum of the results for all polarization channels (Slice 1). (<b>a</b>) BF. (<b>b</b>) Capon. (<b>c</b>) MUSIC. (<b>d</b>) Wavelet-based <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) IAA. (<b>f</b>) The proposed method. The white line represents the LiDAR DSM.</p>
Full article ">Figure 8 Cont.
<p>The incoherent sum of the results for all polarization channels (Slice 1). (<b>a</b>) BF. (<b>b</b>) Capon. (<b>c</b>) MUSIC. (<b>d</b>) Wavelet-based <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) IAA. (<b>f</b>) The proposed method. The white line represents the LiDAR DSM.</p>
Full article ">Figure 9
<p>The incoherent sum of the results for all polarization channels (Slice 2). (<b>a</b>) BF. (<b>b</b>) Capon. (<b>c</b>) MUSIC. (<b>d</b>) Wavelet-based <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) IAA. (<b>f</b>) The proposed method. The white line represents the LiDAR DSM.</p>
Full article ">Figure 9 Cont.
<p>The incoherent sum of the results for all polarization channels (Slice 2). (<b>a</b>) BF. (<b>b</b>) Capon. (<b>c</b>) MUSIC. (<b>d</b>) Wavelet-based <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>. (<b>e</b>) IAA. (<b>f</b>) The proposed method. The white line represents the LiDAR DSM.</p>
Full article ">Figure 10
<p>3-D point cloud map of the entire surveillance region reconstructed by the proposed method.</p>
Full article ">
26 pages, 19577 KiB  
Article
Enhancing Building Point Cloud Reconstruction from RGB UAV Data with Machine-Learning-Based Image Translation
by Elisabeth Johanna Dippold and Fuan Tsai
Sensors 2024, 24(7), 2358; https://doi.org/10.3390/s24072358 - 8 Apr 2024
Cited by 1 | Viewed by 1535
Abstract
The performance of three-dimensional (3D) point cloud reconstruction is affected by dynamic features such as vegetation. Vegetation can be detected by near-infrared (NIR)-based indices; however, the sensors providing multispectral data are resource intensive. To address this issue, this study proposes a two-stage framework [...] Read more.
The performance of three-dimensional (3D) point cloud reconstruction is affected by dynamic features such as vegetation. Vegetation can be detected by near-infrared (NIR)-based indices; however, the sensors providing multispectral data are resource intensive. To address this issue, this study proposes a two-stage framework to firstly improve the performance of the 3D point cloud generation of buildings with a two-view SfM algorithm, and secondly, reduce noise caused by vegetation. The proposed framework can also overcome the lack of near-infrared data when identifying vegetation areas for reducing interferences in the SfM process. The first stage includes cross-sensor training, model selection and the evaluation of image-to-image RGB to color infrared (CIR) translation with Generative Adversarial Networks (GANs). The second stage includes feature detection with multiple feature detector operators, feature removal with respect to the NDVI-based vegetation classification, masking, matching, pose estimation and triangulation to generate sparse 3D point clouds. The materials utilized in both stages are a publicly available RGB-NIR dataset, and satellite and UAV imagery. The experimental results indicate that the cross-sensor and category-wise validation achieves an accuracy of 0.9466 and 0.9024, with a kappa coefficient of 0.8932 and 0.9110, respectively. The histogram-based evaluation demonstrates that the predicted NIR band is consistent with the original NIR data of the satellite test dataset. Finally, the test on the UAV RGB and artificially generated NIR with a segmentation-driven two-view SfM proves that the proposed framework can effectively translate RGB to CIR for NDVI calculation. Further, the artificially generated NDVI is able to segment and classify vegetation. As a result, the generated point cloud is less noisy, and the 3D model is enhanced. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Two-stage framework separated into the machine learning technique (gray) and the application in three steps (green). Firstly (gray), the CIR image is generated from RGB with image-to-image translation. Then (light green), the NDVI is calculated with the generated NIR and red band. Afterwards, (medium green), the NDVI segmentation and classification is used to match the detected features accordingly. Finally (dark green), pose estimation and triangulation are used to generate a sparse 3D point cloud.</p>
Full article ">Figure 2
<p>First stage of the two-stage workflow. (<b>a</b>) Image-to-image translation in 5 steps for RGB2CIR simulation. In general, input and pre-processing (orange), training and testing (green) and verification and validation (yellow) (<b>b</b>) Image-to-image translation training.</p>
Full article ">Figure 3
<p>Framework second stage: segmentation-driven two-view SfM algorithm. The processing steps are grouped by color, the NDVI related processing (green), the input, feature detection (orange), feature processing (yellow) and the output (blue).</p>
Full article ">Figure 4
<p>Pleiades VHR satellite imagery, with the nadir view in true color (RGB). The location of the study target is marked in orange and used for validation (see <a href="#sec3dot2dot3-sensors-24-02358" class="html-sec">Section 3.2.3</a>).</p>
Full article ">Figure 5
<p>The target for validation captured by Pleiades VHR satellite. (<b>a</b>) The target stadium; (<b>b</b>) the geolocation of the target (marked in orange in <a href="#sensors-24-02358-f004" class="html-fig">Figure 4</a>); (<b>c</b>) the target ground truth (GT) CIR image. GT NDVI of the target building and its vicinity.</p>
Full article ">Figure 6
<p>Morphological changes on the image covering the target and image tiles. (<b>a</b>) Original cropped CIR image of Pleiades Satellite Imagery (1024 × 1024 × 3). A single tile, the white rectangle in (<b>a</b>), is shown as (<b>e</b>). (<b>b</b>–<b>d</b>) and (<b>f</b>–<b>i</b>) are the morphed images of (<b>a</b>) and (<b>e</b>), respectively.</p>
Full article ">Figure 7
<p>Training over 200 epochs for model selection. The generator loss (loss GEN) plotted in orange and, in contrast, FID calculation results in blue.</p>
Full article ">Figure 8
<p>Training Pix2Pix for model selection with FID. The epochs with the best FID and CM are marked for every test run, expect overall, with colored bars respectivly. The numbers are summarized in <a href="#sensors-24-02358-t005" class="html-table">Table 5</a>.</p>
Full article ">Figure 9
<p>CIR pansharpening on the target. The high-resolution panchromatic image is used to increase the resolution of the composite CIR image while preserving spectral information. From top to bottom, (<b>a</b>) panchromatic, (<b>b</b>) color infrared created from multi-spectral bands, and (<b>c</b>) pansharpened color infrared are shown.</p>
Full article ">Figure 10
<p>Example of vegetation feature removal to the north of the stadium. (<b>a</b>) CIR images; (<b>b</b>) NDVI image with legend; (<b>c</b>) identified SURF features (yellos asterisks) within dense vegetated areas (green) using 0.6 as the threshold.</p>
Full article ">Figure 11
<p>Comparison between the prediction and the ground truth (GT) of the CIR, NIR and NDVI (incl. legend) of the main target (a stadium) and vicinity.</p>
Full article ">Figure 12
<p>Comparison between the prediction and the ground truth (GT) of the CIR, NIR and NDVI generated from a pansharpened RGB satellite sub-image.</p>
Full article ">Figure 13
<p>Histogram and visual inspection of the CIR and NDVI simulated using MS and PAN images on the target stadium. (<b>a</b>–<b>c</b>) Ground truth (GT) and NDVI predicted using one tile with the size of 256 × 256 from MS Pleiades and their histograms. (<b>d</b>–<b>f</b>) Ground truth of CIR, NIR, NDVI and predicted NIR and NDVI images from nine tiles of the PAN Pleiades images and histograms for NDVI comparison.</p>
Full article ">Figure 14
<p>Histogram and visual inspection of MS (<b>I</b>–<b>III</b>) and PAN (<b>IV</b>–<b>VI</b>) examples of Zhubei city.</p>
Full article ">Figure 15
<p>Prediction of CIR, NIR and calculated NDVI of a UAV scene: (<b>a</b>) RGB, (<b>b</b>) predicted CIR image, (<b>c</b>) the extracted NIR band of (<b>b</b>), and (<b>d</b>) calculated NDVI with NIR and red band. A close-up view of the area marked with an orange box in (<b>a</b>) is displayed as two 256 × 256 tiles in RGB (<b>e</b>) and the predicted CIR (<b>f</b>).</p>
Full article ">Figure 15 Cont.
<p>Prediction of CIR, NIR and calculated NDVI of a UAV scene: (<b>a</b>) RGB, (<b>b</b>) predicted CIR image, (<b>c</b>) the extracted NIR band of (<b>b</b>), and (<b>d</b>) calculated NDVI with NIR and red band. A close-up view of the area marked with an orange box in (<b>a</b>) is displayed as two 256 × 256 tiles in RGB (<b>e</b>) and the predicted CIR (<b>f</b>).</p>
Full article ">Figure 16
<p>Direct comparison between without (<b>a</b>) and with vegetation segmentation (<b>b</b>). Areas of low density shown in blue, areas of high density shown in red.</p>
Full article ">Figure 17
<p>Two−view SfM 3D sparse point cloud without the application of NDVI−based vegetation removal on the target CSRSR. (<b>a</b>) Sparse point cloud with no further coloring; (<b>b</b>) point cloud colored by elevation; (<b>c</b>) density analysis and the corresponding histogram (<b>d</b>). In addition, Table (<b>e</b>) shows the accumulated number of points over the three operators (SURF, ORB and FAST) and the initial and manually cleaned and processed point cloud.</p>
Full article ">Figure 18
<p>Two−view SfM reconstructed 3D sparse point cloud with vegetation segmentation and removal process based on simulated NDVI of the target building. (<b>a</b>) Sparse point cloud with no further coloring; (<b>b</b>) point cloud colored by elevation; (<b>c</b>) density analysis and (<b>d</b>) the histogram. In addition, (<b>e</b>) lists the accumulated number of points over the three operators (SURF, ORB and FAST) after segmentation, with 0.5 NDVI as the threshold to mask vegetation in SURF and ORB, and the initial and manually cleaned point cloud.</p>
Full article ">
13 pages, 3903 KiB  
Article
Binocular Visual Measurement Method Based on Feature Matching
by Zhongyang Xie and Chengyu Yang
Sensors 2024, 24(6), 1807; https://doi.org/10.3390/s24061807 - 11 Mar 2024
Cited by 2 | Viewed by 1219
Abstract
To address the issues of low measurement accuracy and unstable results when using binocular cameras to detect objects with sparse surface textures, weak surface textures, occluded surfaces, low-contrast surfaces, and surfaces with intense lighting variations, a three-dimensional measurement method based on an improved [...] Read more.
To address the issues of low measurement accuracy and unstable results when using binocular cameras to detect objects with sparse surface textures, weak surface textures, occluded surfaces, low-contrast surfaces, and surfaces with intense lighting variations, a three-dimensional measurement method based on an improved feature matching algorithm is proposed. Initially, features are extracted from the left and right images obtained by the binocular camera. The extracted feature points serve as seed points, and a one-dimensional search space is established accurately based on the disparity continuity and epipolar constraints. The optimal search range and seed point quantity are obtained using the particle swarm optimization algorithm. The zero-mean normalized cross-correlation coefficient is employed as a similarity measure function for region growing. Subsequently, the left and right images are matched based on the grayscale information of the feature regions, and seed point matching is performed within each matching region. Finally, the obtained matching pairs are used to calculate the three-dimensional information of the target object using the triangulation formula. The proposed algorithm significantly enhances matching accuracy while reducing algorithm complexity. Experimental results on the Middlebury dataset show an average relative error of 0.75% and an average measurement time of 0.82 s. The error matching rate of the proposed image matching algorithm is 2.02%, and the PSNR is 34 dB. The algorithm improves the measurement accuracy for objects with sparse or weak textures, demonstrating robustness against brightness variations and noise interference. Full article
(This article belongs to the Section Optical Sensors)
Show Figures

Figure 1

Figure 1
<p>Three-dimensional measurement workflow.</p>
Full article ">Figure 2
<p>Epipolar line constraint.</p>
Full article ">Figure 3
<p>Disparity gradient constraint.</p>
Full article ">Figure 4
<p>Region growing process. (<b>a</b>) Image of seed point pixel; (<b>b</b>) image after seed point diffusion.</p>
Full article ">Figure 5
<p>Schematic diagram of binocular camera measurement.</p>
Full article ">Figure 6
<p>Experimental results for bowling stereo image pair. (<b>a</b>) Left view; (<b>b</b>) right view; (<b>c</b>) Siamese network algorithm; (<b>d</b>) generative adversarial network algorithm; (<b>e</b>) region matching algorithm; (<b>f</b>) DP algorithm; (<b>g</b>) SIFT matching algorithm; (<b>h</b>) proposed algorithm.</p>
Full article ">Figure 7
<p>Experimental results for lampshade stereo image pair. (<b>a</b>) Left view; (<b>b</b>) right view; (<b>c</b>) Siamese network algorithm; (<b>d</b>) generative adversarial network algorithm; (<b>e</b>) region matching algorithm; (<b>f</b>) DP algorithm; (<b>g</b>) SIFT matching algorithm; (<b>h</b>) proposed algorithm.</p>
Full article ">Figure 8
<p>Experimental results for plastic stereo image pair. (<b>a</b>) Left view; (<b>b</b>) right view; (<b>c</b>) Siamese network algorithm; (<b>d</b>) generative adversarial network algorithm; (<b>e</b>) region matching algorithm; (<b>f</b>) DP algorithm; (<b>g</b>) SIFT matching algorithm; (<b>h</b>) proposed algorithm.</p>
Full article ">Figure 9
<p>Experimental results for wood stereo image pair. (<b>a</b>) Left view; (<b>b</b>) right view; (<b>c</b>) Siamese network algorithm; (<b>d</b>) generative adversarial network algorithm; (<b>e</b>) region matching algorithm; (<b>f</b>) DP algorithm; (<b>g</b>) SIFT matching algorithm; (<b>h</b>) proposed algorithm.</p>
Full article ">
21 pages, 6240 KiB  
Article
Similarity Measurement and Retrieval of Three-Dimensional Voxel Model Based on Symbolic Operator
by Zhenwen He, Xianzhen Liu and Chunfeng Zhang
ISPRS Int. J. Geo-Inf. 2024, 13(3), 89; https://doi.org/10.3390/ijgi13030089 - 11 Mar 2024
Viewed by 1707
Abstract
Three-dimensional voxel models are widely applied in various fields such as 3D imaging, industrial design, and medical imaging. The advancement of 3D modeling techniques and measurement devices has made the generation of three-dimensional models more convenient. The exponential increase in the number of [...] Read more.
Three-dimensional voxel models are widely applied in various fields such as 3D imaging, industrial design, and medical imaging. The advancement of 3D modeling techniques and measurement devices has made the generation of three-dimensional models more convenient. The exponential increase in the number of 3D models presents a significant challenge for model retrieval. Currently, these models are numerous and typically represented as point clouds or meshes, resulting in sparse data and high feature dimensions within the retrieval database. Traditional methods for 3D model retrieval suffer from high computational complexity and slow retrieval speeds. To address this issue, this paper combines spatial-filling curves with octree structures and proposes a novel approach for representing three-dimensional voxel model sequence data features, along with a similarity measurement method based on symbolic operators. This approach enables efficient similarity calculations and rapid dimensionality reduction for the three-dimensional model database, facilitating efficient similarity calculations and expedited retrieval. Full article
Show Figures

Figure 1

Figure 1
<p>Three-dimensional model retrieval based on VSO architecture.</p>
Full article ">Figure 2
<p>The Hilbert curve.</p>
Full article ">Figure 3
<p>The three-dimensional Hilbert curve.</p>
Full article ">Figure 4
<p>Hilbert space-filling curve-based voxelization and octree storage of 3D models.</p>
Full article ">Figure 5
<p>(<b>a</b>) Voxel granularity 4 × 4 × 4, (<b>b</b>) voxel granularity 8 × 8 × 8.</p>
Full article ">Figure 6
<p>(<b>a</b>) Sixteen filling states of voxels (<b>b</b>) Representation of 3D models based on symbolic operators.</p>
Full article ">Figure 7
<p>(<b>a</b>) Symbol sequence of length 16, (<b>b</b>) symbol sequence of length 128.</p>
Full article ">Figure 8
<p>Symbol sequence for the bathtub class model.</p>
Full article ">Figure 9
<p>Feature representation processes based on symbolic operators.</p>
Full article ">Figure 10
<p>Similarity measurement based on VSO.</p>
Full article ">Figure 11
<p>ModelNet10 classification confusion matrix.</p>
Full article ">Figure 12
<p>Confusing night_stand and dresser models.</p>
Full article ">Figure 13
<p>The section represents the characteristic dimensions of the method.</p>
Full article ">
15 pages, 3959 KiB  
Article
Sub-Bin Delayed High-Range Accuracy Photon-Counting 3D Imaging
by Hao-Meng Yin, Hui Zhao, Ming-Yang Yang, Yong-An Liu, Li-Zhi Sheng and Xue-Wu Fan
Photonics 2024, 11(2), 181; https://doi.org/10.3390/photonics11020181 - 16 Feb 2024
Viewed by 1198
Abstract
The range accuracy of single-photon-array three-dimensional (3D) imaging systems is limited by the time resolution of the array detectors. We introduce a method for achieving super-resolution in 3D imaging through sub-bin delayed scanning acquisition and fusion. Its central concept involves the generation of [...] Read more.
The range accuracy of single-photon-array three-dimensional (3D) imaging systems is limited by the time resolution of the array detectors. We introduce a method for achieving super-resolution in 3D imaging through sub-bin delayed scanning acquisition and fusion. Its central concept involves the generation of multiple sub-bin difference histograms through sub-bin shifting. Then, these coarse time-resolution histograms are fused with multiplied averages to produce finely time-resolved detailed histograms. Finally, the arrival times of the reflected photons with sub-bin resolution are extracted from the resulting fused high-time-resolution count distribution. Compared with the sub-delayed with the fusion method added, the proposed method performs better in reducing the broadening error caused by coarsened discrete sampling and background noise error. The effectiveness of the proposed method is examined at different target distances, pulse widths, and sub-bin scales. The simulation analytical results indicate that small-scale sub-bin delays contribute to superior reconstruction outcomes for the proposed method. Specifically, implementing a sub-bin temporal resolution delay of a factor of 0.1 for a 100 ps echo pulse width substantially reduces the system ranging error by three orders of magnitude. Furthermore, Monte Carlo simulations allow to describe a low signal-to-background noise ratio (0.05) characterised by sparsely reflected photons. The proposed method demonstrates a commendable capability to simultaneously achieve wide-ranging super-resolution and denoising. This is evidenced by the detailed depth distribution information and substantial reduction of 95.60% in the mean absolute error of the reconstruction results, confirming the effectiveness of the proposed method in noisy scenarios. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Diagram of single-photon 3D imaging system (PC, personal computer; TDC, time-to-digital conversion); (<b>b</b>) the principle of time-correlated single-photon counting.</p>
Full article ">Figure 2
<p>Distribution of quantisation error and quantisation centroid error with difference gate start time at 5.7 m target distance.</p>
Full article ">Figure 3
<p>Example of proposed method for double delaying.</p>
Full article ">Figure 4
<p>Delay of 0.1 ns for acquisition coarse histograms at 6 m target distance.</p>
Full article ">Figure 5
<p>Counts over time using different fusion methods. (<b>a</b>) Theoretical photon-count distribution. (<b>b</b>) Direct measurement histogram distribution. (<b>c</b>) Additive fusion-count distribution. (<b>d</b>) Multiplicative fusion-count distribution.</p>
Full article ">Figure 6
<p>Distribution of ranging error with theoretical distance for (<b>a</b>) <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mn>2</mn> <mi>n</mi> <mi>s</mi> </mrow> </semantics></math>, (<b>b</b>) <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mn>4</mn> <mi>n</mi> <mi>s</mi> </mrow> </semantics></math>, (<b>c</b>) <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mn>5</mn> <mi>n</mi> <mi>s</mi> </mrow> </semantics></math>, and (<b>d</b>) <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>s</mi> <mi>u</mi> <mi>b</mi> </mrow> </msub> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mn>10</mn> <mi>n</mi> <mi>s</mi> </mrow> </semantics></math>.</p>
Full article ">Figure 7
<p>Reconstructed target distances at reflected echo pulse width <span class="html-italic">τ</span> of (<b>a</b>) 20 ps, (<b>b</b>) 50 ps, and (<b>c</b>) 100 ps.</p>
Full article ">Figure 8
<p>Distribution of calculated and theoretical distances for pulse width <span class="html-italic">τ</span> of (<b>a</b>) 20 ps, (<b>b</b>) 50 ps, and (<b>c</b>) 100 ps.</p>
Full article ">Figure 9
<p>(<b>a</b>,<b>e</b>) Ground truth. (<b>b</b>,<b>f</b>) Direct measurement reconstruction results for different SBR levels. (<b>c</b>,<b>g</b>) Subtractive dither reconstruction results for different SBR levels. (<b>d</b>,<b>h</b>) Proposed method reconstruction results for different SBR levels.</p>
Full article ">Figure 10
<p>(<b>a</b>,<b>e</b>,<b>i</b>,<b>m</b>) Ground truth. (<b>b</b>,<b>f</b>,<b>j</b>,<b>n</b>) Direct measurement reconstruction results for different acquisition pulse numbers. (<b>c</b>,<b>g</b>,<b>k</b>,<b>o</b>) Subtractive dither reconstruction results for different acquisition pulse number. (<b>d</b>,<b>h</b>,<b>l</b>,<b>p</b>) Proposed method reconstruction results for different acquisition pulse numbers.</p>
Full article ">
19 pages, 7992 KiB  
Article
A Deep Learning Approach for Improving Two-Photon Vascular Imaging Speeds
by Annie Zhou, Samuel A. Mihelic, Shaun A. Engelmann, Alankrit Tomar, Andrew K. Dunn and Vagheesh M. Narasimhan
Bioengineering 2024, 11(2), 111; https://doi.org/10.3390/bioengineering11020111 - 24 Jan 2024
Cited by 1 | Viewed by 1811
Abstract
A potential method for tracking neurovascular disease progression over time in preclinical models is multiphoton fluorescence microscopy (MPM), which can image cerebral vasculature with capillary-level resolution. However, obtaining high-quality, three-dimensional images with traditional point scanning MPM is time-consuming and limits sample sizes for [...] Read more.
A potential method for tracking neurovascular disease progression over time in preclinical models is multiphoton fluorescence microscopy (MPM), which can image cerebral vasculature with capillary-level resolution. However, obtaining high-quality, three-dimensional images with traditional point scanning MPM is time-consuming and limits sample sizes for chronic studies. Here, we present a convolutional neural network-based (PSSR Res-U-Net architecture) algorithm for fast upscaling of low-resolution or sparsely sampled images and combine it with a segmentation-less vectorization process for 3D reconstruction and statistical analysis of vascular network structure. In doing so, we also demonstrate that the use of semi-synthetic training data can replace the expensive and arduous process of acquiring low- and high-resolution training pairs without compromising vectorization outcomes, and thus open the possibility of utilizing such approaches for other MPM tasks where collecting training data is challenging. We applied our approach to images with large fields of view from a mouse model and show that our method generalizes across imaging depths, disease states and other differences in neurovasculature. Our pretrained models and lightweight architecture can be used to reduce MPM imaging time by up to fourfold without any changes in underlying hardware, thereby enabling deployability across a range of settings. Full article
(This article belongs to the Special Issue AI and Big Data Research in Biomedical Engineering)
Show Figures

Figure 1

Figure 1
<p><b>Structure and analysis pipeline.</b> Low-resolution images (128 × 128 pixels) are acquired using two-photon microscopy. A deep learning (PSSR Res-U-Net)-based upscaling process generates high-resolution images (512 × 512 pixels), which take much longer to acquire, from low-resolution images. Segmentation-less vascular vectorization (SLAVV) generates 3D renderings and calculates network statistics from an upscaled image stack.</p>
Full article ">Figure 2
<p><b>Generating and evaluating semi-synthetic training data.</b> (<b>a</b>) Examples of semi-synthetic training images created using different types of added noise prior to downscaling: no noise (downscaling only), Poisson, Gaussian, and additive Gaussian. Acquired low-resolution (LR, 128 × 128 pixels) and high-resolution (HR, 512 × 512 pixels) ground truth images are shown for reference. (<b>b</b>) Resulting test image output from models trained using each noise method, with acquired low-resolution image for model input and acquired high-resolution image as ground truth for comparison. All models were trained with 3399 image pairs, with the Gaussian and additive Gaussian models further tested on 24,069 image pairs (7×) to further test performance. (<b>c</b>) Boxplot comparison of PSNR and SSIM values for each noise method image in (<b>b</b>) measured against ground truth image. Values plotted for an image stack of 222 images. (<b>d</b>) Comparison of test images from models trained using real-world acquired vs. semi-synthetic data, with real acquired low-resolution image for model input and acquired high-resolution image as ground truth for reference. All models were trained with 234 image pairs, a large reduction from the noise model comparison, due to the limited availability of real-world pairs. (<b>e</b>) Boxplot comparison of PSNR and SSIM values for real acquired vs. semi-synthetic model outputs corresponding to (<b>d</b>), measured against high-resolution ground truth image. All values are plotted for an image stack of 222 images.</p>
Full article ">Figure 3
<p><b>Comparison of performance</b> between bilinear upscaling, a single-frame model, and a multi-frame model for semi-synthetic and real acquired test images. All models were trained with 24,069 image pairs. (<b>a</b>) Semi-synthetic test images from bilinear upscaling and models trained using single- vs. multi-frame data. Acquired low-resolution image for model input and acquired high-resolution image as ground truth are shown for reference. (<b>b</b>) PSNR and SSIM plots corresponding to semi-synthetic test results from (<b>a</b>). (<b>c</b>) Real-world test images from bilinear upscaling and models trained using single- vs. multi-frame data. (<b>d</b>) PSNR and SSIM plots corresponding to real-world test image results from (<b>c</b>).</p>
Full article ">Figure 4
<p><b>Maximum-intensity projections (x-y) of ischemic infarct images</b> consisting of 2 × 4 tiles with 213 slices (final dimensions 1.18 mm × 2.10 mm × 0.636 mm, pixel dimensions 1.34 μm × 1.36 μm × 3 μm) for a semi-synthetic low-resolution image, bilinear upscaled image, single- and multi-frame output images, and acquired high-resolution image. The black hole in the bottom-left corner represents the infarct itself.</p>
Full article ">Figure 5
<p><b>Comparison of vectorization results using different upscaling methods against a ground truth image.</b> (<b>a</b>) Blender rendering of vectorized images using VessMorphoVis [<a href="#B26-bioengineering-11-00111" class="html-bibr">26</a>] for visual comparison between single- and multi-frame results and an acquired high-resolution image. We performed manual curation for this vectorization process. (<b>b</b>) Vectorized image statistics for the automated curation process with known ground truth (simulated from manually curated high-resolution image). CDFs shown for metrics of length, radius, z-direction, and inverse tortuosity for original (OG), simulated original (sOG), bilinear upscaled (BL), and PSSR single- and multi-frame (SF, MF, respectively) images. Pearson’s correlation values (<span class="html-italic">r</span><sup>2</sup>) were calculated between the original image and each simulated or upscaled image for each metric. (<b>c</b>) Statistics regarding maximum accuracy (%) achieved with vectorization or thresholding and % error in median length and radius for each method.</p>
Full article ">
17 pages, 7346 KiB  
Article
W-Band FMCW MIMO System for 3-D Imaging Based on Sparse Array
by Wenyuan Shao, Jianmin Hu, Yicai Ji, Wenrui Zhang and Guangyou Fang
Electronics 2024, 13(2), 369; https://doi.org/10.3390/electronics13020369 - 16 Jan 2024
Cited by 4 | Viewed by 1479
Abstract
Multiple-input multiple-output (MIMO) technology is widely used in the field of security imaging. However, existing imaging systems have shortcomings such as numerous array units, high hardware costs, and low imaging resolutions. In this paper, a sparse array-based frequency modulated continuous wave (FMCW) millimeter [...] Read more.
Multiple-input multiple-output (MIMO) technology is widely used in the field of security imaging. However, existing imaging systems have shortcomings such as numerous array units, high hardware costs, and low imaging resolutions. In this paper, a sparse array-based frequency modulated continuous wave (FMCW) millimeter wave imaging system, operating in the W-band, is presented. In order to reduce the number of transceiver units of the system and lower the hardware cost, a linear sparse array with a periodic structure was designed using the MIMO technique. The system operates at 70~80 GHz, and the high operating frequency band and 10 GHz bandwidth provide good imaging resolution. The system consists of a one-dimensional linear array, a motion control system, and hardware for signal generation and image reconstruction. The channel calibration technique was used to eliminate inherent errors. The system combines mechanical and electrical scanning, and uses FMCW signals to extract distance information. The three-dimensional (3-D) fast imaging algorithm in the wave number domain was utilized to quickly process the detection data. The 3-D imaging of the target in the near-field was obtained, with an imaging resolution of 2 mm. The imaging ability of the system was verified through simulations and experiments. Full article
(This article belongs to the Special Issue Radar Signal Processing Technology)
Show Figures

Figure 1

Figure 1
<p>Schematic diagram model of radar echo signal.</p>
Full article ">Figure 2
<p>Schematic diagram of a single module array and EPC distribution.</p>
Full article ">Figure 3
<p>Schematic diagram of the working principle of millimeter wave MIMO imaging system.</p>
Full article ">Figure 4
<p>Schematic diagram of point-target imaging simulation scene.</p>
Full article ">Figure 5
<p>(<b>a</b>) Simulation results of point targets at 0.3 m; (<b>b</b>) simulation results of two-dimensional points-matrix target at 0.3 m.</p>
Full article ">Figure 6
<p>Single-point target and two-dimensional point-target matrix echo curves at 0.3 m. (<b>a</b>) Azimuth direction echo curve of single-point target; (<b>b</b>) mechanical scanning direction echo curve of single-point target; (<b>c</b>) azimuth direction echo curve of points-matrix target; (<b>d</b>) mechanical scanning direction echo curve of points-matrix target.</p>
Full article ">Figure 7
<p>Schematic diagram of radar system structure.</p>
Full article ">Figure 8
<p>Photograph of the imaging system.</p>
Full article ">Figure 9
<p>Schematic diagram of the imaging module structure.</p>
Full article ">Figure 10
<p>Schematic diagram of calibration equipment structure.</p>
Full article ">Figure 11
<p>(<b>a</b>) Photo of the experimental scene, including three metal spherical shells at 0.3 m; (<b>b</b>) imaging results of three metal shells.</p>
Full article ">Figure 12
<p>Echo curves of three metal spherical shells at 0.3 m. (<b>a</b>) Azimuth direction echo curve; (<b>b</b>) mechanical scanning direction echo curve.</p>
Full article ">Figure 13
<p>(<b>a</b>) Optical picture of the resolution board; (<b>b</b>) imaging results before calibration; (<b>c</b>) imaging results after calibration.</p>
Full article ">Figure 14
<p>(<b>a</b>) Optical photo of the pistol model; (<b>b</b>) optical photo of the dagger.</p>
Full article ">Figure 15
<p>(<b>a</b>) Photo of the experimental scene, including the imaging system and human models containing dangerous articles hidden under clothing; (<b>b</b>) imaging results of the pistol model; (<b>c</b>) imaging results of the dagger.</p>
Full article ">
30 pages, 22271 KiB  
Article
A Novel Approach for Simultaneous Localization and Dense Mapping Based on Binocular Vision in Forest Ecological Environment
by Lina Liu, Yaqiu Liu, Yunlei Lv and Xiang Li
Forests 2024, 15(1), 147; https://doi.org/10.3390/f15010147 - 10 Jan 2024
Cited by 2 | Viewed by 1708
Abstract
The three-dimensional reconstruction of forest ecological environment by low-altitude remote sensing photography from Unmanned Aerial Vehicles (UAVs) provides a powerful basis for the fine surveying of forest resources and forest management. A stereo vision system, D-SLAM, is proposed to realize simultaneous localization and [...] Read more.
The three-dimensional reconstruction of forest ecological environment by low-altitude remote sensing photography from Unmanned Aerial Vehicles (UAVs) provides a powerful basis for the fine surveying of forest resources and forest management. A stereo vision system, D-SLAM, is proposed to realize simultaneous localization and dense mapping for UAVs in complex forest ecological environments. The system takes binocular images as input and 3D dense maps as target outputs, while the 3D sparse maps and the camera poses can be obtained. The tracking thread utilizes temporal clue to match sparse map points for zero-drift localization. The relative motion amount and data association between frames are used as constraints for new keyframes selection, and the binocular image spatial clue compensation strategy is proposed to increase the robustness of the algorithm tracking. The dense mapping thread uses Linear Attention Network (LANet) to predict reliable disparity maps in ill-posed regions, which are transformed to depth maps for constructing dense point cloud maps. Evaluations of three datasets, EuRoC, KITTI and Forest, show that the proposed system can run at 30 ordinary frames and 3 keyframes per second with Forest, with a high localization accuracy of several centimeters for Root Mean Squared Absolute Trajectory Error (RMS ATE) on EuRoC and a Relative Root Mean Squared Error (RMSE) with two average values of 0.64 and 0.2 for trel and Rrel with KITTI, outperforming most mainstream models in terms of tracking accuracy and robustness. Moreover, the advantage of dense mapping compensates for the shortcomings of sparse mapping in most Smultaneous Localization and Mapping (SLAM) systems and the proposed system meets the requirements of real-time localization and dense mapping in the complex ecological environment of forests. Full article
(This article belongs to the Special Issue Modeling and Remote Sensing of Forests Ecosystem)
Show Figures

Figure 1

Figure 1
<p>Disparity maps in Forest dataset.</p>
Full article ">Figure 2
<p>Bag in Forest dataset.</p>
Full article ">Figure 3
<p>Left images in bag.</p>
Full article ">Figure 4
<p>Right images in bag.</p>
Full article ">Figure 5
<p>The D-SLAM system consists of four main parallel threads, tracking, local mapping, dense mapping, and loop closing, where the acronyms are defined as follows: Preprocessing (Pre-process), Llocal Bundle Adjustment (Local BA), Full Bundle Adjustment (Full BA), Special Euclidean group (SE3), Point cloud (Pcd), Linear Attention Network (LANet), and bags of binary Words for fast place recognition in image sequences (DboW2).</p>
Full article ">Figure 6
<p>Compensation of spatial clue in the right image frame, where KF represents the key frame, CF represents the current frame, and RF-1 represents the previous frame of the right image.</p>
Full article ">Figure 7
<p>Network structure of LANet. LANet consists of five main parts: feature extraction ResNet, Attention Module (AM), Construction of Matching cost, Three Dimensional Convolutional Neural Network aggregation (3D CNN aggregation), and disparity prediction. The AM consists of two parts: Spatial Attention Module (SAM) and Channel Attention Module (CAM); the 3D CNN aggregation consists of two structures: the basic structure is used for ablation experiments to test the performance of various parts of the network and the stacked hourglass structure is used to optimize the network.</p>
Full article ">Figure 8
<p>Linear mapping layers.</p>
Full article ">Figure 9
<p>The visualization of disparity maps with Forest. The yellow or green boxes are the regions with significant disparity contrast generated by various methods.</p>
Full article ">Figure 10
<p>Estimated trajectory (dark blue) and GT (red) for 9 sequences on EuRoC.</p>
Full article ">Figure 11
<p>Error graph for the 05 sequence of KITTI.</p>
Full article ">Figure 12
<p>Comparison of projection trajectories for the 08 sequence of KITTI.</p>
Full article ">Figure 13
<p>Dense mapping Effect for 01-sequence of KITTI.(<b>a</b>) a left RGB image, (<b>b</b>) a visual disparity map, (<b>c</b>) afeature point tracking map, (<b>d</b>) anestimated trajectory map, (<b>e</b>) asparse point cloud map, and (<b>f</b>) a dense point cloud map.</p>
Full article ">Figure 14
<p>Local dense point cloud maps from different views for the KITTI01 sequence.</p>
Full article ">Figure 15
<p>Dense mapping process on Forest, where (<b>a</b>) is the left RGB image, (<b>b</b>) is the visual disparity map, (<b>c</b>) is the feature point tracking map, (<b>d</b>) is the estimated trajectory map, which has a loop closing and the localization of the front end and the accuracy of the back end mapping are improved after loop closing correction, (<b>e</b>) is the sparse point cloud map, in which the blue boxes represent the keyframes, the green box represents the current frame, the red box represents the start frame, the red points represent the reference map points, and the black points represent all map points generated by keyframes. and (<b>f</b>) is the overall effect of the dense point cloud map.</p>
Full article ">Figure 16
<p>Localized dense point cloud at different angles with Forest.</p>
Full article ">
30 pages, 6445 KiB  
Article
Infrared Dim and Small Target Detection Based on Superpixel Segmentation and Spatiotemporal Cluster 4D Fully-Connected Tensor Network Decomposition
by Wenyan Wei, Tao Ma, Meihui Li and Haorui Zuo
Remote Sens. 2024, 16(1), 34; https://doi.org/10.3390/rs16010034 - 21 Dec 2023
Cited by 1 | Viewed by 1377
Abstract
The detection of infrared dim and small targets in complex backgrounds is very challenging because of the low signal-to-noise ratio of targets and the drastic change in background. Low-rank sparse decomposition based on the structural characteristics of infrared images has attracted the attention [...] Read more.
The detection of infrared dim and small targets in complex backgrounds is very challenging because of the low signal-to-noise ratio of targets and the drastic change in background. Low-rank sparse decomposition based on the structural characteristics of infrared images has attracted the attention of many scholars because of its good interpretability. In order to improve the sensitivity of sliding window size, insufficient utilization of time series information, and inaccurate tensor rank in existing methods, a four-dimensional tensor model based on superpixel segmentation and statistical clustering is proposed for infrared dim and small target detection (ISTD). First, the idea of superpixel segmentation is proposed to eliminate the dependence of the algorithm on selection of the sliding window size. Second, based on the improved structure tensor theory, the image pixels are statistically clustered into three types: corner region, flat region, and edge region, and are assigned different weights to reduce the influence of background edges. Next, in order to better use spatiotemporal correlation, a Four-Dimensional Fully-Connected Tensor Network (4D-FCTN) model is proposed in which 3D patches with the same feature types are rearranged into the same group to form a four-dimensional tensor. Finally, the FCTN decomposition method is used to decompose the clustered tensor into low-dimensional tensors, with the alternating direction multiplier method (ADMM) used to decompose the low-rank background part and sparse target part. We validated our model across various datasets, employing cutting-edge methodologies to assess its effectiveness in terms of detection precision and reduction of background interference. A comparative analysis corroborated the superiority of our proposed approach over prevailing techniques. Full article
Show Figures

Figure 1

Figure 1
<p>Depiction of tensor network decomposition.</p>
Full article ">Figure 2
<p>Clustering and separated target images obtained using different patch sizes.</p>
Full article ">Figure 3
<p>Illustration of the sensitivity of the detection results to the sliding window size: (<b>a</b>) two original images with similar complexity(<b>b</b>) results of superpixel segmentation and (<b>c</b>) ROC curves obtained with sliding windows of different sizes.</p>
Full article ">Figure 4
<p>The whole process of sliding window selection for adaptive superpixel segmentation.</p>
Full article ">Figure 5
<p>Intra-cluster low-rank properties of different clusters (<b>a</b>) represents the original image, (<b>b</b>) represents the singular value curve of the point feature group tensor, (<b>c</b>) represents the singular value curve of the line feature group tensor, and (<b>d</b>) represents the singular value curve of the flat feature group tensor.</p>
Full article ">Figure 6
<p>Overall procedure of the Cluster 4D-FCTN model.</p>
Full article ">Figure 7
<p>Images (<b>a</b>–<b>e</b>) represent five real sequences employed in the experiments.</p>
Full article ">Figure 8
<p>ROC curves for Sequences 1 and 4 with respect to varying parameters. The first row corresponds to Sequence 1, while the second row represents Sequence 4. Column 1 pertains to patch size <math display="inline"><semantics> <msub> <mi>N</mi> <mn>1</mn> </msub> </semantics></math>, Column 2 encompasses different <math display="inline"><semantics> <msub> <mi>λ</mi> <mn>1</mn> </msub> </semantics></math> values, Column 3 shows various L values, and Column 4 concerns the temporal size <math display="inline"><semantics> <msub> <mi>N</mi> <mn>3</mn> </msub> </semantics></math>.</p>
Full article ">Figure 9
<p>Different scenes (<b>a</b>–<b>f</b>) and results (<b>g</b>–<b>l</b>).</p>
Full article ">Figure 10
<p>Scenarios and outcomes related to noise. (<b>a</b>–<b>f</b>) represents the white Gaussian noise image with standard deviation 10 added, and (<b>g</b>–<b>l</b>) represents its corresponding object detection results using the C4D-FCTN method. (<b>m</b>–<b>r</b>) represents the white Gaussian noise image with standard deviation 10 added, and (<b>s</b>–<b>x</b>) represents its corresponding object detection results using the C4D-FCTN method.</p>
Full article ">Figure 11
<p>Different testing methods were employed to detect the results on Sequences 1 to 5 in the three-dimensional surface.</p>
Full article ">Figure 12
<p>Detection outcomes for Sequences 1 to 5 employing seven different testing methodologies. The target is represented by a purple box, while a detection failure is indicated by a dotted box.</p>
Full article ">Figure 13
<p>ROC curves of the compared and proposed methods on different sequences.</p>
Full article ">Figure 14
<p>Detection results of the ablation study, showing ROC curves for 3DST-TCTN, 4D-FCTN, and C4D-FCTN on Sequences 1 to 5.</p>
Full article ">
29 pages, 9402 KiB  
Article
RBF-Based Camera Model Based on a Ray Constraint to Compensate for Refraction Error
by Jaehyun Kim, Chanyoung Kim, Seongwook Yoon, Taehyeon Choi and Sanghoon Sull
Sensors 2023, 23(20), 8430; https://doi.org/10.3390/s23208430 - 12 Oct 2023
Viewed by 1386
Abstract
A camera equipped with a transparent shield can be modeled using the pinhole camera model and residual error vectors defined by the difference between the estimated ray from the pinhole camera model and the actual three-dimensional (3D) point. To calculate the residual error [...] Read more.
A camera equipped with a transparent shield can be modeled using the pinhole camera model and residual error vectors defined by the difference between the estimated ray from the pinhole camera model and the actual three-dimensional (3D) point. To calculate the residual error vectors, we employ sparse calibration data consisting of 3D points and their corresponding 2D points on the image. However, the observation noise and sparsity of the 3D calibration points pose challenges in determining the residual error vectors. To address this, we first fit Gaussian Process Regression (GPR) operating robustly against data noise to the observed residual error vectors from the sparse calibration data to obtain dense residual error vectors. Subsequently, to improve performance in unobserved areas due to data sparsity, we use an additional constraint; the 3D points on the estimated ray should be projected to one 2D image point, called the ray constraint. Finally, we optimize the radial basis function (RBF)-based regression model to reduce the residual error vector differences with GPR at the predetermined dense set of 3D points while reflecting the ray constraint. The proposed RBF-based camera model reduces the error of the estimated rays by 6% on average and the reprojection error by 26% on average. Full article
(This article belongs to the Section Optical Sensors)
Show Figures

Figure 1

Figure 1
<p>Two models depict a camera behind a transparent shield. The left model utilizes tracing of the precise ray path through the shield but needs the shield information. The right model, a generalized camera model, simply maps from the 2D point to the ray without detailing the ray path. We adopt the right model, especially when the shield shape is unknown.</p>
Full article ">Figure 2
<p>Calibration procedure for the proposed camera model. We determined the parameters of the pinhole camera model to reduce the reprojection error on the image plane for the calibration data. Next, we calculated the 3D sparse residual error vectors observed when applying the calibration data to the pinhole camera model. Then, we fit Gaussian Process Regression (GPR) to the sparse residual error vectors to obtain dense residual error vectors. Finally, to improve performance in unobserved regions due to 3D data sparsity, we propose an additional objective for residual error vectors, <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>a</mi> <mi>y</mi> </mrow> </msub> </semantics></math> in the last row. Following the index in the figure, we provide one example of the ray constraint. (1) We obtain a ray from one image point <math display="inline"><semantics> <mi mathvariant="bold">x</mi> </semantics></math> using the pinhole camera model <math display="inline"><semantics> <mrow> <msup> <mi>π</mi> <mrow> <mo>−</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <mi mathvariant="bold">x</mi> <mo>)</mo> </mrow> </mrow> </semantics></math>. Then, we select two points <math display="inline"><semantics> <msub> <mi mathvariant="bold">w</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi mathvariant="bold">w</mi> <mn>2</mn> </msub> </semantics></math> from the observed near and far regions from the camera. (2) We correct <math display="inline"><semantics> <msub> <mi mathvariant="bold">w</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi mathvariant="bold">w</mi> <mn>2</mn> </msub> </semantics></math> to <math display="inline"><semantics> <msubsup> <mi mathvariant="bold">w</mi> <mn>1</mn> <mo>′</mo> </msubsup> </semantics></math> and <math display="inline"><semantics> <msubsup> <mi mathvariant="bold">w</mi> <mn>2</mn> <mo>′</mo> </msubsup> </semantics></math> using backward residual error vectors, which are accurate because the error vectors are in a well-observed region. (3) We obtain the ray through the corrected points <math display="inline"><semantics> <msubsup> <mi mathvariant="bold">w</mi> <mn>1</mn> <mo>′</mo> </msubsup> </semantics></math> and <math display="inline"><semantics> <msubsup> <mi mathvariant="bold">w</mi> <mn>2</mn> <mo>′</mo> </msubsup> </semantics></math>, which is a backward projected ray of the proposed camera model. If the forward residual error vector is correct, then the 3D points on the modified ray are projected again to the same 2D point <math display="inline"><semantics> <mi mathvariant="bold">x</mi> </semantics></math>. (4) To perform the forward projection, we apply the forward residual error vector to the sampled points on the corrected ray. (5)-correct, When the forward residual error vector is correct because it is obtained from the well-observed region, the projected 2D point is the same as the original 2D point <math display="inline"><semantics> <mi mathvariant="bold">x</mi> </semantics></math>. (5)-wrong, When the forward residual error vector is wrong because it is obtained from the unobserved region, the reconstructed 2D point is not the same as the original 2D point <math display="inline"><semantics> <mi mathvariant="bold">x</mi> </semantics></math>. To create accurate forward residual error vectors in the unobserved region, we calculated the inconsistency that measures the error of forward residual error vectors <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>a</mi> <mi>y</mi> </mrow> </msub> </semantics></math> in the figure. Finally, the RBF-based regression model is optimized by two objectives. The first is the objective to reduce the error vector difference with GPR, and the second is the ray constraint <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>a</mi> <mi>y</mi> </mrow> </msub> </semantics></math>.</p>
Full article ">Figure 3
<p>Forward and backward projections according to the proposed RBF-based camera model. The left side displays a forward projection, where the forward projection model estimates the projected point for the 3D points. The right side presents a backward projection where it outputs an estimated ray for 2D points. The forward projection applies the residual error vector for the input 3D point to correct it. The pinhole camera model performs forward projection. In backward projection, the input 2D point is first back-projected to the ray from the pinhole camera model. Then, the two points on the ray are selected, one near and one far from the camera. Then, the backward residual error vectors are applied to these points to obtain two modified 3D points. Finally, the ray through the two modified 3D points is the estimated 3D ray. For both projections, the residual error vector is obtained from the regression model.</p>
Full article ">Figure 4
<p>(<b>Left</b>) Sampled 3D points <math display="inline"><semantics> <msubsup> <mrow> <mo>{</mo> <msub> <mover accent="true"> <mi mathvariant="bold">w</mi> <mo>˜</mo> </mover> <mi>i</mi> </msub> <mo>}</mo> </mrow> <mrow> <mi>i</mi> </mrow> <mi>L</mi> </msubsup> </semantics></math> in Equation (<a href="#FD15-sensors-23-08430" class="html-disp-formula">15</a>). The position was chosen with a uniform interval for each <span class="html-italic">z</span>-axis. (<b>Right</b>) Control points of the RBF regression model obtained from the sampled 3D points in the left figure. The nearest point for each centroid of the <span class="html-italic">k</span>-means algorithm was used for the control point, where <span class="html-italic">k</span> is 80% of the sampled 3D points.</p>
Full article ">Figure 5
<p>Ray constraint: Following the residual camera model, we corrected the <math display="inline"><semantics> <msup> <mi>j</mi> <mrow> <mi>t</mi> <mi>h</mi> </mrow> </msup> </semantics></math> ray from the pinhole camera model <math display="inline"><semantics> <mrow> <mi>l</mi> <mo>(</mo> <msub> <mi mathvariant="bold">x</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </semantics></math> to the corrected ray <math display="inline"><semantics> <mrow> <mover accent="true"> <mi>l</mi> <mo>^</mo> </mover> <mrow> <mo>(</mo> <msub> <mi mathvariant="bold">x</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> with two sampled points in the ray from the pinhole camera model. These sampled points are obtained from the well-observed region where the residual error vectors are accurate. Then, we sampled 3D points <math display="inline"><semantics> <msub> <mover accent="true"> <mi mathvariant="bold">w</mi> <mo>¯</mo> </mover> <mi>p</mi> </msub> </semantics></math> on the corrected ray and applied the forward residual to obtain the corrected points <math display="inline"><semantics> <msub> <mover accent="true"> <mi mathvariant="bold">w</mi> <mo>¯</mo> </mover> <mi>p</mi> </msub> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mi mathvariant="bold">r</mi> <mrow> <mi>G</mi> </mrow> <mrow> <mo>(</mo> <mi>f</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <msub> <mover accent="true"> <mi mathvariant="bold">w</mi> <mo>¯</mo> </mover> <mi>p</mi> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math>. The ray constraint is the forward residual error vector <math display="inline"><semantics> <mrow> <msubsup> <mi mathvariant="bold">r</mi> <mrow> <mi>G</mi> </mrow> <mrow> <mo>(</mo> <mi>f</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <msub> <mover accent="true"> <mi mathvariant="bold">w</mi> <mo>¯</mo> </mover> <mi>p</mi> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math>, which should be on the ray from the pinhole camera model <math display="inline"><semantics> <mrow> <mi>l</mi> <mo>(</mo> <msub> <mi mathvariant="bold">x</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>Visualization of the residual error vectors in the proposed RBF-based camera model. We checked the forward residual error vectors by determining the difference in the GT residual error vector 7 m from the camera along the <span class="html-italic">z</span>-axis when the calibration data only exist at 1 and 9 m. Thus, the accuracy of the proposed residual error vector at 7 m is much better than in the other methods. The left figure, ’Choi et al 2023’, presents the residual error vectors from Choi et al. [<a href="#B14-sensors-23-08430" class="html-bibr">14</a>], which have a large error. When we used the RBF regression model with <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>e</mi> <mi>f</mi> </mrow> </msub> </semantics></math> only, which aims to approximate the result of the Gaussian Process Regression, the residual error vectors have a lower error than that found by Choi et al. [<a href="#B14-sensors-23-08430" class="html-bibr">14</a>], as depicted in the center. After applying <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>a</mi> <mi>y</mi> </mrow> </msub> </semantics></math> (the ray constraint), the error is further reduced.</p>
Full article ">Figure 7
<p>Transparent shields used in the experiments. We used three types of transparent shields: ’plane’, ’sphere’, and ’Dirty plane’ shapes. The red dot in each sub-figure represents the position of the camera. For each, we used two different shield shapes, which are listed in each row. The first and second columns include transparent shields with planar and spherical shapes. The third column provides the shield with the shape of the dirty plane. It resembles a plane but has an uneven thickness. The last column presents the enlarged view of the ’Dirty plane’ shield. The shield of the dirty plane has uneven outer surfaces that simulate the deformed outer surfaces caused by a harsh outer environment.</p>
Full article ">Figure 8
<p>(<b>Left</b>) Calibration of 3D points where the observations exist at 1 and 9 m only for the <span class="html-italic">z</span>-axis. (<b>Right</b>) Test 3D points used for the experiments range from 1 to 9 m along the <span class="html-italic">z</span>-axis and are denser for the <span class="html-italic">x</span>- and <span class="html-italic">y</span>-axes.</p>
Full article ">Figure 9
<p>Results along the noise levels. Each column corresponds to a type of transparent shield, including plane, sphere, and ‘Dirty plane’ shields. ‘Verbiest et al. 2020’ and ‘Choi et al. 2023’ in the figure represent Verbiest et al. [<a href="#B13-sensors-23-08430" class="html-bibr">13</a>] and Choi et al. [<a href="#B14-sensors-23-08430" class="html-bibr">14</a>], respectively. In the first row, we present the reprojection error. ‘RBF w. <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>e</mi> <mi>f</mi> </mrow> </msub> </semantics></math>’ indicates the proposed RBF-based camera model optimized solely using the reference objective outlined in Equation (<a href="#FD15-sensors-23-08430" class="html-disp-formula">15</a>). ‘RBF w. <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>e</mi> <mi>f</mi> </mrow> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>a</mi> <mi>y</mi> </mrow> </msub> </semantics></math>’ represents the proposed camera model optimized using both objectives as discussed in <a href="#sec3dot3dot3-sensors-23-08430" class="html-sec">Section 3.3.3</a>. The use of both objectives consistently yields minimal reprojection errors across all shield types. Moving to the second row, we observe a significant reduction in the error between forward and backward projection when both objectives are employed. In the last row, we evaluate the distance between the estimated ray and the ground-truth 3D point, focusing on the backward model exclusively. Therefore, we do not include the model utilizing both objectives in this analysis. Notably, ‘RBF w. <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>e</mi> <mi>f</mi> </mrow> </msub> </semantics></math>’ demonstrates superior performance compared to other methods.</p>
Full article ">Figure 10
<p>Results with a lower noise level between 0 and 0.2. ‘Miraldo et al. 2012’, ‘Verbiest et al. 2020’, and ‘Choi et al. 2023’ in the figure represent Miraldo et al. [<a href="#B11-sensors-23-08430" class="html-bibr">11</a>], Verbiest et al. [<a href="#B13-sensors-23-08430" class="html-bibr">13</a>], and Choi et al. [<a href="#B14-sensors-23-08430" class="html-bibr">14</a>], respectively. Miraldo et al. [<a href="#B11-sensors-23-08430" class="html-bibr">11</a>] reported low errors at low noise levels, but these errors increased significantly when the noise level reached 0.1 or 0.2. This phenomenon can be attributed to the optimization objective employed by Miraldo et al. [<a href="#B11-sensors-23-08430" class="html-bibr">11</a>], which lacks a constraint for the orthogonality of two vectors, namely, the moment and direction vectors in the Plücker coordinate system. As noise levels increase, the angles between these vectors deviate further from the ideal 90 degrees, as elucidated by Choi et al. [<a href="#B14-sensors-23-08430" class="html-bibr">14</a>], resulting in higher errors. With the exception of the Miraldo et al. [<a href="#B11-sensors-23-08430" class="html-bibr">11</a>] model, the proposed RBF-based camera model consistently outperforms other methods, aligning with the results presented in <a href="#sensors-23-08430-f009" class="html-fig">Figure 9</a>.</p>
Full article ">Figure 11
<p>Result for the data sparsity along the <span class="html-italic">x</span>- and <span class="html-italic">y</span>-axes with low noise level. ‘Miraldo et al. 2012’, ‘Verbiest et al. 2020’, and ‘Choi et al. 2023’ in the figure represent Miraldo et al. [<a href="#B11-sensors-23-08430" class="html-bibr">11</a>], Verbiest et al. [<a href="#B13-sensors-23-08430" class="html-bibr">13</a>], and Choi et al. [<a href="#B14-sensors-23-08430" class="html-bibr">14</a>], respectively. In cases where the shield shape is simple, the performance of the proposed RBF-based camera model closely resembles that of Miraldo et al. [<a href="#B11-sensors-23-08430" class="html-bibr">11</a>]. However, when dealing with the more complex ‘Dirty plane’ shield, the proposed RBF-based camera model consistently outperforms other methods. Furthermore, even as data sparsity increases, the proposed RBF-based camera model demonstrates robustness. The reprojection error of the proposed camera model depends on the distance between the GT 3D point. Since the proposed ray constraint requires an accurate estimated ray, the accuracy of the estimated ray directly affects the reprojection error.</p>
Full article ">Figure 12
<p>Results along data sparsity of <span class="html-italic">x</span>- and <span class="html-italic">y</span>-axes with large observation noise. ‘Verbiest et al. 2020’ and ‘Choi et al. 2023’ in the figure represent Verbiest et al. [<a href="#B13-sensors-23-08430" class="html-bibr">13</a>] and Choi et al. [<a href="#B14-sensors-23-08430" class="html-bibr">14</a>], respectively. The proposed RBF-based camera model demonstrates its effectiveness in terms of reprojection error and the error between forward and backward projections. This indicates that the proposed ray constraint remains robust even when dealing with sparse data. However, for simpler shield shapes like plane and sphere, the proposed reference constraint does not yield a significant performance improvement compared to Choi et al. [<a href="#B14-sensors-23-08430" class="html-bibr">14</a>]. In contrast, for the complex ‘Dirty plane’ shield, the proposed reference constraint consistently proves effective, irrespective of data sparsity.</p>
Full article ">Figure 13
<p>More sparse data along the <span class="html-italic">x</span>- and <span class="html-italic">y</span>-axes used in the experiment in <a href="#sec4dot6-sensors-23-08430" class="html-sec">Section 4.6</a>. The red dot in each sub-figure represents the position of the camera. In this analysis, we deliberately omitted some of the additional 3D points to evaluate the model’s robustness to data sparsity. The calibration data are presented from left to right, showcasing the complete dataset and then with 50%, 33%, and 25% of the remaining data points.</p>
Full article ">Figure 14
<p>These results pertain to the <math display="inline"><semantics> <msub> <mi>α</mi> <mrow> <mi>R</mi> <mi>a</mi> <mi>y</mi> </mrow> </msub> </semantics></math> parameter discussed in <a href="#sec3dot3dot3-sensors-23-08430" class="html-sec">Section 3.3.3</a>. The optimal value for <math display="inline"><semantics> <msub> <mi>α</mi> <mrow> <mi>R</mi> <mi>a</mi> <mi>y</mi> </mrow> </msub> </semantics></math> is found to be <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>×</mo> <msup> <mn>10</mn> <mn>4</mn> </msup> </mrow> </semantics></math>, indicating that the ray constraint holds greater significance than the reference objective. When the weight is increased to <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>×</mo> <msup> <mn>10</mn> <mn>6</mn> </msup> </mrow> </semantics></math>, the performance deteriorates, suggesting that the reference objective <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>e</mi> <mi>f</mi> </mrow> </msub> </semantics></math> prevents the residual error vectors from deviating too far from the observed errors of the pinhole camera model. We utilized <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>×</mo> <msup> <mn>10</mn> <mn>4</mn> </msup> </mrow> </semantics></math> for all other experiments.</p>
Full article ">Figure 15
<p>Results for changing the reference model to the interpolation method performed by Choi et al. [<a href="#B14-sensors-23-08430" class="html-bibr">14</a>]. ‘Choi et al. 2023’ in the figure represents Choi et al. [<a href="#B14-sensors-23-08430" class="html-bibr">14</a>]. We denote this as <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>e</mi> <mi>f</mi> <mo>−</mo> <mi>i</mi> <mi>n</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>p</mi> </mrow> </msub> </semantics></math>, representing the RBF regression model optimized using Choi et al.’s approach as the reference model. When we exclusively apply <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>e</mi> <mi>f</mi> <mo>−</mo> <mi>i</mi> <mi>n</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>p</mi> </mrow> </msub> </semantics></math> without the ray constraint (<math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>a</mi> <mi>y</mi> </mrow> </msub> </semantics></math>), the performance closely aligns with Choi et al.’s model [<a href="#B14-sensors-23-08430" class="html-bibr">14</a>]. However, when we incorporate <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>a</mi> <mi>y</mi> </mrow> </msub> </semantics></math> alongside <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>e</mi> <mi>f</mi> <mo>−</mo> <mi>i</mi> <mi>n</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>p</mi> </mrow> </msub> </semantics></math>, there is a significant decrease in reprojection error and the error between the forward and backward models. Notably, for the shield of the dirty plane, employing GPR as a reference model yields superior results.</p>
Full article ">Figure 16
<p>Results based on data sparsity along the <span class="html-italic">z</span>-axis: ‘Choi et al. 2023’ in the figure represents Choi et al. [<a href="#B14-sensors-23-08430" class="html-bibr">14</a>]. In each figure, the <span class="html-italic">x</span>-axis represents the level of data sparsity. As we move along the <span class="html-italic">x</span>-axis in each graph, the observations become sparser, ranging from 1 to 9 m, then further down to observations at 1, 5, and 9 m only, and finally at 1 and 9 m only along the <span class="html-italic">z</span>-axis. When the observations are dense, the ray constraint <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>a</mi> <mi>y</mi> </mrow> </msub> </semantics></math> does not have a significant impact. However, as the data become sparser, the effectiveness of <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>a</mi> <mi>y</mi> </mrow> </msub> </semantics></math> increases accordingly. On the other hand, <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mi>R</mi> <mi>e</mi> <mi>f</mi> </mrow> </msub> </semantics></math> is effective when the shape of the shield is complex.</p>
Full article ">Figure 17
<p>Results based on calibration data configuration: The <span class="html-italic">x</span>-axis represents the configuration of the calibration data, including observations spanning from 1 to 9 m, 1 and 9 m only, 1 and 5 m only, and 5 and 9 m only along the <span class="html-italic">z</span>-axis. In the case of the plane-shaped shield, the results for the configurations of 1 and 9 m only and 5 and 9 m only are similar. However, in all other cases, the 1 and 9 m only observation configuration consistently yields better results. This suggests that the ray constraint is more effective when the unobserved region (2 to 8 m) is situated in the well-observed region (1 and 9 m). This result is also attributed to the accuracy of the back-projected ray, which is more precise when the observation is at 1 and 9 m, especially when the shield is complex.</p>
Full article ">
Back to TopTop