[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (67)

Search Parameters:
Keywords = multiple RGB-D sensors

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 28445 KiB  
Article
Masonry and Pictorial Surfaces Study by Laser Diagnostics: The Case of the Diana’s House in Ostia Antica
by Valeria Spizzichino, Luisa Caneve, Antonella Docci, Massimo Francucci, Massimiliano Guarneri, Daniela Tarica and Claudia Tempesta
Appl. Sci. 2025, 15(4), 2172; https://doi.org/10.3390/app15042172 - 18 Feb 2025
Viewed by 301
Abstract
The aim of the present research is to validate the combined use, through data fusion, of a Laser Induced Fluorescence (LIF) scanning system and a radar scanner (RGB-ITR, Red Green Blue Imaging Topological Radar system), as a unique tool to address the need [...] Read more.
The aim of the present research is to validate the combined use, through data fusion, of a Laser Induced Fluorescence (LIF) scanning system and a radar scanner (RGB-ITR, Red Green Blue Imaging Topological Radar system), as a unique tool to address the need for non-invasive, rapid, and low-cost techniques for both diagnostic and operational needs. The integrated system has been applied to the House of Diana complex in Ostia Antica. The main diagnostic objective of this research was to trace the materials used in different phases of restoration, from antiquity to modernity, on both masonry and pictorial surfaces, to reconstruct the history of the building. Due to the significant interest in this insula, other studies have been recently carried out on the House of Diana, but they once again highlighted the necessity of multiple approaches and non-invasive methods capable of providing quasi-real-time answers, delivering point-by-point information on very large surfaces to overcome the limits related to representativeness of sampling. The data acquired by the RGB-ITR system are quantitative, allowing for morphological and 3-colour analysis of the investigated artwork. In this work, the sensor has been used to create coloured 3D models useful for structural assessments and for locating different classes of materials. In fact, the LIF maps, which integrate knowledge about the original constituent materials and previous conservation interventions, have been used as additional layers of the tridimensional models. Therefore, the method can direct possible new investigations and restoration actions, piecing together the history of the House of Diana to build for it a safer future. Full article
Show Figures

Figure 1

Figure 1
<p>External view of the insula of the House of Diana in Ostia Antica from the South-West.</p>
Full article ">Figure 2
<p>Plan of the House of Diana during the different building phases, from the earliest phase, first half of the 2nd century AD (<b>top</b>), to the intermediate phase, last quarter of the 2nd century AD (<b>center</b>), to the latest phase, 2nd half of the 3rd century AD (<b>bottom</b>) (Studio 3R, PA-OANT Photo Archive inv. 11,560–11,562).</p>
Full article ">Figure 3
<p>RGB-ITR system (<b>on the left</b>) and LIF scanning system (<b>on the right</b>) during the measurement campaign in House of Diana.</p>
Full article ">Figure 4
<p>(<b>a</b>) a picture of the west wall compared to (<b>b</b>) the raw fluorescence images collected at the eight wavelengths selected.</p>
Full article ">Figure 5
<p>K-means clustering in 7 groups performed on the LIF data collected on the west wall of Room A.</p>
Full article ">Figure 6
<p>Spatial distribution of polymeric acrylic protective on the west wall of Room A detected by LIF and overlapped with the 3D model produced by the RGB-ITR. A short movie can be found in the <a href="#app1-applsci-15-02172" class="html-app">Supplementary Materials</a> attached to the paper.</p>
Full article ">Figure 7
<p>Spatial distribution of polymeric consolidating on the ceiling of Room A detected by LIF and overlapped with the 3D model produced by the RGB-ITR. A short movie can be found in the <a href="#app1-applsci-15-02172" class="html-app">Supplementary Materials</a> attached to the paper.</p>
Full article ">Figure 8
<p>Result of the RGB-ITR data processing. (<b>a</b>) raw data acquired by the RGB-ITR scanner; (<b>b</b>) and image enhancement after the calibration procedure.</p>
Full article ">Figure 9
<p>West wall (<b>top</b>) and south wall (<b>bottom</b>) of Room A. Comparison between picture (<b>on the left</b>) and false colour LIF image (<b>on the right</b>).</p>
Full article ">Figure 10
<p>Fluorescence images at 8 different wavelengths collected by the LIF sensor. The lighter the areas are the more intense the fluorescence signal is (corresponding to a higher concentration of fluorescent material in that area).</p>
Full article ">Figure 11
<p>Results of the processing of LIF data of the north-west corner of Room B, compared with a picture (<b>on the left</b>). In false colours (<b>in the center</b>), deriving from the same procedure used for <a href="#applsci-15-02172-f009" class="html-fig">Figure 9</a> right, and in greyscale (<b>on the right</b>) to put in evidence consolidating treatment on the decorations.</p>
Full article ">Figure 12
<p>(<b>Top</b>): a screenshot of the HR 3D with native colours produced by RGB-ITR. (<b>Bottom</b>): after the data fusion processing, using a LIF map as additional texture layer.</p>
Full article ">
20 pages, 4820 KiB  
Article
Skeletal Data Matching and Merging from Multiple RGB-D Sensors for Room-Scale Distant Interaction with Multiple Surfaces
by Adrien Coppens and Valerie Maquil
Electronics 2025, 14(4), 790; https://doi.org/10.3390/electronics14040790 - 18 Feb 2025
Viewed by 209
Abstract
Using a commodity RGB-D sensor is a popular and cost-effective way to enable interaction at room scale, as such a device supports body tracking functionality at a reasonable price point. Even though the capabilities of such devices might be enough for applications like [...] Read more.
Using a commodity RGB-D sensor is a popular and cost-effective way to enable interaction at room scale, as such a device supports body tracking functionality at a reasonable price point. Even though the capabilities of such devices might be enough for applications like entertainment systems where a person plays in front of a television, this type of sensor is unfortunately sensitive to occlusions from objects or other people, who might be in the way in more sophisticated room-scale set-ups. One may use multiple RGB-D sensors and aggregate the collected data to address the occlusion problem, increase the tracking range, and improve accuracy. However, doing so requires the gathering of calibration information with regard to the sensors themselves and also regarding their relative placement on interactable surfaces. Another challenging consequence of relying on multiple sensors is the need to perform skeleton matching and merging based on their respective body tracking data (e.g., so that skeletons from different sensors but belonging to the same person are recognised as such). The present contribution focuses on approaches to tackling these issues. Ultimately, it contributes a working human interaction tracking system, leveraging multiple RGB-D sensors to provide unobtrusive and occlusion-resilient understanding capabilities. This constitutes a suitable basis for room-scale experiences such as those based on wall-sized displays. Full article
Show Figures

Figure 1

Figure 1
<p>A partial representation of our circular multi-display set-up, on which two Azure Kinect sensors are attached to provide human behaviour tracking capabilities. The need for calibrating the display set-up and the multi-camera arrangement are clarified by the yellow and purple dashed arrows, respectively.</p>
Full article ">Figure 2
<p>The skeleton matching problem. Skeletons from two distinct sensors must be matched to create a combined set of skeletons. Some skeletons are isolated since only one sensor can currently view them, while the majority of the people in the room are tracked by both sensors, creating overlapping skeletons. Reproduced from [<a href="#B1-electronics-14-00790" class="html-bibr">1</a>]. (<b>a</b>) Skeletons from two (green- and pink-coded) sensors. Notice the isolated skeletons: a green one in the top right and two pink ones at the bottom. (<b>b</b>) The merged skeletons following our skeletal data fusion algorithm, showing the successful pairing of overlapping skeletons but also the inclusion of isolated ones.</p>
Full article ">Figure 3
<p>Partial point cloud creation (<b>right</b>) from a depth image (<b>left</b>). A selection of three spots from the depth image is depicted by blue circles, which are mirrored on the image plane in the right image. Because these points are similar in colour on the depth image, they are also similar in terms of distance from the sensor (shown on the right picture as blue lines of comparable length). By combining the distance information with a projection of the image points (using the sensor’s intrinsic parameters), it is possible to generate points, forming a resulting point cloud. Reproduced from [<a href="#B1-electronics-14-00790" class="html-bibr">1</a>].</p>
Full article ">Figure 4
<p>Unity-based display arrangement calibration, with orange point clouds overlaid on the virtual display configuration. (<b>a</b>) A slight misalignment between the orange point cloud and the screen borders. (<b>b</b>) Proper alignment of the orange point cloud and the multi-display arrangement.</p>
Full article ">Figure 5
<p>Calibration results with our initial choices of (simple) methods. Points are coloured depending on which camera they originated from. (<b>a</b>) Using the skeleton-based approach. (<b>b</b>) Using the chequerboard approach.</p>
Full article ">Figure 6
<p>Calibration results using the ICP approach (raw and filtered points), with green and red (full) point clouds corresponding to two separate sensors. (<b>a</b>) With raw point clouds. (<b>b</b>) With filtered point clouds.</p>
Full article ">
21 pages, 12015 KiB  
Article
Segment Any Leaf 3D: A Zero-Shot 3D Leaf Instance Segmentation Method Based on Multi-View Images
by Yunlong Wang and Zhiyong Zhang
Sensors 2025, 25(2), 526; https://doi.org/10.3390/s25020526 - 17 Jan 2025
Viewed by 371
Abstract
Exploring the relationships between plant phenotypes and genetic information requires advanced phenotypic analysis techniques for precise characterization. However, the diversity and variability of plant morphology challenge existing methods, which often fail to generalize across species and require extensive annotated data, especially for 3D [...] Read more.
Exploring the relationships between plant phenotypes and genetic information requires advanced phenotypic analysis techniques for precise characterization. However, the diversity and variability of plant morphology challenge existing methods, which often fail to generalize across species and require extensive annotated data, especially for 3D datasets. This paper proposes a zero-shot 3D leaf instance segmentation method using RGB sensors. It extends the 2D segmentation model SAM (Segment Anything Model) to 3D through a multi-view strategy. RGB image sequences captured from multiple viewpoints are used to reconstruct 3D plant point clouds via multi-view stereo. HQ-SAM (High-Quality Segment Anything Model) segments leaves in 2D, and the segmentation is mapped to the 3D point cloud. An incremental fusion method based on confidence scores aggregates results from different views into a final output. Evaluated on a custom peanut seedling dataset, the method achieved point-level precision, recall, and F1 scores over 0.9 and object-level mIoU and precision above 0.75 under two IoU thresholds. The results show that the method achieves state-of-the-art segmentation quality while offering zero-shot capability and generalizability, demonstrating significant potential in plant phenotyping. Full article
(This article belongs to the Section Smart Agriculture)
Show Figures

Figure 1

Figure 1
<p>Overview of method. The input image sequence undergoes multi-view stereo (MVS) reconstruction, which is based on feature point matching algorithms to align images from multiple viewpoints and reconstruct a 3D point cloud. HQ-SAM-based image segmentation is then applied to extract instance segmentation from each image. The 2D grouping information is mapped into 3D space through a querying process, and the results are incrementally merged to generate the final instance segmentation.</p>
Full article ">Figure 2
<p>Point cloud denoising process: (<b>a</b>) the original point cloud reconstructed using COLMAP; (<b>b</b>) the point cloud after voxel downsampling; (<b>c</b>) fitting the turntable using RANSAC and adjusting the coordinate system; (<b>d</b>) the point cloud after pass-through filtering; (<b>e</b>) the point cloud after color filtering; (<b>f</b>) the point cloud after statistical filtering.</p>
Full article ">Figure 3
<p>Mask filtering process. (<b>a</b>): The original RGB image; (<b>b</b>): the automatic masks generated by HQ-SAM; (<b>c</b>): the masks after saturation filtering; (<b>d</b>): the masks after overlap removal; (<b>e</b>): the masks after shape filtering. All samples used the same parameters. The (<b>b</b>) process averaged 7.95 s on a GeForce RTX 3090 GPU (NVIDIA, Santa Clara, CA, USA), while the (<b>c</b>–<b>e</b>) processes averaged 1.15 s, 6.19 s, and 2.50 s on an Intel(R) Xeon(R) Silver 4210 CPU (Intel Corporation, Santa Clara, CA, USA).</p>
Full article ">Figure 4
<p>Illustration of different views. (<b>a</b>): The angular relationship between different viewpoints and the pixel plane. A and B are two different camera viewpoints. Green indicates that the viewpoint is better for the surface at this location, while red represents a worse viewpoint; (<b>b</b>): the calculation function for the viewpoint factor.</p>
Full article ">Figure 5
<p>Illustration of merging conflict groups.</p>
Full article ">Figure 6
<p>Visualization of leaf instance segmentation. The point cloud quality of sample 1 is low, and there are many holes present. Sample 11 has higher point cloud quality.</p>
Full article ">Figure 7
<p>The qualitative instance segmentation comparison between three methods. For consistency with the visualizations of PlantNet and PSegNet, the results from our method were downsampled to a similar point cloud size of approximately 4096 points, and stem areas were included.</p>
Full article ">Figure 8
<p>Leaf instance segmentation quantity analysis: (<b>a</b>) Analysis of the instance count with buds compared to the ground truth; (<b>b</b>) analysis of the instance count without buds compared to the ground truth. The size of each bubble in the bubble chart represents the sample size, and the red dashed line indicates the reference fitting line <math display="inline"><semantics> <mrow> <mi>y</mi> <mo>=</mo> <mi>x</mi> </mrow> </semantics></math>.</p>
Full article ">Figure 9
<p>Visualization of instance segmentation on DTU dataset.</p>
Full article ">Figure 10
<p>The variations in metrics and point cloud quantity with the number of merges.</p>
Full article ">
13 pages, 2243 KiB  
Article
IGAF: Incremental Guided Attention Fusion for Depth Super-Resolution
by Athanasios Tragakis, Chaitanya Kaul, Kevin J. Mitchell, Hang Dai, Roderick Murray-Smith and Daniele Faccio
Sensors 2025, 25(1), 24; https://doi.org/10.3390/s25010024 - 24 Dec 2024
Viewed by 495
Abstract
Accurate depth estimation is crucial for many fields, including robotics, navigation, and medical imaging. However, conventional depth sensors often produce low-resolution (LR) depth maps, making detailed scene perception challenging. To address this, enhancing LR depth maps to high-resolution (HR) ones has become essential, [...] Read more.
Accurate depth estimation is crucial for many fields, including robotics, navigation, and medical imaging. However, conventional depth sensors often produce low-resolution (LR) depth maps, making detailed scene perception challenging. To address this, enhancing LR depth maps to high-resolution (HR) ones has become essential, guided by HR-structured inputs like RGB or grayscale images. We propose a novel sensor fusion methodology for guided depth super-resolution (GDSR), a technique that combines LR depth maps with HR images to estimate detailed HR depth maps. Our key contribution is the Incremental guided attention fusion (IGAF) module, which effectively learns to fuse features from RGB images and LR depth maps, producing accurate HR depth maps. Using IGAF, we build a robust super-resolution model and evaluate it on multiple benchmark datasets. Our model achieves state-of-the-art results compared to all baseline models on the NYU v2 dataset for ×4, ×8, and ×16 upsampling. It also outperforms all baselines in a zero-shot setting on the Middlebury, Lu, and RGB-D-D datasets. Code, environments, and models are available on GitHub. Full article
(This article belongs to the Special Issue Convolutional Neural Network Technology for 3D Imaging and Sensing)
Show Figures

Figure 1

Figure 1
<p>Overview of the proposed multi-modal architecture for the guided depth super resolution estimation.</p>
Full article ">Figure 2
<p>The proposed multi-modal architecture utilizes information from both an LR depth map and an HR RGB image. Firstly, each modality passes through a convolutional layer followed by a LeakyReLU activation. The model utilizes the IGAF modules to combine information from the two modalities by fusing the relevant information on each stream and ignoring information that is unrelated to the depth maps. Finally, after the third IGAF module, the depth maps are refined and added using a global skip connection from the original upsampled LR depth maps. The RGB modality is used to provide guidance to estimate an HR depth map given an LR one.</p>
Full article ">Figure 3
<p>The <math display="inline"><semantics> <mi mathvariant="bold">IGAF</mi> </semantics></math> module. The module is responsible for both feature extraction and modality fusion. Each modality passes through a feature extraction stage <math display="inline"><semantics> <mrow> <mo>(</mo> <mi mathvariant="bold">FWF</mi> <mo>)</mo> </mrow> </semantics></math> before the initial naive fusion by an element-wise multiplication. An <math display="inline"><semantics> <mi mathvariant="bold">SAF</mi> </semantics></math> block follows, which fuses the result of the multiplication with the extracted features of the RGB stream creating an initial structural guidance. The second <math display="inline"><semantics> <mi mathvariant="bold">SAF</mi> </semantics></math> block incrementally fuses this extracted structural guidance with the depth stream. The output of each <math display="inline"><semantics> <mi mathvariant="bold">SAF</mi> </semantics></math> block is generated by learning attention weights and subsequently performing a cross-multiplication operation between the two input sequences, resulting in fused and salient processed information.</p>
Full article ">Figure 4
<p>Overview of the <math display="inline"><semantics> <mi mathvariant="bold">FWF</mi> </semantics></math> module. The two modules are separated and not combined into one larger module because the propagation of shallower features through the skip connections as seen in <a href="#sensors-25-00024-f003" class="html-fig">Figure 3</a> boosts the performance of the model. The <math display="inline"><semantics> <mi mathvariant="bold">FE</mi> </semantics></math> module is a series of convolutional layers, a channel attention process, and two skip connections. The <math display="inline"><semantics> <mi mathvariant="bold">WF</mi> </semantics></math> module uses linearly increasing dilation rates in convolutional layers to extract multi-resolution features.</p>
Full article ">Figure 5
<p>Qualitative comparison between our model and SUFT [<a href="#B24-sensors-25-00024" class="html-bibr">24</a>]. The visualizations shown are for the <math display="inline"><semantics> <mrow> <mo>×</mo> <mn>8</mn> </mrow> </semantics></math> case. Our model creates more complete depth maps as seen in (<b>c</b>) for rows 1 and 2. In (<b>c</b>), row 3 shows that our model creates sharper edges with minimal bleeding. Also, in (<b>c</b>), row 4 the proposed model creates less smoothing with less bleeding. (Colormap chosen for better visualization. Better seen in full-screen, with zoom-in options).</p>
Full article ">
26 pages, 28365 KiB  
Article
Three-Dimensional Geometric-Physical Modeling of an Environment with an In-House-Developed Multi-Sensor Robotic System
by Su Zhang, Minglang Yu, Haoyu Chen, Minchao Zhang, Kai Tan, Xufeng Chen, Haipeng Wang and Feng Xu
Remote Sens. 2024, 16(20), 3897; https://doi.org/10.3390/rs16203897 - 20 Oct 2024
Cited by 1 | Viewed by 1023
Abstract
Environment 3D modeling is critical for the development of future intelligent unmanned systems. This paper proposes a multi-sensor robotic system for environmental geometric-physical modeling and the corresponding data processing methods. The system is primarily equipped with a millimeter-wave cascaded radar and a multispectral [...] Read more.
Environment 3D modeling is critical for the development of future intelligent unmanned systems. This paper proposes a multi-sensor robotic system for environmental geometric-physical modeling and the corresponding data processing methods. The system is primarily equipped with a millimeter-wave cascaded radar and a multispectral camera to acquire the electromagnetic characteristics and material categories of the target environment and simultaneously employs light detection and ranging (LiDAR) and an optical camera to achieve a three-dimensional spatial reconstruction of the environment. Specifically, the millimeter-wave radar sensor adopts a multiple input multiple output (MIMO) array and obtains 3D synthetic aperture radar images through 1D mechanical scanning perpendicular to the array, thereby capturing the electromagnetic properties of the environment. The multispectral camera, equipped with nine channels, provides rich spectral information for material identification and clustering. Additionally, LiDAR is used to obtain a 3D point cloud, combined with the RGB images captured by the optical camera, enabling the construction of a three-dimensional geometric model. By fusing the data from four sensors, a comprehensive geometric-physical model of the environment can be constructed. Experiments conducted in indoor environments demonstrated excellent spatial-geometric-physical reconstruction results. This system can play an important role in various applications, such as environment modeling and planning. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>In-house-developed FUSEN system hardware configuration. (<b>a</b>) System physical diagram; (<b>b</b>) System architecture diagram.</p>
Full article ">Figure 2
<p>System attitude control. (<b>a1</b>,<b>b1</b>,<b>c1</b>) Electric translation stage moves 0 cm, 10 cm, and 20 cm; (<b>a2</b>,<b>b2</b>,<b>c2</b>) Turntable rotates 0°, 45°, and 90°; (<b>a3</b>,<b>b3</b>,<b>c3</b>) Pitch stage tilt 30°, 45°, and 60°.</p>
Full article ">Figure 3
<p>Sensor selection. (<b>a</b>) 77G Millimeter-wave Cascade RF Board; (<b>b</b>) Vision Star pixel-level mosaic imaging spectrometer; (<b>c</b>) Mid-70 LiDAR; (<b>d</b>) JHUMs series USB3.0 industrial camera.</p>
Full article ">Figure 4
<p>FUSEN workflow diagram.</p>
Full article ">Figure 5
<p>Millimeter-wave cascaded radar antenna array measurement scheme.</p>
Full article ">Figure 6
<p>Schematic diagram of antenna array equivalent channel. (<b>a</b>) Schematic diagram of real aperture size distribution; (<b>b</b>) Schematic diagram of equivalent aperture size; (<b>c</b>) Schematic diagram of overlapping equivalent aperture size; (<b>d</b>) Interval of equivalent aperture (in m).</p>
Full article ">Figure 7
<p>Comparison of the 9-channel multispectral camera with camera. (<b>a</b>) RGB 3-band filters; (<b>b</b>) Multispectral camera 9-band filters.</p>
Full article ">Figure 8
<p>Multispectral inversion flowchart.</p>
Full article ">Figure 9
<p>The preliminary registration of RGB camera and LiDAR. (<b>a</b>) Experimental scene of RGB camera and LiDAR registration; (<b>b</b>) Preliminary registration results of RGB camera and LiDAR.</p>
Full article ">Figure 10
<p>Lidar and RGB camera data fusion flow diagram.</p>
Full article ">Figure 11
<p>The accurate registration of RGB camera and LiDAR by proposed method. (<b>a</b>) Point clouds pixelated; (<b>b</b>) Image edge extraction.</p>
Full article ">Figure 12
<p>Comparison of results. (<b>a</b>) The registration by hand–eye calibration method; (<b>b</b>) The accurate registration by proposed method.</p>
Full article ">Figure 13
<p>Experiment on obtaining registration matrix of millimeter-wave radar and LiDAR. (<b>a</b>) Horizontal–vertical imaging results; (<b>b</b>) 3D point cloud results; (<b>c</b>) Millimeter-wave point cloud (pcd); (<b>d</b>) Point cloud fusion results.</p>
Full article ">Figure 14
<p>Experiment on verifying registration matrix of millimeter-wave radar and LiDAR. (<b>a</b>) Experimental results; (<b>b</b>) Horizontal–vertical plane (1.52 m); (<b>c</b>) Horizontal–vertical plane (1.22 m); (<b>d</b>) Horizontal–vertical plane projection; (<b>e</b>) Horizontal-range plane imaging results; (<b>f</b>) Corner reflector millimeter-wave point cloud; (<b>g</b>) Millimeter-wave point cloud (pcd); (<b>h</b>) LiDAR point cloud.</p>
Full article ">Figure 15
<p>Millimeter-wave radar and LiDAR fusion results. (<b>a</b>) Front view of fusion result; (<b>b</b>) Side view of fusion result.</p>
Full article ">Figure 16
<p>Schematic diagram of data collection at different locations.</p>
Full article ">Figure 17
<p>Millimeter-wave imaging results of targets of different materials. (<b>a1</b>) Optical picture of metal plate at angle 1; (<b>b1</b>) Millimeter-wave imaging result of metal plate at angle 1; (<b>c1</b>) Optical picture of metal plate at angle 2; (<b>d1</b>) Millimeter-wave imaging result of metal plate at angle 2; The meaning of (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>) and (<b>a3</b>,<b>b3</b>,<b>c3</b>,<b>d3</b>) are the same as above, facing glass and cement, respectively.</p>
Full article ">Figure 18
<p>Scattering intensity as a function of view angle.</p>
Full article ">Figure 19
<p>Geometric-physical reconstruction result for small scene. (<b>a</b>) RGB image; (<b>b</b>) Multispectral image inversion pseudo-color image; (<b>c</b>) Multispectral recognition results; (<b>d</b>) Millimeter-wave imaging results; (<b>e</b>) Geometric-physical reconstruction results; (<b>f</b>) Electromagnetic scattering characteristic curve matching results.</p>
Full article ">Figure 20
<p>Geometric-physical reconstruction result for large scene.</p>
Full article ">Figure 21
<p>Material recognition results. The single column represents the optical scene, and the double column represents the recognition result.</p>
Full article ">Figure 22
<p>Confusion matrix. (<b>a</b>) Precision confusion matrix; (<b>b</b>) Recall confusion matrix.</p>
Full article ">Figure A1
<p>Reconstruction result of an entire wall of the office.</p>
Full article ">Figure A2
<p>Reconstruction result of an entire wall of the corridor.</p>
Full article ">Figure A3
<p>Reconstruction result of an entire wall of the building.</p>
Full article ">Figure A4
<p>Reconstruction result of a small wall.</p>
Full article ">Figure A5
<p>Reconstruction result of a small wall.</p>
Full article ">
18 pages, 9438 KiB  
Article
High-Throughput and Accurate 3D Scanning of Cattle Using Time-of-Flight Sensors and Deep Learning
by Gbenga Omotara, Seyed Mohamad Ali Tousi, Jared Decker, Derek Brake and G. N. DeSouza
Sensors 2024, 24(16), 5275; https://doi.org/10.3390/s24165275 - 14 Aug 2024
Viewed by 1263
Abstract
We introduce a high-throughput 3D scanning system designed to accurately measure cattle phenotypes. This scanner employs an array of depth sensors, i.e., time-of-flight (ToF) sensors, each controlled by dedicated embedded devices. The sensors generate high-fidelity 3D point clouds, which are automatically stitched using [...] Read more.
We introduce a high-throughput 3D scanning system designed to accurately measure cattle phenotypes. This scanner employs an array of depth sensors, i.e., time-of-flight (ToF) sensors, each controlled by dedicated embedded devices. The sensors generate high-fidelity 3D point clouds, which are automatically stitched using a point could segmentation approach through deep learning. The deep learner combines raw RGB and depth data to identify correspondences between the multiple 3D point clouds, thus creating a single and accurate mesh that reconstructs the cattle geometry on the fly. In order to evaluate the performance of our system, we implemented a two-fold validation process. Initially, we quantitatively tested the scanner for its ability to determine accurate volume and surface area measurements in a controlled environment featuring known objects. Next, we explored the impact and need for multi-device synchronization when scanning moving targets (cattle). Finally, we performed qualitative and quantitative measurements on cattle. The experimental results demonstrate that the proposed system is capable of producing high-quality meshes of untamed cattle with accurate volume and surface area measurements for livestock studies. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) A schematic representation of the scanning system. (<b>b</b>) Real-life figure of the camera frame and the system components.</p>
Full article ">Figure 2
<p>Overview of the software pipeline: The pipeline begins with the data acquisition of RGBD data, which undergo a segmentation and filtering step to eliminate the background pixels and noise in both depth and RGB space. The filtered data are subsequently backprojected into 3D space and then stitched to form a unified 3D model. A mesh is then constructed over the 3D point cloud. Finally, we measure our traits of interest, volume, and surface area.</p>
Full article ">Figure 3
<p>Schematic layout of Server–Client: In this configuration, the Client sends a capture request to 10 Server programs. Each Server program performs the image acquisition request from the Client and the captured data are transmitted to a storage device.</p>
Full article ">Figure 4
<p><b>Mask R-CNN Architecture</b> [<a href="#B9-sensors-24-05275" class="html-bibr">9</a>]: Mask R-CNN builds upon two existing Faster R-CNN heads as detailed in [<a href="#B10-sensors-24-05275" class="html-bibr">10</a>,<a href="#B11-sensors-24-05275" class="html-bibr">11</a>]. The left and right panels illustrate the heads for the ResNet C4 and FPN backbones, respectively, with an added mask branch. Spatial resolution and channels are indicated by the numbers, while arrows represent conv, deconv, or FC layers, inferred from the context (conv layers maintain spatial dimensions, whereas deconv layers increase them). All conv layers are 3 × 3, except for the output conv which is 1 × 1. Deconv layers are 2 × 2 with a stride of 2, and ReLU [<a href="#B12-sensors-24-05275" class="html-bibr">12</a>] is used in hidden layers. On the left, ‘res5’ refers to the fifth stage of ResNet, which has been modified so that the first conv layer operates on a 7 × 7 RoI with a stride of 1 (instead of 14 × 14 with a stride of 2 as in [<a href="#B10-sensors-24-05275" class="html-bibr">10</a>]). On the right, ‘×4’ indicates a stack of four consecutive conv layers.</p>
Full article ">Figure 5
<p>Multi-view point cloud registration: (<b>a</b>) Given N = 6 point clouds, we perform a simple pairwise registration of point cloud fragments of the scanned cattle. (<b>b</b>) We use the Colored ICP algorithm to solve for the coordinate transformation from camera coordinate frame j to camera coordinate frame i (denoted as <math display="inline"><semantics> <mrow> <msup> <mo> </mo> <mi>i</mi> </msup> <msub> <mi>H</mi> <mi>j</mi> </msub> </mrow> </semantics></math>). Each view is aligned into the coordinate frame of its adjacent camera. We fix the coordinate frame of Camera 1 (<math display="inline"><semantics> <msub> <mi>V</mi> <mn>1</mn> </msub> </semantics></math>) as the world coordinate frame and then align all views with respect to coordinate frame 1. (<b>c</b>) This results in a well-aligned point cloud of the scanned cattle.</p>
Full article ">Figure 6
<p>Comparison of 3D point cloud capture quality with and without synchronization using a large box with known dimensions. The left image displays the results without synchronization (0 μs), capturing a total of 17,098 points. The right image shows the same box captured with synchronization (160 μs) with all other settings the same, resulting in a total of 38,631 points, illustrating the significant improvement in data acquisition quality. (<b>a</b>) Large box, 0 s delay, <span class="html-italic">n</span> = 17,098. (<b>b</b>) Large box, 160 μs delay, <span class="html-italic">n</span> = 38,631.</p>
Full article ">Figure 7
<p>Results of scanning a cylindrical object in multiple orientations, highlighting the scanner’s accuracy across diverse poses. The horizontal axis displays the predicted volumes and surface areas obtained in each test. Given that the same object was used throughout, the ground truth volume and surface area remain constant. This plot demonstrates the scanner’s precision, as evidenced by the close alignment of the predicted values with the consistent ground truths, illustrating the system’s reliability in varying orientations. (<b>a</b>) Surface area calculation results. (<b>b</b>) Volume calculation results.</p>
Full article ">Figure 8
<p>Regression analysis of predicted versus known surface area and volume for multiple static objects. The plots displays the correlation between the scanner’s predicted values and the actual measurements for a cylinder, small box, medium box, and large box, all placed in the same pose across 10 consecutive scans. The high <math display="inline"><semantics> <msup> <mi>R</mi> <mn>2</mn> </msup> </semantics></math> values of 0.997 for surface area and 0.999 for volume demonstrate the scanner’s accuracy and consistency in various object dimensions and shapes under controlled conditions. (<b>a</b>) Surface area calculation results. (<b>b</b>) Volume calculation results.</p>
Full article ">Figure 9
<p>Performance of the scanner under direct sunlight, using a standard box to simulate outdoor livestock scanning conditions. The graphs show the mean and standard deviation of volume and surface area measurements across 10 consecutive scans. The results here illustrate the slight impact of sunlight on the scanner’s infrared sensors, affecting measurement accuracy. (<b>a</b>) Surface area calculation results from data collected in sunlight. (<b>b</b>) Volume calculation results from data collected in sunlight.</p>
Full article ">Figure 10
<p>Segmentation of cattle using combined RGB and depth models via Mask R-CNN: The figure shows an RGBD image of cattle segmented using both RGB and depth data. Results from each model are integrated using a voting arbitrator, resulting in a well-defined segmentation in both modalities.</p>
Full article ">Figure 11
<p>Poisson reconstructed meshes of cattle from which we compute the surface area and volume estimates.</p>
Full article ">
22 pages, 1932 KiB  
Review
Smart Nursing Wheelchairs: A New Trend in Assisted Care and the Future of Multifunctional Integration
by Zhewen Zhang, Peng Xu, Chengjia Wu and Hongliu Yu
Biomimetics 2024, 9(8), 492; https://doi.org/10.3390/biomimetics9080492 - 14 Aug 2024
Cited by 3 | Viewed by 2036
Abstract
As a significant technological innovation in the fields of medicine and geriatric care, smart care wheelchairs offer a novel approach to providing high-quality care services and improving the quality of care. The aim of this review article is to examine the development, applications [...] Read more.
As a significant technological innovation in the fields of medicine and geriatric care, smart care wheelchairs offer a novel approach to providing high-quality care services and improving the quality of care. The aim of this review article is to examine the development, applications and prospects of smart nursing wheelchairs, with particular emphasis on their assistive nursing functions, multiple-sensor fusion technology, and human–machine interaction interfaces. First, we describe the assistive functions of nursing wheelchairs, including position changing, transferring, bathing, and toileting, which significantly reduce the workload of nursing staff and improve the quality of care. Second, we summarized the existing multiple-sensor fusion technology for smart nursing wheelchairs, including LiDAR, RGB-D, ultrasonic sensors, etc. These technologies give wheelchairs autonomy and safety, better meeting patients’ needs. We also discussed the human–machine interaction interfaces of intelligent care wheelchairs, such as voice recognition, touch screens, and remote controls. These interfaces allow users to operate and control the wheelchair more easily, improving usability and maneuverability. Finally, we emphasized the importance of multifunctional-integrated care wheelchairs that integrate assistive care, navigation, and human–machine interaction functions into a comprehensive care solution for users. We are looking forward to the future and assume that smart nursing wheelchairs will play an increasingly important role in medicine and geriatric care. By integrating advanced technologies such as enhanced artificial intelligence, intelligent sensors, and remote monitoring, we expect to further improve patients’ quality of care and quality of life. Full article
Show Figures

Figure 1

Figure 1
<p>The required care services for disabled older people based on the dimensions of their living care needs, their primary care needs, their health needs, and the top five entries for each dimension, including items in the psychological well-being dimension [<a href="#B7-biomimetics-09-00492" class="html-bibr">7</a>,<a href="#B8-biomimetics-09-00492" class="html-bibr">8</a>,<a href="#B10-biomimetics-09-00492" class="html-bibr">10</a>].</p>
Full article ">Figure 2
<p>Typical assisted-lifting robots. (<b>a</b>) ReChair. (<b>b</b>) HLPR Chair. (<b>c</b>) The AgileLife Patient Transfer System and Strong Arm. (<b>d</b>) The Piggyback Transfer Robot. (<b>e</b>) Integration of an electric care bed and an electric reclining wheelchair. (<b>f</b>) Transferring the patient assisted by slipmat.</p>
Full article ">Figure 3
<p>Installation of the I-Support system in clinical environment for experimental validation. The devices constituting the overall system are presented. (<b>a</b>) Amphiro b1 water flow and temperature sensor. (<b>b</b>) General aspect of the system showing the motorized chair, the soft robotic arm, and the installation of the Kinect sensors (for audio–gestural communication). (<b>c</b>) Air temperature, humidity, and illumination sensors by Cube Sensors. (<b>d</b>) Smartwatch for user identification and activity tracking.</p>
Full article ">Figure 4
<p>(<b>a</b>) Robotics Care’s om Poseidon (<b>b</b>) The multi-functional bathing robot. (<b>c</b>) The actual prototype of intelligent bath care system. (<b>d</b>) Assistive walker with passive sit-to-stand mechanism. (<b>e</b>) Self-reliance transfer support robot for home-based care. (<b>f</b>) Principle prototype of the intelligent toilet wheelchair.</p>
Full article ">Figure 5
<p>Future trends predicted by ChatGPT on multifunctional intelligent care wheelchairs.</p>
Full article ">
15 pages, 5816 KiB  
Article
Automated Destination Renewal Process for Location-Based Robot Errands
by Woo-Jin Lee and Sang-Seok Yun
Appl. Sci. 2024, 14(13), 5671; https://doi.org/10.3390/app14135671 - 28 Jun 2024
Viewed by 704
Abstract
In this paper, we propose a new approach for service robots to perform delivery tasks in indoor environments, including map-building and the automatic renewal of destinations for navigation. The first step involves converting the available floor plan (i.e., CAD drawing) of a new [...] Read more.
In this paper, we propose a new approach for service robots to perform delivery tasks in indoor environments, including map-building and the automatic renewal of destinations for navigation. The first step involves converting the available floor plan (i.e., CAD drawing) of a new space into a grid map that the robot can navigate. The system then segments the space in the map and generates movable initial nodes through a generalized Voronoi graph (GVG) thinning process. As the second step, we perform room segmentation from the grid map of the indoor environment and classify each space. Next, when the delivery object is recognized while searching the set space using the laser and RGB-D sensor, the system automatically updates itself to a position that makes it easier to grab objects, taking into consideration geometric relationships with surrounding obstacles. Also, the system supports the robot to autonomously explore the space where the user’s errand can be performed by hierarchically linking recognized objects and spatial information. Experiments related to map generation, estimating space from the recognized objects, and destination node updates were conducted from CAD drawings of buildings with actual multiple floors and rooms, and the performance of each stage of the process was evaluated. From the quantitative evaluation of each stage, the proposed system confirmed the potential of partial automation in performing location-based robot services. Full article
(This article belongs to the Section Robotics and Automation)
Show Figures

Figure 1

Figure 1
<p>The overall flowchart of the automated destination renewal process.</p>
Full article ">Figure 2
<p>Example of obstacle update when above threshold: (<b>a</b>) before the laser scan, (<b>b</b>) after the laser scan.</p>
Full article ">Figure 3
<p>Example of morphological segmentation: (<b>a</b>) initial map, (<b>b</b>) eroded map, (<b>c</b>) grouped free cell, (<b>d</b>) segmented map.</p>
Full article ">Figure 4
<p>Example of GVG generated from the grid map.</p>
Full article ">Figure 5
<p>Hierarchical organization based on the connectivity of the indoor environment.</p>
Full article ">Figure 6
<p>Example of grid map creation based on CAD drawing: (<b>a</b>) original CAD drawing, (<b>b</b>) image after removing dimension and annotation, (<b>c</b>) image transformed into a grid map.</p>
Full article ">Figure 7
<p>Results of space segmentation for each floor in the engineering building.</p>
Full article ">Figure 8
<p>Results of grid map update using sensed obstacles: (<b>a</b>) segmented grid map of a single space, (<b>b</b>) grid map after the update, (<b>c</b>) actual room image with furniture.</p>
Full article ">Figure 9
<p>Experimental results of space classification: (<b>a</b>) conference room, (<b>b</b>) corridor, (<b>c</b>) pantry, (<b>d</b>) hospital room, (<b>e</b>) office, and (<b>f</b>) reception.</p>
Full article ">Figure 10
<p>Experimental results of space segmentation and classification in each space.</p>
Full article ">Figure 11
<p>Example of node update of errand destination according to obstacle change: (<b>a</b>) map without a chair, (<b>b</b>) map with a chair.</p>
Full article ">
18 pages, 6500 KiB  
Article
NSVDNet: Normalized Spatial-Variant Diffusion Network for Robust Image-Guided Depth Completion
by Jin Zeng and Qingpeng Zhu
Electronics 2024, 13(12), 2418; https://doi.org/10.3390/electronics13122418 - 20 Jun 2024
Cited by 1 | Viewed by 996
Abstract
Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance [...] Read more.
Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance of aligned Red–Green–Blue (RGB) images. Recent approaches have achieved a remarkable improvement, but the performance will degrade severely due to the corruption in input sparse depth. To enhance robustness to input corruption, we propose a novel depth completion scheme based on a normalized spatial-variant diffusion network incorporating measurement uncertainty, which introduces the following contributions. First, we design a normalized spatial-variant diffusion (NSVD) scheme to apply spatially varying filters iteratively on the sparse depth conditioned on its certainty measure for excluding depth corruption in the diffusion. In addition, we integrate the NSVD module into the network design to enable end-to-end training of filter kernels and depth reliability, which further improves the structural detail preservation via the guidance of RGB semantic features. Furthermore, we apply the NSVD module hierarchically at multiple scales, which ensures global smoothness while preserving visually salient details. The experimental results validate the advantages of the proposed network over existing approaches with enhanced performance and noise robustness for depth completion in real-use scenarios. Full article
(This article belongs to the Special Issue Image Sensors and Companion Chips)
Show Figures

Figure 1

Figure 1
<p>Example in NYUv2 dataset [<a href="#B12-electronics-13-02418" class="html-bibr">12</a>]. (<b>a</b>) RGB image input, (<b>b</b>) sparse depth input, depth estimation with (<b>c</b>) PNCNN [<a href="#B6-electronics-13-02418" class="html-bibr">6</a>] using single depth, (<b>d</b>) MiDaS [<a href="#B7-electronics-13-02418" class="html-bibr">7</a>] using single RGB, (<b>e</b>) NLSPN [<a href="#B11-electronics-13-02418" class="html-bibr">11</a>], and (<b>f</b>) proposed NSVDNet using both RGB and depth. As highlighted in the black rectangles, (<b>f</b>) NSVDNet generates more accurate structural details than (<b>e</b>) NLSPN due to the uncertainty-aware diffusion scheme. The results are evaluated using RMSE metric, where (<b>f</b>) NSVDNet achieves the smallest RMSE, indicating improved accuracy.</p>
Full article ">Figure 2
<p>An overview of NSVDNet architecture to predict a dense depth from a disturbed sparse depth with RGB guidance. NSVDNet is composed of the depth-dominant branch, which estimates the initial dense depth from the sparse sensor depth, and the RGB-dominant branch, which generates the semantic structural features. The two branches are fused in the hierarchical NSVD modules, where the initial dense depth is diffused with spatial-variant diffusion kernels constructed from RGB features.</p>
Full article ">Figure 3
<p>Depth completion with different algorithms, tested on NYUv2 dataset. As highlighted in the red rectangles, the proposed NSVDNet achieves more accurate depth completion results with detail preservation and noise robustness.</p>
Full article ">Figure 4
<p>Comparison of depth completion with original sparse depth and noisy sparse depth with 50% outliers, tested on NYUv2 dataset. The comparison between results with original and noisy inputs demonstrates the robustness to input corruption for the proposed method. The selected patches are enlarged in the colored rectangles.</p>
Full article ">Figure 5
<p>Generalization ability evaluation tests on TetrasRGBD dataset with outliers. The certainty maps explain the robustness of NSVDNet to input corruptions.</p>
Full article ">Figure 6
<p>Generalization ability evaluation tests on TetrasRGBD dataset with real sensor data, where the proposed NSVDNet generates more accurate depth estimation than competitive methods, including PNCNN [<a href="#B38-electronics-13-02418" class="html-bibr">38</a>] and NLSPN [<a href="#B11-electronics-13-02418" class="html-bibr">11</a>].</p>
Full article ">
26 pages, 19577 KiB  
Article
Enhancing Building Point Cloud Reconstruction from RGB UAV Data with Machine-Learning-Based Image Translation
by Elisabeth Johanna Dippold and Fuan Tsai
Sensors 2024, 24(7), 2358; https://doi.org/10.3390/s24072358 - 8 Apr 2024
Cited by 1 | Viewed by 1594
Abstract
The performance of three-dimensional (3D) point cloud reconstruction is affected by dynamic features such as vegetation. Vegetation can be detected by near-infrared (NIR)-based indices; however, the sensors providing multispectral data are resource intensive. To address this issue, this study proposes a two-stage framework [...] Read more.
The performance of three-dimensional (3D) point cloud reconstruction is affected by dynamic features such as vegetation. Vegetation can be detected by near-infrared (NIR)-based indices; however, the sensors providing multispectral data are resource intensive. To address this issue, this study proposes a two-stage framework to firstly improve the performance of the 3D point cloud generation of buildings with a two-view SfM algorithm, and secondly, reduce noise caused by vegetation. The proposed framework can also overcome the lack of near-infrared data when identifying vegetation areas for reducing interferences in the SfM process. The first stage includes cross-sensor training, model selection and the evaluation of image-to-image RGB to color infrared (CIR) translation with Generative Adversarial Networks (GANs). The second stage includes feature detection with multiple feature detector operators, feature removal with respect to the NDVI-based vegetation classification, masking, matching, pose estimation and triangulation to generate sparse 3D point clouds. The materials utilized in both stages are a publicly available RGB-NIR dataset, and satellite and UAV imagery. The experimental results indicate that the cross-sensor and category-wise validation achieves an accuracy of 0.9466 and 0.9024, with a kappa coefficient of 0.8932 and 0.9110, respectively. The histogram-based evaluation demonstrates that the predicted NIR band is consistent with the original NIR data of the satellite test dataset. Finally, the test on the UAV RGB and artificially generated NIR with a segmentation-driven two-view SfM proves that the proposed framework can effectively translate RGB to CIR for NDVI calculation. Further, the artificially generated NDVI is able to segment and classify vegetation. As a result, the generated point cloud is less noisy, and the 3D model is enhanced. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Two-stage framework separated into the machine learning technique (gray) and the application in three steps (green). Firstly (gray), the CIR image is generated from RGB with image-to-image translation. Then (light green), the NDVI is calculated with the generated NIR and red band. Afterwards, (medium green), the NDVI segmentation and classification is used to match the detected features accordingly. Finally (dark green), pose estimation and triangulation are used to generate a sparse 3D point cloud.</p>
Full article ">Figure 2
<p>First stage of the two-stage workflow. (<b>a</b>) Image-to-image translation in 5 steps for RGB2CIR simulation. In general, input and pre-processing (orange), training and testing (green) and verification and validation (yellow) (<b>b</b>) Image-to-image translation training.</p>
Full article ">Figure 3
<p>Framework second stage: segmentation-driven two-view SfM algorithm. The processing steps are grouped by color, the NDVI related processing (green), the input, feature detection (orange), feature processing (yellow) and the output (blue).</p>
Full article ">Figure 4
<p>Pleiades VHR satellite imagery, with the nadir view in true color (RGB). The location of the study target is marked in orange and used for validation (see <a href="#sec3dot2dot3-sensors-24-02358" class="html-sec">Section 3.2.3</a>).</p>
Full article ">Figure 5
<p>The target for validation captured by Pleiades VHR satellite. (<b>a</b>) The target stadium; (<b>b</b>) the geolocation of the target (marked in orange in <a href="#sensors-24-02358-f004" class="html-fig">Figure 4</a>); (<b>c</b>) the target ground truth (GT) CIR image. GT NDVI of the target building and its vicinity.</p>
Full article ">Figure 6
<p>Morphological changes on the image covering the target and image tiles. (<b>a</b>) Original cropped CIR image of Pleiades Satellite Imagery (1024 × 1024 × 3). A single tile, the white rectangle in (<b>a</b>), is shown as (<b>e</b>). (<b>b</b>–<b>d</b>) and (<b>f</b>–<b>i</b>) are the morphed images of (<b>a</b>) and (<b>e</b>), respectively.</p>
Full article ">Figure 7
<p>Training over 200 epochs for model selection. The generator loss (loss GEN) plotted in orange and, in contrast, FID calculation results in blue.</p>
Full article ">Figure 8
<p>Training Pix2Pix for model selection with FID. The epochs with the best FID and CM are marked for every test run, expect overall, with colored bars respectivly. The numbers are summarized in <a href="#sensors-24-02358-t005" class="html-table">Table 5</a>.</p>
Full article ">Figure 9
<p>CIR pansharpening on the target. The high-resolution panchromatic image is used to increase the resolution of the composite CIR image while preserving spectral information. From top to bottom, (<b>a</b>) panchromatic, (<b>b</b>) color infrared created from multi-spectral bands, and (<b>c</b>) pansharpened color infrared are shown.</p>
Full article ">Figure 10
<p>Example of vegetation feature removal to the north of the stadium. (<b>a</b>) CIR images; (<b>b</b>) NDVI image with legend; (<b>c</b>) identified SURF features (yellos asterisks) within dense vegetated areas (green) using 0.6 as the threshold.</p>
Full article ">Figure 11
<p>Comparison between the prediction and the ground truth (GT) of the CIR, NIR and NDVI (incl. legend) of the main target (a stadium) and vicinity.</p>
Full article ">Figure 12
<p>Comparison between the prediction and the ground truth (GT) of the CIR, NIR and NDVI generated from a pansharpened RGB satellite sub-image.</p>
Full article ">Figure 13
<p>Histogram and visual inspection of the CIR and NDVI simulated using MS and PAN images on the target stadium. (<b>a</b>–<b>c</b>) Ground truth (GT) and NDVI predicted using one tile with the size of 256 × 256 from MS Pleiades and their histograms. (<b>d</b>–<b>f</b>) Ground truth of CIR, NIR, NDVI and predicted NIR and NDVI images from nine tiles of the PAN Pleiades images and histograms for NDVI comparison.</p>
Full article ">Figure 14
<p>Histogram and visual inspection of MS (<b>I</b>–<b>III</b>) and PAN (<b>IV</b>–<b>VI</b>) examples of Zhubei city.</p>
Full article ">Figure 15
<p>Prediction of CIR, NIR and calculated NDVI of a UAV scene: (<b>a</b>) RGB, (<b>b</b>) predicted CIR image, (<b>c</b>) the extracted NIR band of (<b>b</b>), and (<b>d</b>) calculated NDVI with NIR and red band. A close-up view of the area marked with an orange box in (<b>a</b>) is displayed as two 256 × 256 tiles in RGB (<b>e</b>) and the predicted CIR (<b>f</b>).</p>
Full article ">Figure 15 Cont.
<p>Prediction of CIR, NIR and calculated NDVI of a UAV scene: (<b>a</b>) RGB, (<b>b</b>) predicted CIR image, (<b>c</b>) the extracted NIR band of (<b>b</b>), and (<b>d</b>) calculated NDVI with NIR and red band. A close-up view of the area marked with an orange box in (<b>a</b>) is displayed as two 256 × 256 tiles in RGB (<b>e</b>) and the predicted CIR (<b>f</b>).</p>
Full article ">Figure 16
<p>Direct comparison between without (<b>a</b>) and with vegetation segmentation (<b>b</b>). Areas of low density shown in blue, areas of high density shown in red.</p>
Full article ">Figure 17
<p>Two−view SfM 3D sparse point cloud without the application of NDVI−based vegetation removal on the target CSRSR. (<b>a</b>) Sparse point cloud with no further coloring; (<b>b</b>) point cloud colored by elevation; (<b>c</b>) density analysis and the corresponding histogram (<b>d</b>). In addition, Table (<b>e</b>) shows the accumulated number of points over the three operators (SURF, ORB and FAST) and the initial and manually cleaned and processed point cloud.</p>
Full article ">Figure 18
<p>Two−view SfM reconstructed 3D sparse point cloud with vegetation segmentation and removal process based on simulated NDVI of the target building. (<b>a</b>) Sparse point cloud with no further coloring; (<b>b</b>) point cloud colored by elevation; (<b>c</b>) density analysis and (<b>d</b>) the histogram. In addition, (<b>e</b>) lists the accumulated number of points over the three operators (SURF, ORB and FAST) after segmentation, with 0.5 NDVI as the threshold to mask vegetation in SURF and ORB, and the initial and manually cleaned point cloud.</p>
Full article ">
17 pages, 30409 KiB  
Article
Data Fusion of RGB and Depth Data with Image Enhancement
by Lennard Wunsch, Christian Görner Tenorio, Katharina Anding, Andrei Golomoz and Gunther Notni
J. Imaging 2024, 10(3), 73; https://doi.org/10.3390/jimaging10030073 - 21 Mar 2024
Cited by 2 | Viewed by 2467
Abstract
Since 3D sensors became popular, imaged depth data are easier to obtain in the consumer sector. In applications such as defect localization on industrial objects or mass/volume estimation, precise depth data is important and, thus, benefits from the usage of multiple information sources. [...] Read more.
Since 3D sensors became popular, imaged depth data are easier to obtain in the consumer sector. In applications such as defect localization on industrial objects or mass/volume estimation, precise depth data is important and, thus, benefits from the usage of multiple information sources. However, a combination of RGB images and depth images can not only improve our understanding of objects, capacitating one to gain more information about objects but also enhance data quality. Combining different camera systems using data fusion can enable higher quality data since disadvantages can be compensated. Data fusion itself consists of data preparation and data registration. A challenge in data fusion is the different resolutions of sensors. Therefore, up- and downsampling algorithms are needed. This paper compares multiple up- and downsampling methods, such as different direct interpolation methods, joint bilateral upsampling (JBU), and Markov random fields (MRFs), in terms of their potential to create RGB-D images and improve the quality of depth information. In contrast to the literature in which imaging systems are adjusted to acquire the data of the same section simultaneously, the laboratory setup in this study was based on conveyor-based optical sorting processes, and therefore, the data were acquired at different time periods and different spatial locations. Data assignment and data cropping were necessary. In order to evaluate the results, root mean square error (RMSE), signal-to-noise ratio (SNR), correlation (CORR), universal quality index (UQI), and the contour offset are monitored. With JBU outperforming the other upsampling methods, achieving a meanRMSE = 25.22, mean SNR = 32.80, mean CORR = 0.99, and mean UQI = 0.97. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

Figure 1
<p>Setup used for 3D and RGB imaging.</p>
Full article ">Figure 2
<p>Example objects utilized: (<b>a</b>) stone-A, (<b>b</b>) stone-B, and the (<b>c</b>) validation coin.</p>
Full article ">Figure 3
<p>Results of stone one (object). (<b>a</b>) False color images of depth shadows imaged by the 3D laser line scanner (height legend: yellow &gt; 0 <math display="inline"><semantics> <mi mathvariant="normal">m</mi> </semantics></math><math display="inline"><semantics> <mi mathvariant="normal">m</mi> </semantics></math>, and purple = 0 <math display="inline"><semantics> <mi mathvariant="normal">m</mi> </semantics></math><math display="inline"><semantics> <mi mathvariant="normal">m</mi> </semantics></math>) and the (<b>b</b>) improved depth map.</p>
Full article ">Figure 4
<p>Flowchart of the data acquisition process.</p>
Full article ">Figure 5
<p>Flowchart of the data fusion process.</p>
Full article ">Figure 6
<p>RGB-D point cloud of the validation coin fused by JBU.</p>
Full article ">Figure 7
<p>RGB-D point cloud after data fusion, for example, stones-A (<b>a</b>) and -B (<b>b</b>) fused by JBU.</p>
Full article ">Figure 8
<p>Results of object stone-A. False color image depth maps during the fusion process (<b>a</b>,<b>c</b>,<b>e</b>,<b>g</b>) and the edge correlation of the depth map and RGB image (<b>b</b>,<b>d</b>,<b>f</b>,<b>h</b>). (<b>a</b>,<b>b</b>) show the process result of the cropped depth data, with minimized depth shadows; (<b>c</b>,<b>d</b>) show the process result of the depth data after the synthesis; (<b>e</b>,<b>f</b>) show the process result after JBU data fusion; (<b>g</b>,<b>h</b>) show the process result after the feature extraction process.</p>
Full article ">
15 pages, 1207 KiB  
Article
From Movements to Metrics: Evaluating Explainable AI Methods in Skeleton-Based Human Activity Recognition
by Kimji N. Pellano, Inga Strümke and Espen A. F. Ihlen
Sensors 2024, 24(6), 1940; https://doi.org/10.3390/s24061940 - 18 Mar 2024
Cited by 4 | Viewed by 1503
Abstract
The advancement of deep learning in human activity recognition (HAR) using 3D skeleton data is critical for applications in healthcare, security, sports, and human–computer interaction. This paper tackles a well-known gap in the field, which is the lack of testing in the applicability [...] Read more.
The advancement of deep learning in human activity recognition (HAR) using 3D skeleton data is critical for applications in healthcare, security, sports, and human–computer interaction. This paper tackles a well-known gap in the field, which is the lack of testing in the applicability and reliability of XAI evaluation metrics in the skeleton-based HAR domain. We have tested established XAI metrics, namely faithfulness and stability on Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) to address this problem. This study introduces a perturbation method that produces variations within the error tolerance of motion sensor tracking, ensuring the resultant skeletal data points remain within the plausible output range of human movement as captured by the tracking device. We used the NTU RGB+D 60 dataset and the EfficientGCN architecture for HAR model training and testing. The evaluation involved systematically perturbing the 3D skeleton data by applying controlled displacements at different magnitudes to assess the impact on XAI metric performance across multiple action classes. Our findings reveal that faithfulness may not consistently serve as a reliable metric across all classes for the EfficientGCN model, indicating its limited applicability in certain contexts. In contrast, stability proves to be a more robust metric, showing dependability across different perturbation magnitudes. Additionally, CAM and Grad-CAM yielded almost identical explanations, leading to closely similar metric outcomes. This suggests a need for the exploration of additional metrics and the application of more diverse XAI methods to broaden the understanding and effectiveness of XAI in skeleton-based HAR. Full article
Show Figures

Figure 1

Figure 1
<p>Illustration of perturbing a point P(x, y, z) in 3D space to a new position P′(x′, y′, z′) using spherical coordinates. The perturbation magnitude is represented by <span class="html-italic">r</span>, with azimuthal angle <math display="inline"><semantics> <mi>θ</mi> </semantics></math> and polar angle <math display="inline"><semantics> <mi>ϕ</mi> </semantics></math>.</p>
Full article ">Figure 2
<p>The EfficientGCN pipeline showing the variables for calculating faithfulness and stability. Perturbation is performed in the Data Preprocess stage.</p>
Full article ">Figure 3
<p>(<b>left</b>) CAM, Grad-CAM, and baseline random attributions for a data instance in ‘standing up’ (class 8), averaged for all frames and normalized. The color gradient denotes the score intensity: blue indicates 0, and progressing to red indicates a score of 1; (<b>right</b>) the numerical values of the attribution scores, with k denoting the body point number.</p>
Full article ">Figure 4
<p>Evaluation metric outcomes for ‘Writing’ (Class 11, i.e., the weakest class), showing CAM (blue), Grad-CAM (orange), and the random (green) methods for (<b>a</b>) PGI, (<b>b</b>) PGU, (<b>c</b>) RISb, (<b>d</b>) RISj, (<b>e</b>) RISv, (<b>f</b>) ROS, and (<b>g</b>) RRS. The <span class="html-italic">y</span>-axis measures the metric values, while the <span class="html-italic">x</span>-axis shows the perturbation magnitude. CAM and Grad-CAM graphs overlap due to extremely similar metric outcomes.</p>
Full article ">Figure 5
<p>Evaluation metric outcomes for ‘Jump Up’ (Class 26, i.e., the strongest class), showing CAM (blue), Grad-CAM (orange), and the random (green) methods for (<b>a</b>) PGI, (<b>b</b>) PGU, (<b>c</b>) RISb, (<b>d</b>) RISj, (<b>e</b>) RISv, (<b>f</b>) ROS, and (<b>g</b>) RRS. The <span class="html-italic">y</span>-axis measures the metric values, while the <span class="html-italic">x</span>-axis shows the perturbation magnitude. CAM and Grad-CAM graphs overlap due to extremely similar metric outcomes.</p>
Full article ">Figure A1
<p>Evaluation metric outcomes for ‘checking time on watch’ (Class 32), showing CAM (blue), Grad-CAM (orange), and the random (green) methods for (<b>a</b>) PGI, (<b>b</b>) PGU, (<b>c</b>) RISb, (<b>d</b>) RISj, (<b>e</b>) RISv, (<b>f</b>) ROS, and (<b>g</b>) RRS. Despite increasing perturbation magnitudes, CAM and Grad-CAM exhibit only marginally better performance compared with the random method in terms of PGI. This suggests that the expected correlation between increasing the perturbation magnitude of important features and significant changes in prediction output may not consistently apply to this particular case. Conversely, for PGU, CAM and Grad-CAM demonstrate more effective identification of unimportant features compared with the random method. As perturbation magnitude increases, the random method results in a significantly larger discrepancy between the prediction probabilities of the original and perturbed data.</p>
Full article ">Figure A2
<p>Evaluation metric outcomes for ‘clapping’ (Class 9), showing CAM (blue), Grad-CAM (orange), and the random (green) methods for (<b>a</b>) PGI, (<b>b</b>) PGU, (<b>c</b>) RISb, (<b>d</b>) RISj, (<b>e</b>) RISv, (<b>f</b>) ROS, and (<b>g</b>) RRS. Similar to class 32, the PGI results for class 9 show only a slight difference in performance between CAM and Grad-CAM versus the random method, even as perturbation magnitude increases. The PGU results still echo those in class 32, with CAM and Grad-CAM outperforming the random method in distinguishing unimportant features.</p>
Full article ">
16 pages, 9434 KiB  
Article
Omnidirectional-Sensor-System-Based Texture Noise Correction in Large-Scale 3D Reconstruction
by Wenya Xie and Xiaoping Hong
Sensors 2024, 24(1), 78; https://doi.org/10.3390/s24010078 - 22 Dec 2023
Viewed by 1186
Abstract
The evolution of cameras and LiDAR has propelled the techniques and applications of three-dimensional (3D) reconstruction. However, due to inherent sensor limitations and environmental interference, the reconstruction process often entails significant texture noise, such as specular highlight, color inconsistency, and object occlusion. Traditional [...] Read more.
The evolution of cameras and LiDAR has propelled the techniques and applications of three-dimensional (3D) reconstruction. However, due to inherent sensor limitations and environmental interference, the reconstruction process often entails significant texture noise, such as specular highlight, color inconsistency, and object occlusion. Traditional methodologies grapple to mitigate such noise, particularly in large-scale scenes, due to the voluminous data produced by imaging sensors. In response, this paper introduces an omnidirectional-sensor-system-based texture noise correction framework for large-scale scenes, which consists of three parts. Initially, we obtain a colored point cloud with luminance value through LiDAR points and RGB images organization. Next, we apply a voxel hashing algorithm during the geometry reconstruction to accelerate the computation speed and save the computer memory. Finally, we propose the key innovation of our paper, the frame-voting rendering and the neighbor-aided rendering mechanisms, which effectively eliminates the aforementioned texture noise. From the experimental results, the processing rate of one million points per second shows its real-time applicability, and the output figures of texture optimization exhibit a significant reduction in texture noise. These results indicate that our framework has advanced performance in correcting multiple texture noise in large-scale 3D reconstruction. Full article
(This article belongs to the Special Issue Sensing and Processing for 3D Computer Vision: 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Specular highlight phenomenon. (<b>b</b>) The position of the highlight areas in the image changes with the variation of the sensor pose. In the image, the red box indicates the most prominent highlight noise, and the green box indicates the door, which serves as a positional reference.</p>
Full article ">Figure 2
<p>Color inconsistency phenomenon. P1–P3 are three consecutive images in terms of position. (<b>a</b>) Normal situation with consistent color between frames. (<b>b</b>) Inconsistent color between frames caused by variations in the intensity of the light source or changes in its relative position to the sensor.</p>
Full article ">Figure 3
<p>Pipeline of the whole process, consisting of data organization, geometry reconstruction, and texture optimization.</p>
Full article ">Figure 4
<p>Process flow of data organization. (<b>a</b>) RGB image. (<b>b</b>) CIELAB color space image transformed from RGB image, which facilitates luminance evaluation in the subsequent section of our work. (<b>c</b>) LiDAR point cloud. (<b>d</b>) Fusion of LiDAR point cloud with RGB image. (<b>e</b>) Fusion of LiDAR point cloud with CIELAB color space image.</p>
Full article ">Figure 5
<p>Voxel hashing schematic. The mapping between point coordinates and voxel block indices is achieved through a hash table, thereby efficiently allocating points while making reasonable use of computer storage resources.</p>
Full article ">Figure 6
<p>Motivation for proposing neighbor-aided rendering mechanism: points are randomly distributed in voxels; thus, some voxels lack insufficient points for self-optimization.</p>
Full article ">Figure 7
<p>Neighbor-aided rendering mechanism. The figure illustrates the configuration of a voxel block and the interconnections between adjacent voxels.</p>
Full article ">Figure 8
<p>Sensor setup for data collection.</p>
Full article ">Figure 9
<p>Input data. The dataset consists of four spots, and each spot consists of five specified poses.</p>
Full article ">Figure 10
<p>Highlight noise correction in scene 1 according to frame-voting rendering. Regions (<b>a</b>)–(<b>c</b>) present specular highlights phenomenon on the screen and wall surfaces in the scene.</p>
Full article ">Figure 11
<p>Elimination of object occlusion in scene 2 with frame-voting rendering. (<b>a</b>) Comparison diagram of the elimination of misimaging caused by table occlusion. (<b>b</b>) Comparison diagram of the elimination of misimaging caused by chair occlusion.</p>
Full article ">Figure 12
<p>Enhanced outcome with neighbor-aided optimization. Regions A–C exhibite pronounced contrastive effects. (<b>a</b>) Demonstration area of the original point cloud containing numerous types of texture noise. (<b>b</b>) The result optimized using only frame-voting rendering. (<b>c</b>) The result optimized further with neighbor-aided rendering.</p>
Full article ">Figure 13
<p>Comparing results of highlight removal method. (<b>a</b>) Projection of raw model (input). The white boxes indicate areas with noise that should be corrected. The red box indicates area that should not be corrected (lights). (<b>b</b>) Projection of texture optimized model (ours). (<b>c</b>) Yang et al. (2010) [<a href="#B2-sensors-24-00078" class="html-bibr">2</a>]. (<b>d</b>) Shen et al. (2013) [<a href="#B3-sensors-24-00078" class="html-bibr">3</a>]. (<b>e</b>) Fu et al. (2019) [<a href="#B4-sensors-24-00078" class="html-bibr">4</a>]. (<b>f</b>) Jin et al. (2023) [<a href="#B8-sensors-24-00078" class="html-bibr">8</a>].</p>
Full article ">
24 pages, 8567 KiB  
Article
Multi-Sensor Fusion Simultaneous Localization Mapping Based on Deep Reinforcement Learning and Multi-Model Adaptive Estimation
by Ching-Chang Wong, Hsuan-Ming Feng and Kun-Lung Kuo
Sensors 2024, 24(1), 48; https://doi.org/10.3390/s24010048 - 21 Dec 2023
Cited by 6 | Viewed by 3156
Abstract
In this study, we designed a multi-sensor fusion technique based on deep reinforcement learning (DRL) mechanisms and multi-model adaptive estimation (MMAE) for simultaneous localization and mapping (SLAM). The LiDAR-based point-to-line iterative closest point (PLICP) and RGB-D camera-based ORBSLAM2 methods were utilized to estimate [...] Read more.
In this study, we designed a multi-sensor fusion technique based on deep reinforcement learning (DRL) mechanisms and multi-model adaptive estimation (MMAE) for simultaneous localization and mapping (SLAM). The LiDAR-based point-to-line iterative closest point (PLICP) and RGB-D camera-based ORBSLAM2 methods were utilized to estimate the localization of mobile robots. The residual value anomaly detection was combined with the Proximal Policy Optimization (PPO)-based DRL model to accomplish the optimal adjustment of weights among different localization algorithms. Two kinds of indoor simulation environments were established by using the Gazebo simulator to validate the multi-model adaptive estimation localization performance, which is used in this paper. The experimental results of the proposed method in this study confirmed that it can effectively fuse the localization information from multiple sensors and enable mobile robots to obtain higher localization accuracy than the traditional PLICP and ORBSLAM2. It was also found that the proposed method increases the localization stability of mobile robots in complex environments. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

Figure 1
<p>Architecture diagram of the proposed multi-sensor fusion-based simultaneous localization and mapping (SLAM) localization system.</p>
Full article ">Figure 2
<p>PLICP flow chart.</p>
Full article ">Figure 3
<p>UKF trajectory of PLICP. (<b>a</b>) Full trajectory. (<b>b</b>) <span class="html-italic">x</span>-axis trajectory. (<b>c</b>) <span class="html-italic">y</span>-axis trajectory.</p>
Full article ">Figure 4
<p>UKF trajectory of ORBSLAM2. (<b>a</b>) Full trajectory. (<b>b</b>) <span class="html-italic">x</span>-axis trajectory. (<b>c</b>) <span class="html-italic">y</span>-axis trajectory.</p>
Full article ">Figure 5
<p>Two-Tailed validation.</p>
Full article ">Figure 6
<p>Original multi-model estimation schematic.</p>
Full article ">Figure 7
<p>Mobile robot oscillates when turning a corner. (<b>a</b>) <span class="html-italic">x</span>-axis trajectory of PLICP. (<b>b</b>) <span class="html-italic">x</span>-axis residual of PLICP. (Red circle refer to the turning part).</p>
Full article ">Figure 8
<p>Multi-model adaptive estimation architecture.</p>
Full article ">Figure 9
<p><span class="html-italic">x</span>-axis trajectory.</p>
Full article ">Figure 10
<p><span class="html-italic">y</span>-axis trajectory.</p>
Full article ">Figure 11
<p><span class="html-italic">x</span>-axis residual value. (<b>a</b>) <span class="html-italic">x</span>-axis difference. (<b>b</b>) <span class="html-italic">x</span>-axis residual of PLICP. (<b>c</b>) <span class="html-italic">x</span>-axis residual of ORBSLAM2.</p>
Full article ">Figure 12
<p><span class="html-italic">y</span>-axis residual value. (<b>a</b>) <span class="html-italic">y</span>-axis difference. (<b>b</b>) <span class="html-italic">y</span>-axis residual of PLICP. (Red circle refer to the unstable response) (<b>c</b>) <span class="html-italic">y</span>-axis residual of ORBSLAM2. (Red circle refer to the stable response).</p>
Full article ">Figure 13
<p>Mobile robot oscillates when turning a corner. (<b>a</b>) <span class="html-italic">z</span>-value of PLICP. (<b>b</b>) <span class="html-italic">z</span>-value of ORBSLAM2.</p>
Full article ">Figure 14
<p>y-axis weight adjustment.</p>
Full article ">Figure 15
<p>Overall trajectory of PLICP failure events.</p>
Full article ">Figure 16
<p>DRL structure.</p>
Full article ">Figure 17
<p>Result of reward function.</p>
Full article ">Figure 18
<p>Simulation Scene 1.</p>
Full article ">Figure 19
<p>Occupancy grid mapping for Simulation Scene 1.</p>
Full article ">Figure 20
<p>Minimal error map for original PLICP area. (The green point is the start point of the mobile robot, the red point is the end point of the mobile robot, and the red box is the main area that causes the error).</p>
Full article ">Figure 21
<p>Minimal error map of PLICP area after improvement. (The green point is the start point of the mobile robot, the red point is the end point of the mobile robot).</p>
Full article ">Figure 22
<p>Simulation Scene 2.</p>
Full article ">Figure 23
<p>Occupancy grid mapping for Simulation Scene 2.</p>
Full article ">Figure 24
<p>Minimal error map for original PLICP area in Scene 2. (The green point is the start point of the mobile robot, the red point is the end point of the mobile robot, and the red box is the main area that causes the error).</p>
Full article ">Figure 25
<p>Minimal error map of PLICP area after improvement in Scene 2. (The green point is the start point of the mobile robot, the red point is the end point of the mobile robot).</p>
Full article ">
17 pages, 7251 KiB  
Article
Depth Map Super-Resolution Based on Semi-Couple Deformable Convolution Networks
by Botao Liu, Kai Chen, Sheng-Lung Peng and Ming Zhao
Mathematics 2023, 11(21), 4556; https://doi.org/10.3390/math11214556 - 5 Nov 2023
Cited by 2 | Viewed by 1566
Abstract
Depth images obtained from lightweight, real-time depth estimation models and consumer-oriented sensors typically have low-resolution issues. Traditional interpolation methods for depth image up-sampling result in a significant information loss, especially in edges with discontinuous depth variations (depth discontinuities). To address this issue, this [...] Read more.
Depth images obtained from lightweight, real-time depth estimation models and consumer-oriented sensors typically have low-resolution issues. Traditional interpolation methods for depth image up-sampling result in a significant information loss, especially in edges with discontinuous depth variations (depth discontinuities). To address this issue, this paper proposes a semi-coupled deformable convolution network (SCD-Net) based on the idea of guided depth map super-resolution (GDSR). The method employs a semi-coupled feature extraction scheme to learn unique and similar features between RGB images and depth images. We utilize a Coordinate Attention (CA) to suppress redundant information in RGB features. Finally, a deformable convolutional module is employed to restore the original resolution of the depth image. The model is tested on NYUv2, Middlebury, Lu, and a Real-Sense real-world dataset created using an Intel Real-sense D455 structured-light camera. The super-resolution accuracy of SCD-Net at multiple scales is much higher than that of traditional methods and superior to recent state-of-the-art (SOTA) models, which demonstrates the effectiveness and flexibility of our model on GDSR tasks. In particular, our method further solves the problem of an RGB texture being over-transferred in GDSR tasks. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of SCD-Net.</p>
Full article ">Figure 2
<p>Statistics of the Real-sense dataset. (<b>a</b>) The scenes and the corresponding hierarchical content structures of the Real-sense dataset. (<b>b</b>) Example from the NYUv2 dataset (from left to right: RGB, raw depth, and GT). (<b>c</b>) Measure from the Real-sense dataset (from left to right: RGB, raw depth, and GT). Red and blue borders indicate the missing depth value and invalid boundary for NYU image data, respectively.</p>
Full article ">Figure 3
<p>Method procedure.</p>
Full article ">Figure 4
<p>Semi-couple feature extractor.</p>
Full article ">Figure 5
<p>Resampling at R = 3. (The elements in (<b>a</b>) are uniformly distributed into the channels corresponding to the colors in (<b>b</b>).)</p>
Full article ">Figure 6
<p>Deformable convolution module.</p>
Full article ">Figure 7
<p>Deformable kernel.</p>
Full article ">Figure 8
<p>Visual comparison of ×8 depth map SR results without NYUv2 dataset. The depth map processing was performed using JET color bar. (<b>a</b>) RGB, (<b>b</b>) DJF [<a href="#B47-mathematics-11-04556" class="html-bibr">47</a>], (<b>c</b>) DJFR [<a href="#B54-mathematics-11-04556" class="html-bibr">54</a>], (<b>d</b>) DKN [<a href="#B24-mathematics-11-04556" class="html-bibr">24</a>], (<b>e</b>) the proposed SCD-Net, and (<b>f</b>) GT images. (The area marked with red squares is enlarged for display).</p>
Full article ">Figure 9
<p>Visual comparison of ×8 depth map SR results for Real-sense. (<b>a</b>) RGB image, (<b>b</b>) lower-resolution depth map, (<b>c</b>) ground truth, (<b>d</b>) DKN [<a href="#B23-mathematics-11-04556" class="html-bibr">23</a>] training on Real-sense dataset, (<b>e</b>) SCD-net training on NYUv2 dataset, (<b>f</b>) SCD-net training on Real-sense dataset. (The area marked with red squares is enlarged for display).</p>
Full article ">
Back to TopTop