MDPI - Publisher of Open Access Journals

19 pages, 28445 KiB

Open AccessArticle

Masonry and Pictorial Surfaces Study by Laser Diagnostics: The Case of the Diana’s House in Ostia Antica

by Valeria Spizzichino, Luisa Caneve, Antonella Docci, Massimo Francucci, Massimiliano Guarneri, Daniela Tarica and Claudia Tempesta

Appl. Sci. 2025, 15(4), 2172; https://doi.org/10.3390/app15042172 - 18 Feb 2025

Viewed by 301

Abstract

The aim of the present research is to validate the combined use, through data fusion, of a Laser Induced Fluorescence (LIF) scanning system and a radar scanner (RGB-ITR, Red Green Blue Imaging Topological Radar system), as a unique tool to address the need [...] Read more.

The aim of the present research is to validate the combined use, through data fusion, of a Laser Induced Fluorescence (LIF) scanning system and a radar scanner (RGB-ITR, Red Green Blue Imaging Topological Radar system), as a unique tool to address the need for non-invasive, rapid, and low-cost techniques for both diagnostic and operational needs. The integrated system has been applied to the House of Diana complex in Ostia Antica. The main diagnostic objective of this research was to trace the materials used in different phases of restoration, from antiquity to modernity, on both masonry and pictorial surfaces, to reconstruct the history of the building. Due to the significant interest in this insula, other studies have been recently carried out on the House of Diana, but they once again highlighted the necessity of multiple approaches and non-invasive methods capable of providing quasi-real-time answers, delivering point-by-point information on very large surfaces to overcome the limits related to representativeness of sampling. The data acquired by the RGB-ITR system are quantitative, allowing for morphological and 3-colour analysis of the investigated artwork. In this work, the sensor has been used to create coloured 3D models useful for structural assessments and for locating different classes of materials. In fact, the LIF maps, which integrate knowledge about the original constituent materials and previous conservation interventions, have been used as additional layers of the tridimensional models. Therefore, the method can direct possible new investigations and restoration actions, piecing together the history of the House of Diana to build for it a safer future. Full article

(This article belongs to the Special Issue Solving Environmental and Cultural Heritage Issues Through Analytical Tools)

► Show Figures

Figure 1

20 pages, 4820 KiB

Open AccessArticle

Skeletal Data Matching and Merging from Multiple RGB-D Sensors for Room-Scale Distant Interaction with Multiple Surfaces

by Adrien Coppens and Valerie Maquil

Electronics 2025, 14(4), 790; https://doi.org/10.3390/electronics14040790 - 18 Feb 2025

Viewed by 209

Abstract

Using a commodity RGB-D sensor is a popular and cost-effective way to enable interaction at room scale, as such a device supports body tracking functionality at a reasonable price point. Even though the capabilities of such devices might be enough for applications like [...] Read more.

Using a commodity RGB-D sensor is a popular and cost-effective way to enable interaction at room scale, as such a device supports body tracking functionality at a reasonable price point. Even though the capabilities of such devices might be enough for applications like entertainment systems where a person plays in front of a television, this type of sensor is unfortunately sensitive to occlusions from objects or other people, who might be in the way in more sophisticated room-scale set-ups. One may use multiple RGB-D sensors and aggregate the collected data to address the occlusion problem, increase the tracking range, and improve accuracy. However, doing so requires the gathering of calibration information with regard to the sensors themselves and also regarding their relative placement on interactable surfaces. Another challenging consequence of relying on multiple sensors is the need to perform skeleton matching and merging based on their respective body tracking data (e.g., so that skeletons from different sensors but belonging to the same person are recognised as such). The present contribution focuses on approaches to tackling these issues. Ultimately, it contributes a working human interaction tracking system, leveraging multiple RGB-D sensors to provide unobtrusive and occlusion-resilient understanding capabilities. This constitutes a suitable basis for room-scale experiences such as those based on wall-sized displays. Full article

(This article belongs to the Special Issue Advance of Cooperative Working in Design, Visualization and Engineering)

► Show Figures

Figure 1

Figure 1
A partial representation of our circular multi-display set-up, on which two Azure Kinect sensors are attached to provide human behaviour tracking capabilities. The need for calibrating the display set-up and the multi-camera arrangement are clarified by the yellow and purple dashed arrows, respectively. Full article ">Figure 2
The skeleton matching problem. Skeletons from two distinct sensors must be matched to create a combined set of skeletons. Some skeletons are isolated since only one sensor can currently view them, while the majority of the people in the room are tracked by both sensors, creating overlapping skeletons. Reproduced from [<a href="#B1-electronics-14-00790" class="html-bibr">1</a>]. (a) Skeletons from two (green- and pink-coded) sensors. Notice the isolated skeletons: a green one in the top right and two pink ones at the bottom. (b) The merged skeletons following our skeletal data fusion algorithm, showing the successful pairing of overlapping skeletons but also the inclusion of isolated ones. Full article ">Figure 3
Partial point cloud creation (right) from a depth image (left). A selection of three spots from the depth image is depicted by blue circles, which are mirrored on the image plane in the right image. Because these points are similar in colour on the depth image, they are also similar in terms of distance from the sensor (shown on the right picture as blue lines of comparable length). By combining the distance information with a projection of the image points (using the sensor’s intrinsic parameters), it is possible to generate points, forming a resulting point cloud. Reproduced from [<a href="#B1-electronics-14-00790" class="html-bibr">1</a>]. Full article ">Figure 4
Unity-based display arrangement calibration, with orange point clouds overlaid on the virtual display configuration. (a) A slight misalignment between the orange point cloud and the screen borders. (b) Proper alignment of the orange point cloud and the multi-display arrangement. Full article ">Figure 5
Calibration results with our initial choices of (simple) methods. Points are coloured depending on which camera they originated from. (a) Using the skeleton-based approach. (b) Using the chequerboard approach. Full article ">Figure 6
Calibration results using the ICP approach (raw and filtered points), with green and red (full) point clouds corresponding to two separate sensors. (a) With raw point clouds. (b) With filtered point clouds. Full article ">

21 pages, 12015 KiB

Open AccessArticle

Segment Any Leaf 3D: A Zero-Shot 3D Leaf Instance Segmentation Method Based on Multi-View Images

by Yunlong Wang and Zhiyong Zhang

Sensors 2025, 25(2), 526; https://doi.org/10.3390/s25020526 - 17 Jan 2025

Viewed by 371

Abstract

Exploring the relationships between plant phenotypes and genetic information requires advanced phenotypic analysis techniques for precise characterization. However, the diversity and variability of plant morphology challenge existing methods, which often fail to generalize across species and require extensive annotated data, especially for 3D [...] Read more.

Exploring the relationships between plant phenotypes and genetic information requires advanced phenotypic analysis techniques for precise characterization. However, the diversity and variability of plant morphology challenge existing methods, which often fail to generalize across species and require extensive annotated data, especially for 3D datasets. This paper proposes a zero-shot 3D leaf instance segmentation method using RGB sensors. It extends the 2D segmentation model SAM (Segment Anything Model) to 3D through a multi-view strategy. RGB image sequences captured from multiple viewpoints are used to reconstruct 3D plant point clouds via multi-view stereo. HQ-SAM (High-Quality Segment Anything Model) segments leaves in 2D, and the segmentation is mapped to the 3D point cloud. An incremental fusion method based on confidence scores aggregates results from different views into a final output. Evaluated on a custom peanut seedling dataset, the method achieved point-level precision, recall, and F1 scores over 0.9 and object-level mIoU and precision above 0.75 under two IoU thresholds. The results show that the method achieves state-of-the-art segmentation quality while offering zero-shot capability and generalizability, demonstrating significant potential in plant phenotyping. Full article

(This article belongs to the Section Smart Agriculture)

► Show Figures

Figure 1

13 pages, 2243 KiB

Open AccessFeature PaperArticle

IGAF: Incremental Guided Attention Fusion for Depth Super-Resolution

by Athanasios Tragakis, Chaitanya Kaul, Kevin J. Mitchell, Hang Dai, Roderick Murray-Smith and Daniele Faccio

Sensors 2025, 25(1), 24; https://doi.org/10.3390/s25010024 - 24 Dec 2024

Viewed by 495

Abstract

Accurate depth estimation is crucial for many fields, including robotics, navigation, and medical imaging. However, conventional depth sensors often produce low-resolution (LR) depth maps, making detailed scene perception challenging. To address this, enhancing LR depth maps to high-resolution (HR) ones has become essential, [...] Read more.

Accurate depth estimation is crucial for many fields, including robotics, navigation, and medical imaging. However, conventional depth sensors often produce low-resolution (LR) depth maps, making detailed scene perception challenging. To address this, enhancing LR depth maps to high-resolution (HR) ones has become essential, guided by HR-structured inputs like RGB or grayscale images. We propose a novel sensor fusion methodology for guided depth super-resolution (GDSR), a technique that combines LR depth maps with HR images to estimate detailed HR depth maps. Our key contribution is the Incremental guided attention fusion (IGAF) module, which effectively learns to fuse features from RGB images and LR depth maps, producing accurate HR depth maps. Using IGAF, we build a robust super-resolution model and evaluate it on multiple benchmark datasets. Our model achieves state-of-the-art results compared to all baseline models on the NYU v2 dataset for

\times 4

,

\times 8

, and

\times 16

upsampling. It also outperforms all baselines in a zero-shot setting on the Middlebury, Lu, and RGB-D-D datasets. Code, environments, and models are available on GitHub. Full article

(This article belongs to the Special Issue Convolutional Neural Network Technology for 3D Imaging and Sensing)

► Show Figures

Figure 1

Figure 1
Overview of the proposed multi-modal architecture for the guided depth super resolution estimation. Full article ">Figure 2
The proposed multi-modal architecture utilizes information from both an LR depth map and an HR RGB image. Firstly, each modality passes through a convolutional layer followed by a LeakyReLU activation. The model utilizes the IGAF modules to combine information from the two modalities by fusing the relevant information on each stream and ignoring information that is unrelated to the depth maps. Finally, after the third IGAF module, the depth maps are refined and added using a global skip connection from the original upsampled LR depth maps. The RGB modality is used to provide guidance to estimate an HR depth map given an LR one. Full article ">Figure 3
The <math display="inline"><semantics> <mi mathvariant="bold">IGAF</mi> </semantics></math> module. The module is responsible for both feature extraction and modality fusion. Each modality passes through a feature extraction stage <math display="inline"><semantics> <mrow> <mo>(</mo> <mi mathvariant="bold">FWF</mi> <mo>)</mo> </mrow> </semantics></math> before the initial naive fusion by an element-wise multiplication. An <math display="inline"><semantics> <mi mathvariant="bold">SAF</mi> </semantics></math> block follows, which fuses the result of the multiplication with the extracted features of the RGB stream creating an initial structural guidance. The second <math display="inline"><semantics> <mi mathvariant="bold">SAF</mi> </semantics></math> block incrementally fuses this extracted structural guidance with the depth stream. The output of each <math display="inline"><semantics> <mi mathvariant="bold">SAF</mi> </semantics></math> block is generated by learning attention weights and subsequently performing a cross-multiplication operation between the two input sequences, resulting in fused and salient processed information. Full article ">Figure 4
Overview of the <math display="inline"><semantics> <mi mathvariant="bold">FWF</mi> </semantics></math> module. The two modules are separated and not combined into one larger module because the propagation of shallower features through the skip connections as seen in <a href="#sensors-25-00024-f003" class="html-fig">Figure 3</a> boosts the performance of the model. The <math display="inline"><semantics> <mi mathvariant="bold">FE</mi> </semantics></math> module is a series of convolutional layers, a channel attention process, and two skip connections. The <math display="inline"><semantics> <mi mathvariant="bold">WF</mi> </semantics></math> module uses linearly increasing dilation rates in convolutional layers to extract multi-resolution features. Full article ">Figure 5
Qualitative comparison between our model and SUFT [<a href="#B24-sensors-25-00024" class="html-bibr">24</a>]. The visualizations shown are for the <math display="inline"><semantics> <mrow> <mo>×</mo> <mn>8</mn> </mrow> </semantics></math> case. Our model creates more complete depth maps as seen in (c) for rows 1 and 2. In (c), row 3 shows that our model creates sharper edges with minimal bleeding. Also, in (c), row 4 the proposed model creates less smoothing with less bleeding. (Colormap chosen for better visualization. Better seen in full-screen, with zoom-in options). Full article ">

26 pages, 28365 KiB

Open AccessArticle

Three-Dimensional Geometric-Physical Modeling of an Environment with an In-House-Developed Multi-Sensor Robotic System

by Su Zhang, Minglang Yu, Haoyu Chen, Minchao Zhang, Kai Tan, Xufeng Chen, Haipeng Wang and Feng Xu

Remote Sens. 2024, 16(20), 3897; https://doi.org/10.3390/rs16203897 - 20 Oct 2024

Cited by 1 | Viewed by 1023

Abstract

Environment 3D modeling is critical for the development of future intelligent unmanned systems. This paper proposes a multi-sensor robotic system for environmental geometric-physical modeling and the corresponding data processing methods. The system is primarily equipped with a millimeter-wave cascaded radar and a multispectral [...] Read more.

Environment 3D modeling is critical for the development of future intelligent unmanned systems. This paper proposes a multi-sensor robotic system for environmental geometric-physical modeling and the corresponding data processing methods. The system is primarily equipped with a millimeter-wave cascaded radar and a multispectral camera to acquire the electromagnetic characteristics and material categories of the target environment and simultaneously employs light detection and ranging (LiDAR) and an optical camera to achieve a three-dimensional spatial reconstruction of the environment. Specifically, the millimeter-wave radar sensor adopts a multiple input multiple output (MIMO) array and obtains 3D synthetic aperture radar images through 1D mechanical scanning perpendicular to the array, thereby capturing the electromagnetic properties of the environment. The multispectral camera, equipped with nine channels, provides rich spectral information for material identification and clustering. Additionally, LiDAR is used to obtain a 3D point cloud, combined with the RGB images captured by the optical camera, enabling the construction of a three-dimensional geometric model. By fusing the data from four sensors, a comprehensive geometric-physical model of the environment can be constructed. Experiments conducted in indoor environments demonstrated excellent spatial-geometric-physical reconstruction results. This system can play an important role in various applications, such as environment modeling and planning. Full article

(This article belongs to the Special Issue Machine Learning for Intelligent Processing and Applications of Multi-Source Remote Sensing Data)

► Show Figures

Graphical abstract

18 pages, 9438 KiB

Open AccessArticle

High-Throughput and Accurate 3D Scanning of Cattle Using Time-of-Flight Sensors and Deep Learning

by Gbenga Omotara, Seyed Mohamad Ali Tousi, Jared Decker, Derek Brake and G. N. DeSouza

Sensors 2024, 24(16), 5275; https://doi.org/10.3390/s24165275 - 14 Aug 2024

Viewed by 1263

Abstract

We introduce a high-throughput 3D scanning system designed to accurately measure cattle phenotypes. This scanner employs an array of depth sensors, i.e., time-of-flight (ToF) sensors, each controlled by dedicated embedded devices. The sensors generate high-fidelity 3D point clouds, which are automatically stitched using [...] Read more.

We introduce a high-throughput 3D scanning system designed to accurately measure cattle phenotypes. This scanner employs an array of depth sensors, i.e., time-of-flight (ToF) sensors, each controlled by dedicated embedded devices. The sensors generate high-fidelity 3D point clouds, which are automatically stitched using a point could segmentation approach through deep learning. The deep learner combines raw RGB and depth data to identify correspondences between the multiple 3D point clouds, thus creating a single and accurate mesh that reconstructs the cattle geometry on the fly. In order to evaluate the performance of our system, we implemented a two-fold validation process. Initially, we quantitatively tested the scanner for its ability to determine accurate volume and surface area measurements in a controlled environment featuring known objects. Next, we explored the impact and need for multi-device synchronization when scanning moving targets (cattle). Finally, we performed qualitative and quantitative measurements on cattle. The experimental results demonstrate that the proposed system is capable of producing high-quality meshes of untamed cattle with accurate volume and surface area measurements for livestock studies. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

22 pages, 1932 KiB

Open AccessReview

Smart Nursing Wheelchairs: A New Trend in Assisted Care and the Future of Multifunctional Integration

by Zhewen Zhang, Peng Xu, Chengjia Wu and Hongliu Yu

Biomimetics 2024, 9(8), 492; https://doi.org/10.3390/biomimetics9080492 - 14 Aug 2024

Cited by 3 | Viewed by 2036

Abstract

As a significant technological innovation in the fields of medicine and geriatric care, smart care wheelchairs offer a novel approach to providing high-quality care services and improving the quality of care. The aim of this review article is to examine the development, applications [...] Read more.

As a significant technological innovation in the fields of medicine and geriatric care, smart care wheelchairs offer a novel approach to providing high-quality care services and improving the quality of care. The aim of this review article is to examine the development, applications and prospects of smart nursing wheelchairs, with particular emphasis on their assistive nursing functions, multiple-sensor fusion technology, and human–machine interaction interfaces. First, we describe the assistive functions of nursing wheelchairs, including position changing, transferring, bathing, and toileting, which significantly reduce the workload of nursing staff and improve the quality of care. Second, we summarized the existing multiple-sensor fusion technology for smart nursing wheelchairs, including LiDAR, RGB-D, ultrasonic sensors, etc. These technologies give wheelchairs autonomy and safety, better meeting patients’ needs. We also discussed the human–machine interaction interfaces of intelligent care wheelchairs, such as voice recognition, touch screens, and remote controls. These interfaces allow users to operate and control the wheelchair more easily, improving usability and maneuverability. Finally, we emphasized the importance of multifunctional-integrated care wheelchairs that integrate assistive care, navigation, and human–machine interaction functions into a comprehensive care solution for users. We are looking forward to the future and assume that smart nursing wheelchairs will play an increasingly important role in medicine and geriatric care. By integrating advanced technologies such as enhanced artificial intelligence, intelligent sensors, and remote monitoring, we expect to further improve patients’ quality of care and quality of life. Full article

(This article belongs to the Special Issue Bio-Inspired Data-Driven Methods and Their Applications in Engineering Control, Optimization and AI)

► Show Figures

Figure 1

Figure 1
The required care services for disabled older people based on the dimensions of their living care needs, their primary care needs, their health needs, and the top five entries for each dimension, including items in the psychological well-being dimension [<a href="#B7-biomimetics-09-00492" class="html-bibr">7</a>,<a href="#B8-biomimetics-09-00492" class="html-bibr">8</a>,<a href="#B10-biomimetics-09-00492" class="html-bibr">10</a>]. Full article ">Figure 2
Typical assisted-lifting robots. (a) ReChair. (b) HLPR Chair. (c) The AgileLife Patient Transfer System and Strong Arm. (d) The Piggyback Transfer Robot. (e) Integration of an electric care bed and an electric reclining wheelchair. (f) Transferring the patient assisted by slipmat. Full article ">Figure 3
Installation of the I-Support system in clinical environment for experimental validation. The devices constituting the overall system are presented. (a) Amphiro b1 water flow and temperature sensor. (b) General aspect of the system showing the motorized chair, the soft robotic arm, and the installation of the Kinect sensors (for audio–gestural communication). (c) Air temperature, humidity, and illumination sensors by Cube Sensors. (d) Smartwatch for user identification and activity tracking. Full article ">Figure 4
(a) Robotics Care’s om Poseidon (b) The multi-functional bathing robot. (c) The actual prototype of intelligent bath care system. (d) Assistive walker with passive sit-to-stand mechanism. (e) Self-reliance transfer support robot for home-based care. (f) Principle prototype of the intelligent toilet wheelchair. Full article ">Figure 5
Future trends predicted by ChatGPT on multifunctional intelligent care wheelchairs. Full article ">

15 pages, 5816 KiB

Open AccessArticle

Automated Destination Renewal Process for Location-Based Robot Errands

by Woo-Jin Lee and Sang-Seok Yun

Appl. Sci. 2024, 14(13), 5671; https://doi.org/10.3390/app14135671 - 28 Jun 2024

Viewed by 704

Abstract

In this paper, we propose a new approach for service robots to perform delivery tasks in indoor environments, including map-building and the automatic renewal of destinations for navigation. The first step involves converting the available floor plan (i.e., CAD drawing) of a new [...] Read more.

In this paper, we propose a new approach for service robots to perform delivery tasks in indoor environments, including map-building and the automatic renewal of destinations for navigation. The first step involves converting the available floor plan (i.e., CAD drawing) of a new space into a grid map that the robot can navigate. The system then segments the space in the map and generates movable initial nodes through a generalized Voronoi graph (GVG) thinning process. As the second step, we perform room segmentation from the grid map of the indoor environment and classify each space. Next, when the delivery object is recognized while searching the set space using the laser and RGB-D sensor, the system automatically updates itself to a position that makes it easier to grab objects, taking into consideration geometric relationships with surrounding obstacles. Also, the system supports the robot to autonomously explore the space where the user’s errand can be performed by hierarchically linking recognized objects and spatial information. Experiments related to map generation, estimating space from the recognized objects, and destination node updates were conducted from CAD drawings of buildings with actual multiple floors and rooms, and the performance of each stage of the process was evaluated. From the quantitative evaluation of each stage, the proposed system confirmed the potential of partial automation in performing location-based robot services. Full article

(This article belongs to the Section Robotics and Automation)

► Show Figures

Figure 1

18 pages, 6500 KiB

Open AccessArticle

NSVDNet: Normalized Spatial-Variant Diffusion Network for Robust Image-Guided Depth Completion

by Jin Zeng and Qingpeng Zhu

Electronics 2024, 13(12), 2418; https://doi.org/10.3390/electronics13122418 - 20 Jun 2024

Cited by 1 | Viewed by 996

Abstract

Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance [...] Read more.

Depth images captured by low-cost three-dimensional (3D) cameras are subject to low spatial density, requiring depth completion to improve 3D imaging quality. Image-guided depth completion aims at predicting dense depth images from extremely sparse depth measurements captured by depth sensors with the guidance of aligned Red–Green–Blue (RGB) images. Recent approaches have achieved a remarkable improvement, but the performance will degrade severely due to the corruption in input sparse depth. To enhance robustness to input corruption, we propose a novel depth completion scheme based on a normalized spatial-variant diffusion network incorporating measurement uncertainty, which introduces the following contributions. First, we design a normalized spatial-variant diffusion (NSVD) scheme to apply spatially varying filters iteratively on the sparse depth conditioned on its certainty measure for excluding depth corruption in the diffusion. In addition, we integrate the NSVD module into the network design to enable end-to-end training of filter kernels and depth reliability, which further improves the structural detail preservation via the guidance of RGB semantic features. Furthermore, we apply the NSVD module hierarchically at multiple scales, which ensures global smoothness while preserving visually salient details. The experimental results validate the advantages of the proposed network over existing approaches with enhanced performance and noise robustness for depth completion in real-use scenarios. Full article

(This article belongs to the Special Issue Image Sensors and Companion Chips)

► Show Figures

Figure 1

Figure 1
Example in NYUv2 dataset [<a href="#B12-electronics-13-02418" class="html-bibr">12</a>]. (a) RGB image input, (b) sparse depth input, depth estimation with (c) PNCNN [<a href="#B6-electronics-13-02418" class="html-bibr">6</a>] using single depth, (d) MiDaS [<a href="#B7-electronics-13-02418" class="html-bibr">7</a>] using single RGB, (e) NLSPN [<a href="#B11-electronics-13-02418" class="html-bibr">11</a>], and (f) proposed NSVDNet using both RGB and depth. As highlighted in the black rectangles, (f) NSVDNet generates more accurate structural details than (e) NLSPN due to the uncertainty-aware diffusion scheme. The results are evaluated using RMSE metric, where (f) NSVDNet achieves the smallest RMSE, indicating improved accuracy. Full article ">Figure 2
An overview of NSVDNet architecture to predict a dense depth from a disturbed sparse depth with RGB guidance. NSVDNet is composed of the depth-dominant branch, which estimates the initial dense depth from the sparse sensor depth, and the RGB-dominant branch, which generates the semantic structural features. The two branches are fused in the hierarchical NSVD modules, where the initial dense depth is diffused with spatial-variant diffusion kernels constructed from RGB features. Full article ">Figure 3
Depth completion with different algorithms, tested on NYUv2 dataset. As highlighted in the red rectangles, the proposed NSVDNet achieves more accurate depth completion results with detail preservation and noise robustness. Full article ">Figure 4
Comparison of depth completion with original sparse depth and noisy sparse depth with 50% outliers, tested on NYUv2 dataset. The comparison between results with original and noisy inputs demonstrates the robustness to input corruption for the proposed method. The selected patches are enlarged in the colored rectangles. Full article ">Figure 5
Generalization ability evaluation tests on TetrasRGBD dataset with outliers. The certainty maps explain the robustness of NSVDNet to input corruptions. Full article ">Figure 6
Generalization ability evaluation tests on TetrasRGBD dataset with real sensor data, where the proposed NSVDNet generates more accurate depth estimation than competitive methods, including PNCNN [<a href="#B38-electronics-13-02418" class="html-bibr">38</a>] and NLSPN [<a href="#B11-electronics-13-02418" class="html-bibr">11</a>]. Full article ">

26 pages, 19577 KiB

Open AccessArticle

Enhancing Building Point Cloud Reconstruction from RGB UAV Data with Machine-Learning-Based Image Translation

by Elisabeth Johanna Dippold and Fuan Tsai

Sensors 2024, 24(7), 2358; https://doi.org/10.3390/s24072358 - 8 Apr 2024

Cited by 1 | Viewed by 1594

Abstract

The performance of three-dimensional (3D) point cloud reconstruction is affected by dynamic features such as vegetation. Vegetation can be detected by near-infrared (NIR)-based indices; however, the sensors providing multispectral data are resource intensive. To address this issue, this study proposes a two-stage framework [...] Read more.

The performance of three-dimensional (3D) point cloud reconstruction is affected by dynamic features such as vegetation. Vegetation can be detected by near-infrared (NIR)-based indices; however, the sensors providing multispectral data are resource intensive. To address this issue, this study proposes a two-stage framework to firstly improve the performance of the 3D point cloud generation of buildings with a two-view SfM algorithm, and secondly, reduce noise caused by vegetation. The proposed framework can also overcome the lack of near-infrared data when identifying vegetation areas for reducing interferences in the SfM process. The first stage includes cross-sensor training, model selection and the evaluation of image-to-image RGB to color infrared (CIR) translation with Generative Adversarial Networks (GANs). The second stage includes feature detection with multiple feature detector operators, feature removal with respect to the NDVI-based vegetation classification, masking, matching, pose estimation and triangulation to generate sparse 3D point clouds. The materials utilized in both stages are a publicly available RGB-NIR dataset, and satellite and UAV imagery. The experimental results indicate that the cross-sensor and category-wise validation achieves an accuracy of 0.9466 and 0.9024, with a kappa coefficient of 0.8932 and 0.9110, respectively. The histogram-based evaluation demonstrates that the predicted NIR band is consistent with the original NIR data of the satellite test dataset. Finally, the test on the UAV RGB and artificially generated NIR with a segmentation-driven two-view SfM proves that the proposed framework can effectively translate RGB to CIR for NDVI calculation. Further, the artificially generated NDVI is able to segment and classify vegetation. As a result, the generated point cloud is less noisy, and the 3D model is enhanced. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

17 pages, 30409 KiB

Open AccessArticle

Data Fusion of RGB and Depth Data with Image Enhancement

by Lennard Wunsch, Christian Görner Tenorio, Katharina Anding, Andrei Golomoz and Gunther Notni

J. Imaging 2024, 10(3), 73; https://doi.org/10.3390/jimaging10030073 - 21 Mar 2024

Cited by 2 | Viewed by 2467

Abstract

Since 3D sensors became popular, imaged depth data are easier to obtain in the consumer sector. In applications such as defect localization on industrial objects or mass/volume estimation, precise depth data is important and, thus, benefits from the usage of multiple information sources. [...] Read more.

Since 3D sensors became popular, imaged depth data are easier to obtain in the consumer sector. In applications such as defect localization on industrial objects or mass/volume estimation, precise depth data is important and, thus, benefits from the usage of multiple information sources. However, a combination of RGB images and depth images can not only improve our understanding of objects, capacitating one to gain more information about objects but also enhance data quality. Combining different camera systems using data fusion can enable higher quality data since disadvantages can be compensated. Data fusion itself consists of data preparation and data registration. A challenge in data fusion is the different resolutions of sensors. Therefore, up- and downsampling algorithms are needed. This paper compares multiple up- and downsampling methods, such as different direct interpolation methods, joint bilateral upsampling (JBU), and Markov random fields (MRFs), in terms of their potential to create RGB-D images and improve the quality of depth information. In contrast to the literature in which imaging systems are adjusted to acquire the data of the same section simultaneously, the laboratory setup in this study was based on conveyor-based optical sorting processes, and therefore, the data were acquired at different time periods and different spatial locations. Data assignment and data cropping were necessary. In order to evaluate the results, root mean square error (RMSE), signal-to-noise ratio (SNR), correlation (CORR), universal quality index (UQI), and the contour offset are monitored. With JBU outperforming the other upsampling methods, achieving a meanRMSE = 25.22, mean SNR = 32.80, mean CORR = 0.99, and mean UQI = 0.97. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

Figure 1
Setup used for 3D and RGB imaging. Full article ">Figure 2
Example objects utilized: (a) stone-A, (b) stone-B, and the (c) validation coin. Full article ">Figure 3
Results of stone one (object). (a) False color images of depth shadows imaged by the 3D laser line scanner (height legend: yellow > 0 <math display="inline"><semantics> <mi mathvariant="normal">m</mi> </semantics></math><math display="inline"><semantics> <mi mathvariant="normal">m</mi> </semantics></math>, and purple = 0 <math display="inline"><semantics> <mi mathvariant="normal">m</mi> </semantics></math><math display="inline"><semantics> <mi mathvariant="normal">m</mi> </semantics></math>) and the (b) improved depth map. Full article ">Figure 4
Flowchart of the data acquisition process. Full article ">Figure 5
Flowchart of the data fusion process. Full article ">Figure 6
RGB-D point cloud of the validation coin fused by JBU. Full article ">Figure 7
RGB-D point cloud after data fusion, for example, stones-A (a) and -B (b) fused by JBU. Full article ">Figure 8
Results of object stone-A. False color image depth maps during the fusion process (a,c,e,g) and the edge correlation of the depth map and RGB image (b,d,f,h). (a,b) show the process result of the cropped depth data, with minimized depth shadows; (c,d) show the process result of the depth data after the synthesis; (e,f) show the process result after JBU data fusion; (g,h) show the process result after the feature extraction process. Full article ">

15 pages, 1207 KiB

Open AccessArticle

From Movements to Metrics: Evaluating Explainable AI Methods in Skeleton-Based Human Activity Recognition

by Kimji N. Pellano, Inga Strümke and Espen A. F. Ihlen

Sensors 2024, 24(6), 1940; https://doi.org/10.3390/s24061940 - 18 Mar 2024

Cited by 4 | Viewed by 1503

Abstract

The advancement of deep learning in human activity recognition (HAR) using 3D skeleton data is critical for applications in healthcare, security, sports, and human–computer interaction. This paper tackles a well-known gap in the field, which is the lack of testing in the applicability [...] Read more.

The advancement of deep learning in human activity recognition (HAR) using 3D skeleton data is critical for applications in healthcare, security, sports, and human–computer interaction. This paper tackles a well-known gap in the field, which is the lack of testing in the applicability and reliability of XAI evaluation metrics in the skeleton-based HAR domain. We have tested established XAI metrics, namely faithfulness and stability on Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) to address this problem. This study introduces a perturbation method that produces variations within the error tolerance of motion sensor tracking, ensuring the resultant skeletal data points remain within the plausible output range of human movement as captured by the tracking device. We used the NTU RGB+D 60 dataset and the EfficientGCN architecture for HAR model training and testing. The evaluation involved systematically perturbing the 3D skeleton data by applying controlled displacements at different magnitudes to assess the impact on XAI metric performance across multiple action classes. Our findings reveal that faithfulness may not consistently serve as a reliable metric across all classes for the EfficientGCN model, indicating its limited applicability in certain contexts. In contrast, stability proves to be a more robust metric, showing dependability across different perturbation magnitudes. Additionally, CAM and Grad-CAM yielded almost identical explanations, leading to closely similar metric outcomes. This suggests a need for the exploration of additional metrics and the application of more diverse XAI methods to broaden the understanding and effectiveness of XAI in skeleton-based HAR. Full article

(This article belongs to the Special Issue AI-Enabled Sensing Technology and Data Analysis Techniques for Intelligent Human-Computer Interaction)

► Show Figures

Figure 1

Figure 1
Illustration of perturbing a point P(x, y, z) in 3D space to a new position P′(x′, y′, z′) using spherical coordinates. The perturbation magnitude is represented by r, with azimuthal angle <math display="inline"><semantics> <mi>θ</mi> </semantics></math> and polar angle <math display="inline"><semantics> <mi>ϕ</mi> </semantics></math>. Full article ">Figure 2
The EfficientGCN pipeline showing the variables for calculating faithfulness and stability. Perturbation is performed in the Data Preprocess stage. Full article ">Figure 3
(left) CAM, Grad-CAM, and baseline random attributions for a data instance in ‘standing up’ (class 8), averaged for all frames and normalized. The color gradient denotes the score intensity: blue indicates 0, and progressing to red indicates a score of 1; (right) the numerical values of the attribution scores, with k denoting the body point number. Full article ">Figure 4
Evaluation metric outcomes for ‘Writing’ (Class 11, i.e., the weakest class), showing CAM (blue), Grad-CAM (orange), and the random (green) methods for (a) PGI, (b) PGU, (c) RISb, (d) RISj, (e) RISv, (f) ROS, and (g) RRS. The y-axis measures the metric values, while the x-axis shows the perturbation magnitude. CAM and Grad-CAM graphs overlap due to extremely similar metric outcomes. Full article ">Figure 5
Evaluation metric outcomes for ‘Jump Up’ (Class 26, i.e., the strongest class), showing CAM (blue), Grad-CAM (orange), and the random (green) methods for (a) PGI, (b) PGU, (c) RISb, (d) RISj, (e) RISv, (f) ROS, and (g) RRS. The y-axis measures the metric values, while the x-axis shows the perturbation magnitude. CAM and Grad-CAM graphs overlap due to extremely similar metric outcomes. Full article ">Figure A1
Evaluation metric outcomes for ‘checking time on watch’ (Class 32), showing CAM (blue), Grad-CAM (orange), and the random (green) methods for (a) PGI, (b) PGU, (c) RISb, (d) RISj, (e) RISv, (f) ROS, and (g) RRS. Despite increasing perturbation magnitudes, CAM and Grad-CAM exhibit only marginally better performance compared with the random method in terms of PGI. This suggests that the expected correlation between increasing the perturbation magnitude of important features and significant changes in prediction output may not consistently apply to this particular case. Conversely, for PGU, CAM and Grad-CAM demonstrate more effective identification of unimportant features compared with the random method. As perturbation magnitude increases, the random method results in a significantly larger discrepancy between the prediction probabilities of the original and perturbed data. Full article ">Figure A2
Evaluation metric outcomes for ‘clapping’ (Class 9), showing CAM (blue), Grad-CAM (orange), and the random (green) methods for (a) PGI, (b) PGU, (c) RISb, (d) RISj, (e) RISv, (f) ROS, and (g) RRS. Similar to class 32, the PGI results for class 9 show only a slight difference in performance between CAM and Grad-CAM versus the random method, even as perturbation magnitude increases. The PGU results still echo those in class 32, with CAM and Grad-CAM outperforming the random method in distinguishing unimportant features. Full article ">

16 pages, 9434 KiB

Open AccessArticle

Omnidirectional-Sensor-System-Based Texture Noise Correction in Large-Scale 3D Reconstruction

by Wenya Xie and Xiaoping Hong

Sensors 2024, 24(1), 78; https://doi.org/10.3390/s24010078 - 22 Dec 2023

Viewed by 1186

Abstract

The evolution of cameras and LiDAR has propelled the techniques and applications of three-dimensional (3D) reconstruction. However, due to inherent sensor limitations and environmental interference, the reconstruction process often entails significant texture noise, such as specular highlight, color inconsistency, and object occlusion. Traditional [...] Read more.

The evolution of cameras and LiDAR has propelled the techniques and applications of three-dimensional (3D) reconstruction. However, due to inherent sensor limitations and environmental interference, the reconstruction process often entails significant texture noise, such as specular highlight, color inconsistency, and object occlusion. Traditional methodologies grapple to mitigate such noise, particularly in large-scale scenes, due to the voluminous data produced by imaging sensors. In response, this paper introduces an omnidirectional-sensor-system-based texture noise correction framework for large-scale scenes, which consists of three parts. Initially, we obtain a colored point cloud with luminance value through LiDAR points and RGB images organization. Next, we apply a voxel hashing algorithm during the geometry reconstruction to accelerate the computation speed and save the computer memory. Finally, we propose the key innovation of our paper, the frame-voting rendering and the neighbor-aided rendering mechanisms, which effectively eliminates the aforementioned texture noise. From the experimental results, the processing rate of one million points per second shows its real-time applicability, and the output figures of texture optimization exhibit a significant reduction in texture noise. These results indicate that our framework has advanced performance in correcting multiple texture noise in large-scale 3D reconstruction. Full article

(This article belongs to the Special Issue Sensing and Processing for 3D Computer Vision: 2nd Edition)

► Show Figures

Figure 1

24 pages, 8567 KiB

Open AccessArticle

Multi-Sensor Fusion Simultaneous Localization Mapping Based on Deep Reinforcement Learning and Multi-Model Adaptive Estimation

by Ching-Chang Wong, Hsuan-Ming Feng and Kun-Lung Kuo

Sensors 2024, 24(1), 48; https://doi.org/10.3390/s24010048 - 21 Dec 2023

Cited by 6 | Viewed by 3156

Abstract

In this study, we designed a multi-sensor fusion technique based on deep reinforcement learning (DRL) mechanisms and multi-model adaptive estimation (MMAE) for simultaneous localization and mapping (SLAM). The LiDAR-based point-to-line iterative closest point (PLICP) and RGB-D camera-based ORBSLAM2 methods were utilized to estimate [...] Read more.

In this study, we designed a multi-sensor fusion technique based on deep reinforcement learning (DRL) mechanisms and multi-model adaptive estimation (MMAE) for simultaneous localization and mapping (SLAM). The LiDAR-based point-to-line iterative closest point (PLICP) and RGB-D camera-based ORBSLAM2 methods were utilized to estimate the localization of mobile robots. The residual value anomaly detection was combined with the Proximal Policy Optimization (PPO)-based DRL model to accomplish the optimal adjustment of weights among different localization algorithms. Two kinds of indoor simulation environments were established by using the Gazebo simulator to validate the multi-model adaptive estimation localization performance, which is used in this paper. The experimental results of the proposed method in this study confirmed that it can effectively fuse the localization information from multiple sensors and enable mobile robots to obtain higher localization accuracy than the traditional PLICP and ORBSLAM2. It was also found that the proposed method increases the localization stability of mobile robots in complex environments. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

17 pages, 7251 KiB

Open AccessArticle

Depth Map Super-Resolution Based on Semi-Couple Deformable Convolution Networks

by Botao Liu, Kai Chen, Sheng-Lung Peng and Ming Zhao

Mathematics 2023, 11(21), 4556; https://doi.org/10.3390/math11214556 - 5 Nov 2023

Cited by 2 | Viewed by 1566

Abstract

Depth images obtained from lightweight, real-time depth estimation models and consumer-oriented sensors typically have low-resolution issues. Traditional interpolation methods for depth image up-sampling result in a significant information loss, especially in edges with discontinuous depth variations (depth discontinuities). To address this issue, this [...] Read more.

Depth images obtained from lightweight, real-time depth estimation models and consumer-oriented sensors typically have low-resolution issues. Traditional interpolation methods for depth image up-sampling result in a significant information loss, especially in edges with discontinuous depth variations (depth discontinuities). To address this issue, this paper proposes a semi-coupled deformable convolution network (SCD-Net) based on the idea of guided depth map super-resolution (GDSR). The method employs a semi-coupled feature extraction scheme to learn unique and similar features between RGB images and depth images. We utilize a Coordinate Attention (CA) to suppress redundant information in RGB features. Finally, a deformable convolutional module is employed to restore the original resolution of the depth image. The model is tested on NYUv2, Middlebury, Lu, and a Real-Sense real-world dataset created using an Intel Real-sense D455 structured-light camera. The super-resolution accuracy of SCD-Net at multiple scales is much higher than that of traditional methods and superior to recent state-of-the-art (SOTA) models, which demonstrates the effectiveness and flexibility of our model on GDSR tasks. In particular, our method further solves the problem of an RGB texture being over-transferred in GDSR tasks. Full article

(This article belongs to the Special Issue Advances in Data Mining, Machine Learning and Causal Inference and Their Applications)

► Show Figures

Figure 1

Search Results (67)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (67)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI