[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (169)

Search Parameters:
Keywords = residual encoding-decoding network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 3524 KiB  
Article
Effective Detection of Cloud Masks in Remote Sensing Images
by Yichen Cui, Hong Shen and Chan-Tong Lam
Sensors 2024, 24(23), 7730; https://doi.org/10.3390/s24237730 - 3 Dec 2024
Viewed by 429
Abstract
Effective detection of the contours of cloud masks and estimation of their distribution can be of practical help in studying weather changes and natural disasters. Existing deep learning methods are unable to extract the edges of clouds and backgrounds in a refined manner [...] Read more.
Effective detection of the contours of cloud masks and estimation of their distribution can be of practical help in studying weather changes and natural disasters. Existing deep learning methods are unable to extract the edges of clouds and backgrounds in a refined manner when detecting cloud masks (shadows) due to their unpredictable patterns, and they are also unable to accurately identify small targets such as thin and broken clouds. For these problems, we propose MDU-Net, a multiscale dual up-sampling segmentation network based on an encoder–decoder–decoder. The model uses an improved residual module to capture the multi-scale features of clouds more effectively. MDU-Net first extracts the feature maps using four residual modules at different scales, and then sends them to the context information full flow module for the first up-sampling. This operation refines the edges of clouds and shadows, enhancing the detection performance. Subsequently, the second up-sampling module concatenates feature map channels to fuse contextual spatial information, which effectively reduces the false detection rate of unpredictable targets hidden in cloud shadows. On a self-made cloud and cloud shadow dataset based on the Landsat8 satellite, MDU-Net achieves scores of 95.61% in PA and 84.97% in MIOU, outperforming other models in both metrics and result images. Additionally, we conduct experiments to test the model’s generalization capability on the landcover.ai dataset to show that it also achieves excellent performance in the visualization results. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>The structure of MDU-Net.</p>
Full article ">Figure 2
<p>The structure of the residual module. (<b>a</b>) Downsampling Residual Module. (<b>b</b>) Standard Residual Module.</p>
Full article ">Figure 3
<p>The structure of dual up-sampling module.</p>
Full article ">Figure 4
<p>The backgrounds of self-made dataset. (<b>a</b>) Water areas. (<b>b</b>) Cities. (<b>c</b>) Vegetation. (<b>d</b>) Deserts.</p>
Full article ">Figure 5
<p>Visualization results of different models in rural and vegetated regions. (<b>a</b>) Original image, (<b>b</b>) Label image, (<b>c</b>) FCN, (<b>d</b>) UNet, (<b>e</b>) MultiResUNet, (<b>f</b>) PSPNet, (<b>g</b>) RSAGUNet, (<b>h</b>) AFMUNet, (<b>i</b>) MDU-Net.</p>
Full article ">Figure 6
<p>Prediction pictures of different algorithms in saline and snow-covered areas. (<b>a</b>) Original image, (<b>b</b>) Label image, (<b>c</b>) FCN, (<b>d</b>) UNet, (<b>e</b>) MultiResUNet, (<b>f</b>) PSPNet, (<b>g</b>) RSAGUNet, (<b>h</b>) AFMUNet, (<b>i</b>) MDU-Net.</p>
Full article ">Figure 7
<p>Visualization results of different models on landcover.ai dataset: (<b>a</b>) Original image, (<b>b</b>) FCN, (<b>c</b>) MultiResUNet, (<b>d</b>) ResUNet, (<b>e</b>) UNet, (<b>f</b>) PSPNet, (<b>g</b>) RSAGUNet, (<b>h</b>) AFMUNet, (<b>i</b>) MDU-Net.</p>
Full article ">
20 pages, 2388 KiB  
Article
The Spectrum Difference Enhanced Network for Hyperspectral Anomaly Detection
by Shaohua Liu, Huibo Guo, Shiwen Gao and Wuxia Zhang
Remote Sens. 2024, 16(23), 4518; https://doi.org/10.3390/rs16234518 - 2 Dec 2024
Viewed by 408
Abstract
Most deep learning-based hyperspectral anomaly detection (HAD) methods focus on modeling or reconstructing the hyperspectral background to obtain residual maps from the original hyperspectral images. However, these methods typically do not pay enough attention to the spectral similarity in the complex environment, resulting [...] Read more.
Most deep learning-based hyperspectral anomaly detection (HAD) methods focus on modeling or reconstructing the hyperspectral background to obtain residual maps from the original hyperspectral images. However, these methods typically do not pay enough attention to the spectral similarity in the complex environment, resulting in inadequate distinction between background and anomalies. Moreover, some anomalies and background are different objects, but they are sometimes recognized as the objects with the same spectrum. To address the issues mentioned above, this paper proposes a Spectrum Difference Enhanced Network (SDENet) for HAD, which employs variational mapping and Transformer to amplify spectrum differences. The proposed network is based on the encoder–decoder structure, which contains a CSWin-Transformer encoder, Variational Mapping Module (VMModule), and CSWin-Transformer decoder. First, the CSWin-Transformer encoder and decoder are designed to supplement image information by extracting deep and semantic features, where a cross-shaped window self-attention mechanism is designed to provide strong modeling capability with minimal computational cost. Second, in order to enhance the spectral difference characteristics between anomalies and background, a randomly sampling VMModule is presented for feature space transformation. Finally, all fully connected mapping operations are replaced with convolutional layers to reduce the model parameters and computational load. The effectiveness of the proposed SDENet is verified on three datasets, and experimental results show that it achieves better detection accuracy and lower model complexity compared with existing methods. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

Figure 1
<p>The structure of the Spectrum Difference Enhanced Network.</p>
Full article ">Figure 2
<p>(<b>a</b>) The Airplane1 dataset. (<b>b</b>) The ground truth.</p>
Full article ">Figure 3
<p>(<b>a</b>) The HYDICE1 dataset. (<b>b</b>) The ground truth.</p>
Full article ">Figure 4
<p>(<b>a</b>) The Salinas1 dataset. (<b>b</b>) The ground truth.</p>
Full article ">Figure 5
<p>The visualization of anomaly detection results on the Airport1 dataset: (<b>a</b>) Auto-AD, (<b>b</b>) DeCNNAD, (<b>c</b>) FEBPAD, (<b>d</b>) LRSNCR, (<b>e</b>) RGAE, (<b>f</b>) AETNet, (<b>g</b>) GT-HAD, (<b>h</b>) SDENet (Ours), and (<b>i</b>) the ground truth.</p>
Full article ">Figure 6
<p>The visualization of anomaly detection results on the HYDICE1 dataset: (<b>a</b>) Auto-AD, (<b>b</b>) DeCNNAD, (<b>c</b>) FEBPAD, (<b>d</b>) LRSNCR, (<b>e</b>) RGAE, (<b>f</b>) AETNet, (<b>g</b>) GT-HAD, (<b>h</b>) SDENet (Ours), and (<b>i</b>) the ground truth.</p>
Full article ">Figure 7
<p>The visualization of anomaly detection results on the Salinas1 dataset: (<b>a</b>) Auto-AD, (<b>b</b>) DeCNNAD, (<b>c</b>) FEBPAD, (<b>d</b>) LRSNCR, (<b>e</b>) RGAE, (<b>f</b>) AETNet, (<b>g</b>) GT-HAD, (<b>h</b>) SDENet (Ours), and (<b>i</b>) the ground truth.</p>
Full article ">Figure 8
<p>Anomaly detection results of SDENet on three datasets with varying latent dimensions.</p>
Full article ">Figure 9
<p>AUC values of SDENet wo CSWin and SDENet algorithms on three datasets.</p>
Full article ">Figure 10
<p>AUC values of SDENet wo VMModule and SDENet algorithms on three datasets.</p>
Full article ">
21 pages, 4627 KiB  
Article
CFF-Net: Cross-Hierarchy Feature Fusion Network Based on Composite Dual-Channel Encoder for Surface Defect Segmentation
by Ke’er Qian, Xiaokang Ding, Xiaoliang Jiang, Yingyu Ji and Ling Dong
Electronics 2024, 13(23), 4714; https://doi.org/10.3390/electronics13234714 - 28 Nov 2024
Viewed by 403
Abstract
In industries spanning manufacturing to software development, defect segmentation is essential for maintaining high standards of product quality and reliability. However, traditional segmentation methods often struggle to accurately identify defects due to challenges like noise interference, occlusion, and feature overlap. To solve these [...] Read more.
In industries spanning manufacturing to software development, defect segmentation is essential for maintaining high standards of product quality and reliability. However, traditional segmentation methods often struggle to accurately identify defects due to challenges like noise interference, occlusion, and feature overlap. To solve these problems, we propose a cross-hierarchy feature fusion network based on a composite dual-channel encoder for surface defect segmentation, called CFF-Net. Specifically, in the encoder of CFF-Net, we design a composite dual-channel module (CDCM), which combines standard convolution with dilated convolution and adopts a dual-path parallel structure to enhance the model’s capability in feature extraction. Then, a dilated residual pyramid module (DRPM) is integrated at the junction of the encoder and decoder, which utilizes the expansion convolution of different expansion rates to effectively capture multi-scale context information. In the final output phase, we introduce a cross-hierarchy feature fusion strategy (CFFS) that combines outputs from different layers or stages, thereby improving the robustness and generalization of the network. Finally, we conducted comparative experiments to evaluate CFF-Net against several mainstream segmentation networks across three distinct datasets: a publicly available Crack500 dataset, a self-built Bearing dataset, and another publicly available SD-saliency-900 dataset. The results demonstrated that CFF-Net consistently outperformed competing methods in segmentation tasks. Specifically, in the Crack500 dataset, CFF-Net achieved notable performance metrics, including an Mcc of 73.36%, Dice coefficient of 74.34%, and Jaccard index of 59.53%. For the Bearing dataset, it recorded an Mcc of 76.97%, Dice coefficient of 77.04%, and Jaccard index of 63.28%. Similarly, in the SD-saliency-900 dataset, CFF-Net achieved an Mcc of 84.08%, Dice coefficient of 85.82%, and Jaccard index of 75.67%. These results underscore CFF-Net’s effectiveness and reliability in handling diverse segmentation challenges across different datasets. Full article
Show Figures

Figure 1

Figure 1
<p>Visual illustrations of defects on the Crack500 dataset and corresponding annotations.</p>
Full article ">Figure 2
<p>Visual illustrations of defects on the Bearing dataset and corresponding annotations.</p>
Full article ">Figure 3
<p>Visual illustrations of defects on the SD-Saliency-900 dataset and corresponding annotations.</p>
Full article ">Figure 4
<p>Overall architecture of CFF-Net.</p>
Full article ">Figure 5
<p>Structure of original selective kernel module.</p>
Full article ">Figure 6
<p>Structure of composite dual-channel module.</p>
Full article ">Figure 7
<p>Comparison experiment of dilatational convolution. The first row: consecutive convolutions with different expansions (rate = 1, 2, 4). The second row: consecutive convolution with the same expansion (rate = 2).</p>
Full article ">Figure 8
<p>Structure of dilated residual pyramid module.</p>
Full article ">Figure 9
<p>The loss and accuracy curves for the validation phases of all methods. The first column illustrates the results obtained on the Crack500 dataset, the second column displays the corresponding results for the Bearing dataset, and the last column shows the results for the SD-saliency-900 dataset.</p>
Full article ">Figure 10
<p>Visual comparison of different methods on the Crack500 dataset.</p>
Full article ">Figure 11
<p>Visual comparison of different methods on the Bearing dataset.</p>
Full article ">Figure 12
<p>Visual comparison of different methods on the SD-saliency-900 dataset.</p>
Full article ">Figure 13
<p>Ablation experiments on the Crack500 dataset.</p>
Full article ">Figure 14
<p>Bar chart of ablation experiment.</p>
Full article ">Figure 15
<p>Examples of poor segmentation on the Crack500 dataset.</p>
Full article ">Figure 16
<p>Examples of poor segmentation on the Bearing dataset.</p>
Full article ">Figure 17
<p>Examples of poor segmentation on the SD-saliency-900 dataset.</p>
Full article ">
21 pages, 10985 KiB  
Article
A Novel Multi-Scale Feature Enhancement U-Shaped Network for Pixel-Level Road Crack Segmentation
by Jing Wang, Benlan Shen, Guodong Li, Jiao Gao and Chao Chen
Electronics 2024, 13(22), 4503; https://doi.org/10.3390/electronics13224503 - 16 Nov 2024
Viewed by 506
Abstract
Timely and accurate detection of pavement cracks, the most common type of road damage, is essential for ensuring road safety. Automatic image segmentation of cracks can accurately locate their pixel positions. This paper proposes a Multi-Scale Feature Enhanced U-shaped Network (MFE-UNet) for pavement [...] Read more.
Timely and accurate detection of pavement cracks, the most common type of road damage, is essential for ensuring road safety. Automatic image segmentation of cracks can accurately locate their pixel positions. This paper proposes a Multi-Scale Feature Enhanced U-shaped Network (MFE-UNet) for pavement crack detection. This network model uses a Residual Detail-Enhanced Block (RDEB) instead of a conventional convolution in the encoder–decoder process. The block combines Efficient Multi-Scale Attention to enhance its feature extraction performance. The Multi-Scale Gating Feature Fusion (MGFF) is incorporated into the skip connections, enhancing the fusion of multi-scale features to capture finer crack details while maintaining rich semantic information. Furthermore, we created a pavement crack image dataset named China_MCrack, consisting of 1500 images collected from road surfaces using smartphone-mounted motorbikes. The proposed network was trained and tested on the China_MCrack, DeepCrack, and Crack-Forest datasets, with additional generalization experiments on the BochumCrackDataset. The results were compared with those of the U-Net model, ResUNet, and Attention U-Net. The experimental results show that the proposed MFE-UNet model achieves accuracies of 82.95%, 91.71%, and 69.02% on three datasets, namely, China_MCrack, DeepCrack, and Crack-Forest datasets, respectively, and the F1_score is improved by 1–4% compared with other networks. Experimental results demonstrate that the proposed method is effective in detecting cracks at the pixel level. Full article
(This article belongs to the Special Issue Emerging Technologies in Computational Intelligence)
Show Figures

Figure 1

Figure 1
<p>Network architecture of MFE-UNet.</p>
Full article ">Figure 2
<p>The architecture of RDEB.</p>
Full article ">Figure 3
<p>The derivation of HDC.</p>
Full article ">Figure 4
<p>The architecture of EMA.</p>
Full article ">Figure 5
<p>The architecture of MGFF.</p>
Full article ">Figure 6
<p>Example of manually labeled pixels using the Labelme Image Annotation tool.</p>
Full article ">Figure 7
<p>Labelme labels the image and result: (<b>a</b>) original image; (<b>b</b>) true label.</p>
Full article ">Figure 8
<p>Comparison of the F1_score of the China_MCrack training set on four networks.</p>
Full article ">Figure 9
<p>Comparison of the F1_score of the DeepCrack training set on four networks.</p>
Full article ">Figure 10
<p>Comparison of the F1_score of the CFD training set on four networks.</p>
Full article ">Figure 11
<p>Comparison of prediction results of four networks in China_MCrack. The crack images in different cases: (<b>a</b>) contains branches, (<b>b</b>,<b>c</b>) contain tiny cracks, (<b>d</b>) includes the entire road background, (<b>e</b>) has a thin boundary, and (<b>f</b>) has unclear edges.</p>
Full article ">Figure 11 Cont.
<p>Comparison of prediction results of four networks in China_MCrack. The crack images in different cases: (<b>a</b>) contains branches, (<b>b</b>,<b>c</b>) contain tiny cracks, (<b>d</b>) includes the entire road background, (<b>e</b>) has a thin boundary, and (<b>f</b>) has unclear edges.</p>
Full article ">Figure 12
<p>Comparison of prediction results of four networks in DeepCrack. The crack images in different cases: (<b>a</b>) contain leaves, (<b>b</b>) contain tiny cracks, (<b>c</b>) coarse cracks, (<b>d</b>) have blurred edges, (<b>e</b>) contain other edge interference, and (<b>f</b>–<b>h</b>) contain a lot of texture information.</p>
Full article ">Figure 12 Cont.
<p>Comparison of prediction results of four networks in DeepCrack. The crack images in different cases: (<b>a</b>) contain leaves, (<b>b</b>) contain tiny cracks, (<b>c</b>) coarse cracks, (<b>d</b>) have blurred edges, (<b>e</b>) contain other edge interference, and (<b>f</b>–<b>h</b>) contain a lot of texture information.</p>
Full article ">Figure 13
<p>Comparison of prediction results of four networks in CFD. The crack images in different cases: (<b>a</b>) contains cross cracks, (<b>b</b>) contains tiny cracks, (<b>c</b>) contains a lot of noise, (<b>d</b>) has blurred edges, (<b>e</b>) contains complex texture information, and (<b>f</b>) has low contrast.</p>
Full article ">Figure 13 Cont.
<p>Comparison of prediction results of four networks in CFD. The crack images in different cases: (<b>a</b>) contains cross cracks, (<b>b</b>) contains tiny cracks, (<b>c</b>) contains a lot of noise, (<b>d</b>) has blurred edges, (<b>e</b>) contains complex texture information, and (<b>f</b>) has low contrast.</p>
Full article ">Figure 14
<p>Comparison of prediction results on the BochumCrackDataset for models trained by four networks on China_MCrack. The crack images in different cases: (<b>a</b>) contain small cracks, (<b>b</b>,<b>c</b>) have complex backgrounds, (<b>d</b>) contain a lot of noise, and (<b>e</b>,<b>f</b>) have different image background colors.</p>
Full article ">Figure 14 Cont.
<p>Comparison of prediction results on the BochumCrackDataset for models trained by four networks on China_MCrack. The crack images in different cases: (<b>a</b>) contain small cracks, (<b>b</b>,<b>c</b>) have complex backgrounds, (<b>d</b>) contain a lot of noise, and (<b>e</b>,<b>f</b>) have different image background colors.</p>
Full article ">Figure 15
<p>MFE-UNet model training results on different datasets in the detection results of the BochumCrackDataset. The crack images in different cases: (<b>a</b>,<b>b</b>) coarse cracks, (<b>c</b>,<b>d</b>) fine cracks.</p>
Full article ">
16 pages, 4534 KiB  
Article
AER-Net: Attention-Enhanced Residual Refinement Network for Nuclei Segmentation and Classification in Histology Images
by Ruifen Cao, Qingbin Meng, Dayu Tan, Pijing Wei, Yun Ding and Chunhou Zheng
Sensors 2024, 24(22), 7208; https://doi.org/10.3390/s24227208 - 11 Nov 2024
Viewed by 519
Abstract
The acurate segmentation and classification of nuclei in histological images are crucial for the diagnosis and treatment of colorectal cancer. However, the aggregation of nuclei and intra-class variability in histology images present significant challenges for nuclei segmentation and classification. In addition, the imbalance [...] Read more.
The acurate segmentation and classification of nuclei in histological images are crucial for the diagnosis and treatment of colorectal cancer. However, the aggregation of nuclei and intra-class variability in histology images present significant challenges for nuclei segmentation and classification. In addition, the imbalance of various nuclei classes exacerbates the difficulty of nuclei classification and segmentation using deep learning models. To address these challenges, we present a novel attention-enhanced residual refinement network (AER-Net), which consists of one encoder and three decoder branches that have same network structure. In addition to the nuclei instance segmentation branch and nuclei classification branch, one branch is used to predict the vertical and horizontal distance from each pixel to its nuclear center, which is combined with output by the segmentation branch to improve the final segmentation results. The AER-Net utilizes an attention-enhanced encoder module to focus on more valuable features. To further refine predictions and achieve more accurate results, an attention-enhancing residual refinement module is employed at the end of each encoder branch. Moreover, the coarse predictions and refined predictions are combined by using a loss function that employs cross-entropy loss and generalized dice loss to efficiently tackle the challenge of class imbalance among nuclei in histology images. Compared with other state-of-the-art methods on two colorectal cancer datasets and a pan-cancer dataset, AER-Net demonstrates outstanding performance, validating its effectiveness in nuclear segmentation and classification. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

Figure 1
<p>The number of nuclei per type in the CoNIC2022 dataset.</p>
Full article ">Figure 2
<p>Illustration of overall architecture.</p>
Full article ">Figure 3
<p>Illustration of <math display="inline"><semantics> <mrow> <mi>C</mi> <mi>S</mi> <mi>A</mi> <mi>R</mi> </mrow> </semantics></math> block.</p>
Full article ">Figure 4
<p>The structure of the decoder block.</p>
Full article ">Figure 5
<p>The structure of the attention-enhancing residual refinement module.</p>
Full article ">Figure 6
<p>Example visualization results on the CoNSeP dataset.</p>
Full article ">Figure 7
<p>Example visualization results on the Lizard dataset.</p>
Full article ">
13 pages, 2093 KiB  
Article
Speech Enhancement Algorithm Based on Microphone Array and Lightweight CRN for Hearing Aid
by Ji Xi, Zhe Xu, Weiqi Zhang, Li Zhao and Yue Xie
Electronics 2024, 13(22), 4394; https://doi.org/10.3390/electronics13224394 - 9 Nov 2024
Viewed by 739
Abstract
To address the performance and computational complexity issues in speech enhancement for hearing aids, a speech enhancement algorithm based on a microphone array and a lightweight two-stage convolutional recurrent network (CRN) is proposed. The algorithm consists of two main modules: a beamforming module [...] Read more.
To address the performance and computational complexity issues in speech enhancement for hearing aids, a speech enhancement algorithm based on a microphone array and a lightweight two-stage convolutional recurrent network (CRN) is proposed. The algorithm consists of two main modules: a beamforming module and a post-filtering module. The beamforming module utilizes directional features and a complex time-frequency long short-term memory (CFT-LSTM) network to extract local representations and perform spatial filtering. The post-filtering module uses analogous encoding and two symmetric decoding structures, with stacked CFT-LSTM blocks in between. It further reduces residual noise and improves filtering performance by passing spatial information through an inter-channel masking module. Experimental results show that this algorithm outperforms existing methods on the generated hearing aid dataset and the CHIME-3 dataset, with fewer parameters and lower model complexity, making it suitable for hearing aid scenarios with limited computational resources. Full article
(This article belongs to the Special Issue Signal, Image and Video Processing: Development and Applications)
Show Figures

Figure 1

Figure 1
<p>Overall structure of the model.</p>
Full article ">Figure 2
<p>Schematic diagram of CFT-LSTM network structure.</p>
Full article ">Figure 3
<p>Intermodal mask estimation module.</p>
Full article ">Figure 4
<p>Comparison of ablation experiment results on the hearing aid dataset.</p>
Full article ">Figure 5
<p>Comparison of algorithm performance on the hearing aid dataset.</p>
Full article ">Figure 6
<p>Comparison of algorithm performance on the CHIME-3 dataset.</p>
Full article ">
18 pages, 13017 KiB  
Article
DeployFusion: A Deployable Monocular 3D Object Detection with Multi-Sensor Information Fusion in BEV for Edge Devices
by Fei Huang, Shengshu Liu, Guangqian Zhang, Bingsen Hao, Yangkai Xiang and Kun Yuan
Sensors 2024, 24(21), 7007; https://doi.org/10.3390/s24217007 - 31 Oct 2024
Viewed by 671
Abstract
To address the challenges of suboptimal remote detection and significant computational burden in existing multi-sensor information fusion 3D object detection methods, a novel approach based on Bird’s-Eye View (BEV) is proposed. This method utilizes an enhanced lightweight EdgeNeXt feature extraction network, incorporating residual [...] Read more.
To address the challenges of suboptimal remote detection and significant computational burden in existing multi-sensor information fusion 3D object detection methods, a novel approach based on Bird’s-Eye View (BEV) is proposed. This method utilizes an enhanced lightweight EdgeNeXt feature extraction network, incorporating residual branches to address network degradation caused by the excessive depth of STDA encoding blocks. Meantime, deformable convolution is used to expand the receptive field and reduce computational complexity. The feature fusion module constructs a two-stage fusion network to optimize the fusion and alignment of multi-sensor features. This network aligns image features to supplement environmental information with point cloud features, thereby obtaining the final BEV features. Additionally, a Transformer decoder that emphasizes global spatial cues is employed to process the BEV feature sequence, enabling precise detection of distant small objects. Experimental results demonstrate that this method surpasses the baseline network, with improvements of 4.5% in the NuScenes detection score and 5.5% in average precision for detection objects. Finally, the model is converted and accelerated using TensorRT tools for deployment on mobile devices, achieving an inference time of 138 ms per frame on the Jetson Orin NX embedded platform, thus enabling real-time 3D object detection. Full article
(This article belongs to the Special Issue AI-Driving for Autonomous Vehicles)
Show Figures

Figure 1

Figure 1
<p>Overall framework of the network. DeployFusion introduces an improved EdgeNeXt feature extraction network, using residual branches to address degradation and deformable convolutions to increase the receptive field and reduce complexity. The feature fusion module aligns image and point cloud features to generate optimized BEV features. A Transformer decoder is used to process the sequence of BEV features, enabling accurate identification of small distant objects.</p>
Full article ">Figure 2
<p>Comparison of convolutional encoding block. (<b>a</b>) DW Encode. (<b>b</b>) DDW Encode.</p>
Full article ">Figure 3
<p>Feature channel separation attention.</p>
Full article ">Figure 4
<p>Feature channel separation attention.</p>
Full article ">Figure 5
<p>Transposed attention.</p>
Full article ">Figure 6
<p>Comparison of standard and variable convolution kernels in receptive field regions. (<b>a</b>) Receptive field area of standard convolutional kernel. (<b>b</b>) Receptive field area of deformable convolutional kernel.</p>
Full article ">Figure 7
<p>Experimental results of dynamic loss and NDS. (<b>a</b>) Dynamic loss graph. (<b>b</b>) Dynamic NDS score graph.</p>
Full article ">Figure 8
<p>Comparison of EdgeNeXt_DCN with other fusion networks of inference results.</p>
Full article ">Figure 9
<p>Comparisons of detection accuracy in different feature fusion networks. (<b>a</b>) Primitive feature extraction network. (<b>b</b>) EdgeNeXt_DCN feature extraction network.</p>
Full article ">Figure 10
<p>Results of object detection for each category.</p>
Full article ">Figure 11
<p>Comparison of detection results from multi-sensor fusion detection method in BEV.</p>
Full article ">Figure 12
<p>Performance of object detection in BEV of this method. (<b>a</b>) Scene 1. (<b>b</b>) Scene 2.</p>
Full article ">Figure 13
<p>Jetson Orin NX mobile device.</p>
Full article ">Figure 14
<p>Workflow of TensorRT.</p>
Full article ">Figure 15
<p>Comparison of computation time before and after operator fusion.</p>
Full article ">Figure 16
<p>Comparison of detection methods in various quantifiers and accuracy levels.</p>
Full article ">Figure 17
<p>Comparison of inference time before and after model quantification in detection.</p>
Full article ">Figure 18
<p>Detection result of method on mobile devices.</p>
Full article ">
17 pages, 1520 KiB  
Article
A Strip Steel Surface Defect Salient Object Detection Based on Channel, Spatial and Self-Attention Mechanisms
by Yange Sun, Siyu Geng, Huaping Guo, Chengyi Zheng and Li Zhang
Electronics 2024, 13(21), 4277; https://doi.org/10.3390/electronics13214277 - 31 Oct 2024
Viewed by 600
Abstract
Strip steel is extensively utilized in industries such as automotive manufacturing and aerospace due to its superior machinability, economic benefits, and adaptability. However, defects on the surface of steel strips, such as inclusions, patches, and scratches, significantly affect the performance and service life [...] Read more.
Strip steel is extensively utilized in industries such as automotive manufacturing and aerospace due to its superior machinability, economic benefits, and adaptability. However, defects on the surface of steel strips, such as inclusions, patches, and scratches, significantly affect the performance and service life of the product. Therefore, the salient object detection of surface defects on strip steel is crucial to ensure the quality of the final product. Many factors, such as the low contrast of surface defects on strip steel, the diversity of defect types, complex texture structures, and irregular defect distribution, hinder existing detection technologies from accurately identifying and segmenting defect areas against complex backgrounds. To address the above problems, we propose a novel detector called S3D-SOD for the salient object detection of strip steel surface defects. For the encoding stage, a residual self-attention block is proposed to explore semantic information cues of high-level features to locate and guide low-level feature information. In addition, we apply a general residual channel and spatial attention to low-level features, enabling the model to adaptively focus on the key channels and spatial areas of feature maps with high resolutions, thereby enhancing the encoder features and accelerating the convergence of the model. For the decoding stage, a simple residual decoder block with an upsampling operation is proposed to realize the integration and interaction of feature information between different layers. Here, the simple residual decoder block is used for feature integration due to the following observation: backbone networks like ResNet and the Swin Transformer, after being pretrained on the large dataset ImageNet and then fine-tuned on a smaller dataset for strip steel surface defects, are capable of extracting feature maps that contain both general image features and the specific characteristics required for the salient object detection of strip steel surface defects. The experimental results on the SD-saliency-900 dataset show that S3D-SOD is better than advanced methods, and it has strong generalization ability and robustness. Full article
Show Figures

Figure 1

Figure 1
<p>The framework of our S3D-SOD.</p>
Full article ">Figure 2
<p>Architectural diagram of RSAB.</p>
Full article ">Figure 3
<p>Architectural diagram of RCSA.</p>
Full article ">Figure 4
<p>Three defects on SD-saliency-900.</p>
Full article ">Figure 5
<p>Quantitative evaluation of different models on SD-saliency-900.</p>
Full article ">Figure 6
<p>Visualization comparison of different models on SD-saliency-900. (<b>a</b>) Input images, (<b>b</b>) ground truth, (<b>c</b>) RCRR, (<b>d</b>) 2LSG, (<b>e</b>) BC, (<b>f</b>) SMD, (<b>g</b>) MIL, (<b>h</b>) PFANet, (<b>i</b>) NLDF, (<b>j</b>) DSS, (<b>k</b>) R3Net, (<b>l</b>) BMPM, (<b>m</b>) PoolNet, (<b>n</b>) PiCANet, (<b>o</b>) CPD, (<b>p</b>) BASNet, (<b>q</b>) SAMNet, (<b>r</b>) ITSD, (<b>s</b>) F3Net, (<b>t</b>) MINet, (<b>u</b>) EDRNet, (<b>v</b>) DACNet, and (<b>w</b>) ours.</p>
Full article ">
14 pages, 16241 KiB  
Article
Seismic Random Noise Attenuation Using DARE U-Net
by Tara P. Banjade, Cong Zhou, Hui Chen, Hongxing Li, Juzhi Deng, Feng Zhou and Rajan Adhikari
Remote Sens. 2024, 16(21), 4051; https://doi.org/10.3390/rs16214051 - 30 Oct 2024
Viewed by 730
Abstract
Seismic data processing plays a pivotal role in extracting valuable subsurface information for various geophysical applications. However, seismic records often suffer from inherent random noise, which obscures meaningful geological features and reduces the reliability of interpretations. In recent years, deep learning methodologies have [...] Read more.
Seismic data processing plays a pivotal role in extracting valuable subsurface information for various geophysical applications. However, seismic records often suffer from inherent random noise, which obscures meaningful geological features and reduces the reliability of interpretations. In recent years, deep learning methodologies have shown promising results in performing noise attenuation tasks on seismic data. In this research, we propose modifications to the standard U-Net structure by integrating dense and residual connections, which serve as the foundation of our approach named the dense and residual (DARE U-Net) network. Dense connections enhance the receptive field and ensure that information from different scales is considered during the denoising process. Our model implements local residual connections between layers within the encoder, which allows earlier layers to directly connect with deep layers. This promotes the flow of information, allowing the network to utilize filtered and unfiltered input. The combined network mechanisms preserve the spatial information loss during the contraction process so that the decoder can locate the features more accurately by retaining the high-resolution features, enabling precise location in seismic image denoising. We evaluate this adapted architecture by applying synthetic and real data sets and calculating the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM). The effectiveness of this method is well noted. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>DARE U-Net architecture.</p>
Full article ">Figure 2
<p>Residual connection.</p>
Full article ">Figure 3
<p>Local residual connection within each layer of an encoder.</p>
Full article ">Figure 4
<p>Structure of residual dense block.</p>
Full article ">Figure 5
<p>A sample of the training data. (<b>a</b>) Noise-free data. (<b>b</b>) Noisy data.</p>
Full article ">Figure 6
<p>Test on four sets of seismic data. First to fifth column: noise-free data, noisy data, denoised by wavelet, U-Net, and DARE U-Net.</p>
Full article ">Figure 7
<p>(<b>a</b>) Noise-free data. (<b>b</b>) Noisy data. (<b>c</b>) Denoised by wavelet. (<b>d</b>) Denoised by U-Net. (<b>e</b>) Denoised by DARE U-Net.</p>
Full article ">Figure 8
<p>FK spectrum comparisons. (<b>a</b>) Noise-free data. (<b>b</b>) Noisy data. (<b>c</b>) Denoised by wavelet. (<b>d</b>) Denoised by U-Net. (<b>e</b>) Denoised by DARE U-Net.</p>
Full article ">Figure 8 Cont.
<p>FK spectrum comparisons. (<b>a</b>) Noise-free data. (<b>b</b>) Noisy data. (<b>c</b>) Denoised by wavelet. (<b>d</b>) Denoised by U-Net. (<b>e</b>) Denoised by DARE U-Net.</p>
Full article ">Figure 9
<p>Real data test. (<b>a</b>) Noise-free data. (<b>b</b>) Noisy data. (<b>c</b>) Denoised by wavelet. (<b>d</b>) Denoised by U-Net. (<b>e</b>) Denoised by DARE U-Net.</p>
Full article ">Figure 10
<p>Residual section of denoised real data. (<b>a</b>) Wavelet. (<b>b</b>) U-Net. (<b>c</b>) DARE U-Net.</p>
Full article ">
17 pages, 5637 KiB  
Article
Precision Segmentation of Subretinal Fluids in OCT Using Multiscale Attention-Based U-Net Architecture
by Prakash Kumar Karn and Waleed H. Abdulla
Bioengineering 2024, 11(10), 1032; https://doi.org/10.3390/bioengineering11101032 - 16 Oct 2024
Viewed by 982
Abstract
This paper presents a deep-learning architecture for segmenting retinal fluids in patients with Diabetic Macular Oedema (DME) and Age-related Macular Degeneration (AMD). Accurate segmentation of multiple fluid types is critical for diagnosis and treatment planning, but existing techniques often struggle with precision. We [...] Read more.
This paper presents a deep-learning architecture for segmenting retinal fluids in patients with Diabetic Macular Oedema (DME) and Age-related Macular Degeneration (AMD). Accurate segmentation of multiple fluid types is critical for diagnosis and treatment planning, but existing techniques often struggle with precision. We propose an encoder–decoder network inspired by U-Net, processing enhanced OCT images and their edge maps. The encoder incorporates Residual and Inception modules with an autoencoder-based multiscale attention mechanism to extract detailed features. Our method shows superior performance across several datasets. On the RETOUCH dataset, the network achieved F1 Scores of 0.82 for intraretinal fluid (IRF), 0.93 for subretinal fluid (SRF), and 0.94 for pigment epithelial detachment (PED). The model also performed well on the OPTIMA and DUKE datasets, demonstrating high precision, recall, and F1 Scores. This architecture significantly enhances segmentation accuracy and edge precision, offering a valuable tool for diagnosing and managing retinal diseases. Its integration of dual-input processing, multiscale attention, and advanced encoder modules highlights its potential to improve clinical outcomes and advance retinal disease treatment. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

Figure 1
<p>General Procedure of Retinal Fluid Segmentation.</p>
Full article ">Figure 2
<p>ROI generated from the Original OCT scan (<b>Left</b>) and its Edge map (<b>Right</b>).</p>
Full article ">Figure 3
<p>Proposed multiscale attention-based U-Net Model.</p>
Full article ">Figure 4
<p>Detailed Architecture of Encoder block of Proposed Architecture.</p>
Full article ">Figure 5
<p>Retinal fluid Segmentation using the proposed model on the RETOUCH.</p>
Full article ">Figure 6
<p>ROC curve for RETOUCH dataset from various vendors.</p>
Full article ">Figure 7
<p>ROC Curve for OPTIMA Dataset with AUC.</p>
Full article ">Figure 8
<p>Prediction on OPTIMA Dataset.</p>
Full article ">Figure 8 Cont.
<p>Prediction on OPTIMA Dataset.</p>
Full article ">Figure 9
<p>Predicted output from DUKE dataset.</p>
Full article ">Figure 10
<p>ROC curve for DUKE DSC.</p>
Full article ">
21 pages, 6596 KiB  
Article
MRACNN: Multi-Path Residual Asymmetric Convolution and Enhanced Local Attention Mechanism for Industrial Image Compression
by Zikang Yan, Peishun Liu, Xuefang Wang, Haojie Gao, Xiaolong Ma and Xintong Hu
Symmetry 2024, 16(10), 1342; https://doi.org/10.3390/sym16101342 - 10 Oct 2024
Viewed by 1078
Abstract
The rich information and complex background of industrial images make it a challenging task to improve the high compression rate of images. Current learning-based image compression methods mostly use customized convolutional neural networks (CNNs), which find it difficult to cope with the complex [...] Read more.
The rich information and complex background of industrial images make it a challenging task to improve the high compression rate of images. Current learning-based image compression methods mostly use customized convolutional neural networks (CNNs), which find it difficult to cope with the complex production background of industrial images. This causes useful information to be lost in the abundance of irrelevant data, making it difficult to accurately extract important features during the feature extraction stage. To address this, a Multi-path Residual Asymmetric Convolutional Compression Network (MRACNN) is proposed. Firstly, a Multi-path Residual Asymmetric Convolution Block (MRACB) is introduced, which includes the Multi-path Residual Asymmetric Convolution Down-sampling Module for down-sampling in the encoder to extract key features, and the Mult-path Residual Asymmetric Convolution Up-sampling Module for up-sampling in the decoder to recover details and reconstruct the image. This feature transfer and information flow enables the better capture of image details and important information, thereby improving the quality and efficiency of image compression and decompression. Furthermore, a two-branch enhanced local attention mechanisms, and a channel-squeezing entropy model based on the compression-based enhanced local attention module is proposed to enhance the performance of the modeled compression. Extensive experimental evaluations demonstrate that the proposed method outperforms state-of-the-art techniques, achieves superior Rate–Distortion Performance, and excels in preserving local details. Full article
(This article belongs to the Special Issue Symmetry/Asymmetry in Neural Networks and Applications)
Show Figures

Figure 1

Figure 1
<p>Comparison between the original image and the compressed image based on symmetric convolution.</p>
Full article ">Figure 2
<p>Schematic diagram of symmetric and asymmetric convolution structures.</p>
Full article ">Figure 3
<p>The end-to-end image compression model (MRACNN). MRACD↓2 represents the Multi-Path Residual Asymmetric Convolution Down-Sampling module, MRACU↓2 signifies the Multi-Path Residual Asymmetric Convolution Up-Sampling module. ELAM denotes the Enhanced Local Attention Module, Q signifies Quantization, AE and AD represent Arithmetic Encoder and Arithmetic Decoder.</p>
Full article ">Figure 4
<p>Multi-Path Residual Asymmetric Convolution Block (MRACB). Above is the Multi-Path Residual Asymmetric Convolution Up-Sampling module (MRACU), and below is the Multi-Path Residual Asymmetric Convolution Down-Sampling module (MRACD).</p>
Full article ">Figure 5
<p>The Enhanced Local Attention Module.</p>
Full article ">Figure 6
<p>Channel-aware Squeezing Entropy Model. ELAT represents the Enhanced Local Attention Module applied to the channel entropy model, E and D represent arithmetic encoding and decoding, <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">μ</mi> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">σ</mi> </mrow> </semantics></math> represent variance and mean, and LRP represents latent residual error.</p>
Full article ">Figure 7
<p>Compressed Enhanced Local Attention Block (CLAB) in the Channel-aware Squeezing Entropy Model.</p>
Full article ">Figure 8
<p>Rate–distortion curves for different methods on the Kodak dataset. (<b>a</b>) Evaluations on the Kodak dataset in terms of PSNR. (<b>b</b>) Evaluations on the Kodak dataset in terms of MS-SSIM.</p>
Full article ">Figure 9
<p>Rate–distortion curves for different methods on the CLIC-PRO dataset. (<b>a</b>) Evaluations on the CLIC-PRO dataset in terms of PSNR. (<b>b</b>) Evaluations on the CLIC-PRO dataset in terms of MS-SSIM.</p>
Full article ">Figure 10
<p>Feature map of potential feature in industrial production scene images after down-sampling.</p>
Full article ">Figure 11
<p>Visualization of reconstructed images from industrial production scenes. The indicator is [bpp↓/PNSR↑/MS-SSIM↑].</p>
Full article ">Figure 12
<p>The ablation study on the MRACB.</p>
Full article ">Figure 13
<p>The impact of the number of convolution kernel sizes on MRACB module.</p>
Full article ">Figure 14
<p>Visual comparison of reconstructed images from industrial production workshops. The indicator is [bpp↓/PNSR↑/MS-SSIM↑].</p>
Full article ">
25 pages, 38912 KiB  
Article
Thin Cloud Removal Generative Adversarial Network Based on Sparse Transformer in Remote Sensing Images
by Jinqi Han, Ying Zhou, Xindan Gao and Yinghui Zhao
Remote Sens. 2024, 16(19), 3658; https://doi.org/10.3390/rs16193658 - 30 Sep 2024
Viewed by 1146
Abstract
Thin clouds in Remote Sensing (RS) imagery can negatively impact subsequent applications. Current Deep Learning (DL) approaches often prioritize information recovery in cloud-covered areas but may not adequately preserve information in cloud-free regions, leading to color distortion, detail loss, and visual artifacts. This [...] Read more.
Thin clouds in Remote Sensing (RS) imagery can negatively impact subsequent applications. Current Deep Learning (DL) approaches often prioritize information recovery in cloud-covered areas but may not adequately preserve information in cloud-free regions, leading to color distortion, detail loss, and visual artifacts. This study proposes a Sparse Transformer-based Generative Adversarial Network (SpT-GAN) to solve these problems. First, a global enhancement feature extraction module is added to the generator’s top layer to enhance the model’s ability to preserve ground feature information in cloud-free areas. Then, the processed feature map is reconstructed using the sparse transformer-based encoder and decoder with an adaptive threshold filtering mechanism to ensure sparsity. This mechanism enables that the model preserves robust long-range modeling capabilities while disregarding irrelevant details. In addition, inverted residual Fourier transformation blocks are added at each level of the structure to filter redundant information and enhance the quality of the generated cloud-free images. Finally, a composite loss function is created to minimize error in the generated images, resulting in improved resolution and color fidelity. SpT-GAN achieves outstanding results in removing clouds both quantitatively and visually, with Structural Similarity Index (SSIM) values of 98.06% and 92.19% and Peak Signal-to-Noise Ratio (PSNR) values of 36.19 dB and 30.53 dB on the RICE1 and T-Cloud datasets, respectively. On the T-Cloud dataset, especially with more complex cloud components, the superior ability of SpT-GAN to restore ground details is more evident. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of SpT-GAN: (<b>a</b>) U-Net-based generator and (<b>b</b>) Patch-GAN-based discriminator.</p>
Full article ">Figure 2
<p>(<b>a</b>) Illustration of the GEFE module and (<b>b</b>) illustration of the coordinate attention mechanism.</p>
Full article ">Figure 3
<p>(<b>a</b>) Illustration of the sparse attention module and (<b>b</b>) illustration of the FFN module.</p>
Full article ">Figure 4
<p>Illustration of the filtering mechanism.</p>
Full article ">Figure 5
<p>Dataset samples: (<b>a</b>) RICE1 dataset and (<b>b</b>) T-Cloud dataset.</p>
Full article ">Figure 6
<p>Box plots of the MSE results produced by each DL-based method: (<b>a</b>) RICE1 dataset and (<b>b</b>) T-Cloud dataset.</p>
Full article ">Figure 7
<p>Results of DL-based methods on the RICE1 dataset: (<b>a</b>) cloudy images, (<b>b</b>) McGAN [<a href="#B78-remotesensing-16-03658" class="html-bibr">78</a>], (<b>c</b>) SpA-GAN [<a href="#B28-remotesensing-16-03658" class="html-bibr">28</a>], (<b>d</b>) AMGAN-CR [<a href="#B79-remotesensing-16-03658" class="html-bibr">79</a>], (<b>e</b>) MemoryNet [<a href="#B81-remotesensing-16-03658" class="html-bibr">81</a>], (<b>f</b>) CVAE [<a href="#B25-remotesensing-16-03658" class="html-bibr">25</a>], (<b>g</b>) MSDA-CR [<a href="#B80-remotesensing-16-03658" class="html-bibr">80</a>], (<b>h</b>) our proposed method, (<b>i</b>) cloud-free images.</p>
Full article ">Figure 8
<p>Magnified details of the results of each DL-method on the RICE1 dataset: (<b>a</b>) cloudy images, (<b>b</b>) McGAN [<a href="#B78-remotesensing-16-03658" class="html-bibr">78</a>], (<b>c</b>) SpA-GAN [<a href="#B28-remotesensing-16-03658" class="html-bibr">28</a>], (<b>d</b>) AMGAN-CR [<a href="#B79-remotesensing-16-03658" class="html-bibr">79</a>], (<b>e</b>) MemoryNet [<a href="#B81-remotesensing-16-03658" class="html-bibr">81</a>], (<b>f</b>) CVAE [<a href="#B25-remotesensing-16-03658" class="html-bibr">25</a>], (<b>g</b>) MSDA-CR [<a href="#B80-remotesensing-16-03658" class="html-bibr">80</a>], (<b>h</b>) our proposed method, (<b>i</b>) cloud-free images.</p>
Full article ">Figure 9
<p>Results of each DL-method on the T-Cloud dataset: (<b>a</b>) cloudy images, (<b>b</b>) McGAN [<a href="#B78-remotesensing-16-03658" class="html-bibr">78</a>], (<b>c</b>) SpA-GAN [<a href="#B28-remotesensing-16-03658" class="html-bibr">28</a>], (<b>d</b>) AMGAN-CR [<a href="#B79-remotesensing-16-03658" class="html-bibr">79</a>], (<b>e</b>) MemoryNet [<a href="#B81-remotesensing-16-03658" class="html-bibr">81</a>], (<b>f</b>) CVAE [<a href="#B25-remotesensing-16-03658" class="html-bibr">25</a>], (<b>g</b>) MSDA-CR [<a href="#B80-remotesensing-16-03658" class="html-bibr">80</a>], (<b>h</b>) our proposed method, (<b>i</b>) cloud-free images.</p>
Full article ">Figure 10
<p>Magnified details of the results of each DL-method on the T-Cloud dataset: (<b>a</b>) cloudy images, (<b>b</b>) McGAN [<a href="#B78-remotesensing-16-03658" class="html-bibr">78</a>], (<b>c</b>) SpA-GAN [<a href="#B28-remotesensing-16-03658" class="html-bibr">28</a>], (<b>d</b>) AMGAN-CR [<a href="#B79-remotesensing-16-03658" class="html-bibr">79</a>], (<b>e</b>) MemoryNet [<a href="#B81-remotesensing-16-03658" class="html-bibr">81</a>], (<b>f</b>) CVAE [<a href="#B25-remotesensing-16-03658" class="html-bibr">25</a>], (<b>g</b>) MSDA-CR [<a href="#B80-remotesensing-16-03658" class="html-bibr">80</a>], (<b>h</b>) our proposed method, (<b>i</b>) cloud-free images.</p>
Full article ">Figure 11
<p>Comparison showing the effectiveness of adding the IRFT block and GEFE module: (<b>a</b>) cloudy image, (<b>b</b>) ground truth, (<b>c</b>) complete SpT-GAN, (<b>d</b>) SpT-GAN without IRFT block and GEFE module, (<b>e</b>) SpT-GAN without IRFT block, (<b>f</b>) SpT-GAN without GEFE module, (<b>g</b>) SpT-GAN with the GEFE module replaced by a transformer block.</p>
Full article ">Figure 12
<p>Attention heatmaps of the GEFE module and the transformer block: (<b>a</b>) Extensive cloud coverage, (<b>b</b>) moderate cloud coverage, (<b>c</b>) uneven cloud coverage, (<b>d</b>) slight cloud coverage.</p>
Full article ">Figure 13
<p>Visual effect of cloudless image processing; the PSNR and SSIM values for each image pair are as follows: (<b>a</b>) PSNR 30.21 dB and SSIM 95.10%, (<b>b</b>) PSNR 33.67 dB and SSIM 98.04%, (<b>c</b>) PSNR 31.57 dB and SSIM 95.88%, (<b>d</b>) PSNR 29.37 dB and SSIM 97.21%.</p>
Full article ">
19 pages, 4019 KiB  
Article
Vessel Trajectory Prediction Based on Automatic Identification System Data: Multi-Gated Attention Encoder Decoder Network
by Fan Yang, Chunlin He, Yi Liu, Anping Zeng and Longhe Hu
J. Mar. Sci. Eng. 2024, 12(10), 1695; https://doi.org/10.3390/jmse12101695 - 24 Sep 2024
Viewed by 749
Abstract
Utilizing time-series data from ship trajectories to forecast their subsequent movement is crucial for enhancing the safety within maritime traffic environments. The application of deep learning techniques, leveraging Automatic Identification System (AIS) data, has emerged as a pivotal area in maritime traffic studies. [...] Read more.
Utilizing time-series data from ship trajectories to forecast their subsequent movement is crucial for enhancing the safety within maritime traffic environments. The application of deep learning techniques, leveraging Automatic Identification System (AIS) data, has emerged as a pivotal area in maritime traffic studies. Within this domain, the precise forecasting of ship trajectories stands as a central challenge. In this study, we propose the multi-gated attention encoder decoder (MGAED) network, a model based on an encoder–decoder structure specialized for predicting ship trajectories in canals. The model employs a long short-term memory network (LSTM) as an encoder, combined with multiple Gated Recurrent Units (GRUs) and an attention mechanism for the decoder. Long-term dependencies in time-series data are captured through GRUs, while the attention mechanism is used to strengthen the model’s ability to capture key information, and a soft threshold residual structure is introduced to handle sparse features, thus enhancing the model’s generalization ability and robustness. The efficacy of our model is substantiated by an extensive evaluation against current deep learning benchmarks. Through comprehensive comparison experiments with existing deep learning methods, our model shows significant improvements in prediction accuracy, with an at least 9.63% reduction in the mean error (MAE) and an at least 20.0% reduction in the mean square error (MSE), providing a new solution to improve the accuracy and efficiency of ship trajectory prediction. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

Figure 1
<p>Overview of our mode, illustrating the input data flow through the encoder and decoder layers, and final output prediction.</p>
Full article ">Figure 2
<p>LSTM cell structure, <math display="inline"><semantics> <msub> <mi>h</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>C</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </semantics></math> are the hidden state and cell state of the previous LSTM cell. <math display="inline"><semantics> <msub> <mi>x</mi> <mi>t</mi> </msub> </semantics></math> denotes the input at the current time step.</p>
Full article ">Figure 3
<p>GRU cell structure; <math display="inline"><semantics> <msub> <mi>h</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </semantics></math> are the hidden state of the previous LSTM cell. <math display="inline"><semantics> <msub> <mi>x</mi> <mi>t</mi> </msub> </semantics></math> denotes the input at the current time step.</p>
Full article ">Figure 4
<p>Tracks of two rivers. The Columbia River is shown in (<b>a</b>), while the Hudson River is shown in (<b>b</b>).</p>
Full article ">Figure 5
<p>Different aspects of Columbia River and Hudson River. (<b>a</b>) Tracks lengths in the Columbia River dataset. (<b>b</b>) Tracks lengths in the Hudson River dataset. (<b>c</b>) Sampling time for the Columbia River dataset. (<b>d</b>) Sampling time for the Hudson River dataset. (<b>e</b>) Sampling distance for the Columbia River dataset. (<b>f</b>) Sampling distance for the Hudson River dataset.</p>
Full article ">Figure 6
<p>Sliding Window Schematic.</p>
Full article ">Figure 7
<p>Loss over epochs for different models. (<b>a</b>) Loss curves for different models on the Columbia River dataset. (<b>b</b>) Loss curves for different models on the Hudson River dataset. (<b>c</b>) Valid and Train Loss curves for our models on the Columbia River dataset. (<b>d</b>) Valid and Train Loss curves for our models on the Hudson River dataset.</p>
Full article ">Figure 7 Cont.
<p>Loss over epochs for different models. (<b>a</b>) Loss curves for different models on the Columbia River dataset. (<b>b</b>) Loss curves for different models on the Hudson River dataset. (<b>c</b>) Valid and Train Loss curves for our models on the Columbia River dataset. (<b>d</b>) Valid and Train Loss curves for our models on the Hudson River dataset.</p>
Full article ">Figure 8
<p>Comparison of predicted and actual trajectories for the Columbia River.</p>
Full article ">Figure 9
<p>Comparison of predicted and actual trajectories of the Hudson River.</p>
Full article ">
22 pages, 45055 KiB  
Article
SA-SatMVS: Slope Feature-Aware and Across-Scale Information Integration for Large-Scale Earth Terrain Multi-View Stereo
by Xiangli Chen, Wenhui Diao, Song Zhang, Zhiwei Wei and Chunbo Liu
Remote Sens. 2024, 16(18), 3474; https://doi.org/10.3390/rs16183474 - 19 Sep 2024
Viewed by 962
Abstract
Satellite multi-view stereo (MVS) is a fundamental task in large-scale Earth surface reconstruction. Recently, learning-based multi-view stereo methods have shown promising results in this field. However, these methods are mainly developed by transferring the general learning-based MVS framework to satellite imagery, which lacks [...] Read more.
Satellite multi-view stereo (MVS) is a fundamental task in large-scale Earth surface reconstruction. Recently, learning-based multi-view stereo methods have shown promising results in this field. However, these methods are mainly developed by transferring the general learning-based MVS framework to satellite imagery, which lacks consideration of the specific terrain features of the Earth’s surface and results in inadequate accuracy. In addition, mainstream learning-based methods mainly use equal height interval partition, which insufficiently utilizes the height hypothesis surface, resulting in inaccurate height estimation. To address these challenges, we propose an end-to-end terrain feature-aware height estimation network named SA-SatMVS for large-scale Earth surface multi-view stereo, which integrates information across different scales. Firstly, we transform the Sobel operator into slope feature-aware kernels to extract terrain features, and a dual encoder–decoder architecture with residual blocks is applied to incorporate slope information and geometric structural characteristics to guide the reconstruction process. Secondly, we introduce a pixel-wise unequal interval partition method using a Laplacian distribution based on the probability volume obtained from other scales, resulting in more accurate height hypotheses for height estimation. Thirdly, we apply an adaptive spatial feature extraction network to search for the optimal fusion method for feature maps at different scales. Extensive experiments on the WHU-TLC dataset also demonstrate that our proposed model achieves the best MAE metric of 1.875 and an RMSE metric of 3.785, which constitutes a state-of-the-art performance. Full article
Show Figures

Figure 1

Figure 1
<p>Illustration of the overall SA-SatMVS. Our network consists of feature extraction, cost volume construction, cost volume regularization, height map regression, and DSM production. This is a typical multi-stage coarse-to-fine framework. ASFE, UBHS, and TPGF are our novel modules. The baseline architecture is derived from the SatMVS(CasMVSNet) [<a href="#B5-remotesensing-16-03474" class="html-bibr">5</a>].</p>
Full article ">Figure 2
<p>Illustration of the adaptive spatial feature extraction network. The left, middle, and right parts represent the optimal integration of the three stages.</p>
Full article ">Figure 3
<p>Illustration of Slope−Net. The middle part is the SK Conv Layer. It contains two directional Sobel kernels.</p>
Full article ">Figure 4
<p>Comparison of the convolutional layers and geometric convolutional layers. A geometric convolutional layer augments a convolutional layer by concatenating three extra channels (X, Y, and Z) to the input.</p>
Full article ">Figure 5
<p>Illustration of the terrain-prior-guided feature fusion network. The middle part consists of dual encoder–decoder architecture. The right part contains the basic FPN network.</p>
Full article ">Figure 6
<p>The height map results of SatMVS (CasMVSNet), SatMVS (UCS-Net), SatMVS (RED-Net), and SA-SatMVS.</p>
Full article ">Figure 7
<p>The DSM results of SatMVS(CasMVSNet), SatMVS(UCS-Net), SatMVS(RED-Net), and SA-SatMVS.</p>
Full article ">
19 pages, 5464 KiB  
Article
A Multi-Scale Liver Tumor Segmentation Method Based on Residual and Hybrid Attention Enhanced Network with Contextual Integration
by Liyan Sun, Linqing Jiang, Mingcong Wang, Zhenyan Wang and Yi Xin
Sensors 2024, 24(17), 5845; https://doi.org/10.3390/s24175845 - 9 Sep 2024
Viewed by 1029
Abstract
Liver cancer is one of the malignancies with high mortality rates worldwide, and its timely detection and accurate diagnosis are crucial for improving patient prognosis. To address the limitations of traditional image segmentation techniques and the U-Net network in capturing fine image features, [...] Read more.
Liver cancer is one of the malignancies with high mortality rates worldwide, and its timely detection and accurate diagnosis are crucial for improving patient prognosis. To address the limitations of traditional image segmentation techniques and the U-Net network in capturing fine image features, this study proposes an improved model based on the U-Net architecture, named RHEU-Net. By replacing traditional convolution modules in the encoder and decoder with improved residual modules, the network’s feature extraction capabilities and gradient stability are enhanced. A Hybrid Gated Attention (HGA) module is integrated before the skip connections, enabling the parallel processing of channel and spatial attentions, optimizing the feature fusion strategy, and effectively replenishing image details. A Multi-Scale Feature Enhancement (MSFE) layer is introduced at the bottleneck, utilizing multi-scale feature extraction technology to further enhance the expression of receptive fields and contextual information, improving the overall feature representation effect. Testing on the LiTS2017 dataset demonstrated that RHEU-Net achieved Dice scores of 95.72% for liver segmentation and 70.19% for tumor segmentation. These results validate the effectiveness of RHEU-Net and underscore its potential for clinical application. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Figure 1
<p>Architecture of RHEU-Net, where Module A denotes the residual module, Module B refers to the Multi-Scale Feature Enhancement module (MSFE), Module C indicates the Hybrid Gated Attention module (HGA), and Module D includes convolution operations.</p>
Full article ">Figure 2
<p>(<b>a</b>) Structure of the residual module in ResNet; (<b>b</b>) Structure of the residual module used in the encoder; (<b>c</b>) Structure of the residual module used in the decoder.</p>
Full article ">Figure 3
<p>Structure of the Hybrid Gated Attention module.</p>
Full article ">Figure 4
<p>Structure of the Channel Attention Module.</p>
Full article ">Figure 5
<p>Structure of the spatial attention module.</p>
Full article ">Figure 6
<p>Structure of the Hybrid Gated Attention Module.</p>
Full article ">Figure 7
<p>Structure of the Multi-Scale Feature Enhancement Module.</p>
Full article ">Figure 8
<p>(<b>a</b>) Original image; (<b>b</b>) Flip horizontal; (<b>c</b>) Flip vertical; (<b>d</b>) Left rotation; (<b>e</b>) Right rotation.</p>
Full article ">Figure 9
<p>Segmentation results from various networks on selected test set images in the ablation experiment. From left to right: (<b>a</b>) original CT image, (<b>b</b>) gold standard, (<b>c</b>) U-Net, (<b>d</b>) Res+U-Net, (<b>e</b>) HGA+U-Net, (<b>f</b>) MSFE+U-Net, (<b>g</b>) Res+HGA+U-Net, and (<b>h</b>) RHEU-Net (method described in this study).</p>
Full article ">Figure 10
<p>Training loss trends of different models.</p>
Full article ">Figure 11
<p>Comparison of liver segmentation results from different networks against the gold standard. From left to right, the images represent: (<b>a</b>) original CT image, (<b>b</b>) gold standard, (<b>c</b>) Unet, (<b>d</b>) AttentionUnet, (<b>e</b>) ResUnet-a, (<b>f</b>) CAUnet, (<b>g</b>) Res Unet++, (<b>h</b>) RIUNet, (<b>i</b>) RHEUnet (method described in this study).</p>
Full article ">
Back to TopTop