[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (132)

Search Parameters:
Keywords = dual-branch feature fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 18083 KiB  
Article
Robust Multi-Subtype Identification of Breast Cancer Pathological Images Based on a Dual-Branch Frequency Domain Fusion Network
by Jianjun Li, Kaiyue Wang and Xiaozhe Jiang
Sensors 2025, 25(1), 240; https://doi.org/10.3390/s25010240 - 3 Jan 2025
Viewed by 232
Abstract
Breast cancer (BC) is one of the most lethal cancers worldwide, and its early diagnosis is critical for improving patient survival rates. However, the extraction of key information from complex medical images and the attainment of high-precision classification present a significant challenge. In [...] Read more.
Breast cancer (BC) is one of the most lethal cancers worldwide, and its early diagnosis is critical for improving patient survival rates. However, the extraction of key information from complex medical images and the attainment of high-precision classification present a significant challenge. In the field of signal processing, texture-rich images typically exhibit periodic patterns and structures, which are manifested as significant energy concentrations at specific frequencies in the frequency domain. Given the above considerations, this study is designed to explore the application of frequency domain analysis in BC histopathological classification. This study proposes the dual-branch adaptive frequency domain fusion network (AFFNet), designed to enable each branch to specialize in distinct frequency domain features of pathological images. Additionally, two different frequency domain approaches, namely Multi-Spectral Channel Attention (MSCA) and Fourier Filtering Enhancement Operator (FFEO), are employed to enhance the texture features of pathological images and minimize information loss. Moreover, the contributions of the two branches at different stages are dynamically adjusted by a frequency-domain-adaptive fusion strategy to accommodate the complexity and multi-scale features of pathological images. The experimental results, based on two public BC histopathological image datasets, corroborate the idea that AFFNet outperforms 10 state-of-the-art image classification methods, underscoring its effectiveness and superiority in this domain. Full article
(This article belongs to the Special Issue AI-Based Automated Recognition and Detection in Healthcare)
Show Figures

Figure 1

Figure 1
<p>The hierarchical network architecture of AFFNet.</p>
Full article ">Figure 2
<p>The basic architecture in each stage of AFFNet.</p>
Full article ">Figure 3
<p>ROC and confusion matrix.</p>
Full article ">Figure 4
<p>Visualization of T-SNE of different networks.</p>
Full article ">
21 pages, 6626 KiB  
Article
A Text-Based Dual-Branch Person Re-Identification Algorithm Based on the Deep Attribute Information Mining Network
by Ke Han, Xiyan Zhang, Wenlong Xu and Long Jin
Symmetry 2025, 17(1), 64; https://doi.org/10.3390/sym17010064 - 2 Jan 2025
Viewed by 277
Abstract
Text-based person re-identification enables the retrieval of specific pedestrians from a large image library using textual descriptions, effectively addressing the issue of missing pedestrian images. The main challenges in this task are to learn discriminative image–text features and achieve accurate cross-modal matching. Despite [...] Read more.
Text-based person re-identification enables the retrieval of specific pedestrians from a large image library using textual descriptions, effectively addressing the issue of missing pedestrian images. The main challenges in this task are to learn discriminative image–text features and achieve accurate cross-modal matching. Despite the potential of leveraging semantic information from pedestrian attributes, current methods have not yet fully harnessed this resource. To this end, we introduce a novel Text-based Dual-branch Person Re-identification Algorithm based on the Deep Attribute Information Mining (DAIM) network. Our approach employs a Masked Language Modeling (MLM) module to learn cross-modal attribute alignments through mask language modeling, and an Implicit Relational Prompt (IRP) module to extract relational cues between pedestrian attributes using tailored prompt templates. Furthermore, drawing inspiration from feature fusion techniques, we developed a Symmetry Semantic Feature Fusion (SSF) module that utilizes symmetric relationships between attributes to enhance the integration of information from different modes, aiming to capture comprehensive features and facilitate efficient cross-modal interactions. We evaluated our method using three benchmark datasets, CUHK-PEDES, ICFG-PEDES, and RSTPReid, and the results demonstrated Rank-1 accuracy rates of 78.17%, 69.47%, and 68.30%, respectively. These results indicate a significant enhancement in pedestrian retrieval accuracy, thereby validating the efficacy of our proposed approach. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

Figure 1
<p>The overall network framework of DAIM.</p>
Full article ">Figure 2
<p>Masked language modeling module. This module utilizes a cross-modal encoder to effectively match images and text features by learning to predict the masked text.</p>
Full article ">Figure 3
<p>Implicit relational prompt module.</p>
Full article ">Figure 4
<p>Semantic feature fusion module.</p>
Full article ">Figure 5
<p>The architecture of AF network. The green color represents the text features, while the blue color represents the image features.</p>
Full article ">Figure 6
<p>Influence of <math display="inline"><semantics> <mrow> <mi>ϵ</mi> </mrow> </semantics></math> in ID Loss on Rank-1 and mAP performance on CUHK-PEDES.</p>
Full article ">Figure 7
<p>Comparison of MLM model results with different mask rates on CUHK-PEDES.</p>
Full article ">Figure 8
<p>(<b>a</b>) The effect of the number of <math display="inline"><semantics> <mrow> <mi>m</mi> </mrow> </semantics></math> on CUHK-PEDES. (<b>b</b>) The effect of the number of <math display="inline"><semantics> <mrow> <mi>r</mi> </mrow> </semantics></math> on CUHK-PEDES.</p>
Full article ">Figure 9
<p>Effect of different combinations of <math display="inline"><semantics> <mrow> <mi>m</mi> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>r</mi> </mrow> </semantics></math> number on model performance on CUHK-PEDES.</p>
Full article ">Figure 10
<p>Rank-1 values for 20 independent runs for four models on the CUHK-PEDES dataset.</p>
Full article ">Figure 11
<p>Procedure of <math display="inline"><semantics> <mrow> <mi>K</mi> </mrow> </semantics></math>-fold cross-validation. In this procedure, the light blue color represents the training folds, while the dark blue color indicates the test fold.</p>
Full article ">Figure 12
<p>(<b>a</b>) Changes in Rank-1 and mAP changes across epochs on the CUHK-PEDES dataset; (<b>b</b>) changes in Rank-1 and mAP across epochs on the ICFG-PEDES dataset.</p>
Full article ">Figure 13
<p>(<b>a</b>) The original pedestrian image; (<b>b</b>) the heat map of Baseline; (<b>c</b>) the heatmap of DAIM.</p>
Full article ">Figure 14
<p>Comparison of Baseline-based and DAIM-based visualization of retrieval results.</p>
Full article ">
23 pages, 104931 KiB  
Article
Applications of the FusionScratchNet Algorithm Based on Convolutional Neural Networks and Transformer Models in the Detection of Cell Phone Screen Scratches
by Zhihong Cao, Kun Liang, Sheng Tang and Cheng Zhang
Electronics 2025, 14(1), 134; https://doi.org/10.3390/electronics14010134 - 31 Dec 2024
Viewed by 293
Abstract
Screen defect detection has become a crucial research domain, propelled by the growing necessity of precise and effective quality control in mobile device production. This study presents the FusionScratchNet (FS-Net), a novel algorithm developed to overcome the challenges of noise interference and to [...] Read more.
Screen defect detection has become a crucial research domain, propelled by the growing necessity of precise and effective quality control in mobile device production. This study presents the FusionScratchNet (FS-Net), a novel algorithm developed to overcome the challenges of noise interference and to characterize indistinct defects and subtle scratches on mobile phone screens. By integrating the transformer and convolutional neural network (CNN) architectures, FS-Net effectively captures both global and local features, thereby enhancing feature representation. The global–local feature integrator (GLFI) module effectively fuses global and local information through unique channel splitting, feature dependency characterization, and attention mechanisms, thereby enhancing target features and suppressing noise. The bridge attention (BA) module calculates an attention feature map based on the multi-layer fused features, precisely focusing on scratch characteristics and recovering details lost during downsampling. Evaluations using the PKU-Market-Phone dataset demonstrated an overall accuracy of 98.04%, an extended intersection over union (EIoU) of 88.03%, and an F1-score of 65.13%. In comparison to established methods like you only look once (YOLO) and retina network (RetinaNet), FS-Net demonstrated enhanced detection accuracy, computational efficiency, and resilience against noise. The experimental results demonstrated that the proposed method effectively enhances the accuracy of scratch segmentation. Full article
Show Figures

Figure 1

Figure 1
<p>Structure of the FS-Net designed for mobile screen scratch detection.</p>
Full article ">Figure 2
<p>(<b>a</b>) Part of the network structure of ResNet50; and (<b>b</b>) residual connection block.</p>
Full article ">Figure 3
<p>GLFI module structure.</p>
Full article ">Figure 4
<p>Spatial attention and channel attention modules.</p>
Full article ">Figure 5
<p>BA attention module.</p>
Full article ">Figure 6
<p>Sample dataset display.</p>
Full article ">Figure 7
<p>Different overlap levels with the same IoU values.</p>
Full article ">Figure 8
<p>Overlap between predicted and actual boxes.</p>
Full article ">Figure 9
<p>(<b>a</b>) Regression error curves of different loss functions; and (<b>b</b>) variation trends of IoU box plots under different loss functions.</p>
Full article ">Figure 10
<p>(<b>a</b>) Regression error curves of different loss functions; and (<b>b</b>) trends in the IoU box plots under different loss functions.</p>
Full article ">
18 pages, 6655 KiB  
Article
Curiosity-Driven Camouflaged Object Segmentation
by Mengyin Pang, Meijun Sun and Zheng Wang
Appl. Sci. 2025, 15(1), 173; https://doi.org/10.3390/app15010173 - 28 Dec 2024
Viewed by 213
Abstract
Camouflaged object segmentation refers to the task of accurately extracting objects that are seamlessly integrated within their surrounding environment. Existing deep-learning methods frequently encounter challenges in accurately segmenting camouflaged objects, particularly in capturing their complete and intricate details. To this end, we propose [...] Read more.
Camouflaged object segmentation refers to the task of accurately extracting objects that are seamlessly integrated within their surrounding environment. Existing deep-learning methods frequently encounter challenges in accurately segmenting camouflaged objects, particularly in capturing their complete and intricate details. To this end, we propose a novel method based on the Curiosity-Driven network, which is motivated by the innate human tendency for curiosity when encountering ambiguous regions and the subsequent drive to explore and observe objects’ details. Specifically, the proposed fusion bridge module aims to exploit the model’s inherent curiosity to fuse these features extracted by the dual-branch feature encoder to capture the complete details of the object. Then, drawing inspiration from curiosity, the curiosity-refinement module is proposed to progressively refine the initial predictions by exploring unknown regions within the object’s surrounding environment. Notably, we develop a novel curiosity-calculation operation to discover and remove curiosity, leading to accurate segmentation results. Extensive quantitative and qualitative experiments demonstrate that the proposed model significantly outperforms the existing competitors on three challenging benchmark datasets. Compared with the recently proposed state-of-the-art method, our model achieves performance gains of 1.80% on average for Sα. Moreover, our model can be extended to the polyp and industrial defects segmentation tasks, validating its robustness and effectiveness. Full article
Show Figures

Figure 1

Figure 1
<p>Visual comparison of COS in different challenging scenarios, including large objects, small objects, multiple objects, occluded objects, and objects with background matching (from top to bottom in the figure). The local heatmap is placed in the bottom right corner. Compared with the recently proposed CNN-based method FEDER [<a href="#B22-applsci-15-00173" class="html-bibr">22</a>] and Transformer-based HitNet [<a href="#B25-applsci-15-00173" class="html-bibr">25</a>], our method provides superior performance with more accurate object localization and more complete object segmentation, mainly due to the proposed fusion bridge module and curiosity-refinement module.</p>
Full article ">Figure 2
<p>The overall architecture of the proposed CDNet, consists of three key components, i.e., dual-branch feature encoder (DFE), fusion bridge module (FBM), and curiosity-refinement module (CRM).</p>
Full article ">Figure 3
<p>The detailed architecture of local–global feature block (LGFB) and curiosity fusion block (CFB) in fusion bridge module (FBM).</p>
Full article ">Figure 4
<p>F-measure (<b>top</b>) and Precision-recall (<b>bottom</b>) curves on the three camouflaged object datasets.</p>
Full article ">Figure 5
<p>Visual comparison of the proposed model with state-of-the-art methods in several challenging scenarios, including large objects, small objects, multiple objects, occluded objects, and objects with background matching. Please zoom in for details.</p>
Full article ">Figure 6
<p>Visualization of intermediate results in our CDNet in several challenging scenarios, including large objects, small objects, multiple objects, occluded objects, and objects with background matching.</p>
Full article ">Figure 7
<p>Extension applications. The Visualization results in medicine (1st and 2nd rows) and industry (3rd and 4th rows).</p>
Full article ">
13 pages, 7018 KiB  
Article
Image Classification of Tree Species in Relatives Based on Dual-Branch Vision Transformer
by Qi Wang, Yanqi Dong, Nuo Xu, Fu Xu, Chao Mou and Feixiang Chen
Forests 2024, 15(12), 2243; https://doi.org/10.3390/f15122243 - 20 Dec 2024
Viewed by 408
Abstract
Tree species in relatives refer to species belonging to the same genus with high morphological similarity and small botanical differences, making it difficult to perform classification and usually requiring manual identification by experts. To reduce labor costs and achieve accurate species identification, we [...] Read more.
Tree species in relatives refer to species belonging to the same genus with high morphological similarity and small botanical differences, making it difficult to perform classification and usually requiring manual identification by experts. To reduce labor costs and achieve accurate species identification, we conducted research on the image classification of tree species in relatives based on deep learning and proposed a dual-branch feature fusion Vision Transformer model. This model is designed with a dual-branch architecture and two effective blocks, a Residual Cross-Attention Transformer Block and a Multi-level Feature Fusion method, to enhance the influence of shallow network features on the final classification and enable the model to capture both overall image information and detailed features. Finally, we conducted ablation studies and comparative experiments to validate the effectiveness of the model, achieving an accuracy of 90% on the tree relatives dataset. Full article
Show Figures

Figure 1

Figure 1
<p>Overall architecture diagram of dual-branch Feature Fusion ViT. The constituent parts and process diagrams of the global branch and detail branch are described.</p>
Full article ">Figure 2
<p>Residual Cross-Attention Transformer Block and related internal blocks. (<b>a</b>) The described Transformer Encoder structure. (<b>b</b>) The described Residual Cross-Attention Transformer Block structure. Each branch is concatenated with several Transformer Encoders, and the first attention score is added as the residual to the following Transformer Encoders. Pass the results into the Cross Attention step to achieve the interaction of dual-branch information.</p>
Full article ">Figure 3
<p>Description of the structure of Multi-level Feature Fusion. Concatenate and fuse the output features of three RC-Attention Transformer Blocks to obtain the averaged features.</p>
Full article ">Figure 4
<p>Heat maps of the RC-Attention Transformer Block in three stages with dual branches.</p>
Full article ">Figure 5
<p>The loss and accuracy curves of the Taxus dataset on the proposed approach.</p>
Full article ">
30 pages, 13159 KiB  
Article
GLMAFuse: A Dual-Stream Infrared and Visible Image Fusion Framework Integrating Local and Global Features with Multi-Scale Attention
by Fu Li, Yanghai Gu, Ming Zhao, Deji Chen and Quan Wang
Electronics 2024, 13(24), 5002; https://doi.org/10.3390/electronics13245002 - 19 Dec 2024
Viewed by 447
Abstract
Integrating infrared and visible-light images facilitates a more comprehensive understanding of scenes by amalgamating dual-sensor data derived from identical environments. Traditional CNN-based fusion techniques are predominantly confined to local feature emphasis due to their inherently limited receptive fields. Conversely, Transformer-based models tend to [...] Read more.
Integrating infrared and visible-light images facilitates a more comprehensive understanding of scenes by amalgamating dual-sensor data derived from identical environments. Traditional CNN-based fusion techniques are predominantly confined to local feature emphasis due to their inherently limited receptive fields. Conversely, Transformer-based models tend to prioritize global information, which can lead to a deficiency in feature diversity and detail retention. Furthermore, methods reliant on single-scale feature extraction are inadequate for capturing extensive scene information. To address these limitations, this study presents GLMAFuse, an innovative dual-stream encoder–decoder network, which utilizes a multi-scale attention mechanism to harmoniously integrate global and local features. This framework is designed to maximize the extraction of multi-scale features from source images while effectively synthesizing local and global information across all layers. We introduce the global-aware and local embedding (GALE) module to adeptly capture and merge global structural attributes and localized details from infrared and visible imagery via a parallel dual-branch architecture. Additionally, the multi-scale attention fusion (MSAF) module is engineered to optimize attention weights at the channel level, facilitating an enhanced synergy between high-frequency edge details and global backgrounds. This promotes effective interaction and fusion of dual-modal features. Extensive evaluations using standard datasets demonstrate that GLMAFuse surpasses the existing leading methods in both qualitative and quantitative assessments, highlighting its superior capability in infrared and visible image fusion. On the TNO and MSRS datasets, our method achieves outstanding performance across multiple metrics, including EN (7.15, 6.75), SD (46.72, 47.55), SF (12.79, 12.56), MI (2.21, 3.22), SCD (1.75, 1.80), VIF (0.79, 1.08), Qbaf (0.58, 0.71), and SSIM (0.99, 1.00). These results underscore its exceptional proficiency in infrared and visible image fusion. Full article
(This article belongs to the Special Issue Artificial Intelligence Innovations in Image Processing)
Show Figures

Figure 1

Figure 1
<p>The qualitative fusion results based on CNN, AE, GAN, Transformer frameworks, and GLMAFuse.</p>
Full article ">Figure 2
<p>Deep model based on feature-level fusion.</p>
Full article ">Figure 3
<p>The framework of the proposed GLMAFuse for IVIF (where <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="bold">I</mi> </mrow> <mrow> <mi mathvariant="bold">F</mi> </mrow> </msub> </mrow> </semantics></math> means fused image, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="bold">I</mi> </mrow> <mrow> <mi mathvariant="bold">i</mi> <mi mathvariant="bold">r</mi> </mrow> </msub> </mrow> </semantics></math> means infrared image, and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="bold">I</mi> </mrow> <mrow> <mi mathvariant="bold">v</mi> <mi mathvariant="bold">i</mi> <mi mathvariant="bold">s</mi> </mrow> </msub> </mrow> </semantics></math> means visible image).</p>
Full article ">Figure 4
<p>Overview of the global-aware and local embedding module.</p>
Full article ">Figure 5
<p>Overview of the dual-modal interactive residual fusion block.</p>
Full article ">Figure 6
<p>Comparative analysis of visual fusion methods on the TNO dataset. Subfigure (<b>a</b>) displays images of rural scene, while subfigure (<b>b</b>) shows images of urban scene.</p>
Full article ">Figure 7
<p>Cumulative distribution of 8 metrics from the TNO dataset. The point on the curve <math display="inline"><semantics> <mrow> <mo>(</mo> <mi mathvariant="normal">x</mi> <mo>,</mo> <mi mathvariant="normal">y</mi> <mo>)</mo> </mrow> </semantics></math> indicates that there are <math display="inline"><semantics> <mrow> <mo>(</mo> <mn>100</mn> <mo>×</mo> <mi mathvariant="normal">x</mi> <mo>)</mo> <mi mathvariant="normal">%</mi> </mrow> </semantics></math> image pairs with metric values not exceeding y.</p>
Full article ">Figure 8
<p>Qualitative comparison results on the MSRS dataset. Subfigure (<b>a</b>) displays images of nighttime road scene, while subfigure (<b>b</b>) shows images of daytime road scene.</p>
Full article ">Figure 9
<p>Cumulative distribution of 8 metrics from the MSRS dataset. The point on the curve <math display="inline"><semantics> <mrow> <mo>(</mo> <mi mathvariant="normal">x</mi> <mo>,</mo> <mi mathvariant="normal">y</mi> <mo>)</mo> </mrow> </semantics></math> indicates that there are <math display="inline"><semantics> <mrow> <mo>(</mo> <mn>100</mn> <mo>×</mo> <mi mathvariant="normal">x</mi> <mo>)</mo> <mi mathvariant="normal">%</mi> </mrow> </semantics></math> of image pairs with metric values not exceeding y.</p>
Full article ">Figure 10
<p>Qualitative results of the ablation experiment of GALE on the Roadscene dataset.</p>
Full article ">Figure 11
<p>Qualitative results of the ablation experiment of MSAF on the Roadscene dataset.</p>
Full article ">
20 pages, 3034 KiB  
Article
HDCTfusion: Hybrid Dual-Branch Network Based on CNN and Transformer for Infrared and Visible Image Fusion
by Wenqing Wang, Lingzhou Li, Yifei Yang, Han Liu and Runyuan Guo
Sensors 2024, 24(23), 7729; https://doi.org/10.3390/s24237729 - 3 Dec 2024
Viewed by 530
Abstract
The purpose of infrared and visible image fusion is to combine the advantages of both and generate a fused image that contains target information and has rich details and contrast. However, existing fusion algorithms often overlook the importance of incorporating both local and [...] Read more.
The purpose of infrared and visible image fusion is to combine the advantages of both and generate a fused image that contains target information and has rich details and contrast. However, existing fusion algorithms often overlook the importance of incorporating both local and global feature extraction, leading to missing key information in the fused image. To address these challenges, this paper proposes a dual-branch fusion network combining convolutional neural network (CNN) and Transformer, which enhances the feature extraction capability and motivates the fused image to contain more information. Firstly, a local feature extraction module with CNN as the core is constructed. Specifically, the residual gradient module is used to enhance the ability of the network to extract texture information. Also, jump links and coordinate attention are used in order to relate shallow features to deeper ones. In addition, a global feature extraction module based on Transformer is constructed. Through the powerful ability of Transformer, the global context information of the image can be captured and the global features are fully extracted. The effectiveness of the proposed method in this paper is verified on different experimental datasets, and it is better than most of the current advanced fusion algorithms. Full article
Show Figures

Figure 1

Figure 1
<p>General framework of the proposed network.</p>
Full article ">Figure 2
<p>Local feature extraction module.</p>
Full article ">Figure 3
<p>Coordinate attention module.</p>
Full article ">Figure 4
<p>Global feature extraction module.</p>
Full article ">Figure 5
<p>Subjective comparison of three pairs of images on the MSRS dataset. (<b>a</b>) Infrared, (<b>b</b>) Visible, (<b>c</b>) DeepFuse, (<b>d</b>) DenseFuse, (<b>e</b>) RFN-nest, (<b>f</b>) SeaFusion, (<b>g</b>) SwinFuse, (<b>h</b>) U2, (<b>i</b>) ITFuse, (<b>j</b>) Ours.</p>
Full article ">Figure 6
<p>Objective comparison of eight indicators on ten image pairs from the MSRS dataset.</p>
Full article ">Figure 7
<p>Subjective comparison of three pairs of images on the TNO dataset. (<b>a</b>) Infrared, (<b>b</b>) Visible, (<b>c</b>) DeepFuse, (<b>d</b>) DenseFuse, (<b>e</b>) RFN-nest, (<b>f</b>) SeAFusion, (<b>g</b>) SwinFuse, (<b>h</b>) U2, (<b>i</b>) ITFuse, (<b>j</b>) Ours.</p>
Full article ">Figure 8
<p>Objective comparison of eight indicators on ten image pairs from the TNO dataset.</p>
Full article ">Figure 9
<p>Subjective comparison of 3 pairs of images on the RoadSence dataset. (<b>a</b>) Infrared, (<b>b</b>) Visible, (<b>c</b>) DeepFuse, (<b>d</b>) DenseFuse, (<b>e</b>) RFN-nest, (<b>f</b>) SeAFusion, (<b>g</b>) SwinFuse, (<b>h</b>) U2, (<b>i</b>) ITFuse, (<b>j</b>) Ours.</p>
Full article ">Figure 10
<p>Objective comparison of eight indicators on ten image pairs from the RoadSence dataset.</p>
Full article ">Figure 11
<p>Network framework for LFEM ablation experiments. (<b>a</b>) The network only with convolutional layers, (<b>b</b>) The network with the residual gradient module, (<b>c</b>) The network with the residual gradient module and CA.</p>
Full article ">Figure 12
<p>Network framework for the GFEM ablation experiments. (<b>a</b>) The proposed network without the GFEM, (<b>b</b>) The proposed network.</p>
Full article ">
19 pages, 41938 KiB  
Article
MMYFnet: Multi-Modality YOLO Fusion Network for Object Detection in Remote Sensing Images
by Huinan Guo, Congying Sun, Jing Zhang, Wuxia Zhang and Nengshuang Zhang
Remote Sens. 2024, 16(23), 4451; https://doi.org/10.3390/rs16234451 - 27 Nov 2024
Viewed by 624
Abstract
Object detection in remote sensing images is crucial for airport management, hazard prevention, traffic monitoring, and more. The precise ability for object localization and identification enables remote sensing imagery to provide early warnings, mitigate risks, and offer strong support for decision-making processes. While [...] Read more.
Object detection in remote sensing images is crucial for airport management, hazard prevention, traffic monitoring, and more. The precise ability for object localization and identification enables remote sensing imagery to provide early warnings, mitigate risks, and offer strong support for decision-making processes. While traditional deep learning-based object detection techniques have achieved significant results in single-modal environments, their detection capabilities still encounter challenges when confronted with complex environments, such as adverse weather conditions or situations where objects are obscured. To overcome the limitations of existing fusion methods in terms of complexity and insufficient information utilization, we innovatively propose a Cosine Similarity-based Image Feature Fusion (CSIFF) module and integrate it into a dual-branch YOLOv8 network, constructing a lightweight and efficient target detection network called Multi-Modality YOLO Fusion Network (MMYFNet). This network utilizes cosine similarity to divide the original features into common features and specific features, which are then refined and fused through specific modules. Experimental and analytical results show that MMYFNet performs excellently on both the VEDAI and FLIR datasets, achieving mAP values of 80% and 76.8%, respectively. Further validation through parameter sensitivity experiments, ablation studies, and visual analyses confirms the effectiveness of the CSIFF module. MMYFNet achieves high detection accuracy with fewer parameters, and the CSIFF module, as a plug-and-play module, can be integrated into other CNN-based cross-modality network models, providing a new approach for object detection in remote sensing image fusion. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>General overview of the Multi-Modality YOLO Fusion Network (MMYFNet).</p>
Full article ">Figure 2
<p>The overall architecture of Cosine Similarity-based Image Feature Fusion (CSIFF).</p>
Full article ">Figure 3
<p>Cosine Similarity Diagram: Representing Similarity between Vectors through the Cosine of the Angle Between Them.</p>
Full article ">Figure 4
<p>Structure of Feature Splitting (FS).</p>
Full article ">Figure 5
<p>Structure of Distinct Feature Processing (DFP).</p>
Full article ">Figure 6
<p>Structure of Similar Feature Processing (SFP).</p>
Full article ">Figure 7
<p>Curve graph of model performance varying with adaptive parameters <math display="inline"><semantics> <msub> <mi>α</mi> <mn>1</mn> </msub> </semantics></math>.</p>
Full article ">Figure 8
<p>Feature visualization using t-SNE method. (<b>a</b>) Distribution of original input features. (<b>b</b>) Features after partitioning by FS Module. (<b>c</b>) Features processed by DFP and SFP modules.</p>
Full article ">Figure 9
<p>Detection results of different fusion modules.</p>
Full article ">Figure 10
<p>Detection results of different fusion modules.</p>
Full article ">Figure 11
<p>Scatter plot of Params vs. <span class="html-italic">mAP@0.5</span> for different network architectures on VEDAI (<b>a</b>) and FLIR (<b>b</b>).</p>
Full article ">
20 pages, 4950 KiB  
Article
A Dual-Branch Residual Network with Attention Mechanisms for Enhanced Classification of Vaginal Lesions in Colposcopic Images
by Haima Yang, Yeye Song, Yuling Li, Zubei Hong, Jin Liu, Jun Li, Dawei Zhang, Le Fu, Jinyu Lu and Lihua Qiu
Bioengineering 2024, 11(12), 1182; https://doi.org/10.3390/bioengineering11121182 - 22 Nov 2024
Viewed by 412
Abstract
Vaginal intraepithelial neoplasia (VAIN), linked to HPV infection, is a condition that is often overlooked during colposcopy, especially in the vaginal vault area, as clinicians tend to focus more on cervical lesions. This oversight can lead to missed or delayed diagnosis and treatment [...] Read more.
Vaginal intraepithelial neoplasia (VAIN), linked to HPV infection, is a condition that is often overlooked during colposcopy, especially in the vaginal vault area, as clinicians tend to focus more on cervical lesions. This oversight can lead to missed or delayed diagnosis and treatment for patients with VAIN. Timely and accurate classification of VAIN plays a crucial role in the evaluation of vaginal lesions and the formulation of effective diagnostic approaches. The challenge is the high similarity between different classes and the low variability in the same class in colposcopic images, which can affect the accuracy, precision, and recall rates, depending on the image quality and the clinician’s experience. In this study, a dual-branch lesion-aware residual network (DLRNet), designed for small medical sample sizes, is introduced, which classifies vaginal lesions by examining the relationship between cervical and vaginal lesions. The DLRNet model includes four main components: a lesion localization module, a dual-branch classification module, an attention-guidance module, and a pretrained network module. The dual-branch classification module combines the original images with segmentation maps obtained from the lesion localization module using a pretrained ResNet network to fine-tune parameters at different levels, explore lesion-specific features from both global and local perspectives, and facilitate layered interactions. The feature guidance module focuses the local branch network on vaginal-specific features by using spatial and channel attention mechanisms. The final integration involves a shared feature extraction module and independent fully connected layers, which represent and merge the dual-branch inputs. The weighted fusion method effectively integrates multiple inputs, enhancing the discriminative and generalization capabilities of the model. Classification experiments on 1142 collected colposcopic images demonstrate that this method raises the existing classification levels, achieving the classification of VAIN into three lesion grades, thus providing a valuable tool for the early screening of vaginal diseases. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Three states of vaginal epithelium under iodine staining.</p>
Full article ">Figure 2
<p>(<b>Left</b>): Variability in characteristics in the same type of vaginal epithelial lesion. (<b>Right</b>): Similar characteristics across different lesion types.</p>
Full article ">Figure 3
<p>DLRNet is comprised of four main modules: the lesion localization and segmentation module, the dual-branch classification module, the attention-guidance module, and the weighted fusion module.</p>
Full article ">Figure 4
<p>Localized colposcopic image: The red contours indicate the key area of interest in the vagina. The images above show the original with a red boundary, and the images below show the segmented results.</p>
Full article ">Figure 5
<p>Network architecture of attention-guided blocks.</p>
Full article ">Figure 6
<p>Confusion matrix of colposcopic predictions by physicians.</p>
Full article ">Figure 7
<p>t-SNE visualization of ablation experimental results on the test set. (<b>a</b>) Single (global), (<b>b</b>) Single (local), (<b>c</b>) Dual (No pretrained), (<b>d</b>) Dual + Attention, (<b>e</b>) Dual* (Pretrained), (<b>f</b>) Dual* + Attention.</p>
Full article ">Figure 8
<p>ROC curves for each disease, comparing classic models and the proposed method. (<b>a</b>) 0: Normal, (<b>b</b>) 1: LSIL, (<b>c</b>) 2: HSIL +, (<b>d</b>) Mean.</p>
Full article ">Figure 9
<p>Grad-CAM visualizations comparing the proposed method with other classical methods: (<b>a</b>) Endoscopic images; (<b>b</b>) CNN; (<b>c</b>) VGG Net-D &amp; Net-E; (<b>d</b>) Mobilenets; (<b>e</b>) ResNet (ILSVRC’15); (<b>f</b>) DenseNet-BC; (<b>g</b>) GoogLeNet; (<b>h</b>) EfficientNet; (<b>i</b>) DLRNet.</p>
Full article ">
13 pages, 14573 KiB  
Article
A Feature Integration Network for Multi-Channel Speech Enhancement
by Xiao Zeng, Xue Zhang and Mingjiang Wang
Sensors 2024, 24(22), 7344; https://doi.org/10.3390/s24227344 - 18 Nov 2024
Viewed by 548
Abstract
Multi-channel speech enhancement has become an active area of research, demonstrating excellent performance in recovering desired speech signals from noisy environments. Recent approaches have increasingly focused on leveraging spectral information from multi-channel inputs, yielding promising results. In this study, we propose a novel [...] Read more.
Multi-channel speech enhancement has become an active area of research, demonstrating excellent performance in recovering desired speech signals from noisy environments. Recent approaches have increasingly focused on leveraging spectral information from multi-channel inputs, yielding promising results. In this study, we propose a novel feature integration network that not only captures spectral information but also refines it through shifted-window-based self-attention, enhancing the quality and precision of the feature extraction. Our network consists of blocks containing a full- and sub-band LSTM module for capturing spectral information, and a global–local attention fusion module for refining this information. The full- and sub-band LSTM module integrates both full-band and sub-band information through two LSTM layers, while the global–local attention fusion module learns global and local attention in a dual-branch architecture. To further enhance the feature integration, we fuse the outputs of these branches using a spatial attention module. The model is trained to predict the complex ratio mask (CRM), thereby improving the quality of the enhanced signal. We conducted an ablation study to assess the contribution of each module, with each showing a significant impact on performance. Additionally, our model was trained on the SPA-DNS dataset using a circular microphone array and the Libri-wham dataset with a linear microphone array, achieving competitive results compared to state-of-the-art models. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

Figure 1
<p>This diagram illustrates our proposed feature integration network. This architecture comprises multiple feature integration blocks, each containing a full- and sub-band module (the blue box) coupled with a global–local attention fusion module (the green box). * N means repeat the integration block (the gray box) N times.</p>
Full article ">Figure 2
<p>Diagram of the global and local attention fusion layer. It comprises two branches, a global branch and a local branch, along with a spatial attention (SA) module.</p>
Full article ">Figure 3
<p>The window partition operation.</p>
Full article ">Figure 4
<p>Spectrograms of the noisy, clean, and the five cases in <a href="#sensors-24-07344-t001" class="html-table">Table 1</a> (<b>A</b>–<b>E</b>).</p>
Full article ">Figure 5
<p>The influence of the reverberation time in terms of <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>P</mi> <mi>E</mi> <mi>S</mi> <mi>Q</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>S</mi> <mi>T</mi> <mi>O</mi> <mi>I</mi> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>S</mi> <mi>I</mi> <mo>_</mo> <mi>S</mi> <mi>D</mi> <mi>R</mi> </mrow> </semantics></math> is shown in (<b>a</b>–<b>c</b>).</p>
Full article ">
18 pages, 3490 KiB  
Article
MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
by Yan Wang, Li Cao and He Deng
Sensors 2024, 24(22), 7266; https://doi.org/10.3390/s24227266 - 13 Nov 2024
Viewed by 1208
Abstract
Semantic segmentation of remote sensing images is a fundamental task in computer vision, holding substantial relevance in applications such as land cover surveys, environmental protection, and urban building planning. In recent years, multi-modal fusion-based models have garnered considerable attention, exhibiting superior segmentation performance [...] Read more.
Semantic segmentation of remote sensing images is a fundamental task in computer vision, holding substantial relevance in applications such as land cover surveys, environmental protection, and urban building planning. In recent years, multi-modal fusion-based models have garnered considerable attention, exhibiting superior segmentation performance when compared with traditional single-modal techniques. Nonetheless, the majority of these multi-modal models, which rely on Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) for feature fusion, face limitations in terms of remote modeling capabilities or computational complexity. This paper presents a novel Mamba-based multi-modal fusion network called MFMamba for semantic segmentation of remote sensing images. Specifically, the network employs a dual-branch encoding structure, consisting of a CNN-based main encoder for extracting local features from high-resolution remote sensing images (HRRSIs) and of a Mamba-based auxiliary encoder for capturing global features on its corresponding digital surface model (DSM). To capitalize on the distinct attributes of the multi-modal remote sensing data from both branches, a feature fusion block (FFB) is designed to synergistically enhance and integrate the features extracted from the dual-branch structure at each stage. Extensive experiments on the Vaihingen and the Potsdam datasets have verified the effectiveness and superiority of MFMamba in semantic segmentation of remote sensing images. Compared with state-of-the-art methods, MFMamba achieves higher overall accuracy (OA) and a higher mean F1 score (mF1) and mean intersection over union (mIoU), while maintaining low computational complexity. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

Figure 1
<p>The overall architecture of our proposed MFMamba.</p>
Full article ">Figure 2
<p>(<b>a</b>) The detailed architecture of a VSS block. (<b>b</b>) The visualization of an SS2D unit.</p>
Full article ">Figure 3
<p>(<b>a</b>) The overall architecture of an FFB. (<b>b</b>) The structure of an MCKA unit. (<b>c</b>) The structure of an EAA unit.</p>
Full article ">Figure 4
<p>(<b>a</b>) The structure of a GLTB. (<b>b</b>) The structure of an FRH.</p>
Full article ">Figure 5
<p>Samples (<b>a</b>,<b>b</b>) are 256 × 256 from Vaihingen and (<b>c</b>,<b>d</b>) are 256 × 256 from Potsdam. The first row shows the orthophotos with three channels (NIRRG for Vaihingen and RGB for Potsdam). The second and third rows show the corresponding depth information and semantic labels in pixel-wise mapping.</p>
Full article ">Figure 6
<p>Visualization of the segmentation results from different methods on the Vaihingen dataset. (<b>a</b>) NIRRG images, (<b>b</b>) DSM, (<b>c</b>) Ground Truth, (<b>d</b>) CMFNet, (<b>e</b>) ABCNet, (<b>f</b>) TransUNet, (<b>g</b>) UNetFormer, (<b>h</b>) MAResU-Net, (<b>i</b>) CMTFNet, (<b>j</b>) RS3Mamba, and (<b>k</b>) the proposed MFMamba. Two purple boxes are added to each subfigure to highlight the differences.</p>
Full article ">Figure 7
<p>Visualization of the segmentation results from different methods on the Potsdam dataset. (<b>a</b>) RGB images, (<b>b</b>) DSM, (<b>c</b>) Ground Truth, (<b>d</b>) CMFNet, (<b>e</b>) ABCNet, (<b>f</b>) TransUNet, (<b>g</b>) UNetFormer, (<b>h</b>) MAResU-Net, (<b>i</b>) CMTFNet, (<b>j</b>) RS3Mamba, and (<b>k</b>) the proposed MFMamba. Two purple boxes are added to each subfigure to highlight the differences.</p>
Full article ">
19 pages, 3896 KiB  
Article
No-Reference Quality Assessment Based on Dual-Channel Convolutional Neural Network for Underwater Image Enhancement
by Renzhi Hu, Ting Luo, Guowei Jiang, Zhiqiang Lin and Zhouyan He
Electronics 2024, 13(22), 4451; https://doi.org/10.3390/electronics13224451 - 13 Nov 2024
Viewed by 380
Abstract
Underwater images are important for underwater vision tasks, yet their quality often degrades during imaging, promoting the generation of Underwater Image Enhancement (UIE) algorithms. This paper proposes a Dual-Channel Convolutional Neural Network (DC-CNN)-based quality assessment method to evaluate the performance of different UIE [...] Read more.
Underwater images are important for underwater vision tasks, yet their quality often degrades during imaging, promoting the generation of Underwater Image Enhancement (UIE) algorithms. This paper proposes a Dual-Channel Convolutional Neural Network (DC-CNN)-based quality assessment method to evaluate the performance of different UIE algorithms. Specifically, inspired by the intrinsic image decomposition, the enhanced underwater image is decomposed into reflectance with color information and illumination with texture information based on the Retinex theory. Afterward, we design a DC-CNN with two branches to learn color and texture features from reflectance and illumination, respectively, reflecting the distortion characteristics of enhanced underwater images. To integrate the learned features, a feature fusion module and attention mechanism are conducted to align efficiently and reasonably with human visual perception characteristics. Finally, a quality regression module is used to establish the mapping relationship between the extracted features and quality scores. Experimental results on two public enhanced underwater image datasets (i.e., UIQE and SAUD) show that the proposed DC-CNN method outperforms a variety of the existing quality assessment methods. Full article
Show Figures

Figure 1

Figure 1
<p>General framework of the proposed DC-CNN method.</p>
Full article ">Figure 2
<p>The decomposition results of different underwater images enhanced by three UIE algorithms. (<b>a</b>) RD-based [<a href="#B9-electronics-13-04451" class="html-bibr">9</a>]; (<b>b</b>) the reflectance of (<b>a</b>); (<b>c</b>) the illumination of (<b>a</b>); (<b>d</b>) Retinex [<a href="#B10-electronics-13-04451" class="html-bibr">10</a>]; (<b>e</b>) the reflectance of (<b>d</b>); (<b>f</b>) the illumination of (<b>d</b>); (<b>g</b>) Water-Net [<a href="#B17-electronics-13-04451" class="html-bibr">17</a>]; (<b>h</b>) the reflectance of (<b>g</b>); (<b>i</b>) the illumination of (<b>g</b>).</p>
Full article ">Figure 3
<p>Schematic of the residual module.</p>
Full article ">Figure 4
<p>Structure of feature fusion module.</p>
Full article ">Figure 5
<p>Fitted scatter plots of predicted scores (predicted by the NR-IQA method) versus subjective mean opinion score (MOS) values (provided by the UIQE dataset). (<b>a</b>–<b>r</b>) correspond to DIIVINE [<a href="#B29-electronics-13-04451" class="html-bibr">29</a>], BRISQUE [<a href="#B30-electronics-13-04451" class="html-bibr">30</a>], GLBP [<a href="#B31-electronics-13-04451" class="html-bibr">31</a>], SSEQ [<a href="#B32-electronics-13-04451" class="html-bibr">32</a>], BMPRI [<a href="#B33-electronics-13-04451" class="html-bibr">33</a>], CNN-IQA [<a href="#B34-electronics-13-04451" class="html-bibr">34</a>], MUSIQ [<a href="#B37-electronics-13-04451" class="html-bibr">37</a>], VCRNet [<a href="#B38-electronics-13-04451" class="html-bibr">38</a>], UIQM [<a href="#B39-electronics-13-04451" class="html-bibr">39</a>], UCIQE [<a href="#B40-electronics-13-04451" class="html-bibr">40</a>], CCF [<a href="#B56-electronics-13-04451" class="html-bibr">56</a>], FDUM [<a href="#B41-electronics-13-04451" class="html-bibr">41</a>], NUIQ [<a href="#B43-electronics-13-04451" class="html-bibr">43</a>], UIQEI [<a href="#B40-electronics-13-04451" class="html-bibr">40</a>], Twice-Mixing [<a href="#B49-electronics-13-04451" class="html-bibr">49</a>], Uranker [<a href="#B50-electronics-13-04451" class="html-bibr">50</a>], UIQI [<a href="#B44-electronics-13-04451" class="html-bibr">44</a>], and the proposed method, respectively.</p>
Full article ">Figure 5 Cont.
<p>Fitted scatter plots of predicted scores (predicted by the NR-IQA method) versus subjective mean opinion score (MOS) values (provided by the UIQE dataset). (<b>a</b>–<b>r</b>) correspond to DIIVINE [<a href="#B29-electronics-13-04451" class="html-bibr">29</a>], BRISQUE [<a href="#B30-electronics-13-04451" class="html-bibr">30</a>], GLBP [<a href="#B31-electronics-13-04451" class="html-bibr">31</a>], SSEQ [<a href="#B32-electronics-13-04451" class="html-bibr">32</a>], BMPRI [<a href="#B33-electronics-13-04451" class="html-bibr">33</a>], CNN-IQA [<a href="#B34-electronics-13-04451" class="html-bibr">34</a>], MUSIQ [<a href="#B37-electronics-13-04451" class="html-bibr">37</a>], VCRNet [<a href="#B38-electronics-13-04451" class="html-bibr">38</a>], UIQM [<a href="#B39-electronics-13-04451" class="html-bibr">39</a>], UCIQE [<a href="#B40-electronics-13-04451" class="html-bibr">40</a>], CCF [<a href="#B56-electronics-13-04451" class="html-bibr">56</a>], FDUM [<a href="#B41-electronics-13-04451" class="html-bibr">41</a>], NUIQ [<a href="#B43-electronics-13-04451" class="html-bibr">43</a>], UIQEI [<a href="#B40-electronics-13-04451" class="html-bibr">40</a>], Twice-Mixing [<a href="#B49-electronics-13-04451" class="html-bibr">49</a>], Uranker [<a href="#B50-electronics-13-04451" class="html-bibr">50</a>], UIQI [<a href="#B44-electronics-13-04451" class="html-bibr">44</a>], and the proposed method, respectively.</p>
Full article ">Figure 6
<p>Subjective MOS of different enhanced underwater images in the UIQE database compared to the objective quality scores predicted by the proposed method.</p>
Full article ">
20 pages, 9098 KiB  
Article
Local–Global Feature Adaptive Fusion Network for Building Crack Detection
by Yibin He, Zhengrong Yuan, Xinhong Xia, Bo Yang, Huiting Wu, Wei Fu and Wenxuan Yao
Sensors 2024, 24(21), 7076; https://doi.org/10.3390/s24217076 - 3 Nov 2024
Viewed by 889
Abstract
Cracks represent one of the most common types of damage in building structures and it is crucial to detect cracks in a timely manner to maintain the safety of the buildings. In general, tiny cracks require focusing on local detail information while complex [...] Read more.
Cracks represent one of the most common types of damage in building structures and it is crucial to detect cracks in a timely manner to maintain the safety of the buildings. In general, tiny cracks require focusing on local detail information while complex long cracks and cracks similar to the background require more global features for detection. Therefore, it is necessary for crack detection to effectively integrate local and global information. Focusing on this, a local–global feature adaptive fusion network (LGFAF-Net) is proposed. Specifically, we introduce the VMamba encoder as the global feature extraction branch to capture global long-range dependencies. To enhance the ability of the network to acquire detailed information, the residual network is added as another local feature extraction branch, forming a dual-encoding network to enhance the performance of crack detection. In addition, a multi-feature adaptive fusion (MFAF) module is proposed to integrate local and global features from different branches and facilitate representative feature learning. Furthermore, we propose a building exterior wall crack dataset (BEWC) captured by unmanned aerial vehicles (UAVs) to evaluate the performance of the proposed method used to identify wall cracks. Other widely used public crack datasets are also utilized to verify the generalization of the method. Extensive experiments performed on three crack datasets demonstrate the effectiveness and superiority of the proposed method. Full article
(This article belongs to the Special Issue Sensor-Fusion-Based Deep Interpretable Networks)
Show Figures

Figure 1

Figure 1
<p>Network framework of proposed LGFAF-Net.</p>
Full article ">Figure 2
<p>Network structure of the VSS block [<a href="#B41-sensors-24-07076" class="html-bibr">41</a>].</p>
Full article ">Figure 3
<p>Network structure of proposed MFAF module.</p>
Full article ">Figure 4
<p>Network structure of decoding stage.</p>
Full article ">Figure 5
<p>Some examples from the DeepCrack dataset.</p>
Full article ">Figure 6
<p>Some examples of the BEWC dataset.</p>
Full article ">Figure 7
<p>Some examples from the CrackSeg9k dataset.</p>
Full article ">Figure 8
<p>Visualization results of comparison experiments on the DeepCrack dataset. (<b>a</b>) Raw image; (<b>b</b>) ground truth; (<b>c</b>) CrackSegNet; (<b>d</b>) EMRA-Net; (<b>e</b>) CrackFormer-II; (<b>f</b>) APFNet; (<b>g</b>) proposed LGFAF-Net. Distinct regions are marked with red boxes.</p>
Full article ">Figure 9
<p>Visualization results of comparison experiments on the BEWC dataset. (<b>a</b>) Raw image; (<b>b</b>) ground truth; (<b>c</b>) CrackSegNet; (<b>d</b>) EMRA-Net; (<b>e</b>) CrackFormer-II; (<b>f</b>) APFNet; (<b>g</b>) proposed LGFAF-Net. Distinct regions are marked with red boxes.</p>
Full article ">Figure 10
<p>Visualization results of comparison experiments on the CrackSeg9k dataset. (<b>a</b>) Raw image; (<b>b</b>) ground truth; (<b>c</b>) CrackSegNet; (<b>d</b>) EMRA-Net; (<b>e</b>) CrackFormer-II; (<b>f</b>) APFNet; (<b>g</b>) proposed LGFAF-Net. Distinct regions are marked with red boxes.</p>
Full article ">Figure 11
<p>Visualization results of ablation experiments. (<b>a</b>) Raw image; (<b>b</b>) ground truth; (<b>c</b>) baseline; (<b>d</b>) baseline + CNN encoder; (<b>e</b>) proposed LGFAF-Net.</p>
Full article ">Figure 12
<p>Calculation of crack geometry information. The green line represents the center skeleton line of the crack and the red line represents the edge line of the crack.</p>
Full article ">Figure 13
<p>Some examples of estimation of crack geometry information. (<b>a</b>) Row image; (<b>b</b>) ground truth; (<b>c</b>) the edge lines and central skeleton lines; (<b>d</b>) crack width.</p>
Full article ">
21 pages, 37600 KiB  
Article
A Multi-Hierarchical Complementary Feature Interaction Network for Accelerated Multi-Modal MR Imaging
by Haotian Zhang, Qiaoyu Ma, Yiran Qiu and Zongying Lai
Appl. Sci. 2024, 14(21), 9764; https://doi.org/10.3390/app14219764 - 25 Oct 2024
Viewed by 696
Abstract
Magnetic resonance (MR) imaging is widely used in the clinical field due to its non-invasiveness, but the long scanning time is still a bottleneck for its popularization. Using the complementary information between multi-modal imaging to accelerate imaging provides a novel and effective MR [...] Read more.
Magnetic resonance (MR) imaging is widely used in the clinical field due to its non-invasiveness, but the long scanning time is still a bottleneck for its popularization. Using the complementary information between multi-modal imaging to accelerate imaging provides a novel and effective MR fast imaging solution. However, previous technologies mostly use simple fusion methods and fail to fully utilize their potential sharable knowledge. In this study, we introduced a novel multi-hierarchical complementary feature interaction network (MHCFIN) to realize joint reconstruction of multi-modal MR images with undersampled data and thus accelerate multi-modal imaging. Firstly, multiple attention mechanisms are integrated with a dual-branch encoder–decoder network to represent shared features and complementary features of different modalities. In the decoding stage, the multi-modal feature interaction module (MMFIM) acts as a bridge between the two branches, realizing complementary knowledge transfer between different modalities through cross-level fusion. The single-modal feature fusion module (SMFFM) carries out multi-scale feature representation and optimization of the single modality, preserving better anatomical details. Extensive experiments are conducted under different sampling patterns and acceleration factors. The results show that this proposed method achieves obvious improvement compared with existing state-of-the-art reconstruction methods in both visual quality and quantity. Full article
Show Figures

Figure 1

Figure 1
<p>Illustration of different MRI modalities and the reconstruction comparison results on fastMRI brain datasets. (<b>a</b>) Fully-sampled T1WI and T2WI. (<b>b</b>) Zero-filled T1WI and T2WI with 1D random sampling at 8× accelerate factors. (<b>c</b>) Reconstruction results of DuDoRNet [<a href="#B14-applsci-14-09764" class="html-bibr">14</a>]. (<b>d</b>) Reconstruction results of STUN [<a href="#B15-applsci-14-09764" class="html-bibr">15</a>]. (<b>e</b>) Reconstruction results of MHCFIN (Ours). Green boxes highlight detailed structures.</p>
Full article ">Figure 2
<p>Overall architecture of the proposed multi-hierarchical complementary feature interaction network (MHCFIN).</p>
Full article ">Figure 3
<p>Detailed architecture of the proposed encoder–decoder and detailed construction of triple attention.</p>
Full article ">Figure 4
<p>Detailed structure of the three types of attention (channel, spatial, and gate attention).</p>
Full article ">Figure 5
<p>The proposed multi-modal feature interaction module consists of a double cross-attention. To interact with multi-scale features of different modalities at different hierarchical.</p>
Full article ">Figure 6
<p>Detailed architecture of the single-modal feature fusion module.</p>
Full article ">Figure 7
<p>Qualitative comparison with different reconstruction methods using 1D random undersampling pattern with acceleration factor 8× on fastMRI brain datasets. Ground truth (GT), zero-filled (ZF), reconstructed MR images (T1WI and T2WI), error maps, and zoomed-in details are provided.</p>
Full article ">Figure 8
<p>Qualitative comparison with different reconstruction methods using 1D equispaced undersampling pattern with acceleration factor 12× on fastMRI brain datasets.</p>
Full article ">Figure 9
<p>Qualitative comparison with different reconstruction methods using 1D random undersampling pattern with acceleration factor 10× on fastMRI knee datasets.</p>
Full article ">Figure 10
<p>Bar charts for (<b>a</b>) PSNR, (<b>b</b>) SSIM, and (<b>c</b>) RLNE depicting the performance of various reconstruction methods using an 8× acceleration factor mask on the fastMRI brain dataset. The black arrows represent the standard deviation (mean <math display="inline"><semantics> <mrow> <mo>±</mo> </mrow> </semantics></math> standard deviation) for each method.</p>
Full article ">Figure 11
<p>The training loss of different reconstruction methods on the fastMRI brain dataset.</p>
Full article ">Figure 12
<p>Ablation study of the key components in the proposed method using 1D random under-sampling pattern with acceleration factor 8× on fastMRI brain datasets.</p>
Full article ">
21 pages, 2583 KiB  
Article
MDAR: A Multiscale Features-Based Network for Remotely Measuring Human Heart Rate Utilizing Dual-Branch Architecture and Alternating Frame Shifts in Facial Videos
by Linhua Zhang, Jinchang Ren, Shuang Zhao and Peng Wu
Sensors 2024, 24(21), 6791; https://doi.org/10.3390/s24216791 - 22 Oct 2024
Viewed by 785
Abstract
Remote photoplethysmography (rPPG) refers to a non-contact technique that measures heart rate through analyzing the subtle signal changes of facial blood flow captured by video sensors. It is widely used in contactless medical monitoring, remote health management, and activity monitoring, providing a more [...] Read more.
Remote photoplethysmography (rPPG) refers to a non-contact technique that measures heart rate through analyzing the subtle signal changes of facial blood flow captured by video sensors. It is widely used in contactless medical monitoring, remote health management, and activity monitoring, providing a more convenient and non-invasive way to monitor heart health. However, factors such as ambient light variations, facial movements, and differences in light absorption and reflection pose challenges to deep learning-based methods. To solve these difficulties, we put forward a measurement network of heart rate based on multiscale features. In this study, we designed and implemented a dual-branch signal processing framework that combines static and dynamic features, proposing a novel and efficient method for feature fusion, enhancing the robustness and reliability of the signal. Furthermore, we proposed an alternate time-shift module to enhance the model’s temporal depth. To integrate the features extracted at different scales, we utilized a multiscale feature fusion method, enabling the model to accurately capture subtle changes in blood flow. We conducted cross-validation on three public datasets: UBFC-rPPG, PURE, and MMPD. The results demonstrate that MDAR not only ensures fast inference speed but also significantly improves performance. The two main indicators, MAE and MAPE, achieved improvements of at least 30.6% and 30.2%, respectively, surpassing state-of-the-art methods. These conclusions highlight the potential advantages of MDAR for practical applications. Full article
(This article belongs to the Special Issue Multi-Sensor Data Fusion)
Show Figures

Figure 1

Figure 1
<p>MDAR network structure diagram.</p>
Full article ">Figure 2
<p>ATSM network structure diagram.</p>
Full article ">Figure 3
<p>Sample dataset pictures.</p>
Full article ">Figure 4
<p>MAE stack bar plot.</p>
Full article ">Figure 5
<p>MAPE stack bar plot.</p>
Full article ">Figure 6
<p>Visual example diagram.</p>
Full article ">
Back to TopTop