[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (529)

Search Parameters:
Keywords = Siamese

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 7507 KiB  
Article
Sliding-Window Dissimilarity Cross-Attention for Near-Real-Time Building Change Detection
by Wen Lu and Minh Nguyen
Remote Sens. 2025, 17(1), 135; https://doi.org/10.3390/rs17010135 - 2 Jan 2025
Viewed by 395
Abstract
A near-real-time change detection network can consistently identify unauthorized construction activities over a wide area, empowering authorities to enforce regulations efficiently. Furthermore, it can promptly assess building damage, enabling expedited rescue efforts. The extensive adoption of deep learning in change detection has prompted [...] Read more.
A near-real-time change detection network can consistently identify unauthorized construction activities over a wide area, empowering authorities to enforce regulations efficiently. Furthermore, it can promptly assess building damage, enabling expedited rescue efforts. The extensive adoption of deep learning in change detection has prompted a predominant emphasis on enhancing detection performance, primarily through the expansion of the depth and width of networks, overlooking considerations regarding inference time and computational cost. To accurately represent the spatio-temporal semantic correlations between pre-change and post-change images, we create an innovative transformer attention mechanism named Sliding-Window Dissimilarity Cross-Attention (SWDCA), which detects spatio-temporal semantic discrepancies by explicitly modeling the dissimilarity of bi-temporal tokens, departing from the mono-temporal similarity attention typically used in conventional transformers. In order to fulfill the near-real-time requirement, SWDCA employs a sliding-window scheme to limit the range of the cross-attention mechanism within a predetermined window/dilated window size. This approach not only excludes distant and irrelevant information but also reduces computational cost. Furthermore, we develop a lightweight Siamese backbone for extracting building and environmental features. Subsequently, we integrate an SWDCA module into this backbone, forming an efficient change detection network. Quantitative evaluations and visual analyses of thorough experiments verify that our method achieves top-tier accuracy on two building change detection datasets of remote sensing imagery, while also achieving a real-time inference speed of 33.2 FPS on a mobile GPU. Full article
(This article belongs to the Special Issue Remote Sensing and SAR for Building Monitoring)
Show Figures

Figure 1

Figure 1
<p>The size of circles represents the number of network parameters; circles positioned closer to the top left indicate better performance. The computational complexity, quantified by Multiply–Accumulate Operations (MACs), was evaluated using bi-temporal image pairs with a resolution of 512 × 512 pixels.</p>
Full article ">Figure 2
<p>The structure of the sliding-window dissimilarity cross-attention module, where <math display="inline"><semantics> <mrow> <mo>(</mo> <mi>A</mi> <mo>,</mo> <mi>B</mi> <mo>)</mo> <mo>∈</mo> <mo>{</mo> <mo>(</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>)</mo> <mo>,</mo> <mo>(</mo> <mn>2</mn> <mo>,</mo> <mn>1</mn> <mo>)</mo> <mo>}</mo> </mrow> </semantics></math> denote two time points and ⊗ represents matrix multiplication.</p>
Full article ">Figure 3
<p>The architecture of our efficient change detection network, where ⊕ represents element-wise addition.</p>
Full article ">Figure 4
<p>Structures of MBConv and Fused-MBConv [<a href="#B30-remotesensing-17-00135" class="html-bibr">30</a>].</p>
Full article ">Figure 5
<p>Comparison of building change predictions generated by BAT and the SWDCA network on the LEVIR-CD+ dataset.</p>
Full article ">Figure 6
<p>Comparison of building change predictions generated by BAT and the SWDCA network on the S2looking dataset.</p>
Full article ">Figure 7
<p>Comparison of building change predictions generated by various methods on the S2looking dataset.</p>
Full article ">Figure 8
<p>Failure cases on the S2looking dataset.</p>
Full article ">
24 pages, 6638 KiB  
Article
Fault Diagnosis of Bearings with Small Sample Size Using Improved Capsule Network and Siamese Neural Network
by Jarula Yasenjiang, Yang Xiao, Chao He, Luhui Lv and Wenhao Wang
Sensors 2025, 25(1), 92; https://doi.org/10.3390/s25010092 - 27 Dec 2024
Viewed by 299
Abstract
This paper addresses the challenges of low accuracy and long transfer learning time in small-sample bearing fault diagnosis, which are often caused by limited samples, high noise levels, and poor feature extraction. We propose a method that combines an improved capsule network with [...] Read more.
This paper addresses the challenges of low accuracy and long transfer learning time in small-sample bearing fault diagnosis, which are often caused by limited samples, high noise levels, and poor feature extraction. We propose a method that combines an improved capsule network with a Siamese neural network. Multi-view data partitioning is used to enrich data diversity, and Markov transformation converts one-dimensional vibration signals into two-dimensional images, enhancing the visualization of signal features. The dynamic routing mechanism of the capsule network effectively captures and integrates key fault features, improving the model’s feature representation and robustness. The Siamese network shares weights to optimize feature matching, while SKNet dynamically adjusts feature fusion to enhance generalization performance. By integrating the Siamese neural network with SKNet, we improve transfer efficiency, reduce the number of parameters, and lighten the model to reduce complexity and shorten transfer time. Experimental results demonstrate that this method can accurately identify faults under conditions of limited samples and high noise, thereby improving diagnostic accuracy and reducing transfer time. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

Figure 1
<p>Schematic diagram of a capsule unit.</p>
Full article ">Figure 2
<p>Schematic diagram of dynamic routing.</p>
Full article ">Figure 3
<p>Schematic diagram of a Siamese capsule neural network.</p>
Full article ">Figure 4
<p>Schematic diagram of multi-view joint optimization for feature extraction.</p>
Full article ">Figure 5
<p>Schematic diagram of the improved capsule network workflow. X is the input SKNet data; Ũ1, Ũ2, Ũ3 are SKNet channels with convolved nuclei of different sizes; Ũs1, Ũs2, Ũs3 are three different channels after feature extraction by SKNet; V is the final output data of SKNet and also the input data of CBAM attention mechanism. Ṽ is the data processed by CBAM, and all arrows show the data direction.</p>
Full article ">Figure 6
<p>Schematic diagram of the transfer network workflow.</p>
Full article ">Figure 7
<p>Comparative ablation accuracy for different transfer tasks (CWRU dataset).</p>
Full article ">Figure 8
<p>Accuracy of different models under various signal-to-noise ratios (CWRU dataset).</p>
Full article ">Figure 9
<p>Dimensionality reduction visualization of the model on CWRU transfer task 0→1 (CWRU dataset).</p>
Full article ">Figure 10
<p>Confusion matrix of the model (CWRU dataset).</p>
Full article ">Figure 11
<p>Laboratory bearing fault test rig.</p>
Full article ">Figure 12
<p>Various fault categories of bearings.</p>
Full article ">Figure 13
<p>Comparative ablation accuracy for different transfer tasks (laboratory bearing dataset).</p>
Full article ">Figure 14
<p>Accuracy of different models under various signal-to-noise ratios (laboratory bearing dataset).</p>
Full article ">Figure 15
<p>Dimensionality reduction visualization of the model on transfer task A→B (laboratory bearing dataset).</p>
Full article ">Figure 16
<p>Confusion matrix of the model (laboratory bearing dataset).</p>
Full article ">
21 pages, 5645 KiB  
Article
Study on Few-Shot Fault Diagnosis Method for Marine Fuel Systems Based on DT-SViT-KNN
by Shankai Li, Liang Qi, Jiayu Shi, Han Xiao, Bin Da, Runkang Tang and Danfeng Zuo
Sensors 2025, 25(1), 6; https://doi.org/10.3390/s25010006 - 24 Dec 2024
Viewed by 266
Abstract
The fuel system serves as the core component of marine diesel engines, and timely and effective fault diagnosis is the prerequisite for the safe navigation of ships. To address the challenge of current data-driven fault-diagnosis-based methods, which have difficulty in feature extraction and [...] Read more.
The fuel system serves as the core component of marine diesel engines, and timely and effective fault diagnosis is the prerequisite for the safe navigation of ships. To address the challenge of current data-driven fault-diagnosis-based methods, which have difficulty in feature extraction and low accuracy under small samples, this paper proposes a fault diagnosis method based on digital twin (DT), Siamese Vision Transformer (SViT), and K-Nearest Neighbor (KNN). Firstly, a diesel engine DT model is constructed by integrating the mathematical, mechanism, and three-dimensional physical models of the Medium-speed diesel engines of 6L21/31 Marine, completing the mapping from physical entity to virtual entity. Fault simulation calculations are performed using the DT model to obtain different types of fault data. Then, a feature extraction network combining Siamese networks with Vision Transformer (ViT) is proposed for the simulated samples. An improved KNN classifier based on the attention mechanism is added to the network to enhance the classification efficiency of the model. Meanwhile, a Weighted-Similarity loss function is designed using similarity labels and penalty coefficients, enhancing the model’s ability to discriminate between similar sample pairs. Finally, the proposed method is validated using a simulation dataset. Experimental results indicate that the proposed method achieves average accuracies of 97.22%, 98.21%, and 99.13% for training sets with 10, 20, and 30 samples per class, respectively, which can accurately classify the fault of marine fuel systems under small samples and has promising potential for applications. Full article
Show Figures

Figure 1

Figure 1
<p>Composition of diesel engine digital twin system.</p>
Full article ">Figure 2
<p>Simulation schematic of the 6L21/31 marine diesel engine.</p>
Full article ">Figure 3
<p>Twin model of 6L21/31 marine diesel engine.</p>
Full article ">Figure 4
<p>Structure of the siamese network.</p>
Full article ">Figure 5
<p>Structure of SViT-KNN model.</p>
Full article ">Figure 6
<p>Overall Diagnostic Flowchart.</p>
Full article ">Figure 7
<p>Structure of SCNN model.</p>
Full article ">Figure 8
<p>Experimental results of the three methods with different sample sizes.</p>
Full article ">Figure 9
<p>Confusion matrix of the three methods for different sample sizes.</p>
Full article ">Figure 10
<p>Structure of 1D-CNN and OSLNet models.</p>
Full article ">Figure 11
<p>Experimental results of the four methods with different sample sizes.</p>
Full article ">Figure 12
<p>Confusion matrix of the four methods for different sample sizes.</p>
Full article ">
26 pages, 5609 KiB  
Article
DSiam-CnK: A CBAM- and KCF-Enabled Deep Siamese Region Proposal Network for Human Tracking in Dynamic and Occluded Scenes
by Xiangpeng Liu, Jianjiao Han, Yulin Peng, Qiao Liang, Kang An, Fengqin He and Yuhua Cheng
Sensors 2024, 24(24), 8176; https://doi.org/10.3390/s24248176 - 21 Dec 2024
Viewed by 316
Abstract
Despite the accuracy and robustness attained in the field of object tracking, algorithms based on Siamese neural networks often over-rely on information from the initial frame, neglecting necessary updates to the template; furthermore, in prolonged tracking situations, such methodologies encounter challenges in efficiently [...] Read more.
Despite the accuracy and robustness attained in the field of object tracking, algorithms based on Siamese neural networks often over-rely on information from the initial frame, neglecting necessary updates to the template; furthermore, in prolonged tracking situations, such methodologies encounter challenges in efficiently addressing issues such as complete occlusion or instances where the target exits the frame. To tackle these issues, this study enhances the SiamRPN algorithm by integrating the convolutional block attention module (CBAM), which enhances spatial channel attention. Additionally, it integrates the kernelized correlation filters (KCFs) for enhanced feature template representation. Building on this, we present DSiam-CnK, a Siamese neural network with dynamic template updating capabilities, facilitating adaptive adjustments in tracking strategy. The proposed algorithm is tailored to elevate the Siamese neural network’s accuracy and robustness for prolonged tracking, all the while preserving its tracking velocity. In our research, we assessed the performance on the OTB2015, VOT2018, and LaSOT datasets. Our method, when benchmarked against established trackers, including SiamRPN on OTB2015, achieved a success rate of 92.1% and a precision rate of 90.9%. On the VOT2018 dataset, it excelled, with a VOT-A (accuracy) of 46.7%, a VOT-R (robustness) of 135.3%, and a VOT-EAO (expected average overlap) of 26.4%, leading in all categories. On the LaSOT dataset, it achieved a precision of 35.3%, a normalized precision of 34.4%, and a success rate of 39%. The findings demonstrate enhanced precision in tracking performance and a notable increase in robustness with our method. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Figure 1
<p>SiamRPN integrated with CBAM.</p>
Full article ">Figure 2
<p>Convolutional block attention module.</p>
Full article ">Figure 3
<p>Channel attention modules.</p>
Full article ">Figure 4
<p>Spatial attention modules.</p>
Full article ">Figure 5
<p>Structure of CBAM- and KCF-enabled deep Siamese region proposal network.</p>
Full article ">Figure 6
<p>DSiam-CnK algorithm process based on template updating.</p>
Full article ">Figure 7
<p>Illustration of the OTB dataset.</p>
Full article ">Figure 8
<p>Illustration of the VOT2018 dataset.</p>
Full article ">Figure 9
<p>Illustration of the LaSOT dataset.</p>
Full article ">Figure 10
<p>Illustration of template image annotation.</p>
Full article ">Figure 11
<p>Comparative results: SiamRPN vs. DSiam-CnK.</p>
Full article ">Figure 12
<p>Comparison between SiamRPN and DSiam-CnK: IoU and Euclidean distances.</p>
Full article ">Figure 13
<p>Comparison between SiamRPN and DSiam-CnK: IoU and Euclidean distance variations when the target is occluded (yellow rectangle).</p>
Full article ">Figure 14
<p>Success plots of OPE.</p>
Full article ">Figure 15
<p>Precision plots of OPE.</p>
Full article ">
28 pages, 14547 KiB  
Article
A Contrastive-Augmented Memory Network for Anti-UAV Tracking in TIR Videos
by Ziming Wang, Yuxin Hu, Jianwei Yang, Guangyao Zhou, Fangjian Liu and Yuhan Liu
Remote Sens. 2024, 16(24), 4775; https://doi.org/10.3390/rs16244775 - 21 Dec 2024
Viewed by 409
Abstract
With the development of unmanned aerial vehicle (UAV) technology, the threat of UAV intrusion is no longer negligible. Therefore, drone perception, especially anti-UAV tracking technology, has gathered considerable attention. However, both traditional Siamese and transformer-based trackers struggle in anti-UAV tasks due to the [...] Read more.
With the development of unmanned aerial vehicle (UAV) technology, the threat of UAV intrusion is no longer negligible. Therefore, drone perception, especially anti-UAV tracking technology, has gathered considerable attention. However, both traditional Siamese and transformer-based trackers struggle in anti-UAV tasks due to the small target size, clutter backgrounds and model degradation. To alleviate these challenges, a novel contrastive-augmented memory network (CAMTracker) is proposed for anti-UAV tracking tasks in thermal infrared (TIR) videos. The proposed CAMTracker conducts tracking through a two-stage scheme, searching for possible candidates in the first stage and matching the candidates with the template for final prediction. In the first stage, an instance-guided region proposal network (IG-RPN) is employed to calculate the correlation features between the templates and the searching images and further generate candidate proposals. In the second stage, a contrastive-augmented matching module (CAM), along with a refined contrastive loss function, is designed to enhance the discrimination ability of the tracker under the instruction of contrastive learning strategy. Moreover, to avoid model degradation, an adaptive dynamic memory module (ADM) is proposed to maintain a dynamic template to cope with the feature variation of the target in long sequences. Comprehensive experiments have been conducted on the Anti-UAV410 dataset, where the proposed CAMTracker achieves the best performance compared to advanced tracking algorithms, with significant advantages on all the evaluation metrics, including at least 2.40%, 4.12%, 5.43% and 5.48% on precision, success rate, success AUC and state accuracy, respectively. Full article
Show Figures

Figure 1

Figure 1
<p>The main architecture of CAMTracker. The whole tracker mainly contains four parts, including a pair of backbones, an instance-guide region proposal network (IG-RPN), a contrastive-augmented matching module (CAM) and an adaptive dynamic memory module (ADM).</p>
Full article ">Figure 2
<p>The structure of the IG-RPN, which contains a correlation encoder and an RPN head to generate possible proposals.</p>
Full article ">Figure 3
<p>The illustration of CAM. The module is mainly composed of a regular branch and an embedding branch. In the figure, GAP, MLP and Norm means global average pooling, multi-layer perceptron and normalization, respectively.</p>
Full article ">Figure 4
<p>The demonstration of ADM.</p>
Full article ">Figure 5
<p>The overall precision plot (<b>a</b>) and success plot (<b>b</b>) of CAMTracker and other compared trackers on the test set of Anti-UAV410.</p>
Full article ">Figure 6
<p>Attribute-based comparisons of CAMTracker and other trackers on AntiUAV-410. The attributes include fast motion (FM), occlusion (OC), out-of-view (OV), scale varaition (SV), thermal crossover (TC) and dynamic background clutter (DBC). Among the subplots, (<b>a</b>–<b>f</b>) are precision plots, and (<b>g</b>–<b>l</b>) are success plots. The numbers in legend indicate the precision scores or success AUC scores of the corresponding trackers.</p>
Full article ">Figure 7
<p>Target size-based comparisons on precision plots of CAMTracker and other trackers on AntiUAV-410. The sizes include normal size (<b>a</b>), medium size (<b>b</b>), small size (<b>c</b>) and tiny size (<b>d</b>).</p>
Full article ">Figure 8
<p>Target size-based comparisons on success plots of CAMTracker and other trackers on AntiUAV-410. The sizes include normal size (<b>a</b>), medium size (<b>b</b>), small size (<b>c</b>) and tiny size (<b>d</b>).</p>
Full article ">Figure 9
<p>Qualitative evaluation on some challenging sequences.</p>
Full article ">Figure 10
<p>The precision plot (<b>a</b>) and success plot (<b>b</b>) of CAMTracker and other compared trackers on the test set of LSOTB-TIR.</p>
Full article ">Figure 11
<p>Qualitative evaluation for sequences containing closed distractors.</p>
Full article ">Figure 12
<p>Failure on some challenging sequences.</p>
Full article ">
17 pages, 6857 KiB  
Article
Lightweight Siamese Network with Global Correlation for Single-Object Tracking
by Yuxuan Ding and Kehua Miao
Sensors 2024, 24(24), 8171; https://doi.org/10.3390/s24248171 - 21 Dec 2024
Viewed by 273
Abstract
Recent advancements in the field of object tracking have been notably influenced by Siamese-based trackers, which have demonstrated considerable progress in their performance and application. Researchers frequently emphasize the precision of trackers, yet they tend to neglect the associated complexity. This oversight can [...] Read more.
Recent advancements in the field of object tracking have been notably influenced by Siamese-based trackers, which have demonstrated considerable progress in their performance and application. Researchers frequently emphasize the precision of trackers, yet they tend to neglect the associated complexity. This oversight can restrict real-time performance, rendering these trackers inadequate for specific applications. This study presents a novel lightweight Siamese network tracker, termed SiamGCN, which incorporates global feature fusion alongside a lightweight network architecture to improve tracking performance on devices with limited resources. MobileNet-V3 was chosen as the backbone network for feature extraction, with modifications made to the stride of its final layer to enhance extraction efficiency. A global correlation module, which was founded on the Transformer architecture, was developed utilizing a multi-head cross-attention mechanism. This design enhances the integration of template and search region features, thereby facilitating more precise and resilient tracking capabilities. The model underwent evaluation across four prominent tracking benchmarks: VOT2018, VOT2019, LaSOT, and TrackingNet. The results indicate that SiamGCN achieves high tracking performance while simultaneously decreasing the number of parameters and computational costs. This results in significant benefits regarding processing speed and resource utilization. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Comparison of state-of-the-art trackers according to the expected average overlap (EAO) and tracking speed (FPS) on VOT2019. Larger circles represent FLOPs, while smaller circles indicate parameters. A higher EAO and FPS are preferable, indicating better performance and faster tracking. Our method achieves a good balance between performance and efficiency, with a competitive EAO and high FPS, while maintaining low FLOPs and parameters.</p>
Full article ">Figure 2
<p>Cross-correlation for feature fusion. The blue squares represent the convolution kernels derived from the template features, while the green squares represent the feature maps obtained from the search region. Cross-correlation generates a single-channel correlation response map.</p>
Full article ">Figure 3
<p>Depthwise cross-correlation for feature fusion. The blue squares and green squares represent the same as cross-correlation. Independent sliding kernels are applied to each channel of the template features and the search region features, producing a multi-channel correlation response map.</p>
Full article ">Figure 4
<p>The SiamGCN architecture comprises three primary components: a feature extractor, a feature fusion network, and a prediction head. The template and search regions undergo processing via shared weights within the feature extractor. The extracted features are subsequently integrated through the global correlation module. The prediction head comprises distinct branches for both classification and regression tasks. The classification branch is responsible for predicting the correspondence of each region to either the foreground or background, whereas the regression branch is tasked with predicting the bounding box dimensions.</p>
Full article ">Figure 5
<p>Prediction head architecture. Both heads are composed of several depthwise separable convolution (DSC) layers.</p>
Full article ">Figure 6
<p>The global correlation module leverages a cross-attention Transformer layer, designed based on multi-head attention, to fuse features from the template and search branches. Queries <span class="html-italic">Q</span> are generated from the search branch, while keys <span class="html-italic">K</span> and values <span class="html-italic">V</span> are derived from the template branch. The multi-head cross-attention mechanism captures global relationships between features, followed by normalization and a feed-forward network (FFN) for enhanced feature stability.</p>
Full article ">Figure 7
<p>Visualizations of the tracking results of SiamGCN compared to other trackers, SiamFC++ [<a href="#B22-sensors-24-08171" class="html-bibr">22</a>], SiamBAN [<a href="#B2-sensors-24-08171" class="html-bibr">2</a>], SiamDW [<a href="#B35-sensors-24-08171" class="html-bibr">35</a>], and SiamRPN++ [<a href="#B19-sensors-24-08171" class="html-bibr">19</a>], across three representative video sequences from LaSOT [<a href="#B16-sensors-24-08171" class="html-bibr">16</a>]. The red bounding box corresponds to SiamGCN, which consistently demonstrates accurate tracking under challenging conditions, such as aspect ratio changes, fast motion, and partial occlusion. In contrast, the other trackers show coarser bounding boxes, often failing to tightly fit the target. These results indicate the superior tracking stability and accuracy of SiamGCN, especially in maintaining precise prediction even when the object undergoes significant changes or occlusion.</p>
Full article ">Figure 8
<p>Visualization of the tracking results underscores the inherent challenges of object tracking under dense fog weather conditions. The red bounding box represents the ground truth, while the green bounding box denotes the tracking result produced by our model. The severely reduced visibility and indistinct features make it difficult for the model to consistently learn and capture feature information, highlighting the need for further optimization to enhance performance in such challenging scenarios.</p>
Full article ">
22 pages, 7963 KiB  
Article
WTSM-SiameseNet: A Wood-Texture-Similarity-Matching Method Based on Siamese Networks
by Yizhuo Zhang, Guanlei Wu, Shen Shi and Huiling Yu
Information 2024, 15(12), 808; https://doi.org/10.3390/info15120808 - 16 Dec 2024
Viewed by 371
Abstract
In tasks such as wood defect repair and the production of high-end wooden furniture, ensuring the consistency of the texture in repaired or jointed areas is crucial. This paper proposes the WTSM-SiameseNet model for wood-texture-similarity matching and introduces several improvements to address the [...] Read more.
In tasks such as wood defect repair and the production of high-end wooden furniture, ensuring the consistency of the texture in repaired or jointed areas is crucial. This paper proposes the WTSM-SiameseNet model for wood-texture-similarity matching and introduces several improvements to address the issues present in traditional methods. First, to address the issue that fixed receptive fields cannot adapt to textures of different sizes, a multi-receptive field fusion feature extraction network was designed. This allows the model to autonomously select the optimal receptive field, enhancing its flexibility and accuracy when handling wood textures at different scales. Secondly, the interdependencies between layers in traditional serial attention mechanisms limit performance. To address this, a concurrent attention mechanism was designed, which reduces interlayer interference by using a dual-stream parallel structure that enhances the ability to capture features. Furthermore, to overcome the issues of existing feature fusion methods that disrupt spatial structure and lack interpretability, this study proposes a feature fusion method based on feature correlation. This approach not only preserves the spatial structure of texture features but also improves the interpretability and stability of the fused features and the model. Finally, by introducing depthwise separable convolutions, the issue of a large number of model parameters is addressed, significantly improving training efficiency while maintaining model performance. Experiments were conducted using a wood texture similarity dataset consisting of 7588 image pairs. The results show that WTSM-SiameseNet achieved an accuracy of 96.67% on the test set, representing a 12.91% improvement in accuracy and a 14.21% improvement in precision compared to the pre-improved SiameseNet. Compared to CS-SiameseNet, accuracy increased by 2.86%, and precision improved by 6.58%. Full article
Show Figures

Figure 1

Figure 1
<p>Diagram of the SiameseNet architecture.</p>
Full article ">Figure 2
<p>Diagram of the WTSM-SiameseNet architecture.</p>
Full article ">Figure 3
<p>Diagram of the MRF-Resnet architecture.</p>
Full article ">Figure 4
<p>Multi-scale receptive field fusion.</p>
Full article ">Figure 5
<p>Concurrent attention.</p>
Full article ">Figure 6
<p>CBAM attention.</p>
Full article ">Figure 7
<p>Texture feature aggregation and matching module.</p>
Full article ">Figure 8
<p>Sample dataset.</p>
Full article ">Figure 9
<p>Training loss.</p>
Full article ">Figure 10
<p>Wood-texture-similarity matching example.</p>
Full article ">
11 pages, 755 KiB  
Article
Demographics of Feline Lymphoma in Australian Cat Populations: 1705 Cases
by Peter Bennett, Peter Williamson and Rosanne Taylor
Vet. Sci. 2024, 11(12), 641; https://doi.org/10.3390/vetsci11120641 - 11 Dec 2024
Viewed by 850
Abstract
Lymphoma is the most common haematopoietic cancer in cats with few large studies evaluating breed and sex as risk factors for the disease. Australia’s geographic isolation and quarantine rules have led to a potentially restricted genetic pool and, currently, there have not been [...] Read more.
Lymphoma is the most common haematopoietic cancer in cats with few large studies evaluating breed and sex as risk factors for the disease. Australia’s geographic isolation and quarantine rules have led to a potentially restricted genetic pool and, currently, there have not been any large local epidemiological studies reported. A total of 1705 lymphoma cases were identified from several sources and compared to a reference population of 85,741 cats, and represent cats that are presented to veterinary clinics. Odds ratios were calculated for each breed that included lymphoma cases, as well as sex, retroviral status, and immunophenotype. The distributions of age and weight in the lymphoma and control populations and proportions of lymphoma cases in anatomic locations were compared. Eight breeds were identified as displaying increased potential risk of lymphoma and three at decreased risk. Male cats were found to be at increased risk (OR 1.2, 95%CI: 1.1 to 1.3, p = 0.002). The lymphoma cases were older, with a median age of 11.7 years compared to 9.0 years (p < 0.0001), and weighed less, with a median weight of 3.7 kg compared to 4.0 kg (p = 0.010), than the control population. Several breeds were found to have significant variations in the proportions of anatomical presentations including the Siamese, Burmilla, Australian mist, ragdoll, British shorthair, and domestic cats. These findings require confirmation in future studies that address the limitations of this study, as outlined in the discussion. Full article
Show Figures

Figure 1

Figure 1
<p>Age (years) and weight (kg) distributions of the control and lymphoma populations. Frequency values are percentages of the total for each age or weight.</p>
Full article ">Figure 2
<p>Odds ratios for lymphoma with 95% CI for all breeds with cases. (<b>A</b>) All cases and controls, (<b>B</b>) cases and controls from referral centres, and (<b>C</b>) cases and controls from non-referral centres. Circles represent the OR and the line is the 95% CI. The X axis scale is logarithmic.</p>
Full article ">
19 pages, 13055 KiB  
Article
Siamese-RCNet: Defect Detection Model for Complex Textured Surfaces with Few Annotations
by Dandan Guo, Chunying Zhang, Guanghui Yang, Tao Xue, Jiang Ma, Lu Liu and Jing Ren
Electronics 2024, 13(24), 4873; https://doi.org/10.3390/electronics13244873 - 10 Dec 2024
Viewed by 386
Abstract
The surface texture of objects in industrial scenes is complex and diverse, and the characteristics of surface defects are often very similar to the surrounding environment and texture background, so it is difficult to accurately detect the defect area. However, when deep learning [...] Read more.
The surface texture of objects in industrial scenes is complex and diverse, and the characteristics of surface defects are often very similar to the surrounding environment and texture background, so it is difficult to accurately detect the defect area. However, when deep learning technology is used to detect complex texture surface defects, the detection accuracy is not high, due to the lack of large-scale pixel-level label datasets. Therefore, a defect detection model Siamese-RCNet for complex texture surface with a small number of annotations is proposed. The Cascade R-CNN target detection network is used as the basic framework, making full use of unlabeled image feature information, and fusing the nonlinear relationship learning ability of Siamese network and the feature extraction ability of the Res2Net backbone network to more effectively capture the subtle features of complex texture surface defects. The image difference measurement method is used to calculate the similarity between different images, and the attention module is constructed to weight the feature map of the feature extraction pyramid, so that the model can focus more on the defect area and suppress the influence of complex background texture area, so as to improve the accuracy of detection. To verify the effectiveness of the Siamese-RCNet model, a series of experiments were carried out on the DAGM2007 dataset of weakly supervised learning texture surface defects for industrial optical inspection. The results show that even if only 20% of the labeled datasets are used, the [email protected] of the Siamese-RCNet model can still reach 96.9%. Compared with the traditional Cascade R-CNN and Faster R-CNN target detection networks, the Siamese-RCNet model has high accuracy, can reduce the workload of manual labeling, and provides strong support for practical applications. Full article
Show Figures

Figure 1

Figure 1
<p>The network structure diagram of the Siamese network.</p>
Full article ">Figure 2
<p>The network structure diagram of Siamese-RCNet Network.</p>
Full article ">Figure 3
<p>The framework of SMRes2Net-101. (<b>a</b>) The basic architecture of feature extraction network in SMRes2Net-101. (<b>b</b>) Details of the Res2Net block.</p>
Full article ">Figure 4
<p>Feature pyramid and attention module design structure.</p>
Full article ">Figure 5
<p>The framework of Cascade R-CNN.</p>
Full article ">Figure 6
<p>Search results for data augmentation strategies in DAGM2007.</p>
Full article ">Figure 7
<p>mAP@0.5 results on DAGM2007 validation set at different percentages.</p>
Full article ">Figure 8
<p>Training loss and mAP@0.5 curve results on DAGM2007 validation set.</p>
Full article ">Figure 9
<p>Results of mAP@0.5 curves on DAGM2007 at different ratios.</p>
Full article ">Figure 10
<p>Visualization results of Siamese-RCNet and other reference models on 40% of the DAGM2007.</p>
Full article ">
19 pages, 3414 KiB  
Article
Deep Contrastive Survival Analysis with Dual-View Clustering
by Chang Cui, Yongqiang Tang and Wensheng Zhang
Electronics 2024, 13(24), 4866; https://doi.org/10.3390/electronics13244866 - 10 Dec 2024
Viewed by 389
Abstract
Survival analysis aims to analyze the relationship between covariates and events of interest, and is widely applied in multiple research fields, especially in clinical fields. Recently, some studies have attempted to discover potential sub-populations in survival data to assist in survival prediction with [...] Read more.
Survival analysis aims to analyze the relationship between covariates and events of interest, and is widely applied in multiple research fields, especially in clinical fields. Recently, some studies have attempted to discover potential sub-populations in survival data to assist in survival prediction with clustering. However, existing models that combine clustering with survival analysis face multiple challenges: incomplete representation caused by single-path encoders, the incomplete information of pseudo-samples, and misleading effects of boundary samples. To overcome these challenges, in this study, we propose a novel deep contrastive survival analysis model with dual-view clustering. Specifically, we design a Siamese autoencoder to construct latent spaces in two views and conduct dual-view clustering to more comprehensively capture patient representations. Moreover, we consider the dual views as mutual augmentations rather than introducing pseudo-samples and, based on this, triplet contrastive learning is proposed to fully utilize clustering information and dual-view representations to enhance survival prediction. Additionally, we employ a self-paced learning strategy in the dual-view clustering process to ensure the model handles samples from easy to hard in training, thereby avoiding the misleading effects of boundary samples. Our proposal achieves an average C-index and IBS of 0.6653 and 0.1786 on three widely used clinical datasets, both exceeding the existing best methods, which demonstrates its advanced discriminative and calibration performance. Full article
(This article belongs to the Special Issue Machine Learning for Biomedical Applications)
Show Figures

Figure 1

Figure 1
<p>The overall architecture of the DVC-Surv model. The Siamese autoencoder consists of two autoencoders without parameter sharing, mapping patient covariates into latent spaces of two views. Subsequently, the dual-view clustering module integrates the representations from dual views to cluster the samples. Lastly, the fused representation of the two views and covariates is fed into the survival backbone to obtain an estimation of the survival distribution.</p>
Full article ">Figure 2
<p>Schematic diagram of triple contrastive learning, including (<b>a</b>) inter-view cluster-guided contrastive learning, (<b>b</b>) intra-view instance-wise contrastive learning, and (<b>c</b>) intra-view cluster-wise contrastive learning.</p>
Full article ">Figure 3
<p>The visualization of dual-view clustering with tSNE. The t-SNE algorithm can map high-dimensional data to a low-dimensional space (such as two-dimensional space) while preserving the similarity between data points, thereby enabling the visualization of high-dimensional data distributions. Specifically, the clustering results in two views at the end of pre-training and training are shown. In each figure, different clusters are represented by different colors, with censored and uncensored samples indicated by ‘×’ and ‘·’, respectively.</p>
Full article ">Figure 4
<p>The feature importance of the model is determined using the SHAP algorithm. Specifically, the SHAP algorithm evaluates the contribution of each feature to the model’s predictions by calculating the marginal effect of each feature on each sample’s prediction. Higher SHAP values indicate that the feature plays a more significant role in the model’s prediction outcomes. Based on this, the average ranking of features’ SHAP values across all samples represents the importance ranking of the features. This can help us identify the features on which the model relies when making predictions, thereby better understanding the model’s decision-making process.</p>
Full article ">
15 pages, 6962 KiB  
Article
Perceptual Quality Assessment for Pansharpened Images Based on Deep Feature Similarity Measure
by Zhenhua Zhang, Shenfu Zhang, Xiangchao Meng, Liang Chen and Feng Shao
Remote Sens. 2024, 16(24), 4621; https://doi.org/10.3390/rs16244621 - 10 Dec 2024
Viewed by 483
Abstract
Pan-sharpening aims to generate high-resolution (HR) multispectral (MS) images by fusing HR panchromatic (PAN) and low-resolution (LR) MS images covering the same area. However, due to the lack of real HR MS reference images, how to accurately evaluate the quality of a fused [...] Read more.
Pan-sharpening aims to generate high-resolution (HR) multispectral (MS) images by fusing HR panchromatic (PAN) and low-resolution (LR) MS images covering the same area. However, due to the lack of real HR MS reference images, how to accurately evaluate the quality of a fused image without reference is challenging. On the one hand, most methods evaluate the quality of the fused image using the full-reference indices based on the simulated experimental data on the popular Wald’s protocol; however, this remains controversial to the full-resolution data fusion. On the other hand, existing limited no reference methods, most of which depend on manually crafted features, cannot fully capture the sensitive spatial/spectral distortions of the fused image. Therefore, this paper proposes a perceptual quality assessment method based on deep feature similarity measure. The proposed network includes spatial/spectral feature extraction and similarity measure (FESM) branch and overall evaluation network. The Siamese FESM branch extracts the spatial and spectral deep features and calculates the similarity of the corresponding pair of deep features to obtain the spatial and spectral feature parameters, and then, the overall evaluation network realizes the overall quality assessment. Moreover, we propose to quantify both the overall precision of all the training samples and the variations among different fusion methods in a batch, thereby enhancing the network’s accuracy and robustness. The proposed method was trained and tested on a large subjective evaluation dataset comprising 13,620 fused images. The experimental results suggested the effectiveness and the competitive performance. Full article
Show Figures

Figure 1

Figure 1
<p>Flowchart of the DFSM-net.</p>
Full article ">Figure 2
<p>Flowchart of feature extraction and similarity measure block.</p>
Full article ">Figure 3
<p>SRCC by two different training methods.</p>
Full article ">Figure 4
<p>Performance evaluation on six satellite datasets.</p>
Full article ">Figure 5
<p>A sample pair of a PAN/MS image and the fusion images fused by different methods.</p>
Full article ">Figure 6
<p>Scatter plots of the predicted scores against the subjective difference mean opinion scores (DMOS). (<b>a</b>) Loss function w/o <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="script">L</mi> <mrow> <mi>C</mi> <mi>V</mi> </mrow> </msub> </mrow> </semantics></math>. (<b>b</b>) Loss function with <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="script">L</mi> <mrow> <mi>C</mi> <mi>V</mi> </mrow> </msub> </mrow> </semantics></math>.</p>
Full article ">
17 pages, 5291 KiB  
Article
Dynamic-Aware Network for Moving Object Detection
by Hongrui Zhang, Luxia Yang and Xiaona Du
Symmetry 2024, 16(12), 1620; https://doi.org/10.3390/sym16121620 - 6 Dec 2024
Viewed by 448
Abstract
Moving object detection (MOD) plays an important role in many applications that aim to identify regions of interest in videos. However, most existing MOD methods ignore the variability brought by time-varying information. Additionally, many network frameworks primarily focus on low-level feature learning, neglecting [...] Read more.
Moving object detection (MOD) plays an important role in many applications that aim to identify regions of interest in videos. However, most existing MOD methods ignore the variability brought by time-varying information. Additionally, many network frameworks primarily focus on low-level feature learning, neglecting the higher-level contextual understanding required for accurate detection. To solve the above issues, we propose a symmetric Dynamic-Aware Network (DAN) for MOD. DAN explores the interactions between different types of information via structural design and feature optimization. To locate the object position quickly, we build a Siamese convolutional network to emphasize changes in the scene. Subsequently, a Change-Aware Module (CAM) is designed, which can maximize the perception of object change cues by exploiting complementary depth-varying features and different levels of disparity information, thereby enhancing the feature discrimination capability of the network. Moreover, to reinforce the effective transfer between features, we devise a Motion-Attentive Selection Module (MASM) to construct an autonomous decoder for augmenting detail representation. Experimental results on benchmark datasets indicate the rationality and validity of the proposed approach. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

Figure 1
<p>The network architecture of DAN.</p>
Full article ">Figure 2
<p>Illustration of the architecture of Siamese convolutional network (SCN).</p>
Full article ">Figure 3
<p>The structure of change-aware module.</p>
Full article ">Figure 4
<p>Detailed configuration of motion-attentive selection module.</p>
Full article ">Figure 5
<p>Visual results of ablation analysis on LASIESTA and CDnet2014 datasets.</p>
Full article ">Figure 6
<p>Analysis of the performance of various approaches on the LASIESTA dataset (Metrics are F1 and average F1).</p>
Full article ">Figure 7
<p>Analysis of the performance of various approaches on the CDnet2014 dataset (Metrics are F1 and average F1).</p>
Full article ">Figure 8
<p>Visual results on the CDnet2014 dataset.</p>
Full article ">Figure 9
<p>Visual results on the INO dataset.</p>
Full article ">
16 pages, 952 KiB  
Article
SiCRNN: A Siamese Approach for Sleep Apnea Identification via Tracheal Microphone Signals
by Davide Lillini, Carlo Aironi, Lucia Migliorelli, Leonardo Gabrielli and Stefano Squartini
Sensors 2024, 24(23), 7782; https://doi.org/10.3390/s24237782 - 5 Dec 2024
Viewed by 658
Abstract
Sleep apnea syndrome (SAS) affects about 3–7% of the global population, but is often undiagnosed. It involves pauses in breathing during sleep, for at least 10 s, due to partial or total airway blockage. The current gold standard for diagnosing SAS is polysomnography [...] Read more.
Sleep apnea syndrome (SAS) affects about 3–7% of the global population, but is often undiagnosed. It involves pauses in breathing during sleep, for at least 10 s, due to partial or total airway blockage. The current gold standard for diagnosing SAS is polysomnography (PSG), an intrusive procedure that depends on subjective assessment by expert clinicians. To address the limitations of PSG, we propose a decision support system, which uses a tracheal microphone for data collection and a deep learning (DL) approach—namely SiCRNN—to detect apnea events during overnight sleep recordings. Our proposed SiCRNN processes Mel spectrograms using a Siamese approach, integrating a convolutional neural network (CNN) backbone and a bidirectional gated recurrent unit (GRU). The final detection of apnea events is performed using an unsupervised clustering algorithm, specifically k-means. Multiple experimental runs were carried out to determine the optimal network configuration and the most suitable type and frequency range for the input data. Tests with data from eight patients showed that our method can achieve a Recall score of up to 95% for apnea events. We also compared the proposed approach to a fully convolutional baseline, recently introduced in the literature, highlighting the effectiveness of the Siamese training paradigm in improving the identification of SAS. Full article
Show Figures

Figure 1

Figure 1
<p>The scatter plots illustrate the output of the principal component (PCA) applied to the output of the final GRU layer in the SiCRNN model. The resulting embeddings are derived from two patients under two conditions: (<b>a</b>) noise-free patient embeddings and (<b>b</b>) noisy patient embeddings. The observed distances between the <span class="html-italic">apnea</span> and <span class="html-italic">non-apnea</span> clusters are <math display="inline"><semantics> <mrow> <mn>2.0</mn> </mrow> </semantics></math> in the noise-free scenario and <math display="inline"><semantics> <mrow> <mn>0.87</mn> </mrow> </semantics></math> in the presence of noise, respectively.</p>
Full article ">Figure 2
<p>Overview of the proposed SiCRNN framework. The purple dashed line highlights the Siamese configuration employed during the training phase, whereas the green dashed line corresponds to the inference phase, which is carried out through the k-means clustering algorithm.</p>
Full article ">Figure 3
<p>The scatter density plot shows the results of the hyperparameter tuning by relating the <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math>, and <span class="html-italic">F</span>1 <span class="html-italic">score</span> metrics to the number of GRU hidden layers used during training. On the x-axis, <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math> values are reported, while the y-axis represents <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math> values, and the size of the points indicates the <span class="html-italic">F</span>1 <span class="html-italic">score</span>. The different shades of orange represent the number of convolutional blocks used in the model’s training.</p>
Full article ">Figure 4
<p>The scatter density plot shows the results of the hyperparameter tuning by relating the <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math>, and <span class="html-italic">F</span>1 <span class="html-italic">score</span> metrics to the dimension of the kernel size used during training. On the x-axis, <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math> values are reported, while the y-axis represents <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math> values, and the size of the points indicates the <span class="html-italic">F</span>1 <span class="html-italic">score</span>. The different shades of gray represent the kernel size used in the model’s training.</p>
Full article ">Figure 5
<p>The scatter density plot shows the results of the hyperparameter tuning by relating the <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math>, and <span class="html-italic">F</span>1 <span class="html-italic">score</span> metrics to the number of MEL bands selected for each input sample frequency during training. On the x-axis, <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math> values are reported, while the y-axis represents <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math> values, and the size of the points indicates the <span class="html-italic">F</span>1 <span class="html-italic">score</span>. The different shades of blue represent the number of MEL bands used in the model’s training.</p>
Full article ">Figure 6
<p>The scatter density plot shows the results of the hyperparameter tuning by relating the <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math>, and <span class="html-italic">F</span>1 <span class="html-italic">score</span> metrics to the number of convolutional blocks used during training. On the x-axis, <math display="inline"><semantics> <mrow> <mi>P</mi> <mi>r</mi> <mi>e</mi> <mi>c</mi> <mi>i</mi> <mi>s</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> </mrow> </semantics></math> values are reported, while the y-axis represents <math display="inline"><semantics> <mrow> <mi>R</mi> <mi>e</mi> <mi>c</mi> <mi>a</mi> <mi>l</mi> <mi>l</mi> </mrow> </semantics></math> values, and the size of the points indicates the <span class="html-italic">F</span>1 <span class="html-italic">score</span>. The different shades of green represent the number of convolutional blocks used in the model’s training.</p>
Full article ">Figure 7
<p>(<b>a</b>) The region located below the top of the red mask indicates the apnea events; (<b>b</b>,<b>c</b>) spectrograms with the labeled red mask display an apnea event with significant spectral content. The time associated with each individual bin in the spectrograms is <math display="inline"><semantics> <mrow> <mn>11.56</mn> </mrow> </semantics></math> ms.</p>
Full article ">
24 pages, 10105 KiB  
Article
SiamRhic: Improved Cross-Correlation and Ranking Head-Based Siamese Network for Object Tracking in Remote Sensing Videos
by Afeng Yang, Zhuolin Yang and Wenqing Feng
Remote Sens. 2024, 16(23), 4549; https://doi.org/10.3390/rs16234549 - 4 Dec 2024
Viewed by 474
Abstract
Object tracking in remote sensing videos is a challenging task in computer vision. Recent advances in deep learning have sparked significant interest in tracking algorithms based on Siamese neural networks. However, many existing algorithms fail to deliver satisfactory performance in complex scenarios due [...] Read more.
Object tracking in remote sensing videos is a challenging task in computer vision. Recent advances in deep learning have sparked significant interest in tracking algorithms based on Siamese neural networks. However, many existing algorithms fail to deliver satisfactory performance in complex scenarios due to challenging conditions and limited computational resources. Thus, enhancing tracking efficiency and improving algorithm responsiveness in complex scenarios are crucial. To address tracking drift caused by similar objects and background interference in remote sensing image tracking, we propose an enhanced Siamese network based on the SiamRhic architecture, incorporating a cross-correlation and ranking head for improved object tracking. We first use convolutional neural networks for feature extraction and integrate the CBAM (Convolutional Block Attention Module) to enhance the tracker’s representational capacity, allowing it to focus more effectively on the objects. Additionally, we replace the original depth-wise cross-correlation operation with asymmetric convolution, enhancing both speed and performance. We also introduce a ranking loss to reduce the classification confidence of interference objects, addressing the mismatch between classification and regression. We validate the proposed algorithm through experiments on the OTB100, UAV123, and OOTB remote sensing datasets. Specifically, SiamRhic achieves success, normalized precision, and precision rates of 0.533, 0.786, and 0.812, respectively, on the OOTB benchmark. The OTB100 benchmark achieves a success rate of 0.670 and a precision rate of 0.892. Similarly, in the UAV123 benchmark, SiamRhic achieves a success rate of 0.621 and a precision rate of 0.823. These results demonstrate the algorithm’s high precision and success rates, highlighting its practical value. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The network architecture starts by taking a template image and a search image. It then extracts deep features using an enhanced ResNet50 network with a weighted attention mechanism. The CBAM attention mechanism is incorporated between the third, fourth, and fifth convolutional layers of the feature extraction network. These features are then input into an adaptive head network for cross-correlation and multi-layer feature fusion. Finally, ranking loss is applied to suppress the classification confidence scores of interfering items and reduce the mismatch between classification and regression.</p>
Full article ">Figure 2
<p>Attention mechanism. Feature maps from the third, fourth, and fifth convolutional blocks are processed through both channel and spatial attention mechanisms before being sent to the head network. The red box represents the channel attention mechanism, while the blue box represents the spatial attention mechanism.</p>
Full article ">Figure 3
<p>Channel attention module (CAM) and spatial attention module (SAM).</p>
Full article ">Figure 4
<p>Asymmetric convolution. (<b>a</b>) DW-Xcorr. (<b>b</b>) A naive approach for fusing feature maps of varying sizes. (<b>c</b>) Symmetric convolution.</p>
Full article ">Figure 5
<p>Ranking loss. We focus on samples with high classification confidence and increased IoU to achieve higher rankings, leveraging the relationship between the classification and regression branches. The red points represent the center point of the object obtained by classification, and the red boxes represent the bounding box of the object obtained by regression.</p>
Full article ">Figure 6
<p>The precision and success rates of our tracker compared to other trackers on the OTB100 dataset. (<b>a</b>) Success plots; (<b>b</b>) Precision plots.</p>
Full article ">Figure 7
<p>The success rate of our tracker compared to other trackers across the 11 challenges of the OTB100 dataset. (<b>a</b>) In-plane Rotation; (<b>b</b>) Fast Motion; (<b>c</b>) Out-of-view; (<b>d</b>) Low Resolution; (<b>e</b>) Occlusion; (<b>f</b>) Illumination Variation; (<b>g</b>) Deformation; (<b>h</b>) Motion Blur; (<b>i</b>) Out-of-plane Rotation; (<b>j</b>) Scale Variation; (<b>k</b>) Background Clutter.</p>
Full article ">Figure 8
<p>The precision of our tracker in comparison to other trackers across the 11 challenges of the OTB100 dataset. (<b>a</b>) In-plane Rotation; (<b>b</b>) Fast Motion; (<b>c</b>) Out-of-view; (<b>d</b>) Low Resolution; (<b>e</b>) Occlusion; (<b>f</b>) Il-lumination Variation; (<b>g</b>) Deformation; (<b>h</b>) Motion Blur; (<b>i</b>) Out-of-plane Rotation; (<b>j</b>) Background Clutter; (<b>k</b>) Scale Variation.</p>
Full article ">Figure 9
<p>The precision and success rates of our tracker, along with those of the comparison trackers, are evaluated on the UAV123 dataset. (<b>a</b>) Success plots; (<b>b</b>) Precision plots.</p>
Full article ">Figure 10
<p>The success rates of our tracker, along with those of the comparison trackers, are assessed across the twelve challenges of the UAV123 dataset. (<b>a</b>) Viewpoint Change; (<b>b</b>) Similar Object; (<b>c</b>) Fast Motion; (<b>d</b>) Out-of-view; (<b>e</b>) Full Occlusion; (<b>f</b>) Illumination Variation; (<b>g</b>) Background Clutter; (<b>h</b>) Aspect Ratio Variation; (<b>i</b>) Scale Variation; (<b>j</b>) Partial Occlusion; (<b>k</b>) Low Resolution; (<b>l</b>) Camera Motion.</p>
Full article ">Figure 11
<p>The precision of our tracker, as well as that of the comparison trackers, is evaluated across the twelve challenges presented in the UAV123 dataset. (<b>a</b>) Viewpoint Change; (<b>b</b>) Similar Object; (<b>c</b>) Fast Motion; (<b>d</b>) Out-of-view; (<b>e</b>) Full Occlusion; (<b>f</b>) Illumination Variation; (<b>g</b>) Background Clutter; (<b>h</b>) Aspect Ratio Variation; (<b>i</b>) Scale Variation; (<b>j</b>) Partial Occlusion; (<b>k</b>) Low Resolution; (<b>l</b>) Camera Motion.</p>
Full article ">Figure 12
<p>The precision, normalized precision, and success rates of both our tracker and the comparison trackers are assessed on the OOTB dataset. (<b>a</b>) Precision plot; (<b>b</b>) Normalized precision plots; (<b>c</b>) Success plots.</p>
Full article ">Figure 13
<p>The precision and success rates of our tracker compared to other trackers on the LaSOT dataset. (<b>a</b>) Success plots; (<b>b</b>) Precision plots.</p>
Full article ">Figure 14
<p>Visualization of the tracking results for our tracker and the comparative trackers across four video sequences from the OOTB dataset. The tracking results, displayed from left to right and top to bottom, correspond to the videos car_11_1, plane_1_1, ship_12_1, and train_1_1.</p>
Full article ">Figure 15
<p>Visualization of the tracking results for our tracker and the comparative trackers across four video sequences from the OTB dataset.</p>
Full article ">Figure 16
<p>Visualization of the tracking results for our tracker and the comparative trackers across four video sequences from the UAV123 dataset.</p>
Full article ">
15 pages, 1050 KiB  
Article
Siamese Network-Based Lightweight Framework for Tomato Leaf Disease Recognition
by Selvarajah Thuseethan, Palanisamy Vigneshwaran, Joseph Charles and Chathrie Wimalasooriya
Computers 2024, 13(12), 323; https://doi.org/10.3390/computers13120323 - 4 Dec 2024
Viewed by 444
Abstract
In this paper, a novel Siamese network-based lightweight framework is proposed for automatic tomato leaf disease recognition. This framework achieves the highest accuracy of 96.97% on the tomato subset obtained from the PlantVillage dataset and 95.48% on the Taiwan tomato leaf disease dataset. [...] Read more.
In this paper, a novel Siamese network-based lightweight framework is proposed for automatic tomato leaf disease recognition. This framework achieves the highest accuracy of 96.97% on the tomato subset obtained from the PlantVillage dataset and 95.48% on the Taiwan tomato leaf disease dataset. Experimental results further confirm that the proposed framework is effective with imbalanced and small data. The backbone network integrated with this framework is lightweight with approximately 2.9629 million trainable parameters, which is second to SqueezeNet and significantly lower than other lightweight deep networks. Automatic tomato disease recognition from leaf images is vital to avoid crop losses by applying control measures on time. Even though recent deep learning-based tomato disease recognition methods with classical training procedures showed promising recognition results, they demand large labeled data and involve expensive training. The traditional deep learning models proposed for tomato disease recognition also consume high memory and storage because of a high number of parameters. While lightweight networks overcome some of these issues to a certain extent, they continue to show low performance and struggle to handle imbalanced data. Full article
(This article belongs to the Special Issue Emerging Trends in Machine Learning and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>The overall architecture of the proposed tomato disease recognition framework. The <math display="inline"><semantics> <msub> <mi>I</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>I</mi> <mn>2</mn> </msub> </semantics></math> are the input images. The weights <span class="html-italic">w</span> are shared between two streams (<span class="html-italic">G</span>) of the Siamese network. <a href="#computers-13-00323-f002" class="html-fig">Figure 2</a> illustrates the architecture of the backbone network <span class="html-italic">G</span>. The distance <span class="html-italic">D</span> is estimated as the Euclidean distance between the outputs <math display="inline"><semantics> <mrow> <mi>G</mi> <mo>(</mo> <msub> <mi>I</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>G</mi> <mo>(</mo> <msub> <mi>I</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> </semantics></math>. The contrastive loss <span class="html-italic">L</span> is then calculated based on <span class="html-italic">D</span>.</p>
Full article ">Figure 2
<p>The block diagram of the lightweight deep network used as the backbone of the proposed framework. The layer configuration details are further given in <a href="#computers-13-00323-t001" class="html-table">Table 1</a>.</p>
Full article ">Figure 3
<p>A proposed testing scheme dedicated to Siamese network-based tomato plant disease recognition. This scheme follows a majority voting mechanism.</p>
Full article ">Figure 4
<p>Sample images taken from the PlantVillage and Taiwan datasets for each tomato disease class.</p>
Full article ">Figure 5
<p>An example image taken from Black mold disease and its augmented samples. There are seven different augmentation functions applied to the leaf images.</p>
Full article ">Figure 6
<p>Querying a test sample from the Gray spot disease class of the Taiwan dataset.</p>
Full article ">Figure 7
<p>The loss observed for the model trained on the whole PlantVillage tomato leaf dataset.</p>
Full article ">Figure 8
<p>Performance evaluation with imbalanced data: per class samples and class-wise accuracies for the PlantVillage dataset are compared.</p>
Full article ">
Back to TopTop