MDPI - Publisher of Open Access Journals

16 pages, 20081 KiB

Open AccessArticle

YOLO-ACE: Enhancing YOLO with Augmented Contextual Efficiency for Precision Cotton Weed Detection

by Qi Zhou, Huicheng Li, Zhiling Cai, Yiwen Zhong, Fenglin Zhong, Xiaoyu Lin and Lijin Wang

Sensors 2025, 25(5), 1635; https://doi.org/10.3390/s25051635 - 6 Mar 2025

Viewed by 102

Effective weed management is essential for protecting crop yields in cotton production, yet conventional deep learning approaches often falter in detecting small or occluded weeds and can be restricted by large parameter counts. To tackle these challenges, we propose YOLO-ACE, an advanced extension [...] Read more.

Effective weed management is essential for protecting crop yields in cotton production, yet conventional deep learning approaches often falter in detecting small or occluded weeds and can be restricted by large parameter counts. To tackle these challenges, we propose YOLO-ACE, an advanced extension of YOLOv5s, which was selected for its optimal balance of accuracy and speed, making it well suited for agricultural applications. YOLO-ACE integrates a Context Augmentation Module (CAM) and Selective Kernel Attention (SKAttention) to capture multi-scale features and dynamically adjust the receptive field, while a decoupled detection head separates classification from bounding box regression, enhancing overall efficiency. Experiments on the CottonWeedDet12 (CWD12) dataset show that YOLO-ACE achieves notable [email protected] and [email protected]:0.95 scores—95.3% and 89.5%, respectively—surpassing previous benchmarks. Additionally, we tested the model’s transferability and generalization across different crops and environments using the CropWeed dataset, where it achieved a competitive [email protected] of 84.3%, further showcasing its robust ability to adapt to diverse conditions. These results confirm that YOLO-ACE combines precise detection with parameter efficiency, meeting the exacting demands of modern cotton weed management. Full article

(This article belongs to the Special Issue Intelligent Sensing and Machine Vision in Precision Agriculture: 2nd Edition)

► Show Figures

Figure 1

16 pages, 2124 KiB

Open AccessArticle

SmartDENM—A System for Enhancing Pedestrian Safety Through Machine Vision and V2X Communication

by Abdulagha Dadashev and Árpád Török

Electronics 2025, 14(5), 1026; https://doi.org/10.3390/electronics14051026 - 4 Mar 2025

Viewed by 168

Abstract

A pivotal moment in the leap toward autonomous vehicles in recent years has revealed the need to enhance vehicle-to-everything (V2X) communication systems so as to improve road safety. A key challenge is to integrate real-time pedestrian detection to permit the use of timely [...] Read more.

A pivotal moment in the leap toward autonomous vehicles in recent years has revealed the need to enhance vehicle-to-everything (V2X) communication systems so as to improve road safety. A key challenge is to integrate real-time pedestrian detection to permit the use of timely alerts in situations where vulnerable road users, especially pedestrians, might pose a risk. Seeing that, in this article, a YOLO-based object detection model was used to identify pedestrians and extract key data such as bounding box coordinates and confidence levels. These data were encoded afterward into decentralized environmental notification messages (DENM) using ASN.1 schemas to ensure compliance with V2X standards, allowing for real-time communication between vehicles and infrastructure. This research identified that the integration of pedestrian detection with V2X communication brought about a reliable system wherein the roadside unit (RSU) broadcasts DENM alerts to vehicles. These vehicles, upon receiving the messages, initiate appropriate responses such as slowing down or lane changing, with the testing demonstrating reliable message transmission and high pedestrian detection accuracy in simulated–controlled environments. To conclude, this work demonstrates a scalable framework for improving road safety by combining machine vision with V2X communication. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in the New Era of Communication Networks)

► Show Figures

Figure 1

23 pages, 10794 KiB

Open AccessArticle

Hand–Eye Separation-Based First-Frame Positioning and Follower Tracking Method for Perforating Robotic Arm

by Handuo Zhang, Jun Guo, Chunyan Xu and Bin Zhang

Appl. Sci. 2025, 15(5), 2769; https://doi.org/10.3390/app15052769 - 4 Mar 2025

Viewed by 232

Abstract

In subway tunnel construction, current hand–eye integrated drilling robots use a camera mounted on the drilling arm for image acquisition. However, dust interference and long-distance operation cause a decline in image quality, affecting the stability and accuracy of the visual recognition system. Additionally, [...] Read more.

In subway tunnel construction, current hand–eye integrated drilling robots use a camera mounted on the drilling arm for image acquisition. However, dust interference and long-distance operation cause a decline in image quality, affecting the stability and accuracy of the visual recognition system. Additionally, the computational complexity of high-precision detection models limits deployment on resource-constrained edge devices, such as industrial controllers. To address these challenges, this paper proposes a dual-arm tunnel drilling robot system with hand–eye separation, utilizing the first-frame localization and follower tracking method. The vision arm (“eye”) provides real-time position data to the drilling arm (“hand”), ensuring accurate and efficient operation. The study employs an RFBNet model for initial frame localization, replacing the original VGG16 backbone with ShuffleNet V2. This reduces model parameters by 30% (135.5 MB vs. 146.3 MB) through channel splitting and depthwise separable convolutions to reduce computational complexity. Additionally, the GIoU loss function is introduced to replace the traditional IoU, further optimizing bounding box regression through the calculation of the minimum enclosing box. This resolves the gradient vanishing problem in traditional IoU and improves average precision (AP) by 3.3% (from 0.91 to 0.94). For continuous tracking, a SiamRPN-based algorithm combined with Kalman filtering and PID control ensures robustness against occlusions and nonlinear disturbances, increasing the success rate by 1.6% (0.639 vs. 0.629). Experimental results show that this approach significantly improves tracking accuracy and operational stability, achieving 31 FPS inference speed on edge devices and providing a deployable solution for tunnel construction’s safety and efficiency needs. Full article

► Show Figures

Figure 1

25 pages, 152810 KiB

Open AccessArticle

QEDetr: DETR with Query Enhancement for Fine-Grained Object Detection

by Chenguang Dong, Shan Jiang, Haijiang Sun, Jiang Li, Zhenglei Yu, Jiasong Wang and Jiacheng Wang

Remote Sens. 2025, 17(5), 893; https://doi.org/10.3390/rs17050893 - 3 Mar 2025

Viewed by 261

Abstract

Fine-grained object detection aims to accurately localize the object bounding box while identifying the specific model of the object, which is more challenging than conventional remote sensing object detection. Transformer-based object detector (DETR) can capture remote inter-feature dependencies by using attention, which is [...] Read more.

Fine-grained object detection aims to accurately localize the object bounding box while identifying the specific model of the object, which is more challenging than conventional remote sensing object detection. Transformer-based object detector (DETR) can capture remote inter-feature dependencies by using attention, which is suitable for fine-grained object detection tasks. However, most existing DETR-like object detectors are not specifically optimized for remote sensing detection tasks. Therefore, we propose an oriented fine-grained object detection method based on transformers. First, we combine denoising training and angle coding to propose a baseline DETR-like object detector for oriented object detection. Next, we propose a new attention mechanism for extracting finer-grained features by constraining the angle of sampling points during the attentional process, ensuring that the sampling points are more evenly distributed across the object features. Then, we propose a multiscale fusion method based on bilinear pooling to obtain the enhanced query and initialize a more accurate object bounding box. Finally, we combine the localization accuracy of each query with its classification accuracy and propose a new classification loss to further enhance the high-quality queries. Evaluation results on the FAIR1M dataset show that our method achieves an average accuracy of 48.5856 mAP and the highest accuracy of 49.7352 mAP in object detection, outperforming other methods. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

19 pages, 7601 KiB

Open AccessArticle

Mixture of Expert-Based SoftMax-Weighted Box Fusion for Robust Lesion Detection in Ultrasound Imaging

by Se-Yeol Rhyou, Minyung Yu and Jae-Chern Yoo

Diagnostics 2025, 15(5), 588; https://doi.org/10.3390/diagnostics15050588 - 28 Feb 2025

Viewed by 207

Abstract

Background/Objectives: Ultrasound (US) imaging plays a crucial role in the early detection and treatment of hepatocellular carcinoma (HCC). However, challenges such as speckle noise, low contrast, and diverse lesion morphology hinder its diagnostic accuracy. Methods: To address these issues, we propose CSM-FusionNet, a [...] Read more.

Background/Objectives: Ultrasound (US) imaging plays a crucial role in the early detection and treatment of hepatocellular carcinoma (HCC). However, challenges such as speckle noise, low contrast, and diverse lesion morphology hinder its diagnostic accuracy. Methods: To address these issues, we propose CSM-FusionNet, a novel framework that integrates clustering, SoftMax-weighted Box Fusion (SM-WBF), and padding. Using raw US images from a leading hospital, Samsung Medical Center (SMC), we applied intensity adjustment, adaptive histogram equalization, low-pass, and high-pass filters to reduce noise and enhance resolution. Data augmentation generated ten images per one raw US image, allowing the training of 10 YOLOv8 networks. The [email protected] of each network was used as SoftMax-derived weights in SM-WBF. Threshold-lowered bounding boxes were clustered using Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and outliers were managed within clusters. SM-WBF reduced redundant boxes, and padding enriched features, improving classification accuracy. Results: The accuracy improved from 82.48% to 97.58% with sensitivity reaching 100%. The framework increased lesion detection accuracy from 56.11% to 95.56% after clustering and SM-WBF. Conclusions: CSM-FusionNet demonstrates the potential to significantly improve diagnostic reliability in US-based lesion detection, aiding precise clinical decision-making. Full article

(This article belongs to the Special Issue Advances in Medical Image Processing, Segmentation and Classification)

► Show Figures

Figure 1

22 pages, 52708 KiB

Open AccessArticle

CSMR: A Multi-Modal Registered Dataset for Complex Scenarios

by Chenrui Li, Kun Gao, Zibo Hu, Zhijia Yang, Mingfeng Cai, Haobo Cheng and Zhenyu Zhu

Remote Sens. 2025, 17(5), 844; https://doi.org/10.3390/rs17050844 - 27 Feb 2025

Viewed by 159

Abstract

Complex scenarios pose challenges to tasks in computer vision, including image fusion, object detection, and image-to-image translation. On the one hand, complex scenarios involve fluctuating weather or lighting conditions, where even images of the same scenarios appear to be different. On the other [...] Read more.

Complex scenarios pose challenges to tasks in computer vision, including image fusion, object detection, and image-to-image translation. On the one hand, complex scenarios involve fluctuating weather or lighting conditions, where even images of the same scenarios appear to be different. On the other hand, the large amount of textural detail in the given images introduces considerable interference that can conceal the useful information contained in them. An effective solution to these problems is to use the complementary details present in multi-modal images, such as visible-light and infrared images. Visible-light images contain rich textural information while infrared images contain information about the temperature. In this study, we propose a multi-modal registered dataset for complex scenarios under various environmental conditions, targeting security surveillance and the monitoring of low-slow-small targets. Our dataset contains 30,819 images, where the targets are labeled as three classes of “person”, “car”, and “drone” using Yolo format bounding boxes. We compared our dataset with those used in the literature for computer vision-related tasks, including image fusion, object detection, and image-to-image translation. The results showed that introducing complementary information through image fusion can compensate for missing details in the original images, and we also revealed the limitations of visual tasks in single-modal images with complex scenarios. Full article

(This article belongs to the Special Issue Recent Advances in Infrared Target Detection)

► Show Figures

Figure 1

25 pages, 20488 KiB

Open AccessArticle

SAR Small Ship Detection Based on Enhanced YOLO Network

by Tianyue Guan, Sheng Chang, Chunle Wang and Xiaoxue Jia

Remote Sens. 2025, 17(5), 839; https://doi.org/10.3390/rs17050839 - 27 Feb 2025

Viewed by 168

Abstract

Ships are important targets for marine surveillance in both military and civilian domains. Since the rise of deep learning, ship detection in synthetic aperture radar (SAR) images has achieved significant progress. However, the variability in ship size and resolution, especially the widespread presence [...] Read more.

Ships are important targets for marine surveillance in both military and civilian domains. Since the rise of deep learning, ship detection in synthetic aperture radar (SAR) images has achieved significant progress. However, the variability in ship size and resolution, especially the widespread presence of numerous small-sized ships, continues to pose challenges for effective ship detection in SAR images. To address the challenges posed by small ship targets, we propose an enhanced YOLO network to improve the detection accuracy of small targets. Firstly, we propose a Shuffle Re-parameterization (SR) module as a replacement for the C2f module in the original YOLOv8 network. The SR module employs re-parameterized convolution along with channel shuffle operations to improve feature extraction capabilities. Secondly, we employ the space-to-depth (SPD) module to perform down-sampling operations within the backbone network, thereby reducing the information loss associated with pooling operations. Thirdly, we incorporate a Hybrid Attention (HA) module into the neck network to enhance the feature representation of small ship targets while mitigating the interference caused by surrounding sea clutter and speckle noise. Finally, we add the shape-NWD loss to the regression loss, which emphasizes the shape and scale of the bounding box and mitigates the sensitivity of Intersection over Union (IoU) to positional deviations in small ship targets. Extensive experiments were carried out on three publicly available datasets—namely, LS-SSDD, HRSID, and iVision-MRSSD—to demonstrate the effectiveness and reliability of the proposed method. In the small ship dataset LS-SSDD, the proposed method exhibits a notable improvement in average precision at an IoU threshold of 0.5 (AP50), surpassing the baseline network by over 4%, and achieving an AP50 of 77.2%. In the HRSID and iVision-MRSSD datasets, AP50 reaches 91% and 95%, respectively. Additionally, the average precision for small targets (AP) exhibits an increase of approximately 2% across both datasets. Furthermore, the proposed method demonstrates outstanding performance in comparison experiments across all three datasets, outperforming existing state-of-the-art target detection methods. The experimental results offer compelling evidence supporting the superior performance and practical applicability of the proposed method in SAR small ship detection. Full article

(This article belongs to the Special Issue SAR Image Object Detection and Information Extraction: Methods and Applications)

► Show Figures

Figure 1

19 pages, 36008 KiB

Open AccessArticle

An Enhanced Algorithm for Detecting Small Traffic Signs Using YOLOv10

by Hongrui Liu, Ke Wang, Yudi Wang, Ming Zhang, Qinghua Liu and Wentao Li

Electronics 2025, 14(5), 955; https://doi.org/10.3390/electronics14050955 - 27 Feb 2025

Viewed by 223

Abstract

Recognizing traffic signs is crucial for autonomous driving systems, as it significantly impacts their safety and dependability. However, challenges like the diminutive size of objects and intricate background environments limit the effectiveness of current object detection models. To improve small traffic sign detection, [...] Read more.

Recognizing traffic signs is crucial for autonomous driving systems, as it significantly impacts their safety and dependability. However, challenges like the diminutive size of objects and intricate background environments limit the effectiveness of current object detection models. To improve small traffic sign detection, this research introduces an enhanced detection algorithm built on YOLOv10. First, a custom-designed layer for detecting small objects is integrated into the neck section of the network, enhancing the feature extraction process for these objects. Second, a refined downsampling module, called Triple-Branch Downsampling (TBD), utilizes a multi-branch structure and hybrid pooling strategy to boost feature extraction efficiency within the model. Finally, the loss function is optimized by integrating the Normalized Wasserstein Distance (NWD) and Wise-MPDIoU mechanisms, increasing the accuracy of bounding box matching and regression. The experimental findings indicate that the enhanced algorithm reaches a [email protected] of 84.8%, marking a 4% increase over YOLOv10. The classification accuracy and recall are 73.4% and 82.9%, respectively. Moreover, the parameter count decreases by approximately 10%, while the computational complexity is reduced by around 5%. Full article

(This article belongs to the Special Issue Artificial Intelligence Application on Intelligent Transportation System)

► Show Figures

Figure 1

23 pages, 26465 KiB

Open AccessArticle

DHS-YOLO: Enhanced Detection of Slender Wheat Seedlings Under Dynamic Illumination Conditions

by Xuhua Dong and Jingbang Pan

Agriculture 2025, 15(5), 510; https://doi.org/10.3390/agriculture15050510 - 26 Feb 2025

Viewed by 292

Abstract

The precise identification of wheat seedlings in unmanned aerial vehicle (UAV) imagery is fundamental for implementing precision agricultural practices such as targeted pesticide application and irrigation management. This detection task presents significant technical challenges due to two inherent complexities: (1) environmental interference from [...] Read more.

The precise identification of wheat seedlings in unmanned aerial vehicle (UAV) imagery is fundamental for implementing precision agricultural practices such as targeted pesticide application and irrigation management. This detection task presents significant technical challenges due to two inherent complexities: (1) environmental interference from variable illumination conditions and (2) morphological characteristics of wheat seedlings characterized by slender leaf structures and flexible posture variations. To address these challenges, we propose DHS-YOLO, a novel deep learning framework optimized for robust wheat seedling detection under diverse illumination intensities. Our methodology builds upon the YOLOv11 architecture with three principal enhancements: First, the Dynamic Slender Convolution (DSC) module employs deformable convolutions to adaptively capture the elongated morphological features of wheat leaves. Second, the Histogram Transformer (HT) module integrates a dynamic-range spatial attention mechanism to mitigate illumination-induced image degradation. Third, we implement the ShapeIoU loss function that prioritizes geometric consistency between predicted and ground truth bounding boxes, particularly optimizing for slender plant structures. The experimental validation was conducted using a custom UAV-captured dataset containing wheat seedling images under varying illumination conditions. Compared to the existing models, the proposed model achieved the best performance with precision, recall, mAP50, and mAP50-95 values of 94.1%, 91.0%, 95.2%, and 81.9%, respectively. These results demonstrate our model’s effectiveness in overcoming illumination variations while maintaining high sensitivity to fine plant structures. This research contributes an optimized computer vision solution for precision agriculture applications, particularly enabling automated field management systems through reliable crop detection in challenging environmental conditions. Full article

(This article belongs to the Special Issue Computational, AI and IT Solutions Helping Agriculture)

► Show Figures

Figure 1

26 pages, 13085 KiB

Open AccessArticle

Image Augmentation Approaches for Building Dimension Estimation in Street View Images Using Object Detection and Instance Segmentation Based on Deep Learning

by Dongjin Hwang, Jae-Jun Kim, Sungkon Moon and Seunghyeon Wang

Appl. Sci. 2025, 15(5), 2525; https://doi.org/10.3390/app15052525 - 26 Feb 2025

Viewed by 192

Abstract

There are numerous applications for building dimension data, including building performance simulation and urban heat island investigations. In this context, object detection and instance segmentation methods—based on deep learning—are often used with Street View Images (SVIs) to estimate building dimensions. However, these methods [...] Read more.

There are numerous applications for building dimension data, including building performance simulation and urban heat island investigations. In this context, object detection and instance segmentation methods—based on deep learning—are often used with Street View Images (SVIs) to estimate building dimensions. However, these methods typically depend on large and diverse datasets. Image augmentation can artificially boost dataset diversity, yet its role in building dimension estimation from SVIs remains under-studied. This research presents a methodology that applies eight distinct augmentation techniques—brightness, contrast, perspective, rotation, scale, shearing, translation augmentation, and a combined “sum of all” approach—to train models in two tasks: object detection with Faster Region-Based Convolutional Neural Networks (Faster R-CNNs) and instance segmentation with You Only Look Once (YOLO)v10. Comparing the performance with and without augmentation revealed that contrast augmentation consistently provided the greatest improvement in both bounding-box detection and instance segmentation. Using all augmentations at once rarely outperformed the single most effective method, and sometimes degraded the accuracy; shearing augmentation ranked as the second-best approach. Notably, the validation and test findings were closely aligned. These results, alongside the potential applications and the method’s current limitations, underscore the importance of carefully selected augmentations for reliable building dimension estimation. Full article

(This article belongs to the Special Issue Advanced Technologies in Construction and Infrastructure: Theory, Methods and Applications—2nd Edition)

► Show Figures

Figure 1

22 pages, 2410 KiB

Open AccessArticle

DAHD-YOLO: A New High Robustness and Real-Time Method for Smoking Detection

by Jianfei Zhang and Chengwei Jiang

Sensors 2025, 25(5), 1433; https://doi.org/10.3390/s25051433 - 26 Feb 2025

Viewed by 154

Abstract

Recent advancements in AI technologies have driven the extensive adoption of deep learning architectures for recognizing human behavioral patterns. However, the existing smoking behavior detection models based on object detection still have problems, including poor accuracy and insufficient real-time performance. Especially in complex [...] Read more.

Recent advancements in AI technologies have driven the extensive adoption of deep learning architectures for recognizing human behavioral patterns. However, the existing smoking behavior detection models based on object detection still have problems, including poor accuracy and insufficient real-time performance. Especially in complex environments, the existing models often struggle with erroneous detections and missed detections. In this paper, we introduce DAHD-YOLO, a model built upon the foundation of YOLOv8. We first designed the DBCA module to replace the bottleneck component in the backbone. The architecture integrates a diverse branch block and a contextual anchor mechanism, effectively improving the backbone network’s ability to extract features. Subsequently, at the end of the backbone, we introduce adaptive fine-grained channel attention (AFGCA) to effectively facilitate the fusion of both overarching patterns and localized details. We introduce the ECA-FPN, an improved version of the feature pyramid network, designed to refine the extraction of hierarchical information and enhance cross-scale feature interactions. The decoupled detection head is also updated via the reparameterization approach. The wise–powerful intersection over union (Wise-PIoU) is adopted as the new bounding box regression loss function, resulting in quicker convergence speed and improved detection outcomes. Our system achieves superior results compared to existing models using a self-constructed smoking detection dataset, reducing computational complexity by 23.20% while trimming the model parameters by 33.95%. Moreover, the mAP50 of our model has increased by 5.1% compared to the benchmark model, reaching 86.0%. Finally, we deploy the improved model on the RK3588. After optimizations such as quantization and multi-threading, the system achieves a detection rate of 50.2 fps, addressing practical application demands and facilitating the precise and instantaneous identification of smoking activities. Full article

(This article belongs to the Special Issue Computer Vision for Object Detection and Tracking with Sensor-Based Applications)

► Show Figures

Figure 1

22 pages, 3970 KiB

Open AccessArticle

YOLO-ALW: An Enhanced High-Precision Model for Chili Maturity Detection

by Yi Wang, Cheng Ouyang, Hao Peng, Jingtao Deng, Lin Yang, Hailin Chen, Yahui Luo and Ping Jiang

Sensors 2025, 25(5), 1405; https://doi.org/10.3390/s25051405 - 25 Feb 2025

Viewed by 239

Abstract

Chili pepper, a widely cultivated and consumed crop, faces challenges in accurately determining maturity due to issues such as occlusion, small target size, and similarity between fruit color and background. This study presents an enhanced YOLOv8n-based object detection model, YOLO-ALW, designed to address [...] Read more.

Chili pepper, a widely cultivated and consumed crop, faces challenges in accurately determining maturity due to issues such as occlusion, small target size, and similarity between fruit color and background. This study presents an enhanced YOLOv8n-based object detection model, YOLO-ALW, designed to address these challenges. The model introduces the AKConv (Alterable Kernel Convolution) module in the head section, which adaptively adjusts the convolution kernel shape and size based on the target and scene, improving detection performance under occlusion and dense environments. In the backbone, the SPPF_LSKA (Spatial Pyramid Pooling Fast-Large Separable Kernel Attention) module enhances the integration of multi-scale features, facilitating accurate differentiation of peppers at various maturity stages while maintaining low computational complexity. Additionally, the Wise-IoU (Wise Intersection over Union) loss function optimizes bounding box learning, further improving the detection of peppers in occluded or background-similar scenarios. Experimental results demonstrate that YOLO-ALW achieves a mean average precision (mAP_0.5) of 99.1%, with precision and recall rates of 98.3% and 97.8%, respectively, outperforming the original YOLOv8n by 3.4%, 5.1%, and 9.0%, respectively. Grad-CAM feature visualization highlights the model’s improved focus on key fruit features. YOLO-ALW shows significant promise for high-precision chili pepper detection and maturity recognition, offering valuable support for automated harvesting applications. Full article

(This article belongs to the Section Smart Agriculture)

► Show Figures

Figure 1

33 pages, 23014 KiB

Open AccessArticle

Underwater Target Tracking Method Based on Forward-Looking Sonar Data

by Wenjing Zeng, Renzhe Li, Heng Zhou and Tiedong Zhang

J. Mar. Sci. Eng. 2025, 13(3), 430; https://doi.org/10.3390/jmse13030430 - 25 Feb 2025

Viewed by 231

Abstract

Underwater dynamic targets often display significant blurriness in their forward-looking sonar imagery, accompanied by sparse feature representation. This phenomenon presents several challenges, including disturbances in the trajectories of underwater targets and alterations in target identification throughout the tracking process, thereby complicating the continuous [...] Read more.

Underwater dynamic targets often display significant blurriness in their forward-looking sonar imagery, accompanied by sparse feature representation. This phenomenon presents several challenges, including disturbances in the trajectories of underwater targets and alterations in target identification throughout the tracking process, thereby complicating the continuous monitoring of moving targets. This research proposes a new framework for underwater acoustic data interpolation and underwater object tracking. Considering the character of underwater acoustic images, a Swin Transformer is integrated in the architecture of the YOLOv5 network; then, an improved Deep Simple Online and Real-time Tracking method is developed. By enlarging the bounding box output generated by the detector and subsequently integrating it into the tracker, the sensing horizon of the tracker is broadened. This strategy enables the extraction of noise features surrounding the target, thereby augmenting the target’s characteristics and improving the stability of the tracking process. The experimental results demonstrate that the proposed method effectively reduces the frequency of changes in target identification numbers, minimizes the occurrence of trajectory interruptions, and decreases the overall percentage of trajectory interruptions. Additionally, it significantly enhances tracking stability, particularly in scenarios involving intersecting target paths and encounters. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

15 pages, 5230 KiB

Open AccessArticle

Vehicle Exhaust Estimation Using YOLOv7 and Support Vector Regression with Image Features

by Yun-Sin Lin, Ting-Yu Chen, Jiun-Jian Liaw, Hsi-Hsien Yang and Cheng-Hsiung Hsieh

Information 2025, 16(3), 168; https://doi.org/10.3390/info16030168 - 24 Feb 2025

Viewed by 207

Abstract

Vehicle exhaust is a major source of air pollution that contributes to environmental degradation and poses risks to public health. This paper presents an image-based method to estimate opacity (OP) and particulate matter (PM) from vehicle exhaust. In the proposed method, YOLOv7 was [...] Read more.

Vehicle exhaust is a major source of air pollution that contributes to environmental degradation and poses risks to public health. This paper presents an image-based method to estimate opacity (OP) and particulate matter (PM) from vehicle exhaust. In the proposed method, YOLOv7 was used to identify vehicles and, thus, the region of interest (ROI). Then, a support vector regression was trained, with four image features extracted from the ROI as the input vectors, while OP or PM was used as the output. The proposed method was verified by experiments where moving and static scenarios with three ROIs were considered. The ROIs used in the experiments were exhaust pipe area (EPA), vehicle bounding box (VBB), and white background (WBG). In the moving scenario, the EPA and VBB ROIs were considered. For the VBB ROI, the average R² values for OP and PM in the given examples were 0.834 and 0.894, respectively. For the EPA ROI, the average R² values for OP and PM were 0.838 and 0.910, respectively. In the static scenario, the EPA and WBG ROIs were considered. For the EPA ROI, the average R² values for OP and PM were 0.619 and 0.612, respectively. For the WBG ROI, the average R² values for OP and PM were 0.748 and 0.732, respectively. The results suggest that the EPA ROI is preferable in the moving scenario and the WBG ROI in the static scenario to estimate OP and PM from vehicle exhaust. The proposed method is promising in the estimation of OP and PM from vehicle exhaust because satisfactory R² values were achieved. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

27 pages, 65983 KiB

Open AccessArticle

Automatic Prompt Generation Using Class Activation Maps for Foundational Models: A Polyp Segmentation Case Study

by Hanna Borgli, Håkon Kvale Stensland and Pål Halvorsen

Mach. Learn. Knowl. Extr. 2025, 7(1), 22; https://doi.org/10.3390/make7010022 - 24 Feb 2025

Viewed by 292

Abstract

We introduce a weakly supervised segmentation approach that leverages class activation maps and the Segment Anything Model to generate high-quality masks using only classification data. A pre-trained classifier produces class activation maps that, once thresholded, yield bounding boxes encapsulating the regions of interest. [...] Read more.

We introduce a weakly supervised segmentation approach that leverages class activation maps and the Segment Anything Model to generate high-quality masks using only classification data. A pre-trained classifier produces class activation maps that, once thresholded, yield bounding boxes encapsulating the regions of interest. These boxes prompt the SAM to generate detailed segmentation masks, which are then refined by selecting the best overlap with automatically generated masks from the foundational model using the intersection over union metric. In a polyp segmentation case study, our approach outperforms existing zero-shot and weakly supervised methods, achieving a mean intersection over union of 0.63. This method offers an efficient and general solution for image segmentation tasks where segmentation data are scarce. Full article

(This article belongs to the Section Data)

► Show Figures

Graphical abstract

Search Results (1,131)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,131)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI