[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,782)

Search Parameters:
Keywords = video detection

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 10252 KiB  
Article
A New Log-Transform Histogram Equalization Technique for Deep Learning-Based Document Forgery Detection
by Yong-Yeol Bae, Dae-Jea Cho and Ki-Hyun Jung
Symmetry 2025, 17(3), 395; https://doi.org/10.3390/sym17030395 (registering DOI) - 5 Mar 2025
Abstract
Recent advancements in image processing technology have positively impacted some fields, such as image, document, and video production. However, the negative implications of these advancements have also increased, with document image manipulation being a prominent issue. Document image manipulation involves the forgery or [...] Read more.
Recent advancements in image processing technology have positively impacted some fields, such as image, document, and video production. However, the negative implications of these advancements have also increased, with document image manipulation being a prominent issue. Document image manipulation involves the forgery or alteration of documents like receipts, invoices, various certificates, and confirmations. The use of such manipulated documents can cause significant economic and social disruption. To prevent these issues, various methods for the detection of forged document images are being researched, with recent proposals focused on deep learning techniques. An essential aspect of using deep learning to detect manipulated documents is to enhance or augment the characteristics of document images before inputting them into a model. Enhancing the distinctive features of manipulated documents before inputting them into a deep learning model is crucial to achieve high accuracy. One crucial characteristic of document images is their inherent symmetrical patterns, such as consistent text alignment, structural balance, and uniform pixel distribution. This study investigates document forgery detection through a symmetry-aware approach. By focusing on the symmetric structures found in document layouts and pixel distribution, the proposed LTHE technique enhances feature extraction in deep learning-based models. Therefore, this study proposes a new image enhancement technique based on the results of three general-purpose CNN models to enhance the characteristics of document images and achieve high accuracy in deep learning-based forgery detection. The proposed LTHE (Log-Transform Histogram Equalization) technique increases low pixel values through log transformation and increases image contrast by performing histogram equalization to make the features of the image more prominent. Experimental results show that the proposed LTHE technique achieves higher accuracy when compared to other enhancement methods, indicating its potential to aid the development of deep learning-based forgery detection algorithms in the future. Full article
(This article belongs to the Special Issue Symmetry in Image Processing: Novel Topics and Advancements)
Show Figures

Figure 1

Figure 1
<p>Images purposely forged for research purposes. (<b>a</b>) Copy-move method. (<b>b</b>) Insertion method. (<b>c</b>) Splicing method.</p>
Full article ">Figure 2
<p>The proposed feature extraction and enhancement steps for deep learning-based detection methods.</p>
Full article ">Figure 3
<p>The logarithmic transformation results. (<b>a</b>) A histogram of the original image. (<b>b</b>) A histogram of the original image after applying log transformation. (<b>c</b>) A histogram of the forged image using the insertion method. (<b>d</b>) A histogram of the forged image using the insertion method after applying log transformation. (<b>e</b>) A histogram of the forged image using the copy-move method. (<b>f</b>) A histogram of the forged image using the copy-move method after applying log transformation.</p>
Full article ">Figure 4
<p>A process diagram of the proposed LTHE method.</p>
Full article ">Figure 5
<p>The results after applying LTHE. (<b>a</b>) The non-processed original image. (<b>b</b>) The forged image using the insertion method. (<b>c</b>) The forged image using the copy-move method. (<b>d</b>) The LTHE-applied original image. (<b>e</b>) The LTHE-applied forged image using the insertion method. (<b>f</b>) The LTHE-applied forged image using the copy-move method.</p>
Full article ">Figure 6
<p>For performance evaluation purposes, two types of datasets were used with the forged regions indicated by red boxes. (<b>a</b>) A custom test dataset. (<b>b</b>) The ICPR 2018 Fraud Contest dataset. (<b>c</b>) The DocTamper dataset.</p>
Full article ">
16 pages, 13674 KiB  
Article
Enhancing Meteor Observations with Photodiode Detectors
by Adam Popowicz, Jerzy Fiołka, Jacek Chęciński and Krzysztof Bernacki
Appl. Sci. 2025, 15(5), 2828; https://doi.org/10.3390/app15052828 (registering DOI) - 5 Mar 2025
Abstract
This article introduces an innovative meteor detection system that integrates high-speed photodiode detectors with traditional camera-based systems. The system employs four photodiodes to record changes in sky brightness at 100 Hz, enabling meteor detection and the observation of their dynamics. This technology serves [...] Read more.
This article introduces an innovative meteor detection system that integrates high-speed photodiode detectors with traditional camera-based systems. The system employs four photodiodes to record changes in sky brightness at 100 Hz, enabling meteor detection and the observation of their dynamics. This technology serves as a valuable complement to existing imaging techniques, offering a cost-effective solution for measuring meteor ablation at frequencies beyond the capabilities of camera-based systems. We showcase findings from the Perseid meteor shower, demonstrating the potential of our system. Moreover, our system addresses the current limitations in meteor radiometry, where many existing instruments either remain in developmental stages or have not been validated with a substantial number of confirmed meteor events. Our approach successfully addresses these limitations, demonstrating effectiveness across multiple meteor events simultaneously recorded on video. Full article
(This article belongs to the Section Applied Physics General)
17 pages, 8074 KiB  
Article
Automated Segmentation of Breast Cancer Focal Lesions on Ultrasound Images
by Dmitry Pasynkov, Ivan Egoshin, Alexey Kolchev, Ivan Kliouchkin, Olga Pasynkova, Zahraa Saad, Anis Daou and Esam Mohamed Abuzenar
Sensors 2025, 25(5), 1593; https://doi.org/10.3390/s25051593 - 5 Mar 2025
Abstract
Ultrasound (US) remains the main modality for the differential diagnosis of changes revealed by mammography. However, the US images themselves are subject to various types of noise and artifacts from reflections, which can worsen the quality of their analysis. Deep learning methods have [...] Read more.
Ultrasound (US) remains the main modality for the differential diagnosis of changes revealed by mammography. However, the US images themselves are subject to various types of noise and artifacts from reflections, which can worsen the quality of their analysis. Deep learning methods have a number of disadvantages, including the often insufficient substantiation of the model, and the complexity of collecting a representative training database. Therefore, it is necessary to develop effective algorithms for the segmentation, classification, and analysis of US images. The aim of the work is to develop a method for the automated detection of pathological lesions in breast US images and their segmentation. A method is proposed that includes two stages of video image processing: (1) searching for a region of interest using a random forest classifier, which classifies normal tissues, (2) selecting the contour of the lesion based on the difference in brightness of image pixels. The test set included 52 ultrasound videos which contained histologically proven suspicious lesions. The average frequency of lesion detection per frame was 91.89%, and the average accuracy of contour selection according to the IoU metric was 0.871. The proposed method can be used to segment a suspicious lesion. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Frames from two different ultrasound video sequences; the arrows indicate (<b>a</b>) histologically proven mucinous breast cancer; and (<b>b</b>) histologically proven ductal breast cancer.</p>
Full article ">Figure 2
<p>Block diagram of the proposed method for identifying the contour of a lesion in ultrasound video frames.</p>
Full article ">Figure 3
<p>(<b>a</b>,<b>c</b>) Original ultrasound images; (<b>b</b>,<b>d</b>) ultrasound images with marked tissues on them (skin, fat, fibrous tissue, glandular tissue, and artifacts).</p>
Full article ">Figure 4
<p>(<b>a</b>) Original ultrasound image; (<b>b</b>) result of tissue classification by the random forest classifier; (<b>c</b>) after applying the morphological dilation operation and median filter; (<b>d</b>) ground truth.</p>
Full article ">Figure 5
<p>(<b>a</b>) Rays drawn from the center of lesion <span class="html-italic">A</span>; (<b>b</b>) brightness values of the pixels <span class="html-italic">P<sub>i</sub></span> lying on the rays; (<b>c</b>) graphs of the brightness gradients Δ<span class="html-italic">P<sub>i</sub></span> of the pixels lying on the rays.</p>
Full article ">Figure 6
<p>(<b>a</b>) Points corresponding to the extrema of the brightness gradient on the rays; (<b>b</b>) constructed averaged cubic regression for them (solid thick yellow line).</p>
Full article ">Figure 7
<p>Results of tissue classification on frames from the ultrasound video sequences corresponding to <a href="#sensors-25-01593-f001" class="html-fig">Figure 1</a>. (<b>a</b>) Histologically proven mucinous breast carcinoma; (<b>b</b>) histologically proven ductal breast carcinoma from different video sequences of another patient; (<b>c</b>,<b>d</b>) ground truths.</p>
Full article ">Figure 7 Cont.
<p>Results of tissue classification on frames from the ultrasound video sequences corresponding to <a href="#sensors-25-01593-f001" class="html-fig">Figure 1</a>. (<b>a</b>) Histologically proven mucinous breast carcinoma; (<b>b</b>) histologically proven ductal breast carcinoma from different video sequences of another patient; (<b>c</b>,<b>d</b>) ground truths.</p>
Full article ">Figure 8
<p>The result of identifying unclassified objects in the frames of ultrasound video sequences corresponding to <a href="#sensors-25-01593-f001" class="html-fig">Figure 1</a>. (<b>a</b>,<b>b</b>) Show the highlighted objects after additional shape filtering, (<b>c</b>,<b>d</b>) show the rectangular areas circled around them, which will be the ROI.</p>
Full article ">Figure 9
<p>(<b>a</b>,<b>b</b>) Segmented contours of lesions using pixel intensity gradient detection along rays drawn from the center of gravity of the contour (yellow); (<b>c</b>,<b>d</b>) contours of the lesion outlined by the specialist physician (red). The images correspond to <a href="#sensors-25-01593-f001" class="html-fig">Figure 1</a>.</p>
Full article ">Figure 10
<p>Frame-by-frame processing of the video ((<b>left</b>) to (<b>right</b>), (<b>top</b>) to (<b>bottom</b>): frame 1, frame 10, frame 20, frame 30, frame 40, frame 50). In frames 40 and 50, the ROI was not detected due to the absence of a suspicious lesion.</p>
Full article ">
17 pages, 12823 KiB  
Article
Remote Sensing Small Object Detection Network Based on Multi-Scale Feature Extraction and Information Fusion
by Junsuo Qu, Tong Liu, Zongbing Tang, Yifei Duan, Heng Yao and Jiyuan Hu
Remote Sens. 2025, 17(5), 913; https://doi.org/10.3390/rs17050913 - 5 Mar 2025
Abstract
Nowadays, object detection algorithms are widely used in various scenarios. However, there are further small object detection requirements in some special scenarios. Due to the problems related to small objects, such as their less available features, unbalanced samples, higher positioning accuracy requirements, and [...] Read more.
Nowadays, object detection algorithms are widely used in various scenarios. However, there are further small object detection requirements in some special scenarios. Due to the problems related to small objects, such as their less available features, unbalanced samples, higher positioning accuracy requirements, and fewer data sets, a small object detection algorithm is more complex than a general object detection algorithm. The detection effect of the model for small objects is not ideal. Therefore, this paper takes YOLOXs as the benchmark network and enhances the feature information on small objects by improving the network’s structure so as to improve the detection effect of the model for small objects. This specific research is presented as follows: Aiming at the problem of a neck network based on an FPN and its variants being prone to information loss in the feature fusion of non-adjacent layers, this paper proposes a feature fusion and distribution module, which replaces the information transmission path, from deep to shallow, in the neck network of YOLOXs. This method first fuses and extracts the feature layers used by the backbone network for prediction to obtain global feature information containing multiple-size objects. Then, the global feature information is distributed to each prediction branch to ensure that the high-level semantic and fine-grained information are more efficiently integrated so as to help the model effectively learn the discriminative information on small objects and classify them correctly. Finally, after testing on the VisDrone2021 dataset, which corresponds to a standard image size of 1080p (1920 × 1080), the resolution of each image is high and the video frame rate contained in the dataset is usually 30 frames/second (fps), with a high resolution in time, it can be used to detect objects of various sizes and for dynamic object detection tasks. And when we integrated the module into a YOLOXs network (named the FE-YOLO network) with the three improvement points of the feature layer, channel number, and maximum pool, the mAP and APs were increased by 1.0% and 0.8%, respectively. Compared with YOLOV5m, YOLOV7-Tiny, FCOS, and other advanced models, it can obtain the best performance. Full article
Show Figures

Figure 1

Figure 1
<p>Two common feature fusion diagrams.</p>
Full article ">Figure 2
<p>Convolution and deconvolution diagram.</p>
Full article ">Figure 3
<p>PANet information fusion diagram.</p>
Full article ">Figure 4
<p>The schematic diagram of the improved neck network.</p>
Full article ">Figure 5
<p>Schematic diagram of FFDN module.</p>
Full article ">Figure 6
<p>Map of mAP0.5 during the training of Fe-YOLO and FFDN-YOLO.</p>
Full article ">Figure 7
<p>Comparison between FE-YOLO and FFDN-YOLO models.</p>
Full article ">Figure 8
<p>Comparison diagram of model detection and manual labeling.</p>
Full article ">Figure 9
<p>Comparison chart of detection results of different models.</p>
Full article ">Figure 10
<p>Variation diagram of loss value during model training.</p>
Full article ">Figure 11
<p>Dataset test diagram.</p>
Full article ">
23 pages, 7793 KiB  
Article
A New, Robust, Adaptive, Versatile, and Scalable Abandoned Object Detection Approach Based on DeepSORT Dynamic Prompts, and Customized LLM for Smart Video Surveillance
by Merve Yilmazer and Mehmet Karakose
Appl. Sci. 2025, 15(5), 2774; https://doi.org/10.3390/app15052774 - 4 Mar 2025
Abstract
Video cameras are one of the important elements in ensuring security in public areas. Videos inspected by expert personnel using traditional methods may have a high error rate and take a long time to complete. In this study, a new deep learning-based method [...] Read more.
Video cameras are one of the important elements in ensuring security in public areas. Videos inspected by expert personnel using traditional methods may have a high error rate and take a long time to complete. In this study, a new deep learning-based method is proposed for the detection of abandoned objects, such as bags, suitcases, and suitcases left unsupervised in public areas. Transfer learning-based keyframe detection was first performed to remove unnecessary and repetitive frames from the ABODA dataset. Then, human and object classes were detected using the weights of the YOLOv8l model, which has a fast and effective object detection feature. Abandoned object detection is achieved by tracking classes in consecutive frames with the DeepSORT algorithm and measuring the distance between them. In addition, the location information of the human and object classes in the frames was analyzed by a large language model supported by prompt engineering. Thus, an explanation output regarding the location, size, and estimation rate of the object and human classes was created for the authorities. It is observed that the proposed model produces promising results comparable to the state-of-the-art methods for suspicious object detection from videos with success metrics of 97.9% precision, 97.0% recall, and 97.4% f1-score. Full article
Show Figures

Figure 1

Figure 1
<p>Abandoned object detection based on background subtraction [<a href="#B22-applsci-15-02774" class="html-bibr">22</a>].</p>
Full article ">Figure 2
<p>Block diagram of the proposed approach.</p>
Full article ">Figure 3
<p>Block diagram of keyframe detection based on ResNet101v2.</p>
Full article ">Figure 4
<p>Block diagram of YOLOv8-based person and object detection.</p>
Full article ">Figure 5
<p>Abandoned object detection flowchart.</p>
Full article ">Figure 6
<p>Performance comparison of YOLOv8 deep neural network sub-architectures.</p>
Full article ">Figure 7
<p>Mean average precision change curve—precision/recall curve.</p>
Full article ">Figure 8
<p>Person—object detection model input and output.</p>
Full article ">Figure 9
<p>YOLOv8-based person and object detection model outputs.</p>
Full article ">Figure 9 Cont.
<p>YOLOv8-based person and object detection model outputs.</p>
Full article ">Figure 10
<p>Object and human detection image/sample txt file.</p>
Full article ">Figure 11
<p>Prompt and sample output generated for LLM.</p>
Full article ">Figure 12
<p>Abandoned object detection TP, FP, and FN value comparison.</p>
Full article ">Figure 13
<p>Ablation experiments results.</p>
Full article ">Figure 14
<p>Proposed method outputs.</p>
Full article ">Figure 14 Cont.
<p>Proposed method outputs.</p>
Full article ">Figure 15
<p>Proposed method precision comparison.</p>
Full article ">
16 pages, 37656 KiB  
Article
Smoke and Fire-You Only Look Once: A Lightweight Deep Learning Model for Video Smoke and Flame Detection in Natural Scenes
by Chenmeng Zhao, Like Zhao, Ka Zhang, Yinghua Ren, Hui Chen and Yehua Sheng
Fire 2025, 8(3), 104; https://doi.org/10.3390/fire8030104 - 4 Mar 2025
Abstract
Owing to the demand for smoke and flame detection in natural scenes, this paper proposes a lightweight deep learning model, SF-YOLO (Smoke and Fire-YOLO), for video smoke and flame detection in such environments. Firstly, YOLOv11 is employed as the backbone network, combined with [...] Read more.
Owing to the demand for smoke and flame detection in natural scenes, this paper proposes a lightweight deep learning model, SF-YOLO (Smoke and Fire-YOLO), for video smoke and flame detection in such environments. Firstly, YOLOv11 is employed as the backbone network, combined with the C3k2 module based on a two-path residual attention mechanism, and a target detection head frame with an embedded attention mechanism. This combination enhances the response of the unobscured regions to compensate for the feature loss in occluded regions, thereby addressing the occlusion problem in dynamic backgrounds. Then, a two-channel loss function (W-SIoU) based on dynamic tuning and intelligent focusing is designed to enhance loss computation in the boundary regions, thus improving the YOLOv11 model’s ability to recognize targets with ambiguous boundaries. Finally, the algorithms proposed in this paper are experimentally validated using the self-generated dataset S-Firedata and the public smoke and flame virtual dataset M4SFWD. These datasets are derived from internet smoke and flame video frame extraction images and open-source smoke and flame dataset images, respectively. The experimental results demonstrate, compared with deep learning models such as YOLOv8, Gold-YOLO, and Faster-RCNN, the SF-YOLO model proposed in this paper is more lightweight and exhibits higher detection accuracy and robustness. The metrics mAP50 and mAP50-95 are improved by 2.5% and 2.4%, respectively, in the self-made dataset S-Firedata, and by 0.7% and 1.4%, respectively, in the publicly available dataset M4SFWD. The research presented in this paper provides practical methods for the automatic detection of smoke and flame in natural scenes, which can further enhance the effectiveness of fire monitoring systems. Full article
Show Figures

Figure 1

Figure 1
<p>Overall technical flowchart of the algorithm in this paper.</p>
Full article ">Figure 2
<p>The two modules of C3k2_DWR.</p>
Full article ">Figure 3
<p>SEAMHead’s Fully Connected Network Architecture.</p>
Full article ">Figure 4
<p>W-SIoU Schematic.</p>
Full article ">Figure 5
<p>Smoke flame detection effectiveness of different deep neural network models in the case of remote sensing fire targets. With the exception of YOLOv11 and SF-YOLO, none of the models detected the target. (<b>a</b>) is the original image; (<b>b</b>) is the Centernet detection effect image; (<b>c</b>) is the Faster-RCNN detection effect; (<b>d</b>) is the Gold-Yolo detection effect; (<b>e</b>) is the YOLOv7 detection effect; (<b>f</b>) is the YOLOv8 detection effect; (<b>g</b>) is the YOLOv11 detection effect; (<b>h</b>) is the detection effect of the proposed algorithm.</p>
Full article ">Figure 6
<p>Effectiveness of different deep neural network models for smoke flame detection in multi-target situations. Centernet can only detect a portion of the targets and none of the models except SF-YOLO detect small targets in the image. (<b>a</b>) is the original image; (<b>b</b>) is the Centernet detection effect image; (<b>c</b>) is the Faster-RCNN detection effect; (<b>d</b>) is the Gold-Yolo detection effect; (<b>e</b>) is the YOLOv7 detection effect; (<b>f</b>) is the YOLOv8 detection effect; (<b>g</b>) is the YOLOv11 detection effect; (<b>h</b>) is the detection effect of the proposed algorithm.</p>
Full article ">Figure 7
<p>Effectiveness of different deep neural network models for smoke flame detection in the case of including small targets. Only SF-YOLO successfully recognizes all targets on the image. (<b>a</b>) is the original image; (<b>b</b>) is the Centernet detection effect image; (<b>c</b>) is the Faster-RCNN detection effect; (<b>d</b>) is the Gold-Yolo detection effect; (<b>e</b>) is the YOLOv7 detection effect; (<b>f</b>) is the YOLOv8 detection effect; (<b>g</b>) is the YOLOv11 detection effect; (<b>h</b>) is the detection effect of the proposed algorithm.</p>
Full article ">Figure 8
<p>Effectiveness of different deep neural network models for smoke flame detection in situations containing targets in dark environments. Centernet and Faster-RCNN have leakage detection, and all other models have lower detection accuracy than SF-YOLO. (<b>a</b>) is the original image; (<b>b</b>) is the Centernet detection effect image; (<b>c</b>) is the Faster-RCNN detection effect; (<b>d</b>) is the Gold-Yolo detection effect; (<b>e</b>) is the YOLOv7 detection effect; (<b>f</b>) is the YOLOv8 detection effect; (<b>g</b>) is the YOLOv11 detection effect; (<b>h</b>) is the detection effect of the proposed algorithm.</p>
Full article ">Figure 9
<p>Smoke flame detection effectiveness of different deep neural network models in the case of including occluded targets. With the exception of YOLOv11 and SF-YOLO, all models were designed to detect flame targets other than those obscured by foliage. (<b>a</b>) is the original image; (<b>b</b>) is the Centernet detection effect image; (<b>c</b>) is the Faster-RCNN detection effect; (<b>d</b>) is the Gold-Yolo detection effect; (<b>e</b>) is the YOLOv7 detection effect; (<b>f</b>) is the YOLOv8 detection effect; (<b>g</b>) is the YOLOv11 detection effect; (<b>h</b>) is the detection effect of the proposed algorithm.</p>
Full article ">Figure 10
<p>Effectiveness of different deep neural network models for smoke flame detection in the case of including fire-like targets. Only SF-YOLO and Faster-RCNN succeeded in identifying the obfuscated target. (<b>a</b>) is the original image; (<b>b</b>) is the Centernet detection effect image; (<b>c</b>) is the Faster-RCNN detection effect; (<b>d</b>) is the Gold-Yolo detection effect; (<b>e</b>) is the YOLOv7 detection effect; (<b>f</b>) is the YOLOv8 detection effect; (<b>g</b>) is the YOLOv11 detection effect; (<b>h</b>) is the detection effect of the proposed algorithm.</p>
Full article ">Figure 11
<p>Detection effectiveness of the SF-YOLO algorithm in the Los Angeles Hill Fire. The algorithm in this paper accurately detects and identifies the scattered small fires in the graph with a confidence level of about 40% for the tiny targets.</p>
Full article ">
16 pages, 2735 KiB  
Article
AI-Driven Framework for Enhanced and Automated Behavioral Analysis in Morris Water Maze Studies
by István Lakatos, Gergő Bogacsovics, Attila Tiba, Dániel Priksz, Béla Juhász, Rita Erdélyi, Zsuzsa Berényi, Ildikó Bácskay, Dóra Ujvárosy and Balázs Harangi
Sensors 2025, 25(5), 1564; https://doi.org/10.3390/s25051564 - 4 Mar 2025
Viewed by 92
Abstract
The Morris Water Maze (MWM) is a widely used behavioral test to assess the spatial learning and memory of animals, particularly valuable in studying neurodegenerative disorders such as Alzheimer’s disease. Traditional methods for analyzing MWM experiments often face limitations in capturing the complexity [...] Read more.
The Morris Water Maze (MWM) is a widely used behavioral test to assess the spatial learning and memory of animals, particularly valuable in studying neurodegenerative disorders such as Alzheimer’s disease. Traditional methods for analyzing MWM experiments often face limitations in capturing the complexity of animal behaviors. In this study, we present a novel AI-based automated framework to process and evaluate MWM test videos, aiming to enhance behavioral analysis through machine learning. Our pipeline involves video preprocessing, animal detection using convolutional neural networks (CNNs), trajectory tracking, and postprocessing to derive detailed behavioral features. We propose concentric circle segmentation of the pool next to the quadrant-based division, and we extract 32 behavioral metrics for each zone, which are employed in classification tasks to differentiate between younger and older animals. Several machine learning classifiers, including random forest and neural networks, are evaluated, with feature selection techniques applied to improve the classification accuracy. Our results demonstrate a significant improvement in classification performance, particularly through the integration of feature sets derived from concentric zone analyses. This automated approach offers a robust solution for MWM data processing, providing enhanced precision and reliability, which is critical for the study of neurodegenerative disorders. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

Figure 1
<p>Flow chart of the MWM evaluation process.</p>
Full article ">Figure 2
<p>The experiment room with the pool setup.</p>
Full article ">Figure 3
<p>A frame of (<b>a</b>) the unprocessed raw video with the preprocessing steps: (<b>b</b>) cropping, (<b>c</b>) masking, and (<b>d</b>) rotating of the image.</p>
Full article ">Figure 4
<p>The process of detecting the animal in the image of the pool: (<b>a</b>) extracted ROI, (<b>b</b>) Gaussian blurred image, (<b>c</b>) precalculated comparing background, (<b>d</b>) differing image parts and (<b>e</b>) successfully detected animal.</p>
Full article ">Figure 5
<p>Dispersion of the normalized properties of the input bounding boxes from the training set: (<b>a</b>) center point positions and (<b>b</b>) dimensions.</p>
Full article ">Figure 6
<p>Path states (highlighted in red) during postprocessing: (<b>a</b>) unprocessed detected path from YOLOv5, (<b>b</b>) low-entropy discard at the start of the video, (<b>c</b>) filtering based on the 95th percentile, (<b>d</b>) interpolation where the segments are filtered out, and (<b>e</b>) the final postprocessed corrected path.</p>
Full article ">Figure 7
<p>The different area-based segmentations: (<b>a</b>) the 4 quadrants, and (<b>b</b>) the concentric circles.</p>
Full article ">Figure 8
<p>Mean and standard error of mean (SEM) values for (<b>a</b>) escape latency, (<b>b</b>) average speed, and (<b>c</b>) distance traveled.</p>
Full article ">
7 pages, 7488 KiB  
Proceeding Paper
Enhancing Fabric Detection and Classification Using YOLOv5 Models
by Makara Mao, Jun Ma, Ahyoung Lee and Min Hong
Eng. Proc. 2025, 89(1), 33; https://doi.org/10.3390/engproc2025089033 - 3 Mar 2025
Viewed by 48
Abstract
The YOLO series is widely recognized for its efficiency in the real-time detection of objects within images and videos. Accurately identifying and classifying fabric types in the textile industry is vital to ensuring quality, managing supply, and increasing customer satisfaction. We developed a [...] Read more.
The YOLO series is widely recognized for its efficiency in the real-time detection of objects within images and videos. Accurately identifying and classifying fabric types in the textile industry is vital to ensuring quality, managing supply, and increasing customer satisfaction. We developed a method for fabric type classification and object detection using the YOLOv5 architecture. The model was trained on a diverse dataset containing images of different fabrics, including cotton, hanbok, dyed cotton yarn, and a plain cotton blend. We conducted a dataset preparation process, including data collection, annotation, and training procedures for data augmentation to improve model generalization. The model’s performance was evaluated using precision, recall, and F1-score. The developed model detected and classified fabrics with an accuracy of 81.08%. YOLOv5s allowed a faster performance than other models. The model can be used for automated quality control, inventory tracking, and retail analytics. The deep learning-based object detection method with YOLOv5 addresses challenges related to fabric classification, improving the abilities and productivity of manufacturing and operations. Full article
Show Figures

Figure 1

Figure 1
<p>Implementation process for YOLOv5 model.</p>
Full article ">Figure 2
<p>Architecture and processes of YOLOv5.</p>
Full article ">Figure 3
<p>Images of samples of each category of fabric.</p>
Full article ">Figure 4
<p>YOLOv5 labeling.</p>
Full article ">Figure 5
<p>Results from testing the types of fabrics.</p>
Full article ">
18 pages, 3451 KiB  
Article
Integrating Neural Networks for Automated Video Analysis of Traffic Flow Routing and Composition at Intersections
by Maros Jakubec, Michal Cingel, Eva Lieskovská and Marek Drliciak
Sustainability 2025, 17(5), 2150; https://doi.org/10.3390/su17052150 - 2 Mar 2025
Viewed by 272
Abstract
Traffic flow at intersections is influenced by spatial design, control methods, technical equipment, and traffic volume. This article focuses on detecting traffic flows at intersections using video recordings, employing a YOLO-based framework for automated analysis. We compare manual evaluation with machine processing to [...] Read more.
Traffic flow at intersections is influenced by spatial design, control methods, technical equipment, and traffic volume. This article focuses on detecting traffic flows at intersections using video recordings, employing a YOLO-based framework for automated analysis. We compare manual evaluation with machine processing to demonstrate the efficiency improvements in traffic engineering tasks through automated traffic data analysis. The output data include traditionally immeasurable parameters, such as speed and vehicle gaps within the observed intersection area. The traffic analysis incorporates findings from monitoring groups of vehicles, focusing on their formation and speed as they traverse the intersection. Our proposed system for monitoring and classifying traffic flow was implemented at a selected intersection in the city of Zilina, Slovak Republic, as part of a pilot study for this research initiative. Based on evaluations using local data, the YOLOv9c detection model achieved an mAP50 of 98.2% for vehicle localization and classification across three basic classes: passenger cars, trucks, and buses. Despite the high detection accuracy of the model, the automated annotations for vehicle entry and exit at the intersection showed varying levels of accuracy compared to manual evaluation. On average, the mean absolute error between annotations by traffic specialists and the automated framework for the most frequent class, passenger cars, was 2.73 across all directions at 15 min intervals. This indicates that approximately three passenger cars per 15 min interval were either undetected or misclassified. Full article
Show Figures

Figure 1

Figure 1
<p>The structure of the YOLOv9 network.</p>
Full article ">Figure 2
<p>Map of the intersection selected for the pilot analysis and a traffic diagram of the intersection, where (<b>A</b>–<b>C</b>) indicate the directions from which vehicles approach the intersection.</p>
Full article ">Figure 3
<p>Video recording setup at the intersection: (<b>a</b>) camera on a tripod placed indoors; (<b>b</b>) close-up of the camera setup; (<b>c</b>) camera display showing the live view of the monitored intersection.</p>
Full article ">Figure 4
<p>Proposed framework for evaluating traffic flow routing.</p>
Full article ">Figure 5
<p>Comparison of vehicle types in MS COCO [<a href="#B31-sustainability-17-02150" class="html-bibr">31</a>] and those commonly found on Slovak roads.</p>
Full article ">Figure 6
<p>Examples of vehicle detection performed using YOLOv9c.</p>
Full article ">Figure 7
<p>Example of evaluating the formation of vehicle groups on the Pod Hájom street profile.</p>
Full article ">
14 pages, 19850 KiB  
Article
Intelligent Deep Learning and Keypoint Tracking-Based Detection of Lameness in Dairy Cows
by Zongwei Jia, Yingjie Zhao, Xuanyu Mu, Dongjie Liu, Zhen Wang, Jiangtan Yao and Xuhui Yang
Vet. Sci. 2025, 12(3), 218; https://doi.org/10.3390/vetsci12030218 - 2 Mar 2025
Viewed by 216
Abstract
With the ongoing development of computer vision technologies, the automation of lameness detection in dairy cows urgently requires improvement. To address the challenges of detection difficulties and technological limitations, this paper proposes an automated scoring method for cow lameness that integrates deep learning [...] Read more.
With the ongoing development of computer vision technologies, the automation of lameness detection in dairy cows urgently requires improvement. To address the challenges of detection difficulties and technological limitations, this paper proposes an automated scoring method for cow lameness that integrates deep learning with keypoint tracking. First, the DeepLabCut tool is used to efficiently extract keypoint features during the walking process of dairy cows, which enables the automated monitoring and output of positional information. Then, the extracted positional data are combined with temporal data to construct a scoring model for cow lameness. The experimental results demonstrate that the proposed method tracks the keypoint of cow movement accurately in visible-light videos and satisfies the requirements for real-time detection. The model classifies the walking states of the cows into four levels, i.e., normal, mild, moderate, and severe lameness (corresponding to scores of 0, 1, 2, and 3, respectively). The detection results obtained in real-world real environments exhibit the high extraction accuracy of the keypoint positional information, with an average error of only 4.679 pixels and an overall accuracy of 90.21%. The detection accuracy for normal cows was 89.0%, with 85.3% for mild lameness, 92.6% for moderate lameness, and 100.0% for severe lameness. These results demonstrate that the application of keypoint detection technology for the automated scoring of lameness provides an effective solution for intelligent dairy management. Full article
Show Figures

Figure 1

Figure 1
<p>A schematic diagram of the limp video capture location.</p>
Full article ">Figure 2
<p>The number of cows with different degrees of lameness (<b>a</b>) and number of videos (<b>b</b>).</p>
Full article ">Figure 3
<p>A diagram illustrating the positions of four key feature points.</p>
Full article ">Figure 4
<p>The workflow diagram of DeepLabCut.</p>
Full article ">Figure 5
<p>The structure of multiscale fusion.</p>
Full article ">Figure 6
<p>Loss curves.</p>
Full article ">Figure 7
<p>The tracking performances of the key points.</p>
Full article ">Figure 8
<p>The trajectory plot of key points in space.</p>
Full article ">Figure 9
<p>The characteristic triangle of the back and the characteristic angle.</p>
Full article ">Figure 10
<p>The feature angle change plot.</p>
Full article ">Figure 11
<p>A diagram of angular <math display="inline"><semantics> <mi>α</mi> </semantics></math> and <math display="inline"><semantics> <mi>β</mi> </semantics></math> variation for different degrees of claudication.</p>
Full article ">Figure 12
<p>The model test scoring results.</p>
Full article ">
25 pages, 11063 KiB  
Article
Evaluating the Accuracy of Smartphone-Based Photogrammetry and Videogrammetry in Facial Asymmetry Measurement
by Luiz Carlos Teixeira Coelho, Matheus Ferreira Coelho Pinho, Flávia Martinez de Carvalho, Ana Luiza Meneguci Moreira Franco, Omar C. Quispe-Enriquez, Francisco Airasca Altónaga and José Luis Lerma
Symmetry 2025, 17(3), 376; https://doi.org/10.3390/sym17030376 - 1 Mar 2025
Viewed by 196
Abstract
Facial asymmetry presents a significant challenge for health practitioners, including physicians, dentists, and physical therapists. Manual measurements often lack the precision needed for accurate assessments, highlighting the appeal of imaging technologies like structured light scanners and photogrammetric systems. However, high-end commercial systems remain [...] Read more.
Facial asymmetry presents a significant challenge for health practitioners, including physicians, dentists, and physical therapists. Manual measurements often lack the precision needed for accurate assessments, highlighting the appeal of imaging technologies like structured light scanners and photogrammetric systems. However, high-end commercial systems remain cost prohibitive, especially for public health services in developing countries. This study aims to evaluate cell-phone-based photogrammetric methods for generating 3D facial models to detect facial asymmetries. For this purpose, 15 patients had their faces scanned with the ACADEMIA 50 3D scanner, as well as with cell phone images and videos using photogrammetry and videogrammetry, resulting in 3D facial models. Each 3D model (coming from a 3D scanner, photogrammetry, and videogrammetry) was half-mirrored to analyze dissimilarities between the two ideally symmetric face sides using Hausdorff distances between the two half-meshes. These distances were statistically analyzed through various measures and hypothesis tests. The results indicate that, in most cases, both photogrammetric and videogrammetric approaches are as reliable as 3D scanning for detecting facial asymmetries. The benefits and limitations of using images, videos, and 3D scanning are also presented. Full article
(This article belongs to the Special Issue Symmetry and Asymmetry in Computer Vision and Graphics)
Show Figures

Figure 1

Figure 1
<p>Workflow for the experiment, comprising data collection, data treatment, and statistical analysis.</p>
Full article ">Figure 2
<p>Academia 50 3D scanner in use.</p>
Full article ">Figure 3
<p>A screenshot of Agisoft Metashape showing image orientation referred to one of the patients who volunteered for this project.</p>
Full article ">Figure 4
<p>Diagram showing the landmarks on which stickers were placed.</p>
Full article ">Figure 5
<p>Patient being scanned with the 3D scanner (<b>left</b>) and the cell phone for images and video (<b>right</b>).</p>
Full article ">Figure 6
<p>Diagram showing the cell phone path for capturing images and video.</p>
Full article ">Figure 7
<p>Close-up of Patient 8's mouth. It is possible to notice how involuntary movements lead to poor quality meshes, especially for the ACADEMIA 50 3D scanner (<b>c</b>); it is less apparent in photogrammetry (<b>a</b>) and videogrammetry (<b>b</b>) 3D models.</p>
Full article ">Figure 8
<p>Three-dimensional models for Patient 4 side-by-side: (<b>a</b>) 3D scanner, (<b>b</b>) photogrammetry, (<b>c</b>) videogrammetry.</p>
Full article ">Figure 9
<p>Facial landmarks defining the plane to be used for cutting face models in two halves (shown in the diagram in green and red). The model is cut according to the plane defined by those marks; then, its left part is mirrored and the asymmetries between both surfaces are calculated.</p>
Full article ">Figure 10
<p>Hausdorff distances between the two halves (for the photogrammetry models of Patient 15) represented as a heatmap with higher distances in blue and lower distances in red (units in mm).</p>
Full article ">Figure 11
<p>Spatial distribution of Hausdorff distances (in mm) between the two halves of Patient 13's face, according to the models derived from (<b>a</b>) photogrammetry, (<b>b</b>) videogrammetry, and (<b>c</b>) the 3D scanner. Areas in blue are more asymmetrical, whereas areas in red are much more symmetrical.</p>
Full article ">Figure 12
<p>Frequency histograms for Hausdorff distances for the two halves of each patient. For each histogram, the mean is represented as a dotted black line, standard deviations are represented as dotted red lines and the median is represented as a dotted green line.</p>
Full article ">Figure 13
<p>Data extracted from <a href="#symmetry-17-00376-t003" class="html-table">Table 3</a>, as a box plot.</p>
Full article ">
9 pages, 1106 KiB  
Article
Automatic Movement Recognition for Evaluating the Gross Motor Development of Infants
by Yin-Zhang Yang, Jia-An Tsai, Ya-Lan Yu, Mary Hsin-Ju Ko, Hung-Yi Chiou, Tun-Wen Pai and Hui-Ju Chen
Children 2025, 12(3), 310; https://doi.org/10.3390/children12030310 - 28 Feb 2025
Viewed by 136
Abstract
Objective: The objective of this study was to early-detect gross motor abnormalities through video detection in Taiwanese infants aged 2–6 months. Background: The current diagnosis of infant developmental delays primarily relies on clinical examinations. However, during clinical visits, infants may show atypical behaviors [...] Read more.
Objective: The objective of this study was to early-detect gross motor abnormalities through video detection in Taiwanese infants aged 2–6 months. Background: The current diagnosis of infant developmental delays primarily relies on clinical examinations. However, during clinical visits, infants may show atypical behaviors due to unfamiliar environments, which might not truly reflect their true developmental status. Methods: This study utilized videos of infants recorded in their home environments. Two pediatric neurologists manually annotated these clips to identify whether an infant possessed the characteristics of gross motor delays through an assessment of his/her gross motor movements. Using transfer learning techniques, four pose recognition models, including ViTPose, HRNet, DARK, and UDP, were applied to the infant gross motor dataset. Four machine learning classification models, including random forest, support vector machine, logistic regression, and XGBoost, were used to predict the developmental status of infants. Results: The experimental results of pose estimation and tracking indicate that the ViTPose model provided the best performance for pose recognition. A total of 227 features related to kinematics, motions, and postures were extracted and calculated. A one-way ANOVA analysis revealed 106 significant features that were retained for constructing prediction models. The results show that a random forest model achieved the best performance with an average F1-score of 0.94, a weighted average AUC of 0.98, and an average accuracy of 94%. Full article
(This article belongs to the Section Pediatric Neurology & Neurodevelopmental Disorders)
Show Figures

Figure 1

Figure 1
<p>System architecture.</p>
Full article ">Figure 2
<p>Example of pose estimation annotation. Red bounding box: encompassing the infant’s body; Color lines: movement lines of 17 body joints.</p>
Full article ">Figure 3
<p>Eight joint angle features of the shoulders (a) and (b), elbows (c) and (d), hips (e) and (f), and knees (g) and (h).</p>
Full article ">
19 pages, 10608 KiB  
Article
Urban Waterlogging Monitoring and Recognition in Low-Light Scenarios Using Surveillance Videos and Deep Learning
by Jian Zhao, Xing Wang, Cuiyan Zhang, Jing Hu, Jiaquan Wan, Lu Cheng, Shuaiyi Shi and Xinyu Zhu
Water 2025, 17(5), 707; https://doi.org/10.3390/w17050707 - 28 Feb 2025
Viewed by 205
Abstract
With the intensification of global climate change, extreme precipitation events are occurring more frequently, making the monitoring and management of urban flooding a critical global issue. Urban surveillance camera sensor networks, characterized by their large-scale deployment, rapid data transmission, and low cost, have [...] Read more.
With the intensification of global climate change, extreme precipitation events are occurring more frequently, making the monitoring and management of urban flooding a critical global issue. Urban surveillance camera sensor networks, characterized by their large-scale deployment, rapid data transmission, and low cost, have emerged as a key complement to traditional remote sensing techniques. These networks offer new opportunities for high-spatiotemporal-resolution urban flood monitoring, enabling real-time, localized observations that satellite and aerial systems may not capture. However, in low-light environments—such as during nighttime or heavy rainfall—the image features of flooded areas become more complex and variable, posing significant challenges for accurate flood detection and timely warnings. To address these challenges, this study develops an imaging model tailored to flooded areas under low-light conditions and proposes an invariant feature extraction model for flooding areas within surveillance videos. By using extracted image features (i.e., brightness and invariant features of flooded areas) as inputs, a deep learning-based flood segmentation model is built on the U-Net architecture. A new low-light surveillance flood image dataset, named UWs, is constructed for training and testing the model. The experimental results demonstrate the efficacy of the proposed method, achieving an mRecall of 0.88, an mF1_score of 0.91, and an mIoU score of 0.85. These results significantly outperform the comparison algorithms, including LRASPP, DeepLabv3+ with MobileNet and ResNet backbones, and the classic DeepLabv3+, with improvements of 4.9%, 3.0%, and 4.4% in mRecall, mF1_score, and mIoU, respectively, compared to Res-UNet. Additionally, the method maintains its strong performance in real-world tests, and it is also effective for daytime flood monitoring, showcasing its robustness for all-weather applications. The findings of this study provide solid support for the development of an all-weather urban surveillance camera flood monitoring network, with significant practical value for enhancing urban emergency management and disaster reduction efforts. Full article
(This article belongs to the Section Urban Water Management)
Show Figures

Figure 1

Figure 1
<p>The infrared images of the surveillance camera. (<b>a</b>–<b>d</b>) show the changes in the water features before and after the car drives by. (<b>e</b>–<b>h</b>) show the changes in water features at different surveillance camera positions and attitudes. The flooding area is labeled in red.</p>
Full article ">Figure 2
<p>False color images of the surveillance camera. (<b>a</b>–<b>d</b>) show the changes in the water features before and after a car drives by. (<b>e</b>–<b>h</b>) show the variation in water features at different surveillance camera positions and poses. The flooding area is labeled in red.</p>
Full article ">Figure 3
<p>Architecture of Aunet. The Dis<sub>LA</sub> module realizes the separation of the invariant features of the flooding area of the low-light scene, and the U-Net [<a href="#B43-water-17-00707" class="html-bibr">43</a>] network realizes the segmentation of the flooding region of the low-light scene. The white areas represent flood.</p>
Full article ">Figure 4
<p>Dis<sub>LA</sub> composition of the module, where 7 × 7 @128 denotes that there are 128 sets of convolutional kernels of the size 7 × 7. * [N, H, W, C] denotes the input dimension of the module, where N is the Batchsize, H is the height, W is the width, and C is the number of channels.</p>
Full article ">Figure 5
<p>The architecture of the Swin Transformer block [<a href="#B47-water-17-00707" class="html-bibr">47</a>].</p>
Full article ">Figure 6
<p>Some surveillance scenarios within the constructed dataset.</p>
Full article ">Figure 7
<p>Segmentation effect of floods for each model in the black-and-white image. The red areas represent floods. The green-framed areas are of particular interest.</p>
Full article ">Figure 8
<p>Segmentation effect of flooding areas for each model in the false color image. The red areas represent floods. The green, yellow, and blue-framed areas are of particular interest.</p>
Full article ">Figure 9
<p>Effect of Aunet and comparison models on negative samples (water-free regions at night). The red areas represent floods.</p>
Full article ">Figure 10
<p>Aunet and comparison modeling of flooding area segmentation effect in daytime flood images. The red areas represent floods.</p>
Full article ">
21 pages, 325 KiB  
Article
“VID-KIDS” Video-Feedback Interaction Guidance for Depressed Mothers and Their Infants: Results of a Randomized Controlled Trial
by Panagiota D. Tryphonopoulos, Deborah McNeil, Monica Oxford, Cindy-Lee Dennis, Jason Novick, Andrea J. Deane, Kelly Wu, Stefan Kurbatfinski, Keira Griggs and Nicole Letourneau
Behav. Sci. 2025, 15(3), 279; https://doi.org/10.3390/bs15030279 - 27 Feb 2025
Viewed by 242
Abstract
VID-KIDS (Video-Feedback Interaction Guidance for Depressed Mothers and their Infants) is a positive parenting programme comprising three brief nurse-guided video-feedback sessions (offered in-person or virtually via Zoom) that promote “serve and return” interactions by helping depressed mothers to be more sensitive and responsive [...] Read more.
VID-KIDS (Video-Feedback Interaction Guidance for Depressed Mothers and their Infants) is a positive parenting programme comprising three brief nurse-guided video-feedback sessions (offered in-person or virtually via Zoom) that promote “serve and return” interactions by helping depressed mothers to be more sensitive and responsive to infant cues. We examined whether mothers who received the VID-KIDS programme demonstrated improved maternal–infant interaction quality. The secondary hypotheses examined VID-KIDS’ effects on maternal depression, anxiety, perceived parenting stress, infant developmental outcomes, and infant cortisol patterns. A parallel group randomized controlled trial (n = 140) compared the VID-KIDS programme to standard care controls (e.g., a resource and referral programme). The trial was registered in the US Clinical Trials Registry (number NCT03052374). Outcomes were assessed at baseline, nine weeks post-randomization (immediate post-test), and two months post-intervention. Maternal–infant interaction quality significantly improved for the intervention group with moderate to large effects. These improvements persisted during the post-test two months after the final video-feedback session. No significant group differences were detected for secondary outcomes. This study demonstrated that nurse-guided video-feedback can improve maternal–infant interaction in the context of PPD. These findings are promising, as sensitive and responsive parenting is crucial for promoting children’s healthy development. Full article
18 pages, 4436 KiB  
Article
QRNet: A Quaternion-Based Retinex Framework for Enhanced Wireless Capsule Endoscopy Image Quality
by Vladimir Frants and Sos Agaian
Bioengineering 2025, 12(3), 239; https://doi.org/10.3390/bioengineering12030239 - 26 Feb 2025
Viewed by 149
Abstract
Wireless capsule endoscopy (WCE) offers a non-invasive diagnostic alternative for the gastrointestinal tract using a battery-powered capsule. Despite advantages, WCE encounters issues with video quality and diagnostic accuracy, often resulting in missing rates of 1–20%. These challenges stem from weak texture characteristics due [...] Read more.
Wireless capsule endoscopy (WCE) offers a non-invasive diagnostic alternative for the gastrointestinal tract using a battery-powered capsule. Despite advantages, WCE encounters issues with video quality and diagnostic accuracy, often resulting in missing rates of 1–20%. These challenges stem from weak texture characteristics due to non-Lambertian tissue reflections, uneven illumination, and the necessity of color fidelity. Traditional Retinex-based methods used for image enhancement are suboptimal for endoscopy, as they frequently compromise anatomical detail while distorting color. To address these limitations, we introduce QRNet, a novel quaternion-based Retinex framework. QRNet performs image decomposition into reflectance and illumination components within hypercomplex space, maintaining inter-channel relationships that preserve color fidelity. A quaternion wavelet attention mechanism refines essential features while suppressing noise, balancing enhancement and fidelity through an innovative loss function. Experiments on Kvasir-Capsule and Red Lesion Endoscopy datasets demonstrate notable improvements in metrics such as PSNR (+2.3 dB), SSIM (+0.089), and LPIPS (−0.126). Moreover, lesion segmentation accuracy increases by up to 5%, indicating the framework’s potential for improving early-stage lesion detection. Ablation studies highlight the quaternion representation’s pivotal role in maintaining color consistency, confirming the promise of this advanced approach for clinical settings. Full article
Show Figures

Figure 1

Figure 1
<p>An overview of the proposed <b>QRNet</b> pipeline for low-light WCE enhancement. The input image is decomposed into two feature maps <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">Q</mi> </mrow> <mrow> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> (reflectance) and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi mathvariant="normal">Q</mi> </mrow> <mrow> <mn>2</mn> </mrow> </msub> </mrow> </semantics></math> (illumination), each processed by separate SSM Mamba-based encoder–decoder branches. Wavelet-transformed sub-bands guide an attention mechanism that refines both reflectance and illumination features. Finally, the enhanced quaternions are combined via the Hamilton product to generate the output image with improved contrast and preserved color fidelity.</p>
Full article ">Figure 2
<p>Visual comparison on the Kvasir-Capsule dataset. GT is the ground truth. Our method preserves natural color balance and detail better than competing approaches.</p>
Full article ">Figure 3
<p>Visual comparison on the RLE dataset. Ground truth (top-left) vs. various enhancement methods. Our framework balances brightness and color fidelity, aiding in lesion visibility.</p>
Full article ">
Back to TopTop