[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,051)

Search Parameters:
Keywords = bounding box

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
32 pages, 19029 KiB  
Article
Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data
by Paul Fergus, Carl Chalmers, Naomi Matthews, Stuart Nixon, André Burger, Oliver Hartley, Chris Sutherland, Xavier Lambin, Steven Longmore and Serge Wich
Sensors 2024, 24(24), 8122; https://doi.org/10.3390/s24248122 (registering DOI) - 19 Dec 2024
Abstract
Camera traps offer enormous new opportunities in ecological studies, but current automated image analysis methods often lack the contextual richness needed to support impactful conservation outcomes. Integrating vision–language models into these workflows could address this gap by providing enhanced contextual understanding and enabling [...] Read more.
Camera traps offer enormous new opportunities in ecological studies, but current automated image analysis methods often lack the contextual richness needed to support impactful conservation outcomes. Integrating vision–language models into these workflows could address this gap by providing enhanced contextual understanding and enabling advanced queries across temporal and spatial dimensions. Here, we present an integrated approach that combines deep learning-based vision and language models to improve ecological reporting using data from camera traps. We introduce a two-stage system: YOLOv10-X to localise and classify species (mammals and birds) within images and a Phi-3.5-vision-instruct model to read YOLOv10-X bounding box labels to identify species, overcoming its limitation with hard-to-classify objects in images. Additionally, Phi-3.5 detects broader variables, such as vegetation type and time of day, providing rich ecological and environmental context to YOLO’s species detection output. When combined, this output is processed by the model’s natural language system to answer complex queries, and retrieval-augmented generation (RAG) is employed to enrich responses with external information, like species weight and IUCN status (information that cannot be obtained through direct visual analysis). Combined, this information is used to automatically generate structured reports, providing biodiversity stakeholders with deeper insights into, for example, species abundance, distribution, animal behaviour, and habitat selection. Our approach delivers contextually rich narratives that aid in wildlife management decisions. By providing contextually rich insights, our approach not only reduces manual effort but also supports timely decision making in conservation, potentially shifting efforts from reactive to proactive. Full article
(This article belongs to the Section Electronic Sensors)
Show Figures

Figure 1

Figure 1
<p>Flow chart illustrating an overview of the workflow for the YOLOv10-X and Phi3.5-vision-instruct model integration for context-rich camera trap data processing.</p>
Full article ">Figure 2
<p>Class distribution for the Sub-Saharan Africa dataset used to train the YOLOv10-X model to localise and detect mammals, birds, people, and cars.</p>
Full article ">Figure 3
<p>Overview of the YOLOv10 architecture.</p>
Full article ">Figure 4
<p>Image from Limpopo Province in South Africa showing the detection of a zebra at night using a camera trap.</p>
Full article ">Figure 5
<p>Image from Limpopo Province in South Africa showing the detection of a multiple blue wildebeest and zebras using a camera trap.</p>
Full article ">Figure 6
<p>Precision–recall (PR) curve for the YOLOv10-X model trained on 29 Sub-Saharan African species, vehicles, and human subjects.</p>
Full article ">Figure 7
<p>Precision–confidence curve for the model trained on Sub-Saharan African species, vehicles, and human subjects.</p>
Full article ">Figure 8
<p>Recall–confidence curve for the model trained on Sub-Saharan African species, vehicles, and human subjects.</p>
Full article ">Figure 9
<p>F1–confidence curve for the model trained on Sub-Saharan African species, vehicles, and human subjects.</p>
Full article ">Figure 10
<p>The confusion matrix provides a detailed analysis of the model’s classification performance across all Sub-Saharan African species, vehicles, and human subjects.</p>
Full article ">Figure 11
<p>The confusion matrix provides a detailed breakdown of the classifications made by the Phi-3.5-vision model when applied to raw images without YOLOv10-X object detection support.</p>
Full article ">Figure 12
<p>Confusion matrix for the Phi-3.5 model using the bounding boxes from the test case images.</p>
Full article ">Figure 13
<p>Alpaca JSON format showing the question–answer pairs.</p>
Full article ">Figure 14
<p>Sample report using Alpaca Q&amp;A.</p>
Full article ">Figure A1
<p>Q1. Read the label on the bounding box to identify the animal. What is the species identified in the image, and what is its IUCN conservation status?</p>
Full article ">Figure A2
<p>Q2 Read the label on the bounding box to identify the animal. What is the average weight of the species identified, and does this species have any notable characteristics or behaviours?</p>
Full article ">Figure A3
<p>Q3. Was the image taken during the day or night, and what environmental factors can be observed (e.g., forest, bush, water sources)?</p>
Full article ">Figure A4
<p>Q4. Read the label on the bounding box to identify the animal. How does the species identified in the image compare to other species in the same habitat in terms of size, behaviour, and diet?</p>
Full article ">Figure A5
<p>Q5. Read the label on the bounding box to identify animals. Can you identify other animals or objects in the image, such as nearby trees, water bodies, or structures?</p>
Full article ">Figure A6
<p>Q6. Read the labels on the bounding boxes to identify animals. What animals are in the image and how many are there of each animal species identified?</p>
Full article ">Figure A7
<p>Q7. Based on the species and its habits, what predictions can be made about its activity at the time the camera trap image was taken (e.g., hunting, foraging, resting)?</p>
Full article ">Figure A8
<p>Q8. Read the label on the bounding box around the animal to determine the species. What potential threats, either natural or human-induced, are most relevant to the species in the image, given its current IUCN status and environment?</p>
Full article ">Figure A9
<p>Q9. Read the label on the bounding box around the animal to determine the species. What is the species role in the ecosystem, and how does its presence effect other species or the environment in the area where the image was captured?</p>
Full article ">Figure A10
<p>Q10. Read the label on the bounding box around the animal to determine the species. What are the known predators or threats to the species in the image, and are there any visible indicators in the environment that suggest the presence of these threats?</p>
Full article ">
28 pages, 12307 KiB  
Article
Enhance the Concrete Crack Classification Based on a Novel Multi-Stage YOLOV10-ViT Framework
by Ali Mahmoud Mayya and Nizar Faisal Alkayem
Sensors 2024, 24(24), 8095; https://doi.org/10.3390/s24248095 - 19 Dec 2024
Viewed by 175
Abstract
Early identification of concrete cracks and multi-class detection can help to avoid future deformation or collapse in concrete structures. Available traditional detection and methodologies require enormous effort and time. To overcome such difficulties, current vision-based deep learning models can effectively detect and classify [...] Read more.
Early identification of concrete cracks and multi-class detection can help to avoid future deformation or collapse in concrete structures. Available traditional detection and methodologies require enormous effort and time. To overcome such difficulties, current vision-based deep learning models can effectively detect and classify various concrete cracks. This study introduces a novel multi-stage deep learning framework for crack detection and type classification. First, the recently developed YOLOV10 model is trained to detect possible defective regions in concrete images. After that, a modified vision transformer (ViT) model is trained to classify concrete images into three main types: normal, simple cracks, and multi-branched cracks. The evaluation process includes feeding concrete test images into the trained YOLOV10 model, identifying the possible defect regions, and finally delivering the detected regions into the trained ViT model, which decides the appropriate crack type of those detected regions. Experiments are conducted using the individual ViT model and the proposed multi-stage framework. To improve the generation ability, multi-source datasets of concrete structures are used. For the classification part, a concrete crack dataset consisting of 12,000 images of three classes is utilized, while for the detection part, a dataset composed of various materials from historical buildings containing 1116 concrete images with their corresponding bounding boxes, is utilized. Results prove that the proposed multi-stage model accurately classifies crack types with 90.67% precision, 90.03% recall, and 90.34% F1-score. The results also show that the proposed model outperforms the individual classification model by 10.9%, 19.99%, and 19.2% for precision, recall, and F1-score, respectively. The proposed multi-stage YOLOV10-ViT model can be integrated into the construction systems which are based on crack materials to obtain early warning of possible future deformation in concrete structures. Full article
Show Figures

Figure 1

Figure 1
<p>Flowchart of the proposed multi-stage crack detection and classification framework.</p>
Full article ">Figure 2
<p>Some samples of the utilized: (<b>a</b>) concrete classification and (<b>b</b>) detection datasets.</p>
Full article ">Figure 2 Cont.
<p>Some samples of the utilized: (<b>a</b>) concrete classification and (<b>b</b>) detection datasets.</p>
Full article ">Figure 3
<p>Deep learning models: (<b>a</b>) YOLOV10 detection model; (<b>b</b>,<b>c</b>) ViT backbone (feature extraction head); and (<b>d</b>) ViT backbone with a classification part; * is the extra learnable [class] embedding.</p>
Full article ">Figure 3 Cont.
<p>Deep learning models: (<b>a</b>) YOLOV10 detection model; (<b>b</b>,<b>c</b>) ViT backbone (feature extraction head); and (<b>d</b>) ViT backbone with a classification part; * is the extra learnable [class] embedding.</p>
Full article ">Figure 4
<p>Samples of the training set of YOLOV10 and ViT model and their corresponding data (<b>a</b>) albumentation, (<b>b</b>) augmentation.</p>
Full article ">Figure 5
<p>Training and validation accuracy and loss curves of the trained ViT model: (<b>a</b>) Accuracy; (<b>b</b>) Loss.</p>
Full article ">Figure 6
<p>Confusion matrix and ROC plot for the trained ViT model using the multi-class crack dataset: (<b>a</b>) Confusion Matrix; (<b>b</b>) ROC plot.</p>
Full article ">Figure 7
<p>Correctly classified samples of the crack test set, their true labels, and the predicted labels with corresponding probabilities.</p>
Full article ">Figure 8
<p>Trained YOLOV10 loss calculation per training epochs: (<b>a</b>) Box Loss; (<b>b</b>) CLS Loss; (<b>c</b>) DFL Loss.</p>
Full article ">Figure 9
<p>Validation metrics of the trained YOLOV10.</p>
Full article ">Figure 10
<p>The predicted bounding boxes of the trained YOLOV10 model on some test examples: (<b>a</b>) Ground truth; (<b>b</b>) Prediction result.</p>
Full article ">Figure 11
<p>The outputs of the multi-stage crack prediction and classification model using some test samples.</p>
Full article ">Figure 12
<p>Misclassified samples of the crack test set.</p>
Full article ">Figure 13
<p>Evaluation metrics of the trained YOLOV10: (<b>a</b>) Precision-Confidence curve; (<b>b</b>) Recall-Confidence curve; (<b>c</b>) F1-Confidence curve; (<b>d</b>) Precision-Confidence curve.</p>
Full article ">Figure 14
<p>Comparison between the ViT individual model and the proposed Multi-stage YOLOV10-ViT model response: (<b>a</b>) Prediction based on YOLOV10-ViT model; (<b>b</b>) prediction based on ViT model.</p>
Full article ">Figure 14 Cont.
<p>Comparison between the ViT individual model and the proposed Multi-stage YOLOV10-ViT model response: (<b>a</b>) Prediction based on YOLOV10-ViT model; (<b>b</b>) prediction based on ViT model.</p>
Full article ">Figure 15
<p>Comparison between the ViT individual model and the proposed Multi-stage YOLOV10-ViT model performance: (<b>a</b>) Precision, recall, and F1-score comparison; (<b>b</b>) average values comparison.</p>
Full article ">Figure 16
<p>Training and validation curves of the new training parameters (patch = 16, input size = 224 × 224 × 3).</p>
Full article ">Figure 17
<p>Training and validation curves resulted from training the entire ViT layers with eight multi-head attention layers and eight transformer layers (<b>a</b>) Lr = 0.0001, (<b>b</b>) Lr = 0.001.</p>
Full article ">Figure 18
<p>Accuracy, precision, recall, and 1-score comparison of training ViT model using different optimizers and learning rates.</p>
Full article ">Figure 19
<p>Accuracy comparison of the ViT-trained models under different patience factor values.</p>
Full article ">Figure 20
<p>Performance comparison of YOLO model under different hyperparameter values: (<b>a</b>) evaluation metrics; (<b>b</b>) time.</p>
Full article ">Figure 21
<p>Some misclassified/miss detected samples: (<b>a</b>) Single crack case, (<b>b</b>) multi-branched crack case, and (<b>c</b>) multi-branched crack case with small undetected crack region.</p>
Full article ">
15 pages, 13255 KiB  
Article
AI-Based Analysis of Archery Shooting Time from Anchoring to Release Using Pose Estimation and Computer Vision
by Seungkeon Lee, Ji-Yeon Moon, Jinman Kim and Eui Chul Lee
Appl. Sci. 2024, 14(24), 11838; https://doi.org/10.3390/app142411838 - 18 Dec 2024
Viewed by 259
Abstract
This study presents a novel method for automatically analyzing archery shooting time using AI and computer vision technologies, with a particular focus on the critical anchoring to release phase, which directly influences performance. The proposed approach detects the start of the anchoring phase [...] Read more.
This study presents a novel method for automatically analyzing archery shooting time using AI and computer vision technologies, with a particular focus on the critical anchoring to release phase, which directly influences performance. The proposed approach detects the start of the anchoring phase using pose estimation and accurately measures the shooting time by detecting the bowstring within the athlete’s facial bounding box, utilizing Canny edge detection and the probabilistic Hough transform. To ensure stability, low-pass filtering was applied to both the facial bounding box and pose estimation results, and an algorithm was implemented to handle intermittent bowstring detection due to various external factors. The proposed method was validated by comparing its results with expert manual measurements obtained using Dartfish software v2022 achieving a mean absolute error (MAE) of 0.34 s and an R2 score of 0.95. This demonstrates a significant improvement compared to the bowstring-only method, which resulted in an MAE of 1.4 s and an R2 score of 0.89. Previous research has demonstrated a correlation between shooting time and arrow accuracy. Therefore, this method can provide real-time feedback to athletes, overcoming the limitations of traditional manual measurement techniques. It enables immediate technical adjustments during training, which can contribute to overall performance improvement. Full article
(This article belongs to the Special Issue Advances in Motion Monitoring System)
Show Figures

Figure 1

Figure 1
<p>Illustration of the archery shooting process, showcasing key stages from aiming to release, as captured comprehensively in each video clip of the dataset.</p>
Full article ">Figure 2
<p>Mediapipe pose estimation landmarks and corresponding index numbers.</p>
Full article ">Figure 3
<p>Example of calculating arm angles based on 3D pose estimation.</p>
Full article ">Figure 4
<p>Example of a process to detect the bowstring within the archer’s face bounding box.</p>
Full article ">Figure 5
<p>Example of applying the proposed method to measure archery shooting time using actual video clips: (<b>a</b>) illustrates the measurement of an archer’s shooting time; (<b>b</b>) visualizes real-time changes in the angle of the right arm as a graph to identify the anchoring start point.</p>
Full article ">Figure 6
<p>Scatter plot with regression line showing the relationship between the anchoring start points detected using the proposed pose estimation-based method and the actual anchoring start points, along with the <math display="inline"><semantics> <msup> <mi>R</mi> <mn>2</mn> </msup> </semantics></math> score indicating the model’s accuracy.</p>
Full article ">Figure 7
<p>Scatter plot with regression line showing the relationship between measured shooting times using the proposed method and actual shooting times, along with the <math display="inline"><semantics> <msup> <mi>R</mi> <mn>2</mn> </msup> </semantics></math> score indicating the model’s accuracy.</p>
Full article ">Figure 8
<p>Scatter plot with a regression line showing the relationship between the measured shooting time using only the bowstring and the actual shooting time, and the <math display="inline"><semantics> <msup> <mi>R</mi> <mn>2</mn> </msup> </semantics></math> score indicating the accuracy of the model.</p>
Full article ">Figure 9
<p>Scatter plot with regression line showing the relationship between measured shooting time and actual shooting time using only arm angle and <math display="inline"><semantics> <msup> <mi>R</mi> <mn>2</mn> </msup> </semantics></math> score showing the accuracy of the model.</p>
Full article ">
16 pages, 2492 KiB  
Article
Improving the Perception of Objects Under Daylight Foggy Conditions in the Surrounding Environment
by Mohamad Mofeed Chaar, Jamal Raiyn and Galia Weidl
Vehicles 2024, 6(4), 2154-2169; https://doi.org/10.3390/vehicles6040105 - 18 Dec 2024
Viewed by 204
Abstract
Autonomous driving (AD) technology has seen significant advancements in recent years; however, challenges remain, particularly in achieving reliable performance under adverse weather conditions such as heavy fog. In response, we propose a multi-class fog density classification approach to enhance the AD system performance. [...] Read more.
Autonomous driving (AD) technology has seen significant advancements in recent years; however, challenges remain, particularly in achieving reliable performance under adverse weather conditions such as heavy fog. In response, we propose a multi-class fog density classification approach to enhance the AD system performance. By categorizing fog density into multiple levels (25%, 50%, 75%, and 100%) and generating separate datasets for each class using the CARLA simulator, we improve the perception accuracy for each specific fog density level and analyze the effects of varying fog intensities. This targeted approach offers benefits such as improved object detection, specialized training for each fog class, and increased generalizability. Our results demonstrate enhanced perception of various objects, including cars, buses, trucks, vans, pedestrians, and traffic lights, across all fog densities. This multi-class fog density method is a promising advancement toward achieving reliable AD performance in challenging weather, improving both the precision and recall of object detection algorithms under diverse fog conditions. Full article
Show Figures

Figure 1

Figure 1
<p>For a start, determine the fog density of the input image, then use a model specifically trained for that fog density level. In this example, the fog density is 75%.</p>
Full article ">Figure 2
<p>An RGB camera sensor in the CARLA simulator captures an image, the horizontal field of view in degrees (fov) is 90.0 degrees, and the dimensions are 1280 × 720 (see <a href="#app2-vehicles-06-00105" class="html-app">Appendix B</a>) with bounding boxes overlaid on the image to identify objects within the scene. The bounding boxes are generated automatically by the simulator, and they provide a visual representation of the objects’ positions and dimensions. The fog density on this image is 100%.</p>
Full article ">Figure 3
<p>We tested our object detection model in heavy fog conditions (fog density 100%) using a model that was trained with labels for all objects up to 200 m. The model trained with data under fog density 100% outperformed the model trained on clear data in detecting objects at long distances. As shown in (<b>a</b>), on the left, the fog-trained model successfully detected distant objects, while the clear-data model struggled to do so, as evident in (<b>b</b>) on the right. This difference in performance primarily increases the recall of the model. The red box describes the cars, the orange box describes the vans, and the green box describes the traffic lights.</p>
Full article ">
16 pages, 9121 KiB  
Technical Note
A Benchmark Dataset for Aircraft Detection in Optical Remote Sensing Imagery
by Jianming Hu, Xiyang Zhi, Bingxian Zhang, Tianjun Shi, Qi Cui and Xiaogang Sun
Remote Sens. 2024, 16(24), 4699; https://doi.org/10.3390/rs16244699 - 17 Dec 2024
Viewed by 231
Abstract
The problem is that existing aircraft detection datasets rarely simultaneously consider the diversity of target features and the complexity of environmental factors, which has become an important factor restricting the effectiveness and reliability of aircraft detection algorithms. Although a large amount of research [...] Read more.
The problem is that existing aircraft detection datasets rarely simultaneously consider the diversity of target features and the complexity of environmental factors, which has become an important factor restricting the effectiveness and reliability of aircraft detection algorithms. Although a large amount of research has been devoted to breaking through few-sample-driven aircraft detection technology, most algorithms still struggle to effectively solve the problems of missed target detection and false alarms caused by numerous environmental interferences in bird-eye optical remote sensing scenes. To further aircraft detection research, we have established a new dataset, Aircraft Detection in Complex Optical Scene (ADCOS), sourced from various platforms including Google Earth, Microsoft Map, Worldview-3, Pleiades, Ikonos, Orbview-3, and Jilin-1 satellites. It integrates 3903 meticulously chosen images of over 400 famous airports worldwide, containing 33,831 annotated instances employing the oriented bounding box (OBB) format. Notably, this dataset encompasses a wide range of various targets characteristics including multi-scale, multi-direction, multi-type, multi-state, and dense arrangement, along with complex relationships between targets and backgrounds like cluttered backgrounds, low contrast, shadows, and occlusion interference conditions. Furthermore, we evaluated nine representative detection algorithms on the ADCOS dataset, establishing a performance benchmark for subsequent algorithm optimization. The latest dataset will soon be available on the Github website. Full article
(This article belongs to the Section Earth Observation Data)
Show Figures

Figure 1

Figure 1
<p>Building steps of the proposed dataset.</p>
Full article ">Figure 2
<p>Annotation results of typical scenes. (<b>a</b>) Typical scene examples and (<b>b</b>) images containing incomplete targets due to occlusion or field of view.</p>
Full article ">Figure 3
<p>Examples under different detection conditions. (<b>a</b>) Different platforms. (<b>b</b>) Different airports. (<b>c</b>) Different target states. (<b>d</b>) Different target types.</p>
Full article ">Figure 4
<p>Typical scenes that reflect the complex distribution characteristics of targets. (<b>a</b>) Multidirectional and multi-scale issue. (<b>b</b>) Dense permutation issue.</p>
Full article ">Figure 5
<p>Scale distribution of aircraft in the dataset. (<b>a</b>) Height and width distribution. (<b>b</b>) Area distribution.</p>
Full article ">Figure 6
<p>Typical scenes that reflect the complex relationship between targets and background. (<b>a</b>) Low-contrast issue. (<b>b</b>) Cluttered background interference. (<b>c</b>) Shadow issue. (<b>d</b>) Occlusion issue.</p>
Full article ">Figure 7
<p>Typical scenes that reflect detector imaging anomalies.</p>
Full article ">Figure 8
<p>Experimental results of typical advanced detection methods: (<b>a</b>) Rotated faster R-CNN, (<b>b</b>) <math display="inline"><semantics> <mrow> <msup> <mi mathvariant="normal">S</mi> <mn>2</mn> </msup> </mrow> </semantics></math>A-Net, (<b>c</b>) Oriented RepPoints, and (<b>d</b>) ground truth. It should be noted that green represents ground truth, red represents algorithm prediction results, and yellow represents error detection.</p>
Full article ">Figure 9
<p>Experimental results for typical examples by using YOLOv8: (<b>a</b>) detection result of YOLOv8 and (<b>b</b>) ground truth. It should be noted that pink represents ground truth, and red represents algorithm prediction results.</p>
Full article ">
19 pages, 6294 KiB  
Article
Lightweight Detection Counting Method for Pill Boxes Based on Improved YOLOv8n
by Weiwei Sun, Xinbin Niu, Zedong Wu and Zhongyuan Guo
Electronics 2024, 13(24), 4953; https://doi.org/10.3390/electronics13244953 - 16 Dec 2024
Viewed by 287
Abstract
Vending machines have evolved into a critical element of the intelligent healthcare service system. To enhance the precision of pill box detection counting and cater to the lightweight requirements of its internal embedded controller for deep learning frameworks, an enhanced lightweight YOLOv8n model [...] Read more.
Vending machines have evolved into a critical element of the intelligent healthcare service system. To enhance the precision of pill box detection counting and cater to the lightweight requirements of its internal embedded controller for deep learning frameworks, an enhanced lightweight YOLOv8n model is introduced. A dataset comprising 4080 images is initially compiled for model training and assessment purposes. The refined YOLOv8n-ShuffleNetV2 model is crafted, featuring the integration of ShuffleNetv2 as the new backbone network, the incorporation of the VoVGSCSP module to bolster feature extraction capabilities, and the utilization of the Wise-IoU v3 loss function for bounding box regression enhancement. Moreover, a model pruning strategy based on structured pruning (SFP) and layer-wise adaptive magnitude pruning (LAMP) is implemented. Comparative experimental findings demonstrate that the enhanced and pruned model has elevated the mean Average Precision (mAP) rate from 94.5% to 95.1%. Furthermore, the model size has been reduced from 11.1 MB to 6.0 MB, and the inference time has been notably decreased from 1.97 s to 0.34 s. The model’s accuracy and efficacy are validated through experiments conducted on the Raspberry Pi 4B platform. The outcomes of the experiments underscore how the refined model significantly amplifies the deployment efficiency of the deep learning model on resource-limited devices, thus greatly supporting the advancement of intelligent medicine management and medical vending machine applications. Full article
Show Figures

Figure 1

Figure 1
<p>Example of a conveyor belt scenario with different numbers of pill boxes: (<b>a</b>) a scene with a solitary box; (<b>b</b>) a scene with two pill boxes side by side; (<b>c</b>) a scene with a grouping of three boxes.</p>
Full article ">Figure 2
<p>LabelImg labeling process.</p>
Full article ">Figure 3
<p>Label field structure.</p>
Full article ">Figure 4
<p>Examples of data enhancement sections: (<b>a</b>) original image; (<b>b</b>) mirror image; (<b>c</b>) panning image; (<b>d</b>) rotation image; (<b>e</b>) noise addition image; (<b>f</b>) light and shadow transformations image.</p>
Full article ">Figure 5
<p>Network structure diagram of YOLOv8n.</p>
Full article ">Figure 6
<p>Network structure diagram of improved YOLOv8n.</p>
Full article ">Figure 7
<p>Structure diagram of ShuffleNetv2-Block.</p>
Full article ">Figure 8
<p>Network structure diagram of improved YOLOv8n.</p>
Full article ">Figure 9
<p>Pruning optimization flowchart.</p>
Full article ">Figure 10
<p>Comparison of channels before and after pruning.</p>
Full article ">Figure 11
<p>Examples of pill box test results.</p>
Full article ">
30 pages, 6897 KiB  
Article
Research on UAV Autonomous Recognition and Approach Method for Linear Target Splicing Sleeves Based on Deep Learning and Active Stereo Vision
by Guocai Zhang, Guixiong Liu and Fei Zhong
Electronics 2024, 13(24), 4872; https://doi.org/10.3390/electronics13244872 (registering DOI) - 10 Dec 2024
Viewed by 386
Abstract
This study proposes an autonomous recognition and approach method for unmanned aerial vehicles (UAVs) targeting linear splicing sleeves. By integrating deep learning and active stereo vision, this method addresses the navigation challenges faced by UAVs during the identification, localization, and docking of splicing [...] Read more.
This study proposes an autonomous recognition and approach method for unmanned aerial vehicles (UAVs) targeting linear splicing sleeves. By integrating deep learning and active stereo vision, this method addresses the navigation challenges faced by UAVs during the identification, localization, and docking of splicing sleeves on overhead power transmission lines. First, a two-stage localization strategy, LC (Local Clustering)-RB (Reparameterization Block)-YOLO (You Only Look Once)v8n (OBB (Oriented Bounding Box)), is developed for linear target splicing sleeves. This strategy ensures rapid, accurate, and reliable recognition and localization while generating precise waypoints for UAV docking with splicing sleeves. Next, virtual reality technology is utilized to expand the splicing sleeve dataset, creating the DSS dataset tailored to diverse scenarios. This enhancement improves the robustness and generalization capability of the recognition model. Finally, a UAV approach splicing sleeve (UAV-ASS) visual navigation simulation platform is developed using the Robot Operating System (ROS), the PX4 open-source flight control system, and the GAZEBO 3D robotics simulator. This platform simulates the UAV’s final approach to the splicing sleeves. Experimental results demonstrate that, on the DSS dataset, the RB-YOLOv8n(OBB) model achieves a mean average precision (mAP0.5) of 96.4%, with an image inference speed of 86.41 frames per second. By incorporating the LC-based fine localization method, the five rotational bounding box parameters (x, y, w, h, and angle) of the splicing sleeve achieve a mean relative error (MRE) ranging from 3.39% to 4.21%. Additionally, the correlation coefficients (ρ) with manually annotated positions improve to 0.99, 0.99, 0.98, 0.95, and 0.98, respectively. These improvements significantly enhance the accuracy and stability of splicing sleeve localization. Moreover, the developed UAV-ASS visual navigation simulation platform effectively validates high-risk algorithms for UAV autonomous recognition and docking with splicing sleeves on power transmission lines, reducing testing costs and associated safety risks. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

Figure 1
<p>UAV carrying DR equipment approaches and docks with the splicing sleeve on overhead transmission lines. (<b>a</b>) Splicing sleeve; (<b>b</b>) DR; (<b>c</b>) UAV; (<b>d</b>) Approaching; (<b>e</b>) Docking/Hanging.</p>
Full article ">Figure 2
<p>Aerial views of splicing sleeves on overhead transmission lines. (<b>a</b>) Distant view; (<b>b</b>) Medium-distance view; (<b>c</b>) Close-up view; (<b>d</b>) Third-person aerial view showing the UAV inspecting the transmission line and splicing sleeves.</p>
Full article ">Figure 3
<p>Block Diagram of UAV-ASS Method.</p>
Full article ">Figure 4
<p>The network architecture diagram of the RB-YOLOv8(OBB) model.</p>
Full article ">Figure 5
<p>Structure and Reparameterization Process of the RepBlock Module.</p>
Full article ">Figure 6
<p>Schematic Diagram of the Rotated Bounding Box Output for the Splicing Sleeve from Rapid Localization.</p>
Full article ">Figure 7
<p>Diagram of the Fine Localization Principle for Rotational Object Detection Using LC. (<b>a</b>) Boundary calculation of the coarsely localized rectangular box for the splicing sleeve in the local region <span class="html-italic">R</span><sub>DE</sub>; (<b>b</b>) Boundary calculation of the coarsely localized rectangular box for the splicing sleeve in the local region <span class="html-italic">R</span><sub>DE</sub>, and clustering and fitting of the depth values <span class="html-italic">D</span><sub>R</sub> in region <span class="html-italic">R</span><sub>DE</sub>; (<b>c</b>) Fine localization of the splicing sleeve’s rotated bounding box B<sub>F_RGB</sub>.</p>
Full article ">Figure 8
<p>Rules for obtaining <span class="html-italic">D</span><sub>UAV-SS.</sub></p>
Full article ">Figure 9
<p>UAV-ASS Coordinate system.</p>
Full article ">Figure 10
<p>Schematic Diagram of UAV Approaching the Splicing Sleeve.</p>
Full article ">Figure 11
<p>Virtual Scenario of Splicing Sleeve for Dataset Augmentation.</p>
Full article ">Figure 12
<p>Comparison of Model Size, mAP0.5, and Speed for Different Rotational Object Detection Models. (<b>a</b>) Model Size (MB) vs. FPS; (<b>b</b>) mAP0.5 (%) vs. FPS.</p>
Full article ">Figure 13
<p>Different scene recognition effect diagrams, with (<b>a</b>–<b>c</b>), (<b>d</b>–<b>f</b>), and (<b>g</b>–<b>i</b>) corresponding to the hazy static, real/virtual, and UAV aerial dynamic scenarios, respectively.</p>
Full article ">Figure 14
<p>Diagram of Localization Using Three Methods.</p>
Full article ">Figure 15
<p>Comparison of Five Coordinate Parameters Using Different Methods Across Various Metrics. (<b>a</b>) MAE of Parameters; (<b>b</b>) MRE of Parameters; (<b>c</b>) RMSE of Parameters; (<b>d</b>) <span class="html-italic">ρ</span> of Parameters.</p>
Full article ">Figure 15 Cont.
<p>Comparison of Five Coordinate Parameters Using Different Methods Across Various Metrics. (<b>a</b>) MAE of Parameters; (<b>b</b>) MRE of Parameters; (<b>c</b>) RMSE of Parameters; (<b>d</b>) <span class="html-italic">ρ</span> of Parameters.</p>
Full article ">Figure 16
<p>Positioning results of B<sub>C_RGB</sub> and B<sub>F_RGB</sub>, B<sub>F_Depth</sub>. Panels (<b>a</b>–<b>c</b>), as well as (<b>d</b>–<b>f</b>), represent the B<sub>C_RGB</sub>, B<sub>F_RGB</sub>, and the B<sub>F_Depth</sub> results when the distance between the UAV and the splicing sleeve is 4.8 m and 1.2 m, respectively. Panels (<b>g</b>–<b>i</b>) display the positioning results of B<sub>C_RGB</sub>, B<sub>F_RGB</sub>, and B<sub>F_Depth</sub> using an Intel D455 depth camera in a laboratosry environment. The resolution of the images in panels (<b>a</b>–<b>f</b>) is 848 × 480, while the resolution in panels (<b>g</b>–<b>i</b>) is 640 × 480.</p>
Full article ">Figure 17
<p>UAV-ASS visual simulation system.</p>
Full article ">Figure 18
<p>UAV-ASS visual simulation system interface. (<b>a</b>) Main interface of the UAV-ASS simulation; (<b>b</b>) resulting RGB image of the UAV’s visual recognition and localization of the splicing sleeve; (<b>c</b>) depth map of the UAV’s visual recognition and localization of the splicing sleeve.</p>
Full article ">Figure 19
<p>UAV calculating <span class="html-italic">D</span><sub>UAV-SS</sub> using B<sub>C_Depth</sub> and B<sub>F_ Depth</sub> for splicing sleeve localization. (<b>a</b>) UAV Fixed-Point Rotation; (<b>b</b>) <span class="html-italic">D</span><sub>UAV-SS</sub> extraction using B<sub>C_Depth</sub> localization; (<b>c</b>) <span class="html-italic">D</span><sub>UAV-SS</sub> extraction using B<sub>F_Depth</sub> localization.</p>
Full article ">Figure 20
<p>Video screenshots of the UAV body coordinate trajectory and B<sub>F_RGB</sub>-located splicing sleeve position during the UAV recognition and docking process.</p>
Full article ">Figure 20 Cont.
<p>Video screenshots of the UAV body coordinate trajectory and B<sub>F_RGB</sub>-located splicing sleeve position during the UAV recognition and docking process.</p>
Full article ">Figure 21
<p>Changes in the UAV Pose Adjustment Process during Approach and Docking.</p>
Full article ">Figure 22
<p>UAV Initial Positions at Different Starting Points.</p>
Full article ">
13 pages, 2146 KiB  
Article
Developing an Alert System for Agricultural Protection: Sika Deer Detection Using Raspberry Pi
by Sandhya Sharma, Buchaputara Pansri, Suresh Timilsina, Bishnu Prasad Gautam, Yoshifumi Okada, Shinya Watanabe, Satoshi Kondo and Kazuhiko Sato
Electronics 2024, 13(23), 4852; https://doi.org/10.3390/electronics13234852 - 9 Dec 2024
Viewed by 460
Abstract
Agricultural loss due to the overpopulation of Sika deer poses a significant challenge in Japan, leading to frequent human–wildlife conflicts. We conducted a study in Muroran, Hokkaido (42°22′56.1″ N–141°01′51.5″ E), with the objective of monitoring Sika deer and notifying farmers and locals. We [...] Read more.
Agricultural loss due to the overpopulation of Sika deer poses a significant challenge in Japan, leading to frequent human–wildlife conflicts. We conducted a study in Muroran, Hokkaido (42°22′56.1″ N–141°01′51.5″ E), with the objective of monitoring Sika deer and notifying farmers and locals. We deployed a Sika deer detection model (YOLOv8-nano) on a Raspberry Pi, integrated with an infrared camera that captured images only when a PIR sensor was triggered. To further understand the timing of Sika deer visits and potential correlations with environmental temperature and humidity, respective sensors were installed on Raspberry Pi and the data were analyzed using an ANOVA test. In addition, a buzzer was deployed to deter Sika deer from the study area. The buzzer was deactivated in the first 10 days after deployment and was activated in the following 20 days. The Sika deer detection model demonstrated excellent performance, with precision and recall values approaching 1, and a bounding box creation latency of 0.82 frames per second. Once a bounding box was established after Sika deer detection, alert notifications were automatically sent via email and the LINE messaging application, with an average notification time of 0.32 s. Regarding the buzzer’s impact on Sika deer, 35% of the detected individuals reacted by standing upright with alert ears, while 65% immediately fled the area. Analysis revealed that the time of day for Sika deer visits was significantly correlated with humidity (F = 8.95, p < 0.05), but no significant association with temperature (F = 0.681, p > 0.05). These findings represent a significant step toward mitigating human–wildlife conflicts and reducing agricultural production losses through effective conservation measures. Full article
Show Figures

Figure 1

Figure 1
<p>Visualization of the research motivation.</p>
Full article ">Figure 2
<p>Circuit diagram for Raspberry Pi-based Sika deer monitoring and alert system.</p>
Full article ">Figure 3
<p>Workflow illustrating Sika deer real-time detection and alert mechanism.</p>
Full article ">Figure 4
<p>Components connected to the Raspberry Pi for Sika deer detection in the field: (<b>a</b>) external components housed inside a plastic box; (<b>b</b>) sensors, camera, and switch positioned outside the plastic box; (<b>c</b>) solar panel providing a continuous power supply; and (<b>d</b>) Raspberry Pi with components enclosed in a plastic box alongside a solar panel installed in the study field.</p>
Full article ">Figure 5
<p>Curves representing box and class loss for training and validation datasets.</p>
Full article ">Figure 6
<p>Performance metrics across multiple epochs.</p>
Full article ">Figure 7
<p>The number of herds observed daily before the buzzer activation was recorded. Day 1 represented the first day without the buzzer activation on the Raspberry Pi, followed by subsequent days.</p>
Full article ">Figure 8
<p>Images of individual Sika deer within each herd: (<b>a</b>) Herd 1, consisting of three individual Sika deer; and (<b>b</b>) Herd 2, consisting of two individual Sika deer.</p>
Full article ">Figure 9
<p>Visualization of alert mechanisms following Sika deer detection in captured images: (<b>a</b>) identification of Sika deer using bounding boxes in captured images; (<b>b</b>) alert notification via email; (<b>c</b>) alert notification through LINE application; and (<b>d</b>) analysis of Sika deer behavior following buzzer activation.</p>
Full article ">Figure 10
<p>Herd observations recorded at different times throughout the day, along with environmental temperature and humidity, after buzzer activation. The term “Day” refers to the number of days since the buzzer was activated; for example, “Day 1” indicates the first day after activation, “Day 4” refers to the fourth day, and so forth. The visualization only includes days when herds of Sika deer were observed, excluding days without any sightings.</p>
Full article ">
17 pages, 5445 KiB  
Article
CaLiJD: Camera and LiDAR Joint Contender for 3D Object Detection
by Jiahang Lyu, Yongze Qi, Suilian You, Jin Meng, Xin Meng, Sarath Kodagoda and Shifeng Wang
Remote Sens. 2024, 16(23), 4593; https://doi.org/10.3390/rs16234593 - 6 Dec 2024
Viewed by 578
Abstract
Three-dimensional object detection has been a key area of research in recent years because of its rich spatial information and superior performance in addressing occlusion issues. However, the performance of 3D object detection still lags significantly behind that of 2D object detection, owing [...] Read more.
Three-dimensional object detection has been a key area of research in recent years because of its rich spatial information and superior performance in addressing occlusion issues. However, the performance of 3D object detection still lags significantly behind that of 2D object detection, owing to challenges such as difficulties in feature extraction and a lack of texture information. To address this issue, this study proposes a 3D object detection network, CaLiJD (Camera and Lidar Joint Contender for 3D object Detection), guided by two-dimensional detection results. CaLiJD creatively integrates advanced channel attention mechanisms with a novel bounding-box filtering method to improve detection accuracy, especially for small and occluded objects. Bounding boxes are detected by the 2D and 3D networks for the same object in the same scene as an associated pair. The detection results that satisfy the criteria are then fed into the fusion layer for training. In this study, a novel fusion network is proposed. It consists of numerous convolutions arranged in both sequential and parallel forms and includes a Grouped Channel Attention Module for extracting interactions among multi-channel information. Moreover, a novel bounding-box filtering mechanism was introduced, incorporating the normalized distance from the object to the radar as a filtering criterion within the process. Experiments were conducted using the KITTI 3D object detection benchmark. The results showed that a substantial improvement in mean Average Precision (mAP) was achieved by CaLiJD compared with the baseline single-modal 3D detection model, with an enhancement of 7.54%. Moreover, the improvement achieved by our method surpasses that of other classical fusion networks by an additional 0.82%. In particular, CaLiJD achieved mAP values of 73.04% and 59.86%, respectively, thus demonstrating state-of-the-art performance for challenging small-object detection tasks such as those involving cyclists and pedestrians. Full article
(This article belongs to the Special Issue Point Cloud Processing with Machine Learning)
Show Figures

Figure 1

Figure 1
<p>Quantitative Analysis of Camera, LiDAR, Fusion and Proposed Methods. It shows the performance comparison between CaLiJD and other state-of-the-art models on the Kitti dataset, where the horizontal axis represents the different kinds of methods and the vertical axis represents the AP values (Moderate) of the different methods on the kitti dataset.</p>
Full article ">Figure 2
<p>The overall network architecture of CaLiJD. It is mainly divided into a backbone network, a data selection module and a fusion layer. SECOND and C_RCNN represent the 3D and 2D backbone networks of CaLiJD, which are applied to obtain the candidate boxes required for the fusion network. The obtained candidates are filtered and input to the fusion layer for training.</p>
Full article ">Figure 3
<p>Selection mechanism for CaLiJD. (<b>a</b>) demonstrates the data screening method in the traditional late-fusion network, and (<b>b</b>) demonstrates the data screening method in CaLiJD.</p>
Full article ">Figure 4
<p>The overall structure of the feature fusion layer.</p>
Full article ">Figure 5
<p>Grouped Channel Attention Module. <math display="inline"><semantics> <mi>H</mi> </semantics></math> is the feature map obtained after fusion of features in the fusion layer. <math display="inline"><semantics> <mrow> <mi>H</mi> <mo>′</mo> </mrow> </semantics></math> denotes the feature map that has been assigned weights by GCAM. <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="normal">h</mi> <mn>1</mn> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="normal">h</mi> <mn>2</mn> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="normal">h</mi> <mn>3</mn> </msub> </mrow> </semantics></math> represent three different features extracted from different grouped convolutions.</p>
Full article ">Figure 6
<p>Grouped Channel Attention Module.</p>
Full article ">Figure 7
<p>Visualization of the KITTI dataset. (<b>a</b>,<b>d</b>,<b>g</b>,<b>j</b>) show 2D images from four different scenes. (<b>b</b>,<b>e</b>,<b>h</b>,<b>k</b>) present the visualizations of 3D detection using the SECOND for these four scenes. (<b>b</b>,<b>e</b>,<b>h</b>,<b>k</b>) present visualizations of 3D detection using SECOND for these four scenes. (<b>c</b>,<b>f</b>,<b>i</b>,<b>l</b>) show the visualization results for the CaLiJD. The green 3D bounding boxes represent the detection results, whereas the areas within the yellow circles indicate the erroneous detections.</p>
Full article ">Figure 7 Cont.
<p>Visualization of the KITTI dataset. (<b>a</b>,<b>d</b>,<b>g</b>,<b>j</b>) show 2D images from four different scenes. (<b>b</b>,<b>e</b>,<b>h</b>,<b>k</b>) present the visualizations of 3D detection using the SECOND for these four scenes. (<b>b</b>,<b>e</b>,<b>h</b>,<b>k</b>) present visualizations of 3D detection using SECOND for these four scenes. (<b>c</b>,<b>f</b>,<b>i</b>,<b>l</b>) show the visualization results for the CaLiJD. The green 3D bounding boxes represent the detection results, whereas the areas within the yellow circles indicate the erroneous detections.</p>
Full article ">Figure 8
<p>Comparison of mAP values for detection results on the car split. The figure presents a bar chart that visually compares the AP values of three algorithms, including CaLiJD, on the KITTI dataset’s car split under recall levels of 11 and 40. Yellow represents CaLiJD, red represents CLOCs, and blue represents SECOND. (<b>a</b>) shows the results for 3D object detection, while (<b>b</b>) displays the results for BEV detection.</p>
Full article ">
17 pages, 9263 KiB  
Article
HHS-RT-DETR: A Method for the Detection of Citrus Greening Disease
by Yi Huangfu, Zhonghao Huang, Xiaogang Yang, Yunjian Zhang, Wenfeng Li, Jie Shi and Linlin Yang
Agronomy 2024, 14(12), 2900; https://doi.org/10.3390/agronomy14122900 - 4 Dec 2024
Viewed by 473
Abstract
Background: Given the severe economic burden that citrus greening disease imposes on fruit farmers and related industries, rapid and accurate disease detection is particularly crucial. This not only effectively curbs the spread of the disease, but also significantly reduces reliance on manual detection [...] Read more.
Background: Given the severe economic burden that citrus greening disease imposes on fruit farmers and related industries, rapid and accurate disease detection is particularly crucial. This not only effectively curbs the spread of the disease, but also significantly reduces reliance on manual detection within extensive citrus planting areas. Objective: In response to this challenge, and to address the issues posed by resource-constrained platforms and complex backgrounds, this paper designs and proposes a novel method for the recognition and localization of citrus greening disease, named the HHS-RT-DETR model. The goal of this model is to achieve precise detection and localization of the disease while maintaining efficiency. Methods: Based on the RT-DETR-r18 model, the following improvements are made: the HS-FPN (high-level screening-feature pyramid network) is used to improve the feature fusion and feature selection part of the RT-DETR model, and the filtered feature information is merged with the high-level features by filtering out the low-level features, so as to enhance the feature selection ability and multi-level feature fusion ability of the model. In the feature fusion and feature selection sections, the HWD (hybrid wavelet-directional filter banks) downsampling operator is introduced to prevent the loss of effective information in the channel and reduce the computational complexity of the model. Through using the ShapeIoU loss function to enable the model to focus on the shape and scale of the bounding box itself, the prediction of the bounding box of the model will be more accurate. Conclusions and Results: This study has successfully developed an improved HHS-RT-DETR model which exhibits efficiency and accuracy on resource-constrained platforms and offers significant advantages for the automatic detection of citrus greening disease. Experimental results show that the improved model, when compared to the RT-DETR-r18 baseline model, has achieved significant improvements in several key performance metrics: the precision increased by 7.9%, the frame rate increased by 4 frames per second (f/s), the recall rose by 9.9%, and the average accuracy also increased by 7.5%, while the number of model parameters reduced by 0.137×107. Moreover, the improved model has demonstrated outstanding robustness in detecting occluded leaves within complex backgrounds. This provides strong technical support for the early detection and timely control of citrus greening disease. Additionally, the improved model has showcased advanced detection capabilities on the PASCAL VOC dataset. Discussions: Future research plans include expanding the dataset to encompass a broader range of citrus species and different stages of citrus greening disease. In addition, the plans involve incorporating leaf images under various lighting conditions and different weather scenarios to enhance the model’s generalization capabilities, ensuring the accurate localization and identification of citrus greening disease in diverse complex environments. Lastly, the integration of the improved model into an unmanned aerial vehicle (UAV) system is envisioned to enable the real-time, regional-level precise localization of citrus greening disease. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Images of selected greening datasets (images of rock sugar oranges, Wokan oranges, and grapefruits in natural and simple backgrounds).</p>
Full article ">Figure 2
<p>Partial results of dataset expansion.</p>
Full article ">Figure 3
<p>HHS-RT-DETR model structure (the structural diagram of the improved model).</p>
Full article ">Figure 4
<p>Structure of feature selection module (feature selection network structure in HS-FPN network).</p>
Full article ">Figure 5
<p>Structure of SPFF feature fusion module (feature fusion network structure in HS-FPN network).</p>
Full article ">Figure 6
<p>ChannelAttention_HSFPN structure.</p>
Full article ">Figure 7
<p>Feature selection module (feature selection network module in ChannelAttention-HSFPN network).</p>
Full article ">Figure 8
<p>Feature fusion module (feature fusion network module in ChannelAttention-HSFPN network).</p>
Full article ">Figure 9
<p>HWD module structure.</p>
Full article ">Figure 10
<p>Comparison of detection effect between the RT-DETR-r18 model and HS-RT-DETR model. (<b>a</b>) on the left presents the detection results of the original RT-DETR-r18 model, while figure (<b>b</b>) on the right displays the outcomes of the enhanced HHS-RT-DETR model. A comparison between the two figures reveals that the area indicated by the yellow arrow was not detected by the original model, but it has been successfully identified in the improved model.</p>
Full article ">Figure 11
<p>HWD module compared with other modules to reduce the loss of context information (comparison of HWD downsampling method with max pooling, average pooling, and strided convolution methods).</p>
Full article ">Figure 12
<p>Comparison curves of different loss functions.</p>
Full article ">Figure 13
<p>Comparison of the thermal map effect between the original model and the improved model ((<b>a</b>) is the original image, (<b>b</b>) is the heatmap of object detection from the HHS-RT-DETR model, and (<b>c</b>) is the heatmap of object detection from the RT-DETR-r18 benchmark model).</p>
Full article ">Figure 14
<p>Comparison curves of different models.</p>
Full article ">
22 pages, 3079 KiB  
Article
A De-Nesting Hybrid Reliability Analysis Method and Its Application in Marine Structure
by Chenfeng Li, Tenglong Jin, Zequan Chen and Guanchen Wei
J. Mar. Sci. Eng. 2024, 12(12), 2221; https://doi.org/10.3390/jmse12122221 - 4 Dec 2024
Viewed by 403
Abstract
In recent years, marine structures have been widely used in the world, making significant contributions to the utilization of marine resources. In the design of marine structures, there is a hybrid reliability problem arising from aleatory uncertainty and epistemic uncertainty. In many cases, [...] Read more.
In recent years, marine structures have been widely used in the world, making significant contributions to the utilization of marine resources. In the design of marine structures, there is a hybrid reliability problem arising from aleatory uncertainty and epistemic uncertainty. In many cases, epistemic uncertainty is estimated by interval parameters. Traditional methods for hybrid reliability analysis usually require a nested optimization framework, which will lead to too many calls to the limit state function (LSF) and result in poor computational efficiency. In response to this problem, this paper proposes a de-nesting hybrid reliability analysis method creatively. Firstly, it uses the p-box model to describe the epistemic uncertainty variables, and then the linear approximation (LA) model and the two-point adaptive nonlinear approximation (TANA) model are combined to approximate the upper and lower bounds of LSF with epistemic uncertainty. Based on the first-order reliability method (FORM), an iterative operation is used to obtain the interval of the non-probability hybrid reliability index. The traditional nested optimization structure is effectively eliminated by the above approximation method, which efficiently reduces the times of LSF calls and increases the calculation speed while preserving sufficient accuracy. Finally, one numerical example and two engineering examples are provided to show the greater effectiveness of this method than the traditional nested optimization method. Full article
Show Figures

Figure 1

Figure 1
<p>The upper and lower limits of the CDF of the p-box variable <math display="inline"><semantics> <mi>y</mi> </semantics></math>.</p>
Full article ">Figure 2
<p>The surfaces of the non-probability hybrid LSF.</p>
Full article ">Figure 3
<p>Comparison between LA model and TANA model at the design point <span class="html-italic">x</span> = 1.</p>
Full article ">Figure 4
<p>Flow chart of the de-nesting analysis method of the non-probability hybrid reliability index.</p>
Full article ">Figure 5
<p>Upper and lower limits of the CDF of the p-box variable in numerical examples. (<b>a</b>) Normal distribution variable <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="normal">y</mi> <mn>1</mn> </msub> </mrow> </semantics></math>; (<b>b</b>) normal distribution variable <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="normal">y</mi> <mn>2</mn> </msub> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>Comparison of the method of this paper with the traditional nested method when calculating the (<b>a</b>) min reliability index <math display="inline"><semantics> <mrow> <msup> <mi>β</mi> <mi>L</mi> </msup> </mrow> </semantics></math> and the (<b>b</b>) max reliability index <math display="inline"><semantics> <mrow> <msup> <mi>β</mi> <mi>R</mi> </msup> </mrow> </semantics></math>.</p>
Full article ">Figure 7
<p>Upper and lower limits of the reliability index for epistemic uncertain variables with different degrees of uncertainty (method of this paper and the traditional nested method).</p>
Full article ">Figure 8
<p>Schematic view of stiffened panel subjected to longitudinal compression and its geometric parameters.</p>
Full article ">Figure 9
<p>(<b>a</b>) Schematic view of the cross-section of a ship hull girder; (<b>b</b>) cross-sectional stress distribution of the ship hull girder in the sagging limit state, as assumed in this paper.</p>
Full article ">
23 pages, 10799 KiB  
Article
OMAD-6: Advancing Offshore Mariculture Monitoring with a Comprehensive Six-Type Dataset and Performance Benchmark
by Zewen Mo, Yinyu Liang, Yulin Chen, Yanyun Shen, Minduan Xu, Zhipan Wang and Qingling Zhang
Remote Sens. 2024, 16(23), 4522; https://doi.org/10.3390/rs16234522 - 2 Dec 2024
Viewed by 487
Abstract
Offshore mariculture is critical for global food security and economic development. Advances in deep learning and data-driven approaches, enable the rapid and effective monitoring of offshore mariculture distribution and changes. However, detector performance depends heavily on training data quality. The lack of standardized [...] Read more.
Offshore mariculture is critical for global food security and economic development. Advances in deep learning and data-driven approaches, enable the rapid and effective monitoring of offshore mariculture distribution and changes. However, detector performance depends heavily on training data quality. The lack of standardized classifications and public datasets for offshore mariculture facilities currently hampers effective monitoring. Here, we propose to categorize offshore mariculture facilities into six types: TCC, DWCC, FRC, LC, RC, and BC. Based on these categories, we introduce a benchmark dataset called OMAD-6. This dataset includes over 130,000 instances and more than 16,000 high-resolution remote sensing images. The images with a spatial resolution of 0.6 m were sourced from key regions in China, Chile, Norway, and Egypt, from the Google Earth platform. All instances in OMAD-6 were meticulously annotated manually with horizontal bounding boxes and polygons. Compared to existing remote sensing datasets, OMAD-6 has three notable characteristics: (1) it is comparable to large, published datasets in instances per category, image quantity, and sample coverage; (2) it exhibits high inter-class similarity; (3) it shows significant intra-class diversity in facility sizes and arrangements. Based on the OMAD-6 dataset, we evaluated eight state-of-the-art methods to establish baselines for future research. The experimental results demonstrate that the OMAD-6 dataset effectively represents various real-world scenarios, which have posed considerable challenges for current instance segmentation algorithms. Our evaluation confirms that the OMAD-6 dataset has the potential to improve offshore mariculture identification. Notably, the QueryInst and PointRend algorithms have distinguished themselves as top performers on the OMAD-6 dataset, robustly identifying offshore mariculture facilities even with complex environmental backgrounds. Its ongoing development and application will play a pivotal role in future offshore mariculture identification and management. Full article
Show Figures

Figure 1

Figure 1
<p>Samples of each category in the OMAD-6 dataset. (<b>a</b>–<b>f</b>) are in the order of TCC, DWCC, FRC, LC, RC, and BC.</p>
Full article ">Figure 2
<p>Visualization of representative OMAD-6 dataset annotations; (<b>a</b>–<b>f</b>) are in the order of TCC, DWCC, FRC, LC, RC, and BC.</p>
Full article ">Figure 3
<p>Geographic sources of OMAD-6 dataset: coastal China, Chile, Norway, and Nile Basin, Egypt.</p>
Full article ">Figure 4
<p>High inter-class similarity. (<b>a</b>–<b>c</b>) are in the order of BC vs. FRC, RC vs. LC, and TCC vs. FRC.</p>
Full article ">Figure 5
<p>Object size distribution of each class.</p>
Full article ">Figure 6
<p>High intra-class diversity in the dataset. (<b>a</b>–<b>f</b>) are in the order of TCC, DWCC, FRC, LC, RC and BC.</p>
Full article ">Figure 7
<p>The mask (<math display="inline"><semantics> <mrow> <mi mathvariant="normal">I</mi> <mi mathvariant="normal">o</mi> <mi mathvariant="normal">U</mi> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>.75) confusion matrices of the three instance segmentation methods on the OMAD-6 datasets. (<b>a</b>–<b>c</b>) are in order Querylnst, PointRend, and CondInst.</p>
Full article ">Figure 8
<p>Detection results from the three methods: (<b>a</b>–<b>d</b>) are in the order of Ground Truth, Querylnst, PointRend, and CondInst.</p>
Full article ">Figure 9
<p>Comparison of misdetections: (<b>b</b>) QueryInst and (<b>d</b>) PointRend vs. (<b>a</b>,<b>c</b>) Ground Truth.</p>
Full article ">
21 pages, 3849 KiB  
Article
CCW-YOLO: A Modified YOLOv5s Network for Pedestrian Detection in Complex Traffic Scenes
by Zhaodi Wang, Shuqiang Yang, Huafeng Qin, Yike Liu and Jinyan Ding
Information 2024, 15(12), 762; https://doi.org/10.3390/info15120762 - 1 Dec 2024
Viewed by 737
Abstract
In traffic scenes, pedestrian target detection faces significant issues of misdetection and omission due to factors such as crowd density and obstacle occlusion. To address these challenges and enhance detection accuracy, we propose an improved CCW-YOLO algorithm. The algorithm first introduces a lightweight [...] Read more.
In traffic scenes, pedestrian target detection faces significant issues of misdetection and omission due to factors such as crowd density and obstacle occlusion. To address these challenges and enhance detection accuracy, we propose an improved CCW-YOLO algorithm. The algorithm first introduces a lightweight convolutional layer using GhostConv and incorporates an enhanced C2f module to improve the network’s detection performance. Additionally, it integrates the Coordinate Attention module to better capture key points of the targets. Next, the bounding box loss function CIoU loss at the output of YOLOv5 is replaced with WiseIoU loss to enhance adaptability to various detection scenarios, thereby further improving accuracy. Finally, we develop a pedestrian count detection system using PyQt5 to enhance human–computer interaction. Experimental results on the INRIA public dataset showed that our algorithm achieved a detection accuracy of 98.4%, representing a 10.1% improvement over the original YOLOv5s algorithm. This advancement significantly enhances the detection of small objects in images and effectively addresses misdetection and omission issues in complex environments. These findings have important practical implications for ensuring traffic safety and optimizing traffic flow. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>YOLOv5s network structure.</p>
Full article ">Figure 2
<p>Improved CCW-YOLO network structure.</p>
Full article ">Figure 3
<p>Comparison of ghost convolution and normal convolution. (<b>a</b>) Normal convolution; (<b>b</b>) ghost convolution.</p>
Full article ">Figure 4
<p>Comparison of C3 and C2f network structures.</p>
Full article ">Figure 5
<p>Coordinate attention structure.</p>
Full article ">Figure 6
<p>WiseIoU calculation block diagram.</p>
Full article ">Figure 7
<p>Comparison of pedestrian detection results between the original and improved algorithms: (<b>a</b>) YOLOv5s; (<b>b</b>) CCW-YOLO.</p>
Full article ">Figure 8
<p>Pedestrian count detection system interface: (<b>a</b>) time period 1; (<b>b</b>) time period 2.</p>
Full article ">
17 pages, 6810 KiB  
Article
Breast Tumor Detection and Diagnosis Using an Improved Faster R-CNN in DCE-MRI
by Haitian Gui, Han Jiao, Li Li, Xinhua Jiang, Tao Su and Zhiyong Pang
Bioengineering 2024, 11(12), 1217; https://doi.org/10.3390/bioengineering11121217 - 1 Dec 2024
Viewed by 598
Abstract
AI-based breast cancer detection can improve the sensitivity and specificity of detection, especially for small lesions, which has clinical value in realizing early detection and treatment so as to reduce mortality. The two-stage detection network performs well; however, it adopts an imprecise ROI [...] Read more.
AI-based breast cancer detection can improve the sensitivity and specificity of detection, especially for small lesions, which has clinical value in realizing early detection and treatment so as to reduce mortality. The two-stage detection network performs well; however, it adopts an imprecise ROI during classification, which can easily include surrounding tumor tissues. Additionally, fuzzy noise is a significant contributor to false positives. We adopted Faster RCNN as the architecture, introduced ROI aligning to minimize quantization errors and feature pyramid network (FPN) to extract different resolution features, added a bounding box quadratic regression feature map extraction network and three convolutional layers to reduce interference from tumor surrounding information, and extracted more accurate and deeper feature maps. Our approach outperformed Faster R-CNN, Mask R-CNN, and YOLOv9 in breast cancer detection across 485 internal cases. We achieved superior performance in mAP, sensitivity, and false positive rate ((0.752, 0.950, 0.133) vs. (0.711, 0.950, 0.200) vs. (0.718, 0.880, 0.120) vs. (0.658, 0.680, 405)), which represents a 38.5% reduction in false positives compared to manual detection. Additionally, in a public dataset of 220 cases, our model also demonstrated the best performance. It showed improved sensitivity and specificity, effectively assisting doctors in diagnosing cancer. Full article
Show Figures

Figure 1

Figure 1
<p>Flowchart of the study procedure.</p>
Full article ">Figure 2
<p>The architecture of our proposed model BC R-CNN.</p>
Full article ">Figure 3
<p>PDN structure.</p>
Full article ">Figure 4
<p>Four-quadrant location.</p>
Full article ">Figure 5
<p>Background noise reduction: (<b>a</b>) MRI before noise reduction; (<b>b</b>) MRI after noise reduction; (<b>c</b>) segmented breast of original MRI; (<b>d</b>) segmented breast of noise reduction MRI.</p>
Full article ">Figure 6
<p>U-Net++ breast edge segmented: (<b>a</b>) sagittal breast MRI and single breast MRI at axial plane; (<b>b</b>) masks; (<b>c</b>) segmented breasts.</p>
Full article ">Figure 7
<p>AUC performance comparison of different models: (<b>a</b>) internal dataset, (<b>b</b>) public dataset.</p>
Full article ">Figure 8
<p>Breast tumor location and diagnosis comparison of Faster R-CNN and our proposed model: (<b>a1</b>,<b>b1</b>,<b>c1</b>) detected by Faster R-CNN; (<b>a2</b>,<b>b2</b>,<b>c2</b>) detected by our proposed model. (<b>a1</b>,<b>b1</b>) false positive; (<b>c1</b>) diagnosed with a lower score; (<b>c2</b>) diagnosed at a higher score.</p>
Full article ">
27 pages, 4935 KiB  
Article
Diverse Dataset for Eyeglasses Detection: Extending the Flickr-Faces-HQ (FFHQ) Dataset
by Dalius Matuzevičius
Sensors 2024, 24(23), 7697; https://doi.org/10.3390/s24237697 - 1 Dec 2024
Viewed by 539
Abstract
Facial analysis is an important area of research in computer vision and machine learning, with applications spanning security, healthcare, and user interaction systems. The data-centric AI approach emphasizes the importance of high-quality, diverse, and well-annotated datasets in driving advancements in this field. However, [...] Read more.
Facial analysis is an important area of research in computer vision and machine learning, with applications spanning security, healthcare, and user interaction systems. The data-centric AI approach emphasizes the importance of high-quality, diverse, and well-annotated datasets in driving advancements in this field. However, current facial datasets, such as Flickr-Faces-HQ (FFHQ), lack detailed annotations for detecting facial accessories, particularly eyeglasses. This work addresses this limitation by extending the FFHQ dataset with precise bounding box annotations for eyeglasses detection, enhancing its utility for data-centric AI applications. The extended dataset comprises 70,000 images, including over 16,000 images containing eyewear, and it exceeds the CelebAMask-HQ dataset in size and diversity. A semi-automated protocol was employed to efficiently generate accurate bounding box annotations, minimizing the demand for extensive manual labeling. This enriched dataset serves as a valuable resource for training and benchmarking eyewear detection models. Additionally, the baseline benchmark results for eyeglasses detection were presented using deep learning methods, including YOLOv8 and MobileNetV3. The evaluation, conducted through cross-dataset validation, demonstrated the robustness of models trained on the extended FFHQ dataset with their superior performances over existing alternative CelebAMask-HQ. The extended dataset, which has been made publicly available, is expected to support future research and development in eyewear detection, contributing to advancements in facial analysis and related fields. Full article
Show Figures

Figure 1

Figure 1
<p>Heatmaps of the eyewear bounding box (BBox) locations in the Flickr-Faces-HQ (FFHQ) (BBox labels are the result of this research) (<b>a</b>,<b>c</b>) and CelebAMask-HQ (<b>b</b>,<b>d</b>) datasets: (<b>a</b>,<b>b</b>) 2D heatmaps; (<b>c</b>,<b>d</b>) 3D heatmaps. Both heatmaps show the bounding box locations of glasses, where the average images of individuals wearing glasses from each dataset are superimposed. The heatmaps are represented in log-scale intensity, and they show that the FFHQ dataset exhibits greater variability in the locations of glasses compared to the CelebAMask-HQ dataset. CelebAMask-HQ originally contained glasses segmentation masks that were used to generate the bounding box labels.</p>
Full article ">Figure 2
<p>Distributions of the eyeglasses bounding box (BBox) area. Plots (<b>a</b>,<b>b</b>) show the distribution of eyeglasses BBox areas in the FFHQ dataset, while (<b>c</b>) represents the distribution for the CelebAMask-HQ dataset. Plot (<b>b</b>) differs from (<b>a</b>) as it illustrates the BBox area distribution for a subset of the FFHQ dataset used in baseline model experiments, excluding extremely small glasses objects. BBox areas were computed for <math display="inline"><semantics> <mrow> <mn>1024</mn> <mo>×</mo> <mn>1024</mn> </mrow> </semantics></math> resolution images.</p>
Full article ">Figure 3
<p>Co-occurrence plot of the largest and second-largest eyeglasses object areas in the same image. The plot visualizes the relationship between the areas of the largest and second-largest eyeglasses objects within images that contain more than one instance of eyeglasses. This provides insights into the relative sizes of multiple glasses in the same image, offering a deeper understanding of the distribution of the object sizes in such cases.</p>
Full article ">Figure 4
<p>Confusion matrices showing the discrepancies in glasses attributes between the FFHQ dataset extensions. The figure presents confusion matrices that compare the glasses attribute (glasses or no glasses) across three FFHQ dataset extensions: Glasses Detection, Aging, and Features. Plots (<b>a</b>,<b>b</b>) compare Glasses Detection with Aging, while (<b>c</b>,<b>d</b>) compare Glasses Detection with Features. Plot (<b>e</b>) compares Aging with Features. In plots (<b>b</b>,<b>d</b>), the Glasses Detection subset without very small glasses objects is used as the Aging and Features datasets typically label glasses on the central face in the image. For comparisons involving the Features dataset, only instances with available Features labels are included. The green font or background represents correct results, the red font or background represents incorrect results.</p>
Full article ">Figure 5
<p>Confusion matrix showing the gender attribute discrepancies between the Aging and Features FFHQ dataset extensions. The confusion matrix illustrates the discrepancies in gender labeling (female or male) between the Aging and Features extensions, considering only images that have glasses labels in the Glasses Detection extension. The green font or background represents correct results, the red font or background represents incorrect results.</p>
Full article ">Figure 6
<p>Age distributions in the Aging and Features extensions of the FFHQ dataset. Plots present the age distributions across the Aging and Features extensions, considering only images with glasses labels from the Glasses Detection extension (except for (<b>d</b>)): (<b>a</b>) shows the age distribution in the Aging extension, with labels grouped according to the original intervals in the dataset; (<b>b</b>) presents the age distribution in the Features extension, with labels grouped similarly to those in the Aging extension; (<b>c</b>) displays the age distribution in the Features extension with labels grouped into equal intervals; and (<b>d</b>,<b>e</b>) are scatter plots comparing ages between the Aging and Features datasets, with (<b>d</b>) including all images and (<b>e</b>) only focusing on images where glasses are present in the Glasses Detection extension (age binning was based on groups provided by the Aging dataset with noise added to the bin IDs to prevent instance overlap in plots).</p>
Full article ">Figure 7
<p>Scatter plots comparing the head pose values—yaw (<b>a</b>), pitch (<b>b</b>), and roll (<b>c</b>)—between the Aging and Features extensions of the FFHQ dataset. Only images with glasses labels from the Glasses Detection extension were included in this analysis. The plots provide insights into the alignment and discrepancies in the head pose annotations across the two dataset extensions.</p>
Full article ">Figure 8
<p>Flowchart of the semi-manual labeling protocol for eyeglasses detection. The protocol iteratively refines bounding box (BBox) annotations by integrating outputs from multiple detection models, including DETIC, GroundingDINO, and OWLv2, with additional dataset features. The process ensures a systematic progression from simpler cases requiring minimal manual correction to more complex cases necessitating greater human intervention. Iterative refinement incorporates model consensus and additional data sources, and it is supported by thresholds designed to select a subset of images for manual review at each step.</p>
Full article ">Figure 9
<p>Overall block diagram of the experiment.</p>
Full article ">
Back to TopTop