[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (6,003)

Search Parameters:
Keywords = computer architecture

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 4020 KiB  
Article
Power Converter Fault Detection Using MLCA–SpikingShuffleNet
by Li Wang, Feiyang Zhu, Fengfan Jiang and Yuwei Yang
World Electr. Veh. J. 2025, 16(1), 36; https://doi.org/10.3390/wevj16010036 (registering DOI) - 12 Jan 2025
Abstract
With the widespread adoption of electric vehicles, the power converter, as a key component, plays a crucial role. Traditional fault detection methods often face challenges in real-time performance and computational efficiency, making it difficult to meet the demands of electric vehicle power converters [...] Read more.
With the widespread adoption of electric vehicles, the power converter, as a key component, plays a crucial role. Traditional fault detection methods often face challenges in real-time performance and computational efficiency, making it difficult to meet the demands of electric vehicle power converters for efficient and accurate fault diagnosis. To address this challenge, this paper proposes a novel fault detection model—SpikingShuffleNet. This paper first designs an efficient SpikingShuffle Unit that integrates grouped convolutions and channel shuffle techniques, effectively reducing the model’s computational complexity by optimizing feature extraction and channel interaction. Next, by appropriately stacking SpikingShuffle Units and refining the network architecture, a complete lightweight diagnostic network is constructed for real-time fault detection in electric vehicle power converters. Finally, the Mixed Local Channel Attention mechanism is introduced to address the potential limitations in feature representation caused by grouped convolutions, further enhancing fault detection accuracy and robustness by balancing local detail preservation and global feature integration. Experimental results show that SpikingShuffleNet exhibits excellent accuracy and robustness in the fault detection task for power converters, fulfilling the real-time fault diagnosis requirements for low-power embedded devices. Full article
(This article belongs to the Special Issue Power Electronics for Electric Vehicles)
32 pages, 1290 KiB  
Review
Vision Transformers for Image Classification: A Comparative Survey
by Yaoli Wang, Yaojun Deng, Yuanjin Zheng, Pratik Chattopadhyay and Lipo Wang
Technologies 2025, 13(1), 32; https://doi.org/10.3390/technologies13010032 (registering DOI) - 12 Jan 2025
Viewed by 76
Abstract
Transformers were initially introduced for natural language processing, leveraging the self-attention mechanism. They require minimal inductive biases in their design and can function effectively as set-based architectures. Additionally, transformers excel at capturing long-range dependencies and enabling parallel processing, which allows them to outperform [...] Read more.
Transformers were initially introduced for natural language processing, leveraging the self-attention mechanism. They require minimal inductive biases in their design and can function effectively as set-based architectures. Additionally, transformers excel at capturing long-range dependencies and enabling parallel processing, which allows them to outperform traditional models, such as long short-term memory (LSTM) networks, on sequence-based tasks. In recent years, transformers have been widely adopted in computer vision, driving remarkable advancements in the field. Previous surveys have provided overviews of transformer applications across various computer vision tasks, such as object detection, activity recognition, and image enhancement. In this survey, we focus specifically on image classification. We begin with an introduction to the fundamental concepts of transformers and highlight the first successful Vision Transformer (ViT). Building on the ViT, we review subsequent improvements and optimizations introduced for image classification tasks. We then compare the strengths and limitations of these transformer-based models against classic convolutional neural networks (CNNs) through experiments. Finally, we explore key challenges and potential future directions for image classification transformers. Full article
(This article belongs to the Collection Review Papers Collection for Advanced Technologies)
24 pages, 2827 KiB  
Article
RWA-BFT: Reputation-Weighted Asynchronous BFT for Large-Scale IoT
by Guanwei Jia, Zhaoyu Shen, Hongye Sun, Jingbo Xin and Dongyu Wang
Sensors 2025, 25(2), 413; https://doi.org/10.3390/s25020413 (registering DOI) - 12 Jan 2025
Viewed by 138
Abstract
This paper introduces RWA-BFT, a reputation-weighted asynchronous Byzantine Fault Tolerance (BFT) consensus algorithm designed to address the scalability and performance challenges of blockchain systems in large-scale IoT scenarios. Traditional centralized IoT architectures often face issues such as single points of failure and insufficient [...] Read more.
This paper introduces RWA-BFT, a reputation-weighted asynchronous Byzantine Fault Tolerance (BFT) consensus algorithm designed to address the scalability and performance challenges of blockchain systems in large-scale IoT scenarios. Traditional centralized IoT architectures often face issues such as single points of failure and insufficient reliability, while blockchain, with its decentralized and tamper-resistant properties, offers a promising solution. However, existing blockchain consensus mechanisms struggle to meet the high throughput, low latency, and scalability demands of IoT applications. To address these limitations, RWA-BFT adopts a two-layer blockchain architecture; the first layer leverages reputation-based filtering to reduce computational complexity by excluding low-reputation nodes, while the second layer employs an asynchronous consensus mechanism to ensure efficient and secure communication among high-reputation nodes, even under network delays. This dual-layer design significantly improves performance, achieving higher throughput, lower latency, and enhanced scalability, while maintaining strong fault tolerance even in the presence of a substantial proportion of malicious nodes. Experimental results demonstrate that RWA-BFT outperforms HB-BFT and PBFT algorithms, making it a scalable and secure blockchain solution for decentralized IoT applications. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

Figure 1
<p>Schematic diagram of the block structure.</p>
Full article ">Figure 2
<p>Schematic diagram of the system model.</p>
Full article ">Figure 3
<p>Schematic diagram of Reputation-Weighted Asynchronous BFT.</p>
Full article ">Figure 4
<p>Consensus Node Election Model Diagram.</p>
Full article ">Figure 5
<p>Throughput comparison between RWA-BFT, PBFT, and HB-BFT.</p>
Full article ">Figure 6
<p>Comparison of the latency between RWA-BFT, PBFT, and HB-BFT when the number of nodes continues to increase.</p>
Full article ">Figure 7
<p>Comparison of the latency between RWA-BFT, PBFT, and HB-BFT when the number of nodes continues to increase under different proportions of malicious nodes.</p>
Full article ">
22 pages, 6310 KiB  
Article
The Multivariate Fusion Distribution Characteristics in Physician Demand Prediction
by Jiazhen Zhang, Wei Chen and Xiulai Wang
Mathematics 2025, 13(2), 233; https://doi.org/10.3390/math13020233 (registering DOI) - 11 Jan 2025
Viewed by 251
Abstract
Aiming at the optimization of the big data infrastructure in China’s healthcare system, this study proposes a lightweight time series physician demand prediction model, which is especially suitable for the field of telemedicine. The model incorporates multi-head attention mechanisms and generates statistical information, [...] Read more.
Aiming at the optimization of the big data infrastructure in China’s healthcare system, this study proposes a lightweight time series physician demand prediction model, which is especially suitable for the field of telemedicine. The model incorporates multi-head attention mechanisms and generates statistical information, which significantly improves the ability to process nonlinear data, adapt to different data sources, improve the computational efficiency, and process high-dimensional features. By combining variational autoencoders and LSTM units, the model can effectively capture complex nonlinear relationships and long-term dependencies, and the multi-head attention mechanism overcomes the limitations of traditional algorithms. This lightweight architecture design not only improves the computational efficiency but also enhances the stability in high-dimensional data processing and reduces feature redundancy by combining the normalization process with statistics. The experimental results show that the model has wide applicability and excellent performance in a telemedicine consulting service system. Full article
Show Figures

Figure 1

Figure 1
<p>Theoretical logic structure.</p>
Full article ">Figure 2
<p>Lightweight medical demand forecasting model embedded with enhanced mixed factor distribution characteristics.</p>
Full article ">Figure 3
<p>Generation model for statistical parameters of mixed distribution based on VAE.</p>
Full article ">Figure 4
<p>Convolutional stacked LSTM network architecture.</p>
Full article ">Figure 5
<p>Prediction results of CNN–Multi-Head Attention–Multi-LSTM (heads = 2, dims = 90).</p>
Full article ">Figure 6
<p>Prediction results of CNN–Multi-Head Attention–Multi-LSTM (heads = 3, dims = 90).</p>
Full article ">Figure 7
<p>Prediction results of CNN–Multi-Head Attention–Multi-LSTM (heads = 4, dims = 120).</p>
Full article ">Figure 8
<p>Prediction results of CNN–Multi-Head Attention–Multi-LSTM (heads = 5, dims = 100).</p>
Full article ">Figure 9
<p>Comparison of training errors of undetermined parameter combinations.</p>
Full article ">Figure 10
<p>Performance variations in CNN–Multi-Head Attention–Multi-LSTM due to feature fusion.</p>
Full article ">Figure 11
<p>Comparative analysis of performance across multiple models.</p>
Full article ">Figure 12
<p>Bandwidth allocation comparison.</p>
Full article ">Figure 13
<p>Dataset visualization.</p>
Full article ">Figure 14
<p>Presentation of model generalization performance metrics.</p>
Full article ">
20 pages, 1849 KiB  
Article
Speech Emotion Recognition Model Based on Joint Modeling of Discrete and Dimensional Emotion Representation
by John Lorenzo Bautista and Hyun Soon Shin
Appl. Sci. 2025, 15(2), 623; https://doi.org/10.3390/app15020623 - 10 Jan 2025
Viewed by 247
Abstract
This paper introduces a novel joint model architecture for Speech Emotion Recognition (SER) that integrates both discrete and dimensional emotional representations, allowing for the simultaneous training of classification and regression tasks to improve the comprehensiveness and interpretability of emotion recognition. By employing a [...] Read more.
This paper introduces a novel joint model architecture for Speech Emotion Recognition (SER) that integrates both discrete and dimensional emotional representations, allowing for the simultaneous training of classification and regression tasks to improve the comprehensiveness and interpretability of emotion recognition. By employing a joint loss function that combines categorical and regression losses, the model ensures balanced optimization across tasks, with experiments exploring various weighting schemes using a tunable parameter to adjust task importance. Two adaptive weight balancing schemes, Dynamic Weighting and Joint Weighting, further enhance performance by dynamically adjusting task weights based on optimization progress and ensuring balanced emotion representation during backpropagation. The architecture employs parallel feature extraction through independent encoders, designed to capture unique features from multiple modalities, including Mel-frequency Cepstral Coefficients (MFCC), Short-term Features (STF), Mel-spectrograms, and raw audio signals. Additionally, pre-trained models such as Wav2Vec 2.0 and HuBERT are integrated to leverage their robust latent features. The inclusion of self-attention and co-attention mechanisms allows the model to capture relationships between input modalities and interdependencies among features, further improving its interpretability and integration capabilities. Experiments conducted on the IEMOCAP dataset using a leave-one-subject-out approach demonstrate the model’s effectiveness, with results showing a 1–2% accuracy improvement over classification-only models. The optimal configuration, incorporating the joint architecture, dynamic weighting, and parallel processing of multimodal features, achieves a weighted accuracy of 72.66%, an unweighted accuracy of 73.22%, and a mean Concordance Correlation Coefficient (CCC) of 0.3717. These results validate the effectiveness of the proposed joint model architecture and adaptive balancing weight schemes in improving SER performance. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Plutchik’s Wheel of Emotion.</p>
Full article ">Figure 2
<p>(<b>a</b>) Russel’s Circumplex Model, (<b>b</b>) Mehrabian’s PAD Model.</p>
Full article ">Figure 3
<p>Overview of the proposed joint model architecture.</p>
Full article ">Figure 4
<p>Joint model block diagram.</p>
Full article ">
20 pages, 279 KiB  
Article
A Survey on Hardware Accelerators for Large Language Models
by Christoforos Kachris
Appl. Sci. 2025, 15(2), 586; https://doi.org/10.3390/app15020586 - 9 Jan 2025
Viewed by 298
Abstract
Large language models (LLMs) have emerged as powerful tools for natural language processing tasks, revolutionizing the field with their ability to understand and generate human-like text. As the demand for more sophisticated LLMs continues to grow, there is a pressing need to address [...] Read more.
Large language models (LLMs) have emerged as powerful tools for natural language processing tasks, revolutionizing the field with their ability to understand and generate human-like text. As the demand for more sophisticated LLMs continues to grow, there is a pressing need to address the computational challenges associated with their scale and complexity. This paper presents a comprehensive survey of hardware accelerators designed to enhance the performance and energy efficiency of large language models. By examining a diverse range of accelerators, including GPUs, FPGAs, and custom-designed architectures, we explore the landscape of hardware solutions tailored to meet the unique computational demands of LLMs. The survey encompasses an in-depth analysis of architecture, performance metrics, and energy efficiency considerations, providing valuable insights for researchers, engineers, and decision-makers aiming to optimize the deployment of LLMs in real-world applications. Full article
(This article belongs to the Special Issue Applied Intelligence in Natural Language Processing)
Show Figures

Figure 1

Figure 1
<p>Speedup versus energy efficiency.</p>
Full article ">
23 pages, 1432 KiB  
Article
ASDNet: An Efficient Self-Supervised Convolutional Network for Anomalous Sound Detection
by Dewei Kong, Guoshun Yuan, Hongjiang Yu, Shuai Wang and Bo Zhang
Appl. Sci. 2025, 15(2), 584; https://doi.org/10.3390/app15020584 - 9 Jan 2025
Viewed by 271
Abstract
Anomalous Sound Detection (ASD) is crucial for ensuring industrial equipment safety and enhancing production efficiency. However, existing methods, while pursuing high detection accuracy, are often associated with high computational complexity, making them unsuitable for resource-constrained environments. This study proposes an efficient self-supervised ASD [...] Read more.
Anomalous Sound Detection (ASD) is crucial for ensuring industrial equipment safety and enhancing production efficiency. However, existing methods, while pursuing high detection accuracy, are often associated with high computational complexity, making them unsuitable for resource-constrained environments. This study proposes an efficient self-supervised ASD framework that integrates spectral features, lightweight neural networks, and various anomaly scoring methods. Unlike traditional Log-Mel features, spectral features retain richer frequency domain details, providing high-quality inputs that enhance detection accuracy. The framework includes two network architectures: the lightweight ASDNet, optimized for resource-limited scenarios, and SpecMFN, which combines SpecNet and MobileFaceNet for advanced feature extraction and classification. These architectures employ various anomaly scoring methods, enabling complex decision boundaries to effectively detect diverse anomalous patterns. Experimental results demonstrate that ASDNet achieves an average AUC of 94.42% and a pAUC of 87.18%, outperforming existing methods by 6.75% and 9.34%, respectively, while significantly reducing FLOPs (85.4 M, a 93.81% reduction) and parameters (0.51 M, a 41.38% reduction). SpecMFN achieves AUC and pAUC values of 94.36% and 88.60%, respectively, with FLOPs reduced by 86.6%. These results highlight the framework’s ability to balance performance and computational efficiency, making it a robust and practical solution for ASD tasks in industrial and resource-constrained environments. Full article
Show Figures

Figure 1

Figure 1
<p>The framework of the proposed method for anomalous sound detection.</p>
Full article ">Figure 2
<p>Architectural designs of the proposed classification networks for ASD. (<b>a</b>) SpecMFN: Combines SpecNet for further spectral feature processing and MobileFaceNet for classification, optimized for high-sensitivity detection. (<b>b</b>) ASDNet: A lightweight network tailored for resource-constrained environments, featuring eight layers of 1D convolution and ReLU activation to efficiently extract latent features while maintaining a balance between performance and efficiency.</p>
Full article ">Figure 3
<p>Visualization of K-Means clustering results for <span class="html-italic">valve_id_04</span> (<math display="inline"><semantics> <mrow> <mi>k</mi> <mo>=</mo> <mn>4</mn> </mrow> </semantics></math>).</p>
Full article ">Figure 4
<p>Comparison of the model size and complexity (number of parameters and Flops), where the model size is illustrated by the size of the circles, and the number of Flops is presented in logarithmic scale.</p>
Full article ">Figure 5
<p>Comparison of AUC and pAUC for different distance metrics used in the LOF algorithm. The y-axis represents the performance percentage, and the x-axis represents the different distance metrics.</p>
Full article ">
20 pages, 7167 KiB  
Article
Accelerating Deep Learning-Based Morphological Biometric Recognition with Field-Programmable Gate Arrays
by Nourhan Zayed, Nahed Tawfik, Mervat M. A. Mahmoud, Ahmed Fawzy, Young-Im Cho and Mohamed S. Abdallah
AI 2025, 6(1), 8; https://doi.org/10.3390/ai6010008 - 9 Jan 2025
Viewed by 361
Abstract
Convolutional neural networks (CNNs) are increasingly recognized as an important and potent artificial intelligence approach, widely employed in many computer vision applications, such as facial recognition. Their importance resides in their capacity to acquire hierarchical features, which is essential for recognizing complex patterns. [...] Read more.
Convolutional neural networks (CNNs) are increasingly recognized as an important and potent artificial intelligence approach, widely employed in many computer vision applications, such as facial recognition. Their importance resides in their capacity to acquire hierarchical features, which is essential for recognizing complex patterns. Nevertheless, the intricate architectural design of CNNs leads to significant computing requirements. To tackle these issues, it is essential to construct a system based on field-programmable gate arrays (FPGAs) to speed up CNNs. FPGAs provide fast development capabilities, energy efficiency, decreased latency, and advanced reconfigurability. A facial recognition solution by leveraging deep learning and subsequently deploying it on an FPGA platform is suggested. The system detects whether a person has the necessary authorization to enter/access a place. The FPGA is responsible for processing this system with utmost security and without any internet connectivity. Various facial recognition networks are accomplished, including AlexNet, ResNet, and VGG-16 networks. The findings of the proposed method prove that the GoogLeNet network is the best fit due to its lower computational resource requirements, speed, and accuracy. The system was deployed on three hardware kits to appraise the performance of different programming approaches in terms of accuracy, latency, cost, and power consumption. The software programming on the Raspberry Pi-3B kit had a recognition accuracy of around 70–75% and relied on a stable internet connection for processing. This dependency on internet connectivity increases bandwidth consumption and fails to meet the required security criteria, contrary to ZYBO-Z7 board hardware programming. Nevertheless, the hardware/software co-design on the PYNQ-Z2 board achieved an accuracy rate of 85% to 87%. It operates independently of an internet connection, making it a standalone system and saving costs. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

Figure 1
<p>AT &amp; T dataset (downloaded from <a href="https://www.kaggle.com/datasets/kasikrit/att-database-of-faces" target="_blank">https://www.kaggle.com/datasets/kasikrit/att-database-of-faces</a>; accessed on 13 October 2024).</p>
Full article ">Figure 2
<p>PYNQ-Z2.</p>
Full article ">Figure 3
<p>Zybo Z7-20 Zynq-7000 SoC development board.</p>
Full article ">Figure 4
<p>Raspberry Pi 3 Model B.</p>
Full article ">Figure 5
<p>OV7670 camera module.</p>
Full article ">Figure 6
<p>A flowchart of the proposed model.</p>
Full article ">Figure 7
<p>Top level block diagram. The light blue block (3) is a regular IP, while blue blocks (1, 2, and 4) are hierarchy blocks, grouping IP blocks together. Block no. 1, named camera_in, is the original data producer. It groups together the IP blocks needed to decode image data coming from the camera and to format it to suit our needs. Block no. 2, named video_out, is the ultimate data consumer. It groups IP blocks doing DVI encoding, so that the image data can be displayed on a monitor. We are going to look at these two hierarchy blocks later. Block no. 3 is an actual IP, named axi_vdma. It is a Xilinx IP with the full name AXI Video Direct Memory Access. VDMA sits in the middle of the video data flow, and its central role makes it an interesting addition. It is needed to decouple two incompatible video interfaces, the image sensor’s MIPI CSI-2 and the monitor’s DVI.</p>
Full article ">Figure 8
<p>The hierarchy of the control block, which illustrates, the input, output, and control interfaces modelled in C/C++.</p>
Full article ">Figure 9
<p>AlexNet accuracy.</p>
Full article ">Figure 10
<p>AlexNet loss.</p>
Full article ">Figure 11
<p>ResNet18 accuracy.</p>
Full article ">Figure 12
<p>ResNet18 loss.</p>
Full article ">Figure 13
<p>Accuracy of the VGG16 network.</p>
Full article ">Figure 14
<p>Loss curve of the VGG16 network.</p>
Full article ">Figure 15
<p>GoogLeNet accuracy.</p>
Full article ">Figure 16
<p>GoogLeNet loss curve.</p>
Full article ">
16 pages, 1512 KiB  
Article
An End-To-End Speech Recognition Model for the North Shaanxi Dialect: Design and Evaluation
by Yi Qin and Feifan Yu
Sensors 2025, 25(2), 341; https://doi.org/10.3390/s25020341 - 9 Jan 2025
Viewed by 211
Abstract
The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as “Shapu”, characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end [...] Read more.
The coal mining industry in Northern Shaanxi is robust, with a prevalent use of the local dialect, known as “Shapu”, characterized by a distinct Northern Shaanxi accent. This study addresses the practical need for speech recognition in this dialect. We propose an end-to-end speech recognition model for the North Shaanxi dialect, leveraging the Conformer architecture. To tailor the model to the coal mining context, we developed a specialized corpus reflecting the phonetic characteristics of the dialect and its usage in the industry. We investigated feature extraction techniques suitable for the North Shaanxi dialect, focusing on the unique pronunciation of initial consonants and vowels. A preprocessing module was designed to accommodate the dialect’s rapid speech tempo and polyphonic nature, enhancing recognition performance. To enhance the decoder’s text generation capability, we replaced the Conformer decoder with a Transformer architecture. Additionally, to mitigate the computational demands of the model, we incorporated Connectionist Temporal Classification (CTC) joint training for optimization. The experimental results on our self-established voice dataset for the Northern Shaanxi coal mining industry demonstrate that the proposed Conformer–Transformer–CTC model achieves a 9.2% and 10.3% reduction in the word error rate compared to the standalone Conformer and Transformer models, respectively, confirming the advancement of our method. The next step will involve researching how to improve the performance of dialect speech recognition by integrating external language models and extracting pronunciation features of different dialects, thereby achieving better recognition results. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Figure 1
<p>The construction process of dialect corpora.</p>
Full article ">Figure 2
<p>End-to-end dialect Conformer–Transformer–CTC speech recognition system.</p>
Full article ">Figure 3
<p>Preprocessing module.</p>
Full article ">Figure 4
<p>Codec structure.</p>
Full article ">
18 pages, 2687 KiB  
Article
A Robust Blood Vessel Segmentation Technique for Angiographic Images Employing Multi-Scale Filtering Approach
by Agne Paulauskaite-Taraseviciene, Julius Siaulys, Antanas Jankauskas and Gabriele Jakuskaite
J. Clin. Med. 2025, 14(2), 354; https://doi.org/10.3390/jcm14020354 - 8 Jan 2025
Viewed by 379
Abstract
Background: This study focuses on the critical task of blood vessel segmentation in medical image analysis, essential for diagnosing cardiovascular diseases and enabling effective treatment planning. Although deep learning architectures often produce very high segmentation results in medical images, coronary computed tomography [...] Read more.
Background: This study focuses on the critical task of blood vessel segmentation in medical image analysis, essential for diagnosing cardiovascular diseases and enabling effective treatment planning. Although deep learning architectures often produce very high segmentation results in medical images, coronary computed tomography angiography (CTA) images are more challenging than invasive coronary angiography (ICA) images due to noise and the complexity of vessel structures. Methods: Classical architectures for medical images, such as U-Net, achieve only moderate accuracy, with an average Dice score of 0.722. Results: This study introduces Morpho-U-Net, an enhanced U-Net architecture that integrates advanced morphological operations, including Gaussian blurring, thresholding, and morphological opening/closing, to improve vascular integrity, reduce noise, and achieve a higher Dice score of 0.9108, a precision of 0.9341, and a recall of 0.8872. These enhancements demonstrate superior robustness to noise and intricate vessel geometries. Conclusions: This pre-processing filter effectively reduces noise by grouping neighboring pixels with similar intensity values, allowing the model to focus on relevant anatomical structures, thus outperforming traditional methods in handling the challenges posed by CTA images. Full article
Show Figures

Figure 1

Figure 1
<p>Examples of precise contouring challenges in medical imaging of blood vessels.</p>
Full article ">Figure 2
<p>Visual comparison of ICA images (<b>a</b>) [<a href="#B26-jcm-14-00354" class="html-bibr">26</a>], (<b>b</b>) [<a href="#B30-jcm-14-00354" class="html-bibr">30</a>], (<b>c</b>) [<a href="#B15-jcm-14-00354" class="html-bibr">15</a>], (<b>d</b>) [<a href="#B2-jcm-14-00354" class="html-bibr">2</a>], (<b>e</b>) vs. CTA (<b>f</b>), emphasizing the high-resolution and detailed visualization typical of ICA, in contrast to the noise and contrast limitations of non-invasive CTA images.</p>
Full article ">Figure 3
<p>DuckNet Architecture Overview.</p>
Full article ">Figure 4
<p>RCA vessel images from the same patient, highlighting variations despite identical vessel anatomy and patient characteristics.</p>
Full article ">Figure 5
<p>Examples of (<b>a</b>) ground truth and (<b>b</b>) model predictions for coronary artery blood vessel segmentations.</p>
Full article ">Figure 6
<p>Instances of annotated vessels including initial and repeated annotation and their differences.</p>
Full article ">Figure 7
<p>Boundary inaccuracies representing minor differences between the ground truth and predicted masks.</p>
Full article ">Figure 8
<p>The pipeline of the proposed segmentation solution.</p>
Full article ">Figure 9
<p>Original image with (<b>a</b>) applied threshold, (<b>b</b>) region fill and threshold, (<b>c</b>) applied Frangi filter and (<b>d</b>) ground truth segmentation.</p>
Full article ">Figure 10
<p>Examples of segmentation results using Morpho-U-Net, resulting in DICE values of 0.927 for image (<b>A</b>) and 0.759 for image (<b>B</b>).</p>
Full article ">Figure 11
<p>Segmentation results for calcified, mixed, and non-calcified plaques.</p>
Full article ">Figure 12
<p>Examples of incomplete annotations.</p>
Full article ">
25 pages, 8441 KiB  
Article
Reinforcement Learning of a Six-DOF Industrial Manipulator for Pick-and-Place Application Using Efficient Control in Warehouse Management
by Ahmed Iqdymat and Grigore Stamatescu
Sustainability 2025, 17(2), 432; https://doi.org/10.3390/su17020432 - 8 Jan 2025
Viewed by 414
Abstract
This study investigates the integration of reinforcement learning (RL) with optimal control to enhance precision and energy efficiency in industrial robotic manipulation. A novel framework is proposed, combining Deep Deterministic Policy Gradient (DDPG) with a Linear Quadratic Regulator (LQR) controller, specifically applied to [...] Read more.
This study investigates the integration of reinforcement learning (RL) with optimal control to enhance precision and energy efficiency in industrial robotic manipulation. A novel framework is proposed, combining Deep Deterministic Policy Gradient (DDPG) with a Linear Quadratic Regulator (LQR) controller, specifically applied to the ABB IRB120, a six-degree-of-freedom (6-DOF) industrial manipulator, for pick-and-place tasks in warehouse automation. The methodology employs an actor–critic RL architecture with a 27-dimensional state input and a 6-dimensional joint action output. The RL agent was trained using MATLAB’s Reinforcement Learning Toolbox and integrated with ABB’s RobotStudio simulation environment via TCP/IP communication. LQR controllers were incorporated to optimize joint-space trajectory tracking, minimizing energy consumption while ensuring precise control. The novelty of this research lies in its synergistic combination of RL and LQR control, addressing energy efficiency and precision simultaneously—an area that has seen limited exploration in industrial robotics. Experimental validation across 100 diverse scenarios confirmed the framework’s effectiveness, achieving a mean positioning accuracy of 2.14 mm (a 28% improvement over traditional methods), a 92.5% success rate in pick-and-place tasks, and a 22.7% reduction in energy consumption. The system demonstrated stable convergence after 458 episodes and maintained a mean joint angle error of 4.30°, validating its robustness and efficiency. These findings highlight the potential of RL for broader industrial applications. The demonstrated accuracy and success rate suggest its applicability to complex tasks such as electronic component assembly, multi-step manufacturing, delicate material handling, precision coordination, and quality inspection tasks like automated visual inspection, surface defect detection, and dimensional verification. Successful implementation in such contexts requires addressing challenges including task complexity, computational efficiency, and adaptability to process variability, alongside ensuring safety, reliability, and seamless system integration. This research builds upon existing advancements in warehouse automation, inverse kinematics, and energy-efficient robotics, contributing to the development of adaptive and sustainable control strategies for industrial manipulators in automated environments. Full article
(This article belongs to the Special Issue Smart Sustainable Techniques and Technologies for Industry 5.0)
Show Figures

Figure 1

Figure 1
<p>Proposed algorithm flowchart.</p>
Full article ">Figure 2
<p>Top-level architecture.</p>
Full article ">Figure 3
<p>Internal architecture of IRB120 Pick-and-Place.</p>
Full article ">Figure 4
<p>Proposed efficient controller.</p>
Full article ">Figure 5
<p>Policy process for the RL agent.</p>
Full article ">Figure 6
<p>Critic and actor network for the RL agent.</p>
Full article ">Figure 7
<p>Flowchart of the Adaptive Momentum Estimation (ADAM) algorithm.</p>
Full article ">Figure 8
<p>Integrated system architecture.</p>
Full article ">Figure 9
<p>Reward of each episode (each colored line corresponds to one training episode).</p>
Full article ">Figure 10
<p>Cumulative and average cumulative reward.</p>
Full article ">Figure 11
<p>Distribution of robot positioning errors.</p>
Full article ">Figure 12
<p>Error comparison chart between the RL agent and the reference method.</p>
Full article ">Figure 13
<p>Simulated learned trajectory vs. desired trajectory.</p>
Full article ">Figure 14
<p>Comparison of joint angle trajectories between DH and RL solutions.</p>
Full article ">
26 pages, 29211 KiB  
Article
Performance Evaluation of Deep Learning Image Classification Modules in the MUN-ABSAI Ice Risk Management Architecture
by Ravindu G. Thalagala, Oscar De Silva, Dan Oldford and David Molyneux
Sensors 2025, 25(2), 326; https://doi.org/10.3390/s25020326 - 8 Jan 2025
Viewed by 244
Abstract
The retreat of Arctic sea ice has opened new maritime routes, offering faster shipping opportunities; however, these routes present significant navigational challenges due to the harsh ice conditions. To address these challenges, this paper proposes a deep learning-based Arctic ice risk management architecture [...] Read more.
The retreat of Arctic sea ice has opened new maritime routes, offering faster shipping opportunities; however, these routes present significant navigational challenges due to the harsh ice conditions. To address these challenges, this paper proposes a deep learning-based Arctic ice risk management architecture with multiple modules, including ice classification, risk assessment, ice floe tracking, and ice load calculations. A comprehensive dataset of 15,000 ice images was created using public sources and contributions from the Canadian Coast Guard, and it was used to support the development and evaluation of the system. The performance of the YOLOv8n-cls model was assessed for the ice classification modules due to its fast inference speed, making it suitable for resource-constrained onboard systems. The training and evaluation were conducted across multiple platforms, including Roboflow, Google Colab, and Compute Canada, allowing for a detailed comparison of their capabilities in image preprocessing, model training, and real-time inference generation. The results demonstrate that Image Classification Module I achieved a validation accuracy of 99.4%, while Module II attained 98.6%. Inference times were found to be less than 1 s in Colab and under 3 s on a stand-alone system, confirming the architecture’s efficiency in real-time ice condition monitoring. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

Figure 1
<p>Object detection and classification typically used in autonomous full-scale aerial applications carried out using the datasets from our previous work [<a href="#B36-sensors-25-00326" class="html-bibr">36</a>].</p>
Full article ">Figure 2
<p>Sematic segmentation of in situ ice field using PSPNet101 in our previous work [<a href="#B13-sensors-25-00326" class="html-bibr">13</a>]. Reproduced with permission from [Benjamin Dowden], [Sea Ice Classification via Deep Neural Network Semantic Segmentation]; published by [IEEE], [2020].</p>
Full article ">Figure 3
<p>Details of the AI sub-modules within the architecture. The image feed is processed by the Ice Classification Module I, after which the images pass through each subsequent sub-module. The numbered lists within each box represent the specific classes or outputs generated by that module. A GPU-based event mechanics model (GEM) [<a href="#B45-sensors-25-00326" class="html-bibr">45</a>] is denoted as GEM.</p>
Full article ">Figure 4
<p>Graphical image annotation tool found in Roboflow [<a href="#B48-sensors-25-00326" class="html-bibr">48</a>].</p>
Full article ">Figure 5
<p>Image preprocessing carried out on ICM-I. The two rows of forward_looking images (original on the <b>left</b>) were resized (on the <b>right</b>) to 640 × 640 pixels.</p>
Full article ">Figure 6
<p>Image augmentation carried out on preprocessed images from the dataset. The <b>top</b> row indicates the grayscale augmentation. The <b>bottom</b> row indicates the addition of noise to the image. The images were used with consent from Envi.</p>
Full article ">Figure 7
<p>ICM-I results using Roboflow, with percentages indicating the model’s confidence in predicting the correct class for each image.</p>
Full article ">Figure 8
<p>ICM-II results using Roboflow, with percentages indicating the model’s confidence in predicting the correct class for each image.</p>
Full article ">Figure 9
<p>The confusion matrix generated from YOLOv8 model training in Colab.</p>
Full article ">Figure 10
<p>The class training accuracy plot stopped at 200 epochs in Google Colab due to resource constraints.</p>
Full article ">Figure 11
<p>Confusion matrix for the 10,000-image dataset with 1000 training epochs for ICM-I.</p>
Full article ">Figure 12
<p>Model training accuracy graph for the 10,000-image dataset with 1000 training epochs.</p>
Full article ">Figure 13
<p>Confusion matrix for the 6000-image dataset with 1000 training epochs for ICM-II.</p>
Full article ">Figure 14
<p>Model training accuracy graph for ICM-II for 1000 training epochs.</p>
Full article ">Figure 15
<p>Inference speed results from ICM-I on Google Colab.</p>
Full article ">Figure 16
<p>Inference speed results from ICM-II on Google Colab.</p>
Full article ">Figure 17
<p>Model-testing web interface. The left side shows the ICM-I model test and the right side shows the ICM-II model test interface.</p>
Full article ">
18 pages, 5484 KiB  
Article
AI-Assisted Forecasting of a Mitigated Multiple Steam Generator Tube Rupture Scenario in a Typical Nuclear Power Plant
by Sonia Spisak and Aya Diab
Energies 2025, 18(2), 250; https://doi.org/10.3390/en18020250 - 8 Jan 2025
Viewed by 280
Abstract
This study is focused on developing a machine learning (ML) meta-model to predict the progression of a multiple steam generator tube rupture (MSGTR) accident in the APR1400 reactor. The accident was simulated using the thermal–hydraulic code RELAP5/SCDAPSIM/MOD3.4. The model incorporates a mitigation strategy [...] Read more.
This study is focused on developing a machine learning (ML) meta-model to predict the progression of a multiple steam generator tube rupture (MSGTR) accident in the APR1400 reactor. The accident was simulated using the thermal–hydraulic code RELAP5/SCDAPSIM/MOD3.4. The model incorporates a mitigation strategy executed through operator interventions. Following this, uncertainty quantification employing the Best Estimate Plus Uncertainty (BEPU) methodology was undertaken by coupling RELAP5/SCDAPSIM/MOD3.4 with the statistical software, DAKOTA 6.14.0. The analysis concentrated on critical safety parameters, including Reactor Coolant System (RCS) pressure and temperature, as well as reactor vessel upper head (RVUH) void fraction. These simulations generated a comprehensive dataset, which served as the foundation for training three ML architectures: Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), and Convolutional LSTM (CNN+LSTM). Among these models, the CNN+LSTM hybrid configuration demonstrated superior performance, excelling in both predictive accuracy and computational efficiency. To bolster the model’s transparency and interpretability, Integrated Gradients (IGs)—an advanced Explainable AI (XAI) technique—was applied, elucidating the contribution of input features to the model’s predictions and enhancing its trustworthiness. Full article
(This article belongs to the Section B4: Nuclear Energy)
Show Figures

Figure 1

Figure 1
<p>Workflow chart.</p>
Full article ">Figure 2
<p>RNN architecture.</p>
Full article ">Figure 3
<p>GRU architecture.</p>
Full article ">Figure 4
<p>LSTM architecture.</p>
Full article ">Figure 5
<p>CNN-LSTM architecture.</p>
Full article ">Figure 6
<p>Spearman’s correlation matrix.</p>
Full article ">Figure 7
<p>RCS and SG pressure vs. time.</p>
Full article ">Figure 8
<p>BEPU analysis results for (<b>a</b>) RCS pressure and (<b>b</b>) RCS temperature.</p>
Full article ">Figure 9
<p>BEPU results for RVUH void fraction.</p>
Full article ">Figure 10
<p>Loss vs. epoch for RCS pressure.</p>
Full article ">Figure 11
<p>Loss vs. epoch for RCS temperature.</p>
Full article ">Figure 12
<p>Loss vs. epoch for RVUH.</p>
Full article ">Figure 13
<p>Predicted and actual values of RCS pressure.</p>
Full article ">Figure 14
<p>Predicted and actual values of RCS temperature.</p>
Full article ">Figure 15
<p>Predicted and actual values of RVUH.</p>
Full article ">
23 pages, 1068 KiB  
Article
Utilization of a Lightweight 3D U-Net Model for Reducing Execution Time of Numerical Weather Prediction Models
by Hyesung Park and Sungwook Chung
Atmosphere 2025, 16(1), 60; https://doi.org/10.3390/atmos16010060 - 8 Jan 2025
Viewed by 235
Abstract
Conventional weather forecasting relies on numerical weather prediction (NWP), which solves atmospheric equations using numerical methods. The Korea Meteorological Administration (KMA) adopted the Met Office Global Seasonal Forecasting System version 6 (GloSea6) NWP model from the UK and runs it on a supercomputer. [...] Read more.
Conventional weather forecasting relies on numerical weather prediction (NWP), which solves atmospheric equations using numerical methods. The Korea Meteorological Administration (KMA) adopted the Met Office Global Seasonal Forecasting System version 6 (GloSea6) NWP model from the UK and runs it on a supercomputer. However, due to high task demands, the limited resources of the supercomputer have caused job queue delays. To address this, the KMA developed a low-resolution version, Low GloSea6, for smaller-scale servers at universities and research institutions. Despite its ability to run on less powerful servers, Low GloSea6 still requires significant computational resources like those of high-performance computing (HPC) clusters. We integrated deep learning with Low GloSea6 to reduce execution time and improve meteorological research efficiency. Through profiling, we confirmed that deep learning models can be integrated without altering the original configuration of Low GloSea6 or complicating physical interpretation. The profiling identified “tri_sor.F90” as the main CPU time hotspot. By combining the biconjugate gradient stabilized (BiCGStab) method, used for solving the Helmholtz problem, with a deep learning model, we reduced unnecessary hotspot calls, shortening execution time. We also propose a convolutional block attention module-based Half-UNet (CH-UNet), a lightweight 3D-based U-Net architecture, for faster deep-learning computations. In experiments, CH-UNet showed 10.24% lower RMSE than Half-UNet, which has fewer FLOPs. Integrating CH-UNet into Low GloSea6 reduced execution time by up to 71 s per timestep, averaging a 2.6% reduction compared to the original Low GloSea6, and 6.8% compared to using Half-UNet. This demonstrates that CH-UNet, with balanced FLOPs and high predictive accuracy, offers more significant execution time reductions than models with fewer FLOPs. Full article
Show Figures

Figure 1

Figure 1
<p>The overall execution process of GloSea6.</p>
Full article ">Figure 2
<p>Operational structure of “um-atmos.exe”.</p>
Full article ">Figure 3
<p>Correlation heatmap of variables used in BiCGStab.</p>
Full article ">Figure 4
<p>Resolution size of the 3D grid data. (<b>a</b>) matches the latitude and longitude grid size of the Low GloSea6 UM model, and (<b>b</b>) is adjusted to be a multiple of 2 to facilitate the upsampling process in the U-Net architecture.</p>
Full article ">Figure 5
<p>U-Net architecture [<a href="#B30-atmosphere-16-00060" class="html-bibr">30</a>].</p>
Full article ">Figure 6
<p>Half-UNet architecture [<a href="#B35-atmosphere-16-00060" class="html-bibr">35</a>].</p>
Full article ">Figure 7
<p>CBAM-based Half-UNet (CH-UNet) architecture.</p>
Full article ">Figure 8
<p>Overall structure of CBAM and Sub-Attention Modules [<a href="#B36-atmosphere-16-00060" class="html-bibr">36</a>].</p>
Full article ">Figure 9
<p>Hybrid-DL NWP model structure integrating CH-UNet in the UM model of Low GloSea6.</p>
Full article ">Figure 10
<p>Comparison of ’um-atmos.exe’ file execution time for each timestep.</p>
Full article ">Figure 11
<p>Comparison of RMSE for each deep network model’s prediction results during Low GloSea6 execution by Timestep.</p>
Full article ">
21 pages, 4169 KiB  
Article
Enhancing Deepfake Detection Through Quantum Transfer Learning and Class-Attention Vision Transformer Architecture
by Bekir Eray Katı, Ecir Uğur Küçüksille and Güncel Sarıman
Appl. Sci. 2025, 15(2), 525; https://doi.org/10.3390/app15020525 - 8 Jan 2025
Viewed by 276
Abstract
The widespread use of the internet, coupled with the increasing production of digital content, has caused significant challenges in information security and manipulation. Deepfake detection has become a critical research topic in both academic and practical domains, as it involves identifying forged elements [...] Read more.
The widespread use of the internet, coupled with the increasing production of digital content, has caused significant challenges in information security and manipulation. Deepfake detection has become a critical research topic in both academic and practical domains, as it involves identifying forged elements in artificially generated videos using various deep learning and artificial intelligence techniques. In this dissertation, an innovative model was developed for detecting deepfake videos by combining the Quantum Transfer Learning (QTL) and Class-Attention Vision Transformer (CaiT) architectures. The Deepfake Detection Challenge (DFDC) dataset was used for training, and a system capable of detecting spatiotemporal inconsistencies was constructed by integrating QTL and CaiT technologies. In addition to existing preprocessing methods in the literature, a novel preprocessing function tailored to the requirements of deep learning models was developed for the dataset. The advantages of quantum computing offered by QTL were merged with the global feature extraction capabilities of the CaiT. The results demonstrated that the proposed method achieved a remarkable performance in detecting deepfake videos, with an accuracy of 90% and ROC AUC score of 0.94 achieved. The model’s performance was compared with other methods evaluated on the DFDC dataset, highlighting its efficiency in resource utilization and overall effectiveness. The findings reveal that the proposed QTL-CaiT-based system provides a strong foundation for deepfake detection and contributes significantly to the academic literature. Future research should focus on testing the model on real quantum devices and applying it to larger datasets to further enhance its applicability. Full article
(This article belongs to the Section Quantum Science and Technology)
Show Figures

Figure 1

Figure 1
<p>Model flowchart for deepfake detection based on CaiT and QTL.</p>
Full article ">Figure 2
<p>Diagram of the process for extracting faces from videos and data transformation.</p>
Full article ">Figure 3
<p>Illustrates the stages of superposition, entanglement, and measurement in a quantum circuit. The numbers 0, 1, 2, and 3 represent the qubits in the circuit. The “H” denotes the Hadamard Gate, “RY” represents the Rotation-Y Gate, and the circuit also includes CNOT Gates and measurement symbols.</p>
Full article ">Figure 4
<p>Model construction and integration process.</p>
Full article ">Figure 5
<p>Data processing flowchart for the model training process.</p>
Full article ">Figure 6
<p>Confusion matrix for test data.</p>
Full article ">Figure 7
<p>ROC curve and AUC of the model. The dashed blue line represents the performance of a random classifier, where predictions are no better than chance, with equal true and false positive rates.</p>
Full article ">Figure 8
<p>(<b>a</b>) Confusion matrix showing classification performance on the cross-dataset evaluation; (<b>b</b>) ROC curve representing the model’s performance for the cross-dataset evaluation. The dashed blue line represents the performance of a random classifier, where predictions are no better than chance, with equal true and false positive rates.</p>
Full article ">
Back to TopTop