[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,302)

Search Parameters:
Keywords = deep auto-encoder

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 743 KiB  
Article
AD-VAE: Adversarial Disentangling Variational Autoencoder
by Adson Silva and Ricardo Farias
Sensors 2025, 25(5), 1574; https://doi.org/10.3390/s25051574 - 4 Mar 2025
Abstract
Face recognition (FR) is a less intrusive biometrics technology with various applications, such as security, surveillance, and access control systems. FR remains challenging, especially when there is only a single image per person as a gallery dataset and when dealing with variations like [...] Read more.
Face recognition (FR) is a less intrusive biometrics technology with various applications, such as security, surveillance, and access control systems. FR remains challenging, especially when there is only a single image per person as a gallery dataset and when dealing with variations like pose, illumination, and occlusion. Deep learning techniques have shown promising results in recent years using VAE and GAN, with approaches such as patch-VAE, VAE-GAN for 3D Indoor Scene Synthesis, and hybrid VAE-GAN models. However, in Single Sample Per Person Face Recognition (SSPP FR), the challenge of learning robust and discriminative features that preserve the subject’s identity persists. To address these issues, we propose a novel framework called AD-VAE, specifically for SSPP FR, using a combination of variational autoencoder (VAE) and Generative Adversarial Network (GAN) techniques. The proposed AD-VAE framework is designed to learn how to build representative identity-preserving prototypes from both controlled and wild datasets, effectively handling variations like pose, illumination, and occlusion. The method uses four networks: an encoder and decoder similar to VAE, a generator that receives the encoder output plus noise to generate an identity-preserving prototype, and a discriminator that operates as a multi-task network. AD-VAE outperforms all tested state-of-the-art face recognition techniques, demonstrating its robustness. The proposed framework achieves superior results on four controlled benchmark datasets—AR, E-YaleB, CAS-PEAL, and FERET—with recognition rates of 84.9%, 94.6%, 94.5%, and 96.0%, respectively, and achieves remarkable performance on the uncontrolled LFW dataset, with a recognition rate of 99.6%. The AD-VAE framework shows promising potential for future research and real-world applications. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>The first part of the proposed AD-VAE, which works as a variational adversarial autoencoder. The <math display="inline"><semantics> <mi mathvariant="bold">x</mi> </semantics></math> denotes the image data from <math display="inline"><semantics> <mi mathvariant="bold">X</mi> </semantics></math>, and <math display="inline"><semantics> <msup> <mrow> <mi mathvariant="bold">x</mi> </mrow> <mrow> <mi>d</mi> <mi>e</mi> <mi>c</mi> </mrow> </msup> </semantics></math> denotes the decoder reconstruction from <math display="inline"><semantics> <mi mathvariant="bold">x</mi> </semantics></math>. The encoder <math display="inline"><semantics> <msub> <mi mathvariant="bold">E</mi> <mrow> <mi>n</mi> <mi>c</mi> </mrow> </msub> </semantics></math> has as input image <math display="inline"><semantics> <mi mathvariant="bold">x</mi> </semantics></math> and produces two outputs, the mean (<math display="inline"><semantics> <mi>μ</mi> </semantics></math>) and the log-variance (<math display="inline"><semantics> <mi>σ</mi> </semantics></math>), which define the parameters of a normal distribution <math display="inline"><semantics> <mrow> <mi mathvariant="script">N</mi> <mo>(</mo> <mi>μ</mi> <mo>,</mo> <mi>σ</mi> <mo>)</mo> </mrow> </semantics></math>. From distribution <math display="inline"><semantics> <mrow> <mi mathvariant="script">N</mi> <mo>(</mo> <mi>μ</mi> <mo>,</mo> <mi>σ</mi> <mo>)</mo> </mrow> </semantics></math>, we extract a latent vector <math display="inline"><semantics> <mrow> <mi>c</mi> <mo>∼</mo> <mi mathvariant="script">N</mi> <mo>(</mo> <mi>μ</mi> <mo>,</mo> <mi>σ</mi> <mo>)</mo> </mrow> </semantics></math> that serves as input to decoder <math display="inline"><semantics> <msub> <mi mathvariant="bold">D</mi> <mrow> <mi>e</mi> <mi>c</mi> </mrow> </msub> </semantics></math> which outputs the reconstruction <math display="inline"><semantics> <msup> <mrow> <mi mathvariant="bold">x</mi> </mrow> <mrow> <mi>d</mi> <mi>e</mi> <mi>c</mi> </mrow> </msup> </semantics></math>.</p>
Full article ">Figure 2
<p>The second part of the proposed architecture of AD-VAE, where <math display="inline"><semantics> <mi mathvariant="bold">x</mi> </semantics></math> denotes the image from SSPP data <math display="inline"><semantics> <mi mathvariant="bold">X</mi> </semantics></math>, <math display="inline"><semantics> <msup> <mrow> <mi mathvariant="bold">x</mi> </mrow> <mrow> <mi>r</mi> <mi>p</mi> </mrow> </msup> </semantics></math> denotes the image real prototype <math display="inline"><semantics> <mi mathvariant="bold">x</mi> </semantics></math>, and <math display="inline"><semantics> <mrow> <mover accent="true"> <mi mathvariant="bold">x</mi> <mo stretchy="false">^</mo> </mover> </mrow> </semantics></math> is the generated prototype from image <math display="inline"><semantics> <mi mathvariant="bold">x</mi> </semantics></math>. The pre-trained (first part) encoder <math display="inline"><semantics> <msub> <mi mathvariant="bold">E</mi> <mrow> <mi>n</mi> <mi>c</mi> </mrow> </msub> </semantics></math> generates the mean <math display="inline"><semantics> <mi>μ</mi> </semantics></math> and variation <math display="inline"><semantics> <mi>σ</mi> </semantics></math> of <math display="inline"><semantics> <mi mathvariant="bold">x</mi> </semantics></math>. From distribution <math display="inline"><semantics> <mrow> <mi mathvariant="script">N</mi> <mo>(</mo> <mi>μ</mi> <mo>,</mo> <mi>σ</mi> <mo>)</mo> </mrow> </semantics></math>, we extract a latent vector <math display="inline"><semantics> <mrow> <mi>c</mi> <mo>∼</mo> <mi mathvariant="script">N</mi> <mo>(</mo> <mi>μ</mi> <mo>,</mo> <mi>σ</mi> <mo>)</mo> </mrow> </semantics></math> that concatenates with noise vector <math display="inline"><semantics> <mrow> <mi>z</mi> <mo>∼</mo> <mi mathvariant="script">N</mi> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>)</mo> </mrow> </semantics></math> to serve as the input to generator <math display="inline"><semantics> <msub> <mi mathvariant="bold">G</mi> <mrow> <mi>e</mi> <mi>n</mi> </mrow> </msub> </semantics></math> which outputs the prototype <math display="inline"><semantics> <mover accent="true"> <mi mathvariant="bold">x</mi> <mo stretchy="false">^</mo> </mover> </semantics></math> of <math display="inline"><semantics> <mi mathvariant="bold">x</mi> </semantics></math>. The discriminator <math display="inline"><semantics> <mi mathvariant="bold">D</mi> </semantics></math> (1) determines the id and variation of <math display="inline"><semantics> <mi mathvariant="bold">x</mi> </semantics></math>; (2) determines the id, variation, and whether <math display="inline"><semantics> <mover accent="true"> <mi mathvariant="bold">x</mi> <mo stretchy="false">^</mo> </mover> </semantics></math> is real or fake; and (3) determines whether <math display="inline"><semantics> <msub> <mi mathvariant="bold">x</mi> <mrow> <mi>r</mi> <mi>p</mi> </mrow> </msub> </semantics></math> is real or fake.</p>
Full article ">Figure 3
<p>The prototypes generated by AD-VAE are presented as follows: (<b>a</b>) the sample image with variations, (<b>b</b>) the generated prototype of image (<b>a</b>), and (<b>c</b>) the real prototype of image (<b>a</b>). On the right side, the name of the dataset and the variation are indicated.</p>
Full article ">
37 pages, 34201 KiB  
Article
Measuring the Level of Aflatoxin Infection in Pistachio Nuts by Applying Machine Learning Techniques to Hyperspectral Images
by Lizzie Williams, Pancham Shukla, Akbar Sheikh-Akbari, Sina Mahroughi and Iosif Mporas
Sensors 2025, 25(5), 1548; https://doi.org/10.3390/s25051548 - 2 Mar 2025
Viewed by 148
Abstract
This paper investigates the use of machine learning techniques on hyperspectral images of pistachios to detect and classify different levels of aflatoxin contamination. Aflatoxins are toxic compounds produced by moulds, posing health risks to consumers. Current detection methods are invasive and contribute to [...] Read more.
This paper investigates the use of machine learning techniques on hyperspectral images of pistachios to detect and classify different levels of aflatoxin contamination. Aflatoxins are toxic compounds produced by moulds, posing health risks to consumers. Current detection methods are invasive and contribute to food waste. This paper explores the feasibility of a non-invasive method using hyperspectral imaging and machine learning to classify aflatoxin levels accurately, potentially reducing waste and enhancing food safety. Hyperspectral imaging with machine learning has shown promise in food quality control. The paper evaluates models including Dimensionality Reduction with K-Means Clustering, Residual Networks (ResNets), Variational Autoencoders (VAEs), and Deep Convolutional Generative Adversarial Networks (DCGANs). Using a dataset from Leeds Beckett University with 300 hyperspectral images, covering three aflatoxin levels (<8 ppn, >160 ppn, and >300 ppn), key wavelengths were identified to indicate contamination presence. Dimensionality Reduction with K-Means achieved 84.38% accuracy, while a ResNet model using the 866.21 nm wavelength reached 96.67%. VAE and DCGAN models, though promising, were constrained by dataset size. The findings highlight the potential for machine learning-based hyperspectral imaging in pistachio quality control, and future research should focus on expanding datasets and refining models for industry application. Full article
Show Figures

Figure 1

Figure 1
<p>Classification system implemented in this research for each of the classifiers developed. For ResNet, VAE, and DCGAN, the hyperspectral images are broken down into individual wavelengths but for K-Means Clustering, the images have their dimensionality reduced.</p>
Full article ">Figure 2
<p>Point, line, and area scanning representation.</p>
Full article ">Figure 3
<p>The camera setup used to gather the pistachio image dataset used for this paper. The camera uses the line scanning technique.</p>
Full article ">Figure 4
<p>Residual Network Block implemented in the Residual Network Architecture.</p>
Full article ">Figure 5
<p>Representation of the 453.8 nm wavelength at the 3 different aflatoxin levels: (<b>a</b>) Less Than 8 μg/kg, (<b>b</b>) Greater Than 160 μg/kg, and (<b>c</b>) Greater Than 300 μg/kg.</p>
Full article ">Figure 6
<p>RGB representation of hyperspectral images at the 3 different aflatoxin levels: (<b>a</b>) Less Than 8 μg/kg, (<b>b</b>) Greater Than 160 μg/kg, and (<b>c</b>) Greater Than 300 μg/kg.</p>
Full article ">Figure 7
<p>Three different bands of the same hyperspectral image of pistachios from the Greater Than 300 μg/kg level: (<b>a</b>) Band 151 (534 nm), (<b>b</b>) Band 241 (628 nm), and (<b>c</b>) Band 461 (1003 nm).</p>
Full article ">Figure 8
<p>Histogram of the five most informative bands for each Less Than 8 ppn hyperspectral image using the ‘selectbands’ function as described above. There are five peaks at [1, 11], [141, 151], [211, 221], [361, 371], and [451, 462].</p>
Full article ">Figure 9
<p>Histogram of the five most informative bands for each Greater Than 160 ppn hyperspectral image using the ‘selectbands’ function as described above. There are five areas which contain the majority of the bands selected [1, 11], [21, 51], [211, 251], [341, 371], and [451, 462].</p>
Full article ">Figure 10
<p>Histogram of the five most informative bands for each Greater Than 300 ppn Hyperspectral Image using the ‘selectbands’ function as described above. There are four peaks on the histogram at [1, 11], [151, 171], [361, 371], [451, 462].</p>
Full article ">Figure 11
<p>Pairs of the most informative bands for the hyperspectral images from each level: Less Than 8 ppn, shown in blue circles, Greater Than 160 ppn, shown in orange circles and Greater Than 300 ppn, shown in green circles. At the Greater Than 160 ppn level, there are two clear dense clusters, centred approximately around points (355, 45) and (350, 225). The Less Than 8 ppn and Greater Than 300 ppn distribution of points are much more comparable, their main clusters seem to both be centred approximately at the point (365, 150) although the standard deviation of the points for both levels is much greater than for the Greater Than 160 ppn level. Ignoring outliers, the 1st Principal Band ranges from band 300 to 425, and the 2nd Principal Band ranges from band 25 to 241.</p>
Full article ">Figure 12
<p>K-Means clustering results following dimensionality reduction: (<b>a</b>) for a less than 8 ppn hyperspectral image with 2 clusters; (<b>b</b>) for a greater than 300 ppn hyperspectral image with 2 clusters; (<b>c</b>) for a less than 8 ppn hyperspectral image with 4 clusters; and (<b>d</b>) for a greater than 300 ppn hyperspectral image with 4 clusters.</p>
Full article ">Figure 13
<p>K-Means clustering results following dimensionality reduction with various clusters: (<b>a</b>) less than 8 ppn hyperspectral image with 13 clusters; (<b>b</b>) greater than 300 ppn hyperspectral image with 13 clusters; (<b>c</b>) less than 8 ppn hyperspectral image with 15 clusters; and (<b>d</b>) greater than 300 ppn hyperspectral image with 15 clusters.</p>
Full article ">Figure 14
<p>K-Means clustering results following dimensionality reduction with various clusters: (<b>a</b>) less than 8 ppn hyperspectral image with 10 clusters; (<b>b</b>) greater than 300 ppn hyperspectral image with 10 clusters.</p>
Full article ">Figure 15
<p>Distribution of colours within each K-Means Clustering of the reduced hyperspectral images after Dimensionality Reduction, using 10 clusters.</p>
Full article ">Figure 16
<p>Percentage of colour 6 pixels compared to the overall pixel count for each K-Means Clustered image.</p>
Full article ">Figure 17
<p>Fraction of colour 6 pixels compared to the overall pixel count for the extended K-Means Clustered Images.</p>
Full article ">Figure 18
<p>Implemented ResNet Block with ReLU activation functions.</p>
Full article ">Figure 19
<p>Training and validation losses for each of the ResNet classifiers trained on the dataset of individual wavelength images from bands 11, 151, 241, 361, 461. The classifier trained on the dataset for band 361 on average had the lowest training and validation loss, which also fluctuated less than the loss of the other ResNet classifiers. The graph shows losses from 5 epochs onwards so that the losses are not saturated by the steep learning curve.</p>
Full article ">Figure 20
<p>Implemented VAE architecture.</p>
Full article ">Figure 21
<p>Loss functions for the dataset of band 151 images using <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math> for 100 epochs. The training loss for the Total Loss and Reconstruction Loss continue to decrease while the test loss for both increases after approximately 25 epochs. This indicates that the model is overfitting. The KL Divergence for both the training and test dataset appears to be slowly increasing, then at approximately 60 epochs, they begin decreasing. The Total and Reconstruction Loss curves appear similar to one another but the KL Divergence loss curves do not.</p>
Full article ">Figure 22
<p>Reconstructed images for the training dataset of band 151 images using <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math> for 100 epochs. The images are reconstructions of images from each of the 3 aflatoxin levels.</p>
Full article ">Figure 23
<p>Reconstructed images for the test dataset of band 151 images using <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math> for 100 epochs. These images are all reconstructions of pistachios without shells from the Greater Than 160 ppn dataset.</p>
Full article ">Figure 24
<p>Generated image samples for both the training and test datasets after training the <math display="inline"><semantics> <mi>β</mi> </semantics></math>-VAE using an incremental <math display="inline"><semantics> <mi>β</mi> </semantics></math> from 0.1 to 1 over 100 epochs on the input dataset of band 151: (<b>a</b>) Generated images for the training dataset of band 151 images using <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math> for 100 epochs. (<b>b</b>) Generated images for the test dataset of band 151 images using <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math> for 100 epochs.</p>
Full article ">Figure 25
<p>Two-dimensional t-SNE representation of the latent space for the training data after training the <math display="inline"><semantics> <mi>β</mi> </semantics></math>-VAE using an incremental <math display="inline"><semantics> <mi>β</mi> </semantics></math> from 0.1 to 1 over 100 epochs on the input dataset of band 151.</p>
Full article ">Figure 26
<p>Reconstructed images for the training and test datasets of band 361 images using <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math> for 200 epochs. (<b>a</b>) Reconstructed images for the training dataset, representing each of the 3 aflatoxin levels. (<b>b</b>) Reconstructed images for the test dataset, representing the Greater Than 160 ppn level.</p>
Full article ">Figure 27
<p>Generated image samples for both the training and test datasets after training the <math display="inline"><semantics> <mi>β</mi> </semantics></math>-VAE using an incremental <math display="inline"><semantics> <mi>β</mi> </semantics></math> from 1 to 10 over 200 epochs on the input dataset of band 361: (<b>a</b>) Generated images for the training dataset, and (<b>b</b>) Generated images for the test dataset.</p>
Full article ">Figure 28
<p>Loss functions for the dataset of band 361 images using <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math> for 200 epochs with batch size 100. The training and test loss for the Total Loss and Reconstruction Loss decrease. The KL Divergence for both the training and test dataset appear to be increasing at a decreasing rate, indicating that they may begin the decrease and converge to a loss of 0.</p>
Full article ">Figure 29
<p>Two-dimensional t-SNE representation of the latent space for the training data after training the <math display="inline"><semantics> <mi>β</mi> </semantics></math>-VAE using an incremental <math display="inline"><semantics> <mi>β</mi> </semantics></math> from 1 to 10 over 200 epochs on the input dataset of band 361.</p>
Full article ">Figure 30
<p>Implemented DCGAN architecture.</p>
Full article ">Figure 31
<p>Training losses for both DCGAN models implemented: (<b>a</b>) Discriminator and generator training loss for a DCGAN model trained on 100 epochs using the image dataset of band 11. The discriminator loss gradually decreases; meanwhile, the generator loss gradually increases. (<b>b</b>) Discriminator and generator training loss for a DCGAN model trained on 10 epochs using a dataset of each wavelength image from 8 hyperspectral images from each of the Less Than 8 ppn and Greater Than 300 ppn levels. The discriminator loss decreases; meanwhile, the generator loss increases.</p>
Full article ">
27 pages, 4959 KiB  
Article
Deep Learning Autoencoders for Fast Fourier Transform-Based Clustering and Temporal Damage Evolution in Acoustic Emission Data from Composite Materials
by Serafeim Moustakidis, Konstantinos Stergiou, Matthew Gee, Sanaz Roshanmanesh, Farzad Hayati, Patrik Karlsson and Mayorkinos Papaelias
Infrastructures 2025, 10(3), 51; https://doi.org/10.3390/infrastructures10030051 - 2 Mar 2025
Viewed by 343
Abstract
Structural health monitoring (SHM) in fiber-reinforced polymer (FRP) composites is essential to ensure safety and reliability during service, particularly in critical industries such as aerospace and wind energy. Traditional methods of analyzing Acoustic Emission (AE) signals in the time domain often fail to [...] Read more.
Structural health monitoring (SHM) in fiber-reinforced polymer (FRP) composites is essential to ensure safety and reliability during service, particularly in critical industries such as aerospace and wind energy. Traditional methods of analyzing Acoustic Emission (AE) signals in the time domain often fail to accurately detect subtle or early-stage damage, limiting their effectiveness. The present study introduces a novel approach that integrates frequency-domain analysis using the fast Fourier transform (FFT) with deep learning techniques for more accurate and proactive damage detection. AE signals are first transformed into the frequency domain, where significant frequency components are extracted and used as inputs to an autoencoder network. The autoencoder model reduces the dimensionality of the data while preserving essential features, enabling unsupervised clustering to identify distinct damage states. Temporal damage evolution is modeled using Markov chain analysis to provide insights into how damage progresses over time. The proposed method achieves a reconstruction error of 0.0017 and a high R-squared value of 0.95, indicating the autoencoder’s effectiveness in learning compact representations while minimizing information loss. Clustering results, with a silhouette score of 0.37, demonstrate well-separated clusters that correspond to different damage stages. Markov chain analysis captures the transitions between damage states, providing a predictive framework for assessing damage progression. These findings highlight the potential of the proposed approach for early damage detection and predictive maintenance, which significantly improves the effectiveness of AE-based SHM systems in reducing downtime and extending component lifespan. Full article
Show Figures

Figure 1

Figure 1
<p>High-level architecture of the proposed methodology.</p>
Full article ">Figure 2
<p>Sensor positions for the tensile test (<b>left</b>) and 3-point bend (<b>right</b>).</p>
Full article ">Figure 3
<p>Raw acoustic data captured from a sample strip in the 3-point bending machine: (<b>a</b>) the whole duration of the experiment; (<b>b</b>) a zoomed-in visualization of a single AE event.</p>
Full article ">Figure 4
<p>Preprocessing pipeline for AE data, including noise removal, DC offset correction, segmentation, and frequency-domain transformation using <span class="html-italic">FFT</span>.</p>
Full article ">Figure 5
<p>Comparative literature values for the frequency ranges of damage in CFRPs. Data adapted from [<a href="#B25-infrastructures-10-00051" class="html-bibr">25</a>,<a href="#B27-infrastructures-10-00051" class="html-bibr">27</a>,<a href="#B28-infrastructures-10-00051" class="html-bibr">28</a>,<a href="#B29-infrastructures-10-00051" class="html-bibr">29</a>], illustrating variations in reported frequency bands for matrix cracking, delamination, debonding, fiber breakage, and fiber pull-out.</p>
Full article ">Figure 6
<p>Examples of Peak Frequency Assessment of a Tensile Tested CFRP Sample Coupon; (<b>a</b>) Time vs. Peak Frequency and (<b>b</b>) Magnitude vs. Peak Frequency.</p>
Full article ">Figure 7
<p>Time- and frequency-domain AE signals: (<b>a</b>–<b>c</b>) data with a detected AE event; (<b>d</b>) data without a detected AE event. DC offset has been removed from the time domain signals.</p>
Full article ">Figure 8
<p>Thresholding on the frequency domain (mean <span class="html-italic">FFT</span>) for detection of events (example from lab experiment).</p>
Full article ">Figure 9
<p>Bayesian optimization convergence.</p>
Full article ">Figure 10
<p>Examples of reconstructed and original <span class="html-italic">FFT</span> signals.</p>
Full article ">Figure 11
<p>Analysis of the mean frequency content of two clusters over time. The top plot shows the evolution of the mean frequency content for Cluster 1 and Cluster 2 with accumulated mean frequency curves. The bottom plots display the normalized amplitude of the mean frequency signals for Cluster 1 (<b>left</b>) and Cluster 2 (<b>right</b>), highlighting distinct peaks at specific frequencies within each cluster (results for lab experiment).</p>
Full article ">Figure 12
<p>State transition diagram (Markov chain) representing the probability of transitioning between two clusters. The diagram shows self-transition probabilities for Cluster 1 and Cluster 2, as well as the transition probabilities between the two clusters. The width of the arrows is proportional to the transition probabilities, with values indicated along each transition path.</p>
Full article ">
14 pages, 494 KiB  
Article
Denoising-Autoencoder-Aided Euclidean Distance Matrix Reconstruction for Connectivity-Based Localization: A Low-Rank Perspective
by Woong-Hee Lee, Mustafa Ozger, Ursula Challita and Taewon Song
Appl. Sci. 2025, 15(5), 2656; https://doi.org/10.3390/app15052656 - 1 Mar 2025
Viewed by 262
Abstract
In contrast to conventional localization methods, connectivity-based localization is a promising approach that leverages wireless links among network nodes. Here, the Euclidean distance matrix (EDM) plays a pivotal role in implementing the multidimensional scaling technique for the localization of wireless nodes based on [...] Read more.
In contrast to conventional localization methods, connectivity-based localization is a promising approach that leverages wireless links among network nodes. Here, the Euclidean distance matrix (EDM) plays a pivotal role in implementing the multidimensional scaling technique for the localization of wireless nodes based on pairwise distance measurements. This is based on the representation of complex datasets in lower-dimensional spaces, resulting from the mathematical property of an EDM being a low-rank matrix. However, EDM data are inevitably susceptible to contamination due to errors such as measurement imperfections, channel dynamics, and clock asynchronization. Motivated by the low-rank property of the EDM, we introduce a new pre-processor for connectivity-based localization, namely denoising-autoencoder-aided EDM reconstruction (DAE-EDMR). The proposed method is based on optimizing the neural network by inputting and outputting vectors of the eigenvalues of the noisy EDM and the original EDM, respectively. The optimized NN denoises the contaminated EDM, leading to an exceptional performance in connectivity-based localization. Additionally, we introduce a relaxed version of DAE-EDMR, i.e., truncated DAE-EDMR (T-DAE-EDMR), which remains operational regardless of variations in the number of nodes between the training and test phases in NN operations. The proposed algorithms show a superior performance in both EDM denoising and localization accuracy. Moreover, the method of T-DAE-EDMR notably requires a minimal number of training datasets compared to that in conventional approaches such as deep learning algorithms. Overall, our proposed algorithms reduce the required training dataset’s size by approximately one-tenth while achieving more than twice the effectiveness in EDM denoising, as demonstrated through our experiments. Full article
Show Figures

Figure 1

Figure 1
<p>An illustration of an example of the proposed DAE-EDMR (Additionally, truncated DAE-EDMR (T-DAE-EDMR) is presented as a relaxed version of DAE-EDMR. This additional work is carried out to make the optimized NN model more efficient by using the dominant <math display="inline"><semantics> <mrow> <mi>k</mi> <mo>+</mo> <mn>2</mn> </mrow> </semantics></math> eigenvalues as the input data. Details can be found in <a href="#sec2dot3-applsci-15-02656" class="html-sec">Section 2.3</a>).</p>
Full article ">Figure 2
<p>Performance comparison according to the number of training datasets. (<b>a</b>) NMSE between ground-truth and denoised EDMs and (<b>b</b>) localization error in meters.</p>
Full article ">Figure 3
<p>Performance comparison according to the NLoS probability. (<b>a</b>) NMSE between ground-truth and denoised EDMs and (<b>b</b>) localization error in meters.</p>
Full article ">Figure 4
<p>Performance comparison according to the utilized matrices. (<b>a</b>) NMSE between ground-truth and denoised EDMs and (<b>b</b>) localization error in meters.</p>
Full article ">
22 pages, 873 KiB  
Article
EEG-Based Music Emotion Prediction Using Supervised Feature Extraction for MIDI Generation
by Oscar Gomez-Morales, Hernan Perez-Nastar, Andrés Marino Álvarez-Meza, Héctor Torres-Cardona and Germán Castellanos-Dominguez
Sensors 2025, 25(5), 1471; https://doi.org/10.3390/s25051471 - 27 Feb 2025
Viewed by 222
Abstract
Advancements in music emotion prediction are driving AI-driven algorithmic composition, enabling the generation of complex melodies. However, bridging neural and auditory domains remains challenging due to the semantic gap between brain-derived low-level features and high-level musical concepts, making alignment computationally demanding. This study [...] Read more.
Advancements in music emotion prediction are driving AI-driven algorithmic composition, enabling the generation of complex melodies. However, bridging neural and auditory domains remains challenging due to the semantic gap between brain-derived low-level features and high-level musical concepts, making alignment computationally demanding. This study proposes a deep learning framework for generating MIDI sequences aligned with labeled emotion predictions through supervised feature extraction from neural and auditory domains. EEGNet is employed to process neural data, while an autoencoder-based piano algorithm handles auditory data. To address modality heterogeneity, Centered Kernel Alignment is incorporated to enhance the separation of emotional states. Furthermore, regression between feature domains is applied to reduce intra-subject variability in extracted Electroencephalography (EEG) patterns, followed by the clustering of latent auditory representations into denser partitions to improve MIDI reconstruction quality. Using musical metrics, evaluation on real-world data shows that the proposed approach improves emotion classification (namely, between arousal and valence) and the system’s ability to produce MIDI sequences that better preserve temporal alignment, tonal consistency, and structural integrity. Subject-specific analysis reveals that subjects with stronger imagery paradigms produced higher-quality MIDI outputs, as their neural patterns aligned more closely with the training data. In contrast, subjects with weaker performance exhibited auditory data that were less consistent. Full article
(This article belongs to the Special Issue Advances in ECG/EEG Monitoring)
Show Figures

Figure 1

Figure 1
<p>Proposed deep learning framework for EEG-based emotion prediction using supervised feature extraction for MIDI generation. Stages: (i) segment-wise preprocessing; (ii) supervised deep feature extraction for emotion classification; and (iii) affective-based MIDI prediction and feature alignment.</p>
Full article ">Figure 2
<p>Visualization of emotion labels and MIDI feature representations. (<b>a</b>) Emotion labels set by Subject <math display="inline"><semantics> <mrow> <mo>#</mo> <mn>1</mn> </mrow> </semantics></math>, where the <span class="html-italic">x</span>-axis represents arousal and the <span class="html-italic">y</span>-axis represents valence. (<b>b</b>) Two-dimensional t-SNE projection (<math display="inline"><semantics> <mrow> <mi>n</mi> <mo>_</mo> <mi>c</mi> <mi>o</mi> <mi>m</mi> <mi>p</mi> <mi>o</mi> <mi>n</mi> <mi>e</mi> <mi>n</mi> <mi>t</mi> <mi>s</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>, perplexity <math display="inline"><semantics> <mrow> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math>) of the piano-roll arrays, illustrating clustering of MIDI features based on emotion labels. Colors indicate the class of each audio stimulus. (<b>c</b>) Embedding space obtained from the bottleneck representation of the piano-roll autoencoder trained with CKA loss. Dots correspond to training data, while crosses (×) represent test data. Of note, the axes are resized to provide better visual perception of the plotted values.</p>
Full article ">Figure 3
<p>Comparison between original MIDI (<b>top</b>) and reconstructed MIDI (<b>bottom</b>). Of note, the axes are resized to provide better visual perception of the plotted values.</p>
Full article ">Figure 4
<p>Probability density functions (PDFs) of model characteristics for all subjects in fold 1. Each subfigure corresponds to a specific feature extracted from the MIDI data. (<b>a</b>) Feature 0: description of the feature (e.g., pitch range). (<b>b</b>) Feature 1: description of the feature (e.g., total used pitch). (<b>c</b>) Feature 2: description of the feature (e.g., average IOI). (<b>d</b>) Feature 3: description of the feature (e.g., pitch-class histogram).</p>
Full article ">Figure 5
<p>Violin plots comparing metrics for the best- and worst-performing subjects.</p>
Full article ">
18 pages, 8946 KiB  
Article
Estimation of Nitrogen Content in Hevea Rubber Leaves Based on Hyperspectral Data Deep Feature Fusion
by Wenfeng Hu, Longfei Zhang, Zhouyang Chen, Xiaochuan Luo and Cheng Qian
Sustainability 2025, 17(5), 2072; https://doi.org/10.3390/su17052072 - 27 Feb 2025
Viewed by 179
Abstract
Leaf nitrogen content is a critical quantitative indicator for the growth of rubber trees, and accurately determining this content holds significant value for agricultural management and precision fertilization. This study introduces a novel feature extraction framework—SFS-CAE—that integrates the Sequential Feature Selection (SFS) method [...] Read more.
Leaf nitrogen content is a critical quantitative indicator for the growth of rubber trees, and accurately determining this content holds significant value for agricultural management and precision fertilization. This study introduces a novel feature extraction framework—SFS-CAE—that integrates the Sequential Feature Selection (SFS) method with Convolutional Autoencoder (CAE) technology to enhance the accuracy of nitrogen content estimation. Initially, the SFS algorithm was employed to select spectral bands from hyperspectral data collected from rubber tree leaves, thereby extracting feature information pertinent to nitrogen content. Subsequently, a CAE was utilized to further explore deep features within the dataset. Ultimately, the selected feature subset was concatenated with deep features to create a comprehensive input feature set, which was then analyzed using partial least squares regression (PLSR) for nitrogen content regression estimation. To validate the effectiveness of the proposed methodology, comparisons were made against commonly used competitive adaptive reweighted sampling (CARS), successive projection algorithm (SPA), and uninformative variable elimination (UVE) feature selection algorithms. The results indicate that SFS-CAE outperforms traditional SFS methods on the test set; notably, CARS-CAE achieved optimal performance with a coefficient of determination (R2) of 0.9064 and a root mean square error (RMSE) of 0.1405. This approach not only effectively integrates deep features derived from hyperspectral data but also optimizes both band selection and feature extraction processes, offering an innovative solution for the efficient estimation of nitrogen content in rubber tree leaves. Full article
(This article belongs to the Section Sustainable Agriculture)
Show Figures

Figure 1

Figure 1
<p>Flow chart of estimation of nitrogen content in rubber tree leaves based on hyperspectral deep feature fusion. The process consists of four main stages: data acquisition, data preprocessing, feature engineering, and modeling analysis and evaluation. HSI stands for hyperspectral image. ROI stands for region of interest.</p>
Full article ">Figure 2
<p>Research area bitmap.</p>
Full article ">Figure 3
<p>Three-dimensional hyperspectral images of leaves, images of different bands, and original average spectral curves.</p>
Full article ">Figure 4
<p>IFA abnormal detection results. The numbers in the figure represent the number of the sample.</p>
Full article ">Figure 5
<p>Statistical histogram of nitrogen content.</p>
Full article ">Figure 6
<p>Pearson correlation coefficient heat map.</p>
Full article ">Figure 7
<p>SFS−CAE.</p>
Full article ">Figure 8
<p>Spectral curves after different pretreatment methods. (<b>a</b>) SG smoothing, (<b>b</b>) SNV, (<b>c</b>) WAVE, (<b>d</b>) FOD.</p>
Full article ">Figure 9
<p>Results of 1000 CARS runs. (<b>a</b>) The frequency with which each band occurs; (<b>b</b>) boxplots of the corresponding indicators for the training and test sets.</p>
Full article ">Figure 10
<p>CAE training results. (<b>a</b>) Loss curve of training set and validation set during CAE training; (<b>b</b>) one of the original curves in the validation set and the reconstruction results.</p>
Full article ">Figure 11
<p>Effect of the number of deep features on PLSR. The red vertical line positions represent the number of 110 deep features. The position of the red vertical line represents the number of 110 deep features, and R<sup>2</sup> being equal to 0.9064 means that the modeling accuracy of 100 deep features combined with 42 original features is 0.9064.</p>
Full article ">Figure 12
<p>Scatter plots of modeling results using different feature selection methods.</p>
Full article ">Figure 13
<p>Extraction results of sensitive spectral bands using various algorithms. The average spectrum is the average spectrum of all samples as follows: (<b>a</b>) Raw, (<b>b</b>) SPA, (<b>c</b>) UVE, (<b>d</b>) Cars1000.</p>
Full article ">
17 pages, 1774 KiB  
Article
Training a Minesweeper Agent Using a Convolutional Neural Network
by Wenbo Wang and Chengyou Lei
Appl. Sci. 2025, 15(5), 2490; https://doi.org/10.3390/app15052490 - 25 Feb 2025
Viewed by 294
Abstract
The Minesweeper game is modeled as a sequential decision-making task, for which a neural network architecture, state encoding, and reward function were herein designed. Both a Deep Q-Network (DQN) and supervised learning methods were successfully applied to optimize the training of the game. [...] Read more.
The Minesweeper game is modeled as a sequential decision-making task, for which a neural network architecture, state encoding, and reward function were herein designed. Both a Deep Q-Network (DQN) and supervised learning methods were successfully applied to optimize the training of the game. The experiments were conducted on the AutoDL platform using an NVIDIA RTX 3090 GPU for efficient computation. The results showed that in a 6 × 6 grid with four mines, the DQN model achieved an average win rate of 93.3% (standard deviation: 0.77%), while the supervised learning method achieved 91.2% (standard deviation: 0.9%), both outperforming human players and baseline algorithms and demonstrating high intelligence. The mechanisms of the two methods in the Minesweeper task were analyzed, with the reasons for the faster training speed and more stable performance of supervised learning explained from the perspectives of means–ends analysis and feedback control. Although there is room for improvement in sample efficiency and training stability in the DQN model, its greater generalization ability makes it highly promising for application in more complex decision-making tasks. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Minesweeper game interface on Windows XP system (beginner level, 9 × 9 grid, 10 mines).</p>
Full article ">Figure 2
<p>Illustration of single-channel encoding representation.</p>
Full article ">Figure 3
<p>Illustration of full encoding representation.</p>
Full article ">Figure 4
<p>Illustration of dual-channel encoding representation.</p>
Full article ">Figure 5
<p>Convolutional neural network model of the Minesweeper game agent.</p>
Full article ">Figure 6
<p>Schematic of the minesweeper reward function (The red box represents the currently clicked cell).</p>
Full article ">Figure 7
<p>Training curves of the two methods (smoothing parameter = 0.6).</p>
Full article ">
25 pages, 1516 KiB  
Article
Deep Learning Approach for Automatic Heartbeat Classification
by Roger de T. Guerra, Cristina K. Yamaguchi, Stefano F. Stefenon, Leandro dos S. Coelho and Viviana C. Mariani
Sensors 2025, 25(5), 1400; https://doi.org/10.3390/s25051400 - 25 Feb 2025
Viewed by 194
Abstract
Arrhythmia is an irregularity in the rhythm of the heartbeat, and it is the primary method for detecting cardiac abnormalities. The electrocardiogram (ECG) identifies arrhythmias and is one of the methods used to diagnose cardiac issues. Traditional arrhythmia detection methods are time-consuming, error-prone, [...] Read more.
Arrhythmia is an irregularity in the rhythm of the heartbeat, and it is the primary method for detecting cardiac abnormalities. The electrocardiogram (ECG) identifies arrhythmias and is one of the methods used to diagnose cardiac issues. Traditional arrhythmia detection methods are time-consuming, error-prone, and often subjective, making it difficult for doctors to discern between distinct patterns of arrhythmia. To understand ECG signals, this study presents a multi-class classifier and an autoencoder with long short-term memory (LSTM) network layers for extracting signal properties on a dataset from the Massachusetts Institute of Technology and Boston’s Beth Israel Hospital (MIT-BIH). The suggested model had an accuracy rate of 98.57% on the arrhythmia dataset and 97.59% on the supraventricular dataset. In contrast to other deep learning models, the proposed model eliminates the problem of the gradient disappearing in classification tasks. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

Figure 1
<p>Diagram for identifying and adding papers to the literature review of this study.</p>
Full article ">Figure 2
<p>Diagram of the proposed algorithm.</p>
Full article ">Figure 3
<p>Diagram of the adaptive filter used.</p>
Full article ">Figure 4
<p>Example of an original signal without passing through the filter.</p>
Full article ">Figure 5
<p>Example of a signal after filtering.</p>
Full article ">Figure 6
<p>Diagram of the coding layer structure.</p>
Full article ">Figure 7
<p>Diagram of the decoding layer structure.</p>
Full article ">Figure 8
<p>Classification of heartbeat types.</p>
Full article ">
17 pages, 4186 KiB  
Article
Anomaly-Guided Double Autoencoders for Hyperspectral Unmixing
by Hongyi Liu, Chenyang Zhang, Jianing Huang and Zhihui Wei
Remote Sens. 2025, 17(5), 800; https://doi.org/10.3390/rs17050800 - 25 Feb 2025
Viewed by 119
Abstract
Deep learning has emerged as a prevalent approach for hyperspectral unmixing. However, most existing unmixing methods employ a single network, resulting in moderate estimation errors and less meaningful endmembers and abundances. To address this imitation, this paper proposes a novel double autoencoders-based unmixing [...] Read more.
Deep learning has emerged as a prevalent approach for hyperspectral unmixing. However, most existing unmixing methods employ a single network, resulting in moderate estimation errors and less meaningful endmembers and abundances. To address this imitation, this paper proposes a novel double autoencoders-based unmixing method, consisting of an endmember extraction network and an abundance estimation network. In the endmember network, to improve the spectral discrimination, a logarithm spectral angle distance (SAD), integrated with anomaly-guided weight, is developed as the loss function. Specifically, the logarithm function is used to boost the reliability of a pixel based on its high SAD similarity to other pixels. Moreover, the anomaly-guided weight mitigates the influence of outliers. As for the abundance network, a spectral convolutional autoencoder combined with the channel attention module is employed to exploit the spectral features. Additionally, the decoder weight is shared between the two networks to reduce computational complexity. Extensive comparative experiments with state-of-the-art unmixing methods demonstrate that the proposed method achieves superior performance in both endmember extraction and abundance estimation. Full article
(This article belongs to the Special Issue Recent Advances in the Processing of Hyperspectral Images)
Show Figures

Figure 1

Figure 1
<p>The architecture of the proposed ADAEU network, including the endmember extraction network (EE-Net) and abundance estimation network (AE-Net).</p>
Full article ">Figure 2
<p>The datasets: (<b>a</b>) Jasper Ridge. (<b>b</b>) Urban. (<b>c</b>) Samson.</p>
Full article ">Figure 3
<p>Endmembers on Jasper Ridge dataset: (<b>a</b>) VCA. (<b>b</b>) SGSNMF. (<b>c</b>) CNNAEU. (<b>d</b>) OSPAEU. (<b>e</b>) MTAEU. (<b>f</b>) ENDNET. (<b>g</b>) PGMSU. (<b>h</b>) A2SAN. (<b>i</b>) ADAEU.</p>
Full article ">Figure 4
<p>Abundance maps on the Jasper Ridge dataset: (<b>a</b>) GT. (<b>b</b>) VCA. (<b>c</b>) SGSNMF. (<b>d</b>) CNNAEU. (<b>e</b>) OSPAEU. (<b>f</b>) MTAEU. (<b>g</b>) ENDNET. (<b>h</b>) PGMSU. (<b>i</b>) A2SAN. (<b>j</b>) ADAEU.</p>
Full article ">Figure 5
<p>Endmembers on the Urban dataset: (<b>a</b>) VCA. (<b>b</b>) SGSNMF. (<b>c</b>) CNNAEU. (<b>d</b>) OSPAEU. (<b>e</b>) MTAEU. (<b>f</b>) ENDNET. (<b>g</b>) PGMSU. (<b>h</b>) A2SAN. (<b>i</b>) ADAEU.</p>
Full article ">Figure 6
<p>Abundance maps on the Urban dataset: (<b>a</b>) GT. (<b>b</b>) DMAXD. (<b>c</b>) SGSNMF. (<b>d</b>) CNNAEU. (<b>e</b>) OSPAEU. (<b>f</b>) MTAEU. (<b>g</b>) ENDNET. (<b>h</b>) PGMSU. (<b>i</b>) A2SAN. (<b>j</b>) ADAEU.</p>
Full article ">Figure 7
<p>Endmembers on the Samson dataset: (<b>a</b>) VCA. (<b>b</b>) SGSNMF. (<b>c</b>) CNNAEU. (<b>d</b>) OSPAEU. (<b>e</b>) MTAEU. (<b>f</b>) ENDNET. (<b>g</b>) PGMSU. (<b>h</b>) A2SAN. (<b>i</b>) ADAEU.</p>
Full article ">Figure 8
<p>Abundance maps on the Samson dataset: (<b>a</b>) GT. (<b>b</b>) VCA. (<b>c</b>) SGSNMF. (<b>d</b>) CNNAEU. (<b>e</b>) OSPAEU. (<b>f</b>) MTAEU. (<b>g</b>) ENDNET. (<b>h</b>) PGMSU. (<b>i</b>) A2SAN. (<b>j</b>) ADAEU.</p>
Full article ">
26 pages, 17412 KiB  
Article
Enhancing Maritime Safety: Estimating Collision Probabilities with Trajectory Prediction Boundaries Using Deep Learning Models
by Robertas Jurkus, Julius Venskus, Jurgita Markevičiūtė and Povilas Treigys
Sensors 2025, 25(5), 1365; https://doi.org/10.3390/s25051365 - 23 Feb 2025
Viewed by 165
Abstract
We investigate maritime accidents near Bornholm Island in the Baltic Sea, focusing on one of the most recent vessel collisions and a way to improve maritime safety as a prevention strategy. By leveraging Long Short-Term Memory autoencoders, a class of deep recurrent neural [...] Read more.
We investigate maritime accidents near Bornholm Island in the Baltic Sea, focusing on one of the most recent vessel collisions and a way to improve maritime safety as a prevention strategy. By leveraging Long Short-Term Memory autoencoders, a class of deep recurrent neural networks, this research demonstrates a unique approach to forecasting vessel trajectories and assessing collision risks. The proposed method integrates trajectory predictions with statistical techniques to construct probabilistic boundaries, including confidence intervals, prediction intervals, ellipsoidal prediction regions, and conformal prediction regions. The study introduces a collision risk score, which evaluates the likelihood of boundary overlaps as a metric for collision detection. These methods are applied to simulated test scenarios and a real-world case study involving the 2021 collision between the Scot Carrier and Karin Hoej cargo ships. The results demonstrate that CPR, a non-parametric approach, reliably forecasts collision risks with 95% confidence. The findings underscore the importance of integrating statistical uncertainty quantification with deep learning models to improve navigational decision-making and encourage a shift towards more proactive, AI/ML-enhanced maritime risk management protocols. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Figure 1
<p>Workflow diagram of vessel trajectory prediction boundaries and collision detection.</p>
Full article ">Figure 2
<p>AIS data resampling based on time series. Grey dots represent all AIS observations, while red boundary circles indicate points selected using the k-nearest neighbour method for standardizing time steps.</p>
Full article ">Figure 3
<p>Illustration of boundary width determination in CPR using nonconformity scores.</p>
Full article ">Figure 4
<p>AIS study region with highlighted evaluation area.</p>
Full article ">Figure 5
<p>Distribution of data points for selected sequence predictions.</p>
Full article ">Figure 6
<p>Spatio-temporal analysis of maritime collision accident.</p>
Full article ">Figure 7
<p>Graphs of collision risk assessment: (<b>a</b>) calculation of CPA and TCPA for cargo vessels half an hour before collision; (<b>b</b>) Imazu problem case 4 schematic representation (extract from the source [<a href="#B52-sensors-25-01365" class="html-bibr">52</a>]).</p>
Full article ">Figure 8
<p>Comparative trajectory predictions from deep learning models. Triangles in the ‘Single Best Model’ subplot indicate the vessel’s moving direction.</p>
Full article ">Figure 9
<p>Visualisation of prediction and confidence intersection zones for bi-variate spatial data.</p>
Full article ">Figure 10
<p>Ellipsoidal prediction regions. Numbers next to the red crosses represent the corresponding prediction time steps.</p>
Full article ">Figure 11
<p>Graphs of accuracy and evaluation results: (<b>a</b>) average model error by time series of trajectory prediction (km); (<b>b</b>) sequences with all time steps within the calculated boundary widths (coverage probability) estimation; and (<b>c</b>) coverage probability count across individual time steps.</p>
Full article ">Figure 12
<p>Comparison of method boundary widths in a marine accident.</p>
Full article ">Figure A1
<p>Cross-validation results for different models: (<b>a</b>) cross-validation plot for a random model; (<b>b</b>) cross-validation plot for the best model.</p>
Full article ">
23 pages, 2239 KiB  
Article
Securing IoT Networks Against DDoS Attacks: A Hybrid Deep Learning Approach
by Noor Ul Ain, Muhammad Sardaraz, Muhammad Tahir, Mohamed W. Abo Elsoud and Abdullah Alourani
Sensors 2025, 25(5), 1346; https://doi.org/10.3390/s25051346 - 22 Feb 2025
Viewed by 228
Abstract
The Internet of Things (IoT) has revolutionized many domains. Due to the growing interconnectivity of IoT networks, several security challenges persist that need to be addressed. This research presents the application of deep learning techniques for Distributed Denial-of-Service (DDoS) attack detection in IoT [...] Read more.
The Internet of Things (IoT) has revolutionized many domains. Due to the growing interconnectivity of IoT networks, several security challenges persist that need to be addressed. This research presents the application of deep learning techniques for Distributed Denial-of-Service (DDoS) attack detection in IoT networks. This study assesses the performance of various deep learning models, including Latent Autoencoders, LSTM Autoencoders, and convolutional neural networks (CNNs), for DDoS attack detection in IoT environments. Furthermore, a novel hybrid model is proposed, integrating CNNs for feature extraction, Long Short-Term Memory (LSTM) networks for temporal pattern recognition, and Autoencoders for dimensionality reduction. Experimental results on the CICIOT2023 dataset show the enhanced performance of the proposed hybrid model, achieving training and testing accuracy of 96.78% integrated with 96.60% validation accuracy. This presents its efficiency in addressing complex attack patterns within IoT networks. Results’ analysis shows that the proposed hybrid model outperforms the others. However, this research has limitations in detecting rare attack types and emphasizes the importance of addressing data imbalance challenges for further enhancement of DDoS attack detection capabilities in future. Full article
Show Figures

Figure 1

Figure 1
<p>Workflow of the proposed model.</p>
Full article ">Figure 2
<p>Validation and training loss of Autoencoder.</p>
Full article ">Figure 3
<p>Distribution of reconstruction errors.</p>
Full article ">Figure 4
<p>A histogram illustrating the distribution of reconstruction errors.</p>
Full article ">Figure 5
<p>Training and validation loss of the LSTM model.</p>
Full article ">Figure 6
<p>Anomaly detection in test data for LSTM-based Autoencoder.</p>
Full article ">Figure 7
<p>Distribution of reconstruction error on test data.</p>
Full article ">Figure 8
<p>Performance of the CNN model during training and validation phases.</p>
Full article ">Figure 9
<p>Confusion matrix of the CNN.</p>
Full article ">Figure 10
<p>Performance of the hybrid model during training and validation phases.</p>
Full article ">Figure 11
<p>Confusion matrix of the hybrid model.</p>
Full article ">
17 pages, 1533 KiB  
Article
Multimodal Brain Growth Patterns: Insights from Canonical Correlation Analysis and Deep Canonical Correlation Analysis with Auto-Encoder
by Ram Sapkota, Bishal Thapaliya, Bhaskar Ray, Pranav Suresh and Jingyu Liu
Information 2025, 16(3), 160; https://doi.org/10.3390/info16030160 - 20 Feb 2025
Viewed by 208
Abstract
Today’s advancements in neuroimaging have been pivotal in enhancing our understanding of brain development and function using various MRI techniques. This study utilizes images from T1-weighted imaging and diffusion-weighted imaging to identify gray matter and white matter coherent growth patterns within 2 years [...] Read more.
Today’s advancements in neuroimaging have been pivotal in enhancing our understanding of brain development and function using various MRI techniques. This study utilizes images from T1-weighted imaging and diffusion-weighted imaging to identify gray matter and white matter coherent growth patterns within 2 years from 9–10-year-old participants in the Adolescent Brain Cognitive Development (ABCD) Study. The motivation behind this investigation lies in the need to comprehend the intricate processes of brain development during adolescence, a critical period characterized by significant cognitive maturation and behavioral change. While traditional methods like canonical correlation analysis (CCA) capture the linear interactions of brain regions, a deep canonical correlation analysis with an autoencoder (DCCAE) nonlinearly extracts brain patterns. The study involves a comparative analysis of changes in gray and white matter over two years, exploring their interrelation based on correlation scores, extracting significant features using both CCA and DCCAE methodologies, and finding an association between the extracted features with cognition and the Child Behavior Checklist. The results show that both CCA and DCCAE components identified similar brain regions associated with cognition and behavior, indicating that brain growth patterns over this two-year period are linear. The variance explained by CCA and DCCAE components for cognition and behavior suggests that brain growth patterns better account for cognitive maturation compared to behavioral changes. This research advances our understanding of neuroimaging analysis and provides valuable insights into the nuanced dynamics of brain development during adolescence. Full article
(This article belongs to the Section Biomedical Information and Health)
Show Figures

Figure 1

Figure 1
<p>First, second, and third components of GM identified through CCA.</p>
Full article ">Figure 2
<p>First, second, and third components of FA identified through CCA.</p>
Full article ">Figure 3
<p>First, second, and third components of GM identified through DCCAE.</p>
Full article ">Figure 4
<p>First, second, and third components of FA identified through DCCAE.</p>
Full article ">
23 pages, 10921 KiB  
Article
A Weakly Supervised and Self-Supervised Learning Approach for Semantic Segmentation of Land Cover in Satellite Images with National Forest Inventory Data
by Daniel Moraes, Manuel L. Campagnolo and Mário Caetano
Remote Sens. 2025, 17(4), 711; https://doi.org/10.3390/rs17040711 - 19 Feb 2025
Viewed by 182
Abstract
National Forest Inventories (NFIs) provide valuable land cover (LC) information but often lack spatial continuity and an adequate update frequency. Satellite-based remote sensing offers a viable alternative, employing machine learning to extract thematic data. State-of-the-art methods such as convolutional neural networks rely on [...] Read more.
National Forest Inventories (NFIs) provide valuable land cover (LC) information but often lack spatial continuity and an adequate update frequency. Satellite-based remote sensing offers a viable alternative, employing machine learning to extract thematic data. State-of-the-art methods such as convolutional neural networks rely on fully pixel-level annotated images, which are difficult to obtain. Although reference LC datasets have been widely used to derive annotations, NFIs consist of point-based data, providing only sparse annotations. Weakly supervised and self-supervised learning approaches help address this issue by reducing dependence on fully annotated images and leveraging unlabeled data. However, their potential for large-scale LC mapping needs further investigation. This study explored the use of NFI data with deep learning and weakly supervised and self-supervised methods. Using Sentinel-2 images and the Portuguese NFI, which covers other LC types beyond forest, as sparse labels, we performed weakly supervised semantic segmentation with a convolutional neural network to create an updated and spatially continuous national LC map. Additionally, we investigated the potential of self-supervised learning by pretraining a masked autoencoder on 65,000 Sentinel-2 image chips and then fine-tuning the model with NFI-derived sparse labels. The weakly supervised baseline achieved a validation accuracy of 69.60%, surpassing Random Forest (67.90%). The self-supervised model achieved 71.29%, performing on par with the baseline using half the training data. The results demonstrated that integrating both learning approaches enabled successful countrywide LC mapping with limited training data. Full article
(This article belongs to the Section Earth Observation Data)
Show Figures

Figure 1

Figure 1
<p>Study area and location of sample areas used for model training and validation.</p>
Full article ">Figure 2
<p>Example of NFI photo-points: (<b>a</b>) with matching point-patch labels; (<b>b</b>) located at the interface between distinct land covers; and (<b>c</b>) with mismatching point-patch labels.</p>
Full article ">Figure 3
<p>Illustration of distinctly labeled training data. High-resolution image (<b>a</b>), dense labels used in typical fully supervised methods (<b>b</b>) and sparse labels used in our weakly supervised approach (<b>c</b>). Colored and grey pixels correspond to labeled and unlabeled pixels, respectively. The labels in (<b>c</b>) are derived from the photo-point, seen in the center of the 3 × 3 window.</p>
Full article ">Figure 4
<p>Network architecture of our ConvNext-V2 Atto U-Net. The figure also exhibits the ConvNext-V2 block. LN, GRN and GELU stand for Layer Normalization, Global Response Normalization and Gaussian Error Linear Unit, respectively. Conv K × K refers to a convolutional layer with a kernel size of K × K.</p>
Full article ">Figure 5
<p>MAE architecture, illustrating the reconstruction of masked patches. Image representations learned at the encoder can be transferred and applied to different downstream tasks. Each patch corresponds to 8 × 8 pixels.</p>
Full article ">Figure 6
<p>Overall accuracy of the baseline and self-supervised pretrained models. The values represent the average of 10 runs with a 95% confidence interval and were computed on the validation split.</p>
Full article ">Figure 7
<p>Validation split accuracy of the three tested models with distinct training set sizes. The reported values are the average of 10 runs with a 95% confidence interval.</p>
Full article ">Figure 8
<p>Model performance per land cover class measured by the F1-score. For other coniferous, no F1-score was reported for Random Forest, as the model did not predict any sampling units belonging to this class.</p>
Full article ">Figure 9
<p>Example of land cover maps produced by Random Forest, ConvNext-V2 baseline and ConvNext-V2 self-supervised pretrained models.</p>
Full article ">Figure 10
<p>Land cover map of Portugal (2023).</p>
Full article ">Figure A1
<p>Example of 30 × 30 m windows used for training a Random Forest classifier for the homogeneity filter. Annotations as non-homogeneous or homogeneous considered not only the high-resolution images (seen in the figure) but also Sentinel-2 images.</p>
Full article ">
9 pages, 4313 KiB  
Article
Power Load Forecasting System of Iron and Steel Enterprises Based on Deep Kernel–Multiple Kernel Joint Learning
by Yan Zhang, Junsheng Wang, Jie Sun, Ruiqi Sun and Dawei Qin
Processes 2025, 13(2), 584; https://doi.org/10.3390/pr13020584 - 19 Feb 2025
Viewed by 271
Abstract
The traditional power load forecasting learning method has problems such as overfitting and incomplete learning of time series information when dealing with complex nonlinear data, which affects the accuracy of short–medium term power load forecasting. A joint learning method, LSVM-MKL, was proposed based [...] Read more.
The traditional power load forecasting learning method has problems such as overfitting and incomplete learning of time series information when dealing with complex nonlinear data, which affects the accuracy of short–medium term power load forecasting. A joint learning method, LSVM-MKL, was proposed based on the bidirectional promotion of deep kernel learning (DKL) and multiple kernel learning (MKL). The multi-kernel method was combined with the input layer, the highest coding layer, and the highest encoding layer to model the network of the stack autoencoder (SAE) to obtain more comprehensive information. At the same time, the deep kernel was integrated into the optimization training of Gaussian multi-kernel by means of the nonlinear product to form the nonlinear composite kernel. Through a large number of reference datasets and actual industrial data experiments, it was shown that compared with the Elman and LSTM-Seq2Seq methods, the proposed method achieved a higher prediction accuracy of 4.32%, which verified its adaptability to complex time-varying power load forecasting processes and greatly improved the accuracy of power load forecasting. Full article
Show Figures

Figure 1

Figure 1
<p>Joint learning framework.</p>
Full article ">Figure 2
<p>Stack autoencoder structure.</p>
Full article ">Figure 3
<p>Classification accuracy under three kinds of multi-kernel instances. (<b>a</b>) LSVM-MKL1; (<b>b</b>) Elman1; (<b>c</b>) LSTM-Seq2Seq1; (<b>d</b>) LSVM-MKL2; (<b>e</b>) Elman2; and (<b>f</b>) LSTM-Seq2Seq2.</p>
Full article ">Figure 4
<p>Datasets and timing databases.</p>
Full article ">Figure 5
<p>Power load forecasting results. (<b>a</b>) Training values and actual values; and (<b>b</b>) predicted values and real values.</p>
Full article ">
15 pages, 1877 KiB  
Article
GraphEPN: A Deep Learning Framework for B-Cell Epitope Prediction Leveraging Graph Neural Networks
by Feng Wang, Xiangwei Dai, Liyan Shen and Shan Chang
Appl. Sci. 2025, 15(4), 2159; https://doi.org/10.3390/app15042159 - 18 Feb 2025
Viewed by 271
Abstract
B-cell epitope prediction is crucial for advancing immunology, particularly in vaccine development and antibody-based therapies. Traditional experimental techniques are hindered by high costs, time consumption, and limited scalability, making them unsuitable for large-scale applications. Computational methods provide a promising alternative, enabling high-throughput screening [...] Read more.
B-cell epitope prediction is crucial for advancing immunology, particularly in vaccine development and antibody-based therapies. Traditional experimental techniques are hindered by high costs, time consumption, and limited scalability, making them unsuitable for large-scale applications. Computational methods provide a promising alternative, enabling high-throughput screening and accurate predictions. However, existing computational approaches often struggle to capture the complexity of protein structures and intricate residue interactions, highlighting the need for more effective models. This study presents GraphEPN, a novel B-cell epitope prediction framework combining a vector quantized variational autoencoder (VQ-VAE) with a graph transformer. The pre-trained VQ-VAE captures both discrete representations of amino acid microenvironments and continuous structural embeddings, providing a comprehensive feature set for downstream tasks. The graph transformer further processes these features to model long-range dependencies and interactions. Experimental results demonstrate that GraphEPN outperforms existing methods across multiple datasets, achieving superior prediction accuracy and robustness. This approach underscores the significant potential for applications in immunodiagnostics and vaccine development, merging advanced deep learning-based representation learning with graph-based modeling. Full article
Show Figures

Figure 1

Figure 1
<p>Schematic view of the GraphEPN architecture. (<b>a</b>) Overall framework: The 3D structure of a protein (PDB ID: 1OTU_A) is represented as a graph, where nodes correspond to amino acid residues. The VQ-VAE encodes and quantizes node features into discrete embeddings, which are passed to a graph transformer for epitope prediction. (<b>b</b>) VQ-VAE Module: The encoder extracts latent features and maps them to the nearest codebook vectors, generating discrete representations, while the decoder reconstructs original features. (<b>c</b>) Graph Transformer Architecture: The model applies graph attention networks (GAT) to capture residue interactions, followed by residual connections and feed-forward layers.</p>
Full article ">Figure 2
<p>Performance evaluation of the GraphEPN model. (<b>a</b>) ROC curves of 5-fold cross-validation for GraphEPN. (<b>b</b>) AUPRC curves of 5-fold cross-validation for GraphEPN. (<b>c</b>) Comparison of the ROC curves between the GraphEPN and peer methods. (<b>d</b>) Comparison of AUPRC curves between GraphEPN and peer methods.</p>
Full article ">Figure 3
<p>Visualization of epitope predictions for a test case (PDB ID: 6ad8_A, chain A) across multiple methods. (<b>a</b>) Reference epitopes. (<b>b</b>–<b>f</b>) Predictions by GraphEPN, BepiPred 3.0, SEPPA 3.0, SEMA 2.0, and ElliPro, respectively. In each model, correctly predicted epitope residues (true positives) are shown in green, residues incorrectly predicted as epitopes (false positives) are shown in red, and residues that should have been predicted as epitopes but were missed (false negatives) are highlighted in yellow. Silver represents non-epitope residues.</p>
Full article ">Figure 4
<p>Visualization of GraphEPN model predictions for protein 2j88_A. (<b>a</b>) Predicted epitopes are highlighted on the protein’s 3D structure. Residues with high prediction scores are shown as cyan sticks, with secondary structure elements in green and unlabeled regions in silver. (<b>b</b>) Epitope prediction scores along the sequence, where the <span class="html-italic">x</span>-axis corresponds to the sequence positions of amino acids and the <span class="html-italic">y</span>-axis represents the predicted epitope scores. The color gradient indicates the predicted confidence, with yellow representing high-confidence predictions. The blue dashed line marks the prediction threshold.</p>
Full article ">
Back to TopTop