[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Previous Issue
Volume 5, December
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

AI, Volume 6, Issue 1 (January 2025) – 2 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
21 pages, 473 KiB  
Article
Feature Selection in Cancer Classification: Utilizing Explainable Artificial Intelligence to Uncover Influential Genes in Machine Learning Models
by Matheus Dalmolin, Karolayne S. Azevedo, Luísa C. de Souza, Caroline B. de Farias, Martina Lichtenfels and Marcelo A. C. Fernandes
AI 2025, 6(1), 2; https://doi.org/10.3390/ai6010002 - 27 Dec 2024
Viewed by 128
Abstract
This study investigates the use of machine learning (ML) models combined with explainable artificial intelligence (XAI) techniques to identify the most influential genes in the classification of five recurrent cancer types in women: breast cancer (BRCA), lung adenocarcinoma (LUAD), thyroid cancer (THCA), ovarian [...] Read more.
This study investigates the use of machine learning (ML) models combined with explainable artificial intelligence (XAI) techniques to identify the most influential genes in the classification of five recurrent cancer types in women: breast cancer (BRCA), lung adenocarcinoma (LUAD), thyroid cancer (THCA), ovarian cancer (OV), and colon adenocarcinoma (COAD). Gene expression data from RNA-seq, extracted from The Cancer Genome Atlas (TCGA), were used to train ML models, including decision trees (DTs), random forest (RF), and XGBoost (XGB), which achieved accuracies of 98.69%, 99.82%, and 99.37%, respectively. However, the challenges in this analysis included the high dimensionality of the dataset and the lack of transparency in the ML models. To mitigate these challenges, the SHAP (Shapley Additive Explanations) method was applied to generate a list of features, aiming to understand which characteristics influenced the models’ decision-making processes and, consequently, the prediction results for the five tumor types. The SHAP analysis identified 119, 80, and 10 genes for the RF, XGB, and DT models, respectively, totaling 209 genes, resulting in 172 unique genes. The new list, representing 0.8% of the original input features, is coherent and fully explainable, increasing confidence in the applied models. Additionally, the results suggest that the SHAP method can be effectively used as a feature selector in gene expression data. This approach not only enhances model transparency but also maintains high classification performance, highlighting its potential in identifying biologically relevant features that may serve as biomarkers for cancer diagnostics and treatment planning. Full article
(This article belongs to the Section Medical & Healthcare AI)
Show Figures

Figure 1

Figure 1
<p>Flowchart of activities to obtain the most important characteristics in the classification.</p>
Full article ">Figure 2
<p>Number of samples existing in the database before applying the undersampling balancing technique.</p>
Full article ">Figure 3
<p>Training curves for different models: (<b>a</b>) Training curve for the DT model; (<b>b</b>) Training curve for the RF model; and (<b>c</b>) Training curve for the XGB model.</p>
Full article ">Figure 4
<p>SHAP summary plot for the decision tree multiclass classification model. The plot displays the contributions of features (genes) to the prediction of cancer types: breast cancer (BRCA), lung adenocarcinoma (LUAD), thyroid cancer (THCA), ovarian cancer (OV), and colon adenocarcinoma (COAD). Features are ranked by maximum average SHAP values, highlighting the most important genes for distinguishing between the classes.</p>
Full article ">Figure 5
<p>SHAP summary plot for the random forest multiclass classification model. The plot displays the contributions of features (genes) to the prediction of cancer types: breast cancer (BRCA), lung adenocarcinoma (LUAD), thyroid cancer (THCA), ovarian cancer (OV), and colon adenocarcinoma (COAD). Features are ranked by maximum average SHAP values, highlighting the most important genes for distinguishing between the classes.</p>
Full article ">Figure 6
<p>SHAP summary plot for the XGBoost multiclass classification model. The plot displays the contributions of features (genes) to the prediction of cancer types: breast cancer (BRCA), lung adenocarcinoma (LUAD), thyroid cancer (THCA), ovarian cancer (OV), and colon adenocarcinoma (COAD). Features are ranked by maximum average SHAP values, highlighting the most important genes for distinguishing between the classes.</p>
Full article ">
33 pages, 3678 KiB  
Article
A Step Towards Neuroplasticity: Capsule Networks with Self-Building Skip Connections
by Nikolai A. K. Steur and Friedhelm Schwenker
AI 2025, 6(1), 1; https://doi.org/10.3390/ai6010001 - 24 Dec 2024
Viewed by 41
Abstract
Background: Integrating nonlinear behavior into the architecture of artificial neural networks is regarded as essential requirement to constitute their effectual learning capacity for solving complex tasks. This claim seems to be true for moderate-sized networks, i.e., with a lower double-digit number of layers. [...] Read more.
Background: Integrating nonlinear behavior into the architecture of artificial neural networks is regarded as essential requirement to constitute their effectual learning capacity for solving complex tasks. This claim seems to be true for moderate-sized networks, i.e., with a lower double-digit number of layers. However, going deeper with neural networks regularly turns into destructive tendencies of gradual performance degeneration during training. To circumvent this degradation problem, the prominent neural architectures Residual Network and Highway Network establish skip connections with additive identity mappings between layers. Methods: In this work, we unify the mechanics of both architectures into Capsule Networks (CapsNet)s by showing their inherent ability to learn skip connections. As a necessary precondition, we introduce the concept of Adaptive Nonlinearity Gates (ANG)s which dynamically steer and limit the usage of nonlinear processing. We propose practical methods for the realization of ANGs including biased batch normalization, the Doubly-Parametric ReLU (D-PReLU) activation function, and Gated Routing (GR) dedicated to extremely deep CapsNets. Results: Our comprehensive empirical study using MNIST substantiates the effectiveness of our developed methods and delivers valuable insights for the training of very deep nets of any kind. The final experiments on Fashion-MNIST and SVHN demonstrate the potential of pure capsule-driven networks with GR. Full article
Show Figures

Figure 1

Figure 1
<p>Visualization of the degradation problem in relation to the network depth based on (<b>a</b>) plain networks and (<b>b</b>) CapsNets with distinct activation functions, using the MNIST classification dataset. A plain network contains 32 neurons per layer, while a CapsNet consists of eight capsules with four neurons each. Network depth is stated as the number of intermediate blocks, including an introducing convolutional layer and a closing classification head. Each block consists of a fully connected layer followed by BN and the application of the activation function. In the case of CapsNets, signal flow between consecutive capsule layers is controlled by a specific routing procedure. The final loss (as cross-entropy) and accuracy, both based on the training set, are reported as an average over five runs with random network initialization. Each run comprises <math display="inline"><semantics> <mrow> <mn>2</mn> <mi>n</mi> </mrow> </semantics></math> training epochs, where <span class="html-italic">n</span> equals the number of intermediate blocks.</p>
Full article ">Figure 2
<p>Shortcut and skip connections (highlighted in red) in residual learning. (<b>a</b>) Original definition of a shortcut connection with projection matrix based on [<a href="#B5-ai-06-00001" class="html-bibr">5</a>]. (<b>b</b>) Pattern for self-building skip connections in a CapsNet with SR and an activation function with a suitable linear interval.</p>
Full article ">Figure 3
<p>Replacement of the static signal propagation in a CapsNet with a nonlinear routing procedure to form parametric information flow gates. (<b>a</b>) Basic pattern with a single routing gate. (<b>b</b>) Exemplary skip path (highlighted in red) crossing multiple layers and routing gates.</p>
Full article ">Figure 4
<p>Customizing the initialization scheme for BN<math display="inline"><semantics> <mrow> <mo>(</mo> <mi>β</mi> <mo>,</mo> <mi>γ</mi> <mo>)</mo> </mrow> </semantics></math> allows the training of deeper networks by constraining the input distribution (in blue) of an activation function to be positioned in a mostly linear section. Exemplary initializations are shown for (<b>a</b>) sigmoid with BN<math display="inline"><semantics> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>0.5</mn> <mo>)</mo> </mrow> </semantics></math>, and (<b>b</b>) Leaky ReLU with BN<math display="inline"><semantics> <mrow> <mo>(</mo> <mo>−</mo> <mn>2</mn> <mo>,</mo> <mn>1</mn> <mo>)</mo> </mrow> </semantics></math>.</p>
Full article ">Figure 5
<p>Parametric versions of ReLU with (<b>a</b>) single and (<b>b</b>) four degree(s) of freedom using an exemplary parameter range of <math display="inline"><semantics> <mrow> <msub> <mi>ρ</mi> <mi>i</mi> </msub> <mo>∈</mo> <mrow> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>]</mo> </mrow> </mrow> </semantics></math>. (<b>a</b>) PReLU learns a nonlinearity specification <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> for input values below zero and directly passes signals above zero. (<b>b</b>) SReLU applies the identity function within the interval <math display="inline"><semantics> <mrow> <mo>[</mo> <msub> <mi>t</mi> <mi>min</mi> </msub> <mo>,</mo> <msub> <mi>t</mi> <mi>max</mi> </msub> <mo>]</mo> </mrow> </semantics></math>, and learns two individual nonlinearity specifications <math display="inline"><semantics> <msub> <mi>ρ</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>ρ</mi> <mn>2</mn> </msub> </semantics></math> outside of the centered interval.</p>
Full article ">Figure 6
<p>(<b>a</b>) Generic model architecture with (<b>b</b>) one-layer Feature Extractor (FE), a classification head with <span class="html-italic">z</span> classes and (<b>c</b>) intermediate blocks consisting of fully-connected layers. Dense blocks are specified via capsules or scalar neurons (plain) for the fully-connected units.</p>
Full article ">Figure 7
<p><b>First two rows:</b> Mean (first row) and best (second row) training loss progressions over five runs for each BN<math display="inline"><semantics> <mrow> <mo>(</mo> <mi>β</mi> <mo>,</mo> <mi>γ</mi> <mo>)</mo> </mrow> </semantics></math> initialization scheme per activation function. <b>Last two rows:</b> Mean deviation per BN layer of the final <math display="inline"><semantics> <msub> <mi>β</mi> <mi>i</mi> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>γ</mi> <mi>i</mi> </msub> </semantics></math> parameters from their initial values, using the identified superior BN initialization scheme for each activation function. Per plot the model parameter deviations are shown for the best run and as average over all five runs.</p>
Full article ">Figure 8
<p>(<b>a</b>) Mean and (<b>b</b>) best training loss development over five runs using 90 intermediate blocks, AMSGrad and the superior BN<math display="inline"><semantics> <mrow> <mo>(</mo> <mi>β</mi> <mo>,</mo> <mi>γ</mi> <mo>)</mo> </mrow> </semantics></math> initialization strategy per activation function. Both subfigures provide an inset as a zoom-in for tight regions.</p>
Full article ">Figure 9
<p>(<b>a</b>) Percentage gain in accuracy for the remaining epochs measured in relation to the final accuracy. Accuracy gains below one percentage (red line) are gray. (<b>b</b>) Mean training loss development over five runs for varying network depths using ReLU, AMSGrad and BN<math display="inline"><semantics> <mrow> <mo>(</mo> <mn>2</mn> <mo>,</mo> <mn>1</mn> <mo>)</mo> </mrow> </semantics></math> initialization strategy.</p>
Full article ">Figure 10
<p>Each row summarizes the experiment results of the parametric activation functions PReLU, SReLU/D-PReLU and APLU, respectively. <b>First two columns:</b> Mean (first column) and best (second column) training loss development over five runs using AMSGrad and varying initialization strategies for BN<math display="inline"><semantics> <mrow> <mo>(</mo> <mi>β</mi> <mo>,</mo> <mi>γ</mi> <mo>)</mo> </mrow> </semantics></math> and the activation function parameters. Insets are provided as zoom-in for tight regions. <b>Second two columns:</b> Mean parameter deviations per layer from their initial values with respect to BN and the parametric activation function. In each case, the identified superior configuration strategy is used. For APLU the configuration with <math display="inline"><semantics> <mrow> <mi>s</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math> is preferred against <math display="inline"><semantics> <mrow> <mi>s</mi> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math> for the benefit of proper visualization. <b>Last column:</b> Mean training loss progress over five runs for varying network depths using the identified superior configuration strategy.</p>
Full article ">Figure 11
<p>Mean training loss development over five runs using CapsNets with a depth of 500 intermediate blocks and varying routing procedures, activation functions and BN initializations.</p>
Full article ">Figure 12
<p>(<b>a</b>) Mean training (<b>solid</b>) and validation (<b>dotted</b>) loss progressions over five runs for the pure capsule-driven architecture. (<b>b</b>) Mean bias parameter deviation of GR after training from their initial value of <math display="inline"><semantics> <mrow> <mo>−</mo> <mn>3</mn> </mrow> </semantics></math>.</p>
Full article ">Figure A1
<p>Row-wise 20 random samples for each dataset in <a href="#ai-06-00001-t0A1" class="html-table">Table A1</a>.</p>
Full article ">Figure A2
<p>Final training loss (<b>left</b>) and training accuracy (<b>right</b>) averaged over five runs using CapsNets with increasing network depth and distinct configurations.</p>
Full article ">Figure A3
<p>Convolutional capsule unit with GR between two layers of identical dimensionality and image downsampling using grouped convolutions.</p>
Full article ">
Previous Issue
Back to TopTop