Open AccessArticle

Semi-Supervised Deep Subspace Embedding for Binary Classification of Sella Turcica

Kaushlesh Singh Shakya

^1,2,3

Azadeh Alavi

^1,*

Julie Porteous

Priti Khatri

^2,3

Amit Laddi

^2,3,*

Manojkumar Jaiswal

^4,*

and

Vinay Kumar

⁴

School of Computing Technologies, RMIT University, Melbourne, VIC 3000, Australia

Academy of Scientific & Innovative Research (AcSIR), Ghaziabad 201002, India

CSIR—Central Scientific Instruments Organisation, Chandigarh 160030, India

⁴

Oral Health Sciences Centre, Post Graduate Institute of Medical Education & Research (PGIMER), Chandigarh 160012, India

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(23), 11154; https://doi.org/10.3390/app142311154

Submission received: 11 November 2024 / Revised: 25 November 2024 / Accepted: 27 November 2024 / Published: 29 November 2024

(This article belongs to the Special Issue Application of Artificial Intelligence in Biomedical Informatics)

Download

Browse Figures

Versions Notes

Abstract

In orthodontics, the manual tracing of cephalometric radiographs is a common practice, where the Sella Turcica (ST) serves as a reference point. The radiologist often manually traces the outline of the sella using manual tools (e.g., calipers on radiographs). Perhaps the inherent complexity and variability in the shapes of sella and the lack of advanced assessment tools make the classification of sella challenging, as it requires extensive training, skills, time, and manpower to detect subtle changes that often may not be apparent. Moreover, existing semi-supervised learning (SSL) methods face key limitations such as shift invariance, inadequate feature representation, overfitting on small datasets, and a lack of generalization to unseen variations in ST morphology. Medical imaging data are often unlabeled, limiting the training of automated classification systems for ST morphology. To address these limitations, a novel semi-supervised deep subspace embedding (SSLDSE) framework is proposed. This approach integrates real-time stochastic augmentation to significantly expand the training dataset and introduce natural variability in the ST morphology, overcoming the constraints of small and non-representative datasets. Non-linear features are extracted and mapped to a non-linear subspace using Kullback–Leibler divergence, which ensures that the model remains consistent despite image transformations, thus resolving issues related to shift invariance. Additionally, fine-tuning the Inception-ResNet-v2 network on these enriched features reduces retraining costs when new unlabeled data becomes available. t-distributed stochastic neighbor embedding (t-SNE) is employed for effective feature representation through manifold learning, capturing complex patterns that previous methods might miss. Finally, a zero-shot classifier is utilized to accurately categorize the ST, addressing the challenge of classifying new or unseen variations. Further, the proposed SSLDSE framework is evaluated through comparative analysis with the existing methods (Active SSL, GAN SSL, Contrastive SSL, Modified Inception-ResNet-v2) for ST classification using various evaluation metrics. The SSLDSE and the existing methods are trained on our dataset (sourced from PGI Chandigarh, India), and a blind test is conducted on the benchmark dataset (IEEE ISBI 2015). The proposed method improves classification accuracy by 15% compared to state-of-the-art models and reduces retraining costs.

Keywords:

sella turcica; deep learning; semi supervised learning; manifold learning; medical images; classification

1. Introduction

The Sella Turcica (ST), a saddle-shaped depression in the sphenoid bone housing the pituitary gland, plays a crucial role in craniofacial orthodontics. It serves as a key reference point in cephalometric analysis, essential for assessing craniofacial growth and guiding orthodontic treatment planning, particularly in aligning the jaws [1] and facial bones [2]. Alterations in the shape or size of the ST often signify growth disturbances, which may lead to dentofacial deformities, such as malocclusions and jaw misalignments [3,4,5,6]. According to the literature, the ST is divided into three segments: the anterior wall, floor, and posterior wall [7]. In addition, Camp subdivided the standard sella shape into three types: oval, round or circular, and flat (Figure 1) [8,9]. Among these, the oval shape is the most prevalent, while the flat shape is the least common. Another interesting morphological shape of the ST is the convergence of the anterior and posterior clinoid lobes, also known as bridging, illustrated in Figure 1. This morphology is strongly associated with certain syndromes and malformations involving skeletal and dental structures [10]. The accurate classification of ST morphology is essential, as specific variations such as bridging or flattening often correlate with dentofacial deformities, syndromes, and even endocrine disorders due to the proximity of the ST to the pituitary gland [11,12,13]. Currently, manual assessment tools, including visual inspection, linear measurements [7], and shape analysis of cephalometric X-rays [8], dominate clinical practice. However, these methods are time intensive, prone to variability, and heavily reliant on specialized expertise, underscoring the acute need for automated and objective assessment techniques [9].

Deep learning techniques have proven highly effective in various computer vision tasks. However, the effectiveness of these approaches heavily relies on the availability of labeled datasets. In unsupervised learning scenarios, evaluating the reliability of extracted features proves challenging due to the lack of prior information, making it difficult to assess their robustness [14]. Conversely, supervised learning methods are more commonly employed across diverse applications. Although supervised learning produces more reliable features, the process of annotating extensive datasets is often extremely time intensive and expensive. Consequently, there has been a growing interest in semi-supervised learning (SSL) methodologies that utilize deep learning models [15,16]. Previous studies mainly used only unlabeled data during pre-training [17]. However, recent research efforts have focused on incorporating unlabeled data throughout training [18].

Semi-supervised learning (SSL) techniques are often used to improve model performance by extracting important features or insights from unlabeled data [19,20,21]. However, these approaches have two main limitations. First, many current methods do not effectively address contours in areas with high uncertainty and fail to explicitly model boundary distances. This can result in poor coverage of the targets or inaccurate boundary predictions, especially in medical images. Second, these approaches typically only apply semi-supervision at the final layer of the network, which can lead to the propagation of incorrect guidance throughout the network and undervalue the importance of labeled data compared to unlabeled data. This issue is particularly significant when training convolutional neural networks with a large number of parameters and complex architecture, especially when labeled data are scarce [22,23,24]. Furthermore, recent strategies that attempt to cluster inputs by incorporating noise and smoothing mapping functions locally are less effective when the raw data already contain significant noise.

To address the challenges mentioned earlier, a novel four-step approach for the binary classification (bridging and non-bridging) of non-linear sella shapes using advanced deep SSL has been developed. Initially, fine-tuning is employed to extract a set of discriminative features. Subsequently, deep subspace-based descriptors are applied for feature extraction and descriptor generation, transforming any given image into a non-linear subspace and leveraging manifold learning techniques. Following this, t-distributed stochastic neighbor embedding (t-SNE) is used for dimensionality reduction, with convergence criteria defined to stop the optimization process by minimizing the Kullback–Leibler (KL) divergence between probability distributions. This dimensionality reduction is critical for preserving the local structure of the data while simplifying the representation, making it easier to differentiate between bridging and non-bridging sella shapes. Next, a zero-shot classifier (ZsC) is integrated for binary classification using weighted binary cross-entropy loss, yielding the final output representing the dedicated class of sella structures and leveraging unlabeled data. Finally, the features computed in step two are used to train the Inception-ResNet-v2 network for classification purposes. The contributions of this study condense as follows:

The method proposed in this study introduces a novel approach for conducting a non-linear embedding with KL divergence for semi-supervised binary classification.
The proposed semi-supervised deep subspace embedding (SSLDSE) methodology creates a shift-invariant image representation by first producing an augmented image set for each input. This approach addresses the limitations of existing models, such as Active SSL and Contrastive SSL, which struggle with dataset variability and shifts. SSLDSE improves feature representation and robustness to transformations by mapping each image set to a non-linear subspace defined by a KL divergence point.
The model employs t-SNE for feature representations, utilizing manifold learning to capture non-linear structures that previous methods, such as GAN SSL and Contrastive SSL, may not effectively recognize. This enables SSLDSE to handle complex variations in medical images more effectively, improving classification performance by preserving essential relationships in the data.
SSLDSE also integrates a zero-shot classifier (ZsC) that leverages KL divergence loss to classify previously unseen or unlabeled ST structures. This is particularly important in medical imaging, where acquiring labeled data is challenging. Unlike GAN SSL and Active SSL, which require retraining to adapt to new data, SSLDSE generalizes to unseen classes without additional labeling, enhancing model adaptability and efficiency.
By combining real-time stochastic augmentation with KL divergence, SSLDSE generates shift-invariant features and generalizes better across different datasets, mitigating the risk of overfitting. In contrast, models like Active SSL, which may overfit with limited labeled data, show less consistency, while SSLDSE demonstrates accurate performance across diverse datasets.
The SSLDSE model is benchmarked against state-of-the-art semi-supervised learning models, including Active SSL, GAN SSL, and Contrastive SSL. Through experiments on both our proprietary dataset and the IEEE ISBI 2015 dataset, SSLDSE achieves superior performance in terms of accuracy, precision, recall, and F1 score. It outperforms these models by delivering robust classification with fewer labeled data, demonstrating its effectiveness in real-world applications.

The approach was assessed against state-of-the-art SSL models using our proprietary dataset and the IEEE ISBI 2015 [25,26] blind test dataset, and the outcomes showed promising performance. The outcomes indicated that the suggested deep relation-based semi-supervised binary-label classification approach surpasses the enhanced Inception-ResNet-v2 network in terms of performance.

2. Materials and Methods

This section details the proprietary lateral cephalometric X-ray dataset, provides an overview of relevant state-of-the-art semi-supervised methods, and describes the network architecture used in this study. The subsections provide a comprehensive explanation of the architectural framework of the SSLDSE, emphasizing the integration of deep subspace embeddings designed to capture latent feature representations by mapping input data to a lower-dimensional space that retains crucial structural information. These embeddings are enhanced by KL divergence, a statistical method used to measure the difference between two probability distributions, and the t-SNE manifold learning technique used to reduce the dimensionality of data while preserving its structure. These techniques facilitate the efficient extraction of meaningful features from both labeled and unlabeled images. Furthermore, the zero-shot classifier is incorporated into the SSL framework to improve generalization, allowing the model to classify unseen categories without explicit supervision. The proposed combination ensures robust performance even with limited labeled data, particularly in the binary classification of bridging and non-bridging-ST structures. Subsequent sections elaborate on the experimental settings and evaluation measures, with detailed experimental results discussed in Section 3. Finally, the discussion and conclusions are presented in Section 4 and Section 5, respectively.

2.1. Dataset and Pre-Processing

The study uses ethically sourced datasets, including 2254 cephalometric images for training and 627 for testing, supplemented with the IEEE ISBI 2015 dataset [25,26] for validation. The ethically sourced dataset comprises lateral cephalometric X-ray images from the Indian population, provided in 8-bit greyscale with a spatial resolution of 768 × 1024 pixels. In the initial study, min-max scaling is applied to mathematically standardize the input images to a conventional normal distribution, while contrast-limited adaptive histogram equalization (CLAHE) is used to enhance visual contrast during pre-processing [13]. The dataset includes X-ray images of 57% female and 43% male patients, excluding post-operative and artifact-affected X-ray images. Furthermore, for the classification task and to analyze the characteristics of the ST, a

256 \times 256

Region of Interest (RoI) corresponding to the ST is extracted from the full X-ray images. These datasets are used to identify various representative conditions and a pre-determined set of potential issues, creating transient scenarios within the data. The scenarios are then randomly divided into two groups,

S L_{i}

and

S L_{\hat{i}}

, representing labeled and unlabeled cases, respectively, as illustrated in Figure 2. Let

n_{i}

and

n_{\hat{i}}

represent the number of

S L_{i}

and

S L_{\hat{i}}

cases, respectively.

Experts label each case in

S L_{i}

as bridging or non-bridging, with approximately 40% of the dataset labeled and the remainder unlabeled. Cases in

S L_{\hat{i}}

are analyzed using a swift feature extraction method [27,28], with analysis terminated if it exceeds a certain threshold. To manage the computational load,

n_{i}

is kept smaller than

n_{\hat{i}}

. After analysis, responsive data

x_{i}

and stability labels

y_{i}

(if applicable) are collected. The cases are categorized into labeled and unlabeled sets (

S L_{i}

and

S L_{\hat{i}}

). Following this, trajectory feature extraction techniques, such as convolutional neural networks (CNNs), are applied, and the resulting feature vectors form a featured database

L = {L_{i}, L_{\hat{i}}}

, facilitating model development.

2.2. Semi-Supervised Methods

Semi-supervised learning (SSL) is a promising approach to tackle the challenges of creating large-scale annotations. It involves training models to generate meaningful representations from unlabeled data [29]. Unlike traditional supervised learning methods, SSL allows the development of versatile models that can be adapted for various tasks without needing extensive labeled datasets. This capability is particularly valuable in the field of medical imaging, where obtaining expert annotations is significantly more expensive. Therefore, applying SSL techniques in this domain has substantial implications for real-world clinical applications. In recent decades, researchers have focused on various semi-supervised problems in medical imaging. Previous SSL studies covered segmentation and classification tasks across different imaging modalities, including Alzheimer’s disease classification [30], nucleus classification [31] in histopathology images, cervical cell instance segmentation [32], and COVID-19 lesion segmentation from CT images [33]. Earlier research extensively used techniques such as self-training, co-training, graph-based algorithms [34], and semi-supervised SVM [35,36,37].

2.2.1. Active Learning

In active learning (AL), the method of instance selection is crucial and is determined by assessing the informational value of unlabeled data points. Uncertainty sampling [38,39], a widely employed approach, involves the learning algorithm choosing the instance with considerable labeling uncertainty. Another technique, query-by-committee, utilizes a voting system by maintaining a group of competing models that cast votes on labels [40]. The typical case with the highest disagreement among models is then chosen. An additional, more theoretically grounded selection method is rooted in decision theory and identifies the instance that would lead to the most significant decrease in the model’s generalization error if its label were known [41,42].

2.2.2. Generative Adversarial Networks

Recently, deep learning has significantly advanced semi-supervised classification in medical imaging. Various studies have shown that generative adversarial networks (GANs) are productive tools for semi-supervised learning, particularly in medical image analysis for the automated diagnosis of skin, heart, and retinal conditions. Madani et al. (2018) [43] used GANs to address challenges related to the scarcity of labeled data and variations in data domains in chest X-ray classification. The Conditional GAN [44] modified GANs by incorporating class labels into both the generator and discriminator components. The semi-supervised GAN [45] went a step further by compelling the discriminator network to produce class labels. Goodfellow et al. [46] introduced a theoretical framework for GANs and used them to generate images without supervised information. Subsequently, Radford et al. [47] introduced deep convolutional GANs (DCGANs) for unsupervised representations. To resolve the issue of gradient vanishing, WGAN [48] was developed, using the Wasserstein distance instead of the Jensen–Shannon divergence to compare the dataset distribution with the learning distribution from G. Furthermore, training on synthetic instances rather than real ones under a supervised learning scenario has been applied in various tasks [49].

2.2.3. Contrastive Learning

Contemporary studies have shown that contrastive learning (CL) is effective in extracting representations from unlabeled data. Its main advantage is its ability to learn invariant representations by comparing different views of input samples. The SimCLR [50] approach treats an augmented version of the original image as a positive sample, while views of other images in the same batch are considered negative samples. In contrast, BYOL [51] uses an augmented view of the image and trains an online network to predict the representation generated by a target network for the same image under various augmentations. The target network is updated using a moving average of the online network’s parameters. SimSiam achieves notable performance through self-supervised learning alone, stopping gradient operations without requiring negative sample pairs, large batch sizes, or momentum encoders [52]. An alternative method involves clustering [53]. The SwAV model introduces K prototype variables, with each prototype representing a shared feature of a sample type, enabling the more reliable clustering of samples within a batch. The model then utilizes contrastive learning based on the clustering outcomes to enhance sample representation.

While contrastive learning and other semi-supervised methods show significant promise, a dearth of research addresses the concurrent use of coarsely labeled and unlabeled data in medical imaging. One exception is a study on histological image classification [54], where the feature network is pre-trained using self-supervised learning to extract information from all training instances, followed by supervised learning with a margin-based loss function in a deep attention-based MIL network [55]. However, to our knowledge, SSLDSE learning has not been applied to the binary classification of ST. This approach, which combines a limited amount of coarsely labeled data with a larger set of unlabeled data, offers significant potential for reducing the annotation burden. The proposed SSLDSE framework underwent performance comparison against state-of-the-art models, including AL, GAN, CL, and a modified Inception-ResNet-v2, using accuracy, precision, recall, F1 score, blind test results, and ROC curve values for ST classification. This analysis aims to understand the framework’s effectiveness in tackling the challenges of classifying ST morphological structures while reducing the reliance on large labeled datasets.

2.3. Proposed Semi-Supervised Deep Subspace Embedding (SSLDSE) Architecture

Unlike conventional medical classification methods that predominantly depend on fully labeled datasets for accurate predictions, this approach introduces an innovative combination of semi-supervised deep subspace learning targeted at the binary classification of ST anomalies. It employs labeled and unlabeled data, utilizing deep subspace embeddings, KL divergence to refine probability distributions, and manifold learning to capture non-linear relationships within cephalometric data. This method focuses explicitly on binary classification, aiming to differentiate between normal and abnormal ST structures, particularly associated with dentofacial anomalies. Furthermore, using a zero-shot classifier enhances the model’s ability to identify new or previously unobserved abnormalities, even with limited labeled data. This approach significantly improves classification accuracy and robustness, especially in scenarios marked by noise and variability in cephalometric images. The flow diagram in Figure 3 illustrates the overall process of the proposed framework.

2.3.1. Exploring Non-Linear Embeddings with Kullback–Leibler Divergence

The Kullback–Leibler (KL) [56,57] divergence is a quantitative measure that assesses the disparity between two probability density functions (PDF),

f_{1} (x)

and,

f_{2} (x)

. In this context,

f_{1} (x)

represents the PDF for bridging sella shapes, and

f_{2} (x)

represents the PDF for non-bridging sella shapes. The discrimination information function

D (f_{1}, f_{2})

is defined as the difference between the two PDFs using

f_{2}

as the reference distribution. It is calculated as

D (f_{1}, f_{2}) = \int_{- \infty}^{\infty} f_{1} (x) ln (f_{1} (x)) d x - \int_{- \infty}^{\infty} f_{1} (x) ln (f_{2} (x)) d x,

(1)

as described in [6,29]. It is evident that

D (f_{1}, f_{2}) > 0

, and this value is only zero when the two PDFs are nearly identical. Therefore, the KL divergence is always non-negative, and higher values signify a more significant distinction between the two distributions [30].

In the present study, KL divergence is employed to map image sets to a non-linear subspace, where each point corresponds to a KL divergence value [58]. The process commences by extracting features from each image and representing them as probability distributions. Initially, various types of features are experimented with, including raw pixel values, histograms, and intermediate outputs from a CNN. Although raw pixel values and histograms are considered, they are ultimately not used in the final KL divergence calculation due to their limitations in dimensionality and lack of spatial information. Instead, intermediate CNN outputs are exclusively used due to their superior ability to capture both low-level and high-level features.

Given the computational intensity of calculating KL divergence for all possible pairs of images, particularly in large datasets, our methodology incorporates an efficient strategy to manage the computational load. Instead of calculating the KL divergence for every possible pair of images, a sampling technique is employed to select a representative subset of image pairs. This approach reduces the number of pairwise comparisons without compromising the integrity of the analysis. Additionally, parallel processing is utilized to distribute the computational workload across multiple cores, further accelerating the KL divergence calculations.

[CNN-Based Feature Extraction and Probability Distribution Estimation] The rich representational power of intermediate CNN outputs, specifically from Inception-ResNet-v2 [59,60,61], allows for the utilization of these features as the primary inputs for the KL divergence calculations [62]. These CNN-extracted features encapsulate both low-level and high-level morphological traits in a reduced dimensional space, making them particularly suitable for this form of probabilistic analysis. The process begins with extracting deep features from each image, followed by converting these features into probability distributions. To estimate these distributions for a pair of images, a binning technique is employed, where features are binned into fixed intervals. The number of bins as well as the binning strategy (e.g., equal width or equal frequency) are optimized for the dataset. To ensure numerical stability and to avoid zero probabilities in the binning process, a smoothing constant $ε$ is added to each bin count prior to normalization. Given two sets of features P (from image 1) and Q (from image 2), the KL divergence between the distributions of their CNN outputs is calculated as

$D_{K L} (P ‖ Q) = \sum_{i} P (Y_{i} ∣ X) ln (\frac{P (Y_{i} ∣ X)}{Q (Y_{i})}),$

(2)

Here, i represents the bin index for the estimated feature distributions. This approach enables a fine-grained comparison between the image pairs, capturing the inherent differences in ST shapes.

[Efficient Computation and Manifold Learning Integration] Computing the KL divergence for all image pairs in large datasets can be computationally prohibitive. Therefore, an optimized sampling method is applied to select a representative subset of image pairs for divergence calculation. This reduces the computational load significantly while maintaining analytical precision. Parallel processing across multiple cores further enhances the efficiency of these calculations, allowing for the rapid processing of large datasets without sacrificing detail. To capture the complex, non-linear relationships inherent in these features, manifold learning techniques, particularly t-distributed Stochastic Neighbor Embedding (t-SNE), are incorporated [63,64]. t-SNE helps embed high-dimensional divergence values into a lower-dimensional space while preserving both the local and global structures of the data. This allows the model to generalize effectively across variations in ST. The embedding reveals clusters and patterns that are not immediately obvious in the original high-dimensional feature space.
Let $I_{1}$ and $I_{2}$ denote the sets of images represented by bridging and non-bridging ST forms, respectively. For each image I in $I_{1}$ and $I_{2}$ , a feature representation $f (I)$ is extracted. The KL divergence $D (f (I_{1}), (I_{2}))$ is computed for each image pair $I_{1}$ in $I_{1}$ and $I_{2}$ in $I_{2}$ , resulting in a matrix of divergence values. A non-linear embedding technique is then used to visualize these relationships in a lower-dimensional space. $I_{1}$ and $I_{2}$ are the sets of feature representations for the bridging and non-bridging ST images to formalize this process. The embedding of D in a lower-dimensional space is given by

$D = Embed (D (I_{1}, I_{2})),$

(3)

where $D (I_{1}, I_{2})$ is the matrix of KL divergence values between all pairs of feature representations from $I_{1}$ and $I_{1}$ , and $E m b e d ()$ is the embedding function t-SNE. Because both $D (f_{1}, f_{2})$ and $D (f_{2}, f_{1})$ are non-negative and unbounded, this visualization highlights the divergence patterns, elucidating the morphological differences between bridging and non-bridging ST images. The embedding further enhances the model’s ability to distinguish between classes by preserving the non-linear relationships inherent in the data.

[Normalized KL Divergence and Information Distinguishability] While the KL divergence offers valuable insight into the separability of feature distributions, its unbounded nature can make comparisons across different datasets or conditions challenging. To address this, the Information Distinguishability $(I D)$ measure proposed by Soffi et al. [65] is adopted, which normalizes the KL divergence into a bounded metric between 0 and 1:

$I D (f_{1}, f_{2}) = 1 - exp [- D (f_{1}, f_{2})],$

(4)

This measure ranges from 0 to 1, where $I D = 0$ indicates identical marker distributions $f_{1} (x)$ and $f_{2} (x)$ , and $I D = 1$ indicates the complete separation of the distributions. This transformation provides a standardized, bounded measure useful for comparing different biomarkers under study, similar to the AUC and Youden index, especially for the assessed pre-test and post-test rule-in and rule-out potentials. Integrating manifold learning with KL divergence ensures that the learned embeddings are robust, capturing the intricate non-linearities in the data without compromising the model’s ability to distinguish between different classes. This combination of techniques enhances clustering performance and ensures robust, shift-invariant embeddings.

2.3.2. Semi-Supervised Learning with Deep Subspace Descriptor

In typical classification tasks, a CNN is designed to generate predictions

p_{i}

for

i = 1, . . ., n

by utilizing a labeled dataset

S L_{i} = {S L_{i} \in S L}_{1}^{n}

. This is achieved by mapping a high-dimensional data space

S L

to a smaller space using techniques such as max-pooling and fully convolutional layers. As the dimensions of each layer are reduced, the essential information of the data is distilled, effectively transforming each layer into a feature embedding of the data.

On the other hand, clustering algorithms aim to organize an unlabeled dataset

S L_{\hat{i}} = {S L_{\hat{i}} \in S L}_{1}^{n}

into x clusters, represented by their means

μ_{j}

for

j = 1, . . ., k

. Unlike supervised CNNs, clustering algorithms do not require labels to form clusters [24]. However, they suffer from the curse of dimensionality and perform better when the dimensionality of data space

S L

is reduced.

The primary goal of this study is to combine the advantages of both methods: the powerful capabilities of a deep CNN and the label-free clustering capabilities of a robust clustering algorithm based on unsupervised deep embedding [66]. By merging these approaches, the study addresses the limitations typically associated with clustering algorithms [67,68]. Within this framework, the CNN learns a mapping from data

S L

to a lower-dimensional embedding Z. At the same time, the clustering algorithm groups the data points into clusters from this lower-dimensional embedding. These two tasks—embedding, learning, and clustering—are conducted simultaneously.

[Algorithmic Framework] The framed algorithm follows this structure by applying stochastic augmentation to dataset SL, then processing it through a transfer learning approach using the Inception-ResNet-v2 network with fine-tuning and dropout [59,60,61]. The output from this network is then passed through a deep subspace descriptor, specifically, the deep embedding clustering algorithm [69,70], which further reduces the dimensionality and generates embeddings $S L_{i}$ as illustrated in Figure 4.
To effectively integrate labeled and unlabeled data, our approach employs a sophisticated semi-supervised learning framework that maximizes the utility of both datasets. This integration occurs through several key steps:
- Data Augmentation: Both labeled $S L_{i}$ and unlabeled $S L_{\hat{i}}$ datasets undergo stochastic augmentation. This step enhances the diversity and robustness of the training samples by introducing variations in the data, such as rotations, translations, and noise, which helps the model generalize better to unseen data.
- Feature Extraction: Features from both labeled and unlabeled datasets are extracted using the Inception-ResNet-v2 network. This deep learning model is fine-tuned to adapt to the specific task at hand, ensuring that the extracted features are relevant and informative for distinguishing between bridging and non-bridging ST shapes.
- Joint Training: The extracted features from both datasets are jointly utilized in the training process. The model performs supervised learning for the labeled data $S L_{i}$ , using the ground truth labels to guide the learning process. This helps the model to learn the direct mappings from input features to the desired output labels. Simultaneously, the unlabeled data $S L_{\hat{i}}$ are used in unsupervised learning, where the model learns to identify and group similar patterns within the data. This unsupervised learning process aids in discovering the underlying structure of the data, which might not have been apparent from the labeled data alone.
- Deep Embedding Clustering: The deep subspace descriptor is employed to combine the strengths of supervised and unsupervised learning. This involves using a deep embedding clustering algorithm, which reduces the dimensionality of the feature space while preserving its essential characteristics. The algorithm generates embeddings $S L_{i}$ representing the data in a lower-dimensional space, facilitating the clustering of labeled and unlabeled data. This clustering helps form a more coherent and discriminative feature space, improving the overall classification performance.

[Deep Embedding and Clustering Process] The deep embedding layer operates similarly to t-SNE, as it seeks to find a lower-dimensional representation that preserved the clustering structure of the data. It accomplishes this by employing a clustering loss that refines the embeddings for better clustering as measured by the similarity between an embedding $Z_{i}$ and the mean of a cluster $μ_{j}$ :

$p_{i, j} = \frac{(1 + ∥ Z_{i} - μ_{j} ∥^{2}) / β^{\frac{- (β + 1)}{2}}}{\sum_{j^{'}} {(\frac{1 + ∥ Z_{i} - μ_{j^{'}} ∥^{2}}{β})}^{\frac{- (β + 1)}{2}}},$

(5)

where $p_{i}, j$ represents the probability that data point i belongs to cluster j and functions as prediction p. This layer learns the means or centroids of different clusters from the embeddings of the penultimate layer, while a dense layer learns the mapping between embeddings and predictions. The deep embedding clustering algorithm is applied to the feature vectors in the embedding space. This algorithm is chosen due to its ability to handle complex, high-dimensional data and its efficiency in forming well-separated clusters. It complements the deep learning framework by effectively grouping similar feature vectors, thus enhancing the overall performance of the SSL model.
The use of non-linear embeddings, analyzed with KL divergence, ensures that the essential data structures are preserved while making the embeddings robust to shifts and variations. Non-linear embeddings capture complex patterns in the data, effectively reducing dimensionality and preserving important relationships. KL divergence, used as a clustering loss, helps maintain the structural integrity of the data by aligning the learned embeddings with the true data distribution. This combination of techniques enhances the clustering performance and ensures robust, shift-invariant embeddings. In all the experiments, the hyperparameter $β$ is set to 1.

2.3.3. Enhancing Model Robustness with Zero-Shot Classifier

The concluding phase of the proposed methodology, depicted through the detailed Figure 5 functional architectural framework, involves bolstering model resilience using a zero-shot classifier

Z s C

module guided by KL divergence loss [9]. This sophisticated model is engineered to recognize examples from categories absent during training, which is essential when data for certain classes are unavailable or limited. The

Z s C

utilizes semantic connections between familiar and unfamiliar classes, allowing the model to extrapolate beyond its training dataset. In our strategy, the classifier employs KL divergence loss to synchronize the learned embeddings with the distribution of known classes, thus enabling the identification of novel classes [71,72].

[Zero-Shot Classifier Integrated with KL Divergence] To execute $Z s C$ using KL divergence loss, a CNN is initially trained on the existing data to extract significant features from the input examples. This pre-trained CNN produces robust feature representations $f (x)$ , which serve as the foundation for subsequent classification tasks. Each class, including both known and unknown categories, is characterized by a semantic vector. These vectors are constructed using deep visual features extracted from pre-trained CNNs and enhanced with supplementary information such as textual descriptions or hierarchical attributes. The semantic vectors encapsulate the characteristics and relationships among various classes within a high-dimensional semantic space, thereby enabling the model to deduce the existence of unseen classes based on their semantic resemblance to familiar categories.
The KL divergence loss plays a crucial role in this process by assessing the difference between the predicted probability distribution of input data embeddings and the target distribution defined by semantic vectors of known classes. As outlined in Equation (2), the KL divergence measures the disparity between the predicted distribution $P (Y_{i} ∣ X)$ and the target distribution $Q (Y_{i})$ , and is defined as

$D_{K L} (P (Y_{i} ∣ X) ‖ Q (Y_{i})) = \sum_{i} P (Y_{i} ∣ X) ln (\frac{P (Y_{i} ∣ X)}{Q (Y_{i})}),$

(6)

This loss function guides the model in synchronizing the predicted embeddings $Q_{i}$ with the semantic vectors, ensuring that network weights are adjusted to reduce divergence. This synchronization helps align the embeddings of known classes closely with their corresponding semantic vectors.
Through iterative minimization of the KL divergence between predicted and target distributions, the model develops the ability to generalize effectively beyond the training data. This enables the classifier to map embeddings $q_{i}$ to appropriate class labels $O_{i}$ , even without explicit examples of those classes. The capacity to identify unseen classes is achieved by utilizing the semantic relationships encoded in the model, allowing it to deduce the existence of new classes based on their resemblance to known classes. This approach enhances the model’s resilience and adaptability, enabling it to function effectively in varied and dynamic settings where new classes may continually emerge. By aligning the learned embeddings with semantic distributions, the $Z s C$ framework ensures enhanced performance in recognizing novel and unobserved classes, offering a scalable solution for real-world applications where data for certain classes may be limited or unavailable.

2.4. Experimental Settings

The proposed methodology is executed using Python 3.10, PyTorch 1.12.0, and CUDA 11.3 on a Windows 10 platform. All the input frames are resized to uniform dimensions of

256 \times 256

pixels. The model is trained using the AdamW optimizer, with a learning rate of

2.5 \times 10^{- 5}

and a weight decay of

10^{- 4}

. AdamW is chosen for its improved weight decay management, which mitigates the overfitting risk by separating the decay factor from the gradient updates. The training process incorporates diffusion timesteps of

T = 1000

, employing a linear schedule that ranges from an initial

β_{1}

10^{- 4}

to a final

β_{T}

0.02

. The training is carried out on a Nvidia RTX 3080 GPU for 75,000 iterations, with batch sizes varying between 6 and 8, depending on the memory constraints.

An important aspect of the study is selecting the appropriate evaluation methods for classification algorithms. In single-label learning, classification is viewed as correct or incorrect. However, binary or multilabel classification allows for partially correct or inaccurate results, enabling a more nuanced analysis [73]. Various metrics, including accuracy, recall, average precision, and F1 score, are employed to assess the model’s performance. The confusion matrix provides insights into the distribution of true positives, false positives, true negatives, and false negatives during both the training and testing phases, offering a comprehensive view of the model’s accuracy.

The receiver operating characteristic (ROC) curve is used to evaluate the model’s classification performance across our dataset and the IEEE ISBI 2015 datasets for testing and validation. The ROC curve effectively demonstrates the model’s ability to differentiate between classes. Furthermore, t-SNE is employed as a dimensionality reduction technique. The t-SNE plot visually illustrates the model’s capability to cluster bridging and non-bridging ST structures in a lower-dimensional space. t-SNE is integrated as an evaluation tool to provide a qualitative perspective on data separability and clustering, complementing quantitative metrics such as accuracy and the ROC curve.

Following this, a detailed representation of the evaluation methods for SSLDSE is provided. Let

S L = {(S L_{i}, S L_{\hat{i}})}_{i = 1}^{n}

denote a Hybrid Case Base dataset, where

S L_{i}

represents the labeled dataset, and

h (L_{i})

indicates the label classifier. The variable n denotes the total number of instances within the dataset, and

I ⌈ \cdot ⌉

acts as the indicator function.

Accuracy: It represents the proportion of labels that have been correctly classified:

A c c u r a c y = \frac{1}{n} \sum_{i = 1}^{n} (\frac{| S L_{i} \cap h (L_{i}) |}{| S L_{i} \cup h (L_{i}) |}),

(7)

Recall: It quantifies the average percentage of relevant labels correctly identified for instances classified as appropriate:

R e c a l l = \frac{1}{n} \sum_{i = 1}^{n} \frac{|S L_{i} \cap h (L_{i})|}{|S L_{i}|},

(8)

Average Precision: It represents the mean proportion of relevant labels that are ranked higher than a particular label:

A v e r a g e p r e c i s i o n = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{|S L_{i}^{l}|} \sum_{s l_{i} \in S L_{i}^{l}} \frac{|\{s l_{q} \in S L_{i}^{l} : R_{i} (s l_{q} \leq R_{i} (s l_{i}))\}|}{R_{i} (s l_{i})},

(9)

where

R_{i} (s l_{i})

represents the predicted rank of label

s l_{i}

; for instance,

l_{i}

. For each label

s l_{i}

, precision is calculated as the ratio of correctly ranked labels

s l_{q}

(those with

R_{i} (s l_{q}) < R_{i} (s l_{i})

) up to

s l_{i}

, with the average taken across all instances and labels.

F1 score: This metric, derived from precision and recall, offers a weighted evaluation. It is often considered a more reliable performance indicator than using precision and recall individually:

F 1 - score = 2 \times \frac{precision \times recall}{precision + recall},

(10)

3. Results

This section summarizes the results of the binary classification experiment. The dataset is divided, with 70% allocated for training and the remaining 30% evenly split between validation and testing sets. The results are presented at various epoch intervals, along with the average performance metrics. The evaluation includes accuracy, precision, recall, and F1 score for each class individually, as well as for both classes combined.

3.1. Experimental Results of Binary Classification

The binary classification outcomes for the bridging and non-bridging ST class are presented in Table 1. The model’s effectiveness for bridging ST samples is declined at the 50th, 100th, and 250th epochs. Specifically, precision values are decreased to 86.82% at the 100th epoch and 85.42% at the 250th epoch. Similarly, the lowest recall rates recorded are 90.00%, 90.93%, and 90.77% for the 50th, 100th, and 250th epochs. The F1 scores are 88.45% at the 100th epoch and 88.49% at the 250th epoch. Overall, the mean values across these epochs are 87.08% for precision, 91.37% for recall, and 89.17% for the F1 score. Respectively, for the non-bridging ST class, the lowest recall rates are recorded at the 100th and 150th epochs, with values of 88.75% and 89.60%, respectively. In terms of precision, the model achieves 85.75% at the 150th epoch and 86.50% at the 250th epoch. The F1 score reaches 91.73% during the 50th epoch and is improved to 92.85% at the 150th epoch.

Table 2 illustrates the performance metrics for binary classification across both categories, highlighting accuracy, precision, recall, and F1 score. The total accuracy achieved is 97.28%. The precision, recall, and F1 scores are 96.12%, 95.65%, and 97.77%, respectively.

3.2. Experimental Validation Results of Classification

The IEEE ISBI 2015 dataset is used along with our dataset to evaluate the effectiveness of the model and to reduce the risk of overfitting. The IEEE ISBI 2015 dataset comprises 400 lateral cephalogram X-ray images. This approach for validating the proposed SSLDSE model is considered innovative. A 5-fold cross-validation is performed, and the resulting confusion matrix is shown in Figure 6.

For our dataset, the model achieves 83.18% precision for the bridging ST class and 84.62% for the non-bridging ST class. The recall rates are 87.50% for the bridging ST class and 88% for the non-bridging ST class. The F1 scores reach 85.29% for the bridging ST class and 86.28% for the non-bridging ST class, resulting in an overall average accuracy of 86%. This underscores the model’s effectiveness. Similar performance is observed on the IEEE ISBI dataset, with 87% precision for the bridging ST class and 85.23% for the non-bridging ST class. The recall rates are 77.68% for the bridging ST class and 75% for the non-bridging ST class. F1 scores for both classes are 82.04% and 79.79%, respectively, resulting in an overall accuracy of 81.50%. These results further confirm the robustness and reliability of the model, as illustrated by the ROC curve in Figure 6, for the external dataset.

In addition to the quantitative assessment, a t-SNE plot is employed to visualize the learned data embeddings from both the dataset and the IEEE ISBI dataset in a reduced-dimensional space. As shown in Figure 7, the t-SNE plot provides a qualitative evaluation of the model’s ability to distinguish between bridging and non-bridging ST classes. The plot reveals distinct clusters, validating the model’s effective separation of the two classes based on their characteristics. These clusters support the quantitative results regarding precision, recall, and F1 scores while demonstrating the model’s capacity to generalize to unseen data.

However, despite these positive outcomes, several areas for improvement remain. The model struggles with subtle bridging cases, particularly where the anterior and posterior clinoid processes are only partially connected. Future work could focus on enhancing feature learning for these challenging cases by incorporating domain-specific knowledge or spatial features. Additionally, better pre-processing techniques, such as super-resolution methods, could improve the handling of low-quality images. Fine-tuning the model on various bridging subtypes and employing ensemble learning methods may further enhance its robustness. Active learning could also be utilized to select the most informative bridging cases for labeling, addressing the issue of a limited dataset. Lastly, improving cross-dataset generalization through domain adaptation techniques would enhance performance across diverse datasets.

3.3. Error Analysis

The classification method that demonstrates the lowest error rate is recognized as the most effective. To gain a deeper understanding of the classification dynamics and the limitations of the proposed approach, a comprehensive error analysis is conducted. This analysis involves comparing the classification accuracy of the utilized SSL techniques, including contrastive, GAN, and active SSL, alongside a modified Inception-ResNet-V2 model against our proposed SSLDSE method. This analysis is supplemented by both statistical measures and visualization techniques to provide a complete view of the proposed method’s behavior under different conditions. The analysis is initiated with a bar plot that includes error bars, which serves to assess the training accuracy and highlight potential misclassification risks.

Furthermore, Figure 8, Figure 9 and Figure 10 present the visual outcomes of the structure classifications, providing additional insights into the efficacy of the proposed method. As a statistical evaluation, shown in the boxplot with error bars depicted in Figure 8a, the suggested method accurately classifies about 84 bridging cases in our dataset, resulting in a 16.00% error rate, indicating a moderate level of misclassification for bridging instances. In contrast, the model performs better with non-bridging cases, correctly identifying approximately 88 cases with a lower 12.00% error rate, demonstrating improved accuracy in distinguishing non-bridging instances within our dataset.

Conversely, the performance on the IEEE ISBI 2015 dataset, depicted in Figure 8b, reveals that the SSLDSE method correctly identifies roughly 78 bridging cases, yielding a higher 22.00% error rate. This suggests increased difficulty in accurately classifying bridging cases within this dataset. The model also correctly identifies about 87 non-bridging cases, with a 13.00% error rate, indicating slightly reduced accuracy compared to our dataset. In addition, Section 3.2. Experimental Validation Results of Classification, presents a t-SNE plot (Figure 7) that offers both visual and analytical insights into the model’s classification capabilities, reinforcing the experimental results and clearly marking error-prone data points. The t-SNE plot displays distinct clusters with minimal overlap on our dataset, suggesting high classification accuracy and low error rates. For the IEEE dataset, however, the t-SNE plot reveals greater intermixing between classes, corresponding with the observed higher error rates and increased misclassifications.

A more detailed examination of the errors reveals that the model struggles particularly with the subtle variations in the morphology of bridging ST cases. In instances where the bridging is less pronounced, or the anterior and posterior clinoid processes are only partially connected, the model often misclassifies these as non-bridging ST cases. This is especially evident in lower-quality images, where noise and lower resolution contribute to these misclassifications. The misclassification of these challenging bridging ST cases highlights a key limitation of the model in identifying complex, nuanced morphological features.

To rigorously assess the efficacy of the implemented SSL methodologies in relation to the proposed approach, boxplots are created to visualize the classification results. Figure 9 showcases the boxplots for the SSLDSE technique and alternative SSL methods employed in this investigation, specifically for distinguishing between bridging and non-bridging ST categories. The graphical representations indicate that SSLDSE exhibits superior performance, attaining classification accuracies of 95.19% for bridging instances and 97.37% for non-bridging cases while demonstrating minimal fluctuation, thereby establishing itself as the most robust approach among those evaluated.

In contrast, GAN-based techniques display inferior classification accuracies, particularly for bridging cases, with substantial variability and an accuracy of 69.82%, suggesting less consistent outcomes. A more comprehensive examination of each method’s accuracy is presented in Section 4. Moreover, error rates, determined by subtracting the accuracy from one, highlight the discrepancies in misclassification rates across various approaches. For instance, the GAN method for the bridging ST class exhibits the highest error rate of 30.18%, reflecting its lower accuracy, whereas SSLDSE for the non-bridging ST class demonstrates the lowest error rate of 2.63%, emphasizing its superior reliability and effectiveness.

Figure 10 supplements these statistical analyses by illustrating instances where the model encounters classification difficulties. This visual representation sheds light on both accurate and inaccurate predictions, with a particular emphasis on areas that prove challenging for the proposed model. Within the bridging class, red dashed circles highlight regions where errors occurred, indicating that the model occasionally fails to capture subtle, complex characteristics inherent to these structures. These misclassifications underscore the considerable challenge of accurately detecting such nuanced patterns, given the dynamic and intricate nature of bridging formations. By identifying these problematic features, valuable insights into the model’s limitations and the ambiguity present in certain data points are gained, which inform future refinements and improvements to the system.

4. Discussion

Based on our knowledge, no prior research employed an SSL model with enhanced modules to categorize the extensive range of ST classes commonly encountered in clinical practice. Our SSLDSE model was evaluated against state-of-the-art SSL models trained on our dataset. However, it is important to note that this comparison was limited by variations in data samples and SSL technique parameters. The number of architectures developed by computational researchers is growing rapidly, especially those designed to support healthcare professionals. Various models, including U-Net, VGG19, ResNet34, Google-InceptionV3, ResNeXt50, and SellaMorph-Net (designed explicitly for ST structure segmentation), were introduced for the automated segmentation and classification of ST structures. In this section, the results of the proposed SSLDSE model are compared with state-of-the-art SSL models trained on our dataset as outlined in Table 3.

The proposed SSLDSE method outperforms other SSL models in terms of binary classification effectiveness. The modified InceptionResNetV2 framework maintains a notable accuracy of 89.27% for bridging ST and 91.92% for non-bridging ST categories. Moreover, the active SSL technique demonstrates robust precision, attaining 81.67% for bridging ST and 85.10% for non-bridging ST groups. However, a more in-depth comparison of these methods reveals the distinct strengths and limitations for each approach. For instance, contrastive SSL is effective during initial feature extraction but struggles with ambiguous cases, particularly when negative samples are inadequately represented. Although GAN-based SSL has the potential to generate additional training data, it exhibits high variability in the classification results, especially for bridging cases. Active SSL shows promise in selecting informative samples but requires substantial manual intervention for labeling, which impacts its efficiency. In contrast, SSLDSE achieves superior performance by integrating deep subspace analysis with KL divergence, leading to higher accuracy and more stable results, particularly for the challenging bridging ST cases. Table 4 comprehensively analyzes various SSL methods compared to the proposed SSLDSE, including binary classification test outcomes from our proprietary dataset and the IEEE ISBI 2015 dataset.

The accuracy levels reported by the methods in Table 4 range from 68.61% to 90.06% on the IEEE ISBI 2015 dataset and from 68.93% to 94.72% on our dataset. While our method exhibits lower accuracy compared to other state-of-the-art models, it shows higher AUC of ROC and F1 scores. Specifically, the proposed SSLDSE achieves AUC of ROC values of 91.02% for the IEEE ISBI 2015 dataset and 93.57% for our proprietary dataset. Additionally, the F1 scores for the proposed method are 92.08% and 95.37%, respectively.

The authors employ stratified five-fold cross-validation to evaluate the performance of the proposed SSLDSE. Table 1 showcases the precision, recall, and F1 score for various epochs in the binary classification of bridging and non-bridging ST classes, respectively. Table 2 provides a comprehensive overview of accuracy, precision, recall, and F1 scores for classifying both bridging and non-bridging ST classes across different epochs. The results from Table 3 guide the selection of five-fold cross-validation for validating data samples not included in the training phase. Section 3.2. Experimental Validation Results of Classification elaborates on the validation test outcomes. Figure 5 illustrates (a) the confusion matrix and (b) the AUC-ROC curve for binary classification using the provided dataset. The model demonstrates a validation accuracy of 86%, which closely matches the testing accuracy of 93.57%. This alignment suggests effective model generalization without overfitting. Additionally, the (c) confusion matrix and (d) AUC-ROC curve reveal a validation accuracy of 81.50% on the IEEE ISBI 2015 dataset, comparable to the testing accuracy of 91.02%. These results further confirm that the SSLDSE model avoids overfitting and performs well on unseen data. The t-SNE plot in Figure 6 offers a visual assessment of the ability of the proposed SSLDSE to differentiate between classes. The distinct clustering observed in the plot validates the effectiveness of the model in distinguishing between the two classes based on their feature representations.

The proposed SSLDSE model’s limitations and potential areas for enhancement are elucidated through a comprehensive error analysis. Figure 9 presents visual interpretations of classification errors, revealing a higher prevalence of misclassifications within the bridging ST class, which suggests the model’s heightened sensitivity to certain structural intricacies. Further investigation of these errors using confusion matrices and AUC-ROC curves, as shown in Figure 5, demonstrates that despite the high overall accuracy of the proposed SSLDSE, specific error patterns underline the model’s struggle in distinguishing between structurally similar entities. This examination of errors allows for the identification of areas where the model could be improved, such as through enhanced data augmentation techniques or more advanced feature extraction methods, ultimately boosting its capacity to differentiate between subtle morphological variations.

According to the existing literature and the author’s knowledge, no dedicated system has been specifically developed for classifying ST morphological structures using the enhanced modules proposed here. First, the model utilizes a fine-tuned InceptionResNetV2 for feature extraction. Then, the SSLDSE model is evaluated using a five-fold stratified cross-validation approach, which optimizes image selection for training and testing. This method minimizes bias by ensuring that each image is used four times for training and once for testing. Furthermore, the model is validated with 627 images separated from the author’s dataset that are not included in the training phase. The validation results show consistent accuracy, confirming that the model does not exhibit overfitting. A more detailed analysis can be found in Section 3.2. Experimental Validation Results of Classification.

The AdamW optimization method, which combines the advantages of weight decay and Adam, is employed. Unlike the standard Adam optimizer, AdamW separates weight decay from the gradient update, leading to better generalization. Recent studies often favor AdamW for its superior performance compared to traditional optimizers like RMSProp. In this study, AdamW is used alongside advanced techniques such as Self-Organizing Maps (SOMs) and Principal Component Analysis (PCA) to boost model performance. Moreover, the Albumentations library is utilized for image pre-processing during each fold. Instead of its usual purpose of expanding the dataset size, it is applied here to increase learning efficiency, reduce overfitting, and enhance execution time. Additionally, the Albumentations library is utilized for image pre-processing during each fold. Instead of its usual purpose of expanding the dataset size, it is applied here to increase learning efficiency, reduce overfitting, and enhance the execution time.

In conclusion, the researchers emphasize the encouraging outcomes of their proposed SSLDSE technique for automatically classifying ST morphological structures. They recommend the implementation of SSL approaches and enhanced modules, which effectively tackle non-linear issues and reduce data labeling expenses by enabling the use of smaller labeled datasets. This research adds to the existing knowledge base by offering an efficient solution for the automated categorization of ST structures and linking the related dentofacial abnormalities. Further work in the future could enhance the model’s ability to address subtle bridging cases using advanced feature learning with domain-specific spatial information or pre-processing techniques like super-resolution. Fine-tuning for bridging subtypes and employing ensemble learning could improve robustness, while active learning and domain adaptation may address cross-dataset generalization challenges. It is important to note that systems like the one proposed are not intended to replace medical practitioners; instead, these methodologies, as an automated platform, aim to support and improve their overall experience.

5. Conclusions

This research presents an automated system for classifying ST structures. The proposed method employs an SSL framework and has undergone evaluation through five-fold cross-validation. The validation process utilizes both the author’s and an external dataset from IEEE ISBI 2015. In binary classification, the SSLDSE approach demonstrates an average accuracy of 96.70%, with average recall, precision, and F1 scores of 96.69%, 97.59%, and 97.11%, respectively. To the researcher’s knowledge, this study is the first to propose an automated ST binary classification method that incorporates SSL techniques with enhanced modules. The researchers acknowledge certain constraints in current state-of-the-art methods. First, the critical number of samples is dynamically variable and complex, bridging ST structures. Additionally, available X-ray datasets suffer from artifacts and contain a significant amount of unlabeled data, leading to datasets lacking robustness. However, the existing empirical evidence on SSL applications indicates that performance improves as the number of high-quality training samples increases. Furthermore, it is crucial to thoroughly evaluate such methods, taking into account the diverse ST structures found in different patients. While the proposed approach shows potential in classifying the advanced stages of ST, it is important to focus on improving performance in the initial stages, where the results may be less optimal. This study has significant implications for orthodontics and medical imaging. Automating ST classification offers critical insights into craniofacial growth patterns and anomalies, thereby aiding in early diagnosis and treatment planning while reducing reliance on manual assessments. Moreover, the SSLDSE framework illustrates the potential of SSL to effectively utilize unlabeled data, providing a robust solution for challenging tasks in medical imaging. Future work will extend this framework to include the multiclass classification of ST subtypes and apply it across diverse imaging modalities. This advancement could pave the way for the integration of machine learning tools into clinical practice, ultimately enhancing patient care.

Author Contributions

Conceptualization, K.S.S., A.A., J.P. and A.L.; methodology, K.S.S., A.A. and J.P.; validation, A.A., J.P., M.J. and A.L.; formal analysis, V.K. and P.K.; investigation, K.S.S., A.A., J.P. and A.L.; resources, A.A., J.P., M.J. and A.L.; data curation, K.S.S., M.J. and V.K.; writing—original draft preparation, K.S.S. and P.K.; writing—review and editing, K.S.S., A.A., J.P., P.K. and A.L.; supervision, A.A., J.P., M.J., A.L. and V.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was approved by the Ethics Committee of the Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India (Registration No: PGI/IEC/2021/001488 and IEC No: IEC-09/2021-2119) on 19 October 2021.

Informed Consent Statement

Informed consent for participation was obtained by the Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India, as per their standard procedure.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy regulation.

Acknowledgments

The authors would like to express their gratitude to the Oral Health Sciences Centre at the Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India, for providing cephalometric X-ray data under ethical clearance number IEC-09/2021-2119. They also thank the faculty of the CSIR—Central Scientific Instruments Organisation (CSIO), Chandigarh, India, and the School of Science at RMIT University, Melbourne, Australia, for their valuable assistance.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ST	Sella Turcica
SSL	Semi-Supervised Learning
SSLDSE	Semi-Supervised Deep Subspace Embedding
KL divergence	Kullback–Leibler Divergence
t-SNE	t-Distributed Stochastic Neighbor Embedding
CNN	Convolutional Neural Network
AL	Active Learning
GAN	Generative Adversarial Network
CL	Contrastive Learning
PDF	Probability Density Function
ID	Information Distinguishability
ZsC	Zero-Shot Classifier
ROC	Receiver Operating Characteristic

References

Khouw, F.; Proffit, W.; White, R. Cephalometric evaluation of patients with dentofacial disharmonies requiring surgical correction. Oral Surgery Oral Med. Oral Pathol. 1970, 29, 789–798. [Google Scholar] [CrossRef] [PubMed]
Alkofide, E.A. The shape and size of the sella turcica in skeletal Class I, Class II, and Class III Saudi subjects. Eur. J. Orthod. 2007, 29, 457–463. [Google Scholar] [CrossRef] [PubMed]
Tekiner, H.; Acer, N.; Kelestimur, F. Sella turcica: An anatomical, endocrinological, and historical perspective. Pituitary 2015, 18, 575–578. [Google Scholar] [CrossRef] [PubMed]
Shakya, K.S.; Jaiswal, M.; Priti, K.; Alavi, A.; Kumar, V.; Li, M.; Laddi, A. A novel SM-Net model to assess the morphological types of Sella Turcica using Lateral Cephalogram 2022. Available online: https://www.researchsquare.com/article/rs-2046354/v1 (accessed on 13 June 2024).
Sathyanarayana, H.P.; Kailasam, V.; Chitharanjan, A.B. Sella turcica-Its importance in orthodontics and craniofacial morphology. Dent. Res. J. 2013, 10, 571. [Google Scholar]
Shakya, K.S.; Laddi, A.; Jaiswal, M. Automated methods for sella turcica segmentation on cephalometric radiographic data using deep learning (CNN) techniques. Oral Radiol. 2023, 39, 248–265. [Google Scholar] [CrossRef]
Teal, J. Radiology of the adult sella turcica. Bull. Los Angeles Neurol. Soc. 1977, 42, 111–174. [Google Scholar]
Camp, J.D. The normal and pathologic anatomy of the sella turcica as revealed by roentgenograms. Am. J. Roentgenol. Radium Ther. 1924, 12, 143–156. [Google Scholar]
Shakya, K.S.; Jaiswal, M.; Porteous, J.; K, P.; Kumar, V.; Alavi, A.; Laddi, A. SellaMorph-Net: A Novel Machine Learning Approach for Precise Segmentation of Sella Turcica Complex Structures in Full Lateral Cephalometric Images. Appl. Sci. 2023, 13, 9114. [Google Scholar] [CrossRef]
Leonardi, R.; Barbato, E.; Vichi, M.; Caltabiano, M. A sella turcica bridge in subjects with dental anomalies. Eur. J. Orthod. 2006, 28, 580–585. [Google Scholar] [CrossRef]
Khaitan, T.; Vishal; Gupta, P.; Naik, S.R.; Shukla, A.K. Morphometric Analysis of Sella Turcica and a Proposed Novel Sella Turcica Index–A Digital Lateral Cephalometric Study. Indian J. Otolaryngol. Head Neck Surg. 2024, 76, 73–77. [Google Scholar] [CrossRef]
Kucharczyk, W. The sella turcica and parasellar region. In Magnetic Resonance Imaging of the Brain and Spine. 1996. Available online: https://archive.org/details/magneticresonanc0002unse/page/870/mode/2up (accessed on 25 June 2024).
Shakya, K.S.; Priti, K.; Jaiswal, M.; Laddi, A. Segmentation of Sella Turcica in X-ray Image based on U-Net Architecture. Procedia Comput. Sci. 2023, 218, 828–835. [Google Scholar] [CrossRef]
Ghasedi Dizaji, K.; Herandi, A.; Deng, C.; Cai, W.; Huang, H. Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5736–5745. [Google Scholar]
Van Engelen, J.E.; Hoos, H.H. A survey on semi-supervised learning. Mach. Learn. 2020, 109, 373–440. [Google Scholar] [CrossRef]
Shakya, K.S.; Alavi, A.; Porteous, J.; K, P.; Laddi, A.; Jaiswal, M. A Critical Analysis of Deep Semi-Supervised Learning Approaches for Enhanced Medical Image Classification. Information 2024, 15, 246. [Google Scholar] [CrossRef]
Bennett, K.; Demiriz, A. Semi-supervised support vector machines. Adv. Neural Inf. Process. Syst. 1998, 11, 369–374. [Google Scholar]
Seeger, M. Learning with Labeled and Unlabeled Data 2000. Available online: http://www.cs.columbia.edu/~dplewis/candidacy/seeger01learning.pdf (accessed on 25 June 2024).
Ouali, Y.; Hudelot, C.; Tami, M. An overview of deep semi-supervised learning. arXiv 2020, arXiv:2006.05278. [Google Scholar]
Taha, K. Semi-supervised and un-supervised clustering: A review and experimental evaluation. Inf. Syst. 2023, 114, 102178. [Google Scholar] [CrossRef]
Li, Q.; Han, Z.; Wu, X.M. Deeper insights into graph convolutional networks for semi-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Ponzio, F.; Urgese, G.; Ficarra, E.; Di Cataldo, S. Dealing with lack of training data for convolutional neural networks: The case of digital pathology. Electronics 2019, 8, 256. [Google Scholar] [CrossRef]
Abdelhafiz, D.; Yang, C.; Ammar, R.; Nabavi, S. Deep convolutional neural networks for mammography: Advances, challenges and applications. BMC Bioinform. 2019, 20, 281. [Google Scholar] [CrossRef]
Chougrad, H.; Zouaki, H.; Alheyane, O. Deep convolutional neural networks for breast cancer screening. Comput. Methods Programs Biomed. 2018, 157, 19–30. [Google Scholar] [CrossRef]
Kim, H.; Shim, E.; Park, J.; Kim, Y.J.; Lee, U.; Kim, Y. Web-based fully automated cephalometric analysis by deep learning. Comput. Methods Programs Biomed. 2020, 194, 105513. [Google Scholar] [CrossRef]
Wang, C.W.; Huang, C.T.; Hsieh, M.C.; Li, C.H.; Chang, S.W.; Li, W.C.; Vandaele, R.; Marée, R.; Jodogne, S.; Geurts, P.; et al. Evaluation and comparison of anatomical landmark detection methods for cephalometric x-ray images: A grand challenge. IEEE Trans. Med. Imaging 2015, 34, 1890–1900. [Google Scholar] [CrossRef] [PubMed]
Golhar, M.; Bobrow, T.L.; Khoshknab, M.P.; Jit, S.; Ngamruengphong, S.; Durr, N.J. Improving colonoscopy lesion classification using semi-supervised deep learning. IEEE Access 2020, 9, 631–640. [Google Scholar] [CrossRef] [PubMed]
Ha, Y.; Meng, X.; Du, Z.; Tian, J.; Yuan, Y. Semi-supervised graph learning framework for apicomplexan parasite classification. Biomed. Signal Process. Control 2023, 81, 104502. [Google Scholar] [CrossRef]
Zhang, X.Y.; Shi, H.; Zhu, X.; Li, P. Active semi-supervised learning based on self-expressive correlation with generative adversarial networks. Neurocomputing 2019, 345, 103–113. [Google Scholar] [CrossRef]
Moradi, E.; Pepe, A.; Gaser, C.; Huttunen, H.; Tohka, J.; Alzheimer’s Disease Neuroimaging Initiative. Machine learning framework for early MRI-based Alzheimer’s conversion prediction in MCI subjects. Neuroimage 2015, 104, 398–412. [Google Scholar] [CrossRef]
Su, H.; Shi, X.; Cai, J.; Yang, L. Local and global consistency regularized mean teacher for semi-supervised nuclei classification. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Shenzhen, China, 13–17 October 2019; pp. 559–567. [Google Scholar]
Zhou, Y.; Chen, H.; Lin, H.; Heng, P.A. Deep semi-supervised knowledge distillation for overlapping cervical cell instance segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, 4–8 October 2020; Proceedings, Part I 23. pp. 521–531. [Google Scholar]
Li, Y.; Luo, L.; Lin, H.; Chen, H.; Heng, P.A. Dual-consistency semi-supervised learning with uncertainty quantification for COVID-19 lesion segmentation from CT images. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Proceedings, Part II 24. pp. 199–209. [Google Scholar]
Li, C.H.; Yuen, P.C. Semi-supervised learning in medical image database. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Hong Kong, China, 16–18 April 2001; pp. 154–160. [Google Scholar]
Filipovych, R.; Davatzikos, C.; For the Alzheimer’s Disease Neuroimaging Initiative. Semi-supervised pattern classification of medical images: Application to mild cognitive impairment (MCI). NeuroImage 2011, 55, 1109–1119. [Google Scholar] [CrossRef]
Batmanghelich, K.N.; Dong, H.Y.; Pohl, K.M.; Taskar, B.; Davatzikos, C. Disease classification and prediction via semi-supervised dimensionality reduction. In Proceedings of the 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Chicago, IL, USA, 30 March–2 April 2011; pp. 1086–1090. [Google Scholar]
Batmanghelich, N.K.; Taskar, B.; Davatzikos, C. Generative-discriminative basis learning for medical imaging. IEEE Trans. Med. Imaging 2011, 31, 51–69. [Google Scholar] [CrossRef]
Culotta, A.; McCallum, A. Reducing labeling effort for structured prediction tasks. In Proceedings of the AAAI, Pittsburgh, PA, USA, 9–13 July 2005; Volume 5, pp. 746–751. [Google Scholar]
Settles, B.; Craven, M. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, HI, USA, 25–27 October 2008; pp. 1070–1079. [Google Scholar]
Melville, P.; Mooney, R.J. Diverse ensembles for active learning. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 74. [Google Scholar]
Zhang, X.Y.; Wang, S.; Zhu, X.; Yun, X.; Wu, G.; Wang, Y. Update vs. upgrade: Modeling with indeterminate multi-class active learning. Neurocomputing 2015, 162, 163–170. [Google Scholar] [CrossRef]
Zhang, X.Y.; Wang, S.; Yun, X. Bidirectional active learning: A two-way exploration into unlabeled and labeled data set. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 3034–3044. [Google Scholar] [CrossRef]
Madani, A.; Ong, J.R.; Tibrewal, A.; Mofrad, M.R. Deep echocardiography: Data-efficient supervised and semi-supervised deep learning towards automated diagnosis of cardiac disease. NPJ Digit. Med. 2018, 1, 1–11. [Google Scholar] [CrossRef]
Mirza, M. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Odena, A. Semi-supervised learning with generative adversarial networks. arXiv 2016, arXiv:1606.01583. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 139–144. [Google Scholar] [CrossRef]
Radford, A. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning. Sydney, NSW, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
Shrivastava, A.; Pfister, T.; Tuzel, O.; Susskind, J.; Wang, W.; Webb, R. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2107–2116. [Google Scholar]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Virtual Event, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 2020, 33, 21271–21284. [Google Scholar]
Chen, X.; He, K. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 15750–15758. [Google Scholar]
Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural Inf. Process. Syst. 2020, 33, 9912–9924. [Google Scholar]
Lu, M.Y.; Chen, R.J.; Wang, J.; Dillon, D.; Mahmood, F. Semi-supervised histology classification using deep multiple instance learning and contrastive predictive coding. arXiv 2019, arXiv:1910.10825. [Google Scholar]
Wang, X.; Tang, F.; Chen, H.; Cheung, C.Y.; Heng, P.A. Deep semi-supervised multiple instance learning with self-correction for DME classification from OCT images. Med. Image Anal. 2023, 83, 102673. [Google Scholar] [CrossRef]
Chen, Y.; Liginlal, D. A maximum entropy approach to feature selection in knowledge-based authentication. Decis. Support Syst. 2008, 46, 388–398. [Google Scholar] [CrossRef]
Deng, X.; Cai, P.; Cao, Y.; Wang, P. Two-step localized kernel principal component analysis based incipient fault diagnosis for nonlinear industrial processes. Ind. Eng. Chem. Res. 2020, 59, 5956–5968. [Google Scholar] [CrossRef]
Cappelli, R.; Maltoni, D. Multispace KL for pattern representation and classification. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 977–996. [Google Scholar] [CrossRef]
Banerjee, A.; Bhattacharya, R.; Bhateja, V.; Singh, P.K.; Sarkar, R. COFE-Net: An ensemble strategy for computer-aided detection for COVID-19. Measurement 2022, 187, 110289. [Google Scholar] [CrossRef] [PubMed]
Ananda, A.; Ngan, K.H.; Karabağ, C.; Ter-Sarkisov, A.; Alonso, E.; Reyes-Aldasoro, C.C. Classification and visualisation of normal and abnormal radiographs; a comparison between eleven convolutional neural network architectures. Sensors 2021, 21, 5381. [Google Scholar] [CrossRef] [PubMed]
Demir, A.; Yilmaz, F. Inception-ResNet-v2 with LeakyReLU and averagepooling for more reliable and accurate classification of chest X-ray images. In Proceedings of the 2020 Medical Technologies Congress (TIPTEKNO), Antalya, Turkey, 19–20 November 2020; pp. 1–4. [Google Scholar]
Alsayed, A.; Arif, M.; Qadah, T.M.; Alotaibi, S. A Systematic Literature Review on Using the Encoder-Decoder Models for Image Captioning in English and Arabic Languages. Appl. Sci. 2023, 13, 10894. [Google Scholar] [CrossRef]
Shen, F.; Shen, C.; Shi, Q.; Van den Hengel, A.; Tang, Z.; Shen, H.T. Hashing on nonlinear manifolds. IEEE Trans. Image Process. 2015, 24, 1839–1851. [Google Scholar] [CrossRef]
Sarfraz, S.; Koulakis, M.; Seibold, C.; Stiefelhagen, R. Hierarchical nearest neighbor graph embedding for efficient dimensionality reduction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 336–345. [Google Scholar]
Soofi, E.S.; Ebrahimi, N.; Habibullah, M. Information distinguishability with application to analysis of failure data. J. Am. Stat. Assoc. 1995, 90, 657–668. [Google Scholar] [CrossRef]
Xie, J.; Girshick, R.; Farhadi, A. Unsupervised deep embedding for clustering analysis. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 478–487. [Google Scholar]
Gan, H.; Sang, N.; Huang, R.; Tong, X.; Dan, Z. Using clustering analysis to improve semi-supervised classification. Neurocomputing 2013, 101, 290–298. [Google Scholar] [CrossRef]
Bair, E. Semi-supervised clustering methods. Wiley Interdiscip. Rev. Comput. Stat. 2013, 5, 349–361. [Google Scholar] [CrossRef]
Cao, W.; Zhang, Z.; Liu, C.; Li, R.; Jiao, Q.; Yu, Z.; Wong, H.S. Unsupervised discriminative feature learning via finding a clustering-friendly embedding space. Pattern Recognit. 2022, 129, 108768. [Google Scholar] [CrossRef]
Hou, C.; Nie, F.; Yi, D.; Tao, D. Discriminative embedded clustering: A framework for grouping high-dimensional data. IEEE Trans. Neural Netw. Learn. Syst. 2014, 26, 1287–1299. [Google Scholar]
Li, X.; Guo, Y.; Schuurmans, D. Semi-supervised zero-shot classification with label representation learning. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4211–4219. [Google Scholar]
Shojaee, S.M.; Baghshah, M.S. Semi-supervised zero-shot learning by a clustering-based approach. arXiv 2016, arXiv:1605.09016. [Google Scholar]
Li, G.Z.; Yang, J.Y.; Lu, W.C.; Li, D.; Yang, M.Q. Improving prediction accuracy of drug activities by utilising unlabelled instances with feature selection. Int. J. Comput. Biol. Drug Des. 2008, 1, 1–13. [Google Scholar] [CrossRef]

Figure 1. Sample images of pre-defined Sella Turcica (ST) shapes: (A) Oval ST, (B) Circular ST, (C) Flat ST, and (D) Bridging ST. This study classified Circular ST as non-bridging, and Bridging ST was used for binary classification.

Figure 2. The schematic representation of a Hybrid Database (L) and Hybrid Case Base (

S L

) from labeled (

S L_{i}

) and unlabeled (

S L_{\hat{i}}

) case data. Feature extraction using KL divergence, mean (

μ

), and standard deviation (

σ

) is applied to both databases. Labeled data form a featured database with labels (

L_{i}

), while unlabeled data create a featured database without labels (

L_{\hat{i}}

). A dynamic responsive data and label mechanism integrates both, resulting in (1) a Hybrid Database (L) and (2) a Hybrid Case Base (

S L

) for further analysis.

Figure 2. The schematic representation of a Hybrid Database (L) and Hybrid Case Base (

S L

) from labeled (

S L_{i}

) and unlabeled (

S L_{\hat{i}}

) case data. Feature extraction using KL divergence, mean (

μ

), and standard deviation (

σ

) is applied to both databases. Labeled data form a featured database with labels (

L_{i}

), while unlabeled data create a featured database without labels (

L_{\hat{i}}

). A dynamic responsive data and label mechanism integrates both, resulting in (1) a Hybrid Database (L) and (2) a Hybrid Case Base (

S L

) for further analysis.

Figure 3. Process flow diagram of the proposed SSLDSE framework.

Figure 4. The figure illustrates a comprehensive framework of the proposed SSLDSE that integrates labeled (

S L_{i}

) and unlabeled case databases (

S L_{\hat{i}}

). Features are extracted using Kullback–Leibler divergence, mean (

μ

), and standard deviation (

σ

), forming a Hybrid Database. The data undergo stochastic augmentation and are processed through an Inception-ResNet-V2 model. A deep subspace descriptor with t-SNE refines the feature representations, and the outputs are classified by a zero-shot classifier (

Z s C

) with KL divergence loss, enabling the model to handle unseen or unlabeled ST structures.

Figure 4. The figure illustrates a comprehensive framework of the proposed SSLDSE that integrates labeled (

S L_{i}

) and unlabeled case databases (

S L_{\hat{i}}

). Features are extracted using Kullback–Leibler divergence, mean (

μ

), and standard deviation (

σ

Z s C

) with KL divergence loss, enabling the model to handle unseen or unlabeled ST structures.

Figure 5. The illustrated SSLDSE architectural framework processes labeled (L) and semi-labeled (

S L_{i}

) data using Inception-ResNet-V2 as the CNN backbone to extract features (P, Q) and estimate pairwise probability densities (

P_{i, j}

Q_{i, j}

). KL divergence (

D_{KL} (P ‖ Q)

) minimizes divergence through the optimization of (Y). Manifold learning maps feature matrices (X) to t-SNE representations (Y) while preserving structural relationships. The SSL framework employs deep embedding and clustering (mean:

μ_{j}

, covariance:

Σ_{j}

) for feature representation. The zero-shot classifier constructs semantic vectors and applies KL divergence loss for output prediction (

O_{i}

Figure 5. The illustrated SSLDSE architectural framework processes labeled (L) and semi-labeled (

S L_{i}

) data using Inception-ResNet-V2 as the CNN backbone to extract features (P, Q) and estimate pairwise probability densities (

P_{i, j}

Q_{i, j}

). KL divergence (

D_{KL} (P ‖ Q)

μ_{j}

, covariance:

Σ_{j}

) for feature representation. The zero-shot classifier constructs semantic vectors and applies KL divergence loss for output prediction (

O_{i}

Figure 6. Confusion matrix and ROC curve showcasing the validation results of the binary classifier, highlighting the proposed model’s classification performance through true positive/negative rates and the AUC-ROC score.

Figure 7. t-SNE plots visualizing the quantitative assessment of the proposed SSLDSE method. The plots illustrate the effective separation between bridging and non-bridging labels from our proprietary and IEEE ISBI 2015 datasets, demonstrating a clear class distinction.

Figure 8. Boxplots illustrate a detailed comparison of the classification error rates for the proposed SSLDSE method, showing the distribution of error rates across (a) the proprietary dataset and (b) the IEEE ISBI 2015 dataset, highlighting the variability and consistency in classification accuracy.

Figure 9. Boxplots comparing classification error rates across utilized SSL approaches and the proposed SSLDSE method, illustrating performance differences and the effectiveness of SSLDSE in reducing classification errors.

Figure 10. Visual interpretation of errors in ST-binary classification predictions, illustrating misclassified instances and highlighting the areas where the model’s predictions diverge from the true labels.

Table 1. Experimental results of the proposed SSLDSE method for binary classification between the bridging and non-bridging ST classes across 50-epoch intervals up to 300 epochs, considering the precision, recall, and F1 score as evaluation metrics.

Epochs	Precision		Recall		F1 Score
Epochs	Brd	NBrd	Brd	NBrd	Brd	NBrd
50	88.03%	91.45%	90.00%	93.25%	90.12%	91.73%
100	86.82%	90.30%	90.93%	88.75%	88.45%	96.47%
150	87.25%	85.75%	92.51%	89.60%	89.01%	92.85%
200	87.79%	92.10%	91.42%	95.40%	89.29%	93.29%
250	85.42%	86.50%	90.77%	96.10%	88.49%	96.82%
300	87.17%	90.00%	92.79%	91.85%	89.65%	94.51%
Average	87.08%	89.62%	91.37%	92.73%	89.17%	94.14%

Note: Brd—Bridging; NBrd—Non-Bridging.

Table 2. Experimental results present the overall SSLDSE method scores for binary ST classification, highlighting the model’s effectiveness with average accuracy, precision, recall, and F1 score metrics for bridging and non-bridging classes.

Epochs	Accuracy	Precision	Recall	F1 Score
50	98.50%	97.25%	96.80%	98.50%
100	96.75%	95.40%	94.20%	96.30%
150	97.10%	96.75%	95.00%	97.90%
200	97.90%	95.85%	94.85%	97.25%
250	96.25%	96.00%	96.10%	98.10%
300	97.95%	94.50%	94.50%	97.60%
Average	97.28%	96.12%	95.65%	97.77%

Table 3. Comparison of classification accuracy between the utilized SSL approaches and the proposed SSLDSE method illustrates the performance improvements and demonstrates the effectiveness of the SSLDSE method in achieving higher accuracy across the evaluated models.

Epochs	GAN		Contrastive		Active		MIncepResV2		SSLDSE
Epochs	Brd	NBrd	Brd	NBrd	Brd	NBrd	Brd	NBrd	Brd	NBrd
50	68.95%	75.25%	77.85%	83.65%	82.45%	85.45%	90.35%	92.15%	95.85%	97.80%
100	70.40%	73.80%	78.60%	84.90%	80.90%	84.75%	88.75%	90.85%	94.30%	96.50%
150	69.20%	74.45%	79.20%	82.75%	81.30%	86.30%	89.10%	91.60%	95.50%	98.10%
200	71.05%	76.10%	77.30%	85.20%	83.15%	83.85%	90.85%	93.25%	96.10%	97.25%
250	67.75%	72.90%	78.95%	83.40%	79.75%	85.90%	87.65%	92.70%	93.95%	96.90%
300	71.55%	75.15%	78.95%	84.85%	82.55%	84.35%	88.95%	90.85%	95.45%	98.00%
Average	69.82%	74.77%	78.29%	84.13%	81.67%	85.10%	89.27%	91.92%	95.19%	97.37%

Note: MIncepResV2—Modified Inception-ResNet-v2; Brd—Bridging; NBrd—Non-Bridging.

Table 4. Accuracy, AUC of ROC, and F1 score of the proposed SSLDSE method are validated and compared with state-of-the-art methods through blind testing on the IEEE ISBI 2015 and proprietary datasets, demonstrating the outstanding performance of SSLDSE across both datasets.

Models	Accuracy		AUC-ROC		F1 Score
Models	IEEE	Our	IEEE	Our	IEEE	Our
GAN SSL	68.61%	68.93%	64.88%	65.73%	62.86%	67.01%
Contrastive SSL	70.66%	73.11%	74.40%	74.07%	72.06%	73.17%
Active SSL	84.92%	85.64%	83.73%	83.00%	83.00%	85.07%
MIncepResV2	85.07%	86.12%	85.97%	84.03%	83.02%	82.00%
Proposed SSLDSE	90.06%	92.72%	91.02%	93.57%	92.08%	95.37%

Note: MIncepResV2—Modified Inception-ResNet-v2.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shakya, K.S.; Alavi, A.; Porteous, J.; Khatri, P.; Laddi, A.; Jaiswal, M.; Kumar, V. Semi-Supervised Deep Subspace Embedding for Binary Classification of Sella Turcica. Appl. Sci. 2024, 14, 11154. https://doi.org/10.3390/app142311154

AMA Style

Shakya KS, Alavi A, Porteous J, Khatri P, Laddi A, Jaiswal M, Kumar V. Semi-Supervised Deep Subspace Embedding for Binary Classification of Sella Turcica. Applied Sciences. 2024; 14(23):11154. https://doi.org/10.3390/app142311154

Chicago/Turabian Style

Shakya, Kaushlesh Singh, Azadeh Alavi, Julie Porteous, Priti Khatri, Amit Laddi, Manojkumar Jaiswal, and Vinay Kumar. 2024. "Semi-Supervised Deep Subspace Embedding for Binary Classification of Sella Turcica" Applied Sciences 14, no. 23: 11154. https://doi.org/10.3390/app142311154

APA Style

Shakya, K. S., Alavi, A., Porteous, J., Khatri, P., Laddi, A., Jaiswal, M., & Kumar, V. (2024). Semi-Supervised Deep Subspace Embedding for Binary Classification of Sella Turcica. Applied Sciences, 14(23), 11154. https://doi.org/10.3390/app142311154

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semi-Supervised Deep Subspace Embedding for Binary Classification of Sella Turcica

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset and Pre-Processing

2.2. Semi-Supervised Methods

2.2.1. Active Learning

2.2.2. Generative Adversarial Networks

2.2.3. Contrastive Learning

2.3. Proposed Semi-Supervised Deep Subspace Embedding (SSLDSE) Architecture

2.3.1. Exploring Non-Linear Embeddings with Kullback–Leibler Divergence

2.3.2. Semi-Supervised Learning with Deep Subspace Descriptor

2.3.3. Enhancing Model Robustness with Zero-Shot Classifier

2.4. Experimental Settings

3. Results

3.1. Experimental Results of Binary Classification

3.2. Experimental Validation Results of Classification

3.3. Error Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI