Open AccessArticle

Dimensionality Reduction and Clustering Strategies for Label Propagation in Partial Discharge Data Sets

Ronaldo F. Zampolo

^1,*

Frederico H. R. Lopes

Rodrigo M. S. de Oliveira

^1,*

Martim F. Fernandes

and

Victor Dmitriev

Institute of Technology (ITEC), Federal University of Pará (UFPA), Belém 66075-110, PA, Brazil

Electrical Engineering Department, State University of Londrina, Londrina 86057-970, PR, Brazil

Authors to whom correspondence should be addressed.

Energies 2024, 17(23), 5936; https://doi.org/10.3390/en17235936

Submission received: 12 October 2024 / Revised: 4 November 2024 / Accepted: 18 November 2024 / Published: 26 November 2024

(This article belongs to the Special Issue Energy, Electrical and Power Engineering: 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

Deep learning approaches have been successfully applied to perform automatic classification of phase-resolved partial discharge (PRPD) diagrams. Under the supervised learning paradigm, however, the performance of classifiers strongly depends on the availability of large and previously labeled data sets. Labeling is an intensive and time-consuming labor, typically involving the manual annotation of a large number of data examples by an expert. In this work, we propose a label propagation algorithm applied to PRPD data sets, aiming to reduce the time necessary to manually label PRPDs. Our basic pipeline is composed of three phases: pre-processing, dimensionality reduction procedures, and clustering. Different configurations of the basic pipeline are tested by using PRPDs obtained from online measurements in hydrogenerators. The performance of each configuration is assessed by using the Silhouette, Caliński–Harabasz, and Davies–Bouldin scores. The clustering of the best three configurations is compared with annotated PRPDs by using the Fowlkes-Mallows index. Results suggest our strategy can substantially decrease the time for manual labeling.

Keywords:

label propagation; Kernel-PCA; PaCMAP; k-means++; clustering

Graphical Abstract

1. Introduction

Partial discharge (PD) analysis is an effective approach to monitoring the condition of high-voltage (HV) equipment. Practical evidence indicates that insulation defects are the main cause of HV machine failures in 60% of the occurrences [1,2]. By monitoring the PD activity of HV equipment, an analyst can decide the better moment for starting a maintenance intervention, either to prevent unforeseen failures or to reduce downtime.

The PD shape is closely related to its originating event in a way that periodic PD analysis allows following the evolution and identifying the type of insulation defect [3]. Individual pulse shape analysis, however, does not seem to be feasible in the field due to the need to recover the original pulse shape, i.e., the pulse shape at the very PD site. Thus, practical PD monitoring uses phase-resolved partial discharge (PRPD) diagrams, in which amplitude and phase pairs from the detected PDs are counted. After measuring a high number of PD pulses during several powerline cycles, a PRPD pattern is formed. The technical specification IEC/TS 60034-27-2 [4] associates specific PRPD patterns with distinct insulation defects, most of them usually found in regular operation of hydroelectric generators.

As in other technical fields, first automatic PD classifiers were based on classical machine learning techniques—like hidden Markov models [5], fuzzy classifiers [6], and support vector machines [7]. Then, we saw the development of strategies based on shallow artificial neural networks [1,8]. Due to their superior results, however, deep learning approaches have dominated the scenario of machine learning applications, including the area of condition monitoring of HV equipment [9,10,11]. As illustrated by Mantach et al. [12], a convolutional neural network (CNN) is designed for the multi-label classification of PD signals under controlled conditions. In that context, the multi-label classification addresses the challenge of discerning PRPD patterns generated by potentially multiple types of PDs concurrently. Excellent reviews on the application of deep learning techniques to the problem of automatic classification, detection, and localization of PDs can be found in [10,13]. Those papers cover a large variety of approaches, including convolutional neural networks (CNNs), autoencoders, generative adversarial networks (GANs), and long-short time memory (LSTM) networks. The pros and cons are also addressed, with emphasis on the difficulty in obtaining labeled PD data sets for training.

The remarkable performance of deep learning networks in classification tasks, however, is not granted. Apart from ensuring the attainment of hardware and software requirements necessary to optimize and store possibly millions of parameters, the quality of the network inference relies upon the richness of the training data, meaning a training set composed of numerous and diversified examples [14].

In supervised learning approaches, data sets with examples previously labeled should be available to train the classifier. Such data sets are, in general, expensive because of the costs involved in acquiring data and having an expert manually label the examples. Semi-supervised strategies, however, can reduce the effort required to annotate large data sets. Specifically, label propagation is a semi-supervised classification technique in which a small group of labeled examples has their labels propagated throughout the data set using a distance criterion, an approach first proposed by Zhu and Ghahramani [15].

In this paper, we present a label propagation algorithm for PRPD data sets to reduce the time required for manual labeling. It consists of a semi-supervised technique that combines dimension reduction and clustering operations. The main contributions of this work are as follows:

The proposal of a functional label propagation technique tested in PRPDs obtained from online hydrogenerators;
A complete methodology for setup optimization of the label propagation algorithm by assessing clustering performance;
A pre-processing procedure to cope with PRPDs with different amplitude scales.

The remainder of this text is organized as follows: the proposed label propagation algorithm is detailed in Section 2; Section 3 addresses the experimental procedure used to compare different setups of the label propagation technique; in Section 4, we present and discuss the results; conclusions are stated in Section 5; and, finally, a list of references ends the text.

2. The Proposed Label Propagation Algorithm

Figure 1 illustrates the PRPD label propagation strategy that we propose. It comprises three steps: (a) PRPD data set clustering; (b) labeling of cluster centroids; and (c) label spreading. In the first step, PRPD diagrams of the data set are grouped into clusters by using the k-means++ algorithm [16]. This procedure aims to provide an initial and rough classification of PRPDs according to the type of PD. Next, instead of labeling every PRPD of the data set as is usually done, only cluster centroids are manually inspected, which reduces the time required for this task. The amount of reduction depends on the number of clusters found at this stage. Then, in the last step, the labels attributed to each cluster centroid propagate to the other PRPD members of the respective cluster. The effectiveness of the whole procedure mostly depends on the quality of the first step; therefore, we dive deeper into our clustering strategy next.

The PRPD clustering approach that we propose can be further divided into other three stages: (a) pre-processing, (b) dimensionality reduction, and (c) clustering per se.

2.1. Pre-Processing

As the setup of data acquisition equipment must be tuned to maximize signal quality, the amplitude range of PRPDs can be different, even when such PRPDs are obtained from the same machine. Such a difference can potentially lead to misclassification issues in the clustering phase. For this reason, we introduce a scaling procedure to put all PRPDs under analysis in the same amplitude range. If the original amplitude range (Figure 2a) is higher than the range after scaling, the resulting PRPD will exhibit a pattern of horizontal empty lines (Figure 2b,c) that might also disturb the clustering phase. Thus, to mitigate such a pattern, we use a grey-scale closing operator with a 3 × 3-structuring element. The operation of grey-scale closing, which consists of grey-scale dilation followed by grey-scale erosion, is intended to reduce the amplitude of local minima, thereby filling the gaps created by the amplitude scaling procedure [17]. An example of the output of the closing operation applied to a scaled PRPD is shown in Figure 2d.

2.2. Dimensionality Reduction

In multivariate (and possibly high-dimensional) data, dimensionality reduction techniques aim to find out an alternative representation of the original data with a lower number of dimensions, ideally without loss of information. Essentially, we are looking for a mapping to transform data points

x_{i} \in R^{H}

into

y_{i} \in R^{L}

, where both H and L are integers and

L < H

The two main reasons for using dimensionality reduction techniques are vizualisation and efficient representation of data. Visualization is attained by mapping the original data into 2D and 3D sub-spaces. This might help the analyst to visually understand the data set structure and its essential features. Efficient representation of data facilitates further processing steps either by reducing the number of calculations and storage units or eliminating misleading pieces of information, such as noise and outliers.

The underlying assumption that justifies the procedure is that the original data representation (in the so-called visible space) is redundant and one can find a more efficient representation into a lower number of dimensions (a.k.a., the latent space) without loss of relevant information.

Before proceeding with the dimensionality reduction, each

M \times N

PRPD is vectorized. The following techniques are tested in this work:

Principal component analysis (PCA): It is a well-known mapping based on projecting data points onto directions given by the eigenvectors of the data covariance matrix. Such eigenvectors can be ordered according to their importance in terms of representing the data [18]. Despite the time of its proposal, the technique and its variants are still used in several current applications [19].
Kernel PCA: This technique can be considered an extension of PCA, where data are subjected to a kernel transformation before the application of PCA. Such data preprocessing helps PCA capture nonlinear relations in data points [20]. The kernels tested so far in this work are the following: radial basis function (RBF) and cosine.
Pairwise controlled manifold approximation projection (PaCMAP): This recent algorithm aims to preserve local and global structures of the original data points in the process of reducing dimensionality. The technique considers three types of data pairs—near, mid, and further pairs—to each of which a loss function is associated. A total loss, defined as the linear combination of the three loss functions, guides the optimization process, favoring better control over the attractive and repulsive forces between data points [21].

2.3. Clustering

Clustering is a fundamental task in unsupervised learning, aiming to partition a data set of n data points into k groups or clusters, where data points within the same cluster are more similar to each other compared with those in other clusters. One of the most popular clustering algorithms is the k-means [22] (Algorithm 1).

Algorithm 1: K-means

Input: number of clusters

k > 1

, data set

Θ

Output: data set labels

Λ

Choose randomly k centroids among data points;

repeat

Assign each data point to the nearest cluster centroid;

Recalculate the cluster centroids based on the mean Euclidian distance of all data points assigned to each cluster;

until Cluster assignments no longer change significantly

The sensitivity to the initial choice of centroids is one of the challenges in the standard k-means algorithm, which can lead to suboptimal solutions. In this work, we use the k-means++ algorithm [16], an enhanced version of the standard k-means. The k-means++ improves the initialization step to choose centroids that are far apart from each other, reducing the likelihood of converging to a suboptimal solution. In Algorithm 2, we give a brief description of how the initial centroids are chosen by the k-means++ algorithm.

Algorithm 2: K-means++

Input: number of clusters

k > 1

, data set

Θ

Output: data set labels

Λ

Select the first centroid randomly from the data points;

repeat

Compute the Euclidian distance of each data point from the nearest centroid;

Determine a new centroid, by using a weighted probability distribution, where each data point can be chosen with a probability proportional to the square of its Euclidian distance to the corresponding centroid;

until k centroids have been chosen

Run the standard k-means algoritm, skipping the first step (Choose randomly k centroids).

2.4. Feature Scaling

In the training process of a machine learning strategy, some input features might dominate others due to their comparatively higher absolute values. Such a dominance can obscure relevant input information, leading to poor learning. Feature scaling is a procedure conceived to avoid this inconvenience, consisting of normalizing each feature across the whole data set, i.e., each feature is transformed to have a zero mean and unitary standard deviation.

In our clustering strategy, two optional feature scaling blocks are located before dimensionality reduction and before clustering. Figure 3 depicts the complete processing chain for PRPD data set clustering, which includes the procedures of preprocessing, feature scaling, dimensionality reduction, and clustering.

3. Evaluation Methodology

In this section, we detail the procedures adopted to evaluate the PRPD data set clustering system (Figure 3). In particular, we address: (a) the main characteristics of the PRPD data set; (b) how to choose the amplitude scaling factor for PRPDs in the pre-processing phase; (c) the strategy used in determining the dimensionality d of the latent space; (d) how to calculate the number of clusters k for the kmeans++ algorithm; and (e) the metrics used for assessing the performance of different configurations of the clustering system.

3.1. PRPD Data Set

Our data set—provided by an electric utility company that operates in Brazil—is composed of 4079 PRPD examples obtained from online measurements in hydroelectric generators (Francis turbine-type, various manufacturers, and power ratings of 350 and 390 MVA) over a period from 2017 to 2021. Capacitive couplers are installed near the stator windings of the monitored machines, using a setup similar to Figure 4. The measurement system, named Instrumentation for Monitoring and Analysis of Partial Discharges (IMA-DP) [23], samples data at the frequency of 100 MHZ and employs a proprietary filter bank to identify partial discharge peaks and their corresponding phases relative to the sinusoidal wave (60 Hz).

For the purpose of counting partial discharges, the acquisition software partitions the phase × amplitude plane into a matrix of

256 \times 256

cells. Figure 5 shows an example of a PRPD pattern from the data set.

Each PRPD example was manually labeled in accordance with the guidelines in IEC 60034-27-2 [4]. The classes found are associated with the following types of partial discharges: gap, surface, slot, corona, and void. PRPD diagrams that did not match any of the standardized patterns in [4] were classified as unidentified.

Figure 6 displays the count of PRPD diagrams for each hydroelectric unit in our data set. It is worth noting that most of the generators have more than 100 associated PRPDs. In our analysis, clustering is performed not on the entire data set but separately for each hydroelectric generator. This approach favors clustering performance by reducing PRPD variability and is also feasible in practical applications.

3.2. Amplitude Scaling Factor

To define the amplitude scaling factor, we make an ordered list with all the PRPD amplitude ranges of a certain machine. Then, we take the

δ

-th percentile of such a list as the new range to which all PRPD amplitudes will be scaled. We empirically set the value of

δ

to 75.

3.3. Number of Reduced Dimensions

The dimensionality of the latent space is generally defined by looking at elbow-like curves, in which a great deal of subjectiveness could be required to determine the point of diminishing returns, i.e., the optimal number of dimensions. To overcome this obstacle, we use the approach proposed by Zhu and Ghodsi [24] that applies the profile log likelihood (PLL) to automatically identify the dimensionality of the latent space [25]. By using the PLL function, finding the latent dimensionality becomes a maximization problem:

d = max_{L} p (L),

(1)

where d is the estimated dimensionality of the latent space; p denotes the PLL function; and L indicates the number of latent dimensions for which one wants to calculate the PLL.

The PLL function p is given by

\begin{matrix} p (L) & = \sum_{i = 1}^{L} log N (λ_{i} | μ_{1} (L), σ (L)) + \\ + \sum_{i = L + 1}^{L_{\max}} log N (λ_{i} | μ_{2} (L), σ (L)), \end{matrix}

(2)

where

λ_{i}

denotes a performance measure associated with the i-th dimension;

N (λ | μ, σ)

is the Gaussian function evaluated at the point

λ

, given the mean

μ

and standard deviation

σ

; and

L_{\max}

is the maximum number of latent dimensions considered.

The other variables in (2) are defined as follows:

\begin{matrix} μ_{1} (L) & = \frac{\sum_{i \leq L} λ_{i}}{L}, \end{matrix}

(3)

\begin{matrix} μ_{2} (L) & = \frac{\sum_{i > L} λ_{i}}{L_{\max} - L}, \end{matrix}

(4)

\begin{matrix} σ^{2} (L) & = \frac{\sum_{i \leq L} {[λ_{i} - μ_{1} (L)]}^{2} + \sum_{i > L} {[λ_{i} - μ_{2} (L)]}^{2}}{L_{\max}}, \end{matrix}

(5)

where

μ_{1} (L)

μ_{2} (L)

denote the performance measure average up to and above the L-th dimension, respectively, and

σ^{2} (L)

represents the combined variance.

The performance measure

λ_{i}

in (2)–(3) varies according to the technique. For the conventional PCA,

λ_{i}

is the i-th singular value, while in the kernel-PCA [20], we use the reconstruction MSE (mean squared error) up to the i-th principal component.

Unfortunately, the procedure using PLL does not apply for PaCMAP, as this technique does not allow data reconstruction from its representation in the latent space. In addition, PaCMAP operation requires the definition of four parameters: the number of components (latent dimensions); the number of (nearest) neighbors; the ratio of the number of mid-near pairs to the number of neighbors (MN_ratio); and the ratio of the number of further pairs to the number of neighbors (FP_ratio). The strategy chosen to optimize such parameters was the grid search. The ranges and resolution for each parameter were defined from default values or a priori knowledge:

Number of components: integers in the interval $[2, 10]$ .
Number of neighbors: 10 (default value in the reference code (https://github.com/YingfanWang/PaCMAP, accessed on 17 November 2024)).
MN_ratio: ten equally spaced samples in the interval $[0.1, 0.6]$ .
FP_ratio: integers in the interval $[2, 10]$ .

3.4. Number of Clusters

To estimate the optimal number of clusters, we calculate the within-cluster sum of squares (WCSS) after the convergence of the kmeans++ algorithm. The WCSS is given by

ϕ (L) = \sum_{m = 1}^{L} \sum_{x \in C_{m}} d {(x, {\bar{x}}_{m})}^{2},

(6)

where

ϕ

represents the WCSS of a clustered data set; L is the total number of clusters;

C_{m}

refers to the m-th cluster; x denotes a PRPD in the latent space;

{\bar{x}}_{m}

indicates the centroid of

C_{m}

; and

d (x, y)

is the Euclidean distance between x and y.

ϕ (L)

is also an elbow-like function of L, we again use the PLL approach to estimate the optimal number of clusters k. In this case,

λ_{i}

in (3)–(5) refers to

ϕ (i)

3.5. Performance Assessment of the Clustering System

Variations in the processing chain depicted in Figure 3 result in different PRPD clustering, whose performance we assess by using specific objective metrics. Considering the context of this work—the development of a strategy that facilitates the annotation of PRPDs in a data set—it makes more sense to use an approach that does not require ground truth data when selecting from different configurations of the clustering system. For this sake, we use three clustering metrics—Silhouette, Caliński–Harabasz, and Davies–Bouldin scores—which are suitable when the ground truth is unknown.

To evaluate the effectiveness of the chosen configuration, however, we compare the clustering estimated by our method with annotated PRPDs by calculating the Flowkes–Mallows index.

The aforementioned metrics are detailed next.

3.5.1. Silhouette Score [26]

The Silhouette score is defined as

α = \frac{b - a}{max (a, b)},

(7)

where a and b are the mean intra-cluster and mean nearest-cluster distances, respectively.

Specifically, a is the mean distance between a sample and all other samples in the same cluster for all the clusters, and b is the mean distance between a sample and all other samples in the nearest cluster for all samples and clusters.

The Silhouette score ranges from −1 to 1. Misassignments are indicated by scores near −1, while values near 0 indicate overlapping clusters. A high score implies well-separated and dense clusters.

3.5.2. Caliński–Harabasz Score [27]

The Caliński–Harabasz score, also known as the variance ratio criterion, can be used to assess the performance of a clustering procedure when the ground-truth is not known. Such a metric is given by the ratio between the sum of inter-cluster dispersion and the sum of within-cluster dispersion:

β = \frac{tr (B_{k})}{tr (W_{k})} \frac{N - k}{k - 1},

(8)

where k is the number of clusters and N denotes the total number of samples in a data set.

In turn,

B_{k}

and

W_{k}

are given by

\begin{matrix} B_{k} & = \sum_{q = 1}^{k} n_{q} (c_{q} - c_{E}) {(c_{q} - c_{E})}^{T}, \end{matrix}

(9)

\begin{matrix} W_{k} & = \sum_{q = 1}^{k} \sum_{x \in C_{q}} (x - c_{q}) {(x - c_{q})}^{T}, \end{matrix}

(10)

where

C_{q}

is the set of points in the q-th cluster,

c_{q}

denotes the center point of cluster q,

c_{E}

indicates the center point of the data set, and

n_{q}

is the number of points in the cluster q.

The higher the score, the better defined clusters.

3.5.3. Davies–Bouldin Score [28]

This metric is a measure of similarity of clusters defined as the ratio of within-cluster distances to between-cluster distances. The Davies–Bouldin score is defined as

γ = \frac{1}{k} \sum_{i = 1}^{k} max_{i \neq j} R_{i j},

(11)

where k is the number of clusters, and

R_{i j}

is given by

R_{i j} = \frac{s_{i} + s_{j}}{d_{i, j}},

(12)

where

s_{i}

is the average distance between every point of cluster i and the centroid of that cluster; and

d_{i j}

is the distance between centroids of clusters i and j.

Higher scores indicate clusters that are closer to each other and more dispersed. Thus, better clustering is associated with lower scores (the minimum score is zero).

3.5.4. Fowlkes-Mallows Index [29,30]

The Fowlkes-Mallows index (FMI) evaluates the similarity between two clusterings

Ω_{1}

and

Ω_{2}

. In our case, we use the FMI to assess the effectiveness of our PRNU clustering technique by comparing the estimated clusters with manually annotated PRNUs (ground truth data). The FMI

δ

is defined as follows:

\begin{matrix} δ & = \frac{TP}{\sqrt{TP + FP} \sqrt{TP + FN}}, \end{matrix}

(13)

\begin{matrix} = \sqrt{TPR \cdot PPV}, \end{matrix}

(14)

where TP is the number of true positives (the number of pairs of points that are present in the same cluster in both

Ω_{1}

and

Ω_{2}

); FP denotes the number of false positives (the number of pairs of points that are present in the same cluster in

Ω_{1}

, but not in

Ω_{2}

); FN indicates the number of false negatives (the number of pairs of points that are present in the same cluster in

Ω_{2}

, but not in

Ω_{1}

). TPR is the true positive rate (sensitivity), and PPV means the positive predictive rate (precision). The clustering

Ω_{1}

is obtained by manual annotation, whereas the

Ω_{2}

is formed by automatic clustering.

The FMI is defined, thus, as the geometric mean of precision and sensitivity. The FMI was chosen in this work—amongst other metrics that could compare the output of a clustering method with ground truth data—because it is relatively independent of the number of clusters in

Ω_{1}

and

Ω_{2}

. This is particularly important in our case because the number of clusters in our algorithm is defined automatically (see Section 3.4), as we do not know in advance the number of actual PRPD clusters for a certain machine. To interpret FMI values, we consider the following: (a)

δ = 0

indicates clusterings

Ω_{1}

and

Ω_{2}

are defined randomly; (b)

δ = 1

characterizes the situation when clusterings

Ω_{1}

and

Ω_{2}

are identical; and (c) the intervals

δ < 0.5

and

δ > 0.5

suggest clusterings with low and high levels of agreement, respectively.

4. Results and Discussion

As mentioned in Section 3, we evaluated the performance of several configurations of our clustering system, according to three metrics—Silhouette, Caliński–Harabasz, and Davies–Bouldin scores. We are looking for configurations with a good balance between clustering performance and computational complexity.

Starting with the Silhouette score, Figure 7 depicts the weighted average and the standard deviation of the metric for each configuration across the hydroelectric units in the data set. The weights are determined by the number of PRPD examples for each hydroelectric unit. Remember: the higher the Silhouette score, the better the clustering performance. One can see the weighted average varies significantly across configurations (ranging from −0.0080 to 0.5791). The best 10 configurations according to the Silhouette score are presented in Table 1. We have in the first column the weighted average of the Silhouette score, while the second column shows the corresponding standard deviation. The third and forth columns exhibit the maximum number of latent dimensions and the maximum number of clusters, respectively, obtained by following the procedures defined in Section 3. The last column shows the identification number ID of the configuration. Note all the values for maximum number of latent dimensions

d_{max}

are 10, because the selected configurations all use PaCMAP (10 is the upper bound limit for

d_{max}

in PaCMAP), and the maximum number of clusters

n_{max}

remains in the interval between 6 and 9.

Now, considering the Caliński–Harabasz score, the relative performance for the same configurations change. Figure 8 exhibits the weighted average and the standard deviation for each configuration.

As before, we show the top 10 configurations selected by this metric in Table 2.

Similarly, Figure 9 and Table 3 present the results for the Davies–Bouldin score (for this metric, the lower the score, the better the performance).

Comparing Table 1, Table 2 and Table 3, we note the Silhouette score selects configurations with lower maximum number of latent dimensions than the configurations selected by Caliński–Harabasz and Davies–Bouldin scores. Latent space with lower dimensionality will potentially lead to a less complex clustering step. Concerning the maximum number of clusters, this parameter has an effect on the time required to manually label the data (the lower the number of clusters, the less the time required to annotate the data set).

Table 4 dives into the details of the best configurations pointed out by each of the three metrics. Note the best configuration selected by the Caliński–Harabasz score is also the least complex one, where PCA (the simplest strategy when comparing with kPCA and PaCMAP) is the dimensionality reduction approach, with absence of PRPD amplitude scaling, closing operation, and standard feature scaling. In contrast, the best configuration given by the Silhouette score is the most demanding in terms of computational complexity, requiring PRPD amplitude scaling, closing, and feature scaling before the dimensionality reduction step, performed by the PaCMAP algorithm.

In Table 5, we present the most common configurations, meaning here the four configurations that appear in the three top-10 tables (Table 1, Table 2 and Table 3). We observe that both PRPD amplitude scaling (AS) and feature scaling before dimensionality reduction operations (FS–DR) are utilized in half of the configurations. Regarding the feature scaling before clustering (FS–CL), we see the absence of this operation is verified in all configurations, while all configurations use the PaCMAP algorithm for the dimensionality reduction (DR). The closing operation is used in only one configuration.

We also evaluated the correlation between the three metrics to verify their equivalence. Table 6 shows the Spearman correlation coefficients [31] calculated for the weighted averages of the clustering metrics. Note the moderate rank correlations between Silhouette and Caliński–Harabasz average scores (direct correlation) and between Silhouette and Davies–Bouldin average scores (inverse correlation).

Figure 10 and Figure 11 provide a clearer understanding of the calculated correlation coefficients. The monotonicity between metrics suggests that they are not strictly equivalent, thereby capturing complementary aspects of a given clustering output.

To reinforce the non-equivalence between clustering metrics, Figure 12 and Figure 13 present two examples of PRPD clusters obtained by different configurations, each one of them indicated by the Silhouette and Davies–Bouldin scores for the same hydroelectric unit of our data set.

To evaluate the effectiveness of our clustering strategy, we calculate the FMI of the clusterings obtained by the configurations 02, 37, and 64. These configurations were indicated as the best configuration by the Caliński–Harabasz, Davies–Bouldin, and Silhouette scores, respectively. The FMI measures the similarity between ground truth and the output of our clustering procedure for each machine of our data set. Descriptive statistics of this evaluation are presented in Table 7. Note the configuration 64 presents superior figures for mean, standard deviation, and minimum value, while the other configurations outperform in terms of maximum value. For all configurations, the FMI mean

μ_{δ}

indicates moderate/high similarity between estimated clustering and annotated PRPDs. In some cases, significantly high similarity can be achieved (see

{max}_{δ}

in Table 7). The proposed clustering procedure is implemented in Python, and our scripts run in a cloud computing environment on Google Colab with Python 3, 12.7 GB of RAM, and no hardware acceleration. The running time varies depending on the configuration choices (pre-processing operations and dimensionality reduction strategy). As an example, for Configuration 64 (see Table 4), the total processing time to cluster the entire data set (4079 PRPD diagrams) was 155.48 s, averaging approximately 6.21 s per machine.

In Table 8, we compare the number of PRPD diagrams

N_{p}

with the number of clusters

N_{c}

using Configuration 64 for each hydroelectric generator of the data set. The last column presents the expected reduction in the number of manual PRPD inspections when using the proposed technique compared with the scenario where every PRPD diagram is manually evaluated. These results clearly demonstrate the practical applicability of the proposed approach.

5. Conclusions

This work proposes a practical PRPD label propagation strategy aimed to reduce the time required for manual annotation of PRPD data sets. We focused on the performance assessment of the clustering phase due to its critical role in the whole process. Several configurations were tested by using three clustering metrics—Silhouette, Caliński–Harabasz, and Davies–Bouldin scores—which guide the clustering procedure without requiring prior knowledge of the actual number of clusters in the data set. Our numerical results indicate that the Silhouette score tends to favor configurations with lower dimensionality in the latent space and a reduced number of clusters. Both subjective and objective evaluations show that the estimated clustering obtained by the optimal configurations of our method generally presents moderate to high similarity with manually annotated PRPDs. It is worth noting that our data set comprises complex and noisy PRPDs obtained from online measurements in hydroelectric generators. Additionally, the proposed technique is not computationally intensive, being straightforward to implement and readily adaptable to various dimensionality reduction and clustering algorithms. The reduction in the number of PRPD diagrams for labeling a PRPD data set can be inferred from the ratio between the number of estimated clusters and the number of PRPDs, which averaged 0.0140 (a reduction of 98.6026%) in one of the best configurations of our clustering system and available data set. Although reported results support the feasibility of the proposed approach, the Fowlkes-Mallows indexes obtained from comparisons between estimated clusters and ground truth annotations suggest that there is still room for improvement. In this sense, further investigations may consider testing other clustering and dimensionality reduction techniques to enhance the similarity between the clustering output and the ground truth data.

Author Contributions

Conceptualization, R.F.Z.; data curation, F.H.R.L.; formal analysis, R.F.Z.; funding acquisition, V.D.; investigation, R.F.Z.; methodology, R.F.Z.; project administration, V.D.; resources, R.M.S.d.O.; software, R.F.Z. and F.H.R.L.; supervision, R.M.S.d.O. and V.D.; validation, R.F.Z.; visualization, R.F.Z. and F.H.R.L.; writing—original draft, R.F.Z., F.H.R.L. and M.F.F.; writing—review and editing, R.F.Z., R.M.S.d.O., M.F.F. and V.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the CPFL, ENERCAN, and BAESA via the R&D project with contract number 00642-2905/2019, which is regulated by the ANEEL R&D program. The paper publication fee was funded by PROPESP/UFPA, under the PAPQ program.

Data Availability Statement

The authors reserve the right to not disclose the Eletronorte’s private data sets used in this study.

Acknowledgments

Authors are grateful to CPFL, ENERCAN, BAESA, and UFPA for their support. We also would like to express our gratitude to Fernando de Souza Brasil (Eletronorte/Eletrobrás) for providing us with PRPD data acquired from 2017 to 2021.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Raymond, W.J.K.; Illias, H.A.; Bakar, A.H.A.; Mokhlis, H. Partial discharge classifications: Review of recent progress. Measurement 2015, 68, 164–181. [Google Scholar] [CrossRef]
Luo, Y.; Li, Z.; Wang, H. A Review of Online Partial Discharge Measurement of Large Generators. Energies 2017, 10, 1694. [Google Scholar] [CrossRef]
Angrisani, L.; Daponte, P.; Lupò, G.; Petrarca, C.; Vitelli, M. Analysis of ultrawide-band detected partial discharges by means of a multiresolution digital signal-processing method. Measurement 2000, 27, 207–221. [Google Scholar] [CrossRef]
IEC-60034-27-2; Rotating Electrical Machines—Part 27-2: Online Partial Discharge Measurements on Stator Winding Insulation of Rotating Electrical Machines. Technical Report; International Electrotechnical Commission: Geneva, Switzerland, 2012.
Satish, L.; Gururaj, B. Use of hidden Markov models for partial discharge pattern classification. IEEE Trans. Electr. Insul. 1993, 28, 172–182. [Google Scholar] [CrossRef]
Salama, M.; Bartnikas, R. Fuzzy logic applied to PD pattern classification. IEEE Trans. Dielectr. Electr. Insul. 2000, 7, 118–123. [Google Scholar] [CrossRef]
Hunter, J.A.; Lewin, P.L.; Hao, L.; Walton, C.; Michel, M. Autonomous classification of PD sources within three-phase 11 kV PILC cables. IEEE Trans. Dielectr. Electr. Insul. 2013, 20, 2117–2124. [Google Scholar] [CrossRef]
Satish, L.; Gururaj, B.I. Partial discharge pattern classification using multilayer neural networks. IEE Proc. A (Sci. Meas. Technol.) 1993, 140, 323–330. [Google Scholar] [CrossRef]
Catterson, V.M.; Sheng, B. Deep neural networks for understanding and diagnosing partial discharge data. In Proceedings of the 2015 IEEE Electrical Insulation Conference (EIC), Seattle, WA, USA, 7–10 June 2015; pp. 218–221. [Google Scholar] [CrossRef]
Barrios, S.; Buldain, D.; Comech, M.P.; Gilbert, I.; Orue, I. Partial Discharge Classification Using Deep Learning Methods—Survey of Recent Progress. Energies 2019, 12, 2485. [Google Scholar] [CrossRef]
Karimi, M.; Majidi, M.; MirSaeedi, H.; Arefi, M.M.; Oskuoee, M. A Novel Application of Deep Belief Networks in Learning Partial Discharge Patterns for Classifying Corona, Surface, and Internal Discharges. IEEE Trans. Ind. Electron. 2020, 67, 3277–3287. [Google Scholar] [CrossRef]
Mantach, S.; Ashraf, A.; Janani, H.; Kordi, B. A Convolutional Neural Network-Based Model for Multi-Source and Single-Source Partial Discharge Pattern Classification Using Only Single-Source Training Set. Energies 2021, 14, 1355. [Google Scholar] [CrossRef]
Lu, S.; Chai, H.; Sahoo, A.; Phung, B.T. Condition Monitoring Based on Partial Discharge Diagnostics Using Machine Learning Methods: A Comprehensive State-of-the-Art Review. IEEE Trans. Dielectr. Electr. Insul. 2020, 27, 1861–1888. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: http://www.deeplearningbook.org (accessed on 24 November 2024).
Zhu, X.; Ghahramani, Z. Learning from Labeled and Unlabeled Data with Label Propagation; Technical Report CMU-CALD-02-107; Carnegie Mellon University: Pittsburgh, PA, USA, 2002. [Google Scholar]
Arthur, D.; Vassilvitskii, S. K-means++: The Advantages of Careful Seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
Soille, P. Morphological Image Analysis: Principles and Applications, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
Pearson, K. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1901, 2, 559–572. [Google Scholar] [CrossRef]
Jolliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef]
Schölkopf, B.; Smola, A.; Müller, K.R. Kernel principal component analysis. In Artificial Neural Networks—ICANN’97, Proceedings of the 7th International Conference, Lausanne, Switzerland, 8–10 October 1997; Gerstner, W., Germond, A., Hasler, M., Nicoud, J.D., Eds.; Springer: Berlin/Heidelberg, Germany, 1997; pp. 583–588. [Google Scholar]
Wang, Y.; Huang, H.; Rudin, C.; Shaposhnik, Y. Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMap, and PaCMAP for Data Visualization. J. Mach. Learn. Res. 2021, 22, 1–73. [Google Scholar]
MacQueen, J.B. Some Methods for classification and Analysis of Multivariate Observations. Proc. Fifth Berkeley Symp. Math. Stat. Probab. 1967, 1, 281–297. [Google Scholar]
Amorim, H.P.; De Carvalho, A.T.; de Oliveira Fo, O.B.; Levy, A.S.F.; Sans, J. Instrumentation for Monitoring and Analysis of Partial Discharges Based on Modular Architecture. In Proceedings of the 2008 International Conference on High Voltage Engineering and Application, Chongqing, China, 9–12 November 2008; pp. 596–599. [Google Scholar] [CrossRef]
Zhu, M.; Ghodsi, A. Automatic dimensionality selection from the scree plot via the use of profile likelihood. Comput. Stat. Data Anal. 2006, 51, 918–930. [Google Scholar] [CrossRef]
Murphy, K.P. Probabilistic Machine Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2022. [Google Scholar]
Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
Caliński, T.; Harabasz, J. A dendrite method for cluster analysis. Commun. Stat. 1974, 3, 1–27. [Google Scholar] [CrossRef]
Davies, D.L.; Bouldin, D.W. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 224–227. [Google Scholar] [CrossRef]
Fowlkes, E.B.; Mallows, C.L. A Method for Comparing Two Hierarchical Clusterings. J. Am. Stat. Assoc. 1983, 78, 553–569. [Google Scholar] [CrossRef]
Meilă, M. Comparing clusterings—An information based distance. J. Multivar. Anal. 2007, 98, 873–895. [Google Scholar] [CrossRef]
Spearman, C. The Proof and Measurement of Association between Two Things. Am. J. Psychol. 1904, 15, 72–101. [Google Scholar] [CrossRef]

Figure 1. The proposed label propagation strategy.

Figure 2. Effects of pre-processing on PRPDs: (a) original PRPD; (b) PRPD after amplitude scaling; (c) PRPD after amplitude scaling (detail); (d) scaled PRPD after grey-scale closing.

Figure 3. Detailed view of the PRPD data set clustering procedure. The variables d and k denote the dimensionality of the latent space and the number of clusters, respectively.

Figure 4. General experimental setup for acquisition and analysis of partial discharge data.

Figure 5. Example of a PRPD pattern (

256 \times 256

matrix).

Figure 5. Example of a PRPD pattern (

256 \times 256

matrix).

Figure 6. Number of PRPD diagrams for each hydroelectric generator in our data set.

Figure 7. Silhouette scores for each configuration. A grey bar refers to the weighted average of Silhouette scores (

{\bar{α}}_{w}

, left vertical axis) across hydroelectric generators, while the black dot indicates the corresponding standard deviation (

σ_{α}

, right vertical axis).

Figure 7. Silhouette scores for each configuration. A grey bar refers to the weighted average of Silhouette scores (

{\bar{α}}_{w}

, left vertical axis) across hydroelectric generators, while the black dot indicates the corresponding standard deviation (

σ_{α}

, right vertical axis).

Figure 8. Caliński–Harabasz scores for each configuration. A grey bar refers to the weighted average of Caliński–Harabasz scores (

{\bar{β}}_{w}

, left vertical axis) across hydroelectric generators, while the black dot indicates the corresponding standard deviation (

σ_{β}

, right vertical axis).

Figure 8. Caliński–Harabasz scores for each configuration. A grey bar refers to the weighted average of Caliński–Harabasz scores (

{\bar{β}}_{w}

, left vertical axis) across hydroelectric generators, while the black dot indicates the corresponding standard deviation (

σ_{β}

, right vertical axis).

Figure 9. Davies–Bouldin scores for each configuration. A grey bar refers to the weighted average of Davies–Bouldin scores (

{\bar{γ}}_{w}

, left vertical axis) across hydroelectric generators, while the black dot indicates the corresponding standard deviation (

σ_{γ}

, right vertical axis).

Figure 9. Davies–Bouldin scores for each configuration. A grey bar refers to the weighted average of Davies–Bouldin scores (

{\bar{γ}}_{w}

, left vertical axis) across hydroelectric generators, while the black dot indicates the corresponding standard deviation (

σ_{γ}

, right vertical axis).

Figure 10. Weighted average Silhouette scores (

{\bar{α}}_{w}

) vs. weighted average Chaliński–Harabasz scores (

{\bar{β}}_{w}

Figure 10. Weighted average Silhouette scores (

{\bar{α}}_{w}

) vs. weighted average Chaliński–Harabasz scores (

{\bar{β}}_{w}

Figure 11. Weighted average Silhouette scores (

{\bar{α}}_{w}

) vs. weighted average Davies–Bouldin scores (

{\bar{γ}}_{w}

Figure 11. Weighted average Silhouette scores (

{\bar{α}}_{w}

) vs. weighted average Davies–Bouldin scores (

{\bar{γ}}_{w}

Figure 12. Representative PRPDs from clusters 0 (a) and 1 (b) obtained from the best configuration selected by the Silhouette score for the machine 12. Vertical and horizontal axes indicate the indices of a 256 × 256 PRPD matrix.

Figure 13. Examples of two very distinct PRPDs associated to one cluster (cluster 0) obtained from the best configuration selected by the Davies–Bouldin score for the machine 12. Vertical and horizontal axes indicate the indices of a 256 × 256 PRPD matrix.

Table 1. The best 10 configurations for the Silhouette score (

α

{\bar{α}}_{w}

: weighted average of

α

;

σ_{α}

: standard deviation of

α

;

d_{\max}

: maximum number of latent dimensions;

n_{\max}

: maximum number of resulting clusters; and ID: configuration identification number.

Table 1. The best 10 configurations for the Silhouette score (

α

{\bar{α}}_{w}

: weighted average of

α

;

σ_{α}

: standard deviation of

α

;

d_{\max}

: maximum number of latent dimensions;

n_{\max}

: maximum number of resulting clusters; and ID: configuration identification number.

${\bar{α}}_{w}$	$σ_{α}$	$d_{\max}$	$n_{\max}$	ID
0.5791	0.1078	10	7	64
0.5775	0.1213	10	8	42
0.5766	0.1283	10	8	10
0.5711	0.1162	10	9	32
0.5626	0.1331	10	7	20
0.5580	0.1693	10	9	54
0.5481	0.0908	10	9	55
0.5464	0.0927	10	6	33
0.5441	0.0893	10	7	21
0.5411	0.0986	10	7	43

Table 2. The best 10 configurations for the Caliński–Harabasz score (

β

{\bar{β}}_{w}

: weighted average of

β

;

σ_{β}

: standard deviation of

β

;

d_{\max}

: maximum number of latent dimensions;

n_{\max}

: maximum number of resulting clusters; and ID: configuration identification number.

Table 2. The best 10 configurations for the Caliński–Harabasz score (

β

{\bar{β}}_{w}

: weighted average of

β

;

σ_{β}

: standard deviation of

β

;

d_{\max}

: maximum number of latent dimensions;

n_{\max}

: maximum number of resulting clusters; and ID: configuration identification number.

${\bar{β}}_{w}$	$σ_{β}$	$d_{\max}$	$n_{\max}$	ID
3.1184 × 10³	2.4845 × 10⁻¹	35	19	2
2.7782 × 10³	2.7054 × 10⁻¹	35	25	3
2.7754 × 10³	2.4850 × 10⁻¹	19	14	4
2.2416 × 10³	2.5429 × 10⁻¹	19	15	5
1.6682 × 10³	2.2756 × 10⁻¹	25	17	14
1.5380 × 10³	2.2156 × 10⁻¹	25	15	15
4.3822 × 10²	1.2832 × 10⁻¹	10	8	10
4.2722 × 10²	1.3319 × 10⁻¹	10	7	20

Table 3. The best 10 configurations for the Davies–Bouldin score (

γ

{\bar{γ}}_{w}

: weighted average of

γ

;

σ_{γ}

: standard deviation of

γ

;

d_{\max}

: maximum number of latent dimensions;

n_{\max}

: maximum number of resulting clusters; and ID: configuration identification number.

Table 3. The best 10 configurations for the Davies–Bouldin score (

γ

{\bar{γ}}_{w}

: weighted average of

γ

;

σ_{γ}

: standard deviation of

γ

;

d_{\max}

: maximum number of latent dimensions;

n_{\max}

: maximum number of resulting clusters; and ID: configuration identification number.

${\bar{γ}}_{w}$	$σ_{γ}$	$d_{\max}$	$n_{\max}$	ID
0.5023	0.2048	27	17	37
0.5078	0.2073	27	16	36
0.5837	0.2215	25	15	15
0.6126	0.1213	10	8	42
0.6165	0.1078	10	7	64
0.6220	0.1283	10	8	10
0.6241	0.1162	10	9	32
0.6396	0.1693	10	9	54
0.6522	0.1331	10	7	20
0.6560	0.2094	96	36	35

Table 4. Best configuration according to each metric. SIL: Silhouette score; CHA: Calinski–Harabasz score; DBO: Davies–Bouldin score; AS: amplitude scaling; DR: dimensionality reduction; FS–DR: feature scaling before DR; FS–CL: feature scaling before clustering.

Configuration	SIL	CHA	DBO
ID	64	2	37
AS	Yes	No	Yes
Closing	Yes	No	No
DR	PaCMAP	PCA	PCA
Clustering	k-means++	k-means++	k-means++
FS–DR	Yes	No	Yes
FS–CL	No	No	Yes

Table 5. Most common configurations. AS: amplitude scaling; DR: dimensionality reduction; FS–DR: feature scaling before DR; FS–CL: feature scaling before clustering.

ID	AS	Closing	DR	Clustering	FS–DR	FS–CL
42	Yes	No	PaCMAP	k-means++	Yes	No
10	No	No	PaCMAP	k-means++	No	No
20	No	No	PaCMAP	k-means++	Yes	No
54	Yes	Yes	PaCMAP	k-means++	No	No

Table 6. Matrix of Spearman correlation coefficients.

{\bar{α}}_{w}

: weighted average Silhouette scores;

{\bar{β}}_{w}

: weighted average Caliński–Harabasz scores;

{\bar{γ}}_{w}

: weighted average Davies–Bouldin scores. Moderate correlations are in bold.

Table 6. Matrix of Spearman correlation coefficients.

{\bar{α}}_{w}

: weighted average Silhouette scores;

{\bar{β}}_{w}

: weighted average Caliński–Harabasz scores;

{\bar{γ}}_{w}

: weighted average Davies–Bouldin scores. Moderate correlations are in bold.

	${\bar{α}}_{w}$	${\bar{β}}_{w}$	${\bar{γ}}_{w}$
${\bar{α}}_{w}$	1.0000	0.7731	−0.7553
${\bar{β}}_{w}$	0.7731	1.0000	−0.4100
${\bar{γ}}_{w}$	−0.7553	−0.4100	1.0000

Table 7. FMI Descriptive statistics for the best configurations, according to Caliński–Harabasz (ID 02), Davies–Bouldin (ID 37), and Silhouette (ID 64) scores. Symbols

μ_{δ}

σ_{δ}

{min}_{δ}

, and

{max}_{δ}

denote mean, standard deviation, minimum value, and maximum value, respectively. Higher quality values are in bold.

Table 7. FMI Descriptive statistics for the best configurations, according to Caliński–Harabasz (ID 02), Davies–Bouldin (ID 37), and Silhouette (ID 64) scores. Symbols

μ_{δ}

σ_{δ}

{min}_{δ}

, and

{max}_{δ}

denote mean, standard deviation, minimum value, and maximum value, respectively. Higher quality values are in bold.

Statistics	ID 02	ID 37	ID 64
$μ_{δ}$	0.6394	0.6462	0.6504
$σ_{δ}$	0.1727	0.1344	0.0864
${min}_{δ}$	0.3262	0.4113	0.4667
${max}_{δ}$	0.9276	0.9171	0.8604

Table 8. Comparison between the number of PRPD diagrams

N_{p}

and the number of estimated clusters

N_{c}

using Configuration 64. The ratio

N_{c} / N_{p}

compares the number of PRPD diagrams to be inspected with the proposed technique to the total number of PRPDs in the data set. The last column estimates the reduction in PRPD inspections required for manual labeling of the entire data set.

Table 8. Comparison between the number of PRPD diagrams

N_{p}

and the number of estimated clusters

N_{c}

using Configuration 64. The ratio

N_{c} / N_{p}

Machine	$N_{p}$	$N_{c}$	$N_{c} / N_{p}$	Reduction
1	268	3	0.0112	98.8806%
2	192	2	0.0104	98.9583%
3	230	2	0.0087	99.1304%
4	94	2	0.0213	97.8723%
5	314	3	0.0096	99.0446%
6	338	2	0.0059	99.4083%
7	338	2	0.0059	99.4083%
8	68	2	0.0294	97.0588%
9	268	5	0.0187	98.1343%
10	104	2	0.0192	98.0769%
11	254	2	0.0079	99.2126%
12	128	2	0.0156	98.4375%
13	69	2	0.0290	97.1014%
14	118	2	0.0169	98.3051%
15	144	2	0.0139	98.6111%
16	143	2	0.0140	98.6014%
17	144	4	0.0278	97.2222%
18	96	2	0.0208	97.9167%
19	120	2	0.0167	98.3333%
20	72	2	0.0278	97.2222%
21	120	2	0.0167	98.3333%
22	96	2	0.0208	97.9167%
23	120	2	0.0167	98.3333%
24	121	2	0.0165	98.3471%
25	120	2	0.0167	98.3333%
Total	4079	57	0.0140	98.6026%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zampolo, R.F.; Lopes, F.H.R.; de Oliveira, R.M.S.; Fernandes, M.F.; Dmitriev, V. Dimensionality Reduction and Clustering Strategies for Label Propagation in Partial Discharge Data Sets. Energies 2024, 17, 5936. https://doi.org/10.3390/en17235936

AMA Style

Zampolo RF, Lopes FHR, de Oliveira RMS, Fernandes MF, Dmitriev V. Dimensionality Reduction and Clustering Strategies for Label Propagation in Partial Discharge Data Sets. Energies. 2024; 17(23):5936. https://doi.org/10.3390/en17235936

Chicago/Turabian Style

Zampolo, Ronaldo F., Frederico H. R. Lopes, Rodrigo M. S. de Oliveira, Martim F. Fernandes, and Victor Dmitriev. 2024. "Dimensionality Reduction and Clustering Strategies for Label Propagation in Partial Discharge Data Sets" Energies 17, no. 23: 5936. https://doi.org/10.3390/en17235936

APA Style

Zampolo, R. F., Lopes, F. H. R., de Oliveira, R. M. S., Fernandes, M. F., & Dmitriev, V. (2024). Dimensionality Reduction and Clustering Strategies for Label Propagation in Partial Discharge Data Sets. Energies, 17(23), 5936. https://doi.org/10.3390/en17235936

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dimensionality Reduction and Clustering Strategies for Label Propagation in Partial Discharge Data Sets

Abstract

1. Introduction

2. The Proposed Label Propagation Algorithm

2.1. Pre-Processing

2.2. Dimensionality Reduction

2.3. Clustering

2.4. Feature Scaling

3. Evaluation Methodology

3.1. PRPD Data Set

3.2. Amplitude Scaling Factor

3.3. Number of Reduced Dimensions

3.4. Number of Clusters

3.5. Performance Assessment of the Clustering System

3.5.1. Silhouette Score [26]

3.5.2. Caliński–Harabasz Score [27]

3.5.3. Davies–Bouldin Score [28]

3.5.4. Fowlkes-Mallows Index [29,30]

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI