Keywords

1 Introduction

Power quality (PQ) evaluates the waveform of voltage and current signals with respect to pure sinusoidal waveforms for a single frequency component (e.g. 60 Hz) [1]. Non-linear electronic devices such as alternators and current starters introduce undesirable harmonic distortions to the network [2]. Other sources that distort pure electric signals include high electrical loads, electrical faults, and capacitors switching banks [2]. Depending on the source of PQ disturbances, we can find different types of distortions such as harmonics, swell/sag events, interruption events, voltage fluctuations, and transients [1].

The automatic classification of PQ disturbances can be seen as a pattern recognition problem, in which the types of distortions are differentiated based on their features [3]. For the feature extraction, PQ disturbances can be decomposed into time-frequency dependent components by using time-frequency or time-scale transforms, known as dictionaries [3, 4]. Short-time Fourier transform (STFT), Wavelets transform (WT), and S-transform (ST) are commonly used in PQ representation for the feature extraction [3, 5]. For the pattern recognition of PQ disturbances, classifiers based on K nearest neighbours (K-NN), support vector machines (SVM), and artificial neural networks (ANN) have been employed [3].

PQ distortion classification researches have used different combinations of dictionaries and classifiers to improve the accuracy of PQ disturbance discrimination [3]. Previous works in PQ disturbance representation have been limited to the use of single dictionaries for the feature extraction step. This is, they only use either GT, WT or ST together with some specific classifiers [3, 5]. Studies in signal processing have shown that combining several dictionaries for signal analysis improves the robustness for the feature extraction step [4, 6]. Combinations of complete dictionaries for signal analysis are known as overcomplete representations (OR), and they are useful for increasing the richness of the representation by removing uncertainty when choosing the “proper” dictionary, and for increasing the flexibility for matching any structure from data [7]. However, OR increases the dimension of the coefficient of representation extracted from signals, including not relevant information [4].

In order to remove the redundant information before the classification step, research communities in machine learning have proposed different methods for automatically performing basis selection in linear models. These methods are known as sparse linear models (SLMs) [8]. SLMs tend to increase the PQ representation accuracy due to OR and reduce the PQ classification challenge due to the sparse tendency [6]. Because PQ disturbance representation requires the analysis of variables (coefficient of representations) that are grouped per each dictionary (STFT, WT or ST), we employ a SLM known as Group Lasso. The performance of Group Lasso is evaluated using different classifiers with either one dictionary at a time or with OR. We employ the discrete-time forms of the GT, WT, and ST as dictionaries, and we use different classifiers for the PQ classification, namely, K-NN, SVM, ANN, and two Bayesian approaches.

This paper is organized as follows. Materials and methods are given in Sect. 2. In Sect. 3, we discuss the results. Finally, Sect. 4 shows the conclusions.

2 Materials and Methods

2.1 Power Quality (PQ) Disturbances

PQ determines the fitness of the electric power to consumer devices, evaluating the quality of voltage and current signals. There are two main types of PQ disturbances: events and variations. Events describe sudden distortions which occur at specific time intervals. On the other hand, variations are steady-state or quasi-steady-state disturbances which require continuous measurements [1]. This paper is focussed on six types of common disturbances: sag/swell events, interruption events, voltage variations, harmonic distortions and oscillatory transient events [1,2,3]. Table 1 summarizes the characteristics of the PQ disturbances listed above. Figure 1 shows examples of simulated PQ distortions using the electrical power distribution model from Fig. 2.

Table 1. Summary of PQ disturbances.
Fig. 1.
figure 1

PQ disturbance examples for (a) a sag event, (b) a voltage fluctuations, and (c) an oscillatory transient, using the Simulink model from Fig. 2.

Fig. 2.
figure 2

Simulink diagram of the electrical power distribution model.

2.2 Overcomplete Representation

Let be \(\mathbf y \in \mathbb {R}^{N}\) a vector which represents a discrete-time PQ disturbance of length N. In order to analyse the different types of waveform distortions, we want to completely represent the disturbance \(\mathbf y \) given a linear superposition of atoms \(\phi _\gamma \), i.e. \(\mathbf y = \varPhi \varvec{\beta }\), where \(\varPhi = (\phi _\gamma : \gamma \in \varGamma )\) contains the collection of atoms \(\phi _\gamma \), and it is known as a dictionary [4]. The vector \(\varvec{\beta }\in \mathbb {R}^M\) is the vector of representation coefficients. According to the indexation of the variables of the dictionary \(\varPhi \) (e.g. time-frequency), the parameter \(\gamma \) has different interpretations.

Table 2 shows the structures of the atoms for the most common dictionaries applied in PQ disturbance analysis: Gabor transform (GT), Wavelet transform (WT), and Stockwell transform (ST). For the WT, we employ the Mexican Hat Wavelet Transform (MHWT) because their atoms can be easily implemented. However, it is possible to use other wavelet functions (e.g. Hamming, Hann).

An overcomplete representation (OR) is performed by the combination of several dictionaries, \(\varPsi = (\varPhi _g: g \in G)\), where G represents the amount of dictionaries employed for the synthesis [4]. Note that each dictionary \(\varPhi _g = (\phi _{\gamma _g}: \gamma _g \in \varGamma _g)\) has different indexation parameter \(\gamma _g\), allowing the combination of different types of atoms (e.g. frequency, time-frequency, time-scale). In this paper, we analyse the PQ disturbances in terms of the vector \(\varvec{\beta }\) (analysis). Then \(\varvec{\beta }\) is used for the feature extraction in order to feed the statistical classifiers. We also use the obtained vectors \(\varvec{\beta }\) to reconstruct the PQ disturbances (synthesis).

Table 2. Atom structures for the Gabor transform (GT), the Mexican hat Wavelet transform (MHWT), and the Stockwell transform (ST).

2.3 Sparse Linear Models for Grouped Variable Selection

Tibshirani in [9] proposed a method for the estimation in linear models known as Lasso. To solve the inverse problem, \(\mathbf y = \varPsi \varvec{\beta }\), he demonstrated that Lasso obtains a sparse representation by the minimization of the regularized cost function \(\hat{\varvec{\beta }}_\lambda = {\text {arg min}}_{\varvec{\beta }} \{ \Vert \mathbf y - \varPsi \varvec{\beta }\Vert _{2}^{2} + \lambda \Vert \varvec{\beta }\Vert _1 \}\), where the vector \(\hat{\varvec{\beta }}_\lambda \in \mathbb {R}^M\) depends of the regularization parameter \(\lambda \). The norms \(\Vert \cdot \Vert _{2}\) and \(\Vert \cdot \Vert _{1}\) correspond to the \(\ell _2\)-norm, and \(\ell _1\)-norm, respectively. Lasso assumes that there is a unique correspondence between parameters and variables, performing the variable selection individually. However, the individual selection of variables produces a not satisfactory solution for grouped variables (e.g. when \(\varvec{\beta }_g\) is solely related to the group \(\varPhi _g\) from the dictionary \(\varPsi = (\varPhi _g: g \in G)\)) [8]. To deal with grouped variable estimation, a Group Lasso version was proposed in [10]. Group Lasso assumes that vector \(\varvec{\beta }\) is partitioned in G groups where the penalty is an intermediate between \(\ell _1\) and \(\ell _2\) regularizations [8]. Group Lasso has the attractive property that it performs the variable selection at the group level, promoting sparsity over \(\hat{\varvec{\beta }}_\lambda \) for large values of \(\lambda \) [8]. For linear regression, the cost function for Group Lasso is given by

$$\begin{aligned} \hat{\varvec{\beta }}_\lambda = {\text {arg min}}_{\varvec{\beta }} \bigg \{ \Vert \mathbf y - \varPsi \varvec{\beta }\Vert _{2}^{2} + \lambda \sum _{g=1}^{G}\Vert \varvec{\beta }_g \Vert _\mathbf{K _g} \bigg \}, \end{aligned}$$
(1)

with \(\Vert \varvec{\beta }_g \Vert _\mathbf{K _g} = (\varvec{\beta }_g^{\top } \mathbf K _g \varvec{\beta }_g)^{1/2}\). The algorithm to solve the problem from Equation (1) is given in [10]. In this paper, each group g corresponds to a single dictionary.

2.4 Procedure

Dataset: we simulated the PQ disturbances mentioned in Sect. 2.1 from an electrical power distribution model based on [11]. We introduced a variable load module to simulate voltage fluctuations. Figure 2 shows the Simulink diagram of the electrical power distribution model. We generated 1200 PQ disturbances, 200 samples per each type. Each disturbance has 2000 discrete-time values for an interval between 0.05 and 0.45 s. The electrical parameters for the different types of PQ disturbance sources (e.g. RLC values, nominal voltage, and active/reactive power), were tuned manually to simulate disturbances according to the Table 1.

Group Lasso: we implement the algorithm proposed in [10], using a penalty factor \(\lambda = 1\times 10^{-2}\), and a tolerance equal to \(1\times 10^{-10}\). These values are chosen in order to obtain a signal reconstruction error lower than \(1 \times 10^{-3}\). The percentage of sparsity for \(\varvec{\beta }\) is computed by the sum of the number of coefficients lower than \(1\times 10^{-4}\), and dividing the result by the length of the vector \(\varvec{\beta }\).

Dictionaries: we use a location factor \(m_o = 0.005\) s. For the GT and the ST, we focus in the first 40 harmonics with \(\sigma _o = 0.005\) s. For the MHWT, we use an scale \(\sigma _o = 0.005\) s with \(V=10\). We fix six scale terms for the ST. To evaluate the SLM performance using OR, we combine GT, WT and ST dictionaries, and we denote it as the GWST dictionary. We also add a cosine/sine (Harmonics) dictionary for all the cases with the first 40 harmonics to improve the PQ synthesis.

Feature Extraction: we compute the following features over \(\{\varvec{\beta }_p\}_{p=1}^P\) to create the feature vectors \(\{\mathbf{x}_p\}_{p=1}^P\) for each signal p: mean of the absolute values (\(F_1\)), standard deviation (\(F_2\)), kurtosis (\(F_3\)), Shannon’s energy (\(F_4\)) and RMS value (\(F_5\)). We also add the mean of the absolute values of the derivative \(\varvec{\beta }'\) (\(F_6\)). For the PQ disturbance classification step, we normalize the features by subtracting their mean values, and dividing the result using their standard deviation.

Classifiers: we use several classifiers from the state-of-the-art. The theory behind them can be found in deep in any pattern recognition textbook [3, 8]. K-nearest neighbours (K-NN), Bayesian classifiers based on linear (LDC) and quadratic (QDC) discriminant functions, support vector machines (SVM), and neural networks (ANN) are employed. For the ANN-based classifier, we use the Neural Network Toolbox provided for Matlab R2013a. For the other classifiers, we use the toolbox for Pattern Recognition (PRTools Toolbox). We make experiments with a different number of neighbours for the K-NN classifiers. We chose the 1-NN and the 3-NN because they presented better behaviour. For the SVM, we use an RBF kernel, \(k(\mathbf{x}, \mathbf{x}')= \exp (-\Vert \mathbf{x} -\mathbf{x}'\Vert ^2/\sigma ^2)\). The bandwidth parameter \(\sigma \) and the regularization parameter for the SVM are tuned by cross-validation. We design an ANN made of three hidden layers with 20, 15 and 10 neurons in each layer, respectively [12]. We use sigmoid transfer functions.

We test all the classifiers twenty times with different training sets. We select randomly the \(70\%\) of the total samples per each type of PQ disturbance for training, and then we use the other \(30\%\) for testing. The performance for each test experiment is computed by the sum of the successful cases, and dividing the result by the total number of test samples. Finally, we compute the mean \(\mu \) and the standard deviation \(\sigma \) of the performance obtained for all the experiments. Figure 3 summarizes the procedure taken into account in this paper.

Fig. 3.
figure 3

Block diagram of PQ disturbance classification procedure.

Fig. 4.
figure 4

In (a) we show the swell example. The synthesis results using GWST dictionary are showed for the approach (b) without SLM and (c) using Group Lasso.

3 Results and Discussions

To highlight the advantages of SLM for PQ representation, Fig. 4 shows the synthesis for (a) an example of a swell. The result is shown for two cases: (b) without sparsity, and (c) using Group Lasso. We used the GWST dictionary which we obtained by combining the GT, MHWT, and ST. Both methods can synthesize the PQ distortion in Figure (a), ensuring a low reconstruction error.

In order to quantify the level of the sparsity produced by Group Lasso over the GWST representation, the synthesis step is performed over all the PQ disturbance dataset. We obtain values lower than \(1\times 10^{-2}\) for the sparsity percentages produced by the method without SLMs, concluding that this approach tend to use all the coefficients from \(\varvec{\beta }\). This makes difficult subsequent PQ studies (e.g. PQ disturbance classification). On the other hand, Group Lasso automatically selects representative coefficients required to synthesize the disturbances. We note that the Harmonics, MHWT, and ST dictionaries contain relevant components in terms of the disturbance representation when compared to GT. Table 3 shows the sparsity percentages produced by using Group Lasso according to Sect. 2.4. We compute the sparsity percentages per each type of PQ disturbance (rows), and each dictionary (columns).

To evaluate the performance of SLM for PQ classification, we perform the procedure described in Fig. 3 per each PQ disturbance through GT, MHWT, ST, and GWST. We repeat this procedure when Group Lasso is applied to obtain the feature set \(\{\mathbf{x}_p\}_{p=1}^P\). We train the different classifiers, and evaluate their performance on the test set as well as we describe in the procedure. Table 4 shows the PQ classification performance over the test set. When Group Lasso is not applied (first six rows), GT and ST show better results than WT and GWST, independently of the classifier employed. When Group Lasso is applied (last six rows), we notice that for any particular representation, and any classifier, the accuracy obtained by additionally applying Group Lasso is higher, when compared to the same representation, and the same classifier used without Group Lasso. For example, when the representation is ST and the classifier is ANN, applying Group Lasso increases the performance by almost \(4\%\). The improvement is even higher when GWST is applied, and the classifier is almost LDC. In this case, the improvement is close to \(28\%\). Due to the similarity of some accuracy results, we apply the Wilcoxon rank-sum test over the results per classifier [8], concluding that the differences are statistically significant. We note that the results from GWST without Group Lasso is lower than the other schemes for all classifiers. This is because the representation using GWST tends to introduce redundant information, producing misclassification results in each classifier. This redundancy is removed when we apply Group Lasso together with GWST, outperforming the classification accuracy for all the statistical classifiers.

Table 3. Sparsity percentages using GWST with Group Lasso.
Table 4. Performance of the classifiers without SLM and using Group Lasso for GT, MHWT, ST and combining all the dictionaries (GWST).

4 Conclusions

We introduced the concepts of overcomplete representations (OR) and sparse linear models (SLM) for PQ disturbance classification. We combined different time-frequency dictionaries, which are well known in the PQ literature. We introduced Group Lasso assuming that each dictionary is a group in the SLM.

As we showed experimentally, Group Lasso improves the performance of PQ disturbance classification for both linear and non-linear classifiers compared to methods without SLM. Due to SLMs can carry out OR, ensuring a high performance for PQ disturbance classification, this framework removes the uncertainty about which dictionary should be used for which type of distortion.