CN109447153A

CN109447153A - Divergence-excitation self-encoding encoder and its classification method for lack of balance data classification

Info

Publication number: CN109447153A
Application number: CN201811269919.5A
Authority: CN
Inventors: 雒瑞森; 房鑫彤; 王琛; 孙超; 徐耀; 余勤; 龚薇; 涂海燕; 朱颜; 蒋荣华; 王建; 任小梅; 曾晓东; 郑秀娟
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2018-10-29
Filing date: 2018-10-29
Publication date: 2019-03-08

Abstract

The invention discloses a divergence-excited self-encoder used for unbalanced data classification and a classification method thereof, which comprises sequentially connecting a data input layer, a coding layer unit, a bottleneck layer, a decoding layer unit and a data output layer; The layer is connected to the data output layer; the loss function in the bottleneck layer is used to calculate the divergence loss of the data input to the bottleneck layer, which is used to explicitly motivate and distinguish samples with different labels; the loss function in the data output layer unit is used to calculate The reconstruction loss for the output data of the decoding layer unit. The unbalanced data classification method provided by the present invention is superior to the method based on convolutional neural network in terms of F1 value and convergence speed; secondly, the autoencoder needs fewer parameters to be manually adjusted, unlike the loss-sensitive volume The hyperparameters need to be optimized as in the neural network; finally, when the number of minority samples is small, the autoencoder model proposed in the present invention performs well.

Description

Divergence-excited autoencoders for classification of unbalanced data and their classification methods

技术领域technical field

本发明属于非均衡数据分类处理技术领域，具体涉及一种用于非均衡数据分类的散度-激励自编码器及其分类方法。The invention belongs to the technical field of unbalanced data classification and processing, and in particular relates to a divergence-excitation autoencoder used for unbalanced data classification and a classification method thereof.

背景技术Background technique

非均衡数据在实际数据分类问题中大量存在，指在分类中各类别的样本量差异较大，常用的分类算法通常基于不同类别的数据大致均匀分布的假设，但是由于实践中往往不可避免存在大量极端偏态数据，即一个类中的样本占据了大部分数据集。因此，当遇到这类数据时，算法的性能通常会显著降低，多年来，减小由非均衡数据引起的问题一直是一个挑战，以往的研究大多从数据重采样和算法改进的角度提出了解决方案。Unbalanced data exists in large numbers in actual data classification problems, which means that the sample size of each category in the classification varies greatly. Commonly used classification algorithms are usually based on the assumption that the data of different categories are roughly evenly distributed. However, in practice, there are often a large number of Extremely skewed data, i.e. samples in one class occupy the majority of the dataset. Therefore, when encountering such data, the performance of the algorithm usually degrades significantly. For many years, reducing the problem caused by the imbalanced data has been a challenge. Most of the previous studies have proposed from the perspective of data resampling and algorithm improvement. solution.

传统上，对于样本不平衡的数据集，解决方案主要采用两种策略：先进行数据重采样再进行分类，或者采用对样本不均衡敏感度的损失函数进行分类。Traditionally, for datasets with imbalanced samples, the solution mainly employs two strategies: data resampling before classification, or classification using a loss function that is sensitive to imbalanced samples.

数据重采样方法旨在对少数样本类过采样以增加其样本量，或对多数样本类欠采样以减小其样本量；虽然这种技术可以产生均衡的数据集，但会导致信息丢失(欠采样)或重复表示(过采样)的问题。样本不均衡敏感函数通过对通常所忽视的少数样本类样本进行着重学习，使得分类时更加注重这些少数样本类样本来进行，但这些方法需要手动调整超参数，效果往往不理想；另外，有一些算法旨在集中学习少数样本类的特征，然而，这将失去来自多数样本类的丰富信息，当少数样本类的绝对数量很少(例如20个样本)时，其不可行。Data resampling methods aim to oversample the minority class to increase its sample size, or undersample the majority class to reduce its sample size; while this technique can produce a balanced dataset, it results in loss of information (undersampling). sampling) or repeated representation (oversampling). The sample imbalance sensitive function is carried out by focusing on the minority samples that are usually ignored, so that the classification can pay more attention to these minority samples. However, these methods need to manually adjust the hyperparameters, and the effect is often unsatisfactory; in addition, there are some The algorithm aims to learn the features of the minority class centrally, however, this will lose the rich information from the majority class, which is not feasible when the absolute number of the minority class is small (e.g. 20 samples).

发明内容SUMMARY OF THE INVENTION

针对现有技术中的上述不足，本发明提供的用于非均衡数据分类的散度-激励自编码器及其分类方法解决了现的对非均衡数据分类时，收敛速度慢且分类结果不够准确的问题。In view of the above deficiencies in the prior art, the divergence-excited autoencoder and its classification method for unbalanced data classification provided by the present invention solve the problem of slow convergence and inaccurate classification results when classifying unbalanced data. The problem.

为了达到上述发明目的，本发明采用的技术方案为：一种用于非均衡数据分类的散度-激励自编码器，包括顺次连接数据输入层、编码层单元、瓶颈层、解码层单元以及数据输出层；In order to achieve the above purpose of the invention, the technical solution adopted in the present invention is: a divergence-excited autoencoder for unbalanced data classification, comprising sequentially connecting a data input layer, an encoding layer unit, a bottleneck layer, a decoding layer unit and data output layer;

所述数据输入层与数据输出层连接；The data input layer is connected with the data output layer;

所述瓶颈层和数据输出层单元均有损失函数；Both the bottleneck layer and the data output layer units have loss functions;

所述瓶颈层中的损失函数为用于计算输入到瓶颈层数据的散度损失，进而激励区分具有不同标签的样本；The loss function in the bottleneck layer is used to calculate the divergence loss of the data input to the bottleneck layer, and then motivates to distinguish samples with different labels;

所述数据输出层单元中的损失函数用于计算解码层单元输出数据的重构损失。The loss function in the data output layer unit is used to calculate the reconstruction loss of the output data of the decoding layer unit.

进一步地，所述数据输入层用于输入待分类数据；Further, the data input layer is used to input the data to be classified;

所述编码层单元包括若干个顺次连接的卷积层，用于对数据输入层输入的待分类数据进行编码；The coding layer unit includes several convolutional layers connected in sequence for coding the data to be classified inputted by the data input layer;

所述瓶颈层包括要素层和函数损失层；其中，要素层用于导出编码层单元的输出数据，并根据其标签值分为少数样本和多数样本，函数损失层用于计算少数样本和多数样本的散度损失和梯度；The bottleneck layer includes a feature layer and a function loss layer; wherein, the feature layer is used to derive the output data of the coding layer unit, and is divided into minority samples and majority samples according to their label values, and the function loss layer is used to calculate minority samples and majority samples. The divergence loss and gradient of ;

所述解码层单元包括若干个顺次连接的反卷积层，用于对瓶颈层输出的数据进行解码，重构待分类数据；The decoding layer unit includes several sequentially connected deconvolution layers for decoding the data output by the bottleneck layer and reconstructing the data to be classified;

所述数据输出层根据重构待分类数据和待分类数据，计算输出数据的重构损失和梯度，并输出分类后的数据。The data output layer calculates the reconstruction loss and gradient of the output data according to the reconstructed data to be classified and the data to be classified, and outputs the classified data.

进一步地，所述数据输出层中：Further, in the data output layer:

当解码层单元的输出数据为标准化数据时，解码层单元通过sigmoid函数与数据输出层连接，且数据输出层的损失函数计算的重构损失为交叉熵损失；When the output data of the decoding layer unit is standardized data, the decoding layer unit is connected with the data output layer through the sigmoid function, and the reconstruction loss calculated by the loss function of the data output layer is the cross entropy loss;

当解码层单元的输出数据不是标准化数据时，解码层单元直接与数据输出层连接，且数据输出层的损失函数计算的重构损失为最小平方损失；When the output data of the decoding layer unit is not standardized data, the decoding layer unit is directly connected to the data output layer, and the reconstruction loss calculated by the loss function of the data output layer is the least square loss;

其中，当解码层单元的输出数据为标准化数据时，sigmoid函数σ(·)为：Among them, when the output data of the decoding layer unit is standardized data, the sigmoid function σ( ) is:

其中，为重构的待分类数据在坐标(i,j)处的像素值；in, is the pixel value of the reconstructed data to be classified at coordinates (i, j);

数据输出层的交叉熵损失L_r(·)为：The cross-entropy loss L _r ( ) of the data output layer is:

其中，X_i,j为输入的待分类数据在坐标(i,j)处的像素值；Wherein, X _i,j is the pixel value of the input data to be classified at the coordinates (i, j);

X为输入的待分类数据；X is the input data to be classified;

为重构的待分类数据； is the reconstructed data to be classified;

交叉熵损失对应的梯度为：The gradient corresponding to the cross-entropy loss for:

其中，*为成对乘法运算符；Among them, * is the pairwise multiplication operator;

当解码层单元的输出数据不是标准化数据时，数据输出层的最小平方损失 L_r(·)为：When the output data of the decoding layer unit is not normalized data, the least square loss L _r ( ) of the data output layer is:

其中，上标T为转置运算符；Among them, the superscript T is the transpose operator;

最小平方损失对应的梯度为：The gradient corresponding to the least squares loss for:

其中，m为输入到数据输出层的样本数。where m is the number of samples input to the data output layer.

进一步地，所述瓶颈层中的散度损失L_d(·)为：Further, the divergence loss L _d (·) in the bottleneck layer is:

其中， in,

Z^N*为多数样本在瓶颈层中的特征要素值；Z ^N* is the feature element value of most samples in the bottleneck layer;

Z^K为少数样本在瓶颈层中的特征要素值；Z ^K is the feature element value of a few samples in the bottleneck layer;

σ(·)为sigmoid函数；σ( ) is the sigmoid function;

v为函数，将数据复制m行，使得其与被减矩阵相同尺寸；v is a function that copies m rows of data so that it has the same size as the subtracted matrix;

m₁为多数样本类中样本的数量；m ₁ is the number of samples in the majority sample class;

为第i个多数样本类数据样本的瓶颈层数据； is the bottleneck layer data of the i-th majority sample class data sample;

为第i个少数样本类数据样本的瓶颈层数据； is the bottleneck layer data of the i-th minority sample class data sample;

为第i个多数样本类数据样本的瓶颈层数据的第j维度数据； is the jth dimension data of the bottleneck layer data of the ith majority sample class data sample;

为第i个少数样本类数据样本的瓶颈层数据的第j维度数据； is the jth dimension data of the bottleneck layer data of the ith minority sample class data sample;

为第i个少数样本类数据样本的第j维度的sigmoid函数的输出； is the output of the sigmoid function of the j-th dimension of the i-th minority sample class data sample;

散度损失对应的梯度包括Z^N*的梯度和Z^K的梯度；The gradient corresponding to the divergence loss includes the gradient of Z ^N* and the gradient of Z ^K ;

其中，Z^N*的梯度为：where, the gradient of Z ^N* for:

其中，Z^K的梯度为：Among them, the gradient of Z ^K for:

用于非均衡数据分类的散度-激励自编码器的分类方法，包括以下步骤：A classification method for divergence-excited autoencoders for classification of unbalanced data, comprising the following steps:

S1、构建散度-激励自编码器；S1. Build a divergence-excited autoencoder;

S2、确定输入到散度-激励编码器训练的训练集，并输入数据输入层；S2, determine the training set input to the training of the divergence-excitation encoder, and input the data input layer;

S3、依次以最小化重构损失和优化散度损失为训练目标，完成进行散度-激励编码器的训练；S3. Take minimizing the reconstruction loss and optimizing the divergence loss as the training objectives in turn, and complete the training of the divergence-excitation encoder;

S4、将训练集中的数据和测试数据输入到训练好的散度-激励自编码器中，利用余弦距离分类器对测试数据进行分类。S4. Input the data in the training set and the test data into the trained divergence-excitation autoencoder, and use the cosine distance classifier to classify the test data.

进一步地，所述步骤S2中的训练集中的数据包括训练数据X、训练数据X 中多数样本类的数据y_n标签和训练数据X中少数样本类的数据y_k的标签。Further, the data in the training set in the step S2 includes the training data X, the labels of the data _yn of the majority sample class in the training data X and the labels of the data y _k of the minority class of the training data X.

进一步地，所述步骤S3具体为：Further, the step S3 is specifically:

S31、将训练集中的训练数据X、训练数据X中多数样本类的数据y_n的标签和训练数据X中少数样本类的数据y_k的标签依次输入到散度-激励自编码器中；S31, input the training data X in the training set, the label of the data y _n of the majority sample class in the training data X, and the label of the data y _k of the minority sample class in the training data X into the divergence-excitation autoencoder in turn;

S32、分别确定少数样本类的数据y_k的标签指数和多数样本类的数据y_n的标签指数；S32, respectively determining the label index of the data y _k of the minority sample class and the label index of the data y _n of the majority sample class;

其中，少数样本类的数据y_k的标签指数为 Among them, the label index of the data y _k of the minority sample class is

多数样本类的数据y_n的标签指数为 The label index of the data y _n of the majority sample class is

S32、根据少数样本类的数据y_k的标签指数和多数样本类的数据y_n的标签指数对输入激励-自编码器进行次的重构损失对应的梯度计算，并将每次计算的重构损失对应的梯度反向传播到数据输出层，构成具有平衡标签的训练子集；S32. Perform the input excitation-autoencoder on the label index of the data y _k of the minority sample class and the label index of the data y _n of the majority sample class Calculate the gradient corresponding to the reconstruction loss for each time, and back-propagate the gradient corresponding to the reconstruction loss for each calculation to the data output layer to form a training subset with balanced labels;

S34、根据具有平衡标签的训练子集，计算次散度损失对应的梯度，并将其反向传播到瓶颈层；S34, according to the training subset with balanced labels, calculate The gradient corresponding to the sub-divergence loss is back-propagated to the bottleneck layer;

S35、分别对当前散度-激励自编码器进行重构损失和散度损失评估，满足重构损失和散度损失没有显著的梯度时，完成散度-激励自编码器的训练。S35 , respectively evaluate the reconstruction loss and the divergence loss of the current divergence-excitation autoencoder, and complete the training of the divergence-excitation autoencoder when the reconstruction loss and the divergence loss have no significant gradients.

进一步地，所述步骤S4具体为：Further, the step S4 is specifically:

S41、将测试数据X^*输入到训练好的散度-激励自编码器；S41. Input the test data X ^* into the trained divergence-excited autoencoder;

S42、根据训练好的散度-激励自编码器，得到测试数据X^*中每个样本x^*的特征表示z^*；S42, obtain the characteristic representation z ^* of each sample x ^* in the test data X ^* according to the trained divergence-excitation autoencoder;

S43、将训练数据X输入到训练好的散度-激励自编码器，得到带有分类标签的少数样本类的数据X^K和多数样本类的数据X^N；S43, input the training data X into the trained divergence-excited autoencoder, and obtain the data X ^K of the minority sample class with the classification label and the data X ^N of the majority sample class;

S44、结合测试数据X^*中每个样本x^*的特征表示z^*、少数样本类的数据X^K中的样本和多数样本类的数据X^N中的样本，根据余弦距离分类器，依次确定测试数据X^*中每个样本x^*的所属类别，完成测试数据X^*的分类。S44. Combine the feature representation z ^* of each sample x ^* in the test data X ^* , the samples in the data ^XK of the minority sample class and the samples in the data ^XN of the majority sample class, and determine the test in turn according to the cosine distance classifier The category of each sample x ^* in the data X ^* , completes the classification of the test data X ^* .

进一步地，所述步骤S44中一个样本x^*的分类方法具体为：Further, the classification method of a sample x ^* in the step S44 is specifically:

S441、计算多数样本类的数据X^N中每个样本zⁱ和z^*间的余弦距离D_N；S441, calculate the cosine distance D _N between each sample ^zi and z ^* in the data X ^N of most sample classes;

S442、计算少数样本类的数据X^K中每个样本z^j和z^*间的余弦距离D_K；S442, calculate the cosine distance D _K between each sample z ^j and z ^* in the data X ^K of the minority sample class;

S443、分别确定D_N和D_K的中值，并比较其大小；S443. Determine the median values of D _N and D _K respectively, and compare their sizes;

S444、根据较小的中值，将其中z^*对应的样本x^*和带有数据标签数据分类一类。S444. Classify the sample x ^* corresponding to z ^* and the data with the data label into one category according to the smaller median value.

进一步地，其特征在于；Further, it is characterized in that;

所述步骤S431中的余弦距离D_N为：The cosine distance D _N in the step S431 is:

所述步骤S432中的余弦距离D_K为:The cosine distance D _K in the described step S432 is:

其中，||||为取范数的运算操作符。Among them, |||| is the operator for taking the norm.

本发明的有益效果为：本发明提出的用于分均衡数据分类的散度-激励自编码器及其分类方法有以下几个优点：首先，无论是在F1值还是收敛速度方面，本发明提出的基于自编码器的方法优于基于卷积神经网络的方法；其次，自编码器所需进行手动调整的参数更少，而不像损失敏感的卷积神经网络那样需要优化超参数；最后，当少数样本的数量很小时，本发明提出的自编码器模型表现良好。The beneficial effects of the present invention are as follows: the divergence-excited autoencoder and its classification method for dividing and equalizing data classification proposed by the present invention have the following advantages: First, in terms of F1 value and convergence speed, the present invention proposes The autoencoder-based method is superior to the convolutional neural network-based method; secondly, the autoencoder requires fewer parameters for manual tuning, unlike loss-sensitive convolutional neural networks that require optimization of hyperparameters; finally, When the number of minority samples is small, the autoencoder model proposed in the present invention performs well.

附图说明Description of drawings

图1为本发明提供的实施例中用于非均衡数据分类的散度-激励自编码器结构图。FIG. 1 is a structural diagram of a divergence-excited autoencoder used for unbalanced data classification in an embodiment provided by the present invention.

图2位本发明提供的实施例中用于非均衡数据分类的散度-激励自编码器的分类方法流程图。Fig. 2 is a flowchart of a classification method of a divergence-excited autoencoder for unbalanced data classification in an embodiment provided by the present invention.

图3为本发明提供的实施例中逆散度损失函数示意图。FIG. 3 is a schematic diagram of an inverse divergence loss function in an embodiment provided by the present invention.

图4为本发明提供的实施例中散度-激励自编码器训练方法流程图。FIG. 4 is a flowchart of a training method for a divergence-excited autoencoder in an embodiment provided by the present invention.

图5为本发明提供的实施例中两种变量的梯度曲线示意图。FIG. 5 is a schematic diagram of gradient curves of two variables in the embodiment provided by the present invention.

图6为本发明提供的实施例中散度-激励自编码器数据分类方法流程图。FIG. 6 is a flowchart of a data classification method for a divergence-excited autoencoder in an embodiment provided by the present invention.

图7为本发明提供的实施例中三种方法在少数样本类样本仅有30个样本时的分类效果对比图。FIG. 7 is a comparison diagram of the classification effects of the three methods in the embodiment provided by the present invention when there are only 30 samples in the minority sample class.

图8为本发明提供的实施例中三种方法在少数样本类样本仅有10个样本时的分类效果对比图。FIG. 8 is a comparison diagram of the classification effects of the three methods in the embodiment provided by the present invention when there are only 10 samples in the minority sample class.

图9为本发明提供的实施例中三种方法对于少数样本类的样本变化量时的分类效果比较示意图。FIG. 9 is a schematic diagram showing the comparison of the classification effects of the three methods in the embodiment provided by the present invention with respect to the sample variation of the minority sample class.

图10为本发明提供的实施例中使用本发明方法对5个关键词与5000个多数背景样本的性能比较示意图。FIG. 10 is a schematic diagram of performance comparison between 5 keywords and 5000 majority background samples using the method of the present invention in the embodiment provided by the present invention.

图11为本发明提供的实施例中使用本发明方法对10个关键词样本与5000 个多数背景的性能比较示意图。FIG. 11 is a schematic diagram of performance comparison between 10 keyword samples and 5000 majority backgrounds using the method of the present invention in the embodiment provided by the present invention.

具体实施方式Detailed ways

下面对本发明的具体实施方式进行描述，以便于本技术领域的技术人员理解本发明，但应该清楚，本发明不限于具体实施方式的范围，对本技术领域的普通技术人员来讲，只要各种变化在所附的权利要求限定和确定的本发明的精神和范围内，这些变化是显而易见的，一切利用本发明构思的发明创造均在保护之列。The specific embodiments of the present invention are described below to facilitate those skilled in the art to understand the present invention, but it should be clear that the present invention is not limited to the scope of the specific embodiments. For those of ordinary skill in the art, as long as various changes Such changes are obvious within the spirit and scope of the present invention as defined and determined by the appended claims, and all inventions and creations utilizing the inventive concept are within the scope of protection.

如图1所示，一种用于非均衡数据分类的散度-激励自编码器，包括顺次连接数据输入层、编码层单元、瓶颈层、解码层单元以及数据输出层；As shown in Figure 1, a divergence-excited autoencoder for unbalanced data classification includes sequentially connecting a data input layer, an encoding layer unit, a bottleneck layer, a decoding layer unit and a data output layer;

所述瓶颈层中的损失函数为用于计算输入到瓶颈层数据的散度损失，用于明确地激励区分具有不同标签的样本；The loss function in the bottleneck layer is used to calculate the divergence loss of the data input to the bottleneck layer, and is used to explicitly motivate and distinguish samples with different labels;

其中，所述数据输入层用于输入待分类数据；Wherein, the data input layer is used to input the data to be classified;

本发明提出的自编码器中的损失函数包括重构损失和散度损失，其中重构损失可以是交叉熵损失或最小平方损失；The loss function in the self-encoder proposed by the present invention includes reconstruction loss and divergence loss, wherein the reconstruction loss can be cross entropy loss or least square loss;

其中，为重构待分类数据在坐标(i,j)处的像素值；in, In order to reconstruct the pixel value of the data to be classified at the coordinates (i, j);

X为输入的待分类数据；X is the input data to be classified;

为重构的待分类数据； is the reconstructed data to be classified;

交叉熵损失对应的输出梯度为：The output gradient corresponding to the cross-entropy loss for:

使用矩阵表示时，上面的梯度表达式可以写成：When represented by a matrix, the above gradient expression can be written as:

当解码层单元的输出数据不是标准化数据时，数据输出层的最小平方损失L_r(·)为：When the output data of the decoding layer unit is not normalized data, the least square loss L _r ( ) of the data output layer is:

对于上述的交叉熵损失和最小平方损失，可以添加l₂范数w^Tw作为惩罚。For the above cross-entropy loss and least squares loss, the l ₂ norm w ^Tw can be added as a penalty.

另外，散度损失存在于瓶颈层中，其目的在于明确地激励区分具有不同标签的样本；In addition, the divergence loss exists in the bottleneck layer, and its purpose is to explicitly motivate to distinguish samples with different labels;

对于瓶颈层，假设其中的少数样本类的数据Z^K有m₁个样本，多数样本类的数据Z^N中有m₂个样本，m₂＞＞m₁。但是，我们可以从Z^N中采样m₁个样本并获得 Z^N★以均衡样本数量，并对Z^N★和Z^K应用sigmoid函数。For the bottleneck layer, it is assumed that there are m ₁ samples in the data Z ^K of the minority sample class, m ₂ samples in the data Z ^N of the majority sample class, and m ₂ >>m ₁ . However, we can sample m1 samples from ^ZN and obtain ZN ^★ to equalize the number _of samples and apply the sigmoid function to ^ZN ^★ and ZK.

可以应用损失：Losses can be applied:

值得注意的是，上述散度损失并不完全是KL-散度，因为KL-散度是针对分布而不是数据。然而，本发明方法可以成功地激励两个类别之间的差异。为了考虑这种损失的梯度，可以分别分析关于Z^K和Z^N★的梯度。It is worth noting that the above divergence loss is not exactly KL-divergence, since KL-divergence is for distributions rather than data. However, the method of the present invention can successfully stimulate the difference between the two classes. To account for the gradient of this loss, the gradients with respect to Z ^K and Z ^N★ can be analyzed separately.

Z^K的梯度可以计算为：The gradient of Z ^K can be calculated as:

关于Z^N*梯度可以计算为：The gradient with respect to Z ^N* can be calculated as:

上述损失计算公式可能会促使自编码器适应噪声，为了减小这样负面情况的负面影响，进一步将类内方差作为惩罚添加：The above loss calculation formula may cause the autoencoder to adapt to noise. In order to reduce the negative impact of such a negative situation, the intra-class variance is further added as a penalty:

最终得到，散度损失L_d(·)为：Finally, the divergence loss L _d ( ) is:

其中， in,

Z^N*为多数样本类样本在瓶颈层中的特征要素值；Z ^N* is the feature element value of most sample class samples in the bottleneck layer;

Z^K为少数样本类样本在瓶颈层中的特征要素值；Z ^K is the feature element value of the minority sample class sample in the bottleneck layer;

σ(·)为sigmoid函数；σ( ) is the sigmoid function;

在考虑类内方差后，可以将散度损失对应的梯度包括Z^N*的梯度和Z^K的梯度修改为矩阵格式；After considering the intra-class variance, the gradient corresponding to the divergence loss, including the gradient of Z ^N* and the gradient of Z ^K , can be modified into a matrix format;

其中，Z^N*的梯度为：where, the gradient of Z ^N* for:

其中，Z^K的梯度为：Among them, the gradient of Z ^K for:

在本发明的一个实施例中，还提供了用于非均衡数据分类的散度-激励自编码器的分类方法，如图2所示，包括以下步骤：In an embodiment of the present invention, a classification method for a divergence-excited autoencoder for unbalanced data classification is also provided, as shown in FIG. 2 , including the following steps:

上述S2中的训练集中的数据包括训练数据X、训练数据X中多数样本类的数据y_n标签和训练数据X中少数样本类的数据y_k的标签。The data in the training set in the above S2 includes the training data X, the label of the data y _n of the majority sample class in the training data X, and the label of the data y _k of the minority class of the training data X.

上述步骤S3中可以利用等式(4)(6)(11)和(12)进行梯度计算并训练自编码器，但是，关于详细的训练过程，还有两个问题需要进一步确定：训练范式和网络的融合。Equations (4) (6) (11) and (12) can be used to calculate the gradient and train the autoencoder in the above step S3. However, regarding the detailed training process, there are still two issues that need to be further determined: the training paradigm and Convergence of networks.

第一：同时进行两次损失的训练，由于损失函数属于不同的层，因此在训练期间难以合并梯度。因此，在本发明中的自编码器中，采用两阶段训练范式，而不是明确地合并梯度并将它们馈送到网络中：在每个时期，首先最小化重构损失(等式4或6)然后优化散度损失(等式11和等式12)。当使用散度损失进行训练时，从大多数样本类中采样次作为替换，以构成具有平衡标签的训练子集。First: training with two losses at the same time, it is difficult to merge gradients during training since the loss functions belong to different layers. Therefore, in the autoencoder in the present invention, a two-stage training paradigm is adopted, instead of explicitly incorporating gradients and feeding them into the network: at each epoch, the reconstruction loss is first minimized (equations 4 or 6) The divergence loss is then optimized (Equation 11 and Equation 12). When training with divergence loss, sample from most sample classes times as an alternative to form a balanced label training subset.

其中，训练过程与GANs类似：在GANs中我们交替训练鉴别器和生成器，而在本发明的训练方法中依次提高了自编码器的数据重构和类内鉴别性能，实际上这两个目标不一定是“对抗性的”。然而，在实践中，在前几次迭代中，如果两次损失反映出某种“冲突”，这是正常的，因为他们关注的是不同的问题。将训练的两个阶段分开可以带来另一个优点：重构损失的优化可以“压缩”z^i,j的值到一个相对受限的范围，这可以减轻优化散度损失期间的不稳定性。Among them, the training process is similar to GANs: in GANs we alternately train the discriminator and the generator, while in the training method of the present invention, the data reconstruction and intra-class discrimination performance of the autoencoder are sequentially improved. In fact, these two goals Not necessarily "adversarial". In practice, however, in the first few iterations, it is normal if the two losses reflect some kind of "conflict" because they focus on different problems. Separating the two stages of training can bring another advantage: the optimization of the reconstruction loss can "compress" the values of zi ^,j to a relatively restricted range, which can alleviate instability during optimization of the divergence loss.

第二：网络的融合，本发明方法的目标是最小化逆散度损失，所以目标函数(等式10)本身不一定是凸函数。事实上，考虑σ(z^K)和σ(z^N*)的1-d情形， Hessian矩阵是不定的，而球面就像图3中那样；所以我们将在此分析优化散度损失的情况。Second: the fusion of networks, the goal of the method of the present invention is to minimize the inverse divergence loss, so the objective function (Equation 10) itself is not necessarily a convex function. In fact, considering the 1-d case of σ(z ^K ) and σ(z ^N* ), the Hessian is indeterminate, and the sphere is as in Figure 3; so we will analyze the case of optimizing the divergence loss here.

综上，如图4所示，上述步骤S3具体为To sum up, as shown in FIG. 4 , the above step S3 is specifically:

当用等式(11)计算每个变量的梯度时，由于在散度损失中维度之间没有相互作用，为了方便起见，只考虑一维情况(标量输入)的单个数据。扩展z^K的 sigmoid形式的梯度，可以得到：When computing the gradient of each variable with Equation (11), since there is no interaction between dimensions in the divergence loss, only a single data for the one-dimensional case (scalar input) is considered for convenience. Extending the gradient in the sigmoid form of z ^K , we get:

对于Z^N*的梯度，可以得到：For the gradient of Z ^N* , one can get:

同样类似的，通过假设z^K作为给定值也获得了∝步骤。从下列等式中，我们可以找到z^K的梯度(等式13)是一个凹/凸函数，并且很明显的：当z^K到达较大的正/负值是，其梯度将会饱和到接近于0，这表明了损失的收敛性。同样，z^N*的梯度(表达在等式14中)，对于大的负值将会饱和到接近于1，而对于大的正值将会饱和到接近于0，同时具有区间内的单调性。这表明对于给定的z^K，同样可以找到一个近似平稳的z^N*的优化目标。因此，我们的训练方法具有高度的有效性。在实践中，通常可以找到散度损失的平坦梯度点；梯度12和梯度13如图5所示。Similarly, ∝ steps are also obtained by assuming z ^K as a given value. From the following equations, we can find that the gradient of z ^K (Equation 13) is a concave/convex function, and it is obvious that when z ^K reaches large positive/negative values, its gradient will saturate close to to 0, which indicates the convergence of the loss. Likewise, the gradient of z ^N* (expressed in Equation 14) will saturate close to 1 for large negative values and close to 0 for large positive values, while being monotonic in the interval . This shows that for a given z ^K , an optimization objective of approximately stationary z ^N* can also be found. Therefore, our training method is highly effective. In practice, it is often possible to find flat gradient points for divergence loss; gradient 12 and gradient 13 are shown in Figure 5.

梯度曲线的分析对两相训练也提供了一个依据：如果直接应用散度梯度于网络，由于Z值在很大的范围内，它可能会遇到梯度不稳定；如果采用两相训练，Z值将会在优化重构损失之后处于相对密集的范围内，从而散度优化将会更为稳定。根据上述分析，训练过程简单的基于重构和散度损失的梯度消失，终止这个过程是合理的。The analysis of the gradient curve also provides a basis for two-phase training: if the divergence gradient is directly applied to the network, since the Z value is in a large range, it may encounter gradient instability; if two-phase training is used, the Z value will be in a relatively dense range after optimizing the reconstruction loss, so the divergence optimization will be more stable. According to the above analysis, the training process is simply based on the gradient disappearance of reconstruction and divergence loss, and it is reasonable to terminate this process.

如图6所示，上述步骤S4具体为：As shown in Figure 6, the above step S4 is specifically:

S44、结合测试数据X^*中每个样本x^*的特征表示z^*、少数样本类的数据X^K中的样本和多数样本类的数据X^N中的样本，根据余弦距离分类器，依次确定测试数据X^*中每个样本x^*的所属类别，完成测试数据X^*的分类S44. Combine the feature representation z ^* of each sample x ^* in the test data X ^* , the samples in the data ^XK of the minority sample class and the samples in the data ^XN of the majority sample class, and determine the test in turn according to the cosine distance classifier The category of each sample x ^* in the data X ^* , complete the classification of the test data X ^*

上述步骤S44中一个样本x^*的分类方法具体为：The classification method of a sample x ^* in the above step S44 is specifically:

上述步骤S431中的余弦距离D_N为：The cosine distance D _N in the above step S431 is:

上述步骤S432中的余弦距离D_K为:The cosine distance D _K in the above-mentioned steps S432 is:

其中，||||为取泛数的运算操作符。Among them, |||| is the operator for taking the general number.

在本发明的一个实施例中，通过与卷积神经网络、敏感卷积神经网络对比，以验证本发明方法的有效性和效果的过程：In one embodiment of the present invention, by comparing with the convolutional neural network and the sensitive convolutional neural network, the process of verifying the validity and effect of the method of the present invention:

从基准MNIST数据集中取样生成非均衡数据集，并用卷积神经网络、损失敏感卷积神经网络和所提出的基于散度-激励自编码器的算法对其性能进行了测试。利用几种常用的非均衡分类指标来评价网络的性能，包括TPr和TNr，表示多数和少数样本类别的查全率，TPv和TNv表示分类结果的查准率；此外，F₁分数和G均值被用作“总体性能”的指标，为了说明这些度量的计算，我们使用 TP、TN、FP、FN来表示样本量，而这些样本分别归类为真阳性、真阴性、假阳性和假阴性，然后，两个类的查准率/查全率可以如下表示：An unbalanced dataset was generated by sampling from the benchmark MNIST dataset, and its performance was tested with convolutional neural networks, loss-sensitive convolutional neural networks, and the proposed divergence-excited autoencoder-based algorithm. Several commonly used non-equilibrium classification metrics are used to evaluate the performance of the network, including TPr and TNr, which represent the recall rate of the majority and minority sample classes, TPv and TNv, which represent the precision rate of the classification results _; in addition, the F1 score and G mean is used as a measure of "overall performance", to illustrate the computation of these measures, we use TP, TN, FP, FN to denote the sample size, and these samples are classified as true positives, true negatives, false positives, and false negatives, respectively, Then, the precision/recall of the two classes can be expressed as follows:

F₁分数可以表达为： _The F1 score can be expressed as:

类似的，G均值可表达为：Similarly, G-means can be expressed as:

基于实验情况，调整参数η＝12.0，本发明提出的散度-激励自编码器是由3 层卷积加3层反卷积层组成，瓶颈层为100维。两种卷积神经网络和自编码器的详细描述在实验中使用的如表1所示。Based on the experimental situation, adjusting the parameter η=12.0, the divergence-excited autoencoder proposed by the present invention is composed of 3 layers of convolution and 3 layers of deconvolution layers, and the bottleneck layer is 100 dimensions. Detailed descriptions of the two convolutional neural networks and autoencoders used in the experiments are shown in Table 1.

表1：卷积神经网络的结构和参数以及损失敏感型卷积神经网络Table 1: Structure and Parameters of Convolutional Neural Networks and Loss-Sensitive Convolutional Neural Networks

为了更清楚地进行比较，分析了三种方法对少数样本类仅有30个和10个样本的分类结果，并选取对关键的少数样本类的查全率进行绘图，分类结果对比如图7和图8所示。从左到右选取三个子图(a)、(b)和(c)代表了传统的卷积神经网络、改进的对损失敏感的卷积神经网络和本发明提出的方法的效果。每个小方格是其对应的横纵手写数字的分类结果，其颜色的深浅代表查全率的水平，且颜色越深说明对关键的少数样本类的查全率越高(具体数值已在图中标出)。从图中可以看出，本发明提出的方法具有最高的查全能力，且对样本量进一步减少的情况(10个样本)也有很好的效果。For a clearer comparison, the classification results of the three methods for the minority class with only 30 and 10 samples were analyzed, and the recall of the key minority class was selected to plot. The classification results are compared in Figures 7 and 7. shown in Figure 8. Three subgraphs (a), (b) and (c) are selected from left to right to represent the traditional convolutional neural network, the improved loss-sensitive convolutional neural network and the effect of the proposed method of the present invention. Each small square is the classification result of its corresponding horizontal and vertical handwritten digits. The depth of its color represents the level of recall, and the darker the color, the higher the recall of key minority sample classes (the specific value has been set in marked in the figure). It can be seen from the figure that the method proposed by the present invention has the highest recall ability, and also has a good effect on the case where the sample size is further reduced (10 samples).

为进一步检查少数样本类的查全率随少数样本类中样本数据减少的改变，分别通过卷积神经网络、改进的对损失敏感的卷积神经网络和本发明提出的方法计算并绘制图9，图中显示了再对手写数字进行分类时，所提出的散度-激励自编码器方法的查全率始终高于其他方法，特别的，在少数样本类的数据较少时，本发明提出的方法的鲁棒性更好。In order to further examine the change of the recall rate of the minority sample class with the reduction of sample data in the minority sample class, the convolutional neural network, the improved loss-sensitive convolutional neural network and the method proposed by the present invention are respectively calculated and plotted in Fig. 9, The figure shows that when the handwritten digits are classified again, the recall rate of the proposed divergence-excited autoencoder method is always higher than other methods. The robustness of the method is better.

另外，与前述两类别非均衡数据分类任务相比，从噪声较大的背景中识别少数关键词的关键词识别任务对分类器有着更高的要求，本实施例中对此也进行了测试，图10、图11分别展示出对关键字3,1的侦听性能的比较，显示本发明提出的方法的查全率、F1值和G-Mean值优于其他方法。In addition, compared with the aforementioned two-category unbalanced data classification task, the keyword recognition task of recognizing a few keywords from a noisy background has higher requirements on the classifier, which is also tested in this embodiment. FIG. 10 and FIG. 11 respectively show the comparison of the interception performance of keywords 3 and 1, showing that the recall, F1 value and G-Mean value of the method proposed by the present invention are better than other methods.

在上述对比实验过程中，说明了在不同的任务和参数设置下，普通卷积神经网络、损失敏感卷积神经网络与本发明提出的散度-激励自编码器的性能比较。可以观察到，所提出的散度-激励自编码器通常比卷积网络更有利：In the process of the above comparison experiment, the performance comparison of the ordinary convolutional neural network, the loss-sensitive convolutional neural network and the divergence-excited autoencoder proposed by the present invention is illustrated under different tasks and parameter settings. It can be observed that the proposed divergence-excited autoencoders are generally more favorable than convolutional networks:

(1)当少数样本的数量很小时的任务表现：对于所测试的模型，发现卷积网络的性能趋于波动，当少数样本的数量很小时，卷积网络性能较差时，不能达到令人满意的性能；而本发明所提出的自编码器更稳定。(1) Task performance when the number of minority samples is small: For the tested models, it is found that the performance of the convolutional network tends to fluctuate. Satisfactory performance; while the autoencoder proposed by the present invention is more stable.

(2)收敛速度：从上面的数据可以看出，基于散度-激励自动编码器的方法通常比卷积神经网络收敛更快。(2) Convergence speed: From the above data, it can be seen that the divergence-excited autoencoder-based method generally converges faster than the convolutional neural network.

(3)F1值和G均值：采用少数样本类F1分数和G均值这两个指标来评价算法整体表现。在两个评价指标下，本发明所提出的散度-激励自编码器优于卷积神经网络。(3) F1 value and G mean: The F1 score and G mean of the minority sample class are used to evaluate the overall performance of the algorithm. Under the two evaluation indicators, the divergence-excited autoencoder proposed by the present invention is superior to the convolutional neural network.

综上所述，发明提出的用于分均衡数据分类的散度-激励自编码器及其分类方法有以下几个优点：首先，无论是在F1值还是收敛速度方面，本发明提出的基于自编码器的方法优于基于卷积神经网络的方法；其次，自编码器所需进行手动调整的参数更少，而不像损失敏感的卷积神经网络那样需要优化超参数；最后，当少数样本类的样本数量很小时，本发明提出的自编码器模型表现良好。To sum up, the divergence-excited autoencoder and its classification method for dividing balanced data classification proposed by the present invention have the following advantages: First, in terms of F1 value and convergence speed, the The encoder-based approach outperforms the convolutional neural network-based approach; secondly, the autoencoder requires fewer parameters for manual tuning, unlike loss-sensitive convolutional neural networks that require optimization of hyperparameters; finally, when a small number of samples When the number of samples of the class is small, the autoencoder model proposed in the present invention performs well.

Claims

1. A divergence-excitation self-encoder for non-equalized data classification is characterized by comprising a data input layer, an encoding layer unit, a bottleneck layer, a decoding layer unit and a data output layer which are connected in sequence;

the data input layer is connected with the data output layer;

the bottleneck layer unit and the data output layer unit both have loss functions;

the loss function in the bottleneck layer is used for calculating divergence loss of data input to the bottleneck layer so as to excite and distinguish samples with different labels;

the loss function in the data output layer unit is used for calculating the reconstruction loss of the output data of the decoding layer unit.

2. A divergence-autocoder for non-equalized data classification in accordance with claim 1, characterized in that the data input layer is used to input data to be classified;

the coding layer unit comprises a plurality of convolution layers which are connected in sequence and is used for coding the data to be classified input by the data input layer;

the bottleneck layer comprises a factor layer and a function loss layer; the element layer is used for deriving output data of the coding layer unit and dividing the output data into a few samples and a plurality of samples according to the label value of the element layer, and the function loss layer is used for calculating divergence loss and gradient of the few samples and the plurality of samples;

the decoding layer unit comprises a plurality of deconvolution layers which are connected in sequence and is used for decoding data output by the bottleneck layer and reconstructing data to be classified;

and the data output layer calculates the reconstruction loss and gradient of the output data according to the reconstructed data to be classified and the data to be classified, and outputs the classified data.

3. A divergence-self encoder for non-equalized data classification in accordance with claim 1, characterized in that in the data output layer:

when the output data of the decoding layer unit is standardized data, the decoding layer unit is connected with the data output layer through a sigmoid function, and the reconstruction loss calculated by the loss function of the data output layer is cross entropy loss;

when the output data of the decoding layer unit is not standardized data, the decoding layer unit is directly connected with the data output layer, and the reconstruction loss calculated by the loss function of the data output layer is the minimum square loss;

when the output data of the decoding layer unit is standardized data, the sigmoid function sigma (-) is as follows:

wherein,pixel values at coordinates (i, j) for the reconstructed data to be classified;

cross entropy loss L of data output layer_r(. is):

wherein, X_i,jThe pixel value of the input data to be classified at the coordinate (i, j);

x is input data to be classified;

classifying the reconstructed data to be classified;

gradient corresponding to cross entropy lossComprises the following steps:

wherein, is the pairwise multiplication operator;

least square loss L of data output layer when output data of decoding layer unit is not standardized data_r(. is):

wherein, the superscript T is the transpose operator;

gradient of least square loss correspondenceComprises the following steps:

where m is the number of samples input to the data output layer.

4. Divergence-excitation self encoder for non-equalized data classification in accordance with claim 1 characterized in that the divergence loss in the bottleneck layer is L_d(. is):

wherein,

Z^N*characteristic element values of most sample samples in the bottleneck layer;

Z^Kcharacteristic element values of a few sample class samples in a bottleneck layer;

σ (-) is a sigmoid function;

v is a function that copies the data m rows so that it is the same size as the matrix being reduced;

m₁is the number of samples in the majority sample class;

bottleneck layer data of an ith majority sample class data sample;

bottleneck layer data of the ith few sample class data samples;

bottle for ith majority sample class data sampleJ-th dimension data of the neck layer data;

j dimension data of the bottleneck layer data of the ith few sample class data samples;

the output of the sigmoid function of the j dimension of the ith few sample class data samples;

the gradient corresponding to the divergence loss comprises Z^N*Gradient and Z^KA gradient of (a);

wherein Z is^N*Gradient of (2)Comprises the following steps:

wherein Z is^KGradient of (2)Comprises the following steps:

5. a method of classification for a divergence-excitation autocoder for non-equalized data classification, comprising the steps of:

s1, constructing a divergence-excitation self-encoder;

s2, determining a training set input to the divergence-excitation encoder training, and inputting a data input layer;

s3, sequentially taking the minimized reconstruction loss and the optimized divergence loss as training targets to finish the training of the divergence-excitation encoder;

and S4, inputting the data in the training set and the test data into a trained divergence-excitation self-encoder, and classifying the test data by using a cosine distance classifier.

6. The method for classifying a divergence-excitation autocoder for non-equalized data classification as claimed in claim 5, wherein the data in the training set in the step S2 includes training data X, and data y of a majority sample class in the training data X_nData y of a few sample classes in label and training data X_kThe label of (1).

7. The method for classifying a divergence-excitation autocoder for non-equalized data classification as claimed in claim 6, wherein the step S3 is specifically:

s31, training data X in training set, data y of most samples in training data X_nLabel of (2) and data y of a few sample classes in the training data X_kThe labels are sequentially input into a divergence-excitation self-encoder;

s32, determining data y of a few sample classes respectively_kLabel index of (d) and data y of majority sample class_nThe tag index of (a);

wherein, data y of a few sample classes_kHas a label index of

Data y of majority sample class_nHas a label index of

S32, data y according to minority sample class_kLabel index of (d) and data y of majority sample class_nTag index of input excitation-self encoderCalculating the gradient corresponding to the reconstruction loss of each time, and calculating each timeThe gradient corresponding to the reconstruction loss is reversely propagated to a data output layer to form a training subset with a balance label;

s34, calculating according to the training subset with the balance labelThe corresponding gradient of the hypo-divergence loss is reversely propagated to the bottleneck layer;

and S35, respectively carrying out reconstruction loss and divergence loss evaluation on the current divergence-excitation self-encoder, and completing the training of the divergence-excitation self-encoder when the reconstruction loss and the divergence loss have no obvious gradient.

8. The method for classifying a divergence-excitation autocoder for non-equalized data classification as claimed in claim 7, wherein the step S4 is specifically:

s41, testing data X^*Inputting the data to a trained divergence-excitation self-encoder;

s42, obtaining test data X according to the trained divergence-excitation self-encoder^*Of each sample x^*Is a characteristic of^*；

S43, inputting the training data X into the trained divergence-excitation self-encoder to obtain the data X of a few sample classes with classification labels^KAnd data X of a majority sample class^N；

S44 Combined test data X^*Of each sample x^*Is a characteristic of^*Data X of minority sample class^KSample in (1) and data X of a majority of sample classes^NThe sample in (1) sequentially determines the test data X according to the cosine distance classifier^*Of each sample x^*Class of completion test data X^*Classification of (3).

9. The training method of divergence-excitation autocoder for non-equalized data classification as claimed in claim 8, wherein one sample x in the step S44^*The classification method is concretely：

S441, calculating data X of a plurality of sample classes^NEach sample z inⁱAnd z^*Cosine distance D between_N；

S442, calculating data X of a few sample classes^KEach sample z in^jAnd z^*Cosine distance D between_K；

S443, determining D respectively_NAnd D_KAnd comparing the magnitudes thereof;

s444, according to the smaller median value, dividing z in the value^*Corresponding sample x^*And data classes with data tags.

10. The method of classification for a divergence-excitation autocoder for unequal data classification as claimed in claim 9, characterized in that;

cosine distance D in the step S431_NComprises the following steps:

the cosine distance D in the step S432_KComprises the following steps:

wherein, | | | is an operational operator for pan number.