CN115758125A

CN115758125A - Industrial sewage treatment soft measurement method based on feature structure optimization and deep learning

Info

Publication number: CN115758125A
Application number: CN202211565490.0A
Authority: CN
Inventors: 曹佳斐; 薛安克; 杨勇; 张乐; 杨洁; 胡晓静
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-12-07
Filing date: 2022-12-07
Publication date: 2023-03-07

Abstract

The invention discloses an industrial sewage treatment soft measurement method based on feature structure optimization and deep learning; the invention firstly eliminates the overlapping characteristic of the data collected by the industrial sewage treatment plant. In addition, a full set empirical mode decomposition method is adopted to decompose the input characteristic sequence and historical data. The input sequence and the historical data are decomposed into respective intrinsic mode functions, and then the IMFs are subjected to feature screening by using a feature selection Relief F method. Then, the invention applies a new mixed deep learning model to predict four water-yielding indexes, and the model combines a convolution neural network and a gate control circulation unit network to carry out optimization through an attention mechanism. Compared with a soft measurement algorithm without feature optimization, the method provided by the invention has the advantage that the prediction effect is improved.

Description

Soft-sensing method for industrial wastewater treatment based on feature structure optimization and deep learning

技术领域technical field

本发明属于污水处理系统控制中水质软测量领域，特别涉及一种基于特征结构优化和深度学习的工业污水软测量方法。The invention belongs to the field of water quality soft measurement in sewage treatment system control, and in particular relates to an industrial sewage soft measurement method based on characteristic structure optimization and deep learning.

背景技术Background technique

工业废水是指工业生产过程所产生的废水、污水和废液，其种类繁多、成分复杂，常含多种有毒物质。随着工业的迅速发展，工业废水对水体的污染也日趋广泛和严重，造成土壤和空气的污染，威胁人类的健康和安全。由此可见，工业废水的处理比城市污水的处理更为重要。如何有效的根据废水中污染物成分和浓度采取相应的净化措施进行综合利用和处理，是保证可持续发展的有待解决核心问题之一。工业废水处理厂(IETP)虽可通过物理、化学及生物等方法对废水进行有效处理，但废水的水质和水量等干扰因素的波动，往往减弱IETP的治理效果。此外，污水处理厂内其他过程变量与出水某一项指标在时域和频域的耦合关系也很少被提出。Industrial wastewater refers to the wastewater, sewage and waste liquid produced in the industrial production process. It has various types and complex components, and often contains a variety of toxic substances. With the rapid development of industry, the pollution of industrial wastewater to water bodies is becoming more and more extensive and serious, causing soil and air pollution and threatening human health and safety. It can be seen that the treatment of industrial wastewater is more important than the treatment of urban sewage. How to effectively take corresponding purification measures for comprehensive utilization and treatment according to the composition and concentration of pollutants in wastewater is one of the core issues to be solved to ensure sustainable development. Although industrial wastewater treatment plants (IETP) can effectively treat wastewater through physical, chemical, and biological methods, the fluctuations in the quality and quantity of wastewater and other disturbance factors often weaken the treatment effect of IETP. In addition, the coupling relationship between other process variables in sewage treatment plants and a certain indicator of effluent in time domain and frequency domain is rarely proposed.

发明内容Contents of the invention

本发明针对现有工业污水处理软测量方法的局限性，提出一种基于特征结构优化和深度学习的工业污水处理软测量方法。Aiming at the limitations of the existing soft-sensing methods for industrial sewage treatment, the present invention proposes a soft-sensing method for industrial sewage treatment based on feature structure optimization and deep learning.

本发明首先将采集了的工业污水处理厂内数据特征进行了皮尔逊相关性分析(PCC)，剔除存在信息重叠的特征。此外，使用完全集合经验模式分解(CEEMD)方法将输入特征序列和历史数据分解。分解后的IMFs分量既包含了主要特征信息，也会有一些IMFs表示原始信号中的噪声成分，这些伪分量会成为后续处理的一个障碍。采用Relief F特征选择方法来选择IMFs中主要过程变量作为输入序列。然后，应用新的混合深度学习模型，结合卷积神经网络(CNN)和门控循环单元网络(GRU)，最后，通过注意机制(AM)进行优化，输出出水水质CODcr、NH₃-N、TN和TP的浓预测结果。In the present invention, firstly, Pearson correlation analysis (PCC) is performed on the collected data features of the industrial sewage treatment plant, and features with overlapping information are eliminated. Furthermore, the input feature sequence and historical data are decomposed using the Complete Ensemble Empirical Mode Decomposition (CEEMD) method. The decomposed IMFs components not only contain the main feature information, but also some IMFs represent the noise components in the original signal, and these pseudo components will become an obstacle for subsequent processing. The Relief F feature selection method is used to select the main process variables in IMFs as input sequences. Then, apply a new hybrid deep learning model, combined with convolutional neural network (CNN) and gated recurrent unit network (GRU), and finally, optimize through attention mechanism (AM), output water quality CODcr, NH ₃ -N, TN and the thick prediction results of TP.

本发明解决其技术问题所采用的技术方案包括如下步骤：The technical solution adopted by the present invention to solve its technical problems comprises the steps:

步骤1、针对工业污水处理厂所采集的数据，剔除重叠特征；Step 1. Eliminate overlapping features for the data collected by industrial sewage treatment plants;

步骤2、对剔除重叠特征的工业污水处理厂内的数据进行经验模态分解；Step 2, performing empirical mode decomposition on the data in the industrial sewage treatment plant with overlapping features eliminated;

步骤3、对分解后的IMFs进行特征筛选；Step 3, performing feature screening on the decomposed IMFs;

步骤4、设计CNN-GRU-AM混合网络模型；Step 4, design CNN-GRU-AM hybrid network model;

步骤5、训练并验证提出模型网络，输出预测指标。Step 5. Train and verify the proposed model network, and output prediction indicators.

作为优选，步骤1中所述的剔除重叠特征，具体如下：As preferably, the elimination of overlapping features described in step 1 is as follows:

在构建模型之前，对工业污水处理厂所采集的数据进行了皮尔逊相关性分析，相关性的强弱表明输入信息是否存在重叠；PCC的信号x(t)和y(t)的相关系数ρ_xy定义为：Before building the model, a Pearson correlation analysis was performed on the data collected by the industrial sewage treatment plant. The strength of the correlation indicates whether there is overlap in the input information; the correlation coefficient ρ of the PCC signals x(t) and y(t) _xy is defined as:

式中，σ_xy为信号x(t)和y(t)的协方差，σ_x，σ_y分别为信号x(t)和y(t)的标准差；最后，将计算得到的相关系数进行归一化，ρ_xy∈[-1，1]，ρ_xy值越趋近于1表明两个信号的相关性越强，设定阈值，当ρ_xy大于设定阈值时，则认定为输入信号为重叠信号。In the formula, σ _xy is the covariance of signals x(t) and y(t), σ _x and σ _y are the standard deviations of signals x(t) and y(t) respectively; finally, the calculated correlation coefficient is Normalization, ρ _xy ∈ [-1, 1], the closer the value of ρ _xy is to 1, the stronger the correlation between the two signals, the threshold is set, and when ρ _xy is greater than the set threshold, it is recognized as the input signal for overlapping signals.

作为优选，步骤2所述的对特征进行经验模态分解，具体如下：As a preference, the feature described in step 2 is subjected to empirical mode decomposition, specifically as follows:

采用互补集成经验模态分解方法对原始数据做处理，结合特征筛选算法，剔除高频噪声干扰和虚假分量，筛选出能够表征出有效信息的分量。互补集成经验模态分解既保留了经验模态分解良好的自适应性，又结合了集成经验模态分解有效抑制了模态混叠现象的优点，有效克服了高斯白噪声的残留问题。从而使整体模型具有较好的降噪性能，具体步骤如下：The complementary integrated empirical mode decomposition method is used to process the original data, combined with the feature screening algorithm, the high-frequency noise interference and false components are eliminated, and the components that can represent effective information are screened out. Complementary integrated empirical mode decomposition not only retains the good adaptability of empirical mode decomposition, but also combines the advantages of integrated empirical mode decomposition to effectively suppress the mode aliasing phenomenon, and effectively overcomes the residual problem of Gaussian white noise. So that the overall model has better noise reduction performance, the specific steps are as follows:

步骤①、为原始信号s(t)添加成n对正、负白噪声ω_i(t)：Step ①. Add n pairs of positive and negative white noise ω _i (t) to the original signal s(t):

步骤②、添加噪声之后的信号按EMD分解过程重新分解，得到两组相应的固有模态函数IMFs和残差r_i(t)，再求均值：Step ②, the signal after adding noise is re-decomposed according to the EMD decomposition process, and two sets of corresponding intrinsic mode functions IMFs and residual r _i (t) are obtained, and then the mean value is calculated:

式中，C_ij ⁺(t)、C_ij ^-(t)表示第j个IMF分量，r_i ⁺(t)、r_i ^-(t)表示第i次计算残差。In the formula, C _ij ⁺ (t), C _ij ^- (t) represent the j-th IMF component, r _i ⁺ (t), ri _- ⁽ t) represent the i-th calculation residual.

工业污水处理厂的数据采集往往在复杂的环境下，获取的数据通常有大量的噪声混杂其中。CEEMD分解的过程是将原始信号在不同的时间尺度分解得到频率由高到低的IMFs分量，这个过程是自适应的、近似于正交的滤波过程。IMFs分量既包含了主要特征信息，也会有一些IMFs表示原始信号中的噪声成分，这些伪分量会成为后续处理的一个障碍。本发明将在下一步骤通过设定IMF的筛选阈值选取有效IMF分量。The data acquisition of industrial sewage treatment plants is often in a complex environment, and the acquired data is usually mixed with a lot of noise. The process of CEEMD decomposition is to decompose the original signal on different time scales to obtain IMFs components with frequencies ranging from high to low. This process is an adaptive and approximately orthogonal filtering process. The IMFs component not only contains the main feature information, but also some IMFs represent the noise components in the original signal, and these pseudo components will become an obstacle for subsequent processing. In the next step, the present invention selects effective IMF components by setting the screening threshold of IMF.

作为优选，步骤3所述的对分解后的IMFs进行特征筛选，具体如下：As preferably, the decomposed IMFs described in step 3 are characterized as follows:

通过ReliefF算法对分解后的IMFs进行特征筛选，具体为：从同类和不同类样本中各选出k个最近邻样本，对其求平均值得到每个特征权值，从而得到每个样本实例中各个特征与类的相关性。接着，将特征按照权值大小进行排序，将通过设定门限来判定特征是有效或无效，并且选择n个权值最大的特征，去除其他特征来进行特征选择；The features of the decomposed IMFs are screened by the ReliefF algorithm, specifically: select k nearest neighbor samples from the same and different samples, and average them to obtain the weight of each feature, so as to obtain the weight of each sample instance. Correlation of individual features with classes. Next, the features are sorted according to the weight, and the threshold is set to determine whether the feature is valid or invalid, and the n features with the largest weight are selected, and other features are removed for feature selection;

步骤如下：Proceed as follows:

设D为特征集合，m为样本重复抽样次数，δ为特征统计量指标，k为最近邻样本个数，T为输出各个特征的统计量指标。Let D be the feature set, m be the number of sample resampling, δ be the feature statistic index, k be the number of nearest neighbor samples, and T be the statistic index of outputting each feature.

(1)置0所有特征权重(1) Set all feature weights to 0

(2)For i＝1 to m do(2) For i＝1 to m do

(3)R为从特征集合D中任意抽取的一个样本数据；H_j(j＝1，2，...k)为从R的同一类别的样本集中找到的k个最近邻H_j(j＝1，2，...k)，在从每个类别不一致的特征集合中找到k个最近邻M_j(C)(j＝1，2，...k)。(3) R is a sample data randomly drawn from the feature set D; H _j (j=1, 2,...k) is the k nearest neighbors H _j (j =1, 2,...k), find k nearest neighbors M _j (C) (j=1, 2,...k) in the feature sets inconsistent with each category.

(4)for i＝1 to N do(4) for i＝1 to N do

对于某一特征的A的统计量指标有：The statistical indicators of A for a certain feature are:

其中p(C)为该类别的比例。p(Class(R))为随机选取的某样本的类别的比例。where p(C) is the proportion of the category. p(Class(R)) is the proportion of the category of a sample randomly selected.

(6)对W进行排序(6) Sort W

求出特征权值后，权值越大表示该特征对样本的区分能力越强，通过设定阈值就选择新的特征子集。After calculating the feature weight, the larger the weight, the stronger the feature's ability to distinguish samples, and a new feature subset is selected by setting the threshold.

作为优选，步骤4所述的设计CNN-GRU-AM网络模型，具体如下：As preferably, the design CNN-GRU-AM network model described in step 4 is as follows:

其中卷积神经网络CNN结构包括卷积层、池化层、全连接层和输出层；CNN模型使用梯度下降法训练参数，经过训练的模型能够学习到时间序列数据中的特征。The CNN structure of the convolutional neural network includes a convolutional layer, a pooling layer, a fully connected layer, and an output layer; the CNN model uses the gradient descent method to train parameters, and the trained model can learn the characteristics of time series data.

门控循环单元GRU神经网络包含更新门和重置门；GRU单元中的数据传播过程描述如下：首先，通过上一个传输下来的评估状态h_t-1和当前节点的输入x_t来获取重置门R_t和更新门Z_t的状态：The gated recurrent unit GRU neural network includes an update gate and a reset gate; the data propagation process in the GRU unit is described as follows: First, the reset is obtained through the last transmitted evaluation state h _t-1 and the input x _t of the current node The state of gate R _t and update gate Z _t :

r_t＝σ(x_tW_xr+h_t-1W_hr+b_r)r _t ＝σ(x _t W _xr +h _t-1 W _hr +b _r )

z_t＝σ(x_tW_xr+h_t-1W_hr+b_z)z _t ＝σ(x _t W _xr +h _t-1 W _hr +b _z )

接着，当前

集合中所记忆的当前时刻状态表示为：Next, currently

The current state of memory in the collection is expressed as:

然后，GRU模型通过下式更新状态：Then, the GRU model updates the state by the following formula:

最后，前向传播的输出为：Finally, the output of the forward pass is:

y_t＝σ(W_o·h_t)y _t = σ(W _o h _t )

其中h_t是GRU的输出，W是权重向量，b是GRU的偏置向量。where _ht is the output of the GRU, W is the weight vector, and b is the bias vector of the GRU.

通过上一个传输下来的状态H_t-1和当前节点的输入x_t，σ为sigmoid函数，通过这个函数将数据变换为0-1范围内的数值，从而来充当门控信号。其中H_t-1包含过去信息，⊙表示Hadamard积。Through the last transmitted state H _t-1 and the input x _t of the current node, σ is a sigmoid function. Through this function, the data is transformed into a value in the range of 0-1 to serve as a gating signal. Among them, H _t-1 contains past information, and ⊙ represents the Hadamard product.

注意力机制AM的实现步骤见下式。The implementation steps of the attention mechanism AM are shown in the following formula.

(1)计算给定h_j值和目标状态S_t-1相似度，即每个t时刻状态h_j的权重：(1) Calculate the similarity between the given h _j value and the target state S _t-1 , that is, the weight of the state h _j at each time t:

e_tj＝a(S_t-1，h_j)e _tj = a(S _t-1 , h _j )

(2)对权重系数进行归一化：(2) Normalize the weight coefficients:

(3)对状态h_j进行加权平均：(3) Weighted average of state h _j :

其中，T为输入序列中的时间步长总数。where T is the total number of time steps in the input sequence.

本发明的有益效果是：The beneficial effects of the present invention are:

1.本发明提出了针对工业污水处理厂数据的特征结构优化方法，相比未经过特征优化的软测量算法，本发明提升了预测效果。1. The present invention proposes a feature structure optimization method for industrial sewage treatment plant data. Compared with the soft sensor algorithm without feature optimization, the present invention improves the prediction effect.

2.为了评估提出的CEEMD-ReliefF-CNN-GRU-AM模型的预测精度，以真实工业污水处理厂的出水CODcr、NH₃-N、TN和TP为例，通过对比实验对所提出的混合模型在实际数据集上的效率和稳定性进行了评估，大幅提升了预测效果。2. In order to evaluate the prediction accuracy of the proposed CEEMD-ReliefF-CNN-GRU-AM model, taking the effluent CODcr, NH ₃ -N, TN and TP of a real industrial sewage treatment plant as an example, the proposed hybrid model was tested by comparative experiments Efficiency and stability were evaluated on real data sets, and the prediction effect was greatly improved.

附图说明Description of drawings

图1为本发明的特征结构优化和深度学习的工业污水处理软测量方法的流程图；Fig. 1 is the flow chart of the industrial sewage treatment soft-sensing method of feature structure optimization and deep learning of the present invention;

图2-1为根据本发明的实施例中的己训练的模型预测出水COD_cr结果的评价示意图；Fig. 2-1 is the evaluation schematic diagram of the result of predicting effluent COD _cr according to the model trained in the embodiment of the present invention;

图2-2为根据本发明的实施例中的已训练的模型预测出水NH₃-N结果的评价示意图；Fig. 2-2 is a schematic diagram of the evaluation of the result of predicting NH ₃ -N in the effluent according to the trained model in the embodiment of the present invention;

图2-3为根据本发明的实施例中的已训练的模型预测出水TN结果的评价示意图；Fig. 2-3 is the evaluation schematic diagram of predicting the effluent TN result of the trained model in the embodiment of the present invention;

图2-4为根据本发明的实施例中的已训练的模型预测出水TP结果的评价示意图。2-4 are schematic diagrams of evaluation of the results of predicting effluent TP by the trained model according to the embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图1对本发明作进一步详细说明本发明的实施步骤。The implementation steps of the present invention will be described in further detail below in conjunction with accompanying drawing 1 .

一种基于特征结构优化和深度学习的工业污水处理软测量方法，具体包括如下步骤：A soft-sensing method for industrial sewage treatment based on feature structure optimization and deep learning, specifically including the following steps:

步骤1、剔除重叠特征，具体如下：Step 1. Eliminate overlapping features, as follows:

在构建模型之前，本文对工业污水处理厂所采集的数据进行了皮尔逊相关性分析(PCC)，相关性的强弱表明输入信息是否存在重叠；PCC的信号x(t)和y(t)的相关系数ρ_xy定义为：Before building the model, this paper conducts Pearson correlation analysis (PCC) on the data collected by industrial sewage treatment plants. The strength of correlation indicates whether there is overlap in the input information; the signals x(t) and y(t) of PCC The correlation coefficient ρ _xy is defined as:

步骤2、对特征进行经验模态分解，具体如下：Step 2. Perform empirical mode decomposition on the features, as follows:

由于工业污水处理厂内的数据具有高度的非线性与非平稳性，且夹杂着不确定噪声，本发明采用互补集成经验模态分解(CEEMD)方法对原始数据做处理，结合特征筛选算法，剔除高频噪声干扰和虚假分量，筛选出能够表征出有效信息的分量。互补集成经验模态分解(CEEMD)既保留了经验模态分解良好的自适应性，又结合了集成经验模态分解有效抑制了模态混叠现象的优点，有效克服了高斯白噪声的残留问题。从而使整体模型具有较好的降噪性能，其算法如下：Since the data in the industrial sewage treatment plant is highly nonlinear and non-stationary, and mixed with uncertain noise, the present invention uses the Complementary Integrated Empirical Mode Decomposition (CEEMD) method to process the original data, and combines the feature screening algorithm to eliminate High-frequency noise interference and false components are used to screen out components that can represent valid information. Complementary integrated empirical mode decomposition (CEEMD) not only retains the good adaptability of empirical mode decomposition, but also combines the advantages of integrated empirical mode decomposition to effectively suppress the mode aliasing phenomenon, and effectively overcomes the residual problem of Gaussian white noise . So that the overall model has better noise reduction performance, the algorithm is as follows:

步骤3、对分解后的IMFs进行特征筛选，具体如下：Step 3, perform feature screening on the decomposed IMFs, specifically as follows:

Relief F算法的基本思想是给样本的每一个特征赋予权值，并对权值进行迭代更新，之后根据权值大小对相应特征进行排序，并据此选择特征子集，使得好的特征聚集同类样本，而离散异类样本。其可解决多类别问题以及回归问题，能够处理噪声、不完整特征以及多类别属性的数据集。该算法的思想为，从同类和不同类样本中各选出k个最近邻样本，对其求平均值得到每个特征权值，从而得到每个样本实例中各个特征与类的相关性；接着，将特征按照权值大小进行排序，将通过设定门限来判定特征是有效或无效，并且选择n个权值最大的特征，去除其他特征来进行特征选择；The basic idea of the Relief F algorithm is to assign weights to each feature of the sample, and iteratively update the weights, and then sort the corresponding features according to the size of the weights, and select feature subsets accordingly, so that good features are clustered together. samples, while discretizing heterogeneous samples. It can solve multi-category problems as well as regression problems, and can handle datasets with noise, incomplete features, and multi-category attributes. The idea of the algorithm is to select k nearest neighbor samples from samples of the same class and different classes, and calculate the average value of each feature weight to obtain the correlation between each feature and class in each sample instance; then , sort the features according to their weights, and determine whether the features are valid or invalid by setting a threshold, and select n features with the largest weights, and remove other features for feature selection;

步骤如下：Proceed as follows:

设D为特征集合，m为样本重复抽样次数，δ为特征统计量指标，k为最近邻样本个数，T为输出各个特征的统计量指标；Let D be the feature set, m be the number of sample resampling, δ be the feature statistic index, k be the number of nearest neighbor samples, and T be the statistic index of outputting each feature;

(1)置0所有特征权重(1) Set all feature weights to 0

(2)For i＝1 to m do(2) For i＝1 to m do

(3)R为从特征集合D中任意抽取的一个样本数据；H_j为从R的同一类别的样本集中找到的k个最近邻H_j，在从每个类别不一致的特征集合中找到k个最近邻M_j(C)，其中j＝1，2，...k；(3) R is a sample data randomly selected from the feature set D; H _j is the k nearest neighbors H _j found from the sample set of the same category of R, and k nearest neighbors are found in the feature set inconsistent with each category Nearest neighbor M _j (C), where j = 1, 2, ... k;

(4)for i＝1 to N do(4) for i＝1 to N do

其中p(C)为该类别的比例；p(Class(R))为随机选取的某样本的类别的比例；Among them, p(C) is the proportion of the category; p(Class(R)) is the proportion of the category of a sample randomly selected;

(6)对W进行排序(6) Sort W

求出特征权值后，权值越大表示该特征对样本的区分能力越强，通过设定阈值就可以选择新的特征子集。After calculating the feature weight, the larger the weight, the stronger the feature's ability to distinguish samples, and a new feature subset can be selected by setting the threshold.

步骤4、设计CNN-GRU-AM网络模型，具体如下：Step 4, design the CNN-GRU-AM network model, as follows:

CNN-GRU-AM混合模型各模块定义如下：卷积神经网络(CNN)是一种多层的前馈人工神经网络，有着池化操作、局部连接以及权值共享等特性，在不同领域被广泛利用。CNN有两个显著的优点：稀疏连接和权重共享，可以从数据中提取更深层次的特征，并降低网络模型的复杂性。CNN结构主要由卷积层、池化层、全连接层和输出层等组成。这种结构可以减少权值数量，从而使网络模型简单，同时时间序列数据可以直接作为网络的输入，有效降低特征提取和数据重构的复杂程度。CNN模型使用梯度下降法训练参数，经过训练的模型能够学习到时间序列数据中的特征。The modules of the CNN-GRU-AM hybrid model are defined as follows: Convolutional neural network (CNN) is a multi-layer feed-forward artificial neural network, which has the characteristics of pooling operation, local connection and weight sharing, and is widely used in different fields. use. CNN has two significant advantages: sparse connection and weight sharing, which can extract deeper features from data and reduce the complexity of network models. The CNN structure is mainly composed of convolutional layers, pooling layers, fully connected layers, and output layers. This structure can reduce the number of weights, thereby making the network model simple, and at the same time, time series data can be directly used as the input of the network, effectively reducing the complexity of feature extraction and data reconstruction. The CNN model uses the gradient descent method to train parameters, and the trained model can learn the features in the time series data.

门控循环单元(GRU)神经网络是对LSTM进行优化改进的变种，其减少了训练参数的同时保证了预测精度。GRU包含更新门和重置门，相比LSTM三个门结构参数更少，收敛速度更快，并具有良好的数据特征学习能力。GRU单元中的数据传播过程描述如下：首先，通过上一个传输下来的评估状态h_t-1和当前节点的输入x_t来获取重置门R_t和更新门Z_t的状态：The gated recurrent unit (GRU) neural network is an optimized and improved variant of LSTM, which reduces training parameters while ensuring prediction accuracy. GRU contains an update gate and a reset gate. Compared with LSTM, the three gate structure parameters are less, the convergence speed is faster, and it has good data feature learning ability. The data propagation process in the GRU unit is described as follows: First, the state of the reset gate R _{t and the update gate Z t} _is obtained through the last transmitted evaluation state h _t-1 and the input x _t of the current node:

r_t＝σ(x_tW_xr+h_t-1W_hr+b_r)r _t ＝σ(x _t W _xr +h _t-1 W _hr +b _r )

z_t＝σ(x_tW_xr+h_t-1W_hr+b_z)z _t ＝σ(x _t W _xr +h _t-1 W _hr +b _z )

接着，当前

集合中所记忆的当前时刻状态表示为：Next, currently

The current state of memory in the collection is expressed as:

最后，前向传播的输出为：Finally, the output of the forward pass is:

y_t＝σ(W_o·h_t)y _t = σ(W _o h _t )

其中h_t是GRU的输出，W是权重向量，b是GRU的偏置向量；where h _t is the output of the GRU, W is the weight vector, and b is the bias vector of the GRU;

通过上一个传输下来的状态H_t-1和当前节点的输入x_t，σ为sigmoid函数，通过这个函数将数据变换为0-1范围内的数值，从而来充当门控信号；其中H_t-1包含过去信息，⊙表示Hadamard积；Through the last transmitted state H _t-1 and the input x _t of the current node, σ is a sigmoid function, through which the data is transformed into a value in the range of 0-1 to act as a gating signal; where H _{t- 1} contains past information, ⊙ means Hadamard product;

注意力机制AM的实现步骤见下式；The implementation steps of the attention mechanism AM are shown in the following formula;

e_tj＝a(S_t-1，h_j)e _tj = a(S _t-1 , h _j )

(2)对权重系数进行归一化：(2) Normalize the weight coefficients:

(3)对状态h_j进行加权平均：(3) Weighted average of state h _j :

步骤5，训练并验证提出模型网络，进而对软测量方法进行评价，具体如下：Step 5, train and verify the proposed model network, and then evaluate the soft sensor method, as follows:

将在工业污水处理厂采集的11532条工业污水处理厂真实数据按照70％的训练集与30％的测试集进行划分。本文预测模型的每个采样点的窗口长度设置为10，每个模块的结构包括具有32个节点的GRU层，以及具有16个节点的DNN层，epochs大小为100。本节将通过三种实验，对所提出的模型进行评估。The 11532 real data of industrial sewage treatment plants collected in industrial sewage treatment plants are divided into 70% training set and 30% test set. The window length of each sampling point of the prediction model in this paper is set to 10, and the structure of each module includes a GRU layer with 32 nodes and a DNN layer with 16 nodes, and the epochs size is 100. In this section, three experiments are performed to evaluate the proposed model.

具体的，本发明将通过平均绝对误差(MAE)、平均绝对百分比误差(MAPE)、R²_score和平均绝对百分比误差(MAPE)对预测模型进行评估，与对比实验模型进行比较。Specifically, the present invention will evaluate the prediction model by means of mean absolute error (MAE), mean absolute percentage error (MAPE), R ² _score and mean absolute percentage error (MAPE), and compare it with the comparative experimental model.

其中，N为时间序列的长度，

表示预测值，y_t表示真实值，

是真实值的平均值。本文选择上述指标对所提出模型进行估计，其中，评估预测值与真实值的基础方法为MAE与RMSE，计算后它们的数值越小，表示预测的误差值越小；MAPE不但考虑了预测值与真实值的误差，还考虑了误差与真实值的比例；R²_score的范围在0～1之间，越接近1，表示模型的预测值越接近真实值；如图2-1、图2-2、图2-3、图2-4所示。Among them, N is the length of the time series,

represents the predicted value, y _t represents the real value,

is the average of the true values. In this paper, the above indicators are selected to estimate the proposed model. Among them, the basic methods for evaluating the predicted value and the real value are MAE and RMSE. The smaller their values after calculation, the smaller the predicted error value; The error of the real value also takes into account the ratio of the error to the real value; the range of R ² _score is between 0 and 1, and the closer to 1, the closer the predicted value of the model is to the real value; as shown in Figure 2-1 and Figure 2- 2. As shown in Figure 2-3 and Figure 2-4.

Claims

1. The industrial sewage treatment soft-sensing method based on feature structure optimization and deep learning is characterized in that, the method comprises the following steps:

Step 1. Eliminate overlapping features for the data collected by industrial sewage treatment plants;

Step 2, performing empirical mode decomposition on the data in the industrial sewage treatment plant with overlapping features eliminated;

Step 3, performing feature screening on the decomposed IMFs;

Step 4, design CNN-GRU-AM hybrid network model;

Step 5. Train and verify the proposed model network, and output prediction indicators.

2. A kind of industrial sewage treatment soft-sensing method based on feature structure optimization and deep learning according to claim 1, characterized in that the overlapping features described in step 1 are removed, specifically as follows:

Before building the model, a Pearson correlation analysis was performed on the data collected by the industrial sewage treatment plant. The strength of the correlation indicates whether there is overlap in the input information; the correlation coefficient ρ of the PCC signals x(t) and y(t) _xy is defined as:

In the formula, σ _xy is the covariance of signals x(t) and y(t), σ _x , σ _y are the standard deviations of signals x(t) and y(t) respectively; finally, the calculated correlation coefficient is Normalization, ρ _xy ∈ [-1,1], the closer the value of ρ _xy is to 1, the stronger the correlation between the two signals. Set the threshold. When ρ _xy is greater than the set threshold, it is considered as the input signal for overlapping signals.

3. the industrial sewage treatment soft-sensing method based on characteristic structure optimization and deep learning according to claim 1, is characterized in that, step 2 described feature is carried out empirical mode decomposition, specifically as follows:

The complementary integrated empirical mode decomposition method is used to process the original data, combined with the feature screening algorithm, high-frequency noise interference and false components are eliminated, and the components that can represent effective information are screened out; the specific steps are as follows:

Step 1. Add n pairs of positive and negative white noise ω _i (t) to the original signal s(t):

Step ②, the signal after adding noise is re-decomposed according to the EMD decomposition process, and two sets of corresponding intrinsic mode functions IMFs and residual r _i (t) are obtained, and then the mean value is calculated:

In the formula, C _ij ⁺ (t), C _ij ^- (t) represent the j-th IMF component, r _i ⁺ (t), ri _- ⁽ t) represent the i-th calculation residual.

4. a kind of industrial sewage treatment soft-sensing method based on feature structure optimization and deep learning according to claim 3, is characterized in that, the IMFs after decomposing described in step 3 is carried out characteristic screening, specifically as follows:

The feature selection of the decomposed IMFs is carried out through the Relief F algorithm, specifically: select k nearest neighbor samples from the same type and different types of samples, and average them to obtain each feature weight, so as to obtain each sample instance The correlation between each feature in the class and the class; then, the features are sorted according to the weight value, and the feature is determined to be valid or invalid by setting the threshold, and the n features with the largest weight are selected, and other features are removed for feature selection. ;

Proceed as follows:

Let D be the feature set, m be the number of sample resampling, δ be the feature statistic index, k be the number of nearest neighbor samples, and T be the statistic index of outputting each feature;

(1) Set all feature weights to 0

(2) For i＝1 to m do

(3) R is a sample data randomly selected from the feature set D; H _j is the k nearest neighbors H _j found from the sample set of the same category of R, and k nearest neighbors are found in the feature set inconsistent with each category Nearest neighbor M _j (C), where j=1,2,...k;

(4) for i＝1 to N do

The statistical indicators of A for a certain feature are:

Among them, p(C) is the proportion of the category; p(Class(R)) is the proportion of the category of a sample randomly selected;

(6) Sort W

After calculating the feature weight, the larger the weight, the stronger the feature's ability to distinguish samples, and a new feature subset is selected by setting the threshold.

5. a kind of industrial sewage treatment soft-sensing method based on feature structure optimization and deep learning according to claim 4, is characterized in that: the design CNN-GRU-AM network model described in step 4, specifically as follows:

The CNN structure of the convolutional neural network includes a convolutional layer, a pooling layer, a fully connected layer, and an output layer; the CNN model uses the gradient descent method to train parameters, and the trained model can learn the characteristics of time series data;

The gated recurrent unit GRU neural network includes an update gate and a reset gate; the data propagation process in the GRU unit is described as follows: First, the reset is obtained through the last transmitted evaluation state h _t-1 and the input x _t of the current node The state of gate R _t and update gate Z _t :

r _t ＝σ(x _t W _xr +h _t-1 W _hr +b _r )

z _t ＝σ(x _t W _xr +h _t-1 W _hr +b _z )

Next, currently

The current state of memory in the collection is expressed as:

Then, the GRU model updates the state by the following formula:

Finally, the output of the forward pass is:

y _t = σ(W _o h _t )

where h _t is the output of the GRU, W is the weight vector, and b is the bias vector of the GRU;

Through the last transmitted state H _t-1 and the input x _t of the current node, σ is a sigmoid function, through which the data is transformed into a value in the range of 0-1 to act as a gating signal; where H _{t- 1} contains past information, ⊙ means Hadamard product;

The implementation steps of the attention mechanism AM are shown in the following formula;

(1) Calculate the similarity between the given h _j value and the target state S _t-1 , that is, the weight of the state h _j at each time t:

e _tj = a(S _t-1 , h _j )

(2) Normalize the weight coefficients:

(3) Weighted average of state h _j :

where T is the total number of time steps in the input sequence.