CN108932671A

CN108932671A - A kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune

Info

Publication number: CN108932671A
Application number: CN201810575699.2A
Authority: CN
Inventors: 赵坤; 张挺
Original assignee: Shanghai University of Electric Power
Current assignee: Shanghai University of Electric Power
Priority date: 2018-06-06
Filing date: 2018-06-06
Publication date: 2018-12-04

Abstract

The present invention relates to a kind of LSTM wind-powered electricity generation load forecasting methods joined using depth Q neural network tune, and this approach includes the following steps：1) initial data for acquiring power system environment, chooses training set and forecast set；2) using LSTM as prediction model, the hyper parameter in prediction model is adjusted using DQN, specifically includes environmental parameter adjusting, state adjustment, movement selection, the intensified learning reward of regularized learning algorithm rate using the hyper parameter in DQN adjusting prediction model；3) training result is fed back to using experience recovery method and carries out parameter optimization in DQN by the prediction model after training set to be substituted into adjustment parameter, obtains optimal L STM prediction model；4) wind-powered electricity generation load prediction is carried out using optimal L STM prediction model.Compared with prior art, the present invention is not necessarily to need professional's de-regulation when different regions, is greatly improved forecasting efficiency.

Description

A LSTM Wind Power Load Forecasting Method Using Deep Q Neural Network Tuning Parameters

技术领域technical field

本发明涉及电力信息技术领域，尤其是涉及一种采用深度Q神经网络调参的LSTM风电负荷预测方法。The invention relates to the technical field of electric power information, in particular to an LSTM wind power load forecasting method using a deep Q neural network for parameter tuning.

背景技术Background technique

风电负荷预测是电力调度工作中的重要组成部分，其预测好坏直接决定了风电能否接入电网系统。风电负荷属于时间序列，随着时间的变化不断更新。具有LSTM(LongShort Term Memory networks，长短期记忆网络)结构的RNN(Recurrent NeuralNetworks，循环神经网络)可以有效解决RNN网络的时间梯度消失的问题，并且由于RNN特殊的网络结构使得其对时间序列数据有独特的优势。Wind power load forecasting is an important part of power dispatching work, and the quality of its forecasting directly determines whether wind power can be connected to the grid system. Wind power load belongs to time series and is updated continuously with time changes. RNN (Recurrent Neural Networks, cyclic neural network) with LSTM (Long Short Term Memory networks, long short-term memory network) structure can effectively solve the problem of the disappearance of the time gradient of the RNN network, and due to the special network structure of RNN, it is useful for time series data. unique advantage.

循环神经网络有特殊的网络结构，即隐藏层的输入：除了当前时刻的输入层输入，还有上一时刻的输入层输入，如图1所示。在图1中，x、x1、x2分别为不同时间节点的输入，o、o1、o2则分别为对应时间的输出，U、V为线性关系矩阵，在整个RNN中是共享的。将与风电负荷相关的数据包括时间、风场的风速、实时功率、频率、风向、室外温度在内作为预测模型的输入，通过网络计算并得到输出结果o，然后将o与对应的风力负荷进行比较可以得到误差，得到误差后采用梯度下降(Gradient Descent)和BPTT(Back-Propagation Through Time，基于时间的反向传播)方法对模型进行训练，BPTT采用反向传播求解梯度并更新网络参数权重。将RNN中的循环展开，上一层的神经网络会传递信息给下一层，这就是RNN对时间序列数据的处理有优势的原因。不需要训练所有神经网络的参数，只需要训练一层即可，其中的参数均为共享参数。The cyclic neural network has a special network structure, that is, the input of the hidden layer: in addition to the input layer input at the current moment, there is also the input layer input at the previous moment, as shown in Figure 1. In Figure 1, x, x1, and x2 are inputs at different time nodes, o, o1, and o2 are outputs corresponding to time respectively, and U and V are linear relationship matrices, which are shared in the entire RNN. The data related to wind power load, including time, wind speed of the wind field, real-time power, frequency, wind direction, and outdoor temperature, are used as the input of the prediction model, and the output result o is obtained through network calculation, and then o is compared with the corresponding wind load The error can be obtained by comparison. After the error is obtained, the gradient descent (Gradient Descent) and BPTT (Back-Propagation Through Time, time-based backpropagation) methods are used to train the model. BPTT uses backpropagation to solve the gradient and update the network parameter weight. Expanding the loop in RNN, the upper layer of neural network will pass information to the next layer, which is why RNN has advantages in processing time series data. There is no need to train all the parameters of the neural network, only one layer needs to be trained, and the parameters in it are all shared parameters.

普通的RNN面对长时间跨度可能会有梯度消失或梯度爆炸的问题，LSTM可保留误差，用于沿时间和层进行反向传递。LSTM将误差保持在更为恒定的水平，让循环网络能够进行许多个时间步的学习(超过1000个时间步)，从而打开了建立远距离因果联系的通道。Ordinary RNN may have the problem of gradient disappearance or gradient explosion in the face of long-term spans. LSTM can retain errors for reverse transmission along time and layers. LSTMs keep the error at a more constant level, allowing recurrent networks to learn over many time steps (more than 1000 time steps), thus opening the way to establish long-distance causal connections.

LSTM将信息存放在循环网络正常信息流之外的门控单元中。这些单元可以存储、写入或读取信息，就像计算机内存中的数据一样。单元通过门的开关判定存储哪些信息，以及何时允许读取、写入或清除信息。但与计算机中的数字式存储器不同的是，这些门是模拟的，包含输出范围全部在0～1之间的sigmoid函数的逐元素相乘操作。相比数字式存储，模拟值的优点是可微分，因此适合反向传播。这些门依据接收到的信号而开关，而且与神经网络的节点类似，它们会用自有的权重集对信息进行筛选，根据其强度和导入内容决定是否允许信息通过。这些权重就像调制输入和隐藏状态的权重一样，会通过循环网络的学习过程进行调整。也就是说，记忆单元会通过猜测、误差反向传播、用梯度下降调整权重的迭代过程学习何时允许数据进入、离开或被删除。其结构如图2所示。图2中最底部的三个箭头表示信息从多个点流入记忆单元(cell)。当前输入与过去的单元状态不只被送入记忆单元本身，同时也进入单元的三个门，而这些门将决定如何处理输入黑点即是门，通过与不同的系数相乘分别决定何时允许新输入进入(yⁱⁿ)，何时清除当前的单元状态以及何时让单元状态对当前时间步的网络输出产生影响(y^out)。S_c是记忆单元当前的状态，而gyⁱⁿ是当前的输入。每个门都可开可关，而且门在每个时间步都会重新组合开关状态。记忆单元在每个时间步都可以决定是否遗忘其状态，是否允许写入，是否允许读取。LSTM预测是否准确与超参数有直接关系，因此，合适的超参数使预测模型能达到或者极为接近全局最优点。现有技术通常采用Q-Learning算法，其算法流程为：LSTMs store information in gated cells outside the normal flow of information in the recurrent network. These cells can store, write or read information, just like data in computer memory. The cells decide what information to store and when to allow it to be read, written, or cleared by the opening and closing of the gate. But unlike digital memory in a computer, these gates are analog, consisting of element-wise multiplications of sigmoid functions whose outputs all range between 0 and 1. Compared to digital storage, analog values have the advantage of being differentiable and thus suitable for backpropagation. These gates open and close based on the received signal, and similar to the nodes of a neural network, they use their own set of weights to filter information, and decide whether to allow information to pass according to its strength and input content. These weights, like the weights that modulate the input and hidden states, are adjusted through the learning process of the recurrent network. That is, the memory unit learns when data is allowed to enter, leave, or be removed through an iterative process of guessing, error backpropagation, and gradient descent to adjust weights. Its structure is shown in Figure 2. The bottom three arrows in Figure 2 represent the flow of information into memory cells from multiple points. The current input and the past cell state are not only sent into the memory cell itself, but also into the three gates of the cell, and these gates will determine how to process the input. Input enters (y ⁱⁿ ), when to clear the current cell state And when to let the unit state affect the network output at the current time step (y ^out ). S _c is the current state of the memory cell, and gy ⁱⁿ is the current input. Each gate can be opened and closed, and the gate recombines the on and off states at each time step. A memory cell can decide at each time step whether to forget its state, allow writing, or allow reading. The accuracy of LSTM predictions is directly related to hyperparameters. Therefore, appropriate hyperparameters enable the prediction model to reach or be very close to the global optimum. The existing technology usually adopts the Q-Learning algorithm, and its algorithm flow is as follows:

初始化Q(s,a),a∈A(s)，任意的数值，且Q(terminal-state)＝0；Initialize Q(s,a), a∈A(s), any value, and Q(terminal-state)=0;

重复(对每一节episode)；Repeat (for each episode);

初始化状态S；Initialize state S;

重复(对episode中的每一步)：Repeat (for each step in the episode):

使用某一个policy，如(ε-greedy)根据状态S选取一个动作执行；Use a certain policy, such as (ε-greedy) to select an action to execute according to the state S;

执行完动作后，观察reward和新的状态S′；After executing the action, observe the reward and the new state S′;

Q(S_t,A_t)←Q(S_t，A_t)+a(R_t+1+λmax_aQ(S_t+1,a)-Q(S_t,A_t))Q(S _t ,A _t )←Q(S _t ,A _t )+a(R _t+1 +λmax _a Q(S _t+1 ,a)-Q(S _t ,A _t ))

S←S′S←S′

循环直到终止。Loop until terminated.

算法中的α为学习率，其控制前一个Q值和新提出的Q值之间被考虑到的差异程度。Q指相对应的Q值，λ则是折扣因子，当折扣因子为0时，预测模型会倾向于当前表格做决策，当其为1时则倾向于做之前没做过的尝试来扩大Q表的内容，一般而言折扣因子取0到1之间的一个数来平衡即使奖励与探索。R_t+1+λmax_aQ(S_t+1,a)为目标Q值，Q-Learning算法主要就是让Q(S_t,a)接近目标Q值。有助于优化风电预测模型，使之适应于不同地域。然而风电负荷受地域环境影响较大，不同地域的模型参数有较大的不同，当预测模型应用在不同的地域时需要专业人才去调节，而预测模型的调参又颇费人力，较为不便。α in the algorithm is the learning rate, which controls how much difference between the previous Q-value and the newly proposed Q-value is taken into account. Q refers to the corresponding Q value, and λ is the discount factor. When the discount factor is 0, the prediction model will tend to make decisions in the current table, and when it is 1, it will tend to expand the Q table by trying something that has not been done before. Generally speaking, the discount factor takes a number between 0 and 1 to balance rewards and exploration. R _t+1 +λmax _a Q(S _t+1 ,a) is the target Q value, and the Q-Learning algorithm mainly makes Q(S _t ,a) close to the target Q value. It is helpful to optimize the wind power forecasting model and adapt it to different regions. However, the wind power load is greatly affected by the regional environment, and the model parameters in different regions are quite different. When the prediction model is applied in different regions, professionals are required to adjust it, and the adjustment of the prediction model parameters is labor-intensive and inconvenient.

发明内容Contents of the invention

本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种自动调参、提高预测效率，且能够自适应不同地域的采用深度Q神经网络调参的LSTM风电负荷预测方法。The purpose of the present invention is to provide an LSTM wind power load forecasting method that uses deep Q neural network parameter tuning to automatically adjust parameters, improve forecasting efficiency, and adapt to different regions in order to overcome the above-mentioned defects in the prior art.

本发明的目的可以通过以下技术方案来实现：The purpose of the present invention can be achieved through the following technical solutions:

一种采用深度Q神经网络调参的LSTM风电负荷预测方法，该方法包括以下步骤：An LSTM wind power load forecasting method using deep Q neural network parameter adjustment, the method includes the following steps:

S1：采集电力系统环境的原始数据，选取训练集及预测集；S1: Collect the original data of the power system environment, select the training set and prediction set;

S2：采用LSTM作为预测模型，利用DQN调节预测模型中的超参数；S2: Use LSTM as the prediction model, and use DQN to adjust the hyperparameters in the prediction model;

利用DQN调节预测模型中的参数的具体内容包括环境参数调节、状态调整、动作选择及调整学习率的强化学习奖励。环境参数调节结合LSTM预测模型及一系列的动作，形成了一个马尔科夫决策模型；状态调整、动作选择及调整学习率的强化学习奖励的实现基于形成的马尔科夫决策模型。The specific content of using DQN to adjust the parameters in the prediction model includes environmental parameter adjustment, state adjustment, action selection and reinforcement learning rewards for adjusting the learning rate. The environment parameter adjustment combines the LSTM prediction model and a series of actions to form a Markov decision model; the realization of state adjustment, action selection and the reinforcement learning reward of adjusting the learning rate is based on the formed Markov decision model.

其中，环境参数调节的具体内容为：Among them, the specific content of environmental parameter adjustment is as follows:

采用学习率调节函数f(x)调节适应学习率，采用正则参数调节函数g(x)调节适应正则参数，假设(p，y)为一个训练样本，p为输入，包括学习率x_t和正则参数z_t，y为期望的输出，a为实际输出，则有：The learning rate adjustment function f(x) is used to adjust the adaptive learning rate, and the regular parameter adjustment function g(x) is used to adjust the adaptive regular parameters. Assume (p, y) is a training sample, p is the input, including the learning rate x _t and regularization The parameter z _t , y is the expected output, and a is the actual output, then:

式中，n为样本个数。In the formula, n is the number of samples.

状态调整的具体内容为：The specific content of status adjustment is as follows:

采用包含六个状态特征的特征向量来表示状态，六个状态特征的特征向量包括期望调整的超参数、候选迭代目标值、过去M步最大目标值、下降方向与梯度之间的点积、MI/MAX编码、函数评价数和对齐度量，则有：A feature vector containing six state features is used to represent the state. The feature vector of the six state features includes the hyperparameters expected to be adjusted, the candidate iteration target value, the maximum target value of the past M steps, the dot product between the descent direction and the gradient, and MI /MAX encoding, function evaluation numbers and alignment metrics, then:

设为时间t-1得到的M个最低目标值的列表，状态[S_t]编码由下式决定：Assume For the list of M lowest target values obtained at time t-1, the state [S _t ] encoding is determined by:

式中，当f(x_t)小于的最小值时，编码为1，在之前的M个F中则取0，其他情况取-1；In the formula, when f(x _t ) is less than When the minimum value of , it is coded as 1, it is 0 in the previous M Fs, and -1 is used in other cases;

给出状态调整[st]_alignment为：Given the state adjustment [st] _alignment is:

下降方向的表达式为：descending direction The expression is:

式中，为关于的梯度，为学习率的均值。In the formula, for about the gradient of is the mean value of the learning rate.

动作选择的具体内容为：The specific content of the action selection is:

对于给定的状态，采用在接受迭代之后将学习速率或正则化参数重置为初始值的方法进行动作选择，当控制学习速率时，有两个动作，保持学习速率或一半学习速率；对于调整正则化系数，除了两种选择之外，允许其增加四分之一。For a given state, the method of resetting the learning rate or regularization parameter to the initial value after accepting iterations is used for action selection. When controlling the learning rate, there are two actions, keeping the learning rate or half the learning rate; for adjusting The regularization coefficient, except for two alternatives, is allowed to increase by a quarter.

调整学习率的强化学习奖励r_id(f，x_t)的表达式为：The expression of the reinforcement learning reward r _id (f, x _t ) with adjusted learning rate is:

式中，f_lb为函数值的目标下界，c为目标下界值。In the formula, f _lb is the target lower bound of the function value, and c is the target lower bound value.

S3：将训练集代入调节参数后的预测模型，将训练结果反馈至DQN中进行参数优化，获取最优LSTM预测模型；S3: Substituting the training set into the prediction model after adjusting the parameters, feeding back the training results to DQN for parameter optimization, and obtaining the optimal LSTM prediction model;

对训练部分采用经验回放的技巧，在每一次对神经网络的参数进行更新时，从数据里随机地调取部分之前的训练结果，用于更新DQN，进而获取最优LSTM预测模型。The technique of experience playback is used for the training part. Every time the parameters of the neural network are updated, part of the previous training results are randomly retrieved from the data to update the DQN, and then obtain the optimal LSTM prediction model.

S4：利用最优LSTM预测模型进行风电负荷预测。S4: Use the optimal LSTM forecasting model for wind power load forecasting.

与现有技术相比，本发明采用DQN使预测模型自行学习调节超参数，可以适应不同地域的风电预测模型，无需不同的地域时需要专业人才去调节，大大提高了预测效率。Compared with the prior art, the present invention uses DQN to enable the prediction model to learn and adjust hyperparameters by itself, which can adapt to wind power prediction models in different regions, and requires professionals to adjust when there is no need for different regions, which greatly improves the prediction efficiency.

附图说明Description of drawings

图1为RNN结构图；Figure 1 is a structure diagram of RNN;

图2为LSTM结构图；Figure 2 is a structure diagram of LSTM;

图3为本发明方法的流程示意图；Fig. 3 is a schematic flow sheet of the method of the present invention;

图4为本发明实施例中DQN学习速率为0.05、e_greedy＝0.01时的收敛效果图；Fig. 4 is the convergence effect figure when DQN learning rate is 0.05, e_greedy=0.01 in the embodiment of the present invention;

图5为本发明实施例中采用Q梯度下降与采用一般梯度下降的预测模型准确率对比图；Fig. 5 is a comparison chart of prediction model accuracy using Q gradient descent and general gradient descent in the embodiment of the present invention;

图6为本发明实施例中采用Q梯度下降与采用一般梯度下降的预测模型误差下降收敛效果对比图。FIG. 6 is a comparison diagram of the convergence effect of prediction model error reduction using Q gradient descent and general gradient descent in the embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和具体实施例对本发明进行详细说明。The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

实施例Example

如图3所示，本发明涉及一种采用深度Q神经网络调参的LSTM风电负荷预测方法，该方法的主要内容为：As shown in Fig. 3, the present invention relates to a kind of LSTM wind power load forecasting method that adopts depth Q neural network tuning parameter, and the main content of this method is:

1)采集电力系统环境的原始数据，选取训练集及预测集。1) Collect the original data of the power system environment, and select the training set and prediction set.

2)采用LSTM作为预测模型，利用DQN动态适应预测模型中的超参数，获取预测模型的输出值。利用DQN动态适应预测模型中的超参数的具体内容包括环境参数调节、状态调整、动作选择、调整学习率的强化学习奖励。2) Using LSTM as the prediction model, using DQN to dynamically adapt to the hyperparameters in the prediction model to obtain the output value of the prediction model. The specific content of using DQN to dynamically adapt to the hyperparameters in the prediction model includes environmental parameter adjustment, state adjustment, action selection, and reinforcement learning rewards for adjusting the learning rate.

3)将训练集代入调节参数后的预测模型，将训练结果反馈至DQN中进行参数优化，获取最优LSTM预测模型；3) Substituting the training set into the prediction model after adjusting the parameters, feeding back the training results to DQN for parameter optimization, and obtaining the optimal LSTM prediction model;

4)利用最优LSTM预测模型进行风电负荷预测。4) Use the optimal LSTM forecasting model for wind power load forecasting.

将LSTM作为预测模型，采用一个深度Q神经网络(DQN)去动态的适应预测模型中的超参数，每当DQN做出一个动作即取一个学习率的值，这个值会被模拟到预测模型中随后会有一个输出，并对其进行奖励估值，这时DQN将动作与相应的奖励估值计入一个Q表格中，而数据量巨大，故需要用到深度网络来记录之前尝试的结果，从而可以使DQN能从表格中学习到调节超参数的技巧。其中的环境、动作和奖励的定义如下：Use LSTM as a prediction model, and use a deep Q neural network (DQN) to dynamically adapt to the hyperparameters in the prediction model. Whenever DQN makes an action, it takes a learning rate value, and this value will be simulated into the prediction model. Then there will be an output, and it will be rewarded and valued. At this time, DQN will include the action and the corresponding reward value in a Q table, and the amount of data is huge, so it is necessary to use a deep network to record the results of previous attempts. In this way, DQN can learn the skills of adjusting hyperparameters from the table. The environment, actions and rewards are defined as follows:

1、环境1. Environment

式中，学习率调节函数f(x_t)、正则参数调节函数g(z_t)使在不同的学习率与正则参数的情况下预测模型的输出与期望之间缩小差距，在两式中x_t、z_t分别表示学习率与正则参数；(p，y)是一个训练样本，n为样本个数，p为输入，包括学习率x_t、正则参数z_t；y为期望的输出，a为实际输出。这里调节函数f(x_t)、正则参数调节函数g(z_t)采用交叉熵代价函数，当误差大时权重更新快，误差小时权重更新慢。采用f(x)去调节适应学习率，采用g(x)调节适应正则参数。环境结合了预测模型和一系列的动作以及其他必要元素，形成了一个马尔科夫决策模型，即动作的选取只依赖用户当前的状态，与之前的历史行为没有关系。状态调整、动作选择及调整学习率的强化学习奖励的完成基于该马尔科夫决策模型。In the formula, the learning rate adjustment function f(x _t ) and the regular parameter adjustment function g(z _t ) make the gap between the output of the prediction model and the expectation narrow under different learning rates and regular parameters. In the two formulas, x _t and z _t represent the learning rate and regularization parameters respectively; (p, y) is a training sample, n is the number of samples, p is the input, including learning rate x _t and regularization parameter z _t ; y is the expected output, a for the actual output. Here, the adjustment function f(x _t ) and the regular parameter adjustment function g(z _t ) adopt the cross-entropy cost function. When the error is large, the weight update is fast, and when the error is small, the weight update is slow. Use f(x) to adjust the adaptive learning rate, and use g(x) to adjust the adaptive regularization parameters. The environment combines a predictive model with a series of actions and other necessary elements to form a Markov decision model, that is, the selection of actions depends only on the user's current state and has nothing to do with previous historical behaviors. Reinforcement learning rewards for state adjustment, action selection, and adjustment of the learning rate are done based on this Markov decision model.

2、状态2. Status

用具有六个状态特征的特征向量来表示状态。状态特征分别是我们期望调整的超参数、候选迭代目标值、过去M步最大目标值、下降方向与梯度之间的点积、MI/MAX编码、函数评价数和对齐度量，前四个特征可直接获取，对于最后两个特征，有：States are represented by eigenvectors with six state characteristics. The state features are the hyperparameters we expect to adjust, the candidate iteration target value, the maximum target value in the past M steps, the dot product between the descent direction and the gradient, MI/MAX encoding, function evaluation number, and alignment measure. The first four features can be Obtained directly, for the last two features, there are:

设是时间t-1得到的M个最低目标值的列表，状态[S_t]编码由下式决定：Assume is a list of the M lowest target values obtained at time t-1, and the state [S _t ] code is determined by the following formula:

式中，当f(x_t)较之前的最小还小时，编码为1，在之前的M个F中则取0，其他情况取-1。让成为下降方向，其表达式为：In the formula, when f(x _t ) is smaller than the previous When it is still young, it is coded as 1, and it is 0 in the previous M Fs, and -1 in other cases. Let becomes the descending direction, and its expression is:

此外，为了使状态特征独立于特定目标函数，将所有特征变换为区间[-1，1]内的特征。Furthermore, in order to make state features independent of a specific objective function, all features are transformed into features in the interval [-1, 1].

3、动作3. Action

对于给定的状态，动作是如何改变学习速率以及正则化参数的组合。一般来说，学习速率和正则化参数非常小。因此，采用在接受迭代之后将学习速率或正则化参数重置为初始值的策略。因此，当控制学习速率时，有两个动作：保持学习速率或一半学习速率。对于调整正则化系数，除了两种选择之外，允许其增加四分之一。For a given state, the action is how to change the combination of learning rate and regularization parameters. In general, the learning rate and regularization parameters are very small. Therefore, a strategy of resetting the learning rate or regularization parameters to initial values after accepting iterations is adopted. Therefore, when controlling the learning rate, there are two actions: maintain the learning rate or half the learning rate. For adjusting the regularization coefficient, it is allowed to increase by a quarter in all but two choices.

4、奖励4. Rewards

为了调整学习速率，奖励被定义为从目标净训练损失到下界的反距离。调整学习率的强化学习奖励r_id(f，x_t)如下式所示：To tune the learning rate, the reward is defined as the inverse distance from the target net training loss to a lower bound. The reinforcement learning reward r _id (f, x _t ) for adjusting the learning rate is as follows:

式中，f_lb为函数值的目标下界，一般来说可将其实设置为零，作为损失函数之和的目标。c为目标下界。In the formula, f _lb is the target lower bound of the function value, generally speaking, it can be set to zero as the target of the sum of loss functions. c is the target lower bound.

5、经验回放5. Experience playback

在训练部分应用经验回放的技巧，每一次对神经网络的参数进行更新时，就从数据里随机地调取一小批之前的训练结果，帮助培训神经网络。In the training part, the technique of experience playback is applied. Every time the parameters of the neural network are updated, a small batch of previous training results are randomly retrieved from the data to help train the neural network.

一个经验包含a(s_i，a_i，r_i+1，s_i+1，label)^j，其中i是指时间步为i；j是指e_greed为j。这些元组储存在经验E的记忆中。除了用大部分最近的经验来更新DQN，一个子集S∈E被从记忆中拉出来用于小批量的更新DQN。An experience contains a(s _i , a _i , r _i+1 , s _i+1 , label) ^j , where i refers to time step i; j refers to e_greed as j. These tuples are stored in the memory of experience E. In addition to updating DQN with most recent experience, a subset S ∈ E is pulled from memory for updating DQN in mini-batches.

6、训练结果6. Training results

本实施例将LSTM预测模型设置为6个输入(时间、风场的风速、实时功率、频率、风向、室外温度)，一个输出(负荷)，循环神经网络设置为3层，隐藏单元为128个，激活函数选择softsign激活函数，实验中的负荷数据有8932条，其中80％的数据作为训练数据，20％的数据作为测试集。折扣因子λ开始时设置为0.99，探索概率设置为1，在100步内均匀衰减到0.1。当DQN的学习速率为0.05，e_greed为0.01时，如图4的纵轴所示，DQN的损失开始明显的收敛。当训练8小时迭代到100步后DQN达到大约40％的精度，比基准线高10％，且其损失收敛速度也较基准线(梯度下降)快，如图5横轴为迭代步数，纵轴为预测模型准确率，相比较梯度下降的方法，Q梯度下降模型的准确率的上升速度与幅度均有明显优势。图6中横轴为迭代步数，纵轴为Q梯度下降与梯度下降的预测模型误差，Q梯度下降模型误差下降更为迅速。受限于计算能力的限制，我们迭代了100步，但从其中看出DQN的学习能力还是使得目标网络的精确度上升很快且其误差的收敛性也很好。In this embodiment, the LSTM prediction model is set to 6 inputs (time, wind speed of the wind field, real-time power, frequency, wind direction, outdoor temperature), one output (load), the recurrent neural network is set to 3 layers, and the hidden units are 128 , the activation function selects the softsign activation function, and there are 8932 load data in the experiment, of which 80% of the data are used as training data and 20% of the data are used as the test set. The discount factor λ is set to 0.99 at the beginning, and the exploration probability is set to 1, which decays uniformly to 0.1 within 100 steps. When the learning rate of DQN is 0.05 and e_greed is 0.01, as shown in the vertical axis of Figure 4, the loss of DQN begins to converge significantly. After 8 hours of training and iterating to 100 steps, DQN reaches an accuracy of about 40%, which is 10% higher than the baseline, and its loss convergence speed is also faster than the baseline (gradient descent), as shown in Figure 5. The horizontal axis is the number of iteration steps, and the vertical axis The axis is the accuracy rate of the prediction model. Compared with the gradient descent method, the Q gradient descent model has obvious advantages in both the rate of increase and the magnitude of the accuracy rate. In Figure 6, the horizontal axis is the number of iteration steps, and the vertical axis is the prediction model error of Q gradient descent and gradient descent, and the Q gradient descent model error decreases more rapidly. Due to the limitation of computing power, we iterated for 100 steps, but we can see that the learning ability of DQN still makes the accuracy of the target network rise rapidly and the convergence of its error is also very good.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的工作人员在本发明揭露的技术范围内，可轻易想到各种等效的修改或替换，这些修改或替换都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any worker familiar with the technical field can easily think of various equivalents within the technical scope disclosed in the present invention. Modifications or replacements shall all fall within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

1. it is a kind of using depth Q neural network tune join LSTM wind-powered electricity generation load forecasting method, which is characterized in that this method include with Lower step：

1) initial data for acquiring power system environment, chooses training set and forecast set；

2) using LSTM as prediction model, the hyper parameter in prediction model is adjusted using DQN；

3) training set is substituted into the prediction model after adjustment parameter, training result is fed back to and carries out parameter optimization in DQN, is obtained Optimal L STM prediction model；

4) wind-powered electricity generation load prediction is carried out using optimal L STM prediction model.

2. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 1, special Sign is, includes environmental parameter adjusting, state tune using the particular content that DQN adjusts the parameter in prediction model in step 2) Whole, movement selection and the intensified learning reward of regularized learning algorithm rate, environmental parameter, which is adjusted, combines LSTM prediction model and a series of Movement forms a Markovian decision model, the reality of the intensified learning reward of state adjustment, movement selection and regularized learning algorithm rate Now based on the Markovian decision model.

3. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 2, special Sign is that the particular content that environmental parameter is adjusted is：

Adaptive learning rate is adjusted using learning rate adjustment function f (x), is adjusted using regular parameter adjustment function g (x) and adapts to canonical Parameter, it is assumed that (p, y) is a training sample, and p is input, including learning rate x_tWith regular parameter z_t, y is desired output, a For reality output, then have：

In formula, n is number of samples.

4. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 3, special Sign is that the particular content of state adjustment is：

Indicate that state, the feature vector of six state features include that expectation is adjusted using the feature vector comprising six state features Dot product, MI/MAX volume between whole hyper parameter, candidate iterative target value, past M step maximum target value, descent direction and gradient Code, function review number and alignment metric, then have：

IfFor the list of M obtained minimum target values of time t-1, state [S_t] coding determined by following formula：

In formula, as f (x_t) be less thanMinimum value when, be encoded to 1, then take 0 in M F before, other situations take -1；

To the adjustment [st] that does well_alignmentFor：

5. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 4, special Sign is, descent directionExpression formula be：

In formula,ForAboutGradient,For the mean value of learning rate.

6. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 5, special Sign is, act the particular content that selects for：

For given state, using the method that learning rate or regularization parameter are reset to initial value after receiving iteration Movement selection is carried out, when Schistosomiasis control rate, there are two movements, keep learning rate or half learning rate；For adjustment Regularization coefficient allows it to increase a quarter other than two kinds of selections.

7. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 6, special Sign is that the intensified learning of regularized learning algorithm rate rewards r_id(f, x_t) expression formula be：

In formula, f_lbFor the target lower bound of functional value, c is target floor value.

8. a kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune according to claim 1, special Sign is that the particular content of step 3) is：

The skill that experience replay is used to training part, when being updated each time to the parameter of neural network, in data The training result before part is randomly transferred, for updating DQN, and then obtains optimal L STM prediction model.