CN113962364B

CN113962364B - A Multi-factor Power Load Forecasting Method Based on Deep Learning

Info

Publication number: CN113962364B
Application number: CN202111232185.5A
Authority: CN
Inventors: 朱敏; 明章强; 闫建荣; 张万利; 赵志龙
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2021-10-22
Filing date: 2021-10-22
Publication date: 2023-04-18
Anticipated expiration: 2041-10-22
Also published as: CN113962364A

Abstract

The invention discloses a multi-factor power load forecasting method based on deep learning, which comprises the steps of firstly completing the acquisition and storage of data, including power load data and environmental influence data; preprocessing and standardizing data based on abnormal data detection, autoregressive interpolation and sequence data normalization of a k-proximity algorithm and an improved DBSCAN algorithm; then, an improved CNN-LSTM electrical load prediction model is provided, and a CNN characteristic extraction module is adopted to learn local characteristics of input data; then inputting the data into an LSTM sequence learning model, and extracting sequence characteristic information of input data; meanwhile, introducing a self-attribute mechanism into the LSTM for learning the characteristics of the LSTM hidden layer, and realizing the extraction of key characteristics by distributing different attention weights so as to improve the final prediction precision; and finally, predicting the electrical load. The invention can promote the digital upgrading of the power grid, meet the individual requirements of users, and realize the association analysis of industry, the dispatching of power generation, the prediction of power utilization trend, the guidance of repeated work and production and the like.

Description

A multi-factor electricity load forecasting method based on deep learning

技术领域Technical Field

本发明涉及人工智能与用电负荷预测交叉领域，更具体的为一种基于深度学习的多因素用户用电负荷预测方法。The present invention relates to the intersection of artificial intelligence and power load forecasting, and more specifically to a multi-factor user power load forecasting method based on deep learning.

背景技术Background Art

近年来，随着国民经济高速高质量发展，人民生活水平不断提高，对电力的需求也在不断增加。国家也加快对电力项目的部署，以保证电力供应的充足。但由于相关发电项目的发电环节落后的原因，电力部门决策者不能准确把握电网负荷变化，会出现决策失误等问题，不能维持合理的电力供需关系，造成大量电力资源的浪费。对用户用电负荷进行预测具有以下意义：(1)挖掘用户用电模式，分析用户用电信息，挖掘用户用电规律，结合地域、时间等因素，进一步探索用户用电模式与出现差异的原因。针对电网用电负荷的预测结果，便于电力部门做好电力资源调度，并在不同区域，不同用户间提供个性化服务；(2)地理尺度指导电力输送，在地理尺度上对电网用电负荷的预测，可以结合地域间发电量的差异，便于电力部门进行电力资源的合理调度与输送；(3)辅助实现可视化交互系统，对用电负荷预测结果构建基于电力数据的预测及可视化分析模型，辅以分析比较效果较好的预测方法，实现电网用电负荷预测系统，从而可以生动地展现不同地域，不同时间段、不同类别电力用户的预测用电负荷，用电负荷的一系列影响因素也能在可视化系统予以体现。In recent years, with the rapid and high-quality development of the national economy, people's living standards have continued to improve, and the demand for electricity has also continued to increase. The country has also accelerated the deployment of power projects to ensure sufficient power supply. However, due to the backwardness of the power generation link of related power generation projects, decision makers in the power sector cannot accurately grasp the changes in power grid load, which will lead to problems such as decision-making errors, and cannot maintain a reasonable power supply and demand relationship, resulting in a large amount of power resources. Predicting user power load has the following significance: (1) Mining user power consumption patterns, analyzing user power consumption information, mining user power consumption patterns, and combining factors such as region and time to further explore user power consumption patterns and the reasons for differences. The prediction results of power grid load can help the power department to dispatch power resources and provide personalized services to different users in different regions; (2) The geographical scale guides power transmission. The prediction of power grid load on the geographical scale can be combined with the differences in power generation between regions to facilitate the power department to reasonably dispatch and transmit power resources; (3) Assist in the realization of a visual interactive system, build a prediction and visualization analysis model based on power data for the power load prediction results, and use a prediction method with good analysis and comparison effects to realize the power grid load prediction system, so as to vividly display the predicted power load of different regions, different time periods, and different types of power users. A series of influencing factors of power load can also be reflected in the visualization system.

电网负荷预测系统考虑到用户，地域，时间、环境、气候等影响因素，通过构建训练模型，实现对未来某时刻或时段的电力负荷预测。此前一些方法通过对逻辑斯蒂生长曲线模型，多元线性回归分析模型，灰色预测模型，神经网络模型进行仿真，建立预测模型，并对哈尔滨地区供电负荷与用电量进行预测，通过对比分析确定一种精度较高的方法。一些方法基于BP神经网络，考虑温度等多因素的影响，得到了准确度较高的输出结果。有研究者提出一种Lasso-PCA数据简约与特征提取模型用来减少模型的计算量和参数量，提高模型的执行效率，并创新性地运用改进自适应遗传算法优化BP神经网络训练过程，分析论证得出该方法具有更高对电力负荷的预测精度。针对单一尺度研究的缺陷，一种基于时序分解的后向传播算法的循环神经网络预测模型，可以综合多方面因素的影响建立预测模型，对用户未来时段的用电量进行预测。也有研究者提出一种基于水波优化算法改进径向基神经网络的用电预测模型，解决在基于用户层级情况下用电负荷预测不准确的问题。通过对RBF神经网络隐藏层中心的优化及拓展常数参数的优化，进一步证明该方法具有更高的精确度。The power grid load forecasting system takes into account factors such as users, regions, time, environment, and climate, and realizes the prediction of power load at a certain time or period in the future by building a training model. Previously, some methods simulated the logistic growth curve model, multivariate linear regression analysis model, gray prediction model, and neural network model to establish a prediction model, and predicted the power supply load and power consumption in Harbin. A method with higher accuracy was determined through comparative analysis. Some methods are based on BP neural networks, considering the influence of multiple factors such as temperature, and obtained output results with higher accuracy. Some researchers proposed a Lasso-PCA data reduction and feature extraction model to reduce the amount of calculation and parameters of the model and improve the execution efficiency of the model. They also innovatively used an improved adaptive genetic algorithm to optimize the BP neural network training process. The analysis and demonstration showed that this method has higher prediction accuracy for power load. In view of the defects of single-scale research, a recurrent neural network prediction model based on the back propagation algorithm of time series decomposition can comprehensively establish a prediction model based on the influence of multiple factors to predict the power consumption of users in future periods. Some researchers have also proposed an electricity consumption forecasting model based on the water wave optimization algorithm to improve the radial basis function neural network, solving the problem of inaccurate electricity load forecasting based on user levels. By optimizing the hidden layer center of the RBF neural network and the optimization of the expansion constant parameters, it is further proved that this method has higher accuracy.

随着智能电网的发展以及电力行业的需求不断增大，用电负荷预测的重要性日益显现，并且对用电负荷预测精度的要求越来越高。总结目前的相似研究与技术发现：现有的大多数电网负荷预测方法要么偏向于传统的算法，如回归分析法、时间序列法等，这些方法都存在缺陷，无法考虑气象数据等复杂等因素的影响，预测结果和真实值差异很大；要么使用机器学习的方法仅面向单一数据来源，如当前用电负荷和历史用电负荷，虽然短期预测效果较传统方法有所提高，但往往用户用电负荷受环境因素影响较大，例如天气、温度、湿度、节假日等，使得先前的方法预测准确性欠佳，预测结果的偏差较大。因此，探索新的基于深度学习的多因素用电负荷预测已经成为目前的研究热点。With the development of smart grids and the increasing demand of the power industry, the importance of power load forecasting has become increasingly apparent, and the requirements for power load forecasting accuracy have become increasingly higher. Summarizing the current similar research and technical findings: Most of the existing power grid load forecasting methods either tend to be traditional algorithms, such as regression analysis and time series methods, which have defects and cannot consider the influence of complex factors such as meteorological data, and the prediction results are very different from the true value; or use machine learning methods that only face a single data source, such as current power load and historical power load. Although the short-term prediction effect is better than that of traditional methods, the user's power load is often greatly affected by environmental factors, such as weather, temperature, humidity, holidays, etc., which makes the previous method poor in prediction accuracy and large deviation in prediction results. Therefore, exploring new multi-factor power load forecasting based on deep learning has become a current research hotspot.

发明内容Summary of the invention

针对上述问题，本发明的目的在于提供基于深度学习的多因素用户用电负荷预测方法，以实现不同用户类别、地域、不同环境因素条件下用电负荷准确预测的目的。技术方案如下：In view of the above problems, the purpose of the present invention is to provide a multi-factor user power load prediction method based on deep learning, so as to achieve the purpose of accurately predicting power load under different user categories, regions, and different environmental factors. The technical solution is as follows:

一种基于深度学习的多因素用电负荷预测方法，包括以下步骤：A multi-factor power load forecasting method based on deep learning, comprising the following steps:

步骤1：获取包括不同区域、年份和用电类别的用电负荷数据，以及包括温度、湿度和风力的环境影响数据，并存储在sqlite数据库；Step 1: Obtain electricity load data including different regions, years and electricity consumption categories, as well as environmental impact data including temperature, humidity and wind power, and store them in a sqlite database;

步骤2：对用电负荷数据和环境影响数据进行清洗与预处理，包括：基于k-邻近算法和改进DBSCAN算法的异常数据检测、自回归插值和序列数据归一化；Step 2: Clean and preprocess the power load data and environmental impact data, including: abnormal data detection based on the k-nearest neighbor algorithm and the improved DBSCAN algorithm, autoregressive interpolation, and sequence data normalization;

步骤3：引入自注意力机制，对LSTM的隐藏层进行特征重构，使其实现端到端的部署，从而构建基于改进的CNN-LSMT的用电负荷数据预测神经网络模型，利用步骤2处理得到的用电负荷数据和环境影响数据作为训练集和测试集；Step 3: Introduce the self-attention mechanism and reconstruct the features of the hidden layer of LSTM to achieve end-to-end deployment, thereby building a power load data prediction neural network model based on the improved CNN-LSMT, and use the power load data and environmental impact data processed in step 2 as training sets and test sets;

步骤4：通过改进的CNN-LSMT的用电负荷数据预测神经网络模型进行用电负荷预测。Step 4: Power load forecasting is performed using the improved CNN-LSMT power load data prediction neural network model.

进一步的，所述电负荷数据包括用户编号、所属地域、城网/农网、用户分类、用电类别、客户类别、电压等级、行业类别、合同容量、运行容量、首次送电日期、冻结电量日期和15分钟的冻结电量；所述环境影响数据包括当前日期、所属地域、最高温度、最低温度、平均湿度、平均风力、天气类型、15分钟的当前温度、是否为节假日；且环境影响数据的时间跟用电负荷数据的时间同步。Furthermore, the electricity load data includes user number, region, urban network/rural network, user classification, electricity consumption category, customer category, voltage level, industry category, contract capacity, operating capacity, first power supply date, frozen electricity date and 15-minute frozen electricity; the environmental impact data includes current date, region, maximum temperature, minimum temperature, average humidity, average wind speed, weather type, current temperature for 15 minutes, and whether it is a holiday; and the time of the environmental impact data is synchronized with the time of the electricity load data.

更进一步的，所述步骤2中基于k-邻近算法和改进DBSCAN算法的异常数据检测具体为：Furthermore, the abnormal data detection based on the k-nearest neighbor algorithm and the improved DBSCAN algorithm in step 2 is specifically as follows:

步骤2.1：定义样本与邻近样本之间的平均距离为该样本的异常得分，使用改进KNN算法得到每个时刻用电负荷数据的异常得分，将每个集群到各自k个邻近的总距离作为最后异常得分；Step 2.1: Define the average distance between a sample and its neighboring samples as the abnormality score of the sample, use the improved KNN algorithm to obtain the abnormality score of the power load data at each moment, and take the total distance from each cluster to its k neighbors as the final abnormality score;

某时刻i的用电负荷数据c_i的k个邻近的集合N_k(c_i)表示为：The k neighboring sets N _k ( _ci ) of the power load data _ci at a certain time i are expressed as:

式中，

表示c_i的k个邻近点之一，d_k(c_i)为c_i的k个邻近的平均距离，

表示c_i与其邻近点

的距离；In the formula,

represents one of the k neighbors of _ci , d _k ( _ci ) is the average distance of the k neighbors of _ci ,

represents _ci and its neighboring points

distance;

则电负荷数据c_i的异常得分表示为：Then the abnormal score of the electric load data _ci is expressed as:

式中，N_k(c_i)表示c_i的k个邻近点集合Where N _k ( _ci ) represents the set of k neighboring points of _ci

最后输出异常得分排序列表的前m个集群作为用电负荷数据的异常值；Finally, the first m clusters in the anomaly score sorting list are output as the outliers of the power load data;

步骤2.2：采用改进的基于聚类的异常检测算法DBSCAN，先利用局部参数实现小样本的数据进行密度聚类，再对局部聚类结果进行迭代聚类从而实现最终的全局聚类结果，并标记不属于任何簇类的点属于异常点；具体包括：Step 2.2: Using the improved clustering-based anomaly detection algorithm DBSCAN, first use local parameters to achieve density clustering of small sample data, then iteratively cluster the local clustering results to achieve the final global clustering results, and mark the points that do not belong to any cluster as outliers; specifically, including:

步骤a)参数更新：Step a) Parameter update:

设置聚类滑动窗口的大小M，计算窗口内用电负荷数据的平均距离差值，设置前k个邻近的用电负荷为MinPst，用电负荷数据之间的欧氏距离设置为Eps；Set the size of the cluster sliding window M, calculate the average distance difference of the power load data in the window, set the first k neighboring power loads as MinPst, and set the Euclidean distance between the power load data as Eps;

为每个用电负荷数据设置权重以减少对最终聚类结果的影响，权重w(c_i,c_j)的计算公式如下：A weight is set for each power load data to reduce the impact on the final clustering result. The calculation formula of the weight w( _ci , _cj ) is as follows:

其中，Cov(c_i,c_j)为i时刻的用电负荷数据c_i和j时刻的用电负荷数据c_j的协方差，Var(c_i)为i时刻的用电负荷数据c_i的方差，Var(c_j)为j时刻的用电负荷数据c_j的方差；r(c_i,c_j)表示为c_i和c_j的相关系数，值越小相关性越大；Wherein, Cov( _ci , _cj ) is the covariance of the power load data _ci at time i and the power load data cj at time _j , Var( _ci ) is the variance of the power load data _ci at time i, Var( _cj ) is the variance of the power load data cj at time _j ; r( _ci , _cj ) is expressed as the correlation coefficient between _ci and _cj , the smaller the value, the greater the correlation;

步骤b)通过分析聚类结果得到用电负荷数据异常检测结果：Step b) Obtaining the power load data anomaly detection result by analyzing the clustering results:

在聚类的过程中，将第一次标记的异常点设置为候选异常点，设置异常分数加1，在循环迭代聚类过程中进入下一个候选异常点，更新异常分数；如果异常分数S等于聚类数C则标记为异常点。In the clustering process, the first marked outlier is set as a candidate outlier, and the anomaly score is set plus 1. In the cyclic iterative clustering process, the next candidate outlier is entered and the anomaly score is updated; if the anomaly score S is equal to the number of clusters C, it is marked as an outlier.

更进一步的，所述步骤2中自回归插值具有为：Furthermore, the autoregressive interpolation in step 2 has:

采用Lagrange插值法对缺失的用电负荷数据值进行补全，使得n-1的多项式y＝a₀+a₁x+a₂x²+…+a_n-1x^n-1经过n个点的坐标(x₁,y₁),(x₂,y₂),(x₃,y₃),…,(x_n,y_n)，那么拉格朗日插值的函数表达式表示为：The Lagrange interpolation method is used to complete the missing power load data values, so that the n-1 polynomial y= _a0 + _a1x + _a2x2 +…+a _n- 1xn ^-1 passes through the coordinates of n points ( _x1 , ^y1 ), ( _x2 , _y2 ), ( _x3 , _y3 ),… _, ( _xn , _yn ). Then the function expression of Lagrange interpolation is expressed as:

其中，x_i和x_j分别表示电力用户的第i个和第j个时刻，y_i表示电力用户的第i个时刻的用电负荷；n表示用电负荷总的时刻数。Among them, _xi and _xj represent the i-th and j-th moments of the power user respectively, _yi represents the power load of the power user at the i-th moment; and n represents the total number of moments of power load.

更进一步的，所述步骤2中序列数据归一化的处理公式为：Furthermore, the processing formula for normalizing the sequence data in step 2 is:

其中，X＝{X₁,X₂,X₃…,X_N}表示同一类别的电负荷数据或环境影响数据；其归一化后的值为Y＝{Y₁,Y₂,Y₃…,Y_N}，l∈[1,N]，N为某类别种电负荷数据或环境影响数据的总数。Among them, X＝{X ₁ ,X ₂ ,X ₃ …,X _N } represents the same category of electric load data or environmental impact data; its normalized value is Y＝{Y ₁ ,Y ₂ ,Y ₃ …,Y _N }, l∈[1,N], and N is the total number of electric load data or environmental impact data of a certain category.

更进一步的，所述步骤3具体包括：Furthermore, the step 3 specifically includes:

步骤3.1：对于网络模型数据的加载和准备Step 3.1: Loading and preparing network model data

加载数据集并划分训练集、验证集和测试集，然后分别对训练集和测试集构造Dataloader作为数据读取器，模型训练的时候对每一个batch的数据进行计算，Dataloader按照batch的大小将数据加载到内存，并对每一个batch的数据打乱顺序，以提高模型训练的鲁棒性；Load the dataset and divide it into training set, validation set and test set, then construct Dataloader as data reader for the training set and test set respectively. When training the model, calculate the data of each batch. Dataloader loads the data into memory according to the batch size and shuffles the order of each batch of data to improve the robustness of model training.

步骤3.2：构建基于改进的CNN-LSMT的用电负荷数据预测神经网络模型Step 3.2: Construct a neural network model for power load data prediction based on improved CNN-LSMT

所述神经网络模型包括CNN特征学习模块、LSTM序列学习模块和自注意力机制模块三部分：The neural network model includes three parts: CNN feature learning module, LSTM sequence learning module and self-attention mechanism module:

1)所述CNN特征学习模块包括三个一维卷积层，其中在两个连续的卷积层之间加入MaxPooling层和ReLu层；通过卷积操作学习标准化后的用电负荷数据和环境影响数据的特征，作为卷积层输出的特征图；添加MaxPooling层以减轻生成的特征图不变性的限制，激活函数ReLu以增强模型学习复杂结构的能力；1) The CNN feature learning module includes three one-dimensional convolutional layers, wherein a MaxPooling layer and a ReLu layer are added between two consecutive convolutional layers; the features of the standardized power load data and the environmental impact data are learned through the convolution operation as the feature map output by the convolutional layer; the MaxPooling layer is added to alleviate the limitation of the invariance of the generated feature map, and the activation function ReLu is used to enhance the model's ability to learn complex structures;

2)所述LSTM序列学习模块包括三个LSTM层，每个层包含二十个神经元；前两个LSTM层输出隐藏状态的完整序列，而在最后的LSTM层，输出隐藏状态最后的时间步；2) The LSTM sequence learning module includes three LSTM layers, each layer contains twenty neurons; the first two LSTM layers output the complete sequence of hidden states, and the last LSTM layer outputs the last time step of the hidden state;

LSTM层的输入为t时刻的影响参数x_t和上一时刻的预测值h_t-1，经过预测函数F得到t时刻的预测值h_t；其函数表达式如下：The input of the LSTM layer is the influencing parameter xt at time _t and the predicted value _ht-1 at the previous time. The predicted value _ht at time t is obtained through the prediction function F. Its function expression is as follows:

h_t＝F(x_t,h_t-1)h _t =F(x _t ,h _t-1 )

3)所述三个LSTM层之间分别加入一个自注意力机制模块，自注意力机制模块对LSTM层提取的隐藏层的特征分配权重，从而挖掘用电负荷数据和环境影响数据更具判别性的特征；3) A self-attention mechanism module is added between each of the three LSTM layers, and the self-attention mechanism module assigns weights to the features of the hidden layer extracted by the LSTM layer, thereby mining more discriminative features of the power load data and the environmental impact data;

t时刻LSTM网络倒数第二层的隐藏层输出序列分配自注意力权重w_tl，其表达式为：The output sequence of the hidden layer of the penultimate layer of the LSTM network at time t is assigned the self-attention weight w _tl , which is expressed as:

其中，L_h为LSTM隐藏层输出的序列长度，l表示LSTM隐藏层输出序列的序列号，s_tl表示LSTN隐藏层在t时刻的序列l与其他序列直接的相似度；Among them, L _h is the length of the sequence output by the LSTM hidden layer, l represents the sequence number of the LSTM hidden layer output sequence, and s _tl represents the direct similarity between the sequence l of the LSTM hidden layer at time t and other sequences;

将序列l的特征h_tl与其对应的权重w_tl相乘构成新的特征序列h_t'_l并输入到下一个LSTM中；Multiply the feature h _tl of sequence l by its corresponding weight w _tl to form a new feature sequence h _t ' _l and input it into the next LSTM;

步骤3.3：定义优化器，设置模型训练参数Step 3.3: Define the optimizer and set model training parameters

使用平均绝对误差作为损失函数来监控验证损失，设置自适应优化器Adam，在训练的过程中能自适应的更新学习率；并设置训练迭代次数epoch，初始损失函数，及batch的大小。Use mean absolute error as the loss function to monitor the validation loss, set the adaptive optimizer Adam, and adaptively update the learning rate during the training process; and set the number of training iterations epoch, the initial loss function, and the batch size.

更进一步的，所述步骤4中对用电负荷的预测包括：超短期用电负荷预测、短期用电负荷预测和中长期用电负荷预测。Furthermore, the prediction of power load in step 4 includes: ultra-short-term power load prediction, short-term power load prediction and medium- and long-term power load prediction.

本发明的有益效果是：The beneficial effects of the present invention are:

1)本发明提出的基于深度学习的多因素用电负荷预测方法，弥补了现状中存在仅面向单一数据来源，无法考虑气象数据等复杂等因素的影响，预测结果和真实值差异很大的不足之处。通过引入外部环境因素如温度、湿度、风力等最为用电负荷的影响因子并结合历史用电负荷数据作为输入特征，能够实现当前用电负荷的预测，并且考虑环境因素后能够有效提高预测精度。1) The multi-factor power load forecasting method based on deep learning proposed in this invention makes up for the shortcomings of the current situation that it only focuses on a single data source, cannot consider the influence of complex factors such as meteorological data, and the prediction results are very different from the true value. By introducing external environmental factors such as temperature, humidity, wind power, etc. as the most influential factors of power load and combining historical power load data as input features, the current power load can be predicted, and the prediction accuracy can be effectively improved after considering environmental factors.

2)本发明提出的基于深度学习的多因素用电负荷预测方法，提出一种KNN和改进DBSCAN结合的用电负荷异常数据检测的方法，可以有效挖掘用电负荷数据值的离散值，通过还原缺失或异常的用电负荷数据，可以减少异常值对预测模型的影响。2) The present invention proposes a multi-factor electricity load forecasting method based on deep learning, and proposes a method for detecting abnormal electricity load data by combining KNN and improved DBSCAN, which can effectively mine the discrete values of electricity load data values, and reduce the impact of abnormal values on the prediction model by restoring missing or abnormal electricity load data.

3)本发明提出的基于深度学习的多因素用电负荷预测方法，提出一种改进的CNN-LSTM用电负荷预测模型，首先采用CNN特性提取模块对输入数据的局部特征进行学习；然后输入到LSTM序列学习模型中，提取输入数据的序列特征信息；同时将self-attention机制引入到LSTM中用于学习LSTM隐藏层的特征，通过分配不同的注意力权重实现关键特制的提取，以提升最终的预测精度；最后通过Dropout层和FC层对用电负荷进行预测。此外，基于改进的CNN-LSTM预测模型不仅能获得较高的预测准确度，而且能简化神经网络构建流程，并且可以实现端到端的部署，从而促进研究结果快速投入使用。3) The multi-factor power load forecasting method based on deep learning proposed in the present invention proposes an improved CNN-LSTM power load forecasting model. First, the CNN feature extraction module is used to learn the local features of the input data; then it is input into the LSTM sequence learning model to extract the sequence feature information of the input data; at the same time, the self-attention mechanism is introduced into the LSTM to learn the features of the LSTM hidden layer, and the key special extraction is achieved by assigning different attention weights to improve the final prediction accuracy; finally, the power load is predicted through the Dropout layer and the FC layer. In addition, the improved CNN-LSTM prediction model can not only obtain a higher prediction accuracy, but also simplify the neural network construction process, and can achieve end-to-end deployment, thereby promoting the rapid use of research results.

4)本发明提出的基于深度学习的多因素用电负荷预测方法，实现不同类别用户超短期、短期和中长期用电负荷的预测，并且对预测结果进行可视化，并辅助对疫情前后用电分析、电力用电分布分析和电力用户画像等。本发明能够促进电网数字化升级，满足用户的个性化需求，实现行业关联分析、调度电力发电、预测用电趋势和指导复工复产等。本发明既能够作为独立的系统，也可作为组件嵌入到电力部门原有的系统中能够很大程度上节约资源，提高工作效率。4) The multi-factor electricity load forecasting method based on deep learning proposed in the present invention realizes the prediction of ultra-short-term, short-term and medium- and long-term electricity loads of different categories of users, and visualizes the prediction results, and assists in the analysis of electricity consumption before and after the epidemic, electricity consumption distribution analysis and electricity user portraits. The present invention can promote the digital upgrade of the power grid, meet the personalized needs of users, realize industry correlation analysis, dispatch power generation, predict electricity consumption trends and guide resumption of work and production. The present invention can be used as an independent system or as a component embedded in the original system of the power department, which can save resources to a great extent and improve work efficiency.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明整体流程示意图。FIG. 1 is a schematic diagram of the overall process of the present invention.

图2是本发明的数据处理流程示意图。FIG. 2 is a schematic diagram of a data processing flow of the present invention.

图3是本发明提出的基于改进的CNN-LSTM用电负荷预测模型结构示意图。FIG3 is a schematic diagram of the structure of the electricity load prediction model based on the improved CNN-LSTM proposed in the present invention.

图4是本发明对某地域商业用电的超短期用电负荷预测值与真实值的对比折线图；(a)为2019年该地五一商业用电负荷预测值与真实值的对比；(b)为2018/2019/2020年该地五一商业用电负荷对比。Figure 4 is a line graph comparing the ultra-short-term commercial electricity load forecast value and the actual value for a certain region according to the present invention; (a) is a comparison between the forecast value and the actual value of the May 1 commercial electricity load in the region in 2019; (b) is a comparison of the May 1 commercial electricity load in the region in 2018/2019/2020.

图5本发明对某地域商业用电的短期用电负荷预测值与真实值的对比示折线图。FIG5 is a line graph showing a comparison between the predicted value of short-term commercial electricity load in a certain region and the actual value according to the present invention.

图6本发明对某地域商业用电的中长期用电负荷预测值与真实值的对比折线图；(a)为该地每周用电负荷预测；(b)为该地每月用电负荷预测。FIG6 is a line graph comparing the medium- and long-term electricity load forecast value and the actual value of commercial electricity consumption in a certain area according to the present invention; (a) is the weekly electricity load forecast for the area; (b) is the monthly electricity load forecast for the area.

图7是本发明的应用方案示意图。FIG. 7 is a schematic diagram of an application scheme of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图和具体实施例对本发明做进一步详细说明。本发明通过引入外部环境因素如温度、湿度、风力等最为用电负荷的影响因子，使用基于k-邻近(KNN)算法和改进DBSCAN(Density-Based Spatial Clustering of Applications with Noise基于密度的空间聚类在噪声中的应用)算法的异常数据检测、自回归插值和序列数据归一化对数据进行预处理和标准化；然后提出一种改进的CNN-LSTM用电负荷预测模型，首先采用CNN特性提取模块对输入数据的局部特征进行学习；然后输入到LSTM序列学习模型中，提取输入数据的序列特征信息；同时将self-attention机制引入到LSTM中用于学习LSTM隐藏层的特征，通过分配不同的注意力权重实现关键特制的提取，以提升最终的预测精度；最后进行超短期用电负荷预测、短期用电负荷预测和中长期用电负荷预测并辅助对疫情前后用电分析、电力用电分布分析和电力用户画像等。本发明能够促进电网数字化升级，满足用户的个性化需求，实现行业关联分析、调度电力发电、预测用电趋势和指导复工复产等。The present invention is further described in detail below in conjunction with the accompanying drawings and specific embodiments. The present invention introduces external environmental factors such as temperature, humidity, wind power, etc. as the most influential factors of power load, and uses abnormal data detection, autoregressive interpolation and sequence data normalization based on k-nearest neighbor (KNN) algorithm and improved DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm to preprocess and standardize the data; then an improved CNN-LSTM power load prediction model is proposed, first using the CNN feature extraction module to learn the local features of the input data; then input into the LSTM sequence learning model to extract the sequence feature information of the input data; at the same time, the self-attention mechanism is introduced into the LSTM to learn the features of the LSTM hidden layer, and the key special extraction is realized by assigning different attention weights to improve the final prediction accuracy; finally, ultra-short-term power load prediction, short-term power load prediction and medium- and long-term power load prediction are carried out, and the power consumption analysis before and after the epidemic, power consumption distribution analysis and power user portrait are assisted. The present invention can promote the digital upgrade of the power grid, meet the personalized needs of users, realize industry correlation analysis, dispatch power generation, predict power consumption trends, and guide resumption of work and production.

如图1所示的本发明基于深度学习的多因素用电负荷预测方法的具体实施流程图，包括：用电负荷数据和外部环境数据的获取与存储、基于KNN和改进DBSCAN算法的数据清洗与预处理、基于改进CNN-LSTM模型的搭建与训练和用电负荷的预测。具体实施步骤如下：The specific implementation flow chart of the multi-factor power load forecasting method based on deep learning of the present invention as shown in Figure 1 includes: acquisition and storage of power load data and external environment data, data cleaning and preprocessing based on KNN and improved DBSCAN algorithm, construction and training based on improved CNN-LSTM model and power load forecasting. The specific implementation steps are as follows:

步骤1：获取不同区域、年份、用电类别负荷数据和外部环境因素如温度、湿度、风力等数据，并存储在sqlite数据库。具体步骤如下：Step 1: Obtain load data of different regions, years, and electricity consumption categories and external environmental factors such as temperature, humidity, wind speed, etc., and store them in the sqlite database. The specific steps are as follows:

1)不同区域、年份、用电类别负荷数据是从国家电网用户用电信息发布平台等渠道获取得到，数据类别包括用户编号(已脱敏)、所属地域、城网/农网、用户分类(如高压、充电桩、并网高压自备电厂、低压非居民、低压居民等)、用电类别(家庭用电、商业用电、城市用电、农村用电等)、客户类别(正常用电客户、变更用电客户)、电压等级(超高压、高压、低压)、行业类别、合同容量、运行容量、首次送电日期、冻结电量日期和15分钟的冻结电量等，表1显示了对用电负荷数据字段的描述。对于用电负荷数据选择存储在sqlite数据库，方便数据的迁移和系统的部署。1) Load data of different regions, years, and electricity consumption categories are obtained from channels such as the State Grid user electricity consumption information release platform. The data categories include user number (desensitized), region, urban network/rural network, user classification (such as high voltage, charging pile, grid-connected high voltage self-provided power plant, low voltage non-residential, low voltage resident, etc.), electricity consumption category (household electricity, commercial electricity, urban electricity, rural electricity, etc.), customer category (normal electricity customers, changed electricity customers), voltage level (ultra-high voltage, high voltage, low voltage), industry category, contract capacity, operating capacity, first power supply date, frozen electricity date and 15-minute frozen electricity, etc. Table 1 shows the description of the electricity load data fields. The electricity load data is stored in the sqlite database to facilitate data migration and system deployment.

表1用电负荷数据字段描述Table 1 Description of power load data fields

2)通过编写Python爬虫程序完成外部环境因素如温度、湿度、风力等数据的获取与存储。Python爬虫过程涉及模拟登陆、页面获取与解析、数据结构设计以及将外部环境数据存储入库等。外部环境因素如温度、湿度、风力等数据从国家气象网站的官方接口获取，数据类别包括当前日期、所属地域、最高温度、最低温度、平均湿度、平均风力、天气类型(设置阈值范围为[1,4]，表示阴到晴的程度)、15分钟的当前温度、是否为节假日(取值为0或1)，表2显示了对外部环境数据字段的描述。其中环境数据的时间跟用电负荷数据的时间同步，同样地，对于环境数据选择存储在sqlite数据库，方便数据的迁移和系统的部署。2) The acquisition and storage of data on external environmental factors such as temperature, humidity, and wind speed are completed by writing a Python crawler program. The Python crawler process involves simulated login, page acquisition and parsing, data structure design, and storage of external environmental data. Data on external environmental factors such as temperature, humidity, and wind speed are obtained from the official interface of the National Meteorological Website. The data categories include the current date, region, maximum temperature, minimum temperature, average humidity, average wind speed, weather type (the threshold range is set to [1,4], indicating the degree of cloudy to sunny), the current temperature of 15 minutes, and whether it is a holiday (the value is 0 or 1). Table 2 shows the description of the external environmental data fields. The time of the environmental data is synchronized with the time of the power load data. Similarly, the environmental data is stored in the sqlite database to facilitate data migration and system deployment.

表2外部环境数据字段描述Table 2 External environment data field description

字段名称Field Name 字段含义Field meaning WEATHER_DATEWEATHER_DATE 当前日期Current Date WEATHER_ADDRWEATHER_ADDR 所属地域Region MAX_TEMPMAX_TEMP 最高温度Maximum temperature LOW_TEMPLOW_TEMP 最低温度Minimum temperature AVERAGE_HUMIDITYAVERAGE_HUMIDITY 平均湿度Average humidity AVERAGE_WINDAVERAGE_WIND 平均风力Average wind speed WEATHER_TYPEWEATHER_TYPE 天气类型(阈值[1,4]，表示阴到晴的程度)Weather type (threshold [1,4], indicating the degree of cloudy to sunny) R1_WEATHERR1_WEATHER 15分钟的当前温度Current temperature over 15 minutes IS_HOLIDAYIS_HOLIDAY 是否为节假日(取值为0或1)Is it a holiday (value is 0 or 1)

步骤2：用电负荷数据和环境影响数据的清洗与预处理，包括：基于k-邻近(KNN)算法和改进DBSCAN算法的异常数据检测、自回归插值和序列数据归一化，参考图2为本发明的数据处理流程示意图。Step 2: Cleaning and preprocessing of power load data and environmental impact data, including: abnormal data detection based on k-nearest neighbor (KNN) algorithm and improved DBSCAN algorithm, autoregressive interpolation and sequence data normalization. Refer to Figure 2 for a schematic diagram of the data processing flow of the present invention.

具体步骤如下：The specific steps are as follows:

1)基于k-邻近(KNN)算法和改进DBSCAN算法的异常用电负荷数据检测。由于电力设备故障或人为等因素，使得获取的用户用电负荷数据存在少量缺失或者异常的情况，这些缺失或者异常数据对用户用电负荷的预测准确度影响较大，所以需要检测出这些缺失或者异常数据，采用剔除或者插值等方法来还原正常的数据值。1) Detection of abnormal power load data based on k-nearest neighbor (KNN) algorithm and improved DBSCAN algorithm. Due to power equipment failure or human factors, the acquired user power load data may contain a small amount of missing or abnormal data. These missing or abnormal data have a great impact on the prediction accuracy of user power load. Therefore, it is necessary to detect these missing or abnormal data and use methods such as elimination or interpolation to restore normal data values.

本发明提出基于改进的KNN算法的离群点检测方法。KNN算法的基本规则是寻找样本c在特征空间中的k个邻近样本或最相似的样本(1＜k＜N，N表示总的样本个数)，如果这k个邻近样本的大多数属于一个类别，那么c也属于这个类别。通过计算样本c与邻近样本之间的平均距离作为样本c的异常得分，异常得分越高表示样本c越异常。为了更好地反映用电负荷数据的异常情况，本发明在使用改进KNN算法得到每个时刻用电负荷数据的异常得分后，使用每个集群到各自k个邻近的总距离作为最后异常得分。则某时刻i的用电负荷数据c_i的k个邻近的集合N_k(c_i)表示为：The present invention proposes an outlier detection method based on an improved KNN algorithm. The basic rule of the KNN algorithm is to find k neighboring samples or the most similar samples (1＜k＜N, N represents the total number of samples) of sample c in the feature space. If most of these k neighboring samples belong to a category, then c also belongs to this category. The average distance between sample c and its neighboring samples is calculated as the abnormality score of sample c. The higher the abnormality score, the more abnormal the sample c is. In order to better reflect the abnormal situation of the power load data, the present invention uses the total distance from each cluster to its respective k neighbors as the final abnormality score after obtaining the abnormality score of the power load data at each moment using the improved KNN algorithm. Then the set N _k (c _i ) of k neighbors of the power load data c _i at a certain moment i is expressed as:

式中，

表示c_i的k个邻近点之一，d_k(c_i)为c_i的k个邻近的平均距离，

表示c_i与其邻近点

的距离；则电负荷数据c_i的异常得分表示为：In the formula,

represents _ci and its neighboring points

The distance; then the abnormal score of the electric load data c _i is expressed as:

式中，N_k(c_i)表示c_i的k个邻近点集合；Where N _k ( _ci ) represents the set of k neighboring points of _ci ;

最后，输出异常得分排序列表的前m个集群作为用电负荷数据的异常值。Finally, the top m clusters in the anomaly score sorted list are output as outliers of the power load data.

由于KNN算法在计算集群c_i的k个邻近的时存在异常集群和正常集群的相互干扰，是的异常检测结果后容易出现错检和漏检的情况。为了解决这个问题，本发明提出改进的基于聚类的异常检测算法DBSCAN。DBSCAN算法是一种典型的基于密度的聚类算法，该算法通过将紧密相连的样本划为各个不同的类别，最终得出聚类类别结果，其中认定不属于任何簇类的点属于异常点。相比传统的DBSCAN算法采用全局统一的参数Eps和MinPst实现聚类，本发明提出一种基于滑动窗口的数据划分改进方法，先利用局部参数实现小样本的数据进行密度聚类，再对局部聚类结果进行迭代聚类从而实现最终的全局聚类结果。改进的DBSCAN算法的步骤包括参数更新、聚类和异常检测三个步骤。Since the KNN algorithm interferes with abnormal clusters and normal clusters when calculating the k neighbors of cluster c _i , it is easy to have misdetection and missed detection after the abnormal detection result. In order to solve this problem, the present invention proposes an improved clustering-based anomaly detection algorithm DBSCAN. The DBSCAN algorithm is a typical density-based clustering algorithm, which divides closely connected samples into different categories and finally obtains clustering category results, in which points that do not belong to any cluster are considered to be abnormal points. Compared with the traditional DBSCAN algorithm that uses globally unified parameters Eps and MinPst to achieve clustering, the present invention proposes an improved data partitioning method based on a sliding window, which first uses local parameters to achieve density clustering of small sample data, and then iteratively clusters the local clustering results to achieve the final global clustering results. The steps of the improved DBSCAN algorithm include three steps: parameter update, clustering and anomaly detection.

在参数更新的过程中，首先需要设置聚类滑动窗口的大小M并计算窗口内用电负荷数据的平均距离差值，设置前k个邻近的用电负荷为MinPst，用电负荷数据之间的欧氏距离设置为Eps。为了缓解用电负荷数据之间的不一致性，为每个用电负荷数据设置权重以减少对最终聚类结果的影响。权重w(c_i,c_j)的计算公式如下：In the process of parameter update, we first need to set the size M of the cluster sliding window and calculate the average distance difference of the power load data in the window, set the first k neighboring power loads as MinPst, and set the Euclidean distance between the power load data as Eps. In order to alleviate the inconsistency between the power load data, a weight is set for each power load data to reduce the impact on the final clustering result. The calculation formula of the weight w( _ci , _cj ) is as follows:

其中，Cov(c_i,c_j)为i时刻的用电负荷数据c_i和j时刻的用电负荷数据c_j的协方差，Var(c_i)为i时刻的用电负荷数据c_i的方差，Var(c_j)为j时刻的用电负荷数据c_j的方差；r(c_i,c_j)表示为c_i和c_j的相关系数，值越小相关性越大；通过分析聚类结果得到用电负荷数据异常检测结果。在聚类的过程中，将第一次标记为异常点设置为候选异常点，设置异常分数加1(初始值为0)，在循环迭代聚类过程中进入下一个候选异常点，更新异常分数。如果异常分数S等于聚类数C则标记为异常点。Among them, Cov( _ci , _cj ) is the covariance of the power load data _ci at time i and the power load data _cj at time j, Var( _ci ) is the variance of the power load data _ci at time i, and Var( _cj ) is the variance of the power load data _cj at time j; r( _ci , _cj ) is expressed as the correlation coefficient of _ci and _cj , and the smaller the value, the greater the correlation; the power load data anomaly detection result is obtained by analyzing the clustering results. In the clustering process, the first marked abnormal point is set as a candidate abnormal point, and the abnormal score is set plus 1 (the initial value is 0). In the cyclic iterative clustering process, the next candidate abnormal point is entered and the abnormal score is updated. If the abnormal score S is equal to the number of clusters C, it is marked as an abnormal point.

2)对于用电负荷缺失值的处理，通常采用插值法补全缺失值，本发明采用Lagrange插值法对缺失的用电负荷数据值。插值的思想就是根据已知的点建立合适的插值函数f(x)，因此未知点x_i由插值函数f(x)得到函数值为f(xi)，从而可以使用(x_i,f(x_i))近似代替缺失点。2) For the processing of missing values of power load, interpolation is usually used to fill in the missing values. The present invention uses Lagrange interpolation for the missing power load data values. The idea of interpolation is to establish a suitable interpolation function f(x) based on known points. Therefore, the unknown point x _i is obtained by the interpolation function f(x) as f(xi), so that (x _i ,f(x _i )) can be used to approximate the missing point.

Lagrange插值的思想就是使得n-1的多项式y＝a₀+a₁x+a₂x²+L+a_n-1x^n-1经过n个点的坐标(x₁,y₁),(x₂,y₂),(x₃,y₃),L,(x_n,y_n)，那么拉格朗日插值的函数表达式可以表示为：The idea of Lagrange interpolation ^{is to make the n-1 polynomial y=a0+a1x+a2x2} ₊ _L ₊ a _n- ^1xn-1 pass through the coordinates of n points ( _x1 , _y1 ), ( _x2 , _y2 ), ( _x3 , _y3 ), L, ( _xn , _yn ), then the function expression of Lagrange interpolation can be expressed as:

其中，x_i表示电力用户的第i个时刻，y_i表示电力用户的第i个时刻的用电负荷；n表示用电负荷总的时刻数。Among them, _xi represents the i-th moment of the power user, _yi represents the power load of the power user at the i-th moment; n represents the total number of moments of power load.

3)对用户用电数据和环境数据进行归一化处理。由于不同类别或相同类别的数据差异很大，如用电负荷数据和温度在数值表示上差异很大，不同类别的用户类型的用电负荷数据数值上差异也很大，如果不进归一化处理，对模型的训练产生较大的影响，会使得某一参数的影响被隐藏或者放大。因此，对各指标数据的归一化处理公式为：3) Normalize the user power consumption data and environmental data. Since the data of different categories or the same category vary greatly, such as the power load data and temperature have great differences in numerical representation, and the power load data of different types of users also have great differences in numerical values, if normalization is not performed, it will have a great impact on the training of the model, and the influence of a certain parameter will be hidden or amplified. Therefore, the normalization formula for each indicator data is:

其中，X＝{X₁,X₂,X₃L,X_N}表示同一类别的数据如用电负荷数据、温度数据或风力大小等，因此数据归一化后的值为Y＝{Y₁,Y₂,Y₃L,Y_N}，N为某类别种电负荷数据或环境影响数据的总数。Among them, X={ _X1 , _X2 , _X3L , _XN } represents data of the same category, such as power load data, temperature data or wind speed, so the normalized value of the data is Y={ _Y1 , _Y2 , _Y3L , _YN }, and N is the total number of power load data or environmental impact data of a certain category.

步骤3：构建基于改进的CNN-LSMT的用电负荷数据预测网络模型，图3本发明提出的基于改进CNN-LSTM的用电负荷预测模型示意图。CNN和LSTM是使用最为广泛的深度学习技术。CNN模型可用于提取有价值的特征并可以滤除输入数据的噪声，LSTM网络能够捕获序列模式信息。LSTM网络虽然能够处理时间相关性序列信息，但它仅利用训练集中提供的属性，而相比之下，CNN能用于提取局部特征以及出现在不同区域的相同特征，但它不具备处理时序信息的特征。因此，利用两种深度学习技术优势的混合模型可以提高预测准确性。此外本发明引入自注意力注意力(self-attention)机制对CNN-LSTM进行改进，能够对LSTM的隐藏层进行特征重构。Step 3: Construct a power load data prediction network model based on the improved CNN-LSTM. FIG3 is a schematic diagram of the power load prediction model based on the improved CNN-LSTM proposed by the present invention. CNN and LSTM are the most widely used deep learning technologies. The CNN model can be used to extract valuable features and filter out the noise of the input data, and the LSTM network can capture sequence pattern information. Although the LSTM network can process time-related sequence information, it only uses the attributes provided in the training set. In contrast, CNN can be used to extract local features and the same features that appear in different regions, but it does not have the characteristics of processing time series information. Therefore, a hybrid model that utilizes the advantages of the two deep learning technologies can improve the prediction accuracy. In addition, the present invention introduces a self-attention mechanism to improve CNN-LSTM, which can reconstruct the features of the hidden layer of LSTM.

因此具体步骤如下：Therefore, the specific steps are as follows:

步骤3.1)用电负荷预测模型训练数据的准备，利用步骤2)处理得到的用电负荷数据和环境数据作为训练数据，例如以短期(每天)用电负荷预测为例，训练数据为一年365天某用户的每天的用电负荷和环境因素如最高温度、最低温度、平均温度、平均湿度等预处理后的数据，所述的用电负荷预测模型为神经网络。首先加载数据集并按照7:2:1划分训练集、验证集和测试集，然后分别对训练集和测试集构造Dataloader用于数据读取，模型训练的时候只是对每一个batch的数据进行计算，batch的大小将数据加载到内存，并对每一个batch的数据的打乱顺序，提高模型训练的鲁棒性，本发明设置batch的大小为10。Step 3.1) Preparation of training data for the power load prediction model, using the power load data and environmental data obtained in step 2) as training data, for example, taking short-term (daily) power load prediction as an example, the training data is the pre-processed data of a user's daily power load and environmental factors such as maximum temperature, minimum temperature, average temperature, average humidity, etc. for 365 days a year, and the power load prediction model is a neural network. First, load the data set and divide it into training set, verification set and test set according to 7:2:1, and then construct Dataloader for data reading for the training set and test set respectively. When training the model, only the data of each batch is calculated. The batch size loads the data into the memory and shuffles the order of each batch of data to improve the robustness of the model training. The batch size is set to 10 in the present invention.

步骤3.2)用电负荷预测模型的搭建，本发明中的深度学习模型采用Python语言实验，改进的CNN-LSTM的用电负荷预测模型基于Pytorch深度学习库实现。改进的CNN-LSTM的用电负荷预测模型包括CNN特征学习模块、LSTM序列学习模块和自注意力注意力(self-attention)机制模块三部分。Step 3.2) Construction of the power load forecasting model. The deep learning model in the present invention uses Python language experiment, and the improved CNN-LSTM power load forecasting model is implemented based on the Pytorch deep learning library. The improved CNN-LSTM power load forecasting model includes three parts: CNN feature learning module, LSTM sequence learning module and self-attention mechanism module.

CNN特征学习模块由三个一维卷积层组成，其中在两个连续的卷积层之间加入MaxPooling层和ReLu层。引入卷积操作学习标准化后的用电负荷数据和环境数据的特征，作为卷积层输出的特征图有一个限制，即它会跟踪输入数据特征的精确位置，能够得到输入数据中的更具判别性的特征。通常在卷积层之后添加一个MaxPooling层，以减轻生成的特征图不变性的限制，而激活函数ReLu则用于增强模型学习复杂结构的能力，从而降低整体计算负载，同时使网络更易于训练。The CNN feature learning module consists of three one-dimensional convolutional layers, in which the MaxPooling layer and the ReLu layer are added between two consecutive convolutional layers. The convolution operation is introduced to learn the features of the standardized power load data and environmental data. The feature map output by the convolutional layer has a limitation, that is, it will track the exact location of the input data features and can obtain more discriminative features in the input data. A MaxPooling layer is usually added after the convolutional layer to alleviate the limitation of the invariance of the generated feature map, and the activation function ReLu is used to enhance the model's ability to learn complex structures, thereby reducing the overall computational load and making the network easier to train.

在对电力负荷进行预测时，应充分考虑负荷数据的时序相关性，相比传统的循环神经网络，LSTM能够准确学习时间序列中的长期依赖关系，适合对长周期的电力负荷数据进行学习，因此本发明使用LSTM进行用电负荷预测可以获得更高的预测精度。输入为t时刻的影响参数x_t和上一时刻的预测值h_t-1，经过预测函数F得到t时刻的预测值h_t。其函数表达是如下：When predicting the power load, the time series correlation of the load data should be fully considered. Compared with the traditional recurrent neural network, LSTM can accurately learn the long-term dependency in the time series and is suitable for learning long-period power load data. Therefore, the present invention uses LSTM to predict the power load and can obtain higher prediction accuracy. The input is the influencing parameter xt at time _t and the predicted value _ht-1 at the previous time. The predicted value _ht at time t is obtained through the prediction function F. Its function expression is as follows:

h_t＝F(x_t,h_t-1)h _t =F(x _t ,h _t-1 )

LSTM通过给循环神经网络增加记忆及控制门的方式，增强了其处理长序列依赖问题的能力，在对用电负荷进行预测时表现出较好的性能。LSTM enhances the ability of recurrent neural networks to handle long sequence dependency problems by adding memory and control gates, and shows better performance in predicting power load.

在LSTM序列学习模块中，本发明使用了三个LSTM层，每个层包含二十个神经元。前两个LSTM层输出隐藏状态的完整序列，而在最后的LSTM层，输出隐藏状态最后的时间步。In the LSTM sequence learning module, the present invention uses three LSTM layers, each layer contains twenty neurons. The first two LSTM layers output the complete sequence of hidden states, and in the last LSTM layer, the last time step of the hidden state is output.

为了充分挖掘LSTM内部隐藏层的特征对模型最终的预测能力，本发明在序列学习模块的三个LSTM之间分别加入一个self-attention模块，self-attention机制对LSTM提取的隐藏层的特征分配权重，从而挖掘用电负荷数据和环境数据更具判别性的特征，以提升最终的模型预测准确度。t时刻LSTM网络倒数第二层的隐藏层输出序列分配自注意力权重w_tl，其表达式为：In order to fully explore the features of the hidden layer inside the LSTM for the final prediction ability of the model, the present invention adds a self-attention module between the three LSTMs in the sequence learning module. The self-attention mechanism assigns weights to the features of the hidden layer extracted by the LSTM, thereby mining more discriminative features of the power load data and environmental data to improve the final model prediction accuracy. The output sequence of the hidden layer of the penultimate layer of the LSTM network at time t is assigned the self-attention weight w _tl , which is expressed as:

将序列l的特征h_tl与其对应的权重w_tl相乘构成新的特征序列h_t'_l并输入到下一个LSTM中。引入self-attention机制对LSTM的隐藏层的特征赋予权重可以有效获取序列中的关键信息，从而提高预测的准确度和效率。The feature h _tl of sequence l is multiplied by its corresponding weight w _tl to form a new feature sequence h _t ' _l and input it into the next LSTM. Introducing the self-attention mechanism to assign weights to the features of the hidden layer of LSTM can effectively obtain the key information in the sequence, thereby improving the accuracy and efficiency of prediction.

在任何深度学习模型的开发中，dropout层包括随机选择神经元并在训练过程中停用其中一些神经元，以防止模型的过拟合。本发明中，在CNN特征提取块和LSTM序列学习之间加入了一个dropout层，以防止过度拟合。LSTM序列学习块的输出也连接到一个dropout层，然后是一个全连接层(FC)以产生最终输出。In the development of any deep learning model, the dropout layer involves randomly selecting neurons and deactivating some of them during the training process to prevent overfitting of the model. In the present invention, a dropout layer is added between the CNN feature extraction block and the LSTM sequence learning to prevent overfitting. The output of the LSTM sequence learning block is also connected to a dropout layer, followed by a fully connected layer (FC) to produce the final output.

步骤3.3)对于优化器的定义及模型训练参数设置。模型训练输入数据中训练数据、验证数据和测试数据的比例为7:2:1，使用平均绝对误差(MAE)作为损失函数来监控验证损失。模型的优化器设置为自适应优化器Adam，初始学习率设置为0.001，Adam在训练的过程中能自适应的更新学习率，使得模型收敛速度快，参数的调整更加地容易。此外，本模型的训练迭代次数epoch设置为700次，每次迭代的初始损失函数设置为0，batch的大小设置为10。Step 3.3) Define the optimizer and set the model training parameters. The ratio of training data, validation data, and test data in the model training input data is 7:2:1, and the mean absolute error (MAE) is used as the loss function to monitor the validation loss. The optimizer of the model is set to the adaptive optimizer Adam, and the initial learning rate is set to 0.001. Adam can adaptively update the learning rate during the training process, making the model converge faster and the adjustment of parameters easier. In addition, the number of training iterations of this model is set to 700, the initial loss function of each iteration is set to 0, and the batch size is set to 10.

步骤4：对用电负荷的预测主要包括：超短期用电负荷预测、短期用电负荷预测和中长期用电负荷预测，参考图4，图5，图6分别显示了超短期、短期和中长期用电负荷预测与用户真实用电负荷的对比折线图。Step 4: The prediction of power load mainly includes: ultra-short-term power load prediction, short-term power load prediction and medium- and long-term power load prediction. Refer to Figures 4, 5 and 6, which respectively show the comparison line graphs of ultra-short-term, short-term and medium- and long-term power load predictions with the actual power load of users.

图4中(a)为2019年某地五一商业用电负荷预测值与真实值的对比；通过分析该地某一天的用电真实值和预测值，可以得到该地这天的商业用电预测值基本能够反应真实用电情况，可以为发电厂提供调度指导。(b)为2018/2019/2020年该地五一商业用电负荷对比，通过分析2018/2019/2020年该地五一商业用电负荷对比，分析得到2020年五一该地商业用电负荷较前两年受疫情影响较大。Figure 4 (a) shows the comparison between the predicted value and the actual value of the commercial electricity load on May 1st in a certain place in 2019. By analyzing the actual value and predicted value of electricity consumption on a certain day in the place, it can be obtained that the predicted value of commercial electricity consumption on that day can basically reflect the actual electricity consumption situation, which can provide scheduling guidance for power plants. (b) is the comparison of the commercial electricity load on May 1st in the place in 2018/2019/2020. By analyzing the comparison of the commercial electricity load on May 1st in the place in 2018/2019/2020, it is found that the commercial electricity load on May 1st in 2020 was more affected by the epidemic than in the previous two years.

图5为某地域商业用电的短期用电负荷预测值与真实值的对比示折线图，引入温度因素对2019年该地商业用电进行预测，预测值基本能够反应真实负荷用电趋势，也负荷温度的变化趋势，但在2月5入左右出现异常情况，原因是春节期间该地商业大部分处于休假状态。Figure 5 is a line graph comparing the short-term electricity load forecast value and the actual value of commercial electricity consumption in a certain area. The temperature factor is introduced to predict the commercial electricity consumption in this area in 2019. The forecast value can basically reflect the actual load electricity consumption trend and the change trend of load temperature. However, an abnormal situation occurred around February 5th because most of the businesses in this area were on holiday during the Spring Festival.

图6为某地域商业用电的中长期用电负荷预测值与真实值的对比折线图。(a)为该地每周用电负荷预测；通过对比该地商业用电每周用电负荷值与预测值，预测值基本能够反应真实值的变化趋势。(b)为该地每月用电负荷预测，通过对比该地商业用电每月用电负荷真实值与预测值，预测值基本能够反应真实值的变化趋势。Figure 6 is a line chart comparing the medium- and long-term commercial electricity load forecast and the actual value. (a) is the weekly electricity load forecast for the area; by comparing the weekly commercial electricity load value and the forecast value, the forecast value can basically reflect the changing trend of the actual value. (b) is the monthly electricity load forecast for the area; by comparing the actual value and the forecast value of the monthly commercial electricity load, the forecast value can basically reflect the changing trend of the actual value.

具体步骤如下：The specific steps are as follows:

以某市商业用电为例，用电负荷预测模块的设计与实现主要从电厂发电部门的层面从超短期、短期和中长期多个维度进行用电负荷的预测，方便电厂发电部门掌握某市每天、每周、每月商业用电情况，能够做出有效的电力负荷调度。例如获取2019年某市商业用电时刻、天、周、月多个维度的用电负荷数据，及环境数据例如每天的最高温度、最低温度、平均湿度、平均风力大小、天气类型等，在数据预处理和标准化之后作为LSTM模型的输入，输出2019年某市商业用电时刻、天、周、月多个维度的预测用电负荷趋势，通过对比真实值和预测值，反映模型对用电负荷预测的准确度，并且引入温度等因素的变化趋势和用电趋势的对比反映环境因素对用电负荷大小的影响。Taking the commercial electricity consumption of a certain city as an example, the design and implementation of the power load forecasting module mainly predicts the power load from the ultra-short-term, short-term and medium-term dimensions from the level of the power plant power generation department, so as to facilitate the power plant power generation department to grasp the commercial electricity consumption of a certain city every day, every week and every month, and to make effective power load dispatch. For example, the power load data of the commercial electricity consumption time, day, week and month of a certain city in 2019, and environmental data such as the highest temperature, lowest temperature, average humidity, average wind force, weather type, etc. of each day are obtained. After data preprocessing and standardization, they are used as the input of the LSTM model, and the predicted power load trends of the commercial electricity consumption time, day, week and month of a certain city in 2019 are output. By comparing the real value and the predicted value, the accuracy of the model in predicting the power load is reflected, and the comparison of the changing trend of factors such as temperature and the power consumption trend is introduced to reflect the impact of environmental factors on the power load size.

本发明主要应用于电力部门，如指导电厂用于发电电力调度、协助电力管理部门分析用电情况，挖掘用户用电信息、可适用于研发部门，挖掘用户用电规律，提高电力部门工作效率，参考图7显示了本发明的应用方案。影响电网用电负荷有地域、时间、温度、用电类型、电压类型等因素，本作品设计的基于深度学习的多因素用电负荷预测系统，首先使用基于k-邻近(KNN)算法和改进DBSCAN算法的异常数据检测、自回归插值和序列数据归一化对数据进行预处理和标准化；然后提出一种改进的CNN-LSTM用电负荷预测模型，首先采用CNN特性提取模块对输入数据的局部特征进行学习；然后输入到LSTM序列学习模型中，提取输入数据的序列特征信息；同时将self-attention机制引入到LSTM中用于学习LSTM隐藏层的特征，通过分配不同的注意力权重实现关键特制的提取，以提升最终的预测精度；最后进行超短期用电负荷预测、短期用电负荷预测和中长期用电负荷预测并辅助对疫情前后用电分析、电力用电分布分析和电力用户画像等。本发明可应用于电力部门预测用户的用电趋势，合理规划阶段性用电，可以帮助电力部门调度电厂发电量，并且可以帮助指导行业的复工复产和行业的关联分析。The present invention is mainly used in the power sector, such as guiding power plants for power generation and dispatching, assisting power management departments in analyzing power consumption, mining user power consumption information, and can be applied to research and development departments to mine user power consumption patterns and improve the work efficiency of power departments. Reference Figure 7 shows the application scheme of the present invention. Factors that affect the power load of the power grid include region, time, temperature, power type, voltage type, etc. This work designs a multi-factor power load forecasting system based on deep learning. First, the data is preprocessed and standardized using abnormal data detection, autoregressive interpolation and sequence data normalization based on the k-nearest neighbor (KNN) algorithm and the improved DBSCAN algorithm; then an improved CNN-LSTM power load forecasting model is proposed. First, the CNN feature extraction module is used to learn the local features of the input data; then it is input into the LSTM sequence learning model to extract the sequence feature information of the input data; at the same time, the self-attention mechanism is introduced into the LSTM to learn the features of the LSTM hidden layer, and the key special extraction is achieved by assigning different attention weights to improve the final prediction accuracy; finally, ultra-short-term power load forecasting, short-term power load forecasting and medium- and long-term power load forecasting are carried out, and power consumption analysis before and after the epidemic, power consumption distribution analysis and power user portraits are assisted. The present invention can be applied to the power department to predict the power consumption trend of users, reasonably plan the phased power consumption, help the power department to dispatch the power generation of power plants, and can help guide the resumption of work and production in the industry and the correlation analysis of the industry.

Claims

1. A multi-factor power load prediction method based on deep learning is characterized by comprising the following steps:

step 1: acquiring power load data including different areas, years and power utilization categories and environmental influence data including temperature, humidity and wind power, and storing the data in a sqlite database;

step 2: the method for cleaning and preprocessing the power load data and the environmental impact data comprises the following steps: abnormal data detection, autoregressive interpolation and sequence data normalization based on a k-proximity algorithm and an improved DBSCAN algorithm;

and step 3: introducing a self-attention mechanism, performing feature reconstruction on the hidden layer of the LSTM to realize end-to-end deployment, constructing an electrical load data prediction neural network model based on the improved CNN-LSTM, and using the electrical load data and the environmental impact data obtained by processing in the step (2) as a training set and a test set;

the step 3 specifically includes:

step 3.1: for the loading and preparation of network model data:

loading a data set, dividing a training set, a verification set and a test set, constructing a Dataloader as a data reader for the training set and the test set respectively, calculating data of each batch during model training, loading the data into a memory by the Dataloader according to the size of the batch, and disordering the data of each batch to improve the robustness of the model training;

step 3.2: constructing an electrical load data prediction neural network model based on the improved CNN-LSTM:

the neural network model comprises a CNN feature learning module, an LSTM sequence learning module and an attention mechanism module:

1) The CNN feature learning module comprises three one-dimensional convolution layers, wherein a Max painting layer and a ReLu layer are added between two continuous convolution layers; learning the characteristics of the normalized power load data and the normalized environmental impact data through convolution operation, and using the characteristics as a characteristic diagram output by the convolution layer; adding a MaxPolling layer to relieve the limitation of invariance of the generated feature diagram, and activating a function ReLu to enhance the capability of the model to learn a complex structure;

2) The LSTM sequence learning module comprises three LSTM layers, each layer comprising twenty neurons; the first two LSTM layers output the complete sequence of the hidden state, and the last LSTM layer outputs the last time step of the hidden state;

the input of the LSTM layer is an influencing parameter x at the time t _t And the predicted value h of the previous moment _t-1 Obtaining the predicted value h at the time t through the prediction function F _t (ii) a The function expression is as follows:

h _t ＝F(x _t ,h _t-1 )

3) A self-attention mechanism module is respectively added among the three LSTM layers and distributes weight to the features of the hidden layer extracted from the LSTM layer, so that electricity is used for miningLoad data and environmental impact data have more discriminative characteristics; hidden layer output sequence of penultimate layer of LSTM network at t moment is distributed from attention weight w _tl The expression is as follows:

in the formula, L _h For the sequence length of the LSTM hidden layer output, l denotes the sequence number of the LSTM hidden layer output sequence, s _tl Representing the direct similarity of the sequence l of the LSTM hidden layer at the time t and other sequences;

the feature h of the sequence l _tl Weight w corresponding thereto _tl Multiplying to form a new signature sequence h _t ' _l And input into the next LSTM;

step 3.3: defining an optimizer, setting model training parameters:

monitoring verification loss by using the average absolute error as a loss function, and setting a self-adaptive optimizer Adam to adaptively update the learning rate in the training process; setting training iteration times epoch, an initial loss function and the size of batch;

and 4, step 4: predicting the electric load through an improved electric load data prediction neural network model of the CNN-LSTM;

the electric load data comprises a user number, a region to which the user belongs, an urban/rural network, a user classification, an electricity utilization type, a client type, a voltage grade, an industry type, a contract capacity, an operation capacity, a first power transmission date, a frozen electric quantity date and a frozen electric quantity of 15 minutes; the environmental impact data comprises the current date, the region, the highest temperature, the lowest temperature, the average humidity, the average wind power, the weather type, the current temperature of 15 minutes and whether the current temperature is a holiday or not; and the time of the environmental impact data is synchronized with the time of the electrical load data.

2. The method for predicting the electrical load with the multiple factors based on the deep learning according to claim 1, wherein in the step 2, the abnormal data detection based on the k-neighbor algorithm and the improved DBSCAN algorithm is specifically as follows:

step 2.1: defining the average distance between a sample and an adjacent sample as the abnormal score of the sample, obtaining the abnormal score of the electricity load data at each moment by using a modified KNN algorithm, and taking the total distance from each cluster to each k adjacent samples as the final abnormal score;

load data c of electricity consumption at a certain time i _i K adjacent sets N of _k (c _i ) Expressed as:

in the formula (I), the compound is shown in the specification,

denotes c _i One of the k neighboring points of (a), d _k (c _i ) Is c _i K adjacent average distances of->

Denotes c _i And its adjacent point->

The distance of (d);

electrical load data c _i Is expressed as:

in the formula, N _k (c _i ) Is shown by c _i K sets of neighboring points;

finally, outputting the first m clusters of the abnormal score ranking list as abnormal values of the electric load data;

step 2.2: adopting an improved cluster-based anomaly detection algorithm DBSCAN, firstly utilizing local parameters to realize density clustering of data of small samples, then carrying out iterative clustering on local clustering results to realize a final global clustering result, and marking points which do not belong to any cluster class and belong to anomaly points; the method specifically comprises the following steps:

step a) updating parameters:

setting the size M of a cluster sliding window, calculating the average distance difference of the electric load data in the window, setting the former k adjacent electric loads as MinPst, and setting the Euclidean distance between the electric load data as Eps;

setting a weight for each load data to reduce the impact on the final clustering result, weight w (c) _i ,c _j ) The calculation formula of (c) is as follows:

wherein, cov (c) _i ,c _j ) Electrical load data c at time i _i And the electric load data c at the time of j _j Covariance of (a), var (c) _i ) Electrical load data c at time i _i Variance of (c), var (c) _j ) Electrical load data c for time j _j The variance of (a); r (c) _i ,c _j ) Is denoted by c _i And c _j The smaller the value, the greater the correlation;

step b) obtaining an abnormal detection result of the electric load data by analyzing the clustering result:

in the clustering process, setting the abnormal point marked for the first time as a candidate abnormal point, setting the abnormal score plus 1, entering the next candidate abnormal point in the cyclic iterative clustering process, and updating the abnormal score; if the anomaly score S is equal to the cluster number C, the anomaly point is marked.

3. The method for predicting the multi-factor electrical load based on the deep learning of claim 1, wherein in the step 2, the autoregressive interpolation is specifically as follows:

completing the missing electric load data value by adopting a Lagrange interpolation method so that the polynomial y = a of n-1 ₀ +a ₁ x+a ₂ x ² +…+a _n-1 x ^n-1 Coordinate (x) passing through n points ₁ ,y ₁ ),(x ₂ ,y ₂ ),(x ₃ ,y ₃ ),...,(x _n ,y _n ) Then the functional expression for lagrange interpolation is expressed as:

in the formula, x _i And x _j I and j time instants, y, representing the power consumers, respectively _i A power load indicating an i-th time of the power consumer; n represents the total number of times of the electrical load.

4. The method for predicting the electrical load according to claim 1, wherein in the step 2, the processing formula of the normalization of the sequence data is as follows:

wherein X = { X ₁ ,X ₂ ,X ₃ ...,X _N Denotes the same category of electrical load data or environmental impact data; the normalized value is Y = { Y = { (Y) } ₁ ,Y ₂ ,Y ₃ …,Y _N }，l∈[1,N]And N is the total number of certain types of electrical load data or environmental impact data.

5. The method for predicting the multifactor electric load based on the deep learning of claim 1, wherein in the step 4, the prediction of the electric load comprises: ultra-short term power load prediction, and medium-long term power load prediction.